Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Apr 09 19:51
    gaow commented #1419
  • Apr 09 18:13
    BoPeng commented #1419
  • Apr 09 17:49
    gaow commented #1419
  • Apr 09 17:39
    BoPeng commented #1419
  • Apr 09 13:25
    gaow reopened #1419
  • Apr 09 13:25
    gaow commented #1419
  • Feb 26 20:09
    BoPeng labeled #1437
  • Feb 26 20:09
    BoPeng assigned #1437
  • Feb 26 20:09
    BoPeng opened #1437
  • Feb 26 20:07
    BoPeng commented #1435
  • Feb 26 20:07
    BoPeng closed #1435
  • Feb 26 20:00
    BoPeng closed #1436
  • Feb 26 19:51

    BoPeng on master

    Fix the display of global varia… Task monitor now honor walltime… (compare)

  • Feb 26 19:51
    BoPeng assigned #1436
  • Feb 26 19:51
    BoPeng opened #1436
  • Feb 22 04:42
    BoPeng labeled #1424
  • Feb 22 04:42
    BoPeng assigned #1424
  • Feb 22 04:41
    BoPeng labeled #1435
  • Feb 22 04:41
    BoPeng assigned #1435
  • Feb 22 04:41
    BoPeng commented #1435
Bo
@BoPeng
This is because sos tried to aggregate groups of inputs if two output_from are grouped. To ungroup the output you need to use group_by='all' inside output_from.
Bo
@BoPeng

Running sos run test combined with test.sos having the following workflow,

[single]
input: for_each=dict(i=range(2))
output: f'single_{i}.bam'

_output.touch()

[double]
input: for_each=dict(i=range(2))
output: f'single_{i}.bam'

_output.touch()

[combined]
input: output_from('single'), output_from('double')

print(_input)

You will see that the two groups from single and double are combined to form two groups with one output from single and one output from double.

single]
input: for_each=dict(i=range(2))
output: f'single_{i}.bam'

_output.touch()

[double]
input: for_each=dict(i=range(3))
output: f'single_{i}.bam'

_output.touch()

[combined]
input: output_from('single', group_by='all'), output_from('double', group_by='all'), group_by=1

print(_input)
basically "flatten" and join both output_from into a single group before separating them into groups with one file (group_by=).
Bo
@BoPeng
This is documented here but perhaps a more explicit example should be given.
Patrick Cudahy
@pgcudahy
That works well, thanks! I had read that documentation but couldn't figure how to put it all together
Patrick Cudahy
@pgcudahy
Hello, another quick question. The login nodes for my cluster get pretty congested, and during peak hours I start to see a lot of ERROR: ERROR workflow_executor.py:1206 - Failed to connect to yale_hpc_slurm: ssh connection to pgc29@xxx.xxx.xxx.xxx time out with prompt: b'' - None errors. Is there a way to adjust the timeout to make it longer?
Bo
@BoPeng
There is an option to adjust frequency to check task status (30s if you copied the examples), but as far as I know there is no option to adjust timeout time for the underlying ssh command.
Patrick Cudahy
@pgcudahy
My dataset has gotten to be so large (several thousand genomes) that for every step, there will likely be one or two failures due to subtle race conditions, or an outlier requiring much more memory or runtime so the cluster kills it, or the initial file was a contaminant etc. So my workflow has started to bog down into a cycle of 1) submit job 2) check in a few hours later and see which substeps failed 3) adjust parameters or just resubmit (it often just works the second time) 4) check in a few hours later to see how step 2 failed 5) resubmit 6) repeat. With a pipeline of >10 steps, this is tedious. I'd prefer if I wouldn't have to babysit runs as much. Is there a way to have SoS continue to the next step, even if some substeps fail? That way 99% of my samples will make it from fastq file to a final VCF and then I can tweak things for a second run to finish up the failed 1%. Any other suggestions on how to improve robustness would be welcome.
Bo
@BoPeng
Sorry, been a busy day. If you run sos run -h, there is an option -e ERRORMODE, I think you are trying to use -e ignore.
@pgcudahy
Patrick Cudahy
@pgcudahy
Thanks!
Patrick Cudahy
@pgcudahy
I'm having some issues with running steps with remote inputs and outputs and SoS not noticing changed files, so ignoring steps with saved signatures. Where exactly are signatures stored for jobs run remotely and how can I clear them? I've tried !sos remove -s from within my notebook, but I still get steps skipped.
Bo
@BoPeng
@pgcudahy The content of "remote" files are not checked at this point and this is why signatures related to remote files are inaccurate. The current mechanism for remote files work for some cases, but broken for others (I have a ticket for returning remote files with workdir is set) and certainly needs improvement.
This is now vatlab/sos#1411
Bo
@BoPeng
@pgcudahy Just released sos notebook 0.22.4 that makes task execution non-blocking so that you can check status, remove task etc with the buttons. Let me know if you notice any problem.
Bo
@BoPeng
vatlab/sos#1411 is also implemented, although more testing is needed for the next release.
Patrick Cudahy
@pgcudahy
Thank you Bo. I only see 0.22.3 on pip. Are there instructions somewhere on how to install from github?
0.22.3 seems to have broken my scripts, so I'm trying to figure out what's wrong
Bo
@BoPeng
0.22.3 should be also on conda. To install from github you will have to use commands such as pip install git+https://github.com/vatlab/sos.git
shashj
@shashj
Hi, is it possible to change the sigil in a workflow?
@BoPeng ?
shashj
@shashj
found it thanks
Bo
@BoPeng
@shashj Yeah, that is a basic feature but we recently added a warning message for the inclusion of scripts without indentation. I just updated the documentation.
Patrick Cudahy
@pgcudahy
Hello Bo, when submitting workflows and tasks to a cluster (eg %run check_validation -q yale_hpc_task_spooler -r yale_hpc_task_spooler) is there a way to synchronize the output back to my local computer? Using a named path like #scratch fails with WARNING: Error from step check_validation is ignored: [check_validation]: Failed to process step output (f'#scratch/helen_mixed_infection/data/fqtools/good_files.txt'): 'NoneType' object has no attribute 'expanduser'
Bo
@BoPeng
@pgcudahy I will have a look through #1437
Bo
@BoPeng
You seemed to talk about two problems. For the first one, -r is designed to execute everything on remote host and -q is designed to execute part of the things remotely so have the mechanism to sync files back. There are sos remote pu//posh commands but rsync can be more straightforward.
The second issue on anchored path looks more like a bug but I need more background on how it happened.