Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 06:12
    mr-c edited #3668
  • 06:12
    mr-c edited #3668
  • 06:12
    mr-c edited #3668
  • 06:12
    mr-c edited #3668
  • 06:12
    mr-c edited #3668
  • 06:12
    mr-c edited #3668
  • 06:12
    mr-c edited #3668
  • 06:12

    mr-c on 3650-tutorials-encoding-error

    (compare)

  • 06:11

    mr-c on master

    Set encoding to string in tutor… (compare)

  • 06:11
    mr-c closed #3668
  • 06:11
    mr-c closed #3650
  • 06:11
    mr-c updated the wiki
  • Jun 22 23:42
    w-gao synchronize #3488
  • Jun 22 23:42

    w-gao on 3487-migrate-cloudconfig-to-ignition

    Add type hints (#3657) Merge branch 'master' into issu… (compare)

  • Jun 22 23:42
    w-gao review_requested #3488
  • Jun 22 23:42
    w-gao review_requested #3488
  • Jun 22 23:42
    w-gao edited #3488
  • Jun 22 22:49
    DailyDreaming synchronize #3669
  • Jun 22 22:49

    DailyDreaming on update-main-python-test-version

    Update Toil's main python test … (compare)

  • Jun 22 22:48
    DailyDreaming opened #3669
Douglas Lowe
@douglowe

[Adam Novak, UCSC GI] Hm. Maybe we need to just rip out that system and use --noStdOutErr by default...

Correction to my earlier post - the log messages didn't actually cause my problem, it was that the default temp directory location was causing problems. Setting the TMPDIR environmental variable solved this for me (but I foolishly did this at the same time as testing this flag yesterday, and have only tested them separately now).

Michael R. Crusoe
@mr-c
@douglowe Glad to hear it! I would be very sad if we had to turn on -noStdOutErr by default.
I wonder if we can test if TMPDIR is set badly and give users a hint about that ..
@douglowe the issue was that TMPDIR wasn't a path shared on all the nodes, yes?
I guess if --batchsystem is being used and the result of tempfile.gettempdir() starts with /tmp then that might be enough
Michael R. Crusoe
@mr-c
Douglas Lowe
@douglowe
@mr-c - I'm not sure quite what the issue was, but it's quite possible it was on a local disk for each node. I will check what the default is, and let you know.
Douglas Lowe
@douglowe
yep - /tmp is used when TMPDIR isn't set, and is local to each node, so adding a warning based on that should help
Michael R. Crusoe
@mr-c
@douglowe thanks for the confirmation!
serut
@serut

@hannes-ucsc Thanks for the answer ! So there is not yet that concept inside Toil to manage affinity between Tasks and the cluster it manages. Should I create an issue to track it ? Few more questions if you don't mind : This information should be stored inside the CWL, no ? For me, the CWL describe a set of tasks, each task can be runned anywhere or a specific node type. And to clear it as much as possible, the task affinity concept has other impact to Toil, it's just an additionnal attribute transfered to the Mesos call ? It does not break any optimisation that Toil does under the hood ?

Any advice ?

7 replies
Douglas Lowe
@douglowe
will the toil conda package be updated to v5.2.0 soon?
Michael R. Crusoe
@mr-c
Douglas Lowe
@douglowe
ahh, thanks - I'll keep an eye on the progress of that PR then
Douglas Lowe
@douglowe
@mr-c - I've copied your patch for debian, and created a PR for the new recipe: bioconda/bioconda-recipes#26427
fingers crossed it compiles okay
Michael R. Crusoe
@mr-c
👍👍👍
Douglas Lowe
@douglowe
The new conda package installs fine on OSX - but on our local HPC I've had to disable channel priority (conda config --set channel_priority false) to avoid clashing dependency issues (https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-channels.html )
I'll raise an issue to make a permanent note of this for you - would you like that issue on the toil repository, or on the bioconda repository (which is a more logical place for this, as it's a conda issue, but then it's more likely to get lost in the noise)
serut
@serut

@ArtRand I want to be able to schedule a task that can be executed only if the mesos resource fits the required attribute. I can't just use the docker image to provide the correct environment to the task success, as the server (mesos agent) is not located on the same "datacenter". We do not want to register Toil with a specific role, as we want to discuss with only 1 Toil that can distribute tasks on the right mesos resource depending of its requirements. So yes, the binary must be present and can will be if Toil uses the right mesos attribute.

I'm sorry to bump that thread again , I really need advice on this one ! :innocent:

Michael R. Crusoe
@mr-c
@serut that ability doesn't exist to day, but it could be written. Do you have engineering resources available?
Arthur Rand
@ArtRand
I may have some time to POC a feature like that, but with the new K8s cluster functionality - adding a new resource requirement might require architectural discussions. I can take a look.
serut
@serut
@mr-c Yes we have engineering resources that we can dedicate to this specific enhancement to make the enhancement for everyone. We need the clear conception from your point of view and the estimated effort, but I think we can add this feature. On another hand, if @ArtRand has some time to POC it to help our engineer to get a good base it would be very nice. We do not have an active Toil contributor so it won't be easy for us to contribute and test it but I don't think this enhacement is too complicated. I think an issue on your tracker would be nice to collaborate on this issue.
Nikhil Kumar
@nikhil
Does anyone know if toil automatically kills its child jobs when the leader is terminated? I see batchsystems have the kill functionality, but I don't see it in use anywhere.
Michael R. Crusoe
@mr-c
@serut That's great to hear! I only contribute to Toil on the CWL side, so I can't comment on the changes needed elsewhere. Opening an issue is a great idea. We can also make it a topic of a future edition of our weekly CWL videos chats. @DailyDreaming is a regular attendee of those.
Peter Amstutz
@tetron
Lon Blauvelt
@DailyDreaming
Ugh, gitter is hooked into our slack service, which is where I usually get notifications. Our bot hasn't been syncing, so Adam and I haven't been getting notifications from here since January. >.<
Michael R. Crusoe
@mr-c
@nikhil I don't know, offhand. Perhaps @DailyDreaming does?
Lon Blauvelt
@DailyDreaming
@nikhil It should. We recommend using either the "toil kill" or "toil destroy-cluster" commands. If you hit a situation where this doesn't happen, it's definitely a bug and should be submitted as an issue.
Lon Blauvelt
@DailyDreaming
[Lon Blauvelt, UCSC GI] New slack integration test.
New slack integration reverse test.
Adam Novak
@adamnovak
"integration"
pvanheus
@pvanheus
@douglowe I found with the 5.2.0 conda package I had to install enlighten manually. did you see anything like this?
1 reply
pvanheus
@pvanheus
heya Toil folks - I've written this workflow, the first step of which is just about processing an input directory full of files to generate a list of lists (of pairs of files): https://gist.github.com/pvanheus/cd4c730ec429741d0e5567b33fb38b85
I run it like toil-cwl-runner --singularity --stats --clusterStats --retryCount=0 --batchSystem slurm --disableCaching --tmpdir-prefix $(pwd)/tmp --tmp-outdir-prefix $(pwd)/tmp --workDir $(pwd)/work --logFile crypt-tb-profiler-toil.log --jobStore $(pwd)/crypticJobStore process_all_reads_tb_profiler.cwl cryptic-input.yml
this is toil 5.2.0 running from conda
and the first step results in all the files in the input directory being copied into ./work/node-913aeead[....] - is there a way to avoid this behaviour? the ExpressionTool I am using is just aiming to transform a Directory to a File[File[]] essentially
pvanheus
@pvanheus
(the directory in question has 12730 files (6365 samples) totally 1.9 TB so copying it is a bit heavyweight :) )
pvanheus
@pvanheus

and the main part of the workflow is now running but there are a lot of

sacct: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to localhost:6819: Unable to connect to database
sacct: error: slurmdbd: Sending PersistInit msg: No error

errors

Lon Blauvelt
@DailyDreaming
[Adam Novak, UCSC GI] Maybe throwing --linkImports at it would help? That asks the importer to use symlinks when it can.
pvanheus
@pvanheus
ok I'll try that
Lon Blauvelt
@DailyDreaming
@pvanheus I would also try to install Toil from source. We'll do a release soon, but the latest code tries to symlink for CWL where possible: DataBiosphere/toil#3445
pvanheus
@pvanheus
Thanks for the tips... I've added the --linkImports and installed from source (so my toil-cwl-runner is now version 5.3.0a1) and it is still copying everything into the work directory...
pvanheus
@pvanheus
and then after doing so it does some kind of reading of each file (perhaps populating "contents"?). I'm running it on a subset of 200 files to examine behaviour more closely
pvanheus
@pvanheus
oh... it copies each file into the jobStore dir. sigh unfortunately still a lot of copying
after all this is completed though, this version is much better at keeping my cluster busy :)
Lon Blauvelt
@DailyDreaming
@pvanheus That's odd, and I wouldn't expect it to still be copying.
I could try to run it from my end if you have a reproducible workflow that you wanted to make an issue for: https://github.com/DataBiosphere/toil/issues
Vijay Lakhujani
@vlakhujani
the clusterStats option does not produce a json output, am I missing something ?
Lon Blauvelt
@DailyDreaming
@vlakhujani That option could be worded better, as it only works with mesos (and therefore AWS). If using a different cluster, try the --stats option. If using mesos/aws, then let me know because then that's a bug we need to fix.
1 reply