Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 27 23:31

    w-gao on 3875-update-wes-setup-docs

    Update version_template.py Bump mypy from 0.930 to 0.931 (… Scale TES to be able to run rea… and 10 more (compare)

  • Jan 27 21:07
    unito-bot edited #1984
  • Jan 27 21:06
    unito-bot unassigned #3514
  • Jan 27 21:06
    unito-bot edited #3514
  • Jan 27 21:06
    unito-bot unassigned #3699
  • Jan 27 21:06
    unito-bot edited #3699
  • Jan 27 21:05
    unito-bot unassigned #3918
  • Jan 27 21:05
    unito-bot edited #3918
  • Jan 27 21:05
    unito-bot unassigned #3949
  • Jan 27 21:05
    unito-bot edited #3949
  • Jan 27 19:32
    Hexotical opened #4023
  • Jan 27 19:32

    Hexotical on 3699-fix-typos

    Fix assorted typos within assor… (compare)

  • Jan 27 14:56
    mr-c commented #4022
  • Jan 27 14:55
    douglowe commented #4022
  • Jan 27 14:54

    mr-c on 4021-sge-1core-jobs

    SGE batch system change to supp… Merge branch 'master' into 5.6.… (compare)

  • Jan 27 14:53
    mr-c commented #4022
  • Jan 27 14:53
    mr-c synchronize #4022
  • Jan 27 14:51
    douglowe opened #4022
  • Jan 27 14:51
    unito-bot edited #4021
  • Jan 27 14:51
    douglowe opened #4021
Adam Novak
@adamnovak
That all being said, WDL runners are able to do this with WDL code, so it might not be impossible.
Michael Milton
@multimeric
Thanks for the answer. One angle I've seen used in another system is to annotate each job with a hash which we allow the user to calculate. Then the user can try simple solutions like just hashing the file that the job resides in, or alternatively just keeping a manual version number for each task.
If WDL already does this, then I guess Toil has some concept of "has this job changed", which I would just need to plug this logic into
Marcel Loose
@gmloose
Can anyone explain to me how to interpret the output of toil --stats? The documentation is quite limited.
Marcel Loose
@gmloose
Today, I've been bitten by the fact that CWLTool URL-encodes a + character is a filename to %2B. This results in a error: Cannot make job: Invalid filename: 'P233%2B35_structure.txt' contains illegal characters
I saw there are several issues that refer to this:
common-workflow-language/cwltool#1260,
common-workflow-language/cwltool#1098, and
common-workflow-language/cwltool#1445.
Where the last one even contains an almost finished pull request.
So I was wondering, what's the status of this issue? Is it indeed a bug in CWLTool, or is this a (too) strict limitation by CWLTool on allowed characters in a filename?
crusoe
@mr-c:matrix.org
[m]
Sorry to hear that @gmloose ; https://github.com/common-workflow-language/cwltool/pull/1446#issuecomment-850896086 shows that the PR needs some assistance. Would you like to finish it up?
It is indeed a bug in cwltool; and it should be fixed
Marcel Loose
@gmloose
I could have a look, though I have limited time.
crusoe
@mr-c:matrix.org
[m]
Thanks. Seems like it just needs a docstring plus tweaking Alex's tests to cover the rest of the newly added code paths
Marcel Loose
@gmloose
I was curious about the doc-string part. None of the functions in that file have a doc-string. Why is it enforced on this single test function?
crusoe
@mr-c:matrix.org
[m]
It was a requirement added later, and we use diff-cover to only enforce it for new code
Marcel Loose
@gmloose
I'm not really familiar with tox. How can I run only the modified test test_path_checks.py and get coverage stats?
crusoe
@mr-c:matrix.org
[m]

The two lines with no test coverage are annotated at https://github.com/common-workflow-language/cwltool/pull/1446/files#annotation_2008443310

For local checking you'll need to run all the tests with make diff-cover

Marcel Loose
@gmloose

Hm, that fails. ```$ make diff-cover
python --version 2>&1 | grep "Python 3"
Python 3.6.9
python -m pytest -rs --cov --cov-config=.coveragerc --cov-report=
ERROR: usage: main.py [options] [file_or_dir] [file_or_dir] [...]
main.py: error: unrecognized arguments: -n --cov --cov-config=.coveragerc --cov-report=
inifile: /home/marcel/code/cwltool/tox.ini
rootdir: /home/marcel/code/cwltool

Makefile:155: recipe for target 'testcov' failed
make: * [testcov] Error 4
```

OK, make install-dep seemed to do the trick.
Marcel Loose
@gmloose
But it still runs all tests :(. I had expected make diff-cover would only run tests in test_path_checks.py. Am I doing something wrong?
crusoe
@mr-c:matrix.org
[m]
you have to run all the tests to see how the other changes may have impacted the code coverage. You aren't doing anything wrong, no.
crusoe
@mr-c:matrix.org
[m]

So when I run make diff-cover on that PR locally I get:

cwltool/command_line_tool.py (62.5%): Missing lines 207,256-257

Which matches what codecov.io reports, so that is good 🙂
Marcel Loose
@gmloose
I've tried to get my head around what exactly is going on in revmap_file and the new test. I get the impression that the test (in its current setup) can only check that what you put in as filename, also gets out (i.e. the external filename representation). I guess that's why only the if clause is covered by the test. I guess the else clause will only be executed if you supply an internal filename representation (at least, that's what I'm guessing right now). I'm not sure how I would have to supply an internal filename representation in that current test, because it uses a CommandLineTool, which is an external thingy.
crusoe
@mr-c:matrix.org
[m]
internal in this case refers to a path within a software (docker) container
(if that works then we can collapse the code duplication later, don't worry about that for now)
Marcel Loose
@gmloose
Only adding a DockerRequirement doesn't help much. I still have an empty scheme in line 203 of command_line_tool.py. I know I can call as_uri() on the Path variable that is used in the test , but I don't know how to tweak the test to do so, without completely breaking it.
crusoe
@mr-c:matrix.org
[m]
Ah, for that set RuntimeContext.outdir to some file:/// reference to a tmpdir
(tmp_path / "outdir").as_uri()
hmm.. no, that doesn't work
That is some very old code, the check for a schema in outdir
crusoe
@mr-c:matrix.org
[m]
Toil-cwl-runner doesn't use that. Maybe arvados-cwl-runner does? Paging @tetron ..
@gmloose: Lets ignore that part for the moment (and thanks for looking into this!) ; what about the other two lines that have no coverage?
huh, even more ancient code, from January 2018..
I'm tempted to remove both uncovered branches...
crusoe
@mr-c:matrix.org
[m]
@gmloose: I've removed the unused code and added a docstring ; thanks for the reminder about this PR!
Marcel Loose
@gmloose
So, it's ready to be merged? That would be great. I'm one of the first that would like to give it a test spin.
crusoe
@mr-c:matrix.org
[m]
If all the CI tests pass, I'll merge and make a new release; yep 🙂
Marcel Loose
@gmloose
Great!
Marcel Loose
@gmloose
Thanks. I'll give a go today.
Marcel Loose
@gmloose
BTW: This probably means that not only issue common-workflow-language/cwltool#1445 can be closed, but common-workflow-language/cwltool#1260, and common-workflow-language/cwltool#1098 too.
crusoe
@mr-c:matrix.org
[m]
@gmloose: Huzzah, thanks for noticing!
Marcel Loose
@gmloose
Maybe a bit of a naive question. But why is it that the Toil Workflow progress bar doesn't know the total number of jobs to run beforehand? The workflow is validated and parsed completely before processing starts, right? However, when I run my workflow, I see the total number of jobs increasing during the run. This makes the progress bar quite useless for measuring progress.
crusoe
@mr-c:matrix.org
[m]
I think that feature was made for traditional (Python only) Toil workflows; it probably needs some work for toil-cwl-runner
Adam Novak
@adamnovak
@gmloose I didn't really understand the leader when I wrote it, and I wasn't willing to add any sort of traversal of the job graph. So we're using just the jobs that are currently ready to run as the progress denominator: https://github.com/DataBiosphere/toil/blob/eb2ae8365ae2ebdd50132570b20f7d480eb40cac/src/toil/leader.py#L854-L856
Maybe we should really be using just every job ID the leader has ever heard of, now that we have a cache in the ToilState that ought to have copies of everything.
4 replies
Marcel Loose
@gmloose
I know this is against the idea behind Toil doing its utmost best to do as much work as possible and provide a means to recover from (spurious) errors in one of the steps in your workflow, but ... is it possible to let Toil fail early and let it exit upon the first error it encounters? This would be very helpful in debugging.
5 replies
Marcel Loose
@gmloose

I encounter NFS issues when running Toil with Slurm on one of our clusters, causing jobs to fail. Two typical tracebacks follow below:

Traceback (most recent call last):
  File "/project/rapthor/Software/rapthor/bin/_toil_worker", line 8, in <module>
    sys.exit(main())
  File "/project/rapthor/Software/rapthor/lib/python3.6/site-packages/toil/worker.py", line 710, in main
    with in_contexts(options.context):
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/project/rapthor/Software/rapthor/lib/python3.6/site-packages/toil/worker.py", line 684, in in_contexts
    with manager:
  File "/project/rapthor/Software/rapthor/lib/python3.6/site-packages/toil/batchSystems/abstractBatchSystem.py", line 505, in __enter__
    self.arena.enter()
  File "/project/rapthor/Software/rapthor/lib/python3.6/site-packages/toil/lib/threading.py", line 438, in enter
    with global_mutex(self.workDir, self.mutex):
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/project/rapthor/Software/rapthor/lib/python3.6/site-packages/toil/lib/threading.py", line 340, in global_mutex
    fd_stats = os.fstat(fd)
OSError: [Errno 116] Stale file handle
Traceback (most recent call last):
  File "/project/rapthor/Software/rapthor/lib/python3.6/site-packages/toil/deferred.py", line 215, in cleanupWorker
    robust_rmtree(os.path.join(stateDirBase, cls.STATE_DIR_STEM))
  File "/project/rapthor/Software/rapthor/lib/python3.6/site-packages/toil/lib/io.py", line 51, in robust_rmtree
    robust_rmtree(child_path)
  File "/project/rapthor/Software/rapthor/lib/python3.6/site-packages/toil/lib/io.py", line 64, in robust_rmtree
    os.unlink(path)
OSError: [Errno 16] Device or resource busy: b'/project/rapthor/Share/prefactor/L667520/working/f7a704078c8f54fc8a7ccb44a8d5d5f6/deferred/.nfs00000000000e74070000e57a'

Both types of error seem to occur during clean-up.

7 replies
Rohith B S
@rohith-bs
[2021-11-01T11:50:40+0530] [MainThread] [W] [toil.leader] Job failed with exit value 1: 'JobFunctionWrappingJob' kind-JobFunctionWrappingJob/instance-l0jq0ypl
Exit reason: None
[2021-11-01T11:50:40+0530] [MainThread] [W] [toil.leader] No log file is present, despite job failing: 'JobFunctionWrappingJob' kind-JobFunctionWrappingJob/instance-l0jq0ypl
[2021-11-01T11:50:54+0530] [MainThread] [W] [toil.job] Due to failure we are reducing the remaining try count of job 'JobFunctionWrappingJob' kind-JobFunctionWrappingJob/instance-l0jq0ypl with ID kind-JobFunctionWrappingJob/instance-l0jq0ypl to 2
Kindly suggest the reason this happens. I do not see any errors as such in the execution. I am currently using slurm bacthSystem.
9 replies
Adam Novak
@adamnovak

Anybody ever see anything like this from a Toil worker? Maybe @mr-c:matrix.org ?

    Traceback (most recent call last):
      File "/usr/local/lib/python3.6/dist-packages/toil/worker.py", line 376, in workerScript
        job = Job.loadJob(jobStore, jobDesc)
      File "/usr/local/lib/python3.6/dist-packages/toil/job.py", line 2251, in loadJob
        job = cls._unpickle(userModule, fileHandle, requireInstanceOf=Job)
      File "/usr/local/lib/python3.6/dist-packages/toil/job.py", line 1876, in _unpickle
        runnable = unpickler.load()
    AttributeError: 'Comment' object has no attribute '_end'

I'm trying to run some CWL CI tests, which broke when we rebuilt our Gitlab, with a local leader against our Kubernetes, and I'm getting this. I'd say it's a cwltool version mismatch, but as far as I can tell I have cwltool==3.1.20211020155521 in both my container and my leader virtualenv. Does CWL have a Comment object that recently grew or lost a _end?

1 reply