dependabot[bot] on pip
Bump cwltool from 3.1.202111071… (compare)
kannon92 on 4004-periods-separator-scatter
feat: square bracket to period … (compare)
adamnovak on 4006-find-deb
Add more static URLs for Singul… (compare)
adamnovak on 3992-health-check
Require new connexion package t… (compare)
[Adam Novak, UCSC GI] @ionox0 This is part of the not-actually-correct logic that Toil has for preparing directories for CWL workflows. The underscore prefix is something that
cwltool uses that indicates a directory is to be created to present to a CWL tool. Toil tries to generate this to control
cwltool, but in practice what we have in 5.4 only works when running on a single machine or otherwise using a shared filesystem between nodes.
I've been redoing all that logic in DataBiosphere/toil#3628 so that Toil can be responsible for setting up the directory structures that CWL tools expect to see, whether there's a shared filesystem or not, but I still don't have it fully working yet. When it's done, it should be much harder to break.
exit code 120pointing to ?
I'm trying to run toil on the internal Kubernetes cluster, following is the command which I used
toil-cwl-runner --logDebug --enable-dev --batchSystem kubernetes --jobStore aws:us-east-1:toil-test --stats --singularity --defaultCores 1 md_launch.cwl md_list_input_descriptions.yml
but I'm getting a permission error
<Response><Errors><Error><Code>AuthorizationFailure</Code><Message>User (arn:aws:iam::07445xxxxxx:user/cibin) does not have permission to perform (sdb:Select) on resource (arn:aws:sdb:us-east-1:074455289529:domain/toil-registry). Contact account owner.</Message><BoxUsage>0.0000137200</BoxUsage></Error></Errors><RequestID>0e71b5c9-150a-b570-a5cf-31f1f751abca</RequestID></Response>
Toil version is 5.4.0
[Adam Novak, UCSC GI] @cibinsb, if you are setting up your own AWS roles/credentials (instead of using
toil launch-cluster), you need to make sure you are granting access to SimpleDB in addition to access to S3, for the AWS job store to work.
As described in https://toil.readthedocs.io/en/latest/running/cloud/kubernetes.html#aws-job-store-for-kubernetes you need to grab some AWS credentials, put them in a Kubernetes secret, and use
TOIL_AWS_SECRET_NAME when you run the workflow, to grant the workers access. You also need to make sure the leader has access, either by running it in a pod with the secret mounted in to
~/.aws, or by setting up
~/.aws on whatever non-pod machine you are running the leader on.
toil-test--files? That sounds like a very generic name, and bucket names must be unique across all of AWS. It is quite possible that someone else is already using the jobstore named
toil-testand that you will have to pick a different, unique name.
Traceback (most recent call last): File "/home/test-user/toil-scripts/script.py", line 282, in <module> Job.Runner.startToil(main_job, options) File "/usr/local/lib/python3.7/site-packages/toil/job.py", line 1743, in startToil return toil.restart() File "/usr/local/lib/python3.7/site-packages/toil/common.py", line 874, in restart return self._runMainLoop(rootJobDescription) File "/usr/local/lib/python3.7/site-packages/toil/common.py", line 1132, in _runMainLoop jobCache=self._jobCache).run() File "/usr/local/lib/python3.7/site-packages/toil/leader.py", line 229, in run self.innerLoop() File "/usr/local/lib/python3.7/site-packages/toil/leader.py", line 614, in innerLoop self._gatherUpdatedJobs(updatedJobTuple) File "/usr/local/lib/python3.7/site-packages/toil/leader.py", line 573, in _gatherUpdatedJobs self.processFinishedJob(jobID, exitStatus, wallTime=wallTime, exitReason=exitReason) File "/usr/local/lib/python3.7/site-packages/toil/leader.py", line 959, in processFinishedJob replacementJob = self.jobStore.load(jobStoreID) File "/usr/local/lib/python3.7/site-packages/toil/jobStores/fileJobStore.py", line 209, in load with open(jobFile, 'rb') as fileHandle: FileNotFoundError: [Errno 2] No such file or directory: 'jobStore/jobs/kind-FunctionWrappingJob/instance-pb6jcg2c/job'
[Adam Novak, UCSC GI] @cibinsb, Toil automatically appends --files to the job store name to derive the S3 bucket name, because a job store is more than just an S3 bucket; it currently includes some SimpleDB stuff, and only files go in S3.
I would try a different name than
toil-test, maybe something with
cibinsb in it, and see if that works.
[Adam Novak, UCSC GI] @rohith-bs Are you running your job store on a shared network filesystem that might be lagging behind messages sent through the job scheduler/not globally consistent in real time? The job appears to exist, but then is gone by the time Toil goes to load it: https://github.com/DataBiosphere/toil/blob/77a39f507b729525926c5efc9e07377483cdd005/src/toil/leader.py#L955-L959
We may be best off extending the special case handling for stale reads we have for the AWS job store to also cover the file job store, so that when the job is slow to disappear we don't crash.