Hi All, I am trying to start a pipeline using NF-tower on AWS batch, however the head job (nextflow itself) fails to start... I get:
Status reason Task failed to start
CannotStartContainerError: Error response from daemon: OCI runtime create failed: runc did not terminate successfully: unknown
The image that it is trying to start is:
Has anyone had this before?
there was several fails as described before, and now at the latest attempt it managed to start but this is the log from the nextflow container:
/usr/local/bin/nf-launcher.sh: line 25: /usr/bin/tee: Cannot allocate memory /usr/local/bin/nf-launcher.sh: line 71: 12 Killed aws s3 sync --only-show-errors "$NXF_WORK/$cache_path" "$cache_path" Failed to launch the Java virtual machine NOTE: Nextflow is trying to use the Java VM defined by the following environment variables: JAVA_CMD: /usr/lib/jvm/java-11-amazon-corretto/bin/java NXF_OPTS: /usr/local/bin/nf-launcher.sh: line 43: 30 Killed [[ "$NXF_WORK" == s3://* ]]
obviously something is very wrong, but I feel I've checked everything and not sure where else to look...
I'm finding that the tasks API (https://tower.nf/openapi/index.html#get-/workflow/-workflowId-/tasks) is returning info for a previous run of the same data (-resume was not used). Any ideas?
@pditommaso shall I file a gh issue for this or have I made an error? I'm receiving data for a previous run from the tasks API (ID 'EyoDOtC0ruDyv') when querying the latest run (ID '47kppVWxAkUEWl'), though '-resume' wasn't used for the latter.
.command.*files to my S3 bucket. Literally, all folders in
workdirectory contain only input and output files. I need to collect
.command.logfrom several process and at the moment cannot do that. Has anyone seen similar behaviour? When I ran AWS Batch from my laptop before I remember seeing a couple warnings in
.nextflow.logrelated to S3, but in Tower I don't seem to be able to find
.nextflow.log... When I setup up Tower and role policies on AWS I strictly followed Tower documentation.
Just rerun it from my local laptop on AWS and check
.nextflow.log. There does not seem to be any warnings or errors except for:
WARN com.amazonaws.util.Base64 - JAXB is unavailable. Will fallback to SDK implementation which may be less performant
Though I still do not see any of the
.command.* files in my work directory. The pipeline is working with no problem until I specifically ask for
.command.log and then it fails as the file does not exist.
.command*files are written to folders in work directory and the pipeline runs OK until it gets to the process where
.command.logis needed to be copied
task-1.command.err.txtwas empty and when tried to download
Unable to download file: .command.logmessage on the website. I've also checked a corresponding work directory here:
/fusion/s3/my-bucket/scratch/5DQRTBKJsIIay1/bf/71c55281bdbbb44ead372c5acf3746and it was empty.
s3:list. Is that the correct one?
.command.*files in the
workdirectories. Am I missing anything?
maxRetriesto some reasonable value for all of the processes by default? (At the moment my
maxRetriesdepends on the exit code, but as there is not exit code provided by AWS, it should be applied to any reason)
process.errorStrategy = 'retry' process.maxRetries = 5 // or more
-resumeoption in NF Tower, so that the outputs from these initial processes can be reused? Thanks!