if you want to launch a pre-configured pipeline, that
id
is the lauch associated with that pipeline
I think I'm missing something incredibly obvious, but I'm not seeing an ID anywhere for the pipeline. I see them for the workspaces, compute environments, etc. But not for the pipelines.
has anybody else come across a recent AWS ECS/Batch failure with NF Tower?
I think that it’s related to this email to my org from 23 Aug 2021:
Hello,
Your action is required to avoid potential service interruption once Amazon ECS API request validation improvements take effect on September 24, 2021. We have identified the following API requests to Amazon ECS from your account that could be impacted by these changes:
DescribeContainerInstances
With these improvements, Amazon ECS APIs will validate that the Service and Cluster name parameters in the API match the Cluster and Service name in the ARN.
a recent launch into our Tower Forge infrastructure on AWS yielded this notice from AWS:
Hello,
On Wed, 1 Sep 2021 08:57:30 GMT, all EC2 instances in your Batch compute environment “arn:aws:batch:us-west-2:478885234993:compute-environment/TowerForge-2y3V6L8gnk6kM09yoB0vmS-head“ were scaled down. The compute environment is now in an INVALID state due to a misconfiguration preventing the EC2 instances from joining the underlying ECS Cluster. While in this state, the compute environment will not scale up or run any jobs. Batch will continue to monitor your compute environments and will move any compute environment whose instances do not join the cluster to INVALID.
To fix this issue, please review and update/recreate the compute environment configuration. Common compute environment misconfigurations which can prevent instances from joining the cluster include: a VPC/Subnet configuration preventing communication to ECS, incorrect Instance Profile policy preventing authorization to ECS, or a bad custom Amazon Machine Image or LaunchTemplate configuration affecting the ECS agent.
I tried looking in the Nextflow documentation nor searching here, but I didn't see anything along these lines: is there some hidden/secret flag or something that allows you turn on/off sending run 'tracking' to an organisation (vs. your own personal account?)
In other words: generally I want to monitor my runs in a workspace shared with other people in my department, but sometimes if I'm running a 'sensitive' project, I want to keep a given run so that I can monitor in my personal workspace. Is there such a functioanlity to switch this in on and off per run?
I have a workflow I’m building for which some of the early processes take considerable CPU/memory to run, so I’m forced to run workflow on AWS (no biggie). So far, I’ve been slowly adding steps and relaunching the same pipeline with resume and it’s all working great.
However, the time spent provisioning EC2 instances for later processes that don’t require much CPU/memory is frustrating. I could (in theory) just run those processes locally while I build out the rest of the workflow.
Is it possible for me to do something like copy the work/scratch directory from S3 to my local work/
directory and -resume
the workflow running locally? I’ve tried the naive way of doing it, but it won’t detect that the workflow has already successfully run some of the processes. Are there any logs/files I can just manually edit - or pass something on the CLI - to get it to do so?
Hi all!
Super hyped to be trying tower. The user experience has been impressive so far. 🚀
However, I could not get even a single workflow to execute on the Google Life Sciences backend.
I have set up this very minimal example below. Can someone please enlighten me what might be wrong?
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process echo_remote_file_content {
container = "docker.io/taniguti/wf-cas9:latest" // this does not work :(
// container = "docker.io/docker/whalesay:latest" // this works!! both images are public
input: path remote_file
output: stdout emit: cat
script: "cat $remote_file"
}
workflow {
echo_remote_file_content(params.remote_file)
println echo_remote_file_content.out.cat.view()
}
This is the error report:
Error executing process > 'echo_remote_file_content'
Caused by:
Process `echo_remote_file_content` terminated with an error exit status (9)
Command executed:
cat str.txt
Command exit status:
9
Command output:
(empty)
Command error:
Execution failed: generic::failed_precondition: while running "nf-6f1c929e312542a7ee1699175d05f753-main": unexpected exit status 1 was not ignored
Work dir:
gs://sensitive-bucket-name/scratch/1uc7mIoqwEIZV0/6f/1c929e312542a7ee1699175d05f753
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
Hi All, I am trying to start a pipeline using NF-tower on AWS batch, however the head job (nextflow itself) fails to start... I get:
Status reason
Task failed to start
And also
CannotStartContainerError: Error response from daemon: OCI runtime create failed: runc did not terminate successfully: unknown
The image that it is trying to start is:
public.ecr.aws/seqera-labs/tower/nf-launcher:21.08.0-edge
Has anyone had this before?
there was several fails as described before, and now at the latest attempt it managed to start but this is the log from the nextflow container:
/usr/local/bin/nf-launcher.sh: line 25: /usr/bin/tee: Cannot allocate memory
/usr/local/bin/nf-launcher.sh: line 71: 12 Killed aws s3 sync --only-show-errors "$NXF_WORK/$cache_path" "$cache_path"
Failed to launch the Java virtual machine
NOTE: Nextflow is trying to use the Java VM defined by the following environment variables:
JAVA_CMD: /usr/lib/jvm/java-11-amazon-corretto/bin/java
NXF_OPTS:
/usr/local/bin/nf-launcher.sh: line 43: 30 Killed [[ "$NXF_WORK" == s3://* ]]
obviously something is very wrong, but I feel I've checked everything and not sure where else to look...