Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Paolo Di Tommaso
    @pditommaso
    hello please open an issue including the "Nextflow console output" and "Nextflow log file" (you can find in the Exection logs panel)
    Danilo Imparato
    @daniloimparato
    Done, thanks! @pditommaso
    Vlad Kiselev
    @wikiselev

    Hi All, I am trying to start a pipeline using NF-tower on AWS batch, however the head job (nextflow itself) fails to start... I get:

    Status reason
    Task failed to start

    And also

    CannotStartContainerError: Error response from daemon: OCI runtime create failed: runc did not terminate successfully: unknown

    The image that it is trying to start is:

    public.ecr.aws/seqera-labs/tower/nf-launcher:21.08.0-edge

    Has anyone had this before?

    Vlad Kiselev
    @wikiselev

    there was several fails as described before, and now at the latest attempt it managed to start but this is the log from the nextflow container:

    /usr/local/bin/nf-launcher.sh: line 25: /usr/bin/tee: Cannot allocate memory
    /usr/local/bin/nf-launcher.sh: line 71:    12 Killed                  aws s3 sync --only-show-errors "$NXF_WORK/$cache_path" "$cache_path"
    Failed to launch the Java virtual machine
    NOTE: Nextflow is trying to use the Java VM defined by the following environment variables:
     JAVA_CMD: /usr/lib/jvm/java-11-amazon-corretto/bin/java
     NXF_OPTS: 
    /usr/local/bin/nf-launcher.sh: line 43:    30 Killed                  [[ "$NXF_WORK" == s3://* ]]

    obviously something is very wrong, but I feel I've checked everything and not sure where else to look...

    Arghhh, ok, please ignore all of the above - I specified 8MB of RAM instead of 8Gb!..
    Paolo Di Tommaso
    @pditommaso
    :smile:
    Vlad Kiselev
    @wikiselev
    Is that normal that the hosted tower server has API request problems. Seeing a lot of Oops... Unable to process request - Error ID: Ac7sDnGIoR0r7IKFYLkbR both on the website and in NF logs
    Vlad Kiselev
    @wikiselev
    And also this: Http failure response for https://tower.nf/api/orgs: 502 Bad Gateway, can't really use it at the moment
    Combiz Khozoie
    @combiz
    I'm finding that the tasks API (https://tower.nf/openapi/index.html#get-/workflow/-workflowId-/tasks) is returning info for a previous run of the same data (-resume was not used). Any ideas?
    Yes, @wikiselev, same here. Hopefully back up soon!
    Thankfully downtime is v. rare, the first time I've seen it, so definitely not normal. :)
    Vlad Kiselev
    @wikiselev
    good to know! I used for the first time yesterday and was super excited, but can't do anything today :-)
    probably an API service requires a restart
    Paolo Di Tommaso
    @pditommaso
    um, let me check
    @wikiselev give another try please
    1 reply
    Combiz Khozoie
    @combiz

    I'm finding that the tasks API (https://tower.nf/openapi/index.html#get-/workflow/-workflowId-/tasks) is returning info for a previous run of the same data (-resume was not used). Any ideas?

    @pditommaso shall I file a gh issue for this or have I made an error? I'm receiving data for a previous run from the tasks API (ID 'EyoDOtC0ruDyv') when querying the latest run (ID '47kppVWxAkUEWl'), though '-resume' wasn't used for the latter.

    Paolo Di Tommaso
    @pditommaso
    weird, yes please including your request as an example
    Vlad Kiselev
    @wikiselev
    When I run my pipeline with tower on AWS Batch it does not write any of the .command.* files to my S3 bucket. Literally, all folders in work directory contain only input and output files. I need to collect .command.log from several process and at the moment cannot do that. Has anyone seen similar behaviour? When I ran AWS Batch from my laptop before I remember seeing a couple warnings in .nextflow.log related to S3, but in Tower I don't seem to be able to find .nextflow.log... When I setup up Tower and role policies on AWS I strictly followed Tower documentation.
    Vlad Kiselev
    @wikiselev

    Just rerun it from my local laptop on AWS and check .nextflow.log. There does not seem to be any warnings or errors except for:

    WARN  com.amazonaws.util.Base64 - JAXB is unavailable. Will fallback to SDK implementation which may be less performant

    Though I still do not see any of the .command.* files in my work directory. The pipeline is working with no problem until I specifically ask for .command.log and then it fails as the file does not exist.

    Paolo Di Tommaso
    @pditommaso
    I need to collect .command.log from several process and at the moment cannot do that.
    what do you mean?
    is the execution failing?
    Vlad Kiselev
    @wikiselev
    Yes, it is failing, because i am trying to copy .command.log which does not exist. The error is cp: cannot stat '.command.log': No such file or directory.
    Somehow, none of the .command* files are written to folders in work directory and the pipeline runs OK until it gets to the process where .command.log is needed to be copied
    Paolo Di Tommaso
    @pditommaso
    does the basic https://github.com/nextflow-io/hello works?
    Vlad Kiselev
    @wikiselev
    yes, just tried it on Tower and it succeeded. however, no .command.* files in the work directories again...
    Vlad Kiselev
    @wikiselev
    and nothing suspicious in the log file...
    Paolo Di Tommaso
    @pditommaso
    ummm, scroll down the runs page in tasks table
    click on one task for the hello
    when open task dialog click in the execution logs tab
    then donwload the 1) task stdout, 2) task stderr, 3) task log and upload them here
    Vlad Kiselev
    @wikiselev
    thanks for your help, Paolo! I followed your instructions and downloaded task-1.command.out.txt which had Bonjour world! inside, task-1.command.err.txt was empty and when tried to download task-1.command.log.txt got Unable to download file: .command.log message on the website. I've also checked a corresponding work directory here: /fusion/s3/my-bucket/scratch/5DQRTBKJsIIay1/bf/71c55281bdbbb44ead372c5acf3746 and it was empty.
    Paolo Di Tommaso
    @pditommaso
    I suspect that this happens because the job role has not have enough permissions to write in that buckets
    let's followup tmorrow
    Vlad Kiselev
    @wikiselev
    Hi Paolo, thanks! I've used this policy - https://github.com/seqeralabs/nf-tower-aws/blob/4aa2b6f913928cdf5ac9a270022b80d306d56b18/forge/forge-policy.json#L59, which only mentions s3:get and s3:list. Is that the correct one?
    though, looks like I completely missed this section... https://help.tower.nf/compute-envs/aws-batch/#access-to-s3-buckets
    Ok, let me try
    Vlad Kiselev
    @wikiselev
    haha, trying to add that S3 policy to tower user - now AWS complains that I exceed 2048 character limit for inline policies for the tower user...
    Vlad Kiselev
    @wikiselev
    I ended up creating a user group (it has larger character limit on policies), I've added both https://github.com/seqeralabs/nf-tower-aws/blob/master/forge/forge-policy.json (compute) and https://github.com/seqeralabs/nf-tower-aws/blob/master/launch/s3-bucket-write.json (s3 access) policies to that group. Then I added my tower user to that group. Then I reran the Hello world pipeline, but it's still the same, the pipeline finishes without problems, but there are no .command.* files in the work directories. Am I missing anything?
    Danilo Imparato
    @daniloimparato
    👆 I wonder if it has anything to do with this: https://github.com/seqeralabs/nf-tower/issues/327#issuecomment-956339788
    bioinfo
    @bioinfo:matrix.org
    [m]
    I wonder could I deploy nf-tower in my local PC, and discard the SMTP auth to login?
    if it is OK, any guide / link to the deploy? Thanks
    Vlad Kiselev
    @wikiselev
    What is the main tactics to handle spot instance restart (after it was terminated by AWS)? It looks like there is no exit code when this happens... Shall I just set maxRetries to some reasonable value for all of the processes by default? (At the moment my maxRetries depends on the exit code, but as there is not exit code provided by AWS, it should be applied to any reason)
    Paolo Di Tommaso
    @pditommaso
    add in your config
    process.errorStrategy = 'retry' 
    process.maxRetries = 5 // or more
    Vlad Kiselev
    @wikiselev
    beautiful, many thanks, Paolo!
    kkerns85
    @kkerns85
    Hello to the NextFlow Community! What is the best way to trouble shoot an issue with trying to launch a nf workflow from tower using AWS Batch. My workflow worked great until recently when I tried to rerun it. There is likely an issue on the AWS side with my Jobs stuck in runnable status. I have exhausted all of my resources and looked at every option to resolve this issue from AWS, stack-overflow, etc. All my environments are healthy and functional. I have migrated to nf Tower thinking this would bypass or resolve my issues but they still persist. I don't know if this is the correct place to post this but I am desperate for help now. Thank you in advance!
    Graham Wright
    @gwright99
    Hello @kkerns85 , in my experience troubleshooting this kind of problem is "part art, part science" given the number of inter-connecting factors. We'd be happy to assist if you open an issue at https://github.com/seqeralabs/nf-tower/issues and can provide more details on your setup.
    One initial suggestion I have for you is to check the underlying ECS clusters once your Jobs become Runnable - are Worker instances able to join the cluster(s), or does the membership count remain at 0?
    4 replies
    Phil Ewels
    @ewels
    Regarding nf-core pipelines, see the docs here: https://nf-co.re/developers/adding_pipelines (basically, join slack and tell us about it in the #new-pipelines channel)
    1 reply
    Will Fondrie
    @wfondrie
    Hi all - we're launching NF Tower actions programmatically. Our workflow runs often share identical parameters for the initial processes, but differ in the parameters used at later ones. Is there a way to always use the -resume option in NF Tower, so that the outputs from these initial processes can be reused? Thanks!
    21 replies
    Kathleen Keough
    @keoughkath_twitter
    Hi all, I'm attempting to build HISAT2 indices for a large genome as part of the nf-core rnaseq pipeline. This is a high-memory, long running type of job since it's a big genome. It's a non-reference organisms, so I can't download these indices. I used tower to set up my compute environment on AWS batch with mainly default parameters. This particular job is getting stuck in the "submitted" state, and based on conversation with the slack channel, we're thinking this may be a scheduler / resource issue. Has anyone else run into something similar and know how to address it?
    2 replies