Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    rmeinl
    @rmeinl
    Hey! I'm looking to run a workflow in nf-tower that accesses a Postgres DB. When I run it locally I store the credentials in the nextflow config. Is there a way to securely store them somewhere in nf-tower to initiate my workflow?
    Danilo Imparato
    @daniloimparato

    Hi all!

    Super hyped to be trying tower. The user experience has been impressive so far. 🚀

    However, I could not get even a single workflow to execute on the Google Life Sciences backend.

    I have set up this very minimal example below. Can someone please enlighten me what might be wrong?

    #!/usr/bin/env nextflow
    
    nextflow.enable.dsl=2
    
    process echo_remote_file_content {
    
      container = "docker.io/taniguti/wf-cas9:latest"   // this does not work :(
      // container = "docker.io/docker/whalesay:latest" // this works!! both images are public
    
      input: path remote_file
    
      output: stdout emit: cat
    
      script: "cat $remote_file"
    }
    
    workflow {
      echo_remote_file_content(params.remote_file)
      println echo_remote_file_content.out.cat.view()
    }

    This is the error report:

    Error executing process > 'echo_remote_file_content'
    
    Caused by:
      Process `echo_remote_file_content` terminated with an error exit status (9)
    
    Command executed:
      cat str.txt
    
    Command exit status:
      9
    
    Command output:
      (empty)
    
    Command error:
      Execution failed: generic::failed_precondition: while running "nf-6f1c929e312542a7ee1699175d05f753-main": unexpected exit status 1 was not ignored
    
    Work dir:
      gs://sensitive-bucket-name/scratch/1uc7mIoqwEIZV0/6f/1c929e312542a7ee1699175d05f753
    
    Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
    Paolo Di Tommaso
    @pditommaso
    hello please open an issue including the "Nextflow console output" and "Nextflow log file" (you can find in the Exection logs panel)
    Danilo Imparato
    @daniloimparato
    Done, thanks! @pditommaso
    Vlad Kiselev
    @wikiselev

    Hi All, I am trying to start a pipeline using NF-tower on AWS batch, however the head job (nextflow itself) fails to start... I get:

    Status reason
    Task failed to start

    And also

    CannotStartContainerError: Error response from daemon: OCI runtime create failed: runc did not terminate successfully: unknown

    The image that it is trying to start is:

    public.ecr.aws/seqera-labs/tower/nf-launcher:21.08.0-edge

    Has anyone had this before?

    Vlad Kiselev
    @wikiselev

    there was several fails as described before, and now at the latest attempt it managed to start but this is the log from the nextflow container:

    /usr/local/bin/nf-launcher.sh: line 25: /usr/bin/tee: Cannot allocate memory
    /usr/local/bin/nf-launcher.sh: line 71:    12 Killed                  aws s3 sync --only-show-errors "$NXF_WORK/$cache_path" "$cache_path"
    Failed to launch the Java virtual machine
    NOTE: Nextflow is trying to use the Java VM defined by the following environment variables:
     JAVA_CMD: /usr/lib/jvm/java-11-amazon-corretto/bin/java
     NXF_OPTS: 
    /usr/local/bin/nf-launcher.sh: line 43:    30 Killed                  [[ "$NXF_WORK" == s3://* ]]

    obviously something is very wrong, but I feel I've checked everything and not sure where else to look...

    Arghhh, ok, please ignore all of the above - I specified 8MB of RAM instead of 8Gb!..
    Paolo Di Tommaso
    @pditommaso
    :smile:
    Vlad Kiselev
    @wikiselev
    Is that normal that the hosted tower server has API request problems. Seeing a lot of Oops... Unable to process request - Error ID: Ac7sDnGIoR0r7IKFYLkbR both on the website and in NF logs
    Vlad Kiselev
    @wikiselev
    And also this: Http failure response for https://tower.nf/api/orgs: 502 Bad Gateway, can't really use it at the moment
    Combiz Khozoie
    @combiz
    I'm finding that the tasks API (https://tower.nf/openapi/index.html#get-/workflow/-workflowId-/tasks) is returning info for a previous run of the same data (-resume was not used). Any ideas?
    Yes, @wikiselev, same here. Hopefully back up soon!
    Thankfully downtime is v. rare, the first time I've seen it, so definitely not normal. :)
    Vlad Kiselev
    @wikiselev
    good to know! I used for the first time yesterday and was super excited, but can't do anything today :-)
    probably an API service requires a restart
    Paolo Di Tommaso
    @pditommaso
    um, let me check
    @wikiselev give another try please
    1 reply
    Combiz Khozoie
    @combiz

    I'm finding that the tasks API (https://tower.nf/openapi/index.html#get-/workflow/-workflowId-/tasks) is returning info for a previous run of the same data (-resume was not used). Any ideas?

    @pditommaso shall I file a gh issue for this or have I made an error? I'm receiving data for a previous run from the tasks API (ID 'EyoDOtC0ruDyv') when querying the latest run (ID '47kppVWxAkUEWl'), though '-resume' wasn't used for the latter.

    Paolo Di Tommaso
    @pditommaso
    weird, yes please including your request as an example
    Vlad Kiselev
    @wikiselev
    When I run my pipeline with tower on AWS Batch it does not write any of the .command.* files to my S3 bucket. Literally, all folders in work directory contain only input and output files. I need to collect .command.log from several process and at the moment cannot do that. Has anyone seen similar behaviour? When I ran AWS Batch from my laptop before I remember seeing a couple warnings in .nextflow.log related to S3, but in Tower I don't seem to be able to find .nextflow.log... When I setup up Tower and role policies on AWS I strictly followed Tower documentation.
    Vlad Kiselev
    @wikiselev

    Just rerun it from my local laptop on AWS and check .nextflow.log. There does not seem to be any warnings or errors except for:

    WARN  com.amazonaws.util.Base64 - JAXB is unavailable. Will fallback to SDK implementation which may be less performant

    Though I still do not see any of the .command.* files in my work directory. The pipeline is working with no problem until I specifically ask for .command.log and then it fails as the file does not exist.

    Paolo Di Tommaso
    @pditommaso
    I need to collect .command.log from several process and at the moment cannot do that.
    what do you mean?
    is the execution failing?
    Vlad Kiselev
    @wikiselev
    Yes, it is failing, because i am trying to copy .command.log which does not exist. The error is cp: cannot stat '.command.log': No such file or directory.
    Somehow, none of the .command* files are written to folders in work directory and the pipeline runs OK until it gets to the process where .command.log is needed to be copied
    Paolo Di Tommaso
    @pditommaso
    does the basic https://github.com/nextflow-io/hello works?
    Vlad Kiselev
    @wikiselev
    yes, just tried it on Tower and it succeeded. however, no .command.* files in the work directories again...
    Vlad Kiselev
    @wikiselev
    and nothing suspicious in the log file...
    Paolo Di Tommaso
    @pditommaso
    ummm, scroll down the runs page in tasks table
    click on one task for the hello
    when open task dialog click in the execution logs tab
    then donwload the 1) task stdout, 2) task stderr, 3) task log and upload them here
    Vlad Kiselev
    @wikiselev
    thanks for your help, Paolo! I followed your instructions and downloaded task-1.command.out.txt which had Bonjour world! inside, task-1.command.err.txt was empty and when tried to download task-1.command.log.txt got Unable to download file: .command.log message on the website. I've also checked a corresponding work directory here: /fusion/s3/my-bucket/scratch/5DQRTBKJsIIay1/bf/71c55281bdbbb44ead372c5acf3746 and it was empty.
    Paolo Di Tommaso
    @pditommaso
    I suspect that this happens because the job role has not have enough permissions to write in that buckets
    let's followup tmorrow
    Vlad Kiselev
    @wikiselev
    Hi Paolo, thanks! I've used this policy - https://github.com/seqeralabs/nf-tower-aws/blob/4aa2b6f913928cdf5ac9a270022b80d306d56b18/forge/forge-policy.json#L59, which only mentions s3:get and s3:list. Is that the correct one?
    though, looks like I completely missed this section... https://help.tower.nf/compute-envs/aws-batch/#access-to-s3-buckets
    Ok, let me try
    Vlad Kiselev
    @wikiselev
    haha, trying to add that S3 policy to tower user - now AWS complains that I exceed 2048 character limit for inline policies for the tower user...
    Vlad Kiselev
    @wikiselev
    I ended up creating a user group (it has larger character limit on policies), I've added both https://github.com/seqeralabs/nf-tower-aws/blob/master/forge/forge-policy.json (compute) and https://github.com/seqeralabs/nf-tower-aws/blob/master/launch/s3-bucket-write.json (s3 access) policies to that group. Then I added my tower user to that group. Then I reran the Hello world pipeline, but it's still the same, the pipeline finishes without problems, but there are no .command.* files in the work directories. Am I missing anything?
    Danilo Imparato
    @daniloimparato
    👆 I wonder if it has anything to do with this: https://github.com/seqeralabs/nf-tower/issues/327#issuecomment-956339788
    bioinfo
    @bioinfo:matrix.org
    [m]
    I wonder could I deploy nf-tower in my local PC, and discard the SMTP auth to login?
    if it is OK, any guide / link to the deploy? Thanks
    Vlad Kiselev
    @wikiselev
    What is the main tactics to handle spot instance restart (after it was terminated by AWS)? It looks like there is no exit code when this happens... Shall I just set maxRetries to some reasonable value for all of the processes by default? (At the moment my maxRetries depends on the exit code, but as there is not exit code provided by AWS, it should be applied to any reason)
    Paolo Di Tommaso
    @pditommaso
    add in your config
    process.errorStrategy = 'retry' 
    process.maxRetries = 5 // or more
    Vlad Kiselev
    @wikiselev
    beautiful, many thanks, Paolo!
    kkerns85
    @kkerns85
    Hello to the NextFlow Community! What is the best way to trouble shoot an issue with trying to launch a nf workflow from tower using AWS Batch. My workflow worked great until recently when I tried to rerun it. There is likely an issue on the AWS side with my Jobs stuck in runnable status. I have exhausted all of my resources and looked at every option to resolve this issue from AWS, stack-overflow, etc. All my environments are healthy and functional. I have migrated to nf Tower thinking this would bypass or resolve my issues but they still persist. I don't know if this is the correct place to post this but I am desperate for help now. Thank you in advance!
    Graham Wright
    @gwright99
    Hello @kkerns85 , in my experience troubleshooting this kind of problem is "part art, part science" given the number of inter-connecting factors. We'd be happy to assist if you open an issue at https://github.com/seqeralabs/nf-tower/issues and can provide more details on your setup.
    One initial suggestion I have for you is to check the underlying ECS clusters once your Jobs become Runnable - are Worker instances able to join the cluster(s), or does the membership count remain at 0?
    4 replies
    Phil Ewels
    @ewels
    Regarding nf-core pipelines, see the docs here: https://nf-co.re/developers/adding_pipelines (basically, join slack and tell us about it in the #new-pipelines channel)
    1 reply