These are chat archives for nextflow-io/nextflow

1st
Apr 2019
Tobias "Tobi" Schraink
@tobsecret
Apr 01 14:09
@csawye01 always happy to help out/ learn new NF! Did it end up solving your issue?
micans
@micans
Apr 01 14:30
We have multiple users writing to SINGULARITY_CACHEDIR; it leads to permission errors. Is there a solution that does not involve everyone setting umask to 002?
micans
@micans
Apr 01 14:58
(not a NF question, sorry, but perhaps I get lucky; I've looked into setfacl, does not seem to be available or activated on our file system).
Paolo Di Tommaso
@pditommaso
Apr 01 16:46
umm, maybe the singularity channel can help more for this
micans
@micans
Apr 01 16:47
fair enough ... I'm just very lazy 's all
Stephen Kelly
@stevekm
Apr 01 20:28

@stevekm @KochTobi have a look at nf-core/tools#288 and https://github.com/qbicsoftware/nextflow-logger-service. Still in its early stage, but I give the latter priority for our facility ;)

@sven1103 thanks for the heads up on these! I think I finished the bulk of the functionality I was trying to demo in Django; https://github.com/stevekm/nf-dashboard-dj
Not really sure what to do with it next... Right now, my biggest concern for our lab is figuring out how to wrangle the 100+ Nextflow pipelines that we have completed in order to get all their output easily accessible in a centralized format. For example, I was thinking that if I could use a database such as in nextflow-io/nextflow#743 , I might be able to store the filepath to the latest publishDir outputs, and then use the weblogger to send status messages to a central web app such as this Django one or the Flask one you linked to, so that my lab members can see the status of workflows that we have running and potentially get things like HTML reports accessible in their web browser via the web server. Really need to come up with some way to link all these disparate Nextflow outputs together into a cohesive framework. I saw you guys have a lot of stuff in the nf-core tools repo there. I will have to look and see if there is anything I can use. My intention is to keep initiating our current Nextflow pipelines manually on the HPC as usual, but allow for a web-app overview of the entire output of all Nextflow pipelines completed.

@pditommaso I noticed that the http weblog output does not include anything regarding the process output items, or the items being saved to the publishDir, is there a way to get information about this from the task either during task completion, or sent over the http weblog?
if there was something like task.onComplete available, similar to workflow.onComplete, I could hack together the functionality I wanted
Stephen Kelly
@stevekm
Apr 01 20:34

We have multiple users writing to SINGULARITY_CACHEDIR; it leads to permission errors. Is there a solution that does not involve everyone setting umask to 002?

@micans might not help you guys much but I have ended up avoiding a lot of these kinds of problems seemingly by just creating static copies of the Singularity image files ahead of time and calling them by absolute path in the Nextflow config, it sounds like you are trying to pull from Singularity Hub or similar? Not sure if its worth the trouble to save it all locally ahead of time? I also made all my .simg files globally readable.

Sinisa Ivkovic
@sivkovic
Apr 01 21:52
@tbugfinder I'm not sure how to use EFS currently with NF, and how would that help me with the issue? Anyway I got more into the issue and tried few different things, and what I found as a working solution is to use different partition as a working directory. This basically offloads huge IO to different partition and docker daemon performance is not affected during downloads and uploads. I had to make a few changes in nextflow, since I wasn't able to achieve that just by changing the configuration, so here is my pull request sivkovic/nextflow#1. I also created a cloud formation for building AMI which also have a running https://github.com/aws-samples/aws-genomics-workflows for autoscaling ebs, so if you are fine with the solution I can create a pull request to nextflow repository and give you these cloud formation scripts as well. What I did in kind a "hacky" way is using TMPDIR environment variable to change working directory in command.run, so if you have a recommendation how to better implement that I will change it. But all issues I had with timeouts disappeared with this change. If you and @pditommaso agree with this change I can make a pull request to nextflow repo. Of course if there is a way to achieve the same thing just with configuration please let me know.
micans
@micans
Apr 01 23:17
Thanks @stevekm , that's pretty much what our current thinking is. Good to know there is nothing obviously better!
Rad Suchecki
@rsuchecki
Apr 01 23:30
@micans @stevekm Just curious, is the volume of images substantial enough to opt for shared space over user specific SINGULARITY_CACHEDIRs? Or is it more about different users re-running the same pipeline and needing fixed containers for task caching?