These are chat archives for nextflow-io/nextflow

5th
Apr 2019
Rad Suchecki
@rsuchecki
Apr 05 00:03
https://twitter.com/bioinforad/status/1113952376329629696 - TSDR; Singularity 3.1.1 no longer crashes on workDir being bound more than once, allowing the use of autoMounts = true without hacks
Alex Cerjanic
@acerjanic
Apr 05 00:59

So I'm running into a bit of trouble with the Docker support in Nextflow. My environment root squashes the NFS mounts, where all our data lives. This doesn't play well with docker bind mounts. I've tried working around this situation by setting the work dir to a local scratch drive and using stageInMode 'copy' in the process. Regardless, Nextflow is still trying to bind mount the directory where the input file lived (on the NFS share), even though it is copied into the scratch work directory.

docker run -i -v /shared/mrfil-data/acerja2/Prisma/MRE/PrelimCIVIC/CIVIC068/senfm:/shared/mrfil-data/acerja2/Prisma/MRE/PrelimCIVIC/CIVIC068/senfm -v /scratch/19/a889e542faa570fe31bae30ffa9423:/scratch/19/a889e542faa570fe31bae30ffa9423 -v "$PWD":"$PWD" -w "$PWD" --entrypoint /bin/bash -e !{matlabLicense} --shm-size=512M --name $NXF_BOXID nvcr.io/partners/matlab:r2018a -c "/bin/bash -ue /scratch/19/a889e542faa570fe31bae30ffa9423/.command.sh"

The first -vmount is the problematic one. That is the original location of the input file, but the .command.sh refers to the copied in file located in scratch.
Is this the expected behavior? Is there a workaround or an option that I have missed in specifying the bind mounts?

Vanessasaurus
@vsoch
Apr 05 01:10
@rsuchecki woot!
Lavanya Veeravalli
@veeravalli
Apr 05 01:49
@KochTobi Is it possible to resume without changing your publishDir on S3?
Lavanya Veeravalli
@veeravalli
Apr 05 02:25
@pditommaso yes, if I delete the trace file from the publishdir, then cache works. https://github.com/nextflow-io/nextflow/issues/813
Would like to know others are dealing with this now. Thanks
Paolo Di Tommaso
@pditommaso
Apr 05 06:22
@michaelwebb_twitter use nextflow run -q <script>
@s181706_gitlab working on that nextflow-io/nextflow#984, so far you can just run NF as any other tool using a process
@acerjanic how would you handle these mounts with docker command line ?
Paolo Di Tommaso
@pditommaso
Apr 05 06:29
@sivkovic the plan is to add the support for custom mounts in the Batch scope (not at process level) to be able to access shared data via EFS or FSx or mount local paths.
KochTobi
@KochTobi
Apr 05 07:10
@veeravalli yes though you have to take care of the tracefiles. One solution is to store them locally on your EC2 instance and upload them to your s3 bucket once your pipeline ran through
Lavanya Veeravalli
@veeravalli
Apr 05 08:01
@KochTobi yes, thought so. But is there anyway that can be part of adding it into nextflow itself? from workflow.onComplete? Thanks.
KochTobi
@KochTobi
Apr 05 08:57
@veeravalli sorry I am not an expert on that matter :) I can't answer that
Luca Cozzuto
@lucacozzuto
Apr 05 09:30
hi @pditommaso is there a way to make an input optional?
Jonathan Manning
@pinin4fjords
Apr 05 10:43
Is there going to be a full release for Nextflow sometime soon? We're kind of conda focused, and like to pin everything to package versions, but I'm using some of the recent improvements, so I've been using NXF_VER=19.03.0-SNAPSHOT rather than the 19.01.0 release, which is probably a bit naughty for production.
Paolo Di Tommaso
@pditommaso
Apr 05 10:44
yes, the week following the next
Jonathan Manning
@pinin4fjords
Apr 05 10:44
Great :-)
Paolo Di Tommaso
@pditommaso
Apr 05 10:45
tho we also provide maintenance builds trough => https://www.seqera.io/
Jonathan Manning
@pinin4fjords
Apr 05 10:50
Interesting- thanks
Luca Cozzuto
@lucacozzuto
Apr 05 10:53
thanks Paolo! It worked
Paolo Di Tommaso
@pditommaso
Apr 05 11:53
:v:
Alex Cerjanic
@acerjanic
Apr 05 15:03
@pditommaso I would mount them in the same way with the exception of omitting the first. The first -v mount corresponds to the original location of an input file in the process. However, stageInMode 'copy' has been set for this process, where the container has also be set. If the input files have been copied to the scratch directory (which I have verified in the scratch directory), why would we need to bind mount the original location (not scratch) of the input file? I've pasted the process in question, in case I've made things unclear. Thanks!
process reconSenFM {

    container = "nvcr.io/partners/matlab:r2018a"
    containerOptions = '-e LICENSESERVER --shm-size=512M'
    stageInMode 'copy'

    input:
    file senFM_Siemens_twix_dat from senFMDatFile
    output: 
    file 'senFM.mat' into calibrationData

    script:
    """
    matlab -nodisplay -nodesktop -r "reconSenFM_Nextflow('${senFM_Siemens_twix_dat}');"
    """
}
Paolo Di Tommaso
@pditommaso
Apr 05 15:14
NF always mounts the pipeline work dir (or the temp scratch if you are using process.scratch=true) because is where the computation is done
Alex Cerjanic
@acerjanic
Apr 05 15:31
Right, that's the behavior I expect. However, when I look at the docker run command generated by Nextflow, I get 3 mounts: 1) the original (not scratch) location of the file input, 2) the scratch location of the input file and 3) the pipeline working directory (from -v "$PWD":"$PWD"). 2 & 3 make sense based on my reading of the documentation, 1 does not and is redundant at best.
Paolo Di Tommaso
@pditommaso
Apr 05 15:48
1) is needed when the input is staged as symlink, but you are right it's not needed when using stageInMode 'copy'
however I think the latter is not taking into account by the logic of the mount composition
you may want to submit a request for enhancement
also the mountFlags *may* help to solve your case
Sinisa Ivkovic
@sivkovic
Apr 05 17:34
@pditommaso do you maybe have some timeframe for that? In combination with scratch I think we can configure environment to avoid having timeouts on AWS Batch.
Also I encounter the issue with mounted volumes when running singularity. The problem is when you for example have /foo/work as a working directory and your files are located on /bar/data you get the error like FATAL: container creation failed: unabled to /foo/work/6c/a5c0f34d860b8ea40f0f21f7aacf23 to mount list: destination /foo/work/6c/a5c0f34d860b8ea40f0f21f7aacf23 is already in the mount point list. It happens because /foo/work/6c/a5c0f34d860b8ea40f0f21f7aacf23 and /bar/data are both mounted to the container, but $PWD is also always mounted and since it is the same as /foo/work/6c/a5c0f34d860b8ea40f0f21f7aacf23 this error occurs. Just wanted to check am I doing something wrong, or this is a bug?
Laurence E. Bernstein
@lebernstein
Apr 05 17:48
@s181706_gitlab I launch my Nextflow analysis script from another Nextflow "launcher" script which creates the config file to be used based on parameters given to the launcher. It works just fine.
Alex Cerjanic
@acerjanic
Apr 05 18:46
Yeah, that's what I figured about 1). I looked at mountFlags, and it didn't help. I've gone a head and opened the issue on GitHub with a reproducer. Thanks!