These are chat archives for nextflow-io/nextflow

Aug 2018
Francesco Strozzi
Aug 04 2018 07:54
@srynobio this is interesting…I am OK with Docker images that are kept on the instance while jobs are running, as this avoids wasting time and resources in re-downloading a Docker image for each job if it’s already present on the instance. On the other side I expect that stopped containers should not take up so much space, unless somehow there are data saved directly inside the container and not wiped out, even if NF cleans out everything once the job has completed….just thinking out loud to see if I’ve got your issue right and to understand more the scenario in which you experienced the problem. We use aws-batch too and usually we end up having tens of jobs running on the same instance, and we never run into this problem before...
Tim Dudgeon
Aug 04 2018 08:20
@skptic I tried using an intermediate channel but it made no difference. e.g. using splitted = origin_parts.flatMap() to create a channel that links the processes.
I looked at two work dirs that corresponded to the same task (one that was completed during the first execution and the corresponding one that was re-run when re-executed) and the contents of the directories are identical, except for some of the .command.* files. e.g. all the input and output files are exactly the same, but a different hash was generated.
Aug 04 2018 09:01
@srynobio @fstrozzi Our instances get sufficient EBS volume space needed for running the possible number of containers in parallel. Are you saying, that there is space needed for 2500 images if this would be executed on a single machine?
If yes
then I'd say I've never seen such before although running thousands of jobs by one ec2 instance
Karin Lagesen
Aug 04 2018 12:13
Question regarding publishDir, is it possible to specify which of the outputs should be put in the directory?
I might create multiple output channels for a process, and it would make my thinking simpler if I could just have one of them as a "publish" channel
Tim Dudgeon
Aug 04 2018 13:39
@karinlag Yes, you can use the pattern parameter of publishDir to specify which files should be published. See