These are chat archives for nextflow-io/nextflow

28th
Apr 2017
Marcel Martin
@marcelm
Apr 28 2017 07:18

Hi, is there something like a .duplicate() operator for channels? It would return a copy of the channel, but leave the original one intact. I’m asking because we have many places in our code that look like this:

(original, modified) = original.into(2)
modified = modified.map { ... }

It would be nicer if this could be written somewhat like this:

modified = original.duplicate().map { ... }
Paolo Di Tommaso
@pditommaso
Apr 28 2017 07:33
Yes. Have a look at tap operator.
Marcel Martin
@marcelm
Apr 28 2017 08:19
Thanks, I will try this!
Fredrik Boulund
@boulund
Apr 28 2017 08:33
Hi! I'm experiencing some strange issues where my SLURM jobs get all wrong time and cpu requests. Pipeline code is here: https://github.com/boulund/wellness_wgs
Seems all Kaiju jobs get 8 cores and 24 hours, and all bbmap jobs get 1 minut, 1 core assignments. I've no idea what's going on. The .command.run files don't contain any SBATCH requests for time or cores...
I run the pipeline like so: nextflow run /path/to/main.nf -profile milou_proGenomes --input_reads '/path/to/*{1,2}.reads.fq
Fredrik Boulund
@boulund
Apr 28 2017 08:38
Output from nextflow config /path/to/main.nf -profile milou_proGenomes looks all right (entire config file with the milou_proGenomes file included).
Fredrik Boulund
@boulund
Apr 28 2017 09:25
Nvm, worked around it by specifying cpu, memory, and time requirements directly in main.nf.
mitul-patel
@mitul-patel
Apr 28 2017 11:22
Can I use python script as a template execution .??? will nextflow variable with $ sign work inside python script???
Kevin Sayers
@KevinSayers
Apr 28 2017 12:18
@mitul-patel using Python in the process should work fine
"""
#!/usr/bin/python
print "${text}"
"""
Paolo Di Tommaso
@pditommaso
Apr 28 2017 12:22
@boulund have you solved the problem ?
Fredrik Boulund
@boulund
Apr 28 2017 12:28
@pditommaso I couldn't figure out the root cause no, but I worked around it it hardcoding cpu, memory, and time requirements directly into main.nf
Paolo Di Tommaso
@pditommaso
Apr 28 2017 12:31
ok
Fredrik Boulund
@boulund
Apr 28 2017 12:32
Thanks anyway! :) Do you know if anyone else has experienced a similar issue before?
Paolo Di Tommaso
@pditommaso
Apr 28 2017 12:57
umm, so for example
  $kaiju {
        cpus = 8
        memory = 64.GB  // Documentation says about 13GB for proGenomes
        time = {1.h * task.attempt}
    }
Fredrik Boulund
@boulund
Apr 28 2017 12:58
yes?
Paolo Di Tommaso
@pditommaso
Apr 28 2017 12:58
instead of one hour you are getting 1 day for those tasks ?
Fredrik Boulund
@boulund
Apr 28 2017 12:58
yes
but I'm not sure the actual time of the allocation is caused by Nextflow
Paolo Di Tommaso
@pditommaso
Apr 28 2017 12:58
ah
Fredrik Boulund
@boulund
Apr 28 2017 12:59
as the batchscripts for SLURM that Nextflow created didn't contain any memory or cpu requests
Paolo Di Tommaso
@pditommaso
Apr 28 2017 12:59
you can see that in that task launch wrapper
as the batchscripts for SLURM that Nextflow created didn't contain any memory or cpu requests
how did you check that ?
Fredrik Boulund
@boulund
Apr 28 2017 13:00
looking at the .command.run files?
Paolo Di Tommaso
@pditommaso
Apr 28 2017 13:00
there aren't the slurm directive on the top ?
Fredrik Boulund
@boulund
Apr 28 2017 13:00
some, but no memory or cpu requests
Paolo Di Tommaso
@pditommaso
Apr 28 2017 13:01
um, that's not good
Fredrik Boulund
@boulund
Apr 28 2017 13:01
it's indeed very strange
I guess it works to some degree, otherwise it wouldn't have read the cluster parameters and wouldn't have managed to request any slurm jobs at all.
I have no idea what was going on. But as I said, I worked around it now, so it's currently running
Paolo Di Tommaso
@pditommaso
Apr 28 2017 13:09
um, I'm just tested your config and I'm getting those directives
#!/bin/bash
#SBATCH -D /Users/pditommaso/projects/wellness_wgs/work/a1/ebc5a54c21c057616418ad7c679c37
#SBATCH -J nf-kaiju
#SBATCH -o /Users/pditommaso/projects/wellness_wgs/work/a1/ebc5a54c21c057616418ad7c679c37/.command.log
#SBATCH --no-requeue
#SBATCH -c 8
#SBATCH -t 01:00:00
#SBATCH --mem 65536
#SBATCH -A b2016371
so both time and cpus looks fine
Fredrik Boulund
@boulund
Apr 28 2017 13:12

Very strange. I only see

#!/bin/bash
#SBATCH -D /Users/pditommaso/projects/wellness_wgs/work/a1/ebc5a54c21c057616418ad7c679c37
#SBATCH -J nf-kaiju
#SBATCH -o /Users/pditommaso/projects/wellness_wgs/work/a1/ebc5a54c21c057616418ad7c679c37/.command.log
#SBATCH --no-requeue

in my files

Paolo Di Tommaso
@pditommaso
Apr 28 2017 14:33
hard to say, make sure you didn't rename modify the script or the config locally
also check .nextflow.log to verify it's parsing the expected config files
Félix C. Morency
@fmorency
Apr 28 2017 15:38
@pditommaso can I nextflow pull in a folder different from the default one?
Paolo Di Tommaso
@pditommaso
Apr 28 2017 15:39
yes
nextflow pull /clone/path
Félix C. Morency
@fmorency
Apr 28 2017 15:39
awesome, thanks
Paolo Di Tommaso
@pditommaso
Apr 28 2017 15:39
:+1:
Félix C. Morency
@fmorency
Apr 28 2017 15:42
mmm it doesn't work
Not a valid project name: /path/to
Paolo Di Tommaso
@pditommaso
Apr 28 2017 15:43
oops sorry
it's clone not pull
$ nextflow clone hello this/path
nextflow-io/hello cloned to: this/path
Félix C. Morency
@fmorency
Apr 28 2017 15:45
mmm but it doesn't appears in the nextflow list
I would like to be able to get the same functionality as a nextflow pull, but from another path
Paolo Di Tommaso
@pditommaso
Apr 28 2017 15:46
yes, it won't be managed any more as a NF project
well, the only difference is that you will need to specify the path instead of the name to run it (and it's not reported by the list command)
is that a problem ?
Félix C. Morency
@fmorency
Apr 28 2017 15:48
It won't get the git workflow variable $workflow.repository, etc
Paolo Di Tommaso
@pditommaso
Apr 28 2017 15:49
I see
if so, nope. I'm sorry
Félix C. Morency
@fmorency
Apr 28 2017 15:50
:( ill fill a feature request
Paolo Di Tommaso
@pditommaso
Apr 28 2017 15:51
should it be a path outside the NF home directory ?
Félix C. Morency
@fmorency
Apr 28 2017 15:53
yes (on a shared fs)
Paolo Di Tommaso
@pditommaso
Apr 28 2017 15:55
having all of them in that folder would be a solution ?
Félix C. Morency
@fmorency
Apr 28 2017 15:56
all of them?
Paolo Di Tommaso
@pditommaso
Apr 28 2017 15:56
yes
Félix C. Morency
@fmorency
Apr 28 2017 15:57
yes it would i guess
Paolo Di Tommaso
@pditommaso
Apr 28 2017 15:57
oh, that's easy
Félix C. Morency
@fmorency
Apr 28 2017 15:57
I just want the workflow git variable to be populated correctly and the only way atm is to use nextflow pull
oh!?
Paolo Di Tommaso
@pditommaso
Apr 28 2017 15:58
the path where project are download by the pull command is defined by the NXF_ASSETS variable
Félix C. Morency
@fmorency
Apr 28 2017 15:58
awwwwww cool!
yes this is awesome
thanks!
Anton Goloborodko
@golobor
Apr 28 2017 21:08
Hi! Thanks for the awesome tool! Our lab decided to use if for Hi-C data analysis: https://github.com/mirnylab/distiller-nf and so far it works very well!
I have a small question: we started using dockers and it works very smoothly expect for the files nextflow creates in work/ are owned by root:root and thus can't be cleaned up by the user when the pipeline fails. Is there any way for the docker to create these files under the name of the user launching nextflow?
Thanks again
Mike Smoot
@mes5k
Apr 28 2017 21:15

@golobor you can tell docker to use a specific uid and gid when running with the --user option. Put this:

UID = new com.sun.security.auth.module.UnixSystem().getUid()
GID = new com.sun.security.auth.module.UnixSystem().getGid()

docker {
    enabled = true    
    runOptions = '--user $UID:$GID'
}

in your nextflow.config and it will tell docker to run as the current user. You could also just hardcode the UID and GID values if you happen to know them.

Anton Goloborodko
@golobor
Apr 28 2017 21:18
awesome!! Thank you!
small correction for the recipe - replace single quotes ' with double quotes " in "--user $UID:$GID" to enable substitution of $UID and $GID inside that string
it works!
Mike Smoot
@mes5k
Apr 28 2017 21:23
Yes of course, sorry. My python roots are showing through.
Anton Goloborodko
@golobor
Apr 28 2017 21:24
haha, I'm having the exact same issue :)
Paolo Di Tommaso
@pditommaso
Apr 28 2017 21:46
yep, even easier
docker.runOptions = '-u $(id -u):$(id -g)'
Anton Goloborodko
@golobor
Apr 28 2017 21:51
nice!
Paolo Di Tommaso
@pditommaso
Apr 28 2017 21:52
anyhow congrats for distiller-nf! I'm impressed, I love your coding style :)
Anton Goloborodko
@golobor
Apr 28 2017 21:54
thank you! I mean - really, thank you for making this tool, it's really, really useful! In our case, we expect the users to use all kinds of clusters and not be willing to deal with various dependencies. Having Nextflow taking care of AWS/SGE/LSF/etc and dockers is priceless!
Paolo Di Tommaso
@pditommaso
Apr 28 2017 21:54
thanks
have you some background with java/groovy ?
Anton Goloborodko
@golobor
Apr 28 2017 21:57
not at all :( Had to learn it four days ago when I decided to re-implement a Snakemake version of the pipeline (I really wanted AWS+dockers). The first 6 hours were painful, but once I figured how to mimic Python's list comprehensions with .collect and closures and found all required string operations it became very easy.
Paolo Di Tommaso
@pditommaso
Apr 28 2017 21:58
wow! chapeau (as they say in france) !
Anton Goloborodko
@golobor
Apr 28 2017 21:59
:) everything is well documented and stackoverflow helps a lot, as usual!
Paolo Di Tommaso
@pditommaso
Apr 28 2017 21:59
usually pythonists tend to be really scared by NF
Anton Goloborodko
@golobor
Apr 28 2017 22:00
I was! We even had a short twitter exchange about it ;)
Paolo Di Tommaso
@pditommaso
Apr 28 2017 22:00
I will mention your experience in some talk :)
Anton Goloborodko
@golobor
Apr 28 2017 22:00
:)
Paolo Di Tommaso
@pditommaso
Apr 28 2017 22:00
ahh, yes now I remember. great.
I would only refactor isSingleFile as shown below
Anton Goloborodko
@golobor
Apr 28 2017 22:02
oh, absolutely, i'd be super curious to know
Paolo Di Tommaso
@pditommaso
Apr 28 2017 22:02
boolean isSingleFile(object)  {
  object instanceof Path   
}
Anton Goloborodko
@golobor
Apr 28 2017 22:02
oh
that's it ? :))
nice!
Paolo Di Tommaso
@pditommaso
Apr 28 2017 22:02
yes :)
Anton Goloborodko
@golobor
Apr 28 2017 22:02
btw, what if it's a list with a single file?
Paolo Di Tommaso
@pditommaso
Apr 28 2017 22:03
it can't happen
Anton Goloborodko
@golobor
Apr 28 2017 22:03
ah, ok :)
Paolo Di Tommaso
@pditommaso
Apr 28 2017 22:03
then you can reformat all tag without an external closure eg
tag "library:${library} run:${run}"
Anton Goloborodko
@golobor
Apr 28 2017 22:04
oh!
Paolo Di Tommaso
@pditommaso
Apr 28 2017 22:04
it's implicit
Anton Goloborodko
@golobor
Apr 28 2017 22:04
makes sense, indeed
i remember now that some of the examples had it like that
Paolo Di Tommaso
@pditommaso
Apr 28 2017 22:08
as soon as you will have a README in your repo I will include it in the Featured pipelines in our list :)
Anton Goloborodko
@golobor
Apr 28 2017 22:09
this is crazy, thank you so much! :)
Paolo Di Tommaso
@pditommaso
Apr 28 2017 22:09
ahah
Anton Goloborodko
@golobor
Apr 28 2017 22:10
we'll have docs in the next week or two, I'll let you know! Again, thank you so much!
Paolo Di Tommaso
@pditommaso
Apr 28 2017 22:12
Great, I want definitely learn more about it. You are very welcome!
then you if you want to be a real pro, you could setup CI server for your pipeline
like for example
Anton Goloborodko
@golobor
Apr 28 2017 22:15
yeah, that sounds like a very useful idea! I did set up travis for the custom CLI apps used by distiller (https://travis-ci.org/mirnylab/pairsamtools), but when it comes to pipelines I wasn't sure how to approach it. I will definitely take a look at your example and implement something of this sort - the peace of mind that comes from tests is worth any effort. :)
Paolo Di Tommaso
@pditommaso
Apr 28 2017 22:16
exactly, with docker it's straightforward.
the only boring thing is to create a dataset small enough that can be included in the repo to test the pipeline execution
Paolo Di Tommaso
@pditommaso
Apr 28 2017 22:21
OK, going offline. Quite late here. :wave:
Anton Goloborodko
@golobor
Apr 28 2017 22:35
oh, got caught in a chat with labmates. :) Everyone is very excited about nextflow! Again, thank you for your help, hope to stay in touch!