These are chat archives for nextflow-io/nextflow

3rd
Mar 2017
Karin Lagesen
@karinlag
Mar 03 2017 09:25
Is there a document somewhere that explains the nextflow philosopy? I.e. more high level with processes, projects, work, etc?
the architecture, if I may call it that
Fredrik Boulund
@boulund
Mar 03 2017 12:42
@karinlag for a really high overview of the underlying concept I guess you could refer to the "dataflow programming model"? https://en.wikipedia.org/wiki/Dataflow_programming
But for more nextflow-specific stuff I'm not the one to say anything :) Still learning nextflow myself.
Karin Lagesen
@karinlag
Mar 03 2017 12:43
:)
Just started experimenting with it this week myself
and I discovered the work directory purely by accident
hence my question :)
Fredrik Boulund
@boulund
Mar 03 2017 12:43
ah... There is a lot of information about this in the documentation, but it's very long!
Karin Lagesen
@karinlag
Mar 03 2017 12:44
yes, and it is also very detailed
Fredrik Boulund
@boulund
Mar 03 2017 12:44
yep, for good and bad sometimes :)
Karin Lagesen
@karinlag
Mar 03 2017 12:44
I'd like something more "high level", explaining the thought processes that is behind its structure
Fredrik Boulund
@boulund
Mar 03 2017 12:45
I totally understand what you mean. I also agree it's hard to grasp that kind of information from the documentation, unfortunately
Karin Lagesen
@karinlag
Mar 03 2017 12:45
I think that would help me understand things, and also understand the details better
Fredrik Boulund
@boulund
Mar 03 2017 12:45
took me many weeks of trail-and-error, reading and rereading the documentation, and asking lots of questions :)
Karin Lagesen
@karinlag
Mar 03 2017 12:45
sounds familiar :)
Fredrik Boulund
@boulund
Mar 03 2017 12:45
and I'm still here, asking questions, so maybe not the most efficient way to go about it ;)
Karin Lagesen
@karinlag
Mar 03 2017 12:45
but, having this chat is very useful
Fredrik Boulund
@boulund
Mar 03 2017 12:49
indeed! pditommaso has helped me a lot here when I was stuck
Karin Lagesen
@karinlag
Mar 03 2017 12:49
same here
Paolo Di Tommaso
@pditommaso
Mar 03 2017 12:50
I like this community :)
Phil Ewels
@ewels
Mar 03 2017 12:50
Hi @karinlag - I don't know of any high level introduction text (it would be nice), there are videos of presentations which may give a good introduction though: https://www.nextflow.io/presentations.html
Maxime Garcia
@MaxUlysse
Mar 03 2017 12:51
:+1:
Fredrik Boulund
@boulund
Mar 03 2017 12:53
So, now that there are so many here, maybe someone knows a good way to solve my current issue
Karin Lagesen
@karinlag
Mar 03 2017 12:54
@ewels thanks, I\ll have a look!
Paolo Di Tommaso
@pditommaso
Mar 03 2017 12:54
@karinlag The broad view is to provide a lightweight tool to enable portable, scalable and reproducible workflows by embracing community driven technologies, such as Git, GitHub and containers
Fredrik Boulund
@boulund
Mar 03 2017 12:55
I have a read correction process that I want to produce an output channel with a pair_id and file pairs. example:
process correct_reads {
    input: 
    set pair_id, file(reads) from input_channel

    output: 
    set pair_id, file("*.corrected.fq.gz") into output_channel

    """
    correct_reads -in1 ${reads[0]} -in2 ${reads[1]} -out1 ${reads[0].baseName}.corrected.fq.gz -out2 ${reads[1].baseName}.corrected.fq.gz
     """
}
Paolo Di Tommaso
@pditommaso
Mar 03 2017 12:56
@karinlag have a look at this, this and this .
Fredrik Boulund
@boulund
Mar 03 2017 12:58
is there an easy way to give the exact file names to include in the output channel name? I'm simplifying the example a bit here, it could possibly produce several other files that match the same glob pattern.
Karin Lagesen
@karinlag
Mar 03 2017 12:59
@pditommaso thanks!
Paolo Di Tommaso
@pditommaso
Mar 03 2017 12:59
welcome
@boulund I guess the exact name is ${reads[0].baseName}.corrected.fq.gz, right ?
Fredrik Boulund
@boulund
Mar 03 2017 13:00
yep
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:01
@karinlag ah, soon a paper will be out giving a better perspective of NF
Karin Lagesen
@karinlag
Mar 03 2017 13:01
good!
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:01
@boulund Thus
Fredrik Boulund
@boulund
Mar 03 2017 13:01
I'm not familiar enough with Groovy to understand how to create the tuple with the specific file names
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:01
 output: 
    set pair_id, file("${reads[0].baseName}.corrected.fq.gz") into output_channel
Fredrik Boulund
@boulund
Mar 03 2017 13:01
but that would only send one file into the channel, right?
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:02
yes, what files you need to output ?
Fredrik Boulund
@boulund
Mar 03 2017 13:03
I'd like to capture both ${reads[0].baseName}.corrected.fq and ${reads[1].baseName}.corrected.fq.gz
my first guess would be something like
set pair_id, Tuple(file("first_file_name"), file("second_file_name")) into channel
but that didn't work when I tried it...
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:04
umm
the point is the the *.corrected.fq.gz patter is too wide, right?
Fredrik Boulund
@boulund
Mar 03 2017 13:04
yep, there might be several files in the process dir that match that glob that I don't want to include
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:06
in principle you can specify multiple names by separating them with a colon eg
file ("A:B:C:..") into ect
o even better
file ("{A,B,C}.corrected.fq.gz") into etc
Fredrik Boulund
@boulund
Mar 03 2017 13:07
aha, I didn't know that. Very useful! Thanks a lot!
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:07
to make more readable I would do the following
output: 
    set pair_id, file("{$names}.corrected.fq.gz") into output_channel
script: 
   names = reads.collect { it.baseName } .join(',')
   : 
   etc
does it make sense ?
Fredrik Boulund
@boulund
Mar 03 2017 13:09
Hmmm.... yeah, I see it now
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:10
collect all file names into names separating by a comma, then use it in the output
Fredrik Boulund
@boulund
Mar 03 2017 13:10
just like the following in Python, right?
names = ",".join([fn.basename() for fn in reads])
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:10
I guess so ;)
Fredrik Boulund
@boulund
Mar 03 2017 13:11
I think I get. Great
Thanks a lot
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:11
welcome
Fredrik Boulund
@boulund
Mar 03 2017 13:11
now btw
when you use script: in the process definition like that. Can I still use a triple quoted string below the groovy commands to run bash stuff?
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:12
yes
Fredrik Boulund
@boulund
Mar 03 2017 13:12
awesome
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:13
the script: tag allows you to put extra code or even conditional expression to define the command multiline string
LukeGoodsell
@LukeGoodsell
Mar 03 2017 13:16
Hi Paolo. Is it possible to have nextflow generate relative symlinks to files in each of the work directories, and in publishDir outputs, rather than absolute symlink paths?
Fredrik Boulund
@boulund
Mar 03 2017 13:16
perfect! thanks so much
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:17
@LukeGoodsell I think no. What's your use case ?
LukeGoodsell
@LukeGoodsell
Mar 03 2017 13:28
E.g.: A workfow is run on an NFS mount point. The symlinks are valid for the system where it runs, but not for other systems that mount the volume at a different mount point.
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:30
well, the results in the work directory is meant to discarded at the end of the computation
your problem is with the work files or some output files produced by publishDir directive ?
LukeGoodsell
@LukeGoodsell
Mar 03 2017 13:38
It’d be nice if I can go through the work files in the future, without needing to re-link them, but the publishDir is more important. I know I could use storeDir, but for the 10GB+ files, that seems needlessly slow.
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:44
publishDir supports also hard links, which are not affected by this problem. Would that work for you ?
LukeGoodsell
@LukeGoodsell
Mar 03 2017 13:51
yes, actually, I think that would. I see the documentation now and I’ll use that. Thanks!
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:52
:+1:
LukeGoodsell
@LukeGoodsell
Mar 03 2017 13:52
It would be nice if relative paths could be used in the work directories, though. Perhaps that’s something you could consider in future, unless there’s a technical reason not to?
Paolo Di Tommaso
@pditommaso
Mar 03 2017 13:53
it could make sense. you may want to open a feature request on Github for that
LukeGoodsell
@LukeGoodsell
Mar 03 2017 14:04
Done: #296
Paolo Di Tommaso
@pditommaso
Mar 03 2017 14:05
ok
Félix C. Morency
@fmorency
Mar 03 2017 15:42
I remarked that tasks that restart with no apparent reason are in the style
    script:
    if(!params.bla)
        """
        stuff
        """
    else
        """
        other stuff
        """
I also have a
    output:
    set sid, "stuff" into bla
    if(params.something=="bla")
        file "other"
Paolo Di Tommaso
@pditommaso
Mar 03 2017 16:40
the later it's not allowed
Félix C. Morency
@fmorency
Mar 03 2017 16:42
AH
@pditommaso iirc you had an undocumented feature for optional output file?
Paolo Di Tommaso
@pditommaso
Mar 03 2017 16:44
that's true .. :)
Félix C. Morency
@fmorency
Mar 03 2017 16:44
would you mind refreshing my memory? :D
Paolo Di Tommaso
@pditommaso
Mar 03 2017 16:45
yes sorry, as simple as
file 'x' optional true into z
Félix C. Morency
@fmorency
Mar 03 2017 16:46
Awesome. Hope this fixes the problem. Thanks
Paolo Di Tommaso
@pditommaso
Mar 03 2017 16:46
:+1:
@mes5k let me assemble a new build
Mike Smoot
@mes5k
Mar 03 2017 16:49
great, thanks!
Paolo Di Tommaso
@pditommaso
Mar 03 2017 17:56
@mes5k I've just uploaded 0.24.0-SNAPSHOT, enjoy it
Mike Smoot
@mes5k
Mar 03 2017 17:56
Great, will give a try
Paolo Di Tommaso
@pditommaso
Mar 03 2017 17:58
cool, let me know
Félix C. Morency
@fmorency
Mar 03 2017 19:58
@pditommaso so far so good. things seem to -resume normally now
Paolo Di Tommaso
@pditommaso
Mar 03 2017 21:15
good