These are chat archives for nextflow-io/nextflow

18th
May 2017
Simone Baffelli
@baffelli
May 18 2017 07:21
Good morning. It's me annyoing again. It seems that I am not capable to write to a file using groovy in the local scope...
Basically, I need to collect several files, list their paths in a csv file and call a command using that file as input. I tried the following:
process stack{
    input:
      val to_stack from to_stack

    output:
      set file(rate), file(sig_rate), file(sig_ph) into stacked
      file(dt) into diff_tab

    shell:
      first_off_par=to_stack[0]['off_par']
      unw = to_stack.collect{item->item['unw']}
      off_par = to_stack.collect{item->item['off_par']}
      baseline = to_stack.collect{item->item['baseline']}
      stacking_columns = [unw, baseline].transpose()
      dt_content = stacking_columns.collect{line -> "${line[0]}\t${line[1]}"}.join("\n")
      dt = file('dt')
      dt << dt_content
      '''
      width=$(get_value !{first_off_par} interferogram_width)
      stacking !{dt} ${width} rate rate sig_rate sig_ph 256 120 - - 0
      '''

}
However, when i run it, nextflow complains that "dt" is out of the scope of the working directory. I wonder what is the correct idiom to write to dt in the local working directory.
Also, if I inspect "dt", which is created in the path where my nextflow script is, I see that the files in the csv contain paths to other working directories. I suppose the command "stacking" will again complain about them being out of scope, right?
Simone Baffelli
@baffelli
May 18 2017 07:49
Ok, I should read the documentation better. I think I know how to solve the second part ;)
Evan Floden
@evanfloden
May 18 2017 08:05
If you are still stuck, I think the best strategy is to collect the file names in the csv as one channel (with working directory paths) and then have the collected files themselves as another channel. This will allow you to pass the csv file as an argument AND have NF symlink them from the previous work dir.
See here for a working example of this:
Simone Baffelli
@baffelli
May 18 2017 08:06
I solved it as follows:
process stack{
    input:
      file(unw_ls:'*.unw') from unw_stack.collate(params.n_stack)
      file(off_ls:'*.off_par') from off_par_stack.collate(params.n_stack)
      val bl from bl_stack.collate(params.n_stack)

    output:
      set file(rate), file(sig_rate), file(sig_ph) into stacked
      file(dt:'dt') into diff_tab

    shell:
      stacking_columns = [unw_ls as List, bl].transpose()
      //construct the columns of the diff_tab file
      dt_content = stacking_columns.collect{line -> "${line[0]}\t${line[1]}"}.join("\n")
      dt = file('dt')
      dt << dt_content

      '''
      echo !{dt_content} > dt
      width=$(get_value !{off_ls[0]} interferogram_width)
      stacking dt ${width} rate sig_rate sig_ph 256 120 - - 10 0
      '''

}
Still, I am not able to have "dt" in my local scope
But yeah, your solution could end up being more elegant!
Evan Floden
@evanfloden
May 18 2017 08:11
I prefer to keep the channel operators outside the process but that is preference. Let me know if it is not clear what my example is doing.
Simone Baffelli
@baffelli
May 18 2017 08:12
It is perfectly clear! Only, my case is slightly more complicated because I want to repeat "stacking" for buffers of 10 files at a time, in a moving window sort of way.
Paolo Di Tommaso
@pditommaso
May 18 2017 08:16
not sure what the syntax file(off_ls:'*.off_par') or file(dt:'dt') means ?
:)
Simone Baffelli
@baffelli
May 18 2017 08:17
somehow I want to access the list of "*off_par" in my groovy script
it seems to work somehow ;)
gives me some "blankSeparatedList"
or some nextflow internal object
Evan Floden
@evanfloden
May 18 2017 08:17
Me too! I assumed it was something you had suggested @pditommaso. How about contructing a channel containing the name of the 10 files. If it works though :smile:
Paolo Di Tommaso
@pditommaso
May 18 2017 08:18
I'm starting to lose the control on this tool ;)
Simone Baffelli
@baffelli
May 18 2017 08:18
In principle the script works, I just cannot produce a "dt" file exclusive to the workdir scope
If I inspect dt, I will see all the csv appended in a single huge file
Paolo Di Tommaso
@pditommaso
May 18 2017 08:20
yes because in that context the final task workdir is not yet available, you should create that file outside the process and pass as an input to it
Evan Floden
@evanfloden
May 18 2017 08:20
:+1:
Paolo Di Tommaso
@pditommaso
May 18 2017 08:20
or ...
Simone Baffelli
@baffelli
May 18 2017 08:20
Or use the shell command to pipe "stacking_columns" into the dt file
but not in the way I was trying to do it ;)
anyway the file(unw_ls:'*.unw') should be definitely be documented, I find it pretty useful
Paolo Di Tommaso
@pditommaso
May 18 2017 08:21
create the file content as a string, pass to it as an input to the process, and it will automatically saved as a file in process context
anyway the file(unw_ls:'*.unw') should be definitely be documented, I find it pretty useful
now I remember that I never documented because not sure to keep it as a feature at least in the current form
Evan Floden
@evanfloden
May 18 2017 08:23
What does it do BTW
Simone Baffelli
@baffelli
May 18 2017 08:23
Please do! Or allow something similar
Paolo Di Tommaso
@pditommaso
May 18 2017 08:23
:)
Simone Baffelli
@baffelli
May 18 2017 08:23
It allows me to access the list of files in groovy
without naming them explicitely
Evan Floden
@evanfloden
May 18 2017 08:25
so creates a variable called ‘unw_ls’ that contains a list of the file names?
Simone Baffelli
@baffelli
May 18 2017 08:25
appraently it does, yes
Paolo Di Tommaso
@pditommaso
May 18 2017 08:25
yes, the file names assigned to the process with that pattern
Simone Baffelli
@baffelli
May 18 2017 08:25
which I use to generate my "dt" file
without needing a separate process
it's a pretty cool feature
Evan Floden
@evanfloden
May 18 2017 08:26
the process
?
Paolo Di Tommaso
@pditommaso
May 18 2017 08:26
*by the process
Evan Floden
@evanfloden
May 18 2017 08:26
the current one
Paolo Di Tommaso
@pditommaso
May 18 2017 08:26
yes
Evan Floden
@evanfloden
May 18 2017 08:26
ah okay.
Paolo Di Tommaso
@pditommaso
May 18 2017 08:27
like when do file "foo*" from ..
but it also create a variable handle to access their names
Evan Floden
@evanfloden
May 18 2017 08:27
I see, hence unw_ls as List
Simone Baffelli
@baffelli
May 18 2017 08:28
exactly, because it is not a normal list
groovy did not let me transpose it with bl
Paolo Di Tommaso
@pditommaso
May 18 2017 08:28
Simone is a good hacker ;)
Simone Baffelli
@baffelli
May 18 2017 08:29
Somehow I think I should have done my PhD in computer science, not in the geosciences
Evan Floden
@evanfloden
May 18 2017 08:31
Nah, it is all about applying the skills! So cool to see people using NF across so many different fields!
Simone Baffelli
@baffelli
May 18 2017 08:32
Four years ago I would never have imaged that I would have daily coding discussion with biologist/bioinformaticians ;)
Evan Floden
@evanfloden
May 18 2017 08:33
Haha, keep it coming, we all have something to teach other.
Simone Baffelli
@baffelli
May 18 2017 08:35
True! And so much to learn. I see that people from very different fields have the same set of concerns
Simone Baffelli
@baffelli
May 18 2017 08:42
And I must say I'm starting to love grooy
Evan Floden
@evanfloden
May 18 2017 08:42
That is a bold statement to make around here :laughing:
Simone Baffelli
@baffelli
May 18 2017 08:42
groovy
:smile:
Paolo Di Tommaso
@pditommaso
May 18 2017 08:43
:+1:
Simone Baffelli
@baffelli
May 18 2017 08:43
I used to like Java when I was studying. And a less verbose sort of java is actually quite cool.
Paolo Di Tommaso
@pditommaso
May 18 2017 08:44
yep, it's a very handy programming lang with tons of features
Simone Baffelli
@baffelli
May 18 2017 08:45
I wonder whether scala is as handy
Paolo Di Tommaso
@pditommaso
May 18 2017 08:46
well I don't think handy is the right definition for Scala ;)
it's popularity is driven mainly by Spark and the people around the Akka framework
it's somehow the C++ for the Java world
Simone Baffelli
@baffelli
May 18 2017 08:52
reason enough not to like it ;)\
but I like functional programming in general
Paolo Di Tommaso
@pditommaso
May 18 2017 08:52
I tend to agree with you
Bili Dong
@qobilidop
May 18 2017 08:53
I'm not familiar with Java but I hear a lot of hype about Kotlin recently
Paolo Di Tommaso
@pditommaso
May 18 2017 08:54
@qobilidop I'm still not able to figure your timezone ;)
Bili Dong
@qobilidop
May 18 2017 08:54
I'm in San Diego
Sleep late today😂
Paolo Di Tommaso
@pditommaso
May 18 2017 08:55
ahah
Kotlin is an interesting project, it's supposed a static compiled language similar to Groovy but without the bad parts
the same for Swift (the Apple one)
Bili Dong
@qobilidop
May 18 2017 08:59
haha, just came across this today
when I saw this I was wishing for a 'groovy is like python' that I can reference
Simone Baffelli
@baffelli
May 18 2017 09:04
It would only need the equivalent of numpy
and I would move to groovy
Paolo Di Tommaso
@pditommaso
May 18 2017 09:05
all the scientific libraries in the python stack are superior, is out of question
Simone Baffelli
@baffelli
May 18 2017 09:05
yes! absolutely
although a colleague of mine is using julia now and he's quite happy
Paolo Di Tommaso
@pditommaso
May 18 2017 09:06
however there are some alternative for the JVM
Bili Dong
@qobilidop
May 18 2017 09:06
I’m actually curious about your choice of Groovy for the Nextflow project if I may ask @pditommaso
forget about my question if that may take too much of your time to answer, I’m just curious
Paolo Di Tommaso
@pditommaso
May 18 2017 09:08
because the JVM provide the state of art technology for parallelisation, like dataflow on NF is based on
also because NF as been designed to enable very large scalable computation for which Java is still the better, see Hadoop, Spark, etc
but at the same time I wanted an easy to use programming lang without the verbosity of Java
Bili Dong
@qobilidop
May 18 2017 09:10
those are rigid reasons
I don’t know about those things before. I feel i have a lot to learn.
@pditommaso Thank you very much for making this wonderful tool, even if I’m not able to fully understand the technology :)
Paolo Di Tommaso
@pditommaso
May 18 2017 09:12
:D
Simone Baffelli
@baffelli
May 18 2017 09:13
That coudl be interesting in my next life ;)
Paolo Di Tommaso
@pditommaso
May 18 2017 09:13
I guess so !
;)
Simone Baffelli
@baffelli
May 18 2017 09:13
At the moment I will rely on my rather large python library :)
too close to the deadline
Paolo Di Tommaso
@pditommaso
May 18 2017 09:13

phdlife !

Simone Baffelli
@baffelli
May 18 2017 10:01
:mortar_board:
Anton Goloborodko
@golobor
May 18 2017 16:15
Hi, Paolo! I got swamped with other projects, but my labmates and collaborators started testing our Nextflow pipeline for Hi-C, distiller. One issue came up that is most likely from the domain of nextflow. Basically, we have a task that wget's an .sra file and dumps it into fastq.gz and then renames fastqs. https://github.com/mirnylab/distiller-nf/blob/master/distiller.nf#L79
Paolo Di Tommaso
@pditommaso
May 18 2017 16:16
what's the problem with that ?
Anton Goloborodko
@golobor
May 18 2017 16:16
So, one of my labmates ran into an issue when one of these tasks executed every line, including renaming, but the task did not get marked as executed
I'm looking at the logs right now, can send any line
One second, i'll dig up the problematic lines
Paolo Di Tommaso
@pditommaso
May 18 2017 16:17
so, it's failing ?
Anton Goloborodko
@golobor
May 18 2017 16:17
Not at all, the task performed everything it had to, but never returned
Paolo Di Tommaso
@pditommaso
May 18 2017 16:18
umm, it's hanging ?
Anton Goloborodko
@golobor
May 18 2017 16:19
The central log (.nextflow.log) contains the following lines:
May-18 10:22:31.237 [Thread-1] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 20 -- first: TaskHandler[id: 2; name: download_sra (sra:SRR5266588); status: RUNNING; exit: -; workDir: /home/magus/projects/testDistiller/mapWapl3/work/04/ba825ca0b44b8b15c57d9d55924759]
May-18 10:27:31.284 [Thread-1] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 20 -- first: TaskHandler[id: 2; name: download_sra (sra:SRR5266588); status: RUNNING; exit: -; workDir: /home/magus/projects/testDistiller/mapWapl3/work/04/ba825ca0b44b8b15c57d9d55924759]
Paolo Di Tommaso
@pditommaso
May 18 2017 16:19
wrap it in triple ` please
Anton Goloborodko
@golobor
May 18 2017 16:20
May-18 10:22:31.237 [Thread-1] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 20 -- first: TaskHandler[id: 2; name: download_sra (sra:SRR5266588); status: RUNNING; exit: -; workDir: /home/magus/projects/testDistiller/mapWapl3/work/04/ba825ca0b44b8b15c57d9d55924759]                                                                                  
May-18 10:27:31.284 [Thread-1] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 20 -- first: TaskHandler[id: 2; name: 
download_sra (sra:SRR5266588); status: RUNNING; exit: -; workDir: /home/magus/projects/testDistiller/mapWapl3/work/04/ba825ca0b44b8b15c57d9d55924759]
sorry :)
Paolo Di Tommaso
@pditommaso
May 18 2017 16:20
no pb
Anton Goloborodko
@golobor
May 18 2017 16:20
these lines kept coming up for 2 days after the last bash line was executed
Paolo Di Tommaso
@pditommaso
May 18 2017 16:21
it could be that docker image download hang ?
Anton Goloborodko
@golobor
May 18 2017 16:21
oh
yeah, that would be totally possible
would you have any ideas on how to test it?
Paolo Di Tommaso
@pditommaso
May 18 2017 16:22
pull it manually and try it again ?
Anton Goloborodko
@golobor
May 18 2017 16:23
yeah, fair! We'll look into it. Thank you, that probably was the issue!
Paolo Di Tommaso
@pditommaso
May 18 2017 16:29
Let's see
I have seen a similar problem with Docker 1.9.x
Anton Goloborodko
@golobor
May 18 2017 16:31
so, my labmate actually said that the error is reproducible with this particular task
which is actually good for debugging
Anton Goloborodko
@golobor
May 18 2017 16:45
thanks for the lead, we found that different versions of docker are installed on machines where distiller works and where it doesn't!
Paolo Di Tommaso
@pditommaso
May 18 2017 16:46
Ah, it's a cluster deployment?
Anton Goloborodko
@golobor
May 18 2017 16:47
oh, no, just multiple 20-core machines scattered across the lab. Nextflow was used separately for two different projects on two different machines. Worked on one and hung on another
the machine that succeed had the version of docker from the central ubuntu repo
$ docker --version Docker version 1.12.6, build 78d1802
haha, i fail at quotes
and the machine that failed had docker from a PPA
Docker version 17.05.0-ce, build 89658be
Paolo Di Tommaso
@pditommaso
May 18 2017 16:49
Um very recent one
Are you able to replicate the issue consistently?
Anton Goloborodko
@golobor
May 18 2017 16:50
let me confirm with the labmate!
Paolo Di Tommaso
@pditommaso
May 18 2017 16:50
Ok