These are chat archives for nextflow-io/nextflow

1st
Jun 2017
Simone Baffelli
@baffelli
Jun 01 2017 13:08 UTC
Hello. Is there a way to know the reasons why a process is rerun when nextflow is called with the resume option? I suppose the process could be modifying its own input or something like that, but I just cannot find it on my own.
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:10 UTC
use -dump-hashes and compare the differences between two runs
Simone Baffelli
@baffelli
Jun 01 2017 13:10 UTC
Excellent, exactly what I was looking for
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:10 UTC
most likely reason different order of inputs
tho the output is not so human friendly
Simone Baffelli
@baffelli
Jun 01 2017 13:10 UTC
that could be
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:11 UTC
use a difftool to compare the outputs
Simone Baffelli
@baffelli
Jun 01 2017 13:11 UTC
so if i collect several files using file("a*.something")
and they are received in a different order, the process is rerun
even though the files are the same
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:13 UTC
Umm
Are collecting files? How?
Simone Baffelli
@baffelli
Jun 01 2017 13:15 UTC
No wait, actually is the step afterwards which is rerun: collect --> do something --> something else (this is rerun)
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:16 UTC
*are you
Simone Baffelli
@baffelli
Jun 01 2017 13:17 UTC
channel
.collect()
.into{someotherchannel}
process someprocess{
  input:
    file(files:"somefiles*.ext") from someotherchannel
 output:
  file something into nextchannel

 shell:
'''
dummy-command !{files.join(" ")}
'''

}
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:19 UTC
collect result is order neutral, strange
Can you share the different dump
Simone Baffelli
@baffelli
Jun 01 2017 13:19 UTC
sure
but in what sense?
can I get the dumps from previously run processes?
sorry workflows
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:29 UTC
Let me check
Simone Baffelli
@baffelli
Jun 01 2017 13:30 UTC
now it runs from the cache
I don't get it
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:41 UTC
mmm, tho the order is not taken is consideration I think the problem is that the files are renamed not in the same sequence eg somefiles1.ext is not the same in a following run, hence it will end up in a different hash key
Simone Baffelli
@baffelli
Jun 01 2017 13:42 UTC
I see
but actually it was the following process that was being rerun
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:42 UTC
two solutions: 1) you don't rename the input files or 2) you sort the channel by using collect(sort:true)
but actually it was the following process that was being rerun
Simone Baffelli
@baffelli
Jun 01 2017 13:43 UTC
by rename you mean using input(somename:"newname*.ext")?
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:43 UTC
weird
yes
Simone Baffelli
@baffelli
Jun 01 2017 13:43 UTC
but just input(somename)?
and then use that?
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:44 UTC
that should work
Simone Baffelli
@baffelli
Jun 01 2017 13:44 UTC
cool, didn't read the docs carefully then...it always bother me to rely on fixed names
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:45 UTC
I mean a variable handle not a fixed file name
Simone Baffelli
@baffelli
Jun 01 2017 13:45 UTC
yes
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:45 UTC
ok
Simone Baffelli
@baffelli
Jun 01 2017 13:45 UTC
somename in this case
It will be a FileList
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:45 UTC
yes
Simone Baffelli
@baffelli
Jun 01 2017 13:45 UTC
or watheever name it has in nextflow
:)
i can't remember
excellent!
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:46 UTC
in principle List<Path> :smile:
Simone Baffelli
@baffelli
Jun 01 2017 13:46 UTC
Yes, but internally I saw another name:)
I had to cast it to a list
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:47 UTC
I don't remember as well, most stupid name ever chose ;)
no, you should not
Simone Baffelli
@baffelli
Jun 01 2017 13:47 UTC
I remver that i want to transpose it
with another list, but it did not let me
unless I would cast it to List
stacking_columns = [unw_ls as List, bl.collect{item->seconds_to_day(item as long)}].transpose()
without unw_ls as List it complained
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:48 UTC
ah yes, in this case you are right, I was thinking unw_ls.transpose()
Simone Baffelli
@baffelli
Jun 01 2017 13:49 UTC
no no, I cant just build list out of different types of lists I guess
I suppose because the one list is "type" (excuse my imprecision) and the other one is a regular groovy list
*typed
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:54 UTC
actually it's a bit more complicated, actually the files a collected by an object that does not implement the List interface tho it implements of the methods of a List
this is an hack need to be able to provide a custom formatting for the list items, something that I would like to some but it's a very low level groovy mess
Simone Baffelli
@baffelli
Jun 01 2017 13:54 UTC
And it supports being cast to a List
so that filelist is not a Collection
though it supports the same methods?
Paolo Di Tommaso
@pditommaso
Jun 01 2017 13:56 UTC
yes
Simone Baffelli
@baffelli
Jun 01 2017 14:39 UTC
cool ;)
Simone Baffelli
@baffelli
Jun 01 2017 14:48 UTC
I think the problem is that some function I'm using to combine maps in the pipeline something fails
And then the list of collected files changes
and that causes a whole series of changes downstream
:laughing:
Paolo Di Tommaso
@pditommaso
Jun 01 2017 14:50 UTC
even without renaming those files ?
Simone Baffelli
@baffelli
Jun 01 2017 14:50 UTC
I did not try that yet
Paolo Di Tommaso
@pditommaso
Jun 01 2017 14:50 UTC
ah
Simone Baffelli
@baffelli
Jun 01 2017 14:50 UTC
those were crazy days ;)
I'm tyrying to get a final figure done for my conference presentation :sweat_smile:
Paolo Di Tommaso
@pditommaso
Jun 01 2017 14:51 UTC
like any other day :sunglasses:
Simone Baffelli
@baffelli
Jun 01 2017 14:51 UTC
on an easy day I manage to go home at 7 :sunglasses:
but really, you deserve a medal for nextflow
It made my life much easier
Paolo Di Tommaso
@pditommaso
Jun 01 2017 14:52 UTC
because you stop working when you are at home? ! :grin:
Simone Baffelli
@baffelli
Jun 01 2017 14:52 UTC
technically yes, in my head it never stops
Paolo Di Tommaso
@pditommaso
Jun 01 2017 14:52 UTC
ahahah
instead when you go home here arrives users from the other side of the ocean .. :D
Simone Baffelli
@baffelli
Jun 01 2017 14:54 UTC
so it never stops for you
Paolo Di Tommaso
@pditommaso
Jun 01 2017 14:54 UTC
fucking globalisation !
Simone Baffelli
@baffelli
Jun 01 2017 14:54 UTC
do you ever sleep?
Paolo Di Tommaso
@pditommaso
Jun 01 2017 14:54 UTC
sometimes
Maxime Garcia
@MaxUlysse
Jun 01 2017 14:58 UTC
I don't believe you
Félix C. Morency
@fmorency
Jun 01 2017 14:58 UTC
^
Paolo Di Tommaso
@pditommaso
Jun 01 2017 14:58 UTC
:sleeping:
Simone Baffelli
@baffelli
Jun 01 2017 14:59 UTC
It would be rather cool not having to sleep
(especially if employers would not be aware of that)
Paolo Di Tommaso
@pditommaso
Jun 01 2017 14:59 UTC
is not everybody in the Singularity channel for the new release party !? :D
Félix C. Morency
@fmorency
Jun 01 2017 15:00 UTC
new release?
Paolo Di Tommaso
@pditommaso
Jun 01 2017 15:00 UTC
singularity 2.3
Félix C. Morency
@fmorency
Jun 01 2017 15:01 UTC
oh yeah! woohoo!
Paolo Di Tommaso
@pditommaso
Jun 01 2017 15:01 UTC
you see, you were sleeping :D
already up and running here
Félix C. Morency
@fmorency
Jun 01 2017 15:03 UTC
@pditommaso do you have (or anyone here) experience in kernel tunable (or other tunable) for network io/heavy load/huge files transfer?
Paolo Di Tommaso
@pditommaso
Jun 01 2017 15:05 UTC
I'm not a kernel specialist, when I was interviewed by google I neither know what an inode is :)
Félix C. Morency
@fmorency
Jun 01 2017 15:10 UTC
:D
Simone Baffelli
@baffelli
Jun 01 2017 15:24 UTC
@pditommaso is it ok if I use the nextflow logo in a presentation?
as an advertisement
Paolo Di Tommaso
@pditommaso
Jun 01 2017 15:25 UTC
you are welcome
academic ?
Simone Baffelli
@baffelli
Jun 01 2017 15:25 UTC
sure
Paolo Di Tommaso
@pditommaso
Jun 01 2017 15:26 UTC
more than happy, if you can share also here it would be interesting for the community
Simone Baffelli
@baffelli
Jun 01 2017 15:26 UTC
I will!
Paolo Di Tommaso
@pditommaso
Jun 01 2017 15:27 UTC
great, so NF is going into space ;)
Simone Baffelli
@baffelli
Jun 01 2017 15:27 UTC
not really :)
:rocket:
what I'm working on is a bit at the edge of the space/satellite community, because i use a device similar to those on satellites, only mine is on the ground
Paolo Di Tommaso
@pditommaso
Jun 01 2017 15:28 UTC
sound cool the same
Simone Baffelli
@baffelli
Jun 01 2017 15:28 UTC
and my datasets are a bitsmaller :sweat_smile: but much longer in time
in the order of 200/500 images/day, each a few mb big
chdem
@chdem
Jun 01 2017 16:24 UTC
Hi there ! I'm trying to make a nextflow script for multiqc and I need to use conditional input in the final process : MultiQC can analyze data from many sources (each launched by processes). Depending of the available data source, user can launch the NF script with 3 parameters (--fasta_path or/and --bam_path or/and --vcf_path).
Of course, I need to be sure that all the processes are finnished to launch the final MultiQC process
I can do that by putting the input channels from the corresponding processes in the MultiQC final process BUT
inputs depends on the params combinaisons used
Paolo Di Tommaso
@pditommaso
Jun 01 2017 16:27 UTC
the exactly what is the problem ?
chdem
@chdem
Jun 01 2017 16:27 UTC
because of conditional inputs are not available, with 3 parameters, I have to make 5 tests
and to write 5 times the final MultiQC process
with differents input
is there another way to do that ?
I'm sorry if i'm not clear
Paolo Di Tommaso
@pditommaso
Jun 01 2017 16:31 UTC
I have some problem with this
because of conditional inputs are not available, with 3 parameters, I have to make 5 tests
and to write 5 times the final MultiQC process
chdem
@chdem
Jun 01 2017 16:31 UTC
actually, I have to do 7 tests and 7 versions of my final process
test 1 : (!fasta && !bam && vcf) ; test 2 : (!fasta && bam && !vcf) ; test 3 : (fasta && !bam && !vcf) ; test 4 : (!fasta && bam && vcf) ; test 5 (fasta && !bam && vcf) ; test 6 (fasta && bam && !vcf) ; test7 (fasta && bam && vcf)
```
Paolo Di Tommaso
@pditommaso
Jun 01 2017 16:32 UTC
is it supposed to be NF code ?
chdem
@chdem
Jun 01 2017 16:35 UTC
if (!params.fasta && !params.bam && params.vcf) {
process final_multiqc {
    input:
    file('') from channel_process_vcf_1.collect()
    file('') from channel_process_vcf_2.collect()
    file('') from channel_process_vcf_3.collect()

   output:
   file "*multiqc_report.html"
   file "*multiqc_data"

   script:
   """
   multiqc_command
   """
}
}
if (!params.fasta && params.bam && !params.vcf) {
process final_multiqc {
    input:
    file('') from channel_process_bam_1.collect()
    file('') from channel_process_bam_2.collect()
    file('') from channel_process_bam_3.collect()

   output:
   file "*multiqc_report.html"
   file "*multiqc_data"

   script:
   """
   multiqc_command
   """
}
}
etc....
Paolo Di Tommaso
@pditommaso
Jun 01 2017 16:36 UTC
ummm too bad
chdem
@chdem
Jun 01 2017 16:36 UTC
this is exactly the same process, only inputs change
:(
ok, so no other way....
Paolo Di Tommaso
@pditommaso
Jun 01 2017 16:37 UTC
use just one multiqc step and just create a single channel collects all the other channels
chdem
@chdem
Jun 01 2017 16:38 UTC
Great ! Thank you @pditommaso !
I'm going to test this !
Paolo Di Tommaso
@pditommaso
Jun 01 2017 16:39 UTC
or even easier
if you have two or more different branches creating the same channel
you can have a single downstream process using it eg
chdem
@chdem
Jun 01 2017 16:40 UTC
that sounds great....
but I don't have the same number of channels in each branch
Paolo Di Tommaso
@pditommaso
Jun 01 2017 16:41 UTC
if( condition ) {
  process foo {
    output: file x into channel_process_bam_1
   : 
  } 
}
else {
    process bar {
    output: file x into channel_process_bam_1
   : 
  } 
}

process multiqc {
  input file x from channel_process_bam_1.collect()
  :

}
chdem
@chdem
Jun 01 2017 16:42 UTC
understood
what if :
Paolo Di Tommaso
@pditommaso
Jun 01 2017 16:42 UTC
or even (depend if possible in your case)
chdem
@chdem
Jun 01 2017 16:43 UTC
if( condition ) {
  process foo {
    output: file x into channel_process_bam_1
   : 
  } 
process foo2 {
    output: file x into channel_process_bam_2
   : 
  } 
}
else {
    process bar {
    output: file x into channel_process_bam_1
   : 
  } 
}
Paolo Di Tommaso
@pditommaso
Jun 01 2017 16:43 UTC
process foo {
  output: file x into channel_process_bam_1

  script: 
  if( condition ) 
    '''
    to_this
    '''
 else 
   '''
   to_that
   '''
}

process multiqc {
  input file x from channel_process_bam_1.collect()
  :

}
chdem
@chdem
Jun 01 2017 16:44 UTC
ok, this is the example that you gives in google group
I know well understand your point
I'm going to do some tests
Paolo Di Tommaso
@pditommaso
Jun 01 2017 16:45 UTC
regarding your last example the have the last process producing the same output channel eg
if( condition ) {
  process foo {
    output: file x into channel_process_bam_1
   : 
  } 
process foo2 {
    output: file x into channel_process_bam_2
   : 
  } 
}
else {
    process bar {
    output: file x into channel_process_bam_2
   : 
  } 
}
ie channel_process_bam_2 in both branches
does it make sense ?
chdem
@chdem
Jun 01 2017 16:46 UTC
yes, absolutly
thank you @pditommaso , your very helpful !
Paolo Di Tommaso
@pditommaso
Jun 01 2017 16:46 UTC
:+1:
chdem
@chdem
Jun 01 2017 16:46 UTC
good evening !
:smile:
Paolo Di Tommaso
@pditommaso
Jun 01 2017 16:47 UTC
same