These are chat archives for nextflow-io/nextflow

21st
Jul 2017
Paolo Di Tommaso
@pditommaso
Jul 21 2017 07:31
@sergpolly I may need to review the reason regarding point, may you open a gihub issue for that? as point 2, is it related to the wrong output as a consequence of 1?
Shellfishgene
@Shellfishgene
Jul 21 2017 07:40
Still trying to get fastq files for multiple samples with multiple pairs per sample into a process, all pairs for one sample at a time. These are my example files:
s1.l1.R1.fa  s1.l1.R2.fa  s1.l2.R1.fa  s1.l2.R2.fa  s2.l1.R1.fa  s2.l1.R2.fa  s2.l2.R1.fa  s2.l2.R2.fa

And I'm trying this:

params.reads = "./*.R{1,2}.fa"
Channel
     .fromFilePairs(params.reads, flat: true)
     .map { prefix, file1, file2 -> tuple(getLibraryId(prefix), file1, file2) }
     .groupTuple()
     .subscribe onNext: { println it }, onComplete: { println 'Done' }

def getLibraryId( file ) {
  file.substring(0,2)
}

And from that I get this:

[s2, [s2.l2.R1.fa, s2.l1.R1.fa], [s2.l2.R2.fa, s2.l1.R2.fa]]
[s1, [s1.l2.R1.fa, s1.l1.R1.fa], [s1.l2.R2.fa, s1.l1.R2.fa]]

This is close, but the R1 and R2 are not in pairs but together. Also I'm not sure how to get from this to a command line for a process that will take all files for s1, but in pairs.

Paolo Di Tommaso
@pditommaso
Jul 21 2017 07:44
how is the target structure you would like to have ?
Shellfishgene
@Shellfishgene
Jul 21 2017 07:45
I'm not sure about the structure, the process would be command s1.l1.R1.fa s1.l1.R2.fa s1.l2.R1.fa s1.l2.R2.fa
Paolo Di Tommaso
@pditommaso
Jul 21 2017 07:47
what if you group per sampleId? the lexicographic ordering of the pair files would guarantee that pairs match ?
thinking ?
Shellfishgene
@Shellfishgene
Jul 21 2017 08:03
That would work I guess. I'd need an example for that, too, sorry...
mahdi-b
@mahdi-b
Jul 21 2017 08:10

@Shellfishgene If you only want to maintain order, i.e., [s2, [s2.l2.R1.fa, s2.l1.R2.fa], [s2.l2.R1.fa, s2.l1.R2.fa]], then perhaps you can group your files.
just replace your map with this

.map { prefix, file1, file2 -> tuple(getLibraryId(prefix), [file1, file2]) }

Paolo Di Tommaso
@pditommaso
Jul 21 2017 08:17
in that case it's easier, something like that
Shellfishgene
@Shellfishgene
Jul 21 2017 08:17
@mahdi-b yes, that works, thanks.
Now I just have to flatten that again for the process. This does not work:
input:
   set name, file(file_list) from files_ch

   """
   echo $file_list
   """
Paolo Di Tommaso
@pditommaso
Jul 21 2017 08:19
Channel.fromFilePairs('/some/path/*.l{1,2}.R{1,2}.fa', size: 4)
that's all
mahdi-b
@mahdi-b
Jul 21 2017 08:23
@pditommaso ha! This is exactly why I love NextFlow -- so elegant.
Instead of writing documentation, you ship an @pditommaso with every git pull of NF so that he can sit next to the user while they write their script! :)
Shellfishgene
@Shellfishgene
Jul 21 2017 08:24
@pditommaso that looks simple indeed. How come nf does not print anything if I append .subscribe { println it } to that?
mahdi-b
@mahdi-b
Jul 21 2017 08:24
print "$it" ?
Shellfishgene
@Shellfishgene
Jul 21 2017 08:25
no...
Paolo Di Tommaso
@pditommaso
Jul 21 2017 08:25
if does not print, there's output to print ;)
double check your glob pattern
Shellfishgene
@Shellfishgene
Jul 21 2017 08:26
Yes, had a typo. However I think for me that solution won't work, some samples are split across more lanes than others, so the size: 4 won't be correct for all.
mahdi-b
@mahdi-b
Jul 21 2017 08:30
@Shellfishgene would size: -1 work?
Shellfishgene
@Shellfishgene
Jul 21 2017 08:33
@mahdi-b Yes that works, but I have to change the glob to *.l{1,2,3}.R{1,2}.fa, so I still have to put the maximum number of lanes in. Not a big problem though.
So the first solution that gets me [s1, [[/home/tbayer/temp/s1.l2.R1.fa, /home/tbayer/temp/s1.l2.R2.fa], [/home/tbayer/temp/s1.l1.R1.fa, /home/tbayer/temp/s1.l1.R2.fa]]] is better in that case. I'm just not sure how to "flatten" that for the process.
mahdi-b
@mahdi-b
Jul 21 2017 08:47
Ah, then perhaps flattening using another map after you groupTuple?
params.reads = "./*.R{1,2}.fa"
Channel
     .fromFilePairs(params.reads, flat: true)
     .map { prefix, file1, file2 -> tuple(getLibraryId(prefix), [file1, file2]) }
     .groupTuple()
     .map{t-> [t[0], t[1].flatten()]}
     .subscribe onNext: { println it }, onComplete: { println 'Done' }

def getLibraryId( file ) {
  file.substring(0,2)
}
not as elegant as @pditommaso's solution
Shellfishgene
@Shellfishgene
Jul 21 2017 08:48
Would that keep the order correct?
mahdi-b
@mahdi-b
Jul 21 2017 08:48
yes, it should!
Shellfishgene
@Shellfishgene
Jul 21 2017 08:52
It does indeed. Thanks for the help!
mahdi-b
@mahdi-b
Jul 21 2017 08:57
Glad it worked.
Shellfishgene
@Shellfishgene
Jul 21 2017 09:10
I have a process that indexes the fastq files, but the index file is not explicitly needed by the next process. Can I just leave out the output block for the indexing process?
mahdi-b
@mahdi-b
Jul 21 2017 09:27
You need an output block in the process. So I guess you van leave it out.
Shellfishgene
@Shellfishgene
Jul 21 2017 09:38
I just saw I can have an output block with a file but without directing it into a channel, so I guess that works, too.
@pditommaso I'm just wondering why the -resume option is not on by default?
mahdi-b
@mahdi-b
Jul 21 2017 09:42
Sorry, I meant you don't need one. You can leave it out if you don't need it downstream.
Tobias Neumann
@t-neumann
Jul 21 2017 09:55
@pditommaso coming back to an issue we discussed yesterday: I have two processes running in parallel and once they are done, I will have a final process using output files from both processes. How do I best approach this?
Paolo Di Tommaso
@pditommaso
Jul 21 2017 09:57
mix and eventually collect if you need to process all together
Tobias Neumann
@t-neumann
Jul 21 2017 09:59
I see. I don't need them all together but rather one file each per sample - how do I guarantee I will have corresponding files in the input channel? mix does not do that or does it?
Paolo Di Tommaso
@pditommaso
Jul 21 2017 10:00
ah, you need a matching key eg the sampleId that will allow to resyncing your data
Tobias Neumann
@t-neumann
Jul 21 2017 10:02
ok so I'll just have a set output from each process and then use the .phase method instead of .mix right?
Paolo Di Tommaso
@pditommaso
Jul 21 2017 10:03
yes
tho, you will need an extra map I think
Tobias Neumann
@t-neumann
Jul 21 2017 10:06
yes I think I can get this going based on the example given in the docs
Paolo Di Tommaso
@pditommaso
Jul 21 2017 10:10
also the nextflow console is very handy to make tests
I usually isolate the problem in a small snippet and test it in the console
Tobias Neumann
@t-neumann
Jul 21 2017 10:12
so you can basically run you nextflow code in there?
Paolo Di Tommaso
@pditommaso
Jul 21 2017 10:12
only to quickly test small snippets
Shellfishgene
@Shellfishgene
Jul 21 2017 12:25
Just had an error that I could not find, and it turns out that file("${sample.preqc}") is different from file( "${sample.preqc}" ). Is that a groovy thing?
Paolo Di Tommaso
@pditommaso
Jul 21 2017 12:25
I don't see the diff
Shellfishgene
@Shellfishgene
Jul 21 2017 12:25
there is spaces in the second one
Paolo Di Tommaso
@pditommaso
Jul 21 2017 12:26
make no sense
Shellfishgene
@Shellfishgene
Jul 21 2017 12:26
That's what I thought. I'll test again.
Paolo Di Tommaso
@pditommaso
Jul 21 2017 12:26
provide a test case
Shellfishgene
@Shellfishgene
Jul 21 2017 12:31
Never mind, it was file("${sample.preqc}") vs file "${sample.preqc}"
Maybe I should stop wasting everyone's time and do something else for the rest of the day...

Case in point ;)

[ba/bc7fc4] Submitted process > preprocess (4)
Exception in thread "Task submitter" groovy.lang.MissingPropertyException: No such property: bwt for class: java.lang.String
Possible solutions: bytes

I assume this is from a problem with my script and not nf?

Maxime Garcia
@MaxUlysse
Jul 21 2017 12:35
did you put a .bwt somewhere ?
Shellfishgene
@Shellfishgene
Jul 21 2017 12:36
It is, at least I'm getting better at finding the bugs
Yes, set the {} wrongly
Tobias Neumann
@t-neumann
Jul 21 2017 12:41
Is there a way to use the same channel for two different processes? if I just do it plain stupid and use it as input for two channels, nextflow complains
ERROR ~ Channel `masterChannel` has been used twice as an input by process `swembl` and process `sicer`
ok just saw in the FAQs - sorry
Tobias Neumann
@t-neumann
Jul 21 2017 13:04

@pditommaso finally I'm at the step of phasing my channels

input:
    set val(name), file(reads) from sicerChannel.phase(swemblChannel)

This however gives me:

[[id1, SICER.bed], [id1, SWEMBL.bed]]

where I actually need:

[id1, [SICER.bed, SWEMBL.bed]]

I know you mentioned using the .map method after phase, but I'm still too stupid on Groovy to wrap my head around it how to restructure my data with .map

Paolo Di Tommaso
@pditommaso
Jul 21 2017 13:13
input:
    set val(name), file(reads) from sicerChannel.phase(swemblChannel).map { left, right -> tuple(left[0], [left[1], right[1]]) }
Félix C. Morency
@fmorency
Jul 21 2017 14:51
hey @pditommaso will you be releasing 0.25.3 today?
:D
Paolo Di Tommaso
@pditommaso
Jul 21 2017 15:18
I'm out of office, I'm sorry I can't
Félix C. Morency
@fmorency
Jul 21 2017 15:23
No problem! :D
Sergey Venev
@sergpolly
Jul 21 2017 15:23
@pditommaso I opened an issue on GitHub about the (1) question (should a process fail if it fails to copy results to storeDir)
the question (2) about the downstream process should be reformulated now: I'd understand why it proceeded the way it did if I could understand how process outputs results into the output channel
In my case case output channel contained only the results that managed to get copied from work to storeDir
even though work had more output files. That's why downstream process proceeded with incomplete input and didn't complaint
Sergey Venev
@sergpolly
Jul 21 2017 15:29
Does sound about right that process outputs results that were copied to storeDir only?
Anthony Underwood
@aunderwo
Jul 21 2017 16:30
@pditommaso yes thanks the NXF_HOME worked!!
With singularity and nextflow I note that "Unlike Docker, Nextflow does not mount automatically host paths in the container when using Singularity. It expects they are configure and mounted system wide by the Singularity runtime. If your Singularity installation allows user defined bind points read the Singularity configuration section to learn how to enable Nextflow auto mounts." What does this mean if I'm wanting to mount my home directory?
Félix C. Morency
@fmorency
Jul 21 2017 18:23
iirc, the home directory is automatically mounted by singularity