These are chat archives for nextflow-io/nextflow

5th
Oct 2018
Riccardo Giannico
@giannicorik_twitter
Oct 05 2018 07:04
@tobsecret thank you, that's a very nice one, but I'm running nextflow on a non-X11 linux server. I'm thinking about creating a vm on my windows exclusively for that ..
Paolo Di Tommaso
@pditommaso
Oct 05 2018 07:24
you may give a try to linux subsystem for windows, tho I have no direct experience with that
arontommi
@arontommi
Oct 05 2018 09:44
is it possible to run nextflow within docker and use a docker image ? :D
Paolo Di Tommaso
@pditommaso
Oct 05 2018 10:00
yes, there's the an (experimental) option for that -d
nextflow -d run .. etc
Riccardo Giannico
@giannicorik_twitter
Oct 05 2018 11:36

I'm sorry guys but I'm still missing it..
If I have this:

$ ls inputdir
touch inputdir/Sample1_S1_L001_R1_001.fastq.gz 
touch inputdir/Sample1_S1_L001_R2_001.fastq.gz
touch inputdir/Sample1_S1_L002_R1_001.fastq.gz 
touch inputdir/Sample1_S1_L002_R2_001.fastq.gz
touch inputdir/Sample2_S1_L001_R1_001.fastq.gz 
touch inputdir/Sample2_S1_L001_R2_001.fastq.gz
touch inputdir/Sample2_S1_L002_R1_001.fastq.gz 
touch inputdir/Sample2_S1_L002_R2_001.fastq.gz

And I write this flow as Francesco suggested:

samples = Channel.fromFilePairs("inputdir/Sample*_*_R{1,2}_*.fastq.gz")
samples.println()

I get this:

$ nextflow run test.nf --dir "inputdir"
[Sample2_S1_L001, [/home/giannicor/test_nextflow/inputdir/Sample2_S1_L001_R1_001.fastq.gz, /home/giannicor/test_nextflow/inputdir/Sample2_S1_L001_R2_001.fastq.gz]]
[Sample1_S1_L001, [/home/giannicor/test_nextflow/inputdir/Sample1_S1_L001_R1_001.fastq.gz, /home/giannicor/test_nextflow/inputdir/Sample1_S1_L001_R2_001.fastq.gz]]
[Sample2_S1_L002, [/home/giannicor/test_nextflow/inputdir/Sample2_S1_L002_R1_001.fastq.gz, /home/giannicor/test_nextflow/inputdir/Sample2_S1_L002_R2_001.fastq.gz]]
[Sample1_S1_L002, [/home/giannicor/test_nextflow/inputdir/Sample1_S1_L002_R1_001.fastq.gz, /home/giannicor/test_nextflow/inputdir/Sample1_S1_L002_R2_001.fastq.gz]]

Isn't there a way to obtain for each sample a list of all R1 and a list of all R2 ?

Bioninbo
@Bioninbo
Oct 05 2018 12:10
Hello Riccardo,
Not the prettiest but this should do the job:
Channel.from( [ [ "Sample2_S1_L001", ["/home/giannicor/test_nextflow/inputdir/Sample2_S1_L001_R1_001.fastq.gz", "/home/giannicor/test_nextflow/inputdir/Sample2_S1_L001_R2_001.fastq.gz"]],
["Sample1_S1_L001", ["/home/giannicor/test_nextflow/inputdir/Sample1_S1_L001_R1_001.fastq.gz", "/home/giannicor/test_nextflow/inputdir/Sample1_S1_L001_R2_001.fastq.gz"]],
["Sample2_S1_L002", ["/home/giannicor/test_nextflow/inputdir/Sample2_S1_L002_R1_001.fastq.gz", "/home/giannicor/test_nextflow/inputdir/Sample2_S1_L002_R2_001.fastq.gz"]],
["Sample1_S1_L002", ["/home/giannicor/test_nextflow/inputdir/Sample1_S1_L002_R1_001.fastq.gz", "/home/giannicor/test_nextflow/inputdir/Sample1_S1_L002_R2_001.fastq.gz"]] ] )
  .map{ [ it[0].split('_')[0], it ].flatten() }
 .tap{ R1_channel_tmp }
 .map{ [ it[0,1,3] ].flatten() }
 .groupTuple()
 .view() {"R2: ${it}"}
 .set{ R2_channel }

R1_channel_tmp
 .map{ [ it[0..2] ].flatten() }
 .groupTuple()
 .view() {"R1: ${it}"}
 .set{ R1_channel }
@giannicorik_twitter
Hello everyone,
I was wondering: is there a shortcut for -dump-channels? And is it possible to set up this variable in the config file? I found the NXF_DEBUG variable but I am not sure it is related and if yes how.
Paolo Di Tommaso
@pditommaso
Oct 05 2018 12:14
or something like this
Channel
  .fromFilePairs("inputdir/Sample*_*_R{1,2}_*.fastq.gz")
  .map { id, files -> [id.substring(0,7), files[0], files[1]] }
  .groupTuple()
  .println()
I was wondering: is there a shortcut for -dump-channels? And is it possible to set up this variable in the config file?
no
Francesco Strozzi
@fstrozzi
Oct 05 2018 12:16

Hi @giannicorik_twitter in fact the pattern of your files is a little more complex than I thought, so my suggestion is to use the more powerful option of Channel.fromFilePairs that let extract the file name you want from your files. If you still want all your R1 and R2 in the same list, you can do something like this

samples = Channel.fromFilePairs("inputdir/Sample*_*_R{1,2}_*.fastq.gz",size:-1) {file -> file.name.split(/_S\d+/)[0]}

Here the size: parameter tells Channel.fromFilePairs to take all the files corresponding to your pattern, by default it only takes 2. The second part is a closure to split the file name according to your nomenclature and just extract the sample name that you wish.

If you want separate lists for R1 and R2 files for each sample, then follow the advise from Paolo
Bioninbo
@Bioninbo
Oct 05 2018 12:17
I see. Thanks @pditommaso
Riccardo Giannico
@giannicorik_twitter
Oct 05 2018 12:54
@fstrozzi Perfect! That's exactly what I was searching for :D thank you all! :D
Issa Kehinde Salaam
@Issakenny
Oct 05 2018 14:26
Hi, when I run "nextflow run plink-ql.nf" from the git directory on my Ubuntu virtual box, the following error was displayed ERROR ~ .nextflow/history.lock (No such file or directory).
Paolo Di Tommaso
@pditommaso
Oct 05 2018 14:33
VirtualBox shared file system prevents file locks
run NF in the VM home folder and specify the work directory in the shared file system using the -w command line opt
Issa Kehinde Salaam
@Issakenny
Oct 05 2018 14:39
Pls how can I achieve that?
Paolo Di Tommaso
@pditommaso
Oct 05 2018 14:48
cd $HOME
nextflow run plink-ql.nf -w <some other path>
Issa Kehinde Salaam
@Issakenny
Oct 05 2018 14:55
Pls, do I need to specify the path for git?
Git was clone into /usr/bin , should I change it to the directory " /usr/local/bin" where Install the NF?
Issa Kehinde Salaam
@Issakenny
Oct 05 2018 15:16
@pditommaso I run the above cmd with appropriate path to the clone h3agwas git on my laptop, but I have this error " cannot find nextflow-io/plink -ql -- make sure a GitHub repository at this address https:// github.com/nextflow-io/plink-ql." Pls do I still need to clone NF git as well?
Félix C. Morency
@fmorency
Oct 05 2018 15:21
Is there a way to combine set and each? ie. set sid, each file(roi) from data
Alexander Peltzer
@apeltzer
Oct 05 2018 15:44
Don#t see my mistake right now and already checked the configs...
N E X T F L O W  ~  version 0.32.0
ERROR ~ Project config file is malformed -- Cause: No signature of method: nextflow.config.ConfigParser$_parse_closure5.includeconfig() is applicable for argument types: (String) values: [conf/binac.config]
Paolo Di Tommaso
@pditommaso
Oct 05 2018 16:06
mm, smells like this nextflow-io/nextflow#888
@fmorency nope, you may need a combine or transpose operator to manage that
Alexander Peltzer
@apeltzer
Oct 05 2018 16:10
Okay found some more clues:
 nextflow run . -profile docker,test
N E X T F L O W  ~  version 0.32.0
ERROR ~ Project config file is malformed -- Cause: No signature of method: nextflow.config.ConfigParser$_parse_closure5.includeconfig() is applicable for argument types: (String) values: [conf/binac.config]

 -- Check '.nextflow.log' file for details
alex@aragorn ~/I/n/eager> nextflow run main.nf -profile docker,test
N E X T F L O W  ~  version 0.32.0
Launching `main.nf` [extravagant_magritte] - revision: e25cad3301
The latter one runs through
Luca Cozzuto
@lucacozzuto
Oct 05 2018 16:11
Hi all
is there a smarter way to make this?
if (annotation_raw_file=~ /.gz$/)  {
    process unzip_annotation {
        tag { annotation_raw_file }

           input:
                file(annotation_raw_file)

          output:
           file("*") into annotation_file

        script:
        """
            zcat ${annotation_raw_file} > `basename ${annotation_raw_file} .gz`
        """
       } 
} else {
       annotation_file = annotation_raw_file
}
I got a warning
Alexander Peltzer
@apeltzer
Oct 05 2018 16:12
zcat doesn't care whether its unzipped IIRC
Luca Cozzuto
@lucacozzuto
Oct 05 2018 16:12
WARN: Output channel `annotation_file` overrides another variable with the same name declared in the script context -- Rename it to avoid possible conflicts
Alexander Peltzer
@apeltzer
Oct 05 2018 16:12
So you could simply always run zcat... (and test this behaviour first maybe)
Luca Cozzuto
@lucacozzuto
Oct 05 2018 16:13
@apeltzer ? I don't want to cat a file without reason
btw thanks for answering :)
Alexander Peltzer
@apeltzer
Oct 05 2018 16:19
Maybe a when statement to define when to run which process, defining two different channels and using the mix operator to get them into one?
As you know already that there is always just one of the two mixed channels used, you'll be fine and the warning is gone
Luca Cozzuto
@lucacozzuto
Oct 05 2018 16:21
yes! I like it
many thanks
wait... if I mix them I got two values when mixing zipped and unzipped files..
Alexander Peltzer
@apeltzer
Oct 05 2018 16:23
Hm, couldn't you make that based on your regular expression?
That way excluding that you have both channels filled and only one of them is filled exclusively?
Luca Cozzuto
@lucacozzuto
Oct 05 2018 16:26
I was thinking to this
annotation_file = (unzipped_annotation_file ? unzipped_annotation_file: annotation_raw_file)
bit it does not work
Luca Cozzuto
@lucacozzuto
Oct 05 2018 16:34
I managed with this... (apparently)
annotation_file = (unzipped_annotation_file.ifEmpty() ? annotation_raw_file: unzipped_annotation_file)
no
Luca Cozzuto
@lucacozzuto
Oct 05 2018 16:58
this finally worked
annotation_file = (annotation_raw_file =~ /.gz$/ ? unzipped_annotation_file: annotation_raw_file)
Alexander Peltzer
@apeltzer
Oct 05 2018 17:01
Nice :-)
Paolo Di Tommaso
@pditommaso
Oct 05 2018 17:15
@apeltzer there's a typo, it's includeConfig not includeconfig
Alexander Peltzer
@apeltzer
Oct 05 2018 17:46
-.-
Weird I didn't see it at all