These are chat archives for nextflow-io/nextflow

19th
Jul 2017
Tobias Neumann
@t-neumann
Jul 19 2017 07:39

@pditommaso Hi again - can I bother you once more regarding the Channel consisting of SE and PE files? To remind you, THIS

pairedEndRegex = params.readDir + "/*_{1,2}.fastq.gz"
SERegex = params.readDir + "/*[!12].fastq.gz"

reads_ch  = pairedEndRegex ? Channel.fromFilePairs(pairedEndRegex) : Channel.fromFrommFilePairs(SERegex, size: 1){ file -> file.baseName.replaceAll(/.fastq/,"") }

is what you recommended me to do which however only creates a PE channel:

Jul-18 17:29:33.132 [main] DEBUG nextflow.Channel - files for syntax: glob; folder: ../raw/results/; pattern: *_{1,2}.fastq.gz; options: null
Shellfishgene
@Shellfishgene
Jul 19 2017 07:52
Question: Why does the make_matrix process not run here? It seems nf ignores it.
#!/usr/bin/env nextflow

params.reads = "$baseDir/raw_data_name_mod/*1_R{1,2}.fastq.gz"
params.index = "$baseDir/index/pipefish"

Channel
    .fromFilePairs( params.reads )
    .ifEmpty { error "Cannot find any reads matching: ${params.reads}" }
    .set { fastq_ch } 

Channel
    .value( params.index )
    .set { index_prefix } 

process rsem {
   publishDir "rsem_out_nf", mode: 'symlink'

   input:
   val(index) from index_prefix
   set pair_id, file(reads) from fastq_ch

   output:
   file( "${pair_id}.genes.results" ) into rsem_results

   """
   rsem-calculate-expression -p ${task.cpus} --paired-end --bowtie2 --estimate-rspd --append-names ${reads} ${index} ${pair_id}
   """
}

process make_matrix {
   publishDir "rsem_out_nf", mode: 'symlink'

   input:
   file 'rsem'  from rsem_results.collect()

   output:
   file 'tmp_matrix.txt'

   """
   rsem-generate-data-matrix-tpm ${rsem} > tpm_matrix.txt
   """
}
Paolo Di Tommaso
@pditommaso
Jul 19 2017 08:04
if doesn't run mean the process is not received the required data, check the path "$baseDir/raw_data_name_mod/*1_R{1,2}.fastq.gz" is correct
Shellfishgene
@Shellfishgene
Jul 19 2017 08:05
But the rsem process runs fine, just the second does not
Paolo Di Tommaso
@pditommaso
Jul 19 2017 08:06
make_matrix ?
Shellfishgene
@Shellfishgene
Jul 19 2017 08:06
yes
Paolo Di Tommaso
@pditommaso
Jul 19 2017 08:08
it should work, do you have the .nextflow.log file ?
Shellfishgene
@Shellfishgene
Jul 19 2017 08:09
Yes:
Jul-19 10:09:21.356 [Actor Thread 7] INFO  nextflow.processor.TaskProcessor - [e0/bf868e] Cached process > rsem (4)
Jul-19 10:09:21.383 [Actor Thread 1] DEBUG nextflow.processor.TaskProcessor - <rsem> After stop
Jul-19 10:09:21.383 [Actor Thread 3] DEBUG nextflow.processor.TaskProcessor - <rsem> Sending poison pills and terminating process
Jul-19 10:09:21.384 [Actor Thread 3] DEBUG nextflow.Session - <<< barrier arrive (process: rsem)
Jul-19 10:09:21.384 [main] DEBUG nextflow.Session - Session await > all process finished
Jul-19 10:09:21.446 [Thread-2] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local)
Jul-19 10:09:21.446 [main] DEBUG nextflow.Session - Session await > all barriers passed
Jul-19 10:09:21.495 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye
Paolo Di Tommaso
@pditommaso
Jul 19 2017 08:11
please upload all by using https://pastebin.com or something similar
@t-neumann my was more a pseudocode example, you will need a proper condition to choose between SE or PE
Shellfishgene
@Shellfishgene
Jul 19 2017 08:12
Paolo Di Tommaso
@pditommaso
Jul 19 2017 08:14
may not be the problem but you are using a quite old version, could you update to the latest ?
Tobias Neumann
@t-neumann
Jul 19 2017 08:16
@pditommaso you mean like a parameter? because I want to have a channel consisting of both SE and PE files in that directory to remind you.
Paolo Di Tommaso
@pditommaso
Jul 19 2017 08:17
ahhhhh
then, you want to use both PE and SE during the same execution
right?
Tobias Neumann
@t-neumann
Jul 19 2017 08:18
exactly - and depending on whether the fromFilePairs channel returns two files or one file, i run PE or SE commands respectively
Paolo Di Tommaso
@pditommaso
Jul 19 2017 08:19
fine, that's easy eg
se_ch = Channel.fromPath('/some/path', size:1)
pe_ch = Channel.fromPath('/other/path')
all_ch = se_ch.mix(pe_ch)
all_ch will contain both
Tobias Neumann
@t-neumann
Jul 19 2017 08:20
cool - I'll try it right away
Shellfishgene
@Shellfishgene
Jul 19 2017 08:21
Ok, with the new version the second process runs!
Just not as expected ;). The rsem_results channel outputs just the pair id, not the full file name it seems. How to I need to change file( "${pair_id}.genes.results" ) into rsem_results to output the full filename with the .genes.results part?
Tobias Neumann
@t-neumann
Jul 19 2017 08:22
wohoo it works - thanks a bunch @pditommaso
Paolo Di Tommaso
@pditommaso
Jul 19 2017 08:22
:v:
@Shellfishgene not possible, can you share the complete script with pastebin as well ?
Shellfishgene
@Shellfishgene
Jul 19 2017 08:23
Wait, what does file 'rsem' from rsem_results.collect() actually do? For me it returns rsem1, rsem2 instead of the files that should be in the channel...
Full script is the one I pasted here above.
Paolo Di Tommaso
@pditommaso
Jul 19 2017 08:24
ah yes
if you specify file 'rsem' ..
all files will be staged with the rsem name
since you have many, not so useful ..
replace with file '*' from rsem_results.collect()
Shellfishgene
@Shellfishgene
Jul 19 2017 08:26
Ok, what do I put instead of ${rsem} in the command then?
Paolo Di Tommaso
@pditommaso
Jul 19 2017 08:26
that's another option
input: 
file rsem from rsem_results.collect()
then you can use $rsem variable in the script for all files
Shellfishgene
@Shellfishgene
Jul 19 2017 08:28
Great, that works. Just out of interest, if I do file '*' form rsem_results.collect(), what do I put in the command as a variable for the files?
Paolo Di Tommaso
@pditommaso
Jul 19 2017 08:29
that's is useful if the file names the process is receiving are known
you should use the expected rsem file names
Shellfishgene
@Shellfishgene
Jul 19 2017 08:29
So I would just hard-code that in the command, like *.txt in that case?
Paolo Di Tommaso
@pditommaso
Jul 19 2017 08:29
yep
Shellfishgene
@Shellfishgene
Jul 19 2017 08:29
Ok, good to know. Thanks!
Paolo Di Tommaso
@pditommaso
Jul 19 2017 08:30
:ok_hand:
Evan Floden
@evanfloden
Jul 19 2017 11:50

Minor complaint. When I run cat .nextflow/history, the result does not have the apostrophes in the original command and therefore I can not just copy paste.

Here the lack of apostropes stop the glob expansion.

2017-07-19 13:47:01    -    gigantic_heisenberg    -    ea91064aa7974d63ed94dc25e8a314c5    1000d667-ac2a-468d-a861-f8842224a84e    nextflow run kallisto.nf —reads=/data/*_read{1,2}.fastq.gz
Phil Ewels
@ewels
Jul 19 2017 11:56
+1
Paolo Di Tommaso
@pditommaso
Jul 19 2017 11:58
very lazy users!! :)
open an issue as always ;)
Evan Floden
@evanfloden
Jul 19 2017 11:58
I was hoping for some trick…
Will do
Paolo Di Tommaso
@pditommaso
Jul 19 2017 12:09
unfortunately, it's quite tricky
it would required to parse the command line, detect globs and escape them
Evan Floden
@evanfloden
Jul 19 2017 12:09
I imagine. The reason I ask though is that sometimes the command line is huge with long paths etc.
Paolo Di Tommaso
@pditommaso
Jul 19 2017 12:11
yes, makes sense
Félix C. Morency
@fmorency
Jul 19 2017 13:54
foo = params.foo?.tokenize(',')
What's the ? for?
Paolo Di Tommaso
@pditommaso
Jul 19 2017 13:55
that's equivalent to
if( params.foo ) {
  foo = params.foo.tokenize(',')
}
else {
  foo = null
}
Félix C. Morency
@fmorency
Jul 19 2017 13:56
oh cool. will it conflict if I have params { foo = ['foo', 'bar'] } in my config?
Paolo Di Tommaso
@pditommaso
Jul 19 2017 13:57
not at all
it would be true in that case
Félix C. Morency
@fmorency
Jul 19 2017 13:57
awesome thanks
Félix C. Morency
@fmorency
Jul 19 2017 14:03
mmm I get No signature of method: java.util.ArrayList.tokenize() is applicable for argument types: (java.lang.String) values: [,] if I do params.algo = params.algo?.tokenize(',') and params.algo = ['foo', 'bar']
Paolo Di Tommaso
@pditommaso
Jul 19 2017 14:03
ooohh, sorry no
I was thinking to the ?. operator
yes you should convert the list of a string
Félix C. Morency
@fmorency
Jul 19 2017 14:11
I see
Félix C. Morency
@fmorency
Jul 19 2017 14:24
it works thanks
Paolo Di Tommaso
@pditommaso
Jul 19 2017 14:43
:+1:
Félix C. Morency
@fmorency
Jul 19 2017 15:23
btw the each feature is very cool
Paolo Di Tommaso
@pditommaso
Jul 19 2017 15:23
:D
Ashley S Doane
@DoaneAS
Jul 19 2017 19:22
Hi Paolo @pditommaso , do you have by any chance a Singularity file for your RNAseq-nf example pipeline that uses salmon?
I was looking for the dockerfile or location of docker image, from which it is easy to create a singularity image, but I couldn't find these
Paolo Di Tommaso
@pditommaso
Jul 19 2017 19:27
Usually I do convert Docker images to Singularity ones
with the latest version one your can just pull them from the docker hub
Ashley S Doane
@DoaneAS
Jul 19 2017 19:32
yes, exactly. But what is the address for the docker image to pull? I'm guessing from the nextflow.config that it is pulling container from quay.io/biocontainers, like 'quay.io/biocontainers/multiqc:1.0--py35_4', but I'm not familiar with this system
is that a docker image?
Paolo Di Tommaso
@pditommaso
Jul 19 2017 19:34
yes exactly
then you can pull with singularity with
singularity pull docker://quay.io/biocontainers/multiqc:1.0--py35_4
Ashley S Doane
@DoaneAS
Jul 19 2017 19:35
ok cool, thanks!!
Paolo Di Tommaso
@pditommaso
Jul 19 2017 19:36
tho I think you will need to specify the image size, the following should work
singularity pull -size 600 docker://quay.io/biocontainers/multiqc:1.0
or more if it's not enough
otherwise a Dockerfile to create single docker image would be the following
FROM continuumio/miniconda
MAINTAINER Paolo Di Tommaso <paolo.ditommaso@gmail.com>

RUN conda config --add channels defaults \
 && conda config --add channels conda-forge \
 && conda config --add channels bioconda \
 && conda install -y salmon=0.8.2 fastqc=0.11.5 multiqc=1.0
Ashley S Doane
@DoaneAS
Jul 19 2017 19:44
thanks.
I tried
singularity pull --size 600 docker://quay.io/biocontainers/multiqc:1.0:latest
and got the following: ERROR Error getting token for repository biocontainers/multiqc:1.0, exiting.
I will try converting that docker file, thanks
Paolo Di Tommaso
@pditommaso
Jul 19 2017 19:45
I think multiqc requires 1200 (MB)
Ashley S Doane
@DoaneAS
Jul 19 2017 19:46
ok - still tocken error.
Paolo Di Tommaso
@pditommaso
Jul 19 2017 19:48
tocken ?
Ashley S Doane
@DoaneAS
Jul 19 2017 19:48
sorry... error is: ERROR Error getting token for repository biocontainers/multiqc:1.0, exiting.
Paolo Di Tommaso
@pditommaso
Jul 19 2017 19:51
weird
Ashley S Doane
@DoaneAS
Jul 19 2017 19:54
for sure I have no idea what it means 🤔😬
Paolo Di Tommaso
@pditommaso
Jul 19 2017 19:55
I've never seen :/
Phil Ewels
@ewels
Jul 19 2017 20:04
v1.1 was released yesterday btw ;)
Paolo Di Tommaso
@pditommaso
Jul 19 2017 20:07
oh this worked !
$ singularity pull --size 1400 docker://quay.io/biocontainers/multiqc:1.1--py27_0
Initializing Singularity image subsystem
Opening image file: multiqc-1.1--py27_0.img
Creating 1400MiB image
Binding image to loop
Creating file system within image
Image is done: multiqc-1.1--py27_0.img
Docker image path: quay.io/biocontainers/multiqc:1.1--py27_0
Cache folder set to /home/pditommaso/.singularity/docker
[8/8] |===================================| 100.0% 
Importing: base Singularity environment
Importing: /home/pditommaso/.singularity/docker/sha256:e01066e8a456c64e445fe9b745cfab0ba690ffc964374f5f09758cc4f8e7ad5b.tar.gz
Importing: /home/pditommaso/.singularity/docker/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4.tar.gz
Importing: /home/pditommaso/.singularity/docker/sha256:aef3b3b2fa0d190f2a8ab3e43c7ce4e34ae8eb29d56a93c36c82740a30d4dac0.tar.gz
Importing: /home/pditommaso/.singularity/docker/sha256:531ebc5af9ff52b42017cbbde280607e75570a05b66a60be8e12f9417b3fbad4.tar.gz
Importing: /home/pditommaso/.singularity/docker/sha256:00f810677cffc160025dbf82081ebd25ca8f951257749120df70c23d04c10f1c.tar.gz
Importing: /home/pditommaso/.singularity/docker/sha256:6c2ebb6634fc80ba5bfd31a6bed097d63883a78f1deb6c4fdb58aeb219e4ccba.tar.gz
Importing: /home/pditommaso/.singularity/docker/sha256:d836c29a56fbb4798289a6d46d6726e0c52da024b834cb3ed14d80f8d5e4112a.tar.gz
Importing: /home/pditommaso/.singularity/docker/sha256:a7f760de4b2725b11b38f8e0a36c1d8abd2e2128442e8dabbec6baa86e64173b.tar.gz
Importing: /home/pditommaso/.singularity/docker/sha256:4c1fa756c345dec2e28659f9b7a6195bd1f68cab67e3e8db7edb782abcba4b57.tar.gz
Importing: /home/pditommaso/.singularity/metadata/sha256:3988644f96d4a3069f35ad75fee0173c6fd9dba693dbb44cb0198cfd0d889f1d.tar.gz
Done. Container is at: multiqc-1.1--py27_0.img
new multiqc release patched singularity as well ;)
Sergey Venev
@sergpolly
Jul 19 2017 20:35
Hi,
we were trying to run a nextflow pipeline on a SLURM cluster and faced an unusual issue: nextflow runs the first step of the pipeline and exits with an error about not being able to access the folder specified by storeDir - our intermediate cache. Once we resume from that state the pipeline goes a step further, but then gives an error again, we resume and go one more step further ... this way we can run the whole pipeline successfully. So, each process can run individually and completes successfully, but something is wrong with premissions or SLURM - could you help us debug this issue?
Paolo Di Tommaso
@pditommaso
Jul 19 2017 21:19
I'm not understanding, the error is always the same ?
Sergey Venev
@sergpolly
Jul 19 2017 21:25
Yes, it gives us warning saying it cannot access the directory specified as storeDir
but if we resume it goes one step further
Paolo Di Tommaso
@pditommaso
Jul 19 2017 21:25
can I see the complete error message ?
Sergey Venev
@sergpolly
Jul 19 2017 21:26
it's crazy
one second, my collaborator was running it he'll show it
rdacemel
@rdacemel
Jul 19 2017 21:29
ERROR ~ Error executing process > 'chunk_fastqs (library:MATa_R2 run:lane1)'

Caused by:
  /home/ska/rafa/programs/distiller-nf/test/intermediates/fastq_chunksº
Command executed:

  zcat SRR2601851_1.fastq.gz | split -l 20000 -d         --filter 'pbgzip -c -n 2 > $FILE.1.fastq.gz' -         MATa_R2.lane1.

  zcat SRR2601851_2.fastq.gz | split -l 20000 -d         --filter 'pbgzip -c -n 2 > $FILE.2.fastq.gz' -         MATa_R2.lane1.

Command exit status:
  0

Command output:
  (empty)

Work dir:
  /home/ska/rafa/programs/distiller-nf/work/65/0134876397edbe096431a888e58a65

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
WARN: Killing pending tasks (2)
Paolo Di Tommaso
@pditommaso
Jul 19 2017 21:29
:+1:
what is that strange char º at the end of the ... intermediates/fastq_chunks path ?
Sergey Venev
@sergpolly
Jul 19 2017 21:31
I guess he just typed it now
it wasn't there originally
Paolo Di Tommaso
@pditommaso
Jul 19 2017 21:31
sure? :)
Sergey Venev
@sergpolly
Jul 19 2017 21:31
100%
Paolo Di Tommaso
@pditommaso
Jul 19 2017 21:32
could should share also the file .nextflow.log by using pastebin.com or something similar ?
Sergey Venev
@sergpolly
Jul 19 2017 21:32
yes, we'll try
Many thanks btw!!!
Sergey Venev
@sergpolly
Jul 19 2017 21:37
we tried errorStrategy='retry' this time - just to check
just to make it clear: nextflow IS able to access all of this folders that it complaints about
moreover it copies all the intermediate files to these folders
it just fails to proceed to the next step, aka process
Paolo Di Tommaso
@pditommaso
Jul 19 2017 21:43
so the path ../intermediates/fastq_chunks contains the expected files after it failed ?
rdacemel
@rdacemel
Jul 19 2017 21:43
yes...
weird thing
Paolo Di Tommaso
@pditommaso
Jul 19 2017 21:44
this seems to be a problem with your file system, I've seen such issue with NFS
Sergey Venev
@sergpolly
Jul 19 2017 21:45
what should ask our admins?
Paolo Di Tommaso
@pditommaso
Jul 19 2017 21:45
what happens is that a remote node copies the data to the target directory
Sergey Venev
@sergpolly
Jul 19 2017 21:45
I don't even know how to formulate the question
Paolo Di Tommaso
@pditommaso
Jul 19 2017 21:46
then the main NF app is running on the master node is not able to see that data
the file system state is somehow inconsistent, due to caching or latency
rdacemel
@rdacemel
Jul 19 2017 21:47
we think we may have an explanation, well work on that and well let you know!
Sergey Venev
@sergpolly
Jul 19 2017 21:47
I'm actually running late - we'll see the results tomorrow. Many thank Paolo!
Paolo Di Tommaso
@pditommaso
Jul 19 2017 21:48
yes, please let me know what sysadmins say
you are welcome
rdacemel
@rdacemel
Jul 19 2017 21:48
we were putting the intermediate files in wrong folders (home folder subdirectory) and that's not the way it should be
it may be related
we'll figure it out and we'll let you know
thanks again!!
Paolo Di Tommaso
@pditommaso
Jul 19 2017 21:50
ah