These are chat archives for nextflow-io/nextflow

5th
Mar 2018
Vladimir Kiselev
@wikiselev
Mar 05 2018 10:56

Hi Paolo, when defining pipeline parameter is this how I should do it?

params.sample = false

and then in the nextflow call use —sample abcd?
The problem is that when I use in the pipeline script:

if( params.sample ){
   …
}

sample is still false even if I define it in the nextflow call.

Paolo Di Tommaso
@pditommaso
Mar 05 2018 11:01
script params use -- not single one -
Vladimir Kiselev
@wikiselev
Mar 05 2018 11:02
Yes, this is what I did. Above it was gitter formatting, I think
Vladimir Kiselev
@wikiselev
Mar 05 2018 11:10
ok, it works now, thanks!
Vladimir Kiselev
@wikiselev
Mar 05 2018 11:25
one more question: when one process produces an stdout with multiple filenames as an output, how can I use these names in the next process and launch a separate job for each of the filenames?
Evan Floden
@evanfloden
Mar 05 2018 12:16

Recently there was a fix for using resume for retried jobs and it works great when a task is ultimately sucessful. However I’m not convinced it is working when tasks do not complete and the pipeline dies for an unspecified reason.

Consider the following situation:

[6a/e852e2] Submitted process > processA 
[6a/e852e2] NOTE: Process `processA ` terminated with an error exit status (140) -- Execution is retried (1)
[62/9f6704] NOTE: Process `processA ` terminated with an error exit status (140) -- Execution is retried (2)

——then pipeline dies for another reason ——

Upon running with -resume, from my intial observations, the task is submitted but with parameters from task.attempt=1 and not from the last run task attempt, in the example above task.attempt=3.

Paolo Di Tommaso
@pditommaso
Mar 05 2018 12:26
when one process produces an stdout with multiple filenames as an output
@wikiselev not sure to understand this, stdout does not produce any file, what do you mean exactly ?
Vladimir Kiselev
@wikiselev
Mar 05 2018 13:00
It produced a list of filenames
And I want to use the filenames in the next process
Paolo Di Tommaso
@pditommaso
Mar 05 2018 13:01
these file names are known or do they have a common prefix or suffix ?
Vladimir Kiselev
@wikiselev
Mar 05 2018 13:44
Sorry, got disturbed by a meeting
So, the input to the first process is a sample id. The output of the first process is a a list of filenames corresponding to the sample id. I write it to stdout and they look like this:
24479_3#1.cram
24479_3#2.cram
24479_3#3.cram
24479_3#4.cram
24479_4#1.cram
24479_4#2.cram
24479_4#3.cram
24479_4#4.cram
24479_5#1.cram
24479_5#2.cram
24479_5#3.cram
24479_5#4.cram
these files need to be downloaded from another resource
so I want to have a process which start a separate downloading job for each filename in the list
the list of filenames can change depending on the original sample id
Vladimir Kiselev
@wikiselev
Mar 05 2018 13:51
the files have the same cram extension but they are not on the storage system where NF is running, I want to download them from another resource
hope this helps
Tintest
@Tintest
Mar 05 2018 14:13
Hi,
Is there a way to specify a number of cpu per fork (maxForks) ? Because in a same process I piped bwa with samtools, i'm allowing 3 max forks and 7 cpu per task, but before bwa is over, samtools is starting and taking 7 cpus aswell. How can I do, except splitting my processes ? I'm piping bwa into samtools to avoid to get a .sam file, if you have a solution for this withouth and then having to separate processes, i'm listening.
Thanks :)
Evan Floden
@evanfloden
Mar 05 2018 14:54
@wikiselev Consider this example which you should be able to run with nextflow console:
Channel
  .from("A","B","C")
  .set {id_list_ch}

process generateListsOfFiles {
    input:
    val(id) from id_list_ch

    output:
    set val(id), file ("${id}_file_list.txt") into file_list_ch

    script:
    """
    for ((i=1;i<=5;i++)); 
    do 
        echo ${id}.\$i.cram >> ${id}_file_list.txt; 
    done
    """
 }

 file_list_ch
   .map {it -> it[1].text}
   .splitText( by: 0 )
   .set {files_to_download}

process filesToDownload {
     input:
     val(download) from files_to_download

     script:
     """ 
     echo ${download}
     """
}
Paolo Di Tommaso
@pditommaso
Mar 05 2018 15:22
@wikiselev even easier
process foo {
  output:
  stdout foo_ch

  script:
  """
    cat <<EOL
    24479_3#1.cram
    24479_3#2.cram
    24479_3#3.cram
    24479_3#4.cram
    EOL
  """
}

process bar {
  input: 
  file x from foo_ch.flatMap{ it.readLines() }

  """
  echo $x
  """
}
Tintest
@Tintest
Mar 05 2018 15:40
The cpus directive seems to work for the whole process, but not for a fork, I can give 21 cpus to a process, but I can not force 3x7 cpus. Anyway it's not a big deal, i'm just trying to discover the panel of possibilities that nextflow is offering. Thanks @pditommaso :)
Paolo Di Tommaso
@pditommaso
Mar 05 2018 15:52
if you set cpus 7 and maxForks 3it will execute max 3 process using 7 cpus
Rickard Hammarén
@Hammarn
Mar 05 2018 15:56

Hi @pditommaso I've found something interesting with regards to config files.
Toy example:

profiles {

  standard {
    includeConfig 'conf/config1'
    includeConfig 'conf/config2'
  }
}

config1:

singularity {
  enabled = true
}

config 2:

singularity {
  enabled = false
}

with Nextflow v 0.25.1 I get singularity false at runtime but with 0.27.6 i get true
And I was expecting the earlier behaviour, we have our RNA config files set up with that in mind :
https://github.com/SciLifeLab/NGI-RNAseq/blob/master/nextflow.config#L35

Paolo Di Tommaso
@pditommaso
Mar 05 2018 16:05
ugly
can you please try 0.28.0-RC1
Rickard Hammarén
@Hammarn
Mar 05 2018 16:11
Didn't seem to work
Paolo Di Tommaso
@pditommaso
Mar 05 2018 16:12
if so open an issue on GH please
Rickard Hammarén
@Hammarn
Mar 05 2018 16:13
Will do
Tintest
@Tintest
Mar 05 2018 16:25
@pditommaso Even in the case where you are piping several tools in a signle command ? Because samtools starts before the end of bwa, so more than 7 cpus are used for my fork, but in my config file a limit of 23 cpus is set and it's never over 23cpus for the 3 forks, but it's not equal to 21.
Paolo Di Tommaso
@pditommaso
Mar 05 2018 16:26
if so you will need to split that command in several processes
there's no way to handle that (whatever system you use)
Tintest
@Tintest
Mar 05 2018 16:28
ok then, thanks
Vladimir Kiselev
@wikiselev
Mar 05 2018 16:36
Thanks a lot @skptic and @pditommaso!
sorry, another question. I want to split a string by doing this:
script:
    """
    id_run="$(echo ${cram_file} | cut -d'_' -f 1)”
    …
    """
NF complains:
ERROR ~ illegal string body character after dollar sign;
   solution: either escape a literal dollar sign "\$5" or bracket the value expression "${5}" @ line 34, column 12.
if I add an escape to the first dollar sign then looks like the whole expression does not work
Paolo Di Tommaso
@pditommaso
Mar 05 2018 16:39
you need to distinguish bash $ from nextflow $
try
id_run="\$(echo ${cram_file} | cut -d'_' -f 1)"
Vladimir Kiselev
@wikiselev
Mar 05 2018 16:42
yes, I did that, but then when I try to use id_run by calling ${id_run} NF complains:
Unknown variable 'id_run' -- Make sure you didn't misspell it or define somewhere in the script before use it
Paolo Di Tommaso
@pditommaso
Mar 05 2018 16:47
what's id_run, I don't see there
I mean, is id_run a NF variable or a bash one ?
Vladimir Kiselev
@wikiselev
Mar 05 2018 16:49
here is what I did:
  id_run="\$(echo ${cram_file} | cut -d'_' -f 1)"
  echo \${id_run}
it is a bash variable
the same thing works in nextflow console though:
  id_run="\$(echo ${x} | cut -d'_' -f 1)"
  echo \${id_run}
Paolo Di Tommaso
@pditommaso
Mar 05 2018 16:51
just run this and works smoothly
process bar {
  input: 
  file cram_file from foo_ch.flatMap{ it.readLines() }

  """
  id_run="\$(echo ${cram_file} | cut -d'_' -f 1)"
  echo \${id_run}
  """
}
Vladimir Kiselev
@wikiselev
Mar 05 2018 16:58
thanks, worked now, don’t know why it didn’t before, maybe I pulled too quickly after a github commit
thanks a lot for your time and help!
Paolo Di Tommaso
@pditommaso
Mar 05 2018 17:40
:+1: