These are chat archives for nextflow-io/nextflow

8th
Oct 2018
Maxime Vallée
@valleem
Oct 08 2018 06:22
Hello! I got a question on -resume behaviour. I am using a GATK GenomicsDBImport process in my NF pipeline, split by chromosomes. Even if a chromosome has finished nicely, when I re-run my pipeline, -resume does not work as intended and it re-runs the process. I am using NF and -resume for other pipelines and others processes with no trouble at all. Where could I look to investigate what falsely triggers the re-run?
Paolo Di Tommaso
@pditommaso
Oct 08 2018 06:39
there should something something changing in the task inputs
Maxime Vallée
@valleem
Oct 08 2018 06:41
Hello Paolo! Here is my process :
process GenomicsDBImport {

    cpus 1 
    memory '72 GB'
    time '12h'

    tag { chr }

    input:
    each chr from chromosomes_ch
    file (gvcf) from gvcf_ch.toList()
    file (gvcf_idx) from gvcf_idx_ch.toList()

    output:
    set chr, file ("${params.cohort}.${chr}.tar") into gendb_ch

    script:
    """
    ${GATK} GenomicsDBImport --java-options "-Xmx24g -Xms24g" \
    ${gvcf.collect { "-V $it " }.join()} \
    -L ${chr} \
    --genomicsdb-workspace-path ${params.cohort}.${chr}

    tar -cf ${params.cohort}.${chr}.tar ${params.cohort}.${chr}
    """
}
maybe the toList() is randomizing the input?
Paolo Di Tommaso
@pditommaso
Oct 08 2018 06:42
the problem could be these
    file (gvcf) from gvcf_ch.toList()
    file (gvcf_idx) from gvcf_idx_ch.toList()
Maxime Vallée
@valleem
Oct 08 2018 06:42
Should I force them to be sorted?
Paolo Di Tommaso
@pditommaso
Oct 08 2018 06:42
use collect instead of toList
Maxime Vallée
@valleem
Oct 08 2018 06:42
ok
I will try and report
Paolo Di Tommaso
@pditommaso
Oct 08 2018 06:43
:ok_hand:
Maxime Vallée
@valleem
Oct 08 2018 07:07
Report : it works. Run from scratch, completed chrY. Killed NF, re-run with -resume, chrY skipped as intended! Thanks Paolo!!
Paolo Di Tommaso
@pditommaso
Oct 08 2018 07:07
:v:
Luca Cozzuto
@lucacozzuto
Oct 08 2018 09:19
Hello. I'm trying to use a shell script I have in bin as afterScript
but I got an error
/usr/share/univage/soldierantcluster/spool/node-hp0209/job_scripts/26382096: line 103: fixQcml.sh: command not found
Luca Cozzuto
@lucacozzuto
Oct 08 2018 09:37
I'm wondering whether this script should be put in another place...
Paolo Di Tommaso
@pditommaso
Oct 08 2018 09:49
nope, use afterScript "$baseDir/bin/fixQcml.sh"
Luca Cozzuto
@lucacozzuto
Oct 08 2018 09:54
ok thanks
Paolo Di Tommaso
@pditommaso
Oct 08 2018 10:31
and ?
Luca Cozzuto
@lucacozzuto
Oct 08 2018 10:33
sorry
was not for you :)
Paolo Di Tommaso
@pditommaso
Oct 08 2018 10:33
:smile:
Luca Cozzuto
@lucacozzuto
Oct 08 2018 10:33
too many gitter chats :)
Anthony Underwood
@aunderwo
Oct 08 2018 13:14

I have a problem with -resume and adding the lenient directive doesn't help

The process that always reruns is the trimming process (see below). Does the fact that it has a stdout output as an input make a difference?

process determine_min_read_length {
  tag { pair_id }

  input:
  set pair_id, file(file_pair) from raw_fastqs_for_length_assessment

  output:
  stdout min_read_length 

  """
  seqtk sample -s123 ${file_pair[0]} 1000 | printf "%.0f\n" \$(awk 'NR%4==2{sum+=length(\$0)}END{print sum/(NR/4)/3}')
  """
}



// Trimming
process trimming {
  tag { pair_id }

  input:
  val(min_len) from min_read_length
  file('adapter_file.fas') from adapter_file
  set pair_id, file(file_pair) from raw_fastqs_for_trimming

.....
Paolo Di Tommaso
@pditommaso
Oct 08 2018 13:28
no it should not, how is raw_fastqs_for_length_assessment defined ?
Anthony Underwood
@aunderwo
Oct 08 2018 13:30
From a 'normal' fromFilePairs channel
Channel
  .fromFilePairs( fastqs )
  .ifEmpty { error "Cannot find any reads matching: ${fastqs}" }
  .set { raw_fastqs }




// duplicate raw fastq channel
raw_fastqs.into { raw_fastqs_for_trimming; raw_fastqs_for_length_assessment; raw_fastqs_for_genome_size_estimation; raw_fastqs_for_read_counting}
Paolo Di Tommaso
@pditommaso
Oct 08 2018 13:36
Does the fact that it has a stdout output as an input make a difference?
it should not, to double check try to remove
val(min_len) from min_read_length
using a fake min_val eg
val(min_len) from '100'
Luca Cozzuto
@lucacozzuto
Oct 08 2018 13:38
Dear all, I think many of you struggled with multidimensional lists :) do you know how to proper set up them?
Paolo Di Tommaso
@pditommaso
Oct 08 2018 13:39
a list can contain any object and therefore other list .. that's it, nothing special
Luca Cozzuto
@lucacozzuto
Oct 08 2018 13:40
I'm trying to do something simple like:
def Correspondence = []
Correspondence["MS2specCount"]["shotgun"]                 = "0000007"
Oct-08 15:40:29.392 [main] DEBUG nextflow.Session - Session aborted -- Cause: argument type mismatch
Oct-08 15:40:29.582 [main] ERROR nextflow.cli.Launcher - @unknown
java.lang.IllegalArgumentException: argument type mismatch
Paolo Di Tommaso
@pditommaso
Oct 08 2018 13:43
def Correspondence = [:]
Correspondence["MS2specCount"]  = ["shotgun": "0000007"]
BTW that's not a multidimensional lists
but a multidimensional associative array ..
Luca Cozzuto
@lucacozzuto
Oct 08 2018 13:47
ERROR ~ You tried to use a map entry for an index operation, this is not allowed. Maybe something should be set in parentheses or a comma is missing?
 @ line 135, column 32.
   Correspondence["MS2specCount"]["shotgun" : "0000007"]
                                  ^

1 error
I got the error :)
missing =
uff quite different from other languages...
Paolo Di Tommaso
@pditommaso
Oct 08 2018 13:48
try to speak german when you to your local bakery ..
Luca Cozzuto
@lucacozzuto
Oct 08 2018 13:50
he'll give me pretzel ; )
BTW thanks!
Paolo Di Tommaso
@pditommaso
Oct 08 2018 13:54
:ok_hand:
Sven F.
@sven1103
Oct 08 2018 14:44
it is Brezel
:joy:
Luca Cozzuto
@lucacozzuto
Oct 08 2018 14:46
uff... :) I know that they name differently according to different regions: Laugenbrezel, Pretzel, Pretzl, Bretzel, Breze o Brez...
Sven F.
@sven1103
Oct 08 2018 14:54
with P really? This is new to me :D
and there is Butterbrezel (best of all)
sorry for the spam ;)
Luca Cozzuto
@lucacozzuto
Oct 08 2018 15:16
now I'll crave for them.. and the only pretzel shop in BCN just close definitely :(
Sven F.
@sven1103
Oct 08 2018 15:17

https://en.wikipedia.org/wiki/Pretzel

Yeah, that is the English version, but that is not how it used here or in Austria

I guess the letter B was to hard to pronounce, followed by the r
:P
@pditommaso thanks for you feedback, I reopened a PR #891 , as my fork somehow was messed up -.-
Anthony Underwood
@aunderwo
Oct 08 2018 15:20
val(min_len) from '100'
If I run this -resume works correctly. Somehow stdout is breaking this
Diogo Silva
@ODiogoSilva
Oct 08 2018 15:30
@pditommaso This warning message has been appearing recently (since 0.31.0, I think?): WARN: Process configuration syntax $processName has been deprecated
Is there a release milestone in which the old syntax will not be supported anymore?
Paolo Di Tommaso
@pditommaso
Oct 08 2018 15:41
not scheduled int he short run, likely next year
Luca Cozzuto
@lucacozzuto
Oct 08 2018 15:45
hi again @pditommaso. I'm trying to add a value in a channel but it thinks that is a file
    output:
    set sample_id, internal_code, "shotgun_qc4l_cid", checksum, file("${sample_id}.featureXML") into 
shot_qc4l_cid_featureXMLfiles_for_calc_peptide_area
it complains
Missing output file(s) `shotgun_qc4l_cid` expected by process `shotgun_qc4l_cid (9d9d9d1b-9d9d-4f1a-9d27-9d2f7635059d_QC03_b6f96ae8248181b823f5a720c09445cd)` -- Error is ignored
Diogo Silva
@ODiogoSilva
Oct 08 2018 15:47
thanks!
Paolo Di Tommaso
@pditommaso
Oct 08 2018 15:48
strings are considered files, use val("shotgun_qc4l_cid")
Luca Cozzuto
@lucacozzuto
Oct 08 2018 15:49
thanks (may I ask why string are considered files?)
Paolo Di Tommaso
@pditommaso
Oct 08 2018 15:49
that's supposed to be a shortcut
Luca Cozzuto
@lucacozzuto
Oct 08 2018 15:53
mmm I don't know if is a good idea. But just my opinion :P
Paolo Di Tommaso
@pditommaso
Oct 08 2018 15:55
thank you for your opinion :wink:
Anthony Underwood
@aunderwo
Oct 08 2018 15:56
I'm wondering what -resume uses to check if a process has previously run , given that stdout as an output is causing the process to rerun each time. Does it checksum the process description?
Paolo Di Tommaso
@pditommaso
Oct 08 2018 15:57
-resume creates a hash combining all input files + task script
if there's a work dir with that hash and the declared outputs -> skip the execution
if you wan to debug that task use -dump-hashes and compare two executions
Anthony Underwood
@aunderwo
Oct 08 2018 15:58
so when the output is not a file but stdout how is that hashed?
Paolo Di Tommaso
@pditommaso
Oct 08 2018 15:59
the content
anonymous
@naveenmeena584
Oct 08 2018 18:18
hello
ii want to run my nextflow on spark cluster is it possible