These are chat archives for nextflow-io/nextflow

5th
Sep 2018
Shawn Rynearson
@srynobio
Sep 05 2018 01:22

I could be wrong but I thought that that AWS prefix is essentially the path root '/' is the delimiter and everything after is interpreted as a file.

AWS example:

North America/USA/Washington/Bellevue
North America/USA/Washington/Seattle

Translates to:
North America (prefix)/->/Bellevue (file)

And yes 5,500 GET requests are high, but the puts are 3,500 . And I maxed out that limit. :(

Karin Lagesen
@karinlag
Sep 05 2018 08:10
what's the relationship between queuesize and maxforks?
Paolo Di Tommaso
@pditommaso
Sep 05 2018 08:11
they are independent setting
maxForks = max number of process can run in parallel
Karin Lagesen
@karinlag
Sep 05 2018 08:12
this would be in a slurm setting
Paolo Di Tommaso
@pditommaso
Sep 05 2018 08:12
queueSize = max number of jobs can be queued in your cluster
this would be in a slurm setting
no
Karin Lagesen
@karinlag
Sep 05 2018 08:13
bit confused, I set queuesize to 64, and now it seems I only get 10 being submitted to the cluster
Paolo Di Tommaso
@pditommaso
Sep 05 2018 08:13
well, depend your workflows
Karin Lagesen
@karinlag
Sep 05 2018 08:13
so.. which one of these would limit the number of jobs submitted to a slurm queue?
Paolo Di Tommaso
@pditommaso
Sep 05 2018 08:13
queueSize
Karin Lagesen
@karinlag
Sep 05 2018 08:14
ok
Karin Lagesen
@karinlag
Sep 05 2018 08:46
ok
so I am having issues with read sets pointing to wrong read sets
so, the first thing I do is the following:

process collate_data {
// Note, not publishing these because that would mean
// triple copies of the files on the system

tag {pair_id}

input:
set pair_id, file(reads) from read_pairs

output:
//set pair_id, file("${pair_id}_R{1,2}${params.file_ending}") into read_pairs_mlst, read_pairs_amr, read_pairs_vir
set pair_id, file("${pair_id}*_concat.fq.gz") into \
  (read_pairs_mlst, read_pairs_amr, read_pairs_vir)

"""
${preCmd}
cat ${pair_id}*R1* > ${pair_id}_R1_concat.fq.gz
cat ${pair_id}*R2* > ${pair_id}_R2_concat.fq.gz
"""

}

however, one of the processes when I run this links files like this:
lrwxrwxrwx 1 hkaspersen nn9305k 173 sep. 4 09:29 2014-01-2996_S150_L006_R1_001.fastq.gz -> /work/projects/nn9305k/ecoli/rawdata/180307_E00423.A.Project_Kaspersen-DNA1-2017-12-21/180307_E00423.A.Project_Kaspersen-DNA1-2017-12-21/Sample_3/3_S150_L006_R1_001.fastq.gz
lrwxrwxrwx 1 hkaspersen nn9305k 173 sep. 4 09:29 2014-01-2996_S150_L006_R2_001.fastq.gz -> /work/projects/nn9305k/ecoli/rawdata/180307_E00423.A.Project_Kaspersen-DNA1-2017-12-21/180307_E00423.A.Project_Kaspersen-DNA1-2017-12-21/Sample_3/3_S150_L006_R2_001.fastq.gz
Paolo Di Tommaso
@pditommaso
Sep 05 2018 08:49
you may want to format the code properly
tiple `
new-line
code
new-line
triple `
Karin Lagesen
@karinlag
Sep 05 2018 08:49
thanks, and sorry :)
process collate_data {
    // Note, not publishing these because that would mean
    // triple copies of the files on the system

    tag {pair_id}

    input:
    set pair_id, file(reads) from read_pairs

    output:
    //set pair_id, file("${pair_id}_R{1,2}${params.file_ending}") into read_pairs_mlst, read_pairs_amr, read_pairs_vir
    set pair_id, file("${pair_id}*_concat.fq.gz") into \
      (read_pairs_mlst, read_pairs_amr, read_pairs_vir)

    """
    ${preCmd}
    cat ${pair_id}*R1* > ${pair_id}_R1_concat.fq.gz
    cat ${pair_id}*R2* > ${pair_id}_R2_concat.fq.gz
    """
}
and the input channel to that is
Paolo Di Tommaso
@pditommaso
Sep 05 2018 08:50
better
Karin Lagesen
@karinlag
Sep 05 2018 08:50
Channel
    .fromFilePairs( params.reads, size:params.setsize )
    .ifEmpty { error "Cannot find any reads matching: ${params.reads}" }
    .set{read_pairs}
where params.reads = "/work/projects/nn9305k/ecoli/analysis/qrec_map_wgs_art1/soft_linked_reads/hiseq_flex_150bp/*L00{1,5,6}_R{1,2}_001.fastq.gz"
and params. setsize = 2
I am seeing the same kind of error propagated, that is, wrong linking of read sets in later processes too
Paolo Di Tommaso
@pditommaso
Sep 05 2018 08:53
what kind of error ?
Karin Lagesen
@karinlag
Sep 05 2018 08:55
2014-01-2996_S150_L006_R1_001.fastq.gz -> /work/projects/nn9305k/ecoli/rawdata/180307_E00423.A.Project_Kaspersen-DNA1-2017-12-21/180307_E00423.A.Project_Kaspersen-DNA1-2017-12-21/Sample_3/3_S150_L006_R1_001.fastq.gz
I am in the work directory, and I am seeing the 2014 fastq.gz above pointing to the wrong fastq file in the location where the data comes from
Paolo Di Tommaso
@pditommaso
Sep 05 2018 08:57
how it should be the link you are expecting ?
Karin Lagesen
@karinlag
Sep 05 2018 08:59
oooo
hm
ok, what does nextflow do when it is given an input pattern that goes to a directory with softlinked reads?
Paolo Di Tommaso
@pditommaso
Sep 05 2018 09:00
umm, follow the link ?
Karin Lagesen
@karinlag
Sep 05 2018 09:00
because the input read pattern that is given here is really to a directory full of softlinks
ah
that might be the issue
Paolo Di Tommaso
@pditommaso
Sep 05 2018 09:01
check the docs
Thomas Van Parys
@thpar
Sep 05 2018 11:24
Can anyone think of a reason why a process that specifies errorStrategy 'ignore' would still use the default setting from the config file process.errorStrategy = 'finish'?
Shawn Rynearson
@srynobio
Sep 05 2018 15:46

quick question while working through aws-batch prefix limits.

In your example you use the following -w

./nextflow fstrozzi/rnaseq-encode-nf -w s3://bucket/prefix

is prefix i.e. work directory required?

Paolo Di Tommaso
@pditommaso
Sep 05 2018 15:46
yes
Shawn Rynearson
@srynobio
Sep 05 2018 15:46
Ok so -w s3://bucket/ will always fail.
Paolo Di Tommaso
@pditommaso
Sep 05 2018 15:46
yes
Shawn Rynearson
@srynobio
Sep 05 2018 15:46
Thanks @pditommaso
Paolo Di Tommaso
@pditommaso
Sep 05 2018 15:47
welcome :)