These are chat archives for nextflow-io/nextflow

29th
Nov 2018
Maxime HEBRARD
@mhebrard
Nov 29 2018 02:47
hello! When I create a channel.fromPath(path/to/file) then nextflow create a pointer to the correspondig file in /work/##/... directory. (That is ok, I get it). But then, if I create myself the path within a map directive, nextflow do not create the pointer, it just work directly on the file inside his path ... then I don't get the created file for the output .... well give me a minute I show you an example
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 03:18
@pditommaso How can i fetch input of multiple files from s3 bucket directly to scratch?. to process multiple file for each process, something like this
    scratch = '/raid/home/krittin/tmp'
    stageInMode = 'copy'
    stageOutMode = 'move'

    input:
    val sample, 's3:/mybucket/*' from Samples
Maxime HEBRARD
@mhebrard
Nov 29 2018 03:48

ok ... in the code below, the problem is that my BAM is on remote path. The .bai is create in work directory. idxstats look at my remote bam but do not find the bai ....

cBams = cBamCreated.mix(cBamExists.map{
    [it.name, file("${params.outstar}/${it.name}/${it.name}_Sorted.bam")]
  });

process Stats {
  publishDir path: "${params.outstar}", mode: 'copy', pattern: "*Log*", saveAs: {filename -> "${bam[0]}/$filename"}
  publishDir path: "${params.outstar}", mode: 'copy', pattern: "*.bai", saveAs: {filename -> "${reads[0].name}/${reads[0].name}_Sorted.bai"}

  input:
    val bam from cBams

  output:
    file '*Log*'
    file '*.bai'

  script:
    prefix = "${bam[0]}_"
    """
    samtools index ${bam[1]} ${prefix}Sorted.bai
    samtools idxstats ${bam[1]} > ${prefix}Log.idxstats.txt
    """
}

I guess I wish to get a link to the bam file inside work directory, compute the index and stats, and export them ...

Paolo Di Tommaso
@pditommaso
Nov 29 2018 07:31
you need to include also the bai in the cBams channel
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 07:31
Hi guy
Paolo Di Tommaso
@pditommaso
Nov 29 2018 07:32
@mhebrard and declare the input as set val(name), file(bam), file(bai) from cBams
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 07:33
large file about (6 GB) with s3 got these warning all the time, but for smaller like 100 MB is ok. what does it cause?
WARN: Unable to stage foreign file: s3://biotec/F18FTSAPHT1086_HUMlxjR/HS01039/clean_data/181018_I115_CL100098734_L1_HUMlxjRAABM-525_1.fq.gz (try 1) -- Cause: Premature end of Content-Length delimited message body (expected: 6446242104; received: 4270078052
WARN: Unable to stage foreign file: s3://biotec/F18FTSAPHT1086_HUMlxjR/HS01039/clean_data/181018_I115_CL100098734_L1_HUMlxjRAABM-525_1.fq.gz (try 2) -- Cause: Premature end of Content-Length delimited message body (expected: 6446242104; received: 4113145925
Paolo Di Tommaso
@pditommaso
Nov 29 2018 07:34
no idea, it looks a S3 hiccup
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 07:38
is these ok to ignore, it’s retry 3 times my freeNas run over just 1 Gbps link
Paolo Di Tommaso
@pditommaso
Nov 29 2018 07:39
does it manage to copy the file at the end ?
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 07:40
still no files appear on scratch dir
Paolo Di Tommaso
@pditommaso
Nov 29 2018 07:41
therefore is not so OK to ignore it I guess
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 07:45
I wonder how s3 transparently download/upload mechanism was. Could you please briefly elaborate more on this?
Paolo Di Tommaso
@pditommaso
Nov 29 2018 07:46
what I'm supposed to elaborate ?
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 07:52
I actually fustrating what was going on behind the scene when using it with scratch. does s3 directly download into scratch or being copy later. which step download actually trigger.
process Alignment {

    tag { "${sample.sampleId}_index${sample.index}" }

    publishDir "${params.publishDir}/${sample.project}/aligned/${sample.sampleId}", mode: 'copy', overwrite: true

    scratch = '/raid/home/krittin/tmp'

    stageInMode = 'copy'

    stageOutMode = 'move'

    input:
    set sample, file(sampleFiles) from Samples

    output:
    set sample, "${sample.sampleId}.bam", "${sample.sampleId}.bam.bai" into AlignedSamples

    shell:
    '''
    dnabricks germline --ref !{sample.ref} !{sample.inputFastqArgs} --no-bqsr --no-variant-calling --bam !{sample.sampleId}.bam
    '''
}
sampleFiles = file(“S3://mybucket/sampleId/*.fq.gz”)
Paolo Di Tommaso
@pditommaso
Nov 29 2018 07:56
when an input files is hosted on a remote storage is copied locally
the scratch not change it, however not sure stageInMode and stageOutMode work with s3
why are you using it?
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 07:59
i need to transfer data from Minio(freeNas) to DGX local SSD (7TB), so it’s like move in -> processing on gpu -> delete temp or push to s3://output
Maxime HEBRARD
@mhebrard
Nov 29 2018 08:01
oh input: set val(name), file(bam) from cBams that is probably the missing piece in my code ... didn't think I could use "set syntax in input ... I ll try and let you know
Paolo Di Tommaso
@pditommaso
Nov 29 2018 08:02
@sinonkt is the gpu is happening, local, cloud, cluster ?
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 08:02
local sir
minio also on premise
Screen Shot 2561-11-29 at 14.56.58.png
Paolo Di Tommaso
@pditommaso
Nov 29 2018 08:03
then NF copy the files for you, I don;t think you need any stageInMode and stageOutMode
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 08:04
but still need to point to scratch?
S3 still hiccup, even no single process was launched. since warming[local] state
Paolo Di Tommaso
@pditommaso
Nov 29 2018 08:06
scratch is usually used if you use a shared file system, like NFS, lustre, etc
are you using it ?
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 08:07
near future will be gpfs
Paolo Di Tommaso
@pditommaso
Nov 29 2018 08:07
and now ?
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 08:07
DGX will have share gpfs mount next year.
now actually mount NFS from NAS but willing to use s3 as interface over share file system too
Paolo Di Tommaso
@pditommaso
Nov 29 2018 08:10
therefore I think you don' need scratch either because the work dir is already a local storage
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 08:14
@pditommaso Thanks for your advice. I’ll keep it in mind.
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 08:21
but work dir keep growing i shouldn’t not let them live too long on local SSD right? an assumption should be things that need to be permanent should be published some how?
Paolo Di Tommaso
@pditommaso
Nov 29 2018 08:24
this how it work NFD so far
in the future it should make a better use of local storage nextflow-io/nextflow#452
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 08:30
Screen Shot 2561-11-29 at 15.29.35.png
S3 hiccup make process error too. I guess non of files was successfully downloaded.
T_T
Paolo Di Tommaso
@pditommaso
Nov 29 2018 08:33
I guess so
micans
@micans
Nov 29 2018 10:25
@pditommaso I can add two different collectFiles() examples as pattern. For the second one I have the question whether the behaviour is robust and stable, see my latest question above; this is where .collectFile { id, files -> [ id, files.collect{ it.toString() }.join('\n') + '\n' ] } results inid as the file base name. It's very cool anyway.
Paolo Di Tommaso
@pditommaso
Nov 29 2018 10:47
:+1:
micans
@micans
Nov 29 2018 10:48
Let me add a question mark: :question: :grin:
Paolo Di Tommaso
@pditommaso
Nov 29 2018 10:49
to what ?
micans
@micans
Nov 29 2018 10:49
whether the behaviour is robust and stable: this bit of magic that happens with id (appearing as file name). Sorry if I missed it in the docs ... I did read collectFile() docs, bit of a mind bender ..
(I plan to use this behaviour, and retrieve that file name in the downstream process)
Paolo Di Tommaso
@pditommaso
Nov 29 2018 10:51
it's not magic, the name of the file is given by the first element in the pair you are returning
you can control as you with
micans
@micans
Nov 29 2018 10:52
Awesome. :fireworks:
Paolo Di Tommaso
@pditommaso
Nov 29 2018 10:52
:smile:
micans
@micans
Nov 29 2018 10:52
Thanks! will make the patterns today.
Paolo Di Tommaso
@pditommaso
Nov 29 2018 10:53
nice, writing docs help a lot to understand the hidden parts :smile:
micans
@micans
Nov 29 2018 10:55
yes I agree. A step forward on the nf path
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 11:38
@pditommaso S3 hiccup disappear! after i run on local SSD instead of
write file on NAS while at the same time download large file from same NAS via S3
cause might be my poor share 1 Gbps link from my NAS to GPU machine.
Paolo Di Tommaso
@pditommaso
Nov 29 2018 11:38
:+1:
KochTobi
@KochTobi
Nov 29 2018 12:31

Hi there, when trying to access different s3 buckets from differen regions via the nextflow Sarek pipeline i get the error:

ERROR ~ The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'eu-west-1' (Service: Amazon S3; Status Code: 400; Error Code: AuthorizationHeaderMalformed; Request ID: 29319296C3A507AE; S3 Extended Request ID: 70q+dCYJyWeMK0B+UaKS1PYWUsiXo1gqHpXne8JmpIv+uDTin3idGgkGLHwLoMOIgv7dnQu7Eys=)

I configured the awscli with aws configure and also checked the config files ~/.aws/config and ~/.nextflow/config. All are set to the same region. Normally s3 buckets shouldn't be a problem as they are accessible globally. Any ideas?

Paolo Di Tommaso
@pditommaso
Nov 29 2018 12:48
not 100% sure but I think currently it's only possible to access buckets in the same region
you may want to open an issue for that
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 12:50
@pditommaso can i use storeDir over s3?
Paolo Di Tommaso
@pditommaso
Nov 29 2018 12:50
no
Krittin Phornsiricharoenphant
@sinonkt
Nov 29 2018 12:51
and also no for workDir?
Paolo Di Tommaso
@pditommaso
Nov 29 2018 12:51
yes only when using aws cloud/batch executor
micans
@micans
Nov 29 2018 13:32
I just found the view() operator, wonderful for debugging/inspection :+1:
Paolo Di Tommaso
@pditommaso
Nov 29 2018 13:34
look also to dump
micans
@micans
Nov 29 2018 13:36
wonderful stuff. Does dump propagate the channel like view does? Maybe too much ..
Paolo Di Tommaso
@pditommaso
Nov 29 2018 13:37
same semantic of view but can be enable from the CLI
micans
@micans
Nov 29 2018 13:40
Super, yes I like the activation hook and even better that it has view semantics.
You may find my collectFile() pattern a bit too much :grin: I've stuck a dump in it now.
that sounds wrong
PhilPalmer
@PhilPalmer
Nov 29 2018 13:47

Hey,
I have the following process:

process no_reads {
    tag "${isochore_name}_${reads_name}.csv"

    input:
    set val(reads_name), file(aligned_reads) from aligned_reads_no_reads
    set val(isochore_name), file(isochore) from isochores

    output:
    file "*.csv" into csv

    script:
    """
     noReads.py $isochore $aligned_reads > ${isochore_name}_${reads_name}.csv
    """
}

I have multiple files for both $isochore & $aligned_reads and would like a process for all possible combinations. However, currently it iterates through each of the at the same time, not covering all combinations, like so:
Submitted process > no_reads (SL2.31ch02_SRR346617.csv) Submitted process > no_reads (SL2.31ch03_SRR346618.csv)Is there any way of doing this while still launching separate processes? I could do it with bash logic but then as I understand it they won't run in parallel

KochTobi
@KochTobi
Nov 29 2018 13:48
@pditommaso It is possible. I recreated one s3 bucket and it works now. Guess there was sth messed up with the bucket permissions or sth similar.
micans
@micans
Nov 29 2018 13:51
@PhilPalmer Sound like you can https://www.nextflow.io/docs/latest/operator.html#combine the two channels before your no_reads process.
PhilPalmer
@PhilPalmer
Nov 29 2018 13:52
Thanks @micans, I'll have a look
micans
@micans
Nov 29 2018 13:53
combine ... :+1:
Tobias "Tobi" Schraink
@tobsecret
Nov 29 2018 21:19
@pditommaso How easy is it to contribute to the docs? I have learned a couple of things about NextFlow, some of which were more/ less easy to understand from the docs and was thinking it'd be great to contribute. Do you prefer folks contributing to the docs or to the patterns folder?
Paolo Di Tommaso
@pditommaso
Nov 29 2018 21:27
That's welcome. Where it depends: docs is supposed to be reference/user guide material; patterns are a guide of recipes/short how-to tutorial
Félix C. Morency
@fmorency
Nov 29 2018 21:52
@pditommaso been (quite!) a while since my 481 PR. will it get merged at some point?
naveen584
@naveen584
Nov 29 2018 22:07
nextflow support kubernetes
can we run nextflow directly on docker swarm and kubernetes