These are chat archives for nextflow-io/nextflow

26th
Apr 2018
Phil Ewels
@ewels
Apr 26 2018 06:34
This is awesome :clap:
Is there / will there be a -with-conda /path/to/environment.yml ?
Paolo Di Tommaso
@pditommaso
Apr 26 2018 06:35
oh
nope, but there could be :)
Radoslaw Suchecki
@bioinforad_twitter
Apr 26 2018 07:19
What would be nextflow way to define input/output files for processes if a tool (biokanga) uses the file extension to determine the output format? Essentially I need .bam otherwise it outputs uncompressed SAM. If I specify the file explicitly, using something like params.out = "default.bam" and out = file(params.out) then I need cache = 'deep' or the alignment process gets re-run despite -resume.
Paolo Di Tommaso
@pditommaso
Apr 26 2018 07:47
@bioinforad_twitter not sure to understand the real problem here
you can have a process getting the bam file as input and producing bam as output, ie
process foo {
   input: file bam from in_ch 
   output: file '*.bam' into in_ch
   """
   biokanga --in $bam
   """
}
it's true that the bam input file type is not enforced not checked by NF, so you have to sure to provide to right input to it
eventually you can implement a validation check for that
Radoslaw Suchecki
@bioinforad_twitter
Apr 26 2018 10:43
Should have made it clearer @pditommaso . It is just a toy example, see below, but the point is: the output file name extension for the tool determines output format. The question is: how to best handle that in agreement with nextflow way of doing things, without makings things too rigid and avoiding deep caching. A version of the code is available here: https://github.com/csiro-crop-informatics/reproducible_poc/blob/barebones/kangalign.nf
Radoslaw Suchecki
@bioinforad_twitter
Apr 26 2018 10:54
Skipping the unnecessary fluff, I was after biokanga align -o any_output_name.bam such that any_output_name.bam must have the .bam extension and will be available to downstream processes while not requiring cache = 'deep'
Radoslaw Suchecki
@bioinforad_twitter
Apr 26 2018 11:25
Thanks, works now. Using channels being the answer.
process kangaAlign {
    input:
    file r1
    file r2
    file db

    output:
    file 'out.bam' into bams

    """
    biokanga align \
    -i ${r1} \
    -u ${r2} \
    --sfx ${db} \
    -o out.bam \
    --pemode 2 \
    --substitutions 3 
    """
}
Paolo Di Tommaso
@pditommaso
Apr 26 2018 11:30
much better now, input/output must be always declared
Radoslaw Suchecki
@bioinforad_twitter
Apr 26 2018 11:31
process bamReIndex {
    input:
    file 'out.bam' from bams

    """
    samtools index out.bam
    """
}
Paolo Di Tommaso
@pditommaso
Apr 26 2018 11:32
ok, but don't forget that this will rename stage the input as out.bam whatever is the original file name
Radoslaw Suchecki
@bioinforad_twitter
Apr 26 2018 11:35
So I guess in the second process no need to force it anymore
process bamReIndex {
    input:
    file bam from bams

    """
    samtools index $bam
    """
}
Paolo Di Tommaso
@pditommaso
Apr 26 2018 11:35
:+1:
Radoslaw Suchecki
@bioinforad_twitter
Apr 26 2018 11:35
Great, thanks @pditommaso.
Paolo Di Tommaso
@pditommaso
Apr 26 2018 11:36
you are welcome
Radoslaw Suchecki
@bioinforad_twitter
Apr 26 2018 11:40
btw I thought you were supposed to be on holiday, not that I am complaining, but is that your idea of holiday?
Paolo Di Tommaso
@pditommaso
Apr 26 2018 11:41
I would like to spend all my life in holiday, but unfortunately, at some point, holiday ends :grimacing:
Radoslaw Suchecki
@bioinforad_twitter
Apr 26 2018 11:42
:smile:
Ashley S Doane
@DoaneAS
Apr 26 2018 13:54

:point_up::point_up: experimental Conda support

I often use conda environments in nextflow pipelines by using bash scripts in ./bin that are called in a process, and including the relevant source activate <env> in the called bash script. Great if we can setup conda directly in the nextflow process!!

Paolo Di Tommaso
@pditommaso
Apr 26 2018 13:54
good
Ashley S Doane
@DoaneAS
Apr 26 2018 13:54
lol somehow it is quoting everything..
Paolo Di Tommaso
@pditommaso
Apr 26 2018 13:56
:)
Ashley S Doane
@DoaneAS
Apr 26 2018 13:58
@pditommaso any thoughts on the beforeScript issue?
Ashley S Doane
@DoaneAS
Apr 26 2018 14:04
it seems like nextflow must be clearing out some environment variables, as spack is normally in my path, allowing spack load <module>commands
Paolo Di Tommaso
@pditommaso
Apr 26 2018 14:24
never used spak sorry
Simone Baffelli
@baffelli
Apr 26 2018 14:25
Is it possible that nextflow clean -before somename is not deleting everything it should?
I've only kept one specific run but the workdir keeps on getting larger
Paolo Di Tommaso
@pditommaso
Apr 26 2018 14:25
@DoaneAS I would suggest to make a dummy task and debug the script create by NF running bash .command.run with your sysadmins
Ashley S Doane
@DoaneAS
Apr 26 2018 14:26
ok, thanks! that is helpful :)
Paolo Di Tommaso
@pditommaso
Apr 26 2018 14:26
Is it possible that nextflow clean -before somename is not deleting everything it should?
it may happen if a task was used in a previous run
Simone Baffelli
@baffelli
Apr 26 2018 14:29
of course; but why should the workdir get bigger and bigger?
Those task that are not up to date anymore should be deleted right?
Paolo Di Tommaso
@pditommaso
Apr 26 2018 14:29
yes
Simone Baffelli
@baffelli
Apr 26 2018 14:29
I don't understand how I might be causing a "storage leak"
Paolo Di Tommaso
@pditommaso
Apr 26 2018 14:29
me neither :)
Simone Baffelli
@baffelli
Apr 26 2018 14:30
I think I know which procesess are causing it though
so maybe I can search for them in the log and manually delete these entries
I presume nextflow clean does not support this type of feature directly
Paolo Di Tommaso
@pditommaso
Apr 26 2018 14:32
no
Simone Baffelli
@baffelli
Apr 26 2018 14:33
But related to that I see something weird. Say I use nextflow log -F process =~ /polCov./. This does not show anything. However, if I combine it with -but or -before it shows some results
Paolo Di Tommaso
@pditommaso
Apr 26 2018 15:15
maybe nextflow log -F 'process =~ /polCov.*/'