These are chat archives for nextflow-io/nextflow

19th
Mar 2018
Brian Reichholf
@breichholf
Mar 19 2018 07:35

Hi there! I'm trying to debug an error, but can't determine why this is happening, and I'm not sure if it's nextflow or the cluster environment causing the error.
So, I have a pipeline that used to work fine, but (after updating nextflow) it breaks at one of the first steps:

bedtools: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by bedtools)
bedtools: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by bedtools)

Hunting down the error, indeed these strings are not provided in the referenced libstdc++.so.6, but(!) if I ssh to the supposedly faulty compute node, and 'reproduce' the error with bash .command.run the step works just fine. Any ideas on how I could narrow the source down further?

Paolo Di Tommaso
@pditommaso
Mar 19 2018 07:48
NF does not use that stuff, it looks a problem in the env configuration
using bash .command.run you run it in the login node, but likely the issue is in the configuration of the compute node
to verify it you should use sbatch .command.run instead (or whatever is your batch submit command)
Brian Reichholf
@breichholf
Mar 19 2018 09:29
Good to know it's likely not on NF's side! Good suggestion with sbatch. I just logged in via srun --pty bash to the compute node and did bash .command.run, and thought this should mimic the env.
Luca Cozzuto
@lucacozzuto
Mar 19 2018 11:09
hello! Is it possible to make a file with a number of nextflow functions? Something to be easily included in different scripts?
Paolo Di Tommaso
@pditommaso
Mar 19 2018 11:10
what kind of functions ?
Luca Cozzuto
@lucacozzuto
Mar 19 2018 11:11
user defined functions
to avoid redundancy among different scripts
Maxime Garcia
@MaxUlysse
Mar 19 2018 11:13
:+1:
Paolo Di Tommaso
@pditommaso
Mar 19 2018 11:14
tasks to be executed or script helpers ?
Luca Cozzuto
@lucacozzuto
Mar 19 2018 11:15
?
things to be executed
imagine that I have a number of functions like:
mapping, indexing,SNP_calling,ecc and I store the command line, modifications etc in a file that can be included from different scripts, I can thing to use mapping and indexing also in other pipelines (like in RNAseq) without copy pasting 95% of the code.
Paolo Di Tommaso
@pditommaso
Mar 19 2018 11:17
template
Luca Cozzuto
@lucacozzuto
Mar 19 2018 11:19
that is nice but I'm asking for something a bit different
def mappingPairs( pair_id, STARgenome, reads, cpus) { 
    """
        STAR --genomeDir ${STARgenome} \
                 --readFilesIn ${reads} \
                 --readFilesCommand zcat \
                 --outSAMunmapped None \
                 --outSAMtype BAM SortedByCoordinate \
                 --runThreadN ${cpus} \
                 --quantMode GeneCounts \
                 --outFileNamePrefix ${pair_id}

            mkdir STAR_${pair_id}
            mv ${pair_id}Aligned* STAR_${pair_id}/.
            mv ${pair_id}SJ* STAR_${pair_id}/.
            mv ${pair_id}ReadsPerGene* STAR_${pair_id}/.
            mv ${pair_id}Log* STAR_${pair_id}/.   
    """
}
I would like to store this function in a separated file that can be included in different NF pipelines
Can I do it?
Paolo Di Tommaso
@pditommaso
Mar 19 2018 11:24
yes, create a groovy helper class such as
class MyRecurrentCrap {

    static def mappingPairs( pair_id, STARgenome, reads, cpus) { 
        """
            STAR --genomeDir ${STARgenome} \
                     --readFilesIn ${reads} \
                     --readFilesCommand zcat \
                     --outSAMunmapped None \
                     --outSAMtype BAM SortedByCoordinate \
                     --runThreadN ${cpus} \
                     --quantMode GeneCounts \
                     --outFileNamePrefix ${pair_id}

                mkdir STAR_${pair_id}
                mv ${pair_id}Aligned* STAR_${pair_id}/.
                mv ${pair_id}SJ* STAR_${pair_id}/.
                mv ${pair_id}ReadsPerGene* STAR_${pair_id}/.
                mv ${pair_id}Log* STAR_${pair_id}/.   
        """
    }


}
save as MyRecurrentCrap.groovy in the lib/ path
then in the nextflow script use MyRecurrentCrap. mappingPairs( ... )
Luca Cozzuto
@lucacozzuto
Mar 19 2018 11:25
love "MyRecurrentCrap"
Paolo Di Tommaso
@pditommaso
Mar 19 2018 11:26
:joy:
Luca Cozzuto
@lucacozzuto
Mar 19 2018 11:29
actually I'm thinking that done in proper way we could make BioNextflow (as bioPerl, bioPython...)
Maxime Garcia
@MaxUlysse
Mar 19 2018 11:34
Oooh, that's looking nice
Luca Cozzuto
@lucacozzuto
Mar 19 2018 11:34
so let's ask for volunteers :) here the "empty" project
Maxime Garcia
@MaxUlysse
Mar 19 2018 11:37
@lucacozzuto Starred and forked ;-)
Luca Cozzuto
@lucacozzuto
Mar 19 2018 11:38
nice :)
Brian Reichholf
@breichholf
Mar 19 2018 12:50
@pditommaso all fixed, some problem on the cluster leading to unloading gcccore, influencing environment. Something bug specific to this cluster I guess. Thanks anyway!
Paolo Di Tommaso
@pditommaso
Mar 19 2018 13:00
:v:
gawells
@gawells
Mar 19 2018 14:36
Hi, is there a way to generate two channels from two different file globs (differing list sizes) that would be equivalent to iterating over each glob in a nested loop?
Paolo Di Tommaso
@pditommaso
Mar 19 2018 14:38
mm, I would say
ch1 = Channel.fromPath(glob1)
ch2 = Channel.fromPath(glob2)
but I guess I'm missing something
gawells
@gawells
Mar 19 2018 14:39
in bash it would be
for x in glob1; do for y in glob2; do command $x $y; done; done;
Paolo Di Tommaso
@pditommaso
Mar 19 2018 14:41
I see
you not think in terms of loops, that is bringing you in wrong direction
you can create to channels and combine them
I guess that what you need
gawells
@gawells
Mar 19 2018 14:43
ah thanks, I missed that
Mike Smoot
@mes5k
Mar 19 2018 18:51
I have a question for the nextflow community: Do people tend to put process specific configurations in the nextflow.config? In general, my nextflow.config is very simple and then I put cpu, queue, and other requirements that are specific to a process in the process block in my main.nf. I don't use the process.$name.whatever syntax in nextflow.config. I'm curious if this is what most people do or if what I'm doing is unusual?
Paolo Di Tommaso
@pditommaso
Mar 19 2018 18:53
IMO resources should go in the config file always
but I leave the answer to other people experience
Mike Smoot
@mes5k
Mar 19 2018 18:54
Is the rationale for this that you can have different configs for different environments?
Paolo Di Tommaso
@pditommaso
Mar 19 2018 18:55
the rational is decoupling the implementation from resources config
Mike Smoot
@mes5k
Mar 19 2018 18:59
Hmmm. I'll have think on that for a bit. I put most stuff in main.nf so that I have one place to look to understand how a process runs and what resources it demands.
Alexander Peltzer
@apeltzer
Mar 19 2018 19:53
At nf-core we decouple things as Paolo it just mentioend by having several config files. Have a look at https://github.com/nf-core/methylseq/blob/master/nextflow.config for example
That way you can (without changing the main.nf script) run the same pipeline on various resources...
Luca Cozzuto
@lucacozzuto
Mar 19 2018 20:02
We do the same so that you can run the pipeline in different environments with different resources.
Using the variable ${task.cpus} allows you to bind the resources needed in the command line
Mike Smoot
@mes5k
Mar 19 2018 20:07
Thanks @lucacozzuto and @apeltzer! Very helpful information.
Vladimir Kiselev
@wikiselev
Mar 19 2018 20:31
I second @apeltzer and @lucacozzuto, for me the whole point of using nextflow instead of other workflow tools is that it is the most mature and user-friendly for running on almost all possible environments (and new integrations keep coming). Having all your resources described in config makes your pipeline completely environment-independent.
Mike Smoot
@mes5k
Mar 19 2018 20:40
I guess we've had it easy since the same configuration has worked across both our environments for us.
Rohan Shah
@rohanshah
Mar 19 2018 20:59
I couldn't find it in the documents but is there a way to call a "final" process when any other process in a workflow exists with a non-zero exit code? I know onError exists but that seems to only call internal code, I'd like to instead call some script.