These are chat archives for nextflow-io/nextflow

23rd
Mar 2018
Maxime Garcia
@MaxUlysse
Mar 23 2018 12:30 UTC
Hi, I'm now using at least 3 different scripts in our pipeline, which means that I now have 3 differents report trace and timeline files. Is there already a way to merge them together?
Paolo Di Tommaso
@pditommaso
Mar 23 2018 12:34 UTC
no
Evan Floden
@evanfloden
Mar 23 2018 12:34 UTC
@MaxUlysse + @pditommaso Kind of related, we are looking at how to perform an operation on trace file onComplete. @edgano is checking out options today. In this case you could just cat them together.
Maxime Garcia
@MaxUlysse
Mar 23 2018 12:35 UTC
@skptic I agree that the trace files are simple enough that a simple cat would suffice
Evan Floden
@evanfloden
Mar 23 2018 12:35 UTC
@pditommaso Can you acess the trace object (if there is such a thing) from NF?
Maxime Garcia
@MaxUlysse
Mar 23 2018 12:36 UTC
@pditommaso I'm guessing the need for something like that will be null when we have the submodules
Paolo Di Tommaso
@pditommaso
Mar 23 2018 12:36 UTC
not sure
Maxime Garcia
@MaxUlysse
Mar 23 2018 12:40 UTC
So it might be of interest to have an option to concatenate report/trace/timeline if wanted within NF
Shall I make an issue?
Evan Floden
@evanfloden
Mar 23 2018 12:41 UTC
I would want it to be more general. Ie perform any operation on trace files.
Maxime Garcia
@MaxUlysse
Mar 23 2018 12:42 UTC
Which type of operation would you want?
Evan Floden
@evanfloden
Mar 23 2018 12:42 UTC
I need to filter with grep, cut and sort for example.
Maxime Garcia
@MaxUlysse
Mar 23 2018 12:43 UTC
you can already specify which field you want in the trace report
But I can see why you want to grep and sort your trace report
Edgar
@edgano
Mar 23 2018 13:13 UTC
I am no able (yet )to access the trace from a process in NF
Paolo Di Tommaso
@pditommaso
Mar 23 2018 13:18 UTC
if you know the file name, you should be able to get it (to do what?)
Evan Floden
@evanfloden
Mar 23 2018 13:21 UTC
Do you really want to ask :wink:?
Paolo Di Tommaso
@pditommaso
Mar 23 2018 13:21 UTC
yes, maybe better not to know
Evan Floden
@evanfloden
Mar 23 2018 13:25 UTC
Too late! Given a file trace.txtand a file with the sequence lengths, I’m currently doing this.
Paolo Di Tommaso
@pditommaso
Mar 23 2018 13:27 UTC
jesus christ .. ! I think it's time you learn some groovy code
Edgar
@edgano
Mar 23 2018 13:29 UTC
:smile:
Maxime Garcia
@MaxUlysse
Mar 23 2018 13:35 UTC
OMG
Luca Cozzuto
@lucacozzuto
Mar 23 2018 13:40 UTC
I'm wondering why multiqc cannot read Nextflow trace file :)
Shawn Rynearson
@srynobio
Mar 23 2018 15:40 UTC
I have a question about file collection before executing a process, say if I have a process that required the collection and completion of a number of files before execution; similar to merging bam files. I've been reading over the docs(i.e. operators), and I haven't found a clear solution. One method I was considering was using collect but it states that it would flatten all entries down to a list, which could possibly trigger a processes to restart if a pipeline has to be resumed, because i've reassigned files to a list or array. I know collect() also accepts optional closures, so could this be overcome by adding something like: .map{ path -> ( file(path))}
Mike Smoot
@mes5k
Mar 23 2018 15:44 UTC
Not sure I'm totally following your question, but if you're considering collect but are worried about resuming the pipeline, I've found that toSortedList helps because all inputs to the downstream process are in a consistent order.
Shawn Rynearson
@srynobio
Mar 23 2018 15:47 UTC
@mes5k they are but when you use any of the list options on files I believe you change your input from being files to an array, which would trigger a upstream processes to run again on resume.
Mike Smoot
@mes5k
Mar 23 2018 15:51 UTC
Hmmm, I'm not sure that's been my experience, but I'd need to write a test program to verify.
Shawn Rynearson
@srynobio
Mar 23 2018 15:52 UTC
To help explain what I'm getting at is this image. Say I have a number of different sorted.bam files to remove dups from and I've collected them using the toSortedList operator, but 1..n number of dup process fail and a resume is needed to reprocess. All the previous alignment step would rerun to create the list which is fed into the dup process.
Shawn Rynearson
@srynobio
Mar 23 2018 15:59 UTC
This can get expensive on a large number of input files, so I'm essentially looking for the best way to collect but maintain them as a collection of files.
Mike Smoot
@mes5k
Mar 23 2018 16:00 UTC
Not sure I understand how "1..n number of dup processes fail". If you're using toSortedList between "alignment sorting" and "remove duplicates" then "remove duplicates" will only run once on the entire list of sorted bams, because toSortedList consumes the entire channel. Is that how you understand it?
Shawn Rynearson
@srynobio
Mar 23 2018 16:02 UTC
Sorry, you're right the "remove duplicate" would only run once.
but if it fails would it it reprocess the "alignment" step again to generate the list, or understand they are files and simply collect them?
Sorry If this is unclear.
Mike Smoot
@mes5k
Mar 23 2018 16:05 UTC
I don't think toSortedList (or any operator for that matter) would impact any upstream processes. All it should care about is the channel it consumes. So I think you'd be alright. In any case, I'd recommend writing a simple test program to verify this for yourself with dummy processes that run quickly.
Shawn Rynearson
@srynobio
Mar 23 2018 16:05 UTC
Okay, based on this I'll do some more testing. thanks @mes5k!
jncvee
@jncvee
Mar 23 2018 17:10 UTC
if I am using python in nextflow, how do I use the input
Mike Smoot
@mes5k
Mar 23 2018 17:44 UTC
@jncvee here's a very simple python example:
Channel.from(1..10).set{ homer }

process doPython {

    input:
    val(x) from homer

    output:
    stdout into results

    script:
    """
    #!/usr/bin/env python
    print ${x}
    """
}

results.view()
That will print out the numbers 1-10 in whatever order the OS schedules the processes that nextflow starts.
jncvee
@jncvee
Mar 23 2018 17:57 UTC
okay thank you !
jncvee
@jncvee
Mar 23 2018 18:22 UTC
To enable the SLURM executor simply set to process.executor property to slurm value in the nextflow.config file. how do i do this