These are chat archives for nextflow-io/nextflow

3rd
May 2017
mitul-patel
@mitul-patel
May 03 2017 10:24
Hello,
Is it possible to execute process inside def....
Phil Ewels
@ewels
May 03 2017 13:45
Hi @mitul-patel - Paolo may correct me here, but basically I think the answer is no
Why do you want to run a process inside a function?
@pditommaso - I have a question about Nextflow's AWS integration - specifically, S3 integration.
In your blog post you demo saving output data to an S3 bucket by just specifying the S3 url as the output directory (really nice!)
Can you use S3 buckets for input data in the same way? If so, how does this integration work? Presumably it has to sync the files to the local filesystem first? Do I need to do anything special?
Use case - I've created a huge S3 bucket full of loads of reference genomes and indices (in collaboration with AWS Research group). Our RNA pipeline has a config file specifying the locations of reference genomes: https://github.com/SciLifeLab/NGI-RNAseq/blob/master/conf/uppmax.config#L113-L131
I'm wondering if I can create an AWS profile with config that looks just the same, but with s3:// paths instead..
Michael L Heuer
@heuermh
May 03 2017 13:56
@ewels Based on the thread above :point_up: April 25, 2017 3:13 PM I believe the answer is yes. I haven't had a chance to try it yet though.
Phil Ewels
@ewels
May 03 2017 13:59
Awesome! And to check - if I have ten process a using the same s3 address, it won't download the files ten times right?
Michael L Heuer
@heuermh
May 03 2017 14:04
Now that is a good question :)
Phil Ewels
@ewels
May 03 2017 14:07
🤑
Shellfishgene
@Shellfishgene
May 03 2017 14:21
@pditommaso : Hi! A few weeks back you offered to adapt nextflow to our queuing system, NQSII, which is similar to SGE. If you're still up for that, is there a list of commands nextflow uses and parses the output of? So I can send the command names and output formats used in NQSII.
And to check - if I have ten process a using the same s3 address, it won't download the files ten times right?
this depends
I mean, if you use the local or grid executor they are download in any case, instead when using the ignite distributed executor they are download only once for node
Paolo Di Tommaso
@pditommaso
May 03 2017 15:43
Said that, this is a feature under major refactoring to improve caching and avoid unneeded data replications
see nextflow-io/nextflow#265
Phil Ewels
@ewels
May 03 2017 15:45
Ok, good to know. I guess it would be good to minimise the downloads in this case as a reference index can be quite a bit of data
Paolo Di Tommaso
@pditommaso
May 03 2017 15:45
exactly
Phil Ewels
@ewels
May 03 2017 15:45
So for now would I be better off adding a dummy process at the start of the pipeline which downloads the files and passes them on as a regular output channel?
Paolo Di Tommaso
@pditommaso
May 03 2017 15:46
I guess so
@Shellfishgene ok, I will need one or more example of the submit command line and the dump of the output of qstat command
Shellfishgene
@Shellfishgene
May 03 2017 15:48
@pditommaso ok, does nextflow sumbmit job files with the options or all on the command line?
Paolo Di Tommaso
@pditommaso
May 03 2017 15:48
job files
you may want to try to implement it by using the SgeExecutor as template and opening a pull request
Shellfishgene
@Shellfishgene
May 03 2017 15:50
I looked at that, it would probably take me days to do that. I don't understand half the things that are going on.
Paolo Di Tommaso
@pditommaso
May 03 2017 15:51
well the main method is getDirectives
that are more or less the cluster command line options
the other relevant part is the parseQueueStatus
takes takes a string like this
        job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
        -----------------------------------------------------------------------------------------------------------------
        7548318 0.00050 nf-exonera pditommaso   r     02/10/2014 12:30:51 long@node-hp0214.linux.crg.es      1
        7548348 0.00050 nf-exonera pditommaso   r     02/10/2014 12:32:43 long@node-hp0204.linux.crg.es      1
        7548349 0.00050 nf-exonera pditommaso   hqw   02/10/2014 12:32:56 long@node-hp0303.linux.crg.es      1
        7548904 0.00050 nf-exonera pditommaso   qw    02/10/2014 13:07:09                                    1
        7548960 0.00050 nf-exonera pditommaso   Eqw   02/10/2014 13:08:11                                    1
and fetch the job id and status
you can see it in action in this test
I would say starts opening a feature request on the GitHub repo specifying 1) the system/cluster name, 2) a submit command line example, 3) and queue status command and 4) output example
Shellfishgene
@Shellfishgene
May 03 2017 15:56
@pditommaso Ok, maybe I'll have another shot at it myself. Before I start however, one problem I ran into with NQSII and adapting bpipe: Our qstat command does not show any history, so if the job is done or crashes qstat does not show any info on it anymore. Is this a problem?
Paolo Di Tommaso
@pditommaso
May 03 2017 15:56
no, most of batch schedulers work in this way (unfortunately)
Shellfishgene
@Shellfishgene
May 03 2017 15:56
Ok
How does nextflow know if the job crashed or was successful? Append && touch success.txt or something like that?
Paolo Di Tommaso
@pditommaso
May 03 2017 15:58
more or less, it creates a .exitcode file containing the job exit status