These are chat archives for nextflow-io/nextflow

4th
Dec 2018
Alexander Peltzer
@apeltzer
Dec 04 2018 10:16
Is there an easy way to rename an input file?
I can do a mv but that doesn't feel correct ;-)
Paolo Di Tommaso
@pditommaso
Dec 04 2018 10:16
input:
file 'foo' from your_ch
Alexander Peltzer
@apeltzer
Dec 04 2018 10:22
Can I add a file extension then too?
like file 'foo.bar' from your_ch ?
(the tool requires fasta/fmt7 input -.-)
Paolo Di Tommaso
@pditommaso
Dec 04 2018 10:23
whatever you want, that's supposed the file name used to stage the input file
Alexander Peltzer
@apeltzer
Dec 04 2018 11:42
Thanks a bunch, i figured it out and learned something new today :-D
Paolo Di Tommaso
@pditommaso
Dec 04 2018 11:43
I have a vague memory you asked already in the past :D
but I may be wrong ..
Alexander Peltzer
@apeltzer
Dec 04 2018 11:44
Not sure about that either ;-) But my mind forgot about renaming + having a variable simultaneously and I always have a tab of nextflow's docs open ;-)
Anyhow, now I know how to :-)
Paolo Di Tommaso
@pditommaso
Dec 04 2018 11:45
:+1:
Félix C. Morency
@fmorency
Dec 04 2018 14:14
@pditommaso thanks. you might want to change the website. "It can be used on any POSIX compatible system (Linux, Solaris, OS X, etc)." :)
Paolo Di Tommaso
@pditommaso
Dec 04 2018 14:14
ahah, true :grin:
also Solaris .. I wonder if still exists
cwytko
@cwytko
Dec 04 2018 16:08
For the join operator on channels do both channels need to be done populating?
Paolo Di Tommaso
@pditommaso
Dec 04 2018 16:10
need to be done populating
what do you mean ?
cwytko
@cwytko
Dec 04 2018 16:14
Right now I have two channels that are maps (of experiments to runs from ncbi) and one channel is local such that runs are local to given experiments. The other channel is another map (experiments to runs from ncbi) but this one is the remote set that I will be downloading from ncbi. As the downloading occurs I want to be able to merge a complete remote experiment map with its local counterpart in the local channel without having to wait for all other runs to be downloaded with their given experiments in the remote channel.
Paolo Di Tommaso
@pditommaso
Dec 04 2018 16:16
the join makes no different about local/remote
remote files are download when the task is executed (tho it may depend which executor are you using)
cwytko
@cwytko
Dec 04 2018 16:18
Ah so is it the relationship between the process doing the downloading and the remote channel I need to focus on in getting a 'experiment chunk' behavior that I'm looking for?
As in as soon as one map in the remote channel from the downloading process is complete I merge that to the local channel map
Paolo Di Tommaso
@pditommaso
Dec 04 2018 16:20
there's no such concept of remote channels, a channel contains objects that can a be a remote file URI
however if you are applying an operation to it, the file is automatically downloaded
cwytko
@cwytko
Dec 04 2018 16:22
I'm explaining myself wrong then, I mean to say that the channels are labelled local and remote.
Yes I want to apply the operation of downloading the file
The crux of my situation is my REMOTE_EXP channel gets huge on our SLURM system while trying to download all of the files before joining and I want there to be a flow of as the experiments are completed they are processed by the other processes down the workflow
Paolo Di Tommaso
@pditommaso
Dec 04 2018 16:25
what get huge? the number of jobs submitted to the slurm cluster ?
cwytko
@cwytko
Dec 04 2018 16:38
No, the COMBINED_SAMPLES channel that contains all of the runs/sample data of our work before we are able to run the other tools on it (https://github.com/SystemsGenetics/GEMmaker/blob/master/main.nf#L88-L91) & (https://github.com/SystemsGenetics/GEMmaker/blob/master/main.nf#L126-L131).
When I try to run the big experiment we get some of the samples but no other results from succeeding processes and then the whole thing hangs.
Is there a better way to debug this?
And should I set a hard number of concurrent slurm processes running on the nextflow.config so that those number of processes are guaranteed to go through the whole workflow?
Paolo Di Tommaso
@pditommaso
Dec 04 2018 16:41
You can set executor.queueSize to define the max number of jobs NF can submit to slurm
Cedric
@Puumanamana
Dec 04 2018 20:59
Hi everyone. Do you know what's the best way to handle channels when a process can have a variable number of output files (one or two in my case)? I have trouble to handle my output (which is of the shape set val(id), file("${id}*.fastq") : when there are 2 outputs, I can use the .get() method in the next process, but not when I only have one output.
Tobias "Tobi" Schraink
@tobsecret
Dec 04 2018 22:28
@Puumanamana could you share your code?
The way you can imagine this is that this will give your next process one of the following: [val(id), [file(1), file(2)]] or [val(id), path(1)]
I have had a similar issue in the past, where I needed to scale up process memory requirements based on the amount of files in a file object