These are chat archives for nextflow-io/nextflow

15th
May 2017
chdem
@chdem
May 15 2017 08:45
Hello ! I need to launch a process 'B' that merge all the files generated by the previous process 'A'.... So the process B can only start when all the process A instances have finished (and all files are generated)....
I can not find any way to do that :worried:
Phil Ewels
@ewels
May 15 2017 09:51
Hi @chdem - add .collect() to your input channel on process B
You can see an example of it in action in our RNA pipeline: https://github.com/SciLifeLab/NGI-RNAseq/blob/master/main.nf#L760-L777
we're merging a bunch of results files in that process - $input_files becomes a list of input filenames when the command is run
Phil Ewels
@ewels
May 15 2017 12:02
..any luck @chdem?
@pditommaso / others - I've written up some docs (including an example Nextflow config file) for the s3 iGenomes resource that I was playing with the other week. Available here: https://github.com/ewels/AWS-iGenomes
Paolo Di Tommaso
@pditommaso
May 15 2017 12:07
very interesting, well done
Phil Ewels
@ewels
May 15 2017 12:09
if anyone gets a chance to have a play with them and has feedback, that'd be great!
Paolo Di Tommaso
@pditommaso
May 15 2017 12:11
sure thing, something interesting to discuss in September how improve/optimise NF support for this use case
Phil Ewels
@ewels
May 15 2017 12:11
exactly :+1:
chdem
@chdem
May 15 2017 13:10
Thanks @ewels, I'm going to test !
Félix C. Morency
@fmorency
May 15 2017 14:51

I've been using

    errorStrategy 'retry'
    maxRetries 6
    maxErrors 6

but it seems the execution is only retried once.. any idea?

I have Execution is retried (1)
Paolo Di Tommaso
@pditommaso
May 15 2017 15:30
Can you share the full log?
Félix C. Morency
@fmorency
May 15 2017 15:32
Yes, sec
Félix C. Morency
@fmorency
May 15 2017 17:18
@pditommaso is there a built-in mechanism in NF to produce a dummy "DONE" file at the end of each process? Sometime, incomplete files are written when a process finishes and an interruption occcurs
Paolo Di Tommaso
@pditommaso
May 15 2017 17:24
only when the process is successfully executed ?
Félix C. Morency
@fmorency
May 15 2017 17:24
yes
Paolo Di Tommaso
@pditommaso
May 15 2017 17:25
the trace file won't help ?
Félix C. Morency
@fmorency
May 15 2017 17:25
the -resume seems to catch that the file exists on the work directory and launch the next process
Paolo Di Tommaso
@pditommaso
May 15 2017 17:26
not clear
Félix C. Morency
@fmorency
May 15 2017 17:27
what happens if an interruption occurs during the write of a (big) output file?
Paolo Di Tommaso
@pditommaso
May 15 2017 17:27
the process work dir is discarded in the next resumed execution
Félix C. Morency
@fmorency
May 15 2017 17:29
Mmm I def had cases where it was not. The NAS mount went down, output file was partly written. Killed NF, resumed the execution and it started the next process using an incomplete input file
Paolo Di Tommaso
@pditommaso
May 15 2017 17:30
looks weird, how was the .exitcode ?
Félix C. Morency
@fmorency
May 15 2017 17:30
I will look at it next time it happens
Paolo Di Tommaso
@pditommaso
May 15 2017 17:31
if the job was killed or crashed it must contain a non-zero exit status
Félix C. Morency
@fmorency
May 15 2017 17:37
I see
Bili Dong
@qobilidop
May 15 2017 20:10

I have a wierd issue with toAbsolutePath(). For the input I have

input:
file snap from snaps.flatten()

Then in the script I use snap.toAbsolutePath(), which should yield something like /A/B/C/snap1, but instead yields /A/snap1. I have checked the work dir, the file does link to /A/B/C/snap1.

One thing that seems related is how I get snaps. It comes from another process as

output:
file `B/C/snap*` into snaps

Somehow the B/C/ part gets eaten. I still have no idea how to fix it.

Paolo Di Tommaso
@pditommaso
May 15 2017 20:12
can I see the complete error message
Bili Dong
@qobilidop
May 15 2017 20:14
ERROR ~ Error executing process > 'processSnapshot (21)'

Caused by:
  Process `processSnapshot (21)` terminated with an error exit status (1)

Command executed:

  process_snapshot.py -i /home/b2dong/repos/toy-csa-nf/data0019 -o data0019.png

Command exit status:
  1

Command output:
  (empty)

Command error:
  yt : [ERROR    ] 2017-05-15 13:00:57,941 None of the arguments provided to load() is a valid file
  yt : [ERROR    ] 2017-05-15 13:00:57,941 Please check that you have used a correct path
  Traceback (most recent call last):
    File "/home/b2dong/repos/toy-csa-nf/bin/process_snapshot.py", line 16, in <module>
      ds = yt.load(str(args.i))
    File "/home/b2dong/miniconda/envs/toy-csa-nf/lib/python3.6/site-packages/yt/convenience.py", line 76, in load
      raise YTOutputNotIdentified(args, kwargs)
  yt.utilities.exceptions.YTOutputNotIdentified: Supplied ('/home/b2dong/repos/toy-csa-nf/data0019',) {}, but could not load!
Basically it says the file does not exist.
The reason I want to get the original path of the input file is that there are some other files in the original directory that might be needed to process this file.
Paolo Di Tommaso
@pditommaso
May 15 2017 20:16
I see, you can't use toAbsolutePath for that
Bili Dong
@qobilidop
May 15 2017 20:16
Is there a way to do that?
Paolo Di Tommaso
@pditommaso
May 15 2017 20:17
use the intere directory as process input instead of a single file
Bili Dong
@qobilidop
May 15 2017 20:21
I have considered that. One issue is that I want to process each A/snap* individually, so I was considering passing both A and A/snap* to a channel. But I haven’t figured out how to achieve it.
Paolo Di Tommaso
@pditommaso
May 15 2017 20:22
are they produced by the same process ?
(I guess so)
Bili Dong
@qobilidop
May 15 2017 20:23
yes, the previous process download the data and pass A/snap* for following process to process
i guess i can get the absolute path in the script?
Paolo Di Tommaso
@pditommaso
May 15 2017 20:24
you can write
Bili Dong
@qobilidop
May 15 2017 20:24
that works for me
so i konw i shouldn’t use toAbsolutePath
Paolo Di Tommaso
@pditommaso
May 15 2017 20:24
process foo { 
 output: 
  file 'A' into ch_a
  file 'A/snap*' into ch_snap

}
but I still I don't see the reason of doing that
as along as having A folder
you can access A/snap* files from the downstream process
Bili Dong
@qobilidop
May 15 2017 20:26
yes that’s true
but i want to assign each A/snap* to an individual process
so it will go into another process and the same issue occur
Paolo Di Tommaso
@pditommaso
May 15 2017 20:28
but i want to assign each A/snap* to an individual process
not sure to understand
Bili Dong
@qobilidop
May 15 2017 20:29
ideally i’d like to have a input variable pointing to A/snap1 while passing A to the channel
and anothe variable pointing to A/snap2 in another process
Paolo Di Tommaso
@pditommaso
May 15 2017 20:31
is the number of snap files known ?
Bili Dong
@qobilidop
May 15 2017 20:31
not known in advance
Paolo Di Tommaso
@pditommaso
May 15 2017 20:32
ah, if so how do you map them to specific processes ?
Bili Dong
@qobilidop
May 15 2017 20:33
so in one process I output A/snap* to a channel and then process then individually
Paolo Di Tommaso
@pditommaso
May 15 2017 20:35
um, what about using the same process changing dynamically the task command depending the actual snap_n file ?
Bili Dong
@qobilidop
May 15 2017 20:36
i guess i should show the complete code at this point
params.sim = 'Enzo_64'
params.snap_template = 'DD*/data????'

sim = params.sim
snap_template = params.snap_template
data_dir = 'data'
results_dir = "results/${sim}"

process downloadSimulation {
    storeDir data_dir

    output:
    file "${sim}"
    file "${sim}/${snap_template}" into snaps

    script:
    """
    wget http://yt-project.org/data/${sim}.tar.gz
    tar xzf ${sim}.tar.gz
    """
}

process processSnapshot {
    storeDir results_dir

    input:
    file snap from snaps.flatten()

    output:
    file "${snap.baseName}.png" into plots

    script:
    """
    process_snapshot.py -i ${snap.toAbsolutePath()} -o ${snap.baseName}.png
    """
}
Paolo Di Tommaso
@pditommaso
May 15 2017 20:37
better :)
Bili Dong
@qobilidop
May 15 2017 20:39
The issue is that in the last script is I use ${snap}, then it won’t work, because along with DD*/data????, there are DD*/data????.a, DD*/data????.b and so on
I could pass a set of files
but a further issue is that I don’t know if they exist in advance
for this dataset, yes
but I’d like it to work with other datasets
Paolo Di Tommaso
@pditommaso
May 15 2017 20:40
um, but processSnapshot should process all DD*/data???? files altogether
or one by one ?
Bili Dong
@qobilidop
May 15 2017 20:40
one by one
one work around is that I can resolve the input path in the python script
although i do feel it’s an abuse of nextflow to try to get the original path
Paolo Di Tommaso
@pditommaso
May 15 2017 20:42
so you need to remove .toAbsolutePath()
Bili Dong
@qobilidop
May 15 2017 20:42
yes
Paolo Di Tommaso
@pditommaso
May 15 2017 20:42
process_snapshot.py is a script you have written ?
Bili Dong
@qobilidop
May 15 2017 20:42
yes
Paolo Di Tommaso
@pditommaso
May 15 2017 20:43
let me think
Bili Dong
@qobilidop
May 15 2017 20:44
thank you for pointing out the issue with .toAbsolutePath()
Paolo Di Tommaso
@pditommaso
May 15 2017 20:47
what about this
process downloadSimulation {
    storeDir data_dir

    output:
    file "${sim}" into sim_ch
    file "${sim}/${snap_template}" into snaps_ch

    script:
    """
    wget http://yt-project.org/data/${sim}.tar.gz
    tar xzf ${sim}.tar.gz
    """
}

process processSnapshot {
    storeDir results_dir

    input:
    file sim from sim_ch 
    val snap from snaps.flatten().map { it.baseName }

    output:
    file "${snap}.png" into plots

    script:
    """
    process_snapshot.py -i ${sim}/${snap}.putExtensionHere -o ${snap}.png
    """
}
the provide the intere directory as file input
and the snap file name as value parameter
Bili Dong
@qobilidop
May 15 2017 20:49
thanks! let me try it
Bili Dong
@qobilidop
May 15 2017 21:00
ah, there’s an issue. baseName will give data????, but the file is one level deeper at DD*/data????. but this could be overcome, I just need to pass DD* as the directory.
Paolo Di Tommaso
@pditommaso
May 15 2017 21:03
I don't have a better solution for that
Bili Dong
@qobilidop
May 15 2017 21:05
That’s already a good enough solution for me. Thank you for the help! I appreciate your time.
Paolo Di Tommaso
@pditommaso
May 15 2017 21:05
:+1: