These are chat archives for nextflow-io/nextflow

16th
Nov 2016
Maxime Garcia
@MaxUlysse
Nov 16 2016 09:38
this task thing sounds interesting, I'll include that in my next update ;-) Thanks
Félix C. Morency
@fmorency
Nov 16 2016 15:47
(sid, A, B, C) = Channel
    .fromFilePairs("...*{A, B, C}*",
                   size: 3,
                   flat: true) { it.parent.name }
    .separate(4)

sid.into{sid_for_something}
A.into{A_for_something; A_for_other}
B.into{B_for_something}
C.into{C_for_other}

process Something {
    input:
    val sid from sid_for_something
    file A from A_for_something
    file B from B_for_something

    output:
    file "result" into D_for_other

    """
    ...
    """
}

process Other {
    input:
    val sid from sid_for_other
    file A from A_for_other
    file C from C_for_other
    file D from D_for_other

    output:
    file "stuff"

    """
    ...
    """
}
imagine I have something like ^
will D in Other be the file produced by Something with the same sid in both?
Félix C. Morency
@fmorency
Nov 16 2016 18:30
mmm doesn't seem guaranteed at all
Paolo Di Tommaso
@pditommaso
Nov 16 2016 18:53
will D in Other be the file produced by Something with the same sid in both?
it's not clear
Félix C. Morency
@fmorency
Nov 16 2016 18:56
I did a toy pipeline with some toy i/o and can show my race condition problem
Paolo Di Tommaso
@pditommaso
Nov 16 2016 18:56
much better
Félix C. Morency
@fmorency
Nov 16 2016 18:56
I also implemented a fix. It works but I'm not sure it's the right way
Paolo Di Tommaso
@pditommaso
Nov 16 2016 18:57
NF is supposed to be race-condition free..
here's my toy pipeline with toy input data
the stuff in comment is what I had before causing issue
sorry not race condition.... more a misunderstanding of how channel works
the output file three.txt is important here. without the "fix", the sid occurences was not the same in the file
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:02
the inputs you are declaring are not matching
I mean when you have
Channel
    .fromFilePairs("./Data/**/{1,2,3}",
                   size: 3,
                   flat: true) { it.parent.name }
that will produce a channel emitting the following
[In1, /Users/pditommaso/Downloads/nf_id/Data/In1/1, /Users/pditommaso/Downloads/nf_id/Data/In1/2, /Users/pditommaso/Downloads/nf_id/Data/In1/3]
[In2, /Users/pditommaso/Downloads/nf_id/Data/In2/1, /Users/pditommaso/Downloads/nf_id/Data/In2/2, /Users/pditommaso/Downloads/nf_id/Data/In2/3]
[In3, /Users/pditommaso/Downloads/nf_id/Data/In3/1, /Users/pditommaso/Downloads/nf_id/Data/In3/2, /Users/pditommaso/Downloads/nf_id/Data/In3/3]
Félix C. Morency
@fmorency
Nov 16 2016 19:04
yeah this is what I want
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:04
that is each item is a tuple like the following [ id, file1, file2, file3 ]
so far so good
Félix C. Morency
@fmorency
Nov 16 2016 19:04
yes this is okay
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:05
but that tuple does not match the this and other declarations like this
    input:
    val sid from sid_for_one
    file one from one_for_one
you will need something like
  input: 
  set sid, file(a), file(b), file(c) from ch_x
Félix C. Morency
@fmorency
Nov 16 2016 19:07
i do a .separate() after the .fromFilePairs(). i don't need all the tuple content in a single process but some part in multiple process
will ^ cause problem?
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:09
ah no, but I have that I can be simplified a lot
Félix C. Morency
@fmorency
Nov 16 2016 19:10
oh?
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:12
ok, basically you need [id, file1] for the first process and [id, file2, file3] for the second process right ?
Félix C. Morency
@fmorency
Nov 16 2016 19:12
yes, let's start with that
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:21
well, I was hoping something better ..
(channel_for_one, channel_for_two) = Channel
    .fromFilePairs("./Data/**/{1,2,3}", size: 3, flat: true) { it.parent.name }
    .map { id, file1, file2, file3 -> [ tuple(id, file1), tuple(id, file2,file3) ] }
    .separate(2)
Félix C. Morency
@fmorency
Nov 16 2016 19:22
;)
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:23
however what's wrong in your script
it returns
$ nextflow run pipeline.nf 
N E X T F L O W  ~  version 0.22.4
Launching `pipeline.nf` [pensive_bhaskara] - revision: c55cd0d372
[warm up] executor > local
[ce/2b1279] Submitted process > One (1)
[b6/bde3d1] Submitted process > One (2)
[38/1ab614] Submitted process > One (3)
[27/8a967e] Submitted process > Two (1)
[fd/948745] Submitted process > Two (2)
[8f/ce7455] Submitted process > Two (3)
[91/c29848] Submitted process > Three (1)
[6c/ef3ec9] Submitted process > Three (2)
[16/afd540] Submitted process > Three (3)
Félix C. Morency
@fmorency
Nov 16 2016 19:24
if you open the "results/.../Three/three.txt" file
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:24

$ cat results/In3/Three/three.txt 
In3 In3 1 In3 2 3 In3 1
Félix C. Morency
@fmorency
Nov 16 2016 19:24
Yes, you cannsee that the sid (In3) is the same in the file
which is good
now it you open the nf pipeline and (un)comment the part where I pass the sid around in output:
the sid occurence in the three.txt file won't be the same
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:26
do you mean having this
    output:
    set sid, 'one.txt' into result_for_two,
                            result_one_for_three
    file 'one.txt' into result_for_two,
                        result_one_for_three
both of them ?
Félix C. Morency
@fmorency
Nov 16 2016 19:27
no, just keep the second one
run this one ^
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:29
I saw
Félix C. Morency
@fmorency
Nov 16 2016 19:29
im trying to understand what's going on inside nf between the two versions
if you run the _wrong pipeline multiple times, the content of three.txt will change
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:35
because it's not expected they should match
the channel content is order but not the process executions
Félix C. Morency
@fmorency
Nov 16 2016 19:36
and passing the sid around fixes that?
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:36
no
I mean it does not synchronise the processes
you have two choices or create a channel containing all the components you need
our synchronise the channel content based on the same sid using phase.
Félix C. Morency
@fmorency
Nov 16 2016 19:40
i see i see
ok i will create channel containing all the component i need and see how it goes from here
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:40
:+1:
Félix C. Morency
@fmorency
Nov 16 2016 19:40
thanks for your help @pditommaso
Paolo Di Tommaso
@pditommaso
Nov 16 2016 19:41
welcome
Jason Byars
@jbyars
Nov 16 2016 20:26
for the AWS spotPrice setting, is there a way to adjust the bid after cloud creation?
Paolo Di Tommaso
@pditommaso
Nov 16 2016 20:26
um, no..
Jason Byars
@jbyars
Nov 16 2016 21:37
good to know. Should storeDir be able to use a S3 bucket in the current build?
Jason Byars
@jbyars
Nov 16 2016 22:01
storeDir behaves correctly to detect objects that have already been processed and stored in a bucket, but at the end of .command.run it tries to create a local folder instead of push the file to a bucket.
Paolo Di Tommaso
@pditommaso
Nov 16 2016 22:14
oops, could you open an issue for that
Jason Byars
@jbyars
Nov 16 2016 22:22
sure, could you clarify something about publishDir as well so I can document it appropriately for both? publishDir publishes to buckets correctly in the end, but looking at the generated .command.run, even with scratch true, it looks the script section runs and the output is always copied back to the shared work folder before some copy to bucket operation occurs. Is this correct?
I'm trying to avoid copies back to shared storage for the results
Jason Byars
@jbyars
Nov 16 2016 22:36
it looks like this happens in all cases