These are chat archives for nextflow-io/nextflow

14th
Feb 2018
Luca Cozzuto
@lucacozzuto
Feb 14 2018 11:37
Hi all, there is a way to get a value from a channel using the id that you get from another one?
    input:
    set pair_id, file(bamfile) from BOWTIE2bam_for_wiggle
    file(bam_indexfile) from bamIndexFiles[${pair_id}] ???
Paolo Di Tommaso
@pditommaso
Feb 14 2018 11:38
nope
Luca Cozzuto
@lucacozzuto
Feb 14 2018 11:38
no way to make this?
Paolo Di Tommaso
@pditommaso
Feb 14 2018 11:39
I guess you need to join somehow the twos
to feed it with the expected data
Luca Cozzuto
@lucacozzuto
Feb 14 2018 11:39
mmm
so I have two channels CHANNEL ONE AND TWO
the two have same id
Paolo Di Tommaso
@pditommaso
Feb 14 2018 11:40
sounds good
Luca Cozzuto
@lucacozzuto
Feb 14 2018 11:40
if there is a way to merge them per ID will be enough
Paolo Di Tommaso
@pditommaso
Feb 14 2018 11:41
The join operator creates a channel that joins together the items emitted by two channels for which exits a matching key
Luca Cozzuto
@lucacozzuto
Feb 14 2018 11:42
should I join in the process definition?
Paolo Di Tommaso
@pditommaso
Feb 14 2018 11:42
should I join in the process definition?
what does it mean?
try nextflow console
run the example, you will understand . . :)
Luca Cozzuto
@lucacozzuto
Feb 14 2018 11:43
ok
thanks
Luca Cozzuto
@lucacozzuto
Feb 14 2018 11:49
input:
set pair_id, file(bamfile), file(bam_indexfile) from channelA.join(channelB)
thanks!
Paolo Di Tommaso
@pditommaso
Feb 14 2018 12:37
:clap: :clap:
You give a poor man a fish and you feed him for a day. You teach him to fish and you give him an occupation that will feed him for a lifetime.
:satisfied:
Luca Cozzuto
@lucacozzuto
Feb 14 2018 13:21
I suggest to add a similar example to the official documentation :)
Paolo Di Tommaso
@pditommaso
Feb 14 2018 13:22
good tip
jncvee
@jncvee
Feb 14 2018 14:40
Hello I'm new to using nextflow and am trying to understand why if i have two processes why the program will skip the first process and just do the second one. I know it is mentioned in the tutorial. But Why? and how do i get it to do both processes
Paolo Di Tommaso
@pditommaso
Feb 14 2018 14:42
please give me the context, where is it mentioned that in the tutorial ?
jncvee
@jncvee
Feb 14 2018 14:45
It is during the modify and resume part of the Get Started page it says You will see that the execution of the process splitLetters is actually skipped (the process ID is the same), and its results are retrieved from the cache. The second process is executed as expected, printing the reversed strings.
Paolo Di Tommaso
@pditommaso
Feb 14 2018 14:48
because if you modify a task you need to compute at least once to get the result, do you agree on that ?
jncvee
@jncvee
Feb 14 2018 14:48
yes
Paolo Di Tommaso
@pditommaso
Feb 14 2018 14:49
hence, what's your doubt ?
jncvee
@jncvee
Feb 14 2018 14:51
I still don't know why the first processes isn't showing. I personally dont think I'm modifying a task just adding another task
Paolo Di Tommaso
@pditommaso
Feb 14 2018 14:53
so let's recapitulate, there's a script like this

params.str = 'Hello world!'

process splitLetters {

    output:
    file 'chunk_*' into letters mode flatten

    """
    printf '${params.str}' | split -b 6 - chunk_
    """
}


process convertToUpper {

    input:
    file x from letters

    output:
    stdout result

    """
    cat $x | tr '[a-z]' '[A-Z]'
    """
}

result.subscribe {
    println it.trim()
}
there are two process, the first executed once and the second two times
the output is like this
N E X T F L O W  ~  version 0.9.0
[warm up] executor > local
[22/7548fa] Submitted process > splitLetters (1)
[e2/008ee9] Submitted process > convertToUpper (1)
[1e/165130] Submitted process > convertToUpper (2)
HELLO
WORLD!
right ?
jncvee
@jncvee
Feb 14 2018 14:54
yes
Paolo Di Tommaso
@pditommaso
Feb 14 2018 14:55
then it modifies the second one, the script become
params.str = 'Hello world!'

process splitLetters {

    output:
    file 'chunk_*' into letters mode flatten

    """
    printf '${params.str}' | split -b 6 - chunk_
    """
}


process convertToUpper {

    input:
    file x from letters

    output:
    stdout result

    """
    rev $x 
    """
}

result.subscribe {
    println it.trim()
}
if you re-execute this script with -resume it scripts the part already computed, that is splitLetters
jncvee
@jncvee
Feb 14 2018 14:57
so if i add the -resume it will work the way you think it would
Paolo Di Tommaso
@pditommaso
Feb 14 2018 14:57
since the process convertToUpper was modified it is compute (two times for the two different inputs)
does that make sense ?
jncvee
@jncvee
Feb 14 2018 14:59
yes i think I got it
Vladimir Kiselev
@wikiselev
Feb 14 2018 15:52
Hi Paolo, I get exactly this error: nextflow-io/nextflow#525
What does unresolved mean?
Paolo Di Tommaso
@pditommaso
Feb 14 2018 15:57
that has an unknown value
typically a type a typo in variable name
Vladimir Kiselev
@wikiselev
Feb 14 2018 16:01
ok, many thanks, I figured the problem!
Paolo Di Tommaso
@pditommaso
Feb 14 2018 16:01
:+1:
Shawn Rynearson
@srynobio
Feb 14 2018 16:05

@pditommaso thank for helping me with getting a aws-batch system up and running. It was in fact my aws-batch and vpc configurations.

Question:
I know you specify your s3 bucket on the command line as so:
-w s3://mybucket
And it becomes the work directory, but at the beginning of processing if you want to point to data living in a bucket like so how is this handled? I've seen this example but am unsure if it handled the same way.

Paolo Di Tommaso
@pditommaso
Feb 14 2018 16:11
you can use s3://bucket/path as regular file in Channel.fromPath, etc
Shawn Rynearson
@srynobio
Feb 14 2018 16:12
Okay so an example of this would be:
proteins = Channel.fromPath( 's3://mybucket/myfile')
Paolo Di Tommaso
@pditommaso
Feb 14 2018 16:13
yes, can even use wildcards ..
Shawn Rynearson
@srynobio
Feb 14 2018 16:14
like: proteins = Channel.fromPath( 's3://mybucket/*fastq')
Paolo Di Tommaso
@pditommaso
Feb 14 2018 16:14
yep
Shawn Rynearson
@srynobio
Feb 14 2018 16:16
Awesome! I also noticed you suggest that user launch workflows from ec2 instances. Is this because on long running jobs you've noticed a disconnect from aws?
Paolo Di Tommaso
@pditommaso
Feb 14 2018 16:18
well, if there some heavy data transfer make sense to do it in the aws network
Shawn Rynearson
@srynobio
Feb 14 2018 16:20
I see, if included in your .nf script you are doing your s3 upload of data. That make sense, although I never thought to do that :)
jncvee
@jncvee
Feb 14 2018 16:21
Hello I'm back and my output from my script is still producing only the second processes even though I added -resume
Edgar
@edgano
Feb 14 2018 16:25
@jncvee is the output of your first script publish anywhere? if it is the input of the second script and you didnt define a publishDir. It will be in the work folder.
jncvee
@jncvee
Feb 14 2018 16:37
@edgano the output of the first processes isnt in the output folders just copys of the original text files show up in the work folder
jncvee
@jncvee
Feb 14 2018 16:45
Okay I was able to find the output of the first processes in the work folders and the output of the second processes in the work folders as well
Edgar
@edgano
Feb 14 2018 16:47
@jncvee if you want the 1st script output in the output folders, you can use the publisDir
https://www.nextflow.io/docs/latest/process.html#publishdir
jncvee
@jncvee
Feb 14 2018 17:04
I want the file changed in the first process and then be changed again in the second process
Edgar
@edgano
Feb 14 2018 17:17
Channel.fromPath( '/some/path/*.fa' )
        .into {proteins, proteins2}

process foo1 {
  publishDir "${params.output}/process1", mode: 'copy', overwrite: true

  input:
      file query_file from proteins
  output:
      file 'result.txt' into numbers

  script:
    """
        ./helloWorld.sh query_file
    """
}
process foo2 {
  publishDir "${params.output}/process2", mode: 'copy', overwrite: true

  input:
      file file2 from proteins2
  output:
      file 'result2.txt' into numbers2

  script:
    """
        ./hello.sh file2
    """
}
That's a dummy example. You get your file , you modify in the foo1 process, you publish it on process1 folder,
Then the same file, you will modify on foo2 and publish the result on process2 folder
jncvee
@jncvee
Feb 14 2018 17:32
okay thank you that helps a lo
a lot