These are chat archives for nextflow-io/nextflow

30th
Apr 2019
Rad Suchecki
@rsuchecki
Apr 30 02:20
Why not use python shebang & set executable bit & then simply use executable.py @tobsecret
longkaizheng
@longkaizheng
Apr 30 02:52
hello, all! I replaced 19.01.0.5050 with 19.04.0.5069, but the tasks delivered were qw continually(version 19.01.0.5050 would be running), my sge config is executor = 'sge'
queue = 'all_el6.q'
penv = 'make'
cpus = 10
memory = '30 GB'
errorStrategy = 'ignore'
Rafal Gumienny
@guma44
Apr 30 07:59

Hi,
I have a simple script:

#!/usr/bin/env nextflow

import groovy.json.JsonSlurper


methods = Channel.from(["method1", "method2"])

process prepare_all_methods {
    output:
        stdout prepare_channel
    """
    sleep 1
    """
}


process get_params {
    tag "${method}"
    input:
        val prepare_channel
        val method from methods
    output:
        stdout get_params_channel
    """
    #! /bin/env python
    import json
    params = ["a", "b", "c", "d"]
    data = {"results": list()}
    for param in params:
        data["results"].append(("$method", param))
    print json.dumps(data)
    """
}

(params_for_step1) = get_params_channel.flatMap{ x -> (new JsonSlurper()).parseText(x).results }.into(1)

process step1 {
    tag "${method}: ${param}"
    input:
        set method, param from params_for_step1
    output:
        val method into method_channel_with_param
    """
    #! /bin/env python
    import random
    import time
    time.sleep(random.choice(range(1,10)))
    """
}

step1_method = method_channel_with_param.last()

process set_stage_step1 {
    executor 'local'
    tag "${ method }"

    input:
    val method from step1_method

    output:
    val method into staged1_channel

    """
    echo STAGE1
    """
}

When I fire it the output is as follows:

N E X T F L O W  ~  version 19.01.0
Launching `./test.nf` [spontaneous_watson] - revision: b2c8b3a491
WARN: There's no process matching config selector: blast
[warm up] executor > local
[f0/eee645] Submitted process > prepare
[13/1cfe30] Submitted process > get_params (method1)
[14/4169ac] Submitted process > get_params (method2)
[27/acfdaf] Submitted process > step1 (method1: b)
[78/7a91e8] Submitted process > step1 (method1: a)
[ea/79ba20] Submitted process > step1 (method1: c)
[b9/759501] Submitted process > step1 (method2: a)
[a3/c04c4d] Submitted process > step1 (method2: b)
[75/6216e4] Submitted process > step1 (method2: d)
[ce/49046f] Submitted process > step1 (method2: c)
[a5/e3a6a0] Submitted process > step1 (method1: d)
[68/3f332b] Submitted process > set_stage_step1 (method1)

What I would expect is that the process set_stage_step1 is fired once for each
method but this is not the case. When I do the unique instead of last
the order is not followed because it will fire set_stage_step1 before all steps
from given method will end.
What I would like to achieve is to split the process into methods and after
launch each method with set of parameters (generated dynamically and not known
before) separately (ie. each method could run till the end without waiting for
the other to finish). Is there a simple way to do this?

Rafal Gumienny
@guma44
Apr 30 09:17
I was wandering, could this be achieved with sub-workflows?
micans
@micans
Apr 30 09:24
@guma44 I have some trouble understanding what's going on. Can you make a 3-sentence summary of how you want to combine things, ignoring nextflow? You want to combine methods and parameters, that sounds very doable.
Rafal Gumienny
@guma44
Apr 30 09:38
@micans, I am really bad in explaining things :F. I'll try. I would like to run different methods that require common preparation step (that is why they should be in the same script). Methods require different set of parameters (eg. file inputs - one per method, each different) generated dynamically for each method ie. I do no know them beforehand. After first step I need to run script for each method separately but only if the method was run for each parameter. The main requirement is that the run of the script should be independent of the method because there is many steps to follow. I know how to wait after each step with collect but I would like to avoid it.
by run of the script should be independent of the method I meant that the method1 should be independent of method2
micans
@micans
Apr 30 09:49
@guma44 so methods can run in parallel and do not depend on each other, correct? Then you want to run method1 in parallel for a dynamically generated set of parameters, and after that collect those result and run by a script. You want to do the same for method2 <-- is this roughly what you described? One remark is that you can avoid the wait at the collect step by using groupTuple.
Rafal Gumienny
@guma44
Apr 30 10:10
@micans Yes. That is what I would like to do :D. I take a look into the groupTuple
Rafal Gumienny
@guma44
Apr 30 10:27
@micans The groupTuple seems to be blicking ie. it will no group the channel until it releases all values
i meant blocking
micans
@micans
Apr 30 10:59
@guma44 groupTuple will release a Tuple once it has been filled; it is aware of the size of the Tuple. I've used it here: https://github.com/cellgeni/guitar/blob/master/main.nf#L233-L235
Rafal Gumienny
@guma44
Apr 30 11:54
@micans if I run it with this change it behaves as I said:
[3a/aacf52] Submitted process > prepare
[c5/f5b319] Submitted process > get_params (method1)
[89/18c008] Submitted process > get_params (method2)
[18/d41990] Submitted process > get_params (method3)
[38/247074] Submitted process > step1 (method2: d)
[c4/4fbf09] Submitted process > step1 (method2: c)
[f9/025a44] Submitted process > step1 (method3: o)
[d4/e0771f] Submitted process > step1 (method1: b)
[00/c0eeb2] Submitted process > step1 (method1: a)
[30/9bc9af] Submitted process > step1 (method3: p)
[22/412fa6] Submitted process > step1 (method3: z)
[df/70f10a] Submitted process > step1 (method3: r)
[1f/8365e5] Submitted process > step1 (method3: y)
[d8/452e8d] Submitted process > step1 (method3: x)
[44/406e3f] Submitted process > set_stage_step1 ([method3, [y, p, o, r, x, z]])
[e9/d939a0] Submitted process > set_stage_step1 ([method1, [b, a]])
[16/a4c27f] Submitted process > set_stage_step1 ([method2, [c, d]])
the step set_stage_step1 is not launched until all the methods are finished
micans
@micans
Apr 30 12:06
Hi @guma44 -- sorry I meant the addition of groupKey, my mistake. See https://github.com/cellgeni/guitar/blob/master/main.nf#L209 (in the same file, earlier). This will actually do the release of a Tuple as I described.
PhilPalmer
@PhilPalmer
Apr 30 12:09

Hi, is there a way to combine two channels of different lengths by elements at different positions?
I would like to do something like this:

ch1 = Channel.from( 64, 16, "A", 4)
ch2 = Channel.from( "A", 1)    
newCh = ch1.combine(ch2, by: 2)

newCh = [64, 16, "A", 4, 1]

micans
@micans
Apr 30 12:24
@PhilPalmer combine gives the product of all combinations. Not sure what you want; "A" is present in both channels, is that essential or does it make it harder to understand? It looks a bit like what you did was a combine but only using the second element of ch2. That could be done by filtering ch2 first I assume.
PhilPalmer
@PhilPalmer
Apr 30 12:30
Hi @micans, thanks for your help. I want to combine the elements where the channels have have matching letters. I think my problem was that the element which I was trying to combine by was at different positions in the channels so If I change the order combine should give the desired result
micans
@micans
Apr 30 12:47
@PhilPalmer hehe I was a bit useless .... didn't look at the by much. I learned something :-)
Rafal Gumienny
@guma44
Apr 30 13:04
@micans Thanks for help: this issue (issued by you) nextflow-io/nextflow#796 was also helpful. I did not precisely figured out how to use it in my case but this is a step forward.
micans
@micans
Apr 30 13:11
@guma44 +1: with these things I always make the smallest possible toy example that I can using toy files and simple unix commands (echo, touch, etc), before embedding it in a real application, just to make sure I have the file+channel logic right.
Rafal Gumienny
@guma44
Apr 30 13:16
@micans Indeed, with more complex pipeline the stuff always gets more tricky. I have my toy example so I can play now with different set ups. Thanks!
Tobias "Tobi" Schraink
@tobsecret
Apr 30 14:16
@rsuchecki it's part of a bash script.
Could I do both in a script block?
"""
#!/bin/bash
<some bash commands>
#!/bin/python
script.py <arguments>
"""
Eugene Bragin
@eugene.bragin_gitlab
Apr 30 14:51
for scalable AWS cloud deployment, would people suggest using ec2+autoscaling approach or aws Batch?
Paolo Di Tommaso
@pditommaso
Apr 30 14:51
batch
Eugene Bragin
@eugene.bragin_gitlab
Apr 30 14:52
Thanks Paolo, any particular reason why?
Paolo Di Tommaso
@pditommaso
Apr 30 14:52
less moving parts, scale to zero, better resources control
Eugene Bragin
@eugene.bragin_gitlab
Apr 30 14:53
i see, cheers
micans
@micans
Apr 30 15:10
@tobsecret AFAICS you can do /bin/python script.py <arguments> in your script section; you do not need a second section to invoke python+python script.
Sinisa Ivkovic
@sivkovic
Apr 30 15:25
@pditommaso is it possible to use functions defined in different file like this https://github.com/SciLifeLab/Sarek/tree/master/lib? I get this error when I try to run pipeline with 19.05.0-SNAPSHOT version
groovy.lang.MissingPropertyException: No such property: SarekUtils for class: Script_a0b7885c at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:65) at org.codehaus.groovy.runtime.callsite.PogoGetPropertySite.getProperty(PogoGetPropertySite.java:51) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callGroovyObjectGetProperty(AbstractCallSite.java:309) at Script_a0b7885c$_extractFastq_closure4.doCall(Script_a0b7885c:446) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041) at org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:37) at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:115) at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:127) at nextflow.extension.MapOp$_apply_closure1.doCall(MapOp.groovy:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:101) at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323) at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:263) at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1041) at groovy.lang.Closure.call(Closure.java:405) at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120) at groovyx.gpars.dataflow.operator.DataflowOperatorActor.onMessage(DataflowOperatorActor.java:108) at groovyx.gpars.actor.impl.SDAClosure$1.call(SDAClosure.java:43) at groovyx.gpars.actor.AbstractLoopingActor.runEnhancedWithoutRepliesOnMessages(AbstractLoopingActor.java:293) at groovyx.gpars.actor.AbstractLoopingActor.access$400(AbstractLoopingActor.java:30) at groovyx.gpars.actor.AbstractLoopingActor$1.handleMessage(AbstractLoopingActor.java:93) at groovyx.gpars.util.AsyncMessagingCore.run(AsyncMessagingCore.java:132) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748).
Everything worked fine with previous version.
Tobias "Tobi" Schraink
@tobsecret
Apr 30 15:34
@micans hmm, the whole problem was that it didn't find script.py when it was in the bin folder in the workflow directory. I think my current solution is the easiest.
python ${basedir}/bin/script.py
Paolo Di Tommaso
@pditommaso
Apr 30 15:35
have you granted exec permission to the script?
Tobias "Tobi" Schraink
@tobsecret
Apr 30 15:35
yup
Paolo Di Tommaso
@pditommaso
Apr 30 15:35
weird, it should work
Tobias "Tobi" Schraink
@tobsecret
Apr 30 15:35
it finds it if I just type script.py but not if I type python script.py
Paolo Di Tommaso
@pditommaso
Apr 30 15:36
but not if I type python script.py
of course not ..
Tobias "Tobi" Schraink
@tobsecret
Apr 30 15:36
yeah, I figured
Paolo Di Tommaso
@pditommaso
Apr 30 15:36
:)
Tobias "Tobi" Schraink
@tobsecret
Apr 30 15:36
that's why I asked if the following should work
"""
#!/bin/bash
<some bash commands>
#!/bin/python
script.py <arguments>
"""
Paolo Di Tommaso
@pditommaso
Apr 30 15:36
but why using python script.py instead of script.py ?
Tobias "Tobi" Schraink
@tobsecret
Apr 30 15:36
because script.py was giving me some weird errors for some reason
idk, my current solution seems to work
Paolo Di Tommaso
@pditommaso
Apr 30 15:37
:+1:
btw the #!/bin/python in the script above is just a comment ..
Tobias "Tobi" Schraink
@tobsecret
Apr 30 15:38
oh, ok - I was wondering about that!
How would you have both of those in one script block?
Paolo Di Tommaso
@pditommaso
Apr 30 15:39
you can't and your example you don't need it
because is Bash that invokes the python script
Tobias "Tobi" Schraink
@tobsecret
Apr 30 15:40
Also, I was wondering if the fromSRA method also happens to work for ENA, since I noted that the ftp path for both ends in ebi.ac.uk, for some reason? :sweat_smile: I suspect not, but on one run nextflow said it was going to stage the file (which was an ENA accession), so I got optimistic
micans
@micans
Apr 30 15:41
I would try to just have script.py as an executable script. What are those weird errors?
Paolo Di Tommaso
@pditommaso
Apr 30 15:41
I was wondering if the fromSRA method also happens to work for ENA
frankly I don't know, that paths are returned from the SRA API query
Tobias "Tobi" Schraink
@tobsecret
Apr 30 15:58

@micans : this was the odd error I was speaking of:

Caused by:
  Process `coverage (1)` terminated with an error exit status (2)

Command executed:

  mosdepth -x ISO_349.fastq.gz ISO_349.fastq.gz.bam
  plot-dist.py \*global.dist.txt

Command exit status:
  2

Command output:
  (empty)

Command error:
  /cm/local/apps/environment-modules/4.0.0//init/bash: line 15: MODULES_USE_COMPAT_VERSION: unbound variable
  /gpfs/home/schrat01/projects/aim3/bin/plot-dist.py: line 1: import: command not found
  /gpfs/home/schrat01/projects/aim3/bin/plot-dist.py: line 2: import: command not found
  /gpfs/home/schrat01/projects/aim3/bin/plot-dist.py: line 3: import: command not found
  /gpfs/home/schrat01/projects/aim3/bin/plot-dist.py: line 4: import: command not found
  /gpfs/home/schrat01/projects/aim3/bin/plot-dist.py: line 5: from: command not found
  /gpfs/home/schrat01/projects/aim3/bin/plot-dist.py: line 6: import: command not found
  /gpfs/home/schrat01/projects/aim3/bin/plot-dist.py: line 7: from: command not found
  /gpfs/home/schrat01/projects/aim3/bin/plot-dist.py: line 10: syntax error near unexpected token `('
  /gpfs/home/schrat01/projects/aim3/bin/plot-dist.py: line 10: `def main():'

Work dir:
  /gpfs/scratch/schrat01/nf_work/aim3/work/f7/1b343efcf9455017257cf35e8b3c2e

So maybe I am misunderstanding how executables work but linux should figure out that plot-dist.py is a python file, not a bash executable

@pditommaso thanks! I'll report on it again - the last time we ran it, there was some other unrelated error, so we terminated it.
micans
@micans
Apr 30 16:02
@tobsecret linux will not figure that out ... that's why you need the #!/bin/env python line. The error you have there should be solvable. It's loading a modules init script for some reason. I'd try to make a tiny nextflow file and run that.
Tobias "Tobi" Schraink
@tobsecret
Apr 30 16:12
Hmmm, but then will it still be able to run mosdepth, which is a bash executable?
Also, thanks! Learn sth new every day :pray:
micans
@micans
Apr 30 16:35
@tobsecret yes, it should be fine to have different things in your script section, including bash scripts, binaries, python scripts, et cetera .. best to make a tiny nf example where you test this (using a hello-world python program).
Rad Suchecki
@rsuchecki
Apr 30 23:35

@tobsecret what I meant was that this

"""
<some bash commands>
script.py <arguments>
"""

should work if your script.py

  • sits under bin/
  • is executable
  • starts with the right shebang - preferably #!/bin/env python as pointed out by @micans