These are chat archives for nextflow-io/nextflow

16th
Apr 2019
Laurence E. Bernstein
@lebernstein
Apr 16 01:22
It is possible that you have the wrong separator? Backslash versus forward?
Jason Steen
@jasteen
Apr 16 01:34
hmm. I dont believe so.
Rad Suchecki
@rsuchecki
Apr 16 05:07

@rsuchecki Mmm, so in the outlined example you mean changing the executor to local for the final step only? That would help in that specifc scenario, but would of course not be a good solution if the tasks would consume a more substantial amount of resources.
So I take it that in NF there is really no way to avoid delays in a pipeline (with dependent jobs) if other jobs (other NF pipeline, other tool, manual submissions, etc.) are submitted to a queue with similar priority while the first pipeline is in progress of being executed.

Yes @tintin42 I belive this is correct. However, depending on your cluster configuration I would expect that the order in which jobs are submitted should not be a major factor e.g. on our HPC a job's wait time is only a small component of the prioritisation.

Paolo Di Tommaso
@pditommaso
Apr 16 09:16
@Fizol to enable the customisation of the log rendering and enable/disable it using command line options
Maciej Pawlaczyk
@Fizol
Apr 16 11:00
@pditommaso I want to send all logs to the logstash. Logback provides some nice plugins to configure it in xml. Any advice how to do it using current implementation?
Sandeep Shantharam
@machbio
Apr 16 13:23
Sorry - I am trying to get back to building pipelines with Nextflow after a long break - wondering how the nextflow node (apache ingnite) works these days - here are my configurations, job and output
$ cat nextflow.config
cluster {
    join = 'path:/home/username/nextflow/cluster'
}
process.executor = 'ignite'
$ cat main.nf
process sayHello {
    echo true
    num = Channel.from( 1, 2, 3, 4, 5, 6, 7, 8, 9 )
    input:
      val x from num
    script:
    """
    echo $HOSTNAME
    """
}
$ nextflow run main.nf
N E X T F L O W  ~  version 19.01.0
Launching `main.nf` [boring_torricelli] - revision: 9df0ebcdf4
[warm up] executor > ignite
[b1/752d13] Submitted process > sayHello (5)
[cc/f706c5] Submitted process > sayHello (3)
[c0/f30f25] Submitted process > sayHello (4)
[9a/565a51] Submitted process > sayHello (9)
[18/da509b] Submitted process > sayHello (8)
[64/34835b] Submitted process > sayHello (2)
[f9/9289b2] Submitted process > sayHello (1)
[f6/478aa0] Submitted process > sayHello (7)
[aa/921a23] Submitted process > sayHello (6)
host001
host001
host001
host001
host001
host001
host001
host001
host001
why does all the process run on the head node and not on the cluster nodes as well ?
$ ls ~/nextflow/cluster
10.145.62.117#47500  10.145.62.118#47500  10.145.62.119#47500  10.145.62.120#47500  127.0.0.1#47500
Jonathan Manning
@pinin4fjords
Apr 16 13:34

Quick one: what's the best way to get from a channel with tuples like:

[ key1a, key1b, [val1, val2, val3]]
[ key2a, key2b, [val4, val5, val6]]

... to:

[ key1a, key1b, val1 ]
[ key1a, key1b, val2 ]
[ key1a, key1b, val3 ]
[ key2a, key2b, val4 ]
[ key2a, key2b, val5 ]
[ key2a, key2b, val6 ]

So flatten while preserving keys?

Chelsea Sawyer
@csawye01
Apr 16 13:54
@pinin4fjords I think transpose would work https://www.nextflow.io/docs/latest/operator.html#transpose
Paolo Di Tommaso
@pditommaso
Apr 16 13:57
@machbio check the log file and make sure the nodes have connected the topology
Jonathan Manning
@pinin4fjords
Apr 16 13:58
@csawye01 - aha, you're right, thank you:
#!/usr/bin/env nextflow

Channel.from([
   ['a', 'p', ['u','v', 'z'] ],
   ['b', 's', ['x','y'] ]
   ])
   .transpose()
   .println()
N E X T F L O W  ~  version 19.01.0
Launching `foo.nf` [hungry_kare] - revision: 3d88bdd13a
[a, p, u]
[a, p, v]
[a, p, z]
[b, s, x]
[b, s, y]
Sandeep Shantharam
@machbio
Apr 16 15:27
@pditommaso any documentation to find more about topology - could not understand the log messages - does this seem nodes are connected ?
>>> +----------------------------------------------------------------------+
>>> Ignite ver. 2.4.0#20180305-sha1:aa342270b13cc1f4713382a8eb23b2eb7edaa3a5
>>> +----------------------------------------------------------------------+
>>> OS name: Linux 3.10.0-862.11.6.el7.x86_64 amd64
>>> CPU(s): 40
>>> Heap: 27.0GB
>>> VM name: 12632@host005
>>> Ignite instance name: nextflow
>>> Local node [ID=32181F48-F031-4CA8-AA09-C20942EA6C93, order=4, clientMode=false]
>>> Local node addresses: [host005/10.145.62.120, /127.0.0.1]
>>> Local ports: TCP:10800 TCP:11211 TCP:47100 TCP:47500

Apr-16 11:20:32.293 [main] INFO  o.a.i.i.m.d.GridDiscoveryManager - Topology snapshot [ver=4, servers=4, clients=0, CPUs=160, offheap=200.0GB, heap=110.0GB]
Apr-16 11:20:32.293 [main] INFO  o.a.i.i.m.d.GridDiscoveryManager - Data Regions Configured:
Apr-16 11:20:32.294 [main] INFO  o.a.i.i.m.d.GridDiscoveryManager -   ^-- default [initSize=256.0 MiB, maxSize=50.3 GiB, persistenceEnabled=false]
Apr-16 11:20:32.320 [scheduler-agent] DEBUG nextflow.scheduler.SchedulerAgent - === Scheduler agent resources: cpus=40; mem=251.5 GB; disk=98.4 GB
Apr-16 11:20:32.361 [scheduler-agent] DEBUG nextflow.scheduler.SchedulerAgent - === Waiting for master node to join..
Apr-16 11:20:33.798 [exchange-worker-#122%nextflow%] INFO  o.a.ignite.internal.exchange.time - Started exchange init [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], crd=false, evt=DISCOVERY_CUSTOM_EVT, evtNode=34509363-34c8-41dc-8543-09c270ba34b8, customEvt=CacheAffinityChangeMessage [id=851fbb62a61-dcfbd6b4-241c-469c-a232-6ffa61a01065, topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], exchId=null, partsMsg=null, exchangeNeeded=true], allowMerge=false]
Apr-16 11:20:33.807 [exchange-worker-#122%nextflow%] INFO  o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], waitTime=0ms, futInfo=NA]
Apr-16 11:20:33.817 [exchange-worker-#122%nextflow%] INFO  o.a.ignite.internal.exchange.time - Finished exchange init [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], crd=false]
Apr-16 11:20:33.835 [sys-#148%nextflow%] INFO  o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Received full message, will finish exchange [node=34509363-34c8-41dc-8543-09c270ba34b8, resVer=AffinityTopologyVersion [topVer=4, minorTopVer=1]]
Apr-16 11:20:33.836 [sys-#148%nextflow%] INFO  o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Finish exchange future [startVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], resVer=AffinityTopologyVersion [topVer=4, minorTopVer=1], err=null]
Apr-16 11:20:33.845 [exchange-worker-#122%nextflow%] INFO  o.a.i.i.p.c.GridCachePartitionExchangeManager - Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=4, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, node=34509363-34c8-41dc-8543-09c270ba34b8]
Paolo Di Tommaso
@pditommaso
Apr 16 15:52
it seems ok, have you tried to run a less trivial workload?
Sandeep Shantharam
@machbio
Apr 16 16:02
@pditommaso this seems like very simple workload - cannot seem to understand what is used to decide whether to run on remote or local node - if I scale to 200 number tasks on the head node with 40 cpus - the 200 tasks still run on the head node
num = Channel.from( 1..200)
process sayHello {
    echo true
    input:
      val x from num
    script:
    """
    echo $HOSTNAME
    """
}
Paolo Di Tommaso
@pditommaso
Apr 16 16:04
well, being trivial task the main node may be able to catch all of them, instead of propagating to remote nodes
Sandeep Shantharam
@machbio
Apr 16 17:18
@pditommaso Thank you - it works - ran md5sum on a large file.. any idea about the inflection point for remote tasks - is it CPU, memory or I/O ?
Paolo Di Tommaso
@pditommaso
Apr 16 17:20
I think none of them, tasks are queued into a distributed queue and all concur to process process (work stealing)
therefore the main node has a short advantage because the queue entries are local, but with a real workload it should balance automatically
Sandeep Shantharam
@machbio
Apr 16 17:24
@pditommaso Thank you for the explanation
Paolo Di Tommaso
@pditommaso
Apr 16 17:24
you are welcome