These are chat archives for nextflow-io/nextflow

10th
May 2016
Robert Syme
@robsyme
May 10 2016 08:49 UTC
DAGs look great!
Is there a recommended way to debug when an ignite cluster node isn't being assigned jobs?
When the master node starts up, the .node-nextflow.log on the snubbed machine includes lines like:
May-10 16:51:07.332 [exchange-worker-#69%nextflow%] INFO  o.a.i.i.p.c.GridCachePartitionExchangeManager - Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=2, minorTopVer=0], evt=NODE_JOINED, node=96c7d4b0-dc9b-4a9d-9a50-1b87bd4c512e
.but there are still plenty of jobs in the queue.
Robert Syme
@robsyme
May 10 2016 08:59 UTC
The full logs are here (but any pointers to debug myself would be great)
Robert Syme
@robsyme
May 10 2016 09:18 UTC
Oh, the "Skipping rebalancing (nothing scheduled)" is expected, as the worker node has no jobs scheduled when it sees the master come online.
Paolo Di Tommaso
@pditommaso
May 10 2016 09:46 UTC
You could enable a more detailed logging on the daemon side by using -trace nextflow.daemon.CustomStealingCollisionSpi
However I guess that's on the nodes there aren't enough mem/cpus
Could it be that?
Robert Syme
@robsyme
May 10 2016 09:47 UTC
I don't think so. Would that be reported with --trace?
Paolo Di Tommaso
@pditommaso
May 10 2016 09:48 UTC
This
Robert Syme
@robsyme
May 10 2016 09:48 UTC
Perfect
Paolo Di Tommaso
@pditommaso
May 10 2016 09:48 UTC
and this
Robert Syme
@robsyme
May 10 2016 09:51 UTC
-trace is on the master node, right?
Paolo Di Tommaso
@pditommaso
May 10 2016 09:51 UTC
nope, on the daemon(s)
Robert Syme
@robsyme
May 10 2016 09:53 UTC
hmmm. I get
$ nextflow node -trace nextflow.daemon.CustomStealingCollisionSpi
Unknown option: -trace -- Check the available commands and options and syntax with 'help'
Paolo Di Tommaso
@pditommaso
May 10 2016 09:53 UTC
um, it should be
nextflow  -trace ... node
Robert Syme
@robsyme
May 10 2016 09:54 UTC
silly me, sorry.
Paolo Di Tommaso
@pditommaso
May 10 2016 09:54 UTC
no pb!
Robert Syme
@robsyme
May 10 2016 09:58 UTC
There seem to be plenty of resources available on the lonely node:
TRACE n.daemon.CustomStealingCollisionSpi - Node `localhost` resources > cpus: 12 (12) - mem: 15.6 GB (942.5 MB) - active: 0 - waiting: 34
Oh, wait. Free mem < 1GB
That's it.
Paolo Di Tommaso
@pditommaso
May 10 2016 09:59 UTC
The JVM is consuming the remaining 15GB ?
Robert Syme
@robsyme
May 10 2016 10:00 UTC
No, it's being used as disk cache
Paolo Di Tommaso
@pditommaso
May 10 2016 10:00 UTC
oh
it's not too much ?
Robert Syme
@robsyme
May 10 2016 10:02 UTC
I don't think I have control over disk cache. My understanding is that memory used as cache is freed up as soon as any other application asks for it. I'd suggest that the free memory calculation should not include cache (but I'm not sure how to isolate that out).
Paolo Di Tommaso
@pditommaso
May 10 2016 10:03 UTC
you may be right, please report an issue at your convenience
Robert Syme
@robsyme
May 10 2016 10:04 UTC
Will do. Thanks Paolo!
Paolo Di Tommaso
@pditommaso
May 10 2016 10:04 UTC
Welcome!
Robert Syme
@robsyme
May 10 2016 10:05 UTC
In the meantime, a restart on the node is an easy fix :)
Paolo Di Tommaso
@pditommaso
May 10 2016 10:06 UTC
:)
Szilveszter Juhos
@szilvajuhos
May 10 2016 10:12 UTC
hola, if I want to store some data directly from nextflow in json, can I use the groovy json capabilities somehow?
so i do not have to write an other script outside my flow
Robert Syme
@robsyme
May 10 2016 10:15 UTC
Paolo Di Tommaso
@pditommaso
May 10 2016 10:16 UTC
@szilvajuhos Nextflow does not implement any special json feature, however it includes the excellent Groovy Json library. Have a look to http://groovy-lang.org/json.html
@robsyme A lot!
Robert Syme
@robsyme
May 10 2016 10:17 UTC
A fix faster than restarting:
# sync && echo 3 > /proc/sys/vm/drop_caches
Paolo Di Tommaso
@pditommaso
May 10 2016 10:17 UTC
ah, nice
to remember
@szilvajuhos I think you are looking for JsonBuilder
Szilveszter Juhos
@szilvajuhos
May 10 2016 10:34 UTC
that is ok, I just do not know how to mix nf and groovy
Paolo Di Tommaso
@pditommaso
May 10 2016 10:34 UTC
nothing special
NF is a groovy script
so just import it at the top of your script and then use it as you would do in a groovy script
Szilveszter Juhos
@szilvajuhos
May 10 2016 10:41 UTC
that is what I have expected, but likely I am making a simple error, like even if the whole "script" is nothing else but a single line of import statement, I am getting this:
Paolo Di Tommaso
@pditommaso
May 10 2016 10:41 UTC
@mes5k That a lot for your PR! I've commented it. By the way do you have a screenshot of the produced HTML ?
Szilveszter Juhos
@szilvajuhos
May 10 2016 10:41 UTC
obs_test.nf: 1: unable to resolve class groovy.json
@ line 1, column 1.
import groovy.json
^
Paolo Di Tommaso
@pditommaso
May 10 2016 10:41 UTC
import groovy.json.*
like in java, the only difference is that the semicolon is not required
Szilveszter Juhos
@szilvajuhos
May 10 2016 10:42 UTC
yes, as soon as I pasted, saw that the star is missing
Paolo Di Tommaso
@pditommaso
May 10 2016 10:42 UTC
yep
Szilveszter Juhos
@szilvajuhos
May 10 2016 10:43 UTC
kul, now it works, thanks for sorting out
Paolo Di Tommaso
@pditommaso
May 10 2016 10:43 UTC
:+1:
Paolo Di Tommaso
@pditommaso
May 10 2016 12:15 UTC
@robsyme I've uploaded 0.19.0-SNAPSHOT with a patch for #150, you may want to give it a try
Mike Smoot
@mes5k
May 10 2016 20:13 UTC
nextflow_screenshot.png
That's a screenshot of a nextflow DAG rendered with Cytoscape.js. It's a bit crooked in places because I adjusted the positioning of the nodes manually. You can zoom in/out on the actual html so it won't be as small in real life.
Paolo Di Tommaso
@pditommaso
May 10 2016 20:18 UTC
Not bad
do you think it could be useful ?
Mike Smoot
@mes5k
May 10 2016 20:22 UTC
Yes. Of course, what I'd really like to see is the visualization updated (e.g. nodes changing colors) dynamically as the pipeline progresses. Then serve that page somewhere so that people can easily visualize their progress. But I understand that's a bunch more work!
Paolo Di Tommaso
@pditommaso
May 10 2016 20:23 UTC
Yes, that would be very cool, but at this time it seems not feasible
Mike Smoot
@mes5k
May 10 2016 20:24 UTC
Understood.
Paolo Di Tommaso
@pditommaso
May 10 2016 20:24 UTC
however there's a plan to introduce an execution API that could be used to stream dag/timeline interactive rendering
Mike Smoot
@mes5k
May 10 2016 20:26 UTC
It seems like TraceObserver already gets us most of the way there, no? The roadblock as I understand it is that DAG is only created after the pipeline runs.
Paolo Di Tommaso
@pditommaso
May 10 2016 20:27 UTC
I was wondering if it can be improved using D3
Any idea?
It seems like TraceObserver already gets us most of the way there, no? The roadblock as I understand it is that DAG is only created after the pipeline runs.
yes
the biggest problem currently is that there isn't a daemon serving the output
Mike Smoot
@mes5k
May 10 2016 20:29 UTC
Yeah, D3 would work too. I just used Cytoscape because I've got history with them. D3's model for animation with data entering and exiting the visualization would be a natural fit for a dynamic visualization.
gotta run, but I'm going to experiment with a few things when I get a chance
Paolo Di Tommaso
@pditommaso
May 10 2016 20:34 UTC
I won't seems ungrateful and I'm really happy with your PR, but I would prefer to give a try to D3 before merging it
Mike Smoot
@mes5k
May 10 2016 23:18 UTC
Bare bones D3 has been added. It still needs some work to make it look nice, but that's not something I can work on at this point.