These are chat archives for nextflow-io/nextflow

15th
Feb 2019
Paolo Di Tommaso
@pditommaso
Feb 15 07:00
@aunderwo yeah, use collect instead, it should have an undocumented sort option in which you can use the same comparator logic as toSortedList
Anthony Underwood
@aunderwo
Feb 15 07:02
.collect(sort {......}) ??
Paolo Di Tommaso
@pditommaso
Feb 15 07:02
good morning! :satisfied:
.collect(sort: {...})
Anthony Underwood
@aunderwo
Feb 15 07:03
Morning - ah yes !! Thanks I'll give that a go :)
Anthony Underwood
@aunderwo
Feb 15 08:05
@pditommaso Thanks that worked. Saved some hacky bash checking of empty strings :)
Anton Khodak
@anton-khodak
Feb 15 09:29
Hi! Just to let the community know, I'm working on a server for remote Nextflow execution, the goal is to be able to run Nextflow from a web interface and to separate the execution part, the server will provide endpoints to run Nextflow workflows from a set of parameters, monitor their statuses and retrieve the outputs (at least letting know their location). Currently it interacts with Nextflow via command line calls, but potentially the whole thing could be replaced by a Groovy server with a native Nextflow integration. Our server is written in Python, it's in an early development stage now, the source code is on https://github.com/cellgeni/nf-server .
Anthony Ferrari
@af8
Feb 15 09:33
👍
Paolo Di Tommaso
@pditommaso
Feb 15 09:33
looks interesting
rfenouil
@rfenouil
Feb 15 09:45
@stevekm Thank you for your answer about optional inputs. For now I went with the "NO_FILE" strategy but I keep your reference for future needs :)
Sorry can't help for your question about suffixes...
To stop the execution of a pipeline once a specific process is done, I typically add a when directive into next process in the workflow, tied to a params.skipSecondPart=false parameter.
Hope it helps.
Alexey Dushen
@blacky0x0
Feb 15 10:23
Good day =) Is NextFlow WDL-compatible? The structures look pretty similar
Paolo Di Tommaso
@pditommaso
Feb 15 10:31
nope
javascript is similar to java but hope you don't confuse them =)
Alexey Dushen
@blacky0x0
Feb 15 10:39
hah, got it) Don't you think that supporting industry standards or specifications will increase popularity of the NF?
Paolo Di Tommaso
@pditommaso
Feb 15 10:42
industry standards change pass away quickly ..
Maxime Garcia
@MaxUlysse
Feb 15 10:45
we have our own NF standards B-)
Philip Jonsson
@kpjonsson
Feb 15 13:04
@pditommaso I've seen that, but it's a slightly different issue, no? The problem I have is that I want to use a NF workflow on compute clusters with different LSF configurations. On one of them, the base unit for the bsub command is GBs, whereas on the other it is MBs. For the former, that leads to jobs sitting around with memory requests in TBs, since NF just converts the process.memoryto a numeric. Am I wrong about that?
Paolo Di Tommaso
@pditommaso
Feb 15 13:06
Oops, sorry I've misread it
Philip Jonsson
@kpjonsson
Feb 15 13:07
No worries.
Paolo Di Tommaso
@pditommaso
Feb 15 13:07
that's bad
is bsub able to parse any mem unit when specified ?
Philip Jonsson
@kpjonsson
Feb 15 13:10
Good question. I'm actually not sure that it can.
micans
@micans
Feb 15 13:12
we have LSF_UNIT_FOR_LIMITS=MB in /etc/lsf.conf
it does not seem to be set in my shell
Philip Jonsson
@kpjonsson
Feb 15 13:13
Right. I don't think bsub can parse different units on submission.
Paolo Di Tommaso
@pditommaso
Feb 15 13:13
uff
micans
@micans
Feb 15 13:13
make it a nextflow option as well, executor-bound?
Philip Jonsson
@kpjonsson
Feb 15 13:16
Uff indeed.
Paolo Di Tommaso
@pditommaso
Feb 15 13:16
last resort
micans
@micans
Feb 15 13:18
Paolo Di Tommaso
@pditommaso
Feb 15 13:21
is it not exposed in any env variable ?
micans
@micans
Feb 15 13:21
I'm searching ... so far no luck. If /etc/lsf.conf is standard that's my best bet so far
still peeking
Paolo Di Tommaso
@pditommaso
Feb 15 13:22
(I hate LSF)
micans
@micans
Feb 15 13:22
new find: the file $LSF_ENVDIR/lsf.conf
I quite like it ... but I see your point
Paolo Di Tommaso
@pditommaso
Feb 15 13:23
:)
Philip Jonsson
@kpjonsson
Feb 15 13:23
new find: the file $LSF_ENVDIR/lsf.conf
Yep, this is where I see that LSF_UNIT_FOR_LIMITS=GB on my cluster.
micans
@micans
Feb 15 13:23
aha, /etc/lsf.conf is a symlink to this
so that is a beautiful portable solution :-P
Paolo Di Tommaso
@pditommaso
Feb 15 13:23
emmmm
that's available in the login node?
micans
@micans
Feb 15 13:25
it's on our head nodes and on our worker nodes
Paolo Di Tommaso
@pditommaso
Feb 15 13:25
well, if so @kpjonsson won a pull request proposal :satisfied:
Philip Jonsson
@kpjonsson
Feb 15 13:26
Oh dear.
Philip Jonsson
@kpjonsson
Feb 15 13:50
My idea would be to just take that as a parameter in executor and scale accordingly. Does that make sense?
Paolo Di Tommaso
@pditommaso
Feb 15 13:52
I prefer much more self-configuring solution
it would be enough to add a init method in the LsfExecutor overriding the base init
and it could read that $LSF_ENVDIR/lsf.conf and fetch that setting
Philip Jonsson
@kpjonsson
Feb 15 13:54
That sounds better.
Paolo Di Tommaso
@pditommaso
Feb 15 13:54
et voila
Stephen Kelly
@stevekm
Feb 15 15:06
@anton-khodak that sounds interesting, are you using the http-logging feature for tracking workflow execution? I was eventually planning to implement something similar for my users, at least just the monitoring part because I don't want users to naively click a 'Run' button a hundred times and blow up our cluster LOL
@anton-khodak also I see you are using Flask, was there any particular reason for choosing that? I started doing similar projects in Flask but ended up abandoning it for Django as soon as I needed to start using databases to track data. In this case, you might want a database for tracking the Nextflow execution logs, in which case Django's database ORM is a godsend.
Stephen Kelly
@stevekm
Feb 15 15:12

@rfenouil

To stop the execution of a pipeline once a specific process is done, I typically add a when directive into next process in the workflow

this is a good idea, unfortunately my workflow has a lot of branching and there would be a lot of steps I would need to trace out and disable, would get super complicated fast. I saw that there is a Nextflow console, I was hoping that maybe it could be used something like this, in order to insert a breakpoint in the workflow and enter an interactive console to manually execute steps and check different objects, @pditommaso not sure if this is possible??

also @pditommaso

(I hate LSF)

i would be interested in hearing why, I've only used it briefly so I don't know much about it, whats bad about it?

Alexey Dushen
@blacky0x0
Feb 15 15:19

insert a breakpoint in the workflow and enter an interactive console to manually execute steps and check different objects

@stevekm imho: It's almost impossible. I built the NextFlow project and debugged the test scripts a little. It executes as a separate process in the system as a single unit of work. After execution we have only output information and parsed AST-tree for DSL-text.

Alexey Dushen
@blacky0x0
Feb 15 15:27
NextFlow has awesome user's community and excellent tracing information and a lot of integrations. but it's a pity that impossible to debug as a source code. As for me the last most wanted things are GUI to monitor jobs and dividing whole the code to separate modules..
Anton Khodak
@anton-khodak
Feb 15 16:05
@stevekm I started from a minimal prototype before we get fixed on the direction, so there is just a minimal viable set of features for now. Currently jobs are tracked by the log files Nextflow leaves, but a database can pretty much be the next step. Django felt a bit heavyweight for prototyping, especially when you don't use front end and admin parts, and sqlalchemy is a great alternative for Django's ORM
Alexey Dushen
@blacky0x0
Feb 15 16:10
according to Weblog I just edited python echo server script to write to the local file and then just run tail -f pipeline_real_time_events_file
Alexey Dushen
@blacky0x0
Feb 15 16:18
are you talking about https://github.com/nextflow-io/nextflow/tree/master/modules/nf-console?
Stephen Kelly
@stevekm
Feb 15 16:19
oh yeah I remember now, I was using PostgreSQL since it could store the nested weblog JSON directly; https://github.com/stevekm/nf-dashboard/blob/master/api.js#L59
Paolo Di Tommaso
@pditommaso
Feb 15 17:30
Nextflow is pure java bytecode, therefore interactive debugging is surely technically possible
Paolo Di Tommaso
@pditommaso
Feb 15 18:06
if anybody wants to step-in and explore this feature, happy to support him/her
Alexey Dushen
@blacky0x0
Feb 15 20:08
It would be great! Could you make a video or an animated gif file with LICEcap?
Jemma Nelson
@fwip
Feb 15 20:22
Question: I have a repository with several processes in it. (These processes don't depend on one another, but have a lot of shared logic implemented in scripts shared between them). Is there an easy way to run a specific process from the commandline? Something like nextflow run https://github.com/my/repo pipeline1.nf ? I love the revision specification & automatic pull, but I don't see a way to use that with a repo containing multiple scripts.
My current workaround-plan is to simply check the code out manually and run it - or I suppose I could pull, then run it from the ~/.nextflow/assets/my/repo directory.
Also, I don't know if this is my misreading something, but it looks like running a separate version of a pipeline will replace the currently-checked out version with the new version - it'd be cool if there was an option to have them live side-by-side, like .../assets/my/repo/v1.1/ or .../assets/my/repo/master/
lastwon1216
@lastwon1216
Feb 15 21:55

Hello, is there any way I can get the part of the string before the first period to output files?
For example,
input:
test.fastq.gz
output:
test.aligned.out.bam

I tried file "${reads.baseName}*.bam", but its output brings test.fastqaligned.out.bam.

Laurence E. Bernstein
@lebernstein
Feb 15 23:01
Actually, I was wondering something similar. Can I take:
/blah/bleh/test.fastq.gz
and easily get:
/blah/bleh/test.fastq
or did I miss something simple?