These are chat archives for nextflow-io/nextflow

17th
Dec 2018
Rad Suchecki
@rsuchecki
Dec 17 2018 00:54
@stevekm re: pollInterval- you can see that in .nextflow.log e.g. : n.processor.TaskPollingMonitor - Creating task monitor for executor 'slurm' > capacity: 100; pollInterval: 5s; dumpInterval: 5m
Paolo Di Tommaso
@pditommaso
Dec 17 2018 09:00
@aunderwo maybe it's worth to open an issue including a test case to replicate the issue
Anthony Underwood
@aunderwo
Dec 17 2018 09:03
@pditommaso the problem I have found with this is that it is not consistent - the worst kind of issue to debug :( At first I thought it may be my config but I'm not sure now since others are seeing the same thing
Paolo Di Tommaso
@pditommaso
Dec 17 2018 09:04
also a minimal example that sometimes hang would help
Brad Langhorst
@bwlang
Dec 17 2018 13:40
odd problem… i see this for one particular bam file, all others in run completed: File seems ok, not truncated or anything.
https://gist.github.com/bwlang/0573b66452939aed81ceeb943ccd778c
is this a nextflow issue that’s cropping up because it’s running for so long?
it’s this line that makes me wonder: .command.stub: redirection error: cannot duplicate fd: Too many open files
Paolo Di Tommaso
@pditommaso
Dec 17 2018 13:42
that's not nextflow error, it's Fastqc reporting it
TLDR; increase the ulimit for open files
Brad Langhorst
@bwlang
Dec 17 2018 13:43
The stack trace looks like fastqc… but I’m wondering about the file limit error that comes first?
ulimit -n
65535
Paolo Di Tommaso
@pditommaso
Dec 17 2018 13:44
it could even be a file close() leak in fastqc ..
Brad Langhorst
@bwlang
Dec 17 2018 13:45
hmm - thanks for the pointer… i’ll dig a bit more into that.
Tobias Neumann
@t-neumann
Dec 17 2018 14:30
also a minimal example that sometimes hang would help
@aunderwo @pditommaso The problem with a test case for me is, that I only observed this on AWS, so I would need the data on s3 and the associated AWS batch infrastructure publicly available to anyone which of course would be an insane thing to do
Krittin Phornsiricharoenphant
@sinonkt
Dec 17 2018 14:42

Hi guys, is there anyway to get runId as shown on weblog started event .when i just executed nextflow run -bg for further record on database

{
  runId: '6fec6765-d71c-4c35-bf97-5b1ae89b8673',
  event: 'started',
  runName: 'insane_roentgen',
  runStatus: 'started',
  utcTime: '2018-12-17T14:26:24Z',
};

is weblog feature stable to rely on, is it extensively use by our community?

Krittin Phornsiricharoenphant
@sinonkt
Dec 17 2018 14:58
Or is there anyway to specify runId like nextflow run -runId {runId}
Krittin Phornsiricharoenphant
@sinonkt
Dec 17 2018 15:38
may be history file could be helpful.
micans
@micans
Dec 17 2018 16:09
Maybe this is related; I find getting the run names and the customised logs a bit fiddly ... I know how to do it, but it's extra work after the main run. Right now I use nextflow log runname -f script. How about an option for the main nextflow run that appends all executed commands to a (gzipped) file? This would thus (because appending) work with -resume. Perhaps it could have printf-like format specifiers, so logging can be customised. Only flush it to log once the exit status is known. I'm interested in other ideas/opinions! My current thinking is that a list of all executed commands is a nice super-basic and trustworthy reference to show users what's been done.
Timothy R. Fallon
@photocyte
Dec 17 2018 18:50
Hello all, I have the following nextflow/groovy code to produce read pairs from columns of a TSV, but the readPairs_ch only emits the last row of the TSV into a process that is using it as an input. Any idea what is wrong or an alternative way to do this?
allLines  = file(params.samples).readLines()
F_ch = Channel.create()
R_ch = Channel.create()
readPairs_ch = Channel.create()
for( line in allLines ) {
    splitline = line.split("\t")
    println splitline
    F_read = splitline[2]
    R_read = splitline[3]
    Channel.fromPath(F_read).set{ F_ch }
    Channel.fromPath(R_read).set{ R_ch }
    F_ch.combine(R_ch).set{ readPairs_ch }
    }
Paolo Di Tommaso
@pditommaso
Dec 17 2018 18:51
nearly ..
triple `
new-line
Timothy R. Fallon
@photocyte
Dec 17 2018 18:51
sorry markdown formatting is screwy
Paolo Di Tommaso
@pditommaso
Dec 17 2018 18:51
code
triple ```
Timothy R. Fallon
@photocyte
Dec 17 2018 18:51
thanks :)
Paolo Di Tommaso
@pditommaso
Dec 17 2018 18:51
:clap:
forget for
use splitCsv instead
Timothy R. Fallon
@photocyte
Dec 17 2018 18:54
Is for an operator? Not seeing it in the documentation
Or you're saying don't use a loop
Paolo Di Tommaso
@pditommaso
Dec 17 2018 18:55
yeap
Timothy R. Fallon
@photocyte
Dec 17 2018 18:55
got it
Is it possible to use such loops generally? I'm thinking of the https://github.com/nextflow-io/patterns/blob/master/docs/feedback-loop.adoc as an example of more dynamic channels
Paolo Di Tommaso
@pditommaso
Dec 17 2018 18:56
Channel.fromPath(params.samples)
     .splitCsv(sep:'\t') { row -> ... etc }
it's possible but 99% you don't need it
(I'm fighting with english grammar :))
Timothy R. Fallon
@photocyte
Dec 17 2018 18:58
Okay, thanks! Will implement with splitCsv. If you have an explanation for why the original code block doesn't work I'd be interested to know. Helps me understand how NextFlow is working
Paolo Di Tommaso
@pditommaso
Dec 17 2018 18:58
nice :+1:
because in your former approach you need to create, bind values and then also close the channel ..
verbose and error prone, much more idiomatic the splitCsv approach
Timothy R. Fallon
@photocyte
Dec 17 2018 19:00
and .set{} closes the channel?
Paolo Di Tommaso
@pditommaso
Dec 17 2018 19:00
nope
Timothy R. Fallon
@photocyte
Dec 17 2018 19:16
Thanks, here is the solution I came up with.
Channel.fromPath(params.samples)
     .splitCsv(sep:'\t',header:false)
     .map{ row ->
     println row
     return tuple(file(row[2]), file(row[3])) }
     .set{ readPairs_ch }
Also pretty much covered in this pattern https://github.com/nextflow-io/patterns/blob/master/docs/process-per-csv-record.adoc , so I should have read the documentation more!
Timothy R. Fallon
@photocyte
Dec 17 2018 19:30
Another question, in the process input, is there a way to make a file/queue channel symlink be created in a subdirectory of the process working folder?
Stephen Kelly
@stevekm
Dec 17 2018 20:58
@sinonkt I think you are looking for workflow.sessionId. That is the ID that is generated by Nextflow each time you run the workflow. If you want a custom runID value then you can pass it yourself, I do that in my pipelines e.g. here: https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/2492906eb57cb74b6b112140ab58db2117ae2315/main.nf#L53
@photocyte There are some examples on parsing samplesheets here that might be helpful: https://github.com/stevekm/nextflow-demos/blob/master/parse-samplesheet/main.nf
also there is a hack~ish technique for joining sample ID pairs across channels shown here: https://github.com/stevekm/nextflow-demos/blob/master/join-pairs/main.nf ; I use it in my pipeline here: https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/2492906eb57cb74b6b112140ab58db2117ae2315/main.nf#L1446
Stephen Kelly
@stevekm
Dec 17 2018 21:03

is there a way to make a file/queue channel symlink be created in a subdirectory of the process working folder?

what kind of use case did you have in mind for this? I am not sure there is a direct method but odds are you can just re-tool your task to deal with it, for example in your task script mkdir a new directory then move the input files into it

Timothy R. Fallon
@photocyte
Dec 17 2018 21:04
I think the moving solution might screw up the process caching, but haven't tried
I'm interoperating with an existing pipeline that has some expectations about directory structure
Stephen Kelly
@stevekm
Dec 17 2018 21:05
in that case, the sooner you move the entire pipeline into Nextflow the better :) lol
Timothy R. Fallon
@photocyte
Dec 17 2018 21:07
I've given that some thought, but easier said than done :) It might be possible using the named pattern rename feature described here, but haven't tried that either https://www.nextflow.io/docs/latest/process.html#multiple-input-files
micans
@micans
Dec 17 2018 21:33
@photocyte you can do e.g. file('seq/foobar') from ch_input or file('seq/*') from ch_input.collect(). This might help?
NF will arrange the files in the directory structure as specified.
Timothy R. Fallon
@photocyte
Dec 17 2018 21:34
I think that is the named pattern rename feature. Will definitely give that a try, just debugging other parts of my pipeline currently :)
micans
@micans
Dec 17 2018 21:45
Ah OK, you got it already, should have spotted that. Good luck debugging! (my status is permanently set to 'bugging or debugging')