These are chat archives for nextflow-io/nextflow

31st
Mar 2017
Maarten van Gompel
@proycon
Mar 31 2017 08:28
I have a question about multiple input files and the filenames of the staged files; I want to catch input files with a glob expression (e.g. *.txt), but explicitly retain the original filenames rather than rename and number them, as it's metadata I explicitly need to preserve. It's a bit unclear from the documentation how to achieve this most elegantly. How should I go about this?
Maarten van Gompel
@proycon
Mar 31 2017 08:33
hmm...thanks.. but in that example the process operates on single files, mine is a single process taking a whole list of files as input
Maarten van Gompel
@proycon
Mar 31 2017 08:39
perhaps with mode flatten on the output channel but then I'm still unsure how to collect it on the input side
Evan Floden
@evanfloden
Mar 31 2017 08:40

If you need to preserve the metadata in the filename, I would create a “sampleID” from the filename and pass that sampleID around with all the files. In this way the first value acts like a key and allows easy use of many operators. The input channel could be:

set val (sampleID), file(sampleID.txt), file (sampleID.tab) ...

Can you post an example of the filenames/process and how you would like them grouped?

Paolo Di Tommaso
@pditommaso
Mar 31 2017 08:43
The glob expression in the input files is not meant to filter files, but as a scheme to rename them
Maarten van Gompel
@proycon
Mar 31 2017 08:43
but if I want to catch multiple input files in one go I need to use a glob expression right?
Paolo Di Tommaso
@pditommaso
Mar 31 2017 08:44
If you want to preserve the original names use a variable handle declaration instead of wildcards
Maarten van Gompel
@proycon
Mar 31 2017 08:44
otherwise I would get only one from the channel each time?
Paolo Di Tommaso
@pditommaso
Mar 31 2017 08:44
Not in the process input
But in the Channel.fromPath expression
Maarten van Gompel
@proycon
Mar 31 2017 08:45
so file multiplefiles from somechannel would work and contain multiple files if I do somechannel = Channel.fromPath("*.txt") ?
hmm, no, that would get them one by one
sorry I'm a bit confused, it's my first steps with nextflow (and groovy) :)
Paolo Di Tommaso
@pditommaso
Mar 31 2017 08:47
Yes
Wait almost
Channel.fromPath("*.txt") is correct
But then you need to group the files you want to keep together
Have a look at groupTuple in the doc
Maarten van Gompel
@proycon
Mar 31 2017 08:50
right, I used that somewhere already even
I'll try construct a minimal example first
Paolo Di Tommaso
@pditommaso
Mar 31 2017 08:53
:+1:
Phil Ewels
@ewels
Mar 31 2017 09:20
Hi @pditommaso - just want to check that echo: true works as I expect it to..
If I specify it in a process, it should redirect stdout from that process script to the terminal where I'm running Nextflow, right?
(I get nothing)
Kevin Sayers
@KevinSayers
Mar 31 2017 09:25
I think it might just be 'echo true'
Phil Ewels
@ewels
Mar 31 2017 09:25
Nevermind sorry, ignore me - being too impatient. I expected the logs to appear in real time, but they come in a chunk once the process finishes
(I kept killing it before it go that far)
And yes @KevinSayers - you're right (I had echo true in my script actually, that was just me misremembering as I typed in gitter)
Ok great. Would be a little nicer if it worked a bit more like tail -f on the logs, as I often find myself going into work directories to do that to check how some long-running processes are getting on. But that's a minor thing.
Evan Floden
@evanfloden
Mar 31 2017 09:40

@proycon See this example below.

/* Given 6 files = a1.txt, a2.txt, a3.txt, a1.tab, a2.tab, a3.tab */
filesChannel = Channel.fromPath("*.txt")
filesChannel
  .view()

Returns 3 items of 1 element each:

a1.txt
a2.txt
a3.txt
listedChannel = Channel.fromPath("*.txt")
  .collect()
  .view()

Returns 1 item with 3 elements each

a1.txt, a2.txt, a3.txt
groupedChannel = Channel.fromPath("*.t*")
  .map { file -> tuple(file.baseName, file) }
  .groupTuple()
  .view()

Now starting with any file .t , we group on the baseName. Returns 3 items of 3 elements.

a1, a1.txt, a1.tab
a2, a2.txt, a2.tab
a3, a3.txt, a3.tab
Maarten van Gompel
@proycon
Mar 31 2017 10:23
@skptic Thanks! That looks very helpful!
Evan Floden
@evanfloden
Mar 31 2017 10:25
No problem, if you have any questions, post your actual filenames, how you want them grouped/processed and we can work through it.
Maarten van Gompel
@proycon
Mar 31 2017 10:26
I think the collect() option was the key for me already, the 2nd step I might not even need
Evan Floden
@evanfloden
Mar 31 2017 10:26
:+1:
Maarten van Gompel
@proycon
Mar 31 2017 10:29
This is my toy example for this issue: https://github.com/LanguageMachines/lamaflow/blob/master/multiinputtest.nf .. the multiple input and output now seems to work, but it refuses to find the output files currently (perhaps because in the toy example they're identical to the input files and those are excluded from the glob according to the documentation)
Evan Floden
@evanfloden
Mar 31 2017 10:34
Cool, so in your example, ‘all documents’ becomes a list of all .txt files. I think your output should be outputdir/*.txt instead of output/*.txt
Maarten van Gompel
@proycon
Mar 31 2017 10:35
Oh right, that was a silly mistake indeed :)
though strangely enough, fixing that still doesn't fix that.. so perhaps my original hunch was right?
Maarten van Gompel
@proycon
Mar 31 2017 10:44
seems so, If I change the filenames so it's not identical to input it works fine
Paolo Di Tommaso
@pditommaso
Mar 31 2017 17:46
rachanaj
@rachanaj
Mar 31 2017 21:21
Hi I want to pass 2 config files to nextflow at command line. For example I want to do this: nextflow run src/nextflow/metrics.nf -c $1 -c $2 -resume
Where $1 and $2 are some config files
Paolo Di Tommaso
@pditommaso
Mar 31 2017 21:23
and?
rachanaj
@rachanaj
Mar 31 2017 21:23
And this does not work
Paolo Di Tommaso
@pditommaso
Mar 31 2017 21:24
um, some details more? what's not working? what are you expecting ?
rachanaj
@rachanaj
Mar 31 2017 21:26
Ok. sorry. I want to be able to pass 2 differnt config files to the nextflow script.
Currently my script takes only 1 config files. So if I do something like: nextflow run src/nextflow/metrics.nf -c $1 , it works and the pipeline runs as expected
But if I input 2 differnt config files using the -c flag: such "nextflow run src/nextflow/metrics.nf -c configfile1.txt -c configfile2.txt", The script only reads the first config file but noth the second
Is there a way to do that is nextflow?