These are chat archives for nextflow-io/nextflow

13th
Jun 2017
Paolo Di Tommaso
@pditommaso
Jun 13 2017 07:31
version 0.25.0-RC2 is out
Alexander Mikheyev
@mikheyev
Jun 13 2017 08:33
I apologize if this question is elementary, but I am really unfamiliar with java, but can't find the answer anywhere. If a channel generates a data structure, say an array, how can I reference it? For instance, if I use fromFilePairs to generate a file ID, I may want to modify it using regex substitution. How can this be accomplished?
Paolo Di Tommaso
@pditommaso
Jun 13 2017 08:34
Well, that's more a NF specific question than Java, hence you are in the right place :)
channel content is meant to be processed with NF operators
a very common operation is map that allows you to change the content of a channel (by creating a new one)
for example
Channel. fromFilePairs('/some/data/*')
                 .map { pairId, file ->  tuple( modifiedPairId, files )  }
Alexander Mikheyev
@mikheyev
Jun 13 2017 08:39
And the variables pairId, file are automatically recognized because there are two items returned from fromFilePairs? They don't have reserved names?
Paolo Di Tommaso
@pditommaso
Jun 13 2017 08:41
yes exactly, you have to think as parameter names in a function
actually the syntax { foo, bar, .. -> } defines a closure which is an anonymous function
something like
Alexander Mikheyev
@mikheyev
Jun 13 2017 08:42
OK, got it. Just to extend this example a bit, if you don't mind, how would I then rename each of the forward and reverse reads, separately?
Paolo Di Tommaso
@pditommaso
Jun 13 2017 08:42
def <no name>( foo, bar, .. ) { 
  /* do something */
  return someValues 
}
what do you mean by rename, rename the concrete files ?
Alexander Mikheyev
@mikheyev
Jun 13 2017 08:45
Perhaps rename is too specific an example, I just meant performing some operation on each of the files separately. I.e., how do I reference the individual files in the variable file?
Paolo Di Tommaso
@pditommaso
Jun 13 2017 08:49
OK, since fromFilePairs is supposed to capture pairs of files, that file is expected to be a list of files, hence you can do
file[0] and file[1]
or
specify the flat option in the fromFilePairs, that allows you to handle separately eg
Channel.fromFilePairs('/data/*', flat:true)
                 .map { pairId, fwdFile, rvFile -> /* do something */ }
                 .println()
Does make sense ?
Alexander Mikheyev
@mikheyev
Jun 13 2017 08:52
Yep, I had the flat:true already figured out :) I am trying to figure out indexing now.
Paolo Di Tommaso
@pditommaso
Jun 13 2017 08:53
good! :)
Alexander Mikheyev
@mikheyev
Jun 13 2017 08:53
So is the correct syntax .map {pairId, file[0], file[1] -> tuple( modifiedPairId, file0, file1 ) } without flat:true?
Paolo Di Tommaso
@pditommaso
Jun 13 2017 08:54
w/o flat yes
provided that modifiedPairId, file0 and file1 are identifiers that you have defined in that closure
Alexander Mikheyev
@mikheyev
Jun 13 2017 08:59
How do I do that? I feel lazy asking, but it might save me an hour or more of experimentation figuring this out.
Paolo Di Tommaso
@pditommaso
Jun 13 2017 09:01
well it depends what you need to do, but I have the feeling that you need some basic NF/groovy syntax primer
Alexander Mikheyev
@mikheyev
Jun 13 2017 09:01
That's probably true, let me read up
Paolo Di Tommaso
@pditommaso
Jun 13 2017 09:02
I would suggest that you get some confidence with it playing with some basic examples that you can find in the documentation
launch nextflow console and experiment a bit with the code snippets in the docs
also you may find useful this tutorial
Alexander Mikheyev
@mikheyev
Jun 13 2017 09:04
OK, thanks for the tip! I read through the docs, but couldn't figure out how to handle indexes, since I am not sure that's explicitly addressed there. I'll work through the courses, and hopefully that will help.
Paolo Di Tommaso
@pditommaso
Jun 13 2017 09:04
couldn't figure out how to handle indexes
how to create a genome index you mean ?
Alexander Mikheyev
@mikheyev
Jun 13 2017 09:05
sorry, array indexes
Paolo Di Tommaso
@pditommaso
Jun 13 2017 09:05
ah
a[0], a[1] ?
Alexander Mikheyev
@mikheyev
Jun 13 2017 09:06
yep
Paolo Di Tommaso
@pditommaso
Jun 13 2017 09:06
and that answer your question ? :)
Alexander Mikheyev
@mikheyev
Jun 13 2017 09:07
Yes, as soon as I figure out how to define identifiers in a closure. For that I need to read up on groovy, most likely.
Paolo Di Tommaso
@pditommaso
Jun 13 2017 09:07
like in any other construct
def local_var = 1 
global_var = 2
Alexander Mikheyev
@mikheyev
Jun 13 2017 09:08
and within the scope of a channel?
Paolo Di Tommaso
@pditommaso
Jun 13 2017 09:09
it's the value returned by the closure, eg
Channel.from( 1,2,3 ).map { val -> 
   def local_var = val + 2 
   return local_var
 } 
 .println()
Phil Ewels
@ewels
Jun 13 2017 09:11
Hi @pditommaso! I've just stumbled across a (self-inflicted) pipeline error that I've found in our pipelines a few times now. It happens when we try to use params.something as a regular variable. But if defined several times in a pipeline script, I don't think that they will be overwritten like a normal variable, right? For example, see this code chunk
Alexander Mikheyev
@mikheyev
Jun 13 2017 09:11
OK, cool. That makes sense. Thank you!
Phil Ewels
@ewels
Jun 13 2017 09:11
If I'm correct that it's not possible to define params.something more than once, do you think it would be possible for NF to throw an error when a script attempts to do this?
It'd save a bit of time debugging ;)
Paolo Di Tommaso
@pditommaso
Jun 13 2017 09:11
@mikheyev have a look to this cheat sheet for a quick intro to groovy syntax https://dzone.com/refcardz/groovy
@ewels Hi, maybe a warning. I need to investigate it may be not so trivial. Please open an issue for that
Phil Ewels
@ewels
Jun 13 2017 09:13
ok thanks, will do :+1:
Alexander Mikheyev
@mikheyev
Jun 13 2017 09:14
@pditommaso Thanks a lot for your help! I feel like I can grasp the basic concepts of Groovy, and of NF separately, but putting them together will take a bit of experimentation on my part :) Your explanation already resulted in a minor breakthrough in my brain.
Paolo Di Tommaso
@pditommaso
Jun 13 2017 09:16
you are welcome, NF requires a little paradigm shift at the beginning, once you get it's very easy to use
Paolo Di Tommaso
@pditommaso
Jun 13 2017 11:16
Nextflow workshop schedule is now online
https://www.nextflow.io/blog/2017/nextflow-workshop.html
Simone Baffelli
@baffelli
Jun 13 2017 13:11
Hello! Is it possible to stage a temporary file in the current process working directory?
/*
* Get shapefile
*/
process get_shapefile{
      publishDir "${params.results}/geo"

        input:
            val(name) from params.feature_name
        output:
            file('features.shp') into shapefile

        shell:
                req = "https://api3.geo.admin.ch/rest/services/api\
/MapServer/find?layer=ch.swisstopo.swissnames3d&\
searchText=${name}&searchField=name&geometryFormat=geojson".stripMargin().stripIndent()
                response = new URL(req).text
                jsonSlurper = new JsonSlurper()
                resp_js = jsonSlurper.parseText(response)
                jsf = new File('resp.json')
                jsf.text = JsonOutput.toJson(resp_js['results'][0])
                println(jsf.getParent())
                '''
                ogr2ogr features.shp resp.json
                '''
}
I am trying to save the response to a file for later inspection, but getParent returns "null". And if I cd into the working directory i don't see resp.json
Paolo Di Tommaso
@pditommaso
Jun 13 2017 13:17
a code block is delimited with triple ``` not '''
:)
Simone Baffelli
@baffelli
Jun 13 2017 13:18
ahaha true
not '''
I always confuse it
Paolo Di Tommaso
@pditommaso
Jun 13 2017 13:18
:+1:
Simone Baffelli
@baffelli
Jun 13 2017 13:18
Now it looks nicer :smile:
Paolo Di Tommaso
@pditommaso
Jun 13 2017 13:19
yes definitely more readable
anyhow you should manage that outside the process scope
Simone Baffelli
@baffelli
Jun 13 2017 13:21
Or split it into two processes
Paolo Di Tommaso
@pditommaso
Jun 13 2017 13:21
exactly
Simone Baffelli
@baffelli
Jun 13 2017 13:21
send request -> json file
json file -> shapefile
I'm a lazy guy :smile: I wanted to avoid that
Paolo Di Tommaso
@pditommaso
Jun 13 2017 13:22
I know but it looks to me that also a map could work
at the end you are fetching the json from the url and saving to a file, no?
if you create a channel emitting the string content resp_js['results'][0] then a process can save automatically to a file
Simone Baffelli
@baffelli
Jun 13 2017 13:39
correct, I'm fetching it
and then converting it to another format
Tiffany Delhomme
@tdelhomme
Jun 13 2017 14:06
hi all, does any one know why the following command returns an exit code of 1 :
$ nextflow run iarcbioinfo/template-nf --help
N E X T F L O W  ~  version 0.24.4
Pulling iarcbioinfo/template-nf ...
 downloaded from https://github.com/IARCbioinfo/template-nf.git
Launching `iarcbioinfo/template-nf` [elegant_thompson] - revision: e7f71cb1d1 [master]

--------------------------------------------------------
  <PROGRAM_NAME> <VERSION>: <SHORT DESCRIPTION>         
--------------------------------------------------------
Copyright (C) IARC/WHO
This program comes with ABSOLUTELY NO WARRANTY; for details see LICENSE
This is free software, and you are welcome to redistribute it
under certain conditions; see LICENSE for details.
--------------------------------------------------------

--------------------------------------------------------
  USAGE                                                 
--------------------------------------------------------

nextflow run iarcbioinfo/template-nf [-with-docker] [OPTIONS]

Mandatory arguments:
--<OPTION>                      <TYPE>                      <DESCRIPTION>

Optional arguments:
--<OPTION>                      <TYPE>                      <DESCRIPTION>

Flags:
--<FLAG>                                                    <DESCRIPTION>

$ echo $?
1
the output is what I am looking for, without any error...
Paolo Di Tommaso
@pditommaso
Jun 13 2017 14:07
because there's an exit 1
:)
the fantastic world of programming .. !
Tiffany Delhomme
@tdelhomme
Jun 13 2017 14:10
here yes...
Félix C. Morency
@fmorency
Jun 13 2017 14:35
Anyone here using SLURM and autofs?
Paolo Di Tommaso
@pditommaso
Jun 13 2017 14:41
we are using it with SGE
adanzon
@adanzon
Jun 13 2017 14:52
Hi, I'm a new nextflow user and i was wondering what is the best way to deal with group of bam file that need to be processed together (by mutect2 for example) normal/tumor/tumor_t1/tumortt2, files name aren't formatted so i can't use this kind of tricks "*{1,2}.fq". I think i need to use something like groupBy
```
adanzon
@adanzon
Jun 13 2017 14:58
list = Channel 
     .from( [1,'A','B'], [1,'B','C'], [2,'B','C'],[2,'X','Y'], [2,'TAC','CAC'])
     .groupBy { it[0] }
[1:[[1, A, B], [1, B, C]], 2:[[2, B, C], [2, X, Y], [2, TAC, CAC]]]
I tryed this but how can i use the map returned by groupBy for the next process ?
Paolo Di Tommaso
@pditommaso
Jun 13 2017 15:08
Hi, how do you manage usually w/o NF ?
adanzon
@adanzon
Jun 13 2017 15:11
Renaming files in bash to respect SampleId_Status format
But i want to manage it in a cleaner way creating a channel of pairs of C/T, C/T1 files using the sample ID and status (C for constit and T for tumor) witch came from a tsv file containing my metadata : "ID PATH STATUS GENDER ..."
Paolo Di Tommaso
@pditommaso
Jun 13 2017 15:19
ok, you can apply a rule to rename them
I would propose a couple of option:
1) create a txt file listing the file pairs as you need, then read line by line and create a channel from it
2) use fromFilePairs providing a custom rule to match the file pairs as needed (since you are able to rename, you should be able to match), eg.
Channel
    .fromFilePairs('/some/data/*', size: -1) { file -> file.extension }
    .println { ext, files -> "Files with the extension $ext are $files" }
Félix C. Morency
@fmorency
Jun 13 2017 15:27
@pditommaso is NF the one creating .command.login the work directories?
Phil Ewels
@ewels
Jun 13 2017 15:27
yes
adanzon
@adanzon
Jun 13 2017 15:28
Ok i will try that it's closer to my initial implementation. I think i was over-thinking this. Thanks @pditommaso
Phil Ewels
@ewels
Jun 13 2017 15:28
@adanzon - our Cancer pipeline handles this with a meta file: https://github.com/SciLifeLab/CAW
(We also run mutect2)
(Basically suggestion 1 from Paolo)
adanzon
@adanzon
Jun 13 2017 15:33
Yeah i saw your project (it's quite nice actually) i try to do something close but i can't figure out how you generate pairs of files to process together from
[05_086, /data/file.bam, 183,830285] [05_086, /data/file2.bam, 183,830285]
Okay so you're recreating another tsv file containing all the pair to process ? @ewels
Phil Ewels
@ewels
Jun 13 2017 15:37
@MaxUlysse knows best about this
Maxime Garcia
@MaxUlysse
Jun 13 2017 15:40
@adanzon I'm happy you find the project nice. If I can be of any help ping me.
From what I understand you want to generate pairs of C/T files
adanzon
@adanzon
Jun 13 2017 15:42
yes with possible multiple tumor files for the same constit @MaxUlysse
Maxime Garcia
@MaxUlysse
Jun 13 2017 15:42
We're beginning our pipeline with TSV files. Inside we're using Nextflow channels. But since we have multiple possible entry points we export every step into TSV that can be reused after
Ok, so something like C/T1, C/T2...
We're doing that in fact
adanzon
@adanzon
Jun 13 2017 15:43
I saw that for now i have an entry tsv file, and i was trying to operate only with NF channels
Maxime Garcia
@MaxUlysse
Jun 13 2017 15:44
What is your TSV file like ?
adanzon
@adanzon
Jun 13 2017 15:49
something like that [ID, /data/file.bam, status (C,T,T2,..)] with metadata like insert_size and ngs tech used
Maxime Garcia
@MaxUlysse
Jun 13 2017 15:57
Ok, I'll use a groupTuple
Or no, maybe more like a channel for your C, and a channel for your tumors and then use a mix
Sorry I'm not on my computer so it's a little difficult to find the good example
Maxime Garcia
@MaxUlysse
Jun 13 2017 16:06
I have it
We have a normal sample in one channel and tumor samples in the other, so we're using spread to paste the second over the first channel
I hope it'll be of some help @adanzon
adanzon
@adanzon
Jun 13 2017 16:09
Oh thanks ! I think i got it it's work well for C/T but i think i will need to rework it to work with multiple tumors files but it's a great start !
Maxime Garcia
@MaxUlysse
Jun 13 2017 16:12
It's quite straight forward to do that in Nextflow, you just need to take care of your IDs in your channel