These are chat archives for nextflow-io/nextflow

2nd
Nov 2017
Maxime Garcia
@MaxUlysse
Nov 02 2017 07:45
Can you give us a little more details about your output from your process A
But my first choice would be to put all the files you to go trough your process B in the same channel
Felix Kokocinski
@fkokocinski
Nov 02 2017 09:45
Does anyone know the best way to get a variable set within a process to other processes?
In my case: process 1 is an external script that writes a string to a file, process 2 and all following want to use the string to form path names, etc.
Would I create a channel in process 2 that emits the string after reading the file?
Or can I set a global variable within a process?
Paolo Di Tommaso
@pditommaso
Nov 02 2017 09:48
processes are isolated by design
they can only communicate data via input/output channels or a global (read-only) variable
Felix Kokocinski
@fkokocinski
Nov 02 2017 10:22
OK, thanks, so I think I would have to send the string to an output channel, create another process that uses this as input to build the path names and sends these as output to other processes.
Paolo Di Tommaso
@pditommaso
Nov 02 2017 10:24
not necessarily it has to be a process, for a small task you can also use one (or more) operator(s)
Felix Kokocinski
@fkokocinski
Nov 02 2017 10:27
ah, but how can I time the operator to run after process 1 which creates the file containing the string?
Paolo Di Tommaso
@pditommaso
Nov 02 2017 10:28
if the operator takes as input the output of the process it's implicit
Felix Kokocinski
@fkokocinski
Nov 02 2017 10:32
ah, yes, I'll try that. Many thanks Paolo.
Paolo Di Tommaso
@pditommaso
Nov 02 2017 10:33
:+1:
Felix Kokocinski
@fkokocinski
Nov 02 2017 10:40
sorry, one last thing: Is there a way to use this path I create dynamically in a publishDir directive?
The downstream process writes a number of files there that I don't explicitly declare as output but want to conserve.
Or is it generally enough to publishDir the parent folder?
Paolo Di Tommaso
@pditommaso
Nov 02 2017 10:42
yes, BUT downstream tasks should not depend on the output stored by publishDir I would suggest to use the usual output mechanism
Evan Floden
@evanfloden
Nov 02 2017 10:43
A massive thank you to @ewels and all those who worked on the html report. Just ran my first several day execution with it and worked perfect!
The two things I can think of to add would be: i) automated emailing of the report ii) have the hover over on the graphs include the task tag.
Paolo Di Tommaso
@pditommaso
Nov 02 2017 10:47
what do you mean by ii) ?
Evan Floden
@evanfloden
Nov 02 2017 10:47
1 and 2
Paolo Di Tommaso
@pditommaso
Nov 02 2017 10:47
ahhh
you mean the scatter ?
Evan Floden
@evanfloden
Nov 02 2017 10:48
In the report graphs, press "show closest data on hover"
For me this shows the process name which is redundant.
Edgar
@edgano
Nov 02 2017 10:48
A massive thank you to @ewels and all those who worked on the html report. Just ran my first several day execution with it and worked perfect!
Working in the 1st! :)
Evan Floden
@evanfloden
Nov 02 2017 10:48
The task tag would be much more useful so I know which task is using the resources shown.
Paolo Di Tommaso
@pditommaso
Nov 02 2017 10:51
still not understanding where do you want to use/show it
here, in the however over, default_alignment is redundant
The task tag would be much more useful so I know which task used 36.5GB of mem for example
Paolo Di Tommaso
@pditommaso
Nov 02 2017 10:55
I see! you are right, create an issue, I guess there will be a long list of enhancements request for this report
Evan Floden
@evanfloden
Nov 02 2017 10:55
Great!
Simone Baffelli
@baffelli
Nov 02 2017 15:16
Good afternoon. Can somebody help me solve a mistery?
I have this process:
process variogramRegressors{

    input:
        set file(dem_gc), file(dem_seg_par) from dem_gc_phgt
        set    file(plist), file(ref_slc_par),  file(ref_mli_par), file(ref_mli) from plist_for_phgt
        set file(pmap_coord), file(pradar_coord) from geocoded_plist


    output:
        file(phgt_txt) into (phgt, phgt_for_variogram)
        set file(phgt_txt), file(pmap_coord_txt), file(pradar_coord_txt) into regressors_for_variogram
        file(spatialGridClean) into (spatialGrid, spatialGridForPtVgm)

    shell:
        '''
            echo à
            data2pt !{dem_gc} !{ref_mli_par} !{plist} !{ref_slc_par} phgt 1 2
            pdata_to_csv.py !{plist} !{ref_slc_par} phgt FLOAT phgt_txt
            pdata_to_csv.py !{plist} !{ref_slc_par} !{pmap_coord}  FLOAT pmap_coord_txt
            pdata_to_csv.py !{plist} !{ref_slc_par} !{pradar_coord}  FLOAT pradar_coord_txt
            echo "ridx azidx r az x y h" > spatialGrid
            paste <(awk 'NR > 1 {print $1, $2, $3, $4}' pradar_coord_txt) <(awk 'NR > 1 {print $3, $4}' pmap_coord_txt) <(awk 'NR > 1 {print $3}' phgt_txt) >> spatialGrid
            #Convert to single spaces
            cat spatialGrid | tr '\\t' ' '  >> spatialGridClean
        '''
when it is run by nextflow, the output file spatialGridClean is empty. However, when I cd to the workdir and run the command using .command.sh, the file is produced as expected
I suspected it might be somehow related with quoting of special chacters
but I cannot find how
Edgar
@edgano
Nov 02 2017 15:23
I'm not an expert... but I think that you are only saving spatialGridClean
my mistake, i miss understood the question
Simone Baffelli
@baffelli
Nov 02 2017 15:24
it is empty when run by nextflow
but it is created correctly when i cd to the workdir
and run the bash command
Evan Floden
@evanfloden
Nov 02 2017 15:25
I suspected it might be somehow related with quoting of special chacters
Simone Baffelli
@baffelli
Nov 02 2017 15:25
but then, why is the bash script working correctly?
it is generated by nextflow
Evan Floden
@evanfloden
Nov 02 2017 15:30
How about adding the spatialGrid file as an output and examine it. May help solve in which line the problem is occurring
Simone Baffelli
@baffelli
Nov 02 2017 15:31
The problem is simply that if I cd to the workdir spatialGridClean is empty
However, if I run the script manually with .command.sh it is filled as expected
it is very puzzling
Evan Floden
@evanfloden
Nov 02 2017 15:32
Sure, but you don't know if spatialGrid is ever populated, correct?
Simone Baffelli
@baffelli
Nov 02 2017 15:33
i do
it is
so I know something should be failing at that step cat spatialGridInit | tr "\\t" ' ' >> spatialGridClean
which is an horrible way to convert all tabs to spaces
Evan Floden
@evanfloden
Nov 02 2017 15:33
Ah, even when running NF. Cool, so you know it is the last line?
Simone Baffelli
@baffelli
Nov 02 2017 15:33
I do
but as I said, when I run it manually it works, but NF somehow fails to make it run
perhaps a different bash setting?
Evan Floden
@evanfloden
Nov 02 2017 15:39
I'm stuck as well then. I'm sure someone will come along soon
Simone Baffelli
@baffelli
Nov 02 2017 15:43
I''l try with sed instead of tr
just out of curiosity
Phil Ewels
@ewels
Nov 02 2017 16:43
@skptic - great! Glad it worked! Suggestion (i) was part of my original plan for the hackathon but wasn't the top priority and never made it through to the release. If you fancy working on it then our pipelines already have e-mailed reports for template code, plus @pditommaso and I have chatted about how to implement it fairly extensively (I'm not going to have time to work on this myself for the foreseeable future sorry)
@skptic @pditommaso - regarding the task name on hover: interested to see if you can do this! Been wanting the same thing for MegaQC boxplots. Haven't found a solution yet, plotly seems to have limited support for per-point data in boxplots annoyingly.
You can always just check the table at the bottom though - clicking the headers sorts the data, so it's only a single click away ;)
Félix C. Morency
@fmorency
Nov 02 2017 18:16
Is there a way to get the reminders of the combine() operator?
mmm nvm
Shawn Rynearson
@srynobio
Nov 02 2017 22:18
Is their a preferred location within a process {} to add operations like this and this
often when I'm trying to read the contents of a file, I get a "no such variable" when using examples with myReader etc
Paolo Di Tommaso
@pditommaso
Nov 02 2017 22:21
Can you show an example of your code?
Shawn Rynearson
@srynobio
Nov 02 2017 22:24
Simple example would be something like:
process test {

    input:
    file(data) from incomining_ch

    myReader = data.newReader()
    String line
    while( line = myReader.readLine() ) {
        println line
    }
    myReader.close()

    """
    """
}
Paolo Di Tommaso
@pditommaso
Nov 02 2017 22:24
in principle should be
process test {

    input:
    file(data) from incomining_ch

    script:
    myReader = data.newReader()
    String line
    while( line = myReader.readLine() ) {
        println line
    }
    myReader.close()

    """
    """
}
but it's not possible to access the content of input files in this way
Shawn Rynearson
@srynobio
Nov 02 2017 22:26

I basically have a single file that contains:

sample, sample.file

And I want to transfer the file across a channel, then use something like map{ it.tokenize(',')} to split the values.

I tried the script method, but I kept getting an error that data was an unknown variable.
Paolo Di Tommaso
@pditommaso
Nov 02 2017 22:27
the best way is to apply that operation with an operator eg
process test {
  input: 
  val data from incomining_ch.map { it.tokenize(',') }

  script: 
  """
  something_here $data
  """
}
Shawn Rynearson
@srynobio
Nov 02 2017 22:28
Sure, but data would only contain one of the two values, right?
Paolo Di Tommaso
@pditommaso
Nov 02 2017 22:29
um, good question
Shawn Rynearson
@srynobio
Nov 02 2017 22:30
sample,sample.file
val data from incomining_ch.map { it.tokenize(',') }
= $data containing sample
Paolo Di Tommaso
@pditommaso
Nov 02 2017 22:31
data should hold the output of tokenize(',') that is supposed to be a list
Shawn Rynearson
@srynobio
Nov 02 2017 22:31
and my incoming file would look more like:
sample,sample.file
sample1,sample1.file
sample2,sample2.file
sample3,sample3.file
...
Paolo Di Tommaso
@pditommaso
Nov 02 2017 22:32
basically a csv with two columns ..
Shawn Rynearson
@srynobio
Nov 02 2017 22:33
basically, yes it would be,
you think creating a new channel with splitCsv would be preferred?
Paolo Di Tommaso
@pditommaso
Nov 02 2017 22:34
do you need to process it line by line ?
Shawn Rynearson
@srynobio
Nov 02 2017 22:34
Yes, and split the line on the comma
to capture the sample name and the associated file.
Paolo Di Tommaso
@pditommaso
Nov 02 2017 22:36
you can do
process test {
  input: 
  val data from incoming_ch.flatMap { it.tokenize('\n') }.map{ it.tokenize(',') } 
  .. 
}
Shawn Rynearson
@srynobio
Nov 02 2017 22:37
and data would contain both values?
Paolo Di Tommaso
@pditommaso
Nov 02 2017 22:37
yep
you could use splitCsv but current version has a bug, it only works with this release
Shawn Rynearson
@srynobio
Nov 02 2017 22:39
but I couldn't take apart data to get independent values?
Paolo Di Tommaso
@pditommaso
Nov 02 2017 22:40
or with data[0] and data[1]
or changing the input declaration as
input: 
set val(c1), val(c2) from ..
Shawn Rynearson
@srynobio
Nov 02 2017 22:45

Great, so this worked:

process test {
    echo = 'true'

  input:
  set val(sample_id), val(bam) from sampleInfo_ch.flatMap { it.tokenize('\n') }.map{ it.tokenize(',') }

  script:
    """
    echo "$sample_id"
    """
}

It seems it was the double stacked map operations that made this happen.
Thanks I would have never thought to stack them.

Paolo Di Tommaso
@pditommaso
Nov 02 2017 22:46
chaining operators is super useful
Shawn Rynearson
@srynobio
Nov 02 2017 22:47
Yes!
Thanks again @pditommaso
Paolo Di Tommaso
@pditommaso
Nov 02 2017 22:47
welcome