These are chat archives for nextflow-io/nextflow

19th
Sep 2018
Bioninbo
@Bioninbo
Sep 19 2018 09:14

Hello everyone,

Do you know how to unflat a list for one level only?

For instance if I do this:

Channel.from([['a','b'],'c',[['d','e'],['f']]).map{it.flatten}

I would get ['a','b','c','d','e','f']
but I want to get ['a','b','c',['d','e'],'f']

Also I would like to filter within a list
For instance:

Channel.from(['catdog','cat','dog','foo']).map{it[3], it.filter( ~/.*dog.*/ )}

doesn't work,
But I would like to get this: ['foo','catdog','dog']

Paolo Di Tommaso
@pditommaso
Sep 19 2018 09:16
Channel.from([['a','b'],'c',[['d','e'],['f']]]).flatten().println()
Bioninbo
@Bioninbo
Sep 19 2018 09:51
I think I solved the second question:
Channel.from([['catdog','cat','dog','foo']])
    .map{ [ it[3], it.findAll{ it =~ 'dog' } ] }
    .map{ [ it.flatten() ] }
    .println()
Paolo Di Tommaso
@pditommaso
Sep 19 2018 09:52
not understanding why not just using x.flatten().filter { .. }.println()
Luca Cozzuto
@lucacozzuto
Sep 19 2018 09:53
Hi @pditommaso
I have the following problem
ERROR ~ Error executing process > 'calc_peptide_area (b2a401cd-ee09-4a2d-8799-765a237beffb_QC01_3d0c7b4ef362c15f878afef700a9afed)'

Caused by:
  Missing value declared as output parameter: process_id
this is the code
process calc_peptide_area {

    tag { sample_id }
    def process_id = PepArea_ID   

    input:
    set sample_id, internal_code, checksum, file(featxml_file) from shot_featureXMLfiles_for_calc_peptide_area.mix(srm_featureXMLfiles_for_calc_peptide_area)
    file(peptideCSV)
    file(workflowfile) from getWFFile(baseQCPath, process_id)

    output:
    set sample_id, internal_code, checksum, process_id, file("${sample_id}_QC_${process_id}.json") into pep_area_for_check

    script:
    def knime = new Knime(wf:workflowfile, csvpep:peptideCSV, stype:internal_code, featxml:featxml_file, mem:"${task.memory.mega-5000}m", qccv:"QC_${process_id}", qccvp:"QC_${ontology[process_id]}", chksum:checksum, ojid:"${sample_id}")
    knime.launch()

}
Paolo Di Tommaso
@pditommaso
Sep 19 2018 09:55
what the hell is doing def process_id = PepArea_ID there ?
Luca Cozzuto
@lucacozzuto
Sep 19 2018 09:55
:)
If I put it in the script
then I cannot use it in the input:
Bioninbo
@Bioninbo
Sep 19 2018 09:57
I can't make it work with filter... i.e. this doesn't work:
Channel.from([['catdog','cat','dog','foo']])
    .map{ [ it[3], it.flatten().filter('dog') ] }
    .println()
Paolo Di Tommaso
@pditommaso
Sep 19 2018 09:58
@lucacozzuto you cannot use that kind of var in input declaratiosn
Luca Cozzuto
@lucacozzuto
Sep 19 2018 09:58
no way?
Paolo Di Tommaso
@pditommaso
Sep 19 2018 09:59
file(workflowfile) from getWFFile(baseQCPath, process_id) this does not make sense
you cannot have two froms
Luca Cozzuto
@lucacozzuto
Sep 19 2018 10:00
?
getWFFile is function
that requires 2 params
Paolo Di Tommaso
@pditommaso
Sep 19 2018 10:00
and ...
Luca Cozzuto
@lucacozzuto
Sep 19 2018 10:00
and return a file
Paolo Di Tommaso
@pditommaso
Sep 19 2018 10:00
ohh
Luca Cozzuto
@lucacozzuto
Sep 19 2018 10:01
that part it works :)
however there is no way to define a variable at the beginning of a process?
since there are several processes quite similar I was thinking this way
Paolo Di Tommaso
@pditommaso
Sep 19 2018 10:06
put at the beginning of your script
Luca Cozzuto
@lucacozzuto
Sep 19 2018 10:08
ERROR ~ No such variable: process_id
Paolo Di Tommaso
@pditommaso
Sep 19 2018 10:09
without def ...
Luca Cozzuto
@lucacozzuto
Sep 19 2018 10:10
does it become global without the def?
Paolo Di Tommaso
@pditommaso
Sep 19 2018 10:10
yep
Luca Cozzuto
@lucacozzuto
Sep 19 2018 10:10
ERROR ~ Variable `process_id` already defined in the process scope @ line 329, column 99.
   }m", qcml:qcmlfile, qccv:"QC_${process_i
Luca Cozzuto
@lucacozzuto
Sep 19 2018 10:23
it crashes because I'm using in several processes
I understand this is impossible at this time... but do you think it could be a possible future feature?
Can I make an issue?
Paolo Di Tommaso
@pditommaso
Sep 19 2018 10:39
Maybe :)
Luca Cozzuto
@lucacozzuto
Sep 19 2018 12:15
Bioninbo
@Bioninbo
Sep 19 2018 14:09

Hello,

I have a list of an unknown number of elements nested in a list. I would like to split it such that each element of the nested list is affected to a different channel emission. i.e.:

[a, b, c, [d, e, f, g]]

would become:

[a, b, c, d]
[a, b, c, e]
[a, b, c, f]
[a, b, c, g]

Is it possible to do this?

Maxime Garcia
@MaxUlysse
Sep 19 2018 14:10
Oh, that's a nice one
I'm trying some stuff
Paolo Di Tommaso
@pditommaso
Sep 19 2018 14:15
Channel.from([['a', 'b', 'c', ['d', 'e', 'f', 'g']]]).transpose().println()
Maxime Garcia
@MaxUlysse
Sep 19 2018 14:16
Not fast enough :-(
Thanks @pditommaso
Paolo Di Tommaso
@pditommaso
Sep 19 2018 14:16
I gave you some advantage :joy:
Bioninbo
@Bioninbo
Sep 19 2018 14:16
sounds great! thanks @pditommaso
Alexander Peltzer
@apeltzer
Sep 19 2018 14:38
Does nextflow currently log the hostname of submitted jobs somewhere? would be cool for debugging, we currently have a weird filesystem error
Could open an issue too
Paolo Di Tommaso
@pditommaso
Sep 19 2018 14:39
maybe you can just had process.beforeScript = 'echo $HOSTNAME'
Alexander Peltzer
@apeltzer
Sep 19 2018 18:05
Nice ideađź’ˇ
Shawn Rynearson
@srynobio
Sep 19 2018 20:01

I've noticed on update this warning:

WARN: Process configuration syntax $processName has been deprecated -- Replace `process.$myprocess = <value>` with a process selector

So I'm possibly going to redesign my config files, hopefully into one large file.

Based on the documentation selectors will allow you to label options and add a directive to each process, I'm assuming to reduce redundancy. Currently I use params, process and profiles based on my needs and compute environment.
So I'm wondering what the best nesting approach would be.

Could I have something like:

Profiles {
 HPC {
     params { }
    process {
        withLabel: big_mem {
            cpus = 16
            memory = 64.GB
            queue = 'long'
        }
    }
 }
 Grid {
     params { }
     process {
         withLabel: really_big_mem {
             cpus = 40
                 memory = 128.GB
                 queue = 'quick'
             }
    }
 }
}