These are chat archives for nextflow-io/nextflow

2nd
May 2018
Simone Baffelli
@baffelli
May 02 2018 12:10
What does this exactly mean Unable to resume cached task -- See log file for details com.esotericsoftware.kryo.KryoException: Unable to find class: BlankSeparatedList1_groovyProxy
I know that NF is using blankSeparatedList to represent input of multiple files; but I don't understand the meaning of this error in the context
Paolo Di Tommaso
@pditommaso
May 02 2018 12:12
I guess there's a variable with that in the script context declared w/o def
Simone Baffelli
@baffelli
May 02 2018 12:14
process averageTMatrix{

    errorStrategy "ignore"

    input:
        set val(dayId), val(dateId), val(rxId),
        file(par:"T??.par"), file(T11:"??.T11"),
        file(T12:"??.T12"), file(T13:"??.T13"), 
        file(T22:"??.T22"), file(T23:"??.T23"),
        file(T33:"??.T33") from collectedCohToAverage
    output:
        set val(dayId), val(rxId),
        val(["T11", "T12", "T13", "T22", "T23", "T33"]),
        file("average.T*"), file(outPar) into averagedCoherencyMatrix
    shell:
        /*Here, for each element of the matrix
        we want to prepare a file, whose name
        is given as the key of the map. The content
        of the map are the command and the list
        of collect matrix elements
        */
        tabs = [
        ["t11List", T11 as List, "ave_image"], 
        ["t12List", T12 as List, "ave_cpx"],
        ["t13List", T13 as List, "ave_cpx"],
        ["t22List", T22 as List, "ave_image"],
        ["t23List", T23 as List, "ave_cpx"],
        ["t33List", T33 as List, "ave_image"]]

        /* Now prepare the files used in the averaging */
            outPar = (par as List)[0]
            averagingCmds = tabs.collect{
                inFileName, filesToAverage, command ->
                tabCmd = "echo '${filesToAverage.join("\n")}' > ${inFileName}"
                outFileName = "average.${(filesToAverage[0] as String).tokenize(".")[-1]}"
                averagingCommand = "${command} ${inFileName} \${wd} ${outFileName}"
                [tabCmd, averagingCommand].join("\n")
            }.join("\n")
            /* Here we call the commands */
            '''
            wd=$(get_value !{par[0]} range_samples)
            !{averagingCmds}
            '''
so should I define outPar using def?
Paolo Di Tommaso
@pditommaso
May 02 2018 12:15
all variables that are not used as an output should be declared with def
gawells
@gawells
May 02 2018 12:30
@mes5k thanks, I have the closure working, but I'm not sure how to integrate it with collectFile. Ultimately, I need the patterns to come from a channel too
Channel
    .from('group1','group2','group3')
    .set{groups}

Channel
    .fromPath('?_*.txt')
    .set{input_files}

inGroup = { it, group -> if (it =~ /${group}/) {[it,group]}  }

processs reGroup {
    input:
        file input_files.collectFile() {item  -> inGroup(item,'group1') }

    '''
    '''
}
Paolo Di Tommaso
@pditommaso
May 02 2018 12:31
Is there a way to use a regex as a dynamic criteria for collectFile? I want to group outputs based on the presence of a substring
is still this problem ?
gawells
@gawells
May 02 2018 12:57
yes, I've tried using a function too to return the tuple. But I'm missing something
Paolo Di Tommaso
@pditommaso
May 02 2018 12:59
let's put in this way
the collectFile closure must return a pair in which the first element is the grouping key and the second is the file itself
I want to group outputs based on the presence of a substring
the presence of a substring where ?
in the file name ?
gawells
@gawells
May 02 2018 13:02
that's right (I mixed up the order in the pair, but that hasn't fixed it)
Paolo Di Tommaso
@pditommaso
May 02 2018 13:05
a basic implementation could be
input_files.collectFile { item -> 
  if( item.name.contains('foo') ) {
    return ['foo', item]
  }
  else {
     return ['bar', item]
  }
 }
that can be rewritten as
input_files.collectFile { item -> 
  [  item.name.contains('foo') ? 'foo' : 'bar', item  ]   
}
makes sense until here ?
Paolo Di Tommaso
@pditommaso
May 02 2018 13:11
you are slow my friend.. :)
gawells
@gawells
May 02 2018 13:20
hi, yes :) I was being tripped up by some other types
doh, typos
Paolo Di Tommaso
@pditommaso
May 02 2018 13:22
ok, anyhow the main problem is that you need a function that given a string extract the grouping key e.g. getGroupKey
having that you can write
input_files.collectFile { item -> [  getGroupKey(item.name), item  ]  }
does it make sense ?
(provided that for each file there's a grouping key)
Simone Baffelli
@baffelli
May 02 2018 13:28
all variables that are not used as an output should be declared with def
Ok, time to fix most of my code then
gawells
@gawells
May 02 2018 13:30
yup, I have a working function. Still figuring out how to use collectFile inside a process, I was accidentally putting file before it
Paolo Di Tommaso
@pditommaso
May 02 2018 13:30
the syntax is always the same
input: 
file <foo> from <channel>
therefore it become
input: 
file <foo> from input_files.collectFile { item -> [  getGroupKey(item.name), item  ]  }
Simone Baffelli
@baffelli
May 02 2018 13:32
What will happen if I'm not using def?
Paolo Di Tommaso
@pditommaso
May 02 2018 13:32
the variable content is serialised to a file for caching purpose
Simone Baffelli
@baffelli
May 02 2018 13:34
and that may cause trouble if it is not serialisable?
or is there any other risk?
Paolo Di Tommaso
@pditommaso
May 02 2018 13:39
yes, exactly if the object is not serialisable you will get that error
no other risk apart that you add an unnecessary serialisation overhead to your task execution
Vladimir Kiselev
@wikiselev
May 02 2018 13:46

Hi @pditommaso , I try to run NF in our k8s cluster like this: ./nextflow kuberun nf-core/rnaseq --reads 'fastq/*_{1,2}.fastq.gz' --genome GRCh37 -v testpvc:/mnt/gluster/

I put the fastq folder in /mnt/gluster/ however, the pipeline complains: ERROR ~ Cannot find any reads matching: fastq/*_{1,2}.fastq.gz

I can see that NF also created a folder /mnt/gluster/pvc-319c8c17-3ca6-11e8-89b1-fa163e31bb09 where it stores all the working files and pipelines
So, the question is: what is a root directory used by NF when running a pipeline. Where shall the fastq folder be located?
Paolo Di Tommaso
@pditommaso
May 02 2018 13:49
I would suggest to specify the reads path using an absolute path
Vladimir Kiselev
@wikiselev
May 02 2018 13:49
Ah, ok, will try now
Paolo Di Tommaso
@pditommaso
May 02 2018 13:50
not sure that /mnt/gluster/pvc-319c8c17-3ca6-11e8-89b1-fa163e31bb09 is created by NF
Vladimir Kiselev
@wikiselev
May 02 2018 13:53
thanks, will be back in in an hour or so and continue
Paolo Di Tommaso
@pditommaso
May 02 2018 13:53
:+1:
gawells
@gawells
May 02 2018 14:08
@pditommaso this seems to be doing the trick, thanks!
Channel
    .from('group1','group2','group3')
    .set{groups}

Channel
    .fromPath('?_*.txt')
    .set{input_files}

def inGroup2 (item, group) {
    if (item =~ /${group}/) return ["${group}.combined.txt",item]    
}

process reGroup {
    input:
        file input from input_files.combine(groups).collectFile() {item -> inGroup2(item[0],item[1]) }

    """
    """
}
Francesco Strozzi
@fstrozzi
May 02 2018 14:12
hi guys, have you ever seen an error like this when using the concat operator ?
java.util.ConcurrentModificationException: null
    at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
    at java.util.ArrayList$Itr.next(ArrayList.java:851)
    at nextflow.Session.destroy(Session.groovy:546)
    at nextflow.script.ScriptRunner.terminate(ScriptRunner.groovy:348)
    at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:167)
    at nextflow.cli.CmdRun.run(CmdRun.groovy:223)
    at nextflow.cli.Launcher.run(Launcher.groovy:428)
    at nextflow.cli.Launcher.main(Launcher.groovy:582)
the code in the pipeline is very simple, I am creating 4 channels using Channel.fromFilePairs and then I use concat like this
inputs = input1.concat(input2,input3,input4)
am I missing something on how this operator works ?
Simone Baffelli
@baffelli
May 02 2018 14:15
no other risk apart that you add an unnecessary serialisation overhead to your task execution
Ah, exactly as I imagined
Would I see performance gains if I declare types instead of using def? (except for the obvious ease of debugging)?
arontommi
@arontommi
May 02 2018 14:18
@fstrozzi
i am a java and nextflow newb, but i think it can work like this
inputs = input1.concat(input2).concat(input3).concat(input4)
Francesco Strozzi
@fstrozzi
May 02 2018 14:19
hi @arontommi , I was checking and the problem seems related to the assignment of the output of concat to a new channel
if I pass the same command as input in a process it works, i.e.
val data from input1.concat(input2,input3,input4).collect()
arontommi
@arontommi
May 02 2018 14:21
nice!
Francesco Strozzi
@fstrozzi
May 02 2018 14:22
it’s weird, I thought I could assign the output of concat to a new channel
Simone Baffelli
@baffelli
May 02 2018 14:23
I must have misunderstood the role of def: If I try to use a variable to which I prepend def in a command, I get No such property: channelCmd for class:
Vladimir Kiselev
@wikiselev
May 02 2018 16:43
Hi @pditommaso , I’ve opened an issue, since there is too much text: nextflow-io/nextflow#676
Paolo Di Tommaso
@pditommaso
May 02 2018 16:44
:+1:
Stephen Kelly
@stevekm
May 02 2018 17:37
in my experience so far, I've had to use def inside Channel closures, and could not use it inside Processes
also I've been using def in the script outside of Channels and Processes as well. Its been working better since I started doing all those. def defines the scope of the variable right?
Hitesh Joshi
@evoxtorm
May 02 2018 19:42
Hey I was trying to download nextflow
Screenshot from 2018-05-03 01-11-00.png
and it say's installation is completed but " you may complete the installation by moving it to a directory in your $PATH" what does it mean
Paolo Di Tommaso
@pditommaso
May 02 2018 19:44
what version of java are you using ?
Hitesh Joshi
@evoxtorm
May 02 2018 19:45
8
Paolo Di Tommaso
@pditommaso
May 02 2018 19:46
have you managed to install it before or in a different machine ?
Hitesh Joshi
@evoxtorm
May 02 2018 19:46
No I'm trying this first time
Paolo Di Tommaso
@pditommaso
May 02 2018 19:47
can you copy & paste the complete output ?
even better, please create an issue including the complete error message
Hitesh Joshi
@evoxtorm
May 02 2018 19:49
The whole output ok sure