These are chat archives for nextflow-io/nextflow

11th
Oct 2017
Simone Baffelli
@baffelli
Oct 11 2017 05:36
@fmorency nice to hear that our intuition worked!
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:24
Seriously thinking to remove Java 7 support since next version 0.26.0. Any objection?
Simone Baffelli
@baffelli
Oct 11 2017 08:34
No!
Any suggestion for the optimal use of collectFile with csv files?
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:35
what do you mean ?
Simone Baffelli
@baffelli
Oct 11 2017 08:35
I want to collect several csv with the same structure
and keep the first header only
At the moment I'm doing:
detrendedPhases
            .tap{firstDetrended}
            .collectFile(newLine: true, skip:2, sort: 'index'){it->["all_detrended", it.readLines().join('\n')]}
            .into{detrendedCsv}

process detrendedSatistics{

    publishDir "${params.results}"

    input:
        file(phase_tab) from detrendedCsv
        // file(phase_tab:"detrended*.csv") from detrendedCsv
        file(first) from firstDetrended.first()
    output:
        file('residual_phase_histogram.pdf')
        file('residual_phase_ts.pdf')
    shell:
        '''
            head -2 !{first} > all.txt
            cat !{phase_tab} >> all.txt
            #head -2 detrended1.csv > all.txt; tail -n +2 -q detrended*.csv >> all.txt
            phase_residual_statistics.R all.txt residual_phase_histogram.pdf residual_phase_ts.pdf
        '''
}
where detrendedPhases is a channel emitting csv files
the problem is that first can change from run to run and the process is rerun every time
altough the content of the collected file, which I'm interested in, does not
I may use map with a function that extracts the header from the file and pass it as a val instead
since the headers contents do not change
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:37
emm , too many things altogether :)
Simone Baffelli
@baffelli
Oct 11 2017 08:37
sorry, it helps me think clearly aabout it :smile:
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:37
skip does not help to remove the header ?
Simone Baffelli
@baffelli
Oct 11 2017 08:39
it does! But i want it
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:39
what ?
Simone Baffelli
@baffelli
Oct 11 2017 08:39
I want the header reappended to the collected file
but only a global header
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:40
I see
Simone Baffelli
@baffelli
Oct 11 2017 08:40
because all csv have the same header. So I whish something like csv->remove header -> collect -> append one header after collecting
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:40
If it's not varying you can use seed, otherwise it's a bit complicated
I'm pressing that key ..
Simone Baffelli
@baffelli
Oct 11 2017 08:41
Well, I guess an option would be to extract the header using a map operator and pass that to the process
instead of the file
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:42
You want to keep the header of the first csv ?
Simone Baffelli
@baffelli
Oct 11 2017 08:42
or any header for that matter. they are all the same
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:44
if so, I would remove skip
and collect them with a custom function
Simone Baffelli
@baffelli
Oct 11 2017 08:45
that would be an option indeed
Something like skipping the first line
and reappending it at the end
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:46
open an issue on GH for that please
Simone Baffelli
@baffelli
Oct 11 2017 08:46
Will do
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:46
(I've pressed that key :))
Simone Baffelli
@baffelli
Oct 11 2017 08:48
#479
well done!
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:48
good
Simone Baffelli
@baffelli
Oct 11 2017 08:49
how can I tell the custom function to only attach the header when collecting is done?
I suppose there is no easy way to do it at the moment?
At least judging from the documentation: https://www.nextflow.io/docs/latest/operator.html#collectfile
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:51
what is the content of the input channel ?
files ?
Simone Baffelli
@baffelli
Oct 11 2017 08:51
yes
let's assume that
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:53
the problem is how to detect the first one ?
Simone Baffelli
@baffelli
Oct 11 2017 08:54
well I could initialize a counter
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:54
you said that
Simone Baffelli
@baffelli
Oct 11 2017 08:54
does collectFile support a custom closure + sorting?
Because I want to ensure reproducibility and cache integrity
in the worst case I could sort the channel
Paolo Di Tommaso
@pditommaso
Oct 11 2017 08:55
yes
Paolo Di Tommaso
@pditommaso
Oct 11 2017 09:06
definitively more challenging to write the documentation of transpose than implementing it ..
Simone Baffelli
@baffelli
Oct 11 2017 09:07
Examples help a lot
Paolo Di Tommaso
@pditommaso
Oct 11 2017 09:08
I know, but still you need to write some description :)
Simone Baffelli
@baffelli
Oct 11 2017 09:08
Yes, that is not easy
Simone Baffelli
@baffelli
Oct 11 2017 09:45
How can an innocuous closure invalidate the cache? :angry:
def collectCSV(String outputName)
{
    def iterationCounter = 0
    innerCollector = 
    {
        item -> 
        if(iterationCounter == 0)
        {
            text = item.readLines().join("\n")
        }
        else
        {
            text = item.readLines()[2..-1].join("\n")
        }
        // iterationCounter += 1
        println(text)
        return [outputName, text]
    }
    return innerCollector
}
whenever i add this in my pipeline, the cache is invalidated
Paolo Di Tommaso
@pditommaso
Oct 11 2017 09:46
you can't and don't need to return a closure . .
and iterationCounter must be declared in the global scope otherwise is useless
Simone Baffelli
@baffelli
Oct 11 2017 09:47
right! I was doing it for buffer
but I dont want global variable polluting my scope
Paolo Di Tommaso
@pditommaso
Oct 11 2017 09:48
:)
Simone Baffelli
@baffelli
Oct 11 2017 09:48
and I want to use multiple csvCollectors
they should not steal each other counters
Paolo Di Tommaso
@pditommaso
Oct 11 2017 09:48
you can pass as an argument, but you still need a global var
Simone Baffelli
@baffelli
Oct 11 2017 09:48
but then why is this nested closure invalidating the cache of upstream processes?
Paolo Di Tommaso
@pditommaso
Oct 11 2017 09:49
I guess NF is not able to sort a closure ..
Simone Baffelli
@baffelli
Oct 11 2017 09:52
:laughing:
but that happens even if I just define it
and never use that
Paolo Di Tommaso
@pditommaso
Oct 11 2017 09:53
well I don't think so
Francesco Strozzi
@fstrozzi
Oct 11 2017 11:34
I’m getting a strange error
ERROR ~ General error during conversion: Index: 5, Size: 5

java.lang.IndexOutOfBoundsException: Index: 5, Size: 5
    at java.util.ArrayList.rangeCheck(ArrayList.java:653)
    at java.util.ArrayList.get(ArrayList.java:429)
    at nextflow.ast.NextflowDSLImpl.convertProcessBlock(NextflowDSLImpl.groovy:272)
    at nextflow.ast.NextflowDSLImpl.convertProcessDef(NextflowDSLImpl.groovy:795)
    at nextflow.ast.NextflowDSLImpl$1.visitMethodCallExpression(NextflowDSLImpl.groovy:109)
    at org.codehaus.groovy.ast.expr.MethodCallExpression.visit(MethodCallExpression.java:66)
    at org.codehaus.groovy.ast.CodeVisitorSupport.visitExpressionStatement(CodeVisitorSupport.java:71)
    at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitExpressionStatement(ClassCodeVisitorSupport.java:196)
    at org.codehaus.groovy.ast.stmt.ExpressionStatement.visit(ExpressionStatement.java:42)
    at org.codehaus.groovy.ast.CodeVisitorSupport.visitBlockStatement(CodeVisitorSupport.java:37)
    at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitBlockStatement(ClassCodeVisitorSupport.java:166)
    at org.codehaus.groovy.ast.stmt.BlockStatement.visit(BlockStatement.java:71)
    at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitClassCodeContainer(ClassCodeVisitorSupport.java:104)
    at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitConstructorOrMethod(ClassCodeVisitorSupport.java:115)
    at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitMethod(ClassCodeVisitorSupport.java:126)
    at org.codehaus.groovy.ast.ClassNode.visitContents(ClassNode.java:1081)
    at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitClass(ClassCodeVisitorSupport.java:53)
    at nextflow.ast.NextflowDSLImpl.visit(NextflowDSLImpl.groovy:82)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSite.invoke(PogoMetaMethodSite.java:169)
    at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.call(PogoMetaMethodSite.java:71)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:133)
    at org.codehaus.groovy.control.customizers.ASTTransformationCustomizer.call(ASTTransformationCustomizer.groovy:294)
    at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:1065)
    at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:603)
    at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:581)
    at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:558)
    at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:298)
    at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:268)
    at groovy.lang.GroovyShell.parseClass(GroovyShell.java:688)
    at groovy.lang.GroovyShell.parse(GroovyShell.java:700)
    at groovy.lang.GroovyShell.parse(GroovyShell.java:736)
    at nextflow.script.ScriptRunner.parseScript(ScriptRunner.groovy:295)
    at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:154)
    at nextflow.cli.CmdRun.run(CmdRun.groovy:221)
    at nextflow.cli.Launcher.run(Launcher.groovy:410)
    at nextflow.cli.Launcher.main(Launcher.groovy:564)

1 error
I’ve just added a when clause
Paolo Di Tommaso
@pditommaso
Oct 11 2017 11:35
can I see that snippet ?
Is there script: before the command ?
Francesco Strozzi
@fstrozzi
Oct 11 2017 11:37
ehm no
is that the issue ?
Paolo Di Tommaso
@pditommaso
Oct 11 2017 11:37
it could be the problem tho it should report a nicer message ..
Francesco Strozzi
@fstrozzi
Oct 11 2017 11:59
yes, now it’s working thanks
Paolo Di Tommaso
@pditommaso
Oct 11 2017 12:03
:+1:
Simone Baffelli
@baffelli
Oct 11 2017 12:09
So another question: does groupTuple consider a files content or only its path?
Paolo Di Tommaso
@pditommaso
Oct 11 2017 12:38
only paths by default
Simone Baffelli
@baffelli
Oct 11 2017 13:01
Fine,found another way to do it
Venkat Malladi
@vsmalladi
Oct 11 2017 13:53
@baffelli what are you trying to do?
Simone Baffelli
@baffelli
Oct 11 2017 13:57
I group measurements by dates, estimate a certain model, split them and subtract the model prediction to each measurement, finally i want to group them again by date
I just added an id to each group
Venkat Malladi
@vsmalladi
Oct 11 2017 13:57
Ah
Simone Baffelli
@baffelli
Oct 11 2017 13:57
That makes it easier to find them again
Venkat Malladi
@vsmalladi
Oct 11 2017 14:13
Ya i have something similar for the chip-seq experiments
Paolo Di Tommaso
@pditommaso
Oct 11 2017 14:29
I'm asking again for western time zone users
Seriously thinking to remove Java 7 support since next version 0.26.0. Any objection?
Luca Cozzuto
@lucacozzuto
Oct 11 2017 14:35
no
Francesco Strozzi
@fstrozzi
Oct 11 2017 14:43
no, go on and burn legacy support https://i.imgur.com/EhuKB3j.gif
Luca Cozzuto
@lucacozzuto
Oct 11 2017 14:44
peaceful
Paolo Di Tommaso
@pditommaso
Oct 11 2017 14:45
terrorist !
Francesco Strozzi
@fstrozzi
Oct 11 2017 14:45
that’s what you do when you leave behind legacy code / stuff no ?
:)
Mike Smoot
@mes5k
Oct 11 2017 15:21
Happy to remove Java 7 support!
Paolo Di Tommaso
@pditommaso
Oct 11 2017 15:23
:+1:
Michael L Heuer
@heuermh
Oct 11 2017 15:24
We were forced to drop JDK 7 due to an upstream dependency and haven't run into any issues.
Ah other than JDK 8 javadoc being more strict, not sure that is an issue for groovy.
Paolo Di Tommaso
@pditommaso
Oct 11 2017 15:27
jdk 8 works like a charm
a bit more critical it could be jdk 9
Michael L Heuer
@heuermh
Oct 11 2017 15:30
Some teams like Cytoscape are having trouble with jdk 9, I think more from a how-to-launch-the-JVM point of view
Anthony Underwood
@aunderwo
Oct 11 2017 15:31
We'd struggle if we didn't have java 7
Paolo Di Tommaso
@pditommaso
Oct 11 2017 15:31
using the new module system can break a lot of things
@aunderwo oh really ?
Anthony Underwood
@aunderwo
Oct 11 2017 15:32
running old centos 6.4
Paolo Di Tommaso
@pditommaso
Oct 11 2017 15:32
what's the problem on upgrading the JVM ?
Anthony Underwood
@aunderwo
Oct 11 2017 15:32
we have jdk/1.7.0_25
Paolo Di Tommaso
@pditommaso
Oct 11 2017 15:35
why can't you install jvm 8 ?
Anthony Underwood
@aunderwo
Oct 11 2017 15:37
It's ok - I think we're good. We have an env module for jdk//1.8.0_121
Paolo Di Tommaso
@pditommaso
Oct 11 2017 15:37
nice
java 7 was deprecated in april 2015
Anthony Underwood
@aunderwo
Oct 11 2017 15:37
:thumbsup:
Mike Smoot
@mes5k
Oct 11 2017 15:50
Hi @pditommaso I just ran into a situation where publishDir failed saving data to S3 a few times, but my pipeline ran to completion. I see the error messages in the logs and I can recover the data, but I'm wondering if it makes sense to add retry capability to publishDir as well? Very similar to nextflow-io/nextflow#295.
Paolo Di Tommaso
@pditommaso
Oct 11 2017 15:52
yes, open an issue for that please
Mike Smoot
@mes5k
Oct 11 2017 15:55
Great, will do!
Félix C. Morency
@fmorency
Oct 11 2017 16:10
we're still using ubuntu 14.04 (thus jdk 7)
there are some unofficial PPA with jdk8
also, I made NF work on FreeBSD (with some minor fixes)
Paolo Di Tommaso
@pditommaso
Oct 11 2017 16:11
nice, feel free to contribute
Félix C. Morency
@fmorency
Oct 11 2017 16:12
will do
Paolo Di Tommaso
@pditommaso
Oct 11 2017 16:12
about jdk are you able to upgrade it ?
(I remember we talked about that .. )
Félix C. Morency
@fmorency
Oct 11 2017 16:13
not by any official means but as I said there are unofficial ppa... upgrading the whole cluster might take some time because next ubuntu versions are systemd-based
Paolo Di Tommaso
@pditommaso
Oct 11 2017 16:13
well, you can continue to use 0.25.x :)
Félix C. Morency
@fmorency
Oct 11 2017 16:14
yes for the time being
Félix C. Morency
@fmorency
Oct 11 2017 16:19
Are there any pros/cons of using /usr/bin/env bash everywhere (sheband) instead of /bin/bash
Paolo Di Tommaso
@pditommaso
Oct 11 2017 16:20
I think /usr/bin/env is portable
Félix C. Morency
@fmorency
Oct 11 2017 16:21
Because bash is not located in /bin on TrueOS (BSD). I had to make a symlink to make NF work. However, /usr/bin/env bash work A1
I was wondering if changing every shebang in the NF codebase to /usr/bin/env would cause any problem
Paolo Di Tommaso
@pditommaso
Oct 11 2017 16:23
NF is already using /usr/bin/env bash
Félix C. Morency
@fmorency
Oct 11 2017 16:24
Not in the installation script nor in the .command.sh
Paolo Di Tommaso
@pditommaso
Oct 11 2017 16:24
where ?
oops
Félix C. Morency
@fmorency
Oct 11 2017 16:25
:D
Also, curl -s https://get.nextflow.io
The shebang is /bin/bash
Paolo Di Tommaso
@pditommaso
Oct 11 2017 16:26
:)
Félix C. Morency
@fmorency
Oct 11 2017 16:26
I can make a PR that fixes everything
Paolo Di Tommaso
@pditommaso
Oct 11 2017 16:26
fantastic
Félix C. Morency
@fmorency
Oct 11 2017 18:26
@pditommaso is there anyway I can test that my fixes don't break anything?
Paolo Di Tommaso
@pditommaso
Oct 11 2017 18:27
make test
Félix C. Morency
@fmorency
Oct 11 2017 18:30
thanks. currently running
Paolo Di Tommaso
@pditommaso
Oct 11 2017 18:31
:clap:
Félix C. Morency
@fmorency
Oct 11 2017 18:51
@pditommaso I guess it is expected that tests relying on shifter/docker fail since I don't have those tech installed?
Paolo Di Tommaso
@pditommaso
Oct 11 2017 18:52
what the error message ?
Félix C. Morency
@fmorency
Oct 11 2017 18:56
It's complaining about differences linked to my changes. I will dig
Paolo Di Tommaso
@pditommaso
Oct 11 2017 18:56
there should be an html report that could help
Félix C. Morency
@fmorency
Oct 11 2017 18:57
yes im looking at it
Félix C. Morency
@fmorency
Oct 11 2017 19:28
100% successful
Paolo Di Tommaso
@pditommaso
Oct 11 2017 19:29
like a pro!
Félix C. Morency
@fmorency
Oct 11 2017 19:47
@pditommaso do you have more tests that only you can run?
Paolo Di Tommaso
@pditommaso
Oct 11 2017 19:47
there are integration tests on Circle
but they are only triggers when committing on master
Félix C. Morency
@fmorency
Oct 11 2017 19:48
Ok. make test is passing. Time to make a PR?
Paolo Di Tommaso
@pditommaso
Oct 11 2017 19:49
that would be great
Félix C. Morency
@fmorency
Oct 11 2017 19:55
Done. Thanks for the help
Paolo Di Tommaso
@pditommaso
Oct 11 2017 19:55
:+1:
steve jones
@jones_steve1_twitter
Oct 11 2017 22:02
Hi, I'm a new user of nextflow and I'm working through the examples. The software seems excellent. I have a question about 'pipeline parameters' mentioned on the getting started page that I can't find an answer to. How can I make an error message if the user doesn't input required parameters? It seems that as long as I mention the parameter flag on the command line, the script will proceed regardless if any string is associated with the parameter
Paolo Di Tommaso
@pditommaso
Oct 11 2017 22:05
hi, the common idiom is to define parameter a default value and allow the user to override it/them if needed
do you want to validate a parameter or to check if a user provided a different value ?
steve jones
@jones_steve1_twitter
Oct 11 2017 22:09
I think option #2. I want to write a script that requires the user to specify a sample ID. When I internally define "params.sample = 'null' ", simply including the command line flag '--sample ' will over-ride it, without having to actually include anything
Paolo Di Tommaso
@pditommaso
Oct 11 2017 22:11
you don't have to do anything special
providing --sample on the command line allows you to access params.sample in the script
steve jones
@jones_steve1_twitter
Oct 11 2017 22:13
It seems that by simply including '--sample' on the command line call of the script will override my internal declaration of 'null' with ''
Or maybe I'm going about this the wrong way. I essentially, want to force the use to declare some parameters and error-out if they don't
*user
Paolo Di Tommaso
@pditommaso
Oct 11 2017 22:14
put in your script
params.sample = null // NOTE not 'null'
if( !params.sample ) error "Missing sample parameter"
println "sample: $params.sample"
that's it
steve jones
@jones_steve1_twitter
Oct 11 2017 22:15
Ah, thank you. I appreciate your quick response.
steve jones
@jones_steve1_twitter
Oct 11 2017 22:22
Sorry. I'm still fiddling with this. I think I run into the same issue. As long as I invoke the '--sample' parameter without specifying it on the command line the script runs run, i.e. "nextflow test.nf --sample"
*runs fine
Paolo Di Tommaso
@pditommaso
Oct 11 2017 22:24
mmm, I think you haven't used my example
steve jones
@jones_steve1_twitter
Oct 11 2017 22:27
I excluded the third print statement, I didn't realize it's required. It works now but excludes the error statement you specified in line 2 (""Missing sample parameter")
Paolo Di Tommaso
@pditommaso
Oct 11 2017 22:28
can you copy and paste your code ?
steve jones
@jones_steve1_twitter
Oct 11 2017 22:30
#!/usr/bin/env nextflow


/*
 * Declare parameters
 */

params.sample = null // NOTE not 'null'
if( !params.sample ) error "Missing sample parameter"
println "sample: $params.sample"
Using this command: nextflow test.nf --sample
Paolo Di Tommaso
@pditommaso
Oct 11 2017 22:31
I have this output
$ nextflow run test.nf 
N E X T F L O W  ~  version 0.25.7
Launching `test.nf` [trusting_ardinghelli] - revision: 08091a9d76
Missing sample parameter

 -- Check script 'test.nf' at line: 9 or see '.nextflow.log' file for more details


$ nextflow run test.nf --sample 
N E X T F L O W  ~  version 0.25.7
Launching `test.nf` [boring_tesla] - revision: 08091a9d76
sample: true
is not what you want ?
steve jones
@jones_steve1_twitter
Oct 11 2017 22:32
Interesting this is what I see:
N E X T F L O W  ~  version 0.25.7
Launching `./process/test1.nf` [ecstatic_kilby] - revision: 3b3853c690
sample: true
Paolo Di Tommaso
@pditommaso
Oct 11 2017 22:34
you haven't specified what command line you have used ..
steve jones
@jones_steve1_twitter
Oct 11 2017 22:35
nextflow test.nf --sample
Paolo Di Tommaso
@pditommaso
Oct 11 2017 22:35
if so, it's correct
or your point is that on the command line there isn't a value for the sample?
steve jones
@jones_steve1_twitter
Oct 11 2017 22:37
yes correct
Paolo Di Tommaso
@pditommaso
Oct 11 2017 22:38
if not value is given is implicitly considered flag true
you can use this check if so
if( !params.sample || params.sample instanceof Boolean ) error "Missing sample parameter"
steve jones
@jones_steve1_twitter
Oct 11 2017 22:41
Ok, got it