These are chat archives for nextflow-io/nextflow

28th
Feb 2017
amacbride
@amacbride
Feb 28 2017 00:40
@pditommaso Not to get off-topic, but have you been happy with your experience publishing with PeerJ?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 00:41
I would say yes
much better and faster than high profile journals
are you planning a pre-print or a peer-reviewed paper?
amacbride
@amacbride
Feb 28 2017 00:49
eventually peer-reviewed
but may start with a pre-print
I've published in PLoS Computational Biology before, and had a good experience, but that was for some much more high-profile work as a part of a team.
So I'm exploring all the various options.
Paolo Di Tommaso
@pditommaso
Feb 28 2017 00:56
if you don't care about impact factor, PeerJ is great
going offline.. :sleeping:
Karin Lagesen
@karinlag
Feb 28 2017 10:08
I'm wondering about using templates vs just running the script itself
I am assuming there is a benefit there that I'm not seeing as a first-timer?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 10:13
well, the main difference is that templates allows you to use NF variables in the template script
instead using a plain bash/perl/whatever script you should pass any argument the script command line or env variables
Karin Lagesen
@karinlag
Feb 28 2017 11:38
Hmm... that makes sense. Will have to give it a go :)
Tim Diels
@timdiels
Feb 28 2017 12:14
What does NextFlow offer in terms of modularisation? I.e. to what degree can I reuse parts of a pipeline?
Karin Lagesen
@karinlag
Feb 28 2017 12:49
I'm reading the docs now. I see a lot of use of it. Is this convention, or is it actually part of the language?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 12:52
@karinlag no, it is the implicit argument of a closure. Think lambda function in python.
Karin Lagesen
@karinlag
Feb 28 2017 12:53
thanks! and thanks for the ref to python :) makes it (slightly) more understandable :smile:
Paolo Di Tommaso
@pditommaso
Feb 28 2017 12:53
;)
Karin Lagesen
@karinlag
Feb 28 2017 12:55
btw, how do we report errors in the docs? I suspect that the code found under input repeaters is missing a 3
(provided I\ve understood things)
Paolo Di Tommaso
@pditommaso
Feb 28 2017 12:55
that would be fantastic
Karin Lagesen
@karinlag
Feb 28 2017 12:56
me understanding things or me reporting errors or there being errors? :smile:
Paolo Di Tommaso
@pditommaso
Feb 28 2017 12:56
all the docs is in the github repo (rst files)
just open a PR for any problem
Karin Lagesen
@karinlag
Feb 28 2017 12:56
ack
thanks
Paolo Di Tommaso
@pditommaso
Feb 28 2017 12:56
reporting errors! of course! :D
Karin Lagesen
@karinlag
Feb 28 2017 12:57
just checking :laughing:
Paolo Di Tommaso
@pditommaso
Feb 28 2017 12:59
@timdiels You are reuse tasks logic, by using external scripts or templates, or functions with groovy/java libraries. But at this time there isn't sub-workflow modularisation
Tim Diels
@timdiels
Feb 28 2017 13:01
@pditommaso Would nextflow run mysubworkflow work as an alternative (for some cases)?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 13:02
In the meaning you can launch by another NF script like any other tool? of course.
Karin Lagesen
@karinlag
Feb 28 2017 13:20
wow, just discovered you've incorporated the module system! awesome!
Paolo Di Tommaso
@pditommaso
Feb 28 2017 13:20
Yes, conda is on the radar as well!
said that NF is promoting containers to manage deps
Tim Diels
@timdiels
Feb 28 2017 13:25
So, when 2 pipelines share a snippet of code (e.g. processes and channels), it's not possible to put it in a separate file and include it in each pipeline?
Anthony Underwood
@aunderwo
Feb 28 2017 13:26
Afternoon. I am recreating a pipeline that we currently run using a homebrew execution engine. Just wondering it is possible in nextflow to a conditional run process Y if output file from process X contains this text (e.g XML tag with this value) else run process Z
Paolo Di Tommaso
@pditommaso
Feb 28 2017 13:28
@timdiels Nope, my suggestion is to externalise as much as possible task logic in separate script, and use NF to orchestrate/parallelise the execution. It should not be a single process for each tool.
Tim Diels
@timdiels
Feb 28 2017 13:29
Ok, thanks
Anthony Underwood
@aunderwo
Feb 28 2017 13:31
I can see there is a choice operator (https://www.nextflow.io/docs/latest/operator.html#choice) and when I look at some of the example workflows I can see there are if statements but haven't found one that checks the contents of a process output file as the comparator
Paolo Di Tommaso
@pditommaso
Feb 28 2017 13:32
there are different strategies, have a look here
(need to leave temporarily)
Anthony Underwood
@aunderwo
Feb 28 2017 13:33
Thanks - would be interested in how that example can be applied to whole processes rather than within
Karin Lagesen
@karinlag
Feb 28 2017 13:50
I really like the from fromFilePairs channel thing. Any way to do the same, but for four files sharing the same prefix?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 14:05
@karinlag yes, use fromFilePairs(..., size:4). See here.
@aunderwo You the choice operator can be an option in the case.
Anthony Underwood
@aunderwo
Feb 28 2017 14:07

@pditommaso

there are different strategies, have a look here

That example only seems to be checking on if an output is an instance of a path not reading in a file and then deciding which process to run.

Oops sorry messages crossed paths
Paolo Di Tommaso
@pditommaso
Feb 28 2017 14:08
it needs a blank line :)
that example was meant how to create a conditional process script
you can even do something like
if (condition) {
  process A {  .. }
}
else {
  process B { .. }
}
though I tend to avoid if possible
Anthony Underwood
@aunderwo
Feb 28 2017 14:10
Ok using choice how would I test file contents of an output from a previous process
?
Would I put a more complex multi-line block in place instead of a -> a =~ /^Hello.*/ ? 0 : 1
I guess if a was a file I can operate on that in groovy?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 14:12
the a argument would a file in your code
you will need to write an appropriate method that takes the file as argument, read the content and return 0 or 1 depending the choice
that would be plain groovy/java code
Anthony Underwood
@aunderwo
Feb 28 2017 14:13
OK, got it. :) Do you know if any of the examples have a choice operator in a real world example?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 14:15
for example
the interesting thing in this example is that the choosing function directionSorter is passed as an argument
you may noticed that is defined here
Maxime Garcia
@MaxUlysse
Feb 28 2017 14:17
Impressive
Paolo Di Tommaso
@pditommaso
Feb 28 2017 14:17
:)
credits to @robsyme
Anthony Underwood
@aunderwo
Feb 28 2017 14:25

you may noticed that is defined here

That is nice. So choice creates two channels and these channels can be the inputs for two processes doing different things? Would there have to be nested choices if you wanted the equivalent of
if (cond1) { ... } elsif (cond2) { ... } elsif (cond3) { ... } else { ... }
???

Paolo Di Tommaso
@pditommaso
Feb 28 2017 14:29
not sure to understand correctly
first thing, choice does not create the destination channels, they must be created before. See for example here.
then, you can have has many choice option/channels as you need ie.
Anthony Underwood
@aunderwo
Feb 28 2017 14:30
Ok - choice feeds items into pre-created channels
Paolo Di Tommaso
@pditommaso
Feb 28 2017 14:30
source .choice( a, b, .. , n ) { /* choice rule */ }
you can do
if (cond1) { ... } elsif (cond2) { ... } elsif (cond3) { ... } else { ... }
but I would use a switch statement as in the example
Anthony Underwood
@aunderwo
Feb 28 2017 14:31
OK cool - so the block should return 0 ..n where n is the number of channels -1
Paolo Di Tommaso
@pditommaso
Feb 28 2017 14:32
yes, 0 select the first, 1 the second .. n-1 the last :)
Anthony Underwood
@aunderwo
Feb 28 2017 14:32
Got it - I shall give that a try. Thanks for the advice - much appreciated
Paolo Di Tommaso
@pditommaso
Feb 28 2017 14:34
you are welcome
Manuel
@kohleman
Feb 28 2017 15:12

I am using also this nice feature of fromFilePairs. Very elegant.

Beside the usage of the pair_id(see code below), I would also like to only use part of this variable.
In a concrete case the pair_id is 'BSSE_QGF_55000_CA761ANXX_5_ESBIPGRA00342_GCTCGGTA_S29_L005'. Now I would like to also have only 'BSSE_QGF_55000_CA761ANXX_5' which can be used in the publishDir or in the command line part. I failed so far, but maybe there is a 'nextflow' way to do this.

params.reads = '/demultiplexed_6/*/*/*R{1,2}_001.fastq.gz'

Channel.fromFilePairs(params.reads)
                .ifEmpty { error "Cannot find any reads matching: ${params.reads}" }
                .set { read_pairs }

process trim {

   tag "${pair_id}"

    publishDir "${params.outpath}/${params.title}/$pair_id", mode: 'link'

    input:
    set pair_id, file(reads) from read_pairs

   """
   trimmomatic PE -threads 8 -phred33 -trimlog ${pair_id}_trim_log_out.txt [...]
   """
Paolo Di Tommaso
@pditommaso
Feb 28 2017 15:48
you can write an helper function map that get the shorted id given pair_id and then apply in the publishDir rule
eg
def getShortId( str ) {
  return str.substring(5) /* replace with your code */
}
then
publishDir "${params.outpath}/${params.title}/${getShortId(pair_id)}"
Manuel
@kohleman
Feb 28 2017 16:21
Thanks, that is what I needed :-)
Paolo Di Tommaso
@pditommaso
Feb 28 2017 16:22
:+1:
Félix C. Morency
@fmorency
Feb 28 2017 18:20
@pditommaso what hashing method is used internally by NF?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 18:20
Murmum128
Félix C. Morency
@fmorency
Feb 28 2017 18:21
thanks
Paolo Di Tommaso
@pditommaso
Feb 28 2017 18:22
are u going to hack it? :)
Félix C. Morency
@fmorency
Feb 28 2017 18:22
@pditommaso maybe. we are in the process of getting some FDA certification for our pipeline :)
Paolo Di Tommaso
@pditommaso
Feb 28 2017 18:23
I see cool
Félix C. Morency
@fmorency
Feb 28 2017 20:47
is there a way to see why nextflow pull failed on my private gitlab server?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 20:48
unexpected error ?
Félix C. Morency
@fmorency
Feb 28 2017 20:48
yeah. I get a Make sure exists a Gitlab repository at this address
Paolo Di Tommaso
@pditommaso
Feb 28 2017 20:48
try
Félix C. Morency
@fmorency
Feb 28 2017 20:48
and when I go to said address in my browser, the repo is there
Paolo Di Tommaso
@pditommaso
Feb 28 2017 20:48
nextflow -log <filename> pull .. etc
there should be more details in the log
Félix C. Morency
@fmorency
Feb 28 2017 20:50
Feb-28 15:49:23.438 [main] DEBUG nextflow.cli.Launcher - $> /home/morency/software/nextflow/nextflow -log nflog pull nf-dmri-core/dmri-human-nf -hub imk
Feb-28 15:49:23.481 [main] INFO  nextflow.cli.CmdPull - Checking nf-dmri-core/dmri-human-nf ...
Feb-28 15:49:23.772 [main] DEBUG nextflow.scm.RepositoryProvider - Request [credentials fmorency:********************] -> https://imeka-server-01/api/v3/projects/nf-dmri-core%2Fdmri-human-nf
Feb-28 15:49:23.954 [main] DEBUG nextflow.scm.RepositoryProvider - Request [credentials fmorency:********************] -> https://imeka-server-01/api/v3/projects/nf-dmri-core%2Fdmri-human-nf
Feb-28 15:49:23.963 [main] DEBUG nextflow.scm.RepositoryProvider - Request [credentials fmorency:********************] -> https://imeka-server-01/api/v3/projects/nf-dmri-core%2Fdmri-human-nf
Feb-28 15:49:23.979 [main] DEBUG nextflow.cli.Launcher - Operation aborted
nextflow.exception.AbortOperationException: Cannot find `nf-dmri-core/dmri-human-nf` -- Make sure exists a GitLab repository at this address `https://imeka-server-01/nf-dmri-core/dmri-human-nf`
    at nextflow.scm.RepositoryProvider.validateFor(RepositoryProvider.groovy:210)
    at nextflow.scm.AssetManager.checkValidRemoteRepo(AssetManager.groovy:343)
    at nextflow.scm.AssetManager.download(AssetManager.groovy:522)
    at nextflow.cli.CmdPull$_run_closure1.doCall(CmdPull.groovy:78)
    at nextflow.cli.CmdPull$_run_closure1.call(CmdPull.groovy)
    at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2030)
    at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2015)
    at org.codehaus.groovy.runtime.DefaultGroovyMethods.each(DefaultGroovyMethods.java:2056)
    at nextflow.cli.CmdPull.run(CmdPull.groovy:74)
    at nextflow.cli.Launcher.run(Launcher.groovy:410)
    at nextflow.cli.Launcher.main(Launcher.groovy:558)
I can access the referenced api webpage in firefox
mmm but not in incognito mode
Paolo Di Tommaso
@pditommaso
Feb 28 2017 20:53
something related to credentials ?
Félix C. Morency
@fmorency
Feb 28 2017 20:53
mmm maybe
mm I don't see what
im using a self-signed cert for https
can it cause problem?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 20:57
have you specified the gitlab token as specified here ?
Félix C. Morency
@fmorency
Feb 28 2017 20:57
yup
Paolo Di Tommaso
@pditommaso
Feb 28 2017 21:02
are you able to curl/wget https://imeka-server-01/api/v3/projects/nf-dmri-core%2Fdmri-human-nf ?
Félix C. Morency
@fmorency
Feb 28 2017 21:04
i get a 404 in my console
i guess because im not auth?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 21:05
yes, you will need to set the token in the request header, have a look to the gitlab api docs
Félix C. Morency
@fmorency
Feb 28 2017 21:05
i also get an error if I don't pass --no-check-certificate
doesn't nf pull supports self-signed cert url?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 21:06
umm
we enter in a grey area ..
are you using some custom certificate ?
Félix C. Morency
@fmorency
Feb 28 2017 21:07
self-signed yeah
I can access the API from curl/wget with my API token, but I have to pass an additional option to skip the certificate check
Paolo Di Tommaso
@pditommaso
Feb 28 2017 21:09
it could be the problem, I think you will need somehow to add in the java key store . .
not exactly my preferred piece of cake
Félix C. Morency
@fmorency
Feb 28 2017 21:10
It seems there is a Java option to disable cert check: -Dcom.sun.net.ssl.checkRevocation=false
is there a way to feed that to NF?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 21:11
nextflow -Dcom.sun.net.ssl.checkRevocation=false pull .. etc
Félix C. Morency
@fmorency
Feb 28 2017 21:13
mmm still nothing. ill investigate. thanks!
Paolo Di Tommaso
@pditommaso
Feb 28 2017 21:14
ok
Félix C. Morency
@fmorency
Feb 28 2017 21:22
ok now it works
needed to add the cert to the default java keystore
Paolo Di Tommaso
@pditommaso
Feb 28 2017 21:22
exactly
I should add a note in the doc
Félix C. Morency
@fmorency
Feb 28 2017 21:23
I used
openssl s_client -showcerts -connect server.com:443 </dev/null 2>/dev/null|openssl x509 -outform PEM >mycertfile.pem
to get the https cert from the url and then
keytool -import -alias <some descriptive name> -file <certificate file> -keystore <path to keystore>
Paolo Di Tommaso
@pditommaso
Feb 28 2017 21:24
cool
Félix C. Morency
@fmorency
Feb 28 2017 21:24
default keystore in openjdk7 on ubuntu 14.04 is located at /usr/lib/jvm/java-1.7.0-openjdk-amd64/jre/lib/security/cacerts
Paolo Di Tommaso
@pditommaso
Feb 28 2017 21:24
what's s_client in the first command ?
Félix C. Morency
@fmorency
Feb 28 2017 21:25
An openssl option
s_client  This implements a generic SSL/TLS client which can establish
                 a transparent connection to a remote server speaking SSL/TLS.
                 It's intended for testing purposes only and provides only
                 rudimentary interface functionality but internally uses
                 mostly all functionality of the OpenSSL ssl library.
Paolo Di Tommaso
@pditommaso
Feb 28 2017 21:25
great
Félix C. Morency
@fmorency
Feb 28 2017 21:26
I found on the net that the default password for the keystore is changeit
it worked here
Paolo Di Tommaso
@pditommaso
Feb 28 2017 21:26
:)
Félix C. Morency
@fmorency
Feb 28 2017 21:29
thanks for your help
Paolo Di Tommaso
@pditommaso
Feb 28 2017 21:30
happy to help
Mike Smoot
@mes5k
Feb 28 2017 23:37

@pditommaso my brain isn't working well today, can you explain why this code is failing? None of the files from the input list are getting copied into the work dir:

Channel
    //.from("asdf.txt", "fdsa.txt")
    .from("asdf.txt")
    .map{ file(it) }
    .toSortedList()
    .into{
        data
    }

process hello {

    input:
    file "*.txt" from data

    script:
    """
    cat *.txt
    """
}

If I toggle the comments for .from(...) then things work as expected. I gather there's some different handling for lists of length one.

Paolo Di Tommaso
@pditommaso
Feb 28 2017 23:51
because to maintain the original file name you should use file "*" from data
Mike Smoot
@mes5k
Feb 28 2017 23:52
Yup, that works.... any idea why?
Paolo Di Tommaso
@pditommaso
Feb 28 2017 23:52
otherwise with a single file *.txt it's expanded to .txt
Mike Smoot
@mes5k
Feb 28 2017 23:53
yeah, that's the behavior I see, but why is a single file different?
It's no big deal, I'm just curious
Paolo Di Tommaso
@pditommaso
Feb 28 2017 23:53
if you look to the table it's coherent
though a bit counterintuitive in this example .. :/
Mike Smoot
@mes5k
Feb 28 2017 23:54
cool, I'll take a look. Thanks!
Paolo Di Tommaso
@pditommaso
Feb 28 2017 23:56
though it should have more sense to maintain the original file name .. need to think if it could have some breaking side effect on existing code
you may want to report as an issue
Mike Smoot
@mes5k
Feb 28 2017 23:57
I'll study a bit closer and report something if it makes sense