These are chat archives for nextflow-io/nextflow

7th
Sep 2017
spaceturtle
@spaceturtle
Sep 07 2017 06:13
Is there any function or library to handle help?
Simone Baffelli
@baffelli
Sep 07 2017 06:45
@pditommaso I should have applied :(
I could have given another perspective on nextflow since my work is not even remotely related to genomics.
Paolo Di Tommaso
@pditommaso
Sep 07 2017 07:08
@spaceturtle what help?
@baffelli yep!
Samuel Lampa
@samuell
Sep 07 2017 07:42
Hi folks, just read https://f1000research.com/articles/6-876/v1 and found that Nextflow is not included in popular tool directories like https://bio.tools or https://omictools.com
Paolo Di Tommaso
@pditommaso
Sep 07 2017 07:45
Hi, you may want to comment that ..
Samuel Lampa
@samuell
Sep 07 2017 07:48
Not sure yet how the editing process works ... if crowd-sourced, or by some edit team ...
It seems at least on bio.tools, site users can add content
... So I presume it is thought that tool makers do register their tools.
Phil Ewels
@ewels
Sep 07 2017 07:53
I think other people registered MultiQC on both, but I "claimed" them as author afterwards (not that it makes much difference).
Paolo Di Tommaso
@pditommaso
Sep 07 2017 07:53
yes, they makes very generic recommendations apart the one to use domain registries, that IMHO are completely useless :)
(still need to wake up..)
Maxime Garcia
@MaxUlysse
Sep 07 2017 08:03
I registered CAW on biotools too, but it was just so that it could appear on http://www.nbis.se/infrastructure/tools/
Paolo Di Tommaso
@pditommaso
Sep 07 2017 08:05
how they are linked ?
Maxime Garcia
@MaxUlysse
Sep 07 2017 08:07
They grep the tools they're working on from the biotools website
Paolo Di Tommaso
@pditommaso
Sep 07 2017 08:07
I see
Maxime Garcia
@MaxUlysse
Sep 07 2017 08:08
I'm guessing they just did not want to make another catalogue of tools
Paolo Di Tommaso
@pditommaso
Sep 07 2017 08:09
raise your hand if you have ever used such catalogs when looking for a tool ..
Evan Floden
@evanfloden
Sep 07 2017 08:11
Sounds like we need another registry of registries
Maxime Garcia
@MaxUlysse
Sep 07 2017 08:11
I did use this sort of thing when looking for references and other tools, I think it can be useful if you're not up to date, or not very connected
But I agree, it's not that useful, and we could probably do without
Paolo Di Tommaso
@pditommaso
Sep 07 2017 08:12
Phil retired his comment :)
Samuel Lampa
@samuell
Sep 07 2017 08:12
While I like that bio.tools is more "free" / "open" (as I understand), I really like the UI of omictools ... have used a bit to browse around for datasets ... not sure that I used for any concrete work yet, but might do
Maxime Garcia
@MaxUlysse
Sep 07 2017 08:12
image too big I guess
Phil Ewels
@ewels
Sep 07 2017 08:14
:+1: I definitely see the point in having mini-lists of packages maintained by a group etc as with the NBIS one (the SciLifeLab one is http://opensource.scilifelab.se )
I find the big ones a bit overwhelming though, and the info about each packages is often very thin on the ground
So I'm not sure how much more helpful it is than a google search to be honest
Samuel Lampa
@samuell
Sep 07 2017 08:16
I think because of the more "visual" UI of omictools, I think it can give a sense of the "landscape" of tools and datasets in a given field, when new to it...
Evan Floden
@evanfloden
Sep 07 2017 08:16

IMO these catalogs will become much more relevant when they include 1) containers, 2)programatically access/format to the command line options 3) format for type of data in/out.

This will allow the super fast deployment of tools into workflows such as NF.

Samuel Lampa
@samuell
Sep 07 2017 08:16
(or, at least that is my impression ... then comes the question of data completeness of course ... you don't know what you don't know :) )
Good points @skptic ... would be awesome with more machine readability of that kind.
Problem is, folks doing machine readability these days (semweb guys) are often doing things in way overcomplicated ways, with too little eye for practicality, so that the community would have the slightest chance of helping to curate data.
Phil Ewels
@ewels
Sep 07 2017 08:23
Whilst I like the idea of that @skptic, I can't see us ever doing something like that. As it is, it often takes us quite a while of reading and testing to work out what flags to use with each tool in a pipeline (for best results / biological sense). I can't see how an automated machine system could get that right, at least not easily.
Samuel Lampa
@samuell
Sep 07 2017 08:30
Yea, I think the data itself would have to be self-described with meta data in more detail, to have any chance of that to work. There are data input/output descriptions for tools (https://en.wikipedia.org/wiki/SADI), but yea, requires them to consume/produce data in RDF format :P
Evan Floden
@evanfloden
Sep 07 2017 08:37
@ewels Very true, there is really no substitute for biological interpretation. With regards to flags etc, a first step would be a 'standard' yuck which the catalogs used to describe them so we can parse them into something like this kallisto config example in NF.
Phil Ewels
@ewels
Sep 07 2017 08:51
Yup, I'm always keen for sensible defaults! Also, I feel like it's time to link to the standard XKCD comic now..
Maxime Garcia
@MaxUlysse
Sep 07 2017 08:59
I don't think this comic is up to date anymore, we should make a new standard comic to replace it ;-)
Rickard Hammarén
@Hammarn
Sep 07 2017 09:21
I have a question about . fromFilePairs. Or other ways of grouping inputfiles. My use case is that I have samples sequenced over multiple lanes and want to submit them all to one process. eg.
3_160412_AC8B7GANXX_P4701_1001_1.fastq.gz  3_160412_AC8B7GANXX_P4701_1001_2.fastq.gz  4_160412_AC8B7GANXX_P4701_1001_1.fastq.gz  4_160412_AC8B7GANXX_P4701_1001_2.fastq.gz  5_160412_AC8B7GANXX_P4701_1001_1.fastq.gz  5_160412_AC8B7GANXX_P4701_1001_2.fastq.gz
How do I end up with 3 grouped pairs? Do I need to handle it with some custom code from the beging instead of using something like .fromFilePairs
Paolo Di Tommaso
@pditommaso
Sep 07 2017 09:25
not sure, but setting channel.fromFilePairs('?_*_{1,2}.fastq.gz ', size:6 ) should work
Tiffany Delhomme
@tdelhomme
Sep 07 2017 09:26
Hi all,
does anyone know if this is expected?
delhommet@X130121:~/Documents/GitHub_repos/nextflow_toys$ cat test_params.nf 
#!/usr/bin/env nextflow

/*
 * Try to change an input parameter keeping same name
 */

params.foo = null

if (params.foo == "x") { params.foo = "new_foo1"; log.info "I am trying to set foo as new_foo1" }
  else if (params.foo == "y") { params.foo = "new_foo2" ; log.info "I am trying to set foo as new_foo2"}
  else {log.info "I don't want to change foo"}

log.info "The value of foo is : ${params.foo}"


delhommet@X130121:~/Documents/GitHub_repos/nextflow_toys$ nextflow run test_params.nf --foo=x
N E X T F L O W  ~  version 0.24.4
Launching `test_params.nf` [small_hopper] - revision: e427052373
I am trying to set foo as new_foo1
The value of foo is : x
Rickard Hammarén
@Hammarn
Sep 07 2017 09:27
@pditommaso thanks, I'll have a play with it
Paolo Di Tommaso
@pditommaso
Sep 07 2017 09:27
yes, params cannot be overridden
use another var
Tiffany Delhomme
@tdelhomme
Sep 07 2017 09:29
ok! thanks :+1:
Rickard Hammarén
@Hammarn
Sep 07 2017 09:38
hmm, only results in
ERROR ~ Cannot find any reads matching: *P4701_1001_{1,2}.fastq.gz
Paolo Di Tommaso
@pditommaso
Sep 07 2017 09:46
with
Channel.fromFilePairs('{3,4,5}_*_{1,2}.fastq.gz', size:6).println()
[, [/Users/pditommaso/projects/nextflow/3_160412_AC8B7GANXX_P4701_1001_1.fastq.gz, /Users/pditommaso/projects/nextflow/3_160412_AC8B7GANXX_P4701_1001_2.fastq.gz, /Users/pditommaso/projects/nextflow/4_160412_AC8B7GANXX_P4701_1001_1.fastq.gz, /Users/pditommaso/projects/nextflow/4_160412_AC8B7GANXX_P4701_1001_2.fastq.gz, /Users/pditommaso/projects/nextflow/5_160412_AC8B7GANXX_P4701_1001_1.fastq.gz, /Users/pditommaso/projects/nextflow/5_160412_AC8B7GANXX_P4701_1001_2.fastq.gz]]
almost work, but it does not capture the prefix
I think you need to specify your own grouping rule
Evan Floden
@evanfloden
Sep 07 2017 12:28

Before I put in a feature request: When a job fails, it would be really useful to get the work dir as is given when the job is submitted. The message for example:

WARN: Process `dpa_alignment (ltn - UPP - DPA - 1000)` terminated with an error exit status (126) -- Execution is retried (1)

makes it almost impossible to track back when I have thousands of jobs running (and failing).
I would propose something like:

WARN: Process `dpa_alignment (ltn - UPP - DPA - 1000)` [da/8e42c6] terminated with an error exit status (126) -- Execution is retried (1)
Paolo Di Tommaso
@pditommaso
Sep 07 2017 12:31
The task hash would help right?
Paolo Di Tommaso
@pditommaso
Sep 07 2017 12:38
Ah, actually​ the task hash is printed. That's the work dir prefix
You would like the complete path?
Evan Floden
@evanfloden
Sep 07 2017 12:41
even just workdir prefix would be ideal. keeps it neat and tidy
To be consistent with submission and resubmission:
[da/8e42c6] WARN: Process `dpa_alignment (ltn - UPP - DPA - 1000)`terminated with an error exit status (126) -- Execution is retried (1)
Phil Ewels
@ewels
Sep 07 2017 12:51
Hi @pditommaso - I'm trying to be clever with our config file with reference genomes, but stuff isn't working as I would expect.
I have the following:
params {
  igenomes_base = './iGenomes/'
  genomes {
    'GRCh37' {
      bed12   = { "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" }
However, ${params.igenomes_base} isn't substituting in to the file path as I expect
If we print it at run time, instead of a nice file path string, we get:
igenomes$_run_closure1$_closure2$_closure3$_closure28@669513d8
Any ideas what I'm getting wrong here?
If we try to use it in a fromPath channel we get the following:
ERROR ~ org.codehaus.groovy.runtime.GStringImpl cannot be cast to java.nio.file.FileSystem
Paolo Di Tommaso
@pditommaso
Sep 07 2017 13:09
you won't need the closure
just use bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed"
Rickard Hammarén
@Hammarn
Sep 07 2017 13:10
that's what we used originally but that wouldn't expand properly
Paolo Di Tommaso
@pditommaso
Sep 07 2017 13:10
why not ?
Rickard Hammarén
@Hammarn
Sep 07 2017 13:11
It gives the following error:
 /var/spool/slurmd/job206121/slurm_script: line 71: /lupus/proj/ngi2016003/nobackup/NGI/ANALYSIS/P8455/rnaseq_ngi_v1.3_test/${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex: bad substitution
Paolo Di Tommaso
@pditommaso
Sep 07 2017 13:13
ummm
the following snippet
println params
prints
[igenomes_base:./iGenomes/, genomes:[GRCh37:[bed12:./iGenomes//Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed]]]
from where does it come that error ?
Rickard Hammarén
@Hammarn
Sep 07 2017 13:20
this line in the .comman.run sbatch file;
ln -s /lupus/proj/ngi2016003/nobackup/NGI/ANALYSIS/P8455/rnaseq_ngi_v1.3_test/${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex STARIndex
Phil Ewels
@ewels
Sep 07 2017 13:23
Are double quotes important? @Hammarn I think we had single quotes before?
Paolo Di Tommaso
@pditommaso
Sep 07 2017 13:23
ummm, yes I think something like that
it's like BASH, single-quote => no variable substitution
Phil Ewels
@ewels
Sep 07 2017 13:24
Yup, I noticed the single quotes so switched them when I added the closure. I just overshot ;)
Rickard Hammarén
@Hammarn
Sep 07 2017 13:29
Ok, I'll give that a proper test tomorrow morning!
Steve Marshall
@stevemmarshall
Sep 07 2017 15:24
quick question regarding using AWS, should I use the nextflow ami, do I need to install things like bioconda?
Paolo Di Tommaso
@pditommaso
Sep 07 2017 15:26
NF only requires a JVM and the Docker runtime
you can use any VM provided it include those deps and it support https://cloudinit.readthedocs.io/en/latest/
(disclaimer: only tested with Amazon linux and Ubuntu)
Steve Marshall
@stevemmarshall
Sep 07 2017 17:37
for using sge as the scheduler do you need to enable anything else? I can run the hello world example locally on the master node but when I try and run it on the cluster, I get this error... Failed to connect to auth socket '/tmp/.commd/socket'
Paolo Di Tommaso
@pditommaso
Sep 07 2017 18:00
NF can be deployed with or w/o a legacy scheduler in the cloud
installing SGE is quite a cumbersome task, I would suggest to use third party tools like Elisticluster or Alcesflight
Phil Ewels
@ewels
Sep 07 2017 19:02
For me the beauty of using nextflow + docker in AWS is that you don't need to worry about any of that stuff :) Small and light but you have everything you need..
Paolo Di Tommaso
@pditommaso
Sep 07 2017 19:07