These are chat archives for nextflow-io/nextflow

2nd
Nov 2018
Rad Suchecki
@rsuchecki
Nov 02 2018 03:46
try this to see what is available @micans
import static groovy.json.JsonOutput.*
println(prettyPrint(toJson(config)))
Krittin Phornsiricharoenphant
@sinonkt
Nov 02 2018 06:29
Hi, I encounter this error from a multi-thread process, Could u plz hint me how to track further?
works/06/c28/.command.stub: redirection error: cannot duplicate fd: Too many open files
works/06/c28/.command.stub: cannot make pipe for commandsubstitution: Too many open files
works/06/c28/.command.stub: pipe error: Too many open files
works/06/c28/.command.stub: pipe error: Too many open files
Krittin Phornsiricharoenphant
@sinonkt
Nov 02 2018 06:35
.command.sh look like this, for small file.vcf.gz file it’s perfectly fine.
#!/bin/bash -ue
bgzip --decompress --threads 32 -c file.vcf.gz |
    sed 's/ID=AD,Number=./ID=AD,Number=R/' |
    vt decompose -s - |
    vt normalize -r ref.fasta - |
    java -Xmx32G -jar $SNPEFF_JAR -t GRCh37.75 |
    bgzip --threads 32 -c > annotated.vcf.gz
tabix -p vcf annotated.vcf.gz
Maxime HEBRARD
@mhebrard
Nov 02 2018 06:54

hello
I have a serie of processes that input a list of name, create files... Is there a way to check if the output file exist then skip the processes for that name ?

Channel.from('A', 'B', 'C') .set{chList}

process procFirst {
  input: val name from chList
  output: set val(name), stdout into chFirstOut
  """
  printf "Prefix_${name}"
  """
}

process procSecond {
  publishDir path: "${params.outdir}/sub/",
    saveAs: {filename -> "${arr[0]}/$filename"},
    mode: 'copy'
  input: val arr from chFirstOut
  output:
    file '*.txt'
    val "${arr[1]}_Suffix.txt" into chSecondOut
  script:
    """
    touch "${arr[1]}_Suffix.txt"
    """
}

chSecondOut.subscribe{ println it }

I am loking into when or beforeScript but not sure how to proceed

Maxime Garcia
@MaxUlysse
Nov 02 2018 07:15
@mhebrard Have you tried something like when: !file ?
with file being the file you want to know if it exist already
Maxime HEBRARD
@mhebrard
Nov 02 2018 07:18
Trying now
Maxime HEBRARD
@mhebrard
Nov 02 2018 07:24
process procSecond {
  when: !file("${params.outdir}/sub/${arr[0]}/${arr[1]}_Suffix.txt")
  publishDir path: "${params.outdir}/sub/",
    saveAs: {filename -> "${arr[0]}/$filename"},
    mode: 'copy'
  input: val arr from chFirstOut
  output:
    file '*.txt'
    val "${arr[1]}_Suffix.txt"
    stdout into chSecondOut
  script:
    bool = !file("${params.outdir}/sub/${arr[0]}/${arr[1]}_Suffix.txt")
    """
    printf "${arr[0]}:${bool}"
    touch "${arr[1]}_Suffix.txt"
    """
}

// B:false
// But the process is still printing
Maxime HEBRARD
@mhebrard
Nov 02 2018 07:38
hmm hmm very strange ... if I put the when directive after the input, then the procSecond is not running
Maxime Garcia
@MaxUlysse
Nov 02 2018 07:42
when should be between output and script I think
Maxime HEBRARD
@mhebrard
Nov 02 2018 07:43
actually the problem is file(..) return the path of my file weither the file exist or not !
yes when after input is working
Maxime Garcia
@MaxUlysse
Nov 02 2018 07:44
Maybe when: !("${params.outdir}/sub/${arr[0]}/${arr[1]}_Suffix.txt").exists()
Maxime HEBRARD
@mhebrard
Nov 02 2018 07:50
yes !! file().exists() sounds like what I want
Maxime Garcia
@MaxUlysse
Nov 02 2018 07:52
Hoping it'll work
Quick question
Why do you have this arr[0] and arr[1] ?
That does seems a little strange
Maxime HEBRARD
@mhebrard
Nov 02 2018 07:58
oh it is because I wish to keep my original name along with the files ......
full working script
params.out = "./test/"
Channel.from('A', 'B', 'C').set{chList}

process procFirst {
  tag "${name}"
  input: val name from chList
  output: set val(name), stdout into chFirstOut
  """
  printf "Prefix_${name}"
  """
}

process procSecond {
  tag "${arr[0]}"
  publishDir path: "${params.out}", mode: 'copy',
    saveAs: {filename -> "${arr[0]}/$filename"}
  input: val arr from chFirstOut
  when: !file("${params.out}/${arr[0]}/${arr[1]}_Suffix.txt").exists()
  output:
    file '*.txt'
    stdout into chSecondOut
  script:
    bool = !file("${params.out}/${arr[0]}/${arr[1]}_Suffix.txt").exists()
    """
    printf "Create file ${arr[0]}:${bool}"
    touch "${arr[1]}_Suffix.txt"
    """
}

chSecondOut.subscribe{ println it }
then if the end files don't exist, ProcSecond is running... if the files exist procSecond is not running... and if only file A exists, then ProcSecond will run for B and C :)
that was a bit puzzled but that works fine
Maxime Garcia
@MaxUlysse
Nov 02 2018 08:06
I'm happy I could be of help
Maxime HEBRARD
@mhebrard
Nov 02 2018 08:07
yes thanks :) ... writing sniplet flow help a lot to sort the stuff out ... instead of testing with true data >.< "
crap my usecase is a bit more complex :( .... if file exist I want to sent it in a channel ... else I want to generate file first then send to a channel ....
Maxime Garcia
@MaxUlysse
Nov 02 2018 08:36
I see
The way we managed that was different entry point
we can begin our pipeline with fastqs or with differently processed bams
and managing all processes with when statements
Maxime HEBRARD
@mhebrard
Nov 02 2018 08:41
sounds good ... I am looking ant the choice operator on channel ...
like list of name -> choice (if bam exists > send to counting step : else > read fastq and map > send to counting step)
Maxime Garcia
@MaxUlysse
Nov 02 2018 08:45
That would work
Maxime HEBRARD
@mhebrard
Nov 02 2018 08:45
bcz if bam exist I still need to catch them for the next step
and then Mix the output channels chToCounting.mix(chBamExists, chBamCreated)
Maxime HEBRARD
@mhebrard
Nov 02 2018 09:29
Question, can I check if a params exists and throw an error if not inside a process ?
hmm maybe by adding a when: params.arg then throw the error if the output channel isEmpty ...
Maxime Garcia
@MaxUlysse
Nov 02 2018 09:32
yes
if (params.test) exit 0, "error message"
Maxime HEBRARD
@mhebrard
Nov 02 2018 09:34
the if statment will work before the process, but not inside right
Maxime Garcia
@MaxUlysse
Nov 02 2018 09:37
right
micans
@micans
Nov 02 2018 10:39
Thanks @rsuchecki !
rfenouil
@rfenouil
Nov 02 2018 11:44

Hello I am trying to generate sample names from file name by removing suffix in a script block.

input:
    file alignResults    from    ch_alignResult_forFilter.collect()

script:
    """
    filterALign.R --inputFiles  $alignResults \
                  --sampleNames ${ alignResults.collect{ it.name-~'_aligned\\.tsv.*' }.join(' ') } 
    """

Files list comes from a collected channel so I believe I deal with a nextflow.util.BlankSeparatedList object.
Because the "alignResults.collect{}" seem to return a simple list object, I had to mimic the BlankSeparatedList behaviour by using 'join' on result but I believe there is much simpler way to achieve this. Can you help making it cleaner ?

Riccardo Giannico
@giannicorik_twitter
Nov 02 2018 11:49

@rfenouil I had the same question and I adopted this amazing solution:

   Channel
        .fromFilePairs("${params.dir}/*.fastq.gz",size:-1) {file -> file.name.split(/_S\d+_L/)[0]}
        .ifEmpty {error "File ${params.dir} not parsed properly"}
        .set { samplelist }

   process mapping {
        input:
        set val(sample), file(reads) from samplelist

It works great even if you have multiple fastq for the same sample

Martin Proks
@matq007
Nov 02 2018 12:37
has anyone had persmission issues with singularity images? I've converted docker image to singularity image and the tools requires write access to conda env path which is denied due to insufficient permissions. Anyone has an idea how to workaround this issue?
Martin Proks
@matq007
Nov 02 2018 12:52
Ok, so it's because the image is only read-only mode, I've tried added runOptions = "--writable" but that doesn't solve the issue, then I get ERROR : Unable to open squashfs image in read-write mode: Read-only file system. So what can be my workaround?
Maxime Garcia
@MaxUlysse
Nov 02 2018 12:53
what is your singularity version?
Martin Proks
@matq007
Nov 02 2018 12:53
2.6.0-dist
I found this https://github.com/mirnylab/distiller-nf/blob/master/nextflow.config but this looks quite complicated ...
rfenouil
@rfenouil
Nov 02 2018 13:43
@giannicorik_twitter thank you for your answer but I don't understand the strategy in your example. You create pair of (samplename, filename) in your channel ?
Or did you want to illustrate the use use of split on filenames ?
from what I understand in your example, "samplelist" channel should only contain parsed filename, how do you keep file path in there ?
Pierre Lindenbaum
@lindenb
Nov 02 2018 13:54

@pditommaso

weird, I don't see how it can be related to the custom executor

ok, I've debugged a little the standalone jar compiled with make pack . As far as I understand the state of each process is never saved in .nextflow/history because the following test if( !cli ) is always true when running from the jar : https://github.com/nextflow-io/nextflow/blob/master/src/main/groovy/nextflow/script/ScriptRunner.groovy#L398

void verifyAndTrackHistory(String cli, String name) {
(...)
        if( !cli )
return
(....)
}

Am I on the right path ? How can I enable this when running from the jar file.( reminder: I'm compiling a custom Executor and I can only use it 'offline')
Thanks

Paolo Di Tommaso
@pditommaso
Nov 02 2018 14:00
how are you launching your local build ?
I mean how do you run NF you have compiled ?
Pierre Lindenbaum
@lindenb
Nov 02 2018 14:02
make pack && java -jar build/libs/nextflow-0.33.0-SNAPSHOT-all.jar run -resume test02.nf
Paolo Di Tommaso
@pditommaso
Nov 02 2018 14:02
replace java -jar build/libs/nextflow-0.33.0-SNAPSHOT-all.jar with ./launch.sh
I guess that's the problem
Pierre Lindenbaum
@lindenb
Nov 02 2018 14:05
OK, with launch.sh. The second tume I run my workflow I got: "WARN: [splitLetters (run 2 seconds)] Unable to resume cached task -- See log file for details"
in the log:
3 [Actor Thread 3] WARN  nextflow.processor.TaskProcessor - [splitLetters (run 1 seconds)] Unable to resume cached task -- See log file for 
details
java.lang.RuntimeException: java.nio.channels.ClosedChannelException
    at com.google.common.base.Throwables.propagate(Throwables.java:240)
    at org.iq80.leveldb.table.Table.openBlock(Table.java:83)
    at org.iq80.leveldb.util.TableIterator.getNextBlock(TableIterator.java:102)
    at org.iq80.leveldb.util.TableIterator.seekInternal(TableIterator.java:57)
    at org.iq80.leveldb.util.TableIterator.seekInternal(TableIterator.java:26)
    at org.iq80.leveldb.util.AbstractSeekingIterator.seek(AbstractSeekingIterator.java:41)
    at org.iq80.leveldb.util.InternalTableIterator.seekInternal(InternalTableIterator.java:45)
    at org.iq80.leveldb.util.InternalTableIterator.seekInternal(InternalTableIterator.java:25)
    at org.iq80.leveldb.util.AbstractSeekingIterator.seek(AbstractSeekingIterator.java:41)
    at org.iq80.leveldb.impl.Level0.get(Level0.java:103)
    at org.iq80.leveldb.impl.Version.get(Version.java:169)
    at org.iq80.leveldb.impl.VersionSet.get(VersionSet.java:223)
    at org.iq80.leveldb.impl.DbImpl.get(DbImpl.java:616)
    at org.iq80.leveldb.impl.DbImpl.get(DbImpl.java:577)
    at nextflow.CacheDB.getTaskEntry(CacheDB.groovy:160)
    at nextflow.CacheDB$getTaskEntry.call(Unknown Source)
    at nextflow.processor.TaskProcessor.checkCachedOutput(TaskProcessor.groovy:814)
    at nextflow.processor.TaskProcessor$checkCachedOutput$2.callCurrent(Unknown Source)
    at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:51)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:157)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:185)
    at nextflow.processor.TaskProcessor.checkCachedOrLaunchTask(TaskProcessor.groovy:699)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.codehaus.groovy.runtime.callsite.PlainObjectMetaMethodSite.doInvoke(PlainObjectMetaMethodSite.java:43)
    at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:190)
    at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:58)
    at org.codehaus.groovy.runtime.callsite.AbstractCallSite.callCurrent(AbstractCallSite.java:185)
    at nextflow.processor.TaskProcessor.invokeTask(TaskProcessor.groovy:540)
    at nextflow.processor.InvokeTaskAdapter.call(InvokeTaskAdapter.groovy:62)
    at groovyx.gpars.dataflow.operator.DataflowOperatorActor.startTask(DataflowOperatorActor.java:120)
    at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor.access$001(ForkingDataflowOperatorActor.java:35)
    at groovyx.gpars.dataflow.operator.ForkingDataflowOperatorActor$1.run(ForkingDataflowOperatorActor.java:58)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException: null
    at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110)
    at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:721)
    at org.iq80.leveldb.table.FileChannelTable.read(FileChannelTable.java:96)
    at org.iq80.leveldb.table.FileChannelTable.readBlock(FileChannelTable.java:55)
    at org.iq80.leveldb.table.Table.openBlock(Table.java:80)
    ... 36 common frames omitted
Paolo Di Tommaso
@pditommaso
Nov 02 2018 14:06
wow ugly, never seen before
Pierre Lindenbaum
@lindenb
Nov 02 2018 14:06
(I'm running the local executor, not my custome one)
Riccardo Giannico
@giannicorik_twitter
Nov 02 2018 14:07

@rfenouil if you try out my example and than you print the content of samplelist like this samplelist.println() you'll find out it is like this:

[Sample1, [/path/to/Sample1_S1_L001_R1_001.faastq.gz,/path/to/Sample1_S1_L001_R2_001.faastq.gz, /path/to/Sample1_S1_L001_R1_002.faastq.gz,/path/to/Sample1_S1_L001_R2_002.faastq.gz] ,
Sample2,  [/path/to/Sample2_S1_L001_R1_001.faastq.gz,/path/to/Sample2_S1_L001_R2_001.faastq.gz, /path/to/Sample2_S1_L001_R1_002.faastq.gz,/path/to/Sample2_S1_L001_R2_002.faastq.gz] ]

if in the 'input' block of code you write set val(sample), file(reads) from samplelist
you will have a 'mapping' process running for each sample in a different folder, for each folder you will have symlinks to that sample's fastq.gz (and you will have a "sample" variable containing the sample name) .
Does it answer to your question?

Paolo Di Tommaso
@pditommaso
Nov 02 2018 14:09
@lindenb delete the .nextflow directory in the launching dir and try it again
rfenouil
@rfenouil
Nov 02 2018 14:13
@giannicorik_twitter thank you, I understand the set val(sample), file(reads) from samplelist.
I actually forgot that fromfilepairs creates tuples with sample name followed by list of paths... That's why I did not understand how you managed to get both sample name and file path in your channel :)
got it now thank you
Pierre Lindenbaum
@lindenb
Nov 02 2018 14:13
@pditommaso (same, running twice)
$ rm -rf .nextflow && ./launch.sh run -resume test02.nf  &&  ./launch.sh run -resume test02.nf 
N E X T F L O W  ~  version 0.33.0-SNAPSHOT
Launching `test02.nf` [hopeful_colden] - revision: 1d2b45ebb4
WARN: It appears you have never run this project before -- Option `-resume` is ignored
[warm up] executor > local
[d4/fe97af] Submitted process > splitLetters (run 1 seconds)
[94/6250dc] Submitted process > splitLetters (run 2 seconds)
[16/c2fcb2] Submitted process > splitLetters (run 3 seconds)
[74/420d46] Submitted process > p2
N E X T F L O W  ~  version 0.33.0-SNAPSHOT
Launching `test02.nf` [compassionate_turing] - revision: 1d2b45ebb4
[warm up] executor > local
WARN: [splitLetters (run 1 seconds)] Unable to resume cached task -- See log file for details
WARN: [splitLetters (run 2 seconds)] Unable to resume cached task -- See log file for details
WARN: [splitLetters (run 3 seconds)] Unable to resume cached task -- See log file for details
[1b/84536f] Submitted process > splitLetters (run 1 seconds)
[10/4dde12] Submitted process > splitLetters (run 2 seconds)
[9c/8838f6] Submitted process > splitLetters (run 3 seconds)
[b9/f58ea1] Submitted process > p2
Paolo Di Tommaso
@pditommaso
Nov 02 2018 14:15
are you launching it in a NFS like shared directory ?
Pierre Lindenbaum
@lindenb
Nov 02 2018 14:15
no , my local computer
Paolo Di Tommaso
@pditommaso
Nov 02 2018 14:16
what if you run nextflow run hello && nextflow run hello -resume ?
Pierre Lindenbaum
@lindenb
Nov 02 2018 14:18
@pditommaso
N E X T F L O W  ~  version 0.33.0-SNAPSHOT                 
Pulling nextflow-io/hello ...
 downloaded from https://github.com/nextflow-io/hello.git
Launching `nextflow-io/hello` [insane_franklin] - revision: c9b0ec7286 [master]
[warm up] executor > local
[5d/d375c4] Submitted process > sayHello (2)
[70/bb1113] Submitted process > sayHello (3)
[04/d841ef] Submitted process > sayHello (1)
Hello world!
Ciao world!
[df/b2f20c] Submitted process > sayHello (4)
Bonjour world!
Hola world!
N E X T F L O W  ~  version 0.33.0-SNAPSHOT
Launching `nextflow-io/hello` [magical_cori] - revision: c9b0ec7286 [master]
[warm up] executor > local
WARN: [sayHello (1)] Unable to resume cached task -- See log file for details
WARN: [sayHello (2)] Unable to resume cached task -- See log file for details
WARN: [sayHello (3)] Unable to resume cached task -- See log file for details
WARN: [sayHello (4)] Unable to resume cached task -- See log file for details
[3c/e52347] Submitted process > sayHello (1)
[94/870f44] Submitted process > sayHello (2)
[0d/946952] Submitted process > sayHello (3)
Bonjour world!
[19/507460] Submitted process > sayHello (4)
Ciao world!
Hello world!
Hola world!
Paolo Di Tommaso
@pditommaso
Nov 02 2018 14:19
well, I was asking the stock nextflow to be sure it's not a problem with your local build
Pierre Lindenbaum
@lindenb
Nov 02 2018 14:20
ok
N E X T F L O W  ~  version 0.31.1
Launching `nextflow-io/hello` [romantic_brattain] - revision: c9b0ec7286 [master]
[warm up] executor > local
[6c/d59f7e] Submitted process > sayHello (2)
[23/edab51] Submitted process > sayHello (1)
[c4/791eb3] Submitted process > sayHello (3)
Ciao world!
[d4/74e83b] Submitted process > sayHello (4)
Bonjour world!
Hola world!
Hello world!
N E X T F L O W  ~  version 0.31.1
Launching `nextflow-io/hello` [pensive_noyce] - revision: c9b0ec7286 [master]
[warm up] executor > local
[6c/d59f7e] Cached process > sayHello (2)
[23/edab51] Cached process > sayHello (1)
Bonjour world!
Ciao world!
[c4/791eb3] Cached process > sayHello (4)
Hola world!
[d4/74e83b] Cached process > sayHello (3)
Hello world!
Paolo Di Tommaso
@pditommaso
Nov 02 2018 14:21
we can conclude it's a problem in your build :)
Pierre Lindenbaum
@lindenb
Nov 02 2018 14:22
yep... I don't think I've changed critical things in the code when I wrote my custom engine... :-s
Paolo Di Tommaso
@pditommaso
Nov 02 2018 14:23
hard to say without having a look to the code ¯\(ツ)
Pierre Lindenbaum
@lindenb
Nov 02 2018 14:24
:-)
chdem
@chdem
Nov 02 2018 14:50
Hi everyone !
Just a basic technical question
Is there a way to execute some code after all instances of a process have been run?
Paolo Di Tommaso
@pditommaso
Nov 02 2018 14:53
of a specific process ?
chdem
@chdem
Nov 02 2018 14:53
yeap
chdem
@chdem
Nov 02 2018 14:59
Ok
Ahah, that's what I'm already doing. It's practical but a little complicated for something finally quite simple. Aren't you planning something nicer in writing?
Or is this the definitive official method?
Paolo Di Tommaso
@pditommaso
Nov 02 2018 15:19
maybe a better method could be implemented, you may want to a open a feature request for that
chdem
@chdem
Nov 02 2018 16:12
Sure !
Thanks @pditommaso ! :D
Paolo Di Tommaso
@pditommaso
Nov 02 2018 16:13
welcome :)
micans
@micans
Nov 02 2018 16:42
For this mock thing I assume you'd do a collect() on the channel if you want bar to run after all instances of foo have finished.
Paolo Di Tommaso
@pditommaso
Nov 02 2018 16:43
oh-oh, you are right!
chdem
@chdem
Nov 02 2018 16:45
I like this ! :D
micans
@micans
Nov 02 2018 16:46
:+1:
Paolo Di Tommaso
@pditommaso
Nov 02 2018 16:51
actually that was needed if the first task were executed more than one time
micans
@micans
Nov 02 2018 18:36
yes it applied more to the @chdem use case than the mock example .. hagw!