These are chat archives for nextflow-io/nextflow

6th
Mar 2019
teoKusa!
@teoKusa_twitter
Mar 06 08:17
Hi everyone, I have a simple question. I'm using the channel factory fromPath to generate some channels that need to be mixed and combined later.
I'm keeping very similar file names in all the directories, so I expect the files to show up in the channels in the same order
however that doesn't seem to be the case when I print them - any clue on the order with which the files are added to a channel using glob?
so something like fromPath("/bla/bla/bla/*.fasta")
Luca Cozzuto
@lucacozzuto
Mar 06 08:41
@tobsecret wonderful!
Tim Dudgeon
@tdudgeon
Mar 06 09:14
I'm trying to use the nextflow run -C my.config option (upper case C) to specify to only use my.config for configuration but am getting:
Unknown option: -C -- Check the available commands and options and syntax with 'help'
Is the tip found in the docs incorrect?
https://www.nextflow.io/docs/latest/config.html
Kevin Sayers
@KevinSayers
Mar 06 09:17
@tdudgeon nextflow -C my.config run .... should work
Tim Dudgeon
@tdudgeon
Mar 06 09:19
OK, so the order is important!
KochTobi
@KochTobi
Mar 06 09:45
for some arguments it is for some it isn't. AFAIK some arguments like -resume for example can appear anywhere after your workflow and others like -log logfile or -C my.conf need to be located before the run command. I couldn't find anything on that matter in the documentation. Is there a section that lists all command line parameters and their order which I might have missed?
Maxime Garcia
@MaxUlysse
Mar 06 10:19
When it's not in the docs, create the docs
Kevin Sayers
@KevinSayers
Mar 06 10:32
@KochTobi There is some work being done on the arguments under the cli-docs branch on github. https://github.com/nextflow-io/nextflow/blob/cli-docs/docs/cli.rst
KochTobi
@KochTobi
Mar 06 11:26
@KevinSayers Ah thanks!!
KochTobi
@KochTobi
Mar 06 14:40
Hi there I just encountered another problem. I want to run a workflow without containers but with conda. I have a config file (https://github.com/KochTobi/Sarek/blob/master/conf/conda.config) where I set process.conda. This should disable containerization, correct? Sadly, for me it doesn't but pulls the singularity image. Any idea what I did wrong, what I misread / misunderstood?
Paolo Di Tommaso
@pditommaso
Mar 06 14:40
not at all
KochTobi
@KochTobi
Mar 06 14:41
so containerization is not affected by the process.conda ?
Paolo Di Tommaso
@pditommaso
Mar 06 14:41
you still need to specify docker.enable = true
provided it was enabled somehow else
KochTobi
@KochTobi
Mar 06 14:43
I don't want any containerization at all. Only the conda environment. So I set docker.enable = false and use process.conda = environment.yml and this should do the trick?
Paolo Di Tommaso
@pditommaso
Mar 06 14:43
by default there's no containerisation, therefore ignore it
process.conda take the conda file not a boolean
(and implicitly enable it)
KochTobi
@KochTobi
Mar 06 14:45
Ok thanks I will give it a try :+1:
Paolo Di Tommaso
@pditommaso
Mar 06 14:45
:ok_hand:
teoKusa!
@teoKusa_twitter
Mar 06 14:47
Hi everyone, I have a simple question. I'm using the channel factory fromPath to generate some channels that need to be mixed and combined later.
whops sorry
well - my question is above anyway :D
Paolo Di Tommaso
@pditommaso
Mar 06 14:47
that's an assertion not a question :D
teoKusa!
@teoKusa_twitter
Mar 06 14:53
sorry, I'll copy paste
@teoKusa_twitter
Hi everyone, I have a simple question. I'm using the channel factory fromPath to generate some channels that need to be mixed and combined later.
I'm keeping very similar file names in all the directories, so I expect the files to show up in the channels in the same order
however that doesn't seem to be the case when I print them - any clue on the order with which the files are added to a channel using glob?
so something like fromPath("/bla/bla/bla/*.fasta")
Paolo Di Tommaso
@pditommaso
Mar 06 14:54
order is not guaranteed
Lukas Jelonek
@lukasjelonek
Mar 06 14:55
Hi, is there a way to diplay the configuration of k8s jobs? I have a problem with using docker images from a private docker registry. At the moment I always have credential problems.
Paolo Di Tommaso
@pditommaso
Mar 06 14:55
put in the config
    k8s {
      debug.yaml = true
    }
and debug the generated pod spec
Lukas Jelonek
@lukasjelonek
Mar 06 14:56
Oh, so easy. Great. I'll try it
Lukas Jelonek
@lukasjelonek
Mar 06 15:04

As suspected the imagePullSecret is ignored. Here is my example:

process test {
        pod ([[imagePullSecret: "my-registry"], [env: 'TEST', value: 'HOHO'], [label: 'K', value:'V']])
        container 'my-registry-server:5003/test/test:latest'
        executor 'k8s'

        script:
        """
        ls -la
        """

}

and it produces the following yaml:

apiVersion: v1
kind: Pod
metadata:
  name: nf-1ab23b29aeca59948b5286dc4fa60bfc
  namespace: test-namespace
  labels: {app: nextflow, runName: exotic_payne, taskName: test, processName: test,
    sessionId: uuid-2c0517f4-01f4-4316-83a2-2c54ccd61e43}
spec:
  restartPolicy: Never
  containers:
  - name: nf-1ab23b29aeca59948b5286dc4fa60bfc
    image: my-registry-server:5003/test/test:latest
    command: [/bin/bash, -uex, .command.run]
    workingDir: /workspace/ubuntu/work/1a/b23b29aeca59948b5286dc4fa60bfc
    env:
    - {name: TEST, value: HOHO}
    - {name: NXF_OWNER, value: '0:0'}
    volumeMounts:
    - {name: vol-1, mountPath: /workspace}
  volumes:
  - name: vol-1
    persistentVolumeClaim: {claimName: test-volume}

It is missing the imagePullSecrets and I can't see the label as well in the yaml

Chelsea Sawyer
@csawye01
Mar 06 15:04
I am using the 'when' directive for some of my processes to check if a channel is not empty to then run the process but it is continuing to give me the error of "No signature of method: groovyx.gpars.dataflow.DataflowQueue.isEmpty() is applicable for argument types: () values: []
Possible solutions: ifEmpty(java.lang.Object), set(groovy.lang.Closure), identity(groovy.lang.Closure), isBound(), dump(), groupBy()" my code: when:
!some_channel.isEmpty()
Is there a way of doing this differently or is there just a small glaring error I am missing?
Lukas Jelonek
@lukasjelonek
Mar 06 15:05
Or maybe I missed something?
KochTobi
@KochTobi
Mar 06 15:19
@pditommaso thanks again. I used multiple profiles and one of them loaded singularity. Disabling docker in my conda profile worked like a charm.
Lukas Jelonek
@lukasjelonek
Mar 06 15:22
@csawye01 doesn't seem to be an operator isEmpty on channels. I would try to duplicate the channel with the into directive and to transform one of them into a list with the list or collect operator for the emptyness check. The other channel can then be used wherever you like.
teoKusa!
@teoKusa_twitter
Mar 06 15:23
oh, ok @pditommaso thanks, that's good to know. But I guess I can order them using orderedLists? can I do that using files too?
Kevin Sayers
@KevinSayers
Mar 06 15:35
@lukasjelonek I came across the issue with the imagePullSecrets too
Lukas Jelonek
@lukasjelonek
Mar 06 15:36
Could you solve it?
Kevin Sayers
@KevinSayers
Mar 06 15:38
Unfortunately not yet
micans
@micans
Mar 06 15:43

@csawye01 You don't need when to check if a channel is empty; if a channel is empty then nothing will reach the target process. These idioms can be useful for channel creation (I got them from nf-core/atacseq):

ch_salmon_index = params.run_salmon
    ? Channel.fromPath(params.salmon_index)
       .ifEmpty { exit 1, "Salmon index not found: ${params.salmon_index}" }
    : Channel.empty()

and for collecting from a channel that may be empty:

file (fastqc:'fastqc/*') from ch_multiqc_fastqc.collect().ifEmpty([])
Lukas Jelonek
@lukasjelonek
Mar 06 15:48
I checked all options for the pod directive and the following are not included in the generated yaml file: imagePullSecret, imagePullPolicy and label
Lukas Jelonek
@lukasjelonek
Mar 06 16:06
I think I found a possible reason for the missing values in the yaml. They are not copied in the plus method of PodOptions and I suppose that this method is called at some point during the yaml-creation.
Chelsea Sawyer
@csawye01
Mar 06 16:09
@micans so if I want to make optional processes that will only run when a specific channel is empty and won't run if the channel contains a value I don't have to put a when directive and that won't muck anything up downstream?
micans
@micans
Mar 06 16:09
oh uh sorry. that sounds the other way round. You want it to run if a channel is empty!
makes sense, the way I read it I'm sure you know already.
I'm confused now ...
Chelsea Sawyer
@csawye01
Mar 06 16:23
@micans Some processes are only run if a failure is detected. I used the choice directive to output the results in one of two channels and only want to pass option1 to the extra processes and option2 would be sent to processes further downstream. I hope thats a bit more clear, but let me know if not.
option1 = Channel.create()
option2 = Channel.create()
option_check.choice( option1, option2 ) { a -> a[0] =~ /^fail.*/ ? 0 : 1 }
process fail_option {
       input:
       input from option1
       script:
        """ do a thing for option1 but not option2"""
Lukas Jelonek
@lukasjelonek
Mar 06 16:36
@KevinSayers I fixed the problem. There is a bug in nextflow that results in loosing the information of these three fields. @pditommaso I will create a pull request for the fix.
Kevin Sayers
@KevinSayers
Mar 06 16:41
@lukasjelonek great!
Tobias "Tobi" Schraink
@tobsecret
Mar 06 16:53
@csawye01 I may be misunderstanding this but I think isEmpty is only available for lists. Would ifEmpty work?
Something like:
process fail_option {
    input:
    input from some_channel
    when:
    some_channel.ifEmpty(true)
    script:
    """script goes here"""
}
Chelsea Sawyer
@csawye01
Mar 06 17:04
@tobsecret I just tried it and got the error "some_channel has been used as an input by more than a process or an operator"
Lukas Jelonek
@lukasjelonek
Mar 06 17:07
@csawye01 You have to duplicate the channel in advance
some_channel.into{some_channel_1; some_channel_2;}
process fail_option {
    input:
    input from some_channel_1
    when:
    some_channel_2.ifEmpty(true)
    script:
    """script goes here"""
}
Tobias "Tobi" Schraink
@tobsecret
Mar 06 17:10
Oooh, right - my bad! yes, with the duplication trick it should work!
Tobias "Tobi" Schraink
@tobsecret
Mar 06 17:24
Actually, we can simplify the whole thing.
Tobias "Tobi" Schraink
@tobsecret
Mar 06 17:40
brb, cooking up an MCVE
lastwon1216
@lastwon1216
Mar 06 19:33
Hello, I am trying to execute a batch of many singleend fastq files using nextflow script, but it does not produce multiple output files for each fastq files.. Can someone help me with this issue please? I looked at the document and used following channel :
params.reads="$PWD/*.fastq.gz"
Channel.fromPath( params.reads )
Tobias "Tobi" Schraink
@tobsecret
Mar 06 19:37

So seems like when: doesn't do what we want here because it checks for every single item. We can use conditional execution instead:

channel1 = Channel.empty()
process fail_option{
    input:
    val input from channel1.ifEmpty(false)
    exec:
    if( !input ) {
        println "Channel is empty!"
        }
}

The downside of that is that this still submits a job for each item in channel1, if channel1 is not empty.

Tobias "Tobi" Schraink
@tobsecret
Mar 06 20:17
@csawye01
The following actually mitigates that issue. This will still submit a job, even if channel1 is not empty but it will only submit one job.
The way this fails is if channel1 is not empty and the first value emitted by it is false:
channel1 = Channel.empty()
//channel1 = Channel.from(1,2)
process fail_option{
    input:
    val input from channel1.ifEmpty(false).first()
    exec:
    if( !input ) {
        println "Channel is empty!"
        }
}
This should print Channel is empty! if given channel1 = Channel.empty() and should print nothing (but still submit a job) if given channel1 = Channel.from(1,2)
micans
@micans
Mar 06 21:46
@csawye01 when you write Some processes are only run if a failure is detected I am curious how you do it. We do something like that here: https://github.com/cellgeni/rnaseq/blob/master/main.nf#L242-L261, triggering different output channels by utilising optional true and the script section creating different outputs on success/failure. I am interested in different solutions; a drawback of this solution is that transient errors are retried under -resume; we can get a neat notification this way of the failure, but we cannot resume. Another thing that came to mind is that the until channel operator can be handy.
micans
@micans
Mar 06 22:02
(should have written not retried, and we can resume but that failed process will not be retried)
micans
@micans
Mar 06 23:28
@pditommaso to me mixing channels is a bit of boilerplate the way I've used it. It would be natural to me to simply specify the same output channel twice, but this is not allowed. Is the reason to avoid errors, or is it difficult to decide when the mixing should happen (I'd assume just before it is used as an input channel)?