These are chat archives for nextflow-io/nextflow

6th
Jul 2017
Paolo Di Tommaso
@pditommaso
Jul 06 2017 08:29
@aunderwo have you solved the installation problem ?
Evan Floden
@evanfloden
Jul 06 2017 09:20

Going back to what we were discussing the other day.

I require channel C to be the input of a process.

Channel C can originate from either Channel A or Channel B

Paolo Di Tommaso
@pditommaso
Jul 06 2017 09:21
then
Evan Floden
@evanfloden
Jul 06 2017 09:22
But .concat() does not work as Channel A or B could be empty.
So I cannot do:
ChannelB
  .concat(ChannelA)
  .set { ChannelC }
Paolo Di Tommaso
@pditommaso
Jul 06 2017 09:23
empty would be fine, but I guess you are not using that
you need to ensure that any channel produce at least the closing signal
Evan Floden
@evanfloden
Jul 06 2017 09:25
Maybe I don't understand empty properly but I would think I must either set ChannelA or ChannelB to empty depending on the contents or the other channel?
I thought I tried this but will try again now.
Paolo Di Tommaso
@pditommaso
Jul 06 2017 09:26
when you do Channel.from(1,2,3,something)
it is always added implicitly a special STOP element that signal the channel termination
is this element that stops the process execution and ultimately the entire pipeline
if you create a channel with Channel.create() you are supposed to add this stop element manually with channel.close()
Evan Floden
@evanfloden
Jul 06 2017 09:28
Ah okay. That makes sense. I'll test it out
Paolo Di Tommaso
@pditommaso
Jul 06 2017 09:29
Instead Channel.empty() == Channel.create().close()
Evan Floden
@evanfloden
Jul 06 2017 09:34
Thanks @pditommaso, worked a treat. Another topic for the best practices document me thinks :wink:
Paolo Di Tommaso
@pditommaso
Jul 06 2017 09:35
:+1:
Nextflow, the book :)
Evan Floden
@evanfloden
Jul 06 2017 10:15

So my problem + solution is as follows:

Either user provided --trees are provided or not.

If not they are generated in an optional process and output into a channel trees1. Thentrees2 is set to empty.

If they are provided, trees1 is set to empty.

In either case, an empty channel is concated with the other into a a channel treesForAlignment which can be used for all downstream processes.

if ( params.trees ) {
  Channel
    .empty()
    .set { trees1 }
}   

if ( !params.trees ) {
  Channel
    .empty()
    .set { trees2 }
}   

trees1
  .concat ( trees2 )
  .set { treesForAlignment }
Seems to work well for now
Paolo Di Tommaso
@pditommaso
Jul 06 2017 10:50
hey man, they invented if(condition) { } else { } ;)
anyhow good, tho I would like to add something to add in a better way these cases
where's the final code ?
Evan Floden
@evanfloden
Jul 06 2017 12:12

Next thing you will be telling me that all your fancy groovy closures are not obfuscations? :wink:

In reality I have several conditions. But yes, it would be great if conditionals + channels played nicer together.

Paolo Di Tommaso
@pditommaso
Jul 06 2017 12:13
ahhah
Brian Reichholf
@breichholf
Jul 06 2017 16:15
I have a new question: I'd like to do the following:
  • split up genome.fasta in to chromosomes, mitochondria, rRNA (with some extra processing) in one process + collect output in one channel
  • in next process, build indexes
  • in third process (and this is the tricky part I don't know how to do): Consume the channel and align in a specific step-wise manner to filter out contaminants.
    Is there an easy way to achieve this?
Brian Reichholf
@breichholf
Jul 06 2017 16:21
I have:
process prepareRef {
  input:
  file genome from fasta

  output:
  file '*.fa' into splitGenome

  script:
  """
  # Extract mitochondrial > mito.fa
  # Extract rRNA > ribosome.fa
  # Put remainder > genome.fa
  """
}

process genomeIndexes {
  input:
  file part from splitGenome

  output:
  file '*.ebwt' into indexes

  script:
  index_base = part.toString().tokenize(' ')[0].tokenize('.')[0]
  prefix = part.toString() - ~/(\.fa)?$/
  """
  bowtie-build $index_base $prefix
  """
}

process stepwiseAlign {
  input:
  ???

  output:
  file '*_aligned.bam' into alignments

  script:
  """
  # Align to ribosome
  # Align ribo_unaligned to mitochondria
  # Align mito_unaligned to genome
  """
}
Phil Ewels
@ewels
Jul 06 2017 17:18
If you save the prefix as well as the file name to the output channel when building the indexes, you'll know what they are
Then you can pull in all of the channel with .collect() for each input and access the three different indexes by their prefix val..
(..maybe)
Félix C. Morency
@fmorency
Jul 06 2017 17:56
@pditommaso do you have an eta for 0.25.3? :D
Brian Reichholf
@breichholf
Jul 06 2017 18:14
Hmmm.... I'll have to read the docs more thoroughly and write up a toy example. I'll probably have to define a channel from the list ['ribosome', 'mitochondria', 'genome_cleaned'] and then apply some .collect() in the stepwiseAlign process, perhaps?!
Brian Reichholf
@breichholf
Jul 06 2017 18:25

Does each let me iterate over a list of tuples, one tuple at a time? Then I could do something like:

process indexParts {
  input:
  set $part, $partFile from splitGenome

  output:
  set $part, $partFile, ${part}_index.* into genomePartIndexes

  script:
  """
  bowtie-build $partFile ${part}_index
  """
}

Could I do something like:

process stepWiseAlign {
  input:
  file reads from ReadChannel
  each part, partFile, part_index from genomePartIndex

  output:
  file '*_aligned.bam' into alignments

  script:
  """
  # If read_part_unaligned exists: align this to part, produce file_part1_part2_unaligned
  # Finally: create bam
  """
Paolo Di Tommaso
@pditommaso
Jul 06 2017 18:32
no, that syntax is not supported
stepWiseAlign has the same multiplicity of indexParts ?
(ie is executed as many times as indexParts)
Paolo Di Tommaso
@pditommaso
Jul 06 2017 18:41
@fmorency next week
Félix C. Morency
@fmorency
Jul 06 2017 18:41
\o/
Paolo Di Tommaso
@pditommaso
Jul 06 2017 18:41
:)
Félix C. Morency
@fmorency
Jul 06 2017 18:42
where's the 'Donate' button on the NF web page? :D
Paolo Di Tommaso
@pditommaso
Jul 06 2017 18:42
ehehe
Félix C. Morency
@fmorency
Jul 06 2017 18:42
Would love to give you and your team some beer money
Paolo Di Tommaso
@pditommaso
Jul 06 2017 18:43
I should put:
  • $ 1'000
  • $ 10'000
  • $ 100'000
click here ;)
Evan Floden
@evanfloden
Jul 06 2017 18:43
[Bitcoin phishing link]
Félix C. Morency
@fmorency
Jul 06 2017 18:43
Haha I unfortunately don't have that kind of beer money :P
Paolo Di Tommaso
@pditommaso
Jul 06 2017 18:45
you are still in time to join NF event in September :sunglasses:
Evan Floden
@evanfloden
Jul 06 2017 18:46

Would love to give you and your team some beer money

Pretty much the best testimonial you can get Paolo!

Paolo Di Tommaso
@pditommaso
Jul 06 2017 18:47
ahaha
So far won Christophe D
I think of it since a moment and I never takes time to tell it to you but NextFlow completely changed our way of work here, at the hospital ! We are better in traceability, better in script organisation, better in development cycles...with Singularity usage and Slurm, It really revolutionized our way of working (while we work with Nextflow since hardly a few months)...We have some more of work but we progress faster than before!
Félix C. Morency
@fmorency
Jul 06 2017 18:49
Pffblblbl
Evan Floden
@evanfloden
Jul 06 2017 18:50
I guess it depends on if you value helping to improve human health or beer more... :hospital: or :beers: ??
Paolo Di Tommaso
@pditommaso
Jul 06 2017 18:52
I know that you second the latter :joy:
Félix C. Morency
@fmorency
Jul 06 2017 18:57
Thanks @skptic for the support :sparkles:
Mike Smoot
@mes5k
Jul 06 2017 19:25
Hi @pditommaso how can I tell if a process produces a dataflow variable instead of a channel? Because it just runs once?
Paolo Di Tommaso
@pditommaso
Jul 06 2017 19:27
no, only if all its inputs are dataflow variables, only one execution is not sufficient
eg
Mike Smoot
@mes5k
Jul 06 2017 19:29
Ok, that makes sense. Does .toSortedList() convert a channel into a variable?
Or is there any other way to cast a channel to a variable?
Mike Smoot
@mes5k
Jul 06 2017 19:36
It appears as though .toSortedList() does create a variable, which is super helpful!
Evan Floden
@evanfloden
Jul 06 2017 20:06
Has anyone created/using a VIM syntax editor for NF by any chance?
I have started using the groovy one here but would be nice to have the operators, processes and directives etc coloured up.
Félix C. Morency
@fmorency
Jul 06 2017 20:11
@skptic I was wondering exactly the same thing a few hours ago
Evan Floden
@evanfloden
Jul 06 2017 20:20
Great, I'll wait to see if @pditommaso has something already and do a simple modification of the groovy .vim file adding the operators as keywords. The rest is not too bad to be fair.
Paolo Di Tommaso
@pditommaso
Jul 06 2017 21:14
it would be great, just waiting for a willing hacker !
It appears as though .toSortedList() does create a variable, which is super helpful
yes, there are three class of operators: 1) the ones that given a channel queue returns another channel queue, 2) the one returning a channel value (eg. toList, collect, sum, etc) and 3) the ones returning nothing (eg. println, subscribe, etc)
Mike Smoot
@mes5k
Jul 06 2017 21:19
Thanks @pditommaso!
Paolo Di Tommaso
@pditommaso
Jul 06 2017 21:20
sorry before I was losing the train .. :)
Mike Smoot
@mes5k
Jul 06 2017 21:39

I'm working on a branch of my pipeline, but when I push changes to the branch, it doesn't seem like the updated branch is getting pulled when nextflow runs. The logs seem to indicate that master gets pulled, but not the branches.

Jul-06 20:53:22.545 [main] DEBUG nextflow.cli.Launcher - $> /usr/local/bin/nextflow run http://git.l.synthgeno.global/SGI-Pipelines/nextflow_eukaryotic_annotation.git -resume -hub gitlab -r aws_changes -latest -params-file /mnt/efs/nextflow/run.2734c43a-d7cd-42ff-814b-fb5d487477bb/params.yaml -process.executor slurm
Jul-06 20:53:22.830 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 0.25.2
Jul-06 20:53:23.452 [main] DEBUG nextflow.scm.AssetManager - Repository URL: http://git.l.synthgeno.global/SGI-Pipelines/nextflow_eukaryotic_annotation.git; Project: SGI-Pipelines/nextflow_eukaryotic_annotation; Hub provider: gitlab
Jul-06 20:53:23.517 [main] DEBUG nextflow.scm.AssetManager - Git config: /tools/nextflow/assets/SGI-Pipelines/nextflow_eukaryotic_annotation/.git/config; branch: master; remote: origin; url: http://git.l.synthgeno.global/SGI-Pipelines/nextflow_eukaryotic_annotation.git
Jul-06 20:53:23.518 [main] INFO  nextflow.cli.CmdRun - Pulling SGI-Pipelines/nextflow_eukaryotic_annotation ...
Jul-06 20:53:23.518 [main] DEBUG nextflow.scm.AssetManager - Pull pipeline SGI-Pipelines/nextflow_eukaryotic_annotation  -- Using local path: /tools/nextflow/assets/SGI-Pipelines/nextflow_eukaryotic_annotation
Jul-06 20:53:24.854 [main] INFO  nextflow.cli.CmdRun -  Fast-forward
Jul-06 20:53:25.015 [main] DEBUG nextflow.scm.AssetManager - Git config: /tools/nextflow/assets/SGI-Pipelines/nextflow_eukaryotic_annotation/.git/config; branch: master; remote: origin; url: http://git.l.synthgeno.global/SGI-Pipelines/nextflow_eukaryotic_annotation.git
Jul-06 20:53:25.016 [main] INFO  nextflow.cli.CmdRun - Launching `SGI-Pipelines/nextflow_eukaryotic_annotation` [fabulous_cuvier] - revision: cc9d753696 [aws_changes]

Am I reading that right?

Paolo Di Tommaso
@pditommaso
Jul 06 2017 21:42
the branch is aws_changes ?
Mike Smoot
@mes5k
Jul 06 2017 21:44
yes
Paolo Di Tommaso
@pditommaso
Jul 06 2017 21:45
it sounds like a bug :/
open an issue
Mike Smoot
@mes5k
Jul 06 2017 21:47
Yeah, I think there needs to be setRemoteBranchName somewhere after this line. I guess the default for pull command is master.
Paolo Di Tommaso
@pditommaso
Jul 06 2017 21:47
you may try a patch :)
Mike Smoot
@mes5k
Jul 06 2017 21:49
Will see what I can do! :)