These are chat archives for nextflow-io/nextflow

17th
Oct 2018
Rad Suchecki
@rsuchecki
Oct 17 2018 00:04
Make sure bams_for_import is declared as an output in the process @bwlang
Paolo Di Tommaso
@pditommaso
Oct 17 2018 04:26
exactly
@davisem run NF from a directory from where it can create such lock and specify the shared file system dir with the -w opt
Karin Lagesen
@karinlag
Oct 17 2018 08:29
ok, question:
My users all have umask set to 0002
however, when nextflow runs spades, their tmp files do not get those permissions
I am kind of assuming this is a spades issue, and not a nexflow problem, but thought I'd swing this by you anyhow
Luca Cozzuto
@lucacozzuto
Oct 17 2018 08:48
Dear all, when using watchPath and join operators the pipeline gets stuck, while using fromPath it moves further... I am wondering if the join operator is missing something
Eric Davis
@davisem
Oct 17 2018 08:48
@pditommaso This is how I am running. Usually I see 1-5 out of ~200 nextflow processes fail with this error, and it's not deterministic. I notice that only 1 history file is created (working dir from which the job array was submitted) despite the -w command. Too many nextflow processes trying to lock the same file?
Paolo Di Tommaso
@pditommaso
Oct 17 2018 10:05
@karinlag are you using docker ?
@lucacozzuto provide test case / example
@davisem ideally you should use separate launch dirs
Luca Cozzuto
@lucacozzuto
Oct 17 2018 10:56
dear @pditommaso I'm trying to use join with remainder: true when using watchPath and it ignores the remainder
Luca Cozzuto
@lucacozzuto
Oct 17 2018 11:24
if you launch it and then move the files in the input folder (named aaa) to another folder and back you will activate the watchPath
with fromPath everything is fine
LukeGoodsell
@LukeGoodsell
Oct 17 2018 12:36
Hello. Is there an easy way to launch a daemon as part of a Nextflow workflow? E.g., starting a run-specific BLAT server that is used by other processes and then closed when the workflow finishes?
Brad Langhorst
@bwlang
Oct 17 2018 13:03

it seems odd to me that I need to make a channel like this:

file('bams_for_import') into bams_for_import

to be able to do

publishDir "${outputDir}", mode: 'copy', pattern: ‘bams_for_import’

why is that?

I won’t consume that channel anywhere else?
Paolo Di Tommaso
@pditommaso
Oct 17 2018 13:13
the rationale is that publishDir copy file declared as outputs, not other
tho you can omit the into bams_for_import part
@LukeGoodsell what do you mean ?
Paolo Di Tommaso
@pditommaso
Oct 17 2018 13:25

@lucacozzuto

if you launch it and then move the files in the input folder (named aaa) to another folder and back you will activate the watchPath

aaaaaand so ?
Luca Cozzuto
@lucacozzuto
Oct 17 2018 13:27
you see that there is no NULL in the results
as if you would expect by replacing watchPath with fromPath
Paolo Di Tommaso
@pditommaso
Oct 17 2018 13:33
that's correct, because the watchPaths never ends, therefore the remainder cannot be produced
Luca Cozzuto
@lucacozzuto
Oct 17 2018 13:34
mmm
Luca Cozzuto
@lucacozzuto
Oct 17 2018 13:54
it will be nice to have a warning in some operators when using watchPath... btw thanks for answering :)
Paolo Di Tommaso
@pditommaso
Oct 17 2018 13:55
:sleeping::sleeping:
welcome :)
Luca Cozzuto
@lucacozzuto
Oct 17 2018 13:57
@pditommaso if you need :coffee: you know where we are... :)
Brad Langhorst
@bwlang
Oct 17 2018 13:57
thanks @pditommaso … with this in place I do not see a directory called bams_for_import continaing the ../.bam symlinks in output directory… Do i need to do some globbing e.g. file(‘bams_for_import/’)
bams_for import is properly created in the output folder: https://gist.github.com/bwlang/bfd2d8cecc77f1daab40484cc392e52d
Paolo Di Tommaso
@pditommaso
Oct 17 2018 14:01
nope, make sure to use file('bams_for_import') not file(‘bams_for_import/’)
Brad Langhorst
@bwlang
Oct 17 2018 14:01
hmm - ill try again.
Luca Cozzuto
@lucacozzuto
Oct 17 2018 14:05
@pditommaso btw not a warning in the execution but in the documentation :)
Paolo Di Tommaso
@pditommaso
Oct 17 2018 14:06
ohh
maybe after some litre of coffee :)
Eric Davis
@davisem
Oct 17 2018 14:07
thanks @pditommaso !
k-hench
@k-hench
Oct 17 2018 14:10

hi everyone,
I'm having trouble to accomplish a task that should be fairly simple:
I want to cross several channels of model parameters and than use them within a process script.
Can someone point me to what I'm doing wrong here (neither of the two versions worked) ?

Version 1:

#!/usr/bin/env nextflow

/* model parameters */
extend = Channel.from( 5000, 1000 )
recom = Channel.from( 50, 10 )
ne = Channel.from( 10000 )

params =  extend.combine( recom ).combine( ne ).map{ row -> [ ext:row[0], rec:row[1], ne:row[2]] }

process msms {
  echo true

  input:
  val( x ) from params

  script:
  """
  echo "e : ${x.ext}"
  """
}

this gives the error:

WARN: Access to undefined parameter ext -- Initialise it to a default value eg. params.ext = some_value

Version two:

process msms {
  echo true

  input:
  set val( x ), val( y ), val ( z ) from params

  script:
  """
  echo "e : ${x}"
  """
}

this gives the error:

"WARN: Input tuple does not match input set cardinality declared by process msms -- offending value: [:]"

yet, the preparation itself seems to be fine:

extend = Channel.from( 5000, 1000 )
recom = Channel.from( 50, 10 )
ne = Channel.from( 10000 )

extend.combine( recom ).combine( ne ).map{ row -> [ ext:row[0], rec:row[1], ne:row[2]] }.println()

-> this returns

[ext:5000, rec:50, ne:10000]

[ext:1000, rec:50, ne:10000]

[ext:5000, rec:10, ne:10000]

[ext:1000, rec:10, ne:10000]

I'm quite confused at this since a flavour of "version 1" has worked before (in other scripts) - so any help would be much appreciated :)

Luca Cozzuto
@lucacozzuto
Oct 17 2018 14:17
don't use params
I think is reserved
#!/usr/bin/env nextflow

/* model parameters */
extend = Channel.from( 5000, 1000 )
recom = Channel.from( 50, 10 )
ne = Channel.from( 10000 )

pars = extend.combine( recom ).combine( ne ).map{ row -> [ ext:row[0], rec:row[1], ne:row[2]] }

process msms {
  echo true

  input:
  val( x ) from pars

  script:
  """
  echo "e : ${x.ext}"
  """
}
N E X T F L O W  ~  version 0.30.2
Launching `main.nf` [maniac_pike] - revision: ab7b3f2f6b
[warm up] executor > local
[1d/c853b4] Submitted process > msms (1)
[5f/60841f] Submitted process > msms (3)
[61/71eb63] Submitted process > msms (2)
[5e/47fb5e] Submitted process > msms (4)
e : 5000
e : 5000
e : 1000
e : 1000
k-hench
@k-hench
Oct 17 2018 14:19
you're right - thanks for the reminder - with 'pars' it works perfectly - thanks :D
Brad Langhorst
@bwlang
Oct 17 2018 14:56

hmm

    publishDir "${outputDir}", mode: 'copy', pattern: '*.{md.bam}*'
    publishDir "${outputDir}", mode: 'copy', pattern: 'bams_for_import'

    input:
        set val(library), file(libraryBam) from aligned_files.groupTuple()

    output:
        ...
        file('bams_for_import')

    shell:
    '''
       ...
       mkdir bams_for_import && pushd bams_for_import && ln -s ../!{library}.md.bam . && popd

does make the output folder but does not copy the links.. instead it copies the entire data again. and only for one file. Have I misunderstood something?

Paolo Di Tommaso
@pditommaso
Oct 17 2018 14:58
let me check
pushd and popd are just fancy version of cd foo and cd - ?
micans
@micans
Oct 17 2018 15:01
What does the exclamation mark do? ../!{library}.md.bam
Paolo Di Tommaso
@pditommaso
Oct 17 2018 15:01
we ahve already enouh problems .. :smile:
micans
@micans
Oct 17 2018 15:02
bc-2,TIC-misc/tic-test, echo ../!{asdfkj}
bash: !{asdfkj}: event not found
@bwlang I've tried this
process foo {
   publishDir "results", mode: 'copy', pattern: 'bams_for_import'

    output:
    file('bams_for_import')

    shell:
    '''
       touch foo.md.bam
       mkdir bams_for_import 
       cd bams_for_import 
       ln -s ../foo.md.bam 
    '''
}
outputs
$ tree results/
results/
└── bams_for_import
    └── foo.md.bam
:wave:
micans
@micans
Oct 17 2018 15:04
:mouse: squeak, sorry, thanks
Brad Langhorst
@bwlang
Oct 17 2018 15:04
is bams_for_import/foo.bam a link or does it contain the data?
Paolo Di Tommaso
@pditommaso
Oct 17 2018 15:05
is exactly what this code does
       touch foo.md.bam
       mkdir bams_for_import 
       cd bams_for_import 
       ln -s ../foo.md.bam
ah, you mean the result
Brad Langhorst
@bwlang
Oct 17 2018 15:06
in my hands the working directory is a link , but the outputdirectory has the data
twice
Paolo Di Tommaso
@pditommaso
Oct 17 2018 15:07
I got a concrete file
but why you don't jsut move that file under bams_for_import ?
I mean in the task, why are you using a symlink ?
Brad Langhorst
@bwlang
Oct 17 2018 15:08
we have this odd system where the next stage in the system deletes the bam files as they are moved, which is not easy to change. So this is a workaround to avoid losing the bams when they are sucked up downstream.
I’ll play a bit more with copynofollow
Paolo Di Tommaso
@pditommaso
Oct 17 2018 15:09
use an hardlink instead!
Brad Langhorst
@bwlang
Oct 17 2018 15:10
this does work for me, but is very ugly:
i=0
    while ! test -f "!{outputDir}/!{md_file}"; do
       sleep 1
       i=$((i+1))
       if [ $i -gt 100 ]; then
          echo "!{md_file} not seen in 100 iterations."
          exit 1
       fi
    done
    mkdir -p !{outputDir}/bams_for_import && pushd !{outputDir}/bams_for_import && ln -sf ../!{md_file} .
hard link.. hmm can that be copied from one filesystem to another?
(working dir is on a scratch volume, output dir is on a different mount)
brb
Paolo Di Tommaso
@pditommaso
Oct 17 2018 15:12
wait are you moving to a directory outside the task work dir ?
mkdir -p !{outputDir}/bams_for_import
?
Brad Langhorst
@bwlang
Oct 17 2018 16:45
that’s the very ugly part… with the crazy loop.
i did that as a downstream task which works, but is terrible.
LukeGoodsell
@LukeGoodsell
Oct 17 2018 18:57
@pditommaso As part of a workflow, I generate an run-specific peptidome and genome. I would like to launch BLAT daemons for each, providing extremely fast and high-throughput querying to multiple other workflow processes. I would like the daemons to be automatically launched by the workflow and closed when the workflow stops, either when it finishes or halts prematurely. The daemons also need to run on a different machine than the other processes so that they don't share resources.
Currently the best approach I can come up with is convoluted, involving launching monitoring programs in the background on the login machine and daemon machines that use files in controlled locations to signal when processes exit or are no longer needed, and the host name of the daemon machines.
Does Nextflow have built-in support for processes that launch services for other processes in this way?
Brad Langhorst
@bwlang
Oct 17 2018 19:55
@LukeGoodsell : I don’t know about any nextflow native stuff for this. However we dosomething similar using sshkit to check or spin up a remote server from within a nextflow process.

@pditommaso : ok - so good news,

    publishDir "${outputDir}", mode: 'copyNoFollow', pattern: 'bams_for_import'
    beforeScript "set +u; source activate ${conda_env}"

    input:
        set val(library), file(libraryBam) from aligned_files.groupTuple()

    output:
       ...
        file('bams_for_import')

    shell:
    '''
    …
    mkdir bams_for_import && cd bams_for_import && ln -s ../!{library}.md.bam . && cd -
    '''

does seem to produce just a relative symlink to the bam in the preceeding directory. However - only a single link is present in the output directory, presumably the last one to execute. Any ideas to avoid clobbering the output directory?