These are chat archives for nextflow-io/nextflow

30th
Jun 2017
Paolo Di Tommaso
@pditommaso
Jun 30 2017 07:32
@amacbride Yes, I've noticed that it appears when it's not needed. We are dangerously near to the halting problem here .. :)
anyhow I think I'm going to demote it to a debug line, so it will be visible only in the log file
Phil Ewels
@ewels
Jun 30 2017 08:42
:+1:
Maxime Garcia
@MaxUlysse
Jun 30 2017 10:51
:+1:
Rickard Hammarén
@Hammarn
Jun 30 2017 11:39

@pditommaso Hi! I have a strange problem. This might be related to #388 and #141 but I'm not sure and it's difficult to know what's causing it and thus replicating it.
When I run the pipeline normally everything works great but when I try to run through docker I get the error

ERROR ~ Channel `bam_markduplicates` has been used twice as an input by process `markDuplicates` and another operator

Which channel it is changes when I change other un-related lines in the code.

Paolo Di Tommaso
@pditommaso
Jun 30 2017 11:41
can you share your pipeline script ?
Rickard Hammarén
@Hammarn
Jun 30 2017 11:44
It all started when I added lines 1015-1041 so I'm suspecting that something there is the cause.:
https://github.com/Hammarn/NGI-RNAseq/blob/master/main.nf#L1015
Paolo Di Tommaso
@pditommaso
Jun 30 2017 11:50
ok, the message is misleading but the channels is used twice as output
here and here
(unrelated: this is useless)
Paolo Di Tommaso
@pditommaso
Jun 30 2017 11:56
wait that is bam_markduplicates and that are two different branches .. my fault
Paolo Di Tommaso
@pditommaso
Jun 30 2017 12:03
what version are you using ?
Rickard Hammarén
@Hammarn
Jun 30 2017 12:04
I tried it on 0.25.0 0.25.1 and the latest snapshot
Paolo Di Tommaso
@pditommaso
Jun 30 2017 12:06
could you check 0.24.4 ?
Rickard Hammarén
@Hammarn
Jun 30 2017 12:07
sure
odd
ERROR ~ Error executing process > 'fastqc (SRR4238359_subsamp)'

Caused by:
  Process `fastqc (SRR4238359_subsamp)` terminated with an error exit status (125)

Command executed:

  fastqc -q SRR4238359_subsamp.fastq.gz

Command exit status:
  125

Command output:
  (empty)

Command error:
  docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
  See 'docker run --help'.
Paolo Di Tommaso
@pditommaso
Jun 30 2017 12:09
oh, but this is related to docker .. :/
are you running on your machine ?
Rickard Hammarén
@Hammarn
Jun 30 2017 12:10
yes
Paolo Di Tommaso
@pditommaso
Jun 30 2017 12:10
apparently the docker daemon is down
try to restart it
Rickard Hammarén
@Hammarn
Jun 30 2017 12:11
Yeah, seems to work fine now
So the error message was just wrong it the later versions then?
Yeah, it runs fine with all the versions now. Great :smile:
Paolo Di Tommaso
@pditommaso
Jun 30 2017 12:13
so is it a problem with the 2.5.x version ?
Rickard Hammarén
@Hammarn
Jun 30 2017 12:14
well the incorrect error message seems to be that. But I'm not sure.
I've had a few unhelpful Nextflow error messages in the past
Paolo Di Tommaso
@pditommaso
Jun 30 2017 12:15
umm, if you could manage to create a replicable test case it would be a big help
Rickard Hammarén
@Hammarn
Jun 30 2017 12:16
sure, I'll give it a try
Rickard Hammarén
@Hammarn
Jun 30 2017 12:28
Ok, so I managed to come up with something. If you try and run this without having docker at all or killing the docker daemon you can replicate the behaviour.
0.25.1 gives the channel already used error
0.25.0 gives the correct error but just hangs
0.24.4 gives the correct error i.e. is docker running?
It does seem to be consistent though ¯\(ツ)
Paolo Di Tommaso
@pditommaso
Jun 30 2017 12:29
oh
not able to replicate :/
Rickard Hammarén
@Hammarn
Jun 30 2017 12:31
interesting
Phil Ewels
@ewels
Jun 30 2017 12:47
https://www.synapse.org/#!Synapse:syn8507133/wiki/415976 Not sure if this is interesting for anyone? I think it requires CWL interoperability so maybe not brilliant for NF..
"GA4GH/DREAM Workflow Execution Challenge"
Paolo Di Tommaso
@pditommaso
Jun 30 2017 12:49
I'm coming to codefest/bosc exactly to discuss this stuff
I don't think the Workflow execution API requires CWL
actually they are supporting WDL as well
@ewels would you be interested to participate to this challenge ?
Phil Ewels
@ewels
Jun 30 2017 12:59
I like the idea but I'm not super clear on how much work is involved..
Paolo Di Tommaso
@pditommaso
Jun 30 2017 13:00
yes, I think it's not feasible in a couple of months
Phil Ewels
@ewels
Jun 30 2017 13:02
Where does it say that you only have a couple of months?
The September 29, 2017 date is just the close of registration for the challenge, no?
Can discuss in person at BOSC perhaps. Looks like Brad C is involved, in data contribution at least.
Paolo Di Tommaso
@pditommaso
Jun 30 2017 13:02
ahh, you are right
it could be a nice project to kick-start at the codefest
Phil Ewels
@ewels
Jun 30 2017 13:06
:+1:
Phil Ewels
@ewels
Jun 30 2017 15:20

@pditommaso - unrelated as problems solved, but regarding your comments above:

ok, the message is misleading but the channels is used twice as output
here and here
(unrelated: this is useless)

The lines you link to are all within two massive if/else blocks
so they're never duplicated at run time
Paolo Di Tommaso
@pditommaso
Jun 30 2017 15:21
yes, I saw that
Phil Ewels
@ewels
Jun 30 2017 15:21
for example - the one you say is useless, if that's not there then the whole pipeline grinds to a halt
as there are no output channels for the downstream processes
unless I'm missing something?
Paolo Di Tommaso
@pditommaso
Jun 30 2017 15:22
you are right, I was confused by the if branch
Phil Ewels
@ewels
Jun 30 2017 15:23
ok cool, then we're on the same page :) :+1:
Paolo Di Tommaso
@pditommaso
Jun 30 2017 15:24
:)
Phil Ewels
@ewels
Jun 30 2017 15:58
Hi! Me again, basic question sorry. How do I read an input file contents in a script block?
eg. input: file sm from chc, then script: print chc.text
except that doesn't work
Paolo Di Tommaso
@pditommaso
Jun 30 2017 15:58
you can't :/
Phil Ewels
@ewels
Jun 30 2017 15:59
aha, that explains it :sweat_smile:
Paolo Di Tommaso
@pditommaso
Jun 30 2017 15:59
the real file it's only resolve at runtime
Phil Ewels
@ewels
Jun 30 2017 15:59
but I'm inside the script block, so that's executed at runtime, no?
I can access the contents of channels that come from a stdout output type
Paolo Di Tommaso
@pditommaso
Jun 30 2017 16:00
during the task pre-processing
but the task can be submitted to the cluster scheduler that can allocate its own scratch directory ..
it's true that the input files should be know at that time, but some complicated reasons they are not
Phil Ewels
@ewels
Jun 30 2017 16:04
ok, no worries. My hacky style of doing it works so I'll just stick with that :)
(if I .subscribe to the channels outside of a process I can get at the file contents)
Paolo Di Tommaso
@pditommaso
Jun 30 2017 16:07
yes
Phil Ewels
@ewels
Jun 30 2017 16:38
Hmm, bit hacky but seems to work ok. I have to create dummy channels for all processes to stop the process from running too early which feels messy :\
Anyway, now I have software versions which is what I wanted, so I should probably stop obsessing :laughing:
Paolo Di Tommaso
@pditommaso
Jun 30 2017 16:39
:joy:
a bit worried about this code :grimacing:
have you taken in consideration to use a process { exec: <code> }
Phil Ewels
@ewels
Jun 30 2017 17:45
Ah no, that would be nicer!
Phil Ewels
@ewels
Jun 30 2017 19:10
Does code in an exec block have access to the file content then?
Paolo Di Tommaso
@pditommaso
Jun 30 2017 19:14
yes, provided you declare them as value instead of file
see here nextflow-io/nextflow#378
then, you can move all that code in the process including the yaml generation
just using a groovy multiline string
Phil Ewels
@ewels
Jun 30 2017 20:30
ok great!
Last thing - how do I create an output file in pure groovy with exec?
Mike Smoot
@mes5k
Jun 30 2017 20:47

Hi @pditommaso I'm wondering if there is anything like storeDir for caching S3 buckets locally through the file function? My use case is downloading databases (e.g. for blast) to my cluster. I currently do this manually with a process and I cache the results for other pipelines to use. This works, but is a lot of redundant code since each db needs to be in a separate channel. I was thinking along these lines:

Channel
     .from( file("s3://example/my_blast_db/*",  storeDir: "/mnt/my/local/cache") )
     .set{ blast_db_dir }

Is there another similarly easy way to accomplish this?