These are chat archives for nextflow-io/nextflow

7th
Aug 2017
Edgar
@edgano
Aug 07 2017 07:30
awesome @pditommaso !!! hahaha it's incredible!
Simone Baffelli
@baffelli
Aug 07 2017 08:05
Good morning. I probabily have asked that question already..but I forgot it. Is there a way to find out why a process was submitted again?
when using --resume
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:12
it's -resume not --resume :)
you can dump the process hashes with -dump-hashes, but it's not very friendly ..
Simone Baffelli
@baffelli
Aug 07 2017 08:15
ah right..anyway I don't understand why it isn't resuming correctly
It worked on friday :crying_cat_face:
somebody must have `touched my files
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:15
maybe the heat :)
Simone Baffelli
@baffelli
Aug 07 2017 08:15
it was quite hot here too
It would be quite useful if the trace would allow custom messages based on channel values
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:17
meta-debugging-programming ?
Simone Baffelli
@baffelli
Aug 07 2017 08:17
because then it would be easier to check and find the culprit of a failed resume
right :smile:
extensible debugging
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:18
!
Simone Baffelli
@baffelli
Aug 07 2017 08:19
anyway while my workflow crunches data, I'm trying to extend randomSample with a seed parameter
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:19
:+1:
Simone Baffelli
@baffelli
Aug 07 2017 08:24
:sparkles: -dump-hashes does exactly what I need
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:24
cool
I should make it a bit more readable
Simone Baffelli
@baffelli
Aug 07 2017 08:25
I think the problem is related to the ordering of data
because I need several images to average in order to compute a mask
and depending on the order the arrive, that step is run again
and the subsequent steps are run again as well
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:26
is a step gathering the result of an upstream process ?
Simone Baffelli
@baffelli
Aug 07 2017 08:26
yes
but actually to avoid that I used storeDirt
storeDir
but now I see that the file in the storeDir was overwritten
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:27
storeDir is almost always a bad idea..
Simone Baffelli
@baffelli
Aug 07 2017 08:27
:cry:
but I must find a way to reuse the same mask
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:30
what do you mean for mask, a collection of image files ?
Simone Baffelli
@baffelli
Aug 07 2017 08:32
yes, from my huge timeseries a pick some images and average them
and from the average i select certain pixels basing on quality measure
and I want to use these pixels awlays
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:33
how is implemented this logic now?
Simone Baffelli
@baffelli
Aug 07 2017 08:36
from the first pipeline run i collect the first N images, compute the quality measure
and then use storeDir to save that
and use it downstream
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:38
in principle removing storeDir should not alter the logic of the pipeline
Simone Baffelli
@baffelli
Aug 07 2017 08:40
no indeed
but storeDir was a lazy way to avoid the problem of changing masks when I change the images I want to work on
but probabily that's the way to go
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:41
lazy things are deferred problems ;)
Simone Baffelli
@baffelli
Aug 07 2017 08:42
I think my best bet is to do my tests and experiment on a shorter time series
and then just let it run for good when everything works
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:42
exactly !
Simone Baffelli
@baffelli
Aug 07 2017 08:44
on a separate note, how do I run a test from the console?
I want to test if my addition of the seed parameter works
Simone Baffelli
@baffelli
Aug 07 2017 08:45
yeah I wrote one
the question was more wheter there is an easy way to run a single test
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:46
how are you planning to add the seed? as a second parameter ?
Simone Baffelli
@baffelli
Aug 07 2017 08:46
from the console
yes
an optional parameter
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:46
make sense
console, you mean the shell terminal ?
Simone Baffelli
@baffelli
Aug 07 2017 08:46
it will be a very ugly groovy probabily
yes, because I stopped using any IDE
intellJ/pycharm was driving me crazy
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:47
:)
in the project root:
make compile
them
Simone Baffelli
@baffelli
Aug 07 2017 08:48
it compiles :confetti_ball:
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:49
make test class=nextflow.extension.DataflowMathExtensionTest
Simone Baffelli
@baffelli
Aug 07 2017 08:51
exccept that I will use
make test class=nextflow.extension.RandomSampleTest, right?
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:52
well, currently the only test in that class
but you may want to create a new RandomSampleTest, that's fine
Simone Baffelli
@baffelli
Aug 07 2017 08:54
but what about "RandomSampleTest"?
there is already a groovy test ther
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:57
oh, true!
yes, that's fine
Simone Baffelli
@baffelli
Aug 07 2017 08:58
well, I suppose that one does not test the operator on a channel
but just the random sampler on a list
but that should suffice
Paolo Di Tommaso
@pditommaso
Aug 07 2017 08:59
given the same seed it should return the same result, right?
Simone Baffelli
@baffelli
Aug 07 2017 08:59
that's exactly the test I wrote
:+1:
make test takes forever, is that normal?
Paolo Di Tommaso
@pditommaso
Aug 07 2017 09:01
well, I guess the first time it's a bit longer: download, compile, etc
Simone Baffelli
@baffelli
Aug 07 2017 09:03
I'll let it grind...cant wait to be disappointed by a failed test
Paolo Di Tommaso
@pditommaso
Aug 07 2017 09:03
ok, but it's doing something, right?
Simone Baffelli
@baffelli
Aug 07 2017 09:04
it does not say anything at the moment
only
./gradlew -q :test --tests nextflow.extension.RandomSampleTest
Paolo Di Tommaso
@pditommaso
Aug 07 2017 09:05
ummm
ok, there's no output because of -q
it should be fine
if you want to see what's doing try
./gradlew test --tests nextflow.extension.RandomSampleTest
Simone Baffelli
@baffelli
Aug 07 2017 09:07
> Building 94% > :test > 3 tests completed
that sounds good
but I'm not sure that your test are entirely correct, strictly speaking there is a chance (however tiny) that the random sample returns exactly 'A'...'J'
Paolo Di Tommaso
@pditommaso
Aug 07 2017 09:11
true, I thought that, but I couldn't fine anything better
any improvement is welcome
Simone Baffelli
@baffelli
Aug 07 2017 09:12
i guess its fine enough
Paolo Di Tommaso
@pditommaso
Aug 07 2017 09:12
;)
Simone Baffelli
@baffelli
Aug 07 2017 09:12
there is no way to deterministically testing for randomness :)
because whenever you specifiy a sequence, there is always a nonzero probability that that sequence turns up
Paolo Di Tommaso
@pditommaso
Aug 07 2017 09:13
exactly
Simone Baffelli
@baffelli
Aug 07 2017 09:18
now I see why the thest does not run..I was applying the operator to the same channel twice
Paolo Di Tommaso
@pditommaso
Aug 07 2017 09:18
and it hangs :)
Simone Baffelli
@baffelli
Aug 07 2017 09:20
it just waits and waits
Paolo Di Tommaso
@pditommaso
Aug 07 2017 09:20
you can put a @spock.lang.Timeout(1) to avoid that
Simone Baffelli
@baffelli
Aug 07 2017 09:48
:confetti_ball: all test passed
Paolo Di Tommaso
@pditommaso
Aug 07 2017 09:50
green !
Simone Baffelli
@baffelli
Aug 07 2017 09:52
nextflow-io/nextflow#421
Shellfishgene
@Shellfishgene
Aug 07 2017 12:15
@ewels @pditommaso I have (another) problem with the MethylSeq pipeline, may not be a bug though. From NF I get WARN: Process 'get_software_versions' failed -- Error is ignored. The work dir for this process is empty, so I'm not sure how to debug this. The .nextflow.log is also not helpful (to me, at least).
Maxime Garcia
@MaxUlysse
Aug 07 2017 12:16
which NF version are you using ?
Shellfishgene
@Shellfishgene
Aug 07 2017 12:17
Still 0.25.1, from bioconda.
Phil Ewels
@ewels
Aug 07 2017 12:17
@Shellfishgene - this is fixed, see GitHub reply :)
assuming that it's this one: SciLifeLab/NGI-MethylSeq#25
Shellfishgene
@Shellfishgene
Aug 07 2017 12:18
@ewels I saw that, thanks! But this is different, it's about the get_versions task.
Phil Ewels
@ewels
Aug 07 2017 12:18
ah sorry
Shellfishgene
@Shellfishgene
Aug 07 2017 12:18
@MaxUlysse Just tried 0.25.5, same problem
Phil Ewels
@ewels
Aug 07 2017 12:19
Could you create a new issue about it and attach the .nextflow.log file please?
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:20
I think it's an application error not a NF issue
Phil Ewels
@ewels
Aug 07 2017 12:20
This process is pure groovy, so the work directory is always empty
Shellfishgene
@Shellfishgene
Aug 07 2017 12:20
@ewels sure, I asked here first because I'm not sure if the problem is on my side
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:20
are you using a ignore error strategy ?
Shellfishgene
@Shellfishgene
Aug 07 2017 12:21
I'm just not sure how to get more info to debug. The log file only has this:
Aug-07 13:57:28.145 [Actor Thread 24] DEBUG nextflow.processor.TaskProcessor - <get_software_versions> Poison pill arrived; port: 9
Aug-07 13:57:28.145 [Actor Thread 27] DEBUG nextflow.processor.StateObj - <get_software_versions> State before poison: StateObj[submitted: 1; completed: 0; poisoned: false ]
Aug-07 13:57:28.200 [Actor Thread 24] DEBUG nextflow.processor.TaskProcessor - <get_software_versions> After stop
Aug-07 13:57:28.203 [Task submitter] INFO  nextflow.Session - [34/0c9bf3] Submitted process > get_software_versions
Aug-07 13:57:28.430 [Task monitor] WARN  nextflow.processor.TaskProcessor - Process `get_software_versions` failed -- Error is ignored
Aug-07 13:57:28.431 [Actor Thread 22] DEBUG nextflow.processor.TaskProcessor - <get_software_versions> Sending poison pills and terminating process
Aug-07 13:57:28.431 [Actor Thread 22] DEBUG nextflow.Session - <<< barrier arrive (process: get_software_versions)
Phil Ewels
@ewels
Aug 07 2017 12:21
it's almost certainly the fault of the pipeline - I already fixed one error which sounded very similar to this earlier today
Shellfishgene
@Shellfishgene
Aug 07 2017 12:21
@pditommaso errorStrategy = { task.exitStatus == Integer.MAX_VALUE ? 'retry' : 'finish' }
Phil Ewels
@ewels
Aug 07 2017 12:22
shouldn't matter about the retry strategy really
and I was wrong above - it does save a file into the work directory when it works
you usually get a long groovy traceback when stuff breaks there though
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:26
umm, that error message is reported when when the ignore is set (because an error condition is somehow expected) https://github.com/nextflow-io/nextflow/blob/experimental/src/main/groovy/nextflow/processor/TaskProcessor.groovy#L1018-1018
Shellfishgene
@Shellfishgene
Aug 07 2017 12:26
Ah, the newer nf version gave some more info in the log it seems:
Aug-07 14:17:56.633 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 9; name: get_software_versions; status: COMPLETED; exit: -; error: groovy.lang.MissingMethodException: No signature of method: java.lang.Boolean.getText() is applicable for argument types: () values: []
Possible solutions: getAt(java.lang.String), getClass(), every(), grep(), collect(), inspect(); workDir: /sfs/fs3/work-geomar7/smomw240/mela_rrbs/work/c5/a126751c7a623e4adf5049ed2df33b]
Aug-07 14:17:56.639 [Task monitor] WARN  nextflow.processor.TaskProcessor - Process `get_software_versions` failed -- Error is ignored
Aug-07 14:17:56.639 [Actor Thread 1] DEBUG nextflow.Session - <<< barrier arrive (process: get_software_versions)
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:27
try to execute with the cli option -process.errorStrategy terminate
it should report the error cause
Shellfishgene
@Shellfishgene
Aug 07 2017 12:29
I gave the same output as above in the log file, still no files in the workdir.
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:30
with -process.errorStrategy terminate ?
Shellfishgene
@Shellfishgene
Aug 07 2017 12:30
@ewels Also the pipeline reports success, but multiqc is not run (probably because of missing software versions). Not sure if that's intentional.
@pditommaso yes
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:31
umm, if so modify or create a new nextflow.config file in the launch directory with this content
process.$get_software_versions.errorStrategy = 'terminate'
and try it again
Shellfishgene
@Shellfishgene
Aug 07 2017 12:33
Ok, that worked.
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:34
and the error message is ?
Shellfishgene
@Shellfishgene
Aug 07 2017 12:34
The error reported now is the same as above, No signature of method:...., and it copies the whole code block from the workflow.
Phil Ewels
@ewels
Aug 07 2017 12:35
hmm, sounds like the error I had when passing empty placeholder channels before
eg. trying to do getText() on false
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:37
umm, if so, you can use '' instead of false
Shellfishgene
@Shellfishgene
Aug 07 2017 12:37
So that process just takes the stdout/stderr from all the other processes and greps the software versions out, right?
Phil Ewels
@ewels
Aug 07 2017 12:37
@pditommaso - you mean trimgalore_results = Channel.from('') should be ok?
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:38
can I the line in the script ?
@Shellfishgene - yes. Also, MultiQC should always run. But it depends on the software versions process, so presumably doesn't run if that fails
Shellfishgene
@Shellfishgene
Aug 07 2017 12:39
@ewels That makes sense.
So even if the process is pure groovy, should it not get all the command.log files in it's work dir?
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:39
@ewels let me think . .
Phil Ewels
@ewels
Aug 07 2017 12:40
seems to be the case, yes
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:41
Phil Ewels
@ewels
Aug 07 2017 12:41
in fairness, now that I'm collecting everything from .command.out instead of using the stdout channels, I could do it as a more normal process
@pditommaso - it's not the MultiQC process
Apologies, I linked the wrong line before
should have been the one below
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:44
it should work with ''
tho I don't like so much
Phil Ewels
@ewels
Aug 07 2017 12:44
:+1: testing with '' now
link = like typo?
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:45
opps yes
fixed ;)
Phil Ewels
@ewels
Aug 07 2017 12:48
I like '' better if it means that I can get rid of the conditionals in the sw versions process
We worked on adding software version parsing to core MultiQC at the BOSC Codefest (with @Hammarn) anyway, so hopefully I can strip this from the NF pipeline soon.
Paolo Di Tommaso
@pditommaso
Aug 07 2017 12:52
:ok_hand:
Phil Ewels
@ewels
Aug 07 2017 12:57
Caused by:
  No signature of method: java.lang.String.getText() is applicable for argument types: () values: []
Possible solutions: getAt(int), getAt(groovy.lang.Range), getAt(groovy.lang.IntRange), getAt(groovy.lang.EmptyRange), getAt(java.lang.String), getAt(groovy.lang.IntRange)
this is with a find + replace to use Channel.from('') instead of Channel.from(false)
Shellfishgene
@Shellfishgene
Aug 07 2017 13:00
The same one I get, but here it's still Channel.from(false)
Paolo Di Tommaso
@pditommaso
Aug 07 2017 13:04
my fault, the point is that it's expecting a file
and passing an empty channel won't trigger the process execution
a workaround could be to use toList instead of collect
the main difference is that toList returns an empty list for an empty channel, but then you will need to check properly the trimgalore list to verifies it contains at least one element
to recap
1) change this with trimgalore_logs = Channel.empty()
Paolo Di Tommaso
@pditommaso
Aug 07 2017 13:10
2) change this with val trimgalore from trimgalore_logs.toList()
3) change this with
software_versions['FastQC'] = \
      fastqc ? fastqc[0].getText().find(/FastQC v(\S+)/) { match, version -> "v$version" } : null
Phil Ewels
@ewels
Aug 07 2017 13:12
ok cool, testing now :+1:
Seems to be hanging.. (trying to be patient :) )
Does this look right? ewels/NGI-MethylSeq@193a513
Paolo Di Tommaso
@pditommaso
Aug 07 2017 13:20
umm, weird
this run

process foo {
 echo true 
 input: val x from Channel.empty().toList()
 script:
 """
 echo $x
 """
}
Is get_software_versions hanging the execution ?
Phil Ewels
@ewels
Aug 07 2017 13:29
trying to debug line by line atm
if I change this to val bismark_summary from Channel.empty().toList() it works fine
without it I get to [9a/1f6ec0] Cached process > bismark_summary and then it hangs
agh, wrong link sorry
:confused:
Paolo Di Tommaso
@pditommaso
Aug 07 2017 13:31
is bismark_summary always executed ?
oops
yeah, should always be executed
though it takes the bismark_dedup_log_2 channel
which in this case is bismark_dedup_log_2 = Channel.from(false)
though in the travis test it wasn't running --rrbs, so it should have been the regular dedup output channel there
Simone Baffelli
@baffelli
Aug 07 2017 13:33
What does this mean ?
 ERROR nextflow.extension.ChoiceOp - @unknown
Paolo Di Tommaso
@pditommaso
Aug 07 2017 13:34
@ewels but the log says
[42/515e98] Submitted process > bismark_summary
@baffelli look in the log file
Phil Ewels
@ewels
Aug 07 2017 13:36
Last verbose log messages (NF still running):
Aug-07 15:36:05.663 [Actor Thread 7] INFO  nextflow.processor.TaskProcessor - [c0/9b466e] Cached process > bismark_report (SRR389222_Enterobacteria_phage_lambda_Bisulfite-Seq)
Aug-07 15:36:05.665 [Actor Thread 1] DEBUG nextflow.Session - <<< barrier arrive (process: bismark_report)
Aug-07 15:36:05.666 [Actor Thread 2] INFO  nextflow.processor.TaskProcessor - [9a/1f6ec0] Cached process > bismark_summary
Aug-07 15:36:05.670 [Actor Thread 1] DEBUG nextflow.Session - <<< barrier arrive (process: bismark_summary)
Further up:
Aug-07 15:36:05.476 [main] DEBUG nextflow.Session - >>> barrier register (process: get_software_versions)
Aug-07 15:36:05.512 [main] DEBUG nextflow.processor.TaskProcessor - Creating operator > get_software_versions -- maxForks: 4
Paolo Di Tommaso
@pditommaso
Aug 07 2017 13:38
it suggest that bismark_summary and bismark_report run correctly
Simone Baffelli
@baffelli
Aug 07 2017 13:38
@pditommaso that was all the log said
Paolo Di Tommaso
@pditommaso
Aug 07 2017 13:38
you are doing all this mess to collect the tools version numbers, right ? :D
@baffelli oh .. hard to say if so
Simone Baffelli
@baffelli
Aug 07 2017 13:39
plus some stack traces that were not very useful
Paolo Di Tommaso
@pditommaso
Aug 07 2017 13:39
stack trace are always useful :)
Simone Baffelli
@baffelli
Aug 07 2017 13:39
I removed a try catch in the closure and now all of a sudden it works
too late :sweat_smile: I'm running it again and it resumes again
Paolo Di Tommaso
@pditommaso
Aug 07 2017 13:40
better !
@ewels what about refactoring the get_software_versions something like
Simone Baffelli
@baffelli
Aug 07 2017 13:41
but should it happen again, i will keep the log, promise
Phil Ewels
@ewels
Aug 07 2017 13:41
@pditommaso - yes, it seemed like a good idea at the time :sweat:
it started out as a simple one-liner using stdout channel from one process
then grew into this monstrosity :imp:
Paolo Di Tommaso
@pditommaso
Aug 07 2017 13:42
process get_software_versions {
  output: 
  file 'tool_x.version' into tool_ch
  script: 
  """
  tool_x --version >> versions.txt
  tool_y --version >> versions.txt
  .. 
  grep 'Tool X Ver' versions.txt > tool_x.version
}
in this it would run independently . .
Phil Ewels
@ewels
Aug 07 2017 13:43
yes, that would be nicer :+1:
Paolo Di Tommaso
@pditommaso
Aug 07 2017 13:43
w/o all that mess of if and collect;)
Simone Baffelli
@baffelli
Aug 07 2017 13:44
@pditommaso you don't want to see my pipeline then
chains of collect and map and toSortedList everywhere
Shellfishgene
@Shellfishgene
Aug 07 2017 13:46
@ewels Another thing, I just noticed by chance that bismark prints the error Failed to open filehandle: Too many open files at.... I know this related to the server, but the pipeline continued anyway. Not sure why...
Paolo Di Tommaso
@pditommaso
Aug 07 2017 13:46
Chains are good, if are bad! :)
LukeGoodsell
@LukeGoodsell
Aug 07 2017 14:41
Hi again! Is it possible to have storeDir (sym)link to the files, or have publishDir files be reused by default (without -resume)?
kevbrick
@kevbrick
Aug 07 2017 14:44
Hi all ... I'm running nextflow using Slurm. On an interactive node, I have no problem running the pipe, but if I run pipe as an sbatch job, I get "Unable to initialize nextflow environment" error !! Any idea what's going on? Are there ENV VARs I need to set that might differ between these run environments? I can't find much documentation on this type of behaviour ... any help is appreciated ...
Félix C. Morency
@fmorency
Aug 07 2017 14:45
@kevbrick can you post the complete log?
Paolo Di Tommaso
@pditommaso
Aug 07 2017 14:48
@LukeGoodsell I'm not understanding
@kevbrick what version are you using ?
kevbrick
@kevbrick
Aug 07 2017 14:50
Yup ... log below ...
Aug-07 10:34:46.154 [main] DEBUG nextflow.cli.Launcher - Setting http proxy: [dtn01-e0, 3128]
Aug-07 10:34:46.324 [main] DEBUG nextflow.cli.Launcher - Setting https proxy: [dtn01-e0, 3128]
Aug-07 10:34:46.324 [main] DEBUG nextflow.cli.Launcher - $> /usr/local/apps/nextflow/0.24.4/bin/nextflow run /data/RDCO/code/pipelines//bwaAlignPE_bam.groovy --bam /data/RDCO/code/pipelines/testPE.bam --name kevinTestPE2 --threads 6 --sample_name kevinTestPE2 --library kevinTestPE2 --rundate 000101 --outdir ./kevinTestPE2 --genome mm10
Aug-07 10:34:46.678 [main] INFO nextflow.cli.CmdRun - N E X T F L O W ~ version 0.24.4
Aug-07 10:34:46.730 [main] INFO nextflow.cli.CmdRun - Launching /data/RDCO/code/pipelines/bwaAlignPE_bam.groovy [elated_mahavira] - revision: 3736603faa
Aug-07 10:34:46.769 [main] DEBUG nextflow.config.ConfigBuilder - Found config base: /data/RDCO/code/pipelines/nextflow.config
Aug-07 10:34:46.775 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /data/RDCO/code/pipelines/nextflow.config
Aug-07 10:34:47.849 [main] DEBUG nextflow.config.ConfigBuilder - Setting config profile: 'standard'
Aug-07 10:34:48.127 [main] DEBUG nextflow.Session - Session uuid: fccbcab4-4423-4a52-9ed5-de874f06b450
Aug-07 10:34:48.127 [main] DEBUG nextflow.Session - Run name: elated_mahavira
Aug-07 10:34:48.128 [main] DEBUG nextflow.Session - Executor pool size: 56
Aug-07 10:34:48.196 [main] DEBUG nextflow.cli.CmdRun -
Version: 0.24.4 build 4341
Modified: 22-05-2017 11:18 UTC (07:18 EDT)
System: Linux 2.6.32-642.3.1.el6.x86_64
Runtime: Groovy 2.4.10 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_11-b12
Encoding: UTF-8 (ANSI_X3.4-1968)
Process: 3412@cn3614 [10.2.10.186]
CPUs: 56 - Mem: 252.1 GB (64.4 GB) - Swap: 2 GB (2 GB)
Aug-07 10:34:48.281 [main] DEBUG nextflow.Session - Work-dir: /gpfs/gsfs2/users/RDCO/kevbrick/test/work [gpfs]
Aug-07 10:34:48.282 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /data/RDCO/code/pipelines/bin
Aug-07 10:34:48.678 [main] DEBUG nextflow.Session - Session start invoked
Aug-07 10:34:48.683 [main] DEBUG nextflow.processor.TaskDispatcher - Dispatcher > start
Aug-07 10:34:48.683 [main] DEBUG nextflow.script.ScriptRunner - > Script parsing
Aug-07 10:34:50.040 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Aug-07 10:34:50.086 [main] INFO nextflow.Nextflow - ====================================================================
Aug-07 10:34:50.086 [main] INFO nextflow.Nextflow - BWA MEM PIPELINE : Map, mark duplicates, sort and index
Aug-07 10:34:50.086 [main] INFO nextflow.Nextflow - ====================================================================
Aug-07 10:34:50.086 [main] INFO nextflow.Nextflow - ref genome : mm10
Aug-07 10:34:50.086 [main] INFO nextflow.Nextflow - genome fasta : /data/RDCO/genomes//mm10/BWAIndex/version0.7.10/genome.fa
Aug-07 10:34:50.086 [main] INFO nextflow.Nextflow - bam : /data/RDCO/code/pipelines/testPE.bam
Aug-07 10:34:50.090 [main] INFO nextflow.Nextflow - RG:SM : kevinTestPE2
Aug-07 10:34:50.090 [main] INFO nextflow.Nextflow - RG:PL : ILLUMINA
Aug-07 10:34:50.090 [main] INFO nextflow.Nextflow - RG:PU : HISEQ2500
Aug-07 10:34:50.090 [main] INFO nextflow.Nextflow - RG:LB : kevinTestPE2
Aug-07 10:34:50.090 [main] INFO nextflow.Nextflow - RG:ID : kevinTestPE2
Aug-07 10:34:50.101 [main] INFO nextflow.Nextflow - RG:DT : 101
Aug-07 10:34:50.101 [main] INFO nextflow.Nextflow - outName : kevinTestPE2.bwaMemPE.mm10
Aug-07 10:34:50.101 [main] INFO nextflow.Nextflow - outdir : ./kevinTestPE2
Aug-07 10:34:50.101 [main] INFO nextflow.Nextflow - temp_dir : /tmp
Aug-07 10:34:50.102 [main] INFO nextflow.Nextflow - threads : 6
Aug-07 10:34:50.102 [main] INFO nextflow.Nextflow - mem : 16G
Aug-07 10:34:50.311 [main] DEBUG nextflow.processor.ProcessFactory - << taskConfig executor: slurm
Aug-07 10:34:50.312 [main] DEBUG nextflow.pr
Paolo Di Tommaso
@pditommaso
Aug 07 2017 14:51
well it's running if so .. :)
LukeGoodsell
@LukeGoodsell
Aug 07 2017 14:55
@pditommaso I’d like to have output files in a permanent location, and automatically re-used when re-running the pipeline (à la storeDir) but I want to have them created as links or symlinks, rather than copied (à la publishDir)
storeDir seems only to be able to copy the files, while publishDir files don’t get used when re-running the pipeline.
Simone Baffelli
@baffelli
Aug 07 2017 14:57
but nextflow automatically uses cached results if you use -resume
Paolo Di Tommaso
@pditommaso
Aug 07 2017 14:57
I see, nope that's not possible when using storeDir, basically because the task works in that folder
LukeGoodsell
@LukeGoodsell
Aug 07 2017 14:59
Yes, but I do want to re-run tasks that weren’t stored since the process script hasn’t changed, but a library in the scripts it calls have. Unless nextflow is clever enough to detect that a library in a called script has changed.
Simone Baffelli
@baffelli
Aug 07 2017 14:59
well that's exactly the same problem I have when I change the implementation of a script\
I suppose a solution could be to send the library file over a channel
and combine it with the inputs of the process that uses it
LukeGoodsell
@LukeGoodsell
Aug 07 2017 15:01
Ugh
Simone Baffelli
@baffelli
Aug 07 2017 15:01
AFAIK nextflow does not check the implementation of your scripts
Paolo Di Tommaso
@pditommaso
Aug 07 2017 15:01
I think it's what we were talking about nextflow-io/nextflow#413
Simone Baffelli
@baffelli
Aug 07 2017 15:01
Agree
Paolo Di Tommaso
@pditommaso
Aug 07 2017 15:02
that would be a mess . .
Simone Baffelli
@baffelli
Aug 07 2017 15:02
it could be theoretically possible to hash the script I guess, but I don't know how feasible that would be and I suppose that could conflict with the normal hashing based on the command string
Because you may want it to consider both aspects
Paolo Di Tommaso
@pditommaso
Aug 07 2017 15:03
umm, there could be a solution
LukeGoodsell
@LukeGoodsell
Aug 07 2017 15:03
… I’m all eyes
Paolo Di Tommaso
@pditommaso
Aug 07 2017 15:04
provided the user script are located in the bin/ folder NG could hash that scripts and manage the resume if they change
this could be done
LukeGoodsell
@LukeGoodsell
Aug 07 2017 15:05
The problem is that the script itself doesn’t change, but one of its dependencies does
Simone Baffelli
@baffelli
Aug 07 2017 15:05
well then you are out of luck
Paolo Di Tommaso
@pditommaso
Aug 07 2017 15:05
I tend to agree :/
Simone Baffelli
@baffelli
Aug 07 2017 15:06
because nf is language-agnostic, how could it check the depndencies?

@pditommaso should I open an issue for

provided the user script are located in the bin/ folder NG could hash that scripts and manage the resume if they change

LukeGoodsell
@LukeGoodsell
Aug 07 2017 15:06
Just so I’m clear: is it possible to:
  1. Specify the permanent location of a file, and have nextflow link the output to that path rather than copy
  2. Check if the permanent location exists and reuse it if it does.
Paolo Di Tommaso
@pditommaso
Aug 07 2017 15:07
it's storeDir but it does not make the symlink
why do you need the link instead of a regular file ?
LukeGoodsell
@LukeGoodsell
Aug 07 2017 15:09
I don’t want to duplicate files that already exist. I’m dealing with multiple ~400 GB BAM files; copying the output is expensive
hardlinking would be perfect, symlinking would be acceptable.
Paolo Di Tommaso
@pditommaso
Aug 07 2017 15:10
the storeDir does not copy, just works there (unless you are using scratch, are you?)
Phil Ewels
@ewels
Aug 07 2017 15:19
@pditommaso - took me most of the afternoon, but now refactored :sweat_smile: ewels/NGI-MethylSeq@08415b7
Hopefully cleaner pipeline code now!
And easier to untangle as / when MultiQC can do this itself
(will depend on certain software authors including the software version in the log where it's not already)
Paolo Di Tommaso
@pditommaso
Aug 07 2017 15:20
great!
:)
Phil Ewels
@ewels
Aug 07 2017 15:21
hah, yup! ;)
@Shellfishgene - if you could pull the latest version and give that a try, that'd be great!
Hopefully works as intended now
Mike Smoot
@mes5k
Aug 07 2017 19:47

Hi @pditommaso I just started a pipeline and am watching stdout and I see this:

[warm up] executor > slurm
[skipping] Stored process > fetch_infernal_db (1)
[skipping] Stored process > fetch_blast_db_data (1)
[fa/bc719f] Submitted process > check_archetype_availability
[c7/c55c82] Submitted process > createExtrinsicFile
WARN: Process 'writeGenewiseYaml' cannot be executed by 'slurm' executor -- Using 'local' executor instead
[warm up] executor > local
[0a/f09d00] Submitted process > hintGffToGtf (trinity)
[8e/2cd7dc] Submitted process > hintGffToGtf (cufflinks)
[a8/91e76e] Submitted process > maskedFastaPassThru
[ef/755f70] Submitted process > cleanFasta
[8b/406e6f] Submitted process > createAugustusInputGffHints

Does this mean that because writeGenewiseYaml can't run with slurm, only that process will be executed locally or all processes will be executed locally? That executor > local line scares me...

Paolo Di Tommaso
@pditommaso
Aug 07 2017 19:48
yes, it's a process with groovy code, right ?
Mike Smoot
@mes5k
Aug 07 2017 19:48
exactly
yep, seeing a bunch of jobs getting submitted to the slurm partition.
Paolo Di Tommaso
@pditommaso
Aug 07 2017 19:53
if you don't want to see that message set executor 'local' in the process def
Mike Smoot
@mes5k
Aug 07 2017 19:54
Good tip, will do! I guess it's not that hard to figure out that it needs to run locally, but I'm still impressed that nextflow is that smart!
Paolo Di Tommaso
@pditommaso
Aug 07 2017 19:55
well, if you were running it with Ignite executor, it would run in a remote node ..
Mike Smoot
@mes5k
Aug 07 2017 19:59
If I installed nextflow on all the worker nodes, could this work with slurm too?
Paolo Di Tommaso
@pditommaso
Aug 07 2017 20:00
running it via slurm or independently ?
Mike Smoot
@mes5k
Aug 07 2017 20:01
via slurm
Paolo Di Tommaso
@pditommaso
Aug 07 2017 20:02
yes
you don't need to install it, slurm will deploy the daemons
Mike Smoot
@mes5k
Aug 07 2017 20:05
very interesting!
Félix C. Morency
@fmorency
Aug 07 2017 20:29
hey @mes5k, what's the status of #397 ?
I was reading the NF backlog and found interest in that issue
Paolo Di Tommaso
@pditommaso
Aug 07 2017 20:30
oh :)
Mike Smoot
@mes5k
Aug 07 2017 20:31
Sorry, I haven't had a chance to implement Paolo's suggestions. My time for hacking on nextflow is pretty limited, unfortunately.
Félix C. Morency
@fmorency
Aug 07 2017 20:31
Oh okay. Np. I had a couple of questions like I was wondering if it would be compatible with scratch
Mike Smoot
@mes5k
Aug 07 2017 20:34
So that your cached files would appear in scratch for worker nodes, or something along those lines?
Félix C. Morency
@fmorency
Aug 07 2017 20:35
Yes. Poke the cache on disk for said file and link it in scratch instead of downloading the file from the remote
Mike Smoot
@mes5k
Aug 07 2017 20:40
I hadn't thought about that, but I understand why you'd want it. Off the top of my head, it seems more natural to me to have a separate daemon running on worker nodes keeping scratch sync'd with a shared filesystem.
like rsync in a for loop
Félix C. Morency
@fmorency
Aug 07 2017 20:41
how would you do that for files produced by the pipeline and re-used in multiple tasks?
Paolo Di Tommaso
@pditommaso
Aug 07 2017 20:43
by using cacheableFile as I was suggesting NF will create the cache in the shared storage
Félix C. Morency
@fmorency
Aug 07 2017 20:44
where is that shared storage located?
Mike Smoot
@mes5k
Aug 07 2017 20:44
And then rsync the shared storage to the worker nodes. I suppose its possible, but I'd want some hard numbers before I'd optimize like that.
Paolo Di Tommaso
@pditommaso
Aug 07 2017 20:45
the work dir ..
Félix C. Morency
@fmorency
Aug 07 2017 20:45
mmm my work dir is located on remote so it wouldn't help
Paolo Di Tommaso
@pditommaso
Aug 07 2017 20:46
oh
interesting use case, you may want to comment there
Félix C. Morency
@fmorency
Aug 07 2017 20:47
Will do.
Paolo Di Tommaso
@pditommaso
Aug 07 2017 20:47
see you guys, I've made my 18 hours ;)