These are chat archives for nextflow-io/nextflow

7th
Mar 2018
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:24 UTC
Hi Paolo, I have a problem with escaping characters :)
PS: the previous problem was solved changing the queue system version :)
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:27 UTC
I have maaaaany problems ..
Evan Floden
@evanfloden
Mar 07 2018 10:28 UTC
But "escaping characters" ain’t one
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:30 UTC
yeah please do not escape from my escape problem! :9
:)
basically this
sed s/\\\[//g
is recognized as an error
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:33 UTC
you have to escape the escape .. so this should work
"""
sed s/\\[/g
"""
no wait
just twice
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:33 UTC
no
it fails
with 4 "\" it worked!
PS: I hate this :)
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:38 UTC
mmmm, guess the sed wont' work as expected
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:41 UTC
no sorry it required 6 "\"!!!
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:41 UTC
how is the normal one ?
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:42 UTC
uff... actually is not working as expected
I'll try something different from sed
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:42 UTC
tell me the one woking in the bash
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:42 UTC
sed s/\\[//g
only two from shell
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:43 UTC
why double \ in the shell script ?
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:45 UTC
...
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:45 UTC
ok, you are right
productivity tip of the day
when you have regexp in the NF command use slashy strings ie
process foo {

  script:
  /
  echo 'ciao[' | sed s/\\[//g 
  /
}
shit not working :satisfied:
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:46 UTC
ehheheheheh
productivity tip of the day: try until it works :/
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:47 UTC
this works
process foo {
  script:
  /
  echo 'ciao[' | sed s@\\[@@g 
  /
}
in sed you can replace / with @
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:48 UTC
wonderful
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:48 UTC
a :beer: more + the one of the past time = :beers:
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:49 UTC
you'll get drunk
Marco Capuccini
@mcapuccini
Mar 07 2018 15:32 UTC
Hello everyone!
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:33 UTC
Hey there
Marco Capuccini
@mcapuccini
Mar 07 2018 15:33 UTC
I was giving a look at the Nat. Nextflow publication. I have a question about the parallelization approach.
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:34 UTC
Yep
Marco Capuccini
@mcapuccini
Mar 07 2018 15:36 UTC
You use processes and channels to build the dep. graph, right? Then each process tracks the job in the cluster manager (e.g. Kubernetes), and the processes will live in the machine that launches the workflow?
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:37 UTC
If process you mean the data structure, yes
The job obviously will run remotely
Marco Capuccini
@mcapuccini
Mar 07 2018 15:38 UTC
yep, I understand
What is the maximum parallelism level that you manage to handle? Luigi does something similar with workers, but I can't get over 300 simultaneous jobs.
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:39 UTC
Basically the application orchestrated the jobs executions
Marco Capuccini
@mcapuccini
Mar 07 2018 15:40 UTC
But is a thread/process spawned for each parallel job in order to track the real job that runs in the cluster manager?
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:41 UTC
Nope, behind the scenes there's kind of actor
Marco Capuccini
@mcapuccini
Mar 07 2018 15:41 UTC
ok, that's a point to use Nextflow :)
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:41 UTC
You can run safely some million of jobs in my experience
;)
There's a new K8s support ready to be released
Marco Capuccini
@mcapuccini
Mar 07 2018 15:42 UTC
good :)
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:43 UTC
You may be interested, and I would like to have your feedback
Marco Capuccini
@mcapuccini
Mar 07 2018 15:43 UTC
Sure!
Have you though about a Spark runner? To get data locality and interoperability with the Hadoop ecosystem?
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:44 UTC
Can't send the link now, but if you check in the docs folder there's a Kubernetes document
Marco Capuccini
@mcapuccini
Mar 07 2018 15:44 UTC
Yes, I'll give it a look, I am fed up with Luigi :)
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:45 UTC
Spark should be the next one ..
Marco Capuccini
@mcapuccini
Mar 07 2018 15:46 UTC
I could contribute on that actually. Do you have documentation on "how to make a runner"? Or is there a place I could start?
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:47 UTC
That would be nice, the source is the documentation of course ;)
But I should have a skeleton somewhere
I'm with the mobile now, I can send some links later
Félix C. Morency
@fmorency
Mar 07 2018 15:48 UTC
@mcapuccini We gave up Luigi a year ago in favor of Nextflow (+ SLURM). Been there.
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:48 UTC
And we can eventually organise a call
Edgar
@edgano
Mar 07 2018 15:48 UTC
hey guys, do you know if I can use multiple ocnditions on the "search" text field of the report?
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:49 UTC
Don't think so
Marco Capuccini
@mcapuccini
Mar 07 2018 15:49 UTC
Sure! I'll send you a mail then!
Edgar
@edgano
Mar 07 2018 15:50 UTC
ok @pditommaso
Shellfishgene
@Shellfishgene
Mar 07 2018 16:20 UTC
If I set publishDir to "move", the step is repeated each time I run the pipeline again?
Paolo Di Tommaso
@pditommaso
Mar 07 2018 16:21 UTC
move invalidates the cache, it should be written in the docs
Shellfishgene
@Shellfishgene
Mar 07 2018 16:24 UTC
I see it now; only use for terminating process. Thanks.
Paolo Di Tommaso
@pditommaso
Mar 07 2018 16:24 UTC
:+1:
Shellfishgene
@Shellfishgene
Mar 07 2018 16:57 UTC
If I move the file back to the workdir, will it be cached again? Just so I don't have to run this again...
Paolo Di Tommaso
@pditommaso
Mar 07 2018 16:57 UTC
in principle yes, provided file timestamps have not changed
rfenouil
@rfenouil
Mar 07 2018 17:12 UTC
Hello @pditommaso , is there an easy way to automatically delete the 'work' subdirectories except the ones that were used for the last (successful) execution of the pipeline ?
The goal is to clean disk space for old executions but still allow a -resume for 'small' coming changes
Paolo Di Tommaso
@pditommaso
Mar 07 2018 17:13 UTC
nextflow clean -h
rfenouil
@rfenouil
Mar 07 2018 17:17 UTC
Awesome ! Sorry, should have found this one by myself...
Thank you
Paolo Di Tommaso
@pditommaso
Mar 07 2018 17:17 UTC
:+1:
Edgar
@edgano
Mar 07 2018 17:34 UTC
@pditommaso there is a way to have a "default" name for the report/trace?
The idea is to have the report/trace with the execution name and not trace.txt , trace.txt.1, ... Can I add something to the config file?
trace.file = 'foo'
or
trace.file = params.nameCommandLineOption
Edgar
@edgano
Mar 07 2018 17:38 UTC
yeah, but I can't use the unique name given by nextflow
Paolo Di Tommaso
@pditommaso
Mar 07 2018 17:38 UTC
nope :/
Edgar
@edgano
Mar 07 2018 17:39 UTC
bc on the history we can identify the runs by this unique name, and I am thinking to use this unique name for the execution reports
Paolo Di Tommaso
@pditommaso
Mar 07 2018 17:39 UTC
true make sense, open an issue and implement it :)
Edgar
@edgano
Mar 07 2018 17:39 UTC
:smile:
Shellfishgene
@Shellfishgene
Mar 07 2018 18:22 UTC
What's wrong with this: set id, file("${id}_sort.bam") into sorted_bam, I'm getting "illegal string body character after dollar sign;"