These are chat archives for nextflow-io/nextflow

7th
Mar 2018
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:24
Hi Paolo, I have a problem with escaping characters :)
PS: the previous problem was solved changing the queue system version :)
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:27
I have maaaaany problems ..
Evan Floden
@evanfloden
Mar 07 2018 10:28
But "escaping characters" ain’t one
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:30
yeah please do not escape from my escape problem! :9
:)
basically this
sed s/\\\[//g
is recognized as an error
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:33
you have to escape the escape .. so this should work
"""
sed s/\\[/g
"""
no wait
just twice
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:33
no
it fails
with 4 "\" it worked!
PS: I hate this :)
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:38
mmmm, guess the sed wont' work as expected
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:41
no sorry it required 6 "\"!!!
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:41
how is the normal one ?
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:42
uff... actually is not working as expected
I'll try something different from sed
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:42
tell me the one woking in the bash
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:42
sed s/\\[//g
only two from shell
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:43
why double \ in the shell script ?
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:45
...
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:45
ok, you are right
productivity tip of the day
when you have regexp in the NF command use slashy strings ie
process foo {

  script:
  /
  echo 'ciao[' | sed s/\\[//g 
  /
}
shit not working :satisfied:
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:46
ehheheheheh
productivity tip of the day: try until it works :/
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:47
this works
process foo {
  script:
  /
  echo 'ciao[' | sed s@\\[@@g 
  /
}
in sed you can replace / with @
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:48
wonderful
Paolo Di Tommaso
@pditommaso
Mar 07 2018 10:48
a :beer: more + the one of the past time = :beers:
Luca Cozzuto
@lucacozzuto
Mar 07 2018 10:49
you'll get drunk
Marco Capuccini
@mcapuccini
Mar 07 2018 15:32
Hello everyone!
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:33
Hey there
Marco Capuccini
@mcapuccini
Mar 07 2018 15:33
I was giving a look at the Nat. Nextflow publication. I have a question about the parallelization approach.
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:34
Yep
Marco Capuccini
@mcapuccini
Mar 07 2018 15:36
You use processes and channels to build the dep. graph, right? Then each process tracks the job in the cluster manager (e.g. Kubernetes), and the processes will live in the machine that launches the workflow?
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:37
If process you mean the data structure, yes
The job obviously will run remotely
Marco Capuccini
@mcapuccini
Mar 07 2018 15:38
yep, I understand
What is the maximum parallelism level that you manage to handle? Luigi does something similar with workers, but I can't get over 300 simultaneous jobs.
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:39
Basically the application orchestrated the jobs executions
Marco Capuccini
@mcapuccini
Mar 07 2018 15:40
But is a thread/process spawned for each parallel job in order to track the real job that runs in the cluster manager?
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:41
Nope, behind the scenes there's kind of actor
Marco Capuccini
@mcapuccini
Mar 07 2018 15:41
ok, that's a point to use Nextflow :)
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:41
You can run safely some million of jobs in my experience
;)
There's a new K8s support ready to be released
Marco Capuccini
@mcapuccini
Mar 07 2018 15:42
good :)
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:43
You may be interested, and I would like to have your feedback
Marco Capuccini
@mcapuccini
Mar 07 2018 15:43
Sure!
Have you though about a Spark runner? To get data locality and interoperability with the Hadoop ecosystem?
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:44
Can't send the link now, but if you check in the docs folder there's a Kubernetes document
Marco Capuccini
@mcapuccini
Mar 07 2018 15:44
Yes, I'll give it a look, I am fed up with Luigi :)
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:45
Spark should be the next one ..
Marco Capuccini
@mcapuccini
Mar 07 2018 15:46
I could contribute on that actually. Do you have documentation on "how to make a runner"? Or is there a place I could start?
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:47
That would be nice, the source is the documentation of course ;)
But I should have a skeleton somewhere
I'm with the mobile now, I can send some links later
Félix C. Morency
@fmorency
Mar 07 2018 15:48
@mcapuccini We gave up Luigi a year ago in favor of Nextflow (+ SLURM). Been there.
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:48
And we can eventually organise a call
Edgar
@edgano
Mar 07 2018 15:48
hey guys, do you know if I can use multiple ocnditions on the "search" text field of the report?
Paolo Di Tommaso
@pditommaso
Mar 07 2018 15:49
Don't think so
Marco Capuccini
@mcapuccini
Mar 07 2018 15:49
Sure! I'll send you a mail then!
Edgar
@edgano
Mar 07 2018 15:50
ok @pditommaso
Shellfishgene
@Shellfishgene
Mar 07 2018 16:20
If I set publishDir to "move", the step is repeated each time I run the pipeline again?
Paolo Di Tommaso
@pditommaso
Mar 07 2018 16:21
move invalidates the cache, it should be written in the docs
Shellfishgene
@Shellfishgene
Mar 07 2018 16:24
I see it now; only use for terminating process. Thanks.
Paolo Di Tommaso
@pditommaso
Mar 07 2018 16:24
:+1:
Shellfishgene
@Shellfishgene
Mar 07 2018 16:57
If I move the file back to the workdir, will it be cached again? Just so I don't have to run this again...
Paolo Di Tommaso
@pditommaso
Mar 07 2018 16:57
in principle yes, provided file timestamps have not changed
rfenouil
@rfenouil
Mar 07 2018 17:12
Hello @pditommaso , is there an easy way to automatically delete the 'work' subdirectories except the ones that were used for the last (successful) execution of the pipeline ?
The goal is to clean disk space for old executions but still allow a -resume for 'small' coming changes
Paolo Di Tommaso
@pditommaso
Mar 07 2018 17:13
nextflow clean -h
rfenouil
@rfenouil
Mar 07 2018 17:17
Awesome ! Sorry, should have found this one by myself...
Thank you
Paolo Di Tommaso
@pditommaso
Mar 07 2018 17:17
:+1:
Edgar
@edgano
Mar 07 2018 17:34
@pditommaso there is a way to have a "default" name for the report/trace?
The idea is to have the report/trace with the execution name and not trace.txt , trace.txt.1, ... Can I add something to the config file?
trace.file = 'foo'
or
trace.file = params.nameCommandLineOption
Edgar
@edgano
Mar 07 2018 17:38
yeah, but I can't use the unique name given by nextflow
Paolo Di Tommaso
@pditommaso
Mar 07 2018 17:38
nope :/
Edgar
@edgano
Mar 07 2018 17:39
bc on the history we can identify the runs by this unique name, and I am thinking to use this unique name for the execution reports
Paolo Di Tommaso
@pditommaso
Mar 07 2018 17:39
true make sense, open an issue and implement it :)
Edgar
@edgano
Mar 07 2018 17:39
:smile:
Shellfishgene
@Shellfishgene
Mar 07 2018 18:22
What's wrong with this: set id, file("${id}_sort.bam") into sorted_bam, I'm getting "illegal string body character after dollar sign;"