These are chat archives for nextflow-io/nextflow

22nd
Nov 2017
Paolo Di Tommaso
@pditommaso
Nov 22 2017 09:26
nextflow-io/nextflow#529
Tim Diels
@timdiels
Nov 22 2017 12:07
I have an algorithm which does not always converge and has no timeout arg. Can I use Nextflow's time constraint to time it out without marking it as an error and without emitting the partial output on its output channels? It runs the algorithm separately for each input set.
Paolo Di Tommaso
@pditommaso
Nov 22 2017 12:20
yes, use errorStrategy 'ignore'
incomplete outputs are ignored
Tim Diels
@timdiels
Nov 22 2017 12:28
what if it fails due to out of memory or exits non-zero, don't those get ignored as well then?
Paolo Di Tommaso
@pditommaso
Nov 22 2017 12:29
yes
Tim Diels
@timdiels
Nov 22 2017 12:38
I think I'll put a timeout inside the script then, thanks.
Paolo Di Tommaso
@pditommaso
Nov 22 2017 12:38
what executor are you using ?
local ?
Tim Diels
@timdiels
Nov 22 2017 12:40
sge for now
Paolo Di Tommaso
@pditommaso
Nov 22 2017 12:42
if so use memory, time and errorStrategy 'ignore' directive
SGE will kill it if the job exceed mem or time limits
Tim Diels
@timdiels
Nov 22 2017 12:46
I only want to ignore timeouts, and perhaps out-of-memory errors, but preferably not real errors where the algorithm itself exits non-zero
Paolo Di Tommaso
@pditommaso
Nov 22 2017 12:46
as before
Tim Diels
@timdiels
Nov 22 2017 12:47
It does not ignore non-zero exit?
Paolo Di Tommaso
@pditommaso
Nov 22 2017 12:48
ahhh
now I get it right
time and mem errors are in the range 137-140
you can do specify an error strategy as { task.exitStatus >= 140 ? 'ignore' : 'terminate' }
Tim Diels
@timdiels
Nov 22 2017 12:54
ah perfect, thanks
Maxime Garcia
@MaxUlysse
Nov 22 2017 13:05
@pditommaso I managed to advance with AWS Batch, I now have the error download failed: s3://xxx/.command.run to - An error occurred (403) when calling the HeadObject operation: Forbidden, so I'm guessing I have to get the correct security settings
Paolo Di Tommaso
@pditommaso
Nov 22 2017 13:07
Yep you need to grant S3 full permission
Maxime Garcia
@MaxUlysse
Nov 22 2017 13:10
Public access for everyone?
Paolo Di Tommaso
@pditommaso
Nov 22 2017 13:11
Nono!
You need an IAM policy
I can send in 30 mins
Maxime Garcia
@MaxUlysse
Nov 22 2017 13:12
ok
Perfect
Thanks a lot
Paolo Di Tommaso
@pditommaso
Nov 22 2017 14:01
so
Batch needs three roles: Service role, Instance role, Spot fleet role
Paolo Di Tommaso
@pditommaso
Nov 22 2017 14:07
for the Service role you need a role with AWSBatchServiceRole IAM policy
for the Instance Role add these policies AmazonS3FullAccess and AmazonEC2ContainerServiceforEC2Role
finally Spot fleet role, you need a role with the AmazonEC2SpotFleetRole policy
ping @MaxUlysse
Maxime Garcia
@MaxUlysse
Nov 22 2017 14:13
I had the service role and the spot fleet role right, but not the instance role
Thanks
Maxime Garcia
@MaxUlysse
Nov 22 2017 14:50
OK, so I now have a missing output file for my process, so that's a good progress
Maxime Garcia
@MaxUlysse
Nov 22 2017 15:38
So it seems that I did not copy any file from my s3 bucket into the work directory
But i did get all the .command.* files in my s3 sork directory
So I think I'm on a good path
Paolo Di Tommaso
@pditommaso
Nov 22 2017 15:40
ok, when you post a question I will reply :)
Maxime Garcia
@MaxUlysse
Nov 22 2017 15:41
lol
Do you know which policies/roles or what else can I verify to see why it's not working?
Paolo Di Tommaso
@pditommaso
Nov 22 2017 15:43
so, repeat again what's the problem ?
Maxime Garcia
@MaxUlysse
Nov 22 2017 15:44
I can write into my s3 bucket for the work directory, but it seems that I can't read the files that are inside my input s3 bucket
Paolo Di Tommaso
@pditommaso
Nov 22 2017 15:45
any error message in the task log files ?
Maxime Garcia
@MaxUlysse
Nov 22 2017 15:46
The user-provided path *_fastqc.{zip,html} does not exist.
but there is no input file
Paolo Di Tommaso
@pditommaso
Nov 22 2017 15:47
can you dump the content of the .command.run
Meeting
Maxime Garcia
@MaxUlysse
Nov 22 2017 15:51
No problem
Sorry, I had troubles with the linux gitter app, it did not want to upload my file
John C. Earls
@JohnCEarls
Nov 22 2017 16:40

I'm trying to figure out the best way to do something. I have a set of channels that all output similarly formatted data. I want to conditionally, based on some param. run a process on those mixed channels.

I was creating an empty channel and then using mix in the from clause.

hist = Channel.empty()

process whatever{

input:
    set val(data_source), val(process_source), file('table.csv') from hist.mix(db_hist, dn_hist, gd_hist, sd_hist)
...
}

And that worked well enough when I always want to run it. I would like to set a parameter that only conditionally runs it so I tried.

hist = Channel.empty()
if(params.write_hist){
    hist.mix(db_hist, dn_hist, gd_hist, sd_hist)
}
process whatever{

input:
    set val(data_source), val(process_source), file('table.csv') from hist
...
}

but this blew up on me.

I'm probably way off base with how I am doing this, so I wanted to see how I should be going about it.

John C. Earls
@JohnCEarls
Nov 22 2017 16:46

sry, second one should have been

```hist = Channel.empty()
if(params.write_hist){
hist.mix(db_hist, dn_hist, gd_hist, sd_hist)
}
process whatever{

input:
set val(data_source), val(process_source), file('table.csv') from hist
...
}```

Mike Smoot
@mes5k
Nov 22 2017 16:47
By blow up do you mean when params.write_hist is true, your process doesn't run?
John C. Earls
@JohnCEarls
Nov 22 2017 16:49
yes, it complains that hist is used twice
ERROR ~ Channel `hist` has been used twice as an input by process `histogram_generation` and another operator
Mike Smoot
@mes5k
Nov 22 2017 16:50
I'd write that like:
if (params.write_hist) {
    db_hist.mix(dn_hist, gd_hist, sd_hist).set{ hist }
} else {
    Channel.empty().set{ hist }
}
process whatever{

input:
    set val(data_source), val(process_source), file('table.csv') from hist
...
}
John C. Earls
@JohnCEarls
Nov 22 2017 16:51
Thanks, I'll try that.
Mike Smoot
@mes5k
Nov 22 2017 16:51
In your example hist gets consumed by the mix command. You need to set the result of mix to something.
by something I mean a new channel
John C. Earls
@JohnCEarls
Nov 22 2017 16:53
does hist need to be initialized or does set create it?
Mike Smoot
@mes5k
Nov 22 2017 16:55
set creates it
John C. Earls
@JohnCEarls
Nov 22 2017 16:58
Thank you. Your name seems familiar and it looks like you are a cytoscape dev. (checked out your github). Are you one of Trey Ideker's guys?
Mike Smoot
@mes5k
Nov 22 2017 16:59
Well, set doesn't create the channel, but gives it a name so that it can be used. Channel.empty() creates the channel and returns it. set takes the returned channel and gives it a name.
Yup, I worked for Trey on Cytoscape for several years. Now I'm at Synthetic Genomics.
John C. Earls
@JohnCEarls
Nov 22 2017 16:59
BTW, that worked like a charm.
Mike Smoot
@mes5k
Nov 22 2017 17:00
great!
John C. Earls
@JohnCEarls
Nov 22 2017 17:00
Awesome, I am from the Institute for Systems Biology. Paul Shannon is in my lab and we are currently working with Trey on his "deepTranslate" project.
Paolo Di Tommaso
@pditommaso
Nov 22 2017 17:01
cool people meet on NF channel :satisfied:
John C. Earls
@JohnCEarls
Nov 22 2017 17:01
:) Well thanks again.
Mike Smoot
@mes5k
Nov 22 2017 17:02
Very cool! Trey is a great guy to work with.
John C. Earls
@JohnCEarls
Nov 22 2017 17:03
Super smart and seems like a lot of fun. Good enthusiasm.