These are chat archives for nextflow-io/nextflow

28th
Feb 2018
Bioninbo
@Bioninbo
Feb 28 2018 09:45
Hello. Is it possible to specify the directory in which to save the .nextflow.log files?
Paolo Di Tommaso
@pditommaso
Feb 28 2018 09:46
nextflow -log path/file.log run .. etc
Bioninbo
@Bioninbo
Feb 28 2018 09:50
Thanks. And to specify that in the config file?
Paolo Di Tommaso
@pditommaso
Feb 28 2018 09:51
nextflow -h is your friend :P
Bioninbo
@Bioninbo
Feb 28 2018 10:01
Sorry I don't get it. When I add log = "run_metrics/file.log" to the config file it does not create the log file
Paolo Di Tommaso
@pditommaso
Feb 28 2018 10:01
show me the full command line you have used ?
Bioninbo
@Bioninbo
Feb 28 2018 10:02
import java.text.SimpleDateFormat
def date = new Date()
sdf = new SimpleDateFormat("MM-dd-yyyy_HH_mm_ss")
current_date = sdf.format(date)
log = "run_metrics/${current_date}/file.log"
Paolo Di Tommaso
@pditommaso
Feb 28 2018 10:02
triple ` then new-line
code
triple ` then new-line
Bioninbo
@Bioninbo
Feb 28 2018 10:05
this code works
timeline {
enabled = true
file = "run_metrics/${current_date}/timeline.html" }
Paolo Di Tommaso
@pditommaso
Feb 28 2018 10:05
triple ` new-line
code
triple ` new-line
Bioninbo
@Bioninbo
Feb 28 2018 10:06
sorry
Paolo Di Tommaso
@pditommaso
Feb 28 2018 10:07
much better
then, the log file command defined programmatically
only as a command line option as I've showed above
Bioninbo
@Bioninbo
Feb 28 2018 10:08
Ok I see
Thanks for the clarification
Paolo Di Tommaso
@pditommaso
Feb 28 2018 10:09
welcome
marchoeppner
@marchoeppner
Feb 28 2018 13:34
has anyone tried using nextflow with singularity on a Slurm cluster? Specifically, I wanted to use the $TMPDIR variable set by Slurm to write some (duh!) temp data to in one of my stages. But when running with singularity and nextflow, TMPDIR is unset - I suppose because it doesn't exist inside the singularity container. Any ideas of how to work around that?
the problem is that $TMPDIR is specific to each job (here: /scratch/Slurm.<JOB_ID.<USER_ID> or similar)
Maxime Garcia
@MaxUlysse
Feb 28 2018 13:35
@marchoeppner We're using slurm or local and singularity, but I'm afraid we never used $TMPDIR
Can't you specify your work directory in /scratch directly?
marchoeppner
@marchoeppner
Feb 28 2018 13:37
well yes, but the idea is that by using the built-in slurm tmp dir it automatically gets cleaned up after the job finishes
whereas I would need to ensure that it is cleaned up using some "after script" voodoo in nextflow. possible, but a bit inconvenient (and wouldnt work if the job crashes)
Paolo Di Tommaso
@pditommaso
Feb 28 2018 13:37
the TMPDIR should be correctly handled
what version of NF are you using ?
marchoeppner
@marchoeppner
Feb 28 2018 13:38
0.25.5
Paolo Di Tommaso
@pditommaso
Feb 28 2018 13:38
too old, update it, it's free ;)
marchoeppner
@marchoeppner
Feb 28 2018 13:39
yes, but the later versions brake my code :D some tools generate report files with fixed names that I wish to pass to multqc. But I think NF 0.26 and upwards do not accept files with the same name to exist within a stage
but I suppose I need to find away around that..
Paolo Di Tommaso
@pditommaso
Feb 28 2018 13:41
well, this sounds strange, it wasn't introduced any breaking change recently, maybe you only need to update some piece of your code
marchoeppner
@marchoeppner
Feb 28 2018 13:42
I just tried it once, noticed the issue and couldnt be bothered to update. But if I need to , I am sure I can figure out a way
thanks!
Phil Ewels
@ewels
Feb 28 2018 14:26
We had the same thing. It’s not a breaking change - it’s just that now NF doesn’t allow you to do what was previously a silent error
That is - staging multiple files with identical names (overwriting randomly)
It’s an easy fix though :)
Put into a directory with an asterisk and NF does auto-increment counting
( @pditommaso - this is the thing we talked about for a new feature where NF can have the counter in the path and maintain the original base file name)
Phil Ewels
@ewels
Feb 28 2018 14:32
nextflow-io/nextflow#568
eg. file ('alignment/??/*') from alignment_logs.collect() - should work from 0.28.1-SNAPSHOT onwards
Simone Baffelli
@baffelli
Feb 28 2018 15:16
Hello! Is there something like workflow.onStart?
I have a very convoluted idea on how to deal with failed processes, which involves creating a file to store the paths of file that caused it to fail. I'd like to make sure that file exists before starting the workflow
Shawn Rynearson
@srynobio
Feb 28 2018 15:20

I have a question regarding data and how it's run on aws-batch. I noticed from the the .command.run script that you copy over:

/home/ec2-user/miniconda/bin/aws --region us-west-2 s3 cp --only-show-errors s3://testbucket/work/my.bam my.bam

/home/ec2-user/miniconda/bin/aws --region us-west-2 s3 cp --only-show-errors s3://testbucket/work/54/1c29767397a8e492af6e9ebeba1df6/.command.sh .command.sh

The question I have is: Is the data transfered into my /home/ec2-user/ and if so does it require me to have a large root instance.

Also is it possable to use ephemeral drives
jncvee
@jncvee
Feb 28 2018 15:29
hi if i want to run a nextflow script on a SLURM what command do i have to put in? is it just sbatch then the file name ? or will I still need to put in ./nextflow run then the file name ?
Shawn Rynearson
@srynobio
Feb 28 2018 15:31
@jncvee other can correct me but yes you can here are the docs on how to do it.
@jncvee in my config I have the following:
process {
    executor = 'slurm'


    $fastqc {
        errorStrategy = 'retry'
        clusterOptions = '-A myaccount -p mypartition'
   }
}
jncvee
@jncvee
Feb 28 2018 15:34
okay thank you ! that helps a lot!
jncvee
@jncvee
Feb 28 2018 16:25
do all File outputs need a proceeding channel to go into ?
jncvee
@jncvee
Feb 28 2018 16:37
Also if i am using python for part of a process. Will I still be able to/need to use an input and output
Simone Baffelli
@baffelli
Feb 28 2018 16:37
What do you mean?
Your python process must produce a file named according to the output specification
Paolo Di Tommaso
@pditommaso
Feb 28 2018 16:42
@srynobio I'm a bit lost your question is about Batch or Slurm ?
Simone Baffelli
@baffelli
Feb 28 2018 17:01
@pditommaso is it possible that combine returns the same combinations twice?
Paolo Di Tommaso
@pditommaso
Feb 28 2018 17:04
uh, if there are duplicate in the source items, yes
numbers = Channel.from(1,2,3)
words = Channel.from('hello', 'ciao', 'ciao')
numbers
    .combine(words)
    .println()
[1, hello]
[2, hello]
[3, hello]
[1, ciao]
[2, ciao]
[3, ciao]
[1, ciao]
[2, ciao]
[3, ciao]
Simone Baffelli
@baffelli
Feb 28 2018 17:05
I see
That is the reason then
Phil Ewels
@ewels
Feb 28 2018 17:05
@jncvee - if it’s an intermediate file then no, you don’t need an output channel. If you want to use that file in another other processes, or save it using publishDir then it needs to be specified under output though
@jncvee - a bit difficult to understand the question though. More details / a pseudo code example may help :+1:
Simone Baffelli
@baffelli
Feb 28 2018 17:10
@pditommaso it is related to transpose
jncvee
@jncvee
Feb 28 2018 17:11
So i do want to save it using publishDir. Will the output need a channel to go into?
Simone Baffelli
@baffelli
Feb 28 2018 17:13
It cannot transpose a tuple of (number, [item1,item2])
Paolo Di Tommaso
@pditommaso
Feb 28 2018 17:21
frankly I don't know, test case => etc :)
Simone Baffelli
@baffelli
Feb 28 2018 17:21
I will
Stephen Kelly
@stevekm
Feb 28 2018 17:56
Hey @pditommaso do you have any suggestions on which kinds of questions should be posted here vs. Google Groups vs. GitHub Issue/Feature requests?
Paolo Di Tommaso
@pditommaso
Feb 28 2018 18:01
generally errors on GH, everything else here/google forum
Phil Ewels
@ewels
Feb 28 2018 18:22
@jncvee I don’t think so. It’ll need to be declared but I don’t think it needs a channel, no
@pditommaso - correct me if I’m wrong..
Paolo Di Tommaso
@pditommaso
Feb 28 2018 18:26
exactly
Mike Smoot
@mes5k
Feb 28 2018 19:54
Is there any way for nextflow to behave differently based on the type of error that occurs when using errorStrategy retry? I'm running into a situation where sbatch (slurm executor) periodically fails for unknown reasons and a bunch of processes fail and then retry. Because this happens a lot, I eventually exhaust my retries and the pipeline dies. I can set higher maxRetries and maxErrors for everything, but I'd like something like maxRetries 10 for sbatch related failures and maxRetries 2 for all others. This would be a complete hack, so I'm totally OK with the answer being "NO"!
Paolo Di Tommaso
@pditommaso
Feb 28 2018 20:00
Mike Smoot
@mes5k
Feb 28 2018 20:03
See, this is why I asked! :)
Paolo Di Tommaso
@pditommaso
Feb 28 2018 20:03
lol, what's the problem with that?
Mike Smoot
@mes5k
Feb 28 2018 20:06
I just assumed my problem was esoteric enough that no one else had seen it.
Paolo Di Tommaso
@pditommaso
Feb 28 2018 20:08
never tried, but you should be able to set maxRetries { task.exitStatus == XXX ? n : m }
Mike Smoot
@mes5k
Feb 28 2018 20:13
That's what I'll try, but the task.exitStatus appears to be 1, so I'm not sure it'll be easy to distinguish my sbatch failures from others. I was just poking through the code to see if task has something else I could use to test.
Paolo Di Tommaso
@pditommaso
Feb 28 2018 20:13
nope :(
Mike Smoot
@mes5k
Feb 28 2018 20:15
is task.error populated?
I guess not
Paolo Di Tommaso
@pditommaso
Feb 28 2018 20:15
I don't think so
Mike Smoot
@mes5k
Feb 28 2018 20:20
Probably better for me to solve the root problem in any case. Thanks!
Paolo Di Tommaso
@pditommaso
Feb 28 2018 20:20
I guess so ..