These are chat archives for nextflow-io/nextflow

15th
Aug 2018
Venkat Malladi
@vsmalladi
Aug 15 2018 02:09
@mes5k that worked, didn’t know why i tried readLines
Shellfishgene
@Shellfishgene
Aug 15 2018 07:20
Is there a way to get an overview which jobs finished successfully and which did not after a nextflow run?
tbugfinder
@tbugfinder
Aug 15 2018 08:32
nextflow log ... gives you many options to filter.
Jens Preußner
@jenzopr
Aug 15 2018 09:30
Hi all! Is it possible to have process script templates in remote locations (e.g. a github repository)? Any experience with that?
Jens Preußner
@jenzopr
Aug 15 2018 09:46
I guess the question boils down to: Is file(...) usable in script: template file('...')?
micans
@micans
Aug 15 2018 09:50
readLines() is a Groovy method, splitText() is a NF operator.
I use them both
Sander Bervoets
@Biocentric
Aug 15 2018 09:52
@tbugfinder Instance type: c4.xlarge and job definitions: either nextflow_rnaseq or nf-nextflow-rnaseq-nf
Alper Yilmaz
@alperyilmaz
Aug 15 2018 10:51

Hi, I'm trying to implement aws-batch with nextflow and I'm having trouble with spotPrice option in config. My repo is alperyilmaz/nextflow-rnaseq which is copy of the nextflow-io/rnaseq-encode-nf tutorial with some bits from fstrozzi/rnaseq-encode-nf.

When I run the nextflow in master node, it's stuck after index and parseEncode steps

[b8/49cd21] Submitted process > index (Homo_sapiens.GRCh38.cdna.all.fa.gz)
[dd/c200f0] Submitted process > parseEncode (/home/ec2-user/.nextflow/assets/alperyilmaz/nextflow-rnaseq/data/metadata.small.tsv)

There are no error messages printed on screen. When I go to AWS console I notice that there are failed Spot requests.

In the current config, there are two spotPrice lines in cloud section. The first one works, the master node is launched with Max price of $1. But in the auto scalable part, the spotPrice = 1 is ignored and fleet is requested with very low maximum price bids.

cloud {
    imageId = 'ami-7c97d804'
    instanceType = 'm4.xlarge'
    spotPrice = 1
    securityGroup = 'sg-8b3100fa'
    keyName = 'alper-uswest2-keypair'
    userName = 'ec2-user'
    autoscale {
        enabled = true
        minInstances = 5
        maxInstances = 10
        imageId = 'ami-7c97d804'
        instanceType = 'm4.4xlarge'
        spotPrice = 1
        terminateWhenIdle = true
    }
}

As seen below, the two different fleet were requested.

Spot Request Error

The fleets were consisted of multiple type of instances with different weights.

# Spot request 1
Allocation strategy: lowestPrice

Instance type(s): m4.2xlarge weight=1 $0.12000000000, 
                  r4.2xlarge weight=1 $0.1200000000, 
                  m4.4xlarge weight=2 $0.0600000000, 
                  r4.4xlarge weight=2 $0.0600000000

Max price: $0.12


# Spot request 2
Allocation strategy: lowestPrice

Instance type(s): m4.large weight=1 $0.03000000000, 
                  r4.large weight=1 $0.0300000000, 
                  m4.xlarge weight=2 $0.0150000000, 
                  r4.xlarge weight=2 $0.0150000000

Max price: $0.03

I tried changing spotPrice value within autoscale (or even removing it completely) but each time, two fleets with maximum prices of \$0.12 and \$0.03 were requested. How can I troubleshoot this problem?

Shellfishgene
@Shellfishgene
Aug 15 2018 11:00
@tbugfinder How do I use log to look at the jobs of a single run?
Kevin Sayers
@KevinSayers
Aug 15 2018 11:04
@Shellfishgene are you looking for something like https://www.nextflow.io/docs/latest/tracing.html#tasks which shows each process?
Shellfishgene
@Shellfishgene
Aug 15 2018 11:06
That's on option if I remember to use --with-report. I was more looking for something like the messages during the run, with jobs were canceled, which retried and so on.
Or rather -with-trace. I'll just try to remember to use that from now on...
Alper Yilmaz
@alperyilmaz
Aug 15 2018 11:13
As far as I understand, @Shellfishgene is looking for live report, which is available via -with-weblog option maybe?
Shellfishgene
@Shellfishgene
Aug 15 2018 11:14
No, just a report like the one -with-trace gives just without giving the option ;). I thought log maybe provided more detail, but I guess not.
Alper Yilmaz
@alperyilmaz
Aug 15 2018 11:43
@apeltzer, in your qbicsoftware/icgc-featurecounts example the awsbatch.config file has batch{ ... } section.. In my example (pasted above) I only have aws{ ... } and cloud{ ... } sections. Should I carry cloud info underbatch section?
Alexander Peltzer
@apeltzer
Aug 15 2018 13:03
Hi Alper!
Sorry was in meetings till now
IIRC the aws{} and cloud {} scopes won#t work with AWSBatch
the batch{} scope doesn't exist and is currently incorrect: Yo ucould do something like this here however https://github.com/nf-core/rnaseq/blob/master/conf/awsbatch.config
Shellfishgene
@Shellfishgene
Aug 15 2018 14:09
How do I make sure files from three channels are for the same sample? For example this input mixes different samples:
input:
   set val( sample ), file( "${sample}.bam" ) from indexed_bam
   set val( sample ), file( "${sample}.bam.bai" ) from bam_index
   set val( sample ), file( "${sample}.split.bam" ) from split_bam
Alper Yilmaz
@alperyilmaz
Aug 15 2018 14:09
@apeltzer , thanks for the info. In the example you've sent, I can see params like max_memory, max_cpu but how can I place options like spotPrice, instanceType, imageId?
Alexander Peltzer
@apeltzer
Aug 15 2018 14:09
Thats stuff you'd need to specify when setting up your JobQueue and ComputeEnvironment on AWS Batch directly
you can specifiy e.g. a custom AMI ID there directly
I'll push out a hands on how to in the next days when I find the time to polish it
Alper Yilmaz
@alperyilmaz
Aug 15 2018 14:10
oh my! Do I need to specify them on AWS Batch page?
Alexander Peltzer
@apeltzer
Aug 15 2018 14:10
Yes
Alper Yilmaz
@alperyilmaz
Aug 15 2018 14:11
Then, what is the next least complicated to many alignments at AWS, using cloud and aws scopes?
typo: ..least complicated way..
Alexander Peltzer
@apeltzer
Aug 15 2018 14:12
AWSBatch would be the easiest way In my opinionm
Took me quite some time though too
Alper Yilmaz
@alperyilmaz
Aug 15 2018 14:12
If it took time for you, it will take ages for me :(
Maxime Garcia
@MaxUlysse
Aug 15 2018 14:12
I agree with @apeltzer AWS Batch is easier
they do have some good docs about how to set everything up
Alper Yilmaz
@alperyilmaz
Aug 15 2018 14:13
If I use aws and cloud scopes, will NF take care of failed instances/jobs/commands?
Shellfishgene
@Shellfishgene
Aug 15 2018 15:02

Why does nf produce two copies of all input files with numbers appended when I define the input like this:

input:
   set val( sample ), file( indexed_bam ) from indexed_bam
   file( "${sample}.bam.bai" ) from bam_index
   file( "${sample}.split.bam" ) from split_bam

I get X.bam.bai1 and X.bam.bai2 and the same for split.bam.

Karin Lagesen
@karinlag
Aug 15 2018 15:16
@Shellfishgene can you show how you specify bam_index and split_bam?
Shellfishgene
@Shellfishgene
Aug 15 2018 15:17
output:
   set val( sample ), file( "${sample}.split.bam" ) into split_bam

   """
   samtools view -Sb@ ${task.cpus} ${split_sam} > ${sample}.split.bam
   """
Similar for the index file.
@karinlag I just saw others do this by combining channels first on the sample id, is that necessary?
Karin Lagesen
@karinlag
Aug 15 2018 15:22
I suspect, but am not completely sure of that you can't access a variable that is set in the same input directive like that
another question: are the X.bam.bai1 and X.bam.bai2 identical or different?
you are getting two files in there somehow, not sure how.
Shellfishgene
@Shellfishgene
Aug 15 2018 15:26
They are linked to two different files, one version is linked to ...8cb6cca/X.bam.bai and the other to ...56098123432/input.2. Not sure what that means.
Karin Lagesen
@karinlag
Aug 15 2018 15:28
so, you have two different processes putting things into the same channel, and managing to get them to mix
I suggest going with putting all files that belong together into the same channel
Shellfishgene
@Shellfishgene
Aug 15 2018 15:29

Well, now I combine like

indexed_bam
    .combine(bam_index, by:0)
    .combine(split_bam, by:0)
    .set {discover_files}

and then do
set val( sample ), file( indexed_bam ), file( bam_index ), file( split_bam ) from discover_files
This seems to work

Karin Lagesen
@karinlag
Aug 15 2018 15:29
you beat me to it :)
Shellfishgene
@Shellfishgene
Aug 15 2018 15:31
It seems a little cumbersome somehow. But thanks for the help!
Karin Lagesen
@karinlag
Aug 15 2018 15:33
no problem :)
tbugfinder
@tbugfinder
Aug 15 2018 21:19
@Shellfishgene nextflow log -h is a starting point. However the help output doesn't include the run name option. nextflow log <runname> -F status == 'COMPLETED'
List also available fields using -l