These are chat archives for nextflow-io/nextflow

22nd
Sep 2017
ashkurti
@ashkurti
Sep 22 2017 08:30
Hi! In a process with an lsf executor I want to launch I need to specify a directive that I would normally specify in the following way in a standard submission script: #BSUB -R "affinity[core(1)] span[ptile=16]" - How can I express this in a nextflow process ... would just adding it in the script body be fine ...
Luca Cozzuto
@lucacozzuto
Sep 22 2017 08:34
Hi Paolo, I was thinking to "force" some stage of nextflow just by removing the subfolder where intermediate files are...
what about "forcing" a process in this way?
Phil Ewels
@ewels
Sep 22 2017 09:21
@lucacozzuto: I think you're looking for cache: false?
(see docs)
@pditommaso - workflow introspection variables have container which has the docker container. But I guess this also shows singularity containers? Is there a way to track whether singularity or docker (or anything else) was used?
Luca Cozzuto
@lucacozzuto
Sep 22 2017 09:23
thanks Phil, but what about one single task?
Phil Ewels
@ewels
Sep 22 2017 09:26
ah, that's harder. Yeah probably easiest to just delete those work directories as you say
Luca Cozzuto
@lucacozzuto
Sep 22 2017 09:26
however I learnt cache : false :)
Paolo Di Tommaso
@pditommaso
Sep 22 2017 09:38
@ashkurti Hi, cluster specific options can be specified by using with clusterOptions, in the script won't work
@lucacozzuto I think you need #413. It will included in the next release
@ewels Yes you are right, good point. Now it's not possible to know which engine was used ..
Phil Ewels
@ewels
Sep 22 2017 09:41
I've just thought of some more data which would be nice to have too ;) The manifest scope - then we can have workflow homepage and description etc.
Paolo Di Tommaso
@pditommaso
Sep 22 2017 09:43
maybe the config files? but it would require #264
Phil Ewels
@ewels
Sep 22 2017 09:45
yeah, maybe.. or just the parsed config? Then we could print all parsed params too
isn't there a nextflow config command or something similar? eg. if we could just scoop all of that into a variable
Paolo Di Tommaso
@pditommaso
Sep 22 2017 09:46
makes sense
Phil Ewels
@ewels
Sep 22 2017 09:46
re: the requested requirements for each task - working nicely, though it's - for cached processes. I guess this is unavoidable? Doesn't make a big difference, it's just that we do have the used cpu / memory etc for cached tasks. So we can display their usage, but not normalised usage.
Paolo Di Tommaso
@pditommaso
Sep 22 2017 09:47
it would be nice to have some expandable text area to avoid to have a too long report
Phil Ewels
@ewels
Sep 22 2017 09:47
yeah definitely, I was already sort of planning that :P
searchable too
Paolo Di Tommaso
@pditommaso
Sep 22 2017 09:47
cool
e requested requirements for each task - working nicely, though it's - for cached processes.
what do you mean ?
Phil Ewels
@ewels
Sep 22 2017 09:48
image.png
cpus / memory / time are there for the processes that ran this time, but not for processes that were cached
Paolo Di Tommaso
@pditommaso
Sep 22 2017 09:49
um.. there should be
Phil Ewels
@ewels
Sep 22 2017 09:49
%cpu / vmem / duration are there for all processes
Luca Cozzuto
@lucacozzuto
Sep 22 2017 09:49
really really nice
Paolo Di Tommaso
@pditommaso
Sep 22 2017 09:49
I will check
Phil Ewels
@ewels
Sep 22 2017 09:49
ah ok cool :+1: Would be nice :)
Paolo Di Tommaso
@pditommaso
Sep 22 2017 09:50
@lucacozzuto we are pro here ;)
Luca Cozzuto
@lucacozzuto
Sep 22 2017 09:50
I see :)
Paolo Di Tommaso
@pditommaso
Sep 22 2017 09:55
@ewels I've just tested and that info are there also for cached processes
I guess you resumed a run launched with a previous version
Phil Ewels
@ewels
Sep 22 2017 09:56
aha, ok yes sorry :+1:
Paolo Di Tommaso
@pditommaso
Sep 22 2017 09:56
hence that info where not stored
Phil Ewels
@ewels
Sep 22 2017 09:56
will rerun from scratch
Paolo Di Tommaso
@pditommaso
Sep 22 2017 09:56
BTW you just tested that backward compatibility works fine ;)
Phil Ewels
@ewels
Sep 22 2017 09:57
:clap: :laughing:
Now it's handling units in javascript which is giving me a headache :confounded:
Paolo Di Tommaso
@pditommaso
Sep 22 2017 09:58
I GUESS SO !
Phil Ewels
@ewels
Sep 22 2017 10:04
I don't suppose that there's some super elegant method that could be implemented in the groovy code to always try to convert time / memory into standard units is there..? ;)
before I start reimplementing a tonne of string checks looking for ms, s, GB, MB etc etc
Paolo Di Tommaso
@pditommaso
Sep 22 2017 10:06
units area already reported in human readable format, is not fine that ?
Phil Ewels
@ewels
Sep 22 2017 10:07
no I want the opposite, it's a pain for plotting and also for table sorting
Paolo Di Tommaso
@pditommaso
Sep 22 2017 10:08
you so want access raw numbers and convert to string in the table ?
Phil Ewels
@ewels
Sep 22 2017 10:08
Yup :+1: Table is only a secondary thing, I mostly want raw numbers for plotting
I'm currently writing code to "normalise" the cpu/ memory / time against what was requested
Paolo Di Tommaso
@pditommaso
Sep 22 2017 10:09
I see
Phil Ewels
@ewels
Sep 22 2017 10:09
eg. if 4 cpus were requested and %cpu is 200% then cpus_used would be 50%
..if 16 GB memory is requested and peak_vmem is 300 MB then memory_used is 1.8% or whatever..
Paolo Di Tommaso
@pditommaso
Sep 22 2017 10:10
cool
but then you will need to concert to a human readable format ..
Phil Ewels
@ewels
Sep 22 2017 10:11
equally, feel free to compute that in the groovy side too ;)
Anthony Underwood
@aunderwo
Sep 22 2017 10:11

Hi when creating instances on AWS how is the size of the local image specfied?

df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      7.8G  1.9G  5.9G  24% /
devtmpfs        7.9G   96K  7.9G   1% /dev
tmpfs           7.9G     0  7.9G   0% /dev/shm

Seems to be the same if I specify a t2.micro or m4.xlarge. Is it defined by the ami? If so can it be increased?

Paolo Di Tommaso
@pditommaso
Sep 22 2017 10:11
equally, feel free to compute that in the groovy side too
it think it would require much less code, no ?
Phil Ewels
@ewels
Sep 22 2017 10:12
yup!
it's still a pain to have strings for plotting as I need to submit raw numbers to plotly for the graph
but percentages in this case is obviously easy
but I want to plot both "normalised" (percentages) and also raw usage (units)
Anthony Underwood
@aunderwo
Sep 22 2017 10:13

@ewels This looks awesome. Is this a GUI for monitoring nextflow workflows in progress?

image.png

Phil Ewels
@ewels
Sep 22 2017 10:13
@aunderwo - not in progress (currently), only a report on completion
Paolo Di Tommaso
@pditommaso
Sep 22 2017 10:14
Current state of the report ^
ouch, nearly 3MB. Should probably trim down the included JS libraries a little..
Paolo Di Tommaso
@pditommaso
Sep 22 2017 10:15
I'm a bit scared for the final size of one mln tasks report
Phil Ewels
@ewels
Sep 22 2017 10:15
Making something to monitor workflows in progress is what Mike was working on at the hackathon though: nextflow-io/nextflow#454
@pditommaso yeah... not sure that we can do much about that though?
In MultiQC I compress the JSON payload and uncompress it again in javascript
it's a bit messy though
Anthony Underwood
@aunderwo
Sep 22 2017 10:17

@ewels That is a thing of beauty!

test_report.html

Phil Ewels
@ewels
Sep 22 2017 10:17
..is the right answer! ;)
Paolo Di Tommaso
@pditommaso
Sep 22 2017 10:17
yes! phil is a javascript hero! :)
Anthony Underwood
@aunderwo
Sep 22 2017 10:18

for more recent EC2 instances (https://aws.amazon.com/ec2/instance-types/) the storage states EBS only - no fixed size

@aunderwo have a look here https://www.nextflow.io/docs/latest/awscloud.html#instance-storage

Paolo Di Tommaso
@pditommaso
Sep 22 2017 10:19
EBS is not supported yet
ashkurti
@ashkurti
Sep 22 2017 10:20

thanks @pditommaso would the following syntax be fine for example for multiple clusterOptions:

    clusterOptions:
    '#BSUB -R "affinity[core(1)]    span[ptile=16]"'
    "#BSUB -data input_file"

as I am having an error Unknown keyword clusterOptions

Anthony Underwood
@aunderwo
Sep 22 2017 10:20
So I'm puzzled when an m4.xlarge is created how are the sizes of /dev/xvda1, devtmpfs and tmpfs determined?
df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      7.8G  1.9G  5.9G  24% /
devtmpfs        7.9G   96K  7.9G   1% /dev
tmpfs           7.9G     0  7.9G   0% /dev/shm
Paolo Di Tommaso
@pditommaso
Sep 22 2017 10:24
@ashkurti nope, it should be clusterOptions = "affinity[core(1)] span[ptile=16]"
@aunderwo as far as I know each instance has one or more ephemeral disks
ashkurti
@ashkurti
Sep 22 2017 10:26
ok so could we have multiple lines of clusterOptions and also if for the affinity it will detect the -R flag automatically how can I express the #BSUB -data input_file condition
Paolo Di Tommaso
@pditommaso
Sep 22 2017 10:27
I fear this can be problematic
let me check on thing
ok, I was wrong
NF appends the clusterOptions string to a #BSUB declaration
in practical terms whatever you specify in clusterOptions is added in the in the #BSUB header
for example
clusterOptions 'xxx'
it turns
#BSUB xxx
I think you can do clusterOptions '-R "affinity[core(1)] span[ptile=16]" -data your_file'
ashkurti
@ashkurti
Sep 22 2017 10:32
ok thanks, very helpful will try this right away!!
ashkurti
@ashkurti
Sep 22 2017 10:41
I think there is some value that NF assigns for -R and I get the following error:
Command output:
  Syntax Error: Multiple -R resource requirement strings are not supported on "span", "cu" and "affinity" sections. Specify multiple -R resource requirement strings only on order, same, rusage, and select sections.
  Bad resource requirement syntax. Job not submitted.
in fact looking at the NF generated script at the work dir, the -R flag is included twice - once per my specification and once I suppose as determined by NF: #BSUB -R "span[hosts=1]"
Paolo Di Tommaso
@pditommaso
Sep 22 2017 10:44
NF adds a -R for cpus and memory
ashkurti
@ashkurti
Sep 22 2017 10:45
so how can I combine my specifications with NF specifications
Paolo Di Tommaso
@pditommaso
Sep 22 2017 10:46
do not specify cpus nor mem as NF directives and provide them with clusterOptions
ashkurti
@ashkurti
Sep 22 2017 10:52
thanks great tip!
Toni Hermoso Pulido
@toniher
Sep 22 2017 10:59
Hello, a simple question I didn't find a clear example so far... I'm trying to retrieve many files from a process and simply copy them to a directory. Which should be the most straightforward way?
ashkurti
@ashkurti
Sep 22 2017 11:04
and do you include the exports with a beforeScript directive - if so do you separate them with just a space or with a ;
Anthony Underwood
@aunderwo
Sep 22 2017 11:11
@pditommaso it seems that nextflow cloud creates the AWS instance with a default size of 8Gb. There's a chance this could not be enough space. Therefore I think it may be possible to increase storage size using withBlockDeviceMappings in java AWS api. Or would you recommend making scratch on a large shared volume?
Maybe that's the way to go if we need more space
Luca Cozzuto
@lucacozzuto
Sep 22 2017 11:25
Hello, a simple question I didn't find a clear example so far... I'm trying to retrieve many files from a process and simply copy them to a directory. Which should be the most straightforward way?
@toniher refers when you have the same name
for the ouptut
I think overwrite: false in publishDir
Paolo Di Tommaso
@pditommaso
Sep 22 2017 11:32
@aunderwo NF uses the AWS default that's 8GB, yes you can use bootStorageSize to set to a bigger size
Anthony Underwood
@aunderwo
Sep 22 2017 11:32
Thanks
Paolo Di Tommaso
@pditommaso
Sep 22 2017 11:33
@lucacozzuto that was an answer or a question ? :)
Luca Cozzuto
@lucacozzuto
Sep 22 2017 11:33
question
since it does not work
:)
Paolo Di Tommaso
@pditommaso
Sep 22 2017 11:33
.. and the question is?
Luca Cozzuto
@lucacozzuto
Sep 22 2017 11:34
imagine two processes that outputs two file
with the same name
which is the easiest way to keep them without overwriting?
Paolo Di Tommaso
@pditommaso
Sep 22 2017 11:34
store in a subdir with a different name
Luca Cozzuto
@lucacozzuto
Sep 22 2017 11:35
I was thinking to a new param of publishDir
to collect those file and changing the name as you do with collect()
in the processes
like name: PAOLO_*
Paolo Di Tommaso
@pditommaso
Sep 22 2017 11:36
:)
Luca Cozzuto
@lucacozzuto
Sep 22 2017 11:36
and you have PAOLO_1, PAOLO_2 etc
what do you think?
Paolo Di Tommaso
@pditommaso
Sep 22 2017 11:37
you can already do that with publishDir path, saveAs: { "PAOLO_${count++}" }
Luca Cozzuto
@lucacozzuto
Sep 22 2017 11:38
oooook, it was not so easy to get from the documentation BUT having this "name" param would be really nice
do you want I make an issue?
:)
Paolo Di Tommaso
@pditommaso
Sep 22 2017 11:38
sure, we can continue to discuss there
Luca Cozzuto
@lucacozzuto
Sep 22 2017 11:41
nextflow-io/nextflow#463
Toni Hermoso Pulido
@toniher
Sep 22 2017 11:42
Cannot invoke method next() on null object
publishDir 'kk', saveAs: { "blast_${count++}" }
seems not to work
Paolo Di Tommaso
@pditommaso
Sep 22 2017 11:42
oops, declare count=0 before the process definition
Phil Ewels
@ewels
Sep 22 2017 12:50
@pditommaso - anyway, I'll make a new PR with my recent tidying and then we can move on with the normalised numbers / standard units stuff when you're ready.
Anthony Underwood
@aunderwo
Sep 22 2017 14:33

Hey - I'm running a toy pipeline in AWS

created cluster

nextflow cloud create my-cluster -c 3

ran workflow on 4 samples which should be parallelised but the timeline suggests things are happening sequentially.

image.png
Any idea why this is happening?
Anthony Underwood
@aunderwo
Sep 22 2017 14:39

Sorry to report another issue
AWS complains now when I use spotPrices

> Launch configuration:
 - driver: 'aws'
 - imageId: 'ami-43f49030'
 - instanceType: 'm4.xlarge'
 - keyFile: /Users/anthony/.ssh/id_rsa.pub
 - securityGroup: 'sg-60be9818'
 - spotPrice: 0.06
 - subnetId: 'subnet-b2af26d7'
 - userName: 'anthony'

Please confirm you really want to launch the cluster with above configuration [y/n] y
Launching worker node -- Waiting for `running` status.. ERROR ~ The parameter groupName cannot be used with the parameter subnet (Service: AmazonEC2; Status Code: 400; Error Code: InvalidParameterCombination; Request ID: d51e9355-7a47-494a-8c68-ed47503aa4ba)

Here's the cloud section from the config file:

cloud {
    imageId = 'ami-43f49030'
    instanceType = 'm4.xlarge'
    spotPrice = 0.06
    subnetId = 'subnet-b2af26d7'
    securityGroup = 'sg-60be9818'
}
if I remove the spotPrice line then it works
Paolo Di Tommaso
@pditommaso
Sep 22 2017 14:41
regarding the second issue, it's an AWS error message
The parameter groupName cannot be used with the parameter subnet
Anthony Underwood
@aunderwo
Sep 22 2017 14:42
I'm sure this worked earlier this week or last weekend
I have copied what I did into a google doc that I sent to colleagues
I tried removing the subnetId but it doesn't like that either
Paolo Di Tommaso
@pditommaso
Sep 22 2017 14:44
these setting may change depending the AMI and instance type you are using
I don't have much more details, the AWS documentation is the proper place
Anthony Underwood
@aunderwo
Sep 22 2017 14:46
but as I said this is exactly the code I used before but is now not working and it follows the recommendation on the blog post
Paolo Di Tommaso
@pditommaso
Sep 22 2017 14:46
regarding the execution, we could try to have a look at the NF logs
surely it's weird
but it's clearly an AWS API error message
Anthony Underwood
@aunderwo
Sep 22 2017 14:47
yes so has API changed?
Paolo Di Tommaso
@pditommaso
Sep 22 2017 14:48
looks strange, I can give a try later
Anthony Underwood
@aunderwo
Sep 22 2017 14:48
Ok thanks
Paolo Di Tommaso
@pditommaso
Sep 22 2017 14:49
Sep-22 14:44:42.821 [main] DEBUG nextflow.scheduler.Scheduler - +++ Initial cluster topology:
- nodeId=0305a702-d84e-4f0f-95b4-68f9333b7108; hostname=ip-172-31-22-195; instance-id: i-074e123b6044a9eed; boot-time=22-Sep-2017 14:40; tot-res:[cpus=4; mem=15.7 GB; disk=5.9 GB]; idle=-
it looks there's only one node
(sorry, need to leave now)
Anthony Underwood
@aunderwo
Sep 22 2017 14:50
Ahh I just reran using m4.xlarge rather than t2.micro and it looks oK now
This is because m4.xlarge has 4 vCPUs
Anthony Underwood
@aunderwo
Sep 22 2017 15:04
image.png
This time line is for a m4.xlarge with 4 vCPUs
Anthony Underwood
@aunderwo
Sep 22 2017 15:13
Here's the timeline for the same process with a cluster with 2 nodes of the instance type m4.large (2 vCPUs)
image.png
However the nextflow log again suggest that there's only one node
https://gist.github.com/aunderwo/f4c367a1d1afd680a9c23cee4fbcee6b
Anthony Underwood
@aunderwo
Sep 22 2017 15:35
Here's another log when I specified -c 3 with m4.large instances. The topology is the same
https://gist.github.com/aunderwo/ea12377c657e6ec1755f52d2e505b02c
Unless I'm doing something stupid there seems to be a bug when setting up the cluster
Paolo Di Tommaso
@pditommaso
Sep 22 2017 15:57
More likely a security group problem
Anthony Underwood
@aunderwo
Sep 22 2017 15:58
What ports need to be open? I just have ssh incoming and http and https outgoing
Does ignite communicate on other ports?
Paolo Di Tommaso
@pditommaso
Sep 22 2017 16:03
Open all inbound ports having source the security group itself
Anthony Underwood
@aunderwo
Sep 22 2017 16:04
TCP and UDP?
Paolo Di Tommaso
@pditommaso
Sep 22 2017 16:05
Yes
(tho TACO should be enough)
*TCP
Anthony Underwood
@aunderwo
Sep 22 2017 16:06
Now I want some Tacos :)
Paolo Di Tommaso
@pditommaso
Sep 22 2017 16:07
Mobile autocorrection .. :)
Anthony Underwood
@aunderwo
Sep 22 2017 16:19
I've added those rules
image.png
However log and timeline still suggest just one node
ashkurti
@ashkurti
Sep 22 2017 16:20

do you use the beforeScript directive for exports in an lsf environment - if so how do you separate multiple export instances - would the following be correct:

beforeScript 'export var1=export1 ; export var2=export2'

I am just trying to figure out how to launch a script that uses LSF through NF but I am having problems.
@pditommaso can you have a look at this please

Anthony Underwood
@aunderwo
Sep 22 2017 16:20
image.png

@ashkurti
I think you'd use env variables as such
https://www.nextflow.io/docs/latest/config.html#scope-env

do you use the beforeScript directive for exports in an lsf environment - if so how do you separate multiple export instances - would the following be correct:

beforeScript 'export var1=export1 ; export var2=export2'

I am just trying to figure out how to launch a script that uses LSF through NF but I am having problems.
@pditommaso can you have a look at this please

Anthony Underwood
@aunderwo
Sep 22 2017 16:25
@ashkurti These get passed on as exports I believe
ashkurti
@ashkurti
Sep 22 2017 16:35

@pditommaso do you mean creating a nextflow.config file with the following content for example:

process.$myProcessName.env {
    OMP_NUM_THREADS=1
    LD_LIBRARY_PATH='/opt/ibm/lib:$LD_LIBRARY_PATH'
}

@aunderwo with these do you mean the variables I am passing with thebeforeScript directive?

Anthony Underwood
@aunderwo
Sep 22 2017 16:38
@ashkurti the variables in the config are for all processes I think. The doc says
The env scope allows you to define one or more environment variables that will be exported to the system environment where pipeline processes need to be executed.
@pditommaso I think both inbound and outbound ports are required for the security group. PLease can you share your security group config
ashkurti
@ashkurti
Sep 22 2017 16:40
it also says
It is possible to set the properties for a specific process in your pipeline by prefixing the process name with the symbol $ and using it as special scope identifier. For example:

process.queue = 'short'
process.$hello.queue = 'long'
Anthony Underwood
@aunderwo
Sep 22 2017 16:40
Ahh ok - didn't know that :)
ashkurti
@ashkurti
Sep 22 2017 16:41
:)
Anthony Underwood
@aunderwo
Sep 22 2017 16:50
@pditommaso opening outgoing ports allows cluster creation to work. However it appears that shared file storage isn't working
Launching `aunderwo/nextflow_ariba` [awesome_visvesvaraya] - revision: 8b55dbab50 [master]
[warm up] executor > ignite
[bf/fc57dd] Submitted process > get_database
[ea/f19246] Submitted process > run_ariba (2)
[73/123de3] Submitted process > run_ariba (1)
[d6/1357f8] Submitted process > run_ariba (4)
WARN: Process `run_ariba (1)` failed -- Execution is retried (1)
[f6/83b232] Re-submitted process > run_ariba (1)
WARN: Process `run_ariba (4)` failed -- Execution is retried (1)
[94/657fda] Re-submitted process > run_ariba (4)
[99/368da6] Submitted process > run_ariba (3)
WARN: Process `run_ariba (1)` failed -- Execution is retried (2)
[87/f0b28c] Re-submitted process > run_ariba (1)
WARN: Process `run_ariba (4)` failed -- Execution is retried (2)
[fd/ed9069] Re-submitted process > run_ariba (4)
[c5/bb6793] Re-submitted process > run_ariba (1)
WARN: Process `run_ariba (1)` failed -- Execution is retried (3)
[ea/7416f0] Re-submitted process > run_ariba (4)
WARN: Process `run_ariba (4)` failed -- Execution is retried (3)
ERROR ~ Error executing process > 'run_ariba (1)'

Caused by:
  java.nio.file.NoSuchFileException: /home/anthony/work/c5/bb6793e83a17153a66676594671d6e/.command.sh

Command executed:

  ariba run ariba_db test1.R1.fastq.gz test1.R2.fastq.gz test1.ariba
  cp test1.ariba/report.tsv test1.report.tsv

Command exit status:
  -

Command output:
  (empty)

Work dir:
  /home/anthony/work/c5/bb6793e83a17153a66676594671d6e

Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option `-resume`

 -- Check '.nextflow.log' file for details

Is it essential to specify an EFS shared mount point?

To configure the EFS file system you need to provide your EFS storage ID and the mount path by using the sharedStorageId and sharedStorageMount properties.

Paolo Di Tommaso
@pditommaso
Sep 22 2017 17:52
@ashkurti nope, you can't use that syntax env is not a process directive
you need to define as
env.FOO=bar
but you can define inputs as env variable, see here
@aunderwo if you don't use EFS, you will need to use S3 as shared storage
Anthony Underwood
@aunderwo
Sep 22 2017 18:41
@pditommaso I've defined input dirs and output dirs as s3:// URLs and this has worked fine when my previous setup was only allowing communication with a single node. However now there seems to be an issue with worker nodes not being able to access files on /home/anthony/work
Paolo Di Tommaso
@pditommaso
Sep 22 2017 18:42
an issue with worker nodes not being able to access files on /home/anthony/work
that's a local folder cannot be used to run your tasks
Anthony Underwood
@aunderwo
Sep 22 2017 18:43
in your example on the blog that used https://github.com/pditommaso/paraMSA I couldn't see anywhere where an alternative workspace was defined
Paolo Di Tommaso
@pditommaso
Sep 22 2017 18:43
because it uses EFS
Anthony Underwood
@aunderwo
Sep 22 2017 18:44
ah - so just by specifying an EFS mountpoint the work dir will be created there automagically?
I presume this can't be done with S3 (slow file access?)
Paolo Di Tommaso
@pditommaso
Sep 22 2017 18:45
if you specify EFS, NF will mount it and use as work dir your pipeline
Anthony Underwood
@aunderwo
Sep 22 2017 18:45
Is there an alternative?
Paolo Di Tommaso
@pditommaso
Sep 22 2017 18:46
otherwise you have to use S3 (or install your own shared storage) and specify a work dir there when running the pipeline
Is there an alternative?
to what?
Anthony Underwood
@aunderwo
Sep 22 2017 18:48

An alternative to using EBS for work dir?
Would you even recommend that?

To use S3 as scratch would I need to use NXF_WORK or the scratch directive>

Paolo Di Tommaso
@pditommaso
Sep 22 2017 18:49
EBS cannot be used as shared (writable) storage
nextflow run <your pipeline> -w s3://your-bucket
or
export NXF_WORK=s3://your-bucket
nextflow run <your pipeline>
Anthony Underwood
@aunderwo
Sep 22 2017 18:51

Sorry I meant EFS

EBS cannot be used as shared (writable) storage

Thanks will give it a go
>
nextflow run <your pipeline> -w s3://your-bucket
Paolo Di Tommaso
@pditommaso
Sep 22 2017 18:52
ok
Anthony Underwood
@aunderwo
Sep 22 2017 21:26
Yay! AWS now works as expected
1) Open all ports to security group inbound AND outbound
2) specify a work directory via -w s3://nextflow-data/work
I might write a blog post detailing the AWS protocol step by step
Paolo Di Tommaso
@pditommaso
Sep 22 2017 21:31
that would be nice !
Venkat Malladi
@vsmalladi
Sep 22 2017 21:43
@aunderwo that would be awesome
Anthony Underwood
@aunderwo
Sep 22 2017 21:43
@vsmalladi Will give it a go :)
Venkat Malladi
@vsmalladi
Sep 22 2017 21:44
Cool I am working wraping most my code in python, then using pytest to unit test the scripts and have a nextflow integration test on top
Anthony Underwood
@aunderwo
Sep 22 2017 21:45
very nice - would like to see how you run the nextflow integration test
Venkat Malladi
@vsmalladi
Sep 22 2017 21:45
Ya still working on it, but will keep you posted
Anthony Underwood
@aunderwo
Sep 22 2017 21:45
@vsmalladi Thanks
Paolo Di Tommaso
@pditommaso
Sep 22 2017 21:46
another blog post.. :)
Venkat Malladi
@vsmalladi
Sep 22 2017 21:46
@aunderwo Do you have a repository I can follow?
Venkat Malladi
@vsmalladi
Sep 22 2017 21:50
cool