These are chat archives for nextflow-io/nextflow

29th
Aug 2018
Lukas Jelonek
@lukasjelonek
Aug 29 2018 08:19

Hey,

I have a problem with wildcards in s3 paths. When I try the following

params.location = 's3://lj-nf-fastq-testdata/*.fastq.gz'
Channel.fromPath(params.location).set{ch_fastqs}

I get the following exception

N E X T F L O W  ~  version 0.31.1
Launching `./main.nf` [voluminous_varahamihira] - revision: 35fb256120
Exception in thread "pool-3-thread-1" java.lang.IllegalArgumentException: Key cannot be empty
        at com.amazonaws.util.ValidationUtils.assertStringNotEmpty(ValidationUtils.java:89)
        at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1380)
        at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1276)
        at com.upplication.s3fs.AmazonS3Client.getObject(AmazonS3Client.java:110)
        at com.upplication.s3fs.util.S3ObjectSummaryLookup.getS3Object(S3ObjectSummaryLookup.java:197)
        at com.upplication.s3fs.util.S3ObjectSummaryLookup.lookup(S3ObjectSummaryLookup.java:88)
        at com.upplication.s3fs.S3FileSystemProvider.readAttributes(S3FileSystemProvider.java:643)
        at java.nio.file.Files.readAttributes(Files.java:1737)
        at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
        at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
        at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:322)
        at java.nio.file.Files.walkFileTree(Files.java:2662)
        at nextflow.file.FileHelper.visitFiles(FileHelper.groovy:732)
        at nextflow.file.PathVisitor$_pathImpl_closure1.doCall(PathVisitor.groovy:141)
        at nextflow.file.PathVisitor$_pathImpl_closure1.call(PathVisitor.groovy)
        at groovy.lang.Closure.run(Closure.java:499)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

No problem when I access a file without wildcards.

Shellfishgene
@Shellfishgene
Aug 29 2018 08:43
I have a process that uses java and memory is set with the -xmx option. If I understand this correctly I can't use the task.retries or task.memory variables to adapt the memory value in the java option when the task is retried?
Paolo Di Tommaso
@pditommaso
Aug 29 2018 08:53
yes
I have a problem with wildcards in s3 paths. When I try the following
I guess you need to specify at least a base dir i.e.
params.location = 's3://lj-nf-fastq-testdata/foo/*.fastq.gz'
Shellfishgene
@Shellfishgene
Aug 29 2018 08:56
@pditommaso Was the "yes" for my question? If yes, that means it also does not make sense to increase cpus with task attempts because most processes will use task.cpus to increase a command line option, correct?
Paolo Di Tommaso
@pditommaso
Aug 29 2018 08:56
YES :smile:
Shellfishgene
@Shellfishgene
Aug 29 2018 08:57
Ok, thanks :)
Karin Lagesen
@karinlag
Aug 29 2018 08:59
ok, so I am having trouble understanding why I can't get the comitId out from my nf script
I have a script under version control and I add
println "Project : $workflow.projectDir"
println "Git info: $workflow.repository - $workflow.revision [$workflow.commitId]"
println "Cmd line: $workflow.commandLine"
to my script
the first and last line gives what I expect, but the git info line just gives null for all of them
Paolo Di Tommaso
@pditommaso
Aug 29 2018 09:00
how is your launch command line ?
Karin Lagesen
@karinlag
Aug 29 2018 09:01
nextflow run test.nf
...was that what you were asking, btw?
Paolo Di Tommaso
@pditommaso
Aug 29 2018 09:01
the commitId is only reported if you run it from github i.e.
nextflow run user/project
Karin Lagesen
@karinlag
Aug 29 2018 09:02
ok, so this won't work for code that I have in a local repo?
Paolo Di Tommaso
@pditommaso
Aug 29 2018 09:02
no
Karin Lagesen
@karinlag
Aug 29 2018 09:02
that explains, thanks :)
Paolo Di Tommaso
@pditommaso
Aug 29 2018 09:02
:+1:
Shellfishgene
@Shellfishgene
Aug 29 2018 09:04
If I use move mode with publishDir, nf will not find these files as cached on resuming a workflow I assume?
Paolo Di Tommaso
@pditommaso
Aug 29 2018 09:04
exactly
Shellfishgene
@Shellfishgene
Aug 29 2018 09:07
Sorry, last one: Is there a way to give multiple wildcards to a path channel, like Channel.fromPath( "/data/foo*.fa, /data/bar*.fa" )?
the last blue note box
Shellfishgene
@Shellfishgene
Aug 29 2018 09:09
Ah, didn't read far enough, thanks!
Paolo Di Tommaso
@pditommaso
Aug 29 2018 09:09
:wink:
Shellfishgene
@Shellfishgene
Aug 29 2018 09:11
How would I do that on the command line? --files "samples/A*, samples/B*"?
Paolo Di Tommaso
@pditommaso
Aug 29 2018 09:11
--files "samples/A*,samples/B*"
then in the script
Channel.fromPath (  params.files.tokenize(',') )
Shellfishgene
@Shellfishgene
Aug 29 2018 09:13
makes sense, cheers
Paolo Di Tommaso
@pditommaso
Aug 29 2018 09:13
:wave:
Lukas Jelonek
@lukasjelonek
Aug 29 2018 09:16

I guess you need to specify at least a base dir i.e.

params.location = 's3://lj-nf-fastq-testdata/foo/*.fastq.gz'

Worked.Thanks

Paolo Di Tommaso
@pditommaso
Aug 29 2018 09:17
:v:
Lukas Jelonek
@lukasjelonek
Aug 29 2018 09:19
Maybe a comment on that might be useful in the documentation, like: Wildcards do not work on the top level of a bucket. It is required that the data is located in a subdirectory. s3://mybucket/*.txt does not work, whereas s3:/mybucket/data/*.txt works.
Paolo Di Tommaso
@pditommaso
Aug 29 2018 09:20
:+1:
Anthony Underwood
@aunderwo
Aug 29 2018 09:56
Hey all - I have only run nextflow with docker on a mac before and now when I come to run this on a Linux box I am wondering what the recommended way of running nextflow is as a non-root user given that docker run requires root
Luca Cozzuto
@lucacozzuto
Aug 29 2018 09:57
@aunderwo I run docker run in our cluster without being root
Anthony Underwood
@aunderwo
Aug 29 2018 10:23

@lucacozzuto Thanks. I can do this so I don't have to run nextflow with sudo but now it's no longer finding executables

  .command.sh: line 2: fastqc: command not found

but I can see this if I run the docker image interactively

docker run -it bioinformant/ghru-assembly
bio@9576f944c566:/$ which fastqc
/home/bio/.linuxbrew/bin/fastqc
If I run the same thing as root - all is good
Paolo Di Tommaso
@pditommaso
Aug 29 2018 13:45
what if you run it as docker run bioinformant/ghru-assembly fastqc ?
Anthony Underwood
@aunderwo
Aug 29 2018 16:06
@pditommaso yes that works
docker run bioinformant/ghru-assembly fastqc -h

            FastQC - A high throughput sequence QC analysis tool

SYNOPSIS

    fastqc seqfile1 seqfile2 .. seqfileN

    fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam]
           [-c contaminant file] seqfile1 .. seqfileN

DESCRIPTION

    FastQC reads a set of sequence f.....
Paolo Di Tommaso
@pditommaso
Aug 29 2018 16:07
if so change to the task work dir and run bash .command.run
Anthony Underwood
@aunderwo
Aug 29 2018 16:08
it says
bash .command.run
/home/bio/work/bd/58643bf418f42a99e3464e2f42ec15/.command.sh: line 2: fastqc: command not found
Paolo Di Tommaso
@pditommaso
Aug 29 2018 16:09
bash -x .command.run ?
Anthony Underwood
@aunderwo
Aug 29 2018 16:09
+ set -e
+ set -u
+ NXF_DEBUG=0
+ [[ 0 > 1 ]]
+ trap on_exit EXIT
+ trap on_term TERM INT USR1 USR2
++ dd bs=18 count=1 if=/dev/urandom
++ base64
++ tr +/ 0A
+ export NXF_BOXID=nxf-pHozSc7JNAt0eF9XiXOBzmwN
+ NXF_BOXID=nxf-pHozSc7JNAt0eF9XiXOBzmwN
+ NXF_SCRATCH=
+ [[ 0 > 0 ]]
+ touch /home/bio/work/bd/58643bf418f42a99e3464e2f42ec15/.command.begin
+ [[ -n '' ]]
+ rm -f ERR668456_1.fastq.gz
+ rm -f ERR668456_2.fastq.gz
+ ln -s /home/bio/assembly_29_08_2019/raw_fastqs/ERR668456_1.fastq.gz ERR668456_1.fastq.gz
+ ln -s /home/bio/assembly_29_08_2019/raw_fastqs/ERR668456_2.fastq.gz ERR668456_2.fastq.gz
+ set +e
++ set +u
++ nxf_mktemp /dev/shm
+ ctmp=/dev/shm/nxf.leD0zFjGHQ
+ cout=/dev/shm/nxf.leD0zFjGHQ/.command.out
+ mkfifo /dev/shm/nxf.leD0zFjGHQ/.command.out
+ cerr=/dev/shm/nxf.leD0zFjGHQ/.command.err
+ mkfifo /dev/shm/nxf.leD0zFjGHQ/.command.err
+ tee1=6166
+ tee2=6167
+ pid=6168
+ wait 6168
+ tee .command.out
+ tee .command.err
++ id -u
++ id -g
+ docker run -i -e NXF_DEBUG=0 -v /home/bio:/home/bio -v /home/bio/work/bd/58643bf418f42a99e3464e2f42ec15:/home/bio/work/bd/58643bf418f42a99e3464e2f42ec15 -w /home/bio/work/bd/58643bf418f42a99e3464e2f42ec15 --entrypoint /bin/bash -u 1002:1002 --name nxf-pHozSc7JNAt0eF9XiXOBzmwN bioinformant/ghru-assembly -c '/bin/bash /home/bio/work/bd/58643bf418f42a99e3464e2f42ec15/.command.stub'
/home/bio/work/bd/58643bf418f42a99e3464e2f42ec15/.command.sh: line 2: fastqc: command not found
+ ret=127
+ wait 6166 6167
+ on_exit
+ exit_status=127
+ printf 127
+ set +u
+ [[ -n 6166 ]]
+ kill 6166
+ [[ -n 6167 ]]
+ kill 6167
+ [[ -n /dev/shm/nxf.leD0zFjGHQ ]]
+ rm -rf /dev/shm/nxf.leD0zFjGHQ
+ docker rm nxf-pHozSc7JNAt0eF9XiXOBzmwN
+ exit 127
Paolo Di Tommaso
@pditommaso
Aug 29 2018 16:10
docker run -u 1002:1002 bioinformant/ghru-assembly fastqc ?
Anthony Underwood
@aunderwo
Aug 29 2018 16:11
same as before - works OK
Paolo Di Tommaso
@pditommaso
Aug 29 2018 16:12
well, you need to debug it
surely means fastqc cannot be found in the container PATH
Anthony Underwood
@aunderwo
Aug 29 2018 16:13
sure - I'm running through it now :)
yes - just very strange that if I run as root user it works fine
Paolo Di Tommaso
@pditommaso
Aug 29 2018 16:13
modify the .command.sh with a simple env | sort
and check the PATH
Anthony Underwood
@aunderwo
Aug 29 2018 16:42

OK - so here's the thing - don't run docker with a user that has the same user name as the entry point for the container

I was running as a user bio and the docker container user was bio, so -v /home/bio:/home/bio was mapping a directory that already existed within which there was a subdir with the fastqc executable!!

Paolo Di Tommaso
@pditommaso
Aug 29 2018 16:43
ohh
that's a good one :wink:
Anthony Underwood
@aunderwo
Aug 29 2018 16:45

Serves me right trying to be clever with the users in the docker container !!

I've been using packer to create vagrant images, digital ocean snapshots and docker images at the same time and was using a user called bio rather than the standard root for docker images