These are chat archives for nextflow-io/nextflow

11th
Apr 2018
JezSw
@JezSw
Apr 11 2018 10:53
Hey has anyone had any luck using nextflow with singularity images to do CUDA/ GPU work on a SLURM cluster? I've been investigating and SLURM normally seems to modify the CUDA_VISISBLE_DEVICES environment variable to control which GPUs are used but this doesn't seem to get passed through when utilising nextflow but does when using a straight sbatch script.
Evan Floden
@evanfloden
Apr 11 2018 10:58
Not sure if relevent to SLURM, but @KevinSayers did work with Nextflow + CUDA/CPU + Singularity.
See for example here: https://github.com/KevinSayers/SRAGPU-nf - A Nextflow workflow that utilizes GPU enabled Singularity containers to process scRNA-seq data
Kevin Sayers
@KevinSayers
Apr 11 2018 11:01
@skptic @JezSw never tested it on slurm just AWS GPU instances unfortunately.
JezSw
@JezSw
Apr 11 2018 11:03
Thanks for that, we can get images to use CUDA, and be used on SLURM nodes. The main issue is actually limiting which ones they see! Hence the CUDA_VISIBLE_DEVICES question. I'll put my sample up shortly.
mmmnano@odin:/mnt/nanostore/soft/testGPU$ cat testGPU.sh
#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=res_%j.txt
#
#SBATCH -p gpu
#SBATCH --gres gpu:1

srun hostname
srun singularity exec /mnt/nanostore/soft/images/crumpit-2018-03-14-03faf4da43e7.img /mnt/nanostore/soft/samples/1_Utilities/deviceQuery/deviceQuery
srun sleep 10
mmmnano@odin:/mnt/nanostore/soft/testGPU$ sbatch testGPU.sh
Submitted batch job 39145
mmmnano@odin:/mnt/nanostore/soft/testGPU$ cat res_39145.txt
thor
/mnt/nanostore/soft/samples/1_Utilities/deviceQuery/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Quadro P5000"
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
mmmnano@odin:/mnt/nanostore/soft/testGPU$ cat testGPU.nf
#!/usr/bin/env nextflow

process get_cuda_visible_devices{
    clusterOptions '-p gpu --gres=gpu:1'

    output:
    stdout output_channel

    script:
    """
    hostname
    /mnt/nanostore/soft/samples/1_Utilities/deviceQuery/deviceQuery
    """
  }

output_channel.subscribe { print "I say..  $it" }
mmmnano@odin:/mnt/nanostore/soft/testGPU$ nextflow run testGPU.nf -with-singularity /mnt/nanostore/soft/images/crumpit-2018-03-14-03faf4da43e7.img -with-trace -w /mnt/nanostore/SCRATCH/
N E X T F L O W  ~  version 0.28.0
Launching `testGPU.nf` [lonely_varahamihira] - revision: a158568238
[warm up] executor > slurm
[3d/9d8107] Submitted process > get_cuda_visible_devices
I say..  thor
/mnt/nanostore/soft/samples/1_Utilities/deviceQuery/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 3 CUDA Capable device(s)

Device 0: "Quadro P5000"
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Quadro P5000"
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 2: "Quadro P5000"
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Quadro P5000 (GPU0) -> Quadro P5000 (GPU1) : Yes
> Peer access from Quadro P5000 (GPU0) -> Quadro P5000 (GPU2) : Yes
> Peer access from Quadro P5000 (GPU1) -> Quadro P5000 (GPU0) : Yes
> Peer access from Quadro P5000 (GPU1) -> Quadro P5000 (GPU2) : Yes
> Peer access from Quadro P5000 (GPU2) -> Quadro P5000 (GPU0) : Yes
> Peer access from Quadro P5000 (GPU2) -> Quadro P5000 (GPU1) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 3
Result = PASS
Kevin Sayers
@KevinSayers
Apr 11 2018 11:11
what is the output if you cat .command.run in the working dir?
JezSw
@JezSw
Apr 11 2018 11:13
#!/bin/bash
#SBATCH -D /mnt/nanostore/SCRATCH/3d/9d8107291d3bb67015fa4c1f4296fa
#SBATCH -J nf-get_cuda_visible_devices
#SBATCH -o /mnt/nanostore/SCRATCH/3d/9d8107291d3bb67015fa4c1f4296fa/.command.log
#SBATCH --no-requeue
#SBATCH -p gpu --gres=gpu:1

# NEXTFLOW TASK: get_cuda_visible_devices
set -e
set -u
NXF_DEBUG=${NXF_DEBUG:=0}; [[ $NXF_DEBUG > 1 ]] && set -x

nxf_env() {
    echo '============= task environment ============='
    env | sort | sed "s/\(.*\)AWS\(.*\)=\(.\{6\}\).*/\1AWS\2=\3xxxxxxxxxxxxx/"
    echo '============= task output =================='
}

nxf_kill() {
    declare -a ALL_CHILD
    while read P PP;do
        ALL_CHILD[$PP]+=" $P"
    done < <(ps -e -o pid= -o ppid=)

    walk() {
        [[ $1 != $$ ]] && kill $1 2>/dev/null || true
        for i in ${ALL_CHILD[$1]:=}; do walk $i; done
    }

    walk $1
}

nxf_mktemp() {
    local base=${1:-/tmp}
    if [[ $base == /dev/shm && ! -d $base ]]; then base=/tmp; fi
    if [[ $(uname) = Darwin ]]; then mktemp -d $base/nxf.XXXXXXXXXX
    else TMPDIR="$base" mktemp -d -t nxf.XXXXXXXXXX
    fi
}

on_exit() {
  exit_status=${ret:=$?}
  printf $exit_status > /mnt/nanostore/SCRATCH/3d/9d8107291d3bb67015fa4c1f4296fa/.exitcode
  set +u
  [[ "$tee1" ]] && kill $tee1 2>/dev/null
  [[ "$tee2" ]] && kill $tee2 2>/dev/null
  [[ "$ctmp" ]] && rm -rf $ctmp || true
  exit $exit_status
}

on_term() {
    set +e
    [[ "$pid" ]] && nxf_kill $pid
}

trap on_exit EXIT
trap on_term TERM INT USR1 USR2

export NXF_BOXID="nxf-$(dd bs=18 count=1 if=/dev/urandom 2>/dev/null | base64 | tr +/ 0A)"
NXF_SCRATCH=''
[[ $NXF_DEBUG > 0 ]] && nxf_env
touch /mnt/nanostore/SCRATCH/3d/9d8107291d3bb67015fa4c1f4296fa/.command.begin
[[ $NXF_SCRATCH ]] && echo "nxf-scratch-dir $HOSTNAME:$NXF_SCRATCH" && cd $NXF_SCRATCH

set +e
ctmp=$(nxf_mktemp /dev/shm)
cout=$ctmp/.command.out; mkfifo $cout
cerr=$ctmp/.command.err; mkfifo $cerr
tee .command.out < $cout &
tee1=$!
tee .command.err < $cerr >&2 &
tee2=$!
(
set +u; env - PATH="$PATH" SINGULARITYENV_TMP="$TMP" SINGULARITYENV_TMPDIR="$TMPDIR" singularity exec /mnt/nanostore/soft/images/crumpit-2018-03-14-03faf4da43e7.img /bin/bash -c "cd $PWD; /bin/bash /mnt/nanostore/SCRATCH/3d/9d8107291d3bb67015fa4c1f4296fa/.command.stub"
) >$cout 2>$cerr &
pid=$!
wait $pid || ret=$?
wait $tee1 $tee2
Luca Cozzuto
@lucacozzuto
Apr 11 2018 11:28
Dear all, is someone trying to run windows app inside containers on Linux machines?
Kevin Sayers
@KevinSayers
Apr 11 2018 11:29
@JezSw no idea, it looks like it should be equivalent to me but I am also not very familiar with clusterOptions or any common pitfalls. Seems like a question for @pditommaso :laughing:
JezSw
@JezSw
Apr 11 2018 11:31
Thanks for looking anyway @KevinSayers ! Yeah I have a feeling the setting is getting gobbled up somewhere or there's an obscure option that may help.
Paolo Di Tommaso
@pditommaso
Apr 11 2018 11:32
let's reduce this problem to a single task issue
I don't have a big expert of GPU but if you are able to run a single job, then you scale your workload with NF
@JezSw I'm missing what's the main issue ?
Kevin Sayers
@KevinSayers
Apr 11 2018 11:37
If you switch your testGPU.sh to:
#SBATCH --job-name=test
#SBATCH --output=res_%j.txt
#
#SBATCH -p gpu --gres gpu:1
does it still work outside of NF?
Paolo Di Tommaso
@pditommaso
Apr 11 2018 11:40
ok, I thunk I've understood what's the problem
NF clear the environment, look at env - prefixing the singularity command line
you need to specify explicitly all variables needed
JezSw
@JezSw
Apr 11 2018 11:41
@pditommaso . The main issue is that SLURM uses the CUDA_VISIBLE_DEVICES environment variable to control which GPUs are shown to the job. This cannot be found in a nextflow with singularity job. @KevinSayers that .sh script is being submitted using straight sbatch, no NF.
Paolo Di Tommaso
@pditommaso
Apr 11 2018 11:42
exactly
try to add in your nextflow config
env.CUDA_VISIBLE_DEVICES='${CUDA_VISIBLE_DEVICES:-1}'
note the ' single quote
JezSw
@JezSw
Apr 11 2018 11:49
Awesome @pditommaso that's done the trick. Thanks!
Paolo Di Tommaso
@pditommaso
Apr 11 2018 11:49
:tada: :tada:
jncvee
@jncvee
Apr 11 2018 14:30
tput: unknown terminal "xterm-256color"
tput: unknown terminal "xterm-256color"
tput: unknown terminal "xterm-256color"
tput: unknown terminal "xterm-256color"
tput: unknown terminal "xterm-256color"
HI i dont know what this means and I keep getting it
When are you getting this error?
what is your setup?
jncvee
@jncvee
Apr 11 2018 14:37
when I am running a script on a cluster
Paolo Di Tommaso
@pditommaso
Apr 11 2018 14:37
how is the complete output ? I mean who/what is writing that error message ?
Maxime Garcia
@MaxUlysse
Apr 11 2018 14:38
Which script? What kind of cluster?
Can you provide more information?
Félix C. Morency
@fmorency
Apr 11 2018 14:40
It doesn't look nf-related?
Paolo Di Tommaso
@pditommaso
Apr 11 2018 14:42
it could
jncvee
@jncvee
Apr 11 2018 14:52
it's a script that I made and it's a cluster I use for school so i really dont know too much about it
Paolo Di Tommaso
@pditommaso
Apr 11 2018 14:53
emm, what makes you think it's nextflow related ? :)
jncvee
@jncvee
Apr 11 2018 14:55
i've only gotten it when using a nextflow script
Paolo Di Tommaso
@pditommaso
Apr 11 2018 14:56
I would suggest to get assistance from your sysadmins
and eventually open an issue once you are more detailed error report
Luca Cozzuto
@lucacozzuto
Apr 11 2018 15:01
Hi @pditommaso how to get the current working directory for nextflow?
Paolo Di Tommaso
@pditommaso
Apr 11 2018 15:02
work directory of what? single task or entire workflow ?
Luca Cozzuto
@lucacozzuto
Apr 11 2018 15:02
single task
Paolo Di Tommaso
@pditommaso
Apr 11 2018 15:03
plain old bash $PWD
and don't forget to escape the $ ==> \$PWD
Luca Cozzuto
@lucacozzuto
Apr 11 2018 15:11
thanks!
Paolo Di Tommaso
@pditommaso
Apr 11 2018 15:11
:+1:
Stephen Kelly
@stevekm
Apr 11 2018 15:27
hey was anyone working on this one? nextflow-io/nextflow#493
Paolo Di Tommaso
@pditommaso
Apr 11 2018 15:28
nope
Stephen Kelly
@stevekm
Apr 11 2018 15:28
I was thinking about giving it a shot but I've never done anything in Groovy before so it might take me some time to get up to speed
but it sounded like its mostly just mirroring the functionality of the Docker and Singularity features?
Paolo Di Tommaso
@pditommaso
Apr 11 2018 15:29
well, actually there would be very little of groovy to implement that
main work is to generate the proper bash code in the wrapper create by NF
Stephen Kelly
@stevekm
Apr 11 2018 15:30
ok I was actually working on custom conda packages the other week and figured out how to make them, I did one here https://github.com/NYU-Molecular-Pathology/NGS580-nf/tree/master/conda/annovar-150617
I was thinking just have the user set up a package directory like that in a dir called 'conda' or something, and then it would just be a wrapper for NF to run those commands to build the env, then load it in the execution script
Paolo Di Tommaso
@pditommaso
Apr 11 2018 15:32
well this comment describes a possible implementation
do you have specific doubts on that ?
jncvee
@jncvee
Apr 11 2018 15:33
how does nextflow run zipped files? I ran a trimming script with an unzipped file and it worked but when I zipped it, it did not work? I believe this is a nextflow problem
Paolo Di Tommaso
@pditommaso
Apr 11 2018 15:42
how does nextflow run zipped files?
nothing special, it's just a file like any other
Stephen Kelly
@stevekm
Apr 11 2018 15:47
@pditommaso as per your conda comment there and the commands & files in my repo, there are a lot of variables that go into creating a conda env. For example, on my systems there are actually multiple 'conda' installations, and none are in the $PATH automatically, which is why I call mine by path in that Makefile. I am not yet familiar enough with 'conda' to be clear on your 'approach' points. For me I think the simplest way to start would be to look for pre-existing env's, and offload the creation & management of them to the user before running the pipeline. But I would probably have to spend more time messing with it to get a better idea
doesnt Nextflow already leave config information in the user's home directory? Maybe that would be a good place to store the env's?
Paolo Di Tommaso
@pditommaso
Apr 11 2018 15:49
with pre-existing env, you can just active before launching NF
I don't see any value on that
Stephen Kelly
@stevekm
Apr 11 2018 15:49
there is value because, just like Docker containers, you might want a different env per task
Right now I am using the beforeScript directive for it but it gets messy and crowded fast: https://github.com/NYU-Molecular-Pathology/NGS580-nf/blob/f700e1b36f2e13b0d56c7a4e4951710dc31e7b68/nextflow.config#L312
Paolo Di Tommaso
@pditommaso
Apr 11 2018 15:51
as soon as you as a few of them, it would be a mess to manually create them, frankly I don't like this approach
Stephen Kelly
@stevekm
Apr 11 2018 15:51
which approach? beforeScript?
Paolo Di Tommaso
@pditommaso
Apr 11 2018 15:52
that a user creates a conda environment, then he/she specifies the env name in the NF config/process
IMO NF should be able to handle conda modules or env files eg
process foo {
  conda 'bwa=1.1'

  '''
  bwa mem .. etc
  ''' 
}
or
process foo {
  conda '/some/path/env.yml'

  '''
  bwa mem .. etc
  ''' 
}
Stephen Kelly
@stevekm
Apr 11 2018 15:56
using env.yml seems like it would be much easier
Paolo Di Tommaso
@pditommaso
Apr 11 2018 15:56
if you want to give a try to that, for me it's fine
but it requires in any case to create the env on-fly starting from that file and then activate it
the former, it it's just a special case
because given a list of conda modules, it requires to create the the env file
jncvee
@jncvee
Apr 11 2018 15:59
i keep receiving this error Line 1 in FASTQ file is expected to start with '@', but found '\x1f\x8b\x08\x00^E\x19W\x00\x03' when it is zipped
Paolo Di Tommaso
@pditommaso
Apr 11 2018 16:00
go in the task work dir
and debug it running bash .command.run
check the files, fastq file, etc
Stephen Kelly
@stevekm
Apr 11 2018 16:00
it sounds like your tool only wants them unzipped, are you sure it works with zipped files?
jncvee
@jncvee
Apr 11 2018 16:01
i am using cutadapt and it has been used with zipped files before
Edgar
@edgano
Apr 11 2018 16:03
in the cutadapt documentation Files compressed with bzip2 (.bz2) or xz (.xz) are also supported, but only if the Python installation includes the proper modules. xz files require Python 3.3 or later.
maybe you dont have the python version on the container....
Stephen Kelly
@stevekm
Apr 11 2018 16:06
http://cutadapt.readthedocs.io/en/stable/guide.html : Cutadapt supports compressed input and output files. Whether an input file needs to be decompressed or an output file needs to be compressed is detected automatically by inspecting the file name: If it ends in .gz, then gzip compression is assumed.
maybe the file is getting staged without the .gz. suffix?
@pditommaso are there any dev docs for Nextflow? Trying to get an idea of the structure of the overall structure of the program and modules and figure out what parts to look at for the conda implementation
Paolo Di Tommaso
@pditommaso
Apr 11 2018 16:11
unfortunately no, BTW I'm commenting in that issue with an implementation idea I've just had
Paolo Di Tommaso
@pditommaso
Apr 11 2018 16:34
@stevekm try to have a look if it makes sense to you
Stephen Kelly
@stevekm
Apr 11 2018 18:19
ok. One thing, about this: "The environment needs to be create in the workflow work directory, this guarantees that's shared with all computing nodes (as for singularity images)."
on our HPC, the admins estimate that there is ~20s delay for the NFS storage to update between all nodes. Would this cause issues?
also, I just remembered that when I was trying to build my custom conda package, I needed to use 'conda-build' which was separate from conda and not installed by default I think, not sure if that might come up as a requirement for NF to perform the steps you describe
installing 'conda-build' required updating a lot of the built-in conda packages too so it was not as trivial as I had hoped to set up
Rohan Shah
@rohanshah
Apr 11 2018 22:05
does anyone know how to properly use the executor.jobName config? I tried adding it to my nextflow.config file but when the process is submitted to AWS Batch it uses the name of the process as the Batch job name
I also tried overriding it within my individual processes but got a ERROR ~ No such variable: jobName
Rohan Shah
@rohanshah
Apr 11 2018 22:43
nevermind, after digging through the code it seems that executor.jobName is not used for AWS Batch and cannot be easily changed to do so since it is not a grid executor
Rohan Shah
@rohanshah
Apr 11 2018 22:51
does anyone know of any other way to set a process name using variables in the nextflow runtime?