These are chat archives for nextflow-io/nextflow

2nd
Oct 2017
Michael L Heuer
@heuermh
Oct 02 2017 00:52
@lucacozzuto did you open an issue re: file splitting? Please @ mention me there so that I might contribute...
ashkurti
@ashkurti
Oct 02 2017 08:36
good morning!
I have a problem with the scan of the nextflow.config file that has the following content:
process.$simulationDlMeso.env {
    OMP_NUM_THREADS=1
    LD_LIBRARY_PATH='/opt/ibm/lib:$LD_LIBRARY_PATH'
}
the problem reported is:
-bash-4.2$ nextflow run firstLsf.nf 
N E X T F L O W  ~  version 0.25.7
Launching `firstLsf.nf` [evil_mcnulty] - revision: b12940fee4
ERROR ~ Unable to parse config file: 'nextflow.config' 

  No signature of method: groovy.util.ConfigObject.env() is applicable for argument types: (_nf_config_7c6b28f4$_run_closure1) values: [_nf_config_7c6b28f4$_run_closure1@481ba2cf]
  Possible solutions: any(), any(groovy.lang.Closure), any(groovy.lang.Closure), find(), get(java.lang.Object), find(groovy.lang.Closure)
Paolo Di Tommaso
@pditommaso
Oct 02 2017 08:37
that syntax is not allowed
you can only do
env {
    OMP_NUM_THREADS=1
    LD_LIBRARY_PATH='/opt/ibm/lib:$LD_LIBRARY_PATH'
}
ashkurti
@ashkurti
Oct 02 2017 08:38
if I wanted to assign different values of env to different processes
is there any shortcuts then
I mean any way of doing that
Paolo Di Tommaso
@pditommaso
Oct 02 2017 08:39
you need do define them as an input
process foo {
  input:
  env OMP_NUM_THREADS from 1
  '''
  your_command
  '''
}
or
process foo {

  '''
  export OMP_NUM_THREADS=1
  your_command
  '''
}
ashkurti
@ashkurti
Oct 02 2017 08:41
ok thanks I will try it in the second way and remove the nextflow.config file
ashkurti
@ashkurti
Oct 02 2017 08:50
have you got any examples of a nextflow process of an lsf scheduler with a Data Manager
in my case the cluster job generated by nextflow disappears 2 seconds after being in run status (due to a problem with the data manager) but the nextflow run is still pending at the following status:
-bash-4.2$ nextflow run firstLsf.nf 
N E X T F L O W  ~  version 0.25.7
Launching `firstLsf.nf` [gloomy_mercator] - revision: 38417ed010
[warm up] executor > lsf
[28/e71f6f] Submitted process > simulationDlMeso
Paolo Di Tommaso
@pditommaso
Oct 02 2017 08:55
what's data manager ?
ashkurti
@ashkurti
Oct 02 2017 08:57
it is a data transfer of inputs and outputs from the login to the compute node that is done by the lsf scheduler (technically to reduce computation costs but practically causing other problems from the usability perspective)
Paolo Di Tommaso
@pditommaso
Oct 02 2017 08:58
mm, I think that can be the problem
NF is able to move input/output data on its own (when the option process.scratch = true is specified
ashkurti
@ashkurti
Oct 02 2017 09:00
ok but yet if the job has disappeared I don't understand why NF is still waiting for something
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:00
however it needs the control files created by NF in the work directory in the stared file system
I don't understand why NF is still waiting for something
I guess because it waits the control files that are not placed where they are supposed to be
is the data manager mandatory ?
ashkurti
@ashkurti
Oct 02 2017 09:02
in the case of this cluster it is intrinsic to LSF
but while the jobs run fine with a separate LSF script
when I try to replicate that script on NF
I can't get it to work
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:03
how do you run NF ?
ashkurti
@ashkurti
Oct 02 2017 09:03
nextflow run firstLsf.nf
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:04
the lsf executor is specified in the config file ?
ashkurti
@ashkurti
Oct 02 2017 09:05
this is my script:
#!/usr/bin/env nextflow

process simulationDlMeso {

    publishDir "."

    input:
    file deether_dir from Channel.value(file('./DEETHER/*'))

    output:
    file 'OUTPUT'

    executor 'lsf'
    time '3h'
    queue 'panther'
    module 'ibm ibmessl cuda/8.0 openmpi/2.0.2 utilities-gcc:python2'
    clusterOptions '-R "affinity[core(1)]    span[ptile=16]" -data /gpfs/cds/local/HCRI003/rla09/axs10-rla09/nxf/getstarted/DEETHER/FIELD -data /gpfs/cds/local/HCRI003/rla09/axs10-rla09/nxf/getstarted/DEETHER/CONTROL -data /gpfs/cds/local/HCRI003/rla09/axs10-rla09/nxf/getstarted/DEETHER/CONFIG -data /gpfs/cds/local/HCRI003/rla09/shared/common/dpd_xl_openmpi-v2.0.2-gpu_omp.exe -n 16'

    """
    export LD_LIBRARY_PATH=/opt/ibm/lib:$LD_LIBRARY_PATH
    export OMP_NUM_THREADS=1
    mpirun -n 16 dpd_xl_openmpi-v2.0.2-gpu_omp.exe
    touch simulation_complete
    bstage out -src HISTORY000000
    bstage out -src HISTORY000001
    bstage out -src HISTORY000002
    bstage out -src HISTORY000003
    bstage out -src OUTPUT
    bstage out -src simulation_complete
    """
with no config file
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:05
ok
are you launching it a directory accessible to all nodes ?
ashkurti
@ashkurti
Oct 02 2017 09:06
I was even thinking to remove the input block as the data will be transferred anyway to the compute nodes by the Data Manager through the -data directives
yes
in a directory accessible to all nodes
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:07
what the content of the directory work/28/e71f6f...
ashkurti
@ashkurti
Oct 02 2017 09:08
with the understanding that the job created by NF will still result with me as an owner and therefore have no problems accessing files specified in the script of which I am the owner
-bash-4.2$ ls -al work/5c/45caceda05a3e31e4dc1c255e99388/
total 0
drwxr-xr-x 2 axs10-rla09 rla09 4096 Oct  2 10:04 .
drwxr-xr-x 3 axs10-rla09 rla09 4096 Oct  2 10:04 ..
-rw-r--r-- 1 axs10-rla09 rla09  476 Oct  2 10:04 .command.env
-rw-r--r-- 1 axs10-rla09 rla09 2997 Oct  2 10:04 .command.run
-rw-r--r-- 1 axs10-rla09 rla09  638 Oct  2 10:04 .command.sh
that is the content and although the job on the cluster has terminated with problems it looks like NF still awaits for it to complete.
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:11
yes because NF wait for a file named .exitcode to be created in that folder when the job completes
you can try to debug only that task
changing in that directory and running bash .command
it should submit the execution with lsf and try to understand why it's not working
ashkurti
@ashkurti
Oct 02 2017 09:21
done.
-bash-4.2$ bash .command.run 
Lmod has detected the following error:  The following module(s) are unknown: "ibm/ibmessl"
so ibm and ibmessl modules of which it complains are two separate modules that need to be loaded separately. maybe the way I have expressed the module loads in the script is lacking something?
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:22
oh
you need to specify the modules one by one or separate them with a :
in any case it should produce an error
is there the .exitcode file in that folder now?
ashkurti
@ashkurti
Oct 02 2017 09:25
now it is.
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:25
well, the module syntax is wrong but I'm not sure that's the main issue
ashkurti
@ashkurti
Oct 02 2017 09:25
Before running this, I interrupted the pending NF with CTRL+c but it still did not create an .exitcodefile
which was created after running bash .command.run
and contains 1
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:27
I would suggest to try with a simpler pipeline eg. nextflow run hello to verify that you are able to run in that cluster
ashkurti
@ashkurti
Oct 02 2017 09:28
I am able to run jobs normally
or maybe you meant to verify that I can run nextflow jobs on that cluster
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:28
exactly
ashkurti
@ashkurti
Oct 02 2017 09:28
I can run the hello instance
from the login node
successfully
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:29
using the lsf executor ?
ashkurti
@ashkurti
Oct 02 2017 09:29
no without the lsf executor
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:29
eh, that may be the problem
creating a nextflow.config file with these settings
process {
  executor = 'lsf'
  time = '3h'
  queue = 'panther'
}
in the same directory launch nextflow run hello
ashkurti
@ashkurti
Oct 02 2017 09:37
bash-4.2$ nextflow run helloworld.nf 
N E X T F L O W  ~  version 0.25.7
Launching `helloworld.nf` [friendly_goldberg] - revision: b38eb225dd
[warm up] executor > lsf
[5d/cc4159] Submitted process > printHello (4)
[8c/9b6aee] Submitted process > printHello (3)
[83/ff9d13] Submitted process > printHello (1)
[aa/d4938c] Submitted process > printHello (2)
-bash-4.2$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
715048  axs10-r RUN   panther    hcplogin1   pgc324@pant *Hello_(4) Oct  2 10:36
715049  axs10-r PEND  panther    hcplogin1               *Hello_(3) Oct  2 10:36
715050  axs10-r PEND  panther    hcplogin1               *Hello_(1) Oct  2 10:36
715051  axs10-r PEND  panther    hcplogin1               *Hello_(2) Oct  2 10:36
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:38
almost there ..
does it print
Ciao world!
Bonjour world!
Hello world!
Hola world!
or hangs ?
ashkurti
@ashkurti
Oct 02 2017 09:38
it hangs
...
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:39
I see
ashkurti
@ashkurti
Oct 02 2017 09:39
is this because NF expects all outputs to be generated on the same folder work/...
if there are any files generated by the cluster they will be stored in the remote compute nodes unless specifically told to be transferred to wherever the launching folder is
do we know the names of the files generated by nextflow that we want to be brought back ...
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:42
let me check one thing
ashkurti
@ashkurti
Oct 02 2017 09:43
ok htanks
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:55
I'm uploading a possible patch
ashkurti
@ashkurti
Oct 02 2017 09:55
what would the fix be related then
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:58
to specify the path of the task work dir in the job def
ok, in the same dir try to run
NXF_VER=0.25.8-SNAPSHOT nextflow run hello
ashkurti
@ashkurti
Oct 02 2017 09:59
ok
Paolo Di Tommaso
@pditommaso
Oct 02 2017 09:59
it will download some stuff then run it
ashkurti
@ashkurti
Oct 02 2017 10:00
bash-4.2$ NXF_VER=0.25.8-SNAPSHOT nextflow run hello
CAPSULE: Downloading dependency io.nextflow:nextflow:pom:0.25.8-20171002.095338-4
CAPSULE: Transfer failed: capsule.org.eclipse.aether.transfer.ArtifactNotFoundException: Could not find artifact io.nextflow:nextflow:pom:0.25.8-20171002.095338-4 in local (file:/gpfs/fairthorpe/local/HCRI003/rla09/axs10-rla09/.m2/repository) (for stack trace, run with -Dcapsule.log=verbose)
CAPSULE: Downloading dependency io.nextflow:nextflow:pom:0.25.8-20171002.095338-4
CAPSULE: Downloading dependency io.nextflow:nxf-commons:pom:0.25.8-20171002.095403-4
CAPSULE: Transfer failed: capsule.org.eclipse.aether.transfer.ArtifactNotFoundException: Could not find artifact io.nextflow:nxf-commons:pom:0.25.8-20171002.095403-4 in local (file:/gpfs/fairthorpe/local/HCRI003/rla09/axs10-rla09/.m2/repository) (for stack trace, run with -Dcapsule.log=verbose)
CAPSULE: Downloading dependency io.nextflow:nxf-commons:pom:0.25.8-20171002.095403-4
CAPSULE: Downloading dependency io.nextflow:nxf-httpfs:pom:0.25.8-20171002.095513-4
CAPSULE: Transfer failed: capsule.org.eclipse.aether.transfer.ArtifactNotFoundException: Could not find artifact io.nextflow:nxf-httpfs:pom:0.25.8-20171002.095513-4 in local (file:/gpfs/fairthorpe/local/HCRI003/rla09/axs10-rla09/.m2/repository) (for stack trace, run with -Dcapsule.log=verbose)
CAPSULE: Downloading dependency io.nextflow:nxf-httpfs:pom:0.25.8-20171002.095513-4
CAPSULE: Downloading dependency io.nextflow:nextflow:jar:0.25.8-20171002.095338-4
CAPSULE: Downloading dependency io.nextflow:nxf-commons:jar:0.25.8-20171002.095403-4
CAPSULE: Downloading dependency io.nextflow:nxf-httpfs:jar:0.25.8-20171002.095513-4
N E X T F L O W  ~  version 0.25.8-SNAPSHOT
Launching `nextflow-io/hello` [ecstatic_shockley] - revision: 6b9515aba6 [master]
[warm up] executor > lsf
[5f/d5605e] Submitted process > sayHello (4)
[42/c4977a] Submitted process > sayHello (2)
[78/ca465a] Submitted process > sayHello (3)
[c1/f285de] Submitted process > sayHello (1)
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:00
still hangs ?
ashkurti
@ashkurti
Oct 02 2017 10:00
yes
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:01
too bad
ashkurti
@ashkurti
Oct 02 2017 10:01
I know
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:01
last try, wait
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:11
ok, I've made another change, try this please
ashkurti
@ashkurti
Oct 02 2017 10:11
thanks!!!
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:11
CAPSULE_RESET=1 NXF_VER=0.25.8-SNAPSHOT nextflow run hello
ashkurti
@ashkurti
Oct 02 2017 10:12
it is still hanging ...
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:14
ended my options :/
you need to try to ask help to your sysadmins
ashkurti
@ashkurti
Oct 02 2017 10:15
the thing is
I do not understand what is happening here
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:16
it's very easy for each job NF creates a work dir and a launcher script
ashkurti
@ashkurti
Oct 02 2017 10:16
I have found a way to check what is stored in the remote folders of the disk connected to the compute nodes
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:17
it executes the launcher ie .command.run
which is expected to run in that folder and create a file named .exitcode in that folder as well
you can replicate a single task execution just moving in that folder and running bash .command
ashkurti
@ashkurti
Oct 02 2017 10:28
/one/two/three/etc...
then, remote directory related to the compute nodes will be one/panther/three/etc...
I will replace this manually within the hello tasks in the cd section and see if this works
actually, ignoring what I just wrote
which I am deleting now as did not work
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:32
ok
ashkurti
@ashkurti
Oct 02 2017 10:32
I went
into the first folder, the launched script in with would print Ciao world
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:33
using what commadn ?
ashkurti
@ashkurti
Oct 02 2017 10:34
and it worked with just bash .command.run (it comlains no such file or directory with bash .command)
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:34
ok
ashkurti
@ashkurti
Oct 02 2017 10:34
and in the same moment the hanging instance of NF printed Ciao world as well
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:34
oh
yes because it created the .exitcode file that NF is expecting
so now the questions is
why if you run it directly it works but does not work with LSF ?
ashkurti
@ashkurti
Oct 02 2017 10:36
yes
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:36
you can launch it with your scheduler using the bsub command
it should be bsub .command.run
ashkurti
@ashkurti
Oct 02 2017 10:39
it complains line 8: .command.run: command not found
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:39
how do you submit jobs your cluster ?
ashkurti
@ashkurti
Oct 02 2017 10:42
sorry my mistake bsub < .command.run but it complains with the following
line 1: syntax error near unexpected token `newline'
line 1: `bsub> '
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:44
?
ashkurti
@ashkurti
Oct 02 2017 10:46
so we submit jobs in the following way: bsub < name_of_submission_script
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:46
yes right, but I don't understand the error message
ashkurti
@ashkurti
Oct 02 2017 10:47
and in this case with bsub < .command.run an file is generated by the cluster with content
/gpfs/panther/local/HCRI003/rla09/axs10-rla09/.lsbatch/1506940864.3819845.shell: line 1: syntax error near unexpected token `newline'
/gpfs/panther/local/HCRI003/rla09/axs10-rla09/.lsbatch/1506940864.3819845.shell: line 1: `bsub> '
Paolo Di Tommaso
@pditommaso
Oct 02 2017 10:48
I have any clue what is this . .
try to debug that, It should work it's just a bash script invoking the task command that is .command.sh and creating this .exitcode when completes
ashkurti
@ashkurti
Oct 02 2017 12:56
looking at the script generated by NF
that should then be launched through the LSF scheduler
I think the problem stands in three points
  1. There are files generated by that script on the compute nodes, that are expected on that launching directory. These files would need to be asked back in the script through the directive stage out -src file_name such as stage out -src .exitcode for example
  2. The remote directory (as per complications of our server) has got the same name of the launching directory except for the second part of the absolute path: /first/second/third/etc. where second should be replaced by panther.
  3. The files within the launching directory that will be used from within the script during the execution phase, will need to be referenced from the scheduler and transferred to the compute nodes through the #BSUB -data absolute_path_of_file directive.
ashkurti
@ashkurti
Oct 02 2017 13:08
Do you think, a NF patch could be integrated so as to address these issues for the cases of LSF schedulers that use a Data Manager?
I have to leave now but will come back to this later
Paolo Di Tommaso
@pditommaso
Oct 02 2017 13:09
please open a feature request on GH, with the above details
we will continue the discussion there
Rickard Hammarén
@Hammarn
Oct 02 2017 13:56
@pditommaso We have a problem with execution timing and evaluation for our config. This kinda works bit always ends up being ./iGenomes/ and not overwritten here.
We tried having {} to delay the evaluation until runtime but that gave the following error:
ERROR ~ org.codehaus.groovy.runtime.GStringImpl cannot be cast to java.nio.file.FileSystem
Paolo Di Tommaso
@pditommaso
Oct 02 2017 14:02
please open an issue including the full error trace in the .nextflow.config file
Anthony Underwood
@aunderwo
Oct 02 2017 17:01

@pditommaso pleased to say that the advice to set the docker user via docker.runOptions = "-u \$(id -u):\$(id -g) " worked. So it was the non-standard docker image user that was causing the issue.

In the meantime I'd made my own NF workflow and docker image (normal image without changing user) and it had worked which seemed to confirm our suspicions

Paolo Di Tommaso
@pditommaso
Oct 02 2017 17:05
I see, good to know
Félix C. Morency
@fmorency
Oct 02 2017 17:05
Would it be relevant to add this information to the NF docker section?
Paolo Di Tommaso
@pditommaso
Oct 02 2017 17:06
I was thinking that, tho if I'm not wrong it can cause other tricky side-effects
for example:
$ docker run -it busybox sh
/ # env
HOSTNAME=90646f96d1e6
SHLVL=1
HOME=/root
TERM=xterm
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
/ # exit
$ docker run -it -u $(id -u):$(id -g) busybox sh
/ $ env
HOSTNAME=38dd1a01531c
SHLVL=1
HOME=/
TERM=xterm
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
/ $ exit
when setting the external user the HOME is /
this can break tools that relies on that
Paolo Di Tommaso
@pditommaso
Oct 02 2017 17:12
tho, when using singularity the HOME is expected to be different from the one defined in the container ..
Félix C. Morency
@fmorency
Oct 02 2017 17:13
Interesting
Paolo Di Tommaso
@pditommaso
Oct 02 2017 17:14
maybe we should mount the hosting HOME implicitly in the same manner singularity does
Félix C. Morency
@fmorency
Oct 02 2017 17:15
or make it an option. iirc it's an option in the singularity config file
Paolo Di Tommaso
@pditommaso
Oct 02 2017 17:16
yep, but it's turned on by default
Félix C. Morency
@fmorency
Oct 02 2017 17:16
yes