These are chat archives for nextflow-io/nextflow

2nd
Mar 2018
Daniel E Cook
@danielecook
Mar 02 2018 04:44
Has anyone used pyenv within nextflow?
I’m having the darndest time getting a custom virtualenv loaded within a docker container...
Bioninbo
@Bioninbo
Mar 02 2018 08:56
I had troubles using conda environments in a container. So now I use different containers.
Alexander Peltzer
@apeltzer
Mar 02 2018 09:14
Can't confirm that. My experiences were quite nice, so I'm using biocontainers now for most of the stuff
For 80-90% of the tools out there, there is an appropriate Docker/Singularity container then
Paolo Di Tommaso
@pditommaso
Mar 02 2018 09:30
@srynobio I think won't work in that way, you will need to use a custom batch job template handling that mount
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:04
Hi @pditommaso I'm having a generic 140 error at the end of an execution in the cluster (while it runs manually using singularity exec -e .command.sh). I used a lot of memory just in case but still the process dies. what I can do for debugging? PS: there is nothing in .err or in .out thanks!
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:06
increase the mem ..
and time ..
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:06
I used 90G but in the trace it looks like it needs only 2G
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:07
how long it's taking?
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:07
less than one min
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:08
go in the task work dir
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:08
ok
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:08
launch the job with qsub .command.run
take the job_id
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:08
ok
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:08
when it's killed use qacct <jobid> to see the reason why it's killed
if qacct does not work => kill gabriel in your office :joy:
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:15
ok
I just killed Gabriel
now?
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:20
the head of IT ;)
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:20
==============================================================
qname        short-sl7           
hostname     node-hp0510.linux.crg.es
group        Bioinformatics_Unit 
owner        lcozzuto            
project      NONE                
department   defaultdepartment   
jobname      nf-dropReport_(SRR1784313)
jobnumber    22271189            
taskid       undefined
account      sge                 
priority     0                   
cwd          /nfs/software/bi/biocore_tools/git/nextflow/indrop/indrop/work/5c/f54981e507422fdde3dc2081cea2f5
submit_host  ant-login7.linux.crg.es
submit_cmd   qsub .command.run   
qsub_time    03/02/2018 11:19:38.600
start_time   03/02/2018 11:19:43.206
end_time     03/02/2018 11:19:52.544
granted_pe   NONE                
slots        1                   
failed       0    
deleted_by   NONE
exit_status  140                 
ru_wallclock 9.338        
ru_utime     0.380        
ru_stime     0.329        
ru_maxrss    5680                
ru_ixrss     0                   
ru_ismrss    0                   
ru_idrss     0                   
ru_isrss     0                   
ru_minflt    83970               
ru_majflt    39                  
ru_nswap     0                   
ru_inblock   15282               
ru_oublock   72                  
ru_msgsnd    0                   
ru_msgrcv    0                   
ru_nsignals  0                   
ru_nvcsw     2206                
ru_nivcsw    402                 
cpu          6.110        
mem          9.930             
io           0.057             
iow          0.000             
maxvmem      1025.701G
maxrss       412.598M
maxpss       403.660M
arid         undefined
jc_name      NONE
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:21
that's all?
usually there's the error message at the end
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:25
that's all
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:25
frankly I don't know
edit the .command.run, increase the mem and try it again
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:27
If I run in ant-login it works... so maybe I can measure the memory used
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:28
eventually, but there's something wrong
look the qstat maxvmem 1025.701G
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:29
it is quite a lot... what is this?
PS: I'm asking 100G to the node
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:29
and it's using 1025 therefore kill it
you owe me a :beer: :joy:
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:31
I did not get it (but you'll get your beer)
I don't think ant-login has 1000G...
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:31
thats's not ant-login
hostname node-hp0510.linux.crg.es
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:32
if I run in ant-login it works.
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:32
man you don't read the stuff
that's another problem
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:32
man you don't listen :)
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:33
for some reason the cluster measure the job is using that mem => kill it
maybe the cluster mem probe is wrong, I have no clue
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:33
ok so I should ask this to SIT
or kill the rest of them
:)
thanks
Paolo Di Tommaso
@pditommaso
Mar 02 2018 10:34
that's sound a good plan ;)
Luca Cozzuto
@lucacozzuto
Mar 02 2018 10:50
I tried in a simple qlogin and it works... but with qsub it dies
Vladimir Kiselev
@wikiselev
Mar 02 2018 14:21
I love Luca-Paolo conversations here, keep going! More jokes and more beer, please ;-)
Luca Cozzuto
@lucacozzuto
Mar 02 2018 15:12
I love it as well when it ends well... :) not yet for now...
Shawn Rynearson
@srynobio
Mar 02 2018 22:13
@pditommaso I'm reviewing the command.run shell script that's launched into your docker on aws-sbatch, and it looks like you do all the work in $NXF_SCRATCH or /tmp is this correct?