These are chat archives for nextflow-io/nextflow

18th
Jan 2019
Raoul J.P. Bonnal
@helios
Jan 18 07:32

opened an issue to keep track of it nextflow-io/nextflow#1002

I'll look into it. Btw using the local solution for filtering directories, does saveAs access to the full path or just the basename ? I see a relativize call in the source code.

Paolo Di Tommaso
@pditommaso
Jan 18 08:32
File name relative to the task work dir
Raoul J.P. Bonnal
@helios
Jan 18 08:39
but just the file name, not the full relative path, right?
Paolo Di Tommaso
@pditommaso
Jan 18 09:32
it's the same path as you have declared in the output: declaration
therefore if you have
output:
file 'foo.bam'
it's foo.bam
if you have specified
output:
file 'dir/foo.bam'
it's dir/foo.bam
Raoul J.P. Bonnal
@helios
Jan 18 09:57
if it's *, are there subdirectories in the inner path as well ?
Paolo Di Tommaso
@pditommaso
Jan 18 09:58
I think so
rfenouil
@rfenouil
Jan 18 13:49
Hello (sorry deleted my message by mistake).
I am using something like Channel.fromPath(...).groupBy{...} and I would like each group to be emited as a separate value (not as a single big associative array).
Can someone suggest a conversion strategy ?
Shellfishgene
@Shellfishgene
Jan 18 13:57
I seem to remember there was a way to make the process vars, for example time or memory, dependent on the number of input files. Is this possible?
rfenouil
@rfenouil
Jan 18 14:13
Looks like Channel.fromPath(...).groupBy{...}.flatMap() does the job. Should have searched harder before asking :)
rfenouil
@rfenouil
Jan 18 14:37

@Shellfishgene something like this changes the memory declaration based on number of files

Channel.fromPath("yourPathHere/*").into{ ch_files; ch_filesForCount }

process testProcess {

    memory {3.GB*nbFiles}

  input:
    file myFiles   from ch_files
    val  nbFiles   from ch_filesForCount.count()

  script:
    """
    """
}

However this example is probably not optimal (better wait for confirmation by somebody with more experience) and does not really make any sense. Hope it helps as a base though.

Shellfishgene
@Shellfishgene
Jan 18 14:47
Thanks, I found count() and was wondering if I need a separate channel
rfenouil
@rfenouil
Jan 18 14:47
I could not manage to do without channel duplication but it is maybe possible
Gurus know better ;)
micans
@micans
Jan 18 14:53
It may be possible to do nbFiles.size(), but IIRC there may be an issue where this yields an error if there is only one file; if so then I think the plan is to fix that.
rfenouil
@rfenouil
Jan 18 14:54
You should be careful though, example I gave defines the memory for one process based on number of elements in the channel. If you want to set memory based on number of elements in 'myFiles' you should use memory {3.GB*(myFiles.size())}
No need for channel duplication in this case but answers a different question
@micans Thank you for the warning too
Shellfishgene
@Shellfishgene
Jan 18 15:00
What does size do? I can't find it in the list of operators.
micans
@micans
Jan 18 15:01
length of array; groovy method
Shellfishgene
@Shellfishgene
Jan 18 15:01
ok, thanks
Yasset Perez-Riverol
@ypriverol
Jan 18 15:26
Hi does anyone has tested Singularity + LSF ?
Paolo Di Tommaso
@pditommaso
Jan 18 15:27
@micans ?
anyhow I dont' think there's any critical requirement matching the two
micans
@micans
Jan 18 15:51
@pditommaso did I say something stupid?
again?
Paolo Di Tommaso
@pditommaso
Jan 18 15:52
why ?! :satisfied:
micans
@micans
Jan 18 15:52
or do you ask whether I've tested singularity + LSF? :-)
Paolo Di Tommaso
@pditommaso
Jan 18 15:52
exactly
micans
@micans
Jan 18 15:52
hehe
Yasset Perez-Riverol
@ypriverol
Jan 18 15:52
@micans my config looks like:

docker {
enabled = false
}

singularity {
enable = true
}

process.executor = 'lsf'

trace {
enabled = true
}

micans
@micans
Jan 18 15:53
we were talking just now about trying singularity here at Sanger with LSF. but we've not done it yet
Yasset Perez-Riverol
@ypriverol
Jan 18 15:54
here the LSF compute node (local run) with singualrity

docker {
enabled = false
}
singularity {
enabled = true
}

trace {
enabled = true
}

Paolo Di Tommaso
@pditommaso
Jan 18 15:55
just add process.executor = 'lsf'
Yasset Perez-Riverol
@ypriverol
Jan 18 15:55
yes i did it in the previous config ..
Paolo Di Tommaso
@pditommaso
Jan 18 15:55
but ?
Yasset Perez-Riverol
@ypriverol
Jan 18 15:56
this is the definition of the process in my workflow

process createDecoyDb {
container 'biocontainers/searchgui:v2.8.6_cv2'

input:
file "db.fasta" from fasta_file

output:
file "db_concatenated_target_decoy.fasta" into fasta_decoy_db

script:
"""
java -cp /home/biodocker/bin/SearchGUI-2.8.6/SearchGUI-2.8.6.jar eu.isas.searchgui.cmd.FastaCLI -decoy -in db.fasta
"""

}

Paolo Di Tommaso
@pditommaso
Jan 18 15:56
triple `
new line
code
new line
triple `
Yasset Perez-Riverol
@ypriverol
Jan 18 16:00
process createDecoyDb {
    container 'biocontainers/searchgui:v2.8.6_cv2'

    input:
    file "db.fasta" from fasta_file

    output:
    file "db_concatenated_target_decoy.fasta" into fasta_decoy_db

    script:
    """
    java -cp /home/biodocker/bin/SearchGUI-2.8.6/SearchGUI-2.8.6.jar eu.isas.searchgui.cmd.FastaCLI -decoy -in db.fasta
    """
}
Paolo Di Tommaso
@pditommaso
Jan 18 16:00
cool, but still don't see the problem ?
Yasset Perez-Riverol
@ypriverol
Jan 18 16:01
let me send you the error file
ERROR ~ Error executing process > 'createDecoyDb'

Caused by:
  Process `createDecoyDb` terminated with an error exit status (1)

Command executed:

  java -cp /home/biodocker/bin/SearchGUI-2.8.6/SearchGUI-2.8.6.jar eu.isas.searchgui.cmd.FastaCLI -decoy -in db.fasta

Command exit status:
  1

Command output:
  (empty)

Command error:
  Error: Could not find or load main class eu.isas.searchgui.cmd.FastaCLI
Paolo Di Tommaso
@pditommaso
Jan 18 16:03
ummm, invalid class path?
Yasset Perez-Riverol
@ypriverol
Jan 18 16:04
yes but when is not LSF it works with local+docker and local+singularity
Paolo Di Tommaso
@pditommaso
Jan 18 16:05
make a <task-work-dir>/.command.run | grep singularity
Yasset Perez-Riverol
@ypriverol
Jan 18 16:07
nothing
it looks like is not “wrapping" the singularity part
Paolo Di Tommaso
@pditommaso
Jan 18 16:08
what's the output of nextflow config ?
Yasset Perez-Riverol
@ypriverol
Jan 18 16:09
You mean what is my nextflow config ?
Paolo Di Tommaso
@pditommaso
Jan 18 16:10
no, it's a command
slowsmile
@slowsmile
Jan 18 16:18
Hi guys, I'm new to nextflow, can anyone give me a little help? In my code, I wrapped a module in one process; the module normally emits log steps on the fly but it stopped doing so after running within nextflow even I specify "output stdout" result and subscribe in the end
Paolo Di Tommaso
@pditommaso
Jan 18 16:19
what does it mean emits log steps ?
slowsmile
@slowsmile
Jan 18 16:19
anyway we can still have teh function spit out verbose info like it did in normal bash environment even it is wrapped within nextflow
that means printing out verbose info, like each steps
if you run a function, it prints some step log information during its run when you do it under shell
Paolo Di Tommaso
@pditommaso
Jan 18 16:22
it may be that info is printed to the stderr instead of stdout ?
slowsmile
@slowsmile
Jan 18 16:22
once it is wrapped in nextflow, looks like it wont print these info on the fly, all those have to be output after the entire process is ended
I set up my output channel as output:
stdout result
and then result.subscribe{ println it }
Paolo Di Tommaso
@pditommaso
Jan 18 16:24
I'm asking if your tool may print to stderr instead of stdout
slowsmile
@slowsmile
Jan 18 16:25
let me check, it indeed prints all teh log info after the entire process is finished though
just won't do it on the fly
Paolo Di Tommaso
@pditommaso
Jan 18 16:26
well this is expected
slowsmile
@slowsmile
Jan 18 16:26
yes it is stdout
not stderr
so my question is any way we can have it outputs the stdout during the process running
Paolo Di Tommaso
@pditommaso
Jan 18 16:27
nope
slowsmile
@slowsmile
Jan 18 16:27
emmm ...
so no way we can tell the users which steps in the process nextflow is working on, like giving a bit of info during its run?
besides the process tag
Paolo Di Tommaso
@pditommaso
Jan 18 16:29
tool output is not streamed lively
NF is designed to deploy thousands of tasks on remote nodes, streaming all of them would be overwhelming
slowsmile
@slowsmile
Jan 18 16:31
OK, thanks, so what is the best way to give some nextflow running process info to the users?
so that users know which steps the nextflow is running now and how much time it takes
Paolo Di Tommaso
@pditommaso
Jan 18 16:35
other than the default NF output there's a more informative one using the -ansi option
but I don't think what you are looking for
Tobias "Tobi" Schraink
@tobsecret
Jan 18 16:37

I am getting a bit of a cryptic error, running the following code:

  enaGroupGet -f fastq ERS010304
  number_of_files=$(find . -name '*.fastq.gz' | wc -l)
  exit $(($number_of_files%2)) #checks if there are an even number of files --> exits with 0 if yes, and 1 if not

Error:

/cm/local/apps/environment-modules/4.0.0//init/bash: line 15: MODULES_USE_COMPAT_VERSION: unbound variable

This should return an error-code of 1, instead it returns the above. Am I misunderstanding sth?

slowsmile
@slowsmile
Jan 18 16:40
Thanks Paolo
micans
@micans
Jan 18 16:42
@tobsecret I assume that's using the 'modules' framework, and that error is from a .err file in a work dir. Can you replicate the error by running .command.sh interactively?
(in the same module environment)
Tobias "Tobi" Schraink
@tobsecret
Jan 18 16:51
yup, turns out that's not what's actually breaking it
Should have thought of that before posting here
looks like I am just gonna have to delete said accession.
micans
@micans
Jan 18 17:11
@pditommaso I have a potentially fuzzy question, but maybe I'm lucky. Yesterday I posted a question about a groovy filter function that we apply on channels (with choice), where you suggested to restructure it a bit so an if part would not be evaluated. For a longer time I've wondered what happens behind the scenes, especially in the resume case; when code is evaluated, in what context. Anyway ... feel free to ignore, I will make some test cases that are a bit quicker than our rnaseq pipeline.
Paolo Di Tommaso
@pditommaso
Jan 18 17:25
uuuuh, remind me that next week, I left office :sunglasses:
micans
@micans
Jan 18 17:25
:+1: have a great weekend
Tobias "Tobi" Schraink
@tobsecret
Jan 18 17:25
Enjoy!
tbugfinder
@tbugfinder
Jan 18 17:51
@pditommaso How do you create changelog.txt from releases' definition on GitHub?
Yasset Perez-Riverol
@ypriverol
Jan 18 17:51
Testing Singularity + LSF works Great.. Thanks @pditommaso & @micans
Andrew Fiore-Gartland
@agartland
Jan 18 17:56
Hi, I've been using nextflow with aws batch with great success! It's been fun to figure out nextflow. I have a question about local memory usage though: even when i start nextflow with only cloud-destined processes, it allocates a large chunk of memory locally with the java VM....it seems to be 30 - 50 GB?
its stopped me from kicking off multiple jobs with java memory errors. i'm starting the jobs from a big server so i hadn't even notcied the issue until now. does this make sense to you? is there anyway to reduce the footprint?
image.png
thats my htop above and the java processes are one instance of nextflow (you'll have to trust me because the command is too long to fit on the screen)
Tobias "Tobi" Schraink
@tobsecret
Jan 18 18:29
@agartland You can set the minimum and maximum heap size for the executor to mitigate this problem
like so: NXF_OPTS='-Xms512, -Xmx2G' nextflow run pipeline.nf
Andrew Fiore-Gartland
@agartland
Jan 18 18:31
ahh OK. that line goes in the config file, inside the executor block?
what do you recommend for running remote jobs? i don't have a sense of how much computation nf is doing locally in that case....
oh is that just an environment variable that you set NXF_OPTS?
micans
@micans
Jan 18 18:35
See also here ... and I think there's no comma after -Xms512
yes, just env variable
Andrew Fiore-Gartland
@agartland
Jan 18 18:38
i see. so the reason its so big is because its taking 1/64 of the server memory by default. i'll try 512 and 2G
thanks!
Tobias "Tobi" Schraink
@tobsecret
Jan 18 18:46
Oh yeah, sorry that comma was a typo. And yes, 2G should be plenty - I have run thousands of samples using 1G maximum heap size.
hydriniumh2
@hydriniumh2
Jan 18 18:55
Has anyone here run into issues with the aws batch compute environment not updating desired vcpus and nextflow just idling?
Is this an AWS permissions issue?
Paolo Di Tommaso
@pditommaso
Jan 18 19:20
@tbugfinder by hands!
Yasset Perez-Riverol
@ypriverol
Jan 18 22:52
Hi all, do you know if we resume (-resume) a workflow after updating a dockerfile that is use in the workflow with singularity, if it will pull/build the new container?