These are chat archives for nextflow-io/nextflow

27th
Apr 2017
marchoeppner
@marchoeppner
Apr 27 2017 05:40
Good morning everyone
I am currently adding another profile to my pipeline, so I can use it in our growing OpenStack private cloud. To this end, I was hoping to also containierize everything with Singularity.
I noticed that I cannot provide a singularity image to nextflow when it is a swift store object (http:/...). Am I doing something wrong, or is that not currently supported?
Nextflow expands the path automatically assuming that it lives in my home directory
Paolo Di Tommaso
@pditommaso
Apr 27 2017 07:31
good morning @marchoeppner
nope, singularity images need to be stored in the local or shared file system
interested to lean more about your use case
marchoeppner
@marchoeppner
Apr 27 2017 08:08
right - we are in the process of building up a fairly large Openstack system (in Germany that seems preferred over AWS, probably for data protection reasons - medical data is a very touch subject here). My geneious thought was that by providing the singularity image via swift, users of the pipeline won't have to deal with getting the image into their OS instance, basically on less step to worry about. The image includes everything that is needed for exome analysis - if NF could get the image from swift, then the user really only has to provide the actual input data (the sample file describing the reads)
Paolo Di Tommaso
@pditommaso
Apr 27 2017 08:10
if Swift storage compatibile with AWS S3 api ?
marchoeppner
@marchoeppner
Apr 27 2017 08:11
that is a good point, hm - haven't looked into that; maybe...
Paolo Di Tommaso
@pditommaso
Apr 27 2017 08:13
also, what kind of clustering are you planning to use?
marchoeppner
@marchoeppner
Apr 27 2017 08:14
I am currently using elasticluster to launch slurm clusters
Paolo Di Tommaso
@pditommaso
Apr 27 2017 08:15
I see, so you will run a slurm cluster
marchoeppner
@marchoeppner
Apr 27 2017 08:15
exactly
Paolo Di Tommaso
@pditommaso
Apr 27 2017 08:17
singularity images cannot be automatically downloaded by the NF because when you spawn multiple tasks in cluster that will result in a race condition (multiple jobs downloading the same image)
with docker it's possible because the docker daemon take care to synchronise the multiple requests
but there isn't sack daemon with singularity
marchoeppner
@marchoeppner
Apr 27 2017 08:20
right, docker would of course work with openstack - was using singularity because we are also running an actual hpc cluster (so no docker there). Will think about this some more - downloading the singularity image is not the worst thing, since that only has to be done once.
Paolo Di Tommaso
@pditommaso
Apr 27 2017 08:21
exactly, in our deployments we considers the container image an asset to be available as in the case of input data
I think it's fair in the case of HPC
anyhow they are also developing a singularity hub that at some point it could be used to automatically download images
marchoeppner
@marchoeppner
Apr 27 2017 08:24
thanks, I'll keep an eye on that
Alessia
@alesssia
Apr 27 2017 08:49
Hello everyone. I have a problem with our new PBS configuration. As of today, all jobs must have a resource request conforming to this specification: #PBS -lselect=X:ncpus=Y:mem=Z[:mpiprocs=S:ompthreads=T], and when running my nextflow job I get. for each process, the following error: "qsub: All jobs must specify a select statement, jobs cannot be allocated proper resources without this!", due to the fact that the call is: qsub -N nf-dedup_1 .command.run, which specify resources as following:
PBS -l nodes=1:ppn=6 PBS -l walltime=01:00:00 PBS -l mem=32gb
anyway to have it done as #PBS -l select=1:ncpus=4:mem=32G?
Paolo Di Tommaso
@pditommaso
Apr 27 2017 08:52
yes, use the clusterOptions directive that allows to use any custom configuration
Alessia
@alesssia
Apr 27 2017 08:55
cool, I missed it, my bad! Thanks!
Paolo Di Tommaso
@pditommaso
Apr 27 2017 08:55
no pb, you are welcome
mitul-patel
@mitul-patel
Apr 27 2017 09:20
Hello,, I need help regarding channel in nextflow. I want to group paired end files for multiple samples... Can I write globe that can detect {1,2}.fastq.gz, {R1,R2}.fasta.gz, {1,2}.fastq, {R1,R2}.fq within the same directory?
Paolo Di Tommaso
@pditommaso
Apr 27 2017 09:21
you mean having different extensions ?
mitul-patel
@mitul-patel
Apr 27 2017 09:28
yes, I dont know the extension. like they may be ending with 1.fastq.gz or R1.fq.gz or 1.fq or 1.fastq
Paolo Di Tommaso
@pditommaso
Apr 27 2017 09:32
yes, you can do something like
fromfilePairs( '*{1,2,R1,R2}.{fasta.gz,fastq,fq}' )
the important thing is that the * matches a common prefix for the same pair
mitul-patel
@mitul-patel
Apr 27 2017 09:34
great... thats what i wanted....many thanks.,..
mitul-patel
@mitul-patel
Apr 27 2017 10:00
what could be the best solution to iterate block of multiple processes. Like I want to run 5 different processes one after another 10 times...
I am thinking about processes inside while loop... or functions like in python....
Paolo Di Tommaso
@pditommaso
Apr 27 2017 10:01
NF uses a functional approach, thus there are not loops
process executions is triggered by the data
however you can repeat the execution of the process over a range of values
see input repeaters
mitul-patel
@mitul-patel
Apr 27 2017 10:11
still it wont solve my problem. because I want to run each process 10 times but not in one go. I wan to run process1, process2, process3, process4 and process5 then again process1, process2, process3, process4 and process5 then again process1, process2, process3, process4 and process5...and so on until 10 iterations....
Paolo Di Tommaso
@pditommaso
Apr 27 2017 10:12
is so, you need to think it in term of flow instead of interation
I guess the main data are the reads, right ?
mitul-patel
@mitul-patel
Apr 27 2017 10:22
my main data are not reads.... for each read pair I want iterate 5 process 10 times.
Paolo Di Tommaso
@pditommaso
Apr 27 2017 10:24
I see, anyhow is the same
you can have an input repeat ie. each x from 1..10 in the first process
this will trigger the multiple executions of the downstream processes
mitul-patel
@mitul-patel
Apr 27 2017 11:01
sorry i didnt mentioned that each process input is the output of previous process. Like process 1 output will be input of process 2. and output of process 2 will be input of process 3 and so on... do you think it will still work.
Paolo Di Tommaso
@pditommaso
Apr 27 2017 11:02
of course, this is exactly what NF is meant to do :)
for example:
process foo {
  input: 
  file data from data_ch
  each x from 1..10
  output: 
  set x, file 'result' into foo_ch 
  script: 
  """
  your_command --in $data --index $x --out result
  """
}

process bar {
  input:
  set x, file(data) from foo_ch
  output: 
  set x, file('result') into bar_ch 
  script:
  """
  your_command --in $data --index $x --out result
  """
}

...
so on . .
(updated)
mitul-patel
@mitul-patel
Apr 27 2017 13:04
@pditommaso fromfilePairs( '*{1,2,R1,R2}.{fasta.gz,fastq,fq}' ) doesnt work for the CP123_B11_R1_trimmed.fastq.gz
@pditommaso fromfilePairs( '*{1,2,R1,R2}.*.{fasta.gz,fastq,fq}' ) I also tried this one..but doesnt work.....
Paolo Di Tommaso
@pditommaso
Apr 27 2017 13:10
the first doesn't match the file name
the second won't work because the * is supposed to be used to capture the grouping prefix
mitul-patel
@mitul-patel
Apr 27 2017 13:19
The file are CP123_B11_R1_trimmed.fastq.gz, CP123_B11_R2_trimmed.fastq.gz fromFfilePairs( '{1,2,R1,R2}..{fastq.gz,fastq,fq}' ) can i use * to match trimmed....?
Paolo Di Tommaso
@pditommaso
Apr 27 2017 13:20
I fear no, having so different file names, maybe makes sense to create csv file listing your file pairs, and then parse that file
Paolo Di Tommaso
@pditommaso
Apr 27 2017 13:58
Registration is open for Nextflow workshop
Phil Ewels
@ewels
Apr 27 2017 14:10
:tada:
mitul-patel
@mitul-patel
Apr 27 2017 15:34
@pditommaso I can now get all FastQ files via TSV parsing.....
I am getting error when I use docker image.... WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
Paolo Di Tommaso
@pditommaso
Apr 27 2017 15:36
what ?
too little memory? it looks a docker problem google for that