These are chat archives for nextflow-io/nextflow

9th
Aug 2017
Francesco Strozzi
@fstrozzi
Aug 09 2017 05:30 UTC
+1
LukeGoodsell
@LukeGoodsell
Aug 09 2017 07:25 UTC
Bonjourno. Is there a way to have Nextflow abort in the event of a warning à la options(warn = 2) in R, or use strictin Perl?
Paolo Di Tommaso
@pditommaso
Aug 09 2017 07:25 UTC
Buongiorno ! :)
nope, warn = 2 in NF is supposed to be an error, hence it will abort ..
what warning are you referring ?
LukeGoodsell
@LukeGoodsell
Aug 09 2017 07:32 UTC
I got a "WARN: Input tuple does not match input set cardinality declared by process …”
Fortunately, I spotted it and fixed a bug that could have broken my pipeline. I was hoping there may be a wat to have it fail at the warning so that potential bugs can be more easily spotted.
It wasn’t a bug in a script/shell section but a change to the set(…) spec for a channel
(NB: the bug was mine, not Nextflow’s)
Paolo Di Tommaso
@pditommaso
Aug 09 2017 07:34 UTC
yes, it could have sense to have strict flag at some point
LukeGoodsell
@LukeGoodsell
Aug 09 2017 07:35 UTC
Shall I create a feature request issue on GitHub?
Paolo Di Tommaso
@pditommaso
Aug 09 2017 07:36 UTC
yes
grazie
:)
LukeGoodsell
@LukeGoodsell
Aug 09 2017 07:36 UTC
de nada
Paolo Di Tommaso
@pditommaso
Aug 09 2017 07:36 UTC
(time to learn some italian in this channel)
:D
Simone Baffelli
@baffelli
Aug 09 2017 07:37 UTC
I agree
LukeGoodsell
@LukeGoodsell
Aug 09 2017 07:37 UTC
haha, I support this idea. I’m bored of English!
Simone Baffelli
@baffelli
Aug 09 2017 07:37 UTC
I use too much english at work
I don;t even know many technical words in italian
Francesco Strozzi
@fstrozzi
Aug 09 2017 07:39 UTC
Like "Sequenziamento di ultima generazione"?
LukeGoodsell
@LukeGoodsell
Aug 09 2017 07:46 UTC
Segnalato: #426
Paolo Di Tommaso
@pditommaso
Aug 09 2017 07:48 UTC
better to keep italian for food ;)
Shellfishgene
@Shellfishgene
Aug 09 2017 08:03 UTC
I wonder if some bioinformatics groups / labs have already made a decision to not buy a large cluster but just to use AWS or similar.
Paolo Di Tommaso
@pditommaso
Aug 09 2017 08:04 UTC
labs tend to use internal infra
but there are companies/startups (also in this channel) just using the cloud
Francesco Strozzi
@fstrozzi
Aug 09 2017 08:08 UTC
@Shellfishgene yes :)
that’s why I started discussing AWS Batch with Paolo
Shellfishgene
@Shellfishgene
Aug 09 2017 08:08 UTC
Does it come out cheaper for you, or is it also about space/administration?
Francesco Strozzi
@fstrozzi
Aug 09 2017 08:09 UTC
it’s cheaper because we literally pay just what we use (I know, seems like advertising but it’s true) + a lot simpler administration, although you need to invest some time in learning how AWS works (especially security and users / permissions).
LukeGoodsell
@LukeGoodsell
Aug 09 2017 08:09 UTC
@Shellfishgene, I work at a startup and use AWS. There’s much lower up-front costs, and it allows us to prototype and test different infrastructures.
Our needs are constantly changing, and if we bought our own infrastructure, it would almost certainly have both excess capacity in some regards, and insufficient capacity in others.
Shellfishgene
@Shellfishgene
Aug 09 2017 08:15 UTC
That makes sense. I guess larger clusters at Universities are shared between enough people that they don't really run idle either.
Francesco Strozzi
@fstrozzi
Aug 09 2017 08:15 UTC
yes exactly, for small groups or companies I think it starts to make much more sense to go to the cloud (AWS, Google, Azure…)
LukeGoodsell
@LukeGoodsell
Aug 09 2017 08:30 UTC
I have no data, but I imagine the economies of scale may make it cheaper for universities to use IAAS providers when their current clusters reach end-of-life. The data protection, flexibility and control considerations may still outweigh cost when it comes to the decision, though.
Shellfishgene
@Shellfishgene
Aug 09 2017 08:35 UTC
I think the typical uni cluster is added on to and ends up as a mishmash of nodes... Ours is actually leased it seems, and gets replaced every so often.
Simone Baffelli
@baffelli
Aug 09 2017 08:46 UTC
Plus I presume it may it easier, at our uni to use a cluster one has to send some sort of proposal and get it approved, while to use AWS one only has to pay :money_with_wings:
Francesco Strozzi
@fstrozzi
Aug 09 2017 09:04 UTC
I am totally in favour of the cloud, but it involves anyway a certain effort for both the administrators and the users to adapt or change the way of doing certain things. Running workloads on EC2 is similar to what you do locally but there may be differences in the way users access data for example (S3 mainly, EFS is way too expensive, I think only used in few places). Using managed services like Batch or ECS to run containers imply also a shift in the approach for the users (and the administrators as well). What I mean is that for many universities the whole process could take months (and money) to train people to use a new platform.
What I have seen in some places is the hybrid approach, keep the local infra and attach the cloud when there is a peek in usage or special needs. This way for the users is more transparent and the administrators get the time to learn
Paolo Di Tommaso
@pditommaso
Aug 09 2017 09:10 UTC
mobility is the key, develop on the laptop, deploy in the uni cluster, scale in the cloud when needed
Shellfishgene
@Shellfishgene
Aug 09 2017 09:45 UTC
@pditommaso I just ran nextflow self-update using sudo, as I have it in /opt/. It appears that also changes stuff in ~/.nextflow, for example the framkework dir. This is then owned by root and nextflow console does not work anymore. Easy to fix, but is it supposed to work like that?
Paolo Di Tommaso
@pditommaso
Aug 09 2017 09:46 UTC
NF is supposed to be installed without sudo
Shellfishgene
@Shellfishgene
Aug 09 2017 09:47 UTC
Ok, so system-wide installs are discouraged?
Paolo Di Tommaso
@pditommaso
Aug 09 2017 09:49 UTC
in a cluster environment ?
Shellfishgene
@Shellfishgene
Aug 09 2017 09:50 UTC
No, this is a on a single multi-user workstation. I guess it's so easy to install every user can do it.
Paolo Di Tommaso
@pditommaso
Aug 09 2017 09:51 UTC
I suggest to only share the nextflow launcher script in /usr/local/bin
then it will pull the deps as needed
the drawback is that the self-update won't be able to overwrite the launcher because it's owned by root
Phil Ewels
@ewels
Aug 09 2017 09:56 UTC
We played with this kind of thing for a while, but now I just tell people to install their own user version
Seems easier and tends to work better we find
Simone Baffelli
@baffelli
Aug 09 2017 10:24 UTC
Totally unrelated question: is the use of storeDir really discouraged?
Paolo Di Tommaso
@pditommaso
Aug 09 2017 10:25 UTC
unless you need to preserve data across different pipelines/instances
Simone Baffelli
@baffelli
Aug 09 2017 10:27 UTC
i would like to preserve the mask I computed, so that it awlays stays the same when I change other parameters
but somehow on monday storeDir did not work properly
and the process ran again, invalidating the cache of all the subsequent proceeses
Paolo Di Tommaso
@pditommaso
Aug 09 2017 10:33 UTC
Can anybody have touched those files?
Simone Baffelli
@baffelli
Aug 09 2017 11:49 UTC
That could be, because they were on our server
which is still in trouble
thats a good point
does it imply that the hash is computed on the base of the date and not of the file contents?
Paolo Di Tommaso
@pditommaso
Aug 09 2017 11:50 UTC
NF creates the task hashes by the file name and last modified timestap for each input, by default
if you want to use the file content use cache 'deep'
Simone Baffelli
@baffelli
Aug 09 2017 11:50 UTC
:cool: Next feature request: custom hashing per task
:clap:
Amazing. Just tell us what nextflow does not do!!
Paolo Di Tommaso
@pditommaso
Aug 09 2017 11:52 UTC
:)
Simone Baffelli
@baffelli
Aug 09 2017 11:55 UTC
Now I found out where the problem was :smile: stupid me: the mask was correctly stored, but it depended on an upstream process whose input I changed. Now storing the upstream too, should take care of it
Paolo Di Tommaso
@pditommaso
Aug 09 2017 11:59 UTC
:+1:
Simone Baffelli
@baffelli
Aug 09 2017 13:25 UTC
@pditommaso would it make sense to modify the buffer operator so that the opening and closing criteria could access to the buffers content as well?
the use case would be the following:
  • I have a channel emitting sets consisting of files and dates.
  • Whenever the time difference from starting to ending dates in the buffer is smaller than a certain threshold, the buffer contents should be emitter
    To do so, I need a memory where I keep track of the starting date. At the moment I use an external variable that is used in the closure, but I would be somehow more elegant to directly provide it at the closure invokation time
Ido Tamir
@idot
Aug 09 2017 14:03 UTC
Hi, I try to have this command:
'''awk '{if($2 == 4){ print "@" $1 "\n" $10 "\n+\n" $11 }}' reads.umi.cut.rrna.sam > reads.umi.clean.fq'''
but it ends up as: awk '{if($2 == 4){ print "@" $1 "
" $10 "
+
" $11 }}' reads.umi.cut.rrna.sam > reads.umi.clean.fq
^ unterminated string
etc ...
Félix C. Morency
@fmorency
Aug 09 2017 14:11 UTC
can you use proper formatting? it's hard to read
Ido Tamir
@idot
Aug 09 2017 14:11 UTC
Nextflow turns the 3. " into a string terminator.
Oh there is markdown!
#!/bin/bash -ue awk '{if($2 == 4){ print "@" $1 " " $10 " + " $11 }}' reads.umi.cut.rrna.sam > reads.umi.clean.fq
Félix C. Morency
@fmorency
Aug 09 2017 14:15 UTC
can you post the script section of your process?
or the entire process
Ido Tamir
@idot
Aug 09 2017 14:17 UTC
#!/bin/bash -ue
awk '{if($2 == 4){ print "@" $1 "
" $10 "
+
" $11 }}' reads.umi.cut.rrna.sam > reads.umi.clean.fq
in .command.sh
the process is just:
process extrac_unaligned_rrna {

  input:
     file "reads.umi.cut.rrna.sam" from rrna
  output:
     file "reads.umi.clean.fq" into clean
  shell:
      '''awk '{if($2 == 4){ print "@" $1 "\n" $10 "\n+\n" $11 }}' reads.umi.cut.rrna.sam > reads.umi.clean.fq'''
}
Command error:
  awk: cmd. line:1: {if($2 == 4){ print "@" $1 "
  awk: cmd. line:1:                            ^ unterminated string
  awk: cmd. line:1: {if($2 == 4){ print "@" $1 "
  awk: cmd. line:1:                            ^ syntax error
Félix C. Morency
@fmorency
Aug 09 2017 14:23 UTC

I don't use shell: blocs, but from the Nextflow documentation

https://www.nextflow.io/docs/latest/process.html#shell

there is a note about

Shell script definition requires the use of single-quote ' delimited strings. When using double-quote " delimited strings, dollar variables are interpreted as Nextflow variables as usual. See String interpolation.
Did you try escaping the "\n" such as "\\n"?
Ido Tamir
@idot
Aug 09 2017 14:29 UTC
yes indeed I had to escape the \n
Thank you very much @fmorency
Félix C. Morency
@fmorency
Aug 09 2017 14:30 UTC
You're welcome.
Shellfishgene
@Shellfishgene
Aug 09 2017 14:48 UTC
Is there a way to prevent nf from using the 'local' executor even when there is no config file?
Félix C. Morency
@fmorency
Aug 09 2017 15:02 UTC
@Shellfishgene I guess you could pass it on the command line such as nextflow run -process.executor slurm ...
Shellfishgene
@Shellfishgene
Aug 09 2017 15:03 UTC
I meant more as global config setting somewhere, so I don't accidentally run jobs on the head node of the cluster when I forget to specify or copy the config file...
Félix C. Morency
@fmorency
Aug 09 2017 15:04 UTC
@Shellfishgene You can define the environment variable NXF_EXECUTOR with the default executor you want to use
Shellfishgene
@Shellfishgene
Aug 09 2017 15:05 UTC
Cool, thanks!