These are chat archives for nextflow-io/nextflow

25th
Apr 2018
Ashley S Doane
@DoaneAS
Apr 25 2018 00:13

@pditommaso my beforeScript command is
process { beforeScript = 'source /home/asd2007/spack/share/spack/setup-env.sh' }
I have this in my configure file.
setup-env.sh sets up spack, and the error I get is that it cannot find module: module: command not found.

I can pm you the script if that helps (it's 190 lines)

Maxime Garcia
@MaxUlysse
Apr 25 2018 06:57
@Bioninbo was also interested in an ATAC-seq pipeline
Paolo Di Tommaso
@pditommaso
Apr 25 2018 09:07
@sgdavis1 it bounds the number of processes that can be executed, I mean the limit usage is 100GB and each process define memory 10.GB therefore you will be able to run 10 processes in parallel. Note that NF not enforce the memory usage ie if a task consume more (or less), NF is not aware of that
Caspar
@caspargross
Apr 25 2018 11:44
Is there a way to force nextflow to re-run a specific process on a resumed execution? The process code in nextflow code did not change, but I did some changes to the called script.
Maxime Garcia
@MaxUlysse
Apr 25 2018 11:44
you can try to remove the work folder corresponding to this process
Caspar
@caspargross
Apr 25 2018 11:45
there are a lot of working folders since the process is called multiple times
Paolo Di Tommaso
@pditommaso
Apr 25 2018 11:45
it should be able to recognised the changes in the called script, provided they are located in the project bin/ dir
Caspar
@caspargross
Apr 25 2018 11:45
aahh thanks, did forget about the bin dir
:thumbsup:
Steven Davis
@sgdavis1
Apr 25 2018 13:47
@pditommaso Awesome, thanks that's all I needed to know. Would this behavior change if I explicitly added a memory directive to my process of higher value (like 30-35GB?)?
Matthieu Foll
@mfoll
Apr 25 2018 14:58

Hi, I have questions for those using nextflow with a cluster without internet access. I want to use pipelines hosted on GitHub and still keep track of the exact pipeline version used. I tried the following procedure:
On my local machine with internet access:

git clone https://github.com/nextflow-io/hello.git
tar -cvzf hello.tar.gz hello/

Then I transfer the hello.tar.gz to the cluster without internet access put it in nextflow assets folder:

mkdir ~/.nextflow/assets/nextflow-io/
tar -xvzf hello.tar.gz -C ~/.nextflow/assets/nextflow-io/

Then I run nextflow :

$ nextflow run nextflow-io/hello
N E X T F L O W  ~  version 0.28.2
Launching `nextflow-io/hello` [deadly_gates] - revision: d4c9ea84de [master]

That's great as I see the correct revision (d4c9ea84de) and branch. But here nextflow hangs for a couple of minutes before successfully completing the pipeline. I can see an error in the .nextflow.log file:

Apr-25 16:53:09.446 [main] DEBUG nextflow.scm.AssetManager - WARN: Failed to check remote Git revision
org.eclipse.jgit.api.errors.TransportException: https://github.com/nextflow-io/hello.git: cannot open git-upload-pack
    at org.eclipse.jgit.api.LsRemoteCommand.execute(LsRemoteCommand.java:222)
    at org.eclipse.jgit.api.LsRemoteCommand.call(LsRemoteCommand.java:160)
    at nextflow.scm.AssetManager.getRemoteCommitId(AssetManager.groovy:884)
    at nextflow.scm.AssetManager.checkRemoteStatus0(AssetManager.groovy:895)
    at nextflow.scm.AssetManager.checkRemoteStatus(AssetManager.groovy:913)
    at nextflow.cli.CmdRun.getScriptFile(CmdRun.groovy:293)
    at nextflow.cli.CmdRun.run(CmdRun.groovy:198)
    at nextflow.cli.Launcher.run(Launcher.groovy:428)
    at nextflow.cli.Launcher.main(Launcher.groovy:582)
Caused by: org.eclipse.jgit.errors.TransportException: https://github.com/nextflow-io/hello.git: cannot open git-upload-pack
    at org.eclipse.jgit.transport.TransportHttp.connect(TransportHttp.java:566)
    at org.eclipse.jgit.transport.TransportHttp.openFetch(TransportHttp.java:326)
    at org.eclipse.jgit.api.LsRemoteCommand.execute(LsRemoteCommand.java:199)
    ... 8 common frames omitted
Caused by: java.net.ConnectException: Connection timed out (Connection timed out) github.com
    at org.eclipse.jgit.util.HttpSupport.response(HttpSupport.java:210)
    at org.eclipse.jgit.transport.TransportHttp.connect(TransportHttp.java:504)
    ... 10 common frames omitted

So my questions are:

  • is it a good idea?
  • any possibility to tell nextflow not to try to check the remote Git revision so I don't have to wait for the timeout each time?
Maxime Garcia
@MaxUlysse
Apr 25 2018 15:00
@mfoll We're using Sarek on a cluster without internet connection as well
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:01
ummm, maybe using a bare clone can fix the problem, here the guru is @emi80
Maxime Garcia
@MaxUlysse
Apr 25 2018 15:01
we're using that to get the revision
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:02
the problem is that NF tries to check if there's a new remote version
This message was deleted
Matthieu Foll
@mfoll
Apr 25 2018 15:02
If I don’t use the option -latest, why NF is doing this?
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:03
yes, to warn the user for a new release
but I agree there should be an -offline flag to avoid that, could you please open an issue for that ?
Matthieu Foll
@mfoll
Apr 25 2018 15:03
Sure!
Emilio Palumbo
@emi80
Apr 25 2018 15:03
@mfoll you can create a remote on the cluster that does not have internet access and push to it from your local machine
Maxime Garcia
@MaxUlysse
Apr 25 2018 15:04
maybe it's doing that because you're putting it inside the nextflow assets directory, have you tried another location?
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:04
maybe it's doing that because you're putting it inside the nextflow assets directory
that's for sure
Emilio Palumbo
@emi80
Apr 25 2018 15:05
then in the cluster you can nextflow pull file://{path to the local remote}
it's a bit tricky but I have used it quite often and it works well
Maxime Garcia
@MaxUlysse
Apr 25 2018 15:06

that's for sure

I knew there was something strange there

Matthieu Foll
@mfoll
Apr 25 2018 15:07
@MaxUlysse if I don’t put it in the assets directory it runs without checking if there’s a new remote version, but then the revision number doesn’t match the GitHub revision:
nextflow run hello/main.nf 
N E X T F L O W  ~  version 0.28.2
Launching `hello/main.nf` [jovial_kalman] - revision: b9f107bada
Thank @emi80 I’ll try it!
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:07
@emi80 how to create the bare repo ?
Emilio Palumbo
@emi80
Apr 25 2018 15:08
create the folder, move inside and git init --bare
then you can push via ssh from the local machine
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:10
how is the command to push ?
Emilio Palumbo
@emi80
Apr 25 2018 15:10
add the remote: git remote add cluster user@node:/path/to/bare/repo
push: git push cluster <branch>
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:11
ah-ah, great thanks
I need to take note of this
Matthieu Foll
@mfoll
Apr 25 2018 15:14
Same here, thanks for the explanation…
Thomas Zichner
@zichner
Apr 25 2018 15:24
Hi. I have a more general questions.
Is there a best practice of how to check whether input files are existing?
Obviously, NF will exit with an error if a file is missing, however, it would be nice to check for all files (e.g., reference genome, mapping index, etc) at the beginning of a script and give meaning full error messages if something is missing.

In a pipeline written by @ewels I saw the following construct:

star_index = Channel
        .fromPath(params.star_index)
        .ifEmpty { exit 1, "STAR index not found: ${params.star_index}" }

However, as far as I can see, this does not work if a wrong file path is given since fromPath() doesn't check for file existence and therefore, the channel is not empty.

Any ideas / suggestions are highly appreciated.
Phil Ewels
@ewels
Apr 25 2018 15:29
huh, nice catch! I don't think I realised that :)
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:29
you can do also something like that
genome_file     = file(params.genome)
variants_file   = file(params.variants)
blacklist_file  = file(params.blacklist)

assert genome_file.exists(), "Genome file is missing"
assert variants_file.exists(), "Variant file is missing"
assert blacklist_file.exists(), "Blacklist file is missing"
Thomas Zichner
@zichner
Apr 25 2018 15:31
That looks pretty clean and straight forward, very nice!
Phil Ewels
@ewels
Apr 25 2018 15:32
Cool! :+1: Will also steal that :)
Thomas Zichner
@zichner
Apr 25 2018 15:32
I guess one could also build this into a channel of filenames in order to test many?!
Phil Ewels
@ewels
Apr 25 2018 15:33
though that doesn't work with channels right @pditommaso? You just have file object variables there.
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:33
um, that does not make much sense, because the Channel.fromPath idiom returns only existing files ...
Phil Ewels
@ewels
Apr 25 2018 15:34
ahaaa, ok that explains why I've never seen the problem before :D
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:34
LOL
Phil Ewels
@ewels
Apr 25 2018 15:34
ok good, so I can relax - my code is working fine :laughing:
Thomas Zichner
@zichner
Apr 25 2018 15:35
But in the documentation it says "It does not check the file existence."
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:35
oops
where exactly ?
Maxime Garcia
@MaxUlysse
Apr 25 2018 15:36
We're checking if the reference files exist in Sarek
Thomas Zichner
@zichner
Apr 25 2018 15:36
And I had used the approach by @ewels and in my case it did not catch a wrong file path
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:36
ok, I was referring when using glob patterns ..
Maxime Garcia
@MaxUlysse
Apr 25 2018 15:38
But we probably stole everything from @ewels too ;-)
Or maybe not in fact, of you want @zichner you can check around here https://github.com/SciLifeLab/Sarek/blob/f4a2581e8d75fd8bff5b9ec9b24c97387adbd001/main.nf#L626
Thomas Zichner
@zichner
Apr 25 2018 15:41
@MaxUlysse Thanks a lot for the suggestion, I will look into this
Phil Ewels
@ewels
Apr 25 2018 15:43
@pditommaso - could it also work for non-glob patterns? (as a new feature I guess...)
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:44
do you mean to check the existance ?
Phil Ewels
@ewels
Apr 25 2018 15:44
yes
and ability to read if you're feeling nice
Maxime Garcia
@MaxUlysse
Apr 25 2018 15:44
@zichner you're welcome, don't hesitate if you have questions
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:45
we could add an option checkIfExists ..
Phil Ewels
@ewels
Apr 25 2018 15:45
:+1:
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:45
add an issue to the endless list :)
Thomas Zichner
@zichner
Apr 25 2018 15:46
A checkIfExists option would indeed be great
Phil Ewels
@ewels
Apr 25 2018 15:46
:sparkles: :shipit:
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:46
BTW who is interested to Conda support here ?
Phil Ewels
@ewels
Apr 25 2018 15:46
We are :)
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:46
good, a basic implementation is ready
Vladimir Kiselev
@wikiselev
Apr 25 2018 15:47
We, too
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:47
need to understand complex use cases
how many modules it may be specified, different conda channels
or do you use conda env files ?
Phil Ewels
@ewels
Apr 25 2018 15:48
We have been using env files for the nf-core pipelines, yes (then building docker images from them)
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:48
do you have some examples ?
Paolo Di Tommaso
@pditommaso
Apr 25 2018 15:49
nice
these files can be used with conda create .. --file environment.yml, right ?
ah, ok it's exactly that in the dockerfile :+1:
interesting, it's enough to add the conda bin path to the PATH to make it work
https://github.com/nf-core/RNAseq/blob/master/Dockerfile#L8
is this equivalent to source activate etc ?
Phil Ewels
@ewels
Apr 25 2018 16:32
Yup, exactly. Workaround to avoid having to use that command every time the container is run.
Seems to work really well :)
:point_up::point_up: experimental Conda support
Venkat Malladi
@vsmalladi
Apr 25 2018 20:13
@pditommaso will have to take a look
will solve lots of issues
Paolo Di Tommaso
@pditommaso
Apr 25 2018 20:13
nice
Alexander Peltzer
@apeltzer
Apr 25 2018 20:14
Same here
good you implemented that already!
Venkat Malladi
@vsmalladi
Apr 25 2018 20:15
@pditommaso so the idea is to use conda instead of docker or singularity containers
or as alternative for most ppl
Paolo Di Tommaso
@pditommaso
Apr 25 2018 20:15
yep
Venkat Malladi
@vsmalladi
Apr 25 2018 20:16
okay that will make it easier to not have to deal with docker
but still ensure all of the same depdencies
Paolo Di Tommaso
@pditommaso
Apr 25 2018 20:17
if your pipeline only uses standard tools bioconda can be an easier solution
Venkat Malladi
@vsmalladi
Apr 25 2018 20:18
ah okay
so i coukd have docker, conda and singularity config file
Paolo Di Tommaso
@pditommaso
Apr 25 2018 20:18
some other pipelines can use custom packages and scripts in that cases, a container could be better
same to deploy in the cloud
therefore NF allows you to switch from one system to another
Venkat Malladi
@vsmalladi
Apr 25 2018 20:19
Great will have to test these out
Paolo Di Tommaso
@pditommaso
Apr 25 2018 20:20
:ok_hand:
Vladimir Kiselev
@wikiselev
Apr 25 2018 23:12
@pditommaso , wow, this is cool