Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
  • Oct 04 19:15
    claumer commented #49
  • Jan 11 04:13
    Shokusei opened #49
  • Nov 16 2021 00:28
    macmanes commented #48
  • Nov 15 2021 10:50
    marco91sol opened #48
  • Oct 28 2021 01:28
    olar785 opened #47
  • Aug 12 2021 14:25

    macmanes on 2.3.3


  • Jul 22 2021 16:40
    lmuenter closed #46
  • Jul 22 2021 16:40
    lmuenter commented #46
  • Jul 22 2021 16:10
    macmanes commented #44
  • Jul 22 2021 16:09
    macmanes commented #45
  • Jul 22 2021 16:08
    macmanes commented #46
  • Jul 22 2021 08:39
    lmuenter opened #46
  • Jul 06 2021 20:19
    rmdickson opened #45
  • Apr 22 2021 13:19

    macmanes on master

    change inflation to 12 (compare)

  • Apr 22 2021 13:08

    macmanes on master

    update (compare)

  • Apr 22 2021 13:03

    macmanes on master

    update (compare)

  • Apr 22 2021 12:53

    macmanes on master

    fix check (compare)

  • Apr 19 2021 00:32

    macmanes on master

    update (compare)

  • Apr 18 2021 16:47

    macmanes on master

    add blasy and sratools (compare)

  • Apr 18 2021 14:58

    macmanes on master

    strandeval fix (compare)

Matt MacManes
all that polymorphism from individuals can cause problems
Richard Meitern
Matt MacManes
but ya, sorry, have to cat them together
however many you end up including
Richard Meitern
I have 10 individuals ~30 M PE 95 reads per individual
Matt MacManes
how many treatments or biological units?
In animals, 20-40M read pairs is often optimal
Richard Meitern
so the answer form cluster admins was NO docker. As an alternative they suggest "singularity" http://singularity.lbl.gov/ so I can build ORP on a local machine and make a singuarity image if I can't make with conda
they gave no explanation though
but I guess this is related to security issues
actually singularity also imports docker images so I can do that as well
Matt MacManes
i don’t have experiance with singularity, but I can figure this out hopefully today
Richard Meitern
No you don't have to I'll do it and report back how it went
Shawn Doyle
Hi Matt. I was curious if there is any reason that the ORP wouldn't work with a microbial (mostly bacteria) metatranscriptome?
Matt MacManes
Hi @Ice_Microbes_twitter , there is no reason why it would not work, but it may not work well…
for instance, in the merging steps, we choose the best member of each isoform-group, which makes sense in a non-meta assembly.. in your assembly, there could be multiple species represented (correctly) in a single iso-group, and here we’d only be picking one of them.
In the end, I’d say you can try, but look really critically at your results, versus the assemblies that Trinity, Spades, and TransABySS produce, all of which are available in the assemblies/ folder at the end of the run.
Shawn Doyle
Gotcha, thanks Matt.
Konstantinos Kyriakidis
@macmanes Dear Matt, first of all thank you very much for your great work. I would like to ask you if this approach I found could improve your pipeline even more https://github.com/EI-CoreBioinformatics/mikado
Samuele Greco
Hi all and thank you for this awesome protocol!
I am trying it right now and everything seems to run fine, but it has been stuck on Calculating read diagnostics for 181+ hours now, looks like it is running transrate and using 2 cores (even if i set up 64) and a bunch of RAM (~16G). Before killing and rerunning I wanted to ask if this is expected. The transcriptome is not too large and raw reads are about 200 Millions . Also, don't know if this is known, but I wanted to report that the spades step easily runs out of memory when using more than 8~16 cores depending on the dataset size.
Adam Stuckert

Hi @54mu,

This is a known issue, as it sometimes gets hung on this step. This step does take a while, but I'd say this one is definitely hung. I'd kill this job and start again.

A corollary issue is that 200M reads is a lot, and it will pretty dramatically increase the memory/time needed, particularly at this step.

How much memory are you giving the job? And what version of the ORP/SPAdes are you running? Memory issues used to be an issue with SPAdes, but that hasn't been the case for a year or two.

Samuele Greco
hi, i am giving 256G of ram, ORP version is 2.2.6 and spades is 3.13.0
Adam Stuckert
Hi @54mu , that seems like weird behavior to me. I would ask the folks that manage SPAdes, they are generally quite responsive. One possibility is that if the machine you are computing does not have more than 256 * the number of cores you've given the job.
Samuele Greco
Thanks! that may be the case, I will talk to them. By the way, looks like restarting the job did the trick for now.
Hello......am new here so apologies if I have posted in the wrong place or butted into the middle of someone else's thread. I have installed ORP according to the instructions but when running the 'Test the installation' section in the 'sampledata' folder I get this error: Total time = 2.32095s
Reported 69 pairwise alignments, 69 HSPs.
15 queries aligned.
make: [/PROJECTS/Oyster_River_Protocol/sampledata/assemblies/diamond/test.list5] Error 141
Deleting file `/PROJECTS/Oyster_River_Protocol/sampledata/assemblies/diamond/test.list5'
Any thoughts on how to fix Error 141? Thanks, Matt
Samuele Greco

hi, I would like to point out some issues and relative fixes I found. first -this is mainly related to trinity and transabyss- raw reads from SRA are often refused by Trinity for their name and simply discarding the names doesn't work well for trinity and transabyss (i guess it's about how they perform the scaffolding process). To fix this a simple bash script like this should work:

#read file names need to be formatted as filename.1.fastq.gz/filename.2.fastq.gz
filename=$(basename -- "$1")
zcat $1 | awk '{{print (NR%4 == 1) ? "@'$filename'_" ++i "/'$read'": $0}}' > $filename.$2.fastq

Another issue is relative to the transrate step, this happened when i was using a very large number of reads as an input (>500.000.000). I don't know much about ruby, but for some reason it was failing to remove some temporary folders and they were not empty. This was fixed by changing line 145 of /mnt/DATA/Software/Oyster_River_Protocol/software/orp-transrate/lib/app/lib/transrate/snap.rb:

Dir.delete(@index_name) if Dir.exist?(@index_name)
require 'fileutils'
FileUtils.rm_r @index_name if Dir.exist?(@index_name)

I hope this will be helpful if someone stumbles on the same issues

Pietro de Mello

Hi all,
First of all, thanks so much for the ORP. Really neat idea, and very nice coding. I was wondering if there is any way I can add multiple R1 and R2 files to the READ1 and READ2 flags. I tried doing so by separating the file paths with commas, but I get the following error message:

* Welcome to the Oyster River *
* This is version 2.2.6 *
/bin/bash: shell: command not found
/home/orp/Oyster_River_Protocol/oyster.mk:156: recipe for target 'readcheck' failed
make: * [readcheck] Error 127

Here's how I called ORP:

$orp_path/oyster.mk STRAND=RF \
MEM=40 \
CPU=10 \
READ1=$folder_path/39_b_S8.1.fq,$folder_path/39_o_S6.1.fq \
READ2=$folder_path/39_b_S8.2.fq,$folder_path/39_o_S6.2.fq \

Hi Matt, Thank you for this great pipeline. All appears to be running very smoothly. I noticed unique gene count in the report was unexpectedly low. The problem appears to be in the parsing of *diamond.txt, example line 316 in oyster.mk... .....| cut -d -f2 |...... should be .....|cut -d -f1|....... Perhaps the naming scheme in the swissprot has changed recently? But as it stands the current code is counting up taxa rather than unique gene. Let us know if I am on the right track.
Hi Matt, I have very much enjoyed using this pipeline with some other genomic data, but am running into a consistent issue concerning the Trinity run. I keep receiving the error 'Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.String.substring(String.java:1933)
at SeqVertex.getNameKmerAdj(SeqVertex.java:459)
at SeqVertex.getShortSeq(SeqVertex.java:430)
at SeqVertex.getShortSeqWID(SeqVertex.java:470)
at SeqVertex.toString(SeqVertex.java:260)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at TransAssembly_allProbPaths.compactLinearPaths(TransAssembly_allProbPaths.java:12965)
at TransAssembly_allProbPaths.main(TransAssembly_allProbPaths.java:802)
warning, cmd: java -Xmx20G -Xms1G -Xss1G -XX:ParallelGCThreads=2 -jar /usr/local/apps/gb/ORP/Oyster_River_Protocol/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/Butterfly/Butterfly.jar -N 100000 -L 200 -F 500 -C /scratch/trm76056/DrosIsoSeq_Dir/DrosSpecies_Dir/RecSpecies/assemblies/Recens.trinity/read_partitions/Fb_0/CBin_55/c5558.trinity.reads.fa.out/chrysalis/Component_bins/Cbin0/c0.graph --path_reinforcement_distance=25 failed with ret: 256, going to retry.

Hi Matt, question re the use of diamond in ORP. I've found that diamond blast searches block certain neuropeptides that I know are expressed (via antibody labeling) because of a repeat-sequence masking option. When I turn that off, I can then recover these sequences. I noticed this blasting ORP assemblies, but have now realized/remembered that diamond is part of ORP. I couldn't find the specific diamond command that I think is used after Orthofinder(I know where to change the diamond command for Orthofinder itself).

Basically, I'm thinking of re-running ORP after changing diamond just to check whether it makes a difference for assembly...

Hi Matt, Thank you for designing and sharing ORP. It is a great resource. I was trying to run ORP to generate transcriptome assembly for my files. However, I am facing certain challenges in doing so. When I install orp conda environment, i see that it doesn't install Transrate and TransABySS. I was able to install Transrate (conda install -c lmfaber transrate). However, when I try to install TransABySS (conda install -c bioconda transabyss), I encounter version conflicts between various packages including python 3.7 and python-igraph 0.8. I have the screen shot for the conflicts listed. Hence, I am unable to run ORP on my files. Is there any other way to install TransABySS, that can circumvent the conflicts? Or is there a way to use ORP without TransABySS? Any suggestions would be greatly appreciated. Kind regards,Tulika
Hi Matt, thank you for this great protocol.
when using the new spades version 3.14.0 with the 2.8.8 the .orp.fasta is not generated. Do you have any idea for this? With spades 3.13 the process stops with error code -9, irrespective whether I use it single-threaded or with multiple threads.
Hi Matt, the orp now runs until diamond: Total time = 17.7921s
Reported 48823 pairwise alignments, 51295HSPs.
37759 queries aligned. And it stopps with the
"ERROR Impossible to read /home/orp/assemblies/brain_assembly.ORP.fasta". The ORP.fasta is still not generated. I found a similar post on that problem here: macmanes-lab/Oyster_River_Protocol#32. What do you think about that? I don't want to change the code without having an idea on why there is this issue. Thank you for your help.
hi @macmanes and company, I would like to be able to provide to the pipeline some additional transcriptome assemblies to be incorporated into the assembly merging step (OrthoFuse). Is that possible? I seem to recall having read somewhere that OrthFuse was created in such a way it could take additional assemblies. If this is so, how can I do it? If ORP cannot be setup to run this way, would it be possible to run everything from OrthoFuser onwards providing the additional assemblies? Any other recommendation? Thanks in advance!
Hi everyone. Just to follow up on issue #40 on the github repo. Comparing the make file and outputs it seems that the 'orthofusing' step is failing after the creation of the .orthout files. I don't see the deletion of the .group files or the creation of the 'good.list'. It's probably not related but I also noticed the echo calls were not being printed to stdout.
I'm also noticing that the behavior of Orthofinder seems a bit strange. Given the number of groups being generated, and the fact that the number of members per group is (from a very cursory search) always one, I guess there must be something going wrong higher up?
I noticed some issues in the beginning with the scripts complaining about the lack of an orp_v2 environment and py27 so maybe the orthofuser.mk file isn't quite functional in the present repo/docker build?
I guess I should include my original error from github here too:

Hi @macmanes I'm trying to run orthofuser.mk on a set of transcriptomes (same species) that have been generated using the Oyster River Protocol (ORP). I'm running a docker container that has the most up-to-date version of ORP pulled from the master channel on an Ubuntu 18.04 server. The ORP generated transcriptomes were generated on the same container. Following renaming I tried to run the orthofuser.mk snakemake file, and the program runs for some time before failing. The error I receive isn't so revealing to me:

/home/orp/Oyster_River_Protocol/orthofuser.mk:68: recipe for target '/home/orp/assemblies/merge.orthomerged.fasta' failed
make: *** [/home/orp/assemblies/merge.orthomerged.fasta] Error 1

Please let me know if any additional info might be useful for determining why the script is failing.

tobias hildebrandt

Hello, I am trying to install ORP on an AWS server. I encountered the following issues:

AWS Ubuntu Server 20.04

1) Installed ORP using docker option 1 (sudo).

2) chmod -R 777 transferData

3) Tweaked the following statement: docker run -it \
--mount type=bind,source=/home/ubuntu/,target=/home/orp/docker \
macmaneslab/orp:2.2.6 bash


docker run -it \
--mount type=bind,source=/home/ubuntu/transferData/,target=/home/orp/docker \
macmaneslab/orp:2.2.6 bash

4) Started ORP

$HOME/Oyster_River_Protocol/oyster.mk \
MEM=128 \
CPU=32 \
READ1=All_R1_Reads.fastq.gz \
READ2=fixed_R3_reads.fastq.gz \

4) ORP crashed. Error dealing with "pipeliner"

I am new to gitter and orp. Is there a way to post the screen shots that I took?

Since this was a dead end for me then I tried a different option. However that didnt work either. Here is what I did:

Installation WITHOUT docker

new instance. this time with an AWS ubuntu server 16.0 , specifications as before (290 hdd, 32 vcpus 128 gb ram)

Installation was successful. Test run was terminated at the very end(!) causing the following error:

/home/ubuntu/Oyster_River_Protocol/oyster.mk:365: recipe for target '/home/ubuntu/Oyster_River_Protocol/sampledata/assemblies/test.ORP.fasta' failed
make: * [/home/ubuntu/Oyster_River_Protocol/sampledata/assemblies/test.ORP.fasta] Error 1

Changed permissions with chmod -R 777 sampledata

This time ORP crashed right away!

Exactly the same error message as before

/home/ubuntu/Oyster_River_Protocol/oyster.mk:365: recipe for target '/home/ubuntu/Oyster_River_Protocol/sampledata/assemblies/test.ORP.fasta' failed
make: * [/home/ubuntu/Oyster_River_Protocol/sampledata/assemblies/test.ORP.fasta] Error 1

Hello! First of all thanks @macmanes for creating this! I am fairly new in bioinformatics and it's my first time in gitter so my apologies if I am not using this right. I am facing the same problem than @tulika98. I am not able to run the sampledata via installing ORP using conda. I followed the instructions and used the Make process to install the orp environment in conda. The problem is that the conda environment that is created has Python v. 3.7 and apparently neither Transabyss nor Transrate are compatible with python v.3.7. I tried to downgrade the phython version but I get a bunch of incompatibilities with other dependencies since they are already installed. I also tried to start over again and mess with the Makefile code to see if I was able to create the orp environment with another python version but I was not able to figure out a way. Any suggestions? ( I am working on Ubuntu 18.04.4 LTS and conda v. 4.8.3). Thanks a lot!
I've run the ORP and generated a report using the default eukaryota_odb10 database, and now would like to create another report against the viridiplantae_odb10 database using report.mk. The database is sym-linked to Oyster_River_Protocol/busco_dbs, but the job returns an error message about unable to find run_BUSCO.py script. I already have BUSCO output for the eukaryota database, so it seems BUSCO is installed; what is missing is the script. I don't see the script on the Github site, either. Is another version of the report.mk script available that doesn't require the run_BUSCO.py script?
Greetings! I love ORP and successfully used it quite a few times now. However I've run into an issue now that I can't quite solve. No matter what setting I put as a strand parameter, the final assembly analysis comes with a graph that looks like three peaks, on the left side, middle (main one) and a small-ish on the right side (I tried putting it here, but Git removes white spaces and mushed it together).
I would assume that low Busco score is somehow related to this issue. Would be great if you can give me any suggestion on improving the quality of assembly - if it is possible. Thank you!
🦇Alexis M. Brown 🧬

Hello! I absolutely love the ORP and have run it successfully with a lot of my reads. However, I'm running into quite the head scratcher for reads that are 75 bp long or less. The program documentation says you can specify your SPADES2_KMER=INT length, which I have done in the following lines:

MAKEDIR := $(dir $(firstword $(MAKEFILE_LIST)))
RCORR := ${shell which rcorrector}

My reads are exactly 75 bp long, but changing the SPADES2_KMER flag does not resolve the issue. I still receive the following error:


/bin/bash: line 8: shell: command not found

I found a discussion about this on the github from 2019: macmanes-lab/Oyster_River_Protocol#17

Was this ever resolved/addressed? How can I get ORP to run for reads that are 75bp or less? Hoping to hear back!

@macmanes Hi, is there the possibility to modify snap.rb and/or oyster.mk files in the docker container (with non-root privileges)?
Samuele Greco
Hi, how does the unique genes count for the assembly work? Is it just the count of unique annotations in a diamond run against uniprot? id there a gene-to-transcript map being generated? how are the unannotated transcripts treated?
Hi @macmanes. I'm new to ORP and trying to install it on a hpc cluster. I'm able to download ORP from the git but make is failing. Right off the bat, the make gives me an error that Oyster_River_Protocol/software/anaconda/install/bin/conda/ doesn't exist. The Makefile starts off by setting the CONDAROOT = ${DIR}/software/anaconda/install/ but that dir doesn't exist. What am I missing here? I don't see the step where this dir is made. I made the dirs that were missing and tried make again. The paths in config.ini were made by the Makefile. E.g. Oyster_River_Protocol/software/anaconda/install/envs/orp_busco/bin/ wasn't made. Actually nothing after install/ was made. Any suggestions for how to install or start over? I'm doing this work on an HPC. Thanks for your help. I look forward to using ORP.