Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Aug 28 16:39
    evolscientist opened #41
  • May 14 13:59
    erickmikwa commented #40
  • Apr 25 23:29
    peterdfields edited #40
  • Apr 25 23:29
    peterdfields edited #40
  • Apr 25 23:22
    peterdfields opened #40
  • Mar 31 01:08
    Hoberti opened #39
  • Mar 30 15:15
    santiagorevale opened #38
  • Mar 25 11:58
    devonorourke opened #37
  • Mar 03 17:11
    santiagorevale commented #36
  • Feb 04 11:35

    macmanes on master

    update salmon for 1.1 (compare)

  • Feb 04 11:25

    macmanes on master

    fix for salmon 1.1 (compare)

  • Feb 03 13:33

    macmanes on master

    remove KMER_SIZE flag per 2.9.1 (compare)

  • Feb 03 13:09

    macmanes on master

    update trin, add trimmo option,… (compare)

  • Jan 15 17:15
    santiagorevale opened #36
  • Jan 13 13:17

    macmanes on master

    update report gen script (compare)

  • Jan 13 11:30

    macmanes on master

    fix BUSCO path (compare)

  • Jan 09 14:28

    macmanes on master

    bump spades (compare)

  • Jan 07 16:54

    macmanes on master

    add changelog (compare)

  • Jan 07 16:25

    macmanes on master

    look for system conda (compare)

  • Jan 07 16:23

    macmanes on master

    update (compare)

Matt MacManes
@macmanes
so this line in the makefile
cd ${DIR}/software/anaconda && bash Anaconda3-2018.12-Linux-x86_64.sh -b -p install/
change to
cd ${DIR}/software/anaconda && bash Anaconda3-2018.12-Linux-x86_64.sh -b -p /gpfs/hpc/home/rix133/orpconda
all your stuff will be in $HOME/orpconda but that is fine as conda will know that automatically.
Richard Meitern
@rix133
thanks! I figured this out already myself but good for somebody else if they find this problem.
Richard Meitern
@rix133
@macmanes Another question: If I have reads from several individuals can I input them all to oyster.mk or it takes only one READ1 and one READ2 file (so I have to cat them together first)?
Matt MacManes
@macmanes
I typically say 1 read set per biological unit - try to limit the number of individuals you include in the assembly
all that polymorphism from individuals can cause problems
Richard Meitern
@rix133
ok
Matt MacManes
@macmanes
but ya, sorry, have to cat them together
however many you end up including
Richard Meitern
@rix133
I have 10 individuals ~30 M PE 95 reads per individual
Matt MacManes
@macmanes
how many treatments or biological units?
In animals, 20-40M read pairs is often optimal
Richard Meitern
@rix133
so the answer form cluster admins was NO docker. As an alternative they suggest "singularity" http://singularity.lbl.gov/ so I can build ORP on a local machine and make a singuarity image if I can't make with conda
they gave no explanation though
but I guess this is related to security issues
actually singularity also imports docker images so I can do that as well
Matt MacManes
@macmanes
i don’t have experiance with singularity, but I can figure this out hopefully today
Richard Meitern
@rix133
No you don't have to I'll do it and report back how it went
Shawn Doyle
@Ice_Microbes_twitter
Hi Matt. I was curious if there is any reason that the ORP wouldn't work with a microbial (mostly bacteria) metatranscriptome?
Matt MacManes
@macmanes
Hi @Ice_Microbes_twitter , there is no reason why it would not work, but it may not work well…
for instance, in the merging steps, we choose the best member of each isoform-group, which makes sense in a non-meta assembly.. in your assembly, there could be multiple species represented (correctly) in a single iso-group, and here we’d only be picking one of them.
In the end, I’d say you can try, but look really critically at your results, versus the assemblies that Trinity, Spades, and TransABySS produce, all of which are available in the assemblies/ folder at the end of the run.
Shawn Doyle
@Ice_Microbes_twitter
Gotcha, thanks Matt.
Konstantinos Kyriakidis
@kokyriakidis
@macmanes Dear Matt, first of all thank you very much for your great work. I would like to ask you if this approach I found could improve your pipeline even more https://github.com/EI-CoreBioinformatics/mikado
54mu
@54mu
Hi all and thank you for this awesome protocol!
I am trying it right now and everything seems to run fine, but it has been stuck on Calculating read diagnostics for 181+ hours now, looks like it is running transrate and using 2 cores (even if i set up 64) and a bunch of RAM (~16G). Before killing and rerunning I wanted to ask if this is expected. The transcriptome is not too large and raw reads are about 200 Millions . Also, don't know if this is known, but I wanted to report that the spades step easily runs out of memory when using more than 8~16 cores depending on the dataset size.
Adam Stuckert
@AdamStuckert

Hi @54mu,

This is a known issue, as it sometimes gets hung on this step. This step does take a while, but I'd say this one is definitely hung. I'd kill this job and start again.

A corollary issue is that 200M reads is a lot, and it will pretty dramatically increase the memory/time needed, particularly at this step.

How much memory are you giving the job? And what version of the ORP/SPAdes are you running? Memory issues used to be an issue with SPAdes, but that hasn't been the case for a year or two.

54mu
@54mu
hi, i am giving 256G of ram, ORP version is 2.2.6 and spades is 3.13.0
Adam Stuckert
@AdamStuckert
Hi @54mu , that seems like weird behavior to me. I would ask the folks that manage SPAdes, they are generally quite responsive. One possibility is that if the machine you are computing does not have more than 256 * the number of cores you've given the job.
54mu
@54mu
Thanks! that may be the case, I will talk to them. By the way, looks like restarting the job did the trick for now.
mb492
@mb492
Hello......am new here so apologies if I have posted in the wrong place or butted into the middle of someone else's thread. I have installed ORP according to the instructions but when running the 'Test the installation' section in the 'sampledata' folder I get this error: Total time = 2.32095s
Reported 69 pairwise alignments, 69 HSPs.
15 queries aligned.
make: [/PROJECTS/Oyster_River_Protocol/sampledata/assemblies/diamond/test.list5] Error 141
make:
Deleting file `/PROJECTS/Oyster_River_Protocol/sampledata/assemblies/diamond/test.list5'
Any thoughts on how to fix Error 141? Thanks, Matt
54mu
@54mu

hi, I would like to point out some issues and relative fixes I found. first -this is mainly related to trinity and transabyss- raw reads from SRA are often refused by Trinity for their name and simply discarding the names doesn't work well for trinity and transabyss (i guess it's about how they perform the scaffolding process). To fix this a simple bash script like this should work:

#!/bin/bash
#read file names need to be formatted as filename.1.fastq.gz/filename.2.fastq.gz
read=$2
filename=$(basename -- "$1")
extension="${filename##*.}"
filename="${filename%.*.*.*}"
zcat $1 | awk '{{print (NR%4 == 1) ? "@'$filename'_" ++i "/'$read'": $0}}' > $filename.$2.fastq

Another issue is relative to the transrate step, this happened when i was using a very large number of reads as an input (>500.000.000). I don't know much about ruby, but for some reason it was failing to remove some temporary folders and they were not empty. This was fixed by changing line 145 of /mnt/DATA/Software/Oyster_River_Protocol/software/orp-transrate/lib/app/lib/transrate/snap.rb:

#from
Dir.delete(@index_name) if Dir.exist?(@index_name)
#to
require 'fileutils'
FileUtils.rm_r @index_name if Dir.exist?(@index_name)

I hope this will be helpful if someone stumbles on the same issues

Pietro de Mello
@plhm

Hi all,
First of all, thanks so much for the ORP. Really neat idea, and very nice coding. I was wondering if there is any way I can add multiple R1 and R2 files to the READ1 and READ2 flags. I tried doing so by separating the file paths with commas, but I get the following error message:

* Welcome to the Oyster River *
* This is version 2.2.6 *
ERROR: YOUR READ1 FILE DOES NOT EXIST AT THE LOCATION YOU SPECIFIED
/bin/bash: shell: command not found
/home/orp/Oyster_River_Protocol/oyster.mk:156: recipe for target 'readcheck' failed
make: * [readcheck] Error 127

Here's how I called ORP:

orp_path=/home/orp/Oyster_River_Protocol
$orp_path/oyster.mk STRAND=RF \
TPM_FILT=1 \
MEM=40 \
CPU=10 \
READ1=$folder_path/39_b_S8.1.fq,$folder_path/39_o_S6.1.fq \
READ2=$folder_path/39_b_S8.2.fq,$folder_path/39_o_S6.2.fq \
RUNOUT=full_out

Victaphanta
@Victaphanta
Hi Matt, Thank you for this great pipeline. All appears to be running very smoothly. I noticed unique gene count in the report was unexpectedly low. The problem appears to be in the parsing of *diamond.txt, example line 316 in oyster.mk... .....| cut -d -f2 |...... should be .....|cut -d -f1|....... Perhaps the naming scheme in the swissprot has changed recently? But as it stands the current code is counting up taxa rather than unique gene. Let us know if I am on the right track.
theresamiorin
@theresamiorin_twitter
Hi Matt, I have very much enjoyed using this pipeline with some other genomic data, but am running into a consistent issue concerning the Trinity run. I keep receiving the error 'Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.String.substring(String.java:1933)
at SeqVertex.getNameKmerAdj(SeqVertex.java:459)
at SeqVertex.getShortSeq(SeqVertex.java:430)
at SeqVertex.getShortSeqWID(SeqVertex.java:470)
at SeqVertex.toString(SeqVertex.java:260)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at TransAssembly_allProbPaths.compactLinearPaths(TransAssembly_allProbPaths.java:12965)
at TransAssembly_allProbPaths.main(TransAssembly_allProbPaths.java:802)
warning, cmd: java -Xmx20G -Xms1G -Xss1G -XX:ParallelGCThreads=2 -jar /usr/local/apps/gb/ORP/Oyster_River_Protocol/software/anaconda/install/envs/orp_v2/opt/trinity-2.8.4/Butterfly/Butterfly.jar -N 100000 -L 200 -F 500 -C /scratch/trm76056/DrosIsoSeq_Dir/DrosSpecies_Dir/RecSpecies/assemblies/Recens.trinity/read_partitions/Fb_0/CBin_55/c5558.trinity.reads.fa.out/chrysalis/Component_bins/Cbin0/c0.graph --path_reinforcement_distance=25 failed with ret: 256, going to retry.
desmondramirez
@desmondramirez

Hi Matt, question re the use of diamond in ORP. I've found that diamond blast searches block certain neuropeptides that I know are expressed (via antibody labeling) because of a repeat-sequence masking option. When I turn that off, I can then recover these sequences. I noticed this blasting ORP assemblies, but have now realized/remembered that diamond is part of ORP. I couldn't find the specific diamond command that I think is used after Orthofinder(I know where to change the diamond command for Orthofinder itself).

Basically, I'm thinking of re-running ORP after changing diamond just to check whether it makes a difference for assembly...

tulika98
@tulika98
Hi Matt, Thank you for designing and sharing ORP. It is a great resource. I was trying to run ORP to generate transcriptome assembly for my files. However, I am facing certain challenges in doing so. When I install orp conda environment, i see that it doesn't install Transrate and TransABySS. I was able to install Transrate (conda install -c lmfaber transrate). However, when I try to install TransABySS (conda install -c bioconda transabyss), I encounter version conflicts between various packages including python 3.7 and python-igraph 0.8. I have the screen shot for the conflicts listed. Hence, I am unable to run ORP on my files. Is there any other way to install TransABySS, that can circumvent the conflicts? Or is there a way to use ORP without TransABySS? Any suggestions would be greatly appreciated. Kind regards,Tulika
syhof
@syhof
Hi Matt, thank you for this great protocol.
syhof
@syhof
when using the new spades version 3.14.0 with the 2.8.8 the .orp.fasta is not generated. Do you have any idea for this? With spades 3.13 the process stops with error code -9, irrespective whether I use it single-threaded or with multiple threads.
syhof
@syhof
Hi Matt, the orp now runs until diamond: Total time = 17.7921s
Reported 48823 pairwise alignments, 51295HSPs.
37759 queries aligned. And it stopps with the
"ERROR Impossible to read /home/orp/assemblies/brain_assembly.ORP.fasta". The ORP.fasta is still not generated. I found a similar post on that problem here: macmanes-lab/Oyster_River_Protocol#32. What do you think about that? I don't want to change the code without having an idea on why there is this issue. Thank you for your help.
syhof
@syhof
solved
santiagorevale
@santiagorevale
hi @macmanes and company, I would like to be able to provide to the pipeline some additional transcriptome assemblies to be incorporated into the assembly merging step (OrthoFuse). Is that possible? I seem to recall having read somewhere that OrthFuse was created in such a way it could take additional assemblies. If this is so, how can I do it? If ORP cannot be setup to run this way, would it be possible to run everything from OrthoFuser onwards providing the additional assemblies? Any other recommendation? Thanks in advance!
peterdfields
@peterdfields
Hi everyone. Just to follow up on issue #40 on the github repo. Comparing the make file and outputs it seems that the 'orthofusing' step is failing after the creation of the .orthout files. I don't see the deletion of the .group files or the creation of the 'good.list'. It's probably not related but I also noticed the echo calls were not being printed to stdout.
I'm also noticing that the behavior of Orthofinder seems a bit strange. Given the number of groups being generated, and the fact that the number of members per group is (from a very cursory search) always one, I guess there must be something going wrong higher up?
I noticed some issues in the beginning with the scripts complaining about the lack of an orp_v2 environment and py27 so maybe the orthofuser.mk file isn't quite functional in the present repo/docker build?
peterdfields
@peterdfields
I guess I should include my original error from github here too:

Hi @macmanes I'm trying to run orthofuser.mk on a set of transcriptomes (same species) that have been generated using the Oyster River Protocol (ORP). I'm running a docker container that has the most up-to-date version of ORP pulled from the master channel on an Ubuntu 18.04 server. The ORP generated transcriptomes were generated on the same container. Following renaming I tried to run the orthofuser.mk snakemake file, and the program runs for some time before failing. The error I receive isn't so revealing to me:

/home/orp/Oyster_River_Protocol/orthofuser.mk:68: recipe for target '/home/orp/assemblies/merge.orthomerged.fasta' failed
make: *** [/home/orp/assemblies/merge.orthomerged.fasta] Error 1

Please let me know if any additional info might be useful for determining why the script is failing.

tobias hildebrandt
@timeout2575_twitter

Hello, I am trying to install ORP on an AWS server. I encountered the following issues:

AWS Ubuntu Server 20.04

1) Installed ORP using docker option 1 (sudo).

2) chmod -R 777 transferData

3) Tweaked the following statement: docker run -it \
--mount type=bind,source=/home/ubuntu/,target=/home/orp/docker \
macmaneslab/orp:2.2.6 bash

New:

docker run -it \
--mount type=bind,source=/home/ubuntu/transferData/,target=/home/orp/docker \
macmaneslab/orp:2.2.6 bash

4) Started ORP

$HOME/Oyster_River_Protocol/oyster.mk \
STRAND=RF \
TPM_FILT=1 \
MEM=128 \
CPU=32 \
READ1=All_R1_Reads.fastq.gz \
READ2=fixed_R3_reads.fastq.gz \
RUNOUT=test

4) ORP crashed. Error dealing with "pipeliner"

I am new to gitter and orp. Is there a way to post the screen shots that I took?

Since this was a dead end for me then I tried a different option. However that didnt work either. Here is what I did:

Installation WITHOUT docker

new instance. this time with an AWS ubuntu server 16.0 , specifications as before (290 hdd, 32 vcpus 128 gb ram)

Installation was successful. Test run was terminated at the very end(!) causing the following error:

/home/ubuntu/Oyster_River_Protocol/oyster.mk:365: recipe for target '/home/ubuntu/Oyster_River_Protocol/sampledata/assemblies/test.ORP.fasta' failed
make: * [/home/ubuntu/Oyster_River_Protocol/sampledata/assemblies/test.ORP.fasta] Error 1

Changed permissions with chmod -R 777 sampledata

This time ORP crashed right away!

Exactly the same error message as before

/home/ubuntu/Oyster_River_Protocol/oyster.mk:365: recipe for target '/home/ubuntu/Oyster_River_Protocol/sampledata/assemblies/test.ORP.fasta' failed
make: * [/home/ubuntu/Oyster_River_Protocol/sampledata/assemblies/test.ORP.fasta] Error 1

detergentemultiusos
@detergentemulti_twitter
Hello! First of all thanks @macmanes for creating this! I am fairly new in bioinformatics and it's my first time in gitter so my apologies if I am not using this right. I am facing the same problem than @tulika98. I am not able to run the sampledata via installing ORP using conda. I followed the instructions and used the Make process to install the orp environment in conda. The problem is that the conda environment that is created has Python v. 3.7 and apparently neither Transabyss nor Transrate are compatible with python v.3.7. I tried to downgrade the phython version but I get a bunch of incompatibilities with other dependencies since they are already installed. I also tried to start over again and mess with the Makefile code to see if I was able to create the orp environment with another python version but I was not able to figure out a way. Any suggestions? ( I am working on Ubuntu 18.04.4 LTS and conda v. 4.8.3). Thanks a lot!