Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Apr 09 00:16

    BenLangmead on master

    embed parsed_md (compare)

  • Apr 08 21:26

    BenLangmead on master

    make more portable (compare)

  • Apr 08 21:18

    BenLangmead on master

    some attempts to make this scri… (compare)

  • Dec 03 2018 15:25
    gianmaz edited #88
  • Dec 03 2018 14:25
    gianmaz opened #88
  • Mar 15 2018 20:05
    ChristopherWilks opened #87
  • Mar 15 2018 20:01

    ChristopherWilks on master

    switched to use sratoolkit 2.8.… (compare)

  • Mar 04 2018 22:50

    nellore on master

    patches bowtie2-build in travis… (compare)

  • Mar 04 2018 22:11

    nellore on master

    uses bowtie2 2.3.4.1 (compare)

  • Mar 04 2018 21:57

    nellore on master

    specifies samtools version to i… (compare)

  • Mar 04 2018 21:47

    nellore on master

    updates dependencies Merge branch 'master' of https:… (compare)

  • Mar 04 2018 21:39

    nellore on master

    quote rules Merge pull request #86 from Ben… (compare)

  • Mar 04 2018 21:39
    nellore closed #86
  • Feb 04 2018 01:11

    nellore on master

    enables --keep-alive for covera… (compare)

  • Jan 10 2018 13:09
    BenLangmead opened #86
  • Dec 29 2017 05:47

    nellore on master

    fixes unit test in bed_pre (compare)

  • Dec 29 2017 03:11

    nellore on master

    writes coverage tsv, optionally… (compare)

  • Dec 28 2017 20:59

    nellore on master

    fixes issue wipassing temp dir … Merge branch 'master' of https:… (compare)

  • Dec 28 2017 20:53

    nellore on master

    fixes issue wipassing temp dir … (compare)

  • Dec 23 2017 00:30

    nellore on master

    fixes misplaced joins in except… (compare)

Julia di Iulio
@juliadiiulio_twitter
ha I did not !
abhinav
@nellore
it's much easier
Julia di Iulio
@juliadiiulio_twitter
I'll try
abhinav
@nellore
your credentials might work
but you then need to be able to create the default emr roles from your laptop
if they're already set up for your account it may work
Julia di Iulio
@juliadiiulio_twitter
hahah from what I learned so far... nothing is set up for my account :yum: but I'll try!
Ben Strober
@BennyStrobes
Hi @nellore . I'm hoping to use rail to generate exon-exon junction counts for ~200 samples (at ~50 mi reads). As rail takes a fair amount of time to run, I was planning on running multiple batches. And then aggregating the junction data across each of the batches. I just want to check to make sure doing this would give the same quantification as running all the samples in one batch??
abhinav
@nellore
@BennyStrobes it will if you use first-pass junctions
which are exactly the junctions recorded by snaptron
moreover, you can run only the first part of rail
if you specify that the only deliverable you want is jx
that is
use
-d jx
and then the whole thing takes a bit less than half the time it usually does
do you understand what i mean by "first-pass junctions?"
Ben Strober
@BennyStrobes
Hey @nellore, thanks for the quick reply! Thats perfect, as I want to compare my results to the snaptron data. I don't understand what you mean by "first-pass junctions", could you elaborate a bit?
abhinav
@nellore
junctions detected by the aligner in a given sample on a single pass of alignment
if we share junctions across samples, we may find that some junctions undetected in a given sample after a single pass of alignment are detected there on a second pass of alignment
but if you're comparing with snaptron
yes, you can simply use -d jx
Candace Liu
@cliu72
Hello! I am using Rail-RNA to process my RNA-Seq data, and am wondering if there is a way to remove PCR duplicates using UMIs after the alignment step (bowtie2) but before the tsv files are generated? Maybe stop the job after the bam files are created, do filtering, then resume the job? I am trying to use UMI-tools (https://github.com/CGATOxford/UMI-tools) which requires bam files as input to group and deduplicate PCR duplicates.
undiagnosed
@undiagnosed
Hello. I opened issue #85 which has a description of the problem I am having. I tried running again with uncompressed fastq files at it's currently at > 10 hrs stuck on step 4/24 with 0 tasks completed and no cpu utilization. Any idea what's going on? Thanks
Candace Liu
@cliu72
Hi @nellore , I asked a question a week ago about using UMI-tools with Rail. Did you have a chance to look into oit? Thanks!
undiagnosed
@undiagnosed
From the stacktrace, it's stuck on line 565 in align_reads.py, return_code = bowtie_process.wait(). I guess bowtie isn't running for some reason?
abhinav
@nellore
@cliu72 sorry for the late response. what you're asking for would require some nontrivial hacking -- rail doesn't read BAMs to generate TSVs, but rather hadoop intermediates that are in a SAM-like format. it would probably be much easier for you to write scripts that generate rail-like TSVs from groups of BAMs deduplicated by UMI-tools.
Candace Liu
@cliu72
Hi again! I am trying to incorporate Rail into MultiQC (http://multiqc.info/), a cool tool that aggregates summary statistics from different software into a single HTML report. To do so, MultiQC parses through the output for each tool. Is there an output option for Rail that includes summary statistics such as the number of reads unmapped and multimapped? I know I can calculate these using counts.tsv.gz, but MultiQC tries to avoid this kind of calculation as it may have a huge impact on processing time. If it doesn't already exist, such a file may be useful for future releases - just a suggestion. Thanks!
abhinav
@nellore
@cliu72 hey! which additional summary statistics do you want? counts.tsv.gz already provides the numbers of mapped and unmapped reads in each sample
it also tells you how many reads are mapped uniquely
and you can subtract that from the total number of mapped reads to obtain the number of multimappers
Candace Liu
@cliu72
Hi! Thanks for such a quick response. Yeah, I realize I can calculate summary statistics manually, but was just wondering if they were readily available in some sort of final log file (like in STAR). MutliQC tries to avoid calculating these statistics because it slows down the generation of the HTML report.
abhinav
@nellore
hey, yeah
so what i'm confused about is counts.tsv.gz is telling you exactly the summary statistics
which additional ones do you want?
one could also write multimapped reads, yes, but if you want them right now you can perform the subtraction between two columns of counts.tsv.gz: something like gzip -cd counts.tsv.gz | tail -n +2 | rev | cut -f1 | rev | awk -F',' '{print $1-$2}' will give you the number of multimappers
TaraNY
@TaraNY
Hello I have an error and I'm not sure what to make of it.
*Errors encountered*
Streaming command "LC_ALL=C sort -S 307200 -k1,1 -t$'\t' -m /home/CAM/tyankee/rail-rna_logs/align_readlets/dp.tasks/0.* | /isg/shared/apps/rail-rna/0.2.4b/pypy-2.5-linux_x86_64-portable/bin/pypy /isg/shared/apps/rail-rna/0.2.4b/rail-rna/rna/steps/align_readlets.py --bowtie-idx=/home/FCAM/jcotney/GENOME/Homo_sapiens/UCSC/hg38/Sequence/BowtieIndex/genome --bowtie-exe=/isg/shared/apps/rail-rna/0.2.4b/bowtie-1.1.2/bowtie --gzip-level=3 -- -t --sam-nohead --startverbose -v 0 -a -m 30 >/home/CAM/tyankee/rail-rna_logs/align_readlets/0 2>/home/CAM/tyankee/rail-rna_logs/align_readlets/dp.reduce.log/0.0.log" failed; exit level was 2.
Job flow failed on Thursday, Feb 01, 2018 at 04:29:39 PM EST. Run time was 6606.285 seconds.
To start this job flow from where it left off, run:
/isg/shared/apps/rail-rna/0.2.4b/pypy-2.5-linux_x86_64-portable/bin/pypy /isg/shared/apps/rail-rna/0.2.4b/rail-rna/dooplicity/emr_simulator.py -j /home/CAM/tyankee/rail-rna_logs/resume_flow_V5V2UF4SBAXZ.json -b /isg/shared/apps/rail-rna/0.2.4b/rail-rna/rna/driver/rail-rna.txt -l /home/CAM/tyankee/rail-rna_logs/flow.2018-02-01T14:39:30.187676.log -f --max-attempts 1 --num-processes 35
I'm thinking its having a problem with out Bowtie index?
our*
Vamsi Kodali
@vkkodali
Hello! I have a question about how Rail-RNA aligns reads around 'short' exons that are < 9 nt long. My understanding is that, for example, if a 100 nt read aligns to a location with a 5 nt overhang at the left end then the read will be 'readletized'. Because the min_readlet_sizeis 9, the left-most readlet would then align to the genome with a 5 nt overhang. Is this where the 'Cap search' process (described in Figure S8 of doi:10.1093/bioinformatics/btw575) initiated? If so, how does the search_window_size parameter apply here? Will the search be conducted only 1000 nt upstream? What if the intron is >1000 nt long?
gianmaz
@gianmaz
@nellore @nellore Hello, I have an error at the 4th step of the pipeline "Align reads and segment them into readlets". The error says No space left on device encountered sorting files. I am running 200 samples together in a machine with 1Tb memory. Could you confirm
that it is a memory issue? Are there any parameter that can be tuned to avoid this error? Thank you.
abhinav
@nellore
@gianmaz it's a disk space issue
try setting the scratch directory to someplace you know has a lot of space
with --scratch
also may help to use the --gzip-intermediates option to compress temporary files
gianmaz
@gianmaz
@nellore Thank you! that worked fine :) . I just wanted to be sure that I am doing things correcyly. I need the exons/gene counts for my 3000 samples. I would like to run rail-rna in batches of around 250 samples each. Could you confirm that if I run rail in batches this will not affect the gene counts? Moreover, for one batch the align_reads/dp.reduce.log/ gives 100% unpaired in first Bowtie round and then on average 67% of paired reads in second Bowtie round. While in realign_reads/dp.reduce.log/ I got 100% unpaired reads. Is this normal? What should I expect? Thank you!
gianmaz
@gianmaz
@nellore One last issue: At step 14 I get this error in the log file "Bowtie build failed, but probably because FASTA file was empty. continuing... " What could be the cause of this error? Thank you again!
TaraNY
@TaraNY
Hello we are getting a very weird error. Its odd because, we had this pipeline running just fine. However we recently had this issue: " Traceback (most recent call last):
File "app_main.py", line 75, in run_toplevel
File "/home/CAM/tyankee/raildotbio/rail-rna/rna/steps/compare_alignments.py", line 284, in <module>
tie_margin=args.tie_margin
File "/home/CAM/tyankee/raildotbio/rail-rna/rna/utils/alignment_handlers.py", line 874, in print_alignment_data
multiread_reports_and_ties[0][0][0].rpartition('\x1d')[2]
KeyError: 'CS13_14479_R2.fastq.gz'
It seems like we've tried everything, but this python error pops up after about 6 hours of running