Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Jan 12 17:17

    nellore on master

    adds deprecation note (compare)

  • Nov 30 2020 19:01
    Rimsk commented #85
  • Nov 30 2020 18:58
    Rimsk commented #85
  • Jul 09 2020 23:08

    nellore on master

    adds unit test to alignment_han… (compare)

  • Oct 18 2019 17:59
    muhammed-ali commented #85
  • Oct 09 2019 17:29
    dfermin commented #85
  • Oct 08 2019 13:14
    dfermin commented #85
  • Apr 09 2019 00:16

    BenLangmead on master

    embed parsed_md (compare)

  • Apr 08 2019 21:26

    BenLangmead on master

    make more portable (compare)

  • Apr 08 2019 21:18

    BenLangmead on master

    some attempts to make this scri… (compare)

  • Dec 03 2018 15:25
    gianmaz edited #88
  • Dec 03 2018 14:25
    gianmaz opened #88
  • Mar 15 2018 20:05
    ChristopherWilks opened #87
  • Mar 15 2018 20:01

    ChristopherWilks on master

    switched to use sratoolkit 2.8.… (compare)

  • Mar 04 2018 22:50

    nellore on master

    patches bowtie2-build in travis… (compare)

  • Mar 04 2018 22:11

    nellore on master

    uses bowtie2 2.3.4.1 (compare)

  • Mar 04 2018 21:57

    nellore on master

    specifies samtools version to i… (compare)

  • Mar 04 2018 21:47

    nellore on master

    updates dependencies Merge branch 'master' of https:… (compare)

  • Mar 04 2018 21:39

    nellore on master

    quote rules Merge pull request #86 from Ben… (compare)

  • Mar 04 2018 21:39
    nellore closed #86
abhinav
@nellore
@cliu72 hey! which additional summary statistics do you want? counts.tsv.gz already provides the numbers of mapped and unmapped reads in each sample
it also tells you how many reads are mapped uniquely
and you can subtract that from the total number of mapped reads to obtain the number of multimappers
Candace Liu
@cliu72
Hi! Thanks for such a quick response. Yeah, I realize I can calculate summary statistics manually, but was just wondering if they were readily available in some sort of final log file (like in STAR). MutliQC tries to avoid calculating these statistics because it slows down the generation of the HTML report.
abhinav
@nellore
hey, yeah
so what i'm confused about is counts.tsv.gz is telling you exactly the summary statistics
which additional ones do you want?
one could also write multimapped reads, yes, but if you want them right now you can perform the subtraction between two columns of counts.tsv.gz: something like gzip -cd counts.tsv.gz | tail -n +2 | rev | cut -f1 | rev | awk -F',' '{print $1-$2}' will give you the number of multimappers
TaraNY
@TaraNY
Hello I have an error and I'm not sure what to make of it.
*Errors encountered*
Streaming command "LC_ALL=C sort -S 307200 -k1,1 -t$'\t' -m /home/CAM/tyankee/rail-rna_logs/align_readlets/dp.tasks/0.* | /isg/shared/apps/rail-rna/0.2.4b/pypy-2.5-linux_x86_64-portable/bin/pypy /isg/shared/apps/rail-rna/0.2.4b/rail-rna/rna/steps/align_readlets.py --bowtie-idx=/home/FCAM/jcotney/GENOME/Homo_sapiens/UCSC/hg38/Sequence/BowtieIndex/genome --bowtie-exe=/isg/shared/apps/rail-rna/0.2.4b/bowtie-1.1.2/bowtie --gzip-level=3 -- -t --sam-nohead --startverbose -v 0 -a -m 30 >/home/CAM/tyankee/rail-rna_logs/align_readlets/0 2>/home/CAM/tyankee/rail-rna_logs/align_readlets/dp.reduce.log/0.0.log" failed; exit level was 2.
Job flow failed on Thursday, Feb 01, 2018 at 04:29:39 PM EST. Run time was 6606.285 seconds.
To start this job flow from where it left off, run:
/isg/shared/apps/rail-rna/0.2.4b/pypy-2.5-linux_x86_64-portable/bin/pypy /isg/shared/apps/rail-rna/0.2.4b/rail-rna/dooplicity/emr_simulator.py -j /home/CAM/tyankee/rail-rna_logs/resume_flow_V5V2UF4SBAXZ.json -b /isg/shared/apps/rail-rna/0.2.4b/rail-rna/rna/driver/rail-rna.txt -l /home/CAM/tyankee/rail-rna_logs/flow.2018-02-01T14:39:30.187676.log -f --max-attempts 1 --num-processes 35
I'm thinking its having a problem with out Bowtie index?
our*
Vamsi Kodali
@vkkodali
Hello! I have a question about how Rail-RNA aligns reads around 'short' exons that are < 9 nt long. My understanding is that, for example, if a 100 nt read aligns to a location with a 5 nt overhang at the left end then the read will be 'readletized'. Because the min_readlet_sizeis 9, the left-most readlet would then align to the genome with a 5 nt overhang. Is this where the 'Cap search' process (described in Figure S8 of doi:10.1093/bioinformatics/btw575) initiated? If so, how does the search_window_size parameter apply here? Will the search be conducted only 1000 nt upstream? What if the intron is >1000 nt long?
gianmaz
@gianmaz
@nellore @nellore Hello, I have an error at the 4th step of the pipeline "Align reads and segment them into readlets". The error says No space left on device encountered sorting files. I am running 200 samples together in a machine with 1Tb memory. Could you confirm
that it is a memory issue? Are there any parameter that can be tuned to avoid this error? Thank you.
abhinav
@nellore
@gianmaz it's a disk space issue
try setting the scratch directory to someplace you know has a lot of space
with --scratch
also may help to use the --gzip-intermediates option to compress temporary files
gianmaz
@gianmaz
@nellore Thank you! that worked fine :) . I just wanted to be sure that I am doing things correcyly. I need the exons/gene counts for my 3000 samples. I would like to run rail-rna in batches of around 250 samples each. Could you confirm that if I run rail in batches this will not affect the gene counts? Moreover, for one batch the align_reads/dp.reduce.log/ gives 100% unpaired in first Bowtie round and then on average 67% of paired reads in second Bowtie round. While in realign_reads/dp.reduce.log/ I got 100% unpaired reads. Is this normal? What should I expect? Thank you!
gianmaz
@gianmaz
@nellore One last issue: At step 14 I get this error in the log file "Bowtie build failed, but probably because FASTA file was empty. continuing... " What could be the cause of this error? Thank you again!
TaraNY
@TaraNY
Hello we are getting a very weird error. Its odd because, we had this pipeline running just fine. However we recently had this issue: " Traceback (most recent call last):
File "app_main.py", line 75, in run_toplevel
File "/home/CAM/tyankee/raildotbio/rail-rna/rna/steps/compare_alignments.py", line 284, in <module>
tie_margin=args.tie_margin
File "/home/CAM/tyankee/raildotbio/rail-rna/rna/utils/alignment_handlers.py", line 874, in print_alignment_data
multiread_reports_and_ties[0][0][0].rpartition('\x1d')[2]
KeyError: 'CS13_14479_R2.fastq.gz'
It seems like we've tried everything, but this python error pops up after about 6 hours of running
Vy T. Nguyen
@vnguyen0009
Hello! I have been working with rail-rna for about a month now. I am trying to run paired end samples through your pipeline. I managed to get one experiment to go through but now I am running into the same error for my next two experiments. I am stuck on step 20/24 and received the following error in the log file:
Traceback (most recent call last):
File "app_main.py", line 75, in run_toplevel
File "/home/ubuntu/raildotbio/rail-rna/rna/steps/coveragepre.py", line 180, in <module>
for
, _, uniqueness, diff in diffs:
ValueError: expected length 4, got 8
Vy T. Nguyen
@vnguyen0009
Here is also the actual error:
Streaming command "LC_ALL=C sort -T /disc2/91_190506_TruSeq/scratch -S 307200 -k1,1 -k2,3 -t$'\t' -m /disc2/rail-rna_logs/precoverage/dp.tasks/6.* | /home/ubuntu/raildotbio/pypy-2.5-linux_x86_64-portable/bin/pypy /home/ubuntu/raildotbio/rail-rna/rna/steps/coverage_pre.py --bowtie-idx=/disc1/BowtieIndex/genome --library-size 40 --read-counts counts.tsv.gz --partition-stats --manifest=/disc1/sample_manifest/91_190506.manifest --output-ave-bigwig-by-chr 2>/disc2/rail-rna_logs/precoverage/dp.reduce.log/6.0.log" failed; exit level was 1.
Job flow failed on Thursday, Oct 31, 2019 at 12:57:02 AM UTC. Run time was 39644.291 seconds.
To start this job flow from where it left off, run:
/home/ubuntu/raildotbio/pypy-2.5-linux_x86_64-portable/bin/pypy /home/ubuntu/raildotbio/rail-rna/dooplicity/emr_simulator.py -j /disc2/rail-rna_logs/resume_flow_IVVIOIYH26ZL.json -b /home/ubuntu/raildotbio/rail-rna/rna/driver/rail-rna.txt -l /disc2/rail-rna_logs/flow.2019-10-30T13:56:17.966831.log -f --max-attempts 1 --num-processes 7 --sort "sort -T /disc2/91_190506_TruSeq/scratch"
Traceback (most recent call last):
File "app_main.py", line 75, in run_toplevel
File "/home/ubuntu/raildotbio/rail-rna/dooplicity/emr_simulator.py", line 2044, in <module>
args.direct_write)
File "/home/ubuntu/raildotbio/rail-rna/dooplicity/emr_simulator.py", line 1900, in run_simulation
max_attempts=max_attempts
File "/home/ubuntu/raildotbio/rail-rna/dooplicity/emr_simulator.py", line 1359, in execute_balanced_job_with_retries
raise RuntimeError
RuntimeError
abhinav
@nellore
hi @vnguyen0009 ! let's try to figure this out
can you view /disc2/rail-rna_logs/precoverage/dp.reduce.log/6.0.log?
Vy T. Nguyen
@vnguyen0009
That would be great! What do you need from me?
abhinav
@nellore
that log should have more information about the error
Vy T. Nguyen
@vnguyen0009
Traceback (most recent call last):
File "app_main.py", line 75, in run_toplevel
File "/home/ubuntu/raildotbio/rail-rna/rna/steps/coveragepre.py", line 180, in <module>
for
, _, uniqueness, diff in diffs:
ValueError: expected length 4, got 8
abhinav
@nellore
that's no good and somewhat opaque
do you have /disc2/91_190506_TruSeq/scratch -S 307200 -k1,1 -k2,3 -t$'\t' -m /disc2/rail-rna_logs/precoverage/dp.tasks/6.*?
err
that is
/disc2/rail-rna_logs/precoverage/dp.tasks/6.*?
Vy T. Nguyen
@vnguyen0009
yes I have that file.
abhinav
@nellore
how many such files are there?
Vy T. Nguyen
@vnguyen0009
There are seven 6.* files
abhinav
@nellore
are they large?
Vy T. Nguyen
@vnguyen0009
no, between 40-42M
abhinav
@nellore
so those files contain no identifiable info EXCEPT for possible deletion signal -- they simply contain just enough information to construct coverage bigwigs specifying how many primary alignments map across each base of the genome
if you're comfortable sending them to me
i can take a look
i suspect it'll be hard to diagnose this issue without studying those files
Vy T. Nguyen
@vnguyen0009
I think it should be fine. Let me double check with my supervisor really quick.
Yes I should be able too. Sorry, just needed to double check.
Vy T. Nguyen
@vnguyen0009
Trying to send them through gitter. The files are too big for email. ;(
Okay, sent you a link to share the files through onedrive. Let me know if you get it. I sent it to your email.