Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Apr 09 00:16

    BenLangmead on master

    embed parsed_md (compare)

  • Apr 08 21:26

    BenLangmead on master

    make more portable (compare)

  • Apr 08 21:18

    BenLangmead on master

    some attempts to make this scri… (compare)

  • Dec 03 2018 15:25
    gianmaz edited #88
  • Dec 03 2018 14:25
    gianmaz opened #88
  • Mar 15 2018 20:05
    ChristopherWilks opened #87
  • Mar 15 2018 20:01

    ChristopherWilks on master

    switched to use sratoolkit 2.8.… (compare)

  • Mar 04 2018 22:50

    nellore on master

    patches bowtie2-build in travis… (compare)

  • Mar 04 2018 22:11

    nellore on master

    uses bowtie2 2.3.4.1 (compare)

  • Mar 04 2018 21:57

    nellore on master

    specifies samtools version to i… (compare)

  • Mar 04 2018 21:47

    nellore on master

    updates dependencies Merge branch 'master' of https:… (compare)

  • Mar 04 2018 21:39

    nellore on master

    quote rules Merge pull request #86 from Ben… (compare)

  • Mar 04 2018 21:39
    nellore closed #86
  • Feb 04 2018 01:11

    nellore on master

    enables --keep-alive for covera… (compare)

  • Jan 10 2018 13:09
    BenLangmead opened #86
  • Dec 29 2017 05:47

    nellore on master

    fixes unit test in bed_pre (compare)

  • Dec 29 2017 03:11

    nellore on master

    writes coverage tsv, optionally… (compare)

  • Dec 28 2017 20:59

    nellore on master

    fixes issue wipassing temp dir … Merge branch 'master' of https:… (compare)

  • Dec 28 2017 20:53

    nellore on master

    fixes issue wipassing temp dir … (compare)

  • Dec 23 2017 00:30

    nellore on master

    fixes misplaced joins in except… (compare)

abhinav
@nellore
but i would also check whether some walltime is being saturated
exit code 9 means "ran out of cpu time"
ponomarevsy
@ponomarevsy
Thanks for your feedback. Yesterday I've submitted a job on 16 CPUs (high memory node) and it is still running with these options: #$ -pe fillup 16; #$ -l highmem,h_vmem=45G
ponomarevsy
@ponomarevsy
For the failed jobs the rail-rna_logs/align_reads/dp.reduce.log directory is empty.
ponomarevsy
@ponomarevsy
I wonder if this is MPI-related, as it seems to fail only when running on multiple nodes... If that is the case I need to go back to my ipython cluster submission script...
Candace Liu
@cliu72

Hello! I've been testing Rail-RNA and I have some questions I was hoping someone could help me with. Is there any built-in way to output a file with summary statistics? I.e. mapping rate, uniquely mapped reads, multimapped reads. Also, how are multimapped reads handled? Is there a way to calculate the multimapping rate? I tried summing up all the unique reads and dividing by total reads, but it gave me a weird number.

In the counts.tsv file in my output, the third to last column seems to just be labeled with a star (*). What does this column represent? Also, I am still new in looking at this kind of data and unfamiliar with some of the “groups” or column names in counts.tsv. For example, what do the following represent: chr15_KI270727v1_random (what does the “random” mean, is this actually a part of chromosome 15), chrUn_KI270442v1, chrUn_GL000216v2, chrEBV, and chrM?

Also in the coverage_bigWigs output folder, I am getting more output files than http://docs.rail.bio/deliverables/ says under the “bw: coverage vectors” section. For example for sample A01, I am getting A01.A.bw, A01.A.unique.bw, A01.C.bw, A01.C.unique.bw, A01.G.bw, A01.G.unique.bw, A01.N.bw, A01.N.unique.bw, A01.T.bw, A01.T.unique.bw, A01.bw, and A01.unique.bw. Why are the separate files for each base?

ponomarevsy
@ponomarevsy
Rail-RNA runs fine on 16 engines but fails on > 16 on this step: 00h:53m:14s || Step 4/24: Align reads and segment them into readlets
01h:50m:02s |
| Partitioned 32 inputs into tasks.
abhinav
@nellore
@ponomarevsy i would ask for some help from your sysadmin. i'm now pretty confident this isn't a problem with rail; you are submitting your job to a queue where you have limited cpu time, and you need a larger allocation to be able to finish your job
abhinav
@nellore
@cliu72 the counts.tsv file in the cross_sample_results folder of the output is a table. each element of the table has two numbers separated by a comma. the first number is the total number of reads (mates for paired-end samples) that aligned to the given contig (column header). the second number is the total number of reads (mates for paired-end samples) that aligned uniquely to the given contig (column header.) the * column denotes unmapped reads. to understand what "random" means, you should read more about genome builds; a good reference is https://gatkforums.broadinstitute.org/gatk/discussion/7857/reference-genome-components. note that you've likely aligned to a primary assembly. also note that chrEBV is epstein-barr virus, which you find in a lot of people i guess.
@cliu72 as for those separate files for each base, that's just a compact way of storing the coverage of mismatches between reads and the reference. they're not documented right now, so you don't have to worry about them.
hope this helps, everyone!
and sorry for the delayed support
Candace Liu
@cliu72

Thanks for your detailed reply; it helped a lot! If I understand correctly, chr15_KI270727v1_random represents a sequence on chromosome 15 but with unknown order or orientation. So, for example, if counts.tsv tells me that for a certain sample, chr15_KI270727v1_random has a total of 100 reads and 80 uniquely mapped reads, while chr15 has 4000000 total reads and 3000000 unique reads, are the 100 total and 80 unique reads a subset of the 4000000 total and 3000000 unique reads? If so, calculating the percentage of multimapping reads isn't as simple as adding all the uniquely mapped reads and dividing that by the total number of reads.

Also, in my counts.tsv file, the sum of a value in the * (unmapped) and "total mapped reads" column does not equal the number in the "total reads" column. Why is this?

Thanks again!

P.S. I'm using rail with the ultimate goal of creating recount objects. I asked a question here: https://gitter.im/recount-contributions/Lobby but haven't received a reply yet. Since there seem to be a lot of overlapping authors, I was wondering if I could get some help there as well. Thanks!
abhinav
@nellore
@cliu72 "So, for example, if counts.tsv tells me that for a certain sample, chr15_KI270727v1_random has a total of 100 reads and 80 uniquely mapped reads, while chr15 has 4000000 total reads and 3000000 unique reads, are the 100 total and 80 unique reads a subset of the 4000000 total and 3000000 unique reads? If so, calculating the percentage of multimapping reads isn't as simple as adding all the uniquely mapped reads and dividing that by the total number of reads." -- rail nominates at most one alignment as the primary alignment of a given read, and counts.tsv counts only primary alignments. so chr15_KI270727v1_random's alignments are not a subset of chr15's alignments -- they are disjoint sets, and computing the percentage of multimapping reads is indeed as single as adding the uniquely mapped reads, dividing by the total number of reads, and subtracting the result from 1
abhinav
@nellore
as for how * + total mapped reads != total reads, this shouldn't happen
can you show me your counts file?
Candace Liu
@cliu72
Thanks for your help!
abhinav
@nellore
ABHINAVs-MacBook-Pro:Downloads verve$ gzip -cd counts.tsv.gz | rev | cut -f1-3 | rev | tail -n +2 | awk '{split($1,unmapped,","); split($2,aligned,","); split($3,total,","); for (i=1; i<=2; i++) { if (unmapped[i] + aligned[i] != total[i]) {print "fail"}}}' ABHINAVs-MacBook-Pro:Downloads verve$
everything appears to check out
Candace Liu
@cliu72
Ah, I looked more closely through my R code and turns out the command I was using to split at the comma wasn't working correctly...sorry about that...Thanks!
abhinav
@nellore
oops!
Julia di Iulio
@juliadiiulio_twitter

Hi @nellore! Just started using Rail-rna because it looks like it rocks:) but have had some issues... I am trying (with help from devOps people) to run it with AWS EMR..
Here is the error I get , any help would be greatly appreciated:

Parameter validation failed:
Missing required parameter in LifecycleConfiguration.Rules[0]: "Prefix"
Traceback (most recent call last):
File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/raildotbio/rail-rna/main.py", line 975, in <module>
ec2_slave_security_group_id=args.ec2_slave_security_group_id
File "/usr/local/raildotbio/rail-rna/rna/driver/rna_config.py", line 6435, in init
secure_stack_name=secure_stack_name)
File "/usr/local/raildotbio/rail-rna/rna/driver/rna_config.py", line 2217, in init
days=intermediate_lifetime)
File "/usr/local/raildotbio/rail-rna/dooplicity/ansibles.py", line 386, in expire_prefix
' '.join(aws_command)
RuntimeError: Error encountered changing lifecycleparameters with command "aws --profile default s3api put-bucket-lifecycle --bucket rail-rna --lifecycle-configuration {"Rules":[{"Status": "Enabled", "ID": "something", "NoncurrentVersionExpiration": {"NoncurrentDays": 365}, "Expiration": {"ExpiredObjectDeleteMarker": true}, "AbortIncompleteMultipartUpload": {"DaysAfterInitiation": 7}}, {"Status": "Enabled", "Prefix": "ElasticMode/20170816.intermediate/", "Expiration": {"Days": -1}}]}".

In the meanwhile, I also tried using it locally, and get the following error for some of the samples but not for all:

less rail-rna_logs/align_reads/dp.reduce.log/0.0.log
Traceback (most recent call last):
File "app_main.py", line 75, in run_toplevel
File "/usr/local/raildotbio/rail-rna/rna/steps/align_reads.py", line 774, in <module>
no_polyA=args.no_polyA)
File "/usr/local/raildotbio/rail-rna/rna/steps/align_reads.py", line 465, in go
for is_reversed, name, qual in xpartition:
ValueError: expected length 3, got 2

I thought this could be due to bad input fastq, so I used the --skip-bad-records, but it doesn't seem to fix it.

Julia di Iulio
@juliadiiulio_twitter
oh and I'm using Rail-RNA v0.2.4a if that helps
and get the same error (in elastic mode) with v0.2.4b
abhinav
@nellore
hi @juliadiiulio_twitter! thanks for trying rail-rna. the lifecycle error is a known issue in rail-rna v0.2.4b; see nellore/rail#59. (we want to fix a few other issues too before our next release.) the current workaround is for you to use --intermediate-lifetime -1 in your rail-rna command to disable expiring intermediate data; after rail is done a job flow, you can manually delete these from s3 using either the console or the aws cli
as for your local-mode error, my best guess is this has something to do with malformed input. from some previous experience helping another user, one possibility is you're trimming reads, and for some reads the trimmer eliminates the entire read. this could be way off though, and i'm happy to debug live with you in this chat room
Julia di Iulio
@juliadiiulio_twitter
Thanks @nellore for you advice :)
I tried v0.2.4a in elastic mode and get the same error (both when I use --intermediate-lifetime -1 or not) any other idea ? or should I try to go back to an earlier version before the v0.2.4a ?
For the local mode, I will look into it (there is indeed some preprocessing on the fastq files that might have taken off the whole read in some instances)
Thanks a lot!
abhinav
@nellore
gah
i see intermediate-lifetime = -1 is an issue too!
ok, good to know
can you try creating a new bucket in the AWS console
and adding one dud lifecycle rule?
Julia di Iulio
@juliadiiulio_twitter
Let me check with our devops team (but I think that's what they did). I'll come back to you as soon as I hear from them!
abhinav
@nellore
great!
Julia di Iulio
@juliadiiulio_twitter
Okay, they confirmed that they did that already
abhinav
@nellore
alright, can i just walk you through hacking rail so it doesn't do this?
Julia di Iulio
@juliadiiulio_twitter
sure! I'll do my best to follow:)
abhinav
@nellore
edit the file /usr/local/raildotbio/rail-rna/dooplicity/ansibles.py so L356 reads if 'NoSuchLifecycleConfiguration' not in errors: rather than if 'NoSuchLifecycleConfiguration' in errors:
save and rerun, and hopefully that particular error will be gone
but then you'll still face the issue that's potentially overtrimming
one way to handle that is that if a read is 100% trimmed, replace it with a single-character read sequence "N" with quality sequence "#"
on the other hand, you may find you don't really need to trim since rail-rna soft-clips alignments
Julia di Iulio
@juliadiiulio_twitter

hmm looks like I still get the error

Parameter validation failed:
Missing required parameter in LifecycleConfiguration.Rules[0]: "Prefix"
Traceback (most recent call last):
File "/usr/lib64/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/usr/local/raildotbio/rail-rna/main.py", line 975, in <module>
ec2_slave_security_group_id=args.ec2_slave_security_group_id
File "/usr/local/raildotbio/rail-rna/rna/driver/rna_config.py", line 6435, in init
secure_stack_name=secure_stack_name)
File "/usr/local/raildotbio/rail-rna/rna/driver/rna_config.py", line 2217, in init
days=intermediate_lifetime)
File "/usr/local/raildotbio/rail-rna/dooplicity/ansibles.py", line 386, in expire_prefix
' '.join(aws_command)
RuntimeError: Error encountered changing lifecycleparameters with command "aws --profile default s3api put-bucket-lifecycle --bucket rail-rna --lifecycle-configuration {"Rules":[{"Status": "Enabled", "ID": "something", "NoncurrentVersionExpiration": {"NoncurrentDays": 365}, "Expiration": {"ExpiredObjectDeleteMarker": true}, "AbortIncompleteMultipartUpload": {"DaysAfterInitiation": 7}}, {"Status": "Enabled", "Prefix": "jdiiulio/storage/analysis/RNAseq/HNX/GNE/ElasticMode/20170816.intermediate/", "Expiration": {"Days": -1}}]}".

abhinav
@nellore
alright better idea
add a new line at the beginning of the function
right before L315
write return
i think AWS changed its API