These are chat archives for nellore/rail

1st
Jun 2015
abhinav
@nellore
Jun 01 2015 00:05
@maximus-b (neglected to mention)
maximus-b
@maximus-b
Jun 01 2015 14:57
Hi @nellore , thanks for the reply. Yes, I think I do. Is it the line after "Errors encountered"? This is the one: "Streaming command "sort -S 307200 -k1,1 -m /home/user/data.dir/rail-rna_logs/intron_search/dp.tasks/29.* | /home/user/raildotbio/pypy-2.5-linux_x86_64-portable/bin/pypy /home/user/raildotbio/rail-rna/rna/steps/intron_search.py --bowtie-idx=/home/user/data.dir/myref --partition-length=5000 --max-intron-size=500000 --min-intron-size=10 --min-exon-size=9 --search-window-size=1000 --motif-radius=5 >/home/user/data.dir/rail-rna_logs/intron_search/29 2>/home/user/data.dir/rail-rna_logs/intron_search/dp.reduce.log/29.0.log"; failed; exit level was 1."
The command recommended to resume Rail-RNA after I "fix the error" is "/home/user/raildotbio/pypy-2.5-linux_x86_64-portable/bin/pypy /home/user/raildotbio/rail-rna/dooplicity/emr_simulator.py -j /home/user/data.dir/rail-rna_logs/resume_flow_E66EEUGJG2IH.json -b /home/user/raildotbio/rail-rna/rna/driver/rail-rna.txt -l /home/user/data.dir/rail-rna_logs/flow.2015-05-31T16:28:18.972171.log -f --max-attempts 1 --num-processes 32". I do however need suggestions on what exactly to fix.
abhinav
@nellore
Jun 01 2015 15:25
can you view /home/user/data.dir/rail-rna_logs/intron_search/dp.reduce.log/29.0.log ?
try less /home/user/data.dir/rail-rna_logs/intron_search/dp.reduce.log/29.0.log, and tell me what you see
@maximus-b
maximus-b
@maximus-b
Jun 01 2015 15:48
Yes. For the log files in the dp.reduce.log directory, I checked 29.0.log and it says "Traceback (most recent call last):
File "app_main.py", line 75, in run_toplevel
File "/home/user/raildotbio/rail-rna/rna/steps/intron_search.py", line 1330, in <module>
global_alignment=global_alignment)
File "/home/user/raildotbio/rail-rna/rna/steps/intron_search.py", line 1256, in go
intron_pos, intron_end_pos) in introns:
File "/home/user/raildotbio/rail-rna/rna/steps/intron_search.py", line 818, in introns_from_clique
left_motif_search_size
File "/home/user/raildotbio/rail-rna/rna/utils/bowtie_index.py", line 158, in get_stretch
assert starting_rec >= 0
AssertionError"
In fact, of all the files in this particular log directory either have the same errors or have "DONE with intron_search.py; in/out=xxxxxxx/xxxxx; time=xxx.xxx s"
abhinav
@nellore
Jun 01 2015 15:55
interesting -- where did you get your bowtie index?
maximus-b
@maximus-b
Jun 01 2015 15:55
(I reran the analysis with 60 threads, only changing the -p in the rail-rna command line) the dp.reduce.log directory has 17 files with the "Traceback"s and 43 files with the "DONE with intron_search.py" in them.
I indexed it myself using bowtie and bowtie2 prior to running the rail-rna. I tried using the standard stable version we have on the server, and also tried the specific distribution that was installed with rail-rna. Both achieving same error.
Do you require the genome to be in finished/high quality draft? The contigs that I am using are a lower quality draft (lower sequencing coverage) of a de novo sequencing effort.
abhinav
@nellore
Jun 01 2015 15:59
no, this really should work on any indexes built with bowtie/bowtie2; are the contigs you're using public?
maximus-b
@maximus-b
Jun 01 2015 16:01
No, unfortunately not. We have just completed the assembly and wishes to use the transcriptome sequencing data to gather a set of gene models to train gene predictors with.
abhinav
@nellore
Jun 01 2015 16:03
alright, then let me try to debug this with you
maximus-b
@maximus-b
Jun 01 2015 16:04
Yes, please. If you will bear with the fact that I am not able to share the data directly with you.
Thank you very much!
abhinav
@nellore
Jun 01 2015 16:06
edit the file /home/user/raildotbio/rail-rna/rna/utils/bowtie_index.py, and add the line print >>sys.stderr, self.offset_in_ref right before line 158
also add an import sys at the top of the file
maximus-b
@maximus-b
Jun 01 2015 16:07
Top of the file as in before the" import os"?
abhinav
@nellore
Jun 01 2015 16:07
yes, that works
now try resuming the job with /home/user/raildotbio/pypy-2.5-linux_x86_64-portable/bin/pypy /home/user/raildotbio/rail-rna/dooplicity/emr_simulator.py -j /home/user/data.dir/rail-rna_logs/resume_flow_E66EEUGJG2IH.json -b /home/user/raildotbio/rail-rna/rna/driver/rail-rna.txt -l /home/user/data.dir/rail-rna_logs/flow.2015-05-31T16:28:18.972171.log -f --max-attempts 1 --num-processes 32
you'll still see the error, but there will be more output in 29.log
maximus-b
@maximus-b
Jun 01 2015 16:08
and line 158 is "assert starting_rec >=0"?
abhinav
@nellore
Jun 01 2015 16:08
righ
t
so before that line
maximus-b
@maximus-b
Jun 01 2015 16:10
Yes. Started.
abhinav
@nellore
Jun 01 2015 16:10
cool
maximus-b
@maximus-b
Jun 01 2015 16:11
I should wait for it to die again before checking 29.0.log?
abhinav
@nellore
Jun 01 2015 16:11
yeah
or whatever reducer fails first; the error will tell you
maximus-b
@maximus-b
Jun 01 2015 16:13
So I should wait for it to die, check which reducer died first, and post what it says in the log of that reducer? Got it.
abhinav
@nellore
Jun 01 2015 16:13
another question: how many bases are in this assembly you constructed?
yeah, that would be great
abhinav
@nellore
Jun 01 2015 16:56
is it taking way longer?
maximus-b
@maximus-b
Jun 01 2015 17:00
Hi, yes. It is taking way longer. I just ran off to do something else and returned. It is still running.
abhinav
@nellore
Jun 01 2015 17:00
ok ctrl+c it!
maximus-b
@maximus-b
Jun 01 2015 17:00
And the genome is about something over 700Mbp
abhinav
@nellore
Jun 01 2015 17:00
it's probably dumping too much output
try opening 29.log now
it may be a mess
maximus-b
@maximus-b
Jun 01 2015 17:02
It is.
abhinav
@nellore
Jun 01 2015 17:02
hah, oops!
can you paste a bit of it though?
maximus-b
@maximus-b
Jun 01 2015 17:02
The size is 2.4G
, 'genomePE1(paired)contig101702': [0, 135], 'genomePE1(paired)contig101703': [0, 1194], 'genomePE1(paired)contig101704': [0, 200, 666, 1157], 'genomePE1(paired)contig101705': [0], 'genomePE1(paired)contig101706': [0, 443], 'genomePE1(paired)contig_101707': [0], 'genome_PE1(paired)_contig_101708': [0], 'genome_PE1(paired)_contig_101709': [0], 'genome_PE1(paired)_contig_101710': [0], 'genome_PE1(paired)_contig_101711': [0], 'genome_PE1(paired)_contig_101712': [0], 'genome_PE1(paired)_contig_101713': [0, 133], 'genome_PE1(paired)_contig_101714': [0, 1721], 'genome_PE1(paired)_contig_101715': [0], 'genome_PE1(paired)_contig_101716': [0], 'genome_PE1(paired)_contig_101717': [0], 'genome_PE1(paired)_contig_101718': [0], 'genome_PE1(paired)_contig_101719': [0], 'genome_PE1(paired)_contig_101720': [0, 667], 'genome_PE1(paired)_contig_101721': [0, 148, 505], 'genome_PE1(paired)_contig_101722': [0, 274, 598, 2321, 2500], 'genome_PE1(paired)_contig_101723': [0], 'genome_PE1(paired)_contig_101724': [0, 149], 'genome_PE1(paired)_contig_101725': [0, 646, 800], 'genome_PE1(paired)_contig_101726': [0, 159], 'genome_PE1(paired)_contig_101727': [0, 2062], 'genome_PE1(paired)_contig_101728': [0], 'genome_PE1(paired)_contig_101729': [0, 1024, 2584], 'genome_PE1(pai^C
That is the last bit before I Ctrl+C'ed the cat
abhinav
@nellore
Jun 01 2015 17:08
okay, let's be less sloppy. can you return bowtie_index.py to its original state by removing the import sys and the line before the assertion, and instead change assert starting_rec >= 0 to assert starting_rec >= 0, ('Starting record is negative when %d bases of contig %s were requested at offset %d.' % (count, ref_id, ref_off))
then we can drill down on the error using the python interpreter. thanks for doing this, by the way
maximus-b
@maximus-b
Jun 01 2015 17:11
No problem.
So, I should use the same command to resume?
abhinav
@nellore
Jun 01 2015 17:12
yep
it'll overwrite that 2.4GB file
...and the others
abhinav
@nellore
Jun 01 2015 17:22
have to go shortly -- i'll be back in a few hours -- but i'm just going to ask you after rerunning to find the contig on which the assertionerror occurs and construct a bowtie 1 index for just that contig
Ben
@BenLangmead
Jun 01 2015 17:22
that could happen if self.offset_in_ref is empty (weird) or if ref_off is negative (weird) or if ref_off is less than the very first element of self.offset_in_ref[ref_id] -- also weird
maximus-b
@maximus-b
Jun 01 2015 17:24
How, roughly, will I be able to find that contig? It will be stated in the problematic log?
And what do I do with the bowtie1 index of that contig? or are you trying to see if bowtie1 would actually try to build an index out of that contig or spit out complaints?
Ben
@BenLangmead
Jun 01 2015 17:26
the new assert that @nellore suggested should tell us more
maximus-b
@maximus-b
Jun 01 2015 17:28
oh, thanks! We will see how things goes then, but I kinda have to go too (but for like maybe 8-9 hours) so I will definitely post outputs when I have them. Cheers, guys!
It's only running at "Inputs partitioned: 27/60" now.
maximus-b
@maximus-b
Jun 01 2015 18:56
Hi (me again), the same error was reported, namely partition 29. And the logs in the dp.reduce.log directory shows 8 contigs that causes problems due to AssertionError: Starting record is negative when xx bases of contig genome_PE1_(paired)_contig_xxxxx were requested at offset -1. Looking to me like a length issue, so I extracted these contigs and checked their lengths: 191-2141bp. Except for that single one of 191, the rest is >= 500 bp. I have also built individual bowtie1 indices for these sequences with the bowtie-1.1.1 binaries shipped with rail-rna. Please advise next steps / further information I may have missed. Now I really will be gone for 6-7 hours.
abhinav
@nellore
Jun 01 2015 19:15
thanks for reporting back; i think i know the issue
and i'll update rail to address it
abhinav
@nellore
Jun 01 2015 19:22
it's that a negative offset from the beginning of the reference is requested when rail-rna searches for motifs upstream of an alignment
abhinav
@nellore
Jun 01 2015 19:39
this is never a problem for human because every contig (but dinky chrM) begins with a string of Ns, so no alignment of a read segment occurs near the beginning of a contig. thanks for drawing our attention to the issue.
abhinav
@nellore
Jun 01 2015 21:41
@maximus-b just released v0.1.6b for you
just reinstall rail with it
exactly the way you installed it last time
choose yes when it asks if you want to overwrite the existing installation
you don't have to start your job flow from the beginning; you can resume it from where it left off because i changed just a few lines of code that don't affect the first part of the pipeline
see if it fixes your issue
if it does, to be on the safe, reproducible side, you may want to run your job flow from the very beginning
abhinav
@nellore
Jun 01 2015 21:47
let us know how things turn out!