junctions.bgz
for a particular compilation, any other approach is going to either be study specific (as in recount3) or heavily filtered (snapcount). Ultimately this is because the junction matrices for a compilation like srav3 are massive (228,000,000 x 315,000) if fully materialized as dense matrices and are still quite large even if stored as sparse.
junctions.bgz
in a relational format), for the very large compilations in recount3 (srav3, tcgav2, and gtexv2) queries tend to take a very long time when run on them, so it's best to use the bgzipped formatted files in a streaming mode as @nellore suggested to avoid that.
my_junc <- QueryBuilder(compilation = 'srav2',regions = 'chr8:79611215-79616821')
but you will get back all the events that also overlapmy_junc@colData
but it's not super obvious that there's a way to get the samples that had the one range I was looking for out of the 30 that were returned.
OO - figured it out - could recommend including this in the vignette as IMO the function names and order of operations seemed a bit unclear.query <- QueryBuilder(compilation = 'srav3h',regions = 'chr8:79611215-79616821')
query = set_row_filters(query,snaptron_id == 72542235)
my_juncs <- query_jx(query)
So you can get the exact snaptron id and set a filter afterwards
@aleighbrown table 1 of http://snaptron.cs.jhu.edu/reftables.html says region coordinates are 1-based in a query; this is also true of returned coordinates.
a more straightforward way to perform a single-junction query can perhaps be inferred from p. 4 of http://www.bioconductor.org/packages/release/bioc/manuals/snapcount/man/snapcount.pdf -- see the Exact
field.
thanks for using!
sort
to sort by junction (i believe this will be -k3,3 -k4,4n -k5,5n -k7,7
for junctions.bgz
), then pipe the result to a script that merged lines corresponding to the same junction. should be possible to obtain the other database files starting there. use --parallel=n
to use n
threads when sorting
*.sjs.merged.motifs
from the unify run directory from each of the 2 sets files, copy those to one machine so you have them together, then re-run the final steps of the jxn unify pipeline to get the fully merged set of jxns.
collect_per_study_sjs
=>merge_per_study_sjs
=>collect_study_merged_sjs
=>merge_all_sjs
=>annotate_all_sjs
(final step produces the full set of merged jxns with motifs and annotations)