My pseudocode looks like:
val readsDS = ac.loadAlignments(inputBam) // Load the bam file with paired end reads
val readsKeep = readsDS.toDF
.agg(count("readName").alias("countReads"), sum("PairIndex").alias("sumPairs"))) // Assuming PairIndex = 1 if 1st, and 2 if 2nd in pair
.where("countReads=2 AND sumPairs=3") // This will give a dataframe of read names to keep
// Do a "self-join" with the read names for the reads to keep
val filteredReads: AlignmentDataset = readsDS.transformDataset(
(ds: Dataset[AlignmentProduct]) =>
ds.col("readName") === readsKeep.col("keepReads"), "left_semi")
Is there a better way to do what I'm doing? The paired end reads are giving me a headache.
P.S. I hate bam files.
P.P.S Thanks for any advice or suggestions. :)
hi Michael, thank you again for your help! I'd looked at Fragments, but I wanted to also be able to filter reads based on read flags, e.g. getProperPair and getPrimaryAlignment and I couldn't figure out how to access the read info to filter the Fragments.
Any suggestions? Thanks so much again!
Hello again... I'd been analyzing a bam file with ADAM with mapping qualities that are unavailable (i.e. mapping quality = 255). I noticed that when read into an AlignmentDataset the mapping qualities corresponding to 255 are changed null, which is fine and good. The problem is when I write the values back to file with saveAsSam, the nulls are converted to 0 not 255. Any idea what I'm doing wrong, or is there a way to get 255's instead of 0?
Thanks again for all your help and for fielding my (many) questions. :)
hi, I've been searching through the documentation but couldn't find an answer so I thought I'd ask here. I apologize in advance if this question has been asked before.
I'm using a broadcastRegionJoinAgainst(gffObject) to join reads to annotations in a gff3 file. Is strand information used for determining overlap? If yes, is there a way to enforce strand matching. If no, is there a way to disable strand matching?
Thanks for your help!