Just merged master for first time in a while, acquired logger, which is really useful. BUT! I'm curious if you've found it finicky. I can get it to print reliably... sometimes. Seems like it's picky about format? Idk, when it works tho yeah it's really useful, perhaps i'm still on the learning curve...
Note to self. Always rebuild conda package after merging another branch, or else weird problems will linger and you'll spend many hours working on it that you'll wish you had back at the end of your life B-)
The 1st and 2nd reads of PE ddRAD will always be the same, since the first Illumina primer will only ligate to the sequence end with the first cutter overhang, and vice-versa. By "GBS", in my terminology, I always mean "two cut sites but only one cutter", and thus the two ends of a sequence are interchangeable, and therefore 1st and second reads are interchangeably forward or reverse stranded.
So in PE-ddRAD you just have to revcomp R2 and they will both be on the same strand. In GBS, on the other hand, you should revcomp R2, but then the pair (R1, R2) on this strand could still match with a different pair (R2, R1) on the complementary strand. Does that make sense?
For vsearch, this means I concatenate (R1, rv(R2)) and search only for single strand hits in ddRAD, whereas I concatenate (R1, rv(R2)) and search with --strand=both for pairGBS.
I've been messing with code for d-statistics since we're trying to use them for another project here. I feel like I just finally got over the hump in understanding how fast numpycan actually be. I've also been playing with numba and getting some nice additional speed ups. A for-loop that took 10s before is now running in 50ms.