These are chat archives for dereneaton/ipyrad

25th
Jan 2018
Isaac Overcast
@isaacovercast
Jan 25 2018 00:10
@richiehodel_twitter Mmm, combining different read lengths is tricky. Combining 50bp and 100bp reads from the same locus will result in the 50bp reads being extended with 50 indels, assuming vsearch doesn't totally hash the clustering (which it very well may do). There's probably a gap penalty we could introduce in the vsearch clustering that would prevent totally insane clusters from forming, but our assumption is that read lengths are in the same ballpark (within the normal variance of a normal-ish sequencing run).
Isaac Overcast
@isaacovercast
Jan 25 2018 00:17
@richiehodel_twitter 7 weeks is a very, very long time, even for such a large dataset. I'd be willing to bet that vsearch (the clustering method we use internally) is just thrashing super hard trying to deal with the different read lengths. We use the default gap penalty (i think) which could be causing havoc with data with different read lengths. vsearch has a --gapopen flag that you could experiment with in this condition (increasing the penalty for opening a gap within the sequence and reducing the penalty for introducing a gap at the edges) but this would necessitate cloning the ipyrad repo and hacking the code yourself, because this is kind of an edge use-case.
richie hodel
@richiehodel_twitter
Jan 25 2018 05:04
@isaacovercast Thanks for the info! We'll figure out a way to deal with the different read lengths. And we haven't started our run yet--I think the 7 weeks is a run that @nitishnarula is working on. My dataset is only half the number of individuals that his is, so I was trying to get a feel for the computing resources I'd need to request.