These are chat archives for dereneaton/ipyrad

Mar 2016
Deren Eaton
Mar 08 2016 02:27
Here's a general question: If we assume most users are gonna do a conda install, is it still poor form to have too many dependencies?
I mean, right now we have some bloat in the install that is not necessarily required: jupyter, toyplot, numba...
But it's not like it hurts anyone to add these packages if they don't exist. The install only takes like 5 seconds longer for it.
Isaac Overcast
Mar 08 2016 03:37
No harm, no foul. Literally nobody will notice.
This is one of the tools you reviewed recently, yes? I haven't read the paper yet, is it worthy?
Deren Eaton
Mar 08 2016 15:41

I'm looking at the args to smalt:

... -f sam -n 2 -x -c 0.85

it looks like these do the following:
-n = threads
-x = use exhaustive search
-c = minimum coverage of kmers to the query seq

Would we get better matches if we messed with these flags?
-m = minscor. default is <wordlen> + <stepsize> - 1.
-y = minid. This seems like the closest to vsearch clust_threshold.

Because at the moment it seems like we are missing ref matches by some sequences and not others in the sim data even when they only differ by one base.

the GlbPSs software tries to better assemble really messy paired-gbs data where the reads overlap. Which we already do, obviously. It's not a very cohesive software, more of a collection of perl scripts if I remember right..
Isaac Overcast
Mar 08 2016 15:46
-y looks good, seems worth a shot.
Deren Eaton
Mar 08 2016 17:52
should we just make using MPI the default? I mean, why bother having the --MPI flag at all?
There is a lab here at Yale doing tsetse fly research who want to analyze pairddrad data with a reference genome. I told them we're close, but still working on it.
Deren Eaton
Mar 08 2016 18:12
Are you free this weekend? Maybe we could meet up Sat for a tiny hackathon so I can get caught up on refmapping and we can try to squash these last bugs.