These are chat archives for dereneaton/ipyrad

16th
May 2016
Isaac Overcast
@isaacovercast
May 16 2016 00:05
I see, those both make sense. When are you leaving for the field? June something?
Deren Eaton
@dereneaton
May 16 2016 00:24
ha, no. I'm leaving Tuesday.
we might need to revise our timeline.
Isaac Overcast
@isaacovercast
May 16 2016 00:25
Oh whoa dude that's soon. Time flies... Yah, agreed. So you get back Junesomething then?
Deren Eaton
@dereneaton
May 16 2016 00:25
June 8.
what do you think we're lacking before we can release a true Beta?
there's a bunch of things we want to add, but not all of them need to be ready for it to be functional.
Isaac Overcast
@isaacovercast
May 16 2016 00:27
I mean, nothing really. I'm still having an issue pulling the reference sequence into each locus for muscle aligning for PE, but we can just not include that as a beta feature.
Deren Eaton
@dereneaton
May 16 2016 00:28
yeah, for the first real release I really want to have the ref method carry the location data all the way through, and that will take some serious work.
but we could probably release beta for denovo assemblies.
Isaac Overcast
@isaacovercast
May 16 2016 00:29
You mean in the label for each read? That's actually working, last time I checked.
Deren Eaton
@dereneaton
May 16 2016 00:29
oh, cool. Yeah, well then we just need to parse the labels so we can include the location info in the output files (e.g., loci and vcf).
Isaac Overcast
@isaacovercast
May 16 2016 00:30
oh right, yeah that'll be a little work.
What about testing on real data? I haven't rerun the GBS through the whole pipeline ina while, it might be comforting to re-run some real data start to finish, it would be nice to feel confident at least with the datatypes we have access to.
Deren Eaton
@dereneaton
May 16 2016 00:34
yeah, for sure. I've had good success with RAD, and just now with pairddrad, though there is still some small step7 bugs for pairddrad.
I'm working on those now.
Isaac Overcast
@isaacovercast
May 16 2016 00:39
Nasty bug in data.stats. Assembly line 227:
```
Deren Eaton
@dereneaton
May 16 2016 00:40
what happens?
ah. How do you get Nan's?
Isaac Overcast
@isaacovercast
May 16 2016 00:41
ValueError: Cannot convert NA to integer
Deren Eaton
@dereneaton
May 16 2016 00:41
I added that line b/c the summary was sometimes printing all floats and it looked ugly.
there's probably some more clever pandas fix for that problem though.
Isaac Overcast
@isaacovercast
May 16 2016 00:42
By bad luck, mostly. Testing step3 PE reference and one of the samples failed and it got NaN for all it's stats for that step. Not sure why.
Maybe it would be better to leave them as floats and just try to format them nicer? NaN is a valid float value.
Deren Eaton
@dereneaton
May 16 2016 00:44
ok.
Isaac Overcast
@isaacovercast
May 16 2016 00:44
I'm gonna go grab a sandwich, i'll be back in a bit...
Deren Eaton
@dereneaton
May 16 2016 00:44
ok
Isaac Overcast
@isaacovercast
May 16 2016 01:33
Did you see the advice I gave cyclades about increasing max low quality sites vs decreasing phredQ cutoff? Do you have any thoughts on which is a better strategy? I tested his data with a converted phredQ score of 13 (~95% confidence) and got lots more unfiltered reads. Downstream, do you think its' better to have more N's or more low confidence base calls?
Deren Eaton
@dereneaton
May 16 2016 01:35
yeah, I agree with you. I think it's probably better to have more low conf calls downstream. Weird that the phred scores were so low, though.
Deren Eaton
@dereneaton
May 16 2016 02:05
I ran into the NaN bug, it is nasty.
Deren Eaton
@dereneaton
May 16 2016 05:44
OK, I think the bug in step7 for pairddrad data is now fixed. Just worked on all of the sim data sets, and on 4 samples from Ron's empirical data set.
Cycadales
@Cycadales
May 16 2016 06:33
@dereneaton @isaacovercast what do you guys think of these settings that I am using for 150 PE read data; max_low-qual_bases = 20 and phredQ score = 25. I am running this on work computer 24 score CPU and 96gbs ram. What would you guys suggest?
Isaac Overcast
@isaacovercast
May 16 2016 14:41
@Cycadales Yeah, i would dial back the max_low_qual_bases. I sent you an email with more explicit suggestions.
@dereneaton Step 6 is working on the gbs data again. Its soooo fast, it's insane. Destroyed the data in like an hour and a half. Last time i checked performance on step 6 it took 5 hours. :rocket:
Deren Eaton
@dereneaton
May 16 2016 15:47
niiiice.
Deren Eaton
@dereneaton
May 16 2016 16:53
@isaacovercast can you push a mac update?
Deren Eaton
@dereneaton
May 16 2016 17:48
fyi, I should note that cutters with an ambiguous base are not yet supported for demultiplexing. And step1 is really poorly parallelized at the moment, at least since I made some changes last. It's on my TODOs.