These are chat archives for dereneaton/ipyrad

22nd
Feb 2016
Deren Eaton
@dereneaton
Feb 22 2016 17:29
I added some cool stuff last night. The -r flag print a much cleaner output now. Assembly objects now have two separate stats dicts. data.stats_dfs holds pandas data frames for each step, and data.stats_files hold the paths to the saved stats files as a string.
I've also got .phy, .snp, .usnp, and .str outputs building.
and a cleaner results file for step7. With the filters applied correctly now, I believe.
Isaac Overcast
@isaacovercast
Feb 22 2016 19:15
tight!
hmm, ok, so i think we need a new strategy for max_fragment_length, right now it's pegged at 150 for single end but single end gbs reads that merge can be longer than that. Here are counts of read lengths in the clustS file for one individual:
   2574 150
   2577 151
   2537 152
   2879 153
   2628 154
   2388 155
    360 156
     29 157
     18 158
      5 159
     28 160
      7 161
     21 162
Isaac Overcast
@isaacovercast
Feb 22 2016 19:35
Love the new -r format
Isaac Overcast
@isaacovercast
Feb 22 2016 19:41
Not "gbs reads that merge", but you know what i mean, a read on the reverse strand that partially overlaps.
Deren Eaton
@dereneaton
Feb 22 2016 20:22
hmm, setting too large of number cranks up the memory load a bit, since we build those large empty arrays for filling. But we could choose slightly larger values for gbs. Does it crash and burn on the long length sequences?
Ideally we would write it so that if a read exceeds the maxlen we just trim the read to the maxlen and keep going. The expected data loss from this should be super minimal.
Deren Eaton
@dereneaton
Feb 22 2016 21:13
whoops, downside to my "upgrading" the names of the stats dicts is that all older json loads are broken
hopefully no major changes to core Assembly or Sample names for a while.
Deren Eaton
@dereneaton
Feb 22 2016 21:23
OK, I made a quick fix in the API to rescue an old assembly. In an IPython shell run this to convert it to the new format:
import ipyrad as ip
data = ip.load_old_json("old_assembly.json")
data.save()
available in 0.1.58
Isaac Overcast
@isaacovercast
Feb 22 2016 22:09
nice.
Sequences that exceeds max_frag_len cause step 4 to crash and burn. I added a chunk of code to cluster_within.sample_cleanup() that actually calculates max_frag_len from the data
Deren Eaton
@dereneaton
Feb 22 2016 22:13
nice
I noticed that the conda version is still at 0.1.37
do you think we should have versioner.py push all tags to conda?
It kinda seems like we should
Isaac Overcast
@isaacovercast
Feb 22 2016 22:14
Dude, i was totally thinking that
Deren Eaton
@dereneaton
Feb 22 2016 22:15
do you think you could do that?
Isaac Overcast
@isaacovercast
Feb 22 2016 22:15
Totally, it'll take me two seconds.
Isaac Overcast
@isaacovercast
Feb 22 2016 22:23
Done (0.1.59). You still have to remember to go to anaconda.org to set it as the default. I'll try to automate that too, cuz that's kinda annoying.
Deren Eaton
@dereneaton
Feb 22 2016 22:27
strange, seems like it should default to the newest one.
Isaac Overcast
@isaacovercast
Feb 22 2016 22:27
Yep, it sure does seem like that, it's weird it doesn't
Deren Eaton
@dereneaton
Feb 22 2016 23:15
Is the conda install working for you...? I'm getting some weird stuff going on.
Deren Eaton
@dereneaton
Feb 22 2016 23:43
I might have broken the conda recipe when I messing with stuff so that conda would build on rtd. I don't think I changed the conda.recipe/ stuff, but I did add some requirements files...