These are chat archives for dereneaton/ipyrad

11th
Dec 2015
Isaac Overcast
@isaacovercast
Dec 11 2015 19:08
That's a juicy speedup. What loop was it? What file? I want to check out what you did.
Also, this was pretty fucked, cost me half the day today:
dereneaton/ipyrad#26
Non-deterministic behavior. I had to cycle ipcluster and kill all my notebooks and it "fixed" this problem. Something to keep an eye on.
Deren Eaton
@dereneaton
Dec 11 2015 19:41
Pulled in changes, now can't get the CLI to run. Tried a new conda build...
Traceback (most recent call last):
  File "/home/deren/anaconda/bin/ipyrad", line 5, in <module>
    from pkg_resources import load_entry_point
  File "/home/deren/anaconda/lib/python2.7/site-packages/setuptools-18.5-py2.7.egg/pkg_resources/__init__.py", line 3095, in <module>
  File "/home/deren/anaconda/lib/python2.7/site-packages/setuptools-18.5-py2.7.egg/pkg_resources/__init__.py", line 3081, in _call_aside
  File "/home/deren/anaconda/lib/python2.7/site-packages/setuptools-18.5-py2.7.egg/pkg_resources/__init__.py", line 3108, in _initialize_master_working_set
  File "/home/deren/anaconda/lib/python2.7/site-packages/setuptools-18.5-py2.7.egg/pkg_resources/__init__.py", line 660, in _build_master
  File "/home/deren/anaconda/lib/python2.7/site-packages/setuptools-18.5-py2.7.egg/pkg_resources/__init__.py", line 673, in _build_from_requirements
  File "/home/deren/anaconda/lib/python2.7/site-packages/setuptools-18.5-py2.7.egg/pkg_resources/__init__.py", line 846, in resolve
pkg_resources.DistributionNotFound: The 'ipyrad==0.0.66' distribution was not found and is required by the application
I feel like I've seen this before... but forgot what to do.
Deren Eaton
@dereneaton
Dec 11 2015 20:00
fixed... somehow
Deren Eaton
@dereneaton
Dec 11 2015 20:22
The biggest speed-up was in creating bootstrap data sets. Before I had a terrible list comprehension that was trying to build a thousand new sampled data sets at once. Now I have an initialized np.array of the right size and I fill it one row at a time by sampling random rows from the real data set. The data are int arrays, and iterating and summing numerical arrays is way fast, and can be sped up even more with numba.jit. It's in ipyrad.analysis.dstat.py. Not quite ready for running yet, I'm still testing individual functions.
Deren Eaton
@dereneaton
Dec 11 2015 22:33
A thought on the API: I think I want to change set_params and get_params to params_set and params_get. That way params{tab} would bring up the three most likely funcs (params_set, params_get, and paramsdict)
Deren Eaton
@dereneaton
Dec 11 2015 22:48
I'm thinking about what attributes will be available from ip{tab} and data{tab}, and cleaning them up a bit by hiding some functions and attributes. In the case of the binaries I moved them into an ObjDict called bins, so they will be called as e.g., data.bins.muscle.
I think I'll create a whole new module called load that will center around functions to load variously formatted data files into Assembly objects with linked Samples and data. So load_assembly() will load our assembly pickles, but you could imagine functions to load_loci, load_vcf, etc. Maybe the bloated link_fastqs() function would belong in here too.
These would be focused on downstream analyses, and provide a lot of backwards compatibility for people with existing pyrad data sets to jump into using ipyrad popgen stuff, etc.
Isaac Overcast
@isaacovercast
Dec 11 2015 23:13
That all makes sense, i tend prefer get* and set, but it sounds like you have a good reason to go this other direction, so i'm with you. Also, i like the data.bins. idea, makes sense.
disregard errant emphasis :sunglasses:
Deren Eaton
@dereneaton
Dec 11 2015 23:14
I'll mull it over. Not completely sold on get, set yet
Isaac Overcast
@isaacovercast
Dec 11 2015 23:15
I am within striking distance on the ref mapping. been crankin on it today. so close i can taste it. def have something functioning this weekend.
Deren Eaton
@dereneaton
Dec 11 2015 23:16
sweeeeet
I'll try to finish step6 and then we can run some test data sets pretty much to the end
Isaac Overcast
@isaacovercast
Dec 11 2015 23:19
After refmapping gets working for rad-seq, i'm considering merging the refseq branch into master. I have tested it with the 'denovo' flag and it works quite nicely.
Then i can fork a branch for gbs/ddrad/etc
Isaac Overcast
@isaacovercast
Dec 11 2015 23:31
Oh, fuck yyeah, it's totally working.