These are chat archives for dereneaton/ipyrad

23rd
Jan 2016
Isaac Overcast
@isaacovercast
Jan 23 2016 00:01
I had a good idea of how to hack around the numba issue. I think i can do some magic in setup.py. I'll try it out, it'd be easier if pip just fuckin worked.
Deren Eaton
@dereneaton
Jan 23 2016 00:06
no kidding. Thanks for going at this tho. A big milestone.
Isaac Overcast
@isaacovercast
Jan 23 2016 01:17
oof. Here's my best idea for supporting pip: Inside setup.py install miniconda and then do a conda install of llvmlite. Best i can do. If you don't use the precompiled binaries of llvm in conda you have to build them yourself which is a bitch.
It's doable but it'd be a little hackish. Have to detect OS, dl the right miniconda installer, run it, then the user would have to click through a bunch of shit, miniconda installer is... verbose.
Lemme think about it a bit more...
Deren Eaton
@dereneaton
Jan 23 2016 01:20
whoa, that's tricky. Do you think it's legit? I suppose we could use this to sneak in mpi4py and hd5 headers too
Isaac Overcast
@isaacovercast
Jan 23 2016 01:41
Lol! Not sure if it's legit, it seems kind of sneaky but it'd work. Make conda a "deep" requirement. I'm still hacking on it.
Deren Eaton
@dereneaton
Jan 23 2016 01:56
looks like pandas can pretty easily read and write to and from json. I think saving objects as json should be a longer term goal. It's already up as a ticket #21
Deren Eaton
@dereneaton
Jan 23 2016 07:10
This message was deleted
is a conda install working for you currently? I tried it out and got this error: ```
deren@oud:~/Documents/ipyrad$ conda install ipyrad
Fetching package metadata: ....
Error: No packages found in current linux-64 channels matching: ipyrad
Deren Eaton
@dereneaton
Jan 23 2016 20:34
fwi: pip just released 8.0 update. Maybe there's something new in there to help.
I've got single and paired data finishing through step 6 for me in both the API and CLI.
it builds the hdf5 array with 4 data sets.
  1. 'catgs' : dims=(nloci, nsamples, maxlen, 4)
  2. 'filters' : dims=(nloci, 4)
  3. 'seqs' : dims=(nloci, nsamples, maxlen)
  4. 'edges' : dims=(nloci, 4)
catgs and seqs are full. Step 7 will fill the filters and edges arrays, and then apply them to seqs or catgs to get the final data set.
I still need to clean up some extra files that are left behind and figure out this chunking business, which will drastically reduce the disk space needed and allow much faster access to the data on disk. But to start I just wanted to get it running, which it seems to be doing now.
Deren Eaton
@dereneaton
Jan 23 2016 20:39
The 'nnnn' splitter in the sequences is recorded as [0,0,0,0] depths in catgs.
Isaac Overcast
@isaacovercast
Jan 23 2016 21:29
Sweet!
Conda install is still in progress. I'll check out the new version of pip...
Isaac Overcast
@isaacovercast
Jan 23 2016 21:38
Did you test step6 on the sim_rad_R1 SE data? I'm trying it and it's erroring out inside the while loop in fill_superseqs(). If there's more than one seq from an individual in the chunk then the call to indices.remove(sidx) throws an index error. I'll try running it again from step1.
Deren Eaton
@dereneaton
Jan 23 2016 21:41
lol, you're right. I made one last change and broke it right before I pushed it.
Deren Eaton
@dereneaton
Jan 23 2016 22:15
OK. fixed.
Deren Eaton
@dereneaton
Jan 23 2016 22:53
btw: Idaho workshop date is March 5-6. So I guess that's our earliest deadline. I think we should be able to have everything running with good docs and some cookbook recipes by then.
Deren Eaton
@dereneaton
Jan 23 2016 22:59
I've been thinking about whether there is a way to allow users to change Sample names within the API, i.e., change sample.name after step1 without breaking everything...
I'm finding the answer seems like no. It would be a pain in the ass.
but maybe... they can't change the self.samples.keys, tho. Trying it out.
Deren Eaton
@dereneaton
Jan 23 2016 23:13
no, nvm. Keeping track of sample.name possibly being different from the key is a mess. If we wanted to allow users to change names it would require writing a new function that replaced/reassigned dictkeys so that the key matched the new name.
or I suppose we just provide a cookbook recipe showing users how to update both.
Isaac Overcast
@isaacovercast
Jan 23 2016 23:33
March 5/6 sounds doable.
woah, what's the use case for changing a sample name? That does sound tricky.
btw, I'm still seeing step6 die on the simrad SE data. same place, inside full_superseqs() at line 436