These are chat archives for dereneaton/ipyrad

8th
Dec 2015
Deren Eaton
@dereneaton
Dec 08 2015 00:16
More fixes to steps 3 and 5 just merged to master
We're getting pretty close to having this thing running. Unfortunately I'm gonna be pretty busy the next week or two. :books: My input s gonna slow down til then.
Isaac Overcast
@isaacovercast
Dec 08 2015 01:56
I know! I'm psyched!
Also, turns out converting mpileup to the format we need is actually just as annoying as i thought it would be, so it set me back a day (today mostly). Think i have it licked, just debugging the last bits
Also, i was thinking about pulling all refseq mapping code out of cluster within and putting it into its own python file and making function calls into it. There's just a lot of code and it's making cluster_within pretty bulky, plus i think it's good practice to isolate functionality in this way. I'm probably gonna do that unless you have strong feelings otherwise.
Deren Eaton
@dereneaton
Dec 08 2015 02:27
Yeah, good idea. I've been playing around with werkin ipyparallel into the new way we want it. Getting close.
Isaac Overcast
@isaacovercast
Dec 08 2015 04:25
Question: vsearch derep and clustering have to use the '-strand both' flag for gbs because you get a mix of forward and reverse reads from hits at any give cut site, if i understand correctly. I'm working with a similar problem in refseq, reads that align to the reverse strand, and am contemplating taking the complement of these for the final output. Assume this is what is happening here but wanted to verify. Align everything as if it was on the forward strand?
Deren Eaton
@dereneaton
Dec 08 2015 17:11
yeah, that's right.
Deren Eaton
@dereneaton
Dec 08 2015 18:18
Here's the new ipyparallel setup:
Deren Eaton
@dereneaton
Dec 08 2015 18:23

CLI:
main launches ipcluster with a unique cluster-id (could do profiles too, but doesn't yet). ipyrad.core.Assembly imports this id and sets it as a hidden attribute of the Assembly object:
data._ipclusterid = 'ipyrad-[pid]'
data._ipprofile = ''
When the ipyparallel.Client() is launched it uses these two attributes so that it connects to and later kills the correct controller.

API:
ipcluster should be started outside of ipyrad. If the user starts it with a non-default cluster-id or profile then the user can set these as attributes to the Assembly object so it attaches to the correct Client. If ipyparallel is not running or can't find the profile, etc, it should return an informative error message.