These are chat archives for dereneaton/ipyrad

27th
Jan 2016
Deren Eaton
@dereneaton
Jan 27 2016 00:54
OK, I merged my h5 branch to master. The loci file can now be built completely from the h5 arrays created in step6, no need for reading in the catclust.gz file. I think its pretty fast too.
it doesn't handle PE data yet, but can with a small tweak.
Deren Eaton
@dereneaton
Jan 27 2016 01:11
it passed all three SE test data sets.
Isaac Overcast
@isaacovercast
Jan 27 2016 01:20
Sweet. I fixed CLI to handle relative paths for workdir, including ./ however it still uses the directory name as the assembly name.... Right now i'm chasing down a bug in refmap, nasty intermittent thing, only seen it once on a very large dataset, working on reproducing it, but large datesets take time to cluster, so it's kind of a waiting game.
Deren Eaton
@dereneaton
Jan 27 2016 01:30
That's a pain. Can you add the test refseq file to data/? I want to start testing on refseq as well.
Isaac Overcast
@isaacovercast
Jan 27 2016 01:38
Added a small manufactured mitogenome to tests/data that has reliable hits for simulated rad and ddrad data
re: #93 I do like this idea. I grep'd for working_directory in ipyrad/* and there are only 33 matches. 26 matches in tests/
Deren Eaton
@dereneaton
Jan 27 2016 01:41
Cool.
BTW, does the ref fasta also work with a multifasta file?
Isaac Overcast
@isaacovercast
Jan 27 2016 01:42
What do you mean by multifasta?
Just looked it up. I imagine it would work fine. Not sure what the use case would be...
Deren Eaton
@dereneaton
Jan 27 2016 01:47
Like if you have lots of separate contigs representing your reference genome
different chromosomes, etc.
Isaac Overcast
@isaacovercast
Jan 27 2016 01:48
Oh hell yeah, that totally works. I'm testing glenn's gbs data against the manakin genome and it's all contigs
Deren Eaton
@dereneaton
Jan 27 2016 01:48
ok, good, just making sure. I'll get at it soon.
Isaac Overcast
@isaacovercast
Jan 27 2016 01:52
I think the working_dir to project_dir conversion makes sense. It emphasizes the fact that you could have multiple assemblies in the same project, branched off the same data.
If we're gonna make that change then now is the time. It could break shit for a minute, since it's referenced in so many places, but I could do it if you want to pull the trigger.
Deren Eaton
@dereneaton
Jan 27 2016 02:05
yeah, let's do it.
I just pushed a fix in refseq where a pipe was calling the system bedtools instead of ipyrad/bin/bedtools. In case mixing different beds might be causing your bug.
Isaac Overcast
@isaacovercast
Jan 27 2016 02:06
oh fuck! good catch
I'll do the working_directory to project_directory switch. You think it should be 'project_directory' or 'project_dir'?
any preference?
Deren Eaton
@dereneaton
Jan 27 2016 02:15
Dir
While you're at it, should we be more consistent in referring to the others as file or path, i.e. sorted fastq path, barcode path, reference file...
Isaac Overcast
@isaacovercast
Jan 27 2016 02:24
Good idea.
I'll crank it out.
Isaac Overcast
@isaacovercast
Jan 27 2016 04:09
fuggg. Caught a nasty bug. If you ctrl+c while dillout is saving yr assembly it gets corrupted, and then you have to start over from step 1. Added a handler to assembly.save() to trap keyboard interrupts.
Deren Eaton
@dereneaton
Jan 27 2016 04:11
Geez. Send like an unlikely thing
I do think we can convert the save object to json, tho.
It will just take a little work.
Isaac Overcast
@isaacovercast
Jan 27 2016 04:18
Yeah, unlikely but very nasty. I'm testing the fix now.
Deren Eaton
@dereneaton
Jan 27 2016 04:19
How do we avoid that? Seems like it could happen for json file too.
Isaac Overcast
@isaacovercast
Jan 27 2016 04:20
I have a strategy. There's a way to trap kbd interrupts
Got a block in assembly.save() to guard against it.
Deren Eaton
@dereneaton
Jan 27 2016 04:21
Oh, cool.
That's pro.
Isaac Overcast
@isaacovercast
Jan 27 2016 04:21
B-)
Isaac Overcast
@isaacovercast
Jan 27 2016 04:31
I don't think kbd interrupts are cleaning up all ipclients. I'm seeing tons of orphaned ipyparallel engines in ps, eventually they stack up and then ipyrad stops running with nasty error msgs. Seen anything like this?
Seeing lots of lines like this in my ps:
  508  1995     1   0  7:23PM ??        15:36.50 /usr/local/opt/anaconda/bin/python /usr/local/opt/anaconda/bin/ipcluster start --daemon --cluster-id=ipyrad-1947 --controller=Local -n 24
Isaac Overcast
@isaacovercast
Jan 27 2016 04:37
Looks like ipcluster start is getting orphaned to process 1 (init). I'm racked for the day, but i'll try to look at it tomorrow.
Isaac Overcast
@isaacovercast
Jan 27 2016 04:46
ps - check out the new Hinds record. It's really good.
Deren Eaton
@dereneaton
Jan 27 2016 18:02
Cool, I'll check it out.
Sounds like a good fix for ipcluster death. That's an annoying problem.