These are chat archives for dereneaton/ipyrad

8th
Jan 2016
Isaac Overcast
@isaacovercast
Jan 08 2016 00:52
lol, i picked up this habit of surrounding parens/braces with spaces internallyif isinstance( samples, str ). I notice you don't do that and i've been trying to stick to your coding convention, but i slip up occasionally.
Deren Eaton
@dereneaton
Jan 08 2016 01:04
Ha, I use a pylinter plugin for sublimetext which highlights anything that doesn't follow it's code standard (flake8 or PEP8 or something). I've just gotten in the habit of fixing anything that it highlights.
Also the reason I never go past 80 characters in a line.
Isaac Overcast
@isaacovercast
Jan 08 2016 01:05
Now that's a rule i do try to live by. Did i ask you already if you're going to the NY area pop genomics workshop at princeton in a couple weeks?
Deren Eaton
@dereneaton
Jan 08 2016 01:05
I found the revcomp problem for paired reads. I had fixed it a while back and then forgot about it on a lingering branch. Doing some final tests now then I'll push it.
oh yeah, I was gonna ask about it. No, I hadn't even heard of it
do you need to register?
got a link?
Isaac Overcast
@isaacovercast
Jan 08 2016 01:08
I mean "need" is a strong word, nahmean. There was no reg fee, i'm sure if you show up they're not gonna turn you around..
Deren Eaton
@dereneaton
Jan 08 2016 01:12
dang, I'm totally out of the popgen loop
Isaac Overcast
@isaacovercast
Jan 08 2016 01:16
Dude, this shit is so close to being done, i'm psyched.
I mean "done", of couse B-)
Deren Eaton
@dereneaton
Jan 08 2016 01:22
yeah, I'm really excited too!
Deren Eaton
@dereneaton
Jan 08 2016 01:40
I'll send you those files soon. I just got them to run all the way through pairgbs and it's golden.
Isaac Overcast
@isaacovercast
Jan 08 2016 01:40
Siiiiiiiiiiiiiiiick.
Isaac Overcast
@isaacovercast
Jan 08 2016 01:49
fml. git has 2 kinds of tags, lightweight and annotated. The lightweight tags aren't actually "real" tags, they don't change the output of git describe, which i'm trying to use to make maintaining version number in ipyrad/init.py less stupid. We should make a solemn pact to only ever use git tag -a <version_number
Isaac Overcast
@isaacovercast
Jan 08 2016 02:45
I more or less fixed the problem of maintaining version numbers in git tags and in ipyrad/init.py. The new setup.py auto-updates init.py. See this ticket for the new workflow: #25
Deren Eaton
@dereneaton
Jan 08 2016 16:18
How are you getting to Princeton for the meeting?
Isaac Overcast
@isaacovercast
Jan 08 2016 17:41
Unknown. there are a bunch of people from my lab going, i assume we'll probably carpool. I'll coordinate on it and let you know if there are open seats.
Also, i'm still not happy with the auto-tagging thing, i have an idea how to make it easier/more reliable.
Deren Eaton
@dereneaton
Jan 08 2016 17:42
It seems to have worked for me without running python setup.py
maybe cuz I ran pip install -e .
Isaac Overcast
@isaacovercast
Jan 08 2016 17:45
that'll do it.
Deren Eaton
@dereneaton
Jan 08 2016 17:45
It seems fine to me then. We just need to remember to give it a version name when we push to master.
Have you tried running pairs with the latest Master?
it should be orienting them correctly.
Or I guess I can reference it as 0.1.5
Isaac Overcast
@isaacovercast
Jan 08 2016 17:47
not yet, i'll give it a shot
For step 6 you mentioned yesterday that you're planning on writing out .vcf and .loci?
Deren Eaton
@dereneaton
Jan 08 2016 18:00
It currently writes out aligned loci in cat.clust.gz which is is basically the .loci file without the line showing SNPs.
And it outputs the HDF5 supercatg file
I figure the first thing I'll do in step7 is build a vcf from both after applying filters.
Or you can work on that if you want. I'm not gonna be able to do any work on code today.
I'm thinking of doing a pretty big rewrite of the old step7 code since it's really clunky.
speaking of which there are like 8 outstanding bug reports on step7 in the pyrad help forum that I should probably attend to.
Isaac Overcast
@isaacovercast
Jan 08 2016 18:27
I guess what i was thinking is we already have a loci2vcf function, if we apply filters to the cat.clust.gz and write out .loci, then we can just run loci2vcf to get that. Is there an advantage in going from supercatg+clust.gz -> vcf and then vcf -> .loci?
I've already been working on step7 so i can keep hacking on it. Are there specific design ideas you had in mind?
Isaac Overcast
@isaacovercast
Jan 08 2016 18:37
Also, if you get a chance to dropbox me the pairgbs data I'd love to start testing on real
Deren Eaton
@dereneaton
Jan 08 2016 18:40
Oh, well the idea of the supercatg is to have the depth information for every base call. So we could use the loci2vcf function and then just query the depth info from supercatg to fill it in.
One idea I had was that if VCF was the basis for conversions then users could potentially post-filter their data by only allowing base calls with mincoverage=X.
we don't currently save the genotype quality scores either, which could also be used as a post-filter. They could be saved from step5 and end up in VCF as well. Or be recalculated later (maybe easier), it's just the binomprobr() function from consens_se.py.
Isaac Overcast
@isaacovercast
Jan 08 2016 18:47
Got it, makes sense.
I have a spot in a carpool to princeton if you want it, still working on logistics.. more info soon.
Deren Eaton
@dereneaton
Jan 08 2016 19:10
That would be awesome.
Isaac Overcast
@isaacovercast
Jan 08 2016 20:40
OK, i'll hold a spot for you.
Got step6 run all the way through, seems to work great!
Looking at catclust.gz, and i'm having a hard time interpreting what's going on in a small fraction of the clusters:
1D0_734 TGCAGCCAATCAGATCGTGCAATTCACCGTAACCGATCCACATAATACCACACTTCCCCGGTTCTAAGAGAGCGGTGAAGGATGTCCCTCAGGGSSSSAATTGCCCTCCAGGGCTCATAACTGAAGGCGCCCTCGTTCATGCGCATTCGCCATCCTAACAGCACGGGCTGCCGATGGGTTTTTTTATCACTATGGGCT
3J0_88
TGCAGCCAATCAGATCGTGCGATTCACCGTAACCGATCCACATAATACCACACTTCCCCGGTTCTAAGAGAGCGGTGAAG------------------AATTGCCCTCCGGGGCTCATAACTGAAGGCGCCCTCGTTCATGCGCATTCGGCATCCTAACAGCACGGGCTGCCGATGGGTTTTTTTATCACTATGGGCT
1C0_269
TGCAGCCAATCAGATCGTGCAATTCACCGTAACCGATCCACATAATACCACACTTCCCCGGTTCTAAGAGAGCGGTGAAG------------------AATTGCCCTCCAGGGCTCATAACTGAAGGCGCCCTCGTTCATGCGCATTCGGCATCCTWACAGCACGGGCTGCCGATGGGTTTTTTTATCACTATGGGCT
I see a handful of them like this and my brain is contorting to understand, the two conspicuous elements being the missing "SSSS" pair separator, and also the huge gap. I'm assuming this is an artifact of merging, but am not familiar enough with the merge code to know quite why it'd do this. Thoughts?
Deren Eaton
@dereneaton
Jan 08 2016 21:44
Do you have an adapter filter turned on?
It looks like R1 is being trimmed
Though I thought I had it so that it would trim R2 to the same length when it trims R1.
Oh, is this simulated pairddrad?
Something is still not right, the cut site "AATT" should be on the far right. Not next to the separator "SSSS".
Isaac Overcast
@isaacovercast
Jan 08 2016 22:41
Filter adapters = 0
It is the sim pairddrad
Isaac Overcast
@isaacovercast
Jan 08 2016 22:51
Looks like the orientation is right in the revcompR2.fastq, but then it's reversed (with AATT at the beginning) in .merged_.fastq
Isaac Overcast
@isaacovercast
Jan 08 2016 23:09
It looks like vsearch --fastq_mergepairs is internally reversing R2 passed in at the --reverse flag.
Since cluster_within is already doing a vsearch --revcomp right before that it explains why the merged R2 are on the correct strand, but reversed. I think the fix to this is to not revcomp r2 before --fastq_mergepairs, but instead to simply comp() R2, and let vsearch do the rev. I want to get your thoughts on this before i make any change here..