These are chat archives for dereneaton/ipyrad

13th
Jan 2016
Isaac Overcast
@isaacovercast
Jan 13 2016 14:43
I'm about to commit the code that updates assemblies. I adopted a "don't ask, just do it and inform" strategy. keep an eye on it after your next pull..
Isaac Overcast
@isaacovercast
Jan 13 2016 16:11
For me consens_se.py is crashing on cleanup, are you seeing this behavior?
Isaac Overcast
@isaacovercast
Jan 13 2016 16:49
nvm i fixed it. pull new from master if you want step5 to work, tmpcat and tmpcons were getting cleaned up prematurely.
Isaac Overcast
@isaacovercast
Jan 13 2016 17:55
Wicked bug in cluster_within.clustall(). Testing on glenn's data works fine on small # of samples, but the full dataset errors out with IndexError: list index out of range... Can't tell yet if it's a prob with one individual or an emergent bug.. Seems like a race condition.
Isaac Overcast
@isaacovercast
Jan 13 2016 18:21
Licked it... sort of. An empty utemp file will crash build_clusters. I added a guard to check for length utemp > 0, so at least it'll catch this. I tried running the vsearch commands for this individual by hand and they seem to work, but they still output blank utemp files. What could cause this?
Isaac Overcast
@isaacovercast
Jan 13 2016 20:18
https://github.com/dereneaton/ipyrad/wiki/Preview-mode docs. It also made me realize that if you run step1 full, then run step1 in preview it'll overwrite :-/ Can't think of a good way around that besides just making sure people know...
Deren Eaton
@dereneaton
Jan 13 2016 20:25
I'm trying to write a custom Exception that we can use to exit gracefully out of an ipyclient at any time and report a warning message.
Isaac Overcast
@isaacovercast
Jan 13 2016 21:36
hackersonly again useful. The max fragment length of glenn's gbs data is 132.
Deren Eaton
@dereneaton
Jan 13 2016 21:44
Something that might be worth hacking on for step3 is the vsearch parameter "-query_cov". It's currently hardcoded. I recently raised it to 60 to be conservative, but I've tested it with good results as low as 30. This is the amount the query seq must overlap with the seed, if I remember right.
And it could presumably influence what the max frag length will be.
Isaac Overcast
@isaacovercast
Jan 13 2016 21:49
Good idea, I'll make a ticket... As of now everything seems to be going well with glenn's data. I changed the default preview length for step 1 (2000000 lines) and preview is working better, have run it up through step 5 at this point.
Deren Eaton
@dereneaton
Jan 13 2016 21:57
Cool. I've got someone sending me pairddrad data soon for testing.
This exception handling thing is confusing as hell. Still working on it.
I'm moving custom exception classes to util.py
Isaac Overcast
@isaacovercast
Jan 13 2016 22:22
Good idea.
Isaac Overcast
@isaacovercast
Jan 13 2016 22:53
I want to change how newparamsfile works. Instead of hard coding it I'd like to create a new temp assembly and generate the proper and current format on the fly. My idea also involves changing getparamsinfo to make it an ordered dict of tuples, the first tuple can be the current formatted long description, and the second tuple will be the short description for outputting to the params file. Any thoughts?
Deren Eaton
@dereneaton
Jan 13 2016 22:55
I can get it to seemingly catch the ipyparallel.CompositeError and pass it along as a nicely formatted IPyradError, but for some reason it still kills the ipyclient even though I caught the exception. This is driving me mad.
Yeah, that sounds like a great idea re: newparamsfile and getparamsinfo. I suppose you'll have to create the new tmp assembly in __main__?
Isaac Overcast
@isaacovercast
Jan 13 2016 23:00
I was thinking more like putting it inside newparamsfile, inside write_params(). It'd mean importing Assembly in this module. Do you think that'd work?
Deren Eaton
@dereneaton
Jan 13 2016 23:00
I also like this because it we could probably create something like data1.writeparams() to create a params.txt file from an API assembly object.
would that make sense?
oh wait, but the assembly object needs the params from the params file, so I guess that would only be useful after the fact.
Isaac Overcast
@isaacovercast
Jan 13 2016 23:08
No that's a great idea, it would be a good way for API users to export their parameters in a way they could share/archive. should be simple. I can make a writeparams function for Assembly and then newparamsfile would be a simple wrapper that creates a new default assembly and just writes it out to ./params.txt
Deren Eaton
@dereneaton
Jan 13 2016 23:26
I figured out how to grab the host names that engines are connected to. I think we can use this to create the threaded_view hub in steps 3 & 6 to ensure the threads that are being passed as a group are on the same machine. I wrote some psuedo-code in step3 that will need to be tested on an HPC setup.
Isaac Overcast
@isaacovercast
Jan 13 2016 23:32
Nice. That'll be sick when we can get it running on a cluster.
Isaac Overcast
@isaacovercast
Jan 13 2016 23:52
In step6 with glenn's data. In cluster_across.cluster() vsearch is crashing and throwing a SystemExit error, which is bubbling all the way back up to clientwrapper. I caught the command, ran it by hand and it told me: Fatal error: illegal character '-' on line 63 in fasta file
Looked at the cathaps file and there are many lines where a sequence and the following read name are concatenated:
>cra-ant-FMNH_391885_655
TGCAGGNTCCNCTTAGGACANGCTGCTCTGTCACTGGGCCAGGACTCTGTGCTGGGCTGGGCAGAGGGGCTGGCACTGAGGGGCACACTCAGTCTGTGCTGAGATCCCCTGGANCCGTGACCAGGGGTCCTGC>NK_171570_1179
TGCAGCCCCCTCCCTGCTACNAAAACTGATACACAGGGTCATTAAGGTCCCCAGTAATACATGAGGCCATNGATNCCATTAATTCCTCAGGCTGGTTAAACAAGACTTTNTCCTCTTGCTCTCCCTGCCTGC
>cra-ant-B31683_1
and:
>cra-ant-B31683_334
TGCAGTGCCACAGTCCCTCTGCAAAGCCCCCTTATATTCTGGGAATGCAATNGCTCTGGAGAGCCTCCTTACCCTCTCCAGCCTCCTCNTCCCACGAAGCCTTGTTGCCCGGTCATGCTGC>cra-ant-72900-Ancash-Chinchan_295
TGCAGTGTCTNNNNCAGTGTCCCTTTTGTTCCATCTGGCTCTGAAGTGTCACAATGACCCCCTGATTCCATGAGGCTCCACAGAGTGACAATGATCCCTTTGGATCAANAAGACTCTGC>cra-cur-B6032_652
TGCAGAGCATTTGGAAGCCTACGGAGGTGGCTATTATGCATTGCTNAGCTCATCAGAGAGGGAAAACAACTTCTGAGTTAGGAAATTACTTTGCTGACCACATAGCAAAGGGGGCTGC
Deren Eaton
@dereneaton
Jan 13 2016 23:57
yuck. That's a problem.
missing a newline somewhere
Isaac Overcast
@isaacovercast
Jan 13 2016 23:57
It looks like this only happens on lines where the next line is shorter than the previous
Deren Eaton
@dereneaton
Jan 13 2016 23:57
I wonder why its happening in the real data but not sims
Isaac Overcast
@isaacovercast
Jan 13 2016 23:57
sims are all the same length?
Deren Eaton
@dereneaton
Jan 13 2016 23:58
could be