These are chat archives for dereneaton/ipyrad

26th
Apr 2016
Deren Eaton
@dereneaton
Apr 26 2016 19:48
hey hey, any updates?
Isaac Overcast
@isaacovercast
Apr 26 2016 22:22
jah mon, nice work! You were crushing the tickets yesterday :sunglasses:
Cycadales
@Cycadales
Apr 26 2016 22:22
Hi @dereneaton, i would be most happy to do that as i am sure that could help many others too. Also i would really be greatfull too. I will upload the files and m.
drop you an email.
Isaac Overcast
@isaacovercast
Apr 26 2016 22:23
I am rewriting step 3 a little to actually make it handle errors better. right now there are lots of ways it can die where it's really hard to figure out wtf happened, so i'm trying to make it much more explicit when shit breaks.
Also I know denovo+reference and reference are fsck, but that's kind of the reason i got sidetracked onto error handling for step 3, so i can actually debug reference without wanting to pull my hair out :haircut:
Deren Eaton
@dereneaton
Apr 26 2016 22:31
cool. I think it's ok if reference stuff takes a while longer. We could release a beta version with just denovo methods first. I think we're pretty much at that stage.
Isaac Overcast
@isaacovercast
Apr 26 2016 22:32
w00t!
Deren Eaton
@dereneaton
Apr 26 2016 22:32
I FINally got the vcf file building with indels included. It was an epic battle.
I'm gonna push some new simulated data files really soon.
Isaac Overcast
@isaacovercast
Apr 26 2016 22:33
nice!
Deren Eaton
@dereneaton
Apr 26 2016 22:34
One thing I reaaaally want to get to work is to grab the stderr from vsearch while its running and use that to make our own progress bar. But I haven't figured out how yet. When I try to grab the stderr I can print or store everything BUT the final % counter. Something weird is going on...
do you have any expert knowledge of how such things work?
I think I need to trick it into thinking its a terminal to print updates...
Isaac Overcast
@isaacovercast
Apr 26 2016 22:43
mmm, i don't really know about this. That's weird... Here's a clue: I piped stderr from a manual call to vsearch to a file. Inside the file I see a bunch of stuff like this:
Writing output file 0%  ^MWriting output file 0%  ^MWriting output file 0%  ^MWriting output file 1%  ^MWriting output file 1%  ^MWriting output file
 2%  ^MWriting output file 2%  ^MWriting output file 3%  ^MWriting output file 3%  ^MWriting output file 4%  ^MWriting output file 4%  ^MWriting outp
ut file 5%  ^MWriting output file 5%  ^MWriting output file 6%  ^MWriting output file 6%  ^MWriting output file 6%  ^MWriting output file 7%  ^MWriti
ng output file 7%  ^MWriting output file 8%  ^MWriting output file 8%  ^MWriting output file 9%  ^MWriting output file 9%  ^MWriting output file 10% 
 ^MWriting output file 10%
but when i cat the file i only see this:
yeti:ipyrad-test_edits iovercast$ cat tmp.out 
Reading file G_0_R1_.fastq 100%  
1887520 nt in 20080 seqs, min 94, max 94, avg 94
Dereplicating 100%  
Sorting 100%
2155 unique sequences, avg cluster 9.3, median 2, max 50
Writing output file 100%  
vsearch v1.9._osx_x86_64, 64.GB RAM, 24 cores
https://github.com/torognes/vsearch
Isaac Overcast
@isaacovercast
Apr 26 2016 22:52
Ok, i looked at the vsearch code and when it's printing the progress update it prepends each line with '\r' (carriage return), thereby overwriting the previous status. Tricky. This is probably definitely confusing the pipe in subprocess. Not sure how to work around it...
Deren Eaton
@dereneaton
Apr 26 2016 22:52
shit
I tried using the vsearch logfile function, but it doesn't seem to write to the logfile until its done.
Isaac Overcast
@isaacovercast
Apr 26 2016 23:05
You can do it like this:
for c in iter(lambda: process.stderr.read(1), ''):
    if c == "\r":
        print("".join(acc))        
        acc = []
    else:
        acc.append(c)
Deren Eaton
@dereneaton
Apr 26 2016 23:11
nice, trying it out now.
Deren Eaton
@dereneaton
Apr 26 2016 23:20
...
Reading file /home/deren/Downloads/pedicularis/cyatho_consens/cyatho-min8_cathaps.tmp 100%
42408380 nt in 573115 seqs, min 35, max 79, avg 74
Masking 100%
Counting unique k-mers 100%
Clusterin
Reading file /home/deren/Downloads/pedicularis/cyatho_consens/cyatho-min8_cathaps.tmp 100%
42408380 nt in 573115 seqs, min 35, max 79, avg 74
Masking 100%
Counting unique k-mers 100%
Clustering
still stops at Clustering... but doesn't print the number after...
Isaac Overcast
@isaacovercast
Apr 26 2016 23:22
:-/
import subprocess

cmd = "/usr/local/bin/vsearch --derep_fulllength /tmp/ipyrad-test/ipyrad-test_edits/2G_0_R1_.fastq --output /tmp/wat"
cmd = cmd.split()
#cmd = "/usr/local/bin/vsearch"
process = subprocess.Popen(cmd, stderr=subprocess.PIPE)
acc = []
for c in iter(lambda: process.stderr.read(1), ''):
    if c == "\r":
        print("".join(acc))
        acc = []
    else:
        acc.append(c)
Deren Eaton
@dereneaton
Apr 26 2016 23:24
does that work for you?
Isaac Overcast
@isaacovercast
Apr 26 2016 23:24

this is the code i used to test inside a notebook, seemed to work.

Reading file /tmp/ipyrad-test/ipyrad-test_edits/2G_0_R1_.fastq 0%  
Reading file /tmp/ipyrad-test/ipyrad-test_edits/2G_0_R1_.fastq 0%  
Reading file /tmp/ipyrad-test/ipyrad-test_edits/2G_0_R1_.fastq 1%  
Reading file /tmp/ipyrad-test/ipyrad-test_edits/2G_0_R1_.fastq 1%  
Reading file /tmp/ipyrad-test/ipyrad-test_edits/2G_0_R1_.fastq 2%  
Reading file /tmp/ipyrad-test/ipyrad-test_edits/2G_0_R1_.fastq 3%  
Reading file /tmp/ipyrad-test/ipyrad-test_edits/2G_0_R1_.fastq 3%

etc, etc

can you try just running that dummy code in a notebook? It maybe_ _maybe could be a linux v mac thing...
Deren Eaton
@dereneaton
Apr 26 2016 23:35
try it with clustering instead of derep. It also prints progress for a bunch of junk about reading and counting lines, but then it stops printing progress when it starts clustering.
Isaac Overcast
@isaacovercast
Apr 26 2016 23:39
Clustering 0%  
Clustering 0%  
Clustering 1%  
Clustering 1%  
Clustering 2%  
Clustering 3%  
Clustering 3%  
Clustering 4%  
Clustering 4%  
Clustering 5%  
Clustering 5%  
Clustering 6%  
Clustering 7%  
Clustering 7%  
Clustering 8%
D:
Deren Eaton
@dereneaton
Apr 26 2016 23:40
wtf
thats what I waaaant
Isaac Overcast
@isaacovercast
Apr 26 2016 23:40
lol!
is the code in a place where you could commit? You could check it in and i could try it on my mac, gather more data...
Deren Eaton
@dereneaton
Apr 26 2016 23:41
yeah, I'll go for it.
Deren Eaton
@dereneaton
Apr 26 2016 23:51
Ok, master has some not-working code in cluster_across.py at line 321
Isaac Overcast
@isaacovercast
Apr 26 2016 23:51
ok, gimme a minute to try it