These are chat archives for dereneaton/ipyrad

18th
Oct 2016
Edgardo M. Ortiz
@edgardomortiz
Oct 18 2016 08:42
Hi, I keep getting this error in step 4, it only happens on a single sample and only when is analyzed as pairs of reads (it has 10X reads than the average of the rest of samples), but not many more clusters than other samples, so it means that the clusters must have 10X (may be related to the IndexError?) as many reads as the other samples. I have even tried subsampling the fastq to 0.1 of the original numbers of reads and the error keeps happening.
2016-10-18 03:17:27,677 pid=36532 [jointestimate.py] ERROR Sample DIP-Dmeye-1 failed with error IndexError(index 15827 is out of bounds for axis 0 with size 15827)
Is there a way of skipping the calculations of a particular sample? The sample did not have problems on step 4 on previous versions.
R2C2.lab
@R2C2_Lab_twitter
Oct 18 2016 13:42
Hi, I am getting an error message in step 7

regina@ccmar-r2c2-01:~/ipyrad2> ipyrad -p params-seqtkpooled.txt -s 7


ipyrad [v.0.4.1]

Interactive assembly and analysis of RAD-seq data

loading Assembly: seqtkPooled
from saved path: ~/ipyrad2/seqtkPooled.json
local compute node: [20 cores] on ccmar-r2c2-01

Step 7: Filter and write output files for 2 Samples
[####################] 100% filtering loci | 0:00:10
[####################] 100% building loci/stats | 0:00:01

Empty varcounts array. Probably no samples passed filtering.

Encountered an unexpected error (see ./ipyrad_log.txt)
Error message is below -------------------------------
max() arg is an empty sequence

any idea?
Deren Eaton
@dereneaton
Oct 18 2016 13:44
@R2C2_Lab_twitter It looks like none of your samples passed filtering. We should change this to print it as a warning rather than an error. I would recommend probably lowering the min_samples_locus parameter.
R2C2.lab
@R2C2_Lab_twitter
Oct 18 2016 13:45
3 ## [21] [min_samples_locus]: Min # samples per locus for output
should I run step 7 again after changing [21] in the params file to 2 or even 1?
Deren Eaton
@dereneaton
Oct 18 2016 13:48
No, setting it below 3 is not very useful, since you will only pairs or singleton loci, which have no phylogenetic information.
Do you have many loci for each of your samples in step 5?
You can check this in the stats output files, or by running ipyrad -p params.txt -r
R2C2.lab
@R2C2_Lab_twitter
Oct 18 2016 13:50

regina@ccmar-r2c2-01:~/ipyrad2> ipyrad -p params-seqtkpooled.txt -r

Summary stats of Assembly seqtkPooled

          state  reads_raw  reads_passed_filter  reads_merged  \

MC-O1seqtk 5 2475250 2472755 2472755
MC-R1seqtk 5 2149888 2147420 2147420

          clusters_total  clusters_hidepth  hetero_est  error_est  \

MC-O1seqtk 919931 123840 0.020721 0.011916
MC-R1seqtk 703525 82026 0.020616 0.010535

          reads_consens  

MC-O1seqtk 121555
MC-R1seqtk 79565

Full stats files

step 1: None
step 2: ./seqtkPooled_edits/s2_rawedit_stats.txt
step 3: ./seqtkPooled_clust_0.85/s3_cluster_stats.txt
step 4: ./seqtkPooled_clust_0.85/s4_joint_estimate.txt
step 5: ./seqtkPooled_consens/s5_consens_stats.txt
step 6: None
step 7: None

Deren Eaton
@dereneaton
Oct 18 2016 13:50
@jldimond, the ipyrad vcf output has DP=X for read depth. The Base count is an additional type of information, but you can ignore it with vcftools.
Oh, if you only have two samples then yes, you will have to set min_samples_locus to a max of 2.
R2C2.lab
@R2C2_Lab_twitter
Oct 18 2016 13:51
ok, and run just step 7 again or everything -s 1,2,3,4,5,6,7 with [21] set to 2?
Deren Eaton
@dereneaton
Oct 18 2016 13:51
Just 7
You may have to use the -f flag
@edgardomortiz For now you could edit the JSON to set that sample's state to 4. Or this could be done in the API too.
R2C2.lab
@R2C2_Lab_twitter
Oct 18 2016 13:57
@dereneaton like this? ipyrad -p params-seqtkpooled.txt -s 7 -f
Deren Eaton
@dereneaton
Oct 18 2016 14:25
yep
R2C2.lab
@R2C2_Lab_twitter
Oct 18 2016 14:25
thanks!
cwessinger
@cwessinger
Oct 18 2016 16:27
@isaacovercast I've rerun step 3 and moved through the chunking stage very quickly this time. Strange. I'll let you know whether this happens again. Thanks!
Jay Dimond
@jldimond
Oct 18 2016 16:44
@dereneaton - the DP output shows depth across all samples. I was hoping to get that information for each sample. In VCF tools --geno-depth will sumarize DP counts for each locus at each sample but only if DP is in the FORMAT field. Incidentally, doing an internet search I came across this file in the ipyrad repository: https://github.com/dereneaton/ipyrad/blob/master/ipyrad/file_conversion/loci2vcf.py. This shows the type of format VCF Tools would need, but it is not the format I am getting in my ipyrad output.
James McDaniel
@jlmcdaniel
Oct 18 2016 19:51
Hi @isaacovercast, I updated to the newest version of pyrad and sometimes I get the Can't build targets without any engines error. This error never occurred before. Do you have any ideas about what might be causing this? Thanks!
My bash script looks almost identical to the one Deren helped me prepare: https://gist.github.com/dereneaton/6c6bfe6e487eec49cb0731bc9c3565ac. However, I had to add in a line for ipcluster because the condor job fails without it and gives an error Can't find any instance of ipcluster.
Isaac Overcast
@isaacovercast
Oct 18 2016 20:11
@jlmcdaniel If you run ipcluster manually then you need to pass the --ipcluster flag to ipyrad (http://ipyrad.readthedocs.io/HPC_script.html?highlight=HPC#running-ipcluster-by-hand). Like this:
## start an ipcluster instance with --profile=ipyrad
$ ipcluster start --n 48 --profile=ipyrad --daemonize

## run ipyrad with --ipcluster flag so it knows to look for
## that specific ipcluster instance
$ ipyrad -p params-test.txt -s 3 --ipcluster
James McDaniel
@jlmcdaniel
Oct 18 2016 20:29
@isaacovercast Yes, sorry, my bash script looks identical to yours. I get the Can't build targets without any engines with it. It only happens every few jobs though, it doesn't happen every time. I do make the --n flag lower based on the number of cores I run. Could this be the issue? Should --n always be 48? Thanks!
To be more precise, this is my exact bash script:
#!/bin/bash

## load .bashrc
source /mnt/gluster/jlmcdaniel/.bashrc

## start ipcluster
ipcluster start --n 4 --profile=ipyrad --daemonize

## give ipcluster time to start running
sleep 60

## navigate to directory with your params file
cd /mnt/gluster/jlmcdaniel/ipyrad/branching/txt_files/params

## run ipyrad
ipyrad -p params-job2.txt -s 2 --ipcluster