These are chat archives for dereneaton/ipyrad
-dflag it will attempt to cluster the samples that failed and will also enable debug output. After you do this email me the ipyrad_log.txt file. I'll pm you my email address...
maxalleles=2and n-ploid as well) and all get clustered in step 6?
max_alleles_consenshaplotypes are filtered out. Deren, can you chime in here and let me know if i got this right?
max_alleles_consens=1, in which case the step 4 heterozygosity estimate will be fixed to zero and the error rate will suck up all of the variation within sites, and then the step 5 base calls will be haploid calls. For all other values of
max_alleles_consensbase calls are made using the diploid model using the H and E values estimated in step 4. After site base calls are made ipyrad then counts the number of alleles in each cluster. Like Edgardo said, this value is now simply stored in step 5 for use later in step 7 to filter loci, under the assumption that if a locus has paralogs in one sample then it probably has them in other samples but there just wasn't enough variation to detect them. We don't currently have a way to implement the filter in step 5 instead of step 7 (like it used to be done in pyrad), but we could make that an option.
max_alleles_consensat step 5 also would be a great option, specially when combining different species not so closely related when I couldn't assume that what is paralog in only one species will also be in the rest, in fact this is leaving me with just a small fraction of the loci, pretty much invariant in most cases. I know ddRAD is not the best for phylogeny among species, but this option would help to recover more loci from this kind of data at least, I think. It would also help to identify problematic samples that have too many "polyploid" consensus sequences too.