These are chat archives for dereneaton/ipyrad

Sep 2018
Sep 24 2018 12:23
Hi, would it be possible to output s7 filtering stats separately for every sample in my assembly? I am particularly interested in the number (or proportion) of loci filtered_by_max_alleles in each sample. I can see that in my dataset of cca 220 individuals, cca 20% of all pre-filtered loci were labeled as filtered_by_max_alleles loci (i.e. more than 2 alleles as I have diploid organism). On the other hand if I assemble only a subset of 16 individuals, I can see only 7% of loci as filtered_by_max_alleles (and the final number of loci in assembly is half for the same individuals in the complete dataset compared to the 16 subset). I suspect that some of my samples in the complete dataset may be cross-contaminated and therefore would show more than 2 alleles in many loci. But, due to supposedly "low-divergence" contamination, there would still be many loci passing the filtering - i.e those which were not variable between the sample and the contamination. I thought that if I could see the distribution of 3-and-more-alleles loci across the individuals, I could have some hint on the possibly cross-contaminated samples.(I tested clustering with 85 and 90% similarity tresholds with slight increase in the proportion of loci filtered_by_max_alleles in the 90% assembly and min_sample 4 in all cases.) Thank you very much for any advice..
Sep 24 2018 13:41
@eaton-lab Oh sorry i didn't knew! Perfect! Thank you very much!
Sep 24 2018 18:41

This may be something I am overlooking, but I am unable to get through Step 3 (am working from an HPC environment. Steps 1 & 2 worked fine, but I receive the following error for step 3:

IPyradError(vsearch v2.0.3_linux_x86_64, 15.5GB RAM, 32 cores
Reading file /dev/stdin 100%
310391941 nt in 2145951 seqs, min 35, max 145, avg 145

Thanks in advance for any suggestions!

Sep 24 2018 18:53
@eaton-lab is there an easy way to convert between ipyrad/pyrad file formats? or do i need to start from scratch?
Isaac Overcast
Sep 24 2018 19:43
@dejsha There's no straightforward way to do this. The best way would either be to hack the codebase for step 7 yourself, or maybe write a custom script to parse the *_across/catclust.gz file.
@heather340 Can you re-run step 3 with the -d flag and post the last several lines of the ipyrad_log.txt file? Also, it could just be a simple resource allocation issue. It seems you have 15GB of ram and 32 cores, but we recommend 4GB of ram per core, for normal operation.
@mcfaddenlab I assume you mean the .loci file output by pyrad. There's not a way to do this conversion, short of writing a custom script. It's probably easier to just redo the assembly.