These are chat archives for dereneaton/ipyrad
rm_duplicatesare loci with multiple hits within one sample during the "clustering across" step. You can think of these as paralogs or pseudo-paralogs. The reliability of datasets produced with versions < 0.7.16 is not guaranteed. I would expect the extreme amount of rm_duplicate filtering to be highly biased, and therefore I'd imagine you'd want to redo your assembly if you have the option. I would recommend rerunning from step 3 for most accurate results. On the up side you're going to get a LOT more loci/snps with the fixed version.
ls -lin the ipyrad working directory, as well as the *_fastqs directory inside the working directory? Also can you paste in the first 6 or 7 lines of your params file?
filter_min_trim_len, so double check your value here isn't too high. We can take a look at some of the clusters found during step 6 to see what they look like. If you look in your working directory there should be a directory that looks like
<your_assembly>_acrossand inside this directory is a file that ends with
_catclust.gz. Can you execute the following command and email me the output?
gunzip -c *_across/*_catclust.gz | head -n 50
cdto your working directory for this to work.