These are chat archives for dereneaton/ipyrad

Dec 2016
Dec 16 2016 13:19

Hi all, I have a question regarding the possibility of setting that a locus should be kept only if in a minimum number of individuals per population.

In the old pyRAD it was possible to specify that a locus/allele should be reported in the final output only in it was represented in a minimum number of individual per population (defining the population in the hierarchical clustering at the bottom of the param file).
In ipyrad there is option 21. min_samples_locus which is the minimum number of samples that must have data at a given locus for it to be retained in the final data set but this is across all samples. So for example in our case we have 11 populations with 10 individuals per population and we want loci that are in at least 5 individuals per population but if we set 5 as min_samples_locus we get loci that are represented in a random selection of individuals across the 110 individuals with most of the populations not represented, if we set 55 as min_sample_locus there will be individuals from some populations but not from others represented in the final loci.

Is there any option in ipyrad to specify a Min ind per population? I don't manage to find any indication of it in the manual.

Also, what is the popfile in ipyrad used for? In the manual there is a description of how the file should be made but I don't understand its role in step7. I would imagine that maybe something can be done at this stage to specify a value per population?

Todd Pierson
Dec 16 2016 19:26
Simple question for a stupid mistake: is there an easy way to regenerate a .json file (from the outputsand statistics files for each step that are still in place) if it's accidentally deleted?
Deren Eaton
Dec 16 2016 19:47
@twpierson There is not a very simple way to do it. I would recommend restarting from step 1 and loading in your demultiplexed data by using the 'sorted_fastq_path'.
Todd Pierson
Dec 16 2016 19:48
