These are chat archives for dereneaton/ipyrad

4th
Oct 2017
Ollie White
@Ollie_W_White_twitter
Oct 04 2017 08:14

Thanks @isaacovercast for the suggestion regarding denovo-reference assembly. I have got some error messages following step 3 using the denovo-reference method.

Fatal error: Invalid line 3133846 in FASTQ file: Unexpected end of file
)

and

OSError([Errno 2] No such file or directory: '/scratch/oww1v14/ipyrad/data1_edits/frs234-refmap_derep.fastq')

I only had frs234_derep.fastq in this directory after checking this directory.

Do you have any suggestions on where I might have gone wrong? It's odd because one of my branches ran step 3 no problem.

Cheers
Ollie

Isaac Overcast
@isaacovercast
Oct 04 2017 15:42
@congliu0514 Did you re-run the whole assembly from step 1 with max_alleles_consens = 1 or did you just re-run step 7?
Isaac Overcast
@isaacovercast
Oct 04 2017 17:03
@rfolkert This behavior meets my expectations. If you specify 12 samples per population for 14 populations this would drastically reduce the number of loci recovered. Specifying 12 for the min_samples_locus outputs all loci for which there are 12 or more samples total, this is much more permissive. Does this make sense?
@Ollie_W_White_twitter it looks like a corrupt fastq file. My guess is that you're running out of disk space, either the disk is full or you're hitting a quota, and then the derep file is getting truncated. Can you verify that you have plenty of disk?
rfolkert
@rfolkert
Oct 04 2017 18:00
@isaacovercast If I understand correct: setting 12 for min_samples_locusrecovers loci which are found at least 12 times in every population. I am interested in comparing these 14 populations, and do population analyses on the data (Fst, lfmm) is it also possible to run them all in 1 analyses and also recovering loci that are not found within all populations? As such that I have all the data in 1 VCF-file? I could set min_samples_locusto 1 or to 0 maybe...
Isaac Overcast
@isaacovercast
Oct 04 2017 18:57
@rfolkert min_samples_locus recovers loci which are found at least 12 times in ALL samples. The behavior you suggest is what the population assignments file is used for.
rfolkert
@rfolkert
Oct 04 2017 19:37
@isaacovercast yes, thank you, Ishould have been clearer about the difference between min_sample_locus and the pop_assign_file. # pop1:12 pop2:12 etc.. search for each loci recovered a minimum of 12 times in each population. But still my question remains: if I set # pop1:0 pop2:0 etc would it then recover all loci for all populations?
eachambers
@eachambers
Oct 04 2017 20:23

@isaacovercast @dereneaton
Hi Deren and Isaac:
I'm running 12 samples of 2bRAD data, and have had a few issues along the way. At Step 2, I got a python egg cache error (described dereneaton/ipyrad#197), which I was able to resolve by downgrading to iPyrad v.0.7.1 (instead of 0.7.15). I also had an error at Step 6 (described dereneaton/ipyrad#253) which was resolved by downgrading numpy to v.1.12.1. Now, the process is stopping at Step 7, and the error is:

error in filter_stacks on chunk 0: IOError(Unable to open file (Unable to lock file, errno = 11, error message = 'resource temporarily unavailable'))

After looking at the source code, it seems like this might be an issue with async, but I can't figure out how to fix it. I tried upgrading numpy but that didn't work (same error message). Any help with this would be much appreciated. Also, I'm using python 2.7.