These are chat archives for dereneaton/ipyrad

22nd
Feb 2018
Isaac Overcast
@isaacovercast
Feb 22 2018 00:57
@toczydlowski I know for a fact that the snps are correct. The locus numbers and positions are correct. In terms of the NS=261;DP=1034 values I'm not positive, I'll have to look. But yeah, I think the depths in the vcf are unreliable right now. mindepth is definitely being honored, I'm certain of this. Sorry the depths are wonky right now.
@ivanprates Deren pushed the new version with the fixed min_trim_len v.0.7.23
tim-oconnor
@tim-oconnor
Feb 22 2018 05:41
Hi @isaacovercast and @eatonlab. I have post-filtered the loci in my assemblies and would like to generate .str, .phy, and .nex files from just a subset of the original loci. Are there internal functions in ipyrad that accept a list of loci (or, alternatively, a .alleles.loci file) and output other file formats? I could probably write this but don't want to reinvent the wheel. Thanks for any guidance.
Isaac Overcast
@isaacovercast
Feb 22 2018 14:02
@tim-oconnor There's not really a straightforward way to do this, all the output formats are generated from the internal hdf5 files after applying filters, so there's not really any code in the codebase for turning a .loci formatted file to .str or .phy. I'm pretty sure pyrad V3 (the original version) used the .loci format as the starting point to convert to other formats, so you might check out the github for the original version, for python scripts to do these conversions. They should still work.
tim-oconnor
@tim-oconnor
Feb 22 2018 15:40

@isaacovercast Thanks, I'll look into it! Another unrelated issue. A friend and I have both noticed that using the denovo-reference option can cause assemblies to fail in step 3. Sometimes the assembly just hangs after building clusters (100% for many hours), and other times I get one of two errror messages below. Confusingly, re-running the same code can sometimes get the assembly to completion. A straight denovo assembly with otherwise identical parameters avoids all these problems. I'm using ipyrad v.0.7.22 . I tried to update to v.0.7.23 and try again, but using conda install -c ipyrad ipyrad tells me all packages are already installed.

-------------------------------------------------------------
  ipyrad [v.0.7.22]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  loading Assembly: s345-6x-c85
  from saved path: /global/scratch/toconnor/ipyrad/larrea/s345-6x-c85.json
  establishing parallel connection:
  host compute node: [22 cores] on n0058.savio2

  Step 3: Clustering/Mapping reads
  [####################] 100%  indexing reference    | 0:00:02  
  [####################] 100%  dereplicating         | 0:00:14  
  [####################] 100%  mapping               | 0:00:34  
  [####################] 100%  clustering            | 0:57:45  
  [####################] 100%  building clusters     | 0:00:03  ERROR:ipyrad.core.assembly:IOError([Errno 2] No such file or directory: '/global/scratch/toconnor/ipyrad/larrea/s345-6x-c85_clust_0.85/CAI_514_6x.utemp.sort')


  Encountered an unexpected error (see ./ipyrad_log.txt)
  Error message is below -------------------------------
IOError([Errno 2] No such file or directory: '/global/scratch/toconnor/ipyrad/larrea/s345-6x-c85_clust_0.85/CAI_514_6x.utemp.sort')

Alternatively, I sometimes get this error, which seems directly related

 -------------------------------------------------------------
  ipyrad [v.0.7.22]
  Interactive assembly and analysis of RAD-seq data
 -------------------------------------------------------------
  loading Assembly: s345-6x-c90
  from saved path: /global/scratch/toconnor/ipyrad/larrea/s345-6x-c90.json
  establishing parallel connection:
  host compute node: [22 cores] on n0058.savio2

  Step 3: Clustering/Mapping reads
  [####################] 100%  indexing reference    | 0:00:02  
  [####################] 100%  dereplicating         | 0:00:13  
  [####################] 100%  mapping               | 0:00:33  
  [####################] 100%  clustering            | 1:05:01  
  [####################] 100%  building clusters     | 0:01:49  

  Encountered an unexpected error (see ./ipyrad_log.txt)
  Error message is below -------------------------------
IOError([Errno 2] No such file or directory: '/global/scratch/toconnor/ipyrad/larrea/s12_edits/T5Q4_2038_6x-refmap_derep.fastq')
@isaacovercast Sorry, I was trying to say that the second error looks to be directly related to the reference mapping.
Deren Eaton
@dereneaton
Feb 22 2018 15:57
@tim-oconnor thanks, we'll have to look into it. I don't think we've done rigorous testing on the denovo-reference method lately and it's possible some recent changes to other things introduced some kind of error.
Isaac Overcast
@isaacovercast
Feb 22 2018 16:59
@tim-oconnor That seems bizarre that it's failing on different files. That sort of feels like a resource allocation issue, but it's weird that it works for denovo.... Can you post the last few lines of the ipyrad_log.txt?
toczydlowski
@toczydlowski
Feb 22 2018 22:56
@isaacovercast Thanks for the confirmation. NS looks to be correct (I've checked internally and against .loci), but DP cannot be correct, because with a mindepth of 8 and 261 samples (previous example), the minimum for DP would be 2,088, which is about double what the .vcf is reporting (DP = 1034).
toczydlowski
@toczydlowski
Feb 22 2018 23:48
Calling the hive mind - I found a distinct pattern in my GSB (de novo) data set of ipyrad finding many more singleton and doubleton SNPs than Stacks but a very similar total number of loci. I filtered the SNPs from each program to 75% or greater coverage across individuals (I did this in R so it is actually on a per SNP basis not per locus like ipyrad does initially), removed SNPs with heterozygosity above 50%, and filtered to 1 random SNP per locus, which gives about 6,500 SNPs (300 individuals). For ipyrad 0.7.13, clust of 0.97, singletons are at about 13% frequency, doubletons 15%, tripletons 3%. For Stacks 1.30, "clust of 0.97" (M and n = 3), singletons and doubletons are each at about 4.5% frequency, tripletons also 3% (like ipyrad). If I bring ipyrad all the way down to clust of 85, singletons are now 15% and doubletons 13%, tripletons still 3%. So changing params within ipyrad don't seem to affect this a great deal - but there is this big difference between Stacks and ipyrad. Ideas as to what might be causing this difference?