These are chat archives for dereneaton/ipyrad

Jul 2017

Any solution for this issue?

Hi @dereneaton @isaacovercast ,
with a colleague we are trying to use a pop_assign_file. We have 69 individuals with both v.0.6.27 and v.0.7.2 with Deren's work around. Do you have any idea what might be the issue?

-bash-4.2$ ipyrad -p params-notean_nopoor.txt -s 7 -f -d

  ** Enabling debug mode ** 

  ipyrad [v.0.7.2]
  Interactive assembly and analysis of RAD-seq data
  loading Assembly: notean_nopoor
  from saved path: /data/filer-5-2/bernhardt/2017/GBS_wheat_group/assemblies_75_inds/notean_nopoor.json
  establishing parallel connection:
  host compute node: [32 cores] on

  Step 7: Filter and write output files for 69 Samples
  [####################] 100%  filtering loci        | 0:01:55  

  Encountered an error (see details in ./ipyrad_log.txt)
  Error summary is below -------------------------------
error in filter_stacks on chunk 0: IndexError(index 71 is out of bounds for axis 1 with size 69)
Jul 21 2017 15:47 UTC
@isaacovercast @dereneaton Hi guys! I was wondering if you, or any of the ipyrad users could explain me how does the majority rule base calling work. I have data with very low coverage and I have noticed that by using this parameter I get way more homozygotes than by using the statistical base calling. Thanks!
Deren Eaton
Jul 21 2017 19:15 UTC
Hey @Cycadales_twitter, do you mean that the number of cores detected is sometimes different than you expect? That can happen if ipyrad takes a while to detect all of the available engines. We use a sort of adaptive strategy where it looks for cores until it doesn't find anymore for a few seconds and then it assumes it found all of them. It usually works fine. But in the case that it doesn't you can ensure it will connect to all available cores by starting the ipcluster instance manually. This is usually done behind the scenes when using the ipyrad CLI, but is required when using the ipyrad API.
You start an ipcluster instance by calling the ipcluster program
## starts ipcluster running in the background
ipcluster start --n=20 --daemonize

## tells ipyrad to connect to the running ipcluster instance
ipyrad -p params.txt -s 123 --ipcluster
There are more advanced ways to run this as well which we haven't fully documented yet.
Deren Eaton
Jul 21 2017 19:43 UTC
Hi @natagalle, the majority-rule calls the most common base, and if two bases are equal then it is called the ambiguity character. You can see the actual base counts in the VCF output. Majority calls are only made for bases with depth below the mindepth_statistical and above the mindepth_majrule setting. The majority rule calls are definitely more likely to incorrectly call heterozygotes.
@joqb I'll play around with the pop-assignments and see if I can replicate it.
Jul 21 2017 21:13 UTC
@dereneaton Thanks! So if there is a tie, the site could erroneously be called as heterozygous. And if there is a site with depth 5 and with A=3 and T=2, it could erroneously be called homozygous (AA). Did I get it right?
Deren Eaton
Jul 21 2017 21:19 UTC
@natagalle exactly.