These are chat archives for dereneaton/ipyrad

Jul 2017
Glib Mazepa
Jul 11 2017 10:42
Hi @dereneaton and all, is there any way within the pipeline that one can account (filter) for LD between loci? I am running STRUCTURE analysis on the ipyrad's .str and would like to check the data for LD, perhaps someone has tried to do that already?
Deren Eaton
Jul 11 2017 13:37
Hi @mazepago_twitter , I assume that SNPs are completed linked within loci, and effectively unlinked between loci. And so for an analysis like structure it is best to sample a single SNP per locus, and to bootstrap the analysis over multiple iterations where different random SNPs are sampled in each. We have a recommended workflow for that here:
As for measuring LD, however, I have not done that myself.
Jul 11 2017 16:21

Hi @dereneaton , just updated however I'm getting an error

-bash-4.2$ ipyrad -p params-taen.txt -s 7
Traceback (most recent call last):
  File "/data/filer-5-2/bernhardt/miniconda2/bin/ipyrad", line 11, in <module>
    load_entry_point('ipyrad==0.7.2', 'console_scripts', 'ipyrad')()
  File "/data/filer-5-2/bernhardt/miniconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/", line 565, in load_entry_point

  File "/data/filer-5-2/bernhardt/miniconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/", line 2598, in load_entry_point

  File "/data/filer-5-2/bernhardt/miniconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/", line 2258, in load

  File "/data/filer-5-2/bernhardt/miniconda2/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/", line 2264, in resolve

  File "/data/filer-5-2/bernhardt/miniconda2/lib/python2.7/site-packages/ipyrad/", line 20, in <module>
    from . import load as _load
  File "/data/filer-5-2/bernhardt/miniconda2/lib/python2.7/site-packages/ipyrad/load/", line 14, in <module>
    from .load import test_assembly
  File "/data/filer-5-2/bernhardt/miniconda2/lib/python2.7/site-packages/ipyrad/load/", line 13, in <module>
    from ipyrad.assemble.util import *
  File "/data/filer-5-2/bernhardt/miniconda2/lib/python2.7/site-packages/ipyrad/assemble/", line 7, in <module>
    from . import cluster_within
  File "/data/filer-5-2/bernhardt/miniconda2/lib/python2.7/site-packages/ipyrad/assemble/", line 31, in <module>
    from refmap import *
  File "/data/filer-5-2/bernhardt/miniconda2/lib/python2.7/site-packages/ipyrad/assemble/", line 15, in <module>
    import pysam
  File "/data/filer-5-2/bernhardt/miniconda2/lib/python2.7/site-packages/pysam/", line 5, in <module>
    from pysam.libchtslib import *
ImportError: /lib64/ version `GLIBC_2.23' not found (required by /data/filer-5-2/bernhardt/miniconda2/lib/

Did I do something wrong?

Another thing, before updating we were having an issue: we can subsample by simply adding the individual names at the end of the line but not when using a text file, it would not be able to find the individuals. Is there a trick with the format of this file? After the update nothing


PS: from my previous error I managed to subsample and rerun step 6 and 7 without issues, I am still running it on the real dataset, should know tomorrow.

Deren Eaton
Jul 11 2017 17:34
Thanks. No this is a problem with a new package update for one of the dependencies. I'll look into it. Seems to not be working on all systems at the moment.
Jul 11 2017 18:08

Hi @dereneaton , I am working through the Jupyter NB for determining phylogenetic signal and its relationship to missing data (
The presence/absence matrix generation and model fitting worked fine for the simple simulated dataset at the beginning of the notebook, but I am having trouble getting the code to run with my data. Specifically there seems to be an error reading in my RAxML tree as a newick file

data1 = dataset("data1")
data1.files.loci4 = "/home/nathan/Documents/jatropha/data/ultimate/FW/fwkept/jat-kept-fw85m4_outfiles/jat-kept-fw85m4.loci"
data1.files.tree = "/home/nathan/Documents/jatropha/data/ultimate/FW/fwkept/from cipres/fw85m4/RAxML_bestTree.kept4a-fw85m4.tre"
data1.files.s2 = "/home/nathan/Documents/jatropha/data/test/kept-7_0_edits/s2_rawedit_stats.txt"
dsets = [data1]

Runs fine, but

## submit parallel [getarray] jobs
asyncs = {}
for dset in dsets:
    asyncs[] = lbview.apply(getarray, *[dset.files.loci4, dset.files.tree])

## collect results
for dset in dsets:
    dset.lxs4, dset.tree = asyncs[].get()
    print, "\n", dset.lxs4, "\n"
/home/nathan/miniconda2/lib/python2.7/site-packages/ete3-3.0.0b36-py3.5.egg/ete3/parser/newick.pyc in _read_node_data(subnw, current_node, node_type, matcher, formatcode)
    384             _parse_extra_features(node, data[2])
    385     else:
--> 386         raise NewickError("Unexpected newick format '%s' " %subnw[0:50])
    387     return
NewickError: Unexpected newick format ':0.00288112864877994356'

I looked through ete3 forums to try and solve the issue, but haven't been able to make anything work

Eaton Lab
Jul 11 2017 18:31
@joqb oh boy, apparently this missing library is a big problem for lots of people. I'm finding a lot of examples online of other people saying the only fix they can find is "upgrade your version of ubuntu". We'll have to find some kind of workaround since cluster users can't do that. Dang, it worked fine on the machines I tested on, of course.
Eaton Lab
Jul 11 2017 18:36
I think I can fix it though...
Eaton Lab
Jul 11 2017 18:53
@leclearnm cool that you're checking out that code. I think if you try changing the ete3.Tree format to 0 it will read the tree. Also it looks like in my code I loaded the 'bipartitions' tree instead of the 'best' tree.
Eaton Lab
Jul 11 2017 19:29
@joqb OK, I see, the pysam/samtools/htslib libraries were all updated recently on the bioconda channel to avoid this issue. Although their version is not working for me currently either, but it gets past our current problem and runs into a different one...
Eaton Lab
Jul 11 2017 20:25
@all rolling back to 0.7.1 until 0.7.2 installation error is fixed, hopefully within the next day.