Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 08:18
    edgardomortiz opened #388
  • 08:11
    edgardomortiz opened #387
  • Jan 21 09:34
    isaacovercast opened #386
  • Jan 19 23:34

    isaacovercast on 0.9.31

    (compare)

  • Jan 19 23:34

    isaacovercast on master

    "Updating ipyrad/__init__.py t… (compare)

  • Jan 19 23:33

    isaacovercast on master

    Fix error in bucky (progressbar… (compare)

  • Jan 18 18:10

    isaacovercast on master

    Add error handling in a couple … Merge branch 'master' of https:… (compare)

  • Jan 17 15:42
    isaacovercast commented #384
  • Jan 17 15:34
    isaacovercast commented #384
  • Jan 17 12:54
    kis3990 commented #385
  • Jan 17 12:47
    isaacovercast closed #385
  • Jan 17 12:47
    isaacovercast commented #385
  • Jan 17 12:44

    isaacovercast on master

    Removed support for legacy argu… Merge branch 'master' of https:… (compare)

  • Jan 17 12:30
    kis3990 opened #385
  • Jan 17 10:55
    isaacovercast opened #384
  • Jan 16 18:53

    isaacovercast on master

    Allow PCA() to import data as v… (compare)

  • Jan 16 16:56

    isaacovercast on master

    Add support for importing VCF i… (compare)

  • Jan 16 15:31

    isaacovercast on 0.9.30

    (compare)

  • Jan 16 15:31

    isaacovercast on master

    "Updating ipyrad/__init__.py t… (compare)

  • Jan 16 15:30

    isaacovercast on master

    Fix whoops with bucky progressb… (compare)

Isaac Overcast
@isaacovercast
@laninsky I'm also verifying order of snps in the .str and .vcf files, not 100% sure at this point, but I'll let you know once i make sure.
Ivan Prates
@ivanprates
Hi @isaacovercast, thanks so much for the reply and for looking into this. I think this is totally fine - I was just concerned that it wouldn't be using most of the cores most of time. I guess the question now is what would be the time expectation to run step 6 with a data set like mine (~700 samples). I asked for 10 days in our HPC cluster, but I now wonder if that will be enough. Isaac, are the different substeps of step 6 checkpointed like they used to be in 0.7.30? Thanks so much once again.
Isaac Overcast
@isaacovercast
@ivanprates Step 6 has been massively overhauled for performance. I have a test dataset here with 200 samples and ~1e6 reads per sample (single-end) and you can watch step 6 run, it's fast. If your data is paired-end it might take a little longer, but 10 days is far beyond how long step 6 should take with the new codebase. The checkpointing did not make it to the 0.9 version, but with the improved performance we don't think it's necessary. I will be curious to see how your data proceeds, so please let me know if step 6 drags out too long.
We implemented hierarchical clustering in step 6, which greatly improves performance. It should run fast. At the same time 700 samples is a monster dataset, so please do let me know how it goes, will be happy to help tune it.
Alana Alexander
@laninsky
Thanks for looking into these issues @isaacovercast !
tbh, if the vcf depths are fine, then the order of SNPs in the other outputs doesn't matter too much, because I can pull the genotypes from the vcf file, but might be good to know anyway for the future.
Ivan Prates
@ivanprates
Thanks, @isaacovercast. Perhaps as a reference, "clustering tier 1" just reached 50% after 20 hours. I sure will let you know how it goes.
Isaac Overcast
@isaacovercast
@ivanprates That sounds totally reasonable, for the size of your data. Keep me informed.
Isaac Overcast
@isaacovercast
giorgio-92
@giorgio-92

Hello to everybody
I'm trying to use bucky on ipyrad. but it makes some error, this is the script:

import ipyrad.analysis as ipa
import ipyparallel as ipp
samples = ["Ca_PON", "Cd_BEL", "Cg_GLY2", "Cga_AOS" ,"Ch_CER","Cm_LEON", "Co_ANT", "Cp_PAM1", "Ct_BIE",]
c = ipa.bucky(name= "8044", data="/home/utente/Scrivania/Grenoble/con_trimmed/8044/prova_trimm_outfiles/prova_trimm.loci", workdir="analysis-bucky", samples=samples, minsnps=0,)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'bucky'

I've already read the issue on github "dereneaton/ipyrad#346", and I've tried to reinstall ipyrad wit -f , but it keep to give me back the same error: Some ideas?

Isaac Overcast
@isaacovercast
@giorgio-92 What version of ipyrad are you running? It works for me with the current version:
>>> import ipyrad.analysis as ipa
>>> ipa.bucky("wat", "wat", "wat")
<ipyrad.analysis.bucky.Bucky object at 0x7f2b1d457198>
Tomasz Suchan
@TomaszSuchan
Hi all, I'm using ipa.pca for my analysis and would like to make a loadings plot. I can't find any function to do this or how this data is stored. Can you please help me out?
Tomasz Suchan
@TomaszSuchan
OK, I managed to extract the loadings by modifying the code as here: https://www.nxn.se/valent/loadings-with-scikit-learn-pca. Now my question is - how to get a list of SNPs that were subsampled for the analysis?
giorgio-92
@giorgio-92

@isaacovercast I've used the version got with the command "conda install -c ipyrad ipyrad", and it gives me back the previous problem. So I've removed that version and install the new one with conda install ipyrad -c bioconda (0.9.28) I've tried to use python 2.7 but the software gives me back this error:
(buckyenv) utente@gio-pc:~$ python
Python 2.7.17 |Anaconda, Inc.| (default, Oct 21 2019, 19:04:46)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import ipyrad.analysis as ipa
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/utente/miniconda3/envs/buckyenv/lib/python2.7/site-packages/ipyrad/analysis/init.py", line 44, in <module>
from .fasttree import Fasttree as fasttree
File "/home/utente/miniconda3/envs/buckyenv/lib/python2.7/site-packages/ipyrad/analysis/fasttree.py", line 123
print("Fasta file already exist in: {}".format(str(self.params.f)), end='\n')
^
SyntaxError: invalid syntax
So I've tried to use python 3.0 but it says that is incompatible and doesn't install ipyrad. Seems that it uses a python 3 sintax that is incompatible with ipyrad

Isaac Overcast
@isaacovercast
@TomaszSuchan Well, there's not really an easy way to do this. The idea is that you'd run multiple replicates subsampling at each replicate to get an accurate picture, so the way it's designed, any given subsampling is not of specific interest. What is it that you want to do?
@giorgio-92 The fasttree/python2.7 error is fixed in the repo but hasn't been pushed to a new conda package yet. Python3 should work fine. When reporting an error the exact error message text is useful for debugging. What does it say that leads you to believe it's incompatible? I have been running ipyrad on python3 for more than a year and it's works. Did you try reinstalling conda?
Tomasz Suchan
@TomaszSuchan
@isaacovercast Thanks for your response! I wanted to see which loci are responsible for most of variation along one PCA axis. We have some strange pattern that I think might be a contamination. Anyways, I could solve this by looking at the loadings on the dataset without filtering any loci and without replication, then comparing the locations of the loci with the vcf file. It would be actually really cool if ipa.pca somehow store the information on the loci used in the analysis but this is a good workaround.
Isaac Overcast
@isaacovercast
@TomaszSuchan The resampling procedure selects one snp per locus, so it's always using information from all loci for each replicate.
Tomasz Suchan
@TomaszSuchan
@isaacovercast Yes, that's what I realized. So after using all the loci I could match them to the original vcf. Thanks!
Isaac Overcast
@isaacovercast
+1 Cool. Hope you're enjoying the analysis.ipa module, it's one of my favorite and most useful!
giorgio-92
@giorgio-92

@isaacovercast I've reinstalled conda and now it seems works!! But now there is another error! Whar could it be?

(buckyenv3) utente@gio-pc:~$ python
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

import ipyrad.analysis as ipa
import ipyparallel as ipp
samples = ["Ca_PON", "Cd_BEL", "Cg_GLY2", "Cga_AOS" ,"Ch_CER","Cm_LEON", "Co_ANT", "Cp_PAM1", "Ct_BIE",]
c = ipa.bucky(name= "8044", data="/home/utente/Scrivania/Grenoble/con_trimmed/8044/prova_trimm_outfiles/prova_trimm.loci", workdir="analysis-bucky", samples=samples, minsnps=0,)
c.params
bucky_alpha [0.1, 1.0, 10.0]
bucky_nchains 4
bucky_niter 1000000
bucky_nreps 4
maxloci None
mb_mcmc_burnin 100000
mb_mcmc_ngen 1000000
mb_mcmc_sample_freq 1000
minsnps 0
seed 665635204

ipyclient = ipp.Client()
print("{} engines found".format(len(ipyclient)))
4 engines found
c.run(force=True, ipyclient=ipyclient)
wrote 16 nexus files to ~/analysis-bucky/8044
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/utente/miniconda3/envs/buckyenv3/lib/python3.7/site-packages/ipyrad/analysis/bucky.py", line 326, in run
self.run_mrbayes(force=force, quiet=quiet, ipyclient=ipyclient)
File "/home/utente/miniconda3/envs/buckyenv3/lib/python3.7/site-packages/ipyrad/analysis/bucky.py", line 453, in run_mrbayes
progressbar(len(ready), sum(ready), start, printstr)
File "/home/utente/miniconda3/envs/buckyenv3/lib/python3.7/site-packages/ipyrad/analysis/utils.py", line 141, in progressbar
progress = 100 * (finished / float(total))
ZeroDivisionError: float division by zero

Isaac Overcast
@isaacovercast
@giorgio-92 This is fixed. Look for v.0.9.29 on bioconda some time later today.
giorgio-92
@giorgio-92

@isaacovercast I've tried to install the new version. But it seems that it didn't find it!

(buckyenv3) utente@gio-pc:~$ conda install ipyrad=0.9.29 -c bioconda
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  • ipyrad=0.9.29

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

Isaac Overcast
@isaacovercast
@giorgio-92 Works for me. Just do it like this: conda install -c bioconda ipyrad
giorgio-92
@giorgio-92
@isaacovercast I try it but it continue to install me the 0.9.28 version!!
Isaac Overcast
@isaacovercast
@giorgio-92 What version of python do you have?
@giorgio-92 What operating system? Mac? Linux?
giorgio-92
@giorgio-92
@isaacovercast my python version is 3.7.4, and I'm on linux!
Isaac Overcast
@isaacovercast
If you already have the package installed you have to use the -f flag to force the upgrade: conda install -c bioconda ipyrad -f
Don't know why i didn't think of that sooner...
giorgio-92
@giorgio-92

@isaacovercast Ok now works, but there are still the same error (but it did the first charging to 100%)

c.run(force=True, ipyclient=ipyclient)
wrote 16 nexus files to ~/analysis-bucky/8044
[####################] 100% 0:00:01 | infer gene-tree posteriors
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/utente/miniconda3/envs/buckyenv3/lib/python3.7/site-packages/ipyrad/analysis/bucky.py", line 328, in run
self.run_mbsum(force=force, quiet=quiet, ipyclient=ipyclient)
File "/home/utente/miniconda3/envs/buckyenv3/lib/python3.7/site-packages/ipyrad/analysis/bucky.py", line 406, in run_mbsum
progressbar(sum(ready), len(ready), start, printstr)
File "/home/utente/miniconda3/envs/buckyenv3/lib/python3.7/site-packages/ipyrad/analysis/utils.py", line 141, in progressbar
progress = 100 * (finished / float(total))
ZeroDivisionError: float division by zero

Isaac Overcast
@isaacovercast
Oh shoot, I missed one change! I will fix it and push. You can work around this by using run(quiet=True), this will disable the progress bars, which is where the error is.
@giorgio-92 v.0.9.30 will be up on bioconda some time later today.
giorgio-92
@giorgio-92

@isaacovercast I've tried again with version 0.9.30. But duesn't work!! there is always the same error!

(buckyenv3) utente@gio-pc:~$ python
Python 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

import ipyrad.analysis as ipa
import ipyparallel as ipp
samples = ["Ca_PON", "Cd_BEL", "Cg_GLY2", "Cga_AOS" ,"Ch_CER","Cm_LEON", "Co_ANT", "Cp_PAM1", "Ct_BIE",]
c = ipa.bucky(name= "8044", data="/home/utente/Scrivania/Grenoble/con_trimmed/8044/prova_trimm_outfiles/prova_trimm.loci", workdir="analysis-bucky", samples=samples, minsnps=0,)
c.params
bucky_alpha [0.1, 1.0, 10.0]
bucky_nchains 4
bucky_niter 1000000
bucky_nreps 4
maxloci None
mb_mcmc_burnin 100000
mb_mcmc_ngen 1000000
mb_mcmc_sample_freq 1000
minsnps 0
seed 595003987

ipyclient = ipp.Client()
print("{} engines found".format(len(ipyclient)))
4 engines found
c.run(force=True, ipyclient=ipyclient)
wrote 16 nexus files to ~/analysis-bucky/8044
[####################] 100% 0:00:01 | infer gene-tree posteriors
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/utente/miniconda3/envs/buckyenv3/lib/python3.7/site-packages/ipyrad/analysis/bucky.py", line 328, in run
self.run_mbsum(force=force, quiet=quiet, ipyclient=ipyclient)
File "/home/utente/miniconda3/envs/buckyenv3/lib/python3.7/site-packages/ipyrad/analysis/bucky.py", line 406, in run_mbsum
progressbar(sum(ready), len(ready), start, printstr)
File "/home/utente/miniconda3/envs/buckyenv3/lib/python3.7/site-packages/ipyrad/analysis/utils.py", line 141, in progressbar
progress = 100 * (finished / float(total))
ZeroDivisionError: float division by zero

on regard tre quiet=true have I wrote like this: c.run(force=true, ipyclient=ipyclient, quiet=true). but it only does :

c.run(force=True, ipyclient=ipyclient, quiet=True)

and don't gives back any results in workdir!

Isaac Overcast
@isaacovercast
@giorgio-92 Your mrbayes install is almost certainly broken.
Open a terminal, switch to your ipyrad conda env and run mb, i'm betting it will crash. What's the error message?
giorgio-92
@giorgio-92

@isaacovercast it's true this is the error:

mb: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory

giorgio-92
@giorgio-92
I've tried to uninstall and reintall but it gives me back the same error
Isaac Overcast
@isaacovercast
@giorgio-92 It's a nasty dependency issue. I don't know how to solve it at this point: dereneaton/ipyrad#384
You might try rolling back to the ipyrad v.0.7 branch (which is in the ipyrad conda channel), but no promises.
Isaac Overcast
@isaacovercast

@giorgio-92 Another alternative that definitely does work (assuming you have sudo on your system):

conda remove mrbayes --force
sudo apt-get install mrbayes

I tested this and the bucky analysis module does work after this.

giorgio-92
@giorgio-92
@isaacovercast , now it works and complete the first progres bar! But after it says so:

@isaacovercast

c.run(force=True, ipyclient=ipyclient)
wrote 16 nexus files to ~/analysis-bucky/8044
[####################] 100% 0:07:00 | infer gene-tree posteriors
[####################] 100% 0:00:00 | sum replicate runs
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/utente/miniconda3/envs/buckyenv3/lib/python3.7/site-packages/ipyrad/analysis/bucky.py", line 330, in run
self.run_bucky(force=force, quiet=quiet, ipyclient=ipyclient)
File "/home/utente/miniconda3/envs/buckyenv3/lib/python3.7/site-packages/ipyrad/analysis/bucky.py", line 527, in run_bucky
progressbar(sum(ready), len(ready), start, printstr)
File "/home/utente/miniconda3/envs/buckyenv3/lib/python3.7/site-packages/ipyrad/analysis/utils.py", line 147, in progressbar
.format(hashes + nohash, int(progress), elapsed, message),
TypeError: unsupported format string passed to tuple.format

Isaac Overcast
@isaacovercast
@giorgio-92 Pushed a new version v.0.9.31, should be up on bioconda in an hour or two. You are the first person to use this module since our big upgrade, so that's why we're encountering an unusual number of bugs.
giorgio-92
@giorgio-92
@isaacovercast sure!! No problem, thank you for your help and for your time!
giorgio-92
@giorgio-92
@isaacovercast finally the program works and give backs the CF-a10.0.concordance output!! Thank you very muck
benkerbs
@benkerbs
Hi Isaac & Deren, would it be possible to add tree age prior as a modifiable parameter to the mrbayes wrapper? Even when I force set mb.params.treeagepr="fixed(5.8)", the nexus string reverts to treeagepr=offsetexp(1, 5). I tried to nano the nexus file and manipulate directly in another terminal, but again the line reverts to offsetexp(1, 5) as soon as I run. Any help here would be so appreciated since I am having trouble running mrbayes independently on our cluster (and because this wrapper is so useful). Thanks!
Jenn Drummond
@jdrum00
Hi, everybody. Thanks for the package, and congrats on the paper coming out! I'm trying to help some folks in our labs with their ipyrad runs, and coming up against runtime issues. I see step 6 just got some optimization, which is great. We're stuck at step 3 with a 179-sample set. I've Googled around, but does anyone know of a good reference for expected runtimes? I'm on ipyrad 0.9.31, with 179 fastq.gz's ranging fairly smoothly from 5M to 300M. Right now I'm trying step 3 in separate tests with 6, 12, 24, and 48 samples to get a feel for performance, but even the 6-sample test is only 83% done after 7 hours, and the 48-sample test hasn't budged from "0% loading reads". My command is ipyrad -p params.txt -s3 -c32 --MPI, with 4 nodes and 8 cores per, and 4G RAM per CPU. (I could try more cores, but I wanted to do all the tests the same way and at the same time, and I can't run four simultaneous 80-core tests.) Thanks for any insights or references!
tahamimo
@tahamimo
Hi, do you have any suggestion on how to measure the number of snps, level of heterozygosity, gene diversity, Fst and Fis with ipyrad outfiles??
Nayuta YAMAMOTO
@NayutaYamamoto_twitter
Hi, would it be possible to do "Consensus reduction" without meta-data of the genomic positions of RAD loci relative to a reference genome?