Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Eaton Lab
    @eaton-lab
    @pmckenz1 can you remind me when you are available for office hours?
    Eaton Lab
    @eaton-lab
    btw, do you have any problems using curl on OSX? From my understanding it should work fine, but some students were having trouble.
    Patrick McKenzie
    @pmckenz1
    Oh sorry, missed these. curl works fine -- will message this to the main group too, but I think people were just printing to terminal rather than using an output flag
    Eaton Lab
    @eaton-lab
    @pmckenz1 just thought of a hiccup you're likely to run into with ipyparallel, which is that when you make a change to your code, such as the simcat module, it does not update on your remote engines until you restart them, just like how you have to restart the notebook.
    Patrick McKenzie
    @pmckenz1
    Thanks!
    Patrick McKenzie
    @pmckenz1
    Any chance you were able to find the Markov Katana Perl script referenced in the paper? I tried and failed -- no link on the paper and no github repo that I could find.
    Eaton Lab
    @eaton-lab
    no idea, I don't see it either
    Patrick McKenzie
    @pmckenz1
    @eaton-lab Any chance you've seen this error before when running in parallel?
    host compute node: [4 cores] on Patricks-MacBook-Pro.local
    [                    ]   0% | 0:00:29 | simulating count matrices
    Unknown exception encountered: Unable to open file (unable to lock file, errno = 35, error message = 'Resource temporarily unavailable')
    It's proving to be a tough error to track down
    Patrick McKenzie
    @pmckenz1
    Hoping to revive the Gitter this year in a more official way. If you're new to Gitter, know that you can open a formatting cheatsheet by clicking on the button in the bottom right-hand corner of this box (with an M and a downward arrow)
    GUO CEN
    @gc2799_gitlab
    Thanks for letting me know, I think it's cool.
    Eaton Lab
    @eaton-lab
    Hey Pedicularis folks, you might be interested in this recent review of pollination studies in China: https://www.sciencedirect.com/science/article/pii/S2468265918300751
    Eaton Lab
    @eaton-lab
    An interesting paper on distance methods for species tree inference: https://arxiv.org/abs/1806.04974. I've been thinking distance trees may be a good starting method for getting a topology or edge lengths for simcat. I'm not sure how to treat missing data in this though... I suppose you just calculate pairwise distances between all pairs using whichever data they share.
    Isaac Overcast
    @isaacovercast
    watdo
    Edgar Benavides
    @Edgagoras_gitlab
    Hey hey, Joining GitLab for the best lab @ Columbia
    Patrick McKenzie
    @pmckenz1
    awesome, good to have a crowd here!
    Patrick McKenzie
    @pmckenz1
    @eaton-lab allman & rhodes paper looks cool. Not sure, but at a glance seems like you're right that pairwise is fine -- they build the tree from pairwise distances (theorem 8) and have found the log-det distances just based on relative site pattern frequencies between two taxa
    Dense paper though
    Eaton Lab
    @eaton-lab
    Eaton Lab
    @eaton-lab

    Some papers of interest:

    1. "A One-Penny Imputed Genome from Next-Generation Reference Panels"
      https://www.sciencedirect.com/science/article/pii/S0002929718302428?via%3Dihub
      Why: using a large population sample you can phase SNPs for any given sample. We could hopefully do this with the large Amaranthus data. Maybe even with smaller population data sets in Pedicularis (what is considered large?). They use msprime simulations in the paper to show that it works, which is cool. Another idea: using a phased panel of pollen data to then use to phase diploids.

    2. "Loter: A Software Package to Infer Local Ancestry for a Wide Range of Species"
      https://academic.oup.com/mbe/article/35/9/2318/5040668
      Why: infer local ancestry tracks for multiple species. This is what I really want to do with data like for the oaks. Problem is you need phased data for it to work well. The method for assigning tracks here is similar to what we want to do in assigning parental alleles to pollen. We could get phase like in the method above...

    Patrick McKenzie
    @pmckenz1
    Have we looked at this paper? "Dgen: a test statistic for detection of general introgression scenarios" @eaton-lab https://www.biorxiv.org/content/biorxiv/early/2018/06/17/348649.full.pdf
    People love to lose sleep over "allele dropout".
    Eaton Lab
    @eaton-lab
    so many papers like this.
    Patrick McKenzie
    @pmckenz1
    "Inference of recombination maps from a single pair of genomes and its application to archaic samples" https://www.biorxiv.org/content/biorxiv/early/2018/10/25/452268.full.pdf
    ^HMMs for days
    Eaton Lab
    @eaton-lab
    @pmckenz1 polya-urn models (the Dirichlet distribution) are the basis for the BUCKy method, and this paper describes a new way to massively parallelize Bayesian inference under this model. I think it could be very relevant to the spatial bucky method Patrick and I have been discussing: https://arxiv.org/abs/1704.03581, https://thebayesianobserver.wordpress.com/category/machine-learning/page/2/
    Eaton Lab
    @eaton-lab
    latent dirichlet allocation in pymc3: https://docs.pymc.io/notebooks/lda-advi-aevb.html
    Eaton Lab
    @eaton-lab
    @pmckenz1: set("1234567") works even easier.
    Patrick McKenzie
    @pmckenz1
    Smh
    Almost too easy
    Isaac Overcast
    @isaacovercast
    @pmckenz1 @eaton-lab This is the paper I was looking for re: phylogenies from datasets with extensive missing data: https://www.ncbi.nlm.nih.gov/pubmed/26589995
    Patrick McKenzie
    @pmckenz1
    @isaacovercast similar themes in this paper but focused on ASTRAL and MP-EST: https://www.biorxiv.org/content/early/2018/11/04/461699
    (above) Deep learning in genomics
    sandrahoffberg
    @sandrahoffberg
    I am having trouble downloading software to teremoto and starting a jupyter notebook... For example, this website says I should be able to use conda to install bcl2fastq, but I am getting an "unexpected error". https://anaconda.org/dranew/bcl2fastq
    Can anyone help me troubleshoot?
    Isaac Overcast
    @isaacovercast
    Is this error happening during conda install?
    Works for me. Did you try creating a new conda env?
    sandrahoffberg
    @sandrahoffberg
    yes, happening during conda install... I didn't try creating a new env
    sandrahoffberg
    @sandrahoffberg
    I was able to install it with a new conda env - thanks!
    sandrahoffberg
    @sandrahoffberg
    I can't remember the account name (partition) to use on Habanero. Does anyone know?
    Eaton Lab
    @eaton-lab
    dsi
    tle003
    @tle003
    Hi, I am having problems getting with toytree. I am plotting with toytree version 0.1.16, since I have a recent installation of ipyrad. I can get a simple plot with tre1.draw(); , but as soon as I try to add attributes such as height, I get an attribute error, such as AttributeError: TreeStyle instance has no attribute 'setattr'. Is there any code specific to the .draw() function that I can use so that I can add attributes to my toytree plots. Is there a more recent version of toytree that I can now use with ipyrad. It seems like a very useful tool that I would like to use in my work.
    sandrahoffberg
    @sandrahoffberg
    What is the difference between running a job on multiple nodes, cores, and multithreaded? On the moto eaton partition, is it possible to run more than 1 node? I am trying to run RAxML PTHREADS or HYBRID and not specifying the resources correctly.
    Eaton Lab
    @eaton-lab
    The HPC is made up of many nodes, which are basically separate computers. Each has 20 cores, and those cores are made up of 1 or 2 CPUs, which will have 10 or 20 threads. To work across multiple nodes (e.g., you want 40 cores) you need to use multicore methods, multithreaded jobs can only parallelize within a single node (e.g., up to 20 cores/threads). Not all programs or steps in a program can be parallelized in both ways. For example, raxml is primarily a multi-threaded program (-T) meaning it can make use of at most 1 node. However, the bootstrapping steps can be run across multiple cores or nodes since they run independent of each other (this is what the MPI version of raxml does). I don't think this done automatically for you with the -f a argument, though, and that you instead need to tell it to do bootstrapping and then tell it to do the full tree search. It's usually easier to just use the PTHREADS version, submit a long -f a job and wait. But I guess getting a whole node for multiple days might take a while...
    @sandrahoffberg
    "eLife's first computationally reproducible article"