Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Daniel McDonald
@wasade
:/
Jai Ram Rideout
@jairideout
...and that package list will continue to grow each time qiime2 adds a new dependency (it's recursive)
Daniel McDonald
@wasade
i imagine this is a larger issue for conda, so hopefully they’re digging into a reasonable resolution
Evan Bolyen
@ebolyen
if you dig through their issue tracker, you will find no such evidence
Daniel McDonald
@wasade
there are issues in regards to specifying the channel during meta.yaml
which would resolve our issues
Evan Bolyen
@ebolyen
and it was closed if I recall
Daniel McDonald
@wasade
was it?
ugh
Evan Bolyen
@ebolyen
or is this a new one?
yeah I've lost hope for a reasonable solution
Daniel McDonald
@wasade
conda/conda#988
conda/conda-build#532
Evan Bolyen
@ebolyen
Ok those are the two I am familiar with
no clear resolution, but maybe next year?
the PR that almost fixed this was closed
in favor of some internal refactor, which I can't seem to find a bread-trail for
but that was relatively recently
Daniel McDonald
@wasade
should we group follow up on the threads and see if we can get some attention to it?
Jamie Morton
@mortonjt
also, it is possible to have the conda recipe that uploaded the binaries to biocore on biocore/conda-recipes?
doesn't look like the skbio recipe is there: https://github.com/biocore/conda-recipes
could make it easier to debug this problem
Evan Bolyen
@ebolyen
We just use pypi skeleton, and the problem is just because I compiled Linux locally (Ubuntu 14.04) we know why this happens, it's just that conda is flawed. conda-forge uses a docker container called linux-anvil which would be good to use
Haven't gotten around to it though and it doesn't fix the fundamental issue
Jamie Morton
@mortonjt
hmm, that's very unfortunate. I guess we'll have to just stick with pypi
note that this will be problematic with qiime2, since it will limit the machines that it can be distribute with (i.e you can't conda install qiime2 plugins)
Erik Cohen
@cohenpts

Hello everyone, I had a quick statistic question I hope its ok to ask here.

I am trying to do linear interpolation for a set of data but have the end number no be between 0-1 but 0-15 how can I make the range correlate the results with 0-15?

I am using this formula to get the 0-1 between the range,
(ele.value - min) / (max - min)

Jai Ram Rideout
@jairideout
Hey @cohenpts, this question would be better suited for a math/statistics forum (e.g. http://math.stackexchange.com/). Thanks!
Jai Ram Rideout
@jairideout
scikit-bio 0.5.1 is live: support for interval metadata! https://github.com/biocore/scikit-bio/releases/tag/0.5.1
Justine Debelius
@jwdebelius
I'm looking suggestions for bootstrapping a distance matrix?
Is the best way to take the distance matrix into a numpy array, bootstrap that, and then make a new distance matrix. The major disadvantage is that I have to re-index my grouping object.
Evan Bolyen
@ebolyen
Ideally DistanceMatrix would handle array indexing, allowing you to generate a random vector as your bootstrap iteration
but since it doesn't, you might look at xray
(although I think that project changed names)
it handles labelled ndarrays
xarray now I guess
Daniel McDonald
@wasade
@ebolyen .ids and .data do support fancy indexing though
Evan Bolyen
@ebolyen
@wasade great point!
Justine Debelius
@jwdebelius
The block I have is this:
dm_data = dm.data
    dm_ids = dm.ids
    id_pos = [dm_ids.index(id_) for id_ in ids]
skbio.DistanceMatrix(dm_data[id_pos][id_pos], new_names)
Evan Bolyen
@ebolyen
You should be able to use something like np.random.choice to generate an array and then use fancy indexing on dm.ids and dm.data
the dm_data[id_pos][id_pos] also doesn't look quite right to me, but I always forget the fancy indexing rules for multi-dimensional arrays so I am probably wrong
Evan Bolyen
@ebolyen

Wait... do you want duplicate IDs? or should the IDs stay the same, just with bootstrapped data? I don't think distance matrix will allow proper bootstrapping of the IDs as they need to be unique and bootstrapping (if I recall correctly) requires random choices with replacement.

So you probably just need to bootstrap the data, in which case your code is close, but the index needs to be [id_pos][:, id_pos] so that symmetry is preserved.

@wasade please correct me if I'm wrong

Here's some sample code demonstrating the indexing:
In [1]: import numpy as np

In [2]: x = np.reshape(np.arange(100), (10, 10))

In [3]: x
Out[3]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

In [4]: r = np.random.choice(10, (10,), replace=True)

In [5]: r
Out[5]: array([9, 8, 2, 6, 9, 3, 7, 2, 2, 1])

In [6]: x[r][:, r]
Out[6]: 
array([[99, 98, 92, 96, 99, 93, 97, 92, 92, 91],
       [89, 88, 82, 86, 89, 83, 87, 82, 82, 81],
       [29, 28, 22, 26, 29, 23, 27, 22, 22, 21],
       [69, 68, 62, 66, 69, 63, 67, 62, 62, 61],
       [99, 98, 92, 96, 99, 93, 97, 92, 92, 91],
       [39, 38, 32, 36, 39, 33, 37, 32, 32, 31],
       [79, 78, 72, 76, 79, 73, 77, 72, 72, 71],
       [29, 28, 22, 26, 29, 23, 27, 22, 22, 21],
       [29, 28, 22, 26, 29, 23, 27, 22, 22, 21],
       [19, 18, 12, 16, 19, 13, 17, 12, 12, 11]])
Justine Debelius
@jwdebelius
I'm bootstrapping with random choice. I want to be able to pass a bootstrapped grouping object and distance matrix into a function.
I need to maintain the label - distance relationship, not just bootstrap the distances.
Evan Bolyen
@ebolyen
So you'll either need to not use skbio.DistanceMatrix as it won't permit duplicate ids, or keep a map of original bootstrapped ids to surrogate ids (e.g. something like 0, 1, 2, ..., n).
Justine Debelius
@jwdebelius
So, that's been my solution. It's just kind of janky feeling, and means re-indexing the grouping.
I wanted to see if anyone knew a better way 'cause you all scikit-bio way harder than I do.
Andrew Fiore-Gartland
@agartland
Quick question: How do I create a collection of sequences to write to a FASTA file? What collection object do I use in the current version? TabularMSA does not apply because they are not all the same length.
Anders Pitman
@anderspitman
My datavis professor was talking about clustering for visualization and phylogenies today and mentioned QIIME. At one point he was trying to remember Rob's name. Thankfully I was wearing my scikit-bio tshirt under my jacket. I've never felt so prepared for a moment in my life. I was ready, and I did not disappoint.