import boost_histogram as bh
import numpy as np
# From the Boost.Histogram examples
values = [1]*9 + [91]
weight=2
weights=weight*np.ones(len(values))
wm = bh.accumulators.WeightedMean().fill(values, weight=weight)
print("\n# By hand")
print(f"values: {values}")
print(f"weights: {weights}")
_sum_of_weights = sum(weights)
print(f"sum of weights: {_sum_of_weights}")
_sum_of_weights_squared = sum(np.square(weights))
print(f"sum of weights squared: {_sum_of_weights_squared}")
print(f"weighted mean: {np.average(values, weights=weights)}")
print(f"np variance: {np.var(values)}")
_weighted_variance = sum(np.square(values-np.average(values, weights=weights)))/(len(values)-1) # Better ways to do this
print(f"weighted variance: {_weighted_variance}")
print("\n# Boost.Histogram accumulator")
print(wm) # Difference _was_ that Boost.Histogram nicely gives an unbiased estiamtor! :)
# By hand
values: [1, 1, 1, 1, 1, 1, 1, 1, 1, 91]
weights: [2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
sum of weights: 20.0
sum of weights squared: 40.0
weighted mean: 10.0
np variance: 729.0
weighted variance: 810.0
# Boost.Histogram accumulator
WeightedMean(sum_of_weights=20, sum_of_weights_squared=40, value=10, variance=810)
import boost_histogram as bh
h = bh.Histogram(bh.axis.Regular(10, 0, 1), storage=bh.storage.WeightedMean())
h.fill([1], sample=[3], weight=[2])
Hm. Okay I'll poke around at this later tonight, but does that mean that I need to make a second histogram to keep track of the uncertainties on my first histogram? For instance,
import hist
from hist import Hist
import uproot4 as uproot
root_file = uproot.open("example.root")
values, edges = root_file["data"].to_numpy()
_hist = Hist(
hist.axis.Regular(len(edges) - 1, edges[0], edges[-1], name=None),
storage=hist.storage.Double(),
)
_hist[:] = values
_hist.view()
giving
array([ 0., 0., 0., 0., 1., 16., 95., 272., 459., 591., 674.,
658., 623., 531., 456., 389., 321., 254., 194., 161., 120., 95.,
74., 60., 53., 38., 27., 19., 20., 15., 11., 10., 14.,
5., 6., 3., 7., 2., 6., 1., 3., 2., 3., 0.,
0., 1., 0., 0., 0., 0.])
if I wanted to now get the uncertainties on this histogram (so that I could then add them in quadrature while summing this "data" histogram with others to get the uncertainty on the resultant summed histogram) I'm not quite clear how I would do so.
import boost_histogram as bh
import numpy as np
bins = [1, 2, 3]
yields = np.asarray([3, 4])
stdev = np.asarray([1, 2])
h = bh.Histogram(
bh.axis.Variable(bins, underflow=False, overflow=False),
storage=bh.storage.Weight(),
)
h[...] = np.stack([yields, stdev ** 2], axis=-1)
print(f"yields: {h.view().value}, stdev: {np.sqrt(h.view().variance)}")
Currently, weighted histograms don’t plot or interact nicely with the rest of the Scikit-HEP ecosystem, but a Protocol to allow this is in the works and should be rougly ready by the end of the month.
@henryiii That's great and would be quite helpful to be able to use as defaults for https://github.com/matthewfeickert/heputils/blob/master/src/heputils/convert.py
Is there an Issue open at the moment that I could follow for this? I realize I'm also making a lot of requests without volunteering to help on anything, so apologies there, but maybe by 2021 I'll understand enough that I can start making some more helpful Issues or easy PRs.
I don’t think it’s really a replacement for gitter; it’s more of a replacement for a) stack overflow, and b) opening issues for help instead of a bug. It’s also all in one place - GitHub, so less fragmented. It’s really just a bit more orginisation on issues, with some minor added features. An issue closes - you aren’t supposed to care about the issue anymore when it’s fixed. But if it was opened as a discussion, or moved to a discussion, in the Q&A category, then it just gets an answer, but remains easy to find, so it becomes less likely to become a recurring quesiton. The “showcase” topic looks great - people can post thier most intersting and exciting examples of boost-histogram.
And while it’s new to me, other members of our comunity have been trying them out already (the PyHF repo has had them on for over a month, for example).
What is that one place you describe, gitter? To me gitter does not seem suited for focused discussions - the history is hard to read, even more so when one comes back after a month and tries to find a topic discussed previously.
Yes, I was referring to Gitter as the place to communicate with users.
My histogram is having a Regular
and a strCategory
axis and I want to show the cumulative sum of the categories, but with the different components stacked on top of each other.
I found out I can just use np.cumsum(axis=<category axis>)
to calculate the cumulative sum and create a new histogram from that. In had also managed to draw a stacked histogram by plotting multiple plt.bar
with different bottom=...
arguments taken from the cumulative sums. But I dislike that visualy, because I want to have a black outline for my components, but no outlines for the individual bins, as seen in the screenshot.