The naive assumption when dividing histograms with error bars is that the error bars are independent (the same assumption that is usually made when adding or subtracting), but the most common use-case for dividing is to make an efficiency plot, in which the numerator is a strict subset of the denominator and both are counting statistics. Even when we know that this is the case, there are different ways of handling the statistics that differ for ratios close to 0 or 1. See the table on this page. There are strong arguments for some of these options, but not everybody agrees.
So if there is a way to divide histograms in a histogramming library, it should probably be some kind of method call, so that the statistical treatment can be configurable. If it's just the /
operator, a lot would have to be assumed.
So if there is a way to divide histograms in a histogramming library, it should probably be some kind of method call, so that the statistical treatment can be configurable. If it's just the
/
operator, a lot would have to be assumed.
yes, definitely! If it's not appropriate to include in a histogramming library, is this something that people are doing manually in their analyses? Or is there some other stats package that's more appropriate to be using here? Because right now, the simplest way of making an efficiency plot seems to me to convert boost_histograms into TH1 and divide there, which is a little silly!
coffea
histogram implementation has some of the relevant methods already implemented (https://coffeateam.github.io/coffea/modules/coffea.hist.html).
hist
now too: https://hist.readthedocs.io/en/latest/reference/hist.html#module-hist.intervals
hist
to make a ratio plot where the ratio is an efficiency then you could just follow the example in the User Guide with kwarg
rp_uncertainty_type="poisson-ratio"
(where of course you'd change the histograms so that hist_1
is a strict subset of hist_2
).
I think the
coffea
histogram implementation has some of the relevant methods already implemented (https://coffeateam.github.io/coffea/modules/coffea.hist.html).
thanks, I always steered away from coffea because all the examples are CMS-based and I haven't sat down and translated to atlas jargon :D I didn't actually realise it had histogramming!
and
If you want to use hist to make a ratio plot where the ratio is an efficiency then you could just follow the example in the User Guide with kwarg rp_uncertainty_type="poisson-ratio"
I also hadn't realise that hist could do this properly because I didn't see any efficiencies in the example.
two options now, thanks everyone!
coffea
has implemented were basically ported into the hist.intervals
module, so using either should give identical results and if coffea
moves to using hist
it will basically just be changes in the API called. :+1: Big thanks to @nsmith- here as he was the first to implement these in coffea
and has been very helpful in giving feedback and advice.
Hi all,
When using UHI on a 3D histogram with an IntCategory
first axis, I notice that it seems to ignore the starting indices of my projection, e.g.
h[1::sum, ...]
should project the contents after 0 in the first dimension, but this start index seems to be ignored. I see the same result if I slice and then manually call project.
Upon further investigation, it seems that this only happens if I don't provide the stop
attribute of the slice, i.e. h[1:len:sum]
works. This is really, really useful, btw. Thanks for all the hard work.
h[1:len:sum]
cuts off the overflow bin, while h[1::sum]
includes overflow
boost-histogram
, which I've now closed.
some_hist[0:len:sum]
IIRC *
sum
that's doing the work here (the projection) https://uhi.readthedocs.io/en/latest/indexing.html#slicing
[[“one”, “two”], [“three”]]
, then there’s not a good way to do that, but it’s tricky to write as well; what is the name for the merged bin? Since we depend on Python 3.6+ now, I guess an (ordered) dict could be used; keys would be new bin names, and the values would be collections of bins to merge. If we had pick
, I think it would be easier, you could select each iteration then merge.
hist
instead of coffea/hist
because the current workaround is not trivial
h[hist.tag.underflow, 1]
a value gets returned as expected. If I try to make a projection like h[hist.tag.underflow, ...]
I am hit with ValueError: bins > 0 required
import hist
import numpy as np
s = hist.tag.Slicer()
w = 1e-2 # e.g. a cross section for a process
x = np.random.normal(loc=0.4, scale=0.4, size=1000)
y = np.random.normal(loc=0.6, scale=0.4, size=1000)
h = hist.Hist(
hist.axis.Regular(3, 0, 1, name="x", label="axis"),
hist.axis.Regular(3, 0, 1, name="y", label="axis"),
hist.storage.Weight()
)
h.fill(x=x, y=y, weight=w)
h[hist.tag.underflow, ...]
coffea
if I understand that correctly) and then save all at once?