## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
• Create your own community
##### Activity
heatherrussell
@heatherrussell
sorry if this is already stated somewhere, I checked the docs and the open issues but didn't see anything relevant
Jim Pivarski
@jpivarski

The naive assumption when dividing histograms with error bars is that the error bars are independent (the same assumption that is usually made when adding or subtracting), but the most common use-case for dividing is to make an efficiency plot, in which the numerator is a strict subset of the denominator and both are counting statistics. Even when we know that this is the case, there are different ways of handling the statistics that differ for ratios close to 0 or 1. See the table on this page. There are strong arguments for some of these options, but not everybody agrees.

So if there is a way to divide histograms in a histogramming library, it should probably be some kind of method call, so that the statistical treatment can be configurable. If it's just the / operator, a lot would have to be assumed.

heatherrussell
@heatherrussell

So if there is a way to divide histograms in a histogramming library, it should probably be some kind of method call, so that the statistical treatment can be configurable. If it's just the / operator, a lot would have to be assumed.

yes, definitely! If it's not appropriate to include in a histogramming library, is this something that people are doing manually in their analyses? Or is there some other stats package that's more appropriate to be using here? Because right now, the simplest way of making an efficiency plot seems to me to convert boost_histograms into TH1 and divide there, which is a little silly!

alexander-held
@alexander-held
I think the coffea histogram implementation has some of the relevant methods already implemented (https://coffeateam.github.io/coffea/modules/coffea.hist.html).
Matthew Feickert
@matthewfeickert
There is a subset of them in hist now too: https://hist.readthedocs.io/en/latest/reference/hist.html#module-hist.intervals
Matthew Feickert
@matthewfeickert
@heatherrussell If you want to use hist to make a ratio plot where the ratio is an efficiency then you could just follow the example in the User Guide with kwarg rp_uncertainty_type="poisson-ratio" (where of course you'd change the histograms so that hist_1 is a strict subset of hist_2).
heatherrussell
@heatherrussell

I think the coffea histogram implementation has some of the relevant methods already implemented (https://coffeateam.github.io/coffea/modules/coffea.hist.html).

thanks, I always steered away from coffea because all the examples are CMS-based and I haven't sat down and translated to atlas jargon :D I didn't actually realise it had histogramming!

and

If you want to use hist to make a ratio plot where the ratio is an efficiency then you could just follow the example in the User Guide with kwarg rp_uncertainty_type="poisson-ratio"

I also hadn't realise that hist could do this properly because I didn't see any efficiencies in the example.

two options now, thanks everyone!

Andrzej Novak
@andrzejnovak
It should be pointed out the plan is for the coffea implementation of hist to switch to using hist, so if you have not used either yet, going for the one packaged with coffea is not recommended
Matthew Feickert
@matthewfeickert
To follow up on @andrzejnovak's comment the coverage intervals that coffea has implemented were basically ported into the hist.intervals module, so using either should give identical results and if coffea moves to using hist it will basically just be changes in the API called. :+1: Big thanks to @nsmith- here as he was the first to implement these in coffea and has been very helpful in giving feedback and advice.
Angus Hollands
@agoose77:matrix.org
[m]

Hi all,
When using UHI on a 3D histogram with an IntCategory first axis, I notice that it seems to ignore the starting indices of my projection, e.g.

h[1::sum, ...] should project the contents after 0 in the first dimension, but this start index seems to be ignored. I see the same result if I slice and then manually call project.

Upon further investigation, it seems that this only happens if I don't provide the stop attribute of the slice, i.e. h[1:len:sum] works. This is really, really useful, btw. Thanks for all the hard work.

Angus Hollands
@agoose77:matrix.org
[m]
I'll open a bug report with a reproducer :)
Hans Dembinski
@HDembinski
Thanks, very much appreciated!
Hans Dembinski
@HDembinski
@agoose77:matrix.org I cannot reproduce this on an 1-dimensional histogram. I use this syntax a lot myself and did not notice any issues so far.
h[1:len:sum] cuts off the overflow bin, while h[1::sum] includes overflow
The IntCategory axis has an overflow bin to keep track of all the items that did not end up in one of your categories.
Hans Dembinski
@HDembinski
Ok, I can confirm this on IntCategory, it is a bug
Angus Hollands
@agoose77:matrix.org
[m]
@HDembinski thanks for taking a look, I'll try it again - I noticed that any value of start didn't affect the projection which surprised me
Ah, excellent (not going mad) !
Hans Dembinski
@HDembinski
Did you already start your Bug report?
Otherwise I will post code to reproduce
Angus Hollands
@agoose77:matrix.org
[m]
Hans Dembinski
@HDembinski
This is a boost-histogram issue, but ok
Angus Hollands
@agoose77:matrix.org
[m]
OK, shall I migrate it?
Hans Dembinski
@HDembinski
I moved it, thanks for reporting! Very annoying bug :(
Angus Hollands
@agoose77:matrix.org
[m]
No, thanks for checking in! I opened a redundant issue on boost-histogram, which I've now closed.
Hans Dembinski
@HDembinski
I think you closed the issue that I moved?
But you reopened it, so it is ok
Angus Hollands
@agoose77:matrix.org
[m]
😂 yes, it took a few goes, but we're now in a place of equilibrium :)
Henry Schreiner
@henryiii
I love that we can move around issues within scikit-hep :)
Angus Hollands
@agoose77:matrix.org
[m]
It took me by surprise at first, super useful.
Hans Dembinski
@HDembinski

I love that we can move around issues within scikit-hep :)

Me, too, it is great

Andrzej Novak
@andrzejnovak
In hist, how can I rebin on categorical axis?
Andrzej Novak
@andrzejnovak
@henryiii
Hans Dembinski
@HDembinski
@andrzejnovak Rebinning does not make sense on a category axis, since categories are not ordinal
Andrzej Novak
@andrzejnovak
@HDembinski possibly a poor word choice, it's really this problem scikit-hep/hist#211 - merging categories if you will
agoose77
@agoose77:matrix.org
[m]
@andrzejnovak: this is a projection in hist terms
I believe you can do this by some_hist[0:len:sum] IIRC *
It's the sum that's doing the work here (the projection) https://uhi.readthedocs.io/en/latest/indexing.html#slicing
The UHI syntax is really really nice.
Henry Schreiner
@henryiii
If you want to merge arbitrary bins, say [[“one”, “two”], [“three”]], then there’s not a good way to do that, but it’s tricky to write as well; what is the name for the merged bin? Since we depend on Python 3.6+ now, I guess an (ordered) dict could be used; keys would be new bin names, and the values would be collections of bins to merge. If we had pick, I think it would be easier, you could select each iteration then merge.
Andrzej Novak
@andrzejnovak
I would favour the dict semantics, should be more intuitive and less error-prone than any looping. It's a really common use case for virtually all of my workflows (ptbinned samples need to be separately scaled and then merged) and I've been hesitating on recommending new students to use hist instead of coffea/hist because the current workaround is not trivial
Andrzej Novak
@andrzejnovak
I've run into something strange (wanted to check here before I open an issue). I create a 2D hist and fill it such that entries fall into flow bins. If I access flow bins separately like h[hist.tag.underflow, 1]a value gets returned as expected. If I try to make a projection like h[hist.tag.underflow, ...] I am hit with ValueError: bins > 0 required
Angus Hollands
@agoose77:matrix.org
[m]
@andrzejnovak: what kind of axes do you have?
Andrzej Novak
@andrzejnovak
regular
import hist
import numpy as np
s = hist.tag.Slicer()

w = 1e-2  # e.g. a cross section for a process
x = np.random.normal(loc=0.4, scale=0.4, size=1000)
y = np.random.normal(loc=0.6, scale=0.4, size=1000)

h = hist.Hist(
hist.axis.Regular(3, 0, 1, name="x", label="axis"),
hist.axis.Regular(3, 0, 1, name="y", label="axis"),
hist.storage.Weight()
)
h.fill(x=x, y=y, weight=w)

h[hist.tag.underflow,  ...]
Angus Hollands
@agoose77:matrix.org
[m]
Hmm. I'm not sure on this one 😕
Andrzej Novak
@andrzejnovak
maybe @henryiii can be summoned :)
Andrzej Novak
@andrzejnovak
unrelated: does boost_histogram has a recommended/defined way to serialize/save stuff? I vaguely recall seeing some discussion about it, but don't remember. I am currently writing TH1s to root files, but it feels superfluous since I am reading them back out and converting to numpy arrrays :D
alexander-held
@alexander-held
I'm curious about this too! In particular I'm wondering about a use case where I may be producing many histograms in parallel and would like to save them to a single file. Is this possible, or do I need to first collect them (like coffea if I understand that correctly) and then save all at once?
Henry Schreiner
@henryiii
Currently pickeling is well supported, and quite fast. There’s a plan in the very near future to offer something specific that will even work across boost-histogram for Python and Boost.Histogram C++, but that’s still in planning phase. There also might be Hist-specific support for some other formats, and uproot 4 writing should support boost-histograms.