boost-histogram 1.0! I'm adopting the new API for subclassing, and saw in https://boost-histogram.readthedocs.io/en/latest/usage/subclassing.html that
family=object() is recommended when only overriding
Histogram. What is the difference between
object? While trying to understand this, I noticed that
object is object is
object() is object() (are those instances?) is
False. Is the latter part an issue given the following?
It just has to support is, and be the exact same object on all your subclasses.
objectis a class; classes are singletons, there’s just one.
object()is an instance, and you can make as many as you want, each will live in a different place in memory, check with
family=can be anything that supports is which is literally everything, with the exception of the Module
boost_histogram(as that’s already taken by boost-histogram). The Module
histwould be a bad choice too, as then your axes would come out randomly Hist's or your own. The old way works fine,
FAMILY = object()at the top of the file, then use
family=FAMILYwhen you subclass. But for most users, a handy existing object is the module you are in, that is, “hist” or “boost_histogram”. It’s unique to you, and is descriptive. You can use
family=None(or the object class, anything works), you just don’t want some other extension to also use the same one - then boost-histogram won’t be able to distinguish between them when picking Axis, Storage, etc. If all you use is Histogram, though, then it really doesn’t matter.
object()is to make a truly unique object. For example, if I make
NOTHING=object(), then use
def f(x =NOTHING): if x is NOTHING, I can now always tell if someone passed a keyword argument in. They can’t make NOTHING, they have to pull NOTHING out of my source and using it from there, you can’t “remake” it accidentaly.
The ideal way would have been the following:
class Hist(bh.Histogram): … class Regular(bh.axis.Regular, parent=Hist)
The problem with this would have been it is very hard to design without circular imports, as Histogram almost always has Axis usages in it. It can be done, but would have requried changes to boost-histogram and user code, which also has to follow this strict regimen. Using a token is much simpler; it doesn’t require as much caution in user code (or boost-histogram).
Histogramwould create a new
object()and then not match the
object()in the family definition, but from what I understand now this is not what happens - this object is created once when the class is defined, and any other class that may also inherit defined in my code with
family=object()would pick up a different object and be unique too.
If I added a default for family for Histogram, it would have been object(). I could special case None, that is, if
family=None, it just makes an object() for you.
I could also make that the default for Histogram, and only require family= on the other subclasses. But if you have an Axis or other subclass, you have to go back and add family= on the Histogram; that’s why I force it to always be delt with on Histogram, it prepares you for also subclassing other components. I didn’t really think too much about only subclassing Histogram.
By the way, can’t you do
import cabinetry class Histogram(bh.Histogram, family=cabinetry): ...
? That would allow to easily add subclasses for axes eventually if you needed to customize them later.
Yes, I could use that too. I was looking at
object() following the documentation:
If you only override Histogram, just use family=object().
The additions in my histogram class are rather lightweight and I don't expect to go deeper and subclass axes. On the other hand I see no downside of
@henryiii @jpivarski Can you tell me if this is a
hist Issue or a
uproot Issue or neither? https://gist.github.com/matthewfeickert/ab6ac8677aad2e04738111d0af3e0549
(There's a Binder link in the Gist if you want to play with it in browser)
@henryiii @jpivarski Another followup question on moving from root files to
hist.Hist histograms via
uproot: Is there any way to be able to use
.to_hist() API to get a
storage=hist.storage.Weight()? Or at the moment should I just write a little converter like I did here?
Well that's super cool to hear. Congrats in advance
Boost.Histogram team. :)
boost-histogramreturning consistent objects via
view()for different storages. I personally value a consistent API for the tiny subset of features I use in practice (e.g. double/weight storages) higher than the extra flexibility. I suspect API consistency may also help with typing. Is this more consistent API something that may fit into
hist(or is it maybe already available there)?
h.variances()[…] =will not work if variance is computed, as you are setting a computed value (mean storages).
.values()[…] =should work on all the existing storages, though. I would mostly recommend setting them all at once, using the
h[…] =syntax, though.
This could be fixed, though, right
h.variances()returns a NumPy array that has been generated. Though https://github.com/scikit-hep/boost-histogram/discussions/504 would make this all much more elegent; you could write
h.variances = …and that would just work (and support flow / noflow).
The naive assumption when dividing histograms with error bars is that the error bars are independent (the same assumption that is usually made when adding or subtracting), but the most common use-case for dividing is to make an efficiency plot, in which the numerator is a strict subset of the denominator and both are counting statistics. Even when we know that this is the case, there are different ways of handling the statistics that differ for ratios close to 0 or 1. See the table on this page. There are strong arguments for some of these options, but not everybody agrees.
So if there is a way to divide histograms in a histogramming library, it should probably be some kind of method call, so that the statistical treatment can be configurable. If it's just the
/ operator, a lot would have to be assumed.