Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Eduardo Rodrigues
    @eduardo-rodrigues
    BIG congrats on this great milestone :+1: !
    Hans Dembinski
    @HDembinski
    Also from me!
    Jan Pipek
    @janpipek
    Nice! Congrats! (and sorry that I am not able to follow everything and did not even find the time to make physt compatible with some of the Protocols you established yet)
    Henry Schreiner
    @henryiii
    In the very near future, I probably could lend a hand. :)
    alexander-held
    @alexander-held

    Congratulations on boost-histogram 1.0! I'm adopting the new API for subclassing, and saw in https://boost-histogram.readthedocs.io/en/latest/usage/subclassing.html that family=object() is recommended when only overriding Histogram. What is the difference between object() and object? While trying to understand this, I noticed that object is object is True and object() is object() (are those instances?) is False. Is the latter part an issue given the following?

    It just has to support is, and be the exact same object on all your subclasses.

    Henry Schreiner
    @henryiii
    object is a class; classes are singletons, there’s just one. object() is an instance, and you can make as many as you want, each will live in a different place in memory, check with id(). Basically, family= can be anything that supports is which is literally everything, with the exception of the Module boost_histogram (as that’s already taken by boost-histogram). The Module hist would be a bad choice too, as then your axes would come out randomly Hist's or your own. The old way works fine, FAMILY = object() at the top of the file, then use family=FAMILY when you subclass. But for most users, a handy existing object is the module you are in, that is, “hist” or “boost_histogram”. It’s unique to you, and is descriptive. You can use family=None (or the object class, anything works), you just don’t want some other extension to also use the same one - then boost-histogram won’t be able to distinguish between them when picking Axis, Storage, etc. If all you use is Histogram, though, then it really doesn’t matter.
    One use for object() is to make a truly unique object. For example, if I make NOTHING=object(), then use def f(x =NOTHING): if x is NOTHING, I can now always tell if someone passed a keyword argument in. They can’t make NOTHING, they have to pull NOTHING out of my source and using it from there, you can’t “remake” it accidentaly.
    Henry Schreiner
    @henryiii

    The ideal way would have been the following:

    class Hist(bh.Histogram):class Regular(bh.axis.Regular, parent=Hist)

    The problem with this would have been it is very hard to design without circular imports, as Histogram almost always has Axis usages in it. It can be done, but would have requried changes to boost-histogram and user code, which also has to follow this strict regimen. Using a token is much simpler; it doesn’t require as much caution in user code (or boost-histogram).

    alexander-held
    @alexander-held
    Thanks! I was wondering whether somehow an instance of the class inheriting from Histogram would create a new object() and then not match the object() in the family definition, but from what I understand now this is not what happens - this object is created once when the class is defined, and any other class that may also inherit defined in my code with family=object() would pick up a different object and be unique too.
    Henry Schreiner
    @henryiii
    The idea of family=object() is only valid if you don’t have any custom Axes, as you can’t (without knowing that family is stored as ._family, anyway) access the family after you’ve made it inline there.
    alexander-held
    @alexander-held
    Thanks Henry!
    Henry Schreiner
    @henryiii

    If I added a default for family for Histogram, it would have been object(). I could special case None, that is, if family=None, it just makes an object() for you.

    I could also make that the default for Histogram, and only require family= on the other subclasses. But if you have an Axis or other subclass, you have to go back and add family= on the Histogram; that’s why I force it to always be delt with on Histogram, it prepares you for also subclassing other components. I didn’t really think too much about only subclassing Histogram.

    By the way, can’t you do

    import cabinetry
    class Histogram(bh.Histogram, family=cabinetry): ...

    ? That would allow to easily add subclasses for axes eventually if you needed to customize them later.

    alexander-held
    @alexander-held

    Yes, I could use that too. I was looking at object() following the documentation:

    If you only override Histogram, just use family=object().

    The additions in my histogram class are rather lightweight and I don't expect to go deeper and subclass axes. On the other hand I see no downside of family=cabinetry either.

    Henry Schreiner
    @henryiii
    I’ll at least update the docs a bit in the future; the None update should be simple too.
    4 replies
    Matthew Feickert
    @matthewfeickert

    @henryiii @jpivarski Can you tell me if this is a hist Issue or a uproot Issue or neither? https://gist.github.com/matthewfeickert/ab6ac8677aad2e04738111d0af3e0549

    (There's a Binder link in the Gist if you want to play with it in browser)

    10 replies
    Henry Schreiner
    @henryiii
    @matthewfeickert Shouldn't that be np.sqrt(hist.values())?
    Nicholas Smith
    @nsmith-
    I remember hearing bh has a sparse storage option, but I can't find it in the docs, is that something implemented in the python binding?
    19 replies
    Matthew Feickert
    @matthewfeickert

    @henryiii @jpivarski Another followup question on moving from root files to hist.Hist histograms via uproot: Is there any way to be able to use uproot's .to_hist() API to get a hist.Hist with storage=hist.storage.Weight()? Or at the moment should I just write a little converter like I did here?

    https://github.com/matthewfeickert/heputils/issues/24#issuecomment-800867686

    Jim Pivarski
    @jpivarski
    Currently, the storage depends on whether the ROOT histogram has a Sumw2 in it. If you're not getting weighted storage, then your histogram must not have one (barring bugs, of course).
    3 replies
    The Uproot interface is supposed to be minimal, just a bridge to get you into the boost-histogram it hist package. If you need a function that creates trivial weights or specified weights for a histogram with no weights, that sounds like something the histogramming libraries should cover.
    Henry Schreiner
    @henryiii
    boost-histogram 0.13.1, 1.0.1, and hist 2.2.1 released.
    Henry Schreiner
    @henryiii
    Boost.Histogram team is @HDembinski, congats to him :)
    Eduardo Rodrigues
    @eduardo-rodrigues
    :+1:
    Jim Pivarski
    @jpivarski
    Indeed, congratulations @HDembinski, that's awesome!
    Hans Dembinski
    @HDembinski
    Thanks, my lobbying seems to help this case
    alexander-held
    @alexander-held
    I came across the discussion in scikit-hep/boost-histogram#459 about boost-histogram returning consistent objects via view() for different storages. I personally value a consistent API for the tiny subset of features I use in practice (e.g. double/weight storages) higher than the extra flexibility. I suspect API consistency may also help with typing. Is this more consistent API something that may fit into hist (or is it maybe already available there)?
    Hans Dembinski
    @HDembinski
    You can now call histogram.values you don't need the view anymore.
    Henry Schreiner
    @henryiii
    The pressure on this is much less now that you have the PlottableProtocol methods, .values(), .variances(), and .counts(). You can’t quite get everything there, but most of it. Technically, Hist could change the return into a View even for simple storages, but then it would lose compatibility with boost-histogram, which is a no-go. Maybe, now that there’s an easy way to get those values, .view() could be made to always return a View, but that would be a breaking change and would need to be considered carefully
    Hans Dembinski
    @HDembinski
    @henryiii Beat you by the fraction of a second apparently
    Henry Schreiner
    @henryiii
    Not by much, but yes. You just did it with an email, too.
    alexander-held
    @alexander-held
    Thanks! This is very nice and I forgot I am already using the feature. Access for writing is still different I think (e.g. h.view().value = [3,5] for weight, h[...] = [1,2] for double), but that is probably less frequently needed.
    Henry Schreiner
    @henryiii
    Actually, h.values()[…] = [1,2] works. If you want to set variance, you should set both at the same time, h[…] = [[1,.1], [2,.1]].
    alexander-held
    @alexander-held
    That's great, I was not aware this works for weighted storage too. Thanks!
    Henry Schreiner
    @henryiii
    h.variances()[…] = will not work if variance is computed, as you are setting a computed value (mean storages). .values()[…] = should work on all the existing storages, though. I would mostly recommend setting them all at once, using the h[…] = syntax, though.
    Henry Schreiner
    @henryiii
    Hans Dembinski
    @HDembinski

    h.variances()[…] = will not work if variance is computed, as you are setting a computed value (mean storages). .values()[…] = should work on all the existing storages, though. I would mostly recommend setting them all at once, using the h[…] = syntax, though.

    This could be fixed, though, right

    Everything that a user may naively do should work as expected if it is technically possible
    Henry Schreiner
    @henryiii
    No, h.variances() returns a NumPy array that has been generated. Though https://github.com/scikit-hep/boost-histogram/discussions/504 would make this all much more elegent; you could write h.variances = … and that would just work (and support flow / noflow).
    Hans Dembinski
    @HDembinski
    Right, I forgot that it is a function call where we cannot just implement the [:]
    Hans Dembinski
    @HDembinski
    Added some more comments to that discussion
    Hans Dembinski
    @HDembinski
    @alexander-held The view gives you a low-level access to the accumulators. Since the accumulators are different, the interface is different. We will not change that. If you want the unified high-level interface, you can use .value and .variance on the histogram.
    1 reply
    Andrzej Novak
    @andrzejnovak
    Apologies if it's in the docs, but I couldn't find it. Doesn't hist have a .scale option?
    Dawned on me 1s after I asked, I can just multiply it
    heatherrussell
    @heatherrussell
    Are there plans to support dividing histograms / efficiencies / perhaps poisson uncertainties on data points (asymmetric error bars)?
    sorry if this is already stated somewhere, I checked the docs and the open issues but didn't see anything relevant
    Jim Pivarski
    @jpivarski

    The naive assumption when dividing histograms with error bars is that the error bars are independent (the same assumption that is usually made when adding or subtracting), but the most common use-case for dividing is to make an efficiency plot, in which the numerator is a strict subset of the denominator and both are counting statistics. Even when we know that this is the case, there are different ways of handling the statistics that differ for ratios close to 0 or 1. See the table on this page. There are strong arguments for some of these options, but not everybody agrees.

    So if there is a way to divide histograms in a histogramming library, it should probably be some kind of method call, so that the statistical treatment can be configurable. If it's just the / operator, a lot would have to be assumed.

    heatherrussell
    @heatherrussell

    So if there is a way to divide histograms in a histogramming library, it should probably be some kind of method call, so that the statistical treatment can be configurable. If it's just the / operator, a lot would have to be assumed.

    yes, definitely! If it's not appropriate to include in a histogramming library, is this something that people are doing manually in their analyses? Or is there some other stats package that's more appropriate to be using here? Because right now, the simplest way of making an efficiency plot seems to me to convert boost_histograms into TH1 and divide there, which is a little silly!

    alexander-held
    @alexander-held
    I think the coffea histogram implementation has some of the relevant methods already implemented (https://coffeateam.github.io/coffea/modules/coffea.hist.html).