Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Hans Dembinski
    @HDembinski
    I abuse them to that end. I think we have everything in place for this use case, no?
    Jim Pivarski
    @jpivarski

    Does anyone have an answer to this: https://stackoverflow.com/questions/63813448/writing-boost-histograms-with-uproot

    It would have to be Uproot3, since Uproot4 doesn't write anything yet.

    Henry Schreiner
    @henryiii
    How would I write a histogram with variances in uproot3? It should be easy to mimic that with boost-histogram.
    Jim Pivarski
    @jpivarski

    I just looked into it and found a physt example, which handles variances:

    https://github.com/scikit-hep/uproot-methods/blob/80dbc8123c577253585b33b7d8b3d72acc42818b/uproot_methods/classes/TH1.py#L447-L508

    It creates classes with the right names and the right fields, which Uproot 3 recognizes when assigning to a key of an output file (in __setitem__). The recognition happens in

    https://github.com/scikit-hep/uproot-methods/blob/80dbc8123c577253585b33b7d8b3d72acc42818b/uproot_methods/convert.py#L44-L45

    Considering how complicated this looks, it's a toss-up whether it's valuable to do it now, so that Uproot 3 will recognize and write boost-histogram and hist objects, or if it would be better to wait a month or two for me to add the file-writing to Uproot 4. The new interface would be more formal than this.

    Maybe more than two months—I've claimed file-writing in Uproot 4 as a milestone for December 1, though.

    Henry Schreiner
    @henryiii

    Hist 2.0.0 is out! This is the result of the work @LovelyBuggies and I have been doing for Google Summer of Code 2020. Changes since Beta 1:

    • Based on boost-histogram 0.11; now supports two way boost-histogram <-> hist conversion without metadata issues.
    • mplhep is now used for all plotting. Return types changed; fig dropped, new figures only created if needed.
    • QuickConstruct was rewritten, uses new.Reg(...).Double(); not as magical but clearer types and usage.
    • Plotting requirements are no longer required, use pip install "hist[plot]" to request.

    The following new features were added:

    • Jupyter HTML repr's were added.
    • flow=False shortcut added.
    • Static type checker support for dependent projects.

    See more details at https://github.com/scikit-hep/hist

    Eduardo Rodrigues
    @eduardo-rodrigues
    Congrats, folks :+1: !
    Jan Pipek
    @janpipek
    Cool!
    Henry Schreiner
    @henryiii
    Building on top of the recently released pybind11 2.6.0, boost-histogram 0.11.1 is out! Python 3.9 support and wheels, PyPy support and wheels, 40% faster accumulators, better CMake support, and quite a bit more just from the upgrade!
    N!no
    @LovelyBuggies
    👍
    Hans Dembinski
    @HDembinski
    Hi Henry, what made the accumulators faster?
    2 replies
    Boost-1.75 will be released in a few weeks and will have some minor patches for Boost Histogram to increase compatibility with the latest compiler versions and the upcoming c++20 standard.
    2 replies
    We discovered a compiler bug in gcc that some code in Boost Histogram triggered. After I reported that, the gcc folks were very quick and are currently fixing this. :)
    2 replies
    Matthew Feickert
    @matthewfeickert

    @LovelyBuggies @henryiii I think I already asked Henry this (sorry) but can you remind me how one can take two hist objects with the same binning and add them? I see that in the example notebooks I have here https://github.com/matthewfeickert/heputils that just trying to fill a histogram

    root_file = uproot.open("example.root")
    mass_hists = [
        heputils.convert.uproot_to_hist(root_file[key]) for key in root_file.keys()
    ]
    stack_hist = mass_hists[0].copy()
    stack_hist.reset()
    for hist in mass_hists:
        stack_hist.fill(hist)
    stack_hist.plot1d()
    mass_hists[0].plot1d()

    is apparently not the right way to do things.

    Jim Pivarski
    @jpivarski
    It's not mass_hists[0] + mass_hists[1] + ...?
    In which case, sum(mass_hists[1:], mass_hists[0]) would add them all.
    Matthew Feickert
    @matthewfeickert

    mass_hists[0] + mass_hists[1]

    results in

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-4-7bab2e7a86b3> in <module>
         12 # mass_hists[0]
         13 # help(stack_hist)
    ---> 14 mass_hists[0] + mass_hists[1]
    
    /srv/conda/envs/notebook/lib/python3.7/site-packages/boost_histogram/_internal/hist.py in __add__(self, other)
        222     def __add__(self, other):
        223         result = self.copy(deep=False)
    --> 224         return result.__iadd__(other)
        225 
        226     def __iadd__(self, other):
    
    /srv/conda/envs/notebook/lib/python3.7/site-packages/boost_histogram/_internal/hist.py in __iadd__(self, other)
        227         if isinstance(other, (int, float)) and other == 0:
        228             return self
    --> 229         self._compute_inplace_op("__iadd__", other)
        230 
        231         # Addition may change the axes if they can grow
    
    /srv/conda/envs/notebook/lib/python3.7/site-packages/boost_histogram/_internal/hist.py in _compute_inplace_op(self, name, other)
        261     def _compute_inplace_op(self, name, other):
        262         if isinstance(other, Histogram):
    --> 263             getattr(self._hist, name)(other._hist)
        264         elif isinstance(other, _histograms):
        265             getattr(self._hist, name)(other)
    
    ValueError: axes not mergable
    Jim Pivarski
    @jpivarski
    I'd read that error message as saying that the binning is not the same.
    Henry Schreiner
    @henryiii
    If the axes are the same, yes, that works. If the axes don’t match, it won’t.
    Matthew Feickert
    @matthewfeickert
    Hm.
    >>> bins = [hist.to_numpy()[1] for hist in mass_hists]
    all([np.array_equal(x, y) for x, y in zip(bins, bins[::-1])])
    True
    Henry Schreiner
    @henryiii
    I’d use reduce to sum up a list of histograms, functools.reduce(operator.add, mass_hists), but you should be able to do sum(mass_hists), if you’d rather.
    Is the metadata different, perhaps?
    Matthew Feickert
    @matthewfeickert
    oh
    so things like the axis name matter?
    Jim Pivarski
    @jpivarski
    (Maybe there ought to be an easy way to strip metadata? And it should be recommended in the "axes not mergable" error message?)
    Matthew Feickert
    @matthewfeickert

    Hm. Though

    for hist in mass_hists:
        assert hist.metadata == None

    passes

    okay, I might be doing something stupid. I'll poke at this more after my next meeting
    Henry Schreiner
    @henryiii
    What would you do with the axes metadata? c = a + b, what’s the metadata for c? .metadata is only one item in the metadata, any valid attribute is also metadata, like .name, etc.
    Jim Pivarski
    @jpivarski
    One option is to use the first, though that clearly has downsides. Forcing the user to make a choice by wrapping the function call with a metadata stripper/overwriter that gets advertised in the error message might be the best way to go...
    Matthew Feickert
    @matthewfeickert

    This is all helpful and makes me appreciate the concept of metadata more. I guess the good news is that uproot will handle things in a more intelligent manner than I have been once some issues have been resolved, where what I have been (hackily) doing is the following https://github.com/matthewfeickert/heputils/blob/3dd0e858f002041a7ae15fc310d92ddb0ea4fe26/src/heputils/convert.py#L7-L32

    If I just don't even attempt to handle the name then things work with the following:

    import numpy as np
    import uproot4 as uproot
    import mplhep
    import hist
    from hist import Hist
    import functools
    import operator
    
    mplhep.set_style("ATLAS")
    
    def uproot_to_hist(uproot_hist):
        values, edges = uproot_hist.to_numpy()
        _hist = hist.Hist(
            hist.axis.Regular(len(edges) - 1, edges[0], edges[-1]),
            storage=hist.storage.Double(),
        )
        _hist[:] = values
        return _hist
    
    root_file = uproot.open("example.root")
    mass_hists = [
        uproot_to_hist(root_file[key]) for key in root_file.keys()
    ]
    
    stack_hist = functools.reduce(operator.add, mass_hists)
    stack_hist.plot1d()
    for hist in mass_hists:
        hist.plot1d()
    Matthew Feickert
    @matthewfeickert
    Though I guess I still want to know, is there a more intelligent way to deal with the name metadata?
    Henry Schreiner
    @henryiii
    As long as it matches, it should be fine. There used to be a way to get the metadata all at once (.metadata), but now there’s not a public location like that. It would be nice ot be able to just do ax1.__dict__ = ax2.__dict__ then merge.
    Matthew Feickert
    @matthewfeickert
    yeah, I also just looked at the docs for hist and was reminded that name is the name of the axis and not the histogram
    Henry Schreiner
    @henryiii
    (There is a private way to do exactly that - I think it’s ax._ax.metadata)
    You can name your histogram if you want to.
    Matthew Feickert
    @matthewfeickert
    so what I was doing should have been setting the label and not the name
    ah okay, thanks for the info on ax._ax.metadata!
    Henry Schreiner
    @henryiii
    “name” is something that can be used to refer to the axis and should be unique. “label” is anything.
    Matthew Feickert
    @matthewfeickert
    :+1:
    Thanks Henry!
    Henry Schreiner
    @henryiii
    And you can use __dict__ on a histogram. Just not a ax. I wonder if @property; def __dict__(self): return self._ax.metadata would break anything...
    It’s a slots class so normal __dict__ is not used.
    Hans Dembinski
    @HDembinski
    @matthewfeickert The idea of metadata was is to distinguish between otherwise identical axes, e.g. distinguish axis.Regular(10, 0, 10, metadata="time in seconds") from axis.Regular(10, 0, 10, metadata="length in meters"). That's why adding histograms in boost-histogram (and hist) only works when metadata matches on all axes
    The assumption here is that you can only add histograms with identical layout, not only physical layout in terms of the axis grid but also logical layout in terms of what these axes mean
    Matthew Feickert
    @matthewfeickert
    this makes sense now. Before I wasn't assuming that anything called "metadata" could actually have any impact on operations
    Peter Fackeldey
    @pfackeldey

    Dear Hist/boost-histogram developers,

    thank you for this great histogramming library, it is a pleasure to work with it :)

    I have a question to you regarding fancy indexing on a StrCategory axis.
    My (Hist) histogram looks as follows:

    In [38]: h
    Out[38]:
    Hist(
      StrCategory(['GluGluToHHTo2B2VTo2L2Nu_node_cHHH2p45'], growth=True),
      StrCategory(['ee', 'mumu', 'emu'], growth=True),
      StrCategory(['nominal'], growth=True),
      Regular(40, 0, 200, name='MET', label='$p_{T}^{miss}$'),
      storage=Weight()) # Sum: WeightedSum(value=2608.44, variance=47.3505) (WeightedSum(value=2775.4, variance=50.5706) with flow)

    Now I would like to "group" e.g. the "ee" and "emu" category together, which means that I'd like to do something like:

    h[{"dataset": "GluGluToHHTo2B2VTo2L2Nu_node_cHHH2p45", "category": ["ee", "emu"], "systematic": "nominal"}]

    Unfortunately this does not work, as the ["ee", "emu"] is not a valid indexing operation.

    Is there a way to do such a fancy indexing on StrCategory axes? In case this is not supported, is there a nice workaround?
    (I am basically looking for something, which works similar to coffea.Hist.group method: https://github.com/CoffeaTeam/coffea/blob/master/coffea/hist/hist_tools.py#L1115)

    Thank you already a lot in advance!
    Best, Peter

    Peter Fackeldey
    @pfackeldey
    small update: I found that the following works, but I think this would be a very nice feature for Hist, if this can be handled a bit more conveniently. What do you think?
    np.sum(h[{"dataset": "GluGluToHHTo2B2VTo2L2Nu_node_cHHH2p45", "systematic": "nominal"}].view()[h.axes["category"].index(["ee", "emu"]), ...], axis=0)
    Henry Schreiner
    @henryiii
    @pfackeldey The issue is here: scikit-hep/boost-histogram#296 . There could be a shortcut in Hist if this is not solved upstream, but ideally indexing by ["ee", "emu”] would not cause the histogram to convert into a view.
    Peter Fackeldey
    @pfackeldey
    @henryiii Thank you for your reply! Alright, I'll go for now with the reduce option, so that I don't have to convert into a view and can keep the hist :) I'd love to see a nice indexing solution in the future, such as the syntax you proposed in the issue.
    Peter Fackeldey
    @pfackeldey
    I hope you don't mind another question:
    I would like to fill a histogram with a very fine (regular) binning, e.g. 1000 bins from 0 to 1. After filling I would like to rebin into a variable binning of e.g. [0, 5, 7, 8, 9.5, 1] (only 5 bins). As far as I can see, there is the rebin action, but unfortunately it only accepts a single value/factor. Is it possible to rebin a Regular/Variable axis to any arbitrary binning, as long as the binning is "valid"?