Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Henry Schreiner
    @henryiii
    Boost.Histogram team is @HDembinski, congats to him :)
    Eduardo Rodrigues
    @eduardo-rodrigues
    :+1:
    Jim Pivarski
    @jpivarski
    Indeed, congratulations @HDembinski, that's awesome!
    Hans Dembinski
    @HDembinski
    Thanks, my lobbying seems to help this case
    alexander-held
    @alexander-held
    I came across the discussion in scikit-hep/boost-histogram#459 about boost-histogram returning consistent objects via view() for different storages. I personally value a consistent API for the tiny subset of features I use in practice (e.g. double/weight storages) higher than the extra flexibility. I suspect API consistency may also help with typing. Is this more consistent API something that may fit into hist (or is it maybe already available there)?
    Hans Dembinski
    @HDembinski
    You can now call histogram.values you don't need the view anymore.
    Henry Schreiner
    @henryiii
    The pressure on this is much less now that you have the PlottableProtocol methods, .values(), .variances(), and .counts(). You can’t quite get everything there, but most of it. Technically, Hist could change the return into a View even for simple storages, but then it would lose compatibility with boost-histogram, which is a no-go. Maybe, now that there’s an easy way to get those values, .view() could be made to always return a View, but that would be a breaking change and would need to be considered carefully
    Hans Dembinski
    @HDembinski
    @henryiii Beat you by the fraction of a second apparently
    Henry Schreiner
    @henryiii
    Not by much, but yes. You just did it with an email, too.
    alexander-held
    @alexander-held
    Thanks! This is very nice and I forgot I am already using the feature. Access for writing is still different I think (e.g. h.view().value = [3,5] for weight, h[...] = [1,2] for double), but that is probably less frequently needed.
    Henry Schreiner
    @henryiii
    Actually, h.values()[…] = [1,2] works. If you want to set variance, you should set both at the same time, h[…] = [[1,.1], [2,.1]].
    alexander-held
    @alexander-held
    That's great, I was not aware this works for weighted storage too. Thanks!
    Henry Schreiner
    @henryiii
    h.variances()[…] = will not work if variance is computed, as you are setting a computed value (mean storages). .values()[…] = should work on all the existing storages, though. I would mostly recommend setting them all at once, using the h[…] = syntax, though.
    Henry Schreiner
    @henryiii
    Hans Dembinski
    @HDembinski

    h.variances()[…] = will not work if variance is computed, as you are setting a computed value (mean storages). .values()[…] = should work on all the existing storages, though. I would mostly recommend setting them all at once, using the h[…] = syntax, though.

    This could be fixed, though, right

    Everything that a user may naively do should work as expected if it is technically possible
    Henry Schreiner
    @henryiii
    No, h.variances() returns a NumPy array that has been generated. Though https://github.com/scikit-hep/boost-histogram/discussions/504 would make this all much more elegent; you could write h.variances = … and that would just work (and support flow / noflow).
    Hans Dembinski
    @HDembinski
    Right, I forgot that it is a function call where we cannot just implement the [:]
    Hans Dembinski
    @HDembinski
    Added some more comments to that discussion
    Hans Dembinski
    @HDembinski
    @alexander-held The view gives you a low-level access to the accumulators. Since the accumulators are different, the interface is different. We will not change that. If you want the unified high-level interface, you can use .value and .variance on the histogram.
    1 reply
    Andrzej Novak
    @andrzejnovak
    Apologies if it's in the docs, but I couldn't find it. Doesn't hist have a .scale option?
    Dawned on me 1s after I asked, I can just multiply it
    heatherrussell
    @heatherrussell
    Are there plans to support dividing histograms / efficiencies / perhaps poisson uncertainties on data points (asymmetric error bars)?
    sorry if this is already stated somewhere, I checked the docs and the open issues but didn't see anything relevant
    Jim Pivarski
    @jpivarski

    The naive assumption when dividing histograms with error bars is that the error bars are independent (the same assumption that is usually made when adding or subtracting), but the most common use-case for dividing is to make an efficiency plot, in which the numerator is a strict subset of the denominator and both are counting statistics. Even when we know that this is the case, there are different ways of handling the statistics that differ for ratios close to 0 or 1. See the table on this page. There are strong arguments for some of these options, but not everybody agrees.

    So if there is a way to divide histograms in a histogramming library, it should probably be some kind of method call, so that the statistical treatment can be configurable. If it's just the / operator, a lot would have to be assumed.

    heatherrussell
    @heatherrussell

    So if there is a way to divide histograms in a histogramming library, it should probably be some kind of method call, so that the statistical treatment can be configurable. If it's just the / operator, a lot would have to be assumed.

    yes, definitely! If it's not appropriate to include in a histogramming library, is this something that people are doing manually in their analyses? Or is there some other stats package that's more appropriate to be using here? Because right now, the simplest way of making an efficiency plot seems to me to convert boost_histograms into TH1 and divide there, which is a little silly!

    alexander-held
    @alexander-held
    I think the coffea histogram implementation has some of the relevant methods already implemented (https://coffeateam.github.io/coffea/modules/coffea.hist.html).
    Matthew Feickert
    @matthewfeickert
    Matthew Feickert
    @matthewfeickert
    @heatherrussell If you want to use hist to make a ratio plot where the ratio is an efficiency then you could just follow the example in the User Guide with kwarg rp_uncertainty_type="poisson-ratio" (where of course you'd change the histograms so that hist_1 is a strict subset of hist_2).
    heatherrussell
    @heatherrussell

    I think the coffea histogram implementation has some of the relevant methods already implemented (https://coffeateam.github.io/coffea/modules/coffea.hist.html).

    thanks, I always steered away from coffea because all the examples are CMS-based and I haven't sat down and translated to atlas jargon :D I didn't actually realise it had histogramming!

    and

    If you want to use hist to make a ratio plot where the ratio is an efficiency then you could just follow the example in the User Guide with kwarg rp_uncertainty_type="poisson-ratio"

    I also hadn't realise that hist could do this properly because I didn't see any efficiencies in the example.

    two options now, thanks everyone!

    Andrzej Novak
    @andrzejnovak
    It should be pointed out the plan is for the coffea implementation of hist to switch to using hist, so if you have not used either yet, going for the one packaged with coffea is not recommended
    Matthew Feickert
    @matthewfeickert
    To follow up on @andrzejnovak's comment the coverage intervals that coffea has implemented were basically ported into the hist.intervals module, so using either should give identical results and if coffea moves to using hist it will basically just be changes in the API called. :+1: Big thanks to @nsmith- here as he was the first to implement these in coffea and has been very helpful in giving feedback and advice.
    Angus Hollands
    @agoose77:matrix.org
    [m]

    Hi all,
    When using UHI on a 3D histogram with an IntCategory first axis, I notice that it seems to ignore the starting indices of my projection, e.g.

    h[1::sum, ...] should project the contents after 0 in the first dimension, but this start index seems to be ignored. I see the same result if I slice and then manually call project.

    Upon further investigation, it seems that this only happens if I don't provide the stop attribute of the slice, i.e. h[1:len:sum] works. This is really, really useful, btw. Thanks for all the hard work.

    Angus Hollands
    @agoose77:matrix.org
    [m]
    I'll open a bug report with a reproducer :)
    Hans Dembinski
    @HDembinski
    Thanks, very much appreciated!
    Hans Dembinski
    @HDembinski
    @agoose77:matrix.org I cannot reproduce this on an 1-dimensional histogram. I use this syntax a lot myself and did not notice any issues so far.
    h[1:len:sum] cuts off the overflow bin, while h[1::sum] includes overflow
    The IntCategory axis has an overflow bin to keep track of all the items that did not end up in one of your categories.
    Hans Dembinski
    @HDembinski
    Ok, I can confirm this on IntCategory, it is a bug
    Angus Hollands
    @agoose77:matrix.org
    [m]
    @HDembinski thanks for taking a look, I'll try it again - I noticed that any value of start didn't affect the projection which surprised me
    Ah, excellent (not going mad) !
    Hans Dembinski
    @HDembinski
    Did you already start your Bug report?
    Otherwise I will post code to reproduce
    Angus Hollands
    @agoose77:matrix.org
    [m]
    Hans Dembinski
    @HDembinski
    This is a boost-histogram issue, but ok
    Angus Hollands
    @agoose77:matrix.org
    [m]
    OK, shall I migrate it?
    Hans Dembinski
    @HDembinski
    I moved it, thanks for reporting! Very annoying bug :(
    Angus Hollands
    @agoose77:matrix.org
    [m]
    No, thanks for checking in! I opened a redundant issue on boost-histogram, which I've now closed.
    Hans Dembinski
    @HDembinski
    I think you closed the issue that I moved?
    But you reopened it, so it is ok