Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Angus Hollands
    @agoose77:matrix.org
    [m]

    Hi all,
    When using UHI on a 3D histogram with an IntCategory first axis, I notice that it seems to ignore the starting indices of my projection, e.g.

    h[1::sum, ...] should project the contents after 0 in the first dimension, but this start index seems to be ignored. I see the same result if I slice and then manually call project.

    Upon further investigation, it seems that this only happens if I don't provide the stop attribute of the slice, i.e. h[1:len:sum] works. This is really, really useful, btw. Thanks for all the hard work.

    Angus Hollands
    @agoose77:matrix.org
    [m]
    I'll open a bug report with a reproducer :)
    Hans Dembinski
    @HDembinski
    Thanks, very much appreciated!
    Hans Dembinski
    @HDembinski
    @agoose77:matrix.org I cannot reproduce this on an 1-dimensional histogram. I use this syntax a lot myself and did not notice any issues so far.
    h[1:len:sum] cuts off the overflow bin, while h[1::sum] includes overflow
    The IntCategory axis has an overflow bin to keep track of all the items that did not end up in one of your categories.
    Hans Dembinski
    @HDembinski
    Ok, I can confirm this on IntCategory, it is a bug
    Angus Hollands
    @agoose77:matrix.org
    [m]
    @HDembinski thanks for taking a look, I'll try it again - I noticed that any value of start didn't affect the projection which surprised me
    Ah, excellent (not going mad) !
    Hans Dembinski
    @HDembinski
    Did you already start your Bug report?
    Otherwise I will post code to reproduce
    Angus Hollands
    @agoose77:matrix.org
    [m]
    Hans Dembinski
    @HDembinski
    This is a boost-histogram issue, but ok
    Angus Hollands
    @agoose77:matrix.org
    [m]
    OK, shall I migrate it?
    Hans Dembinski
    @HDembinski
    I moved it, thanks for reporting! Very annoying bug :(
    Angus Hollands
    @agoose77:matrix.org
    [m]
    No, thanks for checking in! I opened a redundant issue on boost-histogram, which I've now closed.
    Hans Dembinski
    @HDembinski
    I think you closed the issue that I moved?
    But you reopened it, so it is ok
    Angus Hollands
    @agoose77:matrix.org
    [m]
    😂 yes, it took a few goes, but we're now in a place of equilibrium :)
    Henry Schreiner
    @henryiii
    I love that we can move around issues within scikit-hep :)
    Angus Hollands
    @agoose77:matrix.org
    [m]
    It took me by surprise at first, super useful.
    Hans Dembinski
    @HDembinski

    I love that we can move around issues within scikit-hep :)

    Me, too, it is great

    Andrzej Novak
    @andrzejnovak
    In hist, how can I rebin on categorical axis?
    Andrzej Novak
    @andrzejnovak
    @henryiii
    Hans Dembinski
    @HDembinski
    @andrzejnovak Rebinning does not make sense on a category axis, since categories are not ordinal
    Andrzej Novak
    @andrzejnovak
    @HDembinski possibly a poor word choice, it's really this problem scikit-hep/hist#211 - merging categories if you will
    agoose77
    @agoose77:matrix.org
    [m]
    @andrzejnovak: this is a projection in hist terms
    I believe you can do this by some_hist[0:len:sum] IIRC *
    It's the sum that's doing the work here (the projection) https://uhi.readthedocs.io/en/latest/indexing.html#slicing
    The UHI syntax is really really nice.
    Henry Schreiner
    @henryiii
    If you want to merge arbitrary bins, say [[“one”, “two”], [“three”]], then there’s not a good way to do that, but it’s tricky to write as well; what is the name for the merged bin? Since we depend on Python 3.6+ now, I guess an (ordered) dict could be used; keys would be new bin names, and the values would be collections of bins to merge. If we had pick, I think it would be easier, you could select each iteration then merge.
    Andrzej Novak
    @andrzejnovak
    I would favour the dict semantics, should be more intuitive and less error-prone than any looping. It's a really common use case for virtually all of my workflows (ptbinned samples need to be separately scaled and then merged) and I've been hesitating on recommending new students to use hist instead of coffea/hist because the current workaround is not trivial
    Andrzej Novak
    @andrzejnovak
    I've run into something strange (wanted to check here before I open an issue). I create a 2D hist and fill it such that entries fall into flow bins. If I access flow bins separately like h[hist.tag.underflow, 1]a value gets returned as expected. If I try to make a projection like h[hist.tag.underflow, ...] I am hit with ValueError: bins > 0 required
    Angus Hollands
    @agoose77:matrix.org
    [m]
    @andrzejnovak: what kind of axes do you have?
    Andrzej Novak
    @andrzejnovak
    regular
    import hist
    import numpy as np
    s = hist.tag.Slicer()
    
    w = 1e-2  # e.g. a cross section for a process
    x = np.random.normal(loc=0.4, scale=0.4, size=1000)
    y = np.random.normal(loc=0.6, scale=0.4, size=1000)
    
    h = hist.Hist(
        hist.axis.Regular(3, 0, 1, name="x", label="axis"),
        hist.axis.Regular(3, 0, 1, name="y", label="axis"),
        hist.storage.Weight()
        )
    h.fill(x=x, y=y, weight=w)
    
    h[hist.tag.underflow,  ...]
    Angus Hollands
    @agoose77:matrix.org
    [m]
    Hmm. I'm not sure on this one 😕
    Andrzej Novak
    @andrzejnovak
    maybe @henryiii can be summoned :)
    Andrzej Novak
    @andrzejnovak
    unrelated: does boost_histogram has a recommended/defined way to serialize/save stuff? I vaguely recall seeing some discussion about it, but don't remember. I am currently writing TH1s to root files, but it feels superfluous since I am reading them back out and converting to numpy arrrays :D
    alexander-held
    @alexander-held
    I'm curious about this too! In particular I'm wondering about a use case where I may be producing many histograms in parallel and would like to save them to a single file. Is this possible, or do I need to first collect them (like coffea if I understand that correctly) and then save all at once?
    Henry Schreiner
    @henryiii
    Currently pickeling is well supported, and quite fast. There’s a plan in the very near future to offer something specific that will even work across boost-histogram for Python and Boost.Histogram C++, but that’s still in planning phase. There also might be Hist-specific support for some other formats, and uproot 4 writing should support boost-histograms.
    Jim Pivarski
    @jpivarski

    In this talk, https://indico.cern.ch/event/1028381/#6-pythonic-data-science, I pointed out that this is an area that needs improvement: serialization of boost-histograms and interoperability with ROOT. One avenue is to use ROOT histograms as the serialization format, which takes advantage of the predominance of that format, but would lose some of the boost-histogram features, like mixed categorical/continuous axes. Uproot writing should do those conversions in either case.

    Beyond that, there ought to be a lossless way to save and load boost-histograms.

    5 replies
    Oh, one of the other ideas was just for Boost.Histograms (in C++) to use ROOT's streamer-generation macros, so that they can be serialized to ROOT files without conversion into and out of TH*.
    Henry Schreiner
    @henryiii
    Hmm, using underflow/overflow in a slice to pick looks like a possible bug. Underflow/overflow is special in slicing, and since we don’t have a single-element pick, it uses slicing in the backend.
    Andrzej Novak
    @andrzejnovak
    Is there a possible workaround?
    Henry Schreiner
    @henryiii
    Just really hacky ones, like using .view()
    Fixing it likely will be hacky too unless we get a “pick_one” command of some sort.
    Andrzej Novak
    @andrzejnovak
    ah, that's a bummer, I've actually stumbled into this while trying to investigate some other sus flow related behaviour
    Henry Schreiner
    @henryiii
    It will be fixable, I think, just will be a little messy on the backend until/unless we get a one-element "pick”.
    Hans Dembinski
    @HDembinski
    I suggested a non-hacky solution in the PR that can be implemented in pure Python
    1 reply