Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Henry Schreiner
    @henryiii
    If you want to merge arbitrary bins, say [[“one”, “two”], [“three”]], then there’s not a good way to do that, but it’s tricky to write as well; what is the name for the merged bin? Since we depend on Python 3.6+ now, I guess an (ordered) dict could be used; keys would be new bin names, and the values would be collections of bins to merge. If we had pick, I think it would be easier, you could select each iteration then merge.
    Andrzej Novak
    @andrzejnovak
    I would favour the dict semantics, should be more intuitive and less error-prone than any looping. It's a really common use case for virtually all of my workflows (ptbinned samples need to be separately scaled and then merged) and I've been hesitating on recommending new students to use hist instead of coffea/hist because the current workaround is not trivial
    Andrzej Novak
    @andrzejnovak
    I've run into something strange (wanted to check here before I open an issue). I create a 2D hist and fill it such that entries fall into flow bins. If I access flow bins separately like h[hist.tag.underflow, 1]a value gets returned as expected. If I try to make a projection like h[hist.tag.underflow, ...] I am hit with ValueError: bins > 0 required
    Angus Hollands
    @agoose77:matrix.org
    [m]
    @andrzejnovak: what kind of axes do you have?
    Andrzej Novak
    @andrzejnovak
    regular
    import hist
    import numpy as np
    s = hist.tag.Slicer()
    
    w = 1e-2  # e.g. a cross section for a process
    x = np.random.normal(loc=0.4, scale=0.4, size=1000)
    y = np.random.normal(loc=0.6, scale=0.4, size=1000)
    
    h = hist.Hist(
        hist.axis.Regular(3, 0, 1, name="x", label="axis"),
        hist.axis.Regular(3, 0, 1, name="y", label="axis"),
        hist.storage.Weight()
        )
    h.fill(x=x, y=y, weight=w)
    
    h[hist.tag.underflow,  ...]
    Angus Hollands
    @agoose77:matrix.org
    [m]
    Hmm. I'm not sure on this one 😕
    Andrzej Novak
    @andrzejnovak
    maybe @henryiii can be summoned :)
    Andrzej Novak
    @andrzejnovak
    unrelated: does boost_histogram has a recommended/defined way to serialize/save stuff? I vaguely recall seeing some discussion about it, but don't remember. I am currently writing TH1s to root files, but it feels superfluous since I am reading them back out and converting to numpy arrrays :D
    alexander-held
    @alexander-held
    I'm curious about this too! In particular I'm wondering about a use case where I may be producing many histograms in parallel and would like to save them to a single file. Is this possible, or do I need to first collect them (like coffea if I understand that correctly) and then save all at once?
    Henry Schreiner
    @henryiii
    Currently pickeling is well supported, and quite fast. There’s a plan in the very near future to offer something specific that will even work across boost-histogram for Python and Boost.Histogram C++, but that’s still in planning phase. There also might be Hist-specific support for some other formats, and uproot 4 writing should support boost-histograms.
    Jim Pivarski
    @jpivarski

    In this talk, https://indico.cern.ch/event/1028381/#6-pythonic-data-science, I pointed out that this is an area that needs improvement: serialization of boost-histograms and interoperability with ROOT. One avenue is to use ROOT histograms as the serialization format, which takes advantage of the predominance of that format, but would lose some of the boost-histogram features, like mixed categorical/continuous axes. Uproot writing should do those conversions in either case.

    Beyond that, there ought to be a lossless way to save and load boost-histograms.

    5 replies
    Oh, one of the other ideas was just for Boost.Histograms (in C++) to use ROOT's streamer-generation macros, so that they can be serialized to ROOT files without conversion into and out of TH*.
    Henry Schreiner
    @henryiii
    Hmm, using underflow/overflow in a slice to pick looks like a possible bug. Underflow/overflow is special in slicing, and since we don’t have a single-element pick, it uses slicing in the backend.
    Andrzej Novak
    @andrzejnovak
    Is there a possible workaround?
    Henry Schreiner
    @henryiii
    Just really hacky ones, like using .view()
    Fixing it likely will be hacky too unless we get a “pick_one” command of some sort.
    Andrzej Novak
    @andrzejnovak
    ah, that's a bummer, I've actually stumbled into this while trying to investigate some other sus flow related behaviour
    Henry Schreiner
    @henryiii
    It will be fixable, I think, just will be a little messy on the backend until/unless we get a one-element "pick”.
    Hans Dembinski
    @HDembinski
    I suggested a non-hacky solution in the PR that can be implemented in pure Python
    1 reply
    Jim Pivarski
    @jpivarski

    This is likely a very easy question, but I haven't found the answer. What's a quick way to h.plot(...) with a logarithmic y-axis? I've tried ylog=True, logy=True, and yscale="log".

    I could follow the example given in scikit-hep/hist#198 and extract the ax object out of the output, but I'm writing a tutorial and I want to show how easy this is. (It's for the Uproot tutorial, but I'm trying to include related libraries.)

    @henryiii (The above question is for hist, not necessarily boost-histogram.)
    Henry Schreiner
    @henryiii
    I think we currently just expect users use the existing mpl tools to do this, rather than trying to shortcut it - but it’s likely common enough to provide a shortcut. I’ll check in a little bit - we pass on to mplhep, so if mplhep already provides a shortcut, then that “just works”.
    Though, what’s wrong with plt.yscale(“log”)?
    Jim Pivarski
    @jpivarski
    I haven't imported matplotlib.pyplot as plt yet, but I could do that.
    Henry Schreiner
    @henryiii
    That way, you don’t have to look up anything, and it works everwhere, rather than being hist-specific.
    Jim Pivarski
    @jpivarski
    It just seems like something that should be part of a one-liner, just as it is in Pandas.
    Hist.plot is sending some properties downstream (i.e. nothing to look up: they're defined by Matplotlib), in the StepPatch object it turns out.
    Henry Schreiner
    @henryiii
    We could match pandas.
    Jim Pivarski
    @jpivarski

    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html

    • ax
    • figsize
    • title
    • grid
    • legend
    • logx
    • logy
    • loglog
    • xticks
    • yticks
    • xticks
    • yticks
    • xlim
    • ylim
    • xlabel
    • ylabel
    • rot
    • fontsize
    • colormap
    • colorbar

    ?

    Henry Schreiner
    @henryiii
    We could add some of those to Hist - I think mplhep should remain like mpl and use the normal, composible functions and calls.
    But I’m not against adding logy to hist. Not sure if logx makes much sense, or several of the others.
    Jim Pivarski
    @jpivarski
    That's true. This is a superset of what I think would be reasonable.
    ax is definitely good. That's saved me a few times (in Pandas).
    Henry Schreiner
    @henryiii
    We have ax
    Jim Pivarski
    @jpivarski
    Oh, good. When it's not a super-quick plot, I always start with fig, ax = plt.subplots(...).
    Henry Schreiner
    @henryiii
    Yes, as one should. :)
    Hans Dembinski
    @HDembinski
    I think it is really bad to duplicate matplotlib interface
    Wrapping a popular library like matplotlib behind your own interface is doing more harm than good, because now someone has to learn the matplotlib api and this additional api
    Andrzej Novak
    @andrzejnovak
    I agree with Hans here, though given how ubiquitous pandas is having similar shortcuts is not too far out of left field. While I also get a little annoyed by how long matplotlib.pyplot as plt is, duplicating the interface is not the way. For faster access I could see 2 options 1) have a shortcut to pyplot with hist.plt/mplhep.plt2) .plot()returning the ax object. I think Henry is very pro returning the artists, but the tuple with the artists could quite easily contain the ax as well. Then you could do h_obj.plot().ax.semilogy()if you really wanted a one-liner
    Henry Schreiner
    @henryiii
    I thought about that one liner, but it only works exactly once, you can’t make two changes that way (and you can easily get the axis from an artist). Is “pyplot” a required import, though? I seem tho remember that you can actually avoid that import, and just import the parts you need (though I’m prety sure we don’t do that)
    The point of hist.plot is to have a quick way to make a plot. If you want to import stuff, you can make figures/axes with plt, use mplhep.histplot, etc. Just like pandas, “.plot” is intended to be a shortcut for exploration, so mimicking the pandas API for quick-plotting isn’t unreasonable.
    Henry Schreiner
    @henryiii
    But I don’t want this to balloon, either, and skills you learn for matplotlib are composable / work anywhere, including other matplotlib plots, such as from pandas
    Supporing the pandas plot API might be something we could make a bit more formal. Will look into it.
    Matthew Feickert
    @matthewfeickert

    This isn't directly related, but this is reminding me of when I learned about how to use interactive vs. non-interactive matploltlib backends earlier this year and how that relates between using the matplotlib.pyplot.subplots vs. matplotlib.figure.Figure APIs

    https://gist.github.com/matthewfeickert/84245837f09673b2e7afea929c016904

    1 reply
    Henry Schreiner
    @henryiii
    Looks like plothist from mplhep already duplicates / provides a lot of this. I’d think keeping that simple and only having the shortcuts .plot would have been better.
    veprbl
    @veprbl:matrix.org
    [m]
    Hey! I have a question: Is there a file format that allows interop between C++ and python boost-histogram?
    Henry Schreiner
    @henryiii
    It’s planned. I think you can find a dissussion of it little while back in this(?) channel.
    veprbl
    @veprbl:matrix.org
    [m]
    I see. Thanks!
    Henry Schreiner
    @henryiii
    Shiny new versions of boost-histogram and hist are out, just in time for the talk at PyHEP. Watch the talk in 6.5 hours on Zoom or YouTube LiveStream (I think) to see what’s new!