Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Henry Schreiner
    @henryiii
    I’m only talking about the upper bin
    Sorry, didn’t realize I didn’t specify that. But the point is a user should be able to make these decisions in the functor
    Hans Dembinski
    @HDembinski
    agreed
    still, if we want to use the name loc, we need to agree on something
    because "loc" doesn't explicitly suggest what's happening
    Henry Schreiner
    @henryiii
    I like loc behaving like the integer form, similar to array[1:3], always rounding down, but don’t have a strong preference.
    Nicholas Smith
    @nsmith-
    btw what's the situation for NaNflow bins in boost::hist?
    Henry Schreiner
    @henryiii
    Currently doesn’t have them as far as I know (develop doesn’t, anyway)
    I assume that would be implementable as a custom axis type by a third party if needed
    (Though having an option on the current axes would be more elegant)
    Do you need them, I guess?
    Nicholas Smith
    @nsmith-
    eh, they can act as debug tools but idk if anything else
    Hans Dembinski
    @HDembinski
    NaNs are put in the overflow bin
    Hans Dembinski
    @HDembinski
    If you want to know explicitly when you fill a NaN, you can add a check outside of histogram. It is not a good idea to add NaNflow bins to Boost::Histogram, as it comes with a memory cost and conceptually it is not clear how to handle them. Underflow and overflow have a clear meaning, one counts things below the axis range, the other above. When you integrate over an axis, it is clear what should happen with these bins. Where should you add NaN bins?
    Not all feature ideas are good ideas.
    Hans Dembinski
    @HDembinski

    @jpivarski What do you think about using the "keyword" slice to do a slicing cut in C++? It would look something like this

    auto h2 = reduce(h1, slice(2, 3));

    Note that I have to use functions such as slice, because C++ does not have keyword arguments.

    Jim Pivarski
    @jpivarski
    @HDembinski Actually, since this is C++, perhaps the appropriate word (and object type) is std::span? That might be better than importing a word from Python that has a C++ equivalent already. (Spans don't have a skip, right? Well for this, you don't want that anyway...)
    Hans Dembinski
    @HDembinski
    I cannot use std::spandirectly, because it is C++20, but I could use a type called span. std::span can be constructed in many different ways, one way is from a pair of iterators. I suppose that's the meaning you are referring to, because axis indices behave a bit like iterators (but not quite, they are not dereferencible).
    I am not sure whether the analogy is a good one to draw from or whether it would be better to use a different name to not allude to the similarity
    Henry Schreiner
    @henryiii
    I prefer something like slice, expecially if it could take some of the more complicated ideas we’ve had at some point. Would it help to draw up a plan for a similar “short-cut” syntax to the one we have for Python? In C++ we can use custom literals, etc. so it might be interesting.
    Hans Dembinski
    @HDembinski
    Like you said, C++ is not used in scripts so short-cut syntax is not needed.
    Henry Schreiner
    @henryiii
    It would not have to be be “short”, but it could be very useful to have an axis-by-axis transformation syntax.
    Hans Dembinski
    @HDembinski
    But since the slice notation is used in Python, I would like to offer it also in C++, so that the Python code can delegate to the C++ code
    Well, not the notation, but the functionality
    Henry Schreiner
    @henryiii
    Yes, I’d like to see the same sort of expression be possible in C++ - I don’t expect it to be “short-cut”, just clear :)
    Hans Dembinski
    @HDembinski

    It would not have to be be “short”, but it could be very useful to have an axis-by-axis transformation syntax.

    With reduce you can already transform axis by axis

    You cannot use project in reduce currently, but I could add that functionality
    Hans Dembinski
    @HDembinski

    Yes, I’d like to see the same sort of expression be possible in C++ - I don’t expect it to be “short-cut”, just clear :)

    Ok, here we agree then

    Henry Schreiner
    @henryiii
    Yes, I think it all could go into reduce. I was thinking something that had one transform per axis would be nice, but you can mimic that in reduce just by adding axis 0, 1, …, N to each argument, so that’s okay. And it nicely provides a way to do “nothing” to an axis - just don’t add that axis. I would recommend users provide all axes sequentially and not duplicate an axis for readibilty
    Hans Dembinski
    @HDembinski

    reduce in C++ was designed in this way so that the expensive allocation of a new storage buffer is only done once.

    hist2 = hist.shrink(0, 3, 6).rebin(0, 2).shrink(1, 2, 4).rebin(1, 2)

    does four expensive allocations and deallocations. Allocating from the heap is one of the most expensive operations nowadays, costing 100 to 1000 cycles.

    hist2 = reduce(hist, shrink_and_rebin(0, 3, 6, 2), shrink_and_rebin(1, 2, 4, 2))

    does only the minimum of one such allocations.

    Henry Schreiner
    @henryiii
    hist2 = hist.reduce(bh.shrink_and_rebin(0, 3, 6, 2), bh.shrink_and_rebin(1, 2, 4, 2)) either has to have two version implemented (1 arg and 2 arg), or it has to also do the same chain.
    Hans Dembinski
    @HDembinski
    Why? You will delegate to the C++ version, no?
    reduce takes *args
    and reduce needs to be a free function, not a method
    If you are not sure, maybe you should say so?
    Martin Percossi
    @junzi_gitlab
    Hi all
    First off - what a lovely library you have written.
    Second, I am here because I have a question: is it possible to create a histogram whose bins acquire the values passed from a numpy array?
    Hans Dembinski
    @HDembinski
    @junzi_gitlab Just to clarify, which library do you mean? We mostly discuss here the Python frontend to the C++ library Boost.Histogram, but many other libraries have been discussed as well.
    Martin Percossi
    @junzi_gitlab
    @HDembinski - thanks for your reply. I meant Boost.Histogram...
    I would like to use .view() to turn a histo into a set of values I can stuff in a pandas dataframe. Then I would like to perform a window operation on the values. This works perfectly. My problem is putting the modified bins back into a histo.
    Martin Percossi
    @junzi_gitlab
    Looking at the examples I fear it can't be done...
    Martin Percossi
    @junzi_gitlab
    Another thing I find a bit odd: to my mind a canonical operation of a histogram is to a) fill it with data and then b) ask, for some(new) value z, "what is the percentile for z?" IIUC this requires operations "outside" the Boost.Histogram library (e.g. h.view().cumsum(), h.axis(0).index(X), etc.), which I find surprising.
    Doug Davis
    @douglasdavis
    boost-histogram is very much in development and I think I'm not overstepping by saying that you should feel free to open an issue in the GitHub repo to spin up a feature request.
    Henry Schreiner
    @henryiii
    Please do open an issue with what you want to do. We can probably add it. Hopeing for a less-alpha version by mid Oct. Also keep in mind there should be a nicer “wrapper” for boost.histogram called hist at some point.
    Martin Percossi
    @junzi_gitlab
    Thanks for the suggestion; managing my free time is always an issue with three kids and a side business (zenaud.io), apart from a day job in finance, but this is a really great project so if I do get a free slot on the weekend I would love to help. I agree that the wrapper could use a few tweaks here and there; e.g. more use of properties as opposed to methods which would make it feel more pythonic. BTW I have managed to circumvent my issue for now by taking my calculation out of Boost.Histogram: after initially filling I rip the bins out using .view(flow=True) and the rest of my calculation then operates on numpy arrays and pandas objects.
    Just to end on a positive note - the speed of the library is amazing. Also, I really like the focus of the library - no pointless methods to plot the contents of the histogram. As a C++ developer I like that I can write code that can accept objects created from a Python script. So all in all I'm very pleased, and look forward to making use of more advanced features.
    Hans Dembinski
    @HDembinski

    Another thing I find a bit odd: to my mind a canonical operation of a histogram is to a) fill it with data and then b) ask, for some(new) value z, "what is the percentile for z?" IIUC this requires operations "outside" the Boost.Histogram library (e.g. h.view().cumsum(), h.axis(0).index(X), etc.), which I find surprising.

    Boost.Histogram provides Python bindings to the C++ library of the same name. Boost and the C++ standard library follow the philosophy of providing a set of orthogonal components. Each one is specialized to do just one thing really well and interfaces are provided to combine all these components to do amazing things. The keyword here is orthogonal, this means that one library does not duplicate functionality that can be obtained by another library component already.

    This is the most efficient and powerful way to design a set of libraries, because it requires you to learn the smallest amount of library interfaces. I don't know whether you have played with Lego as a kid, but this is the same thing. You can build anything with Lego, because all the pieces fit together.

    The Python wrapper to Boost::Histogram will follow the same philosophy. It is a mapping of the C++ functionality. On top of that, Henry is working on a library called hist which provides an interface which allows you to do common analysis tasks in a flash.

    Just to end on a positive note - the speed of the library is amazing. Also, I really like the focus of the library - no pointless methods to plot the contents of the histogram. As a C++ developer I like that I can write code that can accept objects created from a Python script. So all in all I'm very pleased, and look forward to making use of more advanced features.

    That's great to hear, you are one of users then which I had in mind when I designed Boost.Histogram :).