Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Matthieu Marinangeli
    @marinang
    Basically replace
    from root_numpy import root2array, rec2array
    
    signal = root2array("/tmp/HIGGS-signal.root",
                        "tree",
                        branch_names)
    signal = rec2array(signal)
    with
    import uproot
    
    signal_tree = uproot.open("/tmp/HIGGS-signal.root")["tree"]
    signal = signal_tree.arrays(branch_names)
    Matthew Feickert
    @matthewfeickert

    Hey @all. My Ph.D. students have been asking me questions about how to do end user analysis with the scikit-hep ecosystem and I've obviously shown them all of our work and tutorials. However, I was also considering making a small helper package that would wrap a lot of the common things one might want to do with uproot and mplhep and friends to make onboarding people faster. It would be pip installable to be useful but it wouldn't be on PyPI.

    Do people think this is a good idea? Or is a bit of an anti-pattern for what we're trying to achieve as a project? Is helping my Ph.D. students go from 0 to plots in group meetings even faster a good idea, or is the creation of yet another wrapper utility library a bad thing for proper adoption of all of our tools?

    Eduardo Rodrigues
    @eduardo-rodrigues
    Hi @matthewfeickert. Hmm, as usual, no perfect solution.
    How about the following: for easy environments e.g. for experiments we've had the idea of the scikit-hep metapackage for a long while, but never actually turned that 1st org package into a proper metapackage. At this stage we're really not that far anymore. Once ready that would be a good solution for an environment for your students?
    Then, progress has been made in the tutorials package scikit-hep-tutorials, see https://scikit-hep.org/scikit-hep-tutorials for the little bits ready, and a lot more is in the pipeline - sadly, higher-priority work often gets in the way.
    Your use case could serve as a motivation for that extra push needed? I'm also willing to help a bit.
    Just a thought. Pushing on this would be beneficial for everyone.
    Matthew Feickert
    @matthewfeickert

    @eduardo-rodrigues agreed that this is hard.

    I like the idea of the metapackage and actually forgot that https://github.com/scikit-hep/scikit-hep existed. I think the "hard" thing here is having everyone agree on what is the scope of the library functionality. For example, should there be some sort of scikithep.viz.plot_stack_hist functionality for making a very common stack plot using @andrzejnovak's mplhep? This obviously exists in mplhep but a wrapper function for something I know my Ph.D. students are going to have to write themselves to avoid the same 20 lines of code over and over is useful. But should this be in mplhep or should it be in scikit-hep? Or neither?

    It seems like this is a bit of a tricky design situation akin to asking "what should be in matplotlib and what should be in seaborn?"

    Also great to see that there is a Jupyter Book for the tutorials (which I also didn't know about)! Nice work there @eduardo-rodrigues and @henryiii! :D

    Henry Schreiner
    @henryiii
    The idea of the metapackage is to have a simple why to set up an environment, not that it “wraps” everything with a new interface. There will likely not be scikithep. or skhep. anything eventually, but pip install scikit-hep will just be a good thing to tell a student for them to get started with a coherent environment. If, for example, you had scikithep.viz.plot_stack_hist, you coudn’t use it if you didn’t have scikit-hep installed, but you did have mplhep and hist installed. That would either be bad for keeping distinct, modular packages, or it would duplicate the interface and double the training matrial, both bad.
    pip install scikit-hep would give you uproot, awkward, hist, boost-histogram, mplhep, vector, etc.
    (It is getting close to that now, but there are still a few things inside scikit-hep that are not standalone yet.)
    Matthew Feickert
    @matthewfeickert
    Ah okay. So it sounds like the plan for scikit-hep would be similar in effect to the Tidyverse in R (from what I understand from my R using wife)? So there would be intentionally limited uses cases forimport skhep (some still, but not like a wrapper library) and its main job is to just install the ecosystem. (yes?)
    Matthew Feickert
    @matthewfeickert

    Your organization, scikit-hep, is off the waitlist—you're all set to start using code scanning on public repositories!

    Learn more about code scanning.

    Note, if you requested access on private repositories as well, you should hear from a member of GitHub's sales team soon with more details.

    Nice

    Eduardo Rodrigues
    @eduardo-rodrigues
    OK, so @henryiii already replied on the scope of the scikit-hep metapackage. For reference, when the project started, we did start the direction you mention, that would go with from skhep.XXX import YYY, but then figured that we should rather be very modular and so on. The directions we took in the end - what we have today - proved right I think :-).
    Yes, that R package seems to be a bit the same philosophy.
    We do not need that much work to finalise a dev version of the metapackage. Actually, one of the things nice to have would be a way to test the compatibilities with a docker container ... As for the few submodules still in scikit-hep, we can still have them in a dev release, since harmless. Indeed that does not prevent the usage of the package as a metapackage. Have a thought and maybe we can give the idea that extra push it needs?
    Thanks anyway for triggering the discussion.
    Matthew Feickert
    @matthewfeickert

    Actually, one of the things nice to have would be a way to test the compatibilities with a docker container

    @eduardo-rodrigues can you elaborate on what is needed here? Maybe in an https://github.com/scikit-hep/scikit-hep issue with me tagged?

    Thanks anyway for triggering the discussion.

    Sure. I guess I'm still interested in discussing the idea of if a utility library with a higher level API that wraps a lot of common patterns we all use is useful or if that is a community anti-pattern.

    Nicholas Smith
    @nsmith-
    https://scikit-hep.org/developer/ is a nice resource. Since conda is a big part of this community, I wonder if some discussion on putting packages into conda-forge could be added?
    Eduardo Rodrigues
    @eduardo-rodrigues

    https://scikit-hep.org/developer/ is a nice resource. Since conda is a big part of this community, I wonder if some discussion on putting packages into conda-forge could be added?

    Ping'ing @chrisburr and @henryiii as conda maintainers, if I may say. Why don't you go ahead and create an issue for this in the https://github.com/scikit-hep/scikit-hep.github.io/ repo?

    Actually, one of the things nice to have would be a way to test the compatibilities with a docker container

    @eduardo-rodrigues can you elaborate on what is needed here? Maybe in an https://github.com/scikit-hep/scikit-hep issue with me tagged?

    @matthewfeickert I opened scikit-hep/scikit-hep#108 to discuss. I was thinking out loud. Let's see if the idea actually makes sense ...

    Matthew Feickert
    @matthewfeickert
    @/all I just noticed that Scikit-HEP is listed in the NumPy paper! https://www.nature.com/articles/s41586-020-2649-2 :) c.f. top of Figure 2
    Jonas Eschle
    @mayou36
    Yeah, great to see that, good spot! ;) Although, to be fair, under the wrong topic I think, it should rather be domain-specific, similar to AstroPy
    Matthew Feickert
    @matthewfeickert
    Yeah, sure. Though given the size differences between Scikit-HEP and the others I can see why they might now lump us in with SunPy (aka, most of heliophysics ;) ). So I'll take it for now happily.
    Eduardo Rodrigues
    @eduardo-rodrigues
    Thanks for sharing, and well spotted! Indeed great modulo the caveat already mentioned.
    abhijitm08
    @abhijitm08
    Screenshot 2020-09-17 at 19.10.54.png
    HI All, I am using uproot4 version 0.0.23 (installed through pip). When I specify the library to pandas it returns an empty tuple but using numpy library works. Is there something I am doing wrong?
    5 replies
    If I set cut to None and use pandas library it reads fine.
    Hans Dembinski
    @HDembinski
    Hi all, iminuit v1.5.0 was released yesterday. The most notable changes are in the Jupyter displays, which have been restructured to show a warning when a parameter is fitted at a limit and the coloring was changed to make it dark theme friendly (and friendlier for ppl with color blindness)
    If you wonder how to change Jupyter to a dark theme, you can use the module jupyterthemes
    Hans Dembinski
    @HDembinski
    I tried out Sympy last week to write a generic error propagation formula and was pleasantly surprised how nice it is. So I wrote a mini-notebook about it. https://github.com/HDembinski/essays/blob/master/error_propagation_with_sympy.ipynb
    Hans Dembinski
    @HDembinski
    @andrzejnovak Congratulations on adding pyplot.stairs to matplotlib! I quickly read through the three PRs yesterday and it was a major effort and you persisted. At some point you seemed to be frustrated by the iterative nature of implementing something, which then triggers discussion about the design and new design specs. That is indeed normal for high-quality (FOSS) projects. When you try to find the best solution, some iteration is necessary. Often the best solution becomes clear only after several iterations of writing code.
    The matplotlib maintainers are really good. I questioned whether it should be pyplot.stair instead of pyplot.stairs and they had thought about this and gave a very comprehensive answer in favour of stairs.
    Cédric Hernalsteens
    @chernals
    @jpivarski Thanks for your quick reply on github. A couple questions to make sure I really understand:
    • uproot methods is gone and is replaced by the behaviors, right ?
    • the models should be generated by the streamers if not defined manually, does that mean that in the case of boost::histogram , even the streamers are missing (I do indeed get an unknown class) ?
    <Unknown BDSBH4D<boost::histogram::histogram<tuple<boost::histogram::axis::regular<double,boost::use_default,boost::use_default,boost::use_default>,boost::histogram::axis::regular<double,boost::use_default,boost::use_default,boost::use_default>,boost::histogram::axis::regular<double,boost::use_default,boost::use_default,boost::use_default>,boost::histogram::axis::regular<double,boost::use_default,boost::use_default,boost::use_default>>,boost::histogram::storage_adaptor<vector<double>>>> at 0x00013e4e15e0>
    Jim Pivarski
    @jpivarski

    Uproot-methods was intended to separate behaviors for ROOT objects from the deserialization code in Uproot. The idea was that Uproot is not supposed to contain analysis tools, and if enough behaviors are added to objects to make them useful for analysis, that wall breaks down. Uproot-methods was also intended to be user-contributed, and it was, but not much.

    I've given up on that separation in Uproot 4, in part because more coordination between Uproot-methods and Uproot was needed than expected. The behaviors on Uproot 4 objects are still not supposed to make them analysis tools, though: they should consist of data extraction only. (Writing files is a different mechanism.) So for instance, edges, values, to_boost, to_numpy...

    To find out if the object has a streamer, use

    tree.file.show_streamers()

    or show_streamers("BDSBH4D<boost::histogram...") because if you pass the name of a C++ class, it will only show that class and its dependencies.

    Or use

    tree.file.streamers

    for a dict of dicts (C++ name → class version → streamer info), which might be easier to search via its keys.

    @HDembinski What do you use to encode Boost.Histograms in ROOT?

    1 reply
    Cédric Hernalsteens
    @chernals

    Thanks again, I’ll have a look (maybe not tonight).

    Just a quick note: the actual boost::histograms are actually within a wrapper class, which is templated, and which contains the histograms. For some reason we ended up with that “design” but it is all a bit messy and might be due to my limited understanding of templates in C++.

    It seems that the streamers are present for my wrapper class and also for boost::histogram::(histogram, axis, etc).
    I also seems that one of PhD student, out of despair to write those in a ROOT file made it such that BDSBH4D inherits from TH1D although it shouldn’t and if it does it is just to make it “ROOT friendly”.
    Hans Dembinski
    @HDembinski
    Ok, I just deleted a couple of comments after reading this more carefully. @chernals You apparently have saved a C++ Boost.Histogram in a ROOT file. How did you do that? I am surprised that this works at all, I certainly did not implement a streamer in C++ to make that happen. Perhaps CLING generates streamers for Boost.Histograms automatically, or someone wrote them and I am not aware of that.
    Cédric Hernalsteens
    @chernals

    @HDembinski Yes I believe we have correctly written boost::histograms in a ROOT file. We basically have a class, which inherits from TH1D, that holds a boost::histogram, and we write that class “as usual”.

    So far we managed only to read it from Python (using pyroot) by calling a C++ methods on that class, that takes the filename and reads it. So it seems that we have an issue with pyroot. That part is not totally clear to me yet, we have a student looking at it.

    With uproot4, I can see it in the file but it is an “unknown” class, although it seems that the streamers are present (and that makes, sense, otherwise it wouldn’t be written in the first place I guess).

    Matthieu Marinangeli
    @marinang

    Hey folks, I posted an issue on the hepstats repo about a renaming of a submodule called hypotests which stands for "hypothesis tests" scikit-hep/hepstats#32. With this submodule one can do discovery tests and compute upper limits or confidence intervals using the Likelihood Ratio (LR) as a test statistic. For the test statistic distributions either the asymptotic formulae are used or they are constructed with pseudo-experiments.

    hepstats has rooms for other statistic tools such as:

    so it is maybe misleading to keep the name hypotests for a submodule doing inferences with the (profile) LR as a test-statistic only. I was thinking to give a name to this submodule implying that the LR is used such as:

    • likelihoodratio
    • lrstat
    • lrinferences
    • ....

    Another possibility is to split hypotests in several submodules that could contain the submodules for:

    • inferences using the LR
    • energy test
    • bump hunter (?)
    • ... any other statistical test we use in hep

    Any suggestions?

    By the way I just finished my PhD and I don't know where I will work in the future. So I probably won't be able to actively develop for hepstats anymore :( .
    But I will try to do my best to maintain it.

    Giovanni Volta
    @GiovanniVolta

    Hi everyone, I am new on using iminuit. Currently I am trying to perform a likelihood fit of two guassian peaks plus a flat noise. A part for the scaling factors of the two peaks I have a good initialization for th other parameters.
    Nevertheless, during migrad routine the parameters are setted to nan. The total function is continuos and second derivate do exists. I am attaching a reproducible example:

    def background(x, m, q, N1, mu1, sigma1, N2, mu2, sigma2):
        print(m, q, N1, mu1, sigma1, N2, mu2, sigma2)
        gauss_1 = N1 * 1 / np.sqrt(2 * np.pi) / sigma1 * np.exp(-(x - mu1) ** 2 / 2. / sigma1 ** 2)
        print(gauss_1)
        gauss_2 = N2 * 1 / np.sqrt(2 * np.pi) / sigma2 * np.exp(-(x - mu2) ** 2 / 2. / sigma2 ** 2)
        print(gauss_2)
        flat    = m*x + q 
        print(flat)
        print('\n###########\n')
        return flat + gauss_1 + gauss_2
    
    # histogram entry
    hh = [49., 48., 44., 46., 49., 58., 46., 54., 61., 112., 142., 133., 163., 140., 140., 194., 379., 625.,
          999., 1381., 1461., 1179., 926., 521., 309., 148., 76., 57., 41., 49., 47., 37., 48., 38.]
    # bins
    bb = [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,
          44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55]
    
    mu_kr_1_ini, sigma_kr_1_ini = 32.14, 1.83
    mu_kr_2_ini, sigma_kr_2_ini = 41.54, 2.09
    m_flat_ini, q_flat_ini      = 0, 55
    
    from iminuit.cost import BinnedNLL
    binned_nll = BinnedNLL(hh, bb, background)
    B = Minuit(binned_nll,
               N1  = 1000, N2 = 10000,
               m      = m_flat_ini,     fix_m      = False,  #limit_m      = (None, None),
               q      = q_flat_ini,     fix_q      = False,  #limit_q      = (None, None),
               mu1    = mu_kr_1_ini,    #limit_mu1    = (None, None), fix_mu1    = True,   
               sigma1 = sigma_kr_1_ini, #limit_sigma1 = (None, None), fix_sigma1 = True,
               mu2    = mu_kr_2_ini,    #limit_mu2    = (None, None), fix_mu2    = True,   
               sigma2 = sigma_kr_2_ini, #limit_sigma2 = (None, None), fix_sigma2 = True,
               errordef=Minuit.LIKELIHOOD)
    B.migrad()

    Is what I am doing reasonably? How can I avoid that the parameters are set to nan?

    Dmitry Romanov
    @majesticra_gitlab
    Hi all! I'm trying to tame hist to plot histograms (from existing root file). What I can't find is an example of how to set 1d histo background?
    Dmitry Romanov
    @majesticra_gitlab
    Ok, figured it out from mplhep:
    h.plot(histtype='fill', alpha=0.1)
    Hans Dembinski
    @HDembinski
    @GiovanniVolta Hi, you need to set limits on some parameters. The sum of flat + gauss_1 + gauss_2 must be positive, since BinnedNLL computes the logarithm of this
    In your case this is not going to be obvious. I suggest to do a LeastSquares fit first and then start the BinnedNLL fit if you have the result from the LeastSquares fit
    More hints: Use scipy.stats.norm or some other library implementation of the pdfs instead of writing the Gaussian by hand
    Hans Dembinski
    @HDembinski
    Oh, another mistake: BinnedNLL expects a CDF, while you compute the PDF
    I admit that the doc string of BinnedNLL is currently not great and does not explain all this.