Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
    Hans Dembinski
    Another weird thing: Numba has a typed list, numba.typed.List, which basically does this already and the performance seems to be good. The problem is that Numba does not provide a way to view this list as a numpy array, even though this should be trivial.
    There is an open issue for this, but it has "low priority"
    Eduardo Rodrigues

    Hi @eduardo-rodrigues and @henryiii just wanted to let you know that I love using particle

    Thanks a lot @HDembinski, that's great feedback :-)!

    BTW, I should say that I will be adding basic info on all nuclei by the end of the week (finally, since a request).
    Hans Dembinski
    Awesome :D
    The iminuit logo is finally online, could the Scikit-HEP website be updated to use the logo as well?
    Henry Schreiner
    Of course!
    Henry Schreiner
    Eduardo Rodrigues
    iminuit logo - nice to see it out there!
    Eduardo Rodrigues
    Re: nuclei in Particle - here's the kind of thing you will be able to do once the PR on nuclei goes in:
    >>> print(Particle.dump_table(filter_fn=lambda p: p.pdgid.is_nucleus and p.pdgid.A==32, tablefmt='rst',exclusive_fields=['pdgid', 'name', 'latex_name', 'mass', 'charge']))
    ===========  ======  ===========================  =============  ========
          pdgid  name    latex_name                            mass    charge
    ===========  ======  ===========================  =============  ========
     1000100320  Ne32    ^{32}\mathrm{Ne}             29844.986969         10
    -1000100320  Ne32~   ^{32}\mathrm{\overline{Ne}}  29844.986969         10
     1000110320  Na32    ^{32}\mathrm{Na}             29826.1148987        11
    -1000110320  Na32~   ^{32}\mathrm{\overline{Na}}  29826.1148987        11
     1000120320  Mg32    ^{32}\mathrm{Mg}             29807.0192697        12
    -1000120320  Mg32~   ^{32}\mathrm{\overline{Mg}}  29807.0192697        12
     1000130320  Al32    ^{32}\mathrm{Al}             29796.7448898        13
    -1000130320  Al32~   ^{32}\mathrm{\overline{Al}}  29796.7448898        13
     1000140320  Si32    ^{32}\mathrm{Si}             29783.7301475        14
    -1000140320  Si32~   ^{32}\mathrm{\overline{Si}}  29783.7301475        14
     1000150320  P32     ^{32}\mathrm{P}              29783.5057133        15
    -1000150320  P32~    ^{32}\mathrm{\overline{P}}   29783.5057133        15
     1000160320  S32     ^{32}\mathrm{S}              29781.7950523        16
    -1000160320  S32~    ^{32}\mathrm{\overline{S}}   29781.7950523        16
     1000170320  Cl32    ^{32}\mathrm{Cl}             29794.4804277        17
    -1000170320  Cl32~   ^{32}\mathrm{\overline{Cl}}  29794.4804277        17
     1000180320  Ar32    ^{32}\mathrm{Ar}             29805.6313435        18
    -1000180320  Ar32~   ^{32}\mathrm{\overline{Ar}}  29805.6313435        18
     1000190320  K32     ^{32}\mathrm{K}              29828.2293902        19
    -1000190320  K32~    ^{32}\mathrm{\overline{K}}   29828.2293902        19
    ===========  ======  ===========================  =============  ========
    (Masses are in MeV as HEP standard unit.)
    Hans Dembinski
    @henryiii Thank you that was super quick!
    @eduardo-rodrigues It took me a while because I wanted to replace the original font in the logo with a free font
    That table looks very impressive and it is markdown-compatible, isn't it?
    I was not aware of "dump_table" but that sounds very handy
    for my current project, I wanted a table of all long-lived particles (life-time > 30 picoseconds). I could have used this, I think
    Hans Dembinski
    Ah, I see, it is restructuredText
    And you can tell it which format to use, nice
    The tabulate package seems pretty cool
    Eduardo Rodrigues
    That's right, the example above is restructuredText. But tabulate allows you to print out in many other formats.
    I had presented Particle.dump_table(...) at PyHEP 2019, though I ran (too) quickly over a lot of material ...
    Henry Schreiner
    Tabulate is an optional dependency of pandas, and is used by .to_markdown(), just noticed that earlier when writing a post
    Eduardo Rodrigues
    Here is what you wanted to do, @HDembinski :
    >>> from hepunits import ps
    >>> from particle import Particle
    >>> print(Particle.dump_table(filter_fn=lambda p: p.lifetime>30*ps,exclusive_fields=['pdgid', 'name', 'lifetime']))
          pdgid  name                       lifetime
    -----------  -------  --------------------------
             11  e-                inf
            -11  e+                inf
             13  mu-              2196.98034989
            -13  mu+              2196.98034989
             21  g                 inf
             22  gamma             inf
            130  K(L)0              51.1431197705
            211  pi+                26.0327460626
           -211  pi-                26.0327460626
            310  K(S)0               0.0895429002893
            321  K+                 12.3793859591
           -321  K-                 12.3793859591
           2112  n        879374684632
          -2112  n~       879374684632
           2212  p                 inf
          -2212  p~                inf
           3112  Sigma-              0.147912798078
          -3112  Sigma~+             0.147912798078
           3122  Lambda              0.263179508775
          -3122  Lambda~             0.263179508775
           3222  Sigma+              0.0801817458213
          -3222  Sigma~-             0.0801817458213
           3312  Xi-                 0.16373431628
          -3312  Xi~+                0.16373431628
           3322  Xi0                 0.289961212091
          -3322  Xi~0                0.289961212091
           3334  Omega-              0.0815628192623
          -3334  Omega~+             0.0815628192623
     1000000010  n        879374684632
    -1000000010  n~       879374684632
     1000010010  p                 inf
    -1000010010  p~                inf
    The lifetime is given in ns, the standard HEP unit of time.
    Jim Pivarski

    I was playing around with Numba a lot these days for a project and wanted to share some surprising results.
    In this notebook, I am testing several low-level ways of making a dynamically growing numpy array inside a Numba-accelerated function. https://github.com/HDembinski/testing_grounds/blob/master/notebook/Growing%20Array%20in%20Numba.ipynb

    @HDembinski Actually, this makes a good demo of lowering in Numba: it requires a lot of undocumented features. I made an example here: https://gist.github.com/jpivarski/7bc83e5aa70d5e3dd8483eb49800885c

    And I guess I ought to put a lot of comments in it. But this could make a good teaching example, since it touches on just about everything.

    But the punchline is
    buf = GrowableBuffer(float, initial=10)
    def test6(x):
    assert numpy.asarray(buf).tolist() == [1.1, 2.2, 3.3, 4.4, 5.5]
    Jim Pivarski
    I just added a bunch of comments to make it more pedagogical. I think an "Extending Numba" tutorial could be written around it.
    Henry Schreiner
    Jim Pivarski
    Possibly, although the vector one has Awkward dependencies, and this one isn't connected to anything else.
    Hans Dembinski
    But how fast is it?
         newbuffer = numpy.zeros(reservation, dtype=self._buffer.dtype)
    numpy.empty would be better here, since you are overriding the memory in the next line anyway
    I am sceptical that this is going to be faster. It looks more or less like "ArrayBuilder" in my notebook and that was pretty slow, although I used @jitclass, which lowers the whole class
    Hans Dembinski

    def __init__(self, dtype, initial=1024, resize=2.0):

    The default should rather be 1.5, like in std::vector

    For those who didn't look into my notebook, the fastest version (the last one) was 141 times as fast as the version based on @jitclass https://github.com/HDembinski/testing_grounds/blob/master/notebook/Growing%20Array%20in%20Numba.ipynb
    Henry Schreiner
    @HDembinski Can you copy-and-paste it and try it on the same hardware?
    Hans Dembinski

    Sure. I pasted Jim's code into my notebook. It seemed to work, but running

    def fill(x):
        b = GrowableBuffer(float, 0)
        for xi in x:
        return b.__array__()
    %timeit fill(x)

    got me this error

    TypingError                               Traceback (most recent call last)
    <ipython-input-13-88dccecb2396> in <module>
          5         b.append(xi)
          6     return b.__array__()
    ----> 7 fill(x)
          9 get_ipython().run_line_magic('timeit', 'fill(x)')
    /usr/local/Caskroom/miniconda/base/envs/prompt_hadrons/lib/python3.6/site-packages/numba/dispatcher.py in _compile_for_args(self, *args, **kws)
        399                 e.patch_message(msg)
    --> 401             error_rewrite(e, 'typing')
        402         except errors.UnsupportedError as e:
        403             # Something unsupported is present in the user code, add help info
    /usr/local/Caskroom/miniconda/base/envs/prompt_hadrons/lib/python3.6/site-packages/numba/dispatcher.py in error_rewrite(e, issue_type)
        342                 raise e
        343             else:
    --> 344                 reraise(type(e), e, None)
        346         argtypes = []
    /usr/local/Caskroom/miniconda/base/envs/prompt_hadrons/lib/python3.6/site-packages/numba/six.py in reraise(tp, value, tb)
        666             value = tp()
        667         if value.__traceback__ is not tb:
    --> 668             raise value.with_traceback(tb)
        669         raise value
    TypingError: Failed in nopython mode pipeline (step: nopython frontend)
    Untyped global name 'GrowableBuffer': cannot determine Numba type of <class 'type'>
    File "<ipython-input-13-88dccecb2396>", line 3:
    def fill(x):
        b = GrowableBuffer(float, 0)
    Jim Pivarski
    @HDembinski Ideal resize is 1.5, rather than 2.0? That's great to know (I'll start using that everywhere).
    Your error comes from the fact that __array__ isn't a lowered function, but it could be.
    It would be similar to the _buffer property, except trimmed with the trim function.
    About numpy.zeros: yes, it was originally numpy.empty, but I switched to zeros while debugging. I'll switch it back in the gist.
    Jim Pivarski
    In cases where I've tried @jitclass, the performance was underwhelming. The same is true of iterators in Numba: it's possible, but it's considerably faster to write what would be idiomatic C than idiomatic Python. I hope this lowered GrowableBuffer works for you, since that's what I would use for a performance-critical application like this, rather than @jitclass.
    Oh, sorry: your first error is trying to create a GrowableBuffer inside of the function, which is another function we could add. (Have to switch into Python mode to make it, just like _ensure_reserved, so it would be another slow/rarely called function like that.) The second error you would encounter is .__array__() unless we add that. But these are both embellishments; to test it, you can create the GrowableBuffer outside the function and convert it to a NumPy array outside as well.
    Hans Dembinski
    Could you post a snippet that would work, so I can try the benchmark?
    I could accept that @jitclass is not doing well yet, because it is a rather new feature, but the surprising thing was that just putting the growth code into a separate jitted function also made the core much slower. That should not happen, if numba used inlining properly. AFAIK they do inline jitted code, and in other tests numba did spectacularly well, even in comparison with pybind11 C++ code, so... I am really puzzled by this.
    Whatever is going on, my benchmark is particularly sensitive to this effect. In my real code, the work that has to be done before each call to .append is significant, so the performance hit shouldn't be so bad
    Jim Pivarski
    (Actually, @jitclass has been around for a couple of years, and I looked into its implementation: it's like `List[T], carrying a mutable buffer for the class instance attributes, and it's all specialized, so the only thing that might be an issue is excessive reference-counting. I don't know what's going wrong.)
    This ought to work:
    def fill_loop(x, b):
        for xi in x:
    def fill(x):
        b = GrowableBuffer(float)
        fill_loop(x, b)
        return b.__array__()
    %timeit fill(x)
    Jim Pivarski

    The initial parameter can't be zero because it's the initial reservation, not the initial length. I had an

    assert resize > 1.0

    to sanity-check the resize parameter, but we also need to have

    assert initial > 0

    I don't know how much of a "cheat" it would be to make the initial very large—then you wouldn't be measuring the resize operation, just filling an array.

    I thought that initial=1024 would be reasonable (small enough that if you have a lot of barely filled arrays for some reason it's not going to be a problem, but big enough that the exponential growth gets started soon: no ramp-up through 1 item, 2 items, 4 items, etc.).

    (Except resize is now 1.5; in Awkward as well. Interesting to hear about the golden ratio!)