Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Marius Wachtler
    @undingen
    Hi, sorry here in austria it's already quite late :-( and so I decided to not join the session after lunch/dinner. But I hope you are making good progress :-)
    Matti Picus
    @mattip
    datnamer
    @datnamer
    @njsmith are we considering "strict mypy" to be a valid source format option and if so can it go in the notes?
    Antoine Pitrou
    @pitrou
    Can someone elaborate on what exactly "the way forward" is? :-) Sorry, it's a bit hard to follow remotely.
    Siu Kwan Lam
    @sklam
    Antoine Pitrou
    @pitrou
    I'm not sure np.int32.__add__ is an interesting case. Every JIT should already be able to implement this most trivially.
    Difficult / interesting things are: 1) edge case semantics (NaN, NaT etc.) 2) broadcasting 3) complicated indexing 4) the myriad of weird high-level helpers Numpy has
    Of course you also want the exposed IR to be expressive enough for efficient optimizations (hence my remark about nditer() for commutative reductions)
    Nathaniel J. Smith
    @njsmith
    the goal isn't to find something really compelling, exactly, so much as to find a first place to tip our toes into the water
    though __add__ is a bit tricky in that numpy's scalar __add__ does overflow checking
    Antoine Pitrou
    @pitrou
    Ah, well, that's part of the discussion, then: should JITs also display the RuntimeWarning? It can pretty much ruin performance if it prevents SIMD vectorizing :-)
    Call it "numnum"
    "pitrou"
    Nathaniel J. Smith
    @njsmith
    are there a lot of opportunities for SIMD vectorization of scalar add? I guess it happens...
    Richard Plangger
    @planrich
    if you do vector + scalar there is
    you can constant/variable expand the scalar value into a vector
    Nathaniel J. Smith
    @njsmith
    yeah, but that goes through the np.add ufunc, not scalar add
    and for no sensible reason, scalar add has overflow checking and np.add does not
    Richard Plangger
    @planrich
    overflow checking is necessary to stay in python language semantics
    Nathaniel J. Smith
    @njsmith
    (personally I think the ideal semantics would be that numpy by default does overflow-checking for both scalars and arrays, and that this can be controlled by something like np.errstate, so if people need maximum speed they can explicitly turn off the checking. In this case there would still be a branch to check whether checking was enabled, but this seems easily hoistable.)
    In [7]: np.int32(2**31 - 1) * np.int32(2**31 - 1)
    /home/njs/.user-python3.5-64bit/bin/ipython:1: RuntimeWarning: overflow encountered in int_scalars
      #!/home/njs/.user-python3.5-64bit/bin/python3.5
    Out[7]: 1
    ^^ python language semantics may disagree but :-/
    and if you use arrays instead, you get 1 without the warning
    Richard Plangger
    @planrich
    I guess you mean 'overflow checking' should occur for arrays as well?
    (controlled by something like np.errstate)
    Nathaniel J. Smith
    @njsmith
    yes
    right now it's unconditional for scalars and impossible for arrays; I think it should be consistent between scalars and arrays, controllable, and default to on (in decreasing order of priority)
    Kevin Modzelewski
    @kmod
    As a non-numpy person, I feel like it would be great if there were some example benchmarks that were relatively agreed to be representative
    *Representative of the things that people would want to be faster in numpy
    It could also help make it more clear how the work on the add function would end up translating into speedups
    Nathaniel J. Smith
    @njsmith
    @kmod : there were some links to existing benchmarks earlier in chat here: https://gitter.im/python-compilers-workshop/chat?at=57851290b79455146fa44595
    Kevin Modzelewski
    @kmod
    yah I'm looking through them but there are a lot, and they seem to stress different things
    Nathaniel J. Smith
    @njsmith
    numpy also has some microbenchmarks in its benchmarks/ directory
    Kevin Modzelewski
    @kmod
    Here's one I picked at random
    It seems like there's not much cross-C work to be done there
    Or well, I don't know, it's not obvious to me what people would want to be faster
    Nathaniel J. Smith
    @njsmith
    but historically, it's never made much sense to benchmark real complex numpy-using code, because optimizing individual numpy operations was essentially equivalent to optimizing complex numpy-using code :-)
    yeah, that cronbach benchmark is not going to really benefit from any fancy JIT
    I bet if you google any numba talk or tutorial you will find a nice example of a function that could be made much faster given a jit that understands numpy :-)
    datnamer
    @datnamer
    Here is an example benchmark
    Siu Kwan Lam
    @sklam
    @njsmith , I have a minimal PyIR interpreter working for our "array sum" function at https://github.com/sklam/pyir_interpreter
    Richard Plangger
    @planrich
    @sklam cool. maybe I find some time to rpythonify it. though much more thought has to be put into the op codes (i think)
    Siu Kwan Lam
    @sklam
    Yes, the opcodes need more discussion. I'm hoping that the interpreter will help make experiments on the opcode design.
    Mark had some foresight here :)