Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
stuartarchibald
@stuartarchibald
Numba public meeting is about to start if anyone wants to join. Details are here: https://numba.discourse.group/t/public-numba-dev-meeting-tuesday-november-10-2020/
nelson2005
@nelson2005
using objmode is it possible for the jit function to test whether objmode is required for a given function call, similar to the dummy 'needs_objmode' function in this example?
def f1(x):
    return x + 2

@njit
def tester():
    x = 2
    if needs_objmode(f1):
        with objmode:
            x += f1(x)
    else:
        x += f1(x)
59 replies
nelson2005
@nelson2005

Ah, this was what I was probably looking for List.empty_list(types.int64(types.int64).as_type())

This does result in some NumbaTypeSafetyWarnings about unsafe casts... those can be disregarded?

2 replies
Angus Hollands
@agoose77

I've noticed that when implementing a pattern like

def single_item(row):
    return np.array(...)

def multiple_items(rows):
    result = np.array(len(rows), N)
    for i, row in enumerate(rows):
        result[i] = single_item(row)
    return result

Numba is usually slower than if I pass in a results array to single_item, e.g. out = result[i]. Is this a sign that I should lean towards result arrays for my "private" numba routines?

12 replies
eecarres
@eecarres

Hi again! I'm trying this:

@numba.njit(["float64[:](float64[:,:],float64,float64,float64,float64,float64)"])
def sig_nb_parabol_eqn_2d(data, a, b, c, d, e):
    x = data[0]
    y = data[1]
    return(-(((x - b) / a)**2 + ((y - d) / c)**2) + e).ravel()

But it says me this...

TypeError                                 Traceback (most recent call last)
<ipython-input-3-3107ce5d498b> in <module>
     28 
     29 nb_parabol_eqn_2d(data, a, b, c, d, e)
---> 30 sig_nb_parabol_eqn_2d(data, a, b, c, d, e)
     31 
     32 print(timeit.timeit('parabol_eqn_2d(data, a, b, c, d, e)',globals=globals(),number=100000))

/usr/local/lib/python3.5/site-packages/numba/dispatcher.py in _explain_matching_error(self, *args, **kws)
    572         msg = ("No matching definition for argument type(s) %s"
    573                % ', '.join(map(str, args)))
--> 574         raise TypeError(msg)
    575 
    576     def _search_new_conversions(self, *args, **kws):

TypeError: No matching definition for argument type(s) array(float64, 3d, C), float64, float64, float64, float64, float64

I don't get where it's getting that 3d float 64 array...

27 replies
Matteo Lepur
@matteolepur
Hi, does anyone know where documentation for numba-scipy is? I just need a basic example to use a scipy function within a numba function
3 replies
Valentin Haenel
@esc
@jpivarski : success! I managed to get the Numba integration testing for Awkward array to work: numba/numba-integration-testing#54
5 replies
Raven Pillmann
@RavenPillmann
Hi folks! I've been trying to profile GPU usage of numba cuda kernels. Using nsights systems, I'm not able to see any CUDA traces, but I know the numba cuda kernels are being called. Has anyone had success using Nvidia Nsights Systems to profile numba cuda kernels?
15 replies
edwinlim0919
@edwinlim0919
With the new Apple M1 chip coming out, would Numba be able to support Mac devices using the M1 chip? I think the M1 uses an ARM processor.
1 reply
Riccardo De Maria
@rdemaria
Is it normal that a function of 250 loc takes 20 second to compile in numba, while clang 0.4 s? I was expecting similar compilation times...
morizin
@morizin
I see that CuPy and Numba could be integrated
is that true
1 reply
James Gray
@JimLGray_gitlab
Hi everyone, I'm really confused because I can't seem to even import numba.
Has this happened for anyone else?
I am running 0.51.2
on ubuntu 18.04.5
Valentin Haenel
@esc
@JimLGray_gitlab please open an issue on the issue tracker so we can diagnose this, thanks!
1 reply
@rdemaria no, that seems off, please do open an issue on the issue tracker so that this can be diagnosed.
James Gray
@JimLGray_gitlab
Thanks, I'll try fiddling around a bit first to see if it's just an issue with setup first. ie. reinstall everything and do all my system updates.
Hameer Abbasi
@hameerabbasi

Hello. Last week, the meeting was cancelled in favor of a public meeting --- But it disappeared off the calendar, which made me miss it, unfortunately.

Is there a calendar I can subscribe to for public meetings?

8 replies
Jens Renders
@JensRenders_gitlab
I just finished tracking down a bug in my project caused by the fact that numba cuda atomic operations do not support negative wrap around indexing. The documentation makes it seem like it does. Not sure if this is the place to report such an issue, please send me to the correct place if needed :)
3 replies
roberto forcen
@rforcen
@c200chromebook try this:
import numpy as np
import numba
from numba import prange

n=100
floats = np.zeros((n, n))

@numba.njit(parallel=True)
def compilable(dims, x):
    def expensive_fn(x):
        while x > 1.00001:
            x **= 0.9999
        return x

    for i in prange(dims[0]):
        for j in prange(dims[1]):
            x[i][j] = expensive_fn(i + j + 50)


compilable((n, n), floats)
print(floats)
roberto forcen
@rforcen
hi!, just joined the group and wanted to share some code: Voronoi is a typical algo. for testing multithreading capabilities, this implementation uses both @vectorize and njit with parallel option:
'''
Voronoi problem solved with numba vectorize & njit(paralell)
'''
import timeit

import numpy as np
from PIL import Image
from numba import vectorize, njit, prange, int32


def voronoi(size, points, colors):
    h, w = size
    n: int = w * h
    n_points: int = len(points)
    amask: int = np.int32(0xff00_0000)
    max_int: int = np.iinfo(np.int32).max

    @vectorize('int32(int32)', target='parallel', nopython=True, fastmath=True)
    def calc_color(ix):  # current index 0..n -> color

        def distance_squared(p0, p1):
            d0, d1 = p0[0] - p1[0], p0[1] - p1[1]
            return d0 * d0 + d1 * d1

        min_dist = max_int
        circ_diam = 1  # as distance is squared
        ind = -1

        current_point = ix % w, ix // w

        for i in range(n_points):
            d = distance_squared(points[i], current_point)

            if d < circ_diam: break
            if d < min_dist:
                min_dist = d
                ind = i

        return amask if ind == -1 else colors[ind] | amask

    return calc_color(np.arange(n).astype('i4'))


@njit(parallel=True, fastmath=True)
def voronoi_jit(size, points, colors):
    h, w = size
    n: int = w * h
    n_points: int = len(points)
    amask: int = np.int32(0xff00_0000)
    max_int: int = np.iinfo(np.int32).max

    def calc_color(ix):  # current index 0..n -> color

        def distance_squared(p0, p1):
            d0, d1 = p0[0] - p1[0], p0[1] - p1[1]
            return d0 * d0 + d1 * d1

        min_dist = max_int
        circ_diam = 1  # as distance is squared
        ind = -1

        current_point = ix % w, ix // w

        for i in range(n_points):
            d = distance_squared(points[i], current_point)

            if d < circ_diam: break
            if d < min_dist:
                min_dist = d
                ind = i

        return amask if ind == -1 else colors[ind] | amask

    img = np.empty(n, dtype=int32)

    for i in prange(n):
        img[i] = calc_color(i)

    return img


def test_voronoi():
    sz = 1024 * 2

    size = (sz, sz)
    n = sz
    n_points = n * 3
    points = np.random.uniform(0, min(size), size=n_points * 2).reshape(n_points, 2).astype('i4')  # x,y
    colors = np.random.uniform(0x0000_0000, 0x00ff_ffff, size=n_points).astype('i4')

    t0 = timeit.default_timer()

    image = voronoi(size, points, colors)
    # image = voronoi_jit(size, points, colors)

    t0 = timeit.default_timer() - t0

    img = Image.frombytes(mode='RGBA', size=size, data=image).show()  # .save('voronoi.png', format='png')

    print(f'generated voronoi, {n_points} points, of {size} in {t0:.3} secs')


if __name__ == '__main__':
    test_voronoi()
luk-f-a
@luk-f-a
hi @rforcen , thanks a lot for sharing. Gitter does not have good search capability, so no one would find your code in a few days. Why don't you post it to discourse? https://numba.discourse.group/
c200chromebook
@c200chromebook

Another odd q. This bombs, but I don't think it should, as the len of a cuda local array is necessarily a constant.

import numpy
from numba import cuda

ret = numpy.ndarray(1)

@cuda.jit
def a(r):
    la = cuda.local.array(2, dtype=numpy.float64)
    la[0] = 2.1
    la[1] = 3.2
    arr = cuda.local.array(len(la), dtype=numpy.int32)
    r[0] = la[0]+la[1]

a[1,1](ret)
print(ret[0])

same if I do

    arr = cuda.local.array(la.shape, dtype=numpy.int32)
17 replies
c200chromebook
@c200chromebook
What's the equivalent of cuda.local.arrayfor njit? It seems to not be np.ndarray
2 replies
roberto forcen
@rforcen
plans to include GLSL as target?, this will definitely add metter compatibility
4 replies

hi @rforcen , thanks a lot for sharing. Gitter does not have good search capability, so no one would find your code in a few days. Why don't you post it to discourse? https://numba.discourse.group/

ok, thanks for the advise

Jouestheminer
@jouestheminer_twitter
Hi, is it me or the Numba version in pip is still 0.51.2 (https://pypi.org/project/numba/) while in anaconda is already the 0.52.0RC3? When will you update numba in PyPI??
4 replies
roberto forcen
@rforcen
how can i access the 'current item' in parallel frames?, i.e. as gl_VertexID is GLSL,
i'm now generating a np.arange as a parameter to simulate this but it requires extra time and memory
built two parallel numba func 4.6 times faster than numpy equivalent
@njit(parallel=True)  # -> np.arange(n * n, dtype='i4').reshape(n, n)
def grid(n, m):
    v = np.empty((n, m), dtype=np.int32)
    for i in prange(n):
        for j in prange(m):
            v[i][j] = i * m + j
    return v


@njit(parallel=True)  # -> np.arange(n, dtype='i4')
def xrange(n):
    v = np.empty((n), dtype=np.int32)
    for i in prange(n):
        v[i] = i
    return v
49 replies
roberto forcen
@rforcen
just updated git with all code snippets: https://github.com/rforcen/numba
roberto forcen
@rforcen
testing on arm (aarch64 - Amlogic S905x3, Cortex-A55, 4 cores, 1.8GHz) device some of latest developments and it works great!, really nice performance in parallel code, about 1/10 of a i7-4790, installing with pip & llvm dev
Jack Miller
@NeutralKaon
Hello! I have a really silly question that I'd just love a bit of advice with.
I'm a scientist, and I'm trying to use numba to speed up scipy's ODE solver so that I can call the results rapidly in an MCMC package, emcee, which is run with multiprocessing.
If I call my likelihood function directly, which involves several numba @njit(cache=True)'s, I see a significant speed-up: what would take ~300 s takes ~6. However, when running the MCMC chain, although I see a speed-up, this isn't anything like as big.
If I profile the code with SnakeViz, it looks like a lot of time is spent in compiling the bytecode:
image.png
Can I ask -- is my interpretation of "compiler_lock.py taking up 279 s means that 279s were spent compiling" correct?
And furthermore, what can I do to prevent this? It looks like each multiprocessing pool is re-compiling what I hope would be the same likelihood function -- even though I've cached it? Should I do ahead-of-time compilation?
Thank you all v. much!
roberto forcen
@rforcen

And furthermore, what can I do to prevent this? It looks like each multiprocessing pool is re-compiling what I hope would be the same likelihood function -- even though I've cached it? Should I do ahead-of-time compilation?

i've also notice a delay on first run so i use a 'warm up' first call with a small number of iterations, see #warm up on https://github.com/rforcen/numba/blob/main/DomainColoring.py

Jack Miller
@NeutralKaon
@rforcen good idea! However, I don't think I see that -- I just ran (overnight) emcee with 16 iterations and it looks like the "compile" part has scaled linearly with the number of iterations. I don't think that this should happen -- shouldn't it be cached once, and not perpetually recompiled?
15 replies
image.png
^have another picture
Jack Miller
@NeutralKaon
One last quick question -- numba-scipy is on github, but looks a bit light at the moment -- is support for odepack planned? (Thank you all v. much for being amazing!)
2 replies
Paul Ortmann
@p-ortmann

I am trying to set up a jitclass with an empty list as below and am getting a typing error that I don't quite understand.

import numba as nb
from collections import OrderedDict
from numba.core.types import Tuple, int64, float64
from numba.core.types.containers import ListType
from numba.typed.typedlist import List

spec_dynamic_events = [('capacity', ListType(Tuple((float64, float64))))]

spec_dynamic_events=OrderedDict(spec_dynamic_events)
@nb.experimental.jitclass(spec_dynamic_events)
class DynamicEvent(object):
    def __init__(self):
        self.capacity = List.empty_list(Tuple((float64,float64)))

my_event=DynamicEvent()

This yields 'Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name 'Tuple': cannot determine Numba type of <class 'numba.core.types.abstract._TypeMetaclass'> '
Could somebody give me a hint where to look for a solution or is this not supported?

5 replies
Black Box Technology
@blackbox-tech

Is there any documentation on the @guvectorize layout declarations? I'm having problems passing in an array. For example:

@nb.guvectorize(["f8[:], f8[:], f8[:], f8[:]", ],
                "(len_a),(len_b)->(len_a),(len_a)", nopython=True)
def foo(a, b, c, d):
    len_a = len(a)
    i = 0
    mx = np.max(b)
    mn = np.min(b)
    while i < len_a:        
        c[i] = a[i] * mx
        d[i] = a[i] + mn
        i += 1


a = np.array([[ 0., 0., 0., 0.],
              [ 1., 1., 1., 1.],
              [ 2., 2., 2., 2.]])

b = np.array([[ 5., -1.,  1., -1.],
              [ 5.,  2.,  2.,  1.]])

c, b = foo(a, b, axis=0)

This code gives the error:

TypeError: foo: axis can only be used with a single shared core dimension, not with the 2 distinct ones implied by signature (len_a),(len_b)->(len_a),(len_a).

There is only 1 core dimension in this function (the dimension of the 1st array that is used to determine the size of the output arrays), how do I declare the second array in the layout to be an array but a non-core dimension?

I can workaround this error being raised using this wrapper, but it is a hack:

def foo_with_axis(a, b, axis=-1):
    c, d = foo(np.moveaxis(a, axis, -1), np.moveaxis(b, axis, -1))
    return np.moveaxis(c, axis, -1), np.moveaxis(d, axis, -1)
9 replies
Razvan Chitu
@razvanch
hi. I would like to @jitclass a skiplist implementation, and as part of that my Node class will need to have a member that is a list of other nodes. what would be the best way to do this in numba?
27 replies
roberto forcen
@rforcen
is there a numba equivalent of python's eval func or must it be hand coded using the @intrinsic decorator?
31 replies
Haozhi Sha
@jshcht
Hello! How to accelerate calculations of 4D array efficiently? Should I use 4 loops with 3 'range' and one 'prange'?
7 replies
Chris Barnes
@clbarnes
seems like there's a comma on the wrong place in the setup.py which means you get the wrong error type and message when you have an unsupported python version. The PR is a single character but I raised it anyway...
Leo Fang
@leofang
Hi @gmarkall quick question: In the latest CAI v3, in addition to act at import times shouldn't NUMBA_CUDA_ARRAY_INTERFACE_SYNC also act at export times (by making sure no stream would be exported to other libraries when it's set to 0)?