Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info


I get NVVM compilation errors after updating to Cuda 11.2. Cuda 11.1 works fine.
Is this issue known? (CentOS 7, numpy 1.19.4, numba 0.52.0)

Error message (same for any kernel):

numba.cuda.cudadrv.error.NvvmError: Failed to compile

<unnamed> (44, 19): parse expected comma after load's type
2 replies
(wrong comminuty, sorry)
Kenichi Maehashi
I think you’re posting to a wrong room :wink: https://gitter.im/cupy/community
ups. sorry
roberto forcen
just posted some code on comments for article: "How fast is C++ compared to Python?" https://towardsdatascience.com/how-fast-is-c-compared-to-python-978f18f474c7, resulting that numba is 1.5 times faster than gcc++ -O3
17 replies
Ankit Mahato
Check out the iteractive open source mandelbrot set viewer I built to demonstrate several numba concepts - https://realworldpython.guide/ready-set-go-numba/
3 replies
Harshal Chaudhari

I am attempting to implement an n-ary tree using jitclass. Each node of the tree would have a dictionary of all its children nodes. Here's my implmentation:

from collections import OrderedDict
from numba import int32, optional, deferred_type
from numba import types, typed
from numba.experimental import jitclass

node_type = deferred_type()
kv_ty = (int32, node_type)

spec = OrderedDict()
spec['idx'] = int32
spec['parent'] = optional(node_type)
spec['children'] = types.DictType(*kv_ty)

class TreeNode(object):
    def __init__(self, idx):
        self.idx = idx
        self.parent = None
        self.children = typed.Dict.empty(*kv_ty)

if __name__ == "__main__":
    tn0 = TreeNode(0)
    tn1 = TreeNode(1)
    tn1.parent = tn0
    tn0.children[0] = tn1

When I try to run this code, I get an error

AttributeError: 'DeferredType' object has no attribute 'name'

Can someone please help with me this? It is difficult to find documentation of DeferredType because it is an internal feature of the package.


9 replies
is it possible to keep a reference to a plain-python class as a member of a jitclass? Something like
spec = [
    ('plain_python_class', ???)
def NumbaClass:
    def __init__(self, pyclass):
        self.pyclass = pyclass
    def doit(self):
        with objmode:
3 replies
numba how using "bytes" ? error message : resolving callee type: Function(<class 'bytes'>)

Is it possible to use a struct with a pointer to struct?
like this, adapted from here

from cffi import FFI
from numba.core.typing import cffi_utils

src = """
typedef struct nested_struct {
    float  x;
} nested_struct;

/* Define the C struct */
typedef struct my_struct {
    nested_struct*    nested;
} my_struct;

ffi = FFI()

print(cffi_utils.map_type(ffi.typeof('my_struct'), use_record_dtype=True))

NotImplementedError: Record(x[type=float32;offset=0];4;True)* cannot be represented as a Numpy dtype

1 reply
Nils J.
Hey Folks I am trying to understand why i get different results for a GUFunc and an equivalent python func. My results are within a close margin equal (~1e-6) but i want to understand if I am doing smth wrong. I added a colab link below, i am thankful for any help or advice.

hi i have a dict like below

self.dict_1 = Dict.empty(u8,i4)

when the elements getting larger, when i try to get the value like below, then it is getting slower


is there any way that i can give a hint to the dict size so that it can pre-allocate memory space?
as i can know how many elements i want at initialization, but i dont know the keys for the dict at initialization.
or any way that i can initialize the dict with some random keys first, and then clear away the dicts?
my purpose is to be able to do dict.get(key) faster

Siu Kwan Lam
@bgzhen64_gitlab, no pre-allocation routine is exposed. But if the .get is getting slower when the dict is getting bigger, i don’t think it’s a problem with allocation. Allocation happens at insert. The slower get may be caused by hash collision.
Is there any way to get the address of a jitclass from within a member function?
from numba.experimental import  jitclass

class Demo:
    def __init__(self):
    def address(self):
        return id(self)  # this doesn't work, fails with Untyped global name 'id'

8 replies

I'm wondering if there is an equivalent to the Numba's Tuple/UniTuple that is mutable for CUDA. For example I would like to write the following device function

@cuda.jit(device = True)
def test(p):
    p[1] = math.sqrt(p[1]**2 + p[2]**2) - 1
    return p

but instead have to write this device function

@cuda.jit(device = True)
def test(p):
    return (p[0], math.sqrt(p[1]**2 + p[2]**2) - 1, p[2])

In these examples p is a simple float3 CUDA vector style UniTuple representing an xyz position.

8 replies
Hannes Pahl
Hi everyone, happy new year! :)
Hannes Pahl
I was curious to have a closer look at some numba internals and gave myself the challenge to see whether I can extend the abilities of numba's Enum/IntEnum implementation. I seem to get most things working, but since I chose to turn the EnumMembers into Structs, I am struggling to get IntEnumMembers into numpy arrays. Is there a point where one can hook in, to tell numba to explicitly cast a value before adding it into a numpy array?
50 replies

@/all Thanks to all those who attended Numba's first open meeting yesterday. Meeting minutes are here: https://github.com/numba/numba/wiki/Minutes_2021_01_05 (thanks for uploading @esc). Topics for next week are:

  • Generic jit classes (jit class extensions, bit more like C++ templates and using Python typing annotations).
  • Changes for target-specific overload.

Everyone is welcome to join, meetings are weekly on Tuesdays, details are here https://numba.discourse.group/t/weekly-open-dev-meeting-2021/417/2

2 replies
hi! I'm trying to use external C functions in numba nopython mode. I have another big function that uses these external C functions, and I want it to be cached (cache=True). Is any of cffi, cython or ctypes interface cacheable?
12 replies
Modifying DictType from multiple threads isn't supported- how about the typed list? Can that be appended from multiple threads in prange()?
23 replies
How to pass a Tuple(2x Tuple(ints x i)) as a literal? For example tensordot(A, B,axes=((1,0),(-2,-1))) where ((1,0),(-2,-1) should be passed as a Tuple of Tuple of int literals to generate a specific implementation.
23 replies
Is there a way to make a jitclass with a jit function member? In this case I'd like to_be_called to be some kind of pointer/reference to the jit function.
from numba import njit, int64
from numba.experimental import jitclass

@jitclass([("to_be_called", int64)])  # what type goes here?
class Caller:
    def __init__(self, to_be_called):
        self.to_be_called = to_be_called

    def call(self):
        # I'd like to write something like
        # return self.to_be_called()
        return 0

def to_be_called_impl():
    return 42

# caller = Caller(to_be_called_impl)
# caller.call()
10 replies
https://github.com/numba/numba/issues/6345#issuecomment-759509837 <-- we have landed initial Python 3.9 support! 🎉thanks to @stuartarchibald and @sklam !
Okay, went straight to dicourse this time :)
1 reply
Dieter Werthmüller

My GitHub Action startet to fail recently (Python 3.6, 3.7; not 3.8) with the following:

     import numba as nb
/usr/share/miniconda3/envs/tenv/site-packages/numba/__init__.py:14: in <module>
    from numba.core import config
/usr/share/miniconda3/envs/tenv/site-packages/numba/core/config.py:20: in <module>
    MACHINE_BITS = tuple.__itemsize__ * 8
E   AttributeError: type object 'tuple' has no attribute '__itemsize__'

Is this a known issue or should I open a GitHub issue (or discourse discussion) with more details?

26 replies
Sean M. Law

Hi all, I have a STUMPY user who has CUDA 11.2 installed and which may not be compatible with his Numba version. However, in STUMPY (which depends on Numba), I check for cuda.is_available() before importing the submodules that use cuda and, in this case, CUDA is available but it's the wrong version. This is the error that they are receiving:

  File ".../lib/python3.6/site-packages/stumpy/__init__.py", line 21, in <module>
    from .gpu_stump import gpu_stump  # noqa: F401

  File ".../lib/python3.6/site-packages/stumpy/gpu_stump.py", line 18, in <module>
    "(i8, f8[:], f8[:], i8,  f8[:], f8[:], f8[:], f8[:], f8[:],"

  File ".../lib/python3.6/site-packages/numba/cuda/decorators.py", line 136, in kernel_jit

  File ".../lib/python3.6/site-packages/numba/cuda/compiler.py", line 811, in __init__

  File ".../lib/python3.6/site-packages/numba/cuda/compiler.py", line 952, in compile

  File ".../lib/python3.6/site-packages/numba/cuda/compiler.py", line 576, in bind

  File ".../lib/python3.6/site-packages/numba/cuda/compiler.py", line 446, in get
    ptx = self.ptx.get()

  File ".../lib/python3.6/site-packages/numba/cuda/compiler.py", line 416, in get

  File ".../lib/python3.6/site-packages/numba/cuda/cudadrv/nvvm.py", line 548, in llvm_to_ptx
    ptx = cu.compile(**opts)

  File ".../lib/python3.6/site-packages/numba/cuda/cudadrv/nvvm.py", line 236, in compile
    self._try_error(err, 'Failed to compile\n')

  File ".../lib/python3.6/site-packages/numba/cuda/cudadrv/nvvm.py", line 254, in _try_error
    self.driver.check_error(err, "%s\n%s" % (msg, self.get_log()))

  File ".../lib/python3.6/site-packages/numba/cuda/cudadrv/nvvm.py", line 144, in check_error
    raise exc

NvvmError: Failed to compile

<unnamed> (114, 19): parse expected comma after load's type

Is there a Numba best practice for also checking whether the CUDA version is compatible? Or am I focusing on the wrong problem?

24 replies
Can a class be AOT compiled in Numba ? couldnt find any documentation on that...
1 reply
@stuartarchibald am here for now
82 replies
@here Open meeting is starting now if anyone wants to join: https://numba.discourse.group/t/public-meeting-tuesday-january-19-2021/442
Mario Roy
Is there a way to specify the chunk_size value so that behind the scene OpenMP is configured (for example chunk_size 1). There is NUMBA_NUM_THREADS. I wish to be able to specify NUMBA_CHUNK_SIZE for prange. Is that possible?
pragma omp parallel for schedule(static, 1)
1 reply
Thank you. I'm making a python demo consuming many cores and using Numba. Plus computing on the GPU and CPU simultaneously. But wish there was a way to tell Numba to use 1 for chunk_size.
45 replies
Mario Roy
Yeah. Thank you.
Quick question. i have a path dependent algo which involves a np.array containing 'dtype='datetime64[ns]'. does njit work on this or do i have to convert this to an array of inters in order to use nopython mode?
2 replies
and another question: suppose the fastmath argument is irrelevant for ARM chips?
5 replies
@/all Numba Weekly Open Development meeting, 1.5hrs from now, details if you want to join.
Graham Markall
Turns out I was wrong, it's the .lib files that are arch-specific and got the wrong version included - my fix is already in the recipe: https://github.com/conda-forge/cudatoolkit-feedstock/blob/master/recipe/build.py#L294
Umberto Lupo

Hi! I hope this is the right place to ask for help on the following. I am currently trying to njit an algorithm for processing Delaunay triangulations, with the following features:

  1. The inputs are an array X and a homogeneous list simplices of tuples of integers. The shape of X and the length of each tuple are arbitrary, but what is always true is that X.shape[1] equals the length of each tuple in simplices plus 1. Call this the "dimension", dim. The tuples may be assumed to be sorted.
  2. In pure Python, I would have the algorithm work with and return a dictionary d with keys all integers from 1 to dim (included). To key i there would correspond another dictionary whose keys are all the sub-tuples of length i + 1 of the tuples in simplices, and whose values are floats. Example: d = {1: {(0, 1): 0., (0, 2): 0., (1, 2): 0}, 2: {(0, 1, 2): 0.}} for dim equal to 2. Notice that d is inhomogeneous but each d[i] is homogeneous.
  3. In the inner workings of the algorithm, and again assuming pure Python, each d[i] is constructed iteratively from d[i + 1]. In particular, one would want to write loops as follows:
    # sigma is a tuple in d[i + 1].keys()
    for k in range(i + 1):
     tau = sigma[:i] + sigma[i + 1:]
     # And do stuff with tau

Now I have at least two problems trying to njit this algorithm. One comes from point 2, because I would like to have an inhomogeneous dictionary. The second comes from 3, because numba is not happy with non-constant slicing. Note that tau in the code snippet is the tuple form of np.delete(sigma, i), but I would really need the tuple because it is needed as a dictionary key and arrays can't be used for that purpose.

I know that there is plenty of performance to be gained from njitting this program, as I wrote an ad-hoc version which only works when dim is 2 (by defining two dictionaries d_1 and d_2 instead of d, and writing out the for loop in 3 explicitly) and gained a large factor in runtime.

Thanks in advance for your help! Happy to provide more context if anything is unclear.

7 replies
Stefano Roberto Soleti
Hello! I don't know if this has been asked already, but I couldn't find any information. I have a CUDA device function that I would like to call also outside a kernel. What is the best way to achieve that? I would like to avoid having e.g. myfunc and myfunc_gpu
Graham Markall
the pattern for doing that can look something like:
def myfunc(arg):

myfunc_gpu = cuda.jit(myfunc)
@soleti does that fit in with the way you're trying to code?
Stefano Roberto Soleti

Yes, that was the solution I came up with, I was wondering if there was a way to have only

def myfunc(arg):

where I could call myfunc both inside and outside a CUDA kernel

Graham Markall
oh, yes - just do:
def myfunc(arg):
Stefano Roberto Soleti
Oh ok I didn't know, njitted functions can be called from a CUDA kernel?
Graham Markall
(you can call an @njit function from the host or a CUDA kernel)
Stefano Roberto Soleti
Umberto Lupo

This is not-so-secretly related to my question above but as it is very small, perhaps it fits better here. If I try to use the new immutable dictionary feature as follows:

def foo():
    d = {0: {(0, 0, 0): 0.}, 1: {(1, 1): 0.}}
    d[1][(2, 2)] = 2.
    return d

I get the following error:

No implementation of function Function(<built-in function setitem>) found for signature:

 >>> setitem(DictType[UniTuple(int64 x 3),float64]<iv=None>, UniTuple(Literal[int](4) x 2), float64)

Is this expected behaviour?

36 replies