Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
Hi,I was able to linspace in numba but when i try to pass tuple in linspace I get an error
2 replies
def clus():
points = np.linspace((1,2),(5,6),20)
return points

TypingError Traceback (most recent call last)

<ipython-input-79-d59e448f35b9> in <module>
3 points = np.linspace((1,2),(5,6),20)
4 return points
----> 5 clus()

~/.local/lib/python3.6/site-packages/numba/core/dispatcher.py in _compile_for_args(self, args, *kws)
413 e.patch_message(msg)
--> 415 error_rewrite(e, 'typing')
416 except errors.UnsupportedError as e:
417 # Something unsupported is present in the user code, add help info

~/.local/lib/python3.6/site-packages/numba/core/dispatcher.py in error_rewrite(e, issue_type)
356 raise e
357 else:
--> 358 reraise(type(e), e, None)
360 argtypes = []

~/.local/lib/python3.6/site-packages/numba/core/utils.py in reraise(tp, value, tb)
78 value = tp()
79 if value.traceback is not tb:
---> 80 raise value.with_traceback(tb)
81 raise value

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function linspace at 0x7fa6305d9840>) found for signature:

linspace(Tuple(Literalint, Literalint), Tuple(Literalint, Literalint), Literalint)

There are 2 candidate implementations:

  - Of which 2 did not match due to:
  Overload of function 'linspace': File: numba/core/typing/npydecl.py: Line 615.
    With argument(s): '(UniTuple(int64 x 2), UniTuple(int64 x 2), int64)':
   No match.

During: resolving callee type: Function(<function linspace at 0x7fa6305d9840>)
During: typing of call at <ipython-input-79-d59e448f35b9> (3)

File "<ipython-input-79-d59e448f35b9>", line 3:
def clus():
points = np.linspace((1,2),(5,6),20)

Hello, I have a question:when I wanna use @jit(cache=True),will raise warningCannot cache compiled function "deal1" as it uses lifted code
Is there a way I can see information about the code that was lifted in debug mode? I want to use this fun to AOT
1 reply
CARON Renaud
Hello, how could one do S -= P with S and P with (n,2) and (1,2) array inside of a cuda device function efficiently please?
45 replies
Graham Markall
@BlackBip I just saw you also posted on discourse about the previous question - please do ping here if you create posts on discourse, then I can jump on there and have a look
Hello all!
1 reply
Gabriele Maurina
Hi, is it possible to compile and save vectorize and guvectorize functions? I am working on an application where time is key and 6 seconds to load numba and compile everytime is really long. I wish I could compile them only once and then disribute the binaries
6 replies
Samuel Holt
Hi All, Can we do another release of the library ? I love Numba and actively use it for mathematical processing at scale, however I really would like the new feature in PR : numba/numba#6608 of using f-strings to be used, however it is merged however it is not released yet. How can I help to run a release of the latest master on github ? Also thank you to you all, for such an amazing project and library
6 replies
Filippo Vicentini
Hi all (devs), what is the relationship between experimental.jitclass, experimental.structref and numba-estras/generic_jitclass ? Is there consensus of what will become the final implementation, and if some higher level interface will be built on top of structref?
4 replies
Hello, does anyone have any idea where this error may have come from? I am trying to use: cuda.managed_array but I find this error: AttributeError: 'ManagedNDArray' object has no attribute 'alloc_size'. I don't get this error when I use this function on 1 GPU only. But when I use it on several GPUs with cuda.gpus[i] I get this error.
4 replies
@gmarkall is there a method similar to register_attr and lower_getattr that adds indexing capability to a custom CUDA struct?
2 replies
from numba import njit

def div_by_zero(x):
    return x/0


def parent(x):
    return div_by_zero(x)


in example like this, the error message is

----> 15 parent(10)
ZeroDivisionError: division by zero

is there a way (e.g. debug option) to pinpoint to where the error actually happens? return x/0? In larger code base, it gets harder to find where the error actually happens.

8 replies
Guoqiang QI

I hava got so many warnings belows when i run python -m numba.runtests:
something like

..../root/huawei/numba/numba/tests/test_sort.py:934: NumbaPendingDeprecationWarning: 
Encountered the use of a type that is scheduled for deprecation: type 'reflected list' found for argument 'A' of function 'make_quicksort_impl.<locals>.GET'.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-reflection-for-list-and-set-types

File "numba/misc/quicksort.py", line 55:
        def GET(A, idx_or_val):



/root/huawei/numba/numba/core/object_mode_passes.py:161: NumbaDeprecationWarning: 
Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

For more information visit https://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "numba/tests/test_builtins.py", line 40:

def any_usecase(x, y):

..................................<string>:2: RuntimeWarning: invalid value encountered in arccos
........<string>:2: RuntimeWarning: invalid value encountered in arccos
.........................<string>:2: RuntimeWarning: invalid value encountered in arccosh
........<string>:2: RuntimeWarning: invalid value encountered in arccosh
.........................<string>:2: RuntimeWarning: invalid value encountered in arcsin
........<string>:2: RuntimeWarning: invalid value encountered in arcsin
.............................................................<string>:2: RuntimeWarning: divide by zero encountered in arctan
....<string>:2: RuntimeWarning: divide by zero encountered in arctan
...............................................<string>:2: RuntimeWarning: invalid value encountered in arctanh
........<string>:2: RuntimeWarning: invalid value encountered in arctanh
...................../root/huawei/numba/numba/core/typed_passes.py:331: NumbaPerformanceWarning: 
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.
File "numba/tests/test_blackscholes.py", line 27:
def cnd_array(d):
    <source elided>
               (K * (A1 + K * (A2 + K * (A3 + K * (A4 + K * A5))))))
    return np.where(d > 0, 1.0 - ret_val, ret_val)

  def cnd_array(d):
/root/huawei/numba/numba/core/object_mode_passes.py:151: NumbaWarning: Function "cnd_array" was compiled in object mode without forceobj=True.

Is it normal or something was error with my environment, conda list show messages:

packages in environment at /root/miniconda3/envs/numbaenv:
Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
_openmp_mutex             5.1                      51_gnu
blas                      1.0                    openblas
ca-certificates           2021.4.13            hd43f75c_1
certifi                   2020.12.5        py38hd43f75c_0
cffi                      1.14.5           py38hdced402_0
jinja2                    2.11.3             pyhd3eb1b0_0
ld_impl_linux-aarch64     2.36.1               h0ab8de2_3
libffi                    3.3                  h7c1a80f_2
libgcc-ng                 10.2.0              h1234567_51
libgfortran-ng            10.2.0              h9534d94_51
libgfortran5              10.2.0              h1234567_51
libgomp                   10.2.0              h1234567_51
libllvm10                 10.0.1               h0eb0881_6
libopenblas               0.3.13               hf4835c0_1
libstdcxx-ng              10.2.0              h1234567_51
llvmdev                   9.0.1                         0    numba
llvmlite                  0.37.0.dev0              pypi_0    pypi
markupsafe                1.1.1            py38hfd63f10_0
@guoqiangqi many many warnings during Numba testing are normal
some of them test warnings, some of them are expected and so on...
if none of the tests actually fail you are probably good to go
Guoqiang QI
@esc got it, thanks for your reply.

Hi, I am trying to use the normal cdf function with a numba njitted function, so e.g. using numba_scipy - but it fails so far - here my code:

from numba import njit
from scipy.special import ndtr
from scipy.stats import norm as no
# import numba_scipy

a = no.cdf(0.)                      # desired test outcome: 0.5

def test(x):
    a   = ndtr(x)
    b   = no.cdf(x)
    return a,b                   # naive implementations via scipy normal cdf functions


From what I read on numba_scipy, it just would need to be installed... however I get an error

Untyped global name 'ndtr': Cannot determine Numba type of <class 'numpy.ufunc'>

How should I code it? Thanks!

10 replies
Rishi Kulkarni

As I understood it, telling Numba to inline a function is mostly for type inference and doesn't affect performance. However, I have a function that IS getting performance benefits from inlining that I'm not quite understanding. I was wondering if anyone here had any ideas:

@nb.jit(nopython=True, inline='always')
def bounded_uint(ub):
    x = np.random.randint(low=2**32)
    m = ub * x
    l = np.bitwise_and(m, 2**32 - 1)
    if l < ub:
        t = (2**32 - ub) % ub
        while l < t:
            x = np.random.randint(low=2**32)
            m = ub * x
            l = np.bitwise_and(m, 2**32 - 1)            
    return m >> 32

def nb_fast_shuffle(arr):
    i = arr.shape[0] - 1
    while i > 0:
        j = bounded_uint(i + 1)
        arr[i], arr[j] = arr[j], arr[i]
        i -= 1

x = np.arange(1000)

4.25 µs ± 47.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

#change inline to "never"

7.73 µs ± 699 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


1 reply
Hi. I'm wrapping in a minimal way scipy.sparse.csr_matrix . If I make a tuple of arrays, or a jitclass with three fields, will the arrays be copied or passed by reference when I pass the struct/object around?
2 replies

Hello, I hope you are all well. I'm encountering a small and annoying error which must be related to the context used by cuda. Let me explain. I run several kernels and I want to read the data from the cpu while the kernel is running. So far, no problem with the solution proposed by @gmarkall
worked perfectly well to use a managed array and then read the contents of the variables used in the kernel afterwards.

The idea is that I run a kernel in a context using:
with gpu.cuda[gpu-number]:

I read what's inside while the gpu is running. So I can have the evolution of my variables.

And I do the same operation several times. I restart the kernel several times once the previous one has finished.
Only it doesn't work after the first iteration. I can't read the variables. The code wait just after the kernel function is called and prevents me from reading what's inside. It is only after the kernel has finished its calculations that I have access to the variables again.

The problem is a bit unclear on my side, and I have the impression that it is related to the contexts of cuda. Anybody have an idea? Thanks to you.

1 reply
Robin Carter
Hi, I have a function deep within my code decorated with parallel=True. Works great. For some starting configurations of my program I want to use the multiprocessing pool, sometimes it runs almost all of the code in serial. This is intended. However, in that case the numba function will be called from what is already 1 of 8 running processes. What will numba do in that case? Will it know somehow that it should only attempt to use 1 core? Or will it potentially generate 64 processes and incur a performance penalty? In which case do I need to have 2 version of my function, one with parallel=True and one without and pass the configuration data down to where it is called and call the right one of the 2 versions of my function depending on the configuration (multiprocessing pool or serial).
4 replies
Angus Hollands
Hi all,
If I have a type alias e.g. index_list_type = typeof(typed.List.empty_list(types.int64)), is there any way to create an instance of that type inside a jitted fn? i.e. how can I simplify this code?
def _groupby_indices_specialize(group):
    index_list_type = typeof(typed.List.empty_list(types.int64))
    group_type = typeof(group.dtype)

    def groupby(group):
        keys = typed.Dict.empty(group_type, types.uint64)
        groups = typed.List.empty_list(index_list_type)

        for i, group in enumerate(group):
            if group not in keys:
                key = keys[group] = len(keys)

            key = keys[group]
        return groups

    return groupby
4 replies
I'd also like to avoid testing for the key each time, e.g. expect that the condition fails (perf wise)
Hi! What is the correct way of checking the type of a namedtuple inside a generated_jit function? I was hoping something like the following would work:
from collections import namedtuple

import numpy as np
from numba import generated_jit

conf1 = namedtuple("conf1", "opt1 opt2 opt3")
conf2 = namedtuple("conf2", "opt1 opt2")

@generated_jit(nogil=True, nopython=True)
def g(config):

    if isinstance(config, conf1):
        def impl(config):
            print("Using conf1 impl.")
        def impl(config):
            print("Using conf2 impl.")

    return impl

c1 = conf1(np.ones(10), "321", 1)
c2 = conf2("123", 4.5)

10 replies
Could it be that @overload does not work at all for the cuda target? I am getting a mistrious error message No definition for lowering <built-in method sub_impl of _dynfunc._Closure object at 0x000001D6B08FCBE0>(<Signature>)
14 replies
Is there a reliable way to create a Record on the GPU? cuda.local.array((1,), dtype)[0] doesn't work for me.
5 replies
The types provided to a jited function for AOT compilation != the types provided when actually calling the jited function. Is there a place in the code where this mapping is defined so that you can, for instance, check if the user passed in the wrong arguments or is this mapping explicitly done?
2 replies
Pablo Rodríguez Robles
I am having trouble extending the following answer (point 1) to several arguments: https://stackoverflow.com/questions/49683653/how-to-pass-additional-parameters-to-numba-cfunc-passed-as-lowlevelcallable-to-s. The problem is about using a numba integrand with scipy.integrate.quad and passing other arguments to the integrand using LowLevelCallable.
Rishi Kulkarni

Hi all,

I'm having a memory leak issue with defining a jitted function inside of a closure. I have a class that defines a method as follows:

    def _random_return(col_to_permute, keys):

        def _random_return_impl(data):
            if col_to_permute == 0:
                nb_fast_shuffle(data[:, col_to_permute])
                nb_strat_shuffle(data[:, col_to_permute], keys)
            return data

        return _random_return_impl

But for whatever reason, it doesn't seem like the "old" function is removed from memory whenever _random_return() is used to redefine the function. This causes the memory usage of that Python process to increase without bound until the process is killed. Removing the jit decorator here fixes the issue and is only a slight performance loss, so I'm going with that, but I am puzzled why this is the case. Happy to hear any ideas.

4 replies
Is there a way to make a.xy = 1, 2 to behave like a = a._replace(xy=(1,2))? If this requires some low level stuff, I would be fine with that.
Henrique Fingler
Is there any reason why Numba calls cuDeviceComputeCapability instead of cuDeviceGetAttribute()? The former is deprecated since cuda 5.0?
12 replies
Li Jin
Hi! I have a question about numba/recarray: I am trying to rewrite a loop function in numba that takes a recarray, loop over it and apply a kernel function. I got most of it working, however, I have some trouble assigning to results back to the resulting recarray..Any thoughts?
import pandas as pd
import numba
import numpy as np

def kernel(arr):
    return np.nanmean(arr['v']), np.nanmean(arr['w']), np.nanmean(arr['v'] + arr['w'])

def loop(kernel, arr, dtype, size):
    results = np.zeros(len(arr), dtype=dtype)
    for i in range(len(arr)):
        result = kernel(arr[0: i])
        for j in range(size):
            # this doesn't work
            # results[i][j] = result[j]

    return results

arr = pd.DataFrame({'v': [1, 2, 3], 'w': [4, 5, 6]}).to_records()
dtype = np.dtype([('a', 'f8'), ('b', 'f4'), ('c', 'f8')])

loop(kernel, arr, dtype, 3)
13 replies
Hans Dembinski
Hi numba team, you guys are awesome. I noticed your Twitter account and wondered whether you are interested to feature https://github.com/HDembinski/monolens. Monolens converts part of your screen to grayscale or filters it to simulate color vision deficiency. The internals use Numba to do the color conversion in real-time as you move the window around using multiple cores! You could retweet this post by Matthew Feickert who generated a nice demo GIF https://twitter.com/HEPfeickert/status/1399835341486493699
6 replies
Jack O'Brien
Hello! I have a question regarding jitclasses. I'm currently trying to allocate an array of jitclass objects in a numba jitted function, but I don't want to run the instantiation yet. Is there a way to allocate an empty array of jitclass objects or pointers to jitclass objects inside a jitted function?
14 replies
I would like to use NUMBA_DISABLE_JIT option for debugging. however, I have Structref code and I see maximum recursion depth error. similar code for jitclass works fine. Would there be any way to use NUMBA_DISABLE_JIT with code that uses Structref?
#!/bin/env python
import os
os.environ['NUMBA_DISABLE_JIT'] = '1'

import numba
import numpy as np
from numba import njit
from numba.core import types
from numba.experimental import structref

class ValueType(types.StructRef):
    def preprocess_fields(self, fields):
        return tuple((name, types.unliteral(typ)) for name, typ in fields)
class Value(structref.StructRefProxy):
    def __new__(cls, val):
        return structref.StructRefProxy.__new__(cls, val)
    def val(self):
        return Value_get_val(self)
def Value_get_val(self):
    return self.val
structref.define_proxy(Value, ValueType, ["val"])
def div_by_zero(node):
    return node.val / 0
node1 = Value(54321)
RecursionError                            Traceback (most recent call last)
structref_mini_test.py in 
     35     return node.val / 0
---> 37 node1 = Value(54321)
     39 div_by_zero(node1)
structref_mini_test.py in __new__(cls, val)
     18 class Value(structref.StructRefProxy):
     19     def __new__(cls, val):
---> 20         return structref.StructRefProxy.__new__(cls, val)
     22     @property
~/anaconda3/envs/py38/lib/python3.8/site-packages/numba/experimental/structref.py in __new__(cls, *args)
    374             # cache it to attribute to avoid recompilation
    375             cls.__numba_ctor = ctor
--> 376         return ctor(*args)
    378     @property
~/anaconda3/envs/py38/lib/python3.8/site-packages/numba/experimental/structref.py in ctor(*args)
    371             @njit
    372             def ctor(*args):
--> 373                 return cls(*args)
    374             # cache it to attribute to avoid recompilation
    375             cls.__numba_ctor = ctor
... last 3 frames repeated, from the frame below ...
structref_mini_test.py in __new__(cls, val)
     18 class Value(structref.StructRefProxy):
     19     def __new__(cls, val):
---> 20         return structref.StructRefProxy.__new__(cls, val)
     22     @property
RecursionError: maximum recursion depth exceeded
10 replies
I am getting a Error message that ends with Enable logging at debug level for details.. How do I do that? logging.getLogger("numba").setLevel("DEBUG") doesn't do it. (Or there are not actually any debug messages produced)
8 replies
Celine Sin
Hello! I'm struggling with installing Numba to use tbb (thread here numba/numba#7095)
145 replies
Sylvain Finot

Hi, I wrote a kernel to solve a diffusion equation on a grid.
Then I call this kernel in a for loop.
The problem is that the after a while the execution time explodes.
Basically the same cell in jupyter takes ~300ms the first time (compilation time)
Then around 80ms each time.
But after a while it jumps to more than 2s

Am I allowed to share a link to the colab notebook ?

1 reply
Ian Henriksen
Hi, I'm in the process of checking the types in a cuda kernel I've written and noticed that for j in range(uint32(sigma.shape[3])): infers j64 as a 64 bit integer even though range is passed a 32 bit integer as an argument. Currently I'm just taking the 64 bit loop variable and converting it back to 32 bits for further arithmetic and hoping the LLVM optimizer can optimize out the redundant conversions. Is there some way to just make the loop variable have the desired precision instead of having to downcast back to 32 bits?
5 replies

Hi, I tried to do a numba.prange loop, but numba seems to be unhappy with the call i+1 in the loop. In detail, this is

for i in numba.prange(5):
A[i:i+1] += 2

Can anyone help with this issue? I need the array slice, the simple array element isn't enough.

Hannes Pahl
16 replies
Ian Henriksen
When writing a C++ cuda kernel I would normally annotate various pointer arguments with __restrict__. Is there any numba equivalent? Alternatively, is this just assumed by default?
9 replies
A long while ago @stuartarchibald and @eric-wieser had a conversation about @overload and @generated_jit that I was too lame to figure out how to start a gitter thread on my phone. If there's no existing python implementation of the target function, are the two methods exactly equivalent in terms of generated code? Is there any reason to choose one over the other?
4 replies
hello. are there any suggestions as to how to make a long running numba loop "interruptible" seeing as there's no KeyboardInterrupt to work with? might there be some mechanism to allow a loop to check for interruption every few iterations?
3 replies
@sklam: wonderful thankyou!
Ian Henriksen
While looking through the IR for a GPU kernel to check tha the types are what I expect, I noticed that the result of adding two 32 bit unsigned integers gets typed as a 64 bit integer. Is this intentional? Currently I'm just downcasting things aggressively and hoping the LLVM optimizer sorts out what's going on. Is there a better way?
2 replies