Public channel for discussing Numba usage. Don't post confidential info here! Consider posting questions to: https://numba.discourse.group/ !
import numpy as np
import numba as nb
from numba import cuda
import math
hundred_twenty_eight_floats = np.zeros(128)
hundred_twenty_eight_floats[:] = list(range(128))
@cuda.jit(nb.void(nb.float64[::1], nb.bool_))
def cycle(vals,update_early):
offset = 64 if update_early else 0
for i in range(5000):
vals[cuda.grid(1) + offset] = math.sin(vals[cuda.grid(1) + offset])
stream = cuda.stream()
stream2 = cuda.stream()
for i in range(25):
cycle[2, 32, stream](hundred_twenty_eight_floats, False)
cycle[2, 32, stream2](hundred_twenty_eight_floats, True)
import numpy as np
import numba as nb
from numba import cuda
import math
hundred_twenty_eight_floats_h = np.zeros(128)
hundred_twenty_eight_floats_h[:] = list(range(128))
hundred_twenty_eight_floats = cuda.to_device(hundred_twenty_eight_floats_h)
@cuda.jit(nb.void(nb.float64[::1], nb.bool_))
def cycle(vals,update_early):
offset = 64 if update_early else 0
for i in range(5000):
vals[cuda.grid(1) + offset] = math.sin(vals[cuda.grid(1) + offset])
stream = cuda.stream()
stream2 = cuda.stream()
for i in range(25):
cycle[2, 32, stream](hundred_twenty_eight_floats, False)
cycle[2, 32, stream2](hundred_twenty_eight_floats, True)
hundred_twenty_eight_floats.to_host()
print(hundred_twenty_eight_floats_h.sum())
@jitdecorator(error_model='numpy')
. Docs https://numba.readthedocs.io/en/stable/reference/jit-compilation.html?highlight=error_model#numba.jit
Hi I am trying to install numba through virtualenv (pip install numba) in a jupyter hub (remote instance) but I get the following error:
error: Command "gcc -pthread -B /sw/spack-rhel6/jupyterhub/jupyterhub/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Inumba -I/mnt/lustre01/pf/b/b381465/kernels/data/include -I/sw/spack-rhel6/jupyterhub/jupyterhub/include/python3.8 -c numba/_devicearray.cpp -o build/temp.linux-x86_64-3.8/numba/_devicearray.o -std=c++11" failed with exit status 1
Llvmlite is already installed and operative but I can't never get to install numba..
njit
function (A) with parallel=True
and prange
inside from a prange
loop in another @njit(parallel=True)
function (B). Will the prange
in the first function (A) be sequential when called from within the second function (B)?
LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Operands must be the same type, got (i64, i32)
import numpy as np
import numba as nb
from numba import cuda
sixty_four = np.zeros(64)
blocks_of_thirty_two = sixty_four.reshape(-1,32)
nb_sixty_four = cuda.to_device(sixty_four)
nb_sixty_four.reshape(-1, 32)
Here's a small one - cuda device array behaves differently than numpy. Numpy will happily reshape using a -1 to fill in excess, numba-cuda will not.
double square(double)
you can easily call it if you define extern double square(double) asm ("cfunc._ZN10mymodule11square$2419Ed");
.The case of a function likeunicode_type myrepeat(unicode_type s, int count)
seems to be a bit more complex. I am not quite sure how the signature of this function should look like and how I should allocate my unicode_type argument (NRT or Py_BuildValue?). Maybe you guys can give me a hint?
import numpy as np
from numba import jitclass, typeof
counter_dtype = np.dtype([('element', np.int32, 5)])
one_counter = np.zeros(1, dtype=counter_dtype)[0]
spec = [
("counter", typeof(one_counter))
]
@jitclass(spec)
class UpdatingStuff:
def __init__(self, counter):
self.counter = counter
UpdatingStuff(one_counter)
NUMBA_DEBUG_CACHE
output has no useful output about why the cache is recreated. This amounted to a lot of files taking some GB of storage when this program was executed a couple thousand times.dfq['c'] = dfq['b'] + '|' + dfq['a]')
and c_array = dfq['c'].unique()
. I saw that numba now supports str operations. But so far my attempts have failed. It would be useful to combine these 2 operations into a single function.
@njit(parallel=True)
def concat_pipe_str(prefix, suffix):
for i in range(len(prefix)):
prefix[i] = prefix[i] + '|' + suffix[i]
return prefix
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type array(pyobject, 1d, C)
During: typing of argument at <ipython-input-2-577501f82f1f> (120)
File "<ipython-input-2-577501f82f1f>", line 120:
def concat_pipe_str(prefix, suffix):
for i in range(len(prefix)):
^
I am projecting a bunch of triangles from a 3D triangular mesh onto a 2D detector with many pixels. I currently handle all triangles in parallel with a cuda kernel. Each triangle is only a little work to project because they are small so their projection overlaps with only a few pixels
However, sometimes my algorithm runs into large triangles, which cover almost the entire detector, causing one cuda thread to loop over all detector pixels sequentially. This is so slow it almost locks the machine.
To solve this I think it might help to change this loop to a new kernel call, but in numba I cannot call a kernel from a kernel. Is there some solution to this in numba? Or do I have to translate the entire thing to C++ and write a wrapper?