Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
c200chromebook
@c200chromebook
settled on
for i in range(5000):
        vals[cuda.threadIdx.x + offset] = math.sin(vals[cuda.threadIdx.x + offset])
c200chromebook
@c200chromebook
Shouldn't these kernels be overlapping?
import numpy as np
import numba as nb
from numba import cuda
import math


hundred_twenty_eight_floats = np.zeros(128)
hundred_twenty_eight_floats[:] = list(range(128))


@cuda.jit(nb.void(nb.float64[::1], nb.bool_))
def cycle(vals,update_early):
    offset = 64 if update_early else 0
    for i in range(5000):
        vals[cuda.grid(1) + offset] = math.sin(vals[cuda.grid(1)  + offset])


stream = cuda.stream()
stream2 = cuda.stream()
for i in range(25):
    cycle[2, 32, stream](hundred_twenty_eight_floats, False)
    cycle[2, 32, stream2](hundred_twenty_eight_floats, True)
image.png
when I actually run it I get:
MegaIng
@MegaIng
Isn't it, they 'may' overlap?
And if your GPU doesn't have enough warps to execute them at the same time, it wont.
c200chromebook
@c200chromebook
nah that wasn't it
needed to move it to device, note the tiny d-to-h
import numpy as np
import numba as nb
from numba import cuda
import math

hundred_twenty_eight_floats_h = np.zeros(128)
hundred_twenty_eight_floats_h[:] = list(range(128))
hundred_twenty_eight_floats = cuda.to_device(hundred_twenty_eight_floats_h)


@cuda.jit(nb.void(nb.float64[::1], nb.bool_))
def cycle(vals,update_early):
    offset = 64 if update_early else 0
    for i in range(5000):
        vals[cuda.grid(1) + offset] = math.sin(vals[cuda.grid(1)  + offset])


stream = cuda.stream()
stream2 = cuda.stream()
for i in range(25):
    cycle[2, 32, stream](hundred_twenty_eight_floats, False)
    cycle[2, 32, stream2](hundred_twenty_eight_floats, True)

hundred_twenty_eight_floats.to_host()

print(hundred_twenty_eight_floats_h.sum())
MegaIng
@MegaIng
Right.
c200chromebook
@c200chromebook
image.png
MegaIng
@MegaIng
Is there a way to tell numba cuda to not do zero checks for division? The behavior of returning infinity is exactly what I want at the moment. Instead numba inserts the checks, forcing me to add another branching check to actual return the correct value.
4 replies
c200chromebook
@c200chromebook
try fast math
MegaIng
@MegaIng
Doesn't work
c200chromebook
@c200chromebook
:/
MegaIng
@MegaIng
Still inserts the check. The checks are done by numba, and fast math is I think a cuda option
No option to skip the check.
stuartarchibald
@stuartarchibald
set the error model to 'numpy'
that might help?
MegaIng
@MegaIng
Not implemented for cuda
stuartarchibald
@stuartarchibald
Perhaps open an issue requesting it?
MegaIng
@MegaIng
Yeah. Will do that
stuartarchibald
@stuartarchibald
Great, thanks.
majra20
@majra20
Hi folks, I have a problem and I am not sure how to solve it. I have a cfunc to which I want to pass a list of pointers (they can point at different types). I want to use carray to get view on this pointers data. Is there a way to cast an int64 to a pointer? I couldn't find anything usefull.
23 replies
c200chromebook
@c200chromebook
Is there a memset exposed on the device?
20 replies
doing x[:] = 0 is slow
c200chromebook
@c200chromebook
Also is there a strided device_memset
Joan Saladich
@joan.saladich_gitlab

Hi I am trying to install numba through virtualenv (pip install numba) in a jupyter hub (remote instance) but I get the following error:
error: Command "gcc -pthread -B /sw/spack-rhel6/jupyterhub/jupyterhub/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Inumba -I/mnt/lustre01/pf/b/b381465/kernels/data/include -I/sw/spack-rhel6/jupyterhub/jupyterhub/include/python3.8 -c numba/_devicearray.cpp -o build/temp.linux-x86_64-3.8/numba/_devicearray.o -std=c++11" failed with exit status 1

Llvmlite is already installed and operative but I can't never get to install numba..

70 replies
dantexp
@dantexp:matrix.org
[m]
Hi :) I was wondering what happens when I call an njit function (A) with parallel=True and prange inside from a prange loop in another @njit(parallel=True) function (B). Will the prange in the first function (A) be sequential when called from within the second function (B)?
2 replies
dantexp
@dantexp:matrix.org
[m]
ok, thanks for clarification!
10 replies
c200chromebook
@c200chromebook
If I have a raw assignment to a device_array, eg, x[:] = 0, how do I assign it to a stream?
3 replies
luispedrogarcia
@luispedrogarcia
Hello, was wondering what this issue means-----
LoweringError: Failed in nopython mode pipeline (step: nopython mode backend) Operands must be the same type, got (i64, i32)
I'm trying to run esda.moran.Moran_Local, which is a function that examines spatial autocorrelation across geographic space.
10 replies
c200chromebook
@c200chromebook
import numpy as np
import numba as nb
from numba import cuda
sixty_four = np.zeros(64)

blocks_of_thirty_two = sixty_four.reshape(-1,32)
nb_sixty_four = cuda.to_device(sixty_four)
nb_sixty_four.reshape(-1, 32)
Here's a small one - cuda device array behaves differently than numpy. Numpy will happily reshape using a -1 to fill in excess, numba-cuda will not.
3 replies
manuels
@manuels
Hi, I am trying to call a AOT function from C (without any python interpreter). With a simple function like double square(double) you can easily call it if you define extern double square(double) asm ("cfunc._ZN10mymodule11square$2419Ed");.The case of a function likeunicode_type myrepeat(unicode_type s, int count) seems to be a bit more complex. I am not quite sure how the signature of this function should look like and how I should allocate my unicode_type argument (NRT or Py_BuildValue?). Maybe you guys can give me a hint?
4 replies
uchytilc
@uchytilc
@gmarkall @stuartarchibald This might be to little to late given the recent announcement (https://developer.nvidia.com/cuda-python#) but I finally published the ctypes CUDA bindings. I'm not sure if there is a better place to put this but here's the link if anyone wants to take a look. It is still very much a work in progress but most of the work in on the higher level Python API.
https://github.com/uchytilc/PyCu
20 replies
JSKenyon
@JSKenyon
Hi! I am busy trying to compose a relatively complicated piece of numba code using generated_jit/overload. However, whilst moving stuff to generated jit I noticed that performance was getting steadily worse. I then coded up the following example which demonstrates a fairly large discrepancy between jitted/generated_jit/overload code. I am not quite sure if this qualifies as a bug and wanted to be sure that I am not doing something obviously wrong. I will place the code in a reply to this message.
11 replies
Angus Hollands
@agoose77:matrix.org
[m]
Hi all, when someone passes in a view of an array, e.g. the columns of a C_CONTIGUOUS 2D array, I assume numba doesn't make a copy?
3 replies
bairdlampard534
@bairdlampard534
How do I get the first key of numba typed dict efficiently? is list(some_dict.keys())[0] the best solution (inside njit code)? thank you!
2 replies
joshuallee
@joshuallee
I am having trouble initialising a jitclass with a singular instance of numpy dtype. The following is the code where I am getting the error.
import numpy as np
from numba import jitclass, typeof

counter_dtype = np.dtype([('element', np.int32, 5)])
one_counter = np.zeros(1, dtype=counter_dtype)[0]

spec = [
    ("counter", typeof(one_counter))
]

@jitclass(spec)
class UpdatingStuff: 
    def __init__(self, counter):
        self.counter = counter

UpdatingStuff(one_counter)
31 replies
Maximilian Nöthe
@maxnoe
Hi,
I have an issue with caching. On each run of the program, numba creates a new cache file, although nothing has changed. The NUMBA_DEBUG_CACHE output has no useful output about why the cache is recreated. This amounted to a lot of files taking some GB of storage when this program was executed a couple thousand times.
Is this the result of a known limitation? What could I do to investigate?
32 replies
David Wynter
@davidwynter_gitlab
I wanted to see if I could improve performance of some pandas string functions. Specifically dfq['c'] = dfq['b'] + '|' + dfq['a]') and c_array = dfq['c'].unique(). I saw that numba now supports str operations. But so far my attempts have failed. It would be useful to combine these 2 operations into a single function.
Here is my attempt at the first
@njit(parallel=True)
def concat_pipe_str(prefix, suffix):
    for i in range(len(prefix)):
        prefix[i] = prefix[i] + '|' + suffix[i]
    return prefix
I am familiar with use the signature as a part of the decorator, but there is no numba type for str
I get this error
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type array(pyobject, 1d, C)
During: typing of argument at <ipython-input-2-577501f82f1f> (120)

File "<ipython-input-2-577501f82f1f>", line 120:
def concat_pipe_str(prefix, suffix):
    for i in range(len(prefix)):
    ^
David Wynter
@davidwynter_gitlab
Can someone give me an idea of how I use the str functions?
57 replies
Antriksh Misri
@antrikshmisri
Can someone tell me if scripts backed by numba are supported on raspberry pi?
8 replies
David Wynter
@davidwynter_gitlab
Is there a way to test if a str already exists in a numba Typed List?. I tried if not 'str' in typed_list:
4 replies
RendersJens
@RendersJens

I am projecting a bunch of triangles from a 3D triangular mesh onto a 2D detector with many pixels. I currently handle all triangles in parallel with a cuda kernel. Each triangle is only a little work to project because they are small so their projection overlaps with only a few pixels

However, sometimes my algorithm runs into large triangles, which cover almost the entire detector, causing one cuda thread to loop over all detector pixels sequentially. This is so slow it almost locks the machine.

To solve this I think it might help to change this loop to a new kernel call, but in numba I cannot call a kernel from a kernel. Is there some solution to this in numba? Or do I have to translate the entire thing to C++ and write a wrapper?

9 replies