Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
Graham Markall
@gmarkall
it would be some work to make a library with a numba extension for a VCD object, but I think that would make an awesome tool for doing fast analysis of VCDs
this actually sounds super interesting but I have no free time to really contribute to such an idea
bosje
@bosje:matrix.org
[m]
hmm
i could possibly implement something like that with some guidance with numba best practices
Graham Markall
@gmarkall
but if you have the time and are keen to do it, I'd be happy to try and support you with answering questions (and shouting about it to all my FPGA friends)
(to clarify, people I know / have worked with that do stuff with FPGAs... I'm not forming meaningful relationships with reconfigurable logic :-) )
probably the best way to start would be to work through the "Extending Numba" section of the docs - then you'll have some ability to estimate the effort needed and whether you want to bother going further
bosje
@bosje:matrix.org
[m]
Great, i'll take a look. Thanks!
By the way, do you know if numpy has a sparse array as i described?
Graham Markall
@gmarkall
I think scipy has sparse representations like CSR, etc
https://docs.scipy.org/doc/scipy/reference/sparse.html - 'd guess that these use underlying numpy arrays
bosje
@bosje:matrix.org
[m]
does numba support those?
Graham Markall
@gmarkall
Looks like it presently doesnt
i think you;d probably be quickest getting things working using your own sparse representation with a couple of numpy arrays, like one for indices and one for values
bosje
@bosje:matrix.org
[m]
Could you give a quick example/pseudocode of what that would look like? I imagine an object with two arrays, but how would the indices map to the data array?
esc
@esc
IIRC wikipedia has a good example
oh no, I am mistaken..
Graham Markall
@gmarkall
supposing you have one signal with values 2, 7, 5, and 6 that began at t0 =0 then changed at t1 = 15, t2 = 27, t3 = 40, the index array would look like [0, 15, 27, 40] and the values array like [2, 7, 5, 6]
2 replies
esc
@esc
I just re-read the conversation, I think what you are looking for may be: https://sparse.pydata.org/en/stable/
1 reply
I know they use Numba under the hood, I am not sure if their datastructures can be used in Numba compiled functions. Would be good to find out though!
Graham Markall
@gmarkall
then to know what the value was at t = 20, do a bisection search on the index array to find the index of the value before t>=20 and t<20 (1 in this case) then use that to index into the values with values[1] which would be 7
ISTR sparse installs a Numba extension, so maybe it does
Christopher Mayes
@ChristopherMayes
Can custom dtypes be used with vectorize, guvectorize? Something like:
ztype = np.dtype([('vec', 'float64', 2), ('ix', int)])
@vectorize(['ztype(ztype)'], target='cuda')
def zcalc(zin):
    ...
    return zout
12 replies
bosje
@bosje:matrix.org
[m]

gmarkall (Graham Markall):

Hey Graham, I implemented a VCD parser and a class to represent a signal as you suggested (See below)

Basically a FastVCD object holds a dict of Signal objects. For my usecase, I have about 100k VCDs, each with about 30-100 signals each. I wish to iterate through all VCDs and collect information from all signals. To do this with numba, should I decorate these two classes with @jitclass ? The Signal class is pretty simple, just holds two integer arrays.

pseudocode of what I want to do :

vcds = [List of FastVCD objs]
for i, v in enumerate(vcds):
    for t in range(0, 1000): # iterate through time    
         sum = 0 
         # Sum up values of all signals at time t
         for s in v.signals:
              sum += s[t] 
         arr[i, t] = sum
return arr

My current code:

from bisect import bisect_right
class Signal:
    def __init__(self) -> None:
        self.indices = []
        self.data = []

    def add(self, i, d):
        self.indices.append(i)
        self.data.append(d)

    def __getitem__(self, i):
        x = bisect_right(self.indices, i)
        return self.data[x-1]
class FastVCD:
    def __init__(self, vcd):
        self.signals = {} # Dict of signal objects
5 replies
bosje
@bosje:matrix.org
[m]
The slowest part of the current algorithm is iterating through time. One solution could be to just iterate through timestamps where signals are changing
arundhati87
@arundhati87
Getting the error: TypeError: 'DeviceFunctionTemplate' object is not callable
import os
import cv2
import time
import numpy as np
from numba import jit,cuda
from google.colab import drive

drive.mount('/content/gdrive')

#@cuda.jit('f8[:,:](u1[:,:])', parallel=True, cache=True, device=True)
@cuda.jit(device=True)
def normalize_mat(depth_src):
    depth_min = depth_src.min()
    depth_max = depth_src.max()
    depth = (depth_src - depth_min) / (depth_max - depth_min)

    return depth

def generate_stereo(depth_dir, depth_prefix, out_dir, f):
    filename = f.split(".")[0]
    print("=== Start processing:", filename, "===")
    depth_src = cv2.imread(os.path.join(depth_dir, depth_prefix + filename + ".jpg"), cv2.IMREAD_GRAYSCALE)

    depth = normalize_mat(depth_src)
    depth = depth * 255

    cv2.imwrite(os.path.join(out_dir, "depth_" + filename + ".jpg"), depth)

def file_processing_imdepth_dir, depth_prefix, out_dir):
    for f in os.listdir(depth_dir):
        filename = f.split(".")[0]
        generate_stereo( depth_dir, depth_prefix, out_dir, filename)

def main():
    start_time = time.time()

    depth_dir = 'gdrive/MyDrive/depth/'
    depth_prefix = 'Depth_'
    out_dir = 'gdrive/MyDrive/output/'

    if not os.path.exists(out_dir):
        os.mkdir(out_dir)

    file_processing_im(depth_dir, depth_prefix, out_dir)

    print(time.time() - start_time, "seconds for base generation")

if __name__ == "__main__":
    main()
2 replies
bosje
@bosje:matrix.org
[m]
this provides enough speed up for my use case such that i do not need numba really
sabbraxcaddabra
@sabbraxcaddabra
Hello everybody, I need some help. I have the njit-ed code that i want to use in my app that plan to freeze with pyinstaller. I also try to use aot numba compiler and it works good. The question is: can i use the aot-compiled code in project without numba package? Because it sounds good not to oblige project to use numba package if i use aot instead of jit compilation. Thank you in advance!
3 replies
Christopher Mayes
@ChristopherMayes
Are there any plans to support numpy math functions? For example, numpy.sinc doesn’t have an equivalent in math
9 replies
This one is trivial to implement (https://github.com/numpy/numpy/blob/b235f9e701e14ed6f6f6dcba885f7986a833743f/numpy/lib/function_base.py#L3476) , but it would be nice not to have to.
Graham Markall
@gmarkall
(possibly / probably) stupid question about the CPU target - it seems if you pass a pandas series to a CPU-jitted function it doesn't know how to type it, and I need to call to_numpy() on it first... Is that expected?
1 reply
rupeshknn
@rupeshknn

Hello, the issue is about indexing numpy array in a jited function.

@jit(nopython=True)
def foo(index):
    x = np.array([1,2,3])
    y = x[index]
    return y
print(foo(4))

This returns some random number rather than raising an error.
This only happens with numpy arrays. With tuples or numba typed lists, an IndexError: list index out of range is raised

nelson2005
@nelson2005
This is as designed. You can turn on bounds checking with an environment variable
1 reply
nelson2005
@nelson2005
I've been receiving this error in discourse recently... anyone have insight? -->"Sorry, you can't include links in your posts"
12 replies
brandonwillard
@brandonwillard:matrix.org
[m]
yeah, I've run into that link issue on Discourse a couple times already, and it just results in me duplicating things from elsewhere in my posts (e.g. contents of Gists)
JD
@rudiejd
can you use xoroshiro128p_uniform_float64 on a CUDA device function?
whenever i try to call it i get some kind of type error
Siu Kwan Lam
@sklam
@rudiejd, yes, you should be able to use it in a device function. Can you share the exception traceback?
JD
@rudiejd

here's a code example
import numba
from numba import cuda
from numba.cuda.random import create_xoroshiro128p_states
from numba.cuda.random import xoroshiro128p_uniform_float64

@cuda.jit('void(float32[:,:])', device=True)
def device(rng_states):
thread_id = cuda.grid(1)
probability = xoroshiro128p_uniform_float64(rng_states, thread_id)

@cuda.jit()
def kernel(rng_states):
device(rng_states)

BPG = 10
TPB = 10
rng_states = create_xoroshiro128p_states(BPG * TPB, seed=42069)
kernelTPB, BPG

Error message:
numba.core.errors.TypingError: Failed in cuda mode pipeline (step: nopython frontend)
Internal error at <numba.core.typeinfer.CallConstraint object at 0x2ad3ee333970>.
module, class, method, function, traceback, frame, or code object was expected, got CPUDispatcher
During: resolving callee type: Function(<numba.cuda.compiler.DeviceDispatcher object at 0x2ad3ee333820>)
During: typing of call at /home/rudiejd/cse620c_finalproject/test.py (21)

Enable logging at debug level for details.

File "test.py", line 21:
def device(rng_states):
<source elided>
thread_id = cuda.grid(1)
probability = xoroshiro128p_uniform_float64(rng_states, thread_id)
^

@sklam should i call it differently, or is there something i have to import in the device function?
Siu Kwan Lam
@sklam
The problem is the type signature in @cuda.jit('void(float32[:,:])', device=True). rng_states is not a 2d float32 array. I'd suggest leaving the type signature out
e.g.
This message was deleted
@cuda.jit(device=True)
def device(rng_states):
    thread_id = cuda.grid(1)
    probability = xoroshiro128p_uniform_float64(rng_states, thread_id)
JD
@rudiejd
@sklam ah okay, that fixed it in my minimal example! thank you. in general, can i leave the type signature out of cuda jit-ed methods and let it infer?
Siu Kwan Lam
@sklam
I usually just let it infer.
I only provide signatures when I want to prevent it compiling new versions for different types.
JD
@rudiejd
one more question: what does it mean when numba cannot unpack the arguments for an exception? my kernel is running with NUMBA_ENABLE_CUDASIM=1, but when i run it in actual cuda i get this error:
Traceback (most recent call last):
  File "/home/rudiejd/cse620c_finalproject/milestone3.py", line 734, in <module>
    cuda_main(agent_count, graph_type, comm_algo)
  File "/home/rudiejd/cse620c_finalproject/milestone3.py", line 498, in cuda_main
    hm.run_parallel(BPG, TPB)
  File "/home/rudiejd/cse620c_finalproject/milestone3.py", line 280, in run_parallel
    kernel[BPG, TPB](
  File "/home/rudiejd/.conda/envs/env/lib/python3.9/site-packages/numba/cuda/compiler.py", line 868, in __call__
    return self.dispatcher.call(args, self.griddim, self.blockdim,
  File "/home/rudiejd/.conda/envs/env/lib/python3.9/site-packages/numba/cuda/compiler.py", line 1003, in call
    kernel.launch(args, griddim, blockdim, stream, sharedmem)
  File "/home/rudiejd/.conda/envs/env/lib/python3.9/site-packages/numba/cuda/compiler.py", line 752, in launch
    exccls, exc_args, loc = self.call_helper.get_exception(code)
ValueError: not enough values to unpack (expected 3, got 2)