from numba import types, njit from numba.extending import intrinsic from llvmlite import ir @intrinsic def readcyclecounter(typingctx): sig = types.int64() def codegen(context, builder, signature, args): fn = builder.module.declare_intrinsic('llvm.readcyclecounter', fnty = ir.FunctionType(ir.IntType(64), )) return builder.call(fn, ) return sig, codegen @njit def foo(): start = readcyclecounter() # Do something here end = readcyclecounter() elapsed = end - start return elapsed print(foo())
In : from numba import njit, objmode In : @njit ...: def time_now(): ...: with objmode(t='float64'): ...: t = time() ...: return t ...: In : time_now() Out: 1566205053.2636802 In : time_now() Out: 1566205054.8763537 In : time_now() Out: 1566205056.2839804 In : @njit ...: def work(): ...: ts = timer() ...: acc = 0 ...: for i in range(1000000): ...: acc += i ...: acc /= 7. ...: te = timer() ...: print("Elapsed = ", te - ts) ...: return acc ...: In : work() Elapsed = 0.012462377548217773 Out: 166666.47222222222
@intrinsicthink you'd need to ABI compatibly define the C structs defined in
time.h, recreate the glibc/kernel level defined impls of e.g.
clk_id, make sure you're on a system that supports these things and then
builder.module.get_or_insert_function(type, "c_func_name")and then
time.timeand add C code to Numba
_helperlib.cas needed to hide difficulties noted with ABIs.
So curious if anyone has experience with the numba cuda implementation? Not sure why not all of the results of a matrix operation are in the outputted result...
import numpy as np from numba import cuda @cuda.jit def matadd(A, B, C): i, j = cuda.grid(2) C[i][j] = A[i][j] + B[i][j] n=4 a=np.random.uniform(low=-100, high=100, size=(n,n)).astype(np.float32) b=np.random.uniform(low=-100, high=100, size=(n,n)).astype(np.float32) result = np.zeros((n,n), dtype=np.float32) matadd(a,b, result) print(result)
The result only shows a value in the first row/column the rest are still 0s
[[-13.121542 0. 0. 0. ] [ 0. 0. 0. 0. ] [ 0. 0. 0. 0. ] [ 0. 0. 0. 0. ]]
function_name[blocks_per_grid, threads_per_block](args), without a launch configuration it'll default to, I think a 1x1, but something you don't want!