import cupy
import numpy as np
def test_astype_boolean_view():
dtype = np.int8
a = cupy.array([0, 1, 2, 3, 4], dtype=dtype).view(dtype=cupy.bool_)
print(f"a = {a}")
a = a.astype(dtype)
print(f"a = {a}")
if __name__ == "__main__":
test_astype_boolean_view()
#include <cupy/atomics.cuh>
#include <cupy/carray.cuh>
#include <cupy/complex.cuh>
typedef bool in0_type;
typedef bool out0_type;
extern "C" __global__ void
cupy_copy__bool_int8(const CArray<bool, 1, 1, 1> _raw_in0,
CArray<signed char, 1, 1, 1> _raw_out0, CIndexer<1> _ind) {
;
CUPY_FOR(i, _ind.size()) {
_ind.set(i);
const in0_type in0(_raw_in0[_ind.get()]);
out0_type out0;
out0 = in0;
_raw_out0[_ind.get()] = (out0) ? 1 : 0;
;
};
}
Cupy eventually will call nv's driver API: cuLaunchKernel(as in cupy/cuda/function.pyx) to launch the kernel 'cupy_copy__bool_int8'.
Source code for reference:
cpdef launchKernel(
intptr_t f, unsigned int grid_dim_x, unsigned int grid_dim_y,
unsigned int grid_dim_z, unsigned int block_dim_x,
unsigned int block_dim_y, unsigned int block_dim_z,
unsigned int shared_mem_bytes, intptr_t stream, intptr_t kernel_params,
intptr_t extra):
with nogil:
status = cuLaunchKernel(
<Function>f, grid_dim_x, grid_dim_y, grid_dim_z,
block_dim_x, block_dim_y, block_dim_z,
shared_mem_bytes, <Stream>stream,
<void**>kernel_params, <void**>extra)
check_status(status)
As I understand, the parameter kernel_params
will be a pointer to a pointer. For example, in my case. Length will be three. The first( and also the second )pointer is a pointer to a _CArray struct instance defined in cupy/_core/_carray.pxd file. The third pointer is a pointer to a _CIndexer struct instance defined in the same file.
My question is how could this pointer to pointer kernel_params
be passed to cupy_copy__bool_int8
as kernel function parameters? Because from the generated 'cu' file as I posted above, it requires const CArray<bool, 1, 1, 1>
for the first argument, but it seems we are passing a different type of struct pointer to it? I must have missed something, please help. I couldn't find any good source code walk throughs online regarding cupy. Much appreciated.
cupy_copy__bool_int8
kernel, both CArray and CIndexer only have __device__
constructor, no __host__
constructor. Which means I have to instantiate on a kernel? So in my host main
function, I invoked a __global__
function, and constructed the CArray and CIndexer instances, and then in this __global__
function, I call the other __global__
function cupy_copy__bool_int8
, which would probably means using dynamic parallelism, which means one has to add -rdc=true
, -lcudadevrt
, right? I didn't manage to make it work yet.nvprof
to check it out, only one kernel function is called, the cupy_copy__bool_int8
function. So I think one doesn't need to start another __global__
function to call into this one, but I don't know how to instantiate those CArray, CIndexer on the device and call cupy_copy__bool_int8
then.
cuLaunchKernel
would copy the arguments pointed by the pointers in kernel_params
, please check the CUDA driver API docscupy/_core/_carray.{pxd,pyx}
, for example), and then on the device when the kernel takes the copied inputs, they are viewed as device CArray
and CIndexer
(which is effectively reinterpret_cast
as you rightly guessed)cudaMemcpy
or any CUDA kernel
_typechars = '?bBhHiIlLefdFD'
def _cuda_can_cast(from_dtype, to_dtype):
from_dtype = numpy.dtype(from_dtype)
to_dtype = numpy.dtype(to_dtype)
return _typechars.find(from_dtype.char) <= _typechars.find(to_dtype.char)
cpdef _custom_can_cast(d1, d2):
_typechars = '?bhilqBHILQefdFD'
d1 = numpy.dtype(d1).char
d2 = numpy.dtype(d2).char
if d1 == '?':
return True
if d2 == '?':
return False
if d1 in 'bhilq':
# d1 is signed integral type
if d2 in 'efdFD':
return True
if d2 in 'BHILQ':
return False
return _typechars.find(d1) <= _typechars.find(d2)
if d1 in 'BHILQ':
# d1 is signed integral type
if d2 in 'efdFD':
return True
if d2 in 'BHILQ':
return _typechars.find(d1) <= _typechars.find(d2)
if d1 == 'L' and d2 == 'q':
# Note: This is a special case. Because L == uint32 or uint64 depend on platform.
# can cast: uint32 -> int64, can not cast: uint64 -> int64
if numpy.dtype(d1) == numpy.dtype('Q'):
return False
else:
return True
return _typechars.find(d1.lower()) < _typechars.find(d2)
if d1 in 'efdFD':
if d1 == 'd' and d2 == 'F':
return False
return _typechars.find(d1) <= _typechars.find(d2)
Hi everyone! Is there a way to include a C library to an ElementWiseKernel? I have to calculate the tangent inside a kernel and I want to use math.h
Here is a sample code of what I want to do, it doesn't have much sense because it is simplified, but I would like to be able to do this:
collisions = cp.ElementwiseKernel(
'float64 xtg, float64 ztg',
'float64 new_ztg',
'''
#include <math.h>
new_ztg = tan(xtg/ztg)
''',
'collisions')
It it possible to accomplish? Thank you very much!!
cupyx.distributed
.Failed to import CuPy.
If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.
On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm.
On Windows, try setting CUDA_PATH environment variable.
Check the Installation Guide for details:
https://docs.cupy.dev/en/latest/install.html
Original error:
Hello!
I am attempting to write an Elementwisekernel in Cupy. The function I am trying to turn into a kernel is this:
def compute_res(u, z):
return cp.linalg.norm((cp.eye(z.shape[0]) - cp.dot(u, cp.transpose(u))) @ z)
The kernel is to be applied upon pairs of two matrices, where U is of shape (5, 225, 20), and Z is of shape(5000, 225). So I want each element of U, u = shape(225, 20) and each element of Z, z = shape(225) to be computed with compute_res.
What I have tried is
compute_res = cp.ElementwiseKernel(
"raw T u, T z", "T res", "res = norm(eye(z.shape[0]) - u[i].T @ u[i]) @ z"
)
But this does not seem to work, can anyone guide me in the correct direction? Thank you.
Hi !
I am trying to import the function and definitions on https://github.com/cupy/cupy/blob/master/cupy_backends/cuda/libs/nccl.pyx
To a pyx file with a cimport , but it is not working.
I would like to know if it is even possible and if someone can give me some guidance about how to do it?
Thank you for the help!
PS: I noticed that on the cupy folder (after installing it with pip) the file nccl.pyx does not even exist just the .so for the library , thus the question if it is possible .