Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 09:39
    takagi synchronize #6725
  • 08:36
    takagi edited #6725
  • 08:36
    takagi edited #6725
  • 08:34
    asi1024 synchronize #6726
  • 08:34
    takagi edited #6725
  • 06:44
    emcastillo commented #6727
  • 06:43
    takagi synchronize #6725
  • 06:42
    takagi synchronize #6725
  • 06:40
    takagi ready_for_review #6725
  • 06:39
    takagi edited #6725
  • 06:39
    takagi edited #6725
  • 06:37
    takagi synchronize #6725
  • 05:45
    takagi commented #6725
  • 05:43
    takagi labeled #6727
  • 05:42
    takagi opened #6727
  • 05:18
    takagi commented #6725
  • 02:32
    takagi edited #6725
  • 02:31
    takagi commented #6725
  • 02:31
    takagi edited #6725
  • 02:30
    takagi synchronize #6725
Priya Nagda
@pri1311
Hello! I had a small doubt, will CuPy be moving forward with previous year's idea list for GSoC '22? Or a new list will be added later?
Kenichi Maehashi
@kmaehashi
I've made a dedicated room for GSoC 2022. Join and connect with mentors if you are interested! https://gitter.im/cupy/gsoc22
Priya Nagda
@pri1311
Thank you @crazymaster!
Bruce
@BruceDai003
Hello, I got a question regarding the source code cupy/_core/include/cupy/carray.cuh and cupy/_core/_carray.pxd in cupy. As I can see, both defined a class CArray and also CIndexer. Why is it done in this way? What's the relationship in this?
The reason why I want to ask this is that I was trying to learn how cupy typical functions work. For example, I tracked down the following simple python example:
import cupy
import numpy as np

def test_astype_boolean_view():
    dtype = np.int8
    a = cupy.array([0, 1, 2, 3, 4], dtype=dtype).view(dtype=cupy.bool_)
    print(f"a = {a}")
    a = a.astype(dtype)
    print(f"a = {a}")        


if __name__ == "__main__":
    test_astype_boolean_view()
I mean, I tracked down how astype works. I see that it will use nvrtc API to compile the kernel function into a 'cubin' file. And I also get the dumped 'cu' file here for reference:
#include <cupy/atomics.cuh>
#include <cupy/carray.cuh>
#include <cupy/complex.cuh>

typedef bool in0_type;
typedef bool out0_type;

extern "C" __global__ void
cupy_copy__bool_int8(const CArray<bool, 1, 1, 1> _raw_in0,
                     CArray<signed char, 1, 1, 1> _raw_out0, CIndexer<1> _ind) {
  ;
  CUPY_FOR(i, _ind.size()) {
    _ind.set(i);
    const in0_type in0(_raw_in0[_ind.get()]);
    out0_type out0;
    out0 = in0;
    _raw_out0[_ind.get()] = (out0) ? 1 : 0;
    ;
  };
}
Bruce
@BruceDai003

Cupy eventually will call nv's driver API: cuLaunchKernel(as in cupy/cuda/function.pyx) to launch the kernel 'cupy_copy__bool_int8'.
Source code for reference:

cpdef launchKernel(
        intptr_t f, unsigned int grid_dim_x, unsigned int grid_dim_y,
        unsigned int grid_dim_z, unsigned int block_dim_x,
        unsigned int block_dim_y, unsigned int block_dim_z,
        unsigned int shared_mem_bytes, intptr_t stream, intptr_t kernel_params,
        intptr_t extra):
    with nogil:
        status = cuLaunchKernel(
            <Function>f, grid_dim_x, grid_dim_y, grid_dim_z,
            block_dim_x, block_dim_y, block_dim_z,
            shared_mem_bytes, <Stream>stream,
            <void**>kernel_params, <void**>extra)
    check_status(status)

As I understand, the parameter kernel_params will be a pointer to a pointer. For example, in my case. Length will be three. The first( and also the second )pointer is a pointer to a _CArray struct instance defined in cupy/_core/_carray.pxd file. The third pointer is a pointer to a _CIndexer struct instance defined in the same file.
My question is how could this pointer to pointer kernel_params be passed to cupy_copy__bool_int8 as kernel function parameters? Because from the generated 'cu' file as I posted above, it requires const CArray<bool, 1, 1, 1> for the first argument, but it seems we are passing a different type of struct pointer to it? I must have missed something, please help. I couldn't find any good source code walk throughs online regarding cupy. Much appreciated.

Bruce
@BruceDai003
Also by the way, I try to write some simple cuda code to mimic the functionality as in my posted python code. But I saw that in cupy_copy__bool_int8 kernel, both CArray and CIndexer only have __device__ constructor, no __host__ constructor. Which means I have to instantiate on a kernel? So in my host main function, I invoked a __global__ function, and constructed the CArray and CIndexer instances, and then in this __global__ function, I call the other __global__ function cupy_copy__bool_int8, which would probably means using dynamic parallelism, which means one has to add -rdc=true, -lcudadevrt, right? I didn't manage to make it work yet.
Also I don't think this is the way cupy does it. Because as I used nvprof to check it out, only one kernel function is called, the cupy_copy__bool_int8 function. So I think one doesn't need to start another __global__ function to call into this one, but I don't know how to instantiate those CArray, CIndexer on the device and call cupy_copy__bool_int8 then.
Bruce
@BruceDai003
Is there some sort of reinterpret_cast happened during this function call?
Leo Fang
@leofang
Hi @BruceDai003
  1. cuLaunchKernel would copy the arguments pointed by the pointers in kernel_params, please check the CUDA driver API docs
  2. Since the kernel parameters (on the host) will be copied by CUDA driver, the trick done in CuPy is to have host structs filled with the required contents (those from cupy/_core/_carray.{pxd,pyx}, for example), and then on the device when the kernel takes the copied inputs, they are viewed as device CArray and CIndexer (which is effectively reinterpret_cast as you rightly guessed)
1 reply
The reason you can't see any data copy for kernel launch on the profiler is because it's a special pipeline implemented in the driver, instead of calling any (public) host API like cudaMemcpy or any CUDA kernel
Hope this unblocks your contribution to CuPy! 🙂
Bruce
@BruceDai003
https://docs.cupy.dev/en/stable/user_guide/performance.html?highlight=float32#prefer-float32-over-float64 Is there any further development on this? I see that in cupy, when we do simple ops, for example elementwise add for two different dtypes, the dtype promotation rule is adopted from numpy.can_cast API. So, for example, if I dtype1 = int32, dtype2 = float32, they would resolve to use float64 as a common dtype. Although fp64 would indeed be more accurate then fp32 in this case, but is there any option to disable fp64? I checked in pytorch(https://github.com/pytorch/pytorch/blob/master/c10/core/ScalarType.h#L408). It defined a table there. I guess fp32 is ok then?
Bruce
@BruceDai003
Another question is I just checked that in cupyx/jit/_cuda_typerules.py, there is a _cuda_can_cast function. It seems that according to this function. int8 can be casted to uint8? Doesn't seem right to me for negative numbers.
_typechars = '?bBhHiIlLefdFD'


def _cuda_can_cast(from_dtype, to_dtype):
    from_dtype = numpy.dtype(from_dtype)
    to_dtype = numpy.dtype(to_dtype)
    return _typechars.find(from_dtype.char) <= _typechars.find(to_dtype.char)
Bruce
@BruceDai003
I tried to implement a type cast function like this, which looks more like pytorch's conversion rules. Not sure if this is better
cpdef _custom_can_cast(d1, d2):
    _typechars = '?bhilqBHILQefdFD'
    d1 = numpy.dtype(d1).char
    d2 = numpy.dtype(d2).char
    if d1 == '?':
        return True
    if d2 == '?':
        return False
    if d1 in 'bhilq':
        # d1 is signed integral type
        if d2 in 'efdFD':
            return True
        if d2 in 'BHILQ':
            return False
        return _typechars.find(d1) <= _typechars.find(d2)
    if d1 in 'BHILQ':
        # d1 is signed integral type
        if d2 in 'efdFD':
            return True
        if d2 in 'BHILQ':
            return _typechars.find(d1) <= _typechars.find(d2)
        if d1 == 'L' and d2 == 'q':
            # Note: This is a special case. Because L == uint32 or uint64 depend on platform.
            # can cast: uint32 -> int64, can not cast: uint64 -> int64
            if numpy.dtype(d1) == numpy.dtype('Q'):
                return False
            else:
                return True
        return _typechars.find(d1.lower()) < _typechars.find(d2)
    if d1 in 'efdFD':
        if d1 == 'd' and d2 == 'F':
            return False
        return  _typechars.find(d1) <= _typechars.find(d2)
emcastillo
@emcastillo
🎉Released CuPy v10.2.0 & v11.0.0a2!
CuPy v11.0.0a1 includes improved NumPy/SciPy function coverage, support for CUDA 11.6 & ROCm 5.0 and several bug fixes and improvements!
Full release notes are available on GitHub:
https://github.com/cupy/cupy/releases/tag/v10.2.0
https://github.com/cupy/cupy/releases/tag/v11.0.0a2
JianFei Zhao
@Amos-Zhao
Hello, everyone. I have a question about for loop with cupy. How can I show the progress of for loop while I am use cupy? Simply use print()?
1 reply
manthan_verma
@manthan_verma:matrix.org
[m]
Hello everyone. When is cupy releasing fft wrapper for cufftmp??
2 replies
Kenichi Maehashi
@kmaehashi
📣 Released CuPy v10.3.0 & v11.0.0b1!
CuPy v11.0.0b1 comes with more SciPy functions and binary packages for CUDA 11.6 & ROCm 5.0!
Full release notes are available on GitHub:
https://github.com/cupy/cupy/releases/tag/v10.3.0
https://github.com/cupy/cupy/releases/tag/v11.0.0b1
Martín Moreno
@martinmorenoc

Hi everyone! Is there a way to include a C library to an ElementWiseKernel? I have to calculate the tangent inside a kernel and I want to use math.h
Here is a sample code of what I want to do, it doesn't have much sense because it is simplified, but I would like to be able to do this:

collisions = cp.ElementwiseKernel(
    'float64 xtg, float64 ztg',
    'float64 new_ztg',
    '''
    #include <math.h>
    new_ztg = tan(xtg/ztg)
    ''',
    'collisions')

It it possible to accomplish? Thank you very much!!

1 reply
Kenichi Maehashi
@kmaehashi
This message was deleted
3 replies
Kenichi Maehashi
@kmaehashi
We have released v10.3.1 as a hot-fix due to a regression that prevented CuPy from running in Maxwell or earlier CUDA GPUs.
https://github.com/cupy/cupy/releases/tag/v10.3.1
samrere
@samrere
Hi everyone, is there a way in cupy that can implement functions such as tanh, sigmoid, erf etc. using lookup table? Thanks!
Vetinari
@Vetinar1
Hey some quick questions, is there ever a case where a cupy array or cupy results will get implicitly transferred to the host, or does everything always live on the device unless otherwise specified?
And also, are there any publications or papers related to cupy that I can cite when writing about it in a thesis? I didn't find any in the documentation.
4 replies
suwen-ux
@suwen-ux
企业微信截图_16503377412163(1).png
I use cupy to accelerate my numpy calculation ,but it cost my large time when i use cupy.asarray
企业微信截图_16503378031980(1).png
企业微信截图_16503378426539(1).png
features is a 800000*800000 float32 numpy.ndarray
there are someone can help me? thanks
suwen-ux
@suwen-ux
@kmaehashi
Akifumi Imanishi
@asi1024
@suwen-ux Is the size of features really 800000800000? 800000800000 float32 ndarray requires 2560GiB GPU memory and will cause an OOM error.
suwen-ux
@suwen-ux
oh! that's my fail ,features is 800000 *256 ,sorry
@asi1024
企业微信截图_16503636368073.png
this is aother question
suwen-ux
@suwen-ux
@asi1024 this is my first step ,i think mybe Kernel Compilation result it ? but i can't confirm
suwen-ux
@suwen-ux
or can i do Kernel Compilation earlier? .thinks
Priya Nagda
@pri1311
I could be wrong, but this error can also be thrown if you are running out of memory. Someone please confirm.
Priya Nagda
@pri1311
Or even running the code with different architecture could solve the problem in some cases.
Martín Moreno
@martinmorenoc
Hi! Is there a way to accomplish the np.substract.outer method with cupy? Thanks!
4 replies
Akifumi Imanishi
@asi1024
📣 Released CuPy v10.4.0 & v11.0.0b2!
CuPy v11.0.0b2 includes more JIT improvements and the sparse matrix support of cupyx.distributed.
Full release notes are available on GitHub:
https://github.com/cupy/cupy/releases/tag/v10.4.0
https://github.com/cupy/cupy/releases/tag/v11.0.0b2
nav-id
@nav-id
Hi! I note that in 11.0.0b2 it mentions "JIT: Support .shape". Does this mean it should work with the fuse decorator too?
1 reply
andoorve
@andoorve
I'm wondering if anyone has run into this issue as well when upgrading CUDA versions?

ImportError:

Failed to import CuPy.

If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.

On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm.
On Windows, try setting CUDA_PATH environment variable.

Check the Installation Guide for details:
https://docs.cupy.dev/en/latest/install.html

Original error:

ImportError: /home/murali/cupy/cupy/_core/core.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZSt28__throw_bad_array_new_lengthv

I've just upgraded cuda versions to 11.6, and see this when building from source. Wanted to make sure it wasn't something dumb on my end before filing an issue
2 replies
On second thought, I've also upgraded to Ubuntu 22.04, might be something to do with gcc as well
Amir Arfan
@amirarfan

Hello!
I am attempting to write an Elementwisekernel in Cupy. The function I am trying to turn into a kernel is this:

def compute_res(u, z):
    return cp.linalg.norm((cp.eye(z.shape[0]) - cp.dot(u, cp.transpose(u))) @ z)

The kernel is to be applied upon pairs of two matrices, where U is of shape (5, 225, 20), and Z is of shape(5000, 225). So I want each element of U, u = shape(225, 20) and each element of Z, z = shape(225) to be computed with compute_res.

What I have tried is

compute_res = cp.ElementwiseKernel(
    "raw T u, T z", "T res", "res = norm(eye(z.shape[0]) - u[i].T @ u[i]) @ z"
)

But this does not seem to work, can anyone guide me in the correct direction? Thank you.

4 replies
sophia-estrela
@sophia-estrela

Hi !
I am trying to import the function and definitions on https://github.com/cupy/cupy/blob/master/cupy_backends/cuda/libs/nccl.pyx
To a pyx file with a cimport , but it is not working.
I would like to know if it is even possible and if someone can give me some guidance about how to do it?
Thank you for the help!

PS: I noticed that on the cupy folder (after installing it with pip) the file nccl.pyx does not even exist just the .so for the library , thus the question if it is possible .

3 replies