- Join over
**1.5M+ people** - Join over
**100K+ communities** - Free
**without limits** - Create
**your own community**

Hi, I'm looking to work on ILU cupy/cupy#2749 - I wrote iterative solver kernels in CUDA before, including ILU.

Before I start, I have some questions:

- How does cupy's architecture support functions calling on single and multiple gpus? How are number of threads and number of blocks defined by cupy in each function?
- Sometimes there are NVIDIA functions (like cublas, cusparse) but sometimes they need to be written by developers. In what kind of situations do the cupy library developers write custom kernels to support new functions?

Nonetheless as long as there are cublas and cusparse functions in place, the ILU kernel can be written using those high level APIs. So my questions are mostly for the general case. Thank you.

- Most of CuPy's function has a highly compatible interface with NumPy, so it does not have the interface to specify the block size and the grid size.
- The feature of the custom kernels is documented here: https://docs-cupy.chainer.org/en/stable/tutorial/kernel.html. It can specify the block size.

Right now almost all cupy functions launch kernels with the same number of threads per block. the number of blocks is defined by the size of the arrays.

We are working on some kind of auto-tuning for these parameters .

Even lower-level class

`cupy.cuda.Function`

has the interface to specify threads/blocks but its mostly used internally.
You can swith the current device by using

`cupy.cuda.Device`

:```
with cupy.cuda.Device(1):
a = cupy.exp(cupy.ones((2, 3), 'f'))
print(a.device) # Prints <CUDA Device 1>
```

Hi, I found a function with a type of syntax I've never seen before https://github.com/cupy/cupy/blob/9f9ef7d15d632619f650471e9407722ce6a73438/cupy/_sorting/count.py#L27

Hi, I've written a draft for Cupy sparse mean #2674 https://gist.github.com/wongalvis14/e4c1af431c20cb18d5457a372a32c3ec

It's not optimized yet in comparison to numpy, and cupy implementations that I have yet to observe

Please give me some feedback on how to keep it up to production quality, thanks

It's not optimized yet in comparison to numpy, and cupy implementations that I have yet to observe

Please give me some feedback on how to keep it up to production quality, thanks

That's one thing I notice from scipy's implementation. I'm wondering if there's code I can refer to within cupy for cupy-oriented optimizations.

Also I don't know if it is correct to handle the row sum case (axis=0) by returning a flattened array, I just did what was most simple and intuitive to match the output formats with scipy and cupy dense mean.

It seems that dividing first then sum them up would require elementwise division, which isn't implemeneted. I also see that simpler elementwise operations such as addition isn't implemented?https://github.com/cupy/cupy/blob/ebe2cae7b2a6efb2b601d7da1a6471eeb0434a6e/cupyx/scipy/sparse/compressed.py#L426

Adding or dividing with a scalar shouldn't have many tricky cases, is there already a sparse elementwise kernel in cupy?

I have improved it to reduce the risk of precision error https://gist.github.com/wongalvis14/91400ced33778fd24e315ed85f5c6655

Oh actually I should put it under cupyx/scipy/sparse/data.py alongside other elementwise functions like power right?

I saw that one of the projects is to support rng https://github.com/cupy/cupy/wiki/GSoC-2020-Project-Ideas

cuRand is imported into cupy and cupy.random already exists, right? What is left to be done for that project?

cuRand is imported into cupy and cupy.random already exists, right? What is left to be done for that project?

Hey everybody! I am Rushabh Vasani, A 2nd-year I.T undergrad from India. I am highly interested in working with cupy this summer under GSoC. I am interested in working on "CuPy coverage of NumPy functions". And I even have contributed to cupy previously. And I really looking forward to contributing more. Right now I am listing up the numpy functions which are not there in cupy for my reference. And trying to raise some more PRs in the project. Can you guys guide me further to get started with this project and the community? Thanks in advance!

`cupy.util.memoize`

, but now both ufuncs/element-wise kernels (never use it before) https://github.com/ericmjl/autograd-cupy

I am curious its performance and extensibility compared to Chainer’s (if there’s one). Also, it might be worth adding to the

`cupyx`

name space given that the Autograd project is inactive now?
**[Emilio, chainer]** autograd & cupy are orthogonal to each other,

Mostly you can think of chainer as an autograd library that has components for nns already included.

As we decided to relegate chainer to maintenance mode, we won't be adding or maintaining any kind of autograd support for cupy.