Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 16:12
    leofang commented #5339
  • 15:16
    the-lay synchronize #5171
  • 09:50
    chainer-ci commented #5171
  • 08:36
    emcastillo synchronize #5260
  • 08:33
    the-lay synchronize #5171
  • 07:02
    emcastillo synchronize #5260
  • 06:41
    emcastillo synchronize #5260
  • 06:04
    leofang edited #5342
  • 06:00
    leofang opened #5342
  • 04:41
    emcastillo labeled #5341
  • 04:41
    emcastillo labeled #5341
  • 04:41
    emcastillo assigned #5341
  • 04:39
    emcastillo labeled #5340
  • 04:39
    emcastillo labeled #5340
  • 04:39
    emcastillo assigned #5340
  • 04:38
    emcastillo labeled #5339
  • 04:35
    emcastillo labeled #5338
  • 04:35
    emcastillo labeled #5338
  • 04:35
    emcastillo assigned #5338
  • 04:34
    emcastillo milestoned #5337
sevagh
@sevagh
if you comment the set_allocator line, you get the OOM error (array too big for gpu): https://pastebin.com/LXp1qrf1
if you uncomment the set_allocator line, you get illegal access error when trying to run the ifft: https://pastebin.com/qbY4CkSv
sevagh
@sevagh
interesting. forward fft2 is ok, ifft2 isn't
also taking a cupy.asnumpy() on the output of the fft gives the same illegal access
sevagh
@sevagh
i suppose i can try to split my ndarray into sizes that can fit in my gpu memory manually and do several fft calls
Leo Fang
@leofang
Thanks, @sevagh. The pastebin.com links do not work, but I can reproduce it locally. May I kindly ask you to open an issue in CuPy and post this problem along with your reproducer, so we can keep track of it and investigate later?
sevagh
@sevagh
will do - thanks for helping
Leo Fang
@leofang

also taking a cupy.asnumpy() on the output of the fft gives the same illegal access

Yes, I guess it's the same situation as in ifft2: there's a memory corruption somewhere. It's just with ifft2 it happens earlier for some reason

i suppose i can try to split my ndarray into sizes that can fit in my gpu memory manually and do several fft calls

This would work, yes. Another hack off top of my head is to create a cupy.cuda.cufft.PlanNd object yourself, and then call PlanNd.fft() with your managed-memory-backed array. Warning: this is internal API, it's not stable, and we do not provide documents, so you might have to play around with it to figure out how it works. The simplest way is for you to run a ifft2 call with a small array, and print the plan cache: cupy.fft.config.get_plan_cache(). It'd tell you the plan you just created with all arguments you need. Then, you will have a clue for how to create a larger plan.

sevagh
@sevagh
ok. on the other hand, would waiting for this bug to be solved (not to put pressure on the maintainer team) be a solution?
in other words, do you expect the example above to have worked fine? is it still the correct way to apply a cufft on an ndarray too big to fit in gpu memory?
not sure if my title makes sense
Leo Fang
@leofang
Thanks @sevagh. Someone (including myself) would look into it. I would naively expect it to work, yes. But I don't think the test suite stressed on making device memory and managed memory interchangeable, so I wouldn't be surprised if there're corner cases we didn't foresee. Also, given the history of cuFFT, it could also be that cuFFT is messing around with us, then all we can do is to report to NVIDIA and wait for a fix to land...
sevagh
@sevagh
your suggestion to play with the PlanNd.fft directly is working better:
gpu_array = cupy.array(big_array)
plan = cupyx.scipy.fftpack.get_fft_plan(gpu_array)
# perform giant 2D FFT in-place
plan.fft(gpu_array, gpu_array, 1)

# perform giant 2D IFFT in-place
plan.fft(gpu_array, gpu_array, -1)
back_to_host = cupy.asnumpy(gpu_array)
unfortunately, i get the illegal access still in the final line, when trying to copy it back to a host numpy ndarray
Leo Fang
@leofang
That's a good call to use get_fft_plan()! I wrote it myself and I can't believe I didn't suggest you to use it...Sorry
This is good finding
It suggests cuFFT is to blame
I'll look into it later
sevagh
@sevagh
i mentioned the above in the github issue for posterity
Leo Fang
@leofang
👍
Leo Fang
@leofang

:tada: Just Released CuPy v9.0.0!
v9.0.0 includes JIT APIs, cuSPARSELt, improved ROCm support and more!
Read the blog for highlights: https://medium.com/cupy-team/cupy-v9-is-here-27e9cbfbf7e5
Or release notes for the full changes: https://github.com/cupy/cupy/releases/tag/v9.0.0

Dear @/all, in a few hours CuPy v9.0.0 will appear on Conda-Forge as well! Note that starting CuPy v9, all satellite packages (cuDNN, cuTENSOR, etc) become optional. If you need to use them, you should do an explicit conda install. See the installation guide for more information.

Leo Fang
@leofang
btw we are solving an issue on the (optional) cuGraph support for the Conda-Forge package. Once it's resolved I will update here.
ragna19
@draganal28
Is support for cupyx.sparse.lil_matrix planned to happen any time in a near future?
Leo Fang
@leofang
Not to my knowledge, but contribution is always welcome!

btw we are solving an issue on the (optional) cuGraph support for the Conda-Forge package. Once it's resolved I will update here.

Dear @/all, the support for cuGraph is enabled. For Conda-Forge users, it means you can just reinstall CuPy v9.0.0 by

conda install -c conda-forge cupy libcugraph

and the new cupyx.scipy.sparse.csgraph.connected_components() API will work now!

ragna19
@draganal28
@leofang : thanks! One more question: conda cannot find cupy for aarch64 and from the posts on github it seems this has not been provided yet (the question still stands with the status Open). Do I gather it right?
Leo Fang
@leofang

That's right, unfortunately the infrastructure for aarch64/ppc64le is not ready yet.

If you are interested in helping, please take look at https://conda-forge.org/docs/maintainer/knowledge_base.html?#adding-support-for-a-new-cuda-version, which also applies to supporting a new architecture. Then, in the referred feedstocks you'll find incomplete works that you might be able to pick up. I'd say we're closer to get ppc64le done than it is for arrch64. For this support, though, it's better to continue the discussion in https://gitter.im/conda-forge/conda-forge.github.io instead of here 🙂

ragna19
@draganal28
@leofang Thanks!
drperpen
@drperpen
Hi, I am developing a python tool to reconstruct 3d point clouds from 2d images. I am using cupy for all the heavy processing (mostly matrices multiplications and pixel matching) My previous reconstruction tool was made in Houdini, using its OpenCL node that took care of all the memory management for me. However using cupy, I am still looking for a way to load say 100 images from different cameras, then process them. The images are 8K,16 bit pngs (around 250 Mb each) so what I do now is to load them using imagio, then crop them (ROI) by around half, then load them to gpu. But even them it seems too much. What is the best approach when dealing with such large batches of images?
Masayuki Takagi
@takagi
Hi @drperpen! I haven't used it, but nVidia provides DALI for loading and processing image and other data. Although it is on the context of deep learning, by any chance, you may use it for preparing your data before you process it with CuPy. https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
drperpen
@drperpen
@takagi thank you for the info, I will take a look at it. What I did so far is change the saving method during the previous step (demosaicing and undistorting the images) Instead of saving them as 100 png per frame, I'd save a single huge numpy array [100, 6004, 7920, 3] Then during reconstruction is faster to load a single file, not 100. Afterwards I can select what goes to GPU memory and what not.
Benjamin Zaitlen
@quasiben
@drperpen you might also be interested in cuCIM: https://docs.rapids.ai/api/cucim/nightly/
drperpen
@drperpen
@quasiben Thank you!
Thomas Aarholt
@thomasaarholt
Does anyone have any experience with cupy-backed dask arrays?
I'm trying to work out why the following example of summing a larger-than-memory array is A) very slow and B) crashes if I increase the size of the array.
Thomas Aarholt
@thomasaarholt
I ended up creating an issue on the dask github here: dask/dask#7687
If anyone has any thoughts, please take a look!
(In short, I'm summing a da.ones array on the GPU, and it takes a long time, much longer than just using the CPU)
Kenichi Maehashi
@kmaehashi
:tada: Released CuPy v9.1.0 & v10.0.0a1!
This release includes support for cuSPARSELt v0.1.0, cuDNN v8.2.0, NCCL v2.9.8, atomicAdd support in JIT, and more!
Refer to the release notes for the full changes: https://github.com/cupy/cupy/releases/tag/v9.1.0 https://github.com/cupy/cupy/releases/tag/v10.0.0a1
Leo Fang
@leofang

:tada: Released CuPy v9.1.0 & v10.0.0a1!

:tada: Both releases are now available on Conda-Forge as well!

Thomas Aarholt
@thomasaarholt
:tada: to both the release and conda-availability!
Just a note: "Note: many of these PRs are already backported to v9.0.0 and available since the release." Surely that's "backported to the v9 series" and not "v9.0.0"?
Kenichi Maehashi
@kmaehashi
Good suggestion :smile: Updated.
amcwhort
@amcwhort
Hi All, I'm having some trouble installing CuPy on my work PC. I have CUDA 11.2, Python 3.9, and a good GPU. However, when I try to install the package using PyCharm's built in installer, I'm getting an error: "TypeError: LoadLibrary() argument 1 must be str, not None".
emcastillo
@emcastillo
can you try pip install cupy-cuda112?
amcwhort
@amcwhort
image.png
when I put that into the prompt, it says that I already have it. however, I still get an error when trying to write with an import statement in PyCharm
It also does not show up on the list of installed packages