Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 05:27
    emcastillo auto_merge_enabled #5764
  • 05:27
    emcastillo commented #5764
  • 05:09
    kmaehashi commented #5770
  • 05:08
    chainer-ci commented #5770
  • 04:56
    kmaehashi demilestoned #5732
  • 04:56
    kmaehashi demilestoned #5688
  • 04:55
    asi1024 assigned #5795
  • 04:55
    asi1024 labeled #5795
  • 04:55
    asi1024 labeled #5795
  • 04:54
    asi1024 labeled #5792
  • 04:54
    asi1024 unlabeled #5792
  • 04:54
    asi1024 labeled #5792
  • 04:54
    asi1024 labeled #5792
  • 04:54
    asi1024 labeled #5792
  • 04:53
    emcastillo commented #5770
  • 04:53
    asi1024 labeled #5790
  • 04:53
    asi1024 labeled #5790
  • 04:53
    kmaehashi assigned #5790
  • 04:53
    asi1024 labeled #5789
  • 04:53
    asi1024 labeled #5789
Thomas Aarholt
@thomasaarholt
Just out of curiosity (and because I'm probably getting one from work) - has there been any discussion on using CuPy with the new Apple M1 processor?
Leo Fang
@leofang
CuPy only supports NVIDIA and AMD GPUs. I actually looked into supporting Apple's GPU via its Metal language. My conclusion was that it is not suitable for scientific computing (ex: it lacks double precision floating numbers and complex numbers) and it'd be very difficult for CuPy to work with.
Thomas Aarholt
@thomasaarholt
I expected something along those lines! I'm surprised about the lack of double precision! Thanks for the answer!
Ben Cutilli
@benvcutilli
Hey, can anyone tell me what "workspace" means in the context of cuDNN? The CUDA documentation doesn't really discuss this; it assumes you already know what it is.
Kenichi Maehashi
@kmaehashi
Workspace is a device memory used by cuDNN library to compute and store intermediate values.
Ben Cutilli
@benvcutilli
Thanks!
Ben Cutilli
@benvcutilli
@leofang @thomasaarholt Unless I'm missing something here, Metal at least as dgemm and the like: https://developer.apple.com/documentation/accelerate/blas
also has the z* functions
It's gotta be a full BLAS implementation
Leo Fang
@leofang
Apple's Accelerate is for CPU only afaik. NumPy does use (or more precisely, re-enable) it .
nav-id
@nav-id
Hi, I'm trying to made the most of @fuse, but I can't find much documentation - in particular I'm struggling to find a way to use the size of array returned by one operation as a parameter for another eg: cupy.random.random(size=(x.shape[0],1)). I would appreciate any advice. Thanks in advance.
2 replies
Leo Fang
@leofang
For CuPy to work we need the ability to write custom GPU kernels, but double precision/complex number aren't part of the primitive types of Metal
@nav-id didn't x.size work for you?
nav-id
@nav-id
@leofang I'm afraid not, I get the following error AttributeError: '_ArrayProxy' object has no attribute 'size'. I tried casting it to different types but no luck
Leo Fang
@leofang
_ArrayProxy is not part of CuPy. It sounds like x is coming from another library?
Oh sorry
I take it back
you're asking about @fuse 😅
my apology, maybe someone from the team could give a better suggestion, I am less familiar with @fuse's functionality
nav-id
@nav-id
no problem @leofang thank you for your quick reply!
Ben Cutilli
@benvcutilli
@leofang Ah, you're right. I did some searching though, and I found this "gemm" (bad pun): https://developer.apple.com/documentation/metalperformanceshaders/mpsmatrixmultiplication
look at alpha and beta
Not sure if the GPU is faster than the M1's dedicated hardware for linear algebra, but maybe a benchmark might tell
Ben Cutilli
@benvcutilli
Wait
Im getting my greek letters and matrices mixed up
Ben Cutilli
@benvcutilli
Well, i cant find additional positive or negative proof, but the fact that alpha and beta are Doubles is suspicious
Ben Cutilli
@benvcutilli
It appears that youre right about double precision. However, @thomasaarholt, it appears that the GPU is much slower than the dedicated hardware on the M1.
both facys
Leo Fang
@leofang

If you are installing CuPy and cuTENSOR from conda-forge, we notice an binary incompatibility issue. For the time being, please limit the cuTENSOR version as follows: conda install -c conda-forge cupy cutensor=1.2 ... We are working on a proper fix and will announce here once it is done.

@/all This is now fixed. conda install -c conda-forge cupy cutensor ... will work just fine (with cuTENSOR 1.2). The support of cuTENSOR 1.3 in conda-forge is ongoing.

Leo Fang
@leofang
@benvcutilli Yeah it looks odd. I think the ref you found is for "Metal Performance Shaders", which is kinda a collection of high-level constructs IIUC (consider it as the GPU version of a subset of Accelerate, or a mixed subset of cuDNN/cuSOLVER/etc), whereas the primitive type support that I was referring to is part of "Metal", the programming model (low-level, if you like), see the language guide https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf
so I'd rather compare "Metal" with "CUDA"
Leo Fang
@leofang
@benvcutilli Aha, the supported matrix/vector dtypes are listed here: https://developer.apple.com/documentation/metalperformanceshaders/mpsdatatype
so no double precision in Apple GPU
emcastillo
@emcastillo
@nav-id sorry for the delay, @fuse actually only works with element wise and reduction functions, use of apis such as random, or linalg is not supported. Only regular arithmetic/trigonometric functions such as cupy.add, cupy.sin, cupy.sum are supported
Ben Cutilli
@benvcutilli
Right thats the thing I was looking at, but it didnt say "This is the list of types that are supported in matrix operations". And then float32 was in a different section with an ambiguous "Float" type that says it represents all floats or something like that, which doesnt make any sense.
3 replies
i think it boils down to the common complaint that apple's documentation is bad
Ben Cutilli
@benvcutilli
I should also point out that I'm using a Trainer instance to do the training, which uses the Iterator
Kenichi Maehashi
@kmaehashi
:mega: Released CuPy v9.3.0 & v10.0.0b1!
This release includes support for CUDA 11.4 and Compute Capability 8.6 (RTX 30X0 and AX000 series), cupyx.scipy.sparse.linalg.* enhancements, and more!
Refer to the release notes for the full changes:
Thomas Aarholt
@thomasaarholt
:rocket:
Masayuki Takagi
@takagi
:tada:
qy.fofr
@FofrQy_twitter
hi guys, I’ve just started using cupy. what I’m trying to do is to use cupy to replace numpy in my code. the problem is when I try to use http://cupy.dot() , it always report error, says “implicit conversion to a host numpy array is not allowed”. However, I’ve checked the document about http://copy.dot() and made sure I give http://cupy.dot() two cupy.ndarray as it requires. is there any possible explanation of this issue? The original numpy code can run normally. Thank you anyway🥺
qy.fofr
@FofrQy_twitter
from the error info, may i infer that .dot() here can only accept numpy object? but it does take cupy.ndarray as input in other condition i tried… Sad…
Kenichi Maehashi
@kmaehashi
Hi, could you share the code that reproduces the error?
qy.fofr
@FofrQy_twitter
i suppose it wont be necessary now LOL. i have renamed the input arrays via cupy just before invoke .dot(), and it works! now my code has accelerated 10 times, thanks again~
Kenichi Maehashi
@kmaehashi

:tada: Released CuPy v9.4.0 & v10.0.0b2!
This release includes support for NVIDIA's CUDA Python (docs, repo), AMD ROCm 4.3, and many more distributions in cupy.random.Generator thanks to GSoC student @povinsahu1909!

Read to the release notes for the full changes:

:mega: Pre-Release (alpha/beta/RC) wheels will be removed from PyPI:
  • Starting in v10.0.0b2, we stopped uploading pre-release binary wheels to PyPI. To try v10.0.0b2 wheels, you will need an extra options like pip install cupy-cuda*** -f https://github.com/cupy/cupy/releases/tag/v10.0.0b2 (see cupy/cupy#5671 for more info.)
  • We are also going to remove outdated (v8.0.0rc1 or earlier) pre-release binary wheels from PyPI on September 20th (see cupy/cupy#5667 for more info.)
Théophile Cantelobre
@theophilec
Hi all ! I have a quick question: if this is not the best place for it, I'm happy to move it to the correct place :) Given Aand B two ndarrays of sizes M, D, D and N, D, D, what would you recommend for efficiently computing cp.matmul(A[i], B[j]) for all i and j. I have already thought of looping through and expanding A[i] and multiplying by B. Any input would be welcome ! Thanks !
7 replies
Kenichi Maehashi
@kmaehashi
Quick survey on dropping CUDA 10.1 support: https://twitter.com/CuPy_Team/status/1435465647178149895
saujay
@saujay
Hi guys, What are the use cases for ElementwiseKernel and RawKernel? I searched, but couldn't find clear instructions or examples.
2 replies