I'm GEMMing two <VR, STAR> matrices A*B, where A is 400GB and B is 1MB. I have more than a terabyte of memory, but this GEMM gives me OOM errors. Any reason why this should be the case? Do I need to relay out the matrices for some reason first?
A is 6177583-by-8096, V is 8096-by-20, the resulting matrix should be 6177583-by-20
I tried relaying out every matrix to MC,MR explicitly, and I'm getting OOM errors when relaying out A by creating a new matrix with MC,MR and copying A into it.
so I guess the question now is, what is the memory cost of relaying out a matrix from VR,STAR to MC, MR? I'd like to think that as long as I can hold two copies of the matrix in memory, it should be fine.