da.from_array(large_numpy_array)was blowing up workers, as (counterintuitively) the
large_numpy_arrayis not actually partitioned along the expected chunks. Each worker appears to get a full copy of
large_numpy_arrayregardless of the chunking.
from_array(large_array, chunks=chunks).compute(), for example, does not allocate data to workers the way one would expect from the chunking
scatter, is that right?
solveoperation across the last 2 dimensions, i.e. given
BI want to do
C[i,:,:]=solve(A[i,:,:], B[i,:,:])for all i in the leading dimension... I tried the below but it seems to be really slow (slower than numpy) - does anyone know what I'm doing wrong/what I could do better? Sorry if this was the wrong place to ask.
C = da.apply_gufunc(np.linalg.solve, "(i,j),(i,k)->(j,k)", A, B, vectorize=True,output_dtypes=A.dtype)
scaleand then read it back then scale up/down from that?