Please use Stack Overflow with the #dask tag for usage questions and github issues for bug reports
Hello everyone. Just wanted to know whether I can perform Dask array computations inside a delayed function, something like this.
def dask_delayed_example(X):
mean = X.mean(axis=0)
std = X.std(axis=0)
return X - mean
X = df.to_dask_array(lengths=True)
res = dask_delayed_example(X)
Is Dask delayed only meant for parallelizing computation or can it also be used to perform computation on larger-than-memory datasets in a distributed manner?
Hi there ! I'm trying to use dask-ssh
as follows:
% dask-ssh --scheduler master1 node1
---------------------------------------------------------------
Dask.distributed v1.27.0
Worker nodes:
0: node1
scheduler node: master1:8786
---------------------------------------------------------------
[ scheduler master1:8786 ] : /home/applis/anaconda/envs/py3v19.04/bin/python -m distributed.cli.dask_scheduler --port 8786
[ worker node1 ] : /home/applis/anaconda/envs/py3v19.04/bin/python -m None master1:8786 --nthreads 0 --host node1 --memory-limit auto
but I cannot connect a client to the scheduler and logging does not provide more information. Client('master1:8786')
fails with OSError: Timed out trying to connect...
and I can not access the web ui at master1:8787. However, running dask-scheduler
and dask-worker
work fine. Any suggestion ?
da.from_array(large_numpy_array)
was blowing up workers, as (counterintuitively) the large_numpy_array
is not actually partitioned along the expected chunks. Each worker appears to get a full copy of large_numpy_array
regardless of the chunking.
from_array(large_array, chunks=chunks)[0].compute()
, for example, does not allocate data to workers the way one would expect from the chunking
scatter
, is that right?
solve
operation across the last 2 dimensions, i.e. given A
and B
I want to do C[i,:,:]=solve(A[i,:,:], B[i,:,:])
for all i in the leading dimension... I tried the below but it seems to be really slow (slower than numpy) - does anyone know what I'm doing wrong/what I could do better? Sorry if this was the wrong place to ask.C = da.apply_gufunc(np.linalg.solve, "(i,j),(i,k)->(j,k)", A, B, vectorize=True,output_dtypes=A.dtype)