This channel is rarely used. For other channels to contact the dask community, please see https://docs.dask.org/en/stable/support.html
Hi there ! I'm trying to use dask-ssh
as follows:
% dask-ssh --scheduler master1 node1
---------------------------------------------------------------
Dask.distributed v1.27.0
Worker nodes:
0: node1
scheduler node: master1:8786
---------------------------------------------------------------
[ scheduler master1:8786 ] : /home/applis/anaconda/envs/py3v19.04/bin/python -m distributed.cli.dask_scheduler --port 8786
[ worker node1 ] : /home/applis/anaconda/envs/py3v19.04/bin/python -m None master1:8786 --nthreads 0 --host node1 --memory-limit auto
but I cannot connect a client to the scheduler and logging does not provide more information. Client('master1:8786')
fails with OSError: Timed out trying to connect...
and I can not access the web ui at master1:8787. However, running dask-scheduler
and dask-worker
work fine. Any suggestion ?
da.from_array(large_numpy_array)
was blowing up workers, as (counterintuitively) the large_numpy_array
is not actually partitioned along the expected chunks. Each worker appears to get a full copy of large_numpy_array
regardless of the chunking.
from_array(large_array, chunks=chunks)[0].compute()
, for example, does not allocate data to workers the way one would expect from the chunking
scatter
, is that right?
solve
operation across the last 2 dimensions, i.e. given A
and B
I want to do C[i,:,:]=solve(A[i,:,:], B[i,:,:])
for all i in the leading dimension... I tried the below but it seems to be really slow (slower than numpy) - does anyone know what I'm doing wrong/what I could do better? Sorry if this was the wrong place to ask.C = da.apply_gufunc(np.linalg.solve, "(i,j),(i,k)->(j,k)", A, B, vectorize=True,output_dtypes=A.dtype)
scale
and then read it back then scale up/down from that?