Hi everyone. probably simple question but couldn't find the specific docs. How should I deploy an adaptive cluster on k8s using dask? Shall I write a
.py file with
from dask_kubernetes import KubeCluster cluster = KubeCluster() cluster.adapt(minimum=0, maximum=100) # scale between 0 and 100 workers
and run it on scheduler machine?
yarn logs -applicationId <your application id>, and only is available for stopped applications. If the application is still running you can get the worker logs using
get_worker_logs, as you said above, or through the yarn resourcemanager web ui (Skein's webui server shows live logs for all services).
dask.dataframeso I'm curious if you avoid it do you get better performance - e.g.
def read_table(filename): import pyarrow.parquet as pq tbl = pq.read_table(filename) return tbl.to_pandas() futs = client.map(read_table, filenames) df = client.submit(pd.concat, futs).result()
Hey all, I have the following problem reproducible with the example:
s = pd.Series([1,2,3,4,5]) ds = dd.from_pandas(s, npartitions=2) print(ds.sum()) print(da.sqrt(ds.sum())) print(da.sin(ds.sum())) print(da.power(ds.sum(), 2))
Computing any dask.array ufunc of a dask.Scalar will trigger acomputation.
If it is done on the Series, the behavior is as expected (dask graph is returned). Any ideas on why this happens?
nothing working for this import dask.dataframe as dd
df = dd.read_parquet('gcs://anaconda-public-data/nyc-taxi/nyc.parquet/part.0.parquet')