futures = df.to_parquet(.., compute=False) try: client.compute(futures) except Exception as e: ..log stuff.. client.retry(futures)
also, slight nit in terminology
futures = df.to_parquet(.., compute=False)
Those are delayed objects, not futures. Futures point to work that has been launched already.
yarn application logsas discussed offline.
Hi everyone. probably simple question but couldn't find the specific docs. How should I deploy an adaptive cluster on k8s using dask? Shall I write a
.py file with
from dask_kubernetes import KubeCluster cluster = KubeCluster() cluster.adapt(minimum=0, maximum=100) # scale between 0 and 100 workers
and run it on scheduler machine?
yarn logs -applicationId <your application id>, and only is available for stopped applications. If the application is still running you can get the worker logs using
get_worker_logs, as you said above, or through the yarn resourcemanager web ui (Skein's webui server shows live logs for all services).
dask.dataframeso I'm curious if you avoid it do you get better performance - e.g.
def read_table(filename): import pyarrow.parquet as pq tbl = pq.read_table(filename) return tbl.to_pandas() futs = client.map(read_table, filenames) df = client.submit(pd.concat, futs).result()