I believe i'm ~90% there just having trouble getting the dashboard
Do I need to do anything to get the dashboard setup? i.e. copy the URL (which URL?) to the search bar in the Dask extension?
Ok. Found using the scheduler external IP
Just hoping to pin in next to my notebook using the Dask extension
Hi! The Enterprise Data Science Architecture Conference focuses on how to properly productionise data science solutions at scale. Dask is a tool that I have personally used to get the job done. Most Dask users would be interested in seeing how large companies productionise machine learning solutions. 27th March 2020 is a great time to visit Melbourne, Australia for a unique and high quality conference. I invite you view our speakers list at https://edsaconf.io and reserve your place because we have a unique mix of speakers.
hi gus, i'm new to dask, i want to konw if there is some tutorials about deploying dask jobs on k8s in native way? i tried with official docs, but i can not even run the demo successfully
i've containerized a demo python job , and deploy a pod on my k8s cluster, it always retiring the other workers.
i've also created rbac for the pod
Hello - I'm struggling to resolve the issue I posted here: dask/dask#5634. Any chance that there's anyone here open to a short consulting gig to help me debug it?
if you write code that doesn't need to shuffle/sync across dask nodes and is "embarrasingly parallellisable", you'll go into the realm of 95% maybe
if you write code that sequentially goes over your data row per row you'll be at 0%
that said, the scheduler can be a bottleneck for really big task graphs
but I'm not sure if that's always the case, we haven't scaled to the size where it made sense to look into that
do you think dask will scale for a more effective scheduler, maybe sometime in the future? or is it more "nice-to-have"? :)
The performance of the scheduler is always being optimised… There have been specific attempts to reimplement in cython or other, but be assured that the often quoated “1ms overhead per task” is pessimistic.
I tried to make a SSHCluster with a tunnel but it would seem Dask doesn't play nicely with Asyncssh. https://bpaste.net/show/K7XOG :"got Future <Future pending> attached to a different loop".
As I understand from the documentation (https://docs.dask.org/en/latest/remote-data-services.html), that Dask does not support Azure Blob or Azure Data Lake Gen 2 as a data source right now. Is there any time line in mind? We are planning to store our data in Azure Data Lake Gen 2 and use Dask for Feature Engineering as well as Training using XGBoost.
“adlfs” is now available on pypi, but only on a personal channel for conda ( https://anaconda.org/defusco/adlfs ). conda-forge should be coming soon. The master version of fsspec knows about adlfs and will use it, if installed. So the short answer is: yes, dask can read and write to both azure datalate and blob. @TomAugspurger , what happened to the release, is it time to update the text in the docs yet?
No idea. I haven’t done anything on adlfs in a few week.s
Oh, it’s @AlbertDeFusco ’s PR
have any dask-jobqueue users gotten adaptive deployment working?
[in this channel]
with dask gateway, does the gateway initiate connections to the client? Or is it one way?
with some tunneling, can I have my client (notebook) be on my local machine and the gateway on a remote k8s cluster?