Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    lancelot1969
    @lancelot1969
    Is rolling function fully working in dask? And if it is not working, what is possible reason? Thanks
    Martin Durant
    @martindurant
    @lancelot1969 , that’s a pretty vague question. Do you have something specific you want to ask?
    lancelot1969
    @lancelot1969
    @martindurant :) I tried to be more general but I guess it sounded more like vague. I mean rolling function with aggregations run with dask distributed.
    Martin Durant
    @martindurant
    I see a whole section in the API docs listing rolling operations and describing caveats. You do not say what “not working” ,means for you.
    lancelot1969
    @lancelot1969
    I was also getting dublicate rows with map_overlap for groupby rolling
    with datetimes
    Martin Durant
    @martindurant
    “also”? Please make a question on stackoverflow with a fully reproducible example, showing the problem you encountered.
    lancelot1969
    @lancelot1969
    Thank you!
    Shaunak De
    @Shaunakde
    This message was deleted
    2 replies
    Ray Bell
    @raybellwaves
    On latest dask trying df = df.repartition(partition_size="100MB") I get ValueError: need at least one array to concatenate
    I think there is a mismatch between the pandas dtypes I have (df = df.convert_dtypes()) and specifying the schema using a pyarrow dict.
    Traceback here https://gist.github.com/raybellwaves/1c5c1f9a88a502b809aafac129463d88
    2 replies
    Boaz Mohar
    @boazmohar
    Hi, I was wondering what is the Dask maintainers approach to issues submitted based on static analysis tools. I looked up Dask on lgtm as I started using it recently and found some issues. I would be able to submit issues and maybe even PRs but it is beyond my abilities to add tests showing the issue due to my pretty basic understating of Dask core code. For example, there are 4 places where there are nested loops with same variable. Another example is an unused exception object. Thanks!
    Martin Durant
    @martindurant
    I think its reasonable to at least provide a summary of the analysis output in an issue. We do you various linting tools in the CI.
    Grigore Cristian-Andrei
    @grrigore
    Hello, is Dask suitable for low ram machines?
    Martin Durant
    @martindurant
    @grrigore : dask allows out-of-core computations, so you can do things with limited RAm that you wouldn’t otherwise be able. However, it depends what you mean by “low ram”: when analysing data, you will always need to keep some in memory, the “chunk/partition size”.
    Trym A E Lindell
    @TrymAELindell
    @martindurant The fix you suggested worked like a charm:)
    goriliukasbuxton
    @goriliukasbuxton
    TypeError: can't pickle _asyncio.Task objects
    trying to add task to:
    class Service():
    def __init__(self,client):
        self.out_folder = None
        self.client = client
    Bas Nijholt
    @basnijholt

    I've been watching some videos about speeding up the dask scheduler and this is really exciting!

    The last time I tested dask (1-2 yrs ago) it was too slow for running our workflows (running simple task graphs on 500-1000 nodes (~20.000 cores)). Is anyone aware of what the largest scale is at which someone has run dask computations?

    Andrzej Novak
    @andrzejnovak
    dask client/cluster seems to stay around if ctrl-c a script, is there a way around it?
    Ale 😡
    @aleperalta82_twitter
    Hello, I've just watch Jim Crist's talk on dask-gateway and I was wondering about implementing backend for HashiCorp's Nomad. As far as I can see, one would only need to implement the backend and dask-gateway would allow me to create a cluster? dask-gateway doesn't rely on other dask packages like dask-kubernetes for kubernets support? What would be the minimum implementation to be able to support Nomad?
    Martin Durant
    @martindurant
    You are right in your assumptions. I know nothing about Nomad, do you need an in-cluster server, or something more like dask-cloudprovider? Does Nomad provide a k8s, HPC or Hadoop-like layer?
    Ale 😡
    @aleperalta82_twitter
    Hi Martin, nomad is a scheduler. It sort of competes with Kubernetes
    You can schedule docker images and more
    More correctly nomad is an orchestrator like kubernetes.
    And no, afaik nomad doesn't support layers for HPC or Hadoop
    Martin Durant
    @martindurant
    OK, so then you would be between dask-kubernetes (one set of dask scheduler/workers, controlled from outside) versus gateway (a service on the inside, which can launch sets of schedulers/workers, but communicates to the outside world over one port). Gateway is much more complex; if you were to make both, there would likely be a lot of common code or run templates.
    (I mean dask-kuberetes Vs gateway-on-kubernetes as inspiration for your new project)
    Another inspiration: https://github.com/dask/dask-docker (has a compose setup)
    Ale 😡
    @aleperalta82_twitter
    gateway would allow me to launch new clusters, meanwhile a something like dask-kubernetes allows me to setup a single cluster is that correct? If I want a new cluster I would "run" again dask-kubernetes somehow?
    Martin Durant
    @martindurant
    correct
    Ale 😡
    @aleperalta82_twitter
    Thank you very much for the answers and clarifications. I'm asking just to have an idea of what I would need to do if I want to create dask cluster with my current setup (nomad). Thanks again.
    Martin Durant
    @martindurant
    the compose.yaml in dask-docker might be a good start
    Ale 😡
    @aleperalta82_twitter
    Yes it is! I was also wondering if could create cluster on demand with nomad and how would one go about it, that's way I was looking at gateway.
    Davis Bennett
    @d-v-b
    image.png
    can I pass an argument to .visualize() that prevents me from seeing all these extra tasks:
    I have a massive array dataset and I'd like to look at the task graph of an operation on a subregion, but this requires not visualizing all the to-be-culled tasks
    Martin Durant
    @martindurant
    You probably want to optimize your graph before passing to .visualize?
    Davis Bennett
    @d-v-b
    doesn't visualize do that by default?
    maybe I need to tell it explicitly to use the array optimizer
    ahh optimize is False by default in visualize
    Martin Durant
    @martindurant
    right got it. I ought to have known the exact thing, having looked at this very recently!
    Davis Bennett
    @d-v-b
    next question: why am I not seeing these two independent branches getting simplified
    image.png
    Martin Durant
    @martindurant
    It suggests they have a common input
    Davis Bennett
    @d-v-b
    image.png
    doesn't look like fusion occurs even when there's no shared root
    Martin Durant
    @martindurant
    agree, one might think some of those linear chains would fuse
    Matthew Rocklin
    @mrocklin
    There is some width and branching there. If you want more aggressive fusion you can bump up the config value for optimization.fuse.ave-width I think
    Davis Bennett
    @d-v-b
    setting that parameter prior to calling visualize doesn't do anything to the rendered task graph, unfortunately