by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Vraj Shah
    @vrajshah11
    I can clarify if you have any questions
    but I doubt it's not applying the function in a parallel way pandas take the exact same time for the whole thing
    Adrian
    @adrianchang__twitter
    hey guys, does anyone have a good example of using like a dask ml pipeline in a real time service?
    mostly having trouble like deal with real time one hot encoding ....
    dealing with
    I am the author of disk.frame. I am trying to do fair benchmarks with Dask. Can anyone help me get this right?
    I need to know how to properly tune dask for best performance on a single-machine
    So far, it feels like disk.frame is slightly faster than Dask, but I think I might be doing something wrong
    RuiLoureiro
    @RuiLoureiro
    Hey everyone, I'm trying to define custom operations and am a bit uncertain as to how to implement it. Posted a more detailed question on SO:
    https://stackoverflow.com/questions/57597151/how-do-i-use-dask-to-efficiently-compute-custom-statistics
    Vishesh Mangla
    @XtremeGood
    hey has anyone used joblib?
    Scott Sievert
    @stsievert
    @RuiLoureiro maybe https://docs.dask.org/en/latest/caching.html or Pythons builtin lru_cache? (also submitted comment on SO)
    Loïc Estève
    @lesteve
    About caching within a dask context I have heard of this https://github.com/radix-ai/graphchain from this comment. I have not used it though.
    evalparse
    @xiaodaigh
    Gotta say the support from dask vs Julia Community is pretty ordinary
    Jim Crist-Harif
    @jcrist
    @xiaodaigh, dask developers are generally not active on gitter, please reach out via github or stackoverflow with questions like this.
    evalparse
    @xiaodaigh
    @jcrist Thanks for the tip. I will post to github next. Cos my SO post hasn't receieved much attention either....
    Jim Crist-Harif
    @jcrist
    Dask is an open source project, and like many other projects has limited developer resources. Asking a question and then complaining that you got no response within 24 hours isn't productive. Please be patient and respectful of others' time.
    evalparse
    @xiaodaigh
    Not complaining. Just comparing experiences, usually I get fairly quick response on Julia questions. So it is a relative experience of the communities. I am also open source author so I understand. That's why I never bother anyone in dask/dev. I tried SO Twitter etc but hardly any response to my noob questions . So to me, the community is not very active that's all.
    Niloy-Chakraborty
    @Niloy-Chakraborty
    Hi All, Can I use Streamz for consuming data from RabbitMQ? then process using dask..
    Pedro Lopes
    @pedroallenrevez

    Hey all, I have the following problem reproducible with the example:

    s = pd.Series([1,2,3,4,5])
    ds = dd.from_pandas(s, npartitions=2)
    print(ds.sum())
    print(da.sqrt(ds.sum()))
    print(da.sin(ds.sum()))
    print(da.power(ds.sum(), 2))

    Computing any dask.array ufunc of a dask.Scalar will trigger acomputation.
    If it is done on the Series, the behavior is as expected (dask graph is returned). Any ideas on why this happens?

    Sarah Bird
    @birdsarah
    Can anyone give me a quick pulse on whether I'm going crazy with issue: dask/dask#5319
    (it will help me know how to proceed with my etl)
    Michael Adkins
    @madkinsz
    Any advice on running docker-in-docker on Dask? e.g. running a containerized task in a Dask worker node on Kubernetes
    Kolmar Kafran
    @kafran
    @xiaodaigh Have you tried StackOverflow?
    suraj bhatt
    @surisurajbhatt_twitter
    Hi, im unable to execute following querry : import dask.dataframe as dd
    df = dd.read_parquet('gcs://anaconda-public-data/nyc-taxi/nyc.parquet/part.0.parquet')
    suraj bhatt
    @surisurajbhatt_twitter

    Hi, im unable to execute following querry : import dask.dataframe as dd
    df = dd.read_parquet('gcs://anaconda-public-data/nyc-taxi/nyc.parquet/part.0.parquet')

    Error : ArrowIOError: Unexpected end of stream: Page was smaller (5242780) than expected (6699768)

    Martin Durant
    @martindurant
    ^ please try with fsspec 0.4.2
    suraj bhatt
    @surisurajbhatt_twitter
    could you please give me demo syntax @martindurant ?
    Martin Durant
    @martindurant
    same syntax, but update your version of fsspec, available via pip or conda
    suraj bhatt
    @surisurajbhatt_twitter
    did that but error: KeyError: 'gcs' @matindurant
    Martin Durant
    @martindurant
    Then you should probably also update gcsfs
    suraj bhatt
    @surisurajbhatt_twitter
    which version
    Martin Durant
    @martindurant
    latest
    suraj bhatt
    @surisurajbhatt_twitter
    nothing working for this import dask.dataframe as dd
    df = dd.read_parquet('gcs://anaconda-public-data/nyc-taxi/nyc.parquet/part.0.parquet')

    nothing working for this import dask.dataframe as dd
    df = dd.read_parquet('gcs://anaconda-public-data/nyc-taxi/nyc.parquet/part.0.parquet')

    @martindurant

    Tom Augspurger
    @TomAugspurger
    @surisurajbhatt_twitter can you write a minimal example and post a github issue? http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
    James Stidard
    @jamesstidard
    Hi, I was wondering if it's OK for a dask delayed function to use a process pool within it? Or will that cause havoc with the scheduler/resource monitoring of dask?
    Brett Naul
    @bnaul
    @jamesstidard if you're running with the multiprocess scheduler you'll definitely have issues, not sure about the threaded scheduler...you could try using the "tasks from tasks" pattern with the distributed scheduler https://distributed.dask.org/en/latest/task-launch.html
    random small q: I've noticed my k8s daskworkers sometimes take 30-60s to connect to the scheduler even after all of the pods are running, is there a config value I might have messed with that controls how often the workers retry to connect? I changed a lot of stuff trying to fix some timeouts so I'm guessing it's my fault
    James Stidard
    @jamesstidard
    @bnaul ah great thanks you!
    Eduardo Gonzalez
    @eddienko
    Hi, quick question, what does "orange" colour mean in the Dash Web UI bokeh interface. Cannot find it in the docs!
    dask_orange.png
    Eduardo Gonzalez
    @eddienko
    Oh, thanks, I missed that one!
    Benjamin Zaitlen
    @quasiben
    @eddienko perhaps that information could also be noted here: https://docs.dask.org/en/latest/diagnostics-distributed.html . Any interest in submitting a PR ?
    Understand if you are busy, but you are not the first person to miss that page
    Eduardo Gonzalez
    @eddienko
    I would have expected it to be in https://distributed.dask.org/en/latest/web.html
    Benjamin Zaitlen
    @quasiben
    or there
    Eduardo Gonzalez
    @eddienko
    cool. I may submit a PR ;-)