Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 31 2019 21:59
    joegoldbeck edited #4446
  • Jan 31 2019 21:56

    martindurant on master

    Fix relative path parsing on wi… (compare)

  • Jan 31 2019 21:56
    martindurant closed #4445
  • Jan 31 2019 21:55
    martindurant commented #4445
  • Jan 31 2019 21:54
    Dimplexion commented #4445
  • Jan 31 2019 21:47
    joegoldbeck opened #4446
  • Jan 31 2019 21:41
    TomAugspurger commented #4361
  • Jan 31 2019 21:38
    holoneo starred dask/dask
  • Jan 31 2019 21:30
    Mdhvince commented #4361
  • Jan 31 2019 20:52
  • Jan 31 2019 20:49
    mrocklin commented #2497
  • Jan 31 2019 20:22
    mrocklin opened #2497
  • Jan 31 2019 20:18
    mrocklin closed #4444
  • Jan 31 2019 20:18
    mrocklin commented #4444
  • Jan 31 2019 20:17

    mrocklin on 1.25.3

    (compare)

  • Jan 31 2019 20:17

    mrocklin on master

    bump version to 1.25.3 (compare)

  • Jan 31 2019 20:12
    martindurant commented #4445
  • Jan 31 2019 18:52

    mrocklin on master

    bump version to 1.1.1 (compare)

  • Jan 31 2019 18:52

    mrocklin on 1.1.1

    (compare)

  • Jan 31 2019 18:38
    martindurant commented #4445
GFleishman
@GFleishman
I don't remember any deprecation warnings about this function previously and there doesn't seem to be anything about it in the source. I guess it's weird that click would remove it (or relocate it, not sure) in v8.0 w/o having given any warnings? I guess I can ask them about it.
Martin Durant
@martindurant
That’s a good idea. Maybe post the issue just with your original situation and link to it too - then we can find out if pip is doing this to others too.
Matthew Rocklin
@mrocklin
Hrm, I'm apparently double booked during the maintenance meeting this morning with something that is somewhat important. My apologies. I'll try to sneak out of my other meeting, but it's looking unlikely.
Martin Durant
@martindurant
Is anybody interested in getting involved in google-summer-of-code? Numfocus is a host organisation, so we could enter a project, if we can come up with a good idea of what that project (dask or dask-adjacent) might be. I don’t have such an idea yet! I would gladly try to brainstorm one though, and co-mentor. In my experience, you shouldn’t expect to get much work out of GSC (the coder will take as much effort to train as it would have to get the original thing done, on average), but it’s a good way to widen our contributor pool, be forced to come up with ideas, and practice mentorship.
1 reply
Julia Signell
@jsignell
I'm going to miss the maintenance meeting this morning. My report is same as usual though
kpasko
@kpasko
anyone have issues getting any of the docker image to read parquet? it seems they don't include fastparquet or pyarrow packages by default, though it could of course just be my naivete in deployment
Matthew Rocklin
@mrocklin
Dask array team ^ ??
Benjamin Zaitlen
@quasiben
We're still planning on a release this Friday, correct ?
jakirkham
@jakirkham
Benjamin Zaitlen
@quasiben
Came across a paper on folks trying to build a multi-backend execution engine for Python:
http://cidrdb.org/cidr2021/papers/cidr2021_paper08.pdf
Martin Durant
@martindurant
Yes saw it. Feels like Blaze… (joking, a little)
James Bourbeau
@jrbourbeau
I'm gonna have to miss the meeting today. Last week: released Dask and Distributed 2021.02.0, continued master->main changes with @jsignell, tried to be active on new issues / PRs. This week: similar things.
kirikov
@kirikov
Hello Dask community, my name Kirill and I'm CTO of http://datrics.ai/. We're developing no-code data-science platform and we're based on dask. We have problems with memory leaks and performance, and we need help or some consultation from folks knowing dask deeply.
If you're interested, please DM me. Thank you!
Martin Durant
@martindurant
Some of the dask-involved companies such as Anaconda and Quansight might be interested in a consulting contract, if you were to get in touch with them directly. Obviously, it’s in the interests of Dask in general to solve memory issues, so if you can make your situation, or a good proxy for it public, you might get more help.
1 reply
Itamar Turner-Trauring
@itamarst
so I'm looking at this bug where there's a dataframe with a column with dtype object, and it stores datetime objects rather than strings
3 replies
but the metadata heuristics just assume that dtype("O") means str
my first thought is that meta generation should be given some data from the underlying series, so it can guess based on real data
which might be intrusive, but might work
Julia Signell
@jsignell
Try setting the _meta by hand and see if it works
I think I did try that on the MRE and it didn't quite fix it
Itamar Turner-Trauring
@itamarst
(I have a smaller reproducer now BTW, the arrow stuff was a distraction, just how the end user found it)
Julia Signell
@jsignell
you can set the meta using ddf._meta = ddf.head()
Can we move this discussion to the issue actually?
Itamar Turner-Trauring
@itamarst
will do
Ali Kefia
@alikefia
Hello, I am new to dask, quick question : The nanny and worker classes are sharing some logic to check the parameters, is there any reason to not delegate this to ONE side ?
A side effect on temporary-directory (we concat twice the suffix dask-worker-spaceto the configured tmp folder)
jakirkham
@jakirkham
Hi Ali, could you please raise this as issue on Distributed? Also please feel free to drop that link here so people can follow the conversation. Thanks! 😀
Ali Kefia
@alikefia
Sure @jakirkham !
Ali Kefia
@alikefia
I will work on a PR and ask for reviews :)
Sebastian Berg
@seberg
I think I had asked before, and pretty sure it is fine. But Dask does not expect numpy to forward "invalid" arguments to ufuncs, right? Something like np.add(dask_arr1, dask_arr2, dask_specific_argument="value")? I am cleaning out the code in NumPy and that includes checking argument names (not the actual values) up-front before dispatching to __array_ufunc__.
Matthew Rocklin
@mrocklin
Hrm, I'm not sure I know of a good use case for this currently. In general I would not expect Numpy to properly handle Dask specific keywords, so maybe this question is moot?
Sebastian Berg
@seberg
Yeah, should be moot and I was just being paranoid. In any case, if anyone notices a change just let me know.
Doug Friedman
@realdoug_twitter

Hello, apologies if i'm not asking in the right place here, but:

I noticed that running mypy on a file that uses dask results in Skipping analyzing 'dask.bag': found module but no type hints or library stubs
Assuming its done with consideration and very incrementally, is there openness to receiving PRs that add support for the typing module and related tooling to dask?

I am a relative newcomer to dask so I could be totally off base here but figured i'd ask! Thanks!

Martin Durant
@martindurant
I think that falls under “we haven’t got around to it”. Certainly, typing has not been a priority. I expect that implementing it would be quite an undertaking. We do not run mypy in CI. Probably no one is outright opposed, so long as it doesn’t complicate the code too much.
Doug Friedman
@realdoug_twitter
numpy supports it now so that might be a starting point for the portions that mimic the numpy api
Matthew Rocklin
@mrocklin
Jason Wagner
@keegean1_gitlab
hello
I'm trying to use dask to load in a csv file and I've specified the dtypes and the column names. When I call n = df['item'] and then call n.compute(), I get an error saying the its trying to convert 'item' from object to float. The item is actually a float. I know all my data in the csv file are valid numbers.
jakirkham
@jakirkham
@keegean1_gitlab would suggest either raising an issue or StackOverflow question. This is intended for library development discussions
Martin Durant
@martindurant
@jcrist @TomAugspurger @jacobtomlinson - would appreciate if Doug and I were also invited to any gateway/deploy sync up you plan.
niranda perera
@nirandaperera
Hi all, I would like to write a custom dask scheduler. Could you please point me to some entrypoints/ documentation regarding this?
Martin Durant
@martindurant
@nirandaperera ! For a local scheduler, you need to essentially implement get, see dask.get or dask.threaded.get. You might find the dask-on-ray executor interesting (which defers scheduling to ray).
For a distributed scheduler, it would be quite an undertaking. You would start by looking through the current implementation. Note that others have written schedulers, e.g., https://github.com/It4innovations/rsds
4 replies
rajeshkumarrs
@rajeshkumarrs
I am trying to contribute for the first time, Is this the right forum to ask for help/direction? I have picked up a “good first issue” dask/dask#6812 .. Is this issue about adding a meaningful error message in to_csv()? I tried to look for "File Not Found" error message in ./dataframe/io/csv.py, which has the to_csv method, but I couldnt find the error message. Can you please help me if I am looking at the right place for this issue? If not, can you please provide direction? Thank you!
Martin Durant
@martindurant
If you have specific quesions about an issue, you should ask on that issue itself. The file-not-found (FileNotFoundError, I assume) is likely coming from fsspec or the builtin open function.
Julien Muniak
@Darune
Hi is this a correct place to talk about gcsfs aswell ?
Julien Muniak
@Darune

I'm trying to run the test and it seems some tests depending and recursive=True are failing test environment tracking why this fails lead me to fsspec trying to run a _get_file() on a "folder" if we can call them that, an it fails with a 404, one way to fix this is to use a glob in the test (adding a *) so that only files are sent to _get_file().
But i'm confused, the CI tells me that the test is passing so I'm wondering if this should be fixed or not. or if my test environment is not well setup.

If it should, should we fix the test itself ? or should we fix the logic of fsspec / gcsfs when using the recursive=True ?
The second option will lead to a breaking change some might rely on this in a way.

Martin Durant
@martindurant
probably an issue on gcsfs is best, if you can make a reproducer.
Julien Muniak
@Darune
sure will do
Daniel Mesejo-León
@mesejo

Hello everyone! I'm not sure this is a bug (or where is it) but I found this behavior confusing:

   s_2 = pd.Series(1, index=[*range(20, 0, -1)])
    ds_2 = dd.from_pandas(s_2, npartitions=5)

    # works
    assert_eq(s_2.index.to_series(), ds_2.index.to_series())

    # don't works
    assert s_2.index.to_series() == ds_2.index.to_series().compute()

Basically the first assert works, but the second does not. Why is that? Moreover when I do:

print(ds_2.index.to_series().compute().tolist())

The output is:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

The reverse of the expected result.