Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Yuvi Panda
    @yuvipanda
    @jcrist @mrocklin kubecon 2020 in March has CfP closing Dec 4. I think a talk on Dask would be very appropriate and well received
    Jim Crist-Harif
    @jcrist
    Ooo, I'll submit something. Thanks for the ping.
    Yuvi Panda
    @yuvipanda
    @jcrist yw! I pinged in the PANGEO discourse too
    Matthew Rocklin
    @mrocklin
    cc @jacobtomlinson , this is close by in Amsterdam. I'm not sure how this aligns with your schedule though.
    David Hoese
    @djhoese
    quick sanity check: As far as everyone knows dask fully supports python 3.8, right? Had a Satpy user say they had a weird multiprocessing error on python 3.8. I've asked them for a simple example so I can try to reproduce, but thought I'd check here too.
    David Hoese
    @djhoese
    looks like it is probably a false alarm. May be users environment or something satpy specific
    Иван Сердюк
    @oceanfish81_twitter
    Hi there
    anyone interested to give a talk, on dask, in Kiev/Ukraine ( may 22-23, 2020 ) ?
    Kolmar Kafran
    @kafran

    Hello everyone. I'm reading a json with the following structure. result is a tabular data and I want to load it into a DataFrame. I'm getting it with .pluck('result'), which returns a list of dictionaries. But when I convert it to_dataframe(), the dataframe has 500k+ columns =/.

    {
        "dateTime":"2019-12-01 00:00:00",
        "dateTimeMilliseconds":0000000000000,
        "operationID":0000000,
        "error":null,
        "result":[
            {dictionary with columns: values},
            {...},
            {...},]
    }

    If I pass it to .from_sequence() and then .to_dataframe() it works but it eats all my RAM

    Yuvi Panda
    @yuvipanda
    how do I get logs from dask-kubernetes' scheduler?
    I ask it for a worker, but it doesn't seem to create the pod
    nor otuptu errors
    Martin Durant
    @martindurant
    Should be on the info tab of the dashboard, button at the top
    Yuvi Panda
    @yuvipanda
    hmm, all looks fine
    distributed.scheduler - INFO - Receive client connection: Client-4db8889a-1531-11ea-804e-0edc9fd66343
    aha, if I run cluster.scale(1), after a while I get:
    Task exception was never retrieved
    future: <Task finished coro=<_wrap_awaitable() done, defined at /opt/conda/envs/geostack/lib/python3.7/asyncio/tasks.py:596> exception=AssertionError()>
    Traceback (most recent call last):
      File "/opt/conda/envs/geostack/lib/python3.7/asyncio/tasks.py", line 603, in _wrap_awaitable
        return (yield from awaitable.__await__())
      File "/opt/conda/envs/geostack/lib/python3.7/site-packages/distributed/deploy/spec.py", line 42, in _
        assert self.status == "running"
    AssertionError
    tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <zmq.eventloop.ioloop.ZMQIOLoop object at 0x7f0da548f550>>, <Task finished coro=<SpecCluster._correct_state_internal() done, defined at /opt/conda/envs/geostack/lib/python3.7/site-packages/distributed/deploy/spec.py:284> exception=AssertionError()>)
    Traceback (most recent call last):
      File "/opt/conda/envs/geostack/lib/python3.7/site-packages/tornado/ioloop.py", line 743, in _run_callback
        ret = callback()
      File "/opt/conda/envs/geostack/lib/python3.7/site-packages/tornado/ioloop.py", line 767, in _discard_future_result
        future.result()
      File "/opt/conda/envs/geostack/lib/python3.7/site-packages/distributed/deploy/spec.py", line 317, in _correct_state_internal
        await w  # for tornado gen.coroutine support
      File "/opt/conda/envs/geostack/lib/python3.7/site-packages/distributed/deploy/spec.py", line 42, in _
        assert self.status == "running"
    AssertionError
    Yuvi Panda
    @yuvipanda
    is there a way to make logs very verbose?
    I can tell that no calls are being made to kubernetes
    so it's somewhere in dask
    Martin Durant
    @martindurant
    :)
    Yuvi Panda
    @yuvipanda
    nothing seems to help unfortunately. I think what's happening is the scheduler process isn't starting, but I actually don't know if the scheduler is an external process or not
    Matthew Rocklin
    @mrocklin
    @yuvipanda you may want cluster.pods() and cluster.logs()
    (if you're using dask-kubernetes)
    Yuvi Panda
    @yuvipanda
    I am
    cluster object doesn't seem to have either methods
    conda-forge's dask is at v0.15? or am I misreading https://anaconda.org/conda-forge/dask?
    (sorry for all the dumb questions!)
    yuvipanda @yuvipanda tries unpinning dask in environment file
    Martin Durant
    @martindurant
    Says 2.8.1
    Yuvi Panda
    @yuvipanda
    aaah
    it says that for linux noarch
    but has 4 badges saying 0.15
    image.png
    my bad
    Yuvi Panda
    @yuvipanda
    hmm, LocalCluster has a .logs and shows me logs
    while KubeCluster does not
    And LocalCluster does worki
    Yuvi Panda
    @yuvipanda
    hmm, I guess this is how people new to JupyterHub feel :D
    I'm gonna go figure out how to get logs to go somewhere not stderr
    Matthew Rocklin
    @mrocklin
    maybe get_logs?
    There is, I believe some logs method there
    Yuvi Panda
    @yuvipanda
    @mrocklin yeah! except there isn't much there, so i was trying to set it to debug. AFter doing so in ~/.dask/config.yaml, nothing there either
    Yuvi Panda
    @yuvipanda
    hmm, I reverted back to an Image used in a different hub that I know is working
    so this is confusing
    Yuvi Panda
    @yuvipanda
    aaargh, it was a typo! My RoleBinding referred to a ClusterRole and not a Role
    why wasn't this in the k8s audit logs?!