by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Sep 18 12:59
    jbusecke commented #785
  • Sep 18 12:04
    qyxiao opened #785
  • Sep 11 06:48
    Gnopal1132 opened #784
  • Sep 07 19:48
    mktippett commented #440
  • Aug 26 00:37
    Boes-man commented #773
  • Aug 26 00:35
    Boes-man commented #773
  • Aug 26 00:32
    Boes-man commented #773
  • Aug 17 17:22
    stavrospapadopoulos commented #120
  • Aug 17 16:46
    JackKelly commented #120
  • Aug 17 16:08
    petacube commented #120
  • Aug 17 15:33
    JackKelly commented #120
  • Aug 17 11:14

    rabernat on gh-pages

    Update docs after building Trav… (compare)

  • Aug 17 11:12

    TomAugspurger on cluster-docs

    (compare)

  • Aug 17 11:12

    TomAugspurger on master

    Update Gateway cluster docs * … Merge pull request #783 from pa… (compare)

  • Aug 17 11:12
    TomAugspurger closed #783
  • Aug 14 15:57
    TomAugspurger opened #783
  • Aug 14 15:57

    TomAugspurger on cluster-docs

    Update Gateway cluster docs * … (compare)

  • Aug 14 12:00
    TomAugspurger commented #773
  • Aug 11 05:30
    Boes-man commented #773
  • Aug 08 07:25
    stale[bot] closed #764
Philipp Rudiger
@philippjfr
Cool, it's still not picking up on my custom conda env unfortunately. This should work right?
cluster = KubeCluster(env={'PATH': '/home/jovyan/my-conda-envs/datashader_dev/bin:$PATH'}, n_workers=2)
Scott Henderson
@scottyhq
hmm, so that used to work! but i just checked and it no longer does the trick. maybe due to changes to the way repo2docker changed the conda environment setup (pangeo-data/pangeo-stacks#47)
i thought maybe adding ‘CONDA_PREFIX’ or ‘CONDA_DEFAULT_ENV’ but at some point when workers are initialized, the ‘notebook’ environment keeps ending up in front : 'PATH': '/srv/conda/envs/notebook/bin:/srv/conda/condabin:/home/jovyan/my-conda-envs/dask-minimal/bin:/srv/conda/condabin:/home/jovyan/.local/bin:/home/jovyan/.local/bin:/srv/conda/envs/notebook/bin:/srv/conda/bin:/srv/npm/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin’
you can check all the environment variables on the workers with client.run(lambda: os.environ)
Philipp Rudiger
@philippjfr
Ah nice, didn't know about client.run, used the much more hacky delayed(fn)().compute().
Scott Henderson
@scottyhq
@philippjfr - this seems to work!
cluster = KubeCluster(env={'NB_PYTHON_PREFIX':sys.prefix})
Philipp Rudiger
@philippjfr
Cool! I was hacking the sys.path after the fact, but that seems a bit nicer.
One more thing, does anyone have a really large zarr dataset I could test distributed regridding code on?
Joe Hamman
@jhamman
Most (maybe all) of these datasets are in GCP central though.
Scott Henderson
@scottyhq
we can move any of those you might be interested in experimenting with to s3://pangeo-data-useast1 that is accessible from nasa.pangeo.io (there are only a few datasets in that bucket currently)
Philipp Rudiger
@philippjfr
That would be really great. Out of those the Hydro cgiar_pet seems perfect. The hydrosheds seem interesting too but I'm not quite clear on how they are structured, is there a time dimension that's not made explicit?
David Brochart
@davidbrochart
@philippjfr The hydrosheds datasets don't have any time dimension. They are GDAL VRTs, you can open them with e.g.:
import xarray as xr
import gcsfs

fs = gcsfs.GCSFileSystem('pangeo-data')
fs.get('pangeo-data/hydrosheds/acc.vrt', './acc.vrt')
da = xr.open_rasterio('./acc.vrt')
(da.sel(band=1, x=slice(-60, -59), y=slice(1, 0)) ** 0.1).plot()
Tina Odaka
@tinaok
@kmpaul @guillaumeeb @willirath , I'm just back from holiday, trying to catch up the situation. Does anyone have any updates on Pangeo participation on SC19? I think it would be a great occasion to talk about HPC integration of pangeo, optimisation, also show our test cases and benchmarks.
Kevin Paul
@kmpaul
@tinaok Yes! SC19 is in my backyard, this year, so we will be there. There is a workshop on interactive HPC that I think a lot of Pangeo would fit into well. However, there are other workshops that would be good fits for other topics such as (maybe) a scientific data reduction workshop, a workshop on cloud-HPC interoperability and maybe others. I think the interactive HPC workshop would be a good venue, though.
@tinaok Actually, I’ve been meaning to connect with you on this for a while, but I’ve been busy and forgot. Thanks for reaching out!
Tina Odaka
@tinaok
@kmpaul Wonderfull, I am looking for someone who I can make joint presentation on usage of pangeo (my domain is more like benchmarking, optimal usage of HPC, but I can expand.) on workshop of BoF. Interactive HPC sounds good too.
Kevin Paul
@kmpaul
I’d be happy to join you in that, @tinaok. Count me in!
Scott Henderson
@scottyhq
@davidbrochart and @philippjfr i’ve gone ahead and copied hydrosheds over to s3, so you can now access from either hub:
import xarray as xr
import s3fs
fs = s3fs.S3FileSystem(anon=False, requester_pays=True)
fs.get('pangeo-data-useast1/hydrosheds/acc.vrt', './acc.vrt')
da = xr.open_rasterio('./acc.vrt')
da
Philipp Rudiger
@philippjfr
@scottyhq Great, does that include the cgiar_pet dataset?

Also my server seems to keep crashing on nasa.pangeo.io:

Server Connection Error
Invalid response: 503

Scott Henderson
@scottyhq
hmm.. i’m seeing Evicted Pods on the cluster, no idea what the cause might be. let me know next time you encounter the Error and I might be able to glean more
i’ll transfer cgiar_pet as well
Joe Hamman
@jhamman
@scottyhq - I just got a connection error too!
Scott Henderson
@scottyhq
hmm... message: 'The node was low on resource: ephemeral-storage. Container notebook was using 28Ki, which exceeds its request of 0. ' phase: Failed reason: Evicted
Joe Hamman
@jhamman
hmmm, for your info, I hadn’t done anything in my session yet. It just created then died almost immediately.
Scott Henderson
@scottyhq
not exactly sure what fills it up, but looks like all the nodes are still launching w/ default 20Gb EBS disk (new clusters we updated to 100Gb)
Scott Henderson
@scottyhq
in fact we documented this earlier! pangeo-data/pangeo-cloud-federation#274
Scott Henderson
@scottyhq
i’ll have a bit of time this afternoon to make sure its updated
David Hoese
@djhoese

@jhamman At scipy you talked about how pangeo's binder can scale down to 0. Any idea how long it keeps things alive before shutting everything down? What about time to start up? What about timeout on individual inactive JLab sessions?

I was thinking if I load a repository X minutes before a tutorial then everyone should have speedy access to their JLab session. However, if I do it too early then we'll have to wait anyway.

Joe Hamman
@jhamman
I think the jupyter session timeout is something like 60 minutes. K8s nodes scale down after that and I don’t recall the rate. You would certainly have 5-10 minutes though.
David Hoese
@djhoese
thanks
Kolmar Kafran
@kafran
'{:,.2f}'.format(1234.1234) results 1,234.12. How can I get 1.234,12 since '{:.,2f}'.format(1234.1234)is not valid?
David Hoese
@djhoese

@kafran You may want to look at the locale module and the n string format specifier. Have you tried stackoverflow? Best answer I found was:

In [32]: '{:.2n}'.format(1234.1234)                                                                                                        
Out[32]: '1,2e+03'

Which I think is a bug in python since it shouldn't be using exponent formatting

Nevermind, that might be intended but I still can't find a way to do what you want
if you post a question on stackoverflow please link it here. I'm curious
matrixbot
@matrixbot
ashwinvis Kolmar Kafran (Gitter)Kolmar Kafran (Gitter) Using comma as decimal might depend on locale settings. If your are on linux try localectl status
Kolmar Kafran
@kafran
@matrixbot I already tried it
Scott Henderson
@scottyhq
@philippjfr - staging.nasa.pangeo.io is back up if you want to keep experimenting
Sarah Bird
@birdsarah
quick poll in case anyone has experience - how steep is the learning curve (by your own definition) for kafka?
Matthew Rocklin
@mrocklin
Steep hill that is short enough that you can jog up it. Then it plateaus for a bit. Then there is a climb later on if you feel like going forward. There isn't really much to it as a system if you're just using it as storage. Also, some random notes on the Python APIs: http://matthewrocklin.com/blog/work/2017/10/10/kafka-python
Sarah Bird
@birdsarah
thanks @mrocklin
David Hoese
@djhoese
Is anyone up to date on what Unidata's work for NetCDF and zarr compatibility actually entails? Mainly how I, the user, would interact or use this functionality? I'm guessing all GOES-16 ABI NetCDF files on GCS and S3 would have to be rewritten in zarr format and that format could be read by NetCDF? Or...something? Anyone know?
James A. Bednar
@jbednar
@rsignell-usgs would probably know...
Dr. Andreas Hopfgartner
@anderl80
Hi all, I have a Pangeo Cluster in my gcloud. I'm not so familiar with gcloud (more Azure), is there a possibility to shut down the cluster for having no costs?
Matthew Rocklin
@mrocklin
Dask Jobqueue + SpecCluster rewrite issue here, if anyone wants to play around with HPC systems: dask/dask-jobqueue#306
Ryan May
@dopplershift
@djhoese The work going in netcdf-c will make zarr work as 'just' another on-disk format for the library to use, like it can use HDF5 or netCDF3. Really, though, this is a great question to send to the netCDF user list: https://www.unidata.ucar.edu/software/netcdf/mailing-lists.html (or support-netcdf@unidata.ucar.edu)
@djhoese As far as the current data in S3 (and maybe GCS), there is support in the library now to make byte-range requests over HTTP, so I think you could do a direct read of hdf5-backed data in S3 using the library.
David Hoese
@djhoese
@dopplershift any idea how it encodes that in the HTTP request? Doesn't that have to be understood by the server? I guess that is important enough that it was probably put in a while ago