Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • 03:14
    leifdenby commented #685
  • Feb 19 17:27
    jhamman commented #685
  • Feb 19 15:03
    fmaussion commented #685
  • Feb 19 14:38
    fmaussion commented #685
  • Feb 19 05:23
    jhamman commented #685
  • Feb 18 05:16
    stale[bot] closed #697
  • Feb 18 05:16
    stale[bot] commented #697
  • Feb 18 01:16
    stale[bot] labeled #749
  • Feb 18 01:16
    stale[bot] commented #749
  • Feb 16 17:37
    cgentemann commented #440
  • Feb 16 17:37
    cgentemann commented #440
  • Feb 16 17:36
    cgentemann commented #440
  • Feb 16 15:02

    rabernat on gh-pages

    Update docs after building Trav… (compare)

  • Feb 16 15:00

    rabernat on master

    move srclinks to docs template … (compare)

  • Feb 16 15:00
    rabernat closed #762
  • Feb 15 22:21
    rabernat opened #762
  • Feb 14 20:36
    XingongLi closed #758
  • Feb 11 17:32

    rabernat on gh-pages

    Update docs after building Trav… (compare)

  • Feb 11 17:30

    rabernat on master

    Update data.rst (compare)

  • Feb 11 04:36
    stale[bot] labeled #697
Pier
@PhenoloBoy
seems that in the upcoming 6 months there will be a beta for netCDF that will use Zarr but right now is more gossip than anything else ( at least for my understanding)
Rob Fatland
@robfatland
@rabernat fantastic i was hoping you'd be available.
Charles Blackmon-Luca
@charlesbluca
Quickly tossed together some notebooks to generate a catalog of all the data on gs://pangeo-data; my idea is that one day a script could be automated to do this on a regular basis
Rob Fatland
@robfatland
If nobody minds I'm going to hijack appear.in/pangeo from 5pm to 5:30pm PDT today for a conversation on Megaptera, our citizen science whale call identification ML project. LMK if any conflicting and I'll bang on over to zoom.
Anderson Banihirwe
@andersy005

RE: For some reason, @tjcrone and I are have trouble making intake work with s3. I think Tim will open an intake issue.

@rabernat & @tjcrone, did you figure this out? Could you expand on what the exact issue was? I am trying to create static intake catalogs pointing to CESM LENS data in S3 and I seem to be having some issues when accessing the data.

Joe Hamman
@jhamman
@robfatland :thumbsup:
Tom Augspurger
@TomAugspurger
Ping me if you’re having issues with s3fs. There’s been some churn lately.
Filipe
@ocefpaf
Did anyone loose a laptop charger during the meeting last week?
Ryan Abernathey
@rabernat
@tjcrone - could you open an intake issue about the s3fs / intake problem we ran into? I don't have the code to reproduce on my machine.
Ryan Abernathey
@rabernat
Satpy / pyresample experts. What is the best way to serialize an area definition and store it in an xarray dataset?
Satpy does some of this stuff internally, but I want to roll my own.
@djhoese
Philip Austin
@phaustin
@ocefpaf pretty sure that's my mac charger -- feel free to gift it to some visitor/grad student who finds themself without one.
Rob Fatland
@robfatland
@ocefpaf I also lost a charger: Black Surface charger, has a blade-like charger that clips on via magnet; also has a spare USB output port
Ryan Abernathey
@rabernat
ocean.pangeo.io seems to be down
I just added a node to the core pool
Yuvi Panda
@yuvipanda
I am going to submit a PANGEO talk to the local kubernetes meetup here
Primarily as a recruiting tool :D
Probably do the sea level raise demo, and phrase it like 'fight climate change with kubernetes' or something like that
Yuvi Panda
@yuvipanda
and then hopefully I can go around giving this talk in many places
I know this isn't all that PANGEO does but it's a good sell
Philip Austin
@phaustin
@yuvipanda — if you wind up with an AMI/gce image for live demos like this we would definitely be interested in reusing it — I gave a Math Dept. Jupyter Day pangeo talk yesterday using TLJH on EC2 and @mrocklin ’s https://www.youtube.com/watch?v=nH_AQo8WdKw notebook as a dask motivator and it went smoothly.
Ryan Abernathey
@rabernat
@yuvipanda :+1:
Yuvi Panda
@yuvipanda
submitted it! I've called myself a 'part of the PANGEO project', which I hope isn't overstating it!
@phaustin will do! I think I'll mostly cannibalize other folks' talks, but tune them to a devops audience
with that my keyboard time for the day is up! ttyl!
Ryan Abernathey
@rabernat
you are definitely "part of the PANGEO project"! In fact, it's hard to imagine where we would be without you!
Ian Rose
@ian-r-rose
:arrow_double_up:
David Hoese
@djhoese

@rabernat Check out my work on geoxarray: geoxarray/geoxarray#13

The way I see it you have a couple choices. Mainly depends on what your goal is for the representation/serialization.

If we are talking about the best way to make a serializable version of a grid/area definition, your main issue is the projection. Well Known Text (WKT) is supposed to be the most fully defined way of describing a projection. PROJ strings apparently can't fully describe all projections. I've been leaning towards pyproj's CRS objects as my preferred container for projection information since it can convert to PROJ strings, PROJ dicts, WKT (different versions), or a CF grid mapping variable. There was also a discussion between @snowman2 and @dopplershift about seeing how possible it would be to use pyproj's CRS objects to replace cartopy's CRS objects (or at least making cartopy's based on pyproj's).

You then have your extent information. Do you force them all to be the same or allow for all the variations? lower-left + upper-right + number of rows/columns, upper-left + pixel size + number of rows/columns, some other combination of these types of parameters, etc.

Are you going to define a single object that defines this information (pyresample's AreaDefinition) or encode it entirely in the xarray Dataset? If in the Dataset, do you depend on the x and y coordinates to define the extents? Or do you have separate attributes/coordinates?

Filipe
@ocefpaf
@robfatland that's the one we found. Don have it.
Kai Mühlbauer
@kmuehlbauer
Hi, short question. I want to test https://binder.pangeo.io/ with one of my notebook repos. The data used in this repo is normally cloned to the system the notebooks are running on and made available to the notebooks via env variables. Is there a simple way to achieve this? If not, what are my options?
Tim Head
@betatim
if you put a start file in your repo (https://repo2docker.readthedocs.io/en/latest/config_files.html#start-run-code-before-the-user-sessions-starts) you can run commands when your repo is launched
you could also bake the data into the image by using a postBuild file. or you use a data format which allows random access over the network and only read what you need when you need it
depends a bit on how big your data is and how much of it you need
Kai Mühlbauer
@kmuehlbauer
@betatim Thanks for the hints. How much can I safely bake into the container? I need to look how much it is. If it would fit with your suggested size, if would bake it into instead of pulling everytime.
Tim Head
@betatim
if it is more than ~1000MB i'd think about only pulling it when you actually need it (so neither postBuild nor start) because it either makes the image huge (slow startup times) or gives your binder a lot to do on startup (slow startup times :) ) as well as a lot of network traffic
David Brochart
@davidbrochart
I'm going to update the GPM precipitation dataset at gs://pangeo-data/gpm_imerg_early, because a new version has been released and the TRMM era has been merged, so it's now available from June 2000 instead of 2014. I will also include the probabilityLiquidPrecipitation field, along with the precipitationCal field, so that we have an information about solid precipitation.
Kai Mühlbauer
@kmuehlbauer
@betatim Thanks again, I'll think about restructuring everything.
David Hoese
@djhoese
@kmuehlbauer (hi from another non-vispy project) In some of my satpy repositories I've had it run a download script in the background from the start script. The data being downloaded though was on google's cloud storage though so I wasn't too worried about the network traffic for the ~3GB I was downloading.
Scott
@scollis
@kmuehlbauer I use the postbuild for data
Scott
@scollis
Are folks having issues today with postbuilds.. my jupyter labextentions install is throwing errors..
according to the tracker it should be fixed..
I wonder if the docker image needs updating
@scottyhq ...
David Hoese
@djhoese
@scollis the issue with doing data download in postbuild is that all that gets bundled in the image. If the image takes too long to download it can stop your image from being loaded in time and binderhub will time out
if you do the data download in the start without putting it in the background then a similar thing will happen where binderhub will wait for jupyter lab to start up but it won't happen in time because data is taking too long to download
(from experience)
Scott
@scollis
Great point! One issue is the current way I am building my image is using a Pangeo docker container as a base.. and somehow this stops start being run