Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jan 21 22:42
    stale[bot] closed #743
  • Jan 21 22:42
    stale[bot] commented #743
  • Jan 17 15:41
    stale[bot] labeled #744
  • Jan 17 15:41
    stale[bot] commented #744
  • Jan 17 14:55
    martindurant commented #288
  • Jan 17 14:54
    rabernat commented #288
  • Jan 16 17:58
    martindurant commented #288
  • Jan 16 17:56
    mrocklin commented #288
  • Jan 16 17:54
    martindurant commented #288
  • Jan 16 17:50
    stale[bot] unlabeled #288
  • Jan 16 17:50
    mrocklin reopened #288
  • Jan 16 17:50
    mrocklin commented #288
  • Jan 15 07:53
    byersiiasa commented #685
  • Jan 14 23:09
    nbren12 commented #567
  • Jan 14 23:01
    nbren12 commented #567
  • Jan 14 22:18
    stale[bot] labeled #743
  • Jan 14 22:18
    stale[bot] commented #743
  • Jan 14 21:43
    skeller88 commented #567
  • Jan 14 18:18
    mrocklin commented #194
  • Jan 14 17:30
    rsignell-usgs commented #194
Pier
@PhenoloBoy
have a look to this, even if is far fro be perfect and is more written by a monkey it could help you. You have to readapt as the writing part isn't there. Unfortunately, I couldn't retest if it's working as I'm a little bit busy
Scott
@scollis
Awesome, thanks!
Pier
@PhenoloBoy
the approach is unconventional and I don't suggest to anybody to follow it. Time to time in some cases is the only solution that I figured out
Scott
@scollis
@jhamman is there any example that stores data from Kuberneties workers to a cloud store like google cloud as a way of returning data?
Pier
@PhenoloBoy
Zarr or parquette is your solution
Scott
@scollis
Thats really nice @PhenoloBoy … acts as way to start thinking about stuff.. Yeah.. gotta learn to use Zarr
Pier
@PhenoloBoy
I've made the same question to @jhamman few days ago
the solution is to use Zarr or Parquet
seems that in the upcoming 6 months there will be a beta for netCDF that will use Zarr but right now is more gossip than anything else ( at least for my understanding)
Rob Fatland
@robfatland
@rabernat fantastic i was hoping you'd be available.
Charles Blackmon-Luca
@charlesbluca
Quickly tossed together some notebooks to generate a catalog of all the data on gs://pangeo-data; my idea is that one day a script could be automated to do this on a regular basis
Rob Fatland
@robfatland
If nobody minds I'm going to hijack appear.in/pangeo from 5pm to 5:30pm PDT today for a conversation on Megaptera, our citizen science whale call identification ML project. LMK if any conflicting and I'll bang on over to zoom.
Anderson Banihirwe
@andersy005

RE: For some reason, @tjcrone and I are have trouble making intake work with s3. I think Tim will open an intake issue.

@rabernat & @tjcrone, did you figure this out? Could you expand on what the exact issue was? I am trying to create static intake catalogs pointing to CESM LENS data in S3 and I seem to be having some issues when accessing the data.

Joe Hamman
@jhamman
@robfatland :thumbsup:
Tom Augspurger
@TomAugspurger
Ping me if you’re having issues with s3fs. There’s been some churn lately.
Filipe
@ocefpaf
Did anyone loose a laptop charger during the meeting last week?
Ryan Abernathey
@rabernat
@tjcrone - could you open an intake issue about the s3fs / intake problem we ran into? I don't have the code to reproduce on my machine.
Ryan Abernathey
@rabernat
Satpy / pyresample experts. What is the best way to serialize an area definition and store it in an xarray dataset?
Satpy does some of this stuff internally, but I want to roll my own.
@djhoese
Philip Austin
@phaustin
@ocefpaf pretty sure that's my mac charger -- feel free to gift it to some visitor/grad student who finds themself without one.
Rob Fatland
@robfatland
@ocefpaf I also lost a charger: Black Surface charger, has a blade-like charger that clips on via magnet; also has a spare USB output port
Ryan Abernathey
@rabernat
ocean.pangeo.io seems to be down
I just added a node to the core pool
Yuvi Panda
@yuvipanda
I am going to submit a PANGEO talk to the local kubernetes meetup here
Primarily as a recruiting tool :D
Probably do the sea level raise demo, and phrase it like 'fight climate change with kubernetes' or something like that
Yuvi Panda
@yuvipanda
and then hopefully I can go around giving this talk in many places
I know this isn't all that PANGEO does but it's a good sell
Philip Austin
@phaustin
@yuvipanda — if you wind up with an AMI/gce image for live demos like this we would definitely be interested in reusing it — I gave a Math Dept. Jupyter Day pangeo talk yesterday using TLJH on EC2 and @mrocklin ’s https://www.youtube.com/watch?v=nH_AQo8WdKw notebook as a dask motivator and it went smoothly.
Ryan Abernathey
@rabernat
@yuvipanda :+1:
Yuvi Panda
@yuvipanda
submitted it! I've called myself a 'part of the PANGEO project', which I hope isn't overstating it!
@phaustin will do! I think I'll mostly cannibalize other folks' talks, but tune them to a devops audience
with that my keyboard time for the day is up! ttyl!
Ryan Abernathey
@rabernat
you are definitely "part of the PANGEO project"! In fact, it's hard to imagine where we would be without you!
Ian Rose
@ian-r-rose
:arrow_double_up:
David Hoese
@djhoese

@rabernat Check out my work on geoxarray: geoxarray/geoxarray#13

The way I see it you have a couple choices. Mainly depends on what your goal is for the representation/serialization.

If we are talking about the best way to make a serializable version of a grid/area definition, your main issue is the projection. Well Known Text (WKT) is supposed to be the most fully defined way of describing a projection. PROJ strings apparently can't fully describe all projections. I've been leaning towards pyproj's CRS objects as my preferred container for projection information since it can convert to PROJ strings, PROJ dicts, WKT (different versions), or a CF grid mapping variable. There was also a discussion between @snowman2 and @dopplershift about seeing how possible it would be to use pyproj's CRS objects to replace cartopy's CRS objects (or at least making cartopy's based on pyproj's).

You then have your extent information. Do you force them all to be the same or allow for all the variations? lower-left + upper-right + number of rows/columns, upper-left + pixel size + number of rows/columns, some other combination of these types of parameters, etc.

Are you going to define a single object that defines this information (pyresample's AreaDefinition) or encode it entirely in the xarray Dataset? If in the Dataset, do you depend on the x and y coordinates to define the extents? Or do you have separate attributes/coordinates?

Filipe
@ocefpaf
@robfatland that's the one we found. Don have it.
Kai Mühlbauer
@kmuehlbauer
Hi, short question. I want to test https://binder.pangeo.io/ with one of my notebook repos. The data used in this repo is normally cloned to the system the notebooks are running on and made available to the notebooks via env variables. Is there a simple way to achieve this? If not, what are my options?
Tim Head
@betatim
if you put a start file in your repo (https://repo2docker.readthedocs.io/en/latest/config_files.html#start-run-code-before-the-user-sessions-starts) you can run commands when your repo is launched
you could also bake the data into the image by using a postBuild file. or you use a data format which allows random access over the network and only read what you need when you need it
depends a bit on how big your data is and how much of it you need
Kai Mühlbauer
@kmuehlbauer
@betatim Thanks for the hints. How much can I safely bake into the container? I need to look how much it is. If it would fit with your suggested size, if would bake it into instead of pulling everytime.
Tim Head
@betatim
if it is more than ~1000MB i'd think about only pulling it when you actually need it (so neither postBuild nor start) because it either makes the image huge (slow startup times) or gives your binder a lot to do on startup (slow startup times :) ) as well as a lot of network traffic
David Brochart
@davidbrochart
I'm going to update the GPM precipitation dataset at gs://pangeo-data/gpm_imerg_early, because a new version has been released and the TRMM era has been merged, so it's now available from June 2000 instead of 2014. I will also include the probabilityLiquidPrecipitation field, along with the precipitationCal field, so that we have an information about solid precipitation.
Kai Mühlbauer
@kmuehlbauer
@betatim Thanks again, I'll think about restructuring everything.
David Hoese
@djhoese
@kmuehlbauer (hi from another non-vispy project) In some of my satpy repositories I've had it run a download script in the background from the start script. The data being downloaded though was on google's cloud storage though so I wasn't too worried about the network traffic for the ~3GB I was downloading.
Scott
@scollis
@kmuehlbauer I use the postbuild for data