RE: For some reason, @tjcrone and I are have trouble making intake work with s3. I think Tim will open an intake issue.
@rabernat & @tjcrone, did you figure this out? Could you expand on what the exact issue was? I am trying to create static intake catalogs pointing to CESM LENS data in S3 and I seem to be having some issues when accessing the data.
@rabernat Check out my work on geoxarray: geoxarray/geoxarray#13
The way I see it you have a couple choices. Mainly depends on what your goal is for the representation/serialization.
If we are talking about the best way to make a serializable version of a grid/area definition, your main issue is the projection. Well Known Text (WKT) is supposed to be the most fully defined way of describing a projection. PROJ strings apparently can't fully describe all projections. I've been leaning towards pyproj's CRS objects as my preferred container for projection information since it can convert to PROJ strings, PROJ dicts, WKT (different versions), or a CF grid mapping variable. There was also a discussion between @snowman2 and @dopplershift about seeing how possible it would be to use pyproj's CRS objects to replace cartopy's CRS objects (or at least making cartopy's based on pyproj's).
You then have your extent information. Do you force them all to be the same or allow for all the variations? lower-left + upper-right + number of rows/columns, upper-left + pixel size + number of rows/columns, some other combination of these types of parameters, etc.
Are you going to define a single object that defines this information (pyresample's AreaDefinition) or encode it entirely in the xarray Dataset? If in the Dataset, do you depend on the x and y coordinates to define the extents? Or do you have separate attributes/coordinates?
startfile in your repo (https://repo2docker.readthedocs.io/en/latest/config_files.html#start-run-code-before-the-user-sessions-starts) you can run commands when your repo is launched
postBuildfile. or you use a data format which allows random access over the network and only read what you need when you need it
gs://pangeo-data/gpm_imerg_early, because a new version has been released and the TRMM era has been merged, so it's now available from June 2000 instead of 2014. I will also include the
probabilityLiquidPrecipitationfield, along with the
precipitationCalfield, so that we have an information about solid precipitation.
startscript. The data being downloaded though was on google's cloud storage though so I wasn't too worried about the network traffic for the ~3GB I was downloading.
startwithout putting it in the background then a similar thing will happen where binderhub will wait for jupyter lab to start up but it won't happen in time because data is taking too long to download