Discussion channel to work on creating metadata files that can provide zarr type access speeds to older formated data of many different types
look at how _FillValue went from -999900 to 0, that would be great
It looks to me like it's happening during the xarray CF decoding with a Zarr store here
_FillValue
from -999900
to 0
, and based on my debugging that's happening during the cf conversion process
recipe
for pangeo-forge, and also mentioned NWM, but totally forgot about the HRRR work! These should all be there, even if pangeo-forge doesn’t yet have a mechanism for dealing with regularly updated datasets.
/shared/users/lsterzinger/hrrr.ipynb
, I also uploaded it to nbviewer here https://nbviewer.jupyter.org/gist/lsterzinger/c6f8c68c35f94794b5c76cf8b1fea30a
I just posted on @lsterzinger ’s GEOS tutorial repo that we ought to make that a
recipe
for pangeo-forge, and also mentioned NWM, but totally forgot about the HRRR work! These should all be there, even if pangeo-forge doesn’t yet have a mechanism for dealing with regularly updated datasets.
@martindurant 100%, let me take a closer look at your Hdf5 recipe and see what's needed
out
is a dict
.
with fs2.open(outfname, "w") as f:
f.write(ujson.dumps(out))
@martindurant , I just realized that indeed as you predicted yesterday, we have some more work to do on time variables, at least for Grib files! Check out cells [17] and [18] in this notebook:
https://nbviewer.jupyter.org/gist/rsignell-usgs/fedf4b0e2d80bd9d202792ed99100d6f
The "time" variable is the time at which the model was run, and since I'm appending the latest forecast to the "best time series", all the values at the end are the same.
Meanwhile the "valid_time" variable, what one would expect to be the "time" variable (having the time values for each hour of the forecast), has only the first two values, with all the rest NaN.
So can we just flip them? We don't really care about providing the hour at which the model was run, since that could be in the description of the dataset. An evenly-spaced variable called "time" (that apparently is in the "valid_time" variable in Grib) is what we want. Make sense?