Discussion channel to work on creating metadata files that can provide zarr type access speeds to older formated data of many different types
I just posted on @lsterzinger ’s GEOS tutorial repo that we ought to make that a
recipe
for pangeo-forge, and also mentioned NWM, but totally forgot about the HRRR work! These should all be there, even if pangeo-forge doesn’t yet have a mechanism for dealing with regularly updated datasets.
@martindurant 100%, let me take a closer look at your Hdf5 recipe and see what's needed
out
is a dict
.
with fs2.open(outfname, "w") as f:
f.write(ujson.dumps(out))
@martindurant , I just realized that indeed as you predicted yesterday, we have some more work to do on time variables, at least for Grib files! Check out cells [17] and [18] in this notebook:
https://nbviewer.jupyter.org/gist/rsignell-usgs/fedf4b0e2d80bd9d202792ed99100d6f
The "time" variable is the time at which the model was run, and since I'm appending the latest forecast to the "best time series", all the values at the end are the same.
Meanwhile the "valid_time" variable, what one would expect to be the "time" variable (having the time values for each hour of the forecast), has only the first two values, with all the rest NaN.
So can we just flip them? We don't really care about providing the hour at which the model was run, since that could be in the description of the dataset. An evenly-spaced variable called "time" (that apparently is in the "valid_time" variable in Grib) is what we want. Make sense?
rpath = 's3://esip-qhub/noaa/hrrr/jsons/20210901.t00z.wrfsfcf01.json'
s_opts = {'requester_pays':True, 'skip_instance_cache':True}
r_opts = {'anon':True}
fs = fsspec.filesystem("reference", fo=rpath, ref_storage_args=s_opts,
remote_protocol='s3', remote_options=r_opts)
m = fs.get_mapper("")
ds = xr.open_dataset(m, engine="zarr", backend_kwargs=dict(consolidated=False))
ds.data_vars
Data variables:
refd (y, x) float32 ...
si10 (y, x) float32 ...
u (y, x) float32 ...
u10 (y, x) float32 ...
unknown (y, x) float32 ...
v (y, x) float32 ...
v10 (y, x) float32 ...
Finished
In [8]: ds
Out[8]:
<xarray.Dataset>
Dimensions: (time: 9, y: 1059, x: 1799)
Coordinates:
heightAboveGround float64 ...
latitude (y, x) float64 ...
longitude (y, x) float64 ...
step timedelta64[ns] ...
* time (time) datetime64[us] 2019-01-01T22:00:00 ... 2019-01-...
valid_time (time) datetime64[ns] ...
Dimensions without coordinates: y, x
Data variables:
d2m (time, y, x) float32 ...
pt (time, y, x) float32 ...
r2 (time, y, x) float32 ...
sh2 (time, y, x) float32 ...
t2m (time, y, x) float32 ...
Attributes:
Conventions: CF-1.7
GRIB_centre: kwbc
GRIB_centreDescription: US National Weather Service - NCEP
GRIB_edition: 2
GRIB_subCentre: 0
history: 2021-09-02T16:57 GRIB to CDM+CF via cfgrib-0.9.9...
institution: US National Weather Service - NCEP
In [9]: ds.time.values
Out[9]:
array(['2019-01-01T22:00:00.000000', '2019-01-01T23:00:00.000000',
'2019-01-02T00:00:00.000000', '2019-01-02T01:00:00.000000',
'2019-01-02T02:00:00.000000', '2019-01-02T03:00:00.000000',
'2019-01-02T04:00:00.000000', '2019-01-02T05:00:00.000000',
'2019-01-02T06:00:00.000000'], dtype='datetime64[us]’)
(this is the original set of files, not the ones from your newer example)