Discussion channel to work on creating metadata files that can provide zarr type access speeds to older formated data of many different types
@martindurant , I just realized that indeed as you predicted yesterday, we have some more work to do on time variables, at least for Grib files! Check out cells [17] and [18] in this notebook:
https://nbviewer.jupyter.org/gist/rsignell-usgs/fedf4b0e2d80bd9d202792ed99100d6f
The "time" variable is the time at which the model was run, and since I'm appending the latest forecast to the "best time series", all the values at the end are the same.
Meanwhile the "valid_time" variable, what one would expect to be the "time" variable (having the time values for each hour of the forecast), has only the first two values, with all the rest NaN.
So can we just flip them? We don't really care about providing the hour at which the model was run, since that could be in the description of the dataset. An evenly-spaced variable called "time" (that apparently is in the "valid_time" variable in Grib) is what we want. Make sense?
rpath = 's3://esip-qhub/noaa/hrrr/jsons/20210901.t00z.wrfsfcf01.json'
s_opts = {'requester_pays':True, 'skip_instance_cache':True}
r_opts = {'anon':True}
fs = fsspec.filesystem("reference", fo=rpath, ref_storage_args=s_opts,
remote_protocol='s3', remote_options=r_opts)
m = fs.get_mapper("")
ds = xr.open_dataset(m, engine="zarr", backend_kwargs=dict(consolidated=False))
ds.data_vars
Data variables:
refd (y, x) float32 ...
si10 (y, x) float32 ...
u (y, x) float32 ...
u10 (y, x) float32 ...
unknown (y, x) float32 ...
v (y, x) float32 ...
v10 (y, x) float32 ...
Finished
In [8]: ds
Out[8]:
<xarray.Dataset>
Dimensions: (time: 9, y: 1059, x: 1799)
Coordinates:
heightAboveGround float64 ...
latitude (y, x) float64 ...
longitude (y, x) float64 ...
step timedelta64[ns] ...
* time (time) datetime64[us] 2019-01-01T22:00:00 ... 2019-01-...
valid_time (time) datetime64[ns] ...
Dimensions without coordinates: y, x
Data variables:
d2m (time, y, x) float32 ...
pt (time, y, x) float32 ...
r2 (time, y, x) float32 ...
sh2 (time, y, x) float32 ...
t2m (time, y, x) float32 ...
Attributes:
Conventions: CF-1.7
GRIB_centre: kwbc
GRIB_centreDescription: US National Weather Service - NCEP
GRIB_edition: 2
GRIB_subCentre: 0
history: 2021-09-02T16:57 GRIB to CDM+CF via cfgrib-0.9.9...
institution: US National Weather Service - NCEP
In [9]: ds.time.values
Out[9]:
array(['2019-01-01T22:00:00.000000', '2019-01-01T23:00:00.000000',
'2019-01-02T00:00:00.000000', '2019-01-02T01:00:00.000000',
'2019-01-02T02:00:00.000000', '2019-01-02T03:00:00.000000',
'2019-01-02T04:00:00.000000', '2019-01-02T05:00:00.000000',
'2019-01-02T06:00:00.000000'], dtype='datetime64[us]’)
(this is the original set of files, not the ones from your newer example)
forecast_time
. While the time
variable would just march forward with uniform 1 hour time steps, the forecast_time
variable would have 1 hour time steps up to the latest forecast, and then 0 hour time steps (all forecast_time
values of the last 18 data records would be the same). Does that make sense?
# cases
# a) this is accum_dim -> note values, dealt with above
# b) this is a dimension that didn't change -> copy (once)
# c) this is a normal var, without accum_dim, var.shape == var0.shape -> copy (once)
# d) this is var needing reshape -> each dataset's keys get new names, update shape