Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Chelle Gentemann
    @cgentemann
    this is so great -
    Martin Durant
    @martindurant
    you don’t need rasterio, it’s pure C buffers (i.e., zarr reader with compression=None)
    Rich Signell
    @rsignell-usgs
    oh nice
    Chelle Gentemann
    @cgentemann
    (me opening up aws hub quickly)
    Martin Durant
    @martindurant
    For the 128GB files, I can subset the massive array on the biggest dimension, but this will be by hand for now.
    Rich Signell
    @rsignell-usgs
    2021-09-02_13-08-50.jpg
    It appears that future Lucas will be selling car insurance rather than geodata science
    Chelle Gentemann
    @cgentemann
    we aren't enough fun. he is leaving us for hollywood. i knew it.
    maybe he will give us a discount and do a commercial for fsspec? wow, that just made me think, gosh wouldn't it be nice if open source libraries had so much money that NPR would say, 'brought to you by xarray, they put the data in climate' or 'fsspec, a solution for your cloudy confusion headaches'
    Rich Signell
    @rsignell-usgs
    But I do like that he chose "ottoinsurance"
    Ooh, yeah Chelle, that would be cool
    Chelle Gentemann
    @cgentemann
    i wonder if we could ask NPR to be part of the year of open science and have one day where they just talk about open source!
    Lucas Sterzinger
    @lsterzinger

    It appears that future Lucas will be selling car insurance rather than geodata science

    Gotta make ends meet somehow!

    Rich Signell
    @rsignell-usgs
    hee hee
    @martindurant , guess what? Our variables are screwed up also for the GRIB. I thought they looked funny. Here's what should be returned based on our filter:
    https://nbviewer.jupyter.org/gist/rsignell-usgs/d7fa16be2bb4323ae9d700a17b1fe2cb
    Martin Durant
    @martindurant
    Can you make a brief summary of where we’re up to? The grib files convert to single JSONs but don’t combine? Something else?
    Rich Signell
    @rsignell-usgs
    The single json is messed up also:
    rpath = 's3://esip-qhub/noaa/hrrr/jsons/20210901.t00z.wrfsfcf01.json'
    s_opts = {'requester_pays':True, 'skip_instance_cache':True}
    r_opts = {'anon':True}
    fs = fsspec.filesystem("reference", fo=rpath, ref_storage_args=s_opts,
                           remote_protocol='s3', remote_options=r_opts)
    m = fs.get_mapper("")
    ds = xr.open_dataset(m, engine="zarr", backend_kwargs=dict(consolidated=False))
    Lucas Sterzinger
    @lsterzinger

    The single json is messed up also:

    Maybe I should have been so quick to send the example notebook to Ryan :laughing:

    Rich Signell
    @rsignell-usgs
    You sent it?
    ds.data_vars
    
    Data variables:
        refd     (y, x) float32 ...
        si10     (y, x) float32 ...
        u        (y, x) float32 ...
        u10      (y, x) float32 ...
        unknown  (y, x) float32 ...
        v        (y, x) float32 ...
        v10      (y, x) float32 ...
    Lucas Sterzinger
    @lsterzinger
    Yeah, just gave the link on that twitter thread. I'll reply to it saying we've discovered some issues with it and maybe don't use it for anything actually important
    1 reply
    Rich Signell
    @rsignell-usgs
    but should be:
    Data variables:
        unknown  (y, x) float32 ...
        t2m      (y, x) float32 ...
        pt       (y, x) float32 ...
        sh2      (y, x) float32 ...
        d2m      (y, x) float32 ...
        r2       (y, x) float32 ...
    Martin Durant
    @martindurant
    I appear to have found a not
    We have no tests of the grib2 module… I can only suppose that when testing manually, I was accidentally reusing an fs instance?
    Martin Durant
    @martindurant
    I am running it locally, but it’s taking a while. We probably can’t add the exact contents of the example* functions to the tests.
    Lucas Sterzinger
    @lsterzinger
    That not should not have been a not, other nots notwithstanding :wink:
    Martin Durant
    @martindurant

    Finished

    In [8]: ds
    Out[8]:
    <xarray.Dataset>
    Dimensions:            (time: 9, y: 1059, x: 1799)
    Coordinates:
        heightAboveGround  float64 ...
        latitude           (y, x) float64 ...
        longitude          (y, x) float64 ...
        step               timedelta64[ns] ...
      * time               (time) datetime64[us] 2019-01-01T22:00:00 ... 2019-01-...
        valid_time         (time) datetime64[ns] ...
    Dimensions without coordinates: y, x
    Data variables:
        d2m                (time, y, x) float32 ...
        pt                 (time, y, x) float32 ...
        r2                 (time, y, x) float32 ...
        sh2                (time, y, x) float32 ...
        t2m                (time, y, x) float32 ...
    Attributes:
        Conventions:             CF-1.7
        GRIB_centre:             kwbc
        GRIB_centreDescription:  US National Weather Service - NCEP
        GRIB_edition:            2
        GRIB_subCentre:          0
        history:                 2021-09-02T16:57 GRIB to CDM+CF via cfgrib-0.9.9...
        institution:             US National Weather Service - NCEP
    
    In [9]: ds.time.values
    Out[9]:
    array(['2019-01-01T22:00:00.000000', '2019-01-01T23:00:00.000000',
           '2019-01-02T00:00:00.000000', '2019-01-02T01:00:00.000000',
           '2019-01-02T02:00:00.000000', '2019-01-02T03:00:00.000000',
           '2019-01-02T04:00:00.000000', '2019-01-02T05:00:00.000000',
           '2019-01-02T06:00:00.000000'], dtype='datetime64[us]’)

    (this is the original set of files, not the ones from your newer example)

    That not should have been a not but was not!
    Rich Signell
    @rsignell-usgs
    Essentially a sign error. I love it!
    I will try it!
    Martin Durant
    @martindurant
    How much data is that? I notice that the coordinate is valid_time, but the values of time are mostly missing
    Rich Signell
    @rsignell-usgs
    I was thinking about modifying the combined json to remove time and rename valid_time to be time
    Martin Durant
    @martindurant
    Sounds good. What’s the difference between them?
    Rich Signell
    @rsignell-usgs
    time is the time the model forecast was run. valid_time is the time that corresponds to the data being simulated.
    The size of this 1 day is about 1GB
    Martin Durant
    @martindurant
    OK, so indeed your rename appears right to me. I wonder why they are NaT, though.
    Rich Signell
    @rsignell-usgs
    Does it have something to do with reading only the first two time steps?
    It would be nice to have the time the model forecast was initialized included as a variable so that users could tell whether the forecast data was 1 hour out from initialization or 18, perhaps called forecast_time. While the time variable would just march forward with uniform 1 hour time steps, the forecast_time variable would have 1 hour time steps up to the latest forecast, and then 0 hour time steps (all forecast_time values of the last 18 data records would be the same). Does that make sense?
    Martin Durant
    @martindurant
    I suppose it’s not included in the cases set out in MultiZarrToZarr._build_output:
                # cases
                # a) this is accum_dim -> note values, dealt with above
                # b) this is a dimension that didn't change -> copy (once)
                # c) this is a normal var, without accum_dim, var.shape == var0.shape -> copy (once)
                # d) this is var needing reshape -> each dataset's keys get new names, update shape
    This would be a coordinate that DOES change (maps 1:1 with the accumulation dimension)
    Martin Durant
    @martindurant
    Got talk at PyData Global
    1 reply
    Lucas Sterzinger
    @lsterzinger
    🎉
    As more people start using reference maker (e.g. intake/fsspec-reference-maker#72), should we advertise this gitter in the README?
    Rich Signell
    @rsignell-usgs
    We should create a better name for the gitter if we do, since we know now this isn't just netcdf4
    Hopefully the PyData Global talk will get more attention on this!
    Martin Durant
    @martindurant
    Yes, I think so. There are still plenty of obvious holes like intake/fsspec-reference-maker#74 that had better be fixed.
    I am writing a FITS parser, where times are recorded as ISO strings.
    The coordinates, on the other hand… Also, should waveband be a coordinate or separate variables?
    2 replies
    Lucas Sterzinger
    @lsterzinger

    We should create a better name for the gitter if we do, since we know now this isn't just netcdf4

    Can't we create a gitter directly from that repo name? So it would be intake/fsspec-reference-maker