Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Martin Durant
    @martindurant

    The .zattrs files are written by xarray in _build_output, lines

            ds.to_zarr(out, chunk_store={}, compute=False,
                       consolidated=False)  # fills in metadata&coords

    so we don’t have any control here. Later on we explicitly skip copying the .zattrs files

    Rich Signell
    @rsignell-usgs
    @lsterzinger , I'm having trouble finding your notebook that creates referenceFileSystem JSON for the HRRR forecast
    Martin Durant
    @martindurant
    I just posted on @lsterzinger ’s GEOS tutorial repo that we ought to make that a recipe for pangeo-forge, and also mentioned NWM, but totally forgot about the HRRR work! These should all be there, even if pangeo-forge doesn’t yet have a mechanism for dealing with regularly updated datasets.
    Lucas Sterzinger
    @lsterzinger
    @rsignell-usgs It's on esip-qhub at /shared/users/lsterzinger/hrrr.ipynb, I also uploaded it to nbviewer here https://nbviewer.jupyter.org/gist/lsterzinger/c6f8c68c35f94794b5c76cf8b1fea30a

    I just posted on @lsterzinger ’s GEOS tutorial repo that we ought to make that a recipe for pangeo-forge, and also mentioned NWM, but totally forgot about the HRRR work! These should all be there, even if pangeo-forge doesn’t yet have a mechanism for dealing with regularly updated datasets.

    @martindurant 100%, let me take a closer look at your Hdf5 recipe and see what's needed

    Lucas Sterzinger
    @lsterzinger
    @rsignell-usgs I just realized that the hrrr filenames do not include dates, only times, so the list of jsons created by that notebook are out of order (compare to list of URLs in the 2nd cell)
    Rich Signell
    @rsignell-usgs
    I'm trying to write the HRRR json to S3 (so that I can use a distributed cluster) and something is wrong here:
        out = scan_grib(u, common, so, inline_threashold=100, filter=afilter)        
        with fs2.open(outfname, "wb") as f:
            f.write(ujson.dumps(out))
    @martindurant, I'm guessing you see what's wrong . out is a dict.
    Martin Durant
    @martindurant
    outfname is “s3://..” ? Do you need credentials in there? Does the first line complete?
    Rich Signell
    @rsignell-usgs
    yes
    oh, it's not "wb", is it?
    yep, this works:
    with fs2.open(outfname, "w") as f:
        f.write(ujson.dumps(out))
    Martin Durant
    @martindurant
    I think builtin json allows either binary, but maybe ujson doesn't
    Martin Durant
    @martindurant
    load of uncompressed FITS files in “gcs://pangeo-data/SDO_AIA_Images” (3TB) see pangeo-data/pangeo#269
    Appears to be exactly one binary chunk per file (for the only one I downloaded)
    Chelle Gentemann
    @cgentemann
    just to follow up from meeting today trying to get a higher-profile article out about how lucas/rich/martin are changing how we think about accessing data. i'm happy to help out with a medium post and like I said in our chat, right now I'm in write-only mode. If someone can figure out how to change my access back to science-only I'd appreciate it, but well, for now here is a document we can start working in. i'll try to get an outline going. https://docs.google.com/document/d/1O2dPeB1smArHg62XcNOxwwEpWDwdUiWIn09XS-fr4tc/edit?usp=sharing
    Rich Signell
    @rsignell-usgs
    Thanks Chelle! I'll check this out in more detail tomorrow.
    @lsterzinger , here's the HRRR forecast notebook with Dask and using 1 hour tau from the past forecasts + the latest forecast: https://nbviewer.jupyter.org/gist/rsignell-usgs/a047178ee12d44c2a7900ee86ba2fbc7
    I'm not sure why we are getting winds at 10m with a filter of 2m. Also we seem to be getting multiple wind variables. Will explore more tomorrow. But the hard part works!
    Rich Signell
    @rsignell-usgs

    @martindurant , I just realized that indeed as you predicted yesterday, we have some more work to do on time variables, at least for Grib files! Check out cells [17] and [18] in this notebook:
    https://nbviewer.jupyter.org/gist/rsignell-usgs/fedf4b0e2d80bd9d202792ed99100d6f

    The "time" variable is the time at which the model was run, and since I'm appending the latest forecast to the "best time series", all the values at the end are the same.

    Meanwhile the "valid_time" variable, what one would expect to be the "time" variable (having the time values for each hour of the forecast), has only the first two values, with all the rest NaN.

    So can we just flip them? We don't really care about providing the hour at which the model was run, since that could be in the description of the dataset. An evenly-spaced variable called "time" (that apparently is in the "valid_time" variable in Grib) is what we want. Make sense?

    Martin Durant
    @martindurant
    All booked up until the afternoon...
    Martin Durant
    @martindurant
    40TB of FITS imaging in ‘s3://stpubdata/tess/public/mast’ in 128GB uncompressed files?
    Rich Signell
    @rsignell-usgs
    So we can read FITS into xarray via rasterio? https://gdal.org/drivers/raster/fits.html
    Chelle Gentemann
    @cgentemann
    this is so great -
    Martin Durant
    @martindurant
    you don’t need rasterio, it’s pure C buffers (i.e., zarr reader with compression=None)
    Rich Signell
    @rsignell-usgs
    oh nice
    Chelle Gentemann
    @cgentemann
    (me opening up aws hub quickly)
    Martin Durant
    @martindurant
    For the 128GB files, I can subset the massive array on the biggest dimension, but this will be by hand for now.
    Rich Signell
    @rsignell-usgs
    2021-09-02_13-08-50.jpg
    It appears that future Lucas will be selling car insurance rather than geodata science
    Chelle Gentemann
    @cgentemann
    we aren't enough fun. he is leaving us for hollywood. i knew it.
    maybe he will give us a discount and do a commercial for fsspec? wow, that just made me think, gosh wouldn't it be nice if open source libraries had so much money that NPR would say, 'brought to you by xarray, they put the data in climate' or 'fsspec, a solution for your cloudy confusion headaches'
    Rich Signell
    @rsignell-usgs
    But I do like that he chose "ottoinsurance"
    Ooh, yeah Chelle, that would be cool
    Chelle Gentemann
    @cgentemann
    i wonder if we could ask NPR to be part of the year of open science and have one day where they just talk about open source!
    Lucas Sterzinger
    @lsterzinger

    It appears that future Lucas will be selling car insurance rather than geodata science

    Gotta make ends meet somehow!

    Rich Signell
    @rsignell-usgs
    hee hee
    @martindurant , guess what? Our variables are screwed up also for the GRIB. I thought they looked funny. Here's what should be returned based on our filter:
    https://nbviewer.jupyter.org/gist/rsignell-usgs/d7fa16be2bb4323ae9d700a17b1fe2cb
    Martin Durant
    @martindurant
    Can you make a brief summary of where we’re up to? The grib files convert to single JSONs but don’t combine? Something else?
    Rich Signell
    @rsignell-usgs
    The single json is messed up also:
    rpath = 's3://esip-qhub/noaa/hrrr/jsons/20210901.t00z.wrfsfcf01.json'
    s_opts = {'requester_pays':True, 'skip_instance_cache':True}
    r_opts = {'anon':True}
    fs = fsspec.filesystem("reference", fo=rpath, ref_storage_args=s_opts,
                           remote_protocol='s3', remote_options=r_opts)
    m = fs.get_mapper("")
    ds = xr.open_dataset(m, engine="zarr", backend_kwargs=dict(consolidated=False))
    Lucas Sterzinger
    @lsterzinger

    The single json is messed up also:

    Maybe I should have been so quick to send the example notebook to Ryan :laughing:

    Rich Signell
    @rsignell-usgs
    You sent it?
    ds.data_vars
    
    Data variables:
        refd     (y, x) float32 ...
        si10     (y, x) float32 ...
        u        (y, x) float32 ...
        u10      (y, x) float32 ...
        unknown  (y, x) float32 ...
        v        (y, x) float32 ...
        v10      (y, x) float32 ...
    Lucas Sterzinger
    @lsterzinger
    Yeah, just gave the link on that twitter thread. I'll reply to it saying we've discovered some issues with it and maybe don't use it for anything actually important
    1 reply
    Rich Signell
    @rsignell-usgs
    but should be:
    Data variables:
        unknown  (y, x) float32 ...
        t2m      (y, x) float32 ...
        pt       (y, x) float32 ...
        sh2      (y, x) float32 ...
        d2m      (y, x) float32 ...
        r2       (y, x) float32 ...
    Martin Durant
    @martindurant
    I appear to have found a not
    We have no tests of the grib2 module… I can only suppose that when testing manually, I was accidentally reusing an fs instance?
    Martin Durant
    @martindurant
    I am running it locally, but it’s taking a while. We probably can’t add the exact contents of the example* functions to the tests.