Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Martin Durant
    @martindurant
    (or submitting it to pangeo-forge, of course)
    Alex Kerney
    @abkfenris
    Ok, I thought that using {{ CATALOG_DIR }}/ would be the right choice. I'm going to try to hand roll a catalog that doesn't have the extra kwargs.
    The reference.yaml's are coming from the recipe generation so those are all the defaults.
    Martin Durant
    @martindurant
    You mean pang-forge-recipes’ HDFReferenceRecipe? The intake stub is intended to be edited (no auomatic way to do that yet).
    Alex Kerney
    @abkfenris
    Yep
    I'll try to write a catalog and remove the stubs after generation
    Martin Durant
    @martindurant
    Having done that, you can make an issue at recipes with ideas about how an automatic catalogue would best be made.
    Alex Kerney
    @abkfenris

    I've pushed up a combined catalog, but it looks like when using the github backend, the repo and org get dropped.

    http_cat = intake.open_catalog("https://raw.githubusercontent.com/gulfofmaine/OISST_intake/main/catalog.yaml")
    http_ds = http_cat.complete.to_dask()
    http_ds['sst'].sel(time="1990-01-01").plot()

    Works, while

    github_cat = intake.open_catalog("github://gulfofmaine:OISST_intake@/catalog.yaml")
    github_ds = github_cat.complete.to_dask()
    github_ds

    Fails with TypeError: __init__() missing 2 required positional arguments: 'org' and 'repo'.

    It looks like the catalog_dir isn't getting fully populated.

    http_cat.complete
    
    complete:
      args:
        chunks: {}
        consolidated: false
        storage_options:
          fo: https://raw.githubusercontent.com/gulfofmaine/OISST_intake/main/complete/reference.json
          remote_options: {}
          remote_protocol: https
        urlpath: reference://
      description: Complete OISST daily data
      driver: intake_xarray.xzarr.ZarrSource
      metadata:
        catalog_dir: https://raw.githubusercontent.com/gulfofmaine/OISST_intake/main

    Compared to

    github_cat.complete
    
    complete:
      args:
        chunks: {}
        consolidated: false
        storage_options:
          fo: github:///complete/reference.json
          remote_options: {}
          remote_protocol: https
        urlpath: reference://
      description: Complete OISST daily data
      driver: intake_xarray.xzarr.ZarrSource
      metadata:
        catalog_dir: github://
    Martin Durant
    @martindurant
    Bahh. So you would need target_options after all. Perhaps Intake should include a “{{ CATALOG_STORAGE_OPTIONS }}” or something - but right now it does not.
    Lucas Sterzinger
    @lsterzinger

    Sorry for the delay, but I think I'm ready to publish my next blog post on fsspec-reference-maker soon (this one focusing on how to use referencemaker + dask to make references for the NWM dataset). Please feel free to leave any comments/suggestions you think might be necessary!

    https://docs.google.com/document/d/1O3N2PJxrZPWHJ1hRirObwF_1_M8UCDv9fjnMyl208Tw/edit?usp=sharing

    Martin Durant
    @martindurant
    Enjoy
    Martin Durant
    @martindurant
    Suggests that I should have made the filter a coordinate rather than separate variables, so they can show up as a selector in the plot. I don’t know why I have to make the coordinates, and I didn’t persuade quadmesh (which took very long) to use the provided coords. Seems I don’t understand how that’s supposed to work ( @rsignell-usgs )
    Rich Signell
    @rsignell-usgs
    Martin you need to use rasterize=True on quadmesh on most grids or you blow out browser memory. Is that what happened?
    BTW, it doesn't look like I provided the link to my most updated HRRR GRIB best time series notebook here: https://nbviewer.jupyter.org/gist/rsignell-usgs/ae7bfb16e8a4049bb1ca0379805c4dc2
    Martin Durant
    @martindurant

    Is that what happened

    Quite likely!
    Do you want to play with the notebook and show me how it should be displayed? Note that the individual frames are still quite big (30MB r something), so it won’t ever be super responsive.

    Also, there’s a bug in the largest wavelength I have yet to fix.
    Rich Signell
    @rsignell-usgs
    Okay, I modified the hvplot command to include rasterize=True, and Martin, that is one super cool image!
    https://nbviewer.jupyter.org/gist/rsignell-usgs/2fa1f8dc93a2ba69a7c85f288398636b
    Chelle Gentemann
    @cgentemann
    holy crap that is cool.
    you know at nasa all the earth scientist are the bummers - oh the climate is collapsing - oh a giant storm is going to hit us. we are all jealous of the helio/planetary/astro people who are like 'look we have a cool RC helicopter on mars!'
    'oh look a pretty picture of the sun'
    rich --- can you put some %%time in the cells?
    Rich Signell
    @rsignell-usgs
    hee hee. BTW, I'm not really using the cluster there, because we are just reading one chunk and displaying it
    If we were going to do some calculations over multiple chunks, however, we definitely would appreciate the cluster
    Chelle Gentemann
    @cgentemann
    I think this is fine as is --- just want to demostrate the fast access ---
    for you Chelle, only for you
    Chelle Gentemann
    @cgentemann
    oh rich, you are the best friend ever!
    Lucas Sterzinger
    @lsterzinger
    That's a super fast TTS! (Time To Sun)
    Chelle Gentemann
    @cgentemann
    so i thought of a name for what you all are working on --- how about zarr-meta it explains what you are doing & doesn't use the term cloud-performant which lucas knows I just love...
    Chelle Gentemann
    @cgentemann
    hey - do you all want to do a write up of this for AGU's earth and space science? I would be your editor! I was just on a call this morning where we want to publish more papers that include jupyter notebooks and actually an article like this would find into ESS porfolio really well. this would be a great short paper on the process, it's implications, and then notebooks showing different data examples. I think this would be really valuable.....ESS is open access.
    Lucas Sterzinger
    @lsterzinger
    I'm happy to be involved in that @cgentemann
    Martin Durant
    @martindurant
    A nice calculation would be a correlation versus time, presumably shorter wavelengths lead longer wavelengths. I think that was the original ask.
    By the way, FITS is special, I could subdivide the frames on the biggest axis, if that would prove something interesting.
    Chelle Gentemann
    @cgentemann
    okay---- is it possible to get this to work? JWST created a yaml for science files called ASDF: https://asdf.readthedocs.io/en/stable/
    Chelle Gentemann
    @cgentemann
    okay - I showed one friend but told him not to share until it is all written up --- he was like OMG they get to keep FITS. omg omg omg! no more arguing!
    Chelle Gentemann
    @cgentemann
    i knew I started a draft. maybe this can be a starting point? https://docs.google.com/document/d/1O2dPeB1smArHg62XcNOxwwEpWDwdUiWIn09XS-fr4tc/edit
    I'm particularly interested to see what is written in the section "Describe FSSPEC without saying FSSPEC"
    Martin Durant
    @martindurant
    I am aware of ASDF, the astro-specific “modern” format (but not cloud-friendly!) that has failed to gain tracton for so long. But there are many hurdles to the references method being a viabla entry to FITS for astro analysis. In order, most important first:
    • the majority of FITS files are whole-file compressed, and this only works for uncompressed. We could maybe cope with bzip2, or even better zstd, but gzip is the one that’s used, and it is probably impossible
    • the astropy stack and other tooling assumes FITS and does not integrate with xarray. I could be sold as a way towards using Dask, which they would want to do (cf https://docs.sunpy.org/projects/ndcube/en/stable/index.html for solar, ie., this type of data)
    • astro coordinates are hard to use in the xarray model (this has been discussed extensively ( pydata/xarray#3620 , still unresolved).
    (but ASDF would work well as a target for referencing! https://asdf-standard.readthedocs.io/en/1.6.0/file_layout.html )
    Chelle Gentemann
    @cgentemann
    omg. you are killing me. they don't use internal compression?
    this is really helpful martin, thanks. if we can demo it working, that would be a win, and we can work more with nasa --- if we can show any advancement for data access - just maybe by uncompressing the gzip files after pushing to AWS, maybe that would be a path forward? the astro community is so FITS FITS FITS.... just like I was binary binary binary 30 years ago....
    Martin Durant
    @martindurant
    So if I did that same dataset, but with
    • the wavelength as a coordinate (so you can select it with a slider)
    • mapping to helio lat/lon (or just show how you can recreate the world coordinates)
    • a version with sub-selection in one of the dimensions, so you do a timeseries on a section of the image better
      … is that enough of an argument?
      Should I do the same for the much more massive set of downsampled JPEG images on the public SDO server?
      I was hoping to get around to the dataset in intake/fsspec-reference-maker#78 , as a nice example of a dataset made by merging on multiple dimensions.
    Rich Signell
    @rsignell-usgs
    So cool that FileReferenceSystem now featuring geotiff!
    https://github.com/intake/fsspec-reference-maker/issues/78#issuecomment-924456900
    Martin Durant
    @martindurant
    Perhaps more importantly: that dataset has FIVE dimensions, three aggregated (and two-dimensional images in each chunk)
    Rich Signell
    @rsignell-usgs
    Yeah, that is also very cool
    @martindurant , did you get contacted by Ivelina Momcheva from the Space Telescope Science Institute? I spoke with her a few days ago and she is very interested in your FITS work (and in Pangeo in general).
    Martin Durant
    @martindurant
    I did not
    Martin Durant
    @martindurant
    OK, I will speak with her this afternoon.
    Rich Signell
    @rsignell-usgs
    Awesome!
    Chelle Gentemann
    @cgentemann
    this is great!!!
    Chelle Gentemann
    @cgentemann
    is anyone else going to Ocean Sciences? This looks like a good session that we should submit this project too: https://www.aslo.org/osm2022/scientific-sessions/#od