Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Lucas Sterzinger
    @lsterzinger
    🎉
    As more people start using reference maker (e.g. intake/fsspec-reference-maker#72), should we advertise this gitter in the README?
    Rich Signell
    @rsignell-usgs
    We should create a better name for the gitter if we do, since we know now this isn't just netcdf4
    Hopefully the PyData Global talk will get more attention on this!
    Martin Durant
    @martindurant
    Yes, I think so. There are still plenty of obvious holes like intake/fsspec-reference-maker#74 that had better be fixed.
    I am writing a FITS parser, where times are recorded as ISO strings.
    The coordinates, on the other hand… Also, should waveband be a coordinate or separate variables?
    2 replies
    Lucas Sterzinger
    @lsterzinger

    We should create a better name for the gitter if we do, since we know now this isn't just netcdf4

    Can't we create a gitter directly from that repo name? So it would be intake/fsspec-reference-maker

    Martin Durant
    @martindurant
    intake/ doesn’t have a gitter org (pangeo-data and continuumIO do). But yes, we could do whatever. I still think we might want to wait a bit, though.
    Chelle Gentemann
    @cgentemann
    um i was trying to change the channel name and it offered to change the avatar so i thought well maybe i can just do that but at least on my gitter now all the pangeo icons are gone.... oops.
    shhh don't tell ryan it was me!
    Rich Signell
    @rsignell-usgs
    I still see my Pangeo icons
    I think we should wait a bit also
    Chelle Gentemann
    @cgentemann
    i changed it back! but it is a little different... odd that I could just do that....
    2 replies
    Martin Durant
    @martindurant
    Ooh, the pangeo icon did change...
    Chelle Gentemann
    @cgentemann
    I know nothing.
    but really, WTF. I can accidently change all the icons in a group but in order to change the name of a room I have to email support@gitter??? is this really my fault?
    Martin Durant
    @martindurant
    Bad software design
    Martin Durant
    @martindurant
    400GB of solar imaging referenced in Intake catalog "gcs://mdtemp/SDO.yaml”
    Actually, 2-min cadence images are available in more filters across all times from 2010 to now at http://jsoc2.stanford.edu/data/aia/images/
    Chelle Gentemann
    @cgentemann
    is anyone working on creating the zarr mapping fo rhte solar data ? & a notebook demo-ing the solar image access 'normal' versus fsspec ref maker?
    Martin Durant
    @martindurant
    Above I posted an intake cat with a zarr mapping done. I haven’t done any further work; there no “normal” access to this kind of dataset using astropy, but there is ndcobe , like an alternate xarray, which I don’t have experience with. I don’t think it has a multi-FITS loader from remote, the broader sunpy docs talk about downloading everything.
    Alex Kerney
    @abkfenris
    I've been playing with reference maker and pangeo-forge-recipies and made a repo that uses Github Actions to generate references for OISST each day. https://github.com/gulfofmaine/OISST_intake (look in complete/ and preliminary/ for the references). It's also generating Intake catalogs, but it looks like the file object is pointed to the wrong spot for Intake to load it.
    Martin Durant
    @martindurant
    Instead of "./preliminary/reference.json”, use “{{ CATALOG_DIR }}/reference.json"
    You can open it by the HTTP raw link, or (and I like how this looks) using the fsspec GitHub backend:
    cat = intake.open_catalog("github://gulfofmaine:OISST_intake@/preliminary/reference.yaml”)
    Oh, and you shouldn’t have target_protocol or target_options - they will be derived from the eventual URL
    Martin Durant
    @martindurant
    In general, we don’t want skip_instance_cache: true either, since it can result in repeated work, but I don’t think it will hurt in this case and can be useful for testing.
    When you feel it is ready, please consider adding a setup.py with entry_points and releasing to pypi, so that people can just “install” the dataset and be sure of having the right dependencies. It then appears under the global intake.cat catalog.
    (or submitting it to pangeo-forge, of course)
    Alex Kerney
    @abkfenris
    Ok, I thought that using {{ CATALOG_DIR }}/ would be the right choice. I'm going to try to hand roll a catalog that doesn't have the extra kwargs.
    The reference.yaml's are coming from the recipe generation so those are all the defaults.
    Martin Durant
    @martindurant
    You mean pang-forge-recipes’ HDFReferenceRecipe? The intake stub is intended to be edited (no auomatic way to do that yet).
    Alex Kerney
    @abkfenris
    Yep
    I'll try to write a catalog and remove the stubs after generation
    Martin Durant
    @martindurant
    Having done that, you can make an issue at recipes with ideas about how an automatic catalogue would best be made.
    Alex Kerney
    @abkfenris

    I've pushed up a combined catalog, but it looks like when using the github backend, the repo and org get dropped.

    http_cat = intake.open_catalog("https://raw.githubusercontent.com/gulfofmaine/OISST_intake/main/catalog.yaml")
    http_ds = http_cat.complete.to_dask()
    http_ds['sst'].sel(time="1990-01-01").plot()

    Works, while

    github_cat = intake.open_catalog("github://gulfofmaine:OISST_intake@/catalog.yaml")
    github_ds = github_cat.complete.to_dask()
    github_ds

    Fails with TypeError: __init__() missing 2 required positional arguments: 'org' and 'repo'.

    It looks like the catalog_dir isn't getting fully populated.

    http_cat.complete
    
    complete:
      args:
        chunks: {}
        consolidated: false
        storage_options:
          fo: https://raw.githubusercontent.com/gulfofmaine/OISST_intake/main/complete/reference.json
          remote_options: {}
          remote_protocol: https
        urlpath: reference://
      description: Complete OISST daily data
      driver: intake_xarray.xzarr.ZarrSource
      metadata:
        catalog_dir: https://raw.githubusercontent.com/gulfofmaine/OISST_intake/main

    Compared to

    github_cat.complete
    
    complete:
      args:
        chunks: {}
        consolidated: false
        storage_options:
          fo: github:///complete/reference.json
          remote_options: {}
          remote_protocol: https
        urlpath: reference://
      description: Complete OISST daily data
      driver: intake_xarray.xzarr.ZarrSource
      metadata:
        catalog_dir: github://
    Martin Durant
    @martindurant
    Bahh. So you would need target_options after all. Perhaps Intake should include a “{{ CATALOG_STORAGE_OPTIONS }}” or something - but right now it does not.
    Lucas Sterzinger
    @lsterzinger

    Sorry for the delay, but I think I'm ready to publish my next blog post on fsspec-reference-maker soon (this one focusing on how to use referencemaker + dask to make references for the NWM dataset). Please feel free to leave any comments/suggestions you think might be necessary!

    https://docs.google.com/document/d/1O3N2PJxrZPWHJ1hRirObwF_1_M8UCDv9fjnMyl208Tw/edit?usp=sharing

    Martin Durant
    @martindurant
    Enjoy
    Martin Durant
    @martindurant
    Suggests that I should have made the filter a coordinate rather than separate variables, so they can show up as a selector in the plot. I don’t know why I have to make the coordinates, and I didn’t persuade quadmesh (which took very long) to use the provided coords. Seems I don’t understand how that’s supposed to work ( @rsignell-usgs )
    Rich Signell
    @rsignell-usgs
    Martin you need to use rasterize=True on quadmesh on most grids or you blow out browser memory. Is that what happened?
    BTW, it doesn't look like I provided the link to my most updated HRRR GRIB best time series notebook here: https://nbviewer.jupyter.org/gist/rsignell-usgs/ae7bfb16e8a4049bb1ca0379805c4dc2
    Martin Durant
    @martindurant

    Is that what happened

    Quite likely!
    Do you want to play with the notebook and show me how it should be displayed? Note that the individual frames are still quite big (30MB r something), so it won’t ever be super responsive.

    Also, there’s a bug in the largest wavelength I have yet to fix.
    Rich Signell
    @rsignell-usgs
    Okay, I modified the hvplot command to include rasterize=True, and Martin, that is one super cool image!
    https://nbviewer.jupyter.org/gist/rsignell-usgs/2fa1f8dc93a2ba69a7c85f288398636b
    Chelle Gentemann
    @cgentemann
    holy crap that is cool.
    you know at nasa all the earth scientist are the bummers - oh the climate is collapsing - oh a giant storm is going to hit us. we are all jealous of the helio/planetary/astro people who are like 'look we have a cool RC helicopter on mars!'
    'oh look a pretty picture of the sun'
    rich --- can you put some %%time in the cells?
    Rich Signell
    @rsignell-usgs
    hee hee. BTW, I'm not really using the cluster there, because we are just reading one chunk and displaying it
    If we were going to do some calculations over multiple chunks, however, we definitely would appreciate the cluster
    Chelle Gentemann
    @cgentemann
    I think this is fine as is --- just want to demostrate the fast access ---