Discussion channel to work on creating metadata files that can provide zarr type access speeds to older formated data of many different types
Well thank you! It’s been fun!
So you could have increased the for
ranges in the notebook and come up with something massive?
By the way, I would always recommend adding simple_templates=True
to filesystem
(faster init, should be the default) and maybe also setting chunks=
in open_dataset
. For the latter, the best values to choose depends on your analysis, but it can make a big difference to load times if the original chunks are small. That's why it’s nice to eventually write Intake specs, to hide this kind of detail.
two tests: one with 3yr of data ~900 files runs fine. but then when I scale to 7000 files... no errors but isn't running okay...
900 files: https://jupyter.qhub.esipfed.org/user/cgentemann/doc/tree/shared/users/cgentemann/notebooks/cloud_mur_v41_3yr.ipynb
7000 files: https://jupyter.qhub.esipfed.org/user/cgentemann/doc/tree/shared/users/cgentemann/notebooks/cloud_mur_v41-all.ipynb
ideas on what might be wrong?
@cgentemann here's an example of how I did what Martin mentioned with the NWM stuff. Hopefully there's enough comments/text to explain how it works but let me know if you need help with anything
https://nbviewer.jupyter.org/gist/lsterzinger/8a93fc1780495aa84694f6d4b1a3708e
Lucas you should be working on the ESIP qhub also!
Free compute and S3 storage, courtesy of ESIP (via AWS credits)
Wasn't sure if I was allowed to keep using this after I was done formally working for you. If I were to make publically accessible references/intake catalog of GOES would it be okay to throw it up on the ESIP S3?