Discussion channel to work on creating metadata files that can provide zarr type access speeds to older formated data of many different types
For the “extra storage” of the references, you might want to note that there are various encoding tricks that work well, the simplest would be to zstd compress the whole json: maybe gets you a factor of 10 in size, but is super fast to unpack.
This is being given to a group with only basic python knowledge and so I was really focused on just showing a concept, not optimization/storage tricks
Lucas, to me it's important you don't say that you need to read the whole netcdf4 file to get the metadata
Yes I agree. This is a piece of information I got from a conversation with Kevin Paul at NCAR -- I think I either misunderstood what he was saying (the most likely reason) or he misunderstood exactly how cloud access happens... maybe thinking of a time where you could only request entire files from object storage.
I will make updates to the presentation before the workshop, thank you for your help!
{"MALLOC_TRIM_THRESHOLD_": "0"}
in the environment variables on your dask workers. " it feels fragile.