Sep 2018
Rufus Pollock
Sep 15 2018 11:29

But, did finally manage to get something pushed!
My gosh did it swell up though. The original data was about 29MB uncompressed. Compressed it's about 6MB. But the CSV version on DataHub is 56MB. The JSON version is 173MB, and even the zipped version containing both is 21MB.

That's kind of expected. Crudely put, my sense atm is that "storage is cheap" vs the cognitive and processing complexity of managing compressed. That said, we keep actively reconsidering this (esp if smaller storage means faster processing). For example, we think about supporting parquet too.