These are chat archives for HdrHistogram/HdrHistogram

Sep 2015
Sep 14 2015 18:12

size difference (bytes with 2 digits precision, 1usec to 1 day range)
V1 V2 V2/V1
empty histogram 52 48 0.92
30% filled in center 2096 1688 0.80
all full 10744 9340 0.86
Clearly V2 is better.
On the cpu side - as expected - V2 is really slow for pure python - compared to the optimized V1 that uses numpy arrays.
Times are in seconds to encode or decode 1000 times a histogram that is filled at 30%:
V1 V2
Encode 0.79 4.69
Decode 0.23 3.11

Profiling shows that the zigzag decode takes about 60% of the time while the iteration on the varint byte stream takes 30%. So these 2 alone make up 90% of the time.
I'll have to look at C wrappers if I had to deploy V2 as it is... Google has C++ libraries wrapped in python, will need to see if the varint encoders/decoders are exposed (GPB prepends a type to every varint). Even if I could use those C++ code, that will only make up for the 60%. So looks like I'll need to look at a small C code that does everything ;-(

GPB provide pure python zigzag codec which I borrowed (to get the numbers above) and an option to use native C++ implementation.