These are chat archives for HdrHistogram/HdrHistogram

17th
Sep 2015
Alec
@ahothan
Sep 17 2015 00:43

@giltene that rule of the 9th byte is always "full" is it standard LEB128 or a "special" HdrHistogram variant?
The GPB python code looks pretty clear on this. Examples:
https://github.com/google/protobuf/blob/master/python/google/protobuf/internal/encoder.py

def _VarintSize(value):
"""Compute the size of a varint value."""
if value <= 0x7f: return 1
if value <= 0x3fff: return 2
if value <= 0x1fffff: return 3
if value <= 0xfffffff: return 4
if value <= 0x7ffffffff: return 5
if value <= 0x3ffffffffff: return 6
if value <= 0x1ffffffffffff: return 7
if value <= 0xffffffffffffff: return 8
if value <= 0x7fffffffffffffff: return 9
return 10

As you can see the size for a full 64-bit value is 10 not 9 and the encode/decode loops do not make a special case of the 9th byte.

With the 9th byte being full rule, the max encodable value will be 63-bit.
Gil Tene
@giltene
Sep 17 2015 02:52
Hmmmm.....
Gil Tene
@giltene
Sep 17 2015 03:06
Scanning through various LEB128 implementations, I do see the (potential) 10 byte encoding in most. I guess some of the protocols I was reading were "non-standard" LEB128 (The original LEB128 was only used for 32 bit values). So this 64 bit encoding should probably called something other than LEB128. E.g. kryo uses this same 1-9 byte encoding in various things (e.g. http://kryo.googlecode.com/svn/trunk/src/com/esotericsoftware/kryo/io/UnsafeOutput.java) .
I/we should change the comments then to say the equivalent of "LEB128 variant with a 9 byte max length for 64 bit variables".
Michael Barker
@mikeb01
Sep 17 2015 03:09
It seems wasteful to need an extra byte to hold 1 bit, especially when the purpose of it is encoding efficiency.
I guess it stems from the original purpose was to support encoding arbitrary length integers not specifically 32/64 bit.
Gil Tene
@giltene
Sep 17 2015 03:10
I guess the 9 byte variants came along at some point when people realized that 64 bit integers are "special" in that that last bit left over can replace the "more" encoding in the common variant. For 32 bit ints, this never needed consideration. Strange that it wasn't picked up by Google though for GPB. It would be especially obvious for non-ZigZag encoded negatives. (with ZigZag, only very large positives or very large negatives would ever use that 10th byte).
Gil Tene
@giltene
Sep 17 2015 04:01
@ahothan @mikeb01 : Ok, I did find a few other places that use this same variant (9 bytes max for 64 bit) for varLong encoding. [It's what SkinnyHistogram was using, which is what got me started on V2 to begin with). But I found no special name designating it (VarLongS was the closest I found for a variant name). I just pushed comment changes (in the Java code) to document the fact that this is not a standard LEB128 format. I'm calling it a LEB128-64b9B variant.
Alec
@ahothan
Sep 17 2015 05:21
Fair enough. I don't think many people will have 64-bit values in their counters anyway, so that is mostly an academic discussion. Still for the sake of completeness we should document that counters are 63-bit only (at least if they have to be encoded) and the code might as well throw an exception if it ever has to encode a 64-bit counter value? Who knows when that can happen...
Gil Tene
@giltene
Sep 17 2015 06:25
@ahothan : Long.MAX_VALUE is within the encoding range, so I see no need to note anything. LEB128-64b9B has the full 64 bit range. The encoding uses negative numbers to encode Zero run lengths, but negative counts are undefined and cannot be recorded. As such, the encoding supports the full range of positive integers possible in 64 bit signed integers.