These are chat archives for HdrHistogram/HdrHistogram

29th
Jul 2015
Gil Tene
@giltene
Jul 29 2015 05:31
@ahothan I'd focus on V1 and skip V0 (unless you want to be able to read old V0 format log files in python). You can consider the Java code to be the "documentation" of the V1 format for now. But the format is pretty straightforward: Compressed V1 encodings have a 2 word header (cookie and compressed payload length), and their payload is a compressed payload that decompresses to a V1 Non-Compresses encoding. You can see the compressed format decoding logic in https://github.com/HdrHistogram/HdrHistogram/blob/master/src/main/java/org/HdrHistogram/AbstractHistogram.java#L1937 .
Non-compressed V1 encodings (which can appear either "raw" or as the compressed payload of compressed V1 encodings) have a 40 byte header, made up of 3 32 bit signed int, two 64 bit signed longs, and 1 64 bit double (in that order, all BIG_ENDIAN encoded). The header fields can be seen in the decoding logic here: https://github.com/HdrHistogram/HdrHistogram/blob/master/src/main/java/org/HdrHistogram/AbstractHistogram.java#L1788 , are their roles are: payloadLengthInBytes, normalizingIndexOffset, numberOfSignificantValueDigits, lowestTrackableUnitValue, highestTrackableValue, integerToDoubleValueConversionRatio
Gil Tene
@giltene
Jul 29 2015 05:40
Non-compressed V1 encodings include as many array entires as needed to cover the maxValue. But they do not truncate the bottom of the array. This is not a real issue for compressed forms (which are the ones commonly used for logs and wire formats) since a beginning zeros sequence compresses so well that the logic and complexity to avoid it was considered a waste (at least for V1).
Alec
@ahothan
Jul 29 2015 16:10
@giltene: thanks for the pointer I will have a look at the Java code and implement V1 only. I'm still working to get most functions in HdrHistoram_C implemented and tested in python. I have implemented all iterators and now debugging one issue with a linear iterator where the number of iterations is smaller than the test code expects. To be short, when recording a corrected value of 16800000 with an expected interval of 10000, values 16790000 and 16780000 fall into the same bucket in python, causing the linear iteration with a bucket size of 10000 to fall short of 1 iteration (since that counter bucket has a value of 2). Will try the C version to see what is going on...
Alec
@ahothan
Jul 29 2015 16:17
@giltene: regarding array bottom truncation, I have no doubt the compression of leading zeros is very efficient, my concern is more about memory allocation while decompressing since python zlib always create a new storage on decompress. In my particular usage with wrk2, the default lowest trackable value is 1 usec while the min value is often in the range of msec, so we have a lot of zero counters at the bottom. Anyway I'll try it the V1 way and will see how bad it really is (worst case I can always raise my lowest trackable value in wrk2)