These are chat archives for HdrHistogram/HdrHistogram

9th
Sep 2015
Michael Barker
@mikeb01
Sep 09 2015 02:00
There was a small bug with the new test, which I've fixed.
Michael Barker
@mikeb01
Sep 09 2015 02:06
V2 decoding (but not encoding yet) supported in the latest C version.
Gil Tene
@giltene
Sep 09 2015 02:34
strange thing with that Log/log bug. How did this work on my Mac? I didn't think it ignores cases...
Michael Barker
@mikeb01
Sep 09 2015 02:35
I think the Mac file system is magic. It preserves and ignores case.
I notice when allocating the buffer for encoding the LEB128 data, you use countsArrayLength 16. For LEB128 you should only need countsArrayLength 9? And I think the TZLE would only make is smaller?
Gil Tene
@giltene
Sep 09 2015 02:40
you mean wordSizeInBytes?
Michael Barker
@mikeb01
Sep 09 2015 02:40
int getNeededPayloadByteBufferCapacity(final int relevantLength) {
    if (useTzleenconding) {
        return (relevantLength * 16);
    }
    return (relevantLength * wordSizeInBytes);
}
Gil Tene
@giltene
Sep 09 2015 02:40
i.e. countsArrayLength * 16
I do it because that's the worst case.
I think.
Although maybe you are right, since any TZLE count would be on 1 or more zeros
which means it is at worst 8 bytes coded in LEB128, which is 9...
Ok. I'll do a pass to change it to 9 across the board then.
Michael Barker
@mikeb01
Sep 09 2015 02:42
Cool, I was just up to implementing that bit and will use 9 too.
Gil Tene
@giltene
Sep 09 2015 02:42
Also, I'll delete that .Log file and put back a .log for consistency with the other file names
Michael Barker
@mikeb01
Sep 09 2015 02:43
Although it shouldn't make any material difference to the encoding, just a little more space efficient.
Gil Tene
@giltene
Sep 09 2015 02:54
Yes, it will only affect buffer allocations, which go by worst case. Change made and pushed
Michael Barker
@mikeb01
Sep 09 2015 03:35
C implementation now supports v1.2 for both encoding and decoding.
Gil Tene
@giltene
Sep 09 2015 04:01
Cool! Any notes/issues/comments now that you've reviewed it the best way possible (by posting it)?
@ahothan : Do you want to take a shot at supporting V2 encoding and v1.2 log format support (which basically means V2 encoding) in your python version?
Michael Barker
@mikeb01
Sep 09 2015 04:07
Nothing major. I'd probably pull the '9' value used in AbstractHistogram out into a constant (it's used in ~7 places). It was pretty straight forward and worked without too many headaches. I haven't benchmarked or done size comparisons. It did force me to do some important clean ups.
Gil Tene
@giltene
Sep 09 2015 04:10
@ahothan: just sent you an invite for the HdrHistogram organization on github. You should be able to use that to move your repo under that org, I think. You should pick a good name for it under the top level hierarchy, something python-natural that would mention python. HdrHistogram-py, or hdr_histogram_py, or ...
Yeah, I went back and forth on that 9...
Gil Tene
@giltene
Sep 09 2015 04:24
I plan to wait a week or so and then release 2.1.7 to maven central after a sleeping on it for a few nights. At that point the default format for Java will shift, and having a C version released at the same time would be a good thing...
Michael Barker
@mikeb01
Sep 09 2015 07:02
Sounds good. I'll aim for a 1.0 around the same time.
Alec
@ahothan
Sep 09 2015 07:49
@giltene: TBH I was not expecting a sudden change of encoding that early ;-) Iterating through array counters (which is what LEB128 is about) is not going to be very fast on python so I'll need to do some benchmarks to see how V2 fares compared to V1 in terms of CPU.
If size and speed matters that much, have you looked at other compression algorithms that may provide very fast compression speed and good ratio? For me decoding speed is more important as I need to be able to decompress and add as fast as possible, compression ratio is not so much important since the cost for sending 2KB of payload is same than sending 2.5KB (it
's all split into 1500 byte MTU packets)
I managed pretty decent speed on decompress with V1 because I used numpy native speed for array addition, which I can't do with V2
Alec
@ahothan
Sep 09 2015 08:02
@mikeb01: I'm amazed at how fast you can port this in C ! Please let me know before you publish that version as it's going to break my wrk2 code ( I need to stick with your 0.9 until I switch on the python side).
Michael Barker
@mikeb01
Sep 09 2015 08:12
The change is pretty straight forward. Most of the effort was copying across the zig zag value encoding.
Michael Barker
@mikeb01
Sep 09 2015 08:22
I happened to have a spare day at work.
Michael Barker
@mikeb01
Sep 09 2015 08:33
You can use the 0.9.0 tag for the older format, 0.9.1 for the newer one. There is a pre-release build of 0.9.1 with binaries for Mac and Linux. https://github.com/HdrHistogram/HdrHistogram_c/releases/tag/HdrHistogram_c-0.9.1
Gil Tene
@giltene
Sep 09 2015 15:00
After a nights sleep, I realized that TZLE can simply be ZLE, simplifying the code a bit and more importantly, simplifying the description (a negative value is simply a zeros count). The reason TZLE was there to begin with is that I was playing with it before ZigZag, and had the trailing zeros count encoded in the high bits of the actual value that preceded the zero (when the value was below a threshold). ZigZag took the need for that away (as the value would shrink to the appropriate number of bytes and carrying the zeros count as a separate value doesn't increase the payload).
So I'm thinking of changing the (not yet released) V2 encoding to ZLE, and we'll need a corresponding change in C. @mikeb01, that that work for you?
Gil Tene
@giltene
Sep 09 2015 15:31
Ok. I just pushed the change to use ZLE (instead of TZLE). Updated the V2 test log and the related jHiccupV2Log() test.
Gil Tene
@giltene
Sep 09 2015 15:39
@ahothan Regarding compression choices: I was arguing against it before (under the "zlib does this stuff internally" notion, and based on my earlier RLE attempts), but the Khronus folk's SkinnyHistogram convinced me. It turns out that while RLE doesn't make a huge difference (redundant with zlib), zero length encoding does. And the LEB128+ZigZag thing also does. As to their cost (extra pass and logic): The significant reduction in zlib work that they both create seems to pay off from a CPU point of view (overall compression speed is 4x-5x faster in Java this way). My guess is that for most histograms this will also be true for python, because the LEB128 array you'll end up walking is much smaller than the counts array.
Gil Tene
@giltene
Sep 09 2015 15:51
This message was deleted
@ahothan As or other compression schemes, lz4 seems like the best candidate for an additional future compression scheme, since it is so widely used and available in many langs. I just wish there was a BSD licensed Java version (The original lz4 is BSD licensed, but the commonly used Java one out there is Apache licensed). I have no problem with Apache myself, but due to the delicate GPLv2 vs. Apache issues (only some of which I understand) I've been trying to avoid including or depending on code that has potential license compatibility issues for others (e.g. "what if some GPLv2 licensed project wanted to include HdrHistogram"). Currently I either used BSD and/or CC0 licenses for the actual code in the library (the combination is compatible with pretty much all other OSS [and most non-OSS] licenses). I have avoided depending any non-plaform-included external stuff . Bottom line: using/requiring lz4 would add a dependency that I need to understand better from a licensing point of view...
Michael Barker
@mikeb01
Sep 09 2015 18:28
@giltene I'll have a look at ZLE today.
Michael Barker
@mikeb01
Sep 09 2015 19:00
Done.
Michael Barker
@mikeb01
Sep 09 2015 21:54
@giltene The other option for compression would be snappy. It doesn't compress as well (around double the size), but in a couple of places I've used it, it is significantly faster than gzip.