These are chat archives for HdrHistogram/HdrHistogram

4th
Aug 2015
Michael Haberler
@mhaberler
Aug 04 2015 14:09
hi - I'm looking into using HdrHistogram_c in machinekit.io, a portable RT motion control application
I'm at the most basic steps - cmake ., make test - and get segfaults on both arm7 andx86 debian jessie
should I report an issue, or am I overlooking something?
Gil Tene
@giltene
Aug 04 2015 15:18
@ahothan The purpose for having a compressed payload length in the payload header is to facilitate framing. While a decompressor reading the compressed payload as a stream could determine the end of the payload stream from it's internal encoding, a protocol parser that may want to skip over the payload, or needs to copy the payload into a side buffer for processing would have a hard time determining the number of opaque bytes that need to move.
Gil Tene
@giltene
Aug 04 2015 15:23
@mhaberler If you can put together the actual failure output and specific steps that lead to it, posting about it here is a good first step. If, after a first look, it's an actual issue with the code or build scripts (as opposed to setup on your side), opening an issue on github makes sense. The same is true even if a change could improve the ease of building in ends like yours.
Alec
@ahothan
Aug 04 2015 15:36
@giltene ok as it turned out I did make use of that payload length field for all my decoded payload objects (since they can have a shorter array).
  • @mikeb01, I have converted my iterator code to use the simpler index-based iteration as recommended by @giltene , lots of changes but I see the code is simpler and faster (my whole UT run went from 4 seconds to less than 3). I ended up converting the Java version of the iterators almost line by line as it it a tricky part and will ease future syncs. The UT did help a lot to make sure I did not break anything with the conversion.
    I have also implemented the V1 encoding (and about to deprecate the JSON based) and added more APIs that were missing.
    I only have a few features/API not implemented as I do not need them, the list is in the readme.rst. Would be happy to get any feedback, the source code and readme file are in github:
    https://github.com/ahothan/hdrhistogram
    You can also give it a run pretty easily with a few steps using python pip (as described in the readme). The last version 0.0.5 is uploaded to PyPI:
    https://pypi.python.org/pypi/hdrhistogram

The python API use is best viewed from the UT code (https://github.com/ahothan/hdrhistogram/blob/master/test/test_hdrhistogram.py)
The UT does some V1 compressions and I got the following compressed sizes:

range: 1 usec to 24 hours, 3 digits (the standard wrk2 settings):
best case (all counters at zero) =

Alec
@ahothan
Aug 04 2015 15:44

276 bytes (binary, zlib compression only), 385 bytes (with base64 encoding)
typical case (I used 20% aggregated counters with distinct values at about 30% from the start) =
12712, 17172

worts case (all counters have distinct values) =
60501, 81689

With just 2 digits the sizes get much smaller:
85, 126
1630, 2212
8048, 10879

Michael Haberler
@mhaberler
Aug 04 2015 16:11
@giltene - excellent, thanks - will do
Gil Tene
@giltene
Aug 04 2015 18:55
@ahothan Looks like things are coming along nicely for your pure python imply. Do you want to create a repo for your stuff under the HdrHistogram org? Since we'll likely see both a "pure python" impl (yours) and a wrapped-around-C impl, what's a good naming scheme for the repos to denoting the two?
Michael Barker
@mikeb01
Aug 04 2015 20:57
@mhaberler Definitely raise an issue. Please could you do a couple things that you could do to help me isolate the problem. First do a build with debug enabled cmake -DCMAKE_BUILD_TYPE=Debug. Then if you look in the test directory underneath your build directory there will be 4 binaries named 'test_*'. If you could run each of those independently to see which one segfaults. Attach the core file to the bug report. If you could also run the failing binary with valgrind it would be appreciated.
Alec
@ahothan
Aug 04 2015 21:03
@giltene I'd be happy to move it under the github HdrHIstogram org, do you have instructions on how to do such move? As for naming, to follow the other repo names (HdrHistogram, Hdr_Histogram_c), we could name the repository HdrHistogram_py (too bad the erlang version uses the "hdr_histogram" prefix).
As for the python/C wrapper I leave it to whoever owns it (HdrHistogram_pyc or whatever).
I still have to do some work to verify interop of the compressed format with Mike's C version, then modify wrk2 to use Mike's C implementation to generate the compressed histograms. Will also need to look at the Recorder code.
Alec
@ahothan
Aug 04 2015 21:19
@giltene: Regarding the wire transport protocol for HdrHistogram compressed content, I wanted to get your opinion about standardizing this as well in some form, is it something you are interested in?
First, the base64 wrapping is needed for usages that require a printable content (MIME format, text files...) at the cost of the base64 overhead (+33%). However it may not be required for others who can handle binary content just fine and by doing so we can save some precious CPU by skipping the base64 encoding and decoding. In my python implementation I made that an option of the encoder class (b64_wrap = True by default, will bypass the base64 wrapping if False).
Second, I think it would be great to define a standardized inter-connection protocol for sending compressed histograms since we are already half way done with the wire format. Using text log files may not always be best (especially with periodic reporting at scale).
Gil Tene
@giltene
Aug 04 2015 21:27
@mikeb01 if you add support for auto-sizing (like the Java version) it should be possible to reuse a histogram even if the new parameters don't match the old ones. This should be "easy" if the number of decimal points match (the array length will always be appropriate for some range), and is slightly more complicated if the number of decimal points changes (there may be partial bucket space over). I'm probably
Michael Barker
@mikeb01
Aug 04 2015 21:33
@giltene I didn't do the auto-resize as I've implemented the histogram as a contiguous structure (array at the end), so I would need to rethink the API to make it work or move to a structure which had a pointer at the end so I could resize the array internally. I'm not too worried as I think the only case where you get a big win from reusing the histogram is reading the histogram logs where the histograms are most likely to be constructed with the same parameters.
Gil Tene
@giltene
Aug 04 2015 22:37
I wouldn't do it for the reuse on read alone. But I'd consider it for the auto-sizing benefit. Once you get used to auto sizing, it's hard to go back... This is me Su because you don't have to cover the worst case any more. In many places I just let auto sizing do all the work now. But even in latency sensitive stuf f where I don't wZnt to incite a resize on the fast oath, I now normally size things to cover a common expected range (e.g. to 1 second or less), and let auto sizing take care of handling large latency values. The argument there is that if something I measure took more than a second, the resize cost involved in recording it is negligeable (as opposed to the cost of resizing to record a 2usec op).
Michael Barker
@mikeb01
Aug 04 2015 22:39
I see a bit of a rewrite in my future....
I suspect that separating the header from the body (the likely implementation) will probably simplify some other things too. And here I thought I was being so clever :-).