These are chat archives for HdrHistogram/HdrHistogram

25th
Feb 2017
Gil Tene
@giltene
Feb 25 2017 00:21
Cool. I updated http://hdrhistogram.org
We should probably transition the plotting links from that page to your JS-based hlog file plotter.
Gil Tene
@giltene
Feb 25 2017 00:31
LRHDR Histogram idea: I've been kicking around an idea that is tempting to call "LDRHistogram", but is more properly termed "Low Resolution High Dynamic Range"... It's inspired by the wish to somehow get a work-able way for existing monitoring systems and time-series DBs to deal with response time and latency information in a sensible (and in non-irrelevant-and-useless) way.
The key notion is this: While I'd love to have Time Series DBs store HdrHistoghrams as as value BLOBS (and series of those BLOBs over time), I think it will take a long time for that to happen
What time-series DBs are good at storing and summarizing are numbers. And sensible example uses for such numbers are counts of things that happened, or rates of things that happened in a period of time. Those can go relatively unharmed through the meaning-oblivious processing layers that summarize and plot values and series of those values. Non-sensible example uses for those numbers are anything that has to do with latency or response time magnitude, including percentiles, max, and min values.
Gil Tene
@giltene
Feb 25 2017 00:37
[The reason for the non-sensible uses is that the various summarization and plotting techniques all aggregate values by averaging them. E.g. the value of a pixel in graphing usually represents the average value for the time period covered by that pixel. And averages of response time magnitudes are always non-sensical]
So my thinking is: What if we flipped the problem on it's side? Rather than report a series of latency magnitudes at various percentile levels as is common today (e.g. 50%'ile, 75%'ile, 90%'ile, 90%'ile, 99.9%'ile are commonly carried through by statsd)
what if we instead reported a series of "How many times did a latency higher than X show up?" values. E.g. How many were above [10msec, 20msec, 50msec, 100msec, etc.]?
Gil Tene
@giltene
Feb 25 2017 00:42
With those values (and a count of the total latency measurements in the same period] you could answer questions like "what was the % of responses that took longer than 50msec?" and "Was the % of >50msec response times higher than 1%?". Which, if you think about it, makes a lot more sense than "what was the 99%'ile? and was it higher than 50msec?".
The "trick" here (and historically the reason people don't do this, I think) is that to log counts of "how many times did a latency higher than X show up?" values, you need to know X. Ahead of time. And usually the systems that produce the monitoring information don't really know (or want to hard-code) what the interesting levels or thresholds are...
Gil Tene
@giltene
Feb 25 2017 00:49
This is where my "LRHDR" idea comes in. I have a theory that is yet to be substantiated: I think without knowing what the application is and what the range it is interested in is, we can start from an assumption that all interesting threshold levels start with the digits 1, 2, or 5. E.g. 1msec, 2msec, 5msec, 10msec, 20msec, 50msec, 100msec, 200msec, 500msec, 1 sec, 2 sec, 5 sec, 10 sec, 20 sec, 1 minute, 2 minutes, 5 minutes.
That's 17 values. Covering 1msec to 5 minutes. It's actually a histogram with a high dynamic range and a low resolution, with buckets pre-selected at "humanl-reporting interesting" levels.
Imagine Metrics reporting this vector of counts (along with a total count) for every latency metric being tracked. Imagine statsd doing the same. Imagine graphite storing it, and imagine using graphana et al to plot them...
Gil Tene
@giltene
Feb 25 2017 00:55
These values (counts of threshold crossings) would flow well through those systems. It's ok to aggregate them (count of >50msec in the hour is sum of counts in that hour). It's ok the plot them with pixel-averaging plotters, and it's ok to do ratio-based stuff with them (including computing the percentile of things that crossed or did not cross a threshold over an hour).
so... Thoughts?
Marshall Pierce
@marshallpierce
Feb 25 2017 01:24
Hm, interesting. I'll think it over but it seems like a pragmatic solution to the broader pipeline issue.
Speaking of which I'd love to hear your feedback on some notes I've been intermittently compiling on where to go next with Metrics (the Java library): https://github.com/marshallpierce/metrics-thoughts
My OSS time has been occupied with the Rust HdrH impl but trying out some of ^^ those ideas is next.
Basically I'm trying to fix one end of the chain of impedance mismatches. Metrics does the right thing for viewing graphs locally in jconsole and the wrong thing for most everything else.
Marshall Pierce
@marshallpierce
Feb 25 2017 01:29
It would be nice if we could pick the LRHDR measurement points such that you could convert a "regular" HdrHistogram to an LRHDR. That way you could measure in Hdr and export to systems that could only handle LRHDR-style data.
Gil Tene
@giltene
Feb 25 2017 01:36
Yeh, I think the LRHDR measurement points are the 1,2,5 things because those will satisfy most folks for levels. When I ask people questions like "What's the acceptable level for X" the answer usually starts with a 1, 2, or 5. And even when it doesn't (e.g. I got "300msec" back the other day), they'd tend to happily jump to one of the "close-by" levels (200msec or 500msec) if those were all they had available. It's a much more sensical "rounding" than looking at the 99%'ile (where the 99.01% or 98.99% may have seen a huge jump).
I expect to do exactly what you suggest: measure with HdfH and summarize from there to LRHDR (needs a better acronym probably)
Vladimir Bukhtoyarov
@vladimir-bukhtoyarov
Feb 25 2017 08:27
In case of "Low Resolution" it is not a big deal to avoid hardcoding of intervals. We can just store boundaries of ranges in additional sorted array and use binary search to search appropriated range. Because count of ranges is not supposed to be high, storing of additional array will not consume much memory. Just a simple mockup:
public class LowResolutionHistogramMockup {

    private static long[] DEFAULT_OFFSETS = new long[] {
            1L,
            2L,
            5L,
            10L,
            20L,
            50L,
            100L,
            200L,
            500L,
            1000L,
            2000L,
            5000L,
            10000L,
            20000L,
            60000L,
            120000L,
            600000L
    };

    private final long[] offsets;
    private final AtomicLongArray buckets;

    public LowResolutionHistogramMockup() {
        this(DEFAULT_OFFSETS);
    }

    public LowResolutionHistogramMockup(long[] offsets) {
        if (offsets != DEFAULT_OFFSETS) {
            assertSorted(offsets);
            assertNotContainsDuplicates(offsets);
            assertNotContainsNegative(offsets);
        }
        this.offsets = offsets;
        this.buckets = new AtomicLongArray(offsets.length + 1);
    }

    public void record(long value) {
        int bucketIndex = binarySearchBucketIndex(value);
        buckets.incrementAndGet(bucketIndex);
    }

    private int binarySearchBucketIndex(long value) {
        int index = Arrays.binarySearch(offsets, value);
        if (index < 0) {
            index = -index - 1;
        }
        return index;
    }

    ...

}
Jon Gjengset
@jonhoo
Feb 25 2017 15:17
Minor point, but I'd like to see ┬Ás also be included in the range. It expands it by a couple of values, but there are definitely high-performance applications where it matters