These are chat archives for HdrHistogram/HdrHistogram

Mar 2017
Marshall Pierce
Mar 31 2017 00:00
@giltene While rolling out some internal monitoring, I took the opportunity to explore a [Timer|Latency|Candy]Gram-like setup. It was pretty straightforward. The service being monitored is in an ahem "runtime-challenged" language, so it emits individual measurements to statsd via UDP as "timers" in statsd-speak. statsd is a mess, but it does more or less work. Anyway, statsd occasionally flushes that timer data to graphite. statsd emits a heap of useless things (like "mean of the 90th percentile") each time it flushes. Mercifully, you can disable its percentile emissions, so that will save you some graphite space, but you can't turn off mean, std dev, etc. You can, however, reduce or eliminate the retention of those metrics in graphite. You can enable "histogram" calculation in statsd, which will place timer measurements into user-configurable buckets. Those will show up in graphite as, etc.
You can then use Grafana, or whatever dashboard you like, to execute a Graphite function: divideSeries(sum({bin_500,bin_1000,bin_...}), is the fraction of things that are in the 500ms bin (so, greater than bin smaller than 500) and up.
So, it's not obvious, but it's not hard either.
Unfortunately there isn't a way to stop statsd from emitting the bogus, but tempting, mean/median/etc. (And while it's doing all that work, it's letting UDP packets queue up in the kernel, because nodejs... sigh)