These are chat archives for HdrHistogram/HdrHistogram

Oct 2016
Michal Kotelba
Oct 26 2016 01:24
I'm currently working on some ~fancy Dropwizard Metrics integration and talking w/ @vladimir-bukhtoyarov (the author of metrics-core-hdr):
Does anyone have any thoughts/ideas?
Michal Kotelba
Oct 26 2016 01:31
In particular, I'm looking to impl/use Counters + Meters + Timers in a performant sampling/snapshotting manner, but, even w/ Vladimir's lovely work, Metrics itself doesn't quite seem to have a clean way to do, well, much of anything :P
Marshall Pierce
Oct 26 2016 02:48
@mkotelba I'm pondering how to address these issues (which I think are largely structural, and not likely to change in metrics v3) for v4. I'd love to hear what your use case is.
Capturing data is only half the battle; for it to be useful it needs to be packaged up and shipped to somewhere else for storage, display, etc. So, the challenge for Metrics (the library) is not only to do a good job capturing but also to try to get some consensus from the other half (Graphite, Prometheus, Datadog, etc) on how to represent things like a histogram.
Michal Kotelba
Oct 26 2016 03:35

@vladimir-bukhtoyarov's ideas in dropwizard/metrics#1016 are related.

I do of course agree re: the importance of reporting, but, IMO, its all for naught if the capturing bit doesn't manage to accurately record what is desired, in our cases, an atomic snapshot of all Metrics.

That said, I think the default/base impl of the Metrics, MetricRegistry, etc should maintain compatibility. Perhaps moving + increasing the visibility of the actual impl guts of those classes, and thus allowing for clean extensibility, could be a start?

Marshall Pierce
Oct 26 2016 03:43
Could be, though I'm not sure it make sense to have arbitrarily complex (custom) data structures be well represented in metrics -- what's the benefit vs just writing custom code to handle that? Anyway, we should probably take this discussion somewhere else since I think we're straying from hdrhistogram specifically.
If you're an IRCer, #metrics on freenode would do the job.
It's getting late in my time zone; I'll catch up with you tomorrow.
Gil Tene
Oct 26 2016 04:55
I obviously think that interval histograms should be a basic data type (blob) that is transported out of Metrics, maintained in histogram data form via summarizers and conduits like statsd, and stored as interval histogram data in time series databases. All latency, response time, and other time-length related reporting should be logged as that data type, and not in the silly ways it is logged today (averages, percentiles, etc.).
Getting Metrics to at least internally maintain all response time related metrics as histograms all the way through is a good start, and the work Marshall has done on an HdrHistogram based resorvoir is a good start for that. But at some point we need to start working on things that are external (statsd, graphite) so that Metrics could send them that data without losing it's valuable representation.
Given how compact the wire form for interval histograms tends to be (they are typically only a couple of hundred bytes in size), that's the form I'd like to see for on-the-wire transmission and for time-series-DB storage...
Michael Barker
Oct 26 2016 06:31
It would also be useful if various stores had first class support for hdr histograms so that operations like aggregating across a time range could be performed by the DB.
Michal Kotelba
Oct 26 2016 06:45

@giltene I also love me some HdrHistogram :)

However, we should keep in mind that not all reporters are / will be live network connections to some variety of stats service. For example, the Metrics pipeline in the apps/services I'm working on are completely decoupled from (and know absolutely nothing about) the potential, eventual destination of their output, a remote Elastic Stack pipeline (Filebeat -> Logstash -> Elasticsearch <-> Kibana).

Periodically, my reporter is serializing all metric values as the fields of a Logstash JSON entry.

In the past, all of the values exposed by a given metric were being output (i.e. count, each of the percentiles, etc for histograms), but I'm working on limiting this down to just the raw "values" (ex. counts mapped to timestamps for timers) and having elasticsearch do the analysis + aggregation work (surprise, it also uses HdrHistogram :D).

Vladimir Bukhtoyarov
Oct 26 2016 08:36
Sorry guys, that we are using this space to discussing problems which not related to HdrHistogram directly. @marshallpierce could you respond directly to dropwizard/metrics#1016?
Could somebody look to this small request HdrHistogram/HdrHistogram#115?
It is little fix which prevents unnecessary allocation of inactiveHistorgam inside constuctor of Recorder, for cases when clients prefer to use getIntervalHistogram instead of getIntervalHistogramInto.