These are chat archives for HdrHistogram/HdrHistogram

11th
Dec 2017
Jon Gjengset
@jonhoo
Dec 11 2017 00:04
@giltene As an aside, if you still want me to move the Rust HdrHistogram port under the HdrHistogram org on GitHub, could you add me to the HdrHistogram organization?
I can't actually move the repository if I don't have access to the place it's moved :p
See also jonhoo/hdrsample#60
Alec
@ahothan
Dec 11 2017 16:11
@marshallpierce , @giltene : is there any progress on the specification for the HdrHistogram encoding? I see the common repo is still empty (assuming that spec would be stored there)?
Marshall Pierce
@marshallpierce
Dec 11 2017 16:12
I got sidetracked by implementing interval log read/write support. That was wrapped up recently.
The experience did leave me persuaded that we should probably not shoehorn any extra cross-implementation testing metadata into the interval log format, so that's at least one decision made (for me at least)
Alec
@ahothan
Dec 11 2017 16:13
I am trying to see if that binary format can be adopted by networking benchmarking tools to store latency histograms (I am more specifically working in Network Function Virtualization and trying to coordinate between various standards).
Marshall Pierce
@marshallpierce
Dec 11 2017 16:13
It certainly can; that's a natural use case
Alec
@ahothan
Dec 11 2017 16:14
This will be related to work ongoing in IETF (RFC), ETSI and OPNFV
Marshall Pierce
@marshallpierce
Dec 11 2017 16:14
So, you just need a spec of the serialization format for one histogram, not the interval log format?
Alec
@ahothan
Dec 11 2017 16:15
yes only the serialized encoding
It would also be nice to have a spec detailing the accuracy and limitation of this encoding
(mostly related to how Hdr deals with buckets)
CUrrently the networking benchmarking industry does a terrible job reporting and storing latency
Marshall Pierce
@marshallpierce
Dec 11 2017 16:16
OK. That part is fairly straightforward. I can help you write up a spec (IETF flavored in EBNF even) but in the meantime take a look at https://github.com/jonhoo/hdrsample/blob/master/src/serialization/v2_serializer.rs. It is commented and fairly straightforward, and has lots of tests.
I can also explain the accuracy loss.
Alec
@ahothan
Dec 11 2017 16:16
and that was acknowledged by the standards community, so it might be good to push forward Hdr
Marshall Pierce
@marshallpierce
Dec 11 2017 16:17
Sure. The things (to my mind) this data structure brings are quantifiable, predictable loss of accuracy and very fast recording.
Alec
@ahothan
Dec 11 2017 16:18
exactly and I don't know of any other solution that can do this better (open source or commercial)
Marshall Pierce
@marshallpierce
Dec 11 2017 16:18
You will be able to have a t-digest that's compact, but it's probabilistic so you don't really know what you're getting, for instnace.
Sometimes that's ideal, but sometimes it's not.
In essence, HdrH works by having linear lengths of k sub buckets that are grouped into buckets. Each sub bucket in a given bucket has the same width, and the next bucket will have sub buckets that are twice as wide.
Alec
@ahothan
Dec 11 2017 16:20
I don't know yet how this will go, if it is accepted by the community, we could even make an RFC out of this encoding
Marshall Pierce
@marshallpierce
Dec 11 2017 16:20
(The first bucket is a bit of a special case but we can gloss over that)
Alec
@ahothan
Dec 11 2017 16:20
with backing for all existing implementations in multiple languages
Marshall Pierce
@marshallpierce
Dec 11 2017 16:21
Basically, you would have 1024 buckets of width 1, then another 1024 of width 2, etc. This tends to do "what you want": high precision at small measurements, less precision at large measurements. With carefully chosen sizes, the math can be done efficiently with bit shifts, etc, so recording is very fast.
(Stop me if I'm boring you and you already know how all this stuff works)
Alec
@ahothan
Dec 11 2017 16:21
yes I know how it works ;-)
Marshall Pierce
@marshallpierce
Dec 11 2017 16:22
We'd certainly have the "multiple interoperating implementations" out of the way
K, I'll save my fingers the typing then. ;) Anyway, I've got my hands pretty full on other projects so I can't push such an effort forward but I'm happy to help edit and improve documents you're preparing.
I think an RFC is more of a political statement than an effective technical tool for something like this but in certain markets (especially networking) it's not real if it doesn't have an RFC.
Alec
@ahothan
Dec 11 2017 16:24
tI"m in the middle of all this, between networking vendors, telecom operators, test equipment vendors, independent test organizations
and latency just sucks the way it is done today
Marshall Pierce
@marshallpierce
Dec 11 2017 16:24
I agree; it's a big problem in many fields that try to measure, well, anything
If only someone would pay me for a year to go off in a corner and build up tools around this so people could stop using "mean latency" just because they have nothing else to look at
Alec
@ahothan
Dec 11 2017 16:25
unfortunately it will have to rely on people like you and me, who must do this as a side job,
I thought there was a starter text document on the Hdr encoding isn't that the case?
Marshall Pierce
@marshallpierce
Dec 11 2017 16:26
There is, in the .NET documentation, but it's pretty bare bones. If you want something like an RFC, I'd start from scratch and fill it in using that document and the source
I may be biased, but I think the rust implementation is pretty comprehensible and well documented. ;)
Alec
@ahothan
Dec 11 2017 16:27
ok, fair enough - I'm not familiar to rust and not intending to be ;-)
Marshall Pierce
@marshallpierce
Dec 11 2017 16:27
I don't think you need to be to understand how the serialization logic works.
Alec
@ahothan
Dec 11 2017 16:27
I guess the java implementation would be the reference
Marshall Pierce
@marshallpierce
Dec 11 2017 16:28
It is, but it's a little more obtuse, since Gil knew what he meant when he wrote it.
In the Rust port, I didn't just blindly port the logic; I made sure I understood each element and commented where I felt it would have been helpful for my own understanding the first t ime.
The Java impl definitely has the widest feature set, the best backwards compatibilty with old serialization formats, and the most usage by far I'm sure.
Alec
@ahothan
Dec 11 2017 16:30
ok let me chat with my IETF contacts to see how we can push that and will get back, if there is any interest to help push for any form of RFC standard in this community, let me know
Marshall Pierce
@marshallpierce
Dec 11 2017 16:30
I think it could well be a valuable tool to help get wider adoption of the format. I'll do what I can to help, but my time is pretty limited.
Keep us apprised