These are chat archives for HdrHistogram/HdrHistogram

2nd
Jun 2017
Gil Tene
@giltene
Jun 02 2017 00:48
I'm playing with it, and so far I've run into inconsistency in tests that verify that v.getPercentile() in percentile-based iteration matches getValueAtPercentile().
Gil Tene
@giltene
Jun 02 2017 16:21
Argh!!!! (((100.0 19961)/20000) / 100.0) 20000) = 19961.000000000004
(((100.0 19961)/20000) / 100.0) 20000) = 19961.000000000004
I mean (((100.0 * 19961)/20000) / 100.0) * 20000) = 19961.000000000004
Which means that when computing the 99.805%'ile point in a histogram with 20000 values recorded (which is properly the 19961'th value) you get 19962'nd value if I use Math.ceil to compute the count boundary rather than rounding to nearest.
Gil Tene
@giltene
Jun 02 2017 16:29
This may not seem like a huge deal at first, but:
getPercentileAtOrBelowValue(histogram.getValueAtPercentile(99.8046875)) = 99.805
getPercentileAtOrBelowValue(histogram.getValueAtPercentile(99.805)) = 99.84
Gil Tene
@giltene
Jun 02 2017 16:34
Bottom line: due to FP rounding issues, it is hard to get consistent behavior for percentile math.
Alec
@ahothan
Jun 02 2017 17:12
It will be hard to enforce that invariant especially with variable size buckets. This could be a problem for unit testing with generated random percentile values (such as those that a tool like Hypothesis would generate). I'm not sure it really matters for real use cases. From what I can see most uses don't require such invariant. Maybe @Julian can comment on how important it is for that invariant to be met (or it is only for the sake of running Hypothesis).
Julian Berman
@Julian
Jun 02 2017 22:01
@giltene @ahothan as a user, I think for me the important thing is understanding what guarantee exists
I interpreted what was there as "don't worry, you'll never be off by more than 0.1%"
But yeah what I'm actually after is something precise telling me how accurate to expect to be
(And then Hypothesis would let me / us guarantee it's met)
Because right now I think I can generate datasets that will have arbitrarily large errors right? I stopped playing with things yesterday but I can attach some code if that'd help?
I'm also having trouble reading / understanding @giltene 's post probably because of fun gitter formatting things
Julian @Julian tries harder :)
Julian Berman
@Julian
Jun 02 2017 22:07
haha ok now I see, it's 3 attempts at typing the same thing, not 3 examples :P
Alec
@ahothan
Jun 02 2017 23:18
@julian, do you actually have a real use case? One example where deviating too much would actually matter? Some concrete requirements? I've been using hdrhistogram for other purpose (performance benchmarking for data plane - packet throughput and latency) and could not care less about that invariant because it is precise enough to do my job.
Gil Tene
@giltene
Jun 02 2017 23:59
The numbers above are from an actual histogram that was already there in the Java unit tests. The "accuracy" is not my main concern, it's the lack of circular consistency that bothers me. I feed in one percentile (99.8046875% naturally occurs in percentile iteration, such as what you find when printing percentile distributions), and get a value that correctly covers a slightly larger percentile (99.805%) because the bucket the specific percentile I asked for hits in goes up to the higher percentile. That's fine, but when I use that percentile back, it goes one bucket farther, which means that I now see two different, non-equivalent values for the same percentile: one that told me it's percentile is 99.805%, and a different one that that I get if I ask the for 99.805%'ile value. The fact that the two are not equal is a problem IMO, as that can mess with logic in all sorts of places.