These are chat archives for HdrHistogram/HdrHistogram

Aug 2017
Marshall Pierce
Aug 24 2017 00:02

Finally found evidence that (quantile.prev() * count).ceil() != (quantile * count).prev().ceil():

mult, prev, ceil: high: 0 low: 2 equal: 9999998
prev, mult, ceil: high: 0 low: 1 equal: 9999999

That's when feeding the same numbers to both cases.

And here's one candidate culprit:
count 2539606004 quantile 0.6108728978260836 q count prev before 1551376479 q count prev after 1551376480
In this case, the latter ends up being the answer we want (the precise value of the product without any prev() and before a ceil() is 1.5513764790000004583659344e9). I'll let it run for a few hours and see what it comes up with.
Marshall Pierce
Aug 24 2017 01:49
Every hundred million random guesses or so it finds another one.
count 1692338703 quantile 0.5264026701160898 q count prev before 890851612 q count prev after 890851613
count 2650272743 quantile 0.5201906334513438 q count prev before 1378647057 q count prev after 1378647058
count 3854327114 quantile 0.5345657960675105 q count prev before 2060391442 q count prev after 2060391443
count 1724200216 quantile 0.6420357025404759 q count prev before 1106998098 q count prev after 1106998097
count 3184044753 quantile 0.8399534750509208 q count prev before 2674449456 q count prev after 2674449455
Marshall Pierce
Aug 24 2017 02:04

Found another wrinkle about testing the accuracy of different FP methods. When generating counts and quantiles to use as input for the various forms of FP arithmetic, if you generate a quantile as index / count, which then gets multiplied by count, the resulting answer (after ceil etc) is the same as the arbitrary-precision result about half the time and too low by 1 about half the time, like this test output (hopefully self explanatory):

---- quantile_count_product_fp_rounding_random stdout ----
        mult, ceil: high: 0 low: 4604677 equal: 5395323
mult, prev, ceil: high: 0 low: 5002432 equal: 4997568
prev, mult, ceil: high: 0 low: 5002432 equal: 4997568

. However, if the quantile is random float and the count is a random int, the distribution is very different:

---- quantile_count_product_fp_rounding_random stdout ----
        mult, ceil: high: 0 low: 1 equal: 9999999
mult, prev, ceil: high: 0 low: 4 equal: 9999996
prev, mult, ceil: high: 0 low: 4 equal: 9999996
Since in common usage quantiles will be created by iterating via percentiles or asking for 0.5 or something like that, I'm discounting the former case.
Marshall Pierce
Aug 24 2017 02:11
So, here are some things we know:
  • It does produce different results, but only infrequently, if you prev() the product of quantile * count, or apply prev to quantile before multiplying.
  • With random quantiles and counts, (quantile * count).ceil(), (quantile.prev() * count).ceil(), and (quantile * count).prev().ceil() all produce all but identical (and very low) rates of error compared to doing that calculation with arbitrary-precision math.
  • When looking only at the product quantile * count, half the time the result is equal to the result of casting an arbitrary-precision product to f64, and the other half it's higher than the f64'd arbitrary-precision.
Marshall Pierce
Aug 24 2017 02:22
Addendum to that last point: if, instead of squeezing the arbitrary-precision product into an f64 and comparing the two f64s, I make an arbitrary-precision rational out of the floating-point product and compare the two rationals, half the random samples end up bigger than the arbitrary-precision result, the other half are smaller, and a minuscule fraction end up equal.
Marshall Pierce
Aug 24 2017 02:41
(The goal of all this floating point masochism is to try to figure out how to actually write a good test that might help us figure out which of the slightly different styles of arithmetic is actually best by some reasonably objective measure)
Marshall Pierce
Aug 24 2017 20:28
OK, I think I may be making a little useful headway.
  • We know from tests like that compared to arbitrary-precision arithmetic, computing the count at a quantile with any of the three candidate FP routines (no prev, prev before multiplication, prev after multiplication) is almost always going to yield the closest answer, and unsurprisingly the one without aprev (just a simple multiplication and ceil) in the mix has the fewest errors. The IEEE754 designers did a good job. (The pattern of having a fraction as many errors as the other two options holds with larger sample counts.)
  • Given that, I started looking into why I got test failures when not using any prev in the quantile count calculation. I replaced as much of the test calculation as possible with arbitrary-precision math (and those tests all passed), but one thing I couldn't replace (since it wasn't in the test code specifically) is the quantile calculation in percentile/quantile iteration. That uses quantile: self.total_count_to_index as f64 / self.hist.count() as f64 in Rust, with equivalent logic in Java: (100.0 * totalCountToCurrentIndex) / arrayTotalCount. However, we've seen in practice that the pattern of ((x as f64) / (y as f64)) * (y as f64) seems to often produce numbers that are slightly higher than y. This is why using prev() seemed to help: it counteracted the errors introduced by calculating the quantile count with a quantile produced the way percentile iteration uses.
  • Example: histo containing 50 numbers (1 to 50). A seemingly reasonable test is to calculate the quantile for every number and then ask the histogram for the value at that quantile and make sure it's the same. (This is basically what iterating percentiles will do, as well as the obvious "iterate every value" approach.) When you get to the number 7, its quantile should be 0.14 (twice 7%). However, 0.14 can't be represented exactly in IEEE754, so what you get is actually the FP representation of the fraction 1261007895663739/9007199254740992, or 0.1400000000000000133226762955018 (etc). When you multiply this by 50, you get 7.000000000000001 (really 7.0000000000000006661338147750939242 etc), which is dutifully ceil'd up to 8. So, the quantile iteration is correct, but it can't produce the actual quantile needed with sufficient accuracy, so the histogram dutifully reports the value it's supposed to.
  • The only tests that fail are the ones that iterate this way and produce the troublesome floating point artifacts. If quantiles are random, or calculated not by dividing a int-ish float by the int-ish float version of the total count, all is well.
  • Upshot: prev just happens to cover up for some of the (essentially unavoidable) sins of calculating a quantile this way. So, my current thinking is that we document accessor methods for quantiles and percentiles and logic that consumes such things (like valueAtPercentile and friends) with warnings about FP accuracy ("best effort", "could be off by one bucket", etc), and use (quantile * count).ceil() with no prev when getting the value for a quintile. (This logic produces 2 for q=0.5000000000000001 of [1, 2] and 19961 for q=0.99805 of [1, ... 20_000], btw.) Accordingly, we would basically just not test that iterating with a percentile iterator produces the expected results because I don't think that's actually a reasonable standard anymore: we know that such iteration will produce incorrect quantiles/percentiles, and if that's a problem, we should solve it by adding an additional (slower) iterator that uses arbitrary-precision arithmetic.