These are chat archives for HdrHistogram/HdrHistogram

24th
Aug 2017
Marshall Pierce
@marshallpierce
Aug 24 2017 00:02

Finally found evidence that `(quantile.prev() * count).ceil()` != `(quantile * count).prev().ceil()`:

``````mult, prev, ceil: high: 0 low: 2 equal: 9999998
prev, mult, ceil: high: 0 low: 1 equal: 9999999``````

That's when feeding the same numbers to both cases.

And here's one candidate culprit:
`count 2539606004 quantile 0.6108728978260836 q count prev before 1551376479 q count prev after 1551376480`
In this case, the latter ends up being the answer we want (the precise value of the product without any `prev()` and before a `ceil()` is `1.5513764790000004583659344e9`). I'll let it run for a few hours and see what it comes up with.
Marshall Pierce
@marshallpierce
Aug 24 2017 01:49
Every hundred million random guesses or so it finds another one.
``````count 1692338703 quantile 0.5264026701160898 q count prev before 890851612 q count prev after 890851613
count 2650272743 quantile 0.5201906334513438 q count prev before 1378647057 q count prev after 1378647058
count 3854327114 quantile 0.5345657960675105 q count prev before 2060391442 q count prev after 2060391443
count 1724200216 quantile 0.6420357025404759 q count prev before 1106998098 q count prev after 1106998097
count 3184044753 quantile 0.8399534750509208 q count prev before 2674449456 q count prev after 2674449455``````
Marshall Pierce
@marshallpierce
Aug 24 2017 02:04

Found another wrinkle about testing the accuracy of different FP methods. When generating counts and quantiles to use as input for the various forms of FP arithmetic, if you generate a quantile as `index / count`, which then gets multiplied by `count`, the resulting answer (after `ceil` etc) is the same as the arbitrary-precision result about half the time and too low by 1 about half the time, like this test output (hopefully self explanatory):

``````---- quantile_count_product_fp_rounding_random stdout ----
mult, ceil: high: 0 low: 4604677 equal: 5395323
mult, prev, ceil: high: 0 low: 5002432 equal: 4997568
prev, mult, ceil: high: 0 low: 5002432 equal: 4997568``````

. However, if the `quantile` is random float and the `count` is a random int, the distribution is very different:

``````---- quantile_count_product_fp_rounding_random stdout ----
mult, ceil: high: 0 low: 1 equal: 9999999
mult, prev, ceil: high: 0 low: 4 equal: 9999996
prev, mult, ceil: high: 0 low: 4 equal: 9999996``````
Since in common usage quantiles will be created by iterating via percentiles or asking for `0.5` or something like that, I'm discounting the former case.
Marshall Pierce
@marshallpierce
Aug 24 2017 02:11
So, here are some things we know:
• It does produce different results, but only infrequently, if you `prev()` the product of `quantile * count`, or apply `prev` to `quantile` before multiplying.
• With random quantiles and counts, `(quantile * count).ceil()`, `(quantile.prev() * count).ceil()`, and `(quantile * count).prev().ceil()` all produce all but identical (and very low) rates of error compared to doing that calculation with arbitrary-precision math.
• When looking only at the product `quantile * count`, half the time the result is equal to the result of casting an arbitrary-precision product to `f64`, and the other half it's higher than the f64'd arbitrary-precision.
Marshall Pierce
@marshallpierce
Aug 24 2017 02:22
Addendum to that last point: if, instead of squeezing the arbitrary-precision product into an `f64` and comparing the two `f64`s, I make an arbitrary-precision rational out of the floating-point product and compare the two rationals, half the random samples end up bigger than the arbitrary-precision result, the other half are smaller, and a minuscule fraction end up equal.
Marshall Pierce
@marshallpierce
Aug 24 2017 02:41
(The goal of all this floating point masochism is to try to figure out how to actually write a good test that might help us figure out which of the slightly different styles of arithmetic is actually best by some reasonably objective measure)
Marshall Pierce
@marshallpierce
Aug 24 2017 20:28
OK, I think I may be making a little useful headway.
• We know from tests like https://github.com/jonhoo/hdrsample/blob/026ab71d354c3d91f072f15c427625c70d793cf9/tests/data_access.rs#L758 that compared to arbitrary-precision arithmetic, computing the count at a quantile with any of the three candidate FP routines (no `prev`, `prev` before multiplication, `prev` after multiplication) is almost always going to yield the closest answer, and unsurprisingly the one without a`prev` (just a simple multiplication and `ceil`) in the mix has the fewest errors. The IEEE754 designers did a good job. (The pattern of having a fraction as many errors as the other two options holds with larger sample counts.)
• Given that, I started looking into why I got test failures when not using any `prev` in the quantile count calculation. I replaced as much of the test calculation as possible with arbitrary-precision math (and those tests all passed), but one thing I couldn't replace (since it wasn't in the test code specifically) is the quantile calculation in percentile/quantile iteration. That uses `quantile: self.total_count_to_index as f64 / self.hist.count() as f64` in Rust, with equivalent logic in Java: `(100.0 * totalCountToCurrentIndex) / arrayTotalCount`. However, we've seen in practice that the pattern of `((x as f64) / (y as f64)) * (y as f64)` seems to often produce numbers that are slightly higher than `y`. This is why using `prev()` seemed to help: it counteracted the errors introduced by calculating the quantile count with a quantile produced the way percentile iteration uses.
• Example: histo containing 50 numbers (1 to 50). A seemingly reasonable test is to calculate the quantile for every number and then ask the histogram for the value at that quantile and make sure it's the same. (This is basically what iterating percentiles will do, as well as the obvious "iterate every value" approach.) When you get to the number 7, its quantile should be 0.14 (twice 7%). However, 0.14 can't be represented exactly in IEEE754, so what you get is actually the FP representation of the fraction `1261007895663739/9007199254740992`, or `0.1400000000000000133226762955018` (etc). When you multiply this by `50`, you get `7.000000000000001` (really `7.0000000000000006661338147750939242` etc), which is dutifully `ceil`'d up to `8`. So, the quantile iteration is correct, but it can't produce the actual quantile needed with sufficient accuracy, so the histogram dutifully reports the value it's supposed to.
• The only tests that fail are the ones that iterate this way and produce the troublesome floating point artifacts. If quantiles are random, or calculated not by dividing a int-ish float by the int-ish float version of the total count, all is well.
• Upshot: `prev` just happens to cover up for some of the (essentially unavoidable) sins of calculating a quantile this way. So, my current thinking is that we document accessor methods for quantiles and percentiles and logic that consumes such things (like `valueAtPercentile` and friends) with warnings about FP accuracy ("best effort", "could be off by one bucket", etc), and use `(quantile * count).ceil()` with no `prev` when getting the value for a quintile. (This logic produces `2` for `q=0.5000000000000001` of `[1, 2]` and `19961` for `q=0.99805` of `[1, ... 20_000]`, btw.) Accordingly, we would basically just not test that iterating with a percentile iterator produces the expected results because I don't think that's actually a reasonable standard anymore: we know that such iteration will produce incorrect quantiles/percentiles, and if that's a problem, we should solve it by adding an additional (slower) iterator that uses arbitrary-precision arithmetic.