These are chat archives for HdrHistogram/HdrHistogram

22nd
Aug 2017
Marshall Pierce
@marshallpierce
Aug 22 2017 13:43

jonhoo/hdrsample@5f5f6ae is a quick test about the aforementioned multiplication behavior -- it compares multiplying a quantile (f64) by a count (u64 cast to f64) with multiplying a quantile (f64 converted to an arbitrary-precision rational) by a count (u64 as rational) and then converting the result to an f64. This isn't a perfect test (maybe I should convert the first f64 product to a rational and compare that to the rational product?) but when running 10 million random iterations, here's some sample output:

high 5002030 low 0 same 4997969

In other words, the fp product was higher than the rational product half the time and the same half the time but never too low. So, that lends some justification for stripping an ulp off of the product.

Marshall Pierce
@marshallpierce
Aug 22 2017 14:21
jonhoo/hdrsample@2b09300 muddies the water somewhat. That tests (quantile.prev() * count).ceil(), (quantile * count).ceil(), and (quantile * count).prev().ceil() vs the equivalent calculation with rational arithmetic. All three did about the same with 10m random numbers (different random numbers for each one):
mult then ceil
high 0 low 1 same 9999999
mult, prev, ceil
high 0 low 3 same 9999997
prev, mult, ceil
high 0 low 2 same 9999998
Gil Tene
@giltene
Aug 22 2017 15:12
My head hurts
Marshall Pierce
@marshallpierce
Aug 22 2017 16:00
Too much squinting at the eclipse? ;)
Marshall Pierce
@marshallpierce
Aug 22 2017 19:52
Here's a fun fact: quantile = 0.5000000000000001 (1 ulp past 0.5) with a histogram containing 1, 2 produces a value of 1 using (quantile * count).prev().ceil() as u64. However, why are we using ceil()? Seems like that is just introducing yet another FP opportunity to not express an integer well. If I use (quantile * count).prev() as u64 + 1, that value becomes 2, which is what we want for 1 ulp past 0.5.
Marshall Pierce
@marshallpierce
Aug 22 2017 20:10
(Naturally, satisfying that test means that other ones break...)
Marshall Pierce
@marshallpierce
Aug 22 2017 20:16
Ah, ceil() will return the callee if it is already at an fp integer-ish number.
Gil Tene
@giltene
Aug 22 2017 20:40
Yeh, and ciel() of something is the only meaningful alternative to rounding to the nearest integer value. This whole thing started with the rounding leading to surprising results (more surprising than the ulp edge cases), like the 40%'ile of (5, 500, 500000) being 5.
(Or was it that the 25%'ile of that set comes out as 500. Don't remember, but either one is clearly surprising)
Gil Tene
@giltene
Aug 22 2017 20:47
Maybe we should just document +/- 10 ulps on the input parameter and call it a day? I'm pretty comfortable that whatever math we end up doing, even if it accumulates 2ulps of error per step, won't add up to being 10 ulps off.
And +/- 10 ups on fp64 is still much more "accurate" than fp32. Or at least sounds like it is.
Marshall Pierce
@marshallpierce
Aug 22 2017 20:54
Actually, I think my test that came up with 2ulps of wrongness when measuring that part of the calculation was flawed. I'm reworking it now to use better quality (and replayable) random numbers to make debugging easier.
I don't think we can meaningfully say anything about ulps of error on the input parameter because what matters is the product, and the degree to which that can be wrong depends on the size of the histogram.
IOW, if your histogram is 1 << 60 in size, you're probably gonna have a bad day precision wise, and I'm not sure if we can really bound that based on ulps at the scale of the input, which since it's in [0, 1] will have little teeny ulps.
I think we can guarantee (and usefully document) that you might get results that will be wrong by one bucket one way or the other. (In practice, with the .prev() in there to bias things down, we're unlikely to see getting a bucket that's too high, but I haven't convinced myself that it couldn't still happen.)
I suspect that's really what matters to users: sometimes, you will get the adjacent bucket because fp is just that way, but you should never see worse than that.
Marshall Pierce
@marshallpierce
Aug 22 2017 21:00
A statement about the number of ulps boils down to the same thing, but is (to me) harder to mathematically justify.
(sorry for the rust centric syntax -- i just figured object-style is easier to read inline than a bunch of Math.foo(bar))
Marshall Pierce
@marshallpierce
Aug 22 2017 22:01
I really nerd-sniped myself... while looking for the best available seedable PRNG, I found http://www.pcg-random.org/ which has a lot of things to read