These are chat archives for HdrHistogram/HdrHistogram

16th
Aug 2017
Marshall Pierce
@marshallpierce
Aug 16 2017 15:36
Other than me unpacking my numerical analysis text, does anyone have an idea how to judge the relative stability of the two different orders of FP operations, or how to bound the number of ulps the result could be wrong by? I've already reproduced it being wrong by 2 ulps, but I'm not confident that that's the upper bound
Gil Tene
@giltene
Aug 16 2017 17:39
I updated the Java version with changes to use nexdtAfter() [nextBefore is only available since Java 8], and changed the multiply/divide order per @marshallpierce 's comment above (the multiply-before-divide order clearly has a positive effect on the roundoff behavior). I also ported the test noted above and added one that iterates on the inserted values rather than percentiles (so covers each value).
I'm still nervous about the ulp behaviors, since I don't have a strong handle on what the actual bound is. Need an FP expert (and don't want to become one).
Marshall Pierce
@marshallpierce
Aug 16 2017 19:02
I haven't found any relatively accessible documentation on FP error bounds :( http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html is rich in facts but not very applicable.
Gil Tene
@giltene
Aug 16 2017 21:27
I'm thinking of moving the ulp removal (/nextBefore/nextAfter) to the percentile parameter itself, as opposed to the countAtPercentile computation. This will match the +/- documentation (that's the one we can actually document expectations on because the percentile parameter is what we are given). In addition, doing so seems to remove the sensitivity the order of multiply/divide discussed above.
Gil Tene
@giltene
Aug 16 2017 21:47
Per the above, I just updated the java version to remove the ulp from the requested percentile rather than the resultant countAtPercentile computation. The new code seems cleaner, and easier to explain expectations on. It "makes sense" that removing the ulp earlier in the computation (while still meeting the documented +
/- 1 ulp expectations) gives us bigger breathing room (since that ulp delta is then multiplied by totalCount/100).
Julian Berman
@Julian
Aug 16 2017 21:47
@marshallpierce I am clearly not an expert either, but possibly you'd find the hypothesis test to be a useful way to find floats satisfying whatever constraint
I think I posted my source above but if not will find it now
Gil Tene
@giltene
Aug 16 2017 21:49
It may be that the right thing to remove the ulp from is actually the 0..1 quantile (percentile/100.0), btw. But then the documentation on the API will be bothersome. For those of you who have a quantile API (instead of percentile), that will probably be the right thing to do.
Julian Berman
@Julian
Aug 16 2017 21:50
looks like the state I ended up leaving this in is actually populating hdrhistograms, but yeah you can probably just go back to looking for floats: https://bpaste.net/show/c4ebd034b59e
Gil Tene
@giltene
Aug 16 2017 22:03
@Julian I still expect to fail the original hypothesis right at the edge, unless the hypothesis is re-written to accept the values at the requested percentile +/- 1 ulp. The example test at HdrHistogram/HdrHistogram@abb45f6 shows the currently expected/accepted edge behavior I'm talking about: Where being a "tiny bit" above a requested percentile (1 ulp above 50.0% in this case of a histogram with only two values in it) can lead to moving into the next count unit, and to a large change in the value reported for the requested %'ile.
Any more that that 1 ulp and we'll get the "right" result
Marshall Pierce
@marshallpierce
Aug 16 2017 22:37
Perhaps we should just expose a valueAtQuantile that we can have tighter bounds on?
And document valueAtPercentile to say "if you care about precision, look at valueAtQuantile"
Marshall Pierce
@marshallpierce
Aug 16 2017 22:49
If we allow iteration and querying based on quantiles, that would really make reasoning about fp errors simpler.
I'm going to take a stab at making pctile iteration just be a wrapper around quantile iteration.
Marshall Pierce
@marshallpierce
Aug 16 2017 23:13
So far so good. Mainly that's just removing multiplications and divisions involving 100.0, which can only do good things FP-wise.
Marshall Pierce
@marshallpierce
Aug 16 2017 23:41
I've pushed the rust changes to that same branch; it looks like the Java PercentileIterator is pretty similar and could also end up skipping a few factors of 100.0, like this one in AbstractHistogramIterator's next(): ((100.0 * totalCountToCurrentIndex) / arrayTotalCount)
In other words, we're not losing anything for users of percentile-flavored data by storing quantile instead and simply returning 100.0 * quantile for percentile.