These are chat archives for HdrHistogram/HdrHistogram

15th
Aug 2017
Marshall Pierce
@marshallpierce
Aug 15 2017 14:26
Finally getting around to poking around in this change. I find this floating point arcana pretty hard to wrap my mind around, so I could be missing something, but I think that maybe subtracting the ulp in getValueAtPercentile could be slightly wrong in some conditions. Julian alluded to nextbefore vis-a-vis ulp's availability in other languages, and I think that might actually be more correct. Might it not be possible that x - ulp(x) is in fact smaller than the largest fp value smaller than x? If so, that could lead to ceil(x - ulp(x)) being one smaller than ceil(nextbefore(x)). I can implement it either way (https://docs.rs/ieee754/0.2.2/ieee754/trait.Ieee754.html has all I need), and as of Java 1.8 we have https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html#nextDown-double- also.
Marshall Pierce
@marshallpierce
Aug 15 2017 14:49
There are plenty of places where x - Math.ulp(x) < Math.nextDown(x), e.g. 32.0 has a nextDown of 31.999999999999996 but 32.0 - ulp(32.0) = 31.999999999999993.
Not too surprising that powers of 2 are where the ulp would tick up in size, I suppose...
Marshall Pierce
@marshallpierce
Aug 15 2017 15:03
I haven't found a double where Math.ceil differs between x - ulp(x) and nextDown(x), but it does seem like nextDown is a better fit to compensate for the behavior in the comment in getValueAtPercentile about fpCountAtPercentile being 1 ulp too big.
Alec
@ahothan
Aug 15 2017 15:45
I am having some trouble to implement this in python because I could not find an ulp() function. I could get a next up and next down and a diff between next up and next down gives me a different number than the ulp value in Java since the same test case fails. I like @marshallpierce 's idea of using next down instead
Marshall Pierce
@marshallpierce
Aug 15 2017 15:46
You could just wrap the rust implementation... ;)
marshallpierce @marshallpierce puts the rust evangelism hat back in the corner
Marshall Pierce
@marshallpierce
Aug 15 2017 15:47
ulp will be tough for most languages, I think
Hypothetically if you can get the raw bits you can cast to a 64-bit int and manipulate that, but I doubt that's universally available
Julian Berman
@Julian
Aug 15 2017 15:49
Does rust have ulp?
Marshall Pierce
@marshallpierce
Aug 15 2017 15:50
Yes, someone has written a nice wrapper https://docs.rs/ieee754/0.2.2/ieee754/trait.Ieee754.html that exposes pretty much everything you could want
(since you can decompose an f64 to its raw bits, it's only a matter of some mildly painful wrappers and accompanying tests)
Julian Berman
@Julian
Aug 15 2017 15:51
Interesting, I didn't know of a language other than Java that exposed it
I think I follow your message above though, think it matches my intuition from way back when too
Marshall Pierce
@marshallpierce
Aug 15 2017 15:52
really it's Math.doubleToRawLongBits that's the key in Java's case -- once you have that it's just some bit wrangling. You could write Math.ulp yourself with a large dose of IEEE754 courage if you have the raw bits :)
Julian @Julian nods
Julian Berman
@Julian
Aug 15 2017 15:52
That operation Python has, I think the edge cases are the annoying parts
Marshall Pierce
@marshallpierce
Aug 15 2017 15:52
yes it's definitely a pretty finicky piece of code -- edge cases are easy to get wrong and hard to spot
Julian Berman
@Julian
Aug 15 2017 15:56
(tbqh yeah I'd just go with wrapping the rust lib if it's well tested, it's about ~5 minutes of Python work to wrap it via cffi, but yeah I suspect the nextbefore thing is more important to decide)
Marshall Pierce
@marshallpierce
Aug 15 2017 15:58
The core logic is very well tested; I merged my original rust version (and its tests) with jonhoo's rust efforts. There are a few things, like iteration, that could use more tests, but I don't know of any actual issues with them -- I just have a few tests left to port that would make me feel better.
Julian Berman
@Julian
Aug 15 2017 15:58
Nice
Marshall Pierce
@marshallpierce
Aug 15 2017 20:29
I've been tinkering with this (https://github.com/jonhoo/hdrsample/compare/master...marshallpierce:percentile?expand=1) and I have a particular case that fails: with 1..100000 as the contents of a histogram, iterating to 64.415th percentile yields a value of 64415, but calculating the value for that percentile uses a fractional count of 64415.000000000015, whose previous FP value is 64415.00000000001 which then rounds to 64416. That's in the next bucket, so the calculated value is 64447. Wouldn't this be a value that is too high?
Interestingly, if I change the order of FP operations from (percentile / 100.0) * total countto (percentile * total count) / 100.0, the test passes.
Marshall Pierce
@marshallpierce
Aug 15 2017 20:34
https://play.rust-lang.org/?gist=5903994dbea3a8290b62c573d488d4f4&version=stable shows why: the latter ordering yields the more accurate 64415.00000000001.
In other words, (percentile / 100.0) * total count can be 2 ulps high. (In this instance, x - ulp is the same as the immediately previous FP value, so that's not a problem here.)
My numerical analysis is rusty (haven't used it since undergrad) so I won't claim to have a good rationale for why that calculation is more stable with the other ordering, but when total_count is more than a small number (100k in this case), it does kinda make sense to me that you wouldn't want to multiply the (imprecise) fp division of percentile / 100.0 by a large-ish number.
Marshall Pierce
@marshallpierce
Aug 15 2017 20:45
Of course, percentile * total_count would also have error. In the worst case of percentile = 100.0 and total_count = u64::max_value(), the ulp of that product is 262144. (That would then be divided by 100.0.)