These are chat archives for HdrHistogram/HdrHistogram

Aug 2017
Gil Tene
Aug 21 2017 04:41
My reason for taking the ulp off the quantile (or percentile) is that that's something we can clearly document in the API (as I do in the current JavaDoc). However, I'm not advocating that we ciel() the input. My current implementation removes 1 ulp (with the equivalent of .prev()), from the input percentile (or quantile), then multiplies it by the totalCount (and /100.0 for percentile), and only then does the ciel(). Have you found any errors in using this technique (i.e. behavior that does not match the documentation?). Note that while we "could do better" in some cases, e.g. with fractions that would yield the correct result even without taking off 1 ulp, the results in those cases are not "wrong", since they fall within the expectations that the documentation sets.
Marshall Pierce
Aug 21 2017 17:57

Sorry if that wasn't clear; I wasn't suggesting to ceil() the input, only agreeing that it does make sense to ceil() the product. (Is there a good name for the thing I've been describing as "quantile count", i.e. quantile * count?)

I haven't dug very deep about prev()-ing the quantile before multiplying by the count or prev()-ing the product, though some quick tests seem to indicate that these tend to be identical:

            let q_count_prev_before_mult = (quantile.prev() * length as f64).ceil();
            let q_count_prev_after_mult = (quantile * length as f64).prev().ceil();
However, I still think that documenting any particular number of ulps of precision is a bit optimistic. compares arbitrary-precision arithmetic (starting from an f64 quantile so as to be fair) with the usual floating point arithmetic, and it finds lots of cases where the resulting error is 2 ulps. Maybe if I left it running for a while it would find 3 or 4 ulps, who knows.
Marshall Pierce
Aug 21 2017 18:04
So, to summarize:
  • quantile.prev() * count).ceil() seems to be equivalent to (quantile * count).prev().ceil()inasmuch as 5 minutes of effort failed to find a case where they were different. I find them both difficult to rationalize, in that the best I can do is "it seems to work better if you throw a prev in there because FP multiplication where one argument is the FP equivalent of an integer is frequently just a touch too large".
  • We definitely cannot guarantee +/- 1 ulp since there are plenty of cases that are +/- 2 ulp, and I am not confident that the upper bound of the error is 2.