These are chat archives for HdrHistogram/HdrHistogram

8th
Mar 2016
Marshall Pierce
@marshallpierce
Mar 08 2016 01:59
Another question in the queue: in AbstractHistogram:shiftValuesRight, the comment before it throws ArrayIndexOutOfBoundsException says that "Any shifting into the bottom-most half bucket would represents a loss of accuracy". I think I follow, but just in case... Let's say you have a sub bucket count of 2048 and min value 1 (unit magnitude 0), and you have a value of 2048 recorded (first entry in bucket index 1). Conceptually if you shift right twice, that's 512 which can be represented in the bottom bucket, but adjusting the counts index (2048) by -1024 * 2 would get you to bucket 0 which maps to value 0 which is wrong. However, value 4092 (index 2047+1024-1=3070) gets shifted to index 3070-2048 = 1022, which is correct. So, because the bottom half bucket of bucket 0 isn't on half the scale of the top half, the index shift by a multiple of subBucketHalfCount doesn't work right -- is this the "loss of accuracy" you were referring to, aka being wrong?
Gil Tene
@giltene
Mar 08 2016 03:19
Yes. The simple logic test for maintaining accuracy is: if I shift to the right and then shift back to the left, will I always regain my original values? When you shift from lower order buckets that are above the lowest half-bucket, the lower order bucket's unit is 1/2 the size of the next higher bucket maintain the accuracy (e.g. 4000 +/- 2 is the same accuracy as 2000 +/-1, and can be e.g. safely shifted back the other way while maintaining the original 4000 value. Same goes for 2002 +/-2, and since there is not way to represent 4001, it's not a problem). However, when shifting into the bottom half bucket from e.g. the top half of the 0th bucket, you are going from a unit size of 1 to a unit size of 1, where you really need a unit size of 1/2 to maintain accuracy. E.g. shifting 801 +/- 1 by one to the right goes to 400 +/-1, which when shifted back to the left will be 800 (and not the original 801).
Marshall Pierce
@marshallpierce
Mar 08 2016 04:20
Ah, I see. So it's two things: for top half to bottom half shifts, it loses accuracy, and for >0'th bucket to bottom half, it's simply wrong.
Any thoughts re: the ternary for subBucketHalfCountMagnitude? Also, when calculating unitMagnitude, it's (int) Math.floor(...) -- wouldn't the int cast already floor? Or is there some subtletey there I'm missing
Gil Tene
@giltene
Mar 08 2016 10:02
I'll look at the ternary and Math.floor questions later. For the shift thing: the examples I gave were for shifting right by 1. When shifting right by N the same logic applies: right shifts to any buckets other than the bottom half bucket work right (unit ratio is power of 2 that matches the shift), and shifting into the bottom half bucket always has the problem of the unit ratio being off by 1 from what is needed to maintain accuracy.
For left shifts there is a matching unit scale issue, which is handled by AbstractHistogram#shiftLowestHalfBucketContentsLeft. Unlike all other buckets, where shifting left can leave values in place in the array since the unit size scales with the shift, shifting values in the bottom half left requires them to be copied one by one to the relevant target indexes, zeroing the original index as we go, and carefully doing it in the right order (if done in a single pass).
Marshall Pierce
@marshallpierce
Mar 08 2016 16:31
OK. I'll implement shifting soon while all this is fresh, but first I'll need getCountAtValue to be able to test shifting, so on that note: in this logic final int index = Math.min(Math.max(0, countsArrayIndex(value)), (countsArrayLength - 1));
  • why is the max there? Even an undersized value shouldn't result in a < 0 index since the bucket and sub bucket index calculations just end up at 0 when the request value is less than the initial min value.
  • for when the value requested is too big and would be clamped to length - 1 by the min, I'm wondering if there's a specific reason it's this way instead of expressing "that value is too big" in some way? As it is, if you had a histogram with a max expressible value of 65,535, and you'd recorded a value there, and you ask for the count at 100 quadrillion, you'll get 1, which seems misleading.