@marshallpierce It helps to start with visualizing the buckets and sub-buckets. Lets start with the requirements:
We are given a number of decimal points of accuracy that we need to maintain. This means that results need to be accurate to within one unit up to the decimal point range given. So e.g. given a 3 decimal point accuracy, the expectation is obviously for "+/- 1 unit at 1000". It also means that it's "ok to be +/- 2 units at 2000". The "tricky" thing (the reason for that 2x you ask about) is that it is NOT ok to be +/- 2 units at 1999. Only starting at 2000. So we internally, we need to maintain single unit resolution to 2x 10^decimalPoints.
The 1'th bucket covers from 2048..4097 in multiples of 2
The 2'h bucket covers from 4096..8191 in multiple of 4
Bucket 0 is "special" here. It is the only one that has 2048 entires. All the rest have 1024 entries (because their bottom half overlaps with and is already covered the previous buckets).
We can think of this as needed storage for N+1 buckets, each with enough slots to hold 1/2 of the largestValueWithSingleUnitResolution (the lower half is covered by previous buckets), and the +1 being used for the lower half of the 0'th bucket.
The indexing math in AbstractHistogram#countsArrayIndex uses a trick where it subtracts subBucketHalfCount from subBucketIndex when computing the index. This subtraction will result in a positive value in all buckets except the 0th bucket (since a value is only in one of those bucket if it is larger than half the bucket's starting-at-0 range). For bucket 0, the negative result works because the bucketBaseIndex for bucket 0 is half-way through it's sub-buckets...
I'll enqueue another question -- I'm not clear on why leadingZeroCountBase is defined the way it is. Is it because it represents the number of leading zeros of the largest value that can fit in bucket 0, namely 64 - (subBucketHalfCountMagnitude + 1)?
LargestValueWith... Thing: you are technically correct. But (a) larger than required is ok (smaller than required would be incorrect). And (b) since we limit decimal points to 5 or 6, and we round up to nearest power of 2, it will matter exactly never...
Yep, definitely. I want to get Phaser support in there too.
My long term goal (which I'd love to pick your brain on some time) is to build a statistically sound way of handling all metrics, which for histogram-suitable metrics would mean that the wire format would be how histograms are reported to the mothership, and are then aggregated into longer timeframes there
My thoughts on that aren't fully formed yet, but basically it comes down to everything can be viewed as a (possibly discontinuous) function, and for certain types of data, we can effectively compress it to a single number (e.g. we may record only max queue depth over the last minute because we do not care about other numbers). I talked a little to a statistician and they told me that I wasn't crazy.
Really what's bothering me about current metrics processing systems is that they have no way to express the semantics that are valid for a given sort of data. Numbers aren't all the same. You can't do the same things with a number like queue depth (a point-in-time assessment of a single attribute of a complex system that depends on all previous time) that you can with, say, 99.9%tile service time.
OK, I think I've taken enough of your time for one evening; thanks for the help. I'm sure I'll have other questions later, but I can proceed further through the record path now and actually have some confidence in what I'm blindly porting.
largestValueWithSingleUnitResolution can be 2 * 10^0 = 2 at the smallest, and subBucketCountMagnitude = log_2(largestValueWithSingleUnitResolution) = 1, so I think the ternary here is unnecessary since sBCM is never 0. subBucketHalfCountMagnitude = ((subBucketCountMagnitude > 1) ? subBucketCountMagnitude : 1) - 1;
Also, I think that's just Math.max(subBucketCountMagnitude, 1) - 1 even if the ternary is necessary, and max can be implemented with sufficient bit twiddling to avoid a branch, which I assume/hope has been done
Anyway, I'm off to sleep; I'll be interested to see what you have to say on TSX and such things. Cheers