These are chat archives for HdrHistogram/HdrHistogram

19th
Sep 2015
Michael Barker
@mikeb01
Sep 19 2015 01:29
@ahothan Thanks, I've updated the code so that it checks that we don't run off the end of the counts array and also checks that the amount of data read is the same as the amount available so that will return errors if values are truncated or the input exceeds the size of the histogram.
Gil Tene
@giltene
Sep 19 2015 01:32
@ahothan: Re: 32 and 16 bit counters: The java implementation is signed for everything. And accordingly the V1 encodings for 4 word and 2 word sizes only supports counts to Integer.MAX_VALUE and Short.MAX_VALUE. In V2, there is no longer a wire-specified word size, since all counts can go to Long.MAX_VALUE, but can also be as small as 1 byte on the wire...
Alec
@ahothan
Sep 19 2015 14:42

@mikeb01 : new decode looks good (although 9 bytes extra padding could have been reduced to 1)

I was looking at the encoding side and there is one more adjustment there that may be needed when it comes to zeros count:

        int32_t zeros = 1;

        while (i < counts_limit && 0 == h->counts[i])
        {
            zeros++;
            i++;
        }

        data_index += zig_zag_encode_i64(&encoded->counts[data_index], -zeros);

@giltene: is there a reason why the zeros counts is limited to min negative value in 32-bit?
because the code above does not check that bound and this will cause the decode side check to error out:

while (data_index < data_limit && counts_index < h->counts_len)
{
    data_index += zig_zag_decode_i64(&counts_data[data_index], &value);

    if (value < INT32_MIN)
    {
        return HDR_TRAILING_ZEROS_INVALID;
    }

Might be simpler to just allow negative values on 64-bit and get rid of the check in the decode?

Alec
@ahothan
Sep 19 2015 14:47
actually the decode side will not error out since the zeros variable will wrap and become negative, which translates into a positive value being encoded...
Alec
@ahothan
Sep 19 2015 14:56
just realized that all indices to navigate the counts array including size are 32-bit, I guess max counts_len has to fit inside a int32_t.
Gil Tene
@giltene
Sep 19 2015 15:11
@ahothan the use of 32 bit ints for counting zeros in limit is there for convenience because Java array indexes are 32 bit ints. A wrapping negative value presumably won't be possible because counts_limit can never reach 2^31. With precision limited to 5 decimal points and highest trackable value limited to 2^63-1, the highest possible length of a counts array is something like 2^17 * ((63-17) + 1). Which is less than 2^23. Even if we expanded decimal points up to 7 the counts array will stay below 2GB.
Gil Tene
@giltene
Sep 19 2015 15:21
BTW, the reason I cap it at 5 is "stupid user tricks". Initial versions were not capped so low, and I got several "I tried this and it took forever and then ran out of memory" questions from people who didn't realize they are dealing with a number than controls the exponential size of the histogram storage. I found a cap of 5 to be reasonable enough to keep running (with 6 you can end up with a 44MB histogram, and with 7 a 640MB histogram).
Alec
@ahothan
Sep 19 2015 19:05
@giltene ok that sounds good, good to know that the max counts_len cannot exceed that size. So this plugs the encoding side.
On the decoding side we still need to be super paranoid and protect against malicious encodes.
I think I found one more issue with the latest C decode ... more on this later