by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Marshall Pierce
    @marshallpierce
    neat, will have a go at the rust version of PackedHIstogram when I have a moment or three
    Alexandre Victoor
    @Alex_Victoor_twitter
    Hello all and happy new year!
    I have make some progress writing the TypeScript version of PackedHistogram. I am almost done writing a first version of the underlying packed long array. However I still can get why during the resize process, there are 2 steps: a first step where only the non leaf nodes are copied and a second steps dedicated on the leaf nodes.
    @giltene I have tried to merge those two steps into one, copying leaf nodes in method 'populateEquivalentEntriesWithZerosFromOther' instead of zeros. It seems to work fine, all tests are green. Would you be interested by a pull request?
    Gil Tene
    @giltene
    @Alex_Victoor_twitter The reason for the separation is concurrency races. Look at https://github.com/HdrHistogram/HdrHistogram/blob/master/src/main/java/org/HdrHistogram/packedarray/ConcurrentPackedLongArray.java#L66 too see the difference between where the structure is prepared (when the context being prepared is not active) and where the values are populated (where the context being copied from is inactive, and the newly created context is).
    The preparation step produces a more space-efficient re-packing of the array, since if we simply added the values one by one without that structure-replication step, a bunch of intermediate steps in creating entries would leave "dead" entries in the storage.
    In a non-concurrent implementation, you can probably simplify this quite a bit. The code was written "concurrent first" with the non concurrent implementation being a special case, rather than the other way around...
    Gil Tene
    @giltene
    In addition, in a non-concurrent case, you can also reduce the entry size. The previousVersionIndex field in each entry is only needed to deal with concurrency. And the protocol described in https://github.com/HdrHistogram/HdrHistogram/blob/master/src/main/java/org/HdrHistogram/packedarray/AbstractPackedArrayContext.java#L92 can probably be dramatically simplified as well.
    And last: I chose to use 8 leaf entries per word mostly because of Java atomic array limitations (it only supports atomic arrays of ints and longs). If Java supported atomic byte or atomic short arrays, I'd probably elect to have the leaf level be smaller (only 1 or two leafs per word). It's hard to tell what the storage impact of that choice is.
    For now, I'd just copy the logic as is. But changing the leaf level shift from 3 to 2, 1, or 0 may produce even better packing (or not).
    (I'd probably keep it at 1 at the very least to make alignment simpler).
    Gil Tene
    @giltene
    BTW, porting probably produces the best code review. E.g. this cool catch by @Alex_Victoor_twitter (HdrHistogram/HdrHistogram#165) got by several others people who had skimmed over the code...
    So if anyone else wants to start a port to help find my logic bugs... (hint hint @marshallpierce)
    Alexandre Victoor
    @Alex_Victoor_twitter
    @giltene Thanks a lot for the explanations! I get now why there are two steps in the resize/copy process. Thanks also for the advices. I have already simplified things a little bit, removing concurrent related stuff. Right now I have kept room for previousVersionIndex only to be able to use the java version has a reference implementation while debugging, it won't last long :)
    marshallpierce @marshallpierce eyes TODO list sadly
    amirhadadi
    @amirhadadi
    @giltene I noticed that getMean and getStdDeviation in AbstractHistogramBase use recordedValuesIterator in a non thread safe manner.
    This means that even for a histogram that's not being changed, calling getMean and getStdDeviation from multiple threads is not thread safe.
    I guess that's intentional right?
    Qihe Bian
    @ufosky
    In rust, how to make SyncHistogram windowed?
    Gil Tene
    @giltene
    @amirhadadi It's definitely a choice. Histograms other than SynchronizedHistogram do not offer safe multithreaded access beyond what they explicitly promise (e.g. ConcurrentHistogram and AtomicHistogram make specific promises that relate to recording behavior but not to observability). And when things are not thread-safe, I consider "early or more likely failure in the presence of unsafe use" to be a feature, not a bug... Yes, it is probably possible to make the observation of a non-mutating histogram thread safe, but I don't see much value in that, and don't really want to allocate new iterators for each e.g. getMean() call, or skip using the built-in iterators in internal implementation.
    Alec
    @ahothan
    @giltene / @mikeb01 / @marshallpierce , could you confirm the histogram log file format was changed to support tagging and when was this added?
    I got a request for supporting this (HdrHistogram/HdrHistogram_py#27) and would like to get more info about it.
    Jon Gjengset
    @jonhoo
    @ahothan for what it's worth, I'm the maintainer of the Rust port (along with Marshall), and I'm the one who reported it :)
    See also the last comment I added

    Specifically that the Java documentation for the log format (https://www.javadoc.io/static/org.hdrhistogram/HdrHistogram/2.1.12/org/HdrHistogram/HistogramLogReader.html) says

    A valid interval description line contains an optional Tag=tagString text field, followed by an interval description.

    Not sure when that was added though

    Alec
    @ahothan
    @jonhoo thanks for the pointer, I was not aware there was a tag there, I wonder if the java repo has a set of "reference" histogram log files, including some with the optional tag, which I can update the python version with, that would allow us to find out if our implementation is "broken"
    anyway, I might be able to fix this maybe by next week (unless someone else does a PR)
    not a lot of code to change but updating the UT will need some work
    Jon Gjengset
    @jonhoo
    Yeah, it'd be really nice if we had some common set of test files across the different HdrHistogram implementations. The Rust implementations have some files over in https://github.com/HdrHistogram/HdrHistogram_rust/tree/master/tests/data that may be useful
    Michael Barker
    @mikeb01
    It was added in version 1.3 of the log format. The C version at the moment just discards the value. Raise a ticket on the C version if you need it and I will have a look at the best way to surface it.
    Alec
    @ahothan
    @jonhoo ok thanks, actually found one set of files under https://github.com/HdrHistogram/HdrHistogram/tree/master/src/test/resources/org/HdrHistogram, so I'll just copy that set over.
    @mikeb01 looks like optional tags were added 4 years ago (duh). Thanks for confirming python version is not the only one missing that support
    Alec
    @ahothan
    @jonhoo i'm closing your request as I just pushed the tag support for logs. Thanks!
    Jon Gjengset
    @jonhoo
    @ahothan Great, thank you!
    Gil Tene
    @giltene
    "one of these days" we should put together an actual document on the common format stuff under the "common repo at https://github.com/HdrHistogram/Common , and view that as the master for common serialized, wire format, and log format definitions. Any volunteers?
    Michael Barker
    @mikeb01
    Would it be awful if we put an arbitrary limit on the size of a tag value? I'm thinking 1023 characters.
    Martin Thompson
    @mjpt777
    If it is a numbered tag then 20 characters is sufficient.
    Michael Barker
    @mikeb01
    It's a string.
    Gil Tene
    @giltene
    @mikeb01 that seems reasonable. Even less may be ok. E.g. 32 or 80
    Jon Gjengset
    @jonhoo
    @giltene That would be super helpful. Especially if paired with a sort of standardized test suite, that can then be ported to different languages. It's a little sad that the different ports all currently have mismatched test suites, when I'm sure we could find (and fix) all sorts of issues if they all had the union of the tests.
    Marshall Pierce
    @marshallpierce
    perhaps something like a directory structure organized for easy parsing, like a text file with one number per line for the original data and another file with the serialized form, etc
    Alec
    @ahothan
    first easy step would be to move the "reference" log files to that common repo, the each individual project can run its UT to read log files from that repo
    the other main source of differences is the apis that are not implemented everywhere (such as the tag feature for histogram logs)
    I thought @marshallpierce proposed to do a spec writeup of the histogram binary format, is that available?
    Marshall Pierce
    @marshallpierce
    I did propose to do that long ago -- never got around to it :( The Rust implementation has some top level docs at least: https://docs.rs/hdrhistogram/7.1.0/hdrhistogram/serialization/index.html
    Alec
    @ahothan
    Maybe you could move that under Common and somehow that will get more detailed over time from more contributors?
    Marshall Pierce
    @marshallpierce
    Added to my todo.
    Michael Barker
    @mikeb01
    I've release HDR Histogram 0.10.0, which includes tag support. https://github.com/HdrHistogram/HdrHistogram_c/releases/tag/0.10.0
    Michael Barker
    @mikeb01
    I'm considering what level of compiler version I should support. At the moment it is C90, but I've used a bunch of C99 stuff (stdint, stdbool). I'm considering just making the C99 the specified standard. Anyone actively using older compilers that needs that level of support?
    Linus Sellberg
    @yxhuvud
    standardized test suite would indeed be very nice. I remember having to look at and steal a bunch from a bunch of different suites when implementing (pretty half-assed) a recorder for crystal
    Michael Barker
    @mikeb01
    I've released a 0.10.1 which fixes some compile error on new compilers. I'm decided to switch to c99 as the compiler standard which I'll include in a 0.11.0 release shortly.
    Marshall Pierce
    @marshallpierce
    I'd go with whichever c standard makes your life as a maintainer easier. If people want old stuff supported on code they get for free, they can ask for an enterprise support contract IMO...
    Marshall Pierce
    @marshallpierce
    (plus, C99 support was complete in gcc by 2010 -- surely a decade is enough time, I say while remembering embedded projects working with Micron's weird toolchain based on a prehistoric version of gcc)