These are chat archives for HdrHistogram/HdrHistogram

26th
Apr 2016
Gil Tene
@giltene
Apr 26 2016 01:43
@mjpt777 Agree on the schema def need. Especially with all the variants we now have. WE have two separate things to spec though: One is how an HdrHistogram is encoded, and the other is what a log file format looks like. This recent discussion is about the log file format only [no change to encodings].
@ahothan : The way I see it, reading the next histogram in a file (regardless of tag) is an "advanced" thing: it is only useful if you are trying to efficiently stream through a file and deal with multiple separate tags (e.g. summarize 20 tags [each separately] in a single pass). I'm going to wait with that sort of API, since I'm not yet sure it's needed. And I'll focus on the "read this file with this specific tag" [which looks like grep'ing for the lines with the specified tag].
For finding the tags in a file, I think I'll just do an API for that. I don't like that it requires a separate pass (since e.g. in Java the log reader can work on any input stream). But I also can't think of an easy way to do tag-discovery in a stream and then go back to read stuff based on what was discovered without doing multiple passes. So I'll keep it simple for now.
Michael Barker
@mikeb01
Apr 26 2016 01:54
I would avoid complicating it. I would just have a lamdba-like api that provides the log line, but lazily decodes the histogram. Anything complicated can be built on top, e.g. indexing or filtering by tag or simply building a tag set.
Gil Tene
@giltene
Apr 26 2016 01:55
@mikeb01 : I'm certainly planning on bumping the log version. And non-tagged log files should still be readable with old code. It's new log file versions that actually contain tagged lines that will present a problem for e.g. existing Java and C (and probably C#) code. I expect that the new code for those will be built to parse both tagged and untagged lines.
Michael Barker
@mikeb01
Apr 26 2016 01:56
As long as the version is bumped, my existing will just return an error. It will fail, but with some degree of gracefulness....
Gil Tene
@giltene
Apr 26 2016 01:57
@mikeb01 : as for a new [log file] format that would be more extensible, I'm open to doing that too, but as an additional [major] step after this. @ahothan had some suggestions for that in the past, too. TLV, or JSON like are two major options.
@mikeb01 : re: the tag api. I think a simple api for getting the list of tags in a file is needed regardless of adding a lambda-style api for advanced scanning.
Michael Barker
@mikeb01
Apr 26 2016 02:06
Maybe it is my mildly sadistic nature, but I would add the single scanning API and then see how it was used and what others asked for. Adding all of the features you mentioned earlier are straight forward on top of a scanning API.
Gil Tene
@giltene
Apr 26 2016 05:06
@mikeb01 We already have an application in mind as a driver for this: Cassandra-stress. And I already know that we'll want a log processor that will be able to report on a specific tag (the way does on the default tag right now), will be able to tell you what tags are in the file, and [probably] will produce multiple .hgrm report files, one for each tag in a hlog file.