These are chat archives for HdrHistogram/HdrHistogram

26th
Sep 2017
Marshall Pierce
@marshallpierce
Sep 26 2017 14:26
Also, it sounds like line wraps in json won't be an issue: the spec requires that they be encoded as "\n" rather than being left as a literal ascii 0xA. IMO the spec is unclear on this but in practice implementations seem to do this replacement.
Alec
@ahothan
Sep 26 2017 15:38
@giltene we can certainly piggy back (again) on the histogram logs metacomments by adding the histogram content description. In that case there is no need to create a binary json file and the associated side car.
the only downside is we have a more and more complex metacomment section in the log file, with a mix of different nested syntaxes (plain text, dates, now json) which looks a bit unruly
Note that nesting a large json dict in a large comment might not be the best / most readable format, perhaps using yaml will be more appropriate
Marshall Pierce
@marshallpierce
Sep 26 2017 16:31
I think anything we jam into a comment in the log format isn't going to be conveniently readable by humans, but i do share the concern that we may be stretching the log format a bit far. It means that for use as a source of test comparison data, the parser would have to be structured such that it emits data not normally needed for consumption of a log, like comments. Naturally, the existing parser in the Java impl, for instance, ignores comments that aren't the magic #[StartTime, etc, but now it would have to expose a hypothetical #[BagOfOtherStuff foobarbaz], which would mean either a separate parser that is mostly the same as the normal parser, or extending the parser to produce a stream of events that one could use to reproduce the original file, or in the case of testing, extract the test-related data.
So, I think being able to define canonical expected values for both single histograms and for logs (and the extra data in a log, like timestamps and tags) is great, but I think that doing it all in one file is not so great: it isn't a great fit for what the log file currently does and how it's parsed, and it gets us back to having to rewrite input files to be amenable to test use, which I think is inherently less desirable than leaving input data pristine and providing separate "this is the answer you should have gotten" data.
Marshall Pierce
@marshallpierce
Sep 26 2017 16:37
Granted, my inclinations here are probably colored by my experiences working with things like Chromium's corpus of X509 test data (various forms of bogus certs, etc) that were stored in a custom PEM-flavored thing that required assembly of base64 snippets from here and there. It was inconvenient even to read it, and it made me nervous -- was a parsing error because I was seeing an expected failure, or because I accidentally fat-fingered a base64 copy-n-paste? So, I prefer untouched input files so that you git commit them as is, and then can be confident that nobody has mucked with them, rather than having to look through diffs that rewrite test metadata to be sure they don't also modify the input data.
Given that log parsing is also something I would like to test concretely, I'd prefer not to have to parse the log to extract the data that would let me test parsing the log.
Alec
@ahothan
Sep 26 2017 17:47
I tend to agree on the overloading of the log file (when will it stop ? ;-))
I already tried to steer @giltene to a more amenable (and more modern) format - to no avail. Nothing against the good ol' log format used but at some point it becomes a bit of a stretch to put so many odd looking and mismatched things in the comments
And I did not see that these additional metacomments would be in every log, but they woudl be optional and only in logs that are serving the purpose of validation test (of course the parser will have to support the optional piecese nonetheless)
and please do not bring in X509 and ASN.1 in the pictures ;-)
Marshall Pierce
@marshallpierce
Sep 26 2017 17:55
I will not suggest ASN1 for anything. :)