Hey all, based on our discussions, I've started a document for housing agent evaluations and debating concerns for logging/telemetry agents.
A basic summary on our experiences with logging agents: * fluentd: Too slow, couldn't handle our volume at any reasonable resource usage * fluentbit: Buggy. They've since fixed their broken JSON handling but there's probably other issues * filebeat: Doesn't handle rotation and deletion properly All but filebeat also had the issue where it would buffer internally if its sink went down, rather than backpressuring to its source. ie. if you're reading from a file, but you can't send the data right now, you read from the file and put it in a new file instead of just...waiting before reading more from the file IMO agents should be as simple as possible and do the bare minimum to get the data off the node into somewhere more robust (eg. message queue). Then any transforms or extra cross-referencing should be a seperate processing step.
@tsloughter that reflects our evaluations as well. By the way, Brian Troutwine (creator of cernan) has been advising us on Vector. I'm not sure if you worked with him at Postmates:
- fluentbit: Buggy. They've since fixed their broken JSON handling but there's probably other issues
The JSON issues! For a little bit of context, and to not repeat it here, see:
Fluentbit reached 1.1 with those issues 👀.
That was fixed quite a while ago, so I suspect this experience is a bit old
@PettitWesley we are still able to produce JSON parsing issues, but I agree, the major ones are fixed!
OpenTelemetry captures metrics, distributed traces, resource metadata, and logs (logging support is incubating now)from https://opentelemetry.io/about/ . Where can I find documentation on the incubation work being done? Is there an example of a Go implementation of this? Does OpenTelemetry act like a wrapper around
login the case of Go or does it provide it's own solution like