OpenTelemtry Metrics Office Hours in 10 minutes
When using OpenTelemetry together with a database (e.g. PostgreSQL) it would be extremely helpful to be able to retrieve extended information about a problematic query sometimes. There are multiple alternatives:
The database uses OTel in its codebase, providing a configuration parameter for when to add extended information (e.g. PostgreSQL's auto_explain 2s could set the record span bit to 1 for all queries using more than 2s, and add an annotation/attribute/event with the EXPLAIN output to the span). Most queries would not have a span from the database side.
Always run the span in the database, and EXPLAIN output if the query duration is longer than auto_explain's 2s.
some other mechanism?
Will there be other mechanisms that controls when the spans are recorded?
I think this kind of functionality could make profiling (simple/most) database query problems easier :) Also, I guess there is need for knobs to enable/disable certain traces when the application is not experiencing problems...
@meastp I would suggest that the solution is to have detailed and thorough semantic conventions for every common database. We’re actively working out what those conventions should be, so this is a good time to get involved.
Once we have specified a set of attributes for desribing common databases, filtering rules for those attributes can be specified as well. Filtering can then be installed at various points in the telemetry system; for example, as
spanprocessor plugins for every language, as config options for collectors and sidecars, or as content policies for L7 network proxies.