Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Max Golovanov
    @maxgolov
    Ignore code coverage.
    We don't want to disable it - sometimes it gives good insights. But in most cases, esp. for small PRs - it gives false positives. We use our best judgement to ignore it when necessary.
    TomRoSystems
    @TomRoSystems
    Thanks. I've closed previous one and reopened new one - open-telemetry/opentelemetry-cpp#447 (please approve)
    Max Golovanov
    @maxgolov
    That works. Maybe std::lock_guard on top of entire method scope.
    Mark
    @MarkSeufert
    Has anyone had the circular_buffer_benchmark_smoketest fail within the Bazel Valgrind CI?? I don't see how it could be related to my PR, so I'm wondering if there's a randomness factor in this test that can trigger to failure once in a while.
    2 replies
    Lalit Kumar Bhasin
    @lalitb
    Hi All, Github Discussion forum is now enabled for opentelemetry-cpp repo: https://github.com/open-telemetry/opentelemetry-cpp/discussions. Feel free to use it for any discussions, ideas and Q & A on topics related to the project. More on Github Discussions ( still in Beta ) : https://docs.github.com/pt/free-pro-team@latest/discussions
    7 replies
    Lalit Kumar Bhasin
    @lalitb
    @/all fyi alpha/test pre-release opentelemetry-cpp v0.1.0 is now available on current commit hash: https://github.com/open-telemetry/opentelemetry-cpp/releases/tag/v0.1.0
    Lalit Kumar Bhasin
    @lalitb
    @/all In case someone missed, as stated in Meeting agenda notes - we don't have community meeting today.
    Max Golovanov
    @maxgolov
    How do we handle multitenancy in OpenTelemetry... maybe it's more of a spec question. If I need to obtain Tracer(iKey1) that is sending traces to Tenant1. And the Tracer(iKey2) that is sending traces to Tenant2. For example, one tenant is my component tenant. The other one is my customer's component tenant. Traces must flow to separate storage locations due to privacy considerations. Both tracers might be associated with the same exporter kind, same channel. BUT different destination / auth key. How do I achieve that with existing API? We have wild variety of options to load different tracing libraries/versions of the libraries on API surface. But where is the API that allows me to bind a Tracer to a given destination / instrumentation key... Looks like in Jaeger this is addressed by obtaining a Tracer from configuration API, which provides the necessary routing / binding. But this is not the case for our current API surface.... in other words, why would TracerProvider API be missing ability to associate the tracer with a tenant? Do we assume that TracerProvider is always single-tenant?
    Also it seems like node.js example implements different semantics for the parameter passed to TracerProvider, e.g. here the guys use example-basic-tracer-node as the parameter.. which does not seem to be a library name? I'd be actually happy if we adjust the meaning of the first GetTracer parameter to be whatever, implementation-dependent. Then a given TracerProvider factory may decide how to treat the first parameter: be that destination (such as auth key) or configuration or alias, that then allows to lookup specific tracer config details (destination, tenant, auth key, batching params, etc.) rather than limiting this to be a library name. What do you guys think?
    Max Golovanov
    @maxgolov
    This will be actually equivalent, in spirit, to how log4cxx is managing loggers , described here
    Josh Suereth
    @jsuereth
    AFAIK - In other Languages, you create multiple "Tracer" instances for each tenant and grab your traces from there
    the "globals" are for APIs to add "default" telemetry, but multi-tenenant users are expected to have to do their own thing (create their own instances and isolate the flows of telemetry)
    It's not really clear in the spec, now, but we had to do an investigation into it for similar reasons
    I think C++ doesn't support this, so maybe we should open an issue against C++? (Or one against the spec if we need more clarity here)
    Lalit Kumar Bhasin
    @lalitb
    We do have a issue against C++ though specific to logger for now : open-telemetry/opentelemetry-cpp#417. @jsuereth do you know which languages support it ? We can see the approach they follow.
    Josh Suereth
    @jsuereth
    The theory being you don't use the global version in multi-tenant use case
    Max Golovanov
    @maxgolov
    We don't have this addressed in C++ SDK at all right now. Here is the use-case and pseudo code:
    auto tracerProvider = GetMyTracerProvider();
    auto tracer1 = tracerProvider::GetTracer("com.acme.component1");
    auto tracer2 = tracerProvider::GetTracer("com.acme.component2");
    This is in spirit identical to log4j, log4cxx, jul, etc.. What it gives: all modules in a single process operate with one factory not multiple factories. Then each may obtain their own tracer. They all might as well use ONE instrumentation library/version, but they require different settings. Sampling, ingestion key, etc.
    And what I am suggesting is -- we relax the definition of the instrumenationName to be not just MUST be library_name (which is impractical for big customers who test their telemetry for months with given instrumentation library+version). But to MAY be library_name, or MAY be unique instrument name. Then that instrument name can be intelligently mapped by the TracerProvider factory itself to corresponding library/version/channel/config settings behind-the-scenes. Moving out the provisioning piece outside of code. ==> less code, easier to adjust instrumentation keys, easier to reconfigure the instrument to use a different library. In essence, a symlink.
    I logged an issue here to elaborate on what this means exactly here:
    open-telemetry/oteps#145
    What I think needs to happen, the spec should change the wording for MUST be library_name to MAY be library_name; or MAY be instrument name or alias to be looked up and mapped by the Tracer Provider implementation.
    Max Golovanov
    @maxgolov
    When I obtain the Tracer object, I'd like the TracerProvider factory to associate my object by-name => to given set of { library_name, library_version, library_options, channel_settings} (that comes in external / separate config rather than in code). And return me the provisioned Tracer instance that I can use. This would also fully satisfy the multi-tenancy requirement, assuming that each module (SDK-in-SDK) within the same app has their own unique instrumentation name or GUID. Later on we can group / chain the handlers for given instrument name, i.e. you can fork your telemetry easily to console and whatever exporter in that model. With one instrument name.
    Josh Suereth
    @jsuereth
    I think that's a reasonable change, may need to expand into some more details.
    E.g let's say you have your RPC / HTTP library annotated.. However you need to report metrics on those RPCs per-client/tennant and you may also want to send some telemetry to your own monitoring.
    How would you accomplish that today?
    Max Golovanov
    @maxgolov
    I think I got a bit confused by spec, but I sent an update / PR for the spec (Tigran already approved): open-telemetry/opentelemetry-specification#1327
    Once we say that the GetTracer API accepts a module / component name, then in the TracerProvider itself - we can introduce some form of configuration (that'd be very similar conceptually to log4j ), that may identify the channel/exporter/channel config settings/channel authorization-ingestion key, for a given component. I'll send out some example to elaborate on what I mean :)
    Josh Suereth
    @jsuereth
    Ah, right
    Max Golovanov
    @maxgolov
    Then each component ( source ) - via configuration of the TracerProvider - may identify what destination it is bound to. It is source --> configurable blurb with channel/settings/authkey -> destination.. One can even build a MetaTracerProvider / mock TracerProvider that way, that aggregates different providers (maybe living in different DLLs) into one.. It's like wiring/rewiring a source -- patch panel -- destination, in essence. This would work, as long as component names within the application are all unique. We do not want a component to accidentally obtain a tracer that belongs to a different component. That'd be bad.
    Josh Suereth
    @jsuereth
    So the collector already has a "router" exporter
    Max Golovanov
    @maxgolov
    Basically, I was thinking like a local tracer provider figuring out the channel.
    In some of our scenarios, we need to send using the same channel type, but different URL.
    Josh Suereth
    @jsuereth
    right
    I like the idea.
    Max Golovanov
    @maxgolov
    Now if our exporter knows how to work with a given URL, means the TracerProvider then would manage how to instantiate, say, two different instances of the same exporter kind - that will give corresponding Tracer bound to given URL_A, URL_B.
    Yeah.
    I think I was really stuck on that NOT statement in the spec, which we cleared up.
    Josh Suereth
    @jsuereth
    Interesting, I need to take time to look deeper after I'm done my PITA java bugs :)
    Also, FYI - something for CPP to think about. If we have some kind of network failure on an exporter/exporter thread, are we going to provide a mechanism to try to "scramble" and refresh our connections/servers on failure?
    OpenCensus has a P0 bug where certain exceptions cause all telemetry to fail to propogate
    Tom Tan
    @ThomsonTan
    @maxgolov is the suggestion about handling multi-tenancy in exporter?
    Max Golovanov
    @maxgolov
    @ThomsonTan - Yes. We can handle it in several ways. We either pre-populate some baggage on Tracer before returning it (baggage may contain tenant identifier deducted from tenant name), OR we associate the tracer with a separately configured exporter. Like, several exporters - differently configured with different destinations / auth-keys. It's a technical detail. We can discuss :)

    Yes, we can support multitenancy with existing Get Tracer API. No issues. I was a bit concerned about the original definition in the spec, as it implied that if I always obtain GetTracer("http-tracer","v1.0") ... And it seemed like we can't that way support multiple per-tenant instances of that..

    Now instead of that - we can do GetTracer("MyModule") , and that will be as-per spec after my PR is merged. Then the TracerProvider may decide where to obtain the tracer from. And even if it comes from standard http-tracer instrumentation library --- the meta-TracerProvider may append extras to baggage of that Tracer instance.. Or somehow otherwise identify that different tenants have different URLs or different auth keys. And different tenants / modules may obtain differently provisioned instances of a tracer with the same channel/exporter kind.

    Max Golovanov
    @maxgolov

    Josh,

    If we have some kind of network failure on an exporter/exporter thread..

    I think there could be several ways to handle it. What I think we are missing currently is async model to propagate the failure code back to upper layer. Since a lot of it is async, we could have a model where we notify the upper layer via registered error handler callback.. This may run in SDK thread, like ISR, with limitations that customers should not do silly things in that callback. But at least they'd know if something was failed to be delivered..... Retry policy is a good discussion for anything HTTP / gRPC, it's like we end up maintaining a separate worker thread for retries. Sometimes though the HTTP stack itself (like WinInet) maintains the actual network transport thread. And only some periodic callbacks need to be invoked from that 'foreign' thread. In that model the retries are orchestrated by SDK. But the actual thread that is doing upload work is handled by HTTP stack. I think Apple HTTP stack would operate nearly the same way.

    Josh Suereth
    @jsuereth
    Exactly. We need a channel to be able to handle this, and I'm not sure "status" on Export is enough.
    Max Golovanov
    @maxgolov

    We do have some logic to contribute. But it's in a different codebase (finally OSS). We'd need to talk thru the better practices, how we can shape it in the exporter in OT SDK. I don't 100% like it how we handled it in the other SDK.. But we can borrow all HTTP clients from there now that it is OSS:
    https://github.com/microsoft/cpp_client_telemetry/tree/master/lib/http

    The clients are designed to pass back the status failure code / indication. It'd be still up to Exporter logic, e.g. be that Elastic, or any other exporter that has to deal with HTTP retries. How we handled it elsewhere (I don't 100% like, again, how it was done) - we had a separate, kinda abstract TPM component that was responsible for the exponential back-off policy with configurable retries:
    https://github.com/microsoft/cpp_client_telemetry/blob/master/lib/tpm/TransmissionPolicyManager.cpp

    I don't think it'd be reasonable to borrow this entire logic from MS code as-is. Fundamentals are good. But we can code it much nicer in OpenTelemetry. Like, NOT with raw pointers, but only using smart pointers; following best practices of modern C++. I can try to do a write-up to recap the thoughts, so that we can do the clean room reimplementation of the exponential back-off, considering all possible conditions (HTTP failures, SSL failures, HTTP connection reset in the middle, HTTP post body successful but connection reset on end i.e. no status code returned, and what HTTP status codes may be retryable and what not). It's a broad topic though.. Then there's Retry-After: there would be something for the STANDARD OTLP collector. But Elastic and other vendor collectors may have their own policy / throttling / DDoS protections. I am assuming we are designing this to operate at tremendous scale. :) (sorry for too many words).

    And, unfortunately, none of what we have applies to HTTP/2 :( .. I am sure we do need HTTP/2
    Max Golovanov
    @maxgolov
    There's a glitch on GitHub Ideas that first post shows 404, as if it didn't succeed. Then it posts the same idea twice :grin: .. I'd like to collect some feedback on this idea:
    https://github.com/open-telemetry/opentelemetry-cpp/discussions/511
    I hit this one while instrumenting something with ETW exporter. It is a bit inconvenient to have non-owning maps for string attributes on API. We could have made it owning, with a string (or C-string copy) across the ABI boundary in a safe manner. Extra copy, but nicer usage pattern if you have to deal with temporary strings returned by various utility functions. At least that's the pattern we used elsewhere. I provided a reference implementation.