## Where communities thrive

• Join over 1.5M+ people
• Join over 100K+ communities
• Free without limits
##### Activity
Joshua MacDonald
@jmacd
There was a brief mention of sampling in today's Spec SIG call. @eyjohn raised a point about whether sampling decisions are re-evaluated when span attributes change, I believe.
@eyjohn Can you restate your question / concern?
Joshua MacDonald
@jmacd
@lizthegrey You're on record as caring about sampling too. Would you summarize your concerns?
Liz Fong-Jones
@lizthegrey
oh yes, I care a great deal about sampling
tl;dr my concern is that I'm concerned that overloading traceid to mean sampling rate as well is too imprecise
Joshua MacDonald
@jmacd
I believe you're calling for propagating the sampling rate of a trace
Liz Fong-Jones
@lizthegrey
and therefore yes, we need to propagate the sampling rate of a trace and its spans
I know Bogdan's previous approach was to say if it starts with 16 0s and then has 16 non-zeroes, it is a 1 / 2**16 trace, etc
but I'd rather be explicit about what the sample rate is
and if you have already a sampling probability of 1/N from upstream and then you further downsample 1/2, then you need to have the sampling rate be 2N for those downstream spans, if that makes sense :)
Joshua MacDonald
@jmacd
Liz Fong-Jones
@lizthegrey
and straightforward probability samplers are obvious (1/N = 1 span represents N spans that were dropped)
but for dynamic sampling, the probability may vary depending upon the fields of the span etc.
Joshua MacDonald
@jmacd
Some have said we should use tracestate for this. Are you hoping for a standard form of baggage?
Liz Fong-Jones
@lizthegrey
e.g. that if you're sampling based on endpoint
yes, that's precisely correct, we need a standard baggage field
Joshua MacDonald
@jmacd
OK, I would agree to that. :)
Liz Fong-Jones
@lizthegrey
so that all otel understands how to read the sampling field even if generated by a different language SDK etc
I was putting off discussing it as long as we were still mired in 0.3 land
but for 0.4 I'd like to see us do it
essentially I think a lot of the disconnect here is vendors that have their own metrics systems may not care that much about sampling precision, whereas those of us that are sampling but reporting the multiplied out sample totals wind up needing it to approximate the total number of rpcs etc
Joshua MacDonald
@jmacd

I believe there could be an argument over interpretation. Although it's a mouthful, I think using the term "inverse probability" is helpful. I'm also in favor of calling it a lower bound--where a lower bound on inverse probability equates with an upper bound on probability. It's saying that "at the time of Extract on a context, we believed the sampling rate a.k.a. inverse probability was no less than the indicated value.

I say this because some sampling schemes are a bit speculative about what is kept-- I'm thinking of reservoir sampling approaches.

I have a second concern about sampling, which has to do with several loose ends in the Span API:
• how can a caller tell whether a span is a no-op, or shall we recommend a lazy interface for any kind of deferment
• shall "UpdateName" be a special case
• is the Sampler required to re-evaluate its decision when new attributes are set.
Joshua MacDonald
@jmacd

My position is (1) that callers ought to be able to tell whether a span operation will have no effect w/o a lazy interface, (2) UpdateName should not exist, SetName is OK, (3) Sampler should be considered a "head" sampler.

The Sampler decision informs whether a SpanData will be built and processed. The span processors can all implement their own sampling designs after the decision is made to build a SpanData, and these will each be recorded with different sampling rates. It's in this setting that I consider the propagated sampling rate to be a lower bound--it's the result of a head-sampling decision to build a span or trace based on the initial conditions, whereas the span or trace could eventually be recorded with a higher sampling rate if it survives (through random chance) some sort of selection process.

To firm this up, I'm suggesting that the default SDK should implement a head sampler, one that does not re-evaluate sampling decisions. The span processors and otel collector can implement tail sampling, and we can propagate a lower bound of sampling rate. The propagated lower bound value helps us limit the volume of trace data collection, whereas actual sampling rates are likely to be computed in the span processors, not in the Sampler.
Liz Fong-Jones
@lizthegrey
+++ yes, love it
indeed, this is for head-sampling only.
and we can do tail sampling later in collector, processors, or in your satellites/our refinery/etc.
but you have to start somewhere to start cutting the bulk down
glad we're violently in agreement here
Joshua MacDonald
@jmacd
((( If you let me talk much longer on this topic, we'll come to the paper Magic sets and other strange ways to implement logic programs. (SIGMOD, 1986) and it will be a great digression )))
Evgeny Yakimov
@eyjohn

@jmacd regarding my earlier call-out, It was less to do with sampling decisions, and more about the ability to addLink after span creation for me, which I understood was removed due to sampling related concerns.

Having said that, I do indeed have some views on sampling so happy to chip-in some of my thoughts:

Some characteristics that I have found useful of samplers (in-house based) are:

1. Ability to influence sampling from application code (i.e. the application code can force sample)