Nothing Is Broken. That's the Problem.

Your dashboards are green. Average response time is 40 milliseconds, same as last quarter. And yet support tickets are up, a few enterprise accounts are “evaluating alternatives,” and your best engineers keep getting paged at 2am for something they can never quite reproduce by morning.

No alert fired. No error budget burned. The average is hiding the only latency that matters.

Why the average lies

In a modern system, a single user request fans out into dozens of internal calls — services, caches, queues, a database or two. The user waits for the slowest one. So the experience isn’t set by your typical request; it’s set by your worst few.

This is why p99 — the latency your slowest 1% of requests see — behaves so differently from the average. And it gets worse with scale, not better. The math is unforgiving: the more services sit on the critical path of a request, the higher the odds that at least one of them is having a slow moment. That probability grows faster than linearly with dependency depth. Recent field analysis — InfoQ’s writeup on adaptive hedged requests — put it plainly: stragglers, not failures, drive your p99. Nothing has to break for your tail to blow out — one service just has to hesitate.

There’s a measurement trap underneath this that catches a lot of teams: you cannot average percentiles across machines. A “p99” that was computed by averaging the p99 of each node is a number with no real meaning. The honest way is to aggregate raw histograms and compute the percentile once, globally. If your team can’t tell you how their p99 is calculated, treat the number with suspicion.

What the field is actually doing about it

Three shifts are worth a leader’s attention right now.

Histograms over averages. The current generation of metrics tooling (Prometheus and the OpenTelemetry standard, among others) is built around latency distributions, not single numbers. The point isn’t the tooling — it’s the mindset shift from “what’s our latency” to “what does the shape of our latency look like, and how fat is the tail.”

Exemplars — closing the gap from symptom to cause. The frustrating part of tail latency has always been reproducing it. Exemplars are a now-mainstream technique that links a spike on a latency chart directly to the actual trace of a real slow request. You stop guessing which dependency stalled; you click through to the one that did. This is the single biggest reduction in “we couldn’t reproduce it” that most teams can buy with configuration rather than a rewrite.

Designing the tail down. On the architecture side, adaptive hedged requests — sending a backup request only when the first is running long — are a live 2026 example: one open-source implementation (modeled on Google’s original “Tail at Scale” paper) took a team’s p99 from 64ms to 17ms, a 74% cut, for about 9% added load. It’s a reminder that the tail is an engineering target you can attack, not weather you have to endure.

The catch: visibility into the tail isn’t free. Telemetry volumes have grown several-fold in recent years, and chasing every outlier with full-fidelity tracing is how teams end up with observability budgets that force finance into the conversation. The current discipline is intelligent sampling — keep the slow and the erroring traces, sample the boring ones — so you can see the tail without paying to store the entire haystack.

What to actually do

Ban the average from your SLOs. Hold your services to p99 (and p99.9 where revenue or reputation is on the line). Track p95 too — it’s less noisy and tells you about the broad middle.
Ask one question in your next review: “How is our p99 computed, and can we click from a latency spike to the trace that caused it?” The answer tells you whether your team can actually diagnose the tail or is flying on vanity numbers.
Budget the tail and its cost together. Tail visibility and observability spend are the same conversation. Set a telemetry budget, sample deliberately, and treat “keep everything” as the expensive default it is — not a strategy.
Treat build-vs-buy as a tail question. The hard part isn’t collecting metrics; it’s correlating the tail across services without drowning in cost. That’s where most teams under-invest and overspend at the same time.

The slowest 1% of requests is where churn hides, where on-call burns out, and where your largest customers form their opinion of your reliability. Your average will never show it to you.

The tail isn’t an edge case. For the customer who hits it, it’s the whole experience.

Why the average lies

What the field is actually doing about it

What to actually do

Want this mapped to your stack?