rethink o11y

Observability is not a dashboard collection hobby.

Last updated · 2026-04-12

The wrong question

You have infrastructure. Not observability.

Not to understanding. Not to root cause. Not to a dashboard that tells you what's wrong.

If you want answers, earn them.

Metrics, logs, traces — a vendor taxonomy, not engineering.

Three storage formats. Says nothing about whether you understand your system.

Teams with all three pillars who can't answer "why is this slow now?"

One engineer with kubectl top, tcpdump, two well-placed log lines — finds it in minutes.

200 imported panels nobody reads until an incident — then nobody knows which panel matters.

A dashboard you didn't design is one you don't understand.

Every panel should answer a specific question.

If you can't articulate the question, delete the panel. You will not miss it.

You chose it: arbitrary thresholds, copied rules, snooze instead of delete.

An alert that needs no action isn't an alert — it's spam.

I've seen teams cut alerting rules from 400 to 15 and improve incident response.

Not despite the reduction — because of it.

Every metric you never query is waste. Every log line never searched is waste. Every trace never followed is waste.

Some orgs spend more on o11y than on the infra they observe.

Not "what can we collect?"

"What do we need to know?"

Instrument for the answers. Stop there.

Prometheus. Grafana. OpenTelemetry. Built by people who owe you nothing.

They give you the ability to observe. Whether you do is on you.

You have the foundation for real observability.

If you can't, no tooling will save you.

Start understanding systems.