If your platform bill has a mystery line item, odds are it is logging.
Not because logging is bad. Because logging is easy to turn on and hard to turn down once people depend on it.
In Azure, a few clicks can start streaming diagnostics into Log Analytics, Application Insights, Sentinel, storage accounts, event hubs, and partner tools. That is great. It is also a budget decision made by whoever clicked the box.
Treat observability like architecture. Because it is.
TL;DR
· Logging spend is a design decision, not an ops surprise. If you did not choose it on purpose, you chose it by default.
· Three levers control most logging cost outcomes: what you collect, how long you keep it, and how you query it.
· The win is not 'log less'. The win is 'log the right things, for the right time, in the right place, with ownership'.
Why logging costs behave differently than 'normal' cloud costs
Compute and storage usually map to a workload. If the workload stops, the cost drops.
Logging costs keep moving even when the workload stays the same, because they are driven by volume, churn, and habits:
· Volume: how much data you emit and ingest.
· Churn: bursts, retries, noisy dependencies, and chatty libraries.
· Habits: dashboards with expensive queries, alerts that run every minute, and retention that never gets revisited.
That is why logging bills feel like the weather. They should not.
Lever 1: What you collect (volume control)
This is the blunt instrument. It also has the greatest impact when you do it early.
Think of logs as a product. Products have requirements. Requirements have scope. Scope has an owner.
Start with a simple question: what do we need to detect and prove?
· Detection: signals that tell you something is broken or drifting.
· Proof: evidence for audit, incident review, and postmortems.
Everything else is optional. Optional data should not default to 'forever, for everyone'.
Practical controls in Azure:
Be intentional with diagnostic settings. Many resource types offer multiple categories. Not all categories are equally valuable.
Use Data Collection Rules (DCR) and transformations where applicable to shape incoming data. Filter out obvious noise before it reaches your primary workspace.
Sample high-volume traces. Keep full fidelity only where it changes decisions. Sampling is a design choice, not a failure.
Control cardinality. High-cardinality dimensions (user IDs, request IDs, full URLs, container names) can explode costs and make queries slow. Keep them, but use them surgically.
Create a 'logging budget' per workload. Not a finance budget. A data budget: max GB/day plus the signals you promise to deliver.
Lever 2: How long you keep it (retention and tiering)
Retention is where good intentions quietly turn into recurring costs.
Most teams keep logs longer than they use them, because nobody wants to be the person who deleted 'the thing we needed'. That is a governance problem, not a tooling problem.
Practical retention moves in Azure:
Set workspace and table-level retention based on purpose. Hot troubleshooting data needs days or weeks. Audit evidence might need months. They do not need the same tier.
Separate 'hot' and 'cold' paths. Use Log Analytics for fast search and active detection. Use an archive or storage for long-term evidence and rare investigations.
Use separate workspaces when it makes ownership clear. One workspace for 'everything' usually means nobody owns the bill.
Document your retention rationale. The goal is not to justify cost. The goal is to make the decision repeatable when someone asks, 'Why 90 days?'
Lever 3: How you query it (query behavior and detection design)
Even if ingestion is stable, query patterns can turn a reasonable spend into a runaway.
A platform can pay for data twice: once to store it, and again to scan it over and over.
Practical query and alert controls:
Treat alert rules like code. Review them. Version them. Retire them. An alert that fires weekly is a great candidate for a slower schedule or a narrower query.
Build 'cheap first' dashboards: start with summary tables and narrow time windows. Only expand when you need detail.
Use pre-aggregation where it makes sense. If every team runs the same heavy query, consider a shared function, materialized views where available, or a derived dataset that is cheaper to read.
Put guardrails around exploration. Ad hoc hunting is valuable during incidents. It should not be the default daily workflow for everyone.
Measure query cost and performance. If you cannot tell which queries are expensive, you cannot control them.

A quick decision framework you can actually use
When someone asks, 'Should we turn on logging for this?', you do not need a meeting. You need a checklist:
1. What decision will this data change? (detect, diagnose, prove, or none)
2. Who owns it? (app team, platform team, security, vendor)
3. What is the data budget? (GB/day and peak expectations)
4. Where does it land? (workspace, sentinel, storage, event hub, partner)
5. How long do we keep it in each place? (hot retention and cold retention)
6. What queries or alerts will run on it? (frequency and scope)
If you cannot answer these, you are not 'enabling logs'. You are creating a cost surface area.
Ownership is the missing piece
Most logging pain is not technical. It is organizational.
Logging sits between teams: app owners emit signals, platform teams route and store them, security teams consume them, and finance teams pay for them.
So the bill lands in a shared bucket, and everyone shrugs.
Fix that by assigning an explicit owner for each of these:
· The schema owner: decides which fields matter and which are banned.
· The budget owner: accountable for the monthly trend and anomalies.
· The consumer owner: owns the alerts, dashboards, and detection quality.
· The change owner: approves changes to diagnostic settings, DCRs, and retention.
When ownership is clear, costs become explainable. That is the whole game.
What to do in the next 30 minutes
If you want a fast win without breaking anything, do this:
· Pick your top 1-2 workspaces by spend.
· List the top tables by ingestion and the top sources by category. You are looking for obvious noise, not perfection.
· Check retention settings. If anything is set to 'keep forever' without a reason, flag it.
· Find the top recurring alerts and dashboards. Identify anything scanning large time ranges frequently.
· Write down the 3 biggest opportunities and assign an owner for each.
That is enough to stop the bleeding and start a real observability operating model.

Close
Logging spend is not an accident. It is the sum of your architecture decisions, your defaults, and your habits.
The goal is not to be cheap. The goal is to be intentional.
If you can explain your logging costs in one paragraph, you are doing better than most teams.
If you cannot, start with the three levers. Then give the bill a real owner.