At a glance · If you cannot explain why Azure SQL costs moved, you do not have cost clarity. You have a bill. · Azure billing shows where money landed. It rarely shows who caused it. · You need a simple cost model: direct costs stay with apps; shared costs get split by a usage signal. · Start with an 80/20 split. Then tighten it with telemetry from Query Store and Azure Monitor. |
Why this matters
SQL is one of the fastest ways for cloud costs to become a political issue. Platform teams get blamed for “expensive shared services.” App teams get blamed for “bad queries.” FinOps gets stuck in the middle, trying to explain a bill that does not map cleanly to ownership.
The real issue is not that SQL is expensive; rather, it is that it is not scalable. It is that SQL is easy to share and hard to attribute. Elastic pools, managed instances, central monitoring, private endpoints, backups, and reservations all mix. The invoice looks clean. The accountability does not.
The three truths you must separate
Billing truth: what Azure billed and where it landed (subscription, resource group, meter).
Operational truth: what actually consumed the resources (CPU, I/O, storage, connections, query patterns).
Accountability truth: who should pay based on your operating model (shared platform vs app-specific use).
Cost clarity is the discipline of reconciling all three truths in a way that is repeatable, explainable, and fair enough that teams accept it. Not perfect. Accepted.
Define “platform” vs “app” spend in plain language
Do not let this become a philosophical debate. Write definitions you can put in a chargeback policy and defend in a meeting.
Platform spend
Shared capacity that exists so multiple apps can run (elastic pool, shared managed instance, shared SQL IaaS cluster).
Baseline controls required for the environment (monitoring, security baselines, backups, private networking patterns).
Central operational overhead that is intentionally shared (automation, patching, platform tooling).
App spend
Dedicated database resources purchased for one app (single database, dedicated MI, dedicated SQL VM).
App-specific premium features or scaling choices (zone redundancy, higher tier, extra replicas, aggressive retention).
App-driven consumption inside shared capacity that can be measured and attributed (CPU, reads/writes, storage growth).
A practical spend map for Azure SQL
This table is a starting point. Adjust based on how you run SQL in your estate.
Cost component | What Azure bills | Usually owned by | How to allocate |
Elastic pool compute | Pool (not per DB) | Platform | Split by per-DB CPU time signal |
Elastic pool storage | Pool storage | Platform | Split by per-DB data size or growth |
Single DB compute | Database | App | Direct to app |
Backup / long-term retention | Backup meters | Shared, but often app-driven | Direct if DB is dedicated; split if shared |
Log Analytics + diagnostics | Workspace + ingestion | Platform | Split by volume per DB/app if you collect per DB |
Defender for SQL / security tooling | Per-resource / per-server | Platform | Split evenly or by DB count if signals are weak |
Private endpoints + DNS | NICs, zones, resolvers | Platform | Treat it as a platform unless you measure per app |
Reserved instances/savings plan | Amortized discount | FinOps + platform | Allocate the discount back to the consumers |
Where most orgs lose cost clarity
They try to use billing tags to solve a telemetry problem.
They centralize platform resources but do not design a fair allocation method.
They adopt elastic pools to simplify operations, then act surprised when attribution gets messy.
They apply reservations, savings plans, or hybrid benefits and do not push the savings back to the consuming apps.
If you are using shared SQL capacity, you must plan for allocation. Otherwise, the platform team becomes the dumping ground for everyone else’s consumption.
A simple cost model that scales
Use four buckets. Keep them consistent across your dashboards, exports, and conversations.
App direct: costs billed to an app-owned resource (single DB, dedicated MI, dedicated SQL VM).
App shared: an app’s share of shared capacity (elastic pool, shared MI, shared tooling).
Platform base: baseline services you run, whether apps exist or not (core workspaces, central automation, base networking).
Platform shared: shared services that scale with adoption (pools, shared MI fleets, shared monitoring per DB).
Your reporting should show all four buckets. The goal is not to make the platform spend disappear. The goal is to show what is truly baseline versus what grew because consumption grew.
Allocation strategies that actually work
There are only three methods worth debating. Pick one per cost component. Document it. Then automate it.
1) Direct allocation (the easy wins)
If a resource is dedicated to one app, bill it directly. Do not overcomplicate it.
Use separate resource groups or subscriptions per app where possible.
Enforce required tags (app, environment, owner) with policy and deny when missing.
Keep dedicated databases out of elastic pools unless there is a strong reason.
2) Even split (acceptable when signals are weak)
Sometimes you do not have a clean usage signal. An even split is fine when you are transparent about it.
Shared security tooling costs where per-app telemetry is not feasible
Base networking and name resolution patterns (private DNS zones, resolvers)
Platform automation run costs if every team benefits similarly
3) Metered split (the long-term answer for shared SQL)
For elastic pools and shared managed instances, split costs based on a usage signal that approximates who caused consumption.
A practical signal stack (pick the strongest you can get):
CPU time per database (Query Store or DMVs)
Data reads/writes per database (Query Store, DMVs, or Azure Monitor where available)
Storage footprint or growth per database
Connection count per database (a weak proxy, but sometimes enough)
Then allocate pool cost using a weighted formula. Example:
Compute share per DB = (DB CPU time / total CPU time)
Storage share per DB = (DB size / total size)
Pool cost allocation per DB = (pool compute cost compute share) + (pool storage cost storage share)
This is not perfect. It is defensible. And it gives teams a lever: tune queries, reduce I/O, control growth.
Data you need (and where it comes from)
You can do this without buying another FinOps tool. You need two feeds: costs and signals.
Cost feed
Azure Cost Management exports (daily) to storage
Amortized view for reservations/savings plans where possible
Consistent tag set on resources (at least app, environment, owner)
Signal feed
Query Store (CPU time, reads, writes, duration) at the database level
Azure Monitor metrics for SQL (CPU %, storage %, workers, sessions)
DMVs for elastic pools and database resource stats (snapshot and trend)
Optional: diagnostic logs to Log Analytics if you want a single query surface
The trick is joining them. Costs are billed per resource and meter. Signals are measured per database. Your allocation layer is the glue.
A 30-day adoption plan that does not melt ops
Week 1: Establish the taxonomy
Agree on the four buckets and publish a one-page definition.
Standardize tags: appId, env, ownerTeam, costCenter, dataClass (minimum viable set).
Decide which SQL patterns count as “platform shared” (pools, shared MI).
Week 2: Build the first allocation
Export costs daily and build a simple Power BI or workbook view by subscription and resource group.
Pull a weekly CPU-time signal per database from Query Store or DMVs.
Allocate pool compute using CPU time. Allocate pool storage using DB size.
Publish a first showback report with assumptions in writing.
Week 3: Make it operational
Automate the allocation job (scheduled notebook, function, or dataflow).
Add an exception process: missing tags, unknown owner, shared database edge cases.
Add a monthly review with platform + FinOps + top app owners. Keep it short.
Week 4: Enforce and improve
Enforce required tags on new SQL resources using policy and a paved road template.
Push reservations or savings benefits back into app showback (do not keep the discount centralized).
Add the second-best signal (reads/writes or storage growth) if CPU alone is too noisy.
Pitfalls & Sharp Edges
Elastic pools hide per-DB billing. You must measure usage to split fairly.
Backup and long-term retention can spike without anyone noticing. Treat retention as an explicit app choice.
Central Log Analytics workspaces can become “platform spend” even when apps generate the logs. Track ingestion by source.
Reservations and savings plans change what “cost” means. Decide whether your showback uses actual, amortized, or blended.
Tagging breaks on day 2 unless you enforce it. Policy is not optional if you want cost clarity at scale.
If your goal is to reduce spend, the most common win is not “optimize SQL.” It is “stop paying for shared ambiguity.” Once teams see what they own, behavior changes.
The operator rule
If you cannot explain an Azure SQL cost change in one minute using data, you do not have cost ownership. You have surprise billing. Fix the model first, then tune the queries.
