The Complete Guide to AWS Cost Optimisation

Most AWS bills do not grow because teams are careless. They grow because nobody owns them. Engineers ship features, infrastructure accumulates, and the invoice arrives weeks after the decisions that caused it. By the time anyone looks, the spend is baked in and the context is gone.

Cost optimisation is not a one-off cleanup. It is a discipline: see the spend clearly, make it somebody's job, and fix the structural things that quietly compound. This guide walks through that discipline in the order that actually works, from visibility through to the habits that keep your bill flat while you grow.

We run production AWS environments for regulated and technology-led businesses every day, with Datadog as the observability layer. The advice below is what we apply when we take over a customer's account and find the bill running 30 to 50 percent higher than it needs to be.

Start with visibility, because you cannot cut what you cannot see

Every cost programme that fails starts the same way: someone tries to cut spend before they can see it. They guess, they delete the wrong thing, something breaks, and the programme dies. Do the visibility work first.

Cost Explorer

AWS Cost Explorer is the free, built-in starting point. It shows your spend broken down by service, account, region, and tag, with daily and monthly granularity. Two features matter most.

Usage analysis lets you slice the bill by usage type, so you can see whether a rising EC2 line is more instances, larger instances, or more data transfer. That distinction changes the fix entirely. Forecasting projects your spend forward based on historical trend, which is how you catch a problem in week two of the month instead of on the invoice. Set forecasting against your budget so the gap is visible early.

Cost Explorer is enough for most teams to find the top three problems. When you need line-item detail (per-resource, per-hour), turn on the Cost and Usage Report, which writes the full granular dataset to S3 for querying in Athena or loading into a dashboard.

Budgets and billing alerts

AWS Budgets lets you set a monthly spend target and get alerted when actual or forecasted spend crosses a threshold. Set this up on day one. A budget that alerts at 80 percent of forecast gives you time to act before the month closes.

The most common budgeting mistakes we see: setting a single account-wide budget with no breakdown (you get alerted but have no idea which team caused it), alerting only on actual spend rather than forecast (you find out too late), and setting budgets once and never revisiting them as the business grows. Budgets should track reality, not a number someone picked a year ago.

Route the alerts somewhere people actually look. Email gets ignored. Pushing budget alerts into the channel where your engineers already work makes them act. We wrote a step-by-step guide on how to connect AWS Budgets to Slack for exactly this reason.

Tagging and cost allocation

Tags are the foundation of everything that follows. Without them, your bill is one undifferentiated number. With them, you can attribute every pound to a team, a product, an environment, or a customer.

Activate cost allocation tags in the billing console so they appear as dimensions in Cost Explorer and the Cost and Usage Report. Then enforce a tagging standard: at minimum, an owner tag, an environment tag (production, staging, development), and a cost-centre or product tag. Make the tags mandatory in your infrastructure-as-code so resources cannot be created without them. A tag that is optional is a tag that is missing on the resource you most need to investigate.

Good tagging is what turns "the bill went up 12 percent" into "the payments team's staging environment went up 12 percent," which is a problem with an owner and a fix.

Dashboards and tooling

For ongoing visibility, build a dashboard rather than logging into Cost Explorer each week. You can build one with Amazon QuickSight on top of the Cost and Usage Report, which gives finance and engineering a shared view. Some teams prefer custom Cost Explorer dashboards or third-party cost tools. The tool matters less than the habit: a single view everyone trusts, reviewed on a regular cadence.

AWS Trusted Advisor is worth checking too. Its cost optimisation checks flag idle load balancers, underutilised instances, unattached EBS volumes, and unassociated Elastic IPs. It is not exhaustive, but it is free and it surfaces obvious waste. For a more thorough sweep, see our guide on tools to detect unused AWS resources.

Make spend accountable

Visibility without accountability is just a nicer-looking bill. Once you can attribute spend by tag, give each team or product visibility into its own number and a target to hold. The teams that own their spend behave differently from the teams that treat infrastructure as free. Separate AWS accounts per team or environment, joined under AWS Organizations, make this cleaner still, because the boundary is structural rather than reliant on tag discipline.

This is the single biggest cultural lever. Cost stops being a finance problem at month-end and becomes an engineering input at design time.

Right-size everything: the biggest quick win

After visibility, right-sizing is where the fastest savings live. Most environments are provisioned for a peak that rarely arrives, or for a guess made when the workload was smaller.

Start with compute. Review EC2 and RDS instances against their actual CPU, memory, and network utilisation over a representative period. An instance sitting at 8 percent CPU for a month is two or three sizes too big. Use a rightsizing checklist so the review is systematic rather than ad hoc, and make it a recurring exercise, not a one-time purge.

Lambda is its own discipline. Memory and CPU are coupled, so the cheapest configuration is rarely the lowest memory setting. Sometimes more memory finishes the job faster and costs less overall. We cover the method in detail in right-sizing AWS Lambda memory.

Then hunt the idle. Unattached EBS volumes, old snapshots, idle load balancers, forgotten dev environments left running over weekends, and orphaned NAT gateways all cost money for nothing. Auto-scaling helps here too, not just for performance but for cost: scaling down out of hours and during troughs means you stop paying for capacity you are not using.

Commit to save: Savings Plans and Reserved Instances

Once your usage is right-sized and stable, commit to it. AWS rewards commitment with substantial discounts over on-demand pricing.

Savings Plans give you a discount in exchange for committing to a steady hourly spend over one or three years. Compute Savings Plans are the flexible option, applying across instance families, sizes, regions, and even Lambda and Fargate. Reserved Instances lock you to a specific instance type in a specific region, which is less flexible but can go slightly deeper on discount for predictable, fixed workloads.

The rule of thumb: right-size first, then commit. Committing to oversized infrastructure just locks in waste at a discount. Cover your stable baseline with commitments and leave headroom for on-demand to absorb variability. Reserved Instance planning is worth doing properly, modelling your baseline against one-year and three-year terms before you commit, because the commitment is real money for the full term.

Fix your storage bill, because S3 is where money hides

Storage spend creeps. Data goes in, nothing comes out, and the bill grows quietly for years. The fix is lifecycle policy.

S3 has multiple storage classes priced for different access patterns, from Standard for frequently accessed data down through Infrequent Access and the Glacier tiers for archival. The mistake is leaving everything in Standard forever. Set lifecycle rules to transition objects to cheaper classes as they age, and to expire objects that have no business value past a certain date (old logs, temporary exports, stale build artefacts).

If your access patterns are unpredictable, S3 Intelligent-Tiering moves objects between access tiers automatically based on usage, so you capture the savings without having to model the access pattern yourself. For predictable patterns, explicit lifecycle rules are cheaper because Intelligent-Tiering carries a small monitoring charge per object.

Two things make storage optimisation safe rather than scary. First, understand your data retention obligations before you expire anything, especially in regulated environments where some data must be kept for years. We cover how to set this up in AWS data retention policies. Second, know what you actually have before you change it: a proper AWS storage cost analysis tells you where the spend sits so you optimise the expensive 20 percent rather than fiddling with the cheap long tail.

Backups deserve the same scrutiny. Backup data accumulates, and old snapshots and AMIs are pure cost once they are past their retention window. Apply lifecycle and retention to backups as deliberately as you do to primary data.

Control the hidden costs

Some of the most surprising line items are not compute or storage at all.

Data transfer is the classic. Moving data between regions, between availability zones, and out to the internet all costs money, and it is rarely visible in the architecture diagram. Cross-region transfer in particular catches teams out as they add redundancy or expand geographically. We break down the patterns and fixes in cross-region data transfer costs.

Lambda has its own hidden costs beyond compute. Data transfer out of Lambda, and the cost of logging every invocation at full verbosity, add up at scale. There is a genuine trade-off between logging enough to debug and logging so much it dominates the function's cost. Decide deliberately what you log at what level, rather than defaulting to verbose everywhere.

Spend the right way from the start

The cheapest cost optimisation is the spend you never commit to. When you are choosing services, the AWS free tier versus paid services decision matters more than it looks, because the free tier has limits that turn into bills the moment you cross them quietly. Choose managed services where the operational saving outweighs the premium, and self-manage where it does not. The right architecture decision at design time saves more than any amount of after-the-fact cleanup.

Make it a habit, not a project

The teams that keep their AWS bill under control do not run an annual cost project. They run a monthly rhythm. Someone owns the number. The dashboard gets reviewed. Anomalies get investigated while the context is still fresh. New commitments get modelled before they are made. Right-sizing happens on a schedule.

This is the FinOps discipline in plain terms: visibility, accountability, and optimisation, repeated. It is not glamorous, and that is exactly why most teams skip it until the bill forces their hand.

Where Critical Cloud comes in

Running this discipline continuously, on top of operating the platform reliably, is more than most lean teams have capacity for. That is what we do.

Critical Cloud operates AWS environments as a managed service, with Datadog as the observability layer across infrastructure, applications, and cost. We use Datadog Cloud Cost Management to correlate spend with what is actually happening in your systems, so a cost spike is tied to the deploy, the traffic pattern, or the misconfiguration that caused it, not discovered three weeks later on the invoice. As the world's first Powered by Datadog accredited partner, we bring that visibility as standard, not as an add-on.

If your AWS bill is growing faster than your business, see how Critical Support works or talk to us about a cost review.

The Complete Guide to AWS Cost Optimisation