Reliability Engineering Lab
Illustration for: Cloud Cost Optimisation: A Practical Checklist for Engineering Teams

Cost Optimisation

Cloud Cost Optimisation: A Practical Checklist for Engineering Teams

Feb 10, 20264 min read

Why most cloud cost reviews stall

Engineering teams often know their cloud bill is too high. The harder question is where to look first and what to do about it. Generic advice like "rightsize your instances" or "delete unused resources" is true but rarely actionable without a systematic approach.

This checklist is organized by impact tier. Start with the quick wins. Build a habit of reviewing the rest quarterly.

Tier 1: Quick wins (days, not weeks)

These items typically take less than a day to investigate and regularly uncover 10–30% of total spend.

Idle and underutilized compute

  • Identify instances running below 10% CPU utilization for 7+ consecutive days
  • Stop or terminate instances tagged as dev, test, or staging that have not been accessed in 30 days
  • Check for instances started for one-off tasks that were never terminated

Most cloud providers offer native utilization views. Set a 7-day lookback window as your baseline. An instance that has been idle for a week is almost certainly safe to review.

Unattached storage volumes

  • List all block storage volumes not attached to a running instance
  • Check snapshot schedules — automated snapshots from terminated instances often continue indefinitely
  • Review object storage buckets with no access events in 90 days and no lifecycle policy

Unattached volumes and forgotten snapshots are among the most common sources of waste. They are invisible in dashboards that only show running resources.

Oversized databases

  • Compare provisioned database instance size against peak connection counts and query throughput
  • Identify databases with provisioned IOPS but consistently low I/O utilization
  • Check for read replicas that are not serving read traffic

A read replica that exists for a failover scenario that has never been tested is both a cost problem and a reliability risk.

Tier 2: Reservation and commitment strategy

Reserved instances and savings plans

  • Calculate coverage rate: what percentage of your baseline compute is covered by reservations or savings plans?
  • Identify stable workloads running continuously for 3+ months — these are candidates for 1-year commitments
  • Avoid reserving instance types that are likely to change with application growth or migration plans

A coverage rate below 40% on workloads that have been stable for over 6 months is a strong signal of undercommitment.

Spot and preemptible instances

  • Identify stateless workloads that tolerate interruption: batch processing, CI/CD runners, data pipelines
  • Review whether your container workloads have graceful shutdown handling — this is a prerequisite for spot usage
  • Check whether your current on-demand costs for these workloads justify the engineering effort to migrate

Spot instances can reduce compute costs by 60–80% for the right workload profiles. The key constraint is interruption tolerance.

Tier 3: Architecture and licensing

Data transfer costs

  • Pull a breakdown of data transfer charges from your billing console
  • Identify cross-region data movement — this is often unnecessary and expensive
  • Check whether applications are routing traffic through a NAT gateway when direct endpoint routing is available

Data transfer is frequently the second or third largest line item on cloud bills, and it is almost never reviewed until it becomes a problem.

Software licensing

  • Audit bring-your-own-license (BYOL) usage — are you paying for licenses you already own?
  • Check database engine choices: are you using a licensed commercial engine (Oracle, SQL Server) where a compatible open-source alternative would serve the workload?
  • Review third-party marketplace subscriptions for active usage

Tagging and allocation gaps

  • Define a mandatory tag schema covering: environment, team, cost center, and project
  • Identify the percentage of spend that is currently untagged or incorrectly tagged
  • Set up budget alerts per tag dimension so teams see their own spend

Without tagging, cost ownership is invisible. Without ownership, there is no accountability loop.

Turning the checklist into a process

A one-time audit produces a list of savings opportunities. A process ensures they compound. The most effective pattern is a weekly calibration: a 30-minute review of the previous week's spend against a defined baseline, with a bias toward acting on findings immediately rather than parking them in a backlog.

The teams that control their cloud spend most effectively are not the ones that run the largest optimization projects. They are the ones that treat cost visibility as an operational discipline.

Ready to find out what your cloud environment could save?

Book a free scoping call. We will map your spend band, define scope, and issue a fixed-fee proposal with a guaranteed minimum savings threshold — typically within one business day.

Book a Discovery Call