AWS latency: Root causes and how to fix them fast

Latency is invisible until it's not. Your users notice every extra 100 milliseconds. It affects sales, engagement, and whether people stay or leave. Most latency problems come from three things: geography, network setup, and underprovisioned resources. All fixable. Usually faster than you think.

The three fundamental causes

Distance is the first cause. Data travels at the speed of light through fibre optic cable. Speed of light is fast. But the furthest points on Earth are still 20,000 kilometres apart. That's 67 milliseconds of light speed latency one way. Add router hops, congestion, and queuing, and you're at 150-200 milliseconds. Your application is fine. The network isn't.

Network setup is the second cause. Your application might be in the right region, but routing is bad. Traffic bounces through unnecessary hops. Firewalls and NAT gateways add latency. Public internet routing is unpredictable. Some packets take fast paths, some slow. Users see inconsistent response times.

Resource underprovisioning is the third cause. Your instance is saturated. Requests queue. A request that should take 50ms waits 500ms for the instance to finish processing the previous request. The instance itself is the bottleneck, not the network.

Measure before fixing

Guess wrong about where latency comes from, you'll spend months on the wrong optimization. Use CloudWatch to measure. Route 53 health checks show latency from different regions to your endpoints. Compare. If London to your app is 15ms and Frankfurt to your app is 50ms, geography is your problem, not resources.

VPC Flow Logs show network path. Enable them. Parse them. Find unexpected routing or packet drops.

AWS X-Ray traces requests through your system. Instrument your application. See where time is spent. Database? Network? Application code?

CloudWatch also tracks infrastructure metrics. CPU, memory, network throughput, disk latency. Look at them when latency is high. If CPU is 80% and latency is high, you need more instances or a faster instance type.

Geography: Pick the right region

If most of your users are in the UK, use eu-west-2 (London). If they're in Europe, use eu-west-1 (Ireland) or eu-central-1 (Frankfurt). If they're global, you need multi-region or edge caching.

Multi-AZ within a region is fine for redundancy. It adds a few milliseconds (within the same AZ is sub-millisecond, cross-AZ is single-digit milliseconds).

AWS Local Zones bring AWS services to large cities. If you're in London and need absolute lowest latency, run in Local Zones. Cost is higher. Availability is sometimes limited. Use for applications that can't tolerate more than a few milliseconds.

For global applications, don't run workloads in every region. That's expensive. Use edge services instead.

Network: Fix routing and connectivity

Public internet routing is slow. A packet from London to your app in Frankfurt might bounce through Amsterdam, Berlin, Munich, and finally Frankfurt. Or it might go direct. AWS's routing algorithm doesn't control the public internet. You get random paths.

AWS Global Accelerator fixes this. It routes your traffic through AWS's private backbone network. AWS controls the entire path. Fast, consistent, reliable. Response times improve 20-60% without changing your application.

CloudFront is for caching (static content, APIs). Global Accelerator is for routing (non-HTTP traffic, dynamic content, lowest latency).

AWS Direct Connect is a private connection from your on-premises network to AWS. Faster than public internet. More consistent. Costs more. Use if you have large amounts of data or latency-sensitive applications.

For non-critical workloads, optimise within VPC first. Use VPC endpoints for S3 and DynamoDB (free). Those save a cross-region hop.

Resources: Provision properly

Right-size your instances. If CPU is 80%, your app is queueing requests. Scale up. Add more instances.

Use instance types that fit your workload. Compute-optimised instances for CPU-heavy work. Memory-optimised for databases. Network-optimised for high throughput.

Use enhanced networking (ENA for EC2, network optimisation for RDS). Single-digit percentage improvement, but it's free.

For databases, query performance matters as much as instance size. Add indexes. Avoid full table scans. Monitor slow queries. Fix them.

Caching: Eliminate repeated work

ElastiCache (Redis or Memcached) stores frequently accessed data in memory. Sub-millisecond access time. Reduces database load and latency simultaneously.

Common pattern: application asks database for product info. Takes 5ms. ElastiCache stores it. Next time, takes 0.1ms. 50x faster.

Downside is complexity. You now have to invalidate the cache when data changes. Data can be stale. Worth it for read-heavy workloads. Not worth it for write-heavy workloads.

CloudFront caches static content and API responses. Serves from edge locations near users. Massive latency improvement for content delivery.

Monitoring: Know when you're slow

Set up CloudWatch dashboards that show latency metrics. P50, P99. Average latency is misleading. P99 matters. If P99 is 5 seconds and average is 200ms, you have spiky workload. That spikiness frustrates users.

Alarms on latency thresholds. If P99 exceeds your SLA, alert. Act before users complain.

Real user measurement (RUM) is important. Synthetic tests show what your app does in controlled conditions. Real users show what it does in the wild. Network congestion, browser rendering, content blocking extensions. All real users experience this. Synthetic tests miss it.

Where Critical Cloud comes in

Most teams guess at where latency comes from. Guess wrong, you waste months. Observability matters. You need to see exactly where time is spent and why latency changed.

We're a Powered by Datadog accredited partner. We instrument latency deeply. Trace latency from user to application. See database query time. See network round trips. See everything.

If your application is slow or you're not sure where latency is coming from, see how Critical Support works.