Best practices for AWS network latency optimisation: Get users faster data

Latency is distance plus processing time. In AWS, the geography question comes first: if your users are in the UK and your workload is in us-east-1, you are paying a latency penalty on every request that no amount of optimisation within the region will overcome. Start with correct region selection, then apply the techniques below.

Start with region and Availability Zone placement

Deploy workloads in the AWS region closest to the majority of your users. For UK and Western European users, eu-west-2 (London) provides the lowest baseline latency for traffic originating from the UK. eu-west-1 (Ireland) is a reasonable alternative with slightly higher baseline latency from UK endpoints but a longer-established region with broader service availability.

Within a region, AZ-to-AZ communication adds latency compared to intra-AZ communication. For latency-sensitive components that communicate heavily (application server to database, cache to application), place them in the same AZ. For high availability, deploy across AZs with the understanding that the HA tier involves slightly higher intra-component latency.

Avoid deploying latency-sensitive workloads across regions except where geographic distribution is a deliberate architectural decision (active-active multi-region, data residency requirements). Cross-region traffic adds 20-100ms of latency depending on the region pair.

Use EC2 placement groups for high-throughput compute

For workloads requiring the lowest possible latency between instances (HPC, distributed databases, tightly coupled microservices), EC2 placement groups control where instances are physically placed within AWS infrastructure.

Cluster placement group: Places instances in a single rack or set of racks within one AZ to maximise network throughput and minimise latency between instances. Typical latency is measured in tens of microseconds. Best for: HPC jobs, distributed machine learning training, any workload where intra-cluster bandwidth (up to 100 Gbps with supported instance types) and latency matter more than AZ redundancy.

Spread placement group: Places each instance on separate underlying hardware, maximising isolation. Each instance is on a different rack with independent power and networking. Best for a small number of critical instances where hardware failure isolation matters.

Partition placement group: Divides instances into partitions, each in a separate rack. Multiple instances can be in the same partition but partitions are isolated from each other. Best for large distributed systems (Kafka, Cassandra, HDFS) where rack-aware placement matters for data durability.

For standard web application workloads, placement groups are not necessary. The latency benefit is only noticeable for workloads where instances communicate extremely frequently or need guaranteed high-bandwidth intra-cluster networking.

Enable Enhanced Networking

Enhanced Networking uses hardware virtualisation (SR-IOV) to provide higher bandwidth, lower latency, and lower CPU utilisation compared to traditional virtualised networking. It is available on most current-generation EC2 instance types and is enabled by default on newer instance families.

Verify Enhanced Networking status on existing instances:

aws ec2 describe-instances \
  --instance-ids i-xxxx \
  --query 'Reservations[*].Instances[*].EnaSupport'

If the instance was launched from an older AMI or uses an older instance type, it may not have Enhanced Networking enabled. Migrating to a current-generation instance type (m6i, m7i, c6i, c7i families) enables Enhanced Networking automatically.

For the highest network performance requirements, Elastic Fabric Adapter (EFA) provides OS-bypass networking for HPC and machine learning workloads, reducing latency to single-digit microseconds for supported instance types.

Use CloudFront for static and cacheable content

CloudFront is AWS's global CDN with edge locations in 30+ countries. Content cached at a CloudFront edge location is served to the user from that location's network, adding only the edge-to-user latency rather than the origin-to-user latency.

For UK users accessing content from a London-region origin, CloudFront edge locations in London, Frankfurt, Paris, and Amsterdam provide lower latency than the origin for cached responses. The benefit is most significant for users further from the origin: a user in Australia accessing a cached response from a Sydney CloudFront edge gets substantially better performance than a direct request to eu-west-2.

What to cache with CloudFront: - Static assets (images, CSS, JavaScript): set long TTLs (1 year with immutable cache headers) - API responses that are the same for all users or for categories of users: set appropriate TTLs - HTML for largely static pages: lower TTLs matching your content update frequency

Do not attempt to cache personalised content or authenticated session-specific responses without very careful cache key design. A misconfigured CloudFront distribution serving the wrong user's data to another user is a data breach.

Optimise TCP connection handling

Each new TCP connection requires a three-way handshake before data transfer begins. For HTTPS, the TLS handshake adds additional round trips. Multiplying these round trips across every user request adds measurable latency.

Connection reuse: Ensure your application and load balancers use HTTP/1.1 keep-alive or HTTP/2 connections rather than opening a new connection per request. Application Load Balancers support HTTP/2 to clients by default. Verify your backend connections from the ALB to the application servers are also using keep-alive.

TLS session resumption: TLS 1.3 significantly reduces handshake latency compared to TLS 1.2. AWS CloudFront, ALB, and API Gateway all support TLS 1.3. Ensure your origin servers also support TLS 1.3 for connections from the load balancer.

TCP acceleration for cross-region: AWS Global Accelerator routes traffic through AWS's global network backbone rather than the public internet, reducing latency for cross-region traffic by up to 60% in some cases. It is most beneficial for latency-sensitive applications with users in multiple geographic regions.

Measure before and after

Latency optimisation without measurement is guesswork. Establish baseline measurements for the specific latency metrics that matter to your application:

  • P50, P95, P99 request latency from your ALB or API Gateway metrics
  • DNS resolution time (relevant if you are changing CloudFront or Route 53 configurations)
  • Time to First Byte (TTFB) for content delivery optimisation
  • Database query latency (separate from application latency; often the dominant component)

Measure from the user's perspective using CloudWatch Synthetics: create canary scripts that simulate user requests from specific AWS regions and record latency. This gives you a geographic view of your latency profile.

Where Critical Cloud comes in

Latency analysis for production workloads requires correlating network metrics, application trace data, and database query profiles simultaneously. An improvement at the network layer may expose a bottleneck at the database layer that was previously masked. As the world's first Powered by Datadog accredited partner, we provide correlated network, application, and infrastructure performance data in a single view, making latency root cause analysis tractable rather than a multi-tool investigation. See how Critical Support works.