Weighted routing in Route 53: A practical guide to traffic splitting

Weighted routing is the simplest way to split traffic between endpoints using DNS. You give each target a weight (0 to 255), Route 53 calculates the ratio, and sends traffic accordingly. It works. It's cheap. It doesn't care about geography or latency. That's the appeal and the limitation.

How weighted routing actually works

When you query a weighted Route 53 record, Route 53 picks one of the configured endpoints based on the weights you've set. The math is straightforward: if you weight three servers at 1, 2, and 3 (total of 6), the first gets roughly 17% of traffic, the second 33%, and the third 50%.

The actual numbers don't matter. Weights of 10, 20, 30 produce the same split as 100, 200, 300. This flexibility lets you adjust traffic gradually without redeploying.

When you set a record's weight to zero, it normally receives no traffic. But paired with health checks, zero-weighted records become backups. If all your primary records fail health checks, Route 53 routes traffic to the zero-weighted backup. This is useful for graceful degradation, though it assumes your backup can actually handle production traffic.

When to use weighted routing

Weighted routing shines for a specific set of problems. Use it for blue-green deployments where you shift traffic gradually from the old version to the new one. Send 90% to the stable version, 10% to the new one. Watch metrics. Adjust the weights. Flip to 0/100 when you're confident.

Use it for A/B testing. Route different percentages of users to different application versions and compare behaviour. The split is consistent because DNS caching means the same client usually gets routed to the same target for the lifetime of the TTL.

Use it for cross-region load balancing when you don't care about latency and just want to balance capacity. Send 50% of traffic to your London data centre and 50% to your Frankfurt data centre based on cost or capacity, not geography.

Don't use weighted routing if you care about routing users to the geographically closest endpoint. That's latency-based routing. Don't use it if you're trying to serve region-specific content. Use geolocation routing for that.

Setting up weighted records

Log into the Route 53 console, select your hosted zone, and create a new record. Choose weighted as the routing policy. Enter your record name (www, api, whatever). Pick the record type (A, AAAA, CNAME). Enter the target IP or domain.

Now assign a weight. If you're setting up two records with equal distribution, give them both weight 50. If you want 80/20 split, use 80 and 20. The absolute numbers don't matter, only the ratio.

Set a TTL. Lower TTL (60 seconds) means Route 53 sees faster updates if you change weights, but clients query more often. Higher TTL (300 seconds or more) reduces query cost but means changes propagate more slowly.

Add a record ID. Make it descriptive: "production-london" or "canary-deployment". This helps you track which is which in logs and dashboards.

Enable health checks. This is not optional. Create a health check for each weighted record that actually monitors the target. Point it at an HTTP endpoint that returns 200 when the service is healthy. Route 53 checks every 10 seconds (or 30 seconds for cheaper checks). If it fails, that record stops receiving traffic.

For AWS targets like Application Load Balancers, use alias records and set "Evaluate Target Health" to Yes. Route 53 checks the target's health automatically.

Testing before you deploy

Before sending real traffic, test your configuration. Use nslookup or dig to query your domain repeatedly. If you've weighted two servers 1:1, you should see roughly equal distribution of IP addresses in the responses. If you see 9:1 distribution when you wanted 1:1, you've misconfigured the weights.

Simulate a failure. Stop the service on one of your targets or block its health check endpoint. Query the domain again. Route 53 should stop returning that IP within 30 seconds (or 10 if you're using fast health checks). Then bring the service back and verify traffic resumes.

Test your zero-weight backup. If all your primary records are down, does Route 53 route to the backup? It should, but only if the backup's health check is passing.

Cost and performance considerations

Route 53 charges per hosted zone (about £0.40 per month) and per million queries (about £0.32). Weighted routing doesn't cost extra, but health checks do. Each health check is about £0.50 per month for a standard check. If you have 10 weighted records with health checks, that's £5 per month in health check costs plus your query costs.

Queries are cached by DNS resolvers. This means your traffic distribution is approximate, not exact. If a resolver caches your response for 300 seconds, the clients it serves will all be directed to the same target for 300 seconds. This is why TTL matters. Lower TTL gives more uniform distribution but higher query costs.

Weighted routing doesn't account for latency. A user in London can be routed to a server in Singapore if the weights tell Route 53 to do it. If you care about latency, reconsider your approach. Use latency-based routing instead, or add a layer of application logic to redirect users to the nearest region.

Practical examples

For a gradual rollout, start with 90:10 split (old:new). Run for a few hours, monitoring error rates and latency of the new version. If everything looks good, move to 50:50. Spend a few more hours verifying. Then flip to 10:90 and finally 0:100.

For blue-green deployments, keep both the blue and green environment running. Route 100% to blue. Deploy green alongside it. Once green is warmed up, switch to 1:99 (blue:green) and monitor. If something goes wrong, flip back immediately. Once you're confident, go 0:100 and decommission blue.

For A/B testing, route 50% to variant A and 50% to variant B. Collect metrics for a week. Make a decision. Clean up the losing variant.

Where Critical Cloud comes in

Weighted routing works best when you have visibility into what's actually happening. Are health checks catching failures fast enough? Is your new version actually receiving the traffic you think it is? Are your weights drifting from the intended split due to DNS caching?

This is where observability becomes critical. We're a Powered by Datadog accredited partner, and we instrument Route 53 health checks, DNS query patterns, and traffic distribution into Datadog. That way you're not guessing whether your weighted routing is working as intended.

If you're managing complex traffic splits manually, see how Critical Support can help you automate and monitor this properly.