ECS Auto Scaling: Automatic task and cluster scaling done right

ECS Auto Scaling is two separate problems solved by one system. First, you need to decide how many tasks to run (Application Auto Scaling). Second, you need enough EC2 instances to host them (Cluster Auto Scaling). Get either one wrong and your application either becomes unavailable or you're paying for unused capacity. Most teams deploy scaling policies once and never touch them again. That's usually wrong.

How ECS Auto Scaling works

You configure a CloudWatch metric (CPU, memory, or custom). You set a threshold. When the metric crosses that threshold, an alarm triggers. The alarm tells Application Auto Scaling to add or remove tasks. For cluster scaling, a separate mechanism monitors instance utilisation and launches or terminates EC2 instances.

The system is automatic and continuous. CloudWatch samples your metrics every minute. Alarms evaluate every minute. Scaling actions happen within seconds. This means your application responds to traffic spikes quickly, and you stop paying for unused resources during quiet periods.

Three types of scaling policies

Target tracking is the simplest approach. You say "keep CPU at 70%" and ECS maintains it automatically. CloudWatch feeds back actual CPU usage, ECS adjusts task count, steady state. This works well for predictable workloads. The downside is less control. If your workload has bursty behaviour, target tracking might add tasks more slowly than you'd like.

Step scaling gives you explicit control. You set multiple thresholds. "If CPU goes above 75%, add 2 tasks. If CPU goes above 85%, add 5 tasks. If CPU falls below 30%, remove 1 task." You can handle different levels of demand differently. Sudden traffic spike? Add more tasks faster. Small increase? Add fewer. The cost is complexity. You're now maintaining multiple thresholds instead of one.

Scheduled scaling runs on a calendar. You say "add 5 tasks at 08:00 GMT on weekdays, scale back at 18:00 GMT." No metrics involved, just time. Perfect for predictable patterns (business hours, seasonal demand, known events). Useless for unpredictable traffic.

Choosing the right metric

CPU utilisation works for compute-heavy applications. If your application's performance degrades when CPU exceeds 70%, scale at 65% to give yourself a buffer. Test this under load to find the real threshold.

Memory utilisation works for memory-heavy applications. Caches, data processing, large datasets. Same approach. Test under load, find where performance degrades, scale at 80% of that.

Custom metrics work for everything else. Request concurrency for APIs. Queue depth for batch jobs. Database connection pool usage. Anything you can emit to CloudWatch becomes a scalable metric. These are often more accurate than CPU or memory because they directly measure what you care about.

Test under realistic load. Generate traffic at the scale you expect. Watch which resource becomes the bottleneck first. That's your metric.

Setting up task scaling

Log into the ECS console. Select your service. Find the "Service auto scaling" section. Enable it. Set minimum tasks (what you run during off-peak) and maximum tasks (highest safe limit).

For a service that handles 50 requests per second at peak, minimum might be 2 tasks, maximum might be 10. Set this conservatively. It's easier to increase later.

Choose your scaling policy. For most applications, start with target tracking at 70% CPU. Monitor for a week. If it's responding well, you're done. If you want more control, switch to step scaling with 65% scale-out and 30% scale-in thresholds.

Create CloudWatch alarms. The console usually does this automatically. For custom metrics, you'll need to create them manually. Set the evaluation period to 2-3 minutes. This prevents scaling on transient spikes.

Enable health checks on your load balancer. Unhealthy tasks should be marked as such so scaling doesn't add capacity if tasks are failing.

Scaling at the cluster level

Task scaling without cluster scaling is a problem. If you scale up to 10 tasks but only have 2 EC2 instances, the new tasks can't run. They just sit pending. You need cluster-level scaling.

Use capacity providers. Create an auto scaling group with your EC2 instances. Register it with ECS as a capacity provider. Set a target utilisation (typically 70-80% instance utilisation). ECS monitors CapacityProviderReservation metric. When instance utilisation exceeds the target, ECS launches new instances. When it falls below, ECS terminates unused instances.

During scale-out from zero instances, ECS launches two instances immediately to provide capacity. This is intentional. It prevents cascading failures where you're always one instance short.

Combine on-demand and spot instances in your auto scaling group. On-demand handles your baseline load. Spot instances handle burst traffic. You save 70% on spot but accept that they can be interrupted. Route 53 health checks will catch failures when instances are terminated.

Use capacity provider strategies to distribute tasks across different providers. Say you have two capacity providers: on-demand and spot. You could configure your service to run 70% of tasks on on-demand and 30% on spot. ECS maintains this ratio as you scale.

Monitoring scaling activity

Set up CloudWatch alarms for scaling events themselves. When tasks are scaled out, you want to know. When scaling repeatedly happens (thrashing), you want to know that's a problem.

Monitor DesiredTaskCount vs RunningTaskCount. If they're consistently out of sync, you have a problem. Either your instances are too small, you're hitting task start time limits, or your cluster capacity is too constrained.

Monitor task startup time. Some applications take 30 seconds to fully start. If you're scaling up during traffic spikes, those cold starts matter. Optimise your application startup or pre-warm instances during scheduled scaling before peak hours.

Look at CPU and memory utilisation trends. If CPU is consistently below 30%, you're over-provisioned. If it's consistently above 80%, you're under-provisioned. Adjust your thresholds.

Watch for scaling thrashing. If your service is constantly scaling up and down in short cycles, something's wrong. This usually means your metric is too noisy, your threshold is right at your normal operating point, or you have bursty workloads that don't benefit from scaling.

Common mistakes

Don't set minimum tasks to zero unless you can handle startup latency. A service with zero tasks running can take 30+ seconds to receive traffic. That's not acceptable for customer-facing applications.

Don't use absolute metrics (raw CPU in MHz) for scaling. Use relative metrics (percentage). Your instances might change size over time. Percentages scale with them.

Don't forget to set maximum task limits. Without a maximum, runaway scaling can cost thousands per hour. Set it to something defensible: if my application gets DDOS'd or leaks connections, what's the maximum it should ever scale to?

Don't scale based on metrics that don't correlate with load. If you're scaling on memory but your application has a memory leak, scaling won't help. Fix the leak first.

Don't rely on scaling alone for high availability. Scaling takes time. Health checks and multi-zone deployments catch failures faster.

Cost considerations

Each task added costs money. Running at minimum capacity during off-peak saves significantly. If you have 2 tasks baseline and scale to 10 during peak, you're paying for 8 additional tasks only during peak hours.

Scheduled scaling combined with target tracking works well. Scale up 30 minutes before you expect traffic. This eliminates cold start latency. Then let target tracking handle unexpected spikes above your baseline.

Monitor your scaling costs. Some teams see their AWS bills double because they set minimum tasks too high or maximum tasks too low. Use Cost Explorer to see how scaling affects your bills.

Use spot instances aggressively for batch jobs and non-critical services. They interrupt, but that's fine if you have multiple tasks.

Where Critical Cloud comes in

Scaling only works if you can see what's actually happening. You need to know if your alarms are triggering correctly, if scaling is fast enough, if your chosen metrics actually correlate with performance.

We're a Powered by Datadog accredited partner, and we instrument ECS deeply. That means tracking task count changes, alarm state transitions, metric values, and scaling effectiveness into Datadog. You get complete visibility into whether your scaling strategy is working.

If you're unsure about your scaling configuration or you're managing complex multi-service scaling, see how Critical Support works.