AWS firewall rules: 7 best practices for security that actually works

AWS provides four distinct network security controls: security groups (stateful, instance-level), network ACLs (stateless, subnet-level), AWS WAF (application-layer for HTTP), and AWS Network Firewall (full-stack VPC firewall). Most organisations use some of these correctly and misconfigure the rest. The common failure modes: security groups with 0.0.0.0/0 inbound rules because it was easier during development, NACLs left at default with no explicit deny rules, WAF deployed but in count mode so it logs attacks without blocking them.

These seven practices cover the controls that matter.

1. Enforce bidirectional rules on security groups

Security groups are stateful: if you allow inbound traffic on a port, AWS automatically allows the return traffic. You do not write explicit outbound rules for responses.

The mistake is allowing outbound 0.0.0.0/0 (all traffic to anywhere) as a default. This allows your EC2 instance to initiate connections to any destination on any port, which is how compromised instances communicate with command-and-control servers, exfiltrate data, and participate in outbound attacks.

Restrict outbound security group rules to what each instance actually needs to communicate with: - Application servers: outbound to database security group on the database port, outbound to AWS service endpoints on 443 - Database servers: outbound to application security group on the application port (for responses), outbound to AWS endpoints for backup and monitoring - NAT Gateway or internet access: explicitly scoped, not wildcard

Review all security groups with outbound 0.0.0.0/0 quarterly. Most will have no legitimate reason for unrestricted outbound access.

2. Reference security groups, not IP addresses

When an application server needs to communicate with a database server, the naive approach is to allow traffic from the application server's IP address. This breaks when instances are replaced (new IP), when Auto Scaling adds instances (new IPs), and when instances move Availability Zones.

The correct approach: allow traffic from the application server's security group. Write the rule as: inbound TCP 5432 from sg-applicationserver. Any instance in the application server security group can connect; no IP management required. When Auto Scaling adds a new application server and assigns it to the security group, it immediately has the correct database access.

This also makes security group rules self-documenting: the rule explicitly describes the relationship between components rather than listing IP addresses that require a separate document to interpret.

3. Use NACLs as a coarse-grained subnet boundary

Network ACLs operate at the subnet level and are stateless (you must write explicit rules for both inbound and outbound traffic, including return traffic). Because they are stateless and processed in numerical order, they are cumbersome for fine-grained rules but effective for broad exclusions.

Use NACLs to: - Explicitly deny inbound traffic from known malicious IP ranges (threat intelligence feeds) - Block outbound traffic on specific high-risk ports (445/SMB, 23/Telnet) at the subnet level as a backstop even if security groups allow it - Create a hard boundary between subnets that should never communicate directly

Do not use NACLs for the fine-grained rules that belong in security groups. Maintaining NACL rules alongside security group rules creates confusion about which layer is responsible for what.

4. Deploy WAF in block mode, not count mode

AWS WAF managed rule groups arrive pre-configured. The temptation is to deploy them in count mode first to observe what they would block before enabling blocking. This is reasonable for the first 24-48 hours to identify false positives. It is not reasonable as a long-term configuration.

A WAF in count mode provides zero protection. It generates logs and metrics that look like security controls, while attacks it would have blocked pass through. Review the WAF logs during the count-mode observation period, tune the rules to eliminate false positives, then switch to block mode within a week at most.

The AWS managed rule groups to enable for most web applications: - Core Rule Set (CRS): OWASP Top 10 protection including SQL injection and XSS - Known Bad Inputs: Common payloads used in attacks - Amazon IP Reputation List: Traffic from known malicious IPs - Bot Control: Detects and controls bot traffic

Enable all in block mode for production. If a specific rule causes false positives for legitimate traffic, exclude the specific rule rather than switching the whole group to count mode.

5. Prioritise rules to match traffic patterns

AWS Network Firewall processes stateless rules in numerical order. The first matching rule wins. Rules evaluated against every packet that does not match an earlier rule add latency.

Structure stateless rules efficiently: - Low rule numbers (1-100): permit established TCP connections (significantly reduces stateful engine load) - Mid rule numbers (100-1000): permit known-good traffic (internal VPC-to-VPC, AWS service endpoints) - High rule numbers (1000-9000): custom deny rules for specific IPs or ports - Default action: forward to stateful engine for everything that does not match stateless rules

For stateful rules, domain-based blocking (block *.malicious-domain.com) is more maintainable than IP-based blocking for threat intelligence: malicious infrastructure changes IP addresses frequently, but domain names are more stable.

6. Deploy across multiple Availability Zones

A firewall endpoint deployed in a single AZ is a single point of failure. If the AZ experiences an outage, traffic through that endpoint is disrupted regardless of how healthy the protected workload is.

Deploy AWS Network Firewall endpoints in each AZ used by the protected workloads. For a three-AZ VPC, that means three firewall endpoints. Route tables in each AZ point to the firewall endpoint in the same AZ. Traffic stays within the AZ for firewall inspection.

This also applies to WAF: WAF is a regional service and provides AZ resilience automatically. Security groups and NACLs are regional controls with AZ scope. The risk area is specifically AWS Network Firewall, which requires explicit multi-AZ endpoint deployment.

7. Enable and monitor firewall logs

A firewall that blocks traffic without logging it is a black box. When a legitimate application request fails, you need to know whether a firewall rule is responsible and which one.

Enable logging for all firewall controls: - Network Firewall: flow logs (every accepted/rejected connection) and alert logs (rule matches) to CloudWatch or S3 - WAF: full request logging to CloudWatch Logs, with at least 90-day retention - VPC Flow Logs: all traffic flowing through VPC interfaces, for network forensics

Set up a CloudWatch metric filter on WAF blocked requests. Alert when the blocked request rate spikes significantly above baseline: either someone is attacking, or a new legitimate use case is being incorrectly blocked. Both need investigation.

Where Critical Cloud comes in

Firewall configuration that looks correct on paper but has 0.0.0.0/0 in practice, WAF in count mode, or Network Firewall endpoints only in one AZ provides a false sense of security. We audit and operate AWS network security for regulated businesses, with firewall log analysis surfaced in the same operational view as application and infrastructure signals. As the world's first Powered by Datadog accredited partner, we treat firewall block events as first-class operational telemetry. See how Critical Support works.