Back to Blog
AWSVPCNetworkingInfrastructure

AWS VPC Deep Dive: Production Networking That Scales

Master AWS VPC networking for production: subnets, route tables, NAT gateways, security groups, and network architecture patterns that scale securely.

Azynth Team
14 min read

AWS VPC Deep Dive: Production Networking That Scales

AWS VPC (Virtual Private Cloud) is the foundation of your cloud infrastructure. Here's how to design VPC architecture that's secure, scalable, and cost-effective.

VPC Fundamentals

A VPC is your private network in AWS:

VPC (10.0.0.0/16)
├── Public Subnet (10.0.1.0/24)  → Internet Gateway
├── Private Subnet (10.0.2.0/24) → NAT Gateway
└── Database Subnet (10.0.3.0/24) → Isolated

Key concepts:

  • CIDR block: IP address range (e.g., 10.0.0.0/16 = 65,536 IPs)
  • Subnets: Subdivisions of your VPC
  • Route tables: Define traffic routing
  • Gateways: Connect to internet or other VPCs

Production VPC Architecture

Multi-Tier Design

# Terraform: Production VPC resource "aws_vpc" "main" { cidr_block = "10.0.0.0/16" enable_dns_hostnames = true enable_dns_support = true tags = { Name = "production-vpc" Environment = "production" } } # Public subnets (load balancers, NAT gateways) resource "aws_subnet" "public" { count = 3 vpc_id = aws_vpc.main.id cidr_block = "10.0.${count.index + 1}.0/24" availability_zone = data.aws_availability_zones.available.names[count.index] map_public_ip_on_launch = true tags = { Name = "public-${count.index + 1}" Tier = "public" } } # Private subnets (application servers) resource "aws_subnet" "private" { count = 3 vpc_id = aws_vpc.main.id cidr_block = "10.0.${count.index + 11}.0/24" availability_zone = data.aws_availability_zones.available.names[count.index] tags = { Name = "private-${count.index + 1}" Tier = "private" } } # Database subnets (RDS, ElastiCache) resource "aws_subnet" "database" { count = 3 vpc_id = aws_vpc.main.id cidr_block = "10.0.${count.index + 21}.0/24" availability_zone = data.aws_availability_zones.available.names[count.index] tags = { Name = "database-${count.index + 1}" Tier = "database" } }

Three-Tier VPC Architecture:

Layer 1: Public Subnets

  • Components: Application Load Balancer, NAT Gateway, Bastion hosts
  • Internet access: Bidirectional (via Internet Gateway)
  • CIDR example: 10.0.1.0/24, 10.0.2.0/24, 10.0.3.0/24

Layer 2: Private Subnets

  • Components: ECS tasks, EC2 instances, Lambda functions
  • Internet access: Outbound only (via NAT Gateway)
  • CIDR example: 10.0.11.0/24, 10.0.12.0/24, 10.0.13.0/24

Layer 3: Database Subnets

  • Components: RDS, ElastiCache, managed databases
  • Internet access: None (fully isolated)
  • CIDR example: 10.0.21.0/24, 10.0.22.0/24, 10.0.23.0/24

Traffic Flow:

Internet → Internet Gateway → Public Subnets → Private Subnets → Database Subnets

Internet Gateway vs NAT Gateway

Internet Gateway (IGW)

For public subnets:

resource "aws_internet_gateway" "main" { vpc_id = aws_vpc.main.id tags = { Name = "production-igw" } } # Route table for public subnets resource "aws_route_table" "public" { vpc_id = aws_vpc.main.id route { cidr_block = "0.0.0.0/0" gateway_id = aws_internet_gateway.main.id } tags = { Name = "public-rt" } } # Associate with public subnets resource "aws_route_table_association" "public" { count = 3 subnet_id = aws_subnet.public[count.index].id route_table_id = aws_route_table.public.id }

Result: Resources in public subnet get direct internet access (bidirectional).

NAT Gateway

For private subnets:

# Elastic IP for NAT Gateway resource "aws_eip" "nat" { count = 3 domain = "vpc" tags = { Name = "nat-eip-${count.index + 1}" } } # NAT Gateway (one per AZ for HA) resource "aws_nat_gateway" "main" { count = 3 allocation_id = aws_eip.nat[count.index].id subnet_id = aws_subnet.public[count.index].id tags = { Name = "nat-gateway-${count.index + 1}" } } # Route table for private subnets resource "aws_route_table" "private" { count = 3 vpc_id = aws_vpc.main.id route { cidr_block = "0.0.0.0/0" nat_gateway_id = aws_nat_gateway.main[count.index].id } tags = { Name = "private-rt-${count.index + 1}" } }

Result: Private resources can initiate outbound internet traffic, but not receive inbound.

Cost consideration: NAT Gateway costs ~$33/month + data transfer. For dev environments, use a single NAT Gateway instead of one per AZ.

Security Groups vs Network ACLs

Security Groups (Stateful)

Best practice: Default deny, explicit allow

# ALB security group resource "aws_security_group" "alb" { name = "alb-sg" description = "Allow HTTP/HTTPS inbound" vpc_id = aws_vpc.main.id ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] description = "HTTP from internet" } ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] description = "HTTPS from internet" } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] description = "Allow all outbound" } } # Application security group resource "aws_security_group" "app" { name = "app-sg" description = "Allow traffic from ALB" vpc_id = aws_vpc.main.id ingress { from_port = 8080 to_port = 8080 protocol = "tcp" security_groups = [aws_security_group.alb.id] description = "HTTP from ALB" } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } } # Database security group resource "aws_security_group" "db" { name = "db-sg" description = "Allow traffic from app servers" vpc_id = aws_vpc.main.id ingress { from_port = 5432 to_port = 5432 protocol = "tcp" security_groups = [aws_security_group.app.id] description = "PostgreSQL from app servers" } # No egress rules needed (response traffic is stateful) }

Flow:

Internet → ALB (80/443) → App (8080) → Database (5432)

Network ACLs (Stateless)

Use case: Additional layer of defense

resource "aws_network_acl" "private" { vpc_id = aws_vpc.main.id subnet_ids = aws_subnet.private[*].id # Allow inbound from VPC ingress { rule_no = 100 protocol = "-1" action = "allow" cidr_block = "10.0.0.0/16" from_port = 0 to_port = 0 } # Deny all other inbound ingress { rule_no = 200 protocol = "-1" action = "deny" cidr_block = "0.0.0.0/0" from_port = 0 to_port = 0 } # Allow all outbound (return traffic) egress { rule_no = 100 protocol = "-1" action = "allow" cidr_block = "0.0.0.0/0" from_port = 0 to_port = 0 } }

Security Groups vs NACLs:

FeatureSecurity GroupsNetwork ACLs
StatefulYes - return traffic automatically allowedNo - must explicitly allow both directions
RulesAllow onlyAllow AND deny
Applied toInstance (ENI level)Subnet level
EvaluationAll rules processedRules processed in numerical order

Recommendation: Use security groups for most cases. Use NACLs for subnet-level deny rules (e.g., block specific IPs).

VPC Peering

Connect VPCs together:

# VPC peering connection resource "aws_vpc_peering_connection" "prod_to_shared" { vpc_id = aws_vpc.production.id peer_vpc_id = aws_vpc.shared_services.id auto_accept = true tags = { Name = "prod-to-shared-peering" } } # Add routes to route tables resource "aws_route" "prod_to_shared" { route_table_id = aws_route_table.private.id destination_cidr_block = "10.1.0.0/16" # Shared VPC CIDR vpc_peering_connection_id = aws_vpc_peering_connection.prod_to_shared.id } resource "aws_route" "shared_to_prod" { route_table_id = aws_route_table.shared_private.id destination_cidr_block = "10.0.0.0/16" # Production VPC CIDR vpc_peering_connection_id = aws_vpc_peering_connection.prod_to_shared.id }

Use cases:

  • Connect production VPC to shared services VPC (monitoring, logging)
  • Multi-account architectures
  • Development/staging/production separation

Limitations:

  • No transitive peering (A→B, B→C doesn't mean A→C)
  • VPC CIDRs can't overlap
  • Max 125 peering connections per VPC

VPC Endpoints

Access AWS services without internet gateway:

Gateway Endpoints (S3, DynamoDB)

resource "aws_vpc_endpoint" "s3" { vpc_id = aws_vpc.main.id service_name = "com.amazonaws.us-east-1.s3" route_table_ids = concat( aws_route_table.private[*].id, aws_route_table.database[*].id ) tags = { Name = "s3-endpoint" } }

Cost: Free!

Interface Endpoints (Most AWS services)

resource "aws_vpc_endpoint" "secrets_manager" { vpc_id = aws_vpc.main.id service_name = "com.amazonaws.us-east-1.secretsmanager" vpc_endpoint_type = "Interface" subnet_ids = aws_subnet.private[*].id security_group_ids = [aws_security_group.vpc_endpoints.id] private_dns_enabled = true tags = { Name = "secretsmanager-endpoint" } } # Security group for VPC endpoints resource "aws_security_group" "vpc_endpoints" { name = "vpc-endpoints-sg" description = "Allow HTTPS from VPC" vpc_id = aws_vpc.main.id ingress { from_port = 443 to_port = 443 protocol = "tcp" cidr_blocks = [aws_vpc.main.cidr_block] } }

Cost: ~$7/month per endpoint + data transfer

When to use:

  • Private subnets without NAT Gateway
  • High data transfer to S3/DynamoDB (saves NAT costs)
  • Compliance (data doesn't leave AWS network)

VPC Flow Logs

Monitor network traffic:

# Send flow logs to CloudWatch resource "aws_flow_log" "main" { vpc_id = aws_vpc.main.id traffic_type = "ALL" # ACCEPT, REJECT, or ALL iam_role_arn = aws_iam_role.flow_logs.arn log_destination = aws_cloudwatch_log_group.flow_logs.arn tags = { Name = "vpc-flow-logs" } } resource "aws_cloudwatch_log_group" "flow_logs" { name = "/aws/vpc/flow-logs" retention_in_days = 7 }

Analyze with CloudWatch Insights:

# Top 10 rejected connections fields @timestamp, srcAddr, dstAddr, dstPort, action | filter action = "REJECT" | stats count() as rejectedConnections by srcAddr, dstAddr, dstPort | sort rejectedConnections desc | limit 10

Use cases:

  • Troubleshoot connectivity issues
  • Detect unauthorized access attempts
  • Compliance and auditing
  • Monitor traffic patterns

Transit Gateway (Advanced)

For complex multi-VPC architectures:

# Transit Gateway (hub) resource "aws_ec2_transit_gateway" "main" { description = "Production Transit Gateway" default_route_table_association = "enable" default_route_table_propagation = "enable" tags = { Name = "production-tgw" } } # Attach VPCs resource "aws_ec2_transit_gateway_vpc_attachment" "prod" { transit_gateway_id = aws_ec2_transit_gateway.main.id vpc_id = aws_vpc.production.id subnet_ids = aws_subnet.private[*].id tags = { Name = "prod-vpc-attachment" } }

When to use:

  • 5+ VPCs that need to communicate
  • Hub-and-spoke network topology
  • Centralized egress/ingress
  • Cross-region VPC connectivity

Cost: ~$36/month per attachment + data transfer

CIDR Planning

Production best practices:

Organization: 10.0.0.0/8
├── us-east-1
│   ├── Production VPC: 10.0.0.0/16
│   │   ├── Public subnets: 10.0.1.0/24, 10.0.2.0/24, 10.0.3.0/24
│   │   ├── Private subnets: 10.0.11.0/24, 10.0.12.0/24, 10.0.13.0/24
│   │   └── Database subnets: 10.0.21.0/24, 10.0.22.0/24, 10.0.23.0/24
│   └── Staging VPC: 10.1.0.0/16
├── us-west-2
│   └── DR VPC: 10.10.0.0/16
└── Shared Services VPC: 10.100.0.0/16

Tips:

  • Use /16 for VPCs (65,536 IPs)
  • Use /24 for subnets (256 IPs)
  • Reserve space for growth
  • Document your CIDR plan
  • Avoid overlapping CIDRs

Cost Optimization

VPC Component Costs:

ComponentMonthly CostData Transfer
VPC, Subnets, Route tablesFreeFree
Security Groups, NACLsFreeFree
Internet GatewayFreeFree
Gateway Endpoint (S3, DynamoDB)FreeFree
NAT Gateway$32.40/month$0.045/GB
Interface VPC Endpoint$7.20/month$0.01/GB
VPC PeeringFree$0.01/GB (cross-AZ)
Transit Gateway$36/month/attachment$0.02/GB

Cost optimization tips:

  1. Use Gateway Endpoints for S3/DynamoDB (free)
  2. Single NAT Gateway for dev/staging
  3. VPC Peering instead of Transit Gateway (for <5 VPCs)
  4. Interface Endpoints if high data transfer (saves NAT costs)
  5. S3 Gateway Endpoint for large data transfers

Best Practices

  1. Multi-AZ by default - Deploy across 3 AZs minimum
  2. Separate tiers - Public, private, database subnets
  3. Least privilege security groups - Only allow required ports
  4. VPC Flow Logs - Enable for troubleshooting and security
  5. Tag everything - Use consistent tagging strategy
  6. CIDR planning - Reserve space for future growth
  7. NAT Gateway HA - One per AZ for production
  8. VPC Endpoints - Reduce NAT costs and improve security
  9. Infrastructure as Code - Use Terraform/CloudFormation
  10. Network ACLs sparingly - Use security groups for most rules

Troubleshooting

Can't reach internet from private subnet

# Check route table aws ec2 describe-route-tables --route-table-id rtb-xxx # Verify NAT Gateway is healthy aws ec2 describe-nat-gateways --nat-gateway-ids nat-xxx # Check security group allows outbound aws ec2 describe-security-groups --group-ids sg-xxx

VPC peering not working

# Verify peering is active aws ec2 describe-vpc-peering-connections --vpc-peering-connection-ids pcx-xxx # Check route tables on both sides # Check security groups allow traffic from peer VPC CIDR

High NAT Gateway costs

# Analyze VPC Flow Logs # Find top talkers to internet # Consider VPC endpoints for AWS services

Conclusion

AWS VPC is the foundation of secure, scalable cloud infrastructure. Key takeaways:

  • Multi-tier design separates public, private, and database layers
  • Security groups provide instance-level firewall rules
  • NAT Gateways enable outbound internet from private subnets
  • VPC Endpoints reduce costs and improve security
  • VPC Flow Logs essential for troubleshooting and security
  • CIDR planning critical for avoiding future conflicts

Master VPC networking, and you've mastered AWS.


Need help designing your AWS network architecture? Let's talk about your infrastructure needs.

You might also like