AWS ECS Production Deployment: The Complete Guide
Deploy containerized applications on AWS ECS with auto-scaling, blue/green deployments, and production-grade monitoring.
AWS ECS Production Deployment: The Complete Guide
Amazon ECS is AWS's container orchestration service. It's simpler than Kubernetes but powerful enough for most production workloads. Here's how to deploy containers on ECS like a pro.
ECS vs Kubernetes vs Fargate
ECS EC2:
- You manage EC2 instances
- Full control over instance types
- Lower cost for steady workloads
ECS Fargate:
- Serverless containers
- No instance management
- Pay per vCPU/GB/second
- ~30% cost premium
Kubernetes (EKS):
- More complex, more powerful
- Better for multi-cloud
- Larger ecosystem
Recommendation: Start with Fargate, move to EC2 if cost or customization needs demand it.
Core Concepts
ECS Cluster
├── Services (long-running tasks)
│ ├── Task Definition (container specs)
│ ├── Tasks (running containers)
│ └── Load Balancer
└── Scheduled Tasks (cron jobs)
Task Definition: The Blueprint
{ "family": "api", "networkMode": "awsvpc", "requiresCompatibilities": ["FARGATE"], "cpu": "512", "memory": "1024", "taskRoleArn": "arn:aws:iam::ACCOUNT:role/api-task-role", "executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecs-execution-role", "containerDefinitions": [{ "name": "api", "image": "123456789.dkr.ecr.us-east-1.amazonaws.com/api:v1.2.3", "portMappings": [{ "containerPort": 8080, "protocol": "tcp" }], "environment": [ {"name": "NODE_ENV", "value": "production"} ], "secrets": [ { "name": "DATABASE_URL", "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:db-url" } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "/ecs/api", "awslogs-region": "us-east-1", "awslogs-stream-prefix": "ecs" } }, "healthCheck": { "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"], "interval": 30, "timeout": 5, "retries": 3, "startPeriod": 60 } }] }
Infrastructure as Code with Terraform
# VPC for ECS module "vpc" { source = "terraform-aws-modules/vpc/aws" name = "ecs-vpc" cidr = "10.0.0.0/16" azs = ["us-east-1a", "us-east-1b", "us-east-1c"] private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"] public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"] enable_nat_gateway = true single_nat_gateway = false # HA: one per AZ } # ECS Cluster resource "aws_ecs_cluster" "main" { name = "production" setting { name = "containerInsights" value = "enabled" } } # Application Load Balancer resource "aws_lb" "api" { name = "api-lb" load_balancer_type = "application" subnets = module.vpc.public_subnets security_groups = [aws_security_group.alb.id] } resource "aws_lb_target_group" "api" { name = "api-tg" port = 8080 protocol = "HTTP" vpc_id = module.vpc.vpc_id target_type = "ip" health_check { path = "/health" healthy_threshold = 2 unhealthy_threshold = 3 timeout = 5 interval = 30 matcher = "200" } deregistration_delay = 30 } resource "aws_lb_listener" "api" { load_balancer_arn = aws_lb.api.arn port = "443" protocol = "HTTPS" ssl_policy = "ELBSecurityPolicy-TLS-1-2-2017-01" certificate_arn = aws_acm_certificate.main.arn default_action { type = "forward" target_group_arn = aws_lb_target_group.api.arn } } # ECS Service resource "aws_ecs_service" "api" { name = "api" cluster = aws_ecs_cluster.main.id task_definition = aws_ecs_task_definition.api.arn desired_count = 3 launch_type = "FARGATE" network_configuration { subnets = module.vpc.private_subnets security_groups = [aws_security_group.api.id] assign_public_ip = false } load_balancer { target_group_arn = aws_lb_target_group.api.arn container_name = "api" container_port = 8080 } deployment_configuration { maximum_percent = 200 minimum_healthy_percent = 100 deployment_circuit_breaker { enable = true rollback = true } } # Graceful deployments depends_on = [aws_lb_listener.api] }
Auto-Scaling
Scale based on CPU, memory, or custom metrics:
# Target tracking: Maintain 70% CPU resource "aws_appautoscaling_target" "api" { service_namespace = "ecs" scalable_dimension = "ecs:service:DesiredCount" resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.api.name}" min_capacity = 3 max_capacity = 20 } resource "aws_appautoscaling_policy" "api_cpu" { name = "api-cpu-scaling" policy_type = "TargetTrackingScaling" service_namespace = aws_appautoscaling_target.api.service_namespace scalable_dimension = aws_appautoscaling_target.api.scalable_dimension resource_id = aws_appautoscaling_target.api.resource_id target_tracking_scaling_policy_configuration { predefined_metric_specification { predefined_metric_type = "ECSServiceAverageCPUUtilization" } target_value = 70.0 scale_in_cooldown = 300 scale_out_cooldown = 60 } } # Custom metric: Scale on request count resource "aws_appautoscaling_policy" "api_requests" { name = "api-request-scaling" policy_type = "TargetTrackingScaling" service_namespace = aws_appautoscaling_target.api.service_namespace scalable_dimension = aws_appautoscaling_target.api.scalable_dimension resource_id = aws_appautoscaling_target.api.resource_id target_tracking_scaling_policy_configuration { predefined_metric_specification { predefined_metric_type = "ALBRequestCountPerTarget" resource_label = "${aws_lb.api.arn_suffix}/${aws_lb_target_group.api.arn_suffix}" } target_value = 1000 # Requests per target per minute } }
Blue/Green Deployments
Zero-downtime deployments with CodeDeploy:
resource "aws_codedeploy_app" "api" { name = "api" compute_platform = "ECS" } resource "aws_codedeploy_deployment_group" "api" { app_name = aws_codedeploy_app.api.name deployment_group_name = "api-deployment-group" service_role_arn = aws_iam_role.codedeploy.arn deployment_config_name = "CodeDeployDefault.ECSAllAtOnce" blue_green_deployment_config { terminate_blue_instances_on_deployment_success { action = "TERMINATE" termination_wait_time_in_minutes = 5 } deployment_ready_option { action_on_timeout = "CONTINUE_DEPLOYMENT" } } ecs_service { cluster_name = aws_ecs_cluster.main.name service_name = aws_ecs_service.api.name } load_balancer_info { target_group_pair_info { prod_traffic_route { listener_arns = [aws_lb_listener.api.arn] } target_group { name = aws_lb_target_group.api_blue.name } target_group { name = aws_lb_target_group.api_green.name } } } }
Secrets Management
Never hardcode secrets:
# Store secrets in Secrets Manager resource "aws_secretsmanager_secret" "db_url" { name = "production/database-url" } resource "aws_secretsmanager_secret_version" "db_url" { secret_id = aws_secretsmanager_secret.db_url.id secret_string = var.database_url } # Grant ECS task execution role access resource "aws_iam_role_policy" "ecs_secrets" { role = aws_iam_role.ecs_execution.id policy = jsonencode({ Version = "2012-10-17" Statement = [{ Effect = "Allow" Action = [ "secretsmanager:GetSecretValue" ] Resource = [ aws_secretsmanager_secret.db_url.arn ] }] }) }
Scheduled Tasks (Cron Jobs)
# EventBridge rule for scheduled task resource "aws_cloudwatch_event_rule" "daily_report" { name = "daily-report" description = "Run daily report at 2 AM UTC" schedule_expression = "cron(0 2 * * ? *)" } resource "aws_cloudwatch_event_target" "daily_report" { rule = aws_cloudwatch_event_rule.daily_report.name target_id = "daily-report-task" arn = aws_ecs_cluster.main.arn role_arn = aws_iam_role.events.arn ecs_target { task_count = 1 task_definition_arn = aws_ecs_task_definition.report.arn launch_type = "FARGATE" network_configuration { subnets = module.vpc.private_subnets security_groups = [aws_security_group.tasks.id] } } }
Logging and Monitoring
# CloudWatch Log Group resource "aws_cloudwatch_log_group" "api" { name = "/ecs/api" retention_in_days = 30 } # Container Insights metrics resource "aws_cloudwatch_dashboard" "ecs" { dashboard_name = "ECS-Production" dashboard_body = jsonencode({ widgets = [ { type = "metric" properties = { metrics = [ ["AWS/ECS", "CPUUtilization", {stat = "Average"}], [".", "MemoryUtilization", {stat = "Average"}] ] period = 300 stat = "Average" region = "us-east-1" title = "ECS Resource Utilization" } } ] }) }
CI/CD Pipeline
# GitHub Actions: Build and deploy name: Deploy to ECS on: push: branches: [main] jobs: deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Configure AWS credentials uses: aws-actions/configure-aws-credentials@v2 with: role-to-assume: ${{ secrets.AWS_ROLE_ARN }} aws-region: us-east-1 - name: Login to ECR id: ecr-login uses: aws-actions/amazon-ecr-login@v1 - name: Build and push image env: ECR_REGISTRY: ${{ steps.ecr-login.outputs.registry }} IMAGE_TAG: ${{ github.sha }} run: | docker build -t $ECR_REGISTRY/api:$IMAGE_TAG . docker push $ECR_REGISTRY/api:$IMAGE_TAG - name: Update task definition id: task-def uses: aws-actions/amazon-ecs-render-task-definition@v1 with: task-definition: task-definition.json container-name: api image: ${{ steps.ecr-login.outputs.registry }}/api:${{ github.sha }} - name: Deploy to ECS uses: aws-actions/amazon-ecs-deploy-task-definition@v1 with: task-definition: ${{ steps.task-def.outputs.task-definition }} service: api cluster: production wait-for-service-stability: true
Cost Optimization
1. Fargate Spot
70% discount for interruptible workloads:
resource "aws_ecs_service" "batch" { capacity_provider_strategy { capacity_provider = "FARGATE_SPOT" weight = 100 base = 0 } }
2. Right-sizing
Monitor and adjust:
# CloudWatch metrics show actual usage aws cloudwatch get-metric-statistics \ --namespace AWS/ECS \ --metric-name CPUUtilization \ --dimensions Name=ServiceName,Value=api \ --start-time 2024-01-01T00:00:00Z \ --end-time 2024-01-07T23:59:59Z \ --period 3600 \ --statistics Average
3. Savings Plans
Commit to usage for 30-50% savings.
Production Checklist
- Health checks configured
- Auto-scaling enabled
- Secrets in Secrets Manager/Parameter Store
- Logs exported to CloudWatch
- Container Insights enabled
- Task role follows least privilege
- Deployment circuit breaker enabled
- Multi-AZ deployment
- Load balancer in front
- Resource limits set (CPU/memory)
- Blue/green deployments for critical services
- Monitoring and alerting configured
When NOT to Use ECS
- Heavy Kubernetes investment: Stick with K8s
- Multi-cloud requirement: Use Kubernetes
- Complex service mesh needs: Consider EKS + Istio
- Self-hosted requirement: Use Docker Swarm or Nomad
Conclusion
ECS strikes the balance between simplicity and power. It's the sweet spot for teams that want containers without Kubernetes complexity.
Start with Fargate for simplicity, optimize costs with EC2 launch type when needed, and leverage AWS integrations for a seamless production experience.
Need help architecting your ECS deployment? Schedule a consultation to discuss your container strategy.
You might also like
GitHub Actions CI/CD Pipeline Design for Production
Build reliable, fast CI/CD pipelines with GitHub Actions: caching strategies, secrets management, matrix builds, reusable workflows, and deployment patterns.
Platform Engineering: Building Internal Developer Platforms
Build self-service infrastructure that accelerates development: golden paths, developer portals, and reducing cognitive load at scale.
Production Observability: OpenTelemetry and Distributed Tracing
Implement comprehensive observability with OpenTelemetry: distributed tracing, metrics, and logs in a unified pipeline for production systems.