Deploy containerized applications on AWS ECS with auto-scaling, blue/green deployments, and production-grade monitoring.

AWS ECS Production Deployment: The Complete Guide

Amazon ECS is AWS's container orchestration service. It's simpler than Kubernetes but powerful enough for most production workloads. Here's how to deploy containers on ECS like a pro.

ECS vs Kubernetes vs Fargate

ECS EC2:

You manage EC2 instances
Full control over instance types
Lower cost for steady workloads

ECS Fargate:

Serverless containers
No instance management
Pay per vCPU/GB/second
~30% cost premium

Kubernetes (EKS):

More complex, more powerful
Better for multi-cloud
Larger ecosystem

Recommendation: Start with Fargate, move to EC2 if cost or customization needs demand it.

Core Concepts

ECS Cluster
├── Services (long-running tasks)
│   ├── Task Definition (container specs)
│   ├── Tasks (running containers)
│   └── Load Balancer
└── Scheduled Tasks (cron jobs)

Task Definition: The Blueprint

{
  "family": "api",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "taskRoleArn": "arn:aws:iam::ACCOUNT:role/api-task-role",
  "executionRoleArn": "arn:aws:iam::ACCOUNT:role/ecs-execution-role",
  "containerDefinitions": [{
    "name": "api",
    "image": "123456789.dkr.ecr.us-east-1.amazonaws.com/api:v1.2.3",
    "portMappings": [{
      "containerPort": 8080,
      "protocol": "tcp"
    }],
    "environment": [
      {"name": "NODE_ENV", "value": "production"}
    ],
    "secrets": [
      {
        "name": "DATABASE_URL",
        "valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT:secret:db-url"
      }
    ],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "/ecs/api",
        "awslogs-region": "us-east-1",
        "awslogs-stream-prefix": "ecs"
      }
    },
    "healthCheck": {
      "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
      "interval": 30,
      "timeout": 5,
      "retries": 3,
      "startPeriod": 60
    }
  }]
}

Infrastructure as Code with Terraform

# VPC for ECS
module "vpc" {
  source = "terraform-aws-modules/vpc/aws"

  name = "ecs-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = false  # HA: one per AZ
}

# ECS Cluster
resource "aws_ecs_cluster" "main" {
  name = "production"

  setting {
    name  = "containerInsights"
    value = "enabled"
  }
}

# Application Load Balancer
resource "aws_lb" "api" {
  name               = "api-lb"
  load_balancer_type = "application"
  subnets            = module.vpc.public_subnets
  security_groups    = [aws_security_group.alb.id]
}

resource "aws_lb_target_group" "api" {
  name        = "api-tg"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = module.vpc.vpc_id
  target_type = "ip"

  health_check {
    path                = "/health"
    healthy_threshold   = 2
    unhealthy_threshold = 3
    timeout             = 5
    interval            = 30
    matcher             = "200"
  }

  deregistration_delay = 30
}

resource "aws_lb_listener" "api" {
  load_balancer_arn = aws_lb.api.arn
  port              = "443"
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS-1-2-2017-01"
  certificate_arn   = aws_acm_certificate.main.arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.api.arn
  }
}

# ECS Service
resource "aws_ecs_service" "api" {
  name            = "api"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.api.arn
  desired_count   = 3
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = module.vpc.private_subnets
    security_groups  = [aws_security_group.api.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.api.arn
    container_name   = "api"
    container_port   = 8080
  }

  deployment_configuration {
    maximum_percent         = 200
    minimum_healthy_percent = 100

    deployment_circuit_breaker {
      enable   = true
      rollback = true
    }
  }

  # Graceful deployments
  depends_on = [aws_lb_listener.api]
}

Auto-Scaling

Scale based on CPU, memory, or custom metrics:

# Target tracking: Maintain 70% CPU
resource "aws_appautoscaling_target" "api" {
  service_namespace  = "ecs"
  scalable_dimension = "ecs:service:DesiredCount"
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.api.name}"
  min_capacity       = 3
  max_capacity       = 20
}

resource "aws_appautoscaling_policy" "api_cpu" {
  name               = "api-cpu-scaling"
  policy_type        = "TargetTrackingScaling"
  service_namespace  = aws_appautoscaling_target.api.service_namespace
  scalable_dimension = aws_appautoscaling_target.api.scalable_dimension
  resource_id        = aws_appautoscaling_target.api.resource_id

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value = 70.0
    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}

# Custom metric: Scale on request count
resource "aws_appautoscaling_policy" "api_requests" {
  name               = "api-request-scaling"
  policy_type        = "TargetTrackingScaling"
  service_namespace  = aws_appautoscaling_target.api.service_namespace
  scalable_dimension = aws_appautoscaling_target.api.scalable_dimension
  resource_id        = aws_appautoscaling_target.api.resource_id

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ALBRequestCountPerTarget"
      resource_label = "${aws_lb.api.arn_suffix}/${aws_lb_target_group.api.arn_suffix}"
    }
    target_value = 1000  # Requests per target per minute
  }
}

Blue/Green Deployments

Zero-downtime deployments with CodeDeploy:

resource "aws_codedeploy_app" "api" {
  name             = "api"
  compute_platform = "ECS"
}

resource "aws_codedeploy_deployment_group" "api" {
  app_name               = aws_codedeploy_app.api.name
  deployment_group_name  = "api-deployment-group"
  service_role_arn       = aws_iam_role.codedeploy.arn
  deployment_config_name = "CodeDeployDefault.ECSAllAtOnce"

  blue_green_deployment_config {
    terminate_blue_instances_on_deployment_success {
      action                           = "TERMINATE"
      termination_wait_time_in_minutes = 5
    }

    deployment_ready_option {
      action_on_timeout = "CONTINUE_DEPLOYMENT"
    }
  }

  ecs_service {
    cluster_name = aws_ecs_cluster.main.name
    service_name = aws_ecs_service.api.name
  }

  load_balancer_info {
    target_group_pair_info {
      prod_traffic_route {
        listener_arns = [aws_lb_listener.api.arn]
      }

      target_group {
        name = aws_lb_target_group.api_blue.name
      }

      target_group {
        name = aws_lb_target_group.api_green.name
      }
    }
  }
}

Secrets Management

Never hardcode secrets:

# Store secrets in Secrets Manager
resource "aws_secretsmanager_secret" "db_url" {
  name = "production/database-url"
}

resource "aws_secretsmanager_secret_version" "db_url" {
  secret_id     = aws_secretsmanager_secret.db_url.id
  secret_string = var.database_url
}

# Grant ECS task execution role access
resource "aws_iam_role_policy" "ecs_secrets" {
  role = aws_iam_role.ecs_execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Action = [
        "secretsmanager:GetSecretValue"
      ]
      Resource = [
        aws_secretsmanager_secret.db_url.arn
      ]
    }]
  })
}

Scheduled Tasks (Cron Jobs)

# EventBridge rule for scheduled task
resource "aws_cloudwatch_event_rule" "daily_report" {
  name                = "daily-report"
  description         = "Run daily report at 2 AM UTC"
  schedule_expression = "cron(0 2 * * ? *)"
}

resource "aws_cloudwatch_event_target" "daily_report" {
  rule      = aws_cloudwatch_event_rule.daily_report.name
  target_id = "daily-report-task"
  arn       = aws_ecs_cluster.main.arn
  role_arn  = aws_iam_role.events.arn

  ecs_target {
    task_count          = 1
    task_definition_arn = aws_ecs_task_definition.report.arn
    launch_type         = "FARGATE"

    network_configuration {
      subnets         = module.vpc.private_subnets
      security_groups = [aws_security_group.tasks.id]
    }
  }
}

Logging and Monitoring

# CloudWatch Log Group
resource "aws_cloudwatch_log_group" "api" {
  name              = "/ecs/api"
  retention_in_days = 30
}

# Container Insights metrics
resource "aws_cloudwatch_dashboard" "ecs" {
  dashboard_name = "ECS-Production"

  dashboard_body = jsonencode({
    widgets = [
      {
        type = "metric"
        properties = {
          metrics = [
            ["AWS/ECS", "CPUUtilization", {stat = "Average"}],
            [".", "MemoryUtilization", {stat = "Average"}]
          ]
          period = 300
          stat   = "Average"
          region = "us-east-1"
          title  = "ECS Resource Utilization"
        }
      }
    ]
  })
}

CI/CD Pipeline

# GitHub Actions: Build and deploy
name: Deploy to ECS
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: us-east-1

      - name: Login to ECR
        id: ecr-login
        uses: aws-actions/amazon-ecr-login@v1

      - name: Build and push image
        env:
          ECR_REGISTRY: ${{ steps.ecr-login.outputs.registry }}
          IMAGE_TAG: ${{ github.sha }}
        run: |
          docker build -t $ECR_REGISTRY/api:$IMAGE_TAG .
          docker push $ECR_REGISTRY/api:$IMAGE_TAG

      - name: Update task definition
        id: task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: task-definition.json
          container-name: api
          image: ${{ steps.ecr-login.outputs.registry }}/api:${{ github.sha }}

      - name: Deploy to ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: api
          cluster: production
          wait-for-service-stability: true

Cost Optimization

1. Fargate Spot

70% discount for interruptible workloads:

resource "aws_ecs_service" "batch" {
  capacity_provider_strategy {
    capacity_provider = "FARGATE_SPOT"
    weight           = 100
    base             = 0
  }
}

2. Right-sizing

Monitor and adjust:

# CloudWatch metrics show actual usage
aws cloudwatch get-metric-statistics \
  --namespace AWS/ECS \
  --metric-name CPUUtilization \
  --dimensions Name=ServiceName,Value=api \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-07T23:59:59Z \
  --period 3600 \
  --statistics Average

3. Savings Plans

Commit to usage for 30-50% savings.

Production Checklist

Health checks configured
Auto-scaling enabled
Secrets in Secrets Manager/Parameter Store
Logs exported to CloudWatch
Container Insights enabled
Task role follows least privilege
Deployment circuit breaker enabled
Multi-AZ deployment
Load balancer in front
Resource limits set (CPU/memory)
Blue/green deployments for critical services
Monitoring and alerting configured

When NOT to Use ECS

Heavy Kubernetes investment: Stick with K8s
Multi-cloud requirement: Use Kubernetes
Complex service mesh needs: Consider EKS + Istio
Self-hosted requirement: Use Docker Swarm or Nomad

Conclusion

ECS strikes the balance between simplicity and power. It's the sweet spot for teams that want containers without Kubernetes complexity.

Start with Fargate for simplicity, optimize costs with EC2 launch type when needed, and leverage AWS integrations for a seamless production experience.

Need help architecting your ECS deployment? Schedule a consultation to discuss your container strategy.

AWS ECS Production Deployment: The Complete Guide

AWS ECS Production Deployment: The Complete Guide

ECS vs Kubernetes vs Fargate

Core Concepts

Task Definition: The Blueprint

Infrastructure as Code with Terraform

Auto-Scaling

Blue/Green Deployments

Secrets Management

Scheduled Tasks (Cron Jobs)

Logging and Monitoring

CI/CD Pipeline

Cost Optimization

1. Fargate Spot

2. Right-sizing

3. Savings Plans

Production Checklist

When NOT to Use ECS

Conclusion

You might also like

GitHub Actions CI/CD Pipeline Design for Production

Platform Engineering: Building Internal Developer Platforms

Production Observability: OpenTelemetry and Distributed Tracing