AWS ECS: Essential IAM Policies and Configuration Notes

A comprehensive collection of IAM policies, configuration examples, and best practices for working with Amazon Elastic Container Service (ECS).

Amazon Elastic Container Service (ECS) is a fully managed container orchestration service that makes it easy to deploy, manage, and scale containerized applications. This article provides essential IAM policies and configuration notes to help you work effectively with ECS.

ECS Cluster Management Policy

To manage ECS clusters, you’ll need appropriate IAM permissions. The following policy allows users to create and list ECS clusters:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecs:CreateCluster",
        "ecs:ListClusters"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

Policy Explanation:

  • This policy grants permission to create new ECS clusters and list existing ones
  • The `CreateCluster` and `ListClusters` actions do not accept specific resource ARNs
  • The resource is set to `*` (all resources) because these actions operate at the account level

ECS Role for Load Balancer Integration

When integrating ECS with Elastic Load Balancing (ELB), you need to create a role with the following permissions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:Describe*",
        "elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
        "elasticloadbalancing:DeregisterTargets",
        "elasticloadbalancing:Describe*",
        "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
        "elasticloadbalancing:RegisterTargets"
      ],
      "Resource": "*"
    }
  ]
}

Purpose of this Policy:

This policy enables ECS to interact with load balancers by providing permissions to:

  1. Security Group Management: Modify security group rules to allow traffic from load balancers
  2. EC2 Discovery: Describe EC2 instances and their attributes
  3. Load Balancer Registration: Register and deregister instances with both Classic and Application Load Balancers
  4. Load Balancer Configuration: View load balancer configurations and settings

Configuring ECS with Load Balancers using Ansible

When using Ansible to manage ECS services with load balancers, you can configure both classic and application load balancers.

Using Application Load Balancers with Ansible ECS Module

The Ansible `ecs_service` module supports Application Load Balancer integration. You need to specify the ELB target group ARN along with container details:

load_balancers:
  - targetGroupArn: arn:aws:elasticloadbalancing:region:account-id:targetgroup/my-targets/73e2d6bc24d8a067
    containerName: mycontainer
    containerPort: 8080

Required Parameters:

  • `targetGroupArn`: The full ARN of the target group to associate with the service
  • `containerName`: The name of the container to associate with the load balancer (as it appears in the task definition)
  • `containerPort`: The port on the container to associate with the load balancer

Reference: Ansible GitHub Issue #2998

Understanding Target Group Ports in ECS with Application Load Balancers

When configuring Application Load Balancers (ALB) with ECS, it’s important to understand how target group ports work.

Target Group Port Configuration

The target group port specifies which port the load balancer should use to route traffic to your containers. This is distinct from both the listener port (where the load balancer accepts traffic) and the container port (where your application listens inside the container).

For ECS services, you need to ensure that:

  1. The target group port matches the port exposed by your container instances
  2. Security groups allow traffic on this port
  3. The container port in your task definition matches this configuration

For more details, see this Stack Overflow discussion on target group ports with ALB and EC2 Container Service.

EC2 Instance Role for ECS Container Instances

EC2 instances that serve as ECS container instances require specific permissions to communicate with the ECS service, pull images from ECR, and send logs to CloudWatch. The following policy, named `AmazonEC2ContainerServiceforEC2Role`, should be attached to the IAM role used by your container instances:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecs:CreateCluster",
        "ecs:DeregisterContainerInstance",
        "ecs:DiscoverPollEndpoint",
        "ecs:Poll",
        "ecs:RegisterContainerInstance",
        "ecs:StartTelemetrySession",
        "ecs:Submit*",
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "*"
    }
  ]
}

This Policy Enables:

  1. ECS Agent Operations: Allows the ECS agent to register the instance with your clusters and communicate with the ECS service
  2. ECR Integration: Permits pulling container images from Amazon ECR (Elastic Container Registry)
  3. CloudWatch Logs: Enables sending container logs to CloudWatch

Implementation Note: This policy should be attached to the IAM role that you assign to your EC2 instances when launching them. The ECS agent running on these instances will use these permissions to perform its functions.

Availability Zone Alignment for ECS and Load Balancers

Critical Configuration Requirement

When using Elastic Load Balancers with ECS, ensure that your load balancer is configured for the same Availability Zones (AZs) where your ECS instances or tasks are running.

Common Error:

reason: Target is in an Availability Zone that is not enabled for the load balancer

Resolution Steps:

  1. Identify the AZs where your ECS instances or Fargate tasks are running
  2. Verify the AZs enabled for your load balancer
  3. Either:
    • Enable the missing AZs in your load balancer configuration, or
    • Launch your ECS instances/tasks in AZs that match your load balancer

Best Practice: When designing your ECS architecture, plan your AZ strategy in advance to ensure proper alignment between all components.

Manually Creating Application Load Balancers for ECS

Health Check Configuration

When manually creating an Application Load Balancer (ALB) for use with ECS, pay special attention to the health check configuration:

  1. Health Check Path: Configure a specific URI path that your application responds to with a healthy status

    • Example: `/health`, `/status`, or `/ping`
    • For simple applications, you can use `/` if your application responds with a 200 OK status
  2. Health Check Settings:

    • Interval: How frequently the ALB sends health check requests (recommended: 30 seconds)
    • Timeout: How long to wait for a response (recommended: 5 seconds)
    • Healthy threshold: Number of consecutive successful checks to mark as healthy (recommended: 2)
    • Unhealthy threshold: Number of consecutive failed checks to mark as unhealthy (recommended: 2)

Note: Ensure your application properly responds to health check requests at the configured path, or your services will be marked as unhealthy.

Using Application Load Balancers with Amazon ECS

Application Load Balancers (ALBs) are ideal for container-based applications and microservices architectures running on ECS. They provide advanced routing capabilities and deeper integration with ECS.

Key Features of Application Load Balancers for ECS

  1. Path-based Routing: Route traffic to different ECS services based on URL paths

    • Example: `/api/*` routes to your API service while `/app/*` routes to your web application
  2. Multiple Port Support: Register the same container instance on multiple ports

    • Enables running multiple containerized applications on the same EC2 instance
  3. AWS Service Integration: Works seamlessly with ECS, IAM, Auto Scaling, and CloudFormation

    • Simplifies deployment and management of containerized applications
  4. Enhanced Monitoring: Provides detailed metrics and improved health checks

    • Better visibility into application performance and health

Core Components of Application Load Balancers

  1. Load Balancer: The entry point for all client traffic

  2. Listener: Evaluates incoming connection requests based on protocol and port

    • Example: HTTP on port 80 or HTTPS on port 443
  3. Rules: Define how requests are routed to target groups

    • Conditions can include path patterns, host headers, or query parameters
    • Each rule has a priority that determines processing order
  4. Target: The destination for traffic (EC2 instances, IP addresses, or Lambda functions)

    • For ECS, these are your container instances or Fargate tasks
  5. Target Group: A logical grouping of targets

    • Each ECS service typically maps to a target group
    • Health checks are configured at the target group level

For a detailed walkthrough, see AWS’s guide on Microservice Delivery with Amazon ECS and Application Load Balancers.

Troubleshooting ECS with Load Balancers

Common Issues and Solutions

When your ECS services aren’t properly connecting with Elastic Load Balancers, check these common problem areas:

  1. Security Group Configuration

    • Ensure the load balancer security group can reach your container instances
    • Verify container instance security groups allow traffic from the load balancer
    • Check that the correct ports are open (both listener port and target port)
  2. Application Health

    • Confirm your application is running correctly inside the container
    • Verify the application responds properly to health check requests
    • Check container logs for application errors (via CloudWatch Logs)
  3. Network Configuration

    • Verify subnet configurations for both load balancer and ECS tasks/instances
    • Ensure Availability Zones are aligned between load balancer and ECS resources
    • For tasks in private subnets, check NAT gateway configuration if internet access is needed
  4. Service Configuration

    • Confirm the correct container name and port are specified in the service definition
    • Verify the target group ARN is correct
    • Check that the task definition exposes the expected port

Diagnostic Steps

  1. Check ECS service events for deployment failures
  2. Review target group health status in the EC2 console
  3. Examine ECS task status and stopped task reasons
  4. Inspect container logs for application errors

Configuring ECS Services with Load Balancers

Service Definition Best Practices

When creating or updating ECS services that use load balancers, follow these configuration guidelines:

  1. Task Definition Preparation

    • Ensure your container definition includes the correct port mappings
    • Use dynamic host port mapping for better resource utilization
    • Consider using the `awsvpc` network mode for simplified networking
  2. Load Balancer Configuration

    • Create and configure your target group before creating the ECS service
    • Set appropriate health check settings based on your application’s startup time
    • Configure stickiness settings if your application requires session persistence
  3. Service Definition

    • Specify the correct load balancer details in your service definition
    • Include container name, container port, and target group ARN
    • Set an appropriate deployment configuration for rolling updates

Example Service Configuration with Load Balancer

{
  "cluster": "my-cluster",
  "serviceName": "my-service",
  "taskDefinition": "my-task:1",
  "loadBalancers": [
    {
      "targetGroupArn": "arn:aws:elasticloadbalancing:region:account:targetgroup/my-targets/73e2d6bc24d8a067",
      "containerName": "web-app",
      "containerPort": 8080
    }
  ],
  "desiredCount": 2,
  "launchType": "FARGATE",
  "networkConfiguration": {
    "awsvpcConfiguration": {
      "subnets": ["subnet-12345", "subnet-67890"],
      "securityGroups": ["sg-12345"],
      "assignPublicIp": "ENABLED"
    }
  }
}

Ansible ECS Service with Load Balancers: Documentation Notes

Required Load Balancer Parameters

When using the Ansible `ecs_service` module with load balancers, each load balancer dictionary must include the following parameters:

  1. For Classic Load Balancers:

    • `loadBalancerName`: The name of the load balancer
    • `containerName`: The name of the container to associate with the load balancer
    • `containerPort`: The port on the container to associate with the load balancer
  2. For Application/Network Load Balancers:

    • `targetGroupArn`: The ARN of the target group
    • `containerName`: The name of the container to associate with the load balancer
    • `containerPort`: The port on the container to associate with the load balancer

Documentation Gap

The Ansible documentation for the `ecs_service` module doesn’t fully specify these required parameters, but they are documented in the AWS Boto3 SDK documentation: Boto3 ECS Documentation

Example Ansible Playbook

- name: Create ECS service with load balancer
  ecs_service:
    state: present
    name: my-service
    cluster: my-cluster
    task_definition: my-task
    desired_count: 3
    load_balancers:
      - targetGroupArn: arn:aws:elasticloadbalancing:region:account:targetgroup/my-targets/73e2d6bc24d8a067
        containerName: web-app
        containerPort: 80

Note: Always refer to the Boto3 ECS Documentation for the most accurate and complete parameter requirements.

ECS Cluster Auto-scaling Best Practices

Understanding ECS Cluster Scaling

Effective auto-scaling for ECS clusters requires understanding the relationship between container requirements and instance capacity. The key is to scale based on resource utilization rather than arbitrary cluster size.

Calculating the Optimal Scaling Threshold

The formula for determining when to scale your ECS cluster is:

Threshold = (1 - max(Container Reservation) / Total Capacity of a Single Container Instance) * 100

Example Calculation:

Assume you have:

  • Container instance capacity: 2048 MB RAM
  • Maximum container reservation: 512 MB RAM
Threshold = (1 - 512 / 2048) * 100
Threshold = (1 - 0.25) * 100
Threshold = 75%

This means you should set your auto-scaling policy to add capacity when the cluster’s memory reservation exceeds 75%.

Comprehensive Scaling Approach

  1. Calculate thresholds for both memory and CPU:

    • Memory threshold as shown above
    • CPU threshold using the same formula with CPU units
  2. Use the lower of the two thresholds:

    • This ensures you scale before either resource becomes a bottleneck
  3. Configure CloudWatch alarms:

    • Set alarms based on `MemoryReservation` and `CPUReservation` metrics
    • Use the calculated thresholds as alarm triggers
  4. Set appropriate scaling policies:

    • Scale out quickly (add instances when threshold is reached)
    • Scale in conservatively (remove instances when utilization is low for a sustained period)

For a detailed walkthrough of this approach, see Docker on ECS: Scale your cluster automatically.

ECS Container Instance Troubleshooting

Common Issue: Containers Terminated During Startup

One frequent problem with ECS containers, especially for Java applications, is containers being marked unhealthy and terminated before they fully initialize.

Root Cause Analysis

This typically happens because:

  1. Inadequate Health Check Configuration: Health checks begin too soon after container startup
  2. Long Application Startup Time: Applications with significant initialization time (like JVM-based applications) need more time before they can respond to health checks
  3. Mismatched Timing Parameters: The health check interval and threshold settings don’t accommodate the application’s startup characteristics

Solution Strategies

  1. Adjust Health Check Settings:

    • Increase the `healthCheckGracePeriodSeconds` in your ECS service definition
    • For Java applications, consider values of 60-120 seconds or more
  2. Optimize Application Startup:

    • Implement a quick-responding health endpoint that returns healthy before full initialization
    • Consider using Spring Boot’s health indicators with separate readiness/liveness probes
  3. Task Definition Improvements:

    • Use the `startPeriod` parameter in container health checks to provide initialization time
    • Increase `interval` between health checks
    • Adjust `retries` to allow more failed health checks before marking unhealthy

Example Task Definition with Optimized Health Check

"healthCheck": {
  "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
  "interval": 30,
  "timeout": 5,
  "retries": 3,
  "startPeriod": 120
}

The `startPeriod` of 120 seconds gives the JVM and application time to initialize before health checks begin counting toward the healthy/unhealthy status.