AWS ECS: Essential IAM Policies and Configuration Notes
Amazon Elastic Container Service (ECS) is a fully managed container orchestration service that makes it easy to deploy, manage, and scale containerized applications. This article provides essential IAM policies and configuration notes to help you work effectively with ECS.
ECS Cluster Management Policy
To manage ECS clusters, you’ll need appropriate IAM permissions. The following policy allows users to create and list ECS clusters:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:CreateCluster",
"ecs:ListClusters"
],
"Resource": [
"*"
]
}
]
}
Policy Explanation:
- This policy grants permission to create new ECS clusters and list existing ones
- The `CreateCluster` and `ListClusters` actions do not accept specific resource ARNs
- The resource is set to `*` (all resources) because these actions operate at the account level
ECS Role for Load Balancer Integration
When integrating ECS with Elastic Load Balancing (ELB), you need to create a role with the following permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:AuthorizeSecurityGroupIngress",
"ec2:Describe*",
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:DeregisterTargets",
"elasticloadbalancing:Describe*",
"elasticloadbalancing:RegisterInstancesWithLoadBalancer",
"elasticloadbalancing:RegisterTargets"
],
"Resource": "*"
}
]
}
Purpose of this Policy:
This policy enables ECS to interact with load balancers by providing permissions to:
- Security Group Management: Modify security group rules to allow traffic from load balancers
- EC2 Discovery: Describe EC2 instances and their attributes
- Load Balancer Registration: Register and deregister instances with both Classic and Application Load Balancers
- Load Balancer Configuration: View load balancer configurations and settings
Configuring ECS with Load Balancers using Ansible
When using Ansible to manage ECS services with load balancers, you can configure both classic and application load balancers.
Using Application Load Balancers with Ansible ECS Module
The Ansible `ecs_service` module supports Application Load Balancer integration. You need to specify the ELB target group ARN along with container details:
load_balancers:
- targetGroupArn: arn:aws:elasticloadbalancing:region:account-id:targetgroup/my-targets/73e2d6bc24d8a067
containerName: mycontainer
containerPort: 8080
Required Parameters:
- `targetGroupArn`: The full ARN of the target group to associate with the service
- `containerName`: The name of the container to associate with the load balancer (as it appears in the task definition)
- `containerPort`: The port on the container to associate with the load balancer
Reference: Ansible GitHub Issue #2998
Understanding Target Group Ports in ECS with Application Load Balancers
When configuring Application Load Balancers (ALB) with ECS, it’s important to understand how target group ports work.
Target Group Port Configuration
The target group port specifies which port the load balancer should use to route traffic to your containers. This is distinct from both the listener port (where the load balancer accepts traffic) and the container port (where your application listens inside the container).
For ECS services, you need to ensure that:
- The target group port matches the port exposed by your container instances
- Security groups allow traffic on this port
- The container port in your task definition matches this configuration
For more details, see this Stack Overflow discussion on target group ports with ALB and EC2 Container Service.
EC2 Instance Role for ECS Container Instances
EC2 instances that serve as ECS container instances require specific permissions to communicate with the ECS service, pull images from ECR, and send logs to CloudWatch. The following policy, named `AmazonEC2ContainerServiceforEC2Role`, should be attached to the IAM role used by your container instances:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:CreateCluster",
"ecs:DeregisterContainerInstance",
"ecs:DiscoverPollEndpoint",
"ecs:Poll",
"ecs:RegisterContainerInstance",
"ecs:StartTelemetrySession",
"ecs:Submit*",
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
]
}
This Policy Enables:
- ECS Agent Operations: Allows the ECS agent to register the instance with your clusters and communicate with the ECS service
- ECR Integration: Permits pulling container images from Amazon ECR (Elastic Container Registry)
- CloudWatch Logs: Enables sending container logs to CloudWatch
Implementation Note: This policy should be attached to the IAM role that you assign to your EC2 instances when launching them. The ECS agent running on these instances will use these permissions to perform its functions.
Availability Zone Alignment for ECS and Load Balancers
Critical Configuration Requirement
When using Elastic Load Balancers with ECS, ensure that your load balancer is configured for the same Availability Zones (AZs) where your ECS instances or tasks are running.
Common Error:
reason: Target is in an Availability Zone that is not enabled for the load balancer
Resolution Steps:
- Identify the AZs where your ECS instances or Fargate tasks are running
- Verify the AZs enabled for your load balancer
- Either:
- Enable the missing AZs in your load balancer configuration, or
- Launch your ECS instances/tasks in AZs that match your load balancer
Best Practice: When designing your ECS architecture, plan your AZ strategy in advance to ensure proper alignment between all components.
Manually Creating Application Load Balancers for ECS
Health Check Configuration
When manually creating an Application Load Balancer (ALB) for use with ECS, pay special attention to the health check configuration:
-
Health Check Path: Configure a specific URI path that your application responds to with a healthy status
- Example: `/health`, `/status`, or `/ping`
- For simple applications, you can use `/` if your application responds with a 200 OK status
-
Health Check Settings:
- Interval: How frequently the ALB sends health check requests (recommended: 30 seconds)
- Timeout: How long to wait for a response (recommended: 5 seconds)
- Healthy threshold: Number of consecutive successful checks to mark as healthy (recommended: 2)
- Unhealthy threshold: Number of consecutive failed checks to mark as unhealthy (recommended: 2)
Note: Ensure your application properly responds to health check requests at the configured path, or your services will be marked as unhealthy.
Using Application Load Balancers with Amazon ECS
Application Load Balancers (ALBs) are ideal for container-based applications and microservices architectures running on ECS. They provide advanced routing capabilities and deeper integration with ECS.
Key Features of Application Load Balancers for ECS
-
Path-based Routing: Route traffic to different ECS services based on URL paths
- Example: `/api/*` routes to your API service while `/app/*` routes to your web application
-
Multiple Port Support: Register the same container instance on multiple ports
- Enables running multiple containerized applications on the same EC2 instance
-
AWS Service Integration: Works seamlessly with ECS, IAM, Auto Scaling, and CloudFormation
- Simplifies deployment and management of containerized applications
-
Enhanced Monitoring: Provides detailed metrics and improved health checks
- Better visibility into application performance and health
Core Components of Application Load Balancers
-
Load Balancer: The entry point for all client traffic
-
Listener: Evaluates incoming connection requests based on protocol and port
- Example: HTTP on port 80 or HTTPS on port 443
-
Rules: Define how requests are routed to target groups
- Conditions can include path patterns, host headers, or query parameters
- Each rule has a priority that determines processing order
-
Target: The destination for traffic (EC2 instances, IP addresses, or Lambda functions)
- For ECS, these are your container instances or Fargate tasks
-
Target Group: A logical grouping of targets
- Each ECS service typically maps to a target group
- Health checks are configured at the target group level
For a detailed walkthrough, see AWS’s guide on Microservice Delivery with Amazon ECS and Application Load Balancers.
Troubleshooting ECS with Load Balancers
Common Issues and Solutions
When your ECS services aren’t properly connecting with Elastic Load Balancers, check these common problem areas:
-
Security Group Configuration
- Ensure the load balancer security group can reach your container instances
- Verify container instance security groups allow traffic from the load balancer
- Check that the correct ports are open (both listener port and target port)
-
Application Health
- Confirm your application is running correctly inside the container
- Verify the application responds properly to health check requests
- Check container logs for application errors (via CloudWatch Logs)
-
Network Configuration
- Verify subnet configurations for both load balancer and ECS tasks/instances
- Ensure Availability Zones are aligned between load balancer and ECS resources
- For tasks in private subnets, check NAT gateway configuration if internet access is needed
-
Service Configuration
- Confirm the correct container name and port are specified in the service definition
- Verify the target group ARN is correct
- Check that the task definition exposes the expected port
Diagnostic Steps
- Check ECS service events for deployment failures
- Review target group health status in the EC2 console
- Examine ECS task status and stopped task reasons
- Inspect container logs for application errors
Configuring ECS Services with Load Balancers
Service Definition Best Practices
When creating or updating ECS services that use load balancers, follow these configuration guidelines:
-
Task Definition Preparation
- Ensure your container definition includes the correct port mappings
- Use dynamic host port mapping for better resource utilization
- Consider using the `awsvpc` network mode for simplified networking
-
Load Balancer Configuration
- Create and configure your target group before creating the ECS service
- Set appropriate health check settings based on your application’s startup time
- Configure stickiness settings if your application requires session persistence
-
Service Definition
- Specify the correct load balancer details in your service definition
- Include container name, container port, and target group ARN
- Set an appropriate deployment configuration for rolling updates
Example Service Configuration with Load Balancer
{
"cluster": "my-cluster",
"serviceName": "my-service",
"taskDefinition": "my-task:1",
"loadBalancers": [
{
"targetGroupArn": "arn:aws:elasticloadbalancing:region:account:targetgroup/my-targets/73e2d6bc24d8a067",
"containerName": "web-app",
"containerPort": 8080
}
],
"desiredCount": 2,
"launchType": "FARGATE",
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": ["subnet-12345", "subnet-67890"],
"securityGroups": ["sg-12345"],
"assignPublicIp": "ENABLED"
}
}
}
Ansible ECS Service with Load Balancers: Documentation Notes
Required Load Balancer Parameters
When using the Ansible `ecs_service` module with load balancers, each load balancer dictionary must include the following parameters:
-
For Classic Load Balancers:
- `loadBalancerName`: The name of the load balancer
- `containerName`: The name of the container to associate with the load balancer
- `containerPort`: The port on the container to associate with the load balancer
-
For Application/Network Load Balancers:
- `targetGroupArn`: The ARN of the target group
- `containerName`: The name of the container to associate with the load balancer
- `containerPort`: The port on the container to associate with the load balancer
Documentation Gap
The Ansible documentation for the `ecs_service` module doesn’t fully specify these required parameters, but they are documented in the AWS Boto3 SDK documentation: Boto3 ECS Documentation
Example Ansible Playbook
- name: Create ECS service with load balancer
ecs_service:
state: present
name: my-service
cluster: my-cluster
task_definition: my-task
desired_count: 3
load_balancers:
- targetGroupArn: arn:aws:elasticloadbalancing:region:account:targetgroup/my-targets/73e2d6bc24d8a067
containerName: web-app
containerPort: 80
Note: Always refer to the Boto3 ECS Documentation for the most accurate and complete parameter requirements.
ECS Cluster Auto-scaling Best Practices
Understanding ECS Cluster Scaling
Effective auto-scaling for ECS clusters requires understanding the relationship between container requirements and instance capacity. The key is to scale based on resource utilization rather than arbitrary cluster size.
Calculating the Optimal Scaling Threshold
The formula for determining when to scale your ECS cluster is:
Threshold = (1 - max(Container Reservation) / Total Capacity of a Single Container Instance) * 100
Example Calculation:
Assume you have:
- Container instance capacity: 2048 MB RAM
- Maximum container reservation: 512 MB RAM
Threshold = (1 - 512 / 2048) * 100
Threshold = (1 - 0.25) * 100
Threshold = 75%
This means you should set your auto-scaling policy to add capacity when the cluster’s memory reservation exceeds 75%.
Comprehensive Scaling Approach
-
Calculate thresholds for both memory and CPU:
- Memory threshold as shown above
- CPU threshold using the same formula with CPU units
-
Use the lower of the two thresholds:
- This ensures you scale before either resource becomes a bottleneck
-
Configure CloudWatch alarms:
- Set alarms based on `MemoryReservation` and `CPUReservation` metrics
- Use the calculated thresholds as alarm triggers
-
Set appropriate scaling policies:
- Scale out quickly (add instances when threshold is reached)
- Scale in conservatively (remove instances when utilization is low for a sustained period)
For a detailed walkthrough of this approach, see Docker on ECS: Scale your cluster automatically.
ECS Container Instance Troubleshooting
Common Issue: Containers Terminated During Startup
One frequent problem with ECS containers, especially for Java applications, is containers being marked unhealthy and terminated before they fully initialize.
Root Cause Analysis
This typically happens because:
- Inadequate Health Check Configuration: Health checks begin too soon after container startup
- Long Application Startup Time: Applications with significant initialization time (like JVM-based applications) need more time before they can respond to health checks
- Mismatched Timing Parameters: The health check interval and threshold settings don’t accommodate the application’s startup characteristics
Solution Strategies
-
Adjust Health Check Settings:
- Increase the `healthCheckGracePeriodSeconds` in your ECS service definition
- For Java applications, consider values of 60-120 seconds or more
-
Optimize Application Startup:
- Implement a quick-responding health endpoint that returns healthy before full initialization
- Consider using Spring Boot’s health indicators with separate readiness/liveness probes
-
Task Definition Improvements:
- Use the `startPeriod` parameter in container health checks to provide initialization time
- Increase `interval` between health checks
- Adjust `retries` to allow more failed health checks before marking unhealthy
Example Task Definition with Optimized Health Check
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 120
}
The `startPeriod` of 120 seconds gives the JVM and application time to initialize before health checks begin counting toward the healthy/unhealthy status.