Deep Dive into ECS & Fargate

A comprehensive guide to understanding and implementing Amazon ECS with Fargate – 2025 Edition

Introduction

Amazon ECS (Elastic Container Service) is a fully managed container orchestration service that helps you run, stop, and manage containers on a cluster. With ECS, your containers are defined in a task definition that you use to run individual tasks or services.

This guide will walk you through everything you need to know about ECS and Fargate—from basic concepts to advanced deployment strategies.

System Components & Interactions

Understanding how your system components interact is key to operating a reliable infrastructure. In an ECS deployment, components like DNS (Route 53), Global Accelerator, load balancers (ALB/NLB), ECS clusters, and security features all work together.

  • Route 53: Directs user requests using DNS.
  • Global Accelerator: Routes traffic based on performance and availability.
  • Application Load Balancer (ALB): Distributes incoming traffic to ECS tasks.
  • ECS Cluster (Fargate): Runs containerized applications in isolated tasks.
  • Security Groups & IAM Roles: Secure access and control AWS API permissions.
  • VPC & Subnets: Define network boundaries.
  • Storage (EFS & S3): Provide persistent storage for your applications.
  • Monitoring & Logging: CloudWatch collects metrics and logs to monitor system health.
End User Route 53 Global Accelerator Application Load Balancer ECS Cluster (Fargate) Task 1 Task 2 Task 3 Security Groups & IAM Roles Storage (EFS & S3)

What is Amazon ECS?

Amazon ECS is a container management service that leverages Docker containers to run your applications. It organizes containers into clusters, tasks, and services for efficient resource management.

Key Concepts

Clusters

Groups of container instances running your tasks.

Task Definitions

Blueprints defining container images and resource requirements.

Tasks

Running instances of your task definitions.

Services

Ensure that a desired number of tasks run continuously.

How ECS Works

ECS operates with a control plane that schedules and manages tasks, and a data plane where tasks run. With the awsvpc network mode, each task gets its own Elastic Network Interface (ENI) for isolated networking.


+---------------------------+
|   ECS Control Plane       |
| (Schedules & Manages)     |
+------------+--------------+
             │
       +-----▼------+
       |   API &    |
       |  Console   |
       +------------+
            

Deployment Options: EC2 vs. Fargate

Running ECS on EC2 gives you full control over the underlying servers, while Fargate abstracts away infrastructure management, letting AWS handle scaling, security, and maintenance.

ECS on EC2

Pros: Full control, potential cost savings, advanced customization.
Cons: Requires manual server management, scaling, and patching.

ECS on Fargate

Pros: No server management; AWS handles scaling, security, and maintenance.
Cons: Less control over the underlying hardware.

Deep Dive into Fargate

AWS Fargate is a serverless compute engine for containers. With Fargate, you only define the container requirements while AWS manages the underlying servers, scaling, and maintenance.

Enhanced Storage

• Up to 200 GB of temporary storage per task.
• Use EFS or S3 for persistent storage.

Networking & Security

  • Each task gets its own ENI for isolated networking.
  • Security groups and VPC isolation protect resources.
  • Task-level IAM roles provide fine-grained permissions.
VPC Fargate Tasks ENI Security Groups

Multi-Region Deployment

Deploying your application across multiple regions increases availability and reduces latency. AWS tools like Route 53 and Global Accelerator help ensure your infrastructure remains consistent.

Primary Region (us-east-1) Secondary Region (eu-west-1) Route 53 + Global Accelerator

Code Examples

8.1 VPC and Network Configuration

This snippet sets up a VPC with public and private subnets. Public subnets host the load balancer, while private subnets run your ECS tasks securely.


# VPC Configuration for each region
Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsSupport: true
      EnableDnsHostnames: true
      Tags:
        - Key: Name
          Value: !Sub ${AWS::StackName}-vpc

  # Public Subnets for Load Balancer
  PublicSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.1.0/24
      AvailabilityZone: !Select [0, !GetAZs '']
      MapPublicIpOnLaunch: true

  PublicSubnet2:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.2.0/24
      AvailabilityZone: !Select [1, !GetAZs '']
      MapPublicIpOnLaunch: true

  # Private Subnets for ECS Tasks
  PrivateSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.3.0/24
      AvailabilityZone: !Select [0, !GetAZs '']

  PrivateSubnet2:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.4.0/24
      AvailabilityZone: !Select [1, !GetAZs '']
            

Explanation:

  • Creates a VPC with a CIDR block of 10.0.0.0/16 with DNS support enabled.
  • Defines two public subnets for hosting the load balancer.
  • Defines two private subnets for securely running ECS tasks.
  • Uses intrinsic functions (!Sub, !Ref, !Select) for dynamic configuration.

8.2 EFS Setup for Persistent Storage

This snippet configures an Amazon EFS file system with automatic backups and mount targets in private subnets.


# EFS Configuration
Resources:
  EFSFileSystem:
    Type: AWS::EFS::FileSystem
    Properties:
      Encrypted: true
      PerformanceMode: generalPurpose
      ThroughputMode: bursting
      BackupPolicy:
        Status: ENABLED
      LifecyclePolicies:
        - TransitionToIA: AFTER_30_DAYS
      FileSystemTags:
        - Key: Name
          Value: !Sub ${AWS::StackName}-efs

  # Mount Targets in Private Subnets
  MountTarget1:
    Type: AWS::EFS::MountTarget
    Properties:
      FileSystemId: !Ref EFSFileSystem
      SubnetId: !Ref PrivateSubnet1
      SecurityGroups: [!Ref EFSSecurityGroup]

  MountTarget2:
    Type: AWS::EFS::MountTarget
    Properties:
      FileSystemId: !Ref EFSFileSystem
      SubnetId: !Ref PrivateSubnet2
      SecurityGroups: [!Ref EFSSecurityGroup]

  # Access Point for ECS Tasks
  EFSAccessPoint:
    Type: AWS::EFS::AccessPoint
    Properties:
      FileSystemId: !Ref EFSFileSystem
      PosixUser:
        Uid: "1000"
        Gid: "1000"
      RootDirectory:
        Path: "/ecs"
        CreationInfo:
          OwnerGid: "1000"
          OwnerUid: "1000"
          Permissions: "755"
            

Explanation:

  • Creates an encrypted EFS file system with general-purpose performance.
  • Enables backup policies and lifecycle management (transition to IA after 30 days).
  • Defines mount targets in private subnets for high availability.
  • Creates an access point with specific POSIX user settings for ECS tasks.

8.3 ECS Task Definition with Storage Integration

This snippet defines an ECS task that mounts an EFS volume and sets environment variables for S3 access.


# ECS Task Definition
Resources:
  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: !Sub ${AWS::StackName}-task
      Cpu: '1024'
      Memory: '2048'
      NetworkMode: awsvpc
      RequiresCompatibilities: [FARGATE]
      ExecutionRoleArn: !Ref ExecutionRole
      TaskRoleArn: !Ref TaskRole
      Volumes:
        - Name: efs-storage
          EFSVolumeConfiguration:
            FilesystemId: !Ref EFSFileSystem
            TransitEncryption: ENABLED
            AuthorizationConfig:
              AccessPointId: !Ref EFSAccessPoint
              IAM: ENABLED
      ContainerDefinitions:
        - Name: app
          Image: !Sub ${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/${ImageRepository}:latest
          Essential: true
          MountPoints:
            - SourceVolume: efs-storage
              ContainerPath: /mnt/efs
              ReadOnly: false
          Environment:
            - Name: S3_BUCKET
              Value: !Ref DataBucket
            - Name: AWS_REGION
              Value: !Ref AWS::Region
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: !Ref LogGroup
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: ecs
            

Explanation:

  • Defines a Fargate task with 1024 CPU units and 2048 MB of memory.
  • Uses the awsvpc network mode for enhanced isolation.
  • Specifies execution and task IAM roles for secure operations.
  • Mounts an EFS volume using the previously created file system and access point.
  • Sets environment variables (S3_BUCKET, AWS_REGION) for runtime configuration.
  • Configures CloudWatch logging for the container.

8.4 Global Load Balancing Configuration

This snippet sets up Route 53 and Global Accelerator for cross-region traffic routing and automatic failover.


# Global Load Balancing
Resources:
  GlobalAccelerator:
    Type: AWS::GlobalAccelerator::Accelerator
    Properties:
      Name: !Sub ${AWS::StackName}-accelerator
      Enabled: true
      IpAddressType: IPV4

  Listener:
    Type: AWS::GlobalAccelerator::Listener
    Properties:
      AcceleratorArn: !Ref GlobalAccelerator
      PortRanges:
        - FromPort: 80
          ToPort: 80
      Protocol: TCP

  EndpointGroup1:
    Type: AWS::GlobalAccelerator::EndpointGroup
    Properties:
      ListenerArn: !Ref Listener
      EndpointGroupRegion: us-east-1
      EndpointConfigurations:
        - EndpointId: !Ref LoadBalancer
          Weight: 100

  EndpointGroup2:
    Type: AWS::GlobalAccelerator::EndpointGroup
    Properties:
      ListenerArn: !Ref Listener
      EndpointGroupRegion: eu-west-1
      EndpointConfigurations:
        - EndpointId: !ImportValue SecondaryRegionALB
          Weight: 100

  DNSRecord:
    Type: AWS::Route53::RecordSet
    Properties:
      HostedZoneId: !Ref HostedZone
      Name: !Sub app.${DomainName}
      Type: A
      AliasTarget:
        DNSName: !GetAtt GlobalAccelerator.DnsName
        HostedZoneId: Z2BJ6XQ5FK7U4H
            

Explanation:

  • Creates a Global Accelerator with IPv4 addressing.
  • Sets up a listener on port 80 using TCP.
  • Defines endpoint groups for us-east-1 and eu-west-1, linking them to load balancers.
  • Configures a Route 53 alias record pointing to the Global Accelerator DNS name.

8.5 Auto Scaling Configuration

This snippet implements dynamic scaling based on CPU and memory usage with defined cooldown periods.


# Auto Scaling Configuration
Resources:
  ServiceScalableTarget:
    Type: AWS::ApplicationAutoScaling::ScalableTarget
    Properties:
      MaxCapacity: 10
      MinCapacity: 2
      ResourceId: !Sub service/${ECSCluster}/${ECSService.Name}
      ScalableDimension: ecs:service:DesiredCount
      ServiceNamespace: ecs
      RoleARN: !GetAtt AutoScalingRole.Arn

  ScalingPolicy:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: !Sub ${AWS::StackName}-scaling-policy
      PolicyType: TargetTrackingScaling
      ScalingTargetId: !Ref ServiceScalableTarget
      TargetTrackingScalingPolicyConfiguration:
        PredefinedMetricSpecification:
          PredefinedMetricType: ECSServiceAverageCPUUtilization
        TargetValue: 70.0
        ScaleInCooldown: 60
        ScaleOutCooldown: 60

  MemoryScalingPolicy:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: !Sub ${AWS::StackName}-memory-scaling
      PolicyType: TargetTrackingScaling
      ScalingTargetId: !Ref ServiceScalableTarget
      TargetTrackingScalingPolicyConfiguration:
        PredefinedMetricSpecification:
          PredefinedMetricType: ECSServiceAverageMemoryUtilization
        TargetValue: 75.0
        ScaleInCooldown: 60
        ScaleOutCooldown: 60
            

Explanation:

  • Defines a scalable target for the ECS service (min 2, max 10 tasks).
  • Creates a CPU-based scaling policy targeting 70% utilization with 60-second cooldowns.
  • Creates a memory-based scaling policy targeting 75% utilization with similar cooldowns.

8.6 Monitoring and Alerts

This snippet configures CloudWatch alarms and dashboards for monitoring ECS service and ALB metrics.


# CloudWatch Alarms
Resources:
  HighCPUAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub ${AWS::StackName}-high-cpu
      MetricName: CPUUtilization
      Namespace: AWS/ECS
      Dimensions:
        - Name: ClusterName
          Value: !Ref ECSCluster
        - Name: ServiceName
          Value: !GetAtt ECSService.Name
      Statistic: Average
      Period: 60
      EvaluationPeriods: 2
      Threshold: 85
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - !Ref AlertTopic

  Dashboard:
    Type: AWS::CloudWatch::Dashboard
    Properties:
      DashboardName: !Sub ${AWS::StackName}-dashboard
      DashboardBody: !Sub |
        {
          "widgets": [
            {
              "type": "metric",
              "properties": {
                "metrics": [
                  ["AWS/ECS", "CPUUtilization", "ServiceName", "${ECSService.Name}", "ClusterName", "${ECSCluster}"],
                  [".", "MemoryUtilization", ".", ".", ".", "."]
                ],
                "period": 300,
                "stat": "Average",
                "region": "${AWS::Region}",
                "title": "Service Metrics"
              }
            },
            {
              "type": "metric",
              "properties": {
                "metrics": [
                  ["AWS/ApplicationELB", "RequestCount", "LoadBalancer", "${LoadBalancer.Name}"],
                  [".", "TargetResponseTime", ".", "."]
                ],
                "period": 300,
                "stat": "Sum",
                "region": "${AWS::Region}",
                "title": "Load Balancer Metrics"
              }
            }
          ]
        }
            

Explanation:

  • Sets up a CloudWatch alarm that triggers when CPU utilization exceeds 85%.
  • Creates a CloudWatch Dashboard displaying ECS and ALB metrics using dynamic substitutions.
  • Helps monitor and troubleshoot the performance of your services.

8.7 Backup and Disaster Recovery

This snippet configures backup policies for EFS and S3, including cross-region replication for disaster recovery.


# Backup Configuration
Resources:
  BackupVault:
    Type: AWS::Backup::BackupVault
    Properties:
      BackupVaultName: !Sub ${AWS::StackName}-vault
      EncryptionKeyArn: !Ref KMSKey

  BackupPlan:
    Type: AWS::Backup::BackupPlan
    Properties:
      BackupPlan:
        BackupPlanName: !Sub ${AWS::StackName}-backup-plan
        BackupPlanRule:
          - RuleName: DailyBackups
            TargetBackupVault: !Ref BackupVault
            ScheduleExpression: cron(0 5 ? * * *)
            StartWindowMinutes: 60
            CompletionWindowMinutes: 120
            Lifecycle:
              DeleteAfterDays: 30
          - RuleName: WeeklyBackups
            TargetBackupVault: !Ref BackupVault
            ScheduleExpression: cron(0 5 ? * 1 *)
            StartWindowMinutes: 60
            CompletionWindowMinutes: 120
            Lifecycle:
              DeleteAfterDays: 90

  ReplicationRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: s3.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: S3ReplicationPolicy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - s3:GetReplicationConfiguration
                  - s3:ListBucket
                Resource: !GetAtt DataBucket.Arn
              - Effect: Allow
                Action:
                  - s3:GetObjectVersionForReplication
                  - s3:GetObjectVersionAcl
                  - s3:GetObjectVersionTagging
                Resource: !Sub ${DataBucket.Arn}/*
              - Effect: Allow
                Action:
                  - s3:ReplicateObject
                  - s3:ReplicateDelete
                  - s3:ReplicateTags
                Resource: !Sub arn:aws:s3:::${SecondaryRegionBucket}/*
            

Explanation:

  • Creates a backup vault with encryption using a specified KMS key.
  • Defines backup plans with daily and weekly rules and retention policies.
  • Sets up an IAM role for S3 replication for cross-region disaster recovery.

8.8 Simple ECS Fargate Cluster Example

A basic CloudFormation snippet that creates a VPC, public subnet, ECS cluster, task definition, and service for Fargate.


AWSTemplateFormatVersion: '2010-09-09'
Description: Basic ECS Fargate Cluster Setup
Parameters:
  VpcCIDR:
    Type: String
    Default: 10.0.0.0/16
Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: !Ref VpcCIDR
      EnableDnsSupport: true
      EnableDnsHostnames: true

  PublicSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.1.0/24
      MapPublicIpOnLaunch: true

  ECSCluster:
    Type: AWS::ECS::Cluster
    Properties:
      ClusterName: MyFargateCluster

  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: MyTask
      Cpu: '512'
      Memory: '1024'
      NetworkMode: awsvpc
      RequiresCompatibilities: [FARGATE]
      ContainerDefinitions:
        - Name: app
          Image: amazon/amazon-ecs-sample:latest
          PortMappings:
            - ContainerPort: 80

  ECSService:
    Type: AWS::ECS::Service
    Properties:
      Cluster: !Ref ECSCluster
      TaskDefinition: !Ref TaskDefinition
      DesiredCount: 2
      LaunchType: FARGATE
      NetworkConfiguration:
        AwsvpcConfiguration:
          Subnets: [!Ref PublicSubnet]
          AssignPublicIp: ENABLED
            

Explanation:

  • Creates a VPC with DNS support using a CIDR block of 10.0.0.0/16.
  • Defines a public subnet that automatically assigns public IP addresses.
  • Creates an ECS cluster named "MyFargateCluster".
  • Defines a task definition for Fargate with a sample container image and port mapping.
  • Creates an ECS service that runs two tasks in the public subnet on Fargate.

Dynamic Data in CloudFormation Templates

CloudFormation uses intrinsic functions like !Ref, !Sub, and !Select to dynamically insert data into your templates.

  • !Ref: Returns the value of a parameter or resource (e.g., !Ref VPC returns the VPC ID).
  • !Sub: Performs string substitution to dynamically generate resource names and IDs.
  • !Select: Retrieves a specific item from a list (for example, choosing an availability zone).
  • !GetAtt: Retrieves attributes from a resource, such as an ARN or URL.
  • !ImportValue: Imports values from other CloudFormation stacks to enable cross-stack references.

These intrinsic functions allow your templates to be dynamic and reusable, automatically updating as resource values change.

Security & Networking In Depth

In this section, we dive deeper into the security and networking aspects of ECS. Discover how each task gets its own ENI, how security groups and VPC isolation protect your infrastructure, and how task-level IAM roles enforce fine-grained permissions.

ENI per Task Security Groups & VPC Task-Level IAM Roles

Each animated block represents a key security component: dedicated ENIs isolate tasks at the network layer; security groups and VPC configurations restrict traffic; and IAM roles ensure that each task only has the permissions it needs.

Summary & Agenda

  1. Introduction: Overview of ECS and Fargate.
  2. System Components & Interactions: How DNS, load balancers, clusters, and security interact.
  3. What is ECS? Key ECS concepts and components.
  4. How ECS Works: The control plane, data plane, and networking (awsvpc mode).
  5. Deployment Options: Comparing ECS on EC2 versus Fargate.
  6. Deep Dive into Fargate: Benefits and inner workings of Fargate.
  7. Multi-Region Deployment: Ensuring high availability and reduced latency.
  8. Code Examples: CloudFormation snippets for VPCs, task definitions, load balancers, auto scaling, and more.
  9. Dynamic Data: Using intrinsic functions to create dynamic, reusable templates.
  10. Security & Networking: Detailed look at networking isolation, security groups, and task-level IAM roles.
  11. Summary & Agenda: Recap of key points and next steps.

This comprehensive guide is designed to give you a clear understanding of ECS & Fargate and how to leverage its features to build secure, scalable containerized applications.