From Terraform to GitOps: Deploying a Three-Tier Microservices on EKS

About the Project

I built this project to get hands-on with the tools I’d actually use in a DevOps role. It’s a three-tier quiz application: React frontend, Flask backend, and PostgreSQL on RDS, all running on AWS EKS.

What makes this interesting (to me at least) is the clean separation between infrastructure and application using the App of Apps pattern. Terraform manages the infrastructure (VPC, EKS, RDS, ECR, IAM roles) and installs ArgoCD. Then, a single kubectl apply bootstraps the entire GitOps workflow via a root Application. That root Application manages everything else - platform tools (ALB Controller, External Secrets Operator, Prometheus) and the actual application workloads - using sync waves to ensure correct deployment order.

The flow is: Terraform → kubectl apply → Root App → Platform Apps → Workloads. This is a production-grade GitOps architecture where Terraform creates the infrastructure and the “GitOps engine” (ArgoCD), then steps back and lets GitOps drive everything else.

I also use External Secrets Operator to sync secrets from AWS Secrets Manager to Kubernetes. This means no secrets in Git, and no manual secret management. Everything is automated and secure.

For CI/CD, I went with GitHub Actions using OIDC to authenticate with AWS. No long-lived credentials sitting in GitHub Secrets. ArgoCD handles the GitOps side, so any changes to the Kubernetes manifests get synced automatically. And I’ve got Prometheus/Grafana for monitoring because you can’t improve what you can’t measure.

If you want to see the code, it’s all on GitHub.

Architecture & Design Decisions

Let me explain why I made certain choices. Some were obvious, others I learnt the hard way.

Why Terraform for Everything?

I actually started with eksctl for the cluster. Seemed easier at first. But then I had infrastructure in two places: some in eksctl YAML, some in Terraform. It got confusing fast.

Moving everything to Terraform meant one source of truth. The VPC, EKS cluster, RDS, ECR repos, IAM roles, all defined in one place. Now I can spin up the whole environment with terraform apply and tear it down with terraform destroy. No more “oh crap I forgot to delete that security group” moments.

Why Managed Node Groups?

EKS gives you three compute options: self-managed nodes, managed node groups, or Fargate.

I picked managed node groups because they hit the sweet spot. AWS handles the boring stuff: AMI updates, node lifecycle, patching. But I still control instance types and scaling.

Fargate would be simpler (no nodes to manage at all) but it costs more for sustained workloads. Self-managed nodes give you full control but that’s overkill for this project. I don’t want to be SSHing into nodes to update packages.

How It All Connects

The journey of a request:

User hits the ALB (sitting in a public subnet)
Ingress Controller routes /api calls to the backend, everything else to the frontend
Application Layer React serves the UI, Flask handles the API
Data Layer RDS PostgreSQL tucked away in isolated database subnets, only accessible from the EKS nodes

Prerequisites

You will need aws cli, kubectl, terraform, and docker installed locally. I am assuming you have basic familiarity with Kubernetes and AWS.

Cloning the Repository

git clone https://github.com/wegoagain-dev/3-tier-eks.git
cd 3-tier-eks

Step 1: Understanding the Terraform Structure

Heads up on costs: This setup will run you about $190-200/month if you leave it running.

Component What it is Monthly Cost
EKS Control Plane The Kubernetes master ~$73
EC2 Nodes (2x) t3.medium instances ~$60
RDS PostgreSQL db.t3.micro database ~$13
NAT Gateway So private subnets can reach the internet ~$32
ALB Load balancer ~$16
Total ~$194/month

Based on eu-west-2 (London) pricing. Don’t forget to terraform destroy when you’re done experimenting!

Component	What it is	Monthly Cost
EKS Control Plane	The Kubernetes master	~$73
EC2 Nodes (2x)	t3.medium instances	~$60
RDS PostgreSQL	db.t3.micro database	~$13
NAT Gateway	So private subnets can reach the internet	~$32
ALB	Load balancer	~$16
Total		~$194/month

The terraform/ directory contains all the infrastructure code:

1
terraform/
2
├── provider.tf          # AWS and Helm providers + S3 backend
3
├── variables.tf         # All configurable values
4
├── vpc.tf               # VPC with public, private, and database subnets
5
├── eks.tf               # EKS cluster using terraform-aws-modules
6
├── rds.tf               # PostgreSQL RDS in database subnets
7
├── external_secrets.tf  # IAM role for External Secrets Operator (IRSA)
8
├── lb_controller.tf     # IAM role for AWS Load Balancer Controller (IRSA)
9
├── argocd.tf            # ArgoCD installation via Helm
10
├── ecr.tf               # ECR repositories for frontend and backend
11
├── github_oidc.tf       # OIDC provider for GitHub Actions
12
└── outputs.tf           # Useful outputs like RDS endpoint

Notice what’s missing: Terraform doesn’t install Helm charts for platform tools directly. Instead, it creates ArgoCD, then you bootstrap GitOps with one kubectl command. The root Application then manages all platform tools via the k8s/platform/ directory. This is the App of Apps pattern - Terraform bootstraps GitOps, then GitOps takes over.

The VPC Setup

I use the terraform-aws-modules/vpc/aws module to create a proper network layout:

1
module "vpc" {
2
  source  = "terraform-aws-modules/vpc/aws"
3
  version = "~> 5.1.0"
4

5
  name = var.cluster_name
6
  cidr = var.vpc_cidr
7

8
  azs              = var.azs
9
  public_subnets   = var.public_subnets
10
  private_subnets  = var.private_subnets
11
  database_subnets = var.database_subnets
12

13
  enable_nat_gateway = true
14
  single_nat_gateway = true  # Saves money for dev environments
15

16
  enable_dns_hostnames = true
17
  enable_dns_support   = true
18

19
  # Tags required for EKS to discover subnets
20
  public_subnet_tags = {
21
    "kubernetes.io/role/elb" = "1"
22
  }
23
  private_subnet_tags = {
24
    "kubernetes.io/role/internal-elb" = "1"
25
  }
26
}

The key bit here is the three tier subnet layout:

Public subnets: Where the ALB lives
Private subnets: Where the EKS nodes run (they get internet access via NAT Gateway)
Database subnets: Completely isolated, no internet access at all

The EKS Cluster

The EKS module handles all the heavy lifting:

1
module "eks" {
2
  source  = "terraform-aws-modules/eks/aws"
3
  version = "~> 19.0"
4

5
  cluster_name    = var.cluster_name
6
  cluster_version = "1.31"
7

8
  vpc_id     = module.vpc.vpc_id
9
  subnet_ids = module.vpc.private_subnets
10

11
  eks_managed_node_groups = {
12
    standard-workers = {
13
      instance_types = ["t3.medium"]
14
      min_size       = 1
15
      max_size       = 3
16
      desired_size   = 2
17

18
      iam_role_additional_policies = {
19
        AmazonEKSWorkerNodePolicy          = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
20
        AmazonEKS_CNI_Policy               = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
21
        AmazonEC2ContainerRegistryReadOnly = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
22
        CloudWatchAgentServerPolicy        = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
23
      }
24
    }
25
  }
26

27
  enable_irsa                    = true
28
  cluster_endpoint_public_access = true
29
}

The enable_irsa = true is important. This sets up an OIDC provider for the cluster, which lets Kubernetes ServiceAccounts assume IAM roles. The AWS Load Balancer Controller and External Secrets Operator need this to access AWS services.

Step 2: Understanding OIDC and IRSA

This is where things get interesting. Two different OIDC providers, two different purposes, both working together to eliminate hardcoded credentials.

What is OIDC?

OIDC (OpenID Connect) is an authentication protocol built on top of OAuth 2.0. It allows systems to verify identity via signed tokens without sharing passwords. In AWS, OIDC providers let you authenticate with AWS services using tokens from external identity providers.

The Two OIDC Providers in This Setup

1. GitHub OIDC Provider (for CI/CD Authentication)

This lets GitHub Actions authenticate with AWS without storing long-lived credentials in GitHub Secrets.

How it works:

GitHub Actions requests a JWT token from GitHub’s OIDC provider
Sends token to AWS STS (Security Token Service)
AWS validates token against your OIDC provider
Returns temporary AWS credentials (valid for 1 hour)
Actions uses credentials to push to ECR

The Terraform setup:

1
resource "aws_iam_openid_connect_provider" "github" {
2
  url             = "https://token.actions.githubusercontent.com"
3
  client_id_list  = ["sts.amazonaws.com"]
4
  thumbprint_list = ["6938fd4d98bab03faadb97b34396831e3780aea1"]
5
}
6

7
resource "aws_iam_role" "github_actions" {
8
  name = "GitHubActionsECR"
9

10
  assume_role_policy = jsonencode({
11
    Version = "2012-10-17"
12
    Statement = [{
13
      Action = "sts:AssumeRoleWithWebIdentity"
14
      Effect = "Allow"
15
      Principal = {
16
        Federated = aws_iam_openid_connect_provider.github.arn
17
      }
18
      Condition = {
19
        StringLike = {
20
          "token.actions.githubusercontent.com:sub" = "repo:${var.github_repo}:*"
21
        }
22
      }
23
    }]
24
  })
25
}

The trust policy says: “Only GitHub Actions workflows from my specific repo can assume this role.”

2. EKS OIDC Provider (for IRSA)

This is created automatically when you set enable_irsa = true. It allows Kubernetes pods to assume IAM roles using their ServiceAccounts.

IRSA (IAM Roles for Service Accounts) is an EKS feature that uses OIDC to give pods AWS permissions without giving those permissions to the entire node.

How IRSA Works (The Full Flow)

Without IRSA, you’d have to give the EC2 nodes broad IAM permissions, and every pod on those nodes would inherit those permissions. That’s a security nightmare - your quiz app pods shouldn’t be able to create load balancers!

With IRSA, each pod only gets the permissions attached to its specific ServiceAccount.

The complete flow:

1
1. Terraform creates IAM role with trust policy
2
   ↓
3
2. Trust policy specifies: "Only pods with ServiceAccount X can assume me"
4
   ↓
5
3. ArgoCD creates ServiceAccount with annotation: eks.amazonaws.com/role-arn
6
   ↓
7
4. Pod starts using that ServiceAccount
8
   ↓
9
5. EKS injects OIDC token (JWT) into pod at /var/run/secrets/eks.amazonaws.com/serviceaccount/token
10
   ↓
11
6. AWS SDK in pod reads token and sends to STS: "I want to assume role Y"
12
   ↓
13
7. STS validates: "Is this token signed by my trusted OIDC provider? Does the subject match?"
14
   ↓
15
8. STS returns temporary AWS credentials (valid for 1 hour)
16
   ↓
17
9. Pod uses credentials to make AWS API calls

Concrete Example: External Secrets Operator

1. Terraform creates the IAM role:

1
resource "aws_iam_role" "external_secrets" {
2
  name = "devops-quiz-external-secrets-role"
3

4
  assume_role_policy = jsonencode({
5
    Version = "2012-10-17"
6
    Statement = [{
7
      Effect = "Allow"
8
      Principal = {
9
        Federated = module.eks.oidc_provider_arn  # EKS OIDC provider
10
      }
11
      Action = "sts:AssumeRoleWithWebIdentity"
12
      Condition = {
13
        StringEquals = {
14
          # Only this specific ServiceAccount can assume the role
15
          "${module.eks.oidc_provider}:sub" = "system:serviceaccount:external-secrets:external-secrets"
16
        }
17
      }
18
    }]
19
  })
20
}
21

22
# Attach policy to read Secrets Manager
23
resource "aws_iam_role_policy" "external_secrets" {
24
  name = "external-secrets-policy"
25
  role = aws_iam_role.external_secrets.id
26

27
  policy = jsonencode({
28
    Version = "2012-10-17"
29
    Statement = [{
30
      Effect = "Allow"
31
      Action = [
32
        "secretsmanager:GetSecretValue",
33
        "secretsmanager:DescribeSecret"
34
      ]
35
      Resource = aws_secretsmanager_secret.db_credentials.arn
36
    }]
37
  })
38
}

Notice: Terraform creates the IAM role with the trust policy, but doesn’t create the Kubernetes ServiceAccount. The role ARN is hardcoded in the GitOps manifest.

2. ArgoCD installs ESO with ServiceAccount annotation:

1
serviceAccount:
2
  create: true
3
  name: external-secrets
4
  annotations:
5
    eks.amazonaws.com/role-arn: arn:aws:iam::373317459404:role/devops-quiz-external-secrets-role

3. When ESO pod starts:

EKS sees the ServiceAccount annotation
EKS injects the OIDC token into the pod
ESO AWS SDK reads the token
Sends to STS to assume the role
Gets temporary credentials
Uses credentials to read from Secrets Manager

Why this matters: The ALB Controller can create ALBs because it has that specific IAM role. My backend pods can’t create ALBs because their ServiceAccount doesn’t have those permissions. Principle of least privilege, automated.

Summary: OIDC vs IRSA

Concept	What It Is	Used For
OIDC	Authentication protocol	Proves identity via signed tokens
GitHub OIDC	OIDC provider for GitHub	CI/CD pipeline auth to AWS
EKS OIDC	OIDC provider for EKS cluster	Pod authentication to AWS
IRSA	EKS feature using EKS OIDC	Allows pods to assume IAM roles

Step 3: The Database Layer

The RDS instance sits in the database subnets with a security group that only allows traffic from the EKS nodes:

1
resource "aws_security_group" "rds" {
2
  name   = "${var.cluster_name}-rds-sg"
3
  vpc_id = module.vpc.vpc_id
4

5
  ingress {
6
    from_port       = 5432
7
    to_port         = 5432
8
    protocol        = "tcp"
9
    security_groups = [module.eks.node_security_group_id]
10
  }
11
}
12

13
resource "aws_db_instance" "db" {
14
  allocated_storage      = 20
15
  engine                 = "postgres"
16
  engine_version         = "16.8"
17
  instance_class         = var.db_instance_class
18
  db_name                = var.db_name
19
  username               = var.db_username
20
  password               = random_password.password.result
21
  db_subnet_group_name   = aws_db_subnet_group.rds.name
22
  vpc_security_group_ids = [aws_security_group.rds.id]
23
  skip_final_snapshot    = true
24
}

The password is generated using random_password and stored in AWS Secrets Manager. No hardcoded credentials floating about.

External Secrets Operator: GitOps for Secrets

I used to have Terraform create Kubernetes secrets directly, but that’s problematic for GitOps. If ArgoCD manages your app, but Terraform creates the secrets, you have two sources of truth. Plus, storing secrets in Terraform state isn’t ideal.

External Secrets Operator (ESO) syncs secrets from AWS Secrets Manager to Kubernetes automatically. Here’s how it works:

1. Terraform stores the secret in AWS Secrets Manager:

1
resource "aws_secretsmanager_secret" "db_credentials" {
2
  name = "devops_quiz_db_secret"
3
}
4

5
resource "aws_secretsmanager_secret_version" "db_credentials_version" {
6
  secret_id = aws_secretsmanager_secret.db_credentials.id
7
  secret_string = jsonencode({
8
    DATABASE_URL = "postgresql://${var.db_username}:${random_password.password.result}@${aws_db_instance.db.address}:5432/${var.db_name}"
9
    RDS_ENDPOINT = aws_db_instance.db.address
10
  })
11
}

2. Terraform creates the IAM role for IRSA:

1
resource "aws_iam_role" "external_secrets" {
2
  name = "devops-quiz-external-secrets-role"
3

4
  assume_role_policy = jsonencode({
5
    # Trust policy allows only the ESO ServiceAccount to assume this role
6
    Version = "2012-10-17"
7
    Statement = [{
8
      Effect = "Allow"
9
      Principal = {
10
        Federated = module.eks.oidc_provider_arn
11
      }
12
      Action = "sts:AssumeRoleWithWebIdentity"
13
      Condition = {
14
        StringEquals = {
15
          "${module.eks.oidc_provider}:sub" = "system:serviceaccount:external-secrets:external-secrets"
16
        }
17
      }
18
    }]
19
  })
20
}

Notice: Terraform creates the IAM role, but not the Helm chart. The role ARN gets hardcoded in the GitOps manifest.

3. ArgoCD installs ESO via the root Application:

The k8s/platform/external-secrets-chart.yaml Application installs ESO with the ServiceAccount annotation:

1
serviceAccount:
2
  create: true
3
  name: external-secrets
4
  annotations:
5
    eks.amazonaws.com/role-arn: arn:aws:iam::373317459404:role/devops-quiz-external-secrets-role

4. ArgoCD deploys the ClusterSecretStore and ExternalSecrets (in k8s/apps/external-secrets.yaml):

1
# ClusterSecretStore tells ESO how to connect to AWS
2
apiVersion: external-secrets.io/v1
3
kind: ClusterSecretStore
4
metadata:
5
  name: aws-secrets-manager
6
spec:
7
  provider:
8
    aws:
9
      service: SecretsManager
10
      region: eu-west-2
11
      auth:
12
        jwt:
13
          serviceAccountRef:
14
            name: external-secrets
15
            namespace: external-secrets
16

17
---
18
# ExternalSecret fetches the secret from AWS
19
apiVersion: external-secrets.io/v1
20
kind: ExternalSecret
21
metadata:
22
  name: database-secret
23
  namespace: 3-tier-app-eks
24
spec:
25
  refreshInterval: 1h
26
  secretStoreRef:
27
    name: aws-secrets-manager
28
    kind: ClusterSecretStore
29
  target:
30
    name: database-secret
31
  data:
32
    - secretKey: DATABASE_URL
33
      remoteRef:
34
        key: devops_quiz_db_secret
35
        property: DATABASE_URL

Now the secret flows: AWS Secrets Manager → ESO → Kubernetes Secret → Pod. No secrets in Git, no secrets in Terraform state, and ArgoCD manages everything.

Sync-Wave Annotations: Controlling Deployment Order

When you have resources that depend on each other, you need to control the order they deploy. ArgoCD has sync-waves for this:

1
# Wave -1: Secrets must be created before Deployments can use them
2
apiVersion: external-secrets.io/v1
3
kind: ExternalSecret
4
metadata:
5
  name: database-secret
6
  annotations:
7
    argocd.argoproj.io/sync-wave: "-1"

ArgoCD deploys from lowest wave number to highest. This ensures the namespace exists before ExternalSecrets, and secrets exist before Deployments try to mount them.

The Kubernetes Bridge

The initContainers in the backend and migration job use the RDS endpoint from the secret to wait for the database:

1
initContainers:
2
  - name: wait-for-db
3
    image: busybox
4
    command: ["sh", "-c", 'until nc -z -v -w30 $RDS_ENDPOINT 5432; do echo "waiting for DB.."; sleep 2; done;']
5
    env:
6
      - name: RDS_ENDPOINT
7
        valueFrom:
8
          secretKeyRef:
9
            name: rds-endpoint-secret
10
            key: RDS_ENDPOINT

This is cleaner than hardcoding the endpoint. If RDS ever changes, the secret updates automatically, and the pods get the new value on restart.

Step 4: Deploying the Infrastructure

Before running Terraform, you need to create an S3 bucket for the state file. The backend is configured in provider.tf:

1
backend "s3" {
2
  bucket       = "devops-quiz-terraform-state"
3
  key          = "dev/terraform.tfstate"
4
  region       = "eu-west-2"
5
  encrypt      = true
6
  use_lockfile = true
7
}

Create the bucket manually first, then:

cd terraform
terraform init
terraform apply

This will take about 15 to 20 minutes. Terraform creates the VPC, EKS cluster, RDS instance, ECR repositories, IAM roles for IRSA, and installs ArgoCD via Helm.

Once done, configure kubectl:

aws eks update-kubeconfig --name devops-quiz --region eu-west-2
kubectl get nodes

Step 5: CI/CD with GitHub Actions

The CI/CD pipeline uses OIDC to authenticate with AWS. No long-lived credentials stored in GitHub Secrets.

The GitOps Flow

This is my favourite part the whole deployment pipeline. Once it’s set up, deploying is just git push.

What happens:

I push code to GitHub
GitHub Actions kicks in builds the Docker images
Trivy scans for vulnerabilities (fails the build if it finds CRITICAL issues)
Images get pushed to ECR
Actions updates the Kubernetes manifests with the new image tags
ArgoCD notices the changes in Git
ArgoCD syncs the changes to the cluster

No manual kubectl apply. No SSHing into servers. Just push code and watch it deploy.

The Workflow

The GitHub Actions workflow does the following:

Checks out the code
Authenticates with AWS using OIDC
Builds the Docker images
Runs Trivy vulnerability scanning (fails on CRITICAL vulnerabilities)
Pushes to ECR if scans pass
Updates the Kubernetes manifests with the new image tags
Commits and pushes the changes

1
- name: Build frontend image
2
  run: |
3
    docker build -t $ECR_REGISTRY/$ECR_REPOSITORY_FRONTEND:$IMAGE_TAG ./frontend
4

5
- name: Run Trivy vulnerability scanner
6
  uses: aquasecurity/trivy-action@master
7
  with:
8
    image-ref: ${{ steps.login-ecr.outputs.registry }}/${{ env.ECR_REPOSITORY_FRONTEND }}:${{ github.sha }}
9
    exit-code: "1"
10
    severity: "CRITICAL"
11

12
- name: Push Frontend image to ECR
13
  if: success()
14
  run: |
15
    docker push $ECR_REGISTRY/$ECR_REPOSITORY_FRONTEND:$IMAGE_TAG

The last step updates the image tags in the Kubernetes manifests and commits them back to the repo:

1
- name: Update Kubernetes Manifests
2
  run: |
3
    # Update Frontend Manifest
4
    sed -i "s|image: .*$ECR_REPOSITORY_FRONTEND:.*|image: $ECR_REGISTRY/$ECR_REPOSITORY_FRONTEND:$IMAGE_TAG|g" k8s/apps/frontend.yaml
5

6
    # Update Backend Manifest
7
    sed -i "s|image: .*$ECR_REPOSITORY_BACKEND:.*|image: $ECR_REGISTRY/$ECR_REPOSITORY_BACKEND:$IMAGE_TAG|g" k8s/apps/backend.yaml
8

9
    # Update Migration Job Manifest
10
    sed -i "s|image: .*$ECR_REPOSITORY_BACKEND:.*|image: $ECR_REGISTRY/$ECR_REPOSITORY_BACKEND:$IMAGE_TAG|g" k8s/apps/migration-job.yaml
11

12
    git add k8s/apps/frontend.yaml k8s/apps/backend.yaml k8s/apps/migration-job.yaml
13
    git commit -m "Update image tags to $IMAGE_TAG"
14
    git push

This is the GitOps bit. ArgoCD watches the repo and syncs any changes to the cluster.

Step 6: Deploying with GitOps (The App of Apps Pattern)

After Terraform finishes, you have a running EKS cluster with ArgoCD installed. But ArgoCD doesn’t know what to manage yet. This is where the App of Apps pattern comes in.

The GitOps Architecture

Instead of manually applying Kubernetes manifests or having Terraform install Helm charts, everything flows through Git:

Terraform creates infrastructure and installs ArgoCD. One kubectl apply bootstraps the root Application. Root Application manages platform tools AND application workloads using sync waves.

The k8s/ Directory Structure

1
k8s/
2
├── platform/                    # Platform Applications (ArgoCD manages)
3
│   ├── root.yaml               # Bootstrap: kubectl apply -f this
4
│   ├── alb-controller.yaml     # ALB Controller Helm Application
5
│   ├── external-secrets-chart.yaml  # ESO Helm Application (wave -2)
6
│   └── prometheus.yaml         # Prometheus Helm Application
7
│
8
└── apps/                        # Application workloads
9
    ├── namespace.yaml
10
    ├── configmap.yaml
11
    ├── external-secrets.yaml   # ClusterSecretStore + ExternalSecrets
12
    ├── backend.yaml
13
    ├── frontend.yaml
14
    ├── migration-job.yaml
15
    └── ingress.yaml

Understanding the Flow

1. Bootstrap with one kubectl command:

kubectl apply -f k8s/platform/root.yaml

This single command creates the root Application that manages everything else.

2. The root Application uses sync waves:

1
apiVersion: argoproj.io/v1alpha1
2
kind: Application
3
metadata:
4
  name: external-secrets-chart
5
  annotations:
6
    argocd.argoproj.io/sync-wave: "-2"  # Deploy first (provides CRDs)
7
spec:
8
  source:
9
    repoURL: https://charts.external-secrets.io
10
    chart: external-secrets
11
    targetRevision: 2.0.0
12
---
13
apiVersion: argoproj.io/v1alpha1
14
kind: Application
15
metadata:
16
  name: aws-load-balancer-controller
17
  annotations:
18
    argocd.argoproj.io/sync-wave: "-1"  # Deploy after ESO
19
spec:
20
  source:
21
    repoURL: https://aws.github.io/eks-charts
22
    chart: aws-load-balancer-controller
23
    targetRevision: 3.0.0
24
---
25
apiVersion: argoproj.io/v1alpha1
26
kind: Application
27
metadata:
28
  name: 3-tier-app
29
  annotations:
30
    argocd.argoproj.io/sync-wave: "0"  # Deploy after platform
31
spec:
32
  source:
33
    repoURL: https://github.com/wegoagain-dev/3-tier-eks.git
34
    targetRevision: main
35
    path: k8s/apps

Why sync waves matter: Without them, the 3-tier app might try to deploy before the ALB Controller or External Secrets Operator are ready. This would cause failures - the Ingress can’t create an ALB if the controller isn’t running, and ExternalSecrets can’t be created if ESO CRDs aren’t installed yet.

With sync waves:

Wave -2: External Secrets Operator deploys first (provides CRDs)
Wave -1: ALB Controller and Prometheus deploy
ArgoCD waits for them to be healthy
Wave 0: Application workloads deploy after platform is ready

3. Resource-level sync waves within each app:

The k8s/apps/ directory also uses sync-waves to ensure correct ordering:

Wave -2: Creates the namespace
Wave -1: Sets up External Secrets and fetches from AWS
Wave 0: Deploys backend, frontend, and runs migrations
Wave 1: Creates the ALB ingress

Watch the Deployment

# Check all Applications
kubectl get applications -n argocd

# You should see:
# NAME            SYNC STATUS   HEALTH STATUS
# platform-tools  Synced        Healthy
# 3-tier-app      Synced        Healthy

# Watch platform tools come up first
kubectl get pods -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controller
kubectl get pods -n external-secrets
kubectl get pods -n monitoring

# Then watch your app
kubectl get pods -n 3-tier-app-eks -w

# Check secrets were created by ESO
kubectl get secrets -n 3-tier-app-eks
kubectl get externalsecrets -n 3-tier-app-eks

Access the ArgoCD UI

kubectl port-forward svc/argocd-server -n argocd 8080:443

Get the initial admin password:

kubectl get secret argocd-initial-admin-secret -n argocd -o jsonpath="{.data.password}" | base64 -d

ArgoCD Dashboard:

Testing the Deployment

Once ArgoCD shows “Healthy” and “Synced”:

# Access the app (get the ALB URL from the ADDRESS column)
kubectl get ingress -n 3-tier-app-eks

# Test the app
curl http://<ALB-ADDRESS>/api/health/ready

# View logs if something is wrong
kubectl logs -n 3-tier-app-eks -l app=backend --tail=50

Troubleshooting Sync Issues

If ArgoCD shows “OutOfSync”:

# Force a hard refresh
kubectl patch application 3-tier-app -n argocd --type merge -p '{"metadata":{"annotations":{"argocd.argoproj.io/refresh":"hard"}}}'

# Check sync status details
kubectl get application 3-tier-app -n argocd -o yaml | grep -A 20 "operationState:"

# View ArgoCD logs
kubectl logs -n argocd -l app.kubernetes.io/name=argocd-application-controller --tail=100

Once everything is synced and healthy, you can access the application using the ALB URL from the ingress. Here’s a quick demo of the app in action:

Step 7: Monitoring with Prometheus and Grafana

The kube-prometheus-stack is installed via ArgoCD. Access Grafana:

kubectl port-forward svc/prometheus-grafana -n monitoring 3000:80

Get the admin password:

kubectl get secret prometheus-grafana -n monitoring -o jsonpath="{.data.admin-password}" | base64 -d

The stack comes with dashboards for cluster health, node metrics, and pod resource usage out of the box.

Grafana Dashboard:

Things That Broke (And How I Fixed Them)

Because nothing ever works first time, right?

Ingress stuck without an ADDRESS Spent way too long on this. The AWS Load Balancer Controller needs the right IAM policy, and I was using a v2.7 policy while the Helm chart installed v3.0 of the controller. The fix? Always fetch the policy from the main branch so it matches your controller version:

1
data "http" "alb_controller_policy" {
2
  url = "https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/main/docs/install/iam_policy.json"
3
}

Frontend showing “Connection Error” Classic mistake. I hardcoded localhost:8000 in the React app during development. Worked fine on my machine! But in production, the frontend couldn’t reach the backend. Fixed it by using a relative path (/api) which the ALB ingress routes correctly.

Docker images built on M1 Mac wouldn’t run on EKS Built the images on my Mac (ARM architecture) but EKS nodes are x86. The pods would crash with “exec format error”. Now I use docker buildx build --platform linux/amd64 to build for the right architecture.

Frontend health checks failing The nginx config was listening on port 8080, but the Kubernetes manifests had port 80 for the health probes. The probes kept failing with “connection refused” because they were hitting the wrong port. Fixed by updating the frontend deployment to use port 8080 everywhere:

1
ports:
2
  - containerPort: 8080
3
readinessProbe:
4
  httpGet:
5
    path: /
6
    port: 8080
7
service:
8
  ports:
9
    - port: 80
10
      targetPort: 8080  # Route port 80 to 8080

Migration job using wrong image tag The CI pipeline updates image tags in the manifests, but the migration job was still using latest. This caused “ImagePullBackOff” errors because latest didn’t exist in ECR. Fixed by ensuring the CI pipeline also updates k8s/apps/migration-job.yaml with the same image tag as the backend.

ArgoCD sync stuck on immutable Job Kubernetes Jobs are immutable. Once created, you can’t update them. ArgoCD would try to apply changes and fail with “field is immutable” errors. I initially tried Replace=true but that caused infinite recreation loops when combined with selfHeal: true. The proper solution is using a PostSync hook:

1
apiVersion: batch/v1
2
kind: Job
3
metadata:
4
  name: database-migration
5
  annotations:
6
    # Run as PostSync hook - executes after all other resources are synced
7
    # Deletes after success so it doesn't keep recreating
8
    argocd.argoproj.io/hook: PostSync
9
    argocd.argoproj.io/hook-delete-policy: HookSucceeded

This runs the migration as a PostSync hook - it executes after all other resources are synced, then deletes itself on success. No more infinite loops.

Secrets not syncing from AWS Initially, I had Terraform creating Kubernetes secrets directly. This worked, but it meant secrets were in Terraform state and ArgoCD wasn’t managing them. Refactored to use External Secrets Operator (ESO) which syncs secrets from AWS Secrets Manager. Now secrets flow: AWS Secrets Manager → ESO → Kubernetes Secret, all managed by ArgoCD.

Helm repo cache errors Got errors like “no cached repo found” when running Terraform. Fixed by adding the Helm repos locally:

helm repo add eks https://aws.github.io/eks-charts
helm repo add argo https://argoproj.github.io/argo-helm
helm repo add external-secrets https://charts.external-secrets.io
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

Race conditions between platform and app Initially, I had separate kubectl commands for platform tools and the app. The app would sometimes fail because it tried to deploy before the ALB Controller or External Secrets Operator were ready. Fixed by using the App of Apps pattern with sync waves - ESO deploys first (wave -2) to provide CRDs, then ALB Controller and Prometheus (wave -1), then the app deploys after they’re healthy (wave 0).

CRD annotation too long error Got this error when ArgoCD tried to apply the External Secrets Operator CRDs: metadata.annotations: Too long: must have at most 262144 bytes. This happens because ArgoCD’s client-side apply adds a last-applied-configuration annotation that exceeds Kubernetes’ limit for large CRDs. Fixed by adding ServerSideApply=true to the sync options:

1
syncOptions:
2
  - CreateNamespace=true
3
  - ServerSideApply=true  # Fixes CRD annotation limit

Server-side apply handles the apply logic on the Kubernetes server side without the large annotation.

Cleanup

Follow these steps in order to avoid orphaned resources:

Step 1: Delete the Ingress first (so the ALB gets cleaned up)

kubectl delete ingress three-tier-ingress -n 3-tier-app-eks

Wait a minute or two for the ALB to be deleted.

Step 2: Destroy Terraform resources

cd terraform
terraform destroy

This deletes everything: EKS cluster, VPC, RDS, ECR repos, the lot.

What I’d Add Next

This works, but it’s not production-ready. Here’s what I’d tackle next:

Custom Domain & HTTPS Right now I’m using the ALB’s auto-generated DNS name (something like k8s-3tierap-3tierap-1234567890.eu-west-2.elb.amazonaws.com). Not exactly user-friendly. Adding Route53 and ACM certificates would give me a proper domain with SSL.

Horizontal Pod Autoscaling (HPA) Currently the backend just runs with 2 replicas all the time. That’s fine until I get a traffic spike and the app crashes, or it’s 3am and I’m paying for capacity I’m not using. HPA would scale pods up when CPU hits 70% and scale down when things are quiet.

Network Policies Right now, any pod can talk to any other pod. That’s… not great from a security perspective. Network policies would lock this down. Only the backend pods can reach the database, only the frontend can reach the backend. If someone compromises the frontend, they can’t just hop to the database.

Resource Quotas & Limits Without quotas, one runaway pod could consume all the cluster’s CPU and memory. I’ve seen it happen. Setting namespace quotas and pod limits keeps things predictable.

Multi-AZ RDS The database is a single point of failure right now. If that availability zone goes down (rare but it happens), the whole app goes down. Multi-AZ RDS would automatically failover to a standby in another AZ.

What I Learned

Going through this project taught me a few things:

The App of Apps pattern with sync waves is the right way to do GitOps. Initially, I had Terraform installing Helm charts for ALB Controller, External Secrets, and Prometheus. It worked, but it was messy. Terraform would hang trying to connect to a cluster that wasn’t ready yet, and I had infrastructure concerns mixed with application deployment. Refactoring to the App of Apps pattern - where Terraform only creates infrastructure and ArgoCD, then a single kubectl apply bootstraps everything via a root Application with sync waves - made everything cleaner. Now Terraform finishes quickly, and GitOps drives the entire platform and application lifecycle with guaranteed ordering.

OIDC and IRSA are game-changers for security. Understanding the difference between GitHub OIDC (for CI/CD auth) and EKS OIDC (for pod auth via IRSA) was crucial. GitHub OIDC eliminates long-lived AWS credentials in GitHub Secrets. IRSA eliminates the need to give nodes broad IAM permissions. Both use short-lived tokens and the principle of least privilege. The key insight: OIDC is the authentication mechanism (proves identity), IRSA is the authorisation feature (proves what you’re allowed to do).

Separate concerns: Terraform creates IAM, GitOps creates pods. For IRSA to work, you need both the IAM role (Terraform’s job) and the ServiceAccount with the role annotation (GitOps’s job). The clean separation is: Terraform creates the role with the trust policy that references the OIDC provider, hardcodes the role ARN in GitOps manifests, then ArgoCD creates the ServiceAccounts and pods. This keeps infrastructure (IAM, OIDC providers) separate from workload configuration.

External Secrets Operator is the way to go for GitOps secrets. Storing secrets in Terraform state or Git is a bad idea. ESO bridges AWS Secrets Manager and Kubernetes seamlessly. The secret stays in AWS, ESO syncs it to Kubernetes using IRSA for authentication, and ArgoCD manages the ExternalSecret resource. Best of both worlds - security of AWS Secrets Manager, convenience of Kubernetes secrets.

Sync-waves are essential for ordered deployments. When you have resources that depend on each other (platform tools must be ready before apps, namespace must exist before deployments, secrets must exist before pods can mount them), sync-waves ensure they deploy in the right order. Without them, you’ll get random failures on first deploy. Application-level sync waves prevent race conditions between platform and apps. Resource-level sync waves ensure correct ordering within each application.

GitOps is worth the setup. Yeah, it took time to get ArgoCD working. But now deploying is just git push. I don’t have to remember kubectl commands or worry about which version is running where. The Git repo is the source of truth. The single kubectl apply -f k8s/platform/root.yaml bootstrap means anyone can deploy the entire stack after Terraform runs.

IRSA isn’t optional. I initially thought about just giving the nodes broad IAM permissions. That would’ve been a mistake. IRSA means each pod only gets the permissions it needs via its ServiceAccount. It’s more work upfront (creating IAM roles with trust policies) but way more secure. Plus, understanding the OIDC token flow is a great interview topic.

Cost surprises are real. That NAT Gateway cost caught me off guard. $32/month just so private subnets can reach the internet! Worth knowing about before you deploy.

Port mismatches are sneaky. The frontend nginx was listening on 8080, but I had port 80 in the Kubernetes manifests. Everything looked fine, but health checks kept failing. Always double-check your ports match between the container and the manifest.

Provider cleanup matters. I initially had kubernetes and http providers in Terraform that weren’t actually being used. Cleaning those up made the configuration clearer and reduced dependencies. Only include providers you’re actually using.

If you’re trying this yourself and get stuck, feel free to open an issue on the repo. I’ve probably hit the same problem.