Build a Serverless CI/CD pipeline on AWS with Fargate, CodePipeline and Terraform

This blog provides an example for deploying a CI/CD pipeline on AWS utilising the serverless container platform Fargate and the fully managed CodePipeline service. We’ll also use Terraform to automate the process for building the entire AWS environment, as shown in the below diagram.

Specifically, we’ll be creating the following AWS resources:

  • 1x demo VPC including public/private subnets, NAT gateway and security groups etc
  • 1x ALB for providing LB services to a target group of 2x Fargate container tasks
  • 1x ECS cluster with a Fargate service definition (running our demo app)
  • 1x CodePipeline definition, which builds the demo app from GitHub Repo (with a webhook trigger) and deploys it to the same Fargate service
  • 1x ECR repository for hosting pipeline build images
  • 2x S3 Buckets as build & artifact cache

References – for this demo, I’m using these Terraform modules found on GitHub:

PREREQUISITES

  • Access to an AWS testing environment
  • Install Git & Terraform on your client
  • Install AWS toolkits including AWS CLI, AWS-IAM-Authenticator
  • Check the NTP clock & sync status on your client —> important!
  • Clone or donwload the Terraform code at here.
  • Clone or fork the demo app (including CodePipeline buildspec) at here.

Step-1: Review the Terraform Script

Let’s take a close look of the Terraform code. I’ll skip the VPC and ALB sections and focus on the ECS/Fargate service and CodePipeline definition.

This section creates an ECS cluster with the Fargate service definition, note I have put a bitnami node image for testing purpose and it will get replaced automatically by our demo app via the CodPipeline execution.

############################# Create ECS Cluster and Fargate Service ##################################


resource "aws_ecs_cluster" "ecs_cluster" {
  name = "default"
}


module "ecs_fargate" {
  source           = "git::https://github.com/tmknom/terraform-aws-ecs-fargate.git?ref=tags/2.0.0"
  name             = var.ecs_service_name
  container_name   = var.container_name
  container_port   = var.container_port
  cluster          = aws_ecs_cluster.ecs_cluster.arn
  subnets          = module.vpc.public_subnets
  target_group_arn = join("", module.alb.target_group_arns)
  vpc_id           = module.vpc.vpc_id

  container_definitions = jsonencode([
    {
      name      = var.container_name
      image     = "bitnami/node:latest"
      essential = true
      portMappings = [
        {
          containerPort = var.container_port
          protocol      = "tcp"
        }
      ]
    }
  ])

  desired_count                      = 2
  deployment_maximum_percent         = 200
  deployment_minimum_healthy_percent = 100
  deployment_controller_type         = "ECS"
  assign_public_ip                   = true
  health_check_grace_period_seconds  = 10
  platform_version                   = "LATEST"
  source_cidr_blocks                 = ["0.0.0.0/0"]
  cpu                                = 256
  memory                             = 512
  requires_compatibilities           = ["FARGATE"]
  iam_path                           = "/service_role/"
  description                        = "Fargate demo example"
  enabled                            = true

  tags = {
    Environment = "Dev"
  }
}

This section creates an ECR repository (for hosting the build image) and defines the pipeline, which builds the demo app from GitHub repo, pushes the new image to ECR and deploys it to the same ECS cluster and Fargate service as created from the above.

################################### Create ECR Repo and Code Pipeline ###################################


resource "aws_ecr_repository" "fargate-repo" {
  name = var.ecr_repo

  image_scanning_configuration {
    scan_on_push = true
  }
}

module "ecs_codepipeline" {
  source                = "git::https://github.com/cloudposse/terraform-aws-ecs-codepipeline.git?ref=master"
  name                  = var.app_name
  namespace             = var.namespace
  region                = var.region
  image_repo_name       = var.ecr_repo
  stage                 = var.stage
  github_oauth_token    = var.github_oath_token
  github_webhooks_token = var.github_webhooks_token
  webhook_enabled       = "true"
  repo_owner            = var.github_repo_owner
  repo_name             = var.github_repo_name
  branch                = "master"
  service_name          = module.ecs_fargate.ecs_service_name
  ecs_cluster_name      = aws_ecs_cluster.ecs_cluster.arn
  privileged_mode       = "true"
}

Note the pipeline is synced to GitHub with a webhook trigger enabled, and you’ll need to supply a GitHub personal token for this. So go create one if you haven’t already done so.

This image has an empty alt attribute; its file name is image.png

Step-2: Create the Serverless Pipeline with Terraform

Configure AWS environment variables

[root@cloud-ops01 tf-aws-eks]# aws configure
AWS Access Key ID [*****]: 
AWS Secret Access Key [***]: 
Default region name [us-east-1]: 
Default output format [json]:

update terraform.tfvars based on your own environment

region = "us-east-1"
ecs_service_name = "ecs-svc-example"
container_port = 3000
container_name = "demo-app"
namespace = "xxx"
stage = "dev"
app_name = "demo-app-xxxx"
ecr_repo = "fargate-demo-repo"
github_oath_token = "xxxx"
github_webhooks_token = "xxxx"
github_repo_owner = "xxxx"
github_repo_name = "fargate-demo-app"

Now run the Terraform script

terraform init
terraform apply

The process will take about 5 mins and you should see an output like this. Note the public URL of the ALB, which is providing LB services to the 2x Fargate container tasks.

Step-3: Review the Fargate Service

On the AWS Console, go to “Elastic Container Service (ECS) —> Cluster” and we can see an ECS cluster “default” has been created, with 1x Fargate service defined and 2x container tasks/pods running.

and here are the two running container tasks/pods:

Click any of the tasks to confirm its running our demo app image deployed from the ECR repository.

Next, search for AWS service “Developer Tools —> CodePipeline“, you’ll see our Pipeline has been deployed with a (1st) successful execution.

Now search for “EC2 —> Load Balancer”, confirm that an ALB has been created and it should be deployed on two different subsets across two AZs.

This is because we are spreading the 2x ECS container tasks onto two AZs for high availability

Go to the ALB public DNS/URL and you should see the default page of our demo app running on AWS Fargate, cool!

Step-4: Test the Pipeline Run

It’s testing time now! As discussed, the pipeline is synced to Github repository and will be triggered by a push to master event. The actual build task is defined within the buildspec.yaml which contains a simple 3-stage process as per below. Note the output of the build process includes a json artifact (imagedefinitions.json) which includes the ECR path for the latest build image.

version: 0.2
phases:
  pre_build:
    commands:
      - echo Logging in to Amazon ECR...
      - aws --version
      - eval $(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email)
      - REPOSITORY_URI=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO_NAME
      - IMAGE_TAG=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)
  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...
      - REPO_URI=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO_NAME
      - docker pull $REPO_URI:latest || true
      - docker build --cache-from $REPO_URI:latest --tag $REPO_URI:latest --tag $REPO_URI:$IMAGE_TAG .
  post_build:
    commands:
      - echo Build completed on `date`
      - echo Pushing the Docker images...
      - REPO_URI=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO_NAME
      - docker push $REPO_URI:latest
      - docker push $REPO_URI:$IMAGE_TAG
      - echo Writing image definitions file...
      - printf '[{"name":"demo-app","imageUri":"%s"}]' "$REPO_URI:$IMAGE_TAG" | tee imagedefinitions.json
artifacts:
  files: imagedefinitions.json

To test the pipeline run, we’ll make a “cosmetic change” to the app revision (v1.0 —> v1.1)

Commit and push to master.

As expected, this has triggered a new pipeline run

Soon you’ll see two additional pods are launching with a new revision number of “3” — this is because by default Fargate implements a rolling update deployment strategy with a default minimum healthy percent of 100%. So it will not remove the previous container pods (revision 2) until the new ones are running and ready.

Once the v3 Pods are running and we can see the v2 pods are being terminated and de-registered from the service.

Eventually the v2 pods are removed and the Fargate service is now updated with revision 3, which consists of the new pods running our demo app “v1.1”.

In the CodePipeline history, verify the new build & deployment process have been completed successfully.

Also, verify the new image (tag “99cc610”) of the demo app is pushed to ECR as expected.

Go to the Fargate tasks (revision 3) again and verify the container pods are indeed running on the new image “99cc610”.

Refresh the ALB address to see the v1.1 app page loading — Magic!

Cloud Native DevOps on GCP Series Ep3 – Use Terraform to launch a Serverless CI/CD pipeline with Cloud Run, GCR and Cloud Build

This is the third episode of our Cloud Native DevOps on GCP series. In the previous chapters, we have achieved the following:

This time, we will take a step further and go completely serverless by deploying the same Node app onto the Google Cloud Run platform. Cloud Run is built from an open source project named Knative, which is a serverless framework developed based on the industry proven Kubernetes architecture. Whilst Knative is developed with the same event-driven concept (like other serverless solutions), it also offers great flexibility and multi-cloud portability at a container level.

For this demo, we will firstly launch a Cloud Run Service with an initial image using cloudrun-hello app provided by Google. We will also create a Cloud Build Pipeline to automatically build and push our Node app onto GCR, and then deploy it to the same Cloud Run Service (as a new revision). As previously, the pipeline will be synced to GitHub repository and automatically triggered by a Git push event.

Best of all, all GCP resources in this environment, including the Cloud Run Service and the Cloud Build Pipeline will be provisioned via Terraform, as illustrated at below.

WHAT YOU’LL NEED:

  • Access to a GCP testing environment
  • Install Git and Terrafrom on your client
  • InstallΒ GCloud SDK
  • Check the NTP clock & sync status on your client β€”> important!
  • Clone or download the Terraform script at here
  • Clone or fork the NodeJS demo app at here

Step-1: Prepare the GCloud Environment

To start, configure the GCloud environment variables and authentications.

gcloud init
gcloud config set accessibility/screen_reader true
gcloud auth application-default login

Enable required GCP API services

gcloud services enable servicenetworking.googleapis.com
gcloud services enable cloudresourcemanager.googleapis.com
gcloud services enable cloudbuild.googleapis.com
gcloud services enable containerregistry.googleapis.com 
gcloud services enable run.googleapis.com 
gcloud services enable sourcerepo.googleapis.com    

Update Cloud Build service account with all the necessary roles so it will have required permissions to access Cloud Run and GCR within the project.

PROJECT_ID=`gcloud config get-value project`
CLOUDBUILD_SA="$(gcloud projects describe $PROJECT_ID --format 'value(projectNumber)')@cloudbuild.gserviceaccount.com"
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:$CLOUDBUILD_SA --role roles/editor
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:$CLOUDBUILD_SA --role roles/run.admin
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:$CLOUDBUILD_SA --role roles/container.developer

Step-2: Connect Cloud Build to GitHub Repository

Next, let’s connect Cloud Build to the demo app Git Repository. On GCP console, go to “Cloud Build —> Triggers —> Connect Repository” and then select “GitHub” as below. (You will be redirected to GitHub for authentication.)

Select the demo app repository which contains the sample NodeJs application.

In the next page, make sure to click “Skip for now” and we are done. We’ll leave it to Terraform to create the trigger at later.

Step-3: Run Terrafrom Script to launch a Serverless CI/CD Pipeline

Before executing the script, make sure to update the variables (as defined in Terrafrom.tfvars) as per your own GCP environment.

project_id = "xxxxxxxx"
location = "asia-northeast1"
gcr_region = "asia"
github_owner = "xxxxxx"
github_repository = "xxxxxx"

Run the Terraform script.

terraform init
terraform apply

Since we are not provisioning any Infrastructure resources (it’s Serverless!), the process should take less than 2~3 mins. Take a note of the URL provided in the output — this is the public URL of our Cloud Run Service.

On GCP console verify the Cloud Run Service has been deployed successfully.

Now go to the above URL and you should see the default page of the cloudrun-hello app.

Before we move forward, confirm there is now a Cloud Build triggered provisioned by Terrafrom with the pipeline config defined as “cloudbuild.yaml“.

Step-4: Test the Pipeline

Now let’s take a closer look at the pipeline code. This is a basic 3-stage pipeline:

  • Build the demo Node app
  • Push the image to GCR
  • Deploy the image from GCR to the existing Cloud Run Service
steps:
  # Build Node app docker image
  - name: "gcr.io/cloud-builders/docker"
    args:
      - build
      - -t
      - ${_GCR_REGION}.gcr.io/$PROJECT_ID/${_SERVICE_NAME}:$COMMIT_SHA
      - .

  # Push Node app image to GCR
  - name: "gcr.io/cloud-builders/docker"
    args:
      - push 
      - ${_GCR_REGION}.gcr.io/$PROJECT_ID/${_SERVICE_NAME}:$COMMIT_SHA

  # Deploy the docker image to Cloud Run Service
  - name: "gcr.io/cloud-builders/gcloud"
    args:
      - run
      - deploy
      - ${_SERVICE_NAME}
      - --image=${_GCR_REGION}.gcr.io/$PROJECT_ID/${_SERVICE_NAME}:$COMMIT_SHA
      - --region=${_LOCATION}
      - --platform=managed

images:
  - "${_GCR_REGION}.gcr.io/$PROJECT_ID/${_SERVICE_NAME}:$COMMIT_SHA"

timeout: 1200s
substitutions:
  _LOCATION: asia-northeast1 
  _GCR_REGION: asia 
  _SERVICE_NAME: cloudrun-demo

Time to test the pipeline! We’ll add a note into the README file.

Commit and push to Git.

This should automatically trigger the pipeline, and the 3-stage process should be completed around a minute πŸ™‚

Now go back to our Cloud Run Service, you should see a new revision has been deployed by Cloud Build, with the container image now pointing to the GCR path (which contains our demo app).

Refresh the browser and Boom — you now have access to the demo app running on Google Cloud Run!

This concludes our Cloud Native DevOps on GCP series. I hope this has been informative and thanks very much for reading!

Cloud Native DevOps on GCP Series Ep2 – Create a CI/CD pipeline with GKE, GCR and Cloud Build

This is the second episode of our Cloud Native DevOps on GCP series. In the previous chapter, we have built a multi-AZ GKE cluster with Terraform. This time, we’ll create a cloud native CI/CD pipeline leveraging our GKE cluster and Google DevOps tools such as Cloud Build and Google Container Registry (GCR). We’ll create a Cloud Build trigger by connecting to GitHub repository to perform automatic build, test and deployment of a sample micro-service app onto the GKE cluster.

For this demo, I have provided a simple NodeJS app which is already containerized and packaged as a Helm Chart for fast K8s deployment. You can find all the artifacts at my GitHub Repo, including the demo app, Helm template/chart, as well as the Cloud Build pipeline code.

WHAT YOU’LL NEED:

  • Access to a GCP testing environment
  • Install Git, Kubectl and Terrafrom on your client
  • Install Docker on your client
  • Install GCloud SDK
  • Check the NTP clock & sync status on your client β€”> important!
  • Clone or download the demo app repo at here

Step-1: Prepare the GCloud Environment

To begin, configure the GCloud environment variables and authentications.

gcloud init
gcloud config set accessibility/screen_reader true
gcloud auth application-default login

Register GCloud as a Docker credential helper — this is important so our Docker client will have privileged access to interact with GCR. (Later we’ll need to build and push a Helm client image to GCR as required for the pipeline deployment process)

gcloud auth configure-docker

Enable required GCP API services.

gcloud services enable compute.googleapis.com
gcloud services enable servicenetworking.googleapis.com
gcloud services enable cloudresourcemanager.googleapis.com
gcloud services enable container.googleapis.com
gcloud services enable cloudbuild.googleapis.com

Update Cloud Build service account with an editor role so it will have required permissions to access GKE and GCR within the project.

PROJECT_ID=`gcloud config get-value project`
CLOUDBUILD_SA="$(gcloud projects describe $PROJECT_ID --format 'value(projectNumber)')@cloudbuild.gserviceaccount.com"
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:$CLOUDBUILD_SA --role roles/editor

Step-2: Launch a GKE Cluster using Terraform

If you have been following the series and have already deployed a GKE cluster, you can skip this step and move on to the next. Otherwise you can follow this post to build a GKE cluster with Terraform.

Make sure to deploy an Ingress Controller as there is an Ingress service defined in our Helm Chart!

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-0.32.0/deploy/static/provider/cloud/deploy.yaml  

Step-3: Initialize Helm for Application Deployment on GKE

As mentioned above, for this demo we have encapsulated our demo app into a Helm Chart. Helm is a package management system designed for simplifying and accelerating application deployment on the Kubernetes platform.

As of version 2, Helm consists of a local client and a Tiller server pod (deployed in K8s cluster) to interact with the Kube-apiserver for app deployment. In our example, we’ll first build a customised Helm client docker image and push it to GCR. This image will then be used by Cloud Build to interact with the Tiller server (deployed on GKE) for deploying the pre-packaged Helm chart — as illustrated in the below diagram.

First let’s configure a service account for Tiller and initialize Helm (server component) on our GKE cluster.

kubectl apply -f ./k8s-helm/tiller.yaml
helm init --history-max 200 --service-account tiller

We’ll then build and push a customised Helm client image to GCR. This might take a few minutes.

cd ./k8s-helm/cloud-builders-community/helm
docker build -t gcr.io/$PROJECT_ID/helm .
docker push gcr.io/$PROJECT_ID/helm

On GCR confirm there is a new Helm (client) image has been pushed through.

Step-4: Review the (Cloud Build) Pipeline Code

Before we move forward, let’s take a moment to review the pipeline code (as defined in the cloudbuild.yaml). There is a total of 4 stages included in our Cloud Build pipeline:

  1. Build a docker image with our demo app
  2. Push the new image to GCR
  3. Deploy Helm chart (for our demo app) to GKE via GCR
  4. Integration Testing

The first two stages are straight forward, we’ll use the Google published Cloud Builder docker image to build the node app image and push it to the GCR repository.

  # Build demo app image
  - name: gcr.io/cloud_builders/docker
    args:
      - build
      - -t
      - gcr.io/$PROJECT_ID/node-app:$COMMIT_SHA
      - .
  # Push demo app image to GCR
  - name: gcr.io/cloud-builders/docker
    args:
      - push
      - gcr.io/$PROJECT_ID/node-app:$COMMIT_SHA

Next we’ll leverage the (previously built) Helm client to interact with our GKE cluster and to deploy the Helm chart (for our node app), with the image repository pointing to the GCR path from the last pipeline stage.

  # Deploy with Helm Chart
  - name: gcr.io/$PROJECT_ID/helm
    args:
      - upgrade
      - -i
      - node-app
      - ./k8s-helm/node-app
      - --set
      - image.repository=gcr.io/$PROJECT_ID/node-app,image.tag=$COMMIT_SHA
      - -f
      - ./k8s-helm/node-app/values.yaml
    env:
      - CLOUDSDK_COMPUTE_REGION=$_CUSTOM_REGION
      - CLOUDSDK_CONTAINER_CLUSTER=$_CUSTOM_CLUSTER
      - KUBECONFIG=/workspace/.kube/config
      - TILLERLESS=false
      - TILLER_NAMESPACE=kube-system

Lastly, we’ll run an integration test to verify the demo app status on our GKE cluster. For our node app there is a built-in heath-check URL configured at “/health“, and we’ll be leveraging another Cloud Builder curl image to ping this URL path and expect a return message of <“status”: “ok”> . Note: here we should be polling the internal DNS address for the k8s service (of the demo app) so there is no dependency on IP allocations.

  # Integration Testing
  - name: gcr.io/cloud-builders/kubectl
    entrypoint: 'bash'
    args:
      - '-c'
      - |
        kubectl delete --wait=true pod curl
        kubectl run curl --restart=Never --image=gcr.io/cloud-builders/curl --generator=run-pod/v1 -- http://node-app.default.svc.cluster.local/health
        sleep 15
        kubectl logs curl 
        kubectl logs curl | grep OK
    env:
      - CLOUDSDK_COMPUTE_REGION=$_CUSTOM_REGION
      - CLOUDSDK_CONTAINER_CLUSTER=$_CUSTOM_CLUSTER
      - KUBECONFIG=/workspace/.kube/config

Step-4: Create a Cloud Build Trigger by Connecting to GitHub Repository

Now that we have our GKE cluster ready and Helm image pushed to GCR, the next step is to connect Cloud Build to the GitHub repository and create a CI trigger. On GCP console, go to Cloud Build —> Triggers, select the GitHub repo as below.

If this is the first time you are connecting to GitHub in Cloud Build, it will redirect you to an authorization page like below, accept it in order to access your repositories.

Select the demo app repository, which also includes the pipeline config (cloudbuild.yaml) file.

Create a push trigger in the next page and you should see a summary like this.

You can manually run the trigger now to kick off the CI build process. However we’ll be running more thorough testing to verify the end-to-end pipeline automation process in the next section.

Step-5: Test the CI/CD Pipeline

It’s time to test our CI/CD pipeline! First we’ll make a “cosmetic” version change (1.0.0 to 1.0.1) to the Helm chart for our demo app.

Commit the change and push to the Git repository.

This (push event) should have triggered our Cloud Build pipeline. You can jump on the GCP console to monitor the fully automated 4-stage process. The pipeline will be completed once the integration test has returned a status of OK.

On the GKE cluster we can see our Helm chart v-1.0.1 has been deployed successfully.

The deployment and node app are running as expected.

Retrieve the Ingress public IP and update the local host file for a quick testing. (Note the Ingress URL is defined as “node-app.local”)

[root@cloud-ops01 nodejs-cloudbuild-demo]# kubectl get ingresses 
NAME       HOSTS            ADDRESS         PORTS   AGE
node-app   node-app.local   34.87.213.107   80      15m
[root@cloud-ops01 nodejs-cloudbuild-demo]# 
[root@cloud-ops01 nodejs-cloudbuild-demo]# echo "34.87.213.107  node-app.local" >> /etc/hosts   

Now point your browser to “node-app.local” and you should see the demo app page like below. Congrats, you have just successfully deployed a cloud native CI/CD pipeline on GCP!

Cloud Native DevOps on GCP Series Ep1 – Build a GKE Cluster with Terraform

This is the first episode of our Cloud Native DevOps on GCP series. Here we’ll be building an Google Kubernetes Engine (GKE) cluster using Terraform. From my personal experience, GKE has been one of the most scalable and reliable managed Kubernetes solution, and it’s also 100% upstream compliant and certified by CNCF.

For this demo I have provided a sample Terraform script at here. The target state will look like this:

In specific, we’ll be launching the following GCP/GKE resources:

  • 1x new VPC for hosting the demo GKE cluster
  • 1x /17 CIDR block as the primary address space for the VPC
  • 2x /18 CIDR blocks for the GKE Pod and Service address spaces
  • 1x GKE high availability cluster across 2x Availability Zone (AZ)
  • 2x GKE worker instance groups (2x nodes each)

PREREQUISITES

  • Access to a GCP testing environment
  • Install Git, Kubectl and Terrafrom on your client
  • Install GCloud SDK
  • Check the NTP clock & sync status on your client β€”> important!
  • Clone the Terraform Repo at here

Step-1: Setup the GCloud Environment and Run the Terrafrom Script

To begin, run below interactive GCloud commands to prepare for the GCP environment

gcloud init  
gcloud config set accessibility/screen_reader true  
gcloud auth application-default login  

Remember to update the terraform.tfvars with your own GCP project_id

project_id = "xxxxxxxx"

Make sure to enable the GKE API if not already

gcloud services enable container.googleapis.com

Now run the Terraform script:

terraform init
terraform apply

The whole process should be taking about 7~10 mins, and you should get an output like this:

Now register the cluster and update kubeconfig file

[root@cloud-ops01 tf-gcp-gke]# gcloud container clusters get-credentials node-pool-cluster-demo --region australia-southeast1
Fetching cluster endpoint and auth data.
kubeconfig entry generated for node-pool-cluster-demo.

Step-2: Verify the GKE Cluster Status

Check that we can access the GKE cluster and there should be 4x worker nodes provisioned.

[root@cloud-ops01 ~]# kubectl get nodes
NAME                                               STATUS   ROLES    AGE     VERSION
gke-node-pool-cluster-demo-pool-01-03a2c598-34lh   Ready    <none>   8m59s   v1.16.9-gke.2
gke-node-pool-cluster-demo-pool-01-03a2c598-tpwq   Ready    <none>   9m      v1.16.9-gke.2
gke-node-pool-cluster-demo-pool-01-e903c7a8-04cf   Ready    <none>   9m5s    v1.16.9-gke.2
gke-node-pool-cluster-demo-pool-01-e903c7a8-0lt8   Ready    <none>   9m5s    v1.16.9-gke.2

This can also been verified on GKE console

The 4x worker nodes are provisioned over 2x managed instance groups across two different AZs

Run kubectl describe nodes and we can see each node has been tagged with a few customised labels based on its unique properties. These are important metadata which can be used for selective Pod/Node deployment and other use cases like affinity or anti-affinity rules.

Step-3: Deploy GKE Add-on Services

  • Install Metrics-Server to provide cluster-wide resource metrics collection and to support use cases such as Horizontal Pod Autoscaling (HPA)
[root@cloud-ops01 tf-gcp-gke]# kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml

Wait for a few seconds and we should have resource stats

[root@cloud-ops01 tf-gcp-gke]# kubectl top nodes
NAME                                               CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
gke-node-pool-cluster-demo-pool-01-03a2c598-34lh   85m          4%     798Mi           14%       
gke-node-pool-cluster-demo-pool-01-03a2c598-tpwq   300m         15%    816Mi           14%       
gke-node-pool-cluster-demo-pool-01-e903c7a8-04cf   191m         9%     958Mi           16%       
gke-node-pool-cluster-demo-pool-01-e903c7a8-0lt8   102m         5%     795Mi           14%    
  • Next, deploy a NGINX Ingress Controller so we can use L7 URL load balancing and to save cost by reducing the required numbers of external load balances
[root@cloud-ops01 tf-gcp-gke]# kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-0.32.0/deploy/static/provider/cloud/deploy.yaml  

On GCP console we can see that an external Load Balancer has been provisioned in front of the Ingress Controller. Take a note of the LB address at below — this is the public IP that will be consumed by our ingress services.

In addition, we’ll deploy 2x storage classes to provide dynamic persistent storage support for stateful pods and services. Note the different persistent disk (PD) specs (standard & SSD) for different I/O requirements.

 [root@cloud-ops01 tf-gcp-gke]# kubectl create -f ./storage/storageclass/  

Step-4: Deploy Sample Apps onto the GKE Cluster for Testing

  • We’ll first deploy the famous Hipster Shop app, which is a cloud-native microservice application developed by Google.
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/master/release/kubernetes-manifests.yaml  

wait for all the Pods up and running

[root@cloud-ops01 tf-gcp-gke]# kubectl get pods 
NAME                                     READY   STATUS    RESTARTS   AGE
adservice-687b58699c-fq9x4               1/1     Running   0          2m16s
cartservice-778cffc8f6-dnxmr             1/1     Running   0          2m20s
checkoutservice-98cf4f4c-69fqg           1/1     Running   0          2m26s
currencyservice-c69c86b7c-mz5zv          1/1     Running   0          2m19s
emailservice-5db6c8b59f-jftv7            1/1     Running   0          2m27s
frontend-8d8958c77-s9665                 1/1     Running   0          2m24s
loadgenerator-6bf9fd5bc9-5lsrn           1/1     Running   3          2m19s
paymentservice-698f684cf9-7xbjc          1/1     Running   0          2m22s
productcatalogservice-789c77b8dc-4tk4w   1/1     Running   0          2m21s
recommendationservice-75d7cd8d5c-4x9kl   1/1     Running   0          2m25s
redis-cart-5f59546cdd-8tj8f              1/1     Running   0          2m17s
shippingservice-7d87945947-nhb5x         1/1     Running   0          2m18s

check the external frontend service, you should see a LB has been deployed by GKE with a public IP assigned

[root@cloud-ops01 ~]# kubectl get svc frontend-external 
NAME                TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)        AGE
frontend-external   LoadBalancer   192.168.74.68   35.197.182.62   80:32408/TCP   5m32s

You should be able to access the app via the LB public IP.

  • Next, we’ll deploy the sample Guestbook app to verify the persistent storage setup.
[root@cloud-ops01 tf-gcp-gke]# kubectl create ns guestbook-app  
[root@cloud-ops01 tf-gcp-gke]# kubectl apply -f ./demo-apps/guestbook/  

The application requests 2x persistent volumes (PV) for the redis-master and redis-slave pods. Both PVs should be automatically provisioned by the persistent volume claims (PVC) with the 2x different storage classes as we deployed earlier. You should see the STATUS reported as “Bound” between each PV and PVC mapping.

Retrieve the external IP/DNS for the frontend service of the Guestbook app.

[root@cloud-ops01 tf-gcp-gke]# kubectl get svc frontend -n guestbook-app 
NAME       TYPE           CLUSTER-IP        EXTERNAL-IP    PORT(S)        AGE
frontend   LoadBalancer   192.168.127.128   34.87.228.35   80:31006/TCP   23m

You should be able to access the Guesbook app now. Enter and submit some messages, and try to destroy and redeploy the app, your data will be kept by the redis PVs.

  • Lastly, we’ll deploy a modified version of the yelb app to test the NGINX ingress controller
[root@cloud-ops01 tf-gcp-gke]# kubectl create ns yelb  
[root@cloud-ops01 tf-gcp-gke]# kubectl apply -f ./demo-apps/yelb/

You should see an ingress service deployed as per below.

Retrieve the external IP for the ingress service within the yelb namespace. As mentioned before, this should be the same address of the external LB deployed for the ingress controller.

[root@cloud-ops01 tf-gcp-gke]# kubectl get ingresses -n yelb 
NAME           HOSTS        ADDRESS       PORTS   AGE
yelb-ingress   yelb.local   35.189.3.12   80      6m47s

Also, notice the ingress URL path is defined as “yelb.local”. This is the DNS entry that will be redirected by the http ingress service. So we’ll update the local host file (with the ingress public IP) for a quick testing.

[root@cloud-ops01 tf-aws-eks]# echo "35.189.3.12  yelb.local" >> /etc/hosts  

and that’s it, the incoming requests to “yelb.local” are now routed via the ingress service to the yelb frontend pod running on our GKE cluster.

Provision an AWS EKS Cluster with Terraform

In this post we’ll provision an AWS Elastic Kubernetes Service (EKS) Cluster using Terraform. EKS is an upstream compliant Kubernetes solution that is fully managed by AWS.

I have provided a sample Terraform script at here. It will build a multi-AZ EKS cluster that looks like this:

Specifically, we’ll be launching the following AWS resources:

  • 1x new VPC for hosting the EKS cluster
  • 3x private subnets (across 3x different AZ) for the EKS worker nodes
  • 3x public subnets for hosting ELBs (mapped to EKS external Load Balancer services)
  • 1x NAT Gateway for Internet access and publishing external services
  • 2x Auto-Scaling Groups for 2x EKS worker groups, with different IAM instance sizes (each ASG is set to a desired capacity of 2x, so we’ll get a total of 4x worker nodes)
  • 2x Security Groups attached to the 2x ASGs for management access

PREREQUISITES

  • Access to an AWS testing environment
  • Install Git, Terraform & Kubectl on your client
  • Install AWS toolkits including AWS CLI, AWS-IAM-Authenticator
  • Check the NTP clock & sync status on your client —> important!
  • Clone the Terraform Repo
git clone https://github.com/sc13912/tf-aws-eks.git

Step-1: Set the AWS Environment Variables and run the Terraform script

[root@cloud-ops01 tf-aws-eks]# aws configure
AWS Access Key ID [*****]: 
AWS Secret Access Key [***]: 
Default region name [us-east-1]: 
Default output format [json]:
terraform init
terraform apply

The process will take about 10~15 mins and your Terraform output should look like this:

Register the cluster and update the kubeconfig file with the correct cluster name.

[root@cloud-ops01 tf-aws-eks]# aws eks --region us-east-1 update-kubeconfig --name demo-eks-zUqzVyxb
Added new context arn:aws:eks:us-east-1:979459205431:cluster/demo-eks-zUqzVyxb to /root/.kube/config

Step-2: Verify the EKS Cluster status

Verify we can access the EKS cluster and the 4x worker nodes that have just been created.

[root@cloud-ops01 tf-aws-eks]# kubectl get nodes
NAME                         STATUS   ROLES    AGE   VERSION
ip-10-0-1-113.ec2.internal   Ready    <none>   43m   v1.16.8-eks-e16311
ip-10-0-1-40.ec2.internal    Ready    <none>   43m   v1.16.8-eks-e16311
ip-10-0-2-26.ec2.internal    Ready    <none>   43m   v1.16.8-eks-e16311
ip-10-0-3-23.ec2.internal    Ready    <none>   43m   v1.16.8-eks-e16311

Run kubectl describe nodes and we can see each node has been tagged with a few customised labels based on its unique properties. These are important metadata which can be used for selective Pod/Node deployment and other use cases like affinity or anti-affinity rules.

Now log into the AWS console, navigate to EC2 —> Auto Scaling —> Auto Scaling Groups, you’ll find the two ASGs that have been provisioned by Terraform.

Now check the EC2 instances, we should have 2+2 work nodes with different ASG instance sizes, and they should be randomly distributed across all 3x AZs.

Step-3: Deploy Kubernetes Add-on Services

  • Install Metrics-Server to provide cluster-wide resource metrics collection and to support use cases such as Horizontal Pod Autoscaling (HPA)
[root@cloud-ops01 tf-aws-eks]# kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml 

wait a for a few seconds and verify we now have resource stats

[root@cloud-ops01 tf-aws-eks]# kubectl top nodes
NAME                         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
ip-10-0-1-113.ec2.internal   88m          9%     417Mi           27%       
ip-10-0-1-40.ec2.internal    126m         6%     600Mi           17%       
ip-10-0-2-26.ec2.internal    360m         18%    760Mi           22%       
ip-10-0-3-23.ec2.internal    84m          8%     454Mi           30%       
  • Next, deploy a NGINX Ingress Controller so we can use L7 URL load balancing.
[root@cloud-ops01 tf-aws-eks]# kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-0.32.0/deploy/static/provider/aws/deploy.yaml

verify the Ingress pods and services are running

[root@cloud-ops01 tf-aws-eks]# kubectl get pods -n ingress-nginx 
NAME                                        READY   STATUS      RESTARTS   AGE
ingress-nginx-admission-create-2fvlb        0/1     Completed   0          103s
ingress-nginx-admission-patch-4tvnk         0/1     Completed   0          102s
ingress-nginx-controller-5cc4589cc8-7fr64   1/1     Running     0          117s
[root@cloud-ops01 tf-aws-eks]# 
[root@cloud-ops01 tf-aws-eks]# kubectl get svc -n ingress-nginx  
NAME                                 TYPE           CLUSTER-IP       EXTERNAL-IP                                                                     PORT(S)                      AGE
ingress-nginx-controller             LoadBalancer   172.20.114.166   aaa1d4619924247688fc4eeb4f85cd48-76f9a6f87fe42022.elb.us-east-1.amazonaws.com   80:31060/TCP,443:31431/TCP   2m2s
ingress-nginx-controller-admission   ClusterIP      172.20.3.211     <none>  
  • In addition, we’ll deploy some storage classes (with different I/O specification) to provide dynamic persistent storage required for stateful pods and services.
[root@cloud-ops01 tf-aws-eks]# kubectl apply -f ./storage/storageclass/  
storageclass.storage.k8s.io/fast-50 created
storageclass.storage.k8s.io/standard created
  • Optionally, we can deploy the Kubernetes dashboard for some basic UI visibility.
[root@cloud-ops01 tf-aws-eks]# kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended.yaml  
[root@cloud-ops01 tf-aws-eks]# kubectl apply -f ./kube-dashboard/  

Retrieve the dashboard token.

[root@cloud-ops01 tf-aws-eks]# SA_NAME=admin-user  
[root@cloud-ops01 tf-aws-eks]# kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep ${SA_NAME} | awk '{print $1}')  
... 
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6IlFsUzNqaW9iNFVsXy1BNlppdk9YZVVDZkFxMTJqeGMtSlA0LXN5QjZDdkkifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi11c2VyLXRva2VuLTliODdiIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImFkbWluLXVzZXIiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJjNjkwZTk5Zi0zM2ViLTRlZjctYTA2Ny03MDVjMTE3ODI1NjUiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06YWRtaW4tdXNlciJ9.h1a_8pySJ7hebSci-mP8tPXmCY0vCQOCKzeDKICDMEE4Qlt-FGSwoBMEzTLLcA2-MUtDjkzbjJlFPZMl2EsiaxPbP63_yn_0l4hZqMdM4nKjvrtVCXUvY9fJOREj3lNvG4Uy1QiyU3pgKbUKdFpvSYPVPGmqq_hFTc5U9KXwk_bBgIIJr9S2a8_yIvchMtTrsxdh3O1P-AeP5Bd5FZJSG9QeI2z1guD8ewWOa2W4Z5E4wKZ10yVVslhh_OcQgQ2eBvtDD6_mrDwSs1tQUbY83jbHR7yYOTYmz-v2EnLWb3cUbO8u3EHL_qWjRTPcMTuH9RLZwTf7CLH6RYoEVlUvLw

Get the dashboard LB service address.

[root@cloud-ops01 tf-aws-eks]# kubectl get svc kubernetes-dashboard  -n kubernetes-dashboard  
NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP                                                              PORT(S)         AGE
kubernetes-dashboard   LoadBalancer   172.20.62.179   aa2e50aa1703d4163a87c1dbe5bab77a-723142651.us-east-1.elb.amazonaws.com   443:30822/TCP   6m46s

Point to the URL in the browser, copy & paste the token for authentication and you should land on a dashboard page like this:

Step-4: Deploy sample apps on the EKS cluster for testing

  • Firstly, deploy the provided sample Guestbook app to verify the persistent storage setup.
[root@cloud-ops01 tf-aws-eks]# kubectl create ns guestbook-app  
[root@cloud-ops01 tf-aws-eks]# kubectl apply -f ./demo-apps/guestbook/    

The application requests 2x persistent volumes (PV) for the redis-master and redis-slave pods. Both PVs should be automatically provisioned by the persistent volume claims (PVC) with the 2x different storage classes as we deployed earlier. You should see the STATUS reported as “Bound” between each PV and PVC mapping.

[root@cloud-ops01 tf-aws-eks]# kubectl get pvc -n guestbook-app 
NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
redis-master-claim   Bound    pvc-ad5310a6-249f-4526-9ed6-0596b70fa171   2Gi        RWO            standard       38m
redis-slave-claim    Bound    pvc-a3e97098-600a-4ede-bc4a-e9235602d42c   4Gi        RWO            fast-50        38m

Again, retrieve the external IP/DNS for the frontend service for the Guestbook app.

[root@cloud-ops01 storageclass]# kubectl get svc -n guestbook-app 
NAME           TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)        AGE
frontend       LoadBalancer   172.20.19.131    a9e282a1efb6b4f97a288e183c68ac82-2013066277.us-east-1.elb.amazonaws.com   80:32578/TCP   45m

You should be able to access the Guestbook now. Enter and submit some messages, and try to destroy and re-redeploy the app, your data will be kept by the redis PVs.

  • Next, we’ll deploy a modified version of the yelb app to test the NGINX ingress controller
[root@cloud-ops01 tf-aws-eks]# kubectl create ns yelb  
[root@cloud-ops01 tf-aws-eks]# kubectl apply -f ./demo-apps/yelb/  

Retrieve the external DNS address for the ingress service within the yelb namespace. Notice the ingress URL path is defined as “yelb.local”. Next we’ll need to get the public IP of the ingress service and then update the local host file for a quick testing.

[root@cloud-ops01 tf-aws-eks]# kubectl get ingresses -n yelb
NAME           HOSTS        ADDRESS                                                                         PORTS   AGE
yelb-ingress   yelb.local   a8821ed5391434981a35cd6599ed7671-a0d9702f226e21d8.elb.us-east-1.amazonaws.com   80      63m

Run nslookup to get the public IP of the ingress service, then update the local host file.

Non-authoritative answer:
Name:   a8821ed5391434981a35cd6599ed7671-a0d9702f226e21d8.elb.us-east-1.amazonaws.com
Address: 54.175.25.189

[root@cloud-ops01 tf-aws-eks]# echo "54.175.25.189  yelb.local" >> /etc/hosts      

We should have access to the app now.