Recently I have tried out the Terraform NSX-T Provider and it worked like a charm. In this post, I will demonstrate a simple example on how to leverage Terraform to provision a basic NSX tenant network environment, which includes the following:
create a Tier-1 router
create (linked) routed ports on the new T1 router and the existing upstream T0 router
link the T1 router to the upstream T0 router
create three logical switches with three logical ports
create three downlink LIFs (with subnets/gateway defined) on the T1 router, and link each of them to the logical switch ports accordingly
Once the tenant environment is provisioned by Terraform, the 3x tenant subnets will be automatically published to the T0 router and propagated to the rest of the network (if BGP is enabled), and we should be able to reach the individual LIF addresses. Below is a sample topology deployed in my lab — (here I’m using pre-provisioned static routes between the T0 and upstream network for simplicity reasons).
Software Versions Used & Verified
Terraform – v0.12.25
NSX-T Provider – v3.0.1 (auto downloaded by Terraform)
NSX-T Data Center -v3.0.2 (build 0.0.16887200)
Sample Terraform Script
You can find the sample Terraform script at my Git repo here — remember to update the variables based on your own environment.
Run the Terraform script and this should take less than a minute to complete.
We can review and reverify that the required NSX components were built successfully via the NSX manager UI — Note: you’ll need to switch to the “Manager mode” to be able to see the newly create elements (T1 router, logical switches etc), as Terraform was interacting with the NSX management plane (via MP-API) directly.
In addition, we can also check and confirm the3x tenant subnets are published via T1 to T0 by SSH into the active edge node. Make sure you connect to the correct VRF table for the T0 service router (SR) in order to see the full route table — here we can see the 3x /24 subnets are indeed advertised from T1 to T0 as directly connected (t1c) routes.
As expected I can reach to each of the three LIFs on the T1 router from the lab terminal VM.
This blog provides an example for deploying a CI/CD pipeline on AWS utilising the serverless container platform Fargate and the fully managed CodePipeline service. We’ll also use Terraform to automate the process for building the entire AWS environment, as shown in the below diagram.
Specifically, we’ll be creating the following AWS resources:
1x demo VPC including public/private subnets, NAT gateway and security groups etc
1x ALB for providing LB services to a target group of 2x Fargate container tasks
1x ECS cluster with a Fargate service definition (running our demo app)
1x CodePipeline definition, which builds the demo app from GitHub Repo (with a webhook trigger) and deploys it to the same Fargate service
1x ECR repository for hosting pipeline build images
2x S3 Buckets as build & artifact cache
References – for this demo, I’m using these Terraform modules found on GitHub:
Clone or fork the demo app (including CodePipeline buildspec) at here.
Step-1: Review the Terraform Script
Let’s take a close look of the Terraform code. I’ll skip the VPC and ALB sections and focus on the ECS/Fargate service and CodePipeline definition.
This section creates an ECS cluster with the Fargate service definition, note I have put a bitnami node image for testing purpose and it will get replaced automatically by our demo app via the CodPipeline execution.
This section creates an ECR repository (for hosting the build image) and defines the pipeline, which builds the demo app from GitHub repo, pushes the new image to ECR and deploys it to the same ECS cluster and Fargate service as created from the above.
The process will take about 5 mins and you should see an output like this. Note the public URL of the ALB, which is providing LB services to the 2x Fargate container tasks.
Step-3: Review the Fargate Service
On the AWS Console, go to “Elastic Container Service (ECS) —> Cluster” and we can see an ECS cluster “default” has been created, with 1x Fargate service defined and 2x container tasks/pods running.
and here are the two running container tasks/pods:
Click any of the tasks to confirm its running our demo app image deployed from the ECR repository.
Next, search for AWS service “Developer Tools —> CodePipeline“, you’ll see our Pipeline has been deployed with a (1st) successful execution.
Now search for “EC2 —> Load Balancer”, confirm that an ALB has been created and it should be deployed on two different subsets across two AZs.
This is because we are spreading the 2x ECS container tasks onto two AZs for high availability
Go to the ALB public DNS/URL and you should see the default page of our demo app running on AWS Fargate, cool!
Step-4: Test the Pipeline Run
It’s testing time now! As discussed, the pipeline is synced to Github repository and will be triggered by a push to master event. The actual build task is defined within the buildspec.yaml which contains a simple 3-stage process as per below. Note the output of the build process includes a json artifact (imagedefinitions.json) which includes the ECR path for the latest build image.
To test the pipeline run, we’ll make a “cosmetic change” to the app revision (v1.0 —> v1.1)
Commit and push to master.
As expected, this has triggered a new pipeline run
Soon you’ll see two additional pods are launching with a new revision number of “3” — this is because by default Fargate implements a rolling update deployment strategy with a default minimum healthy percent of 100%. So it will not remove the previous container pods (revision 2) until the new ones are running and ready.
Once the v3 Pods are running and we can see the v2 pods are being terminated and de-registered from the service.
Eventually the v2 pods are removed and the Fargate service is now updated with revision 3, which consists of the new pods running our demo app “v1.1”.
In the CodePipeline history, verify the new build & deployment process have been completed successfully.
Also, verify the new image (tag “99cc610”) of the demo app is pushed to ECR as expected.
Go to the Fargate tasks (revision 3) again and verify the container pods are indeed running on the new image “99cc610”.
Refresh the ALB address to see the v1.1 app page loading — Magic!
This time, we will take a step further and go completely serverless by deploying the same Node app onto the Google Cloud Run platform. Cloud Run is built from an open source project named Knative, which is a serverless framework developed based on the industry proven Kubernetes architecture. Whilst Knative is developed with the same event-driven concept (like other serverless solutions), it also offers great flexibility and multi-cloud portability at a container level.
For this demo, we will firstly launch a Cloud Run Service with an initial image using cloudrun-hello app provided by Google. We will also create a Cloud Build Pipeline to automatically build and push our Node app onto GCR, and then deploy it to the same Cloud Run Service (as a new revision). As previously, the pipeline will be synced to GitHub repository and automatically triggered by a Git push event.
Best of all, all GCP resources in this environment, including the Cloud Run Service and the Cloud Build Pipeline will be provisioned via Terraform, as illustrated at below.
Next, let’s connect Cloud Build to the demo app Git Repository. On GCP console, go to “Cloud Build —> Triggers —> Connect Repository” and then select “GitHub” as below. (You will be redirected to GitHub for authentication.)
Select the demo app repository which contains the sample NodeJs application.
In the next page, make sure to click “Skip for now” and we are done. We’ll leave it to Terraform to create the trigger at later.
Step-3: Run Terrafrom Script to launch a Serverless CI/CD Pipeline
Before executing the script, make sure to update the variables (as defined in Terrafrom.tfvars) as per your own GCP environment.
Since we are not provisioning any Infrastructure resources (it’s Serverless!), the process should take less than 2~3 mins. Take a note of the URL provided in the output — this is the public URL of our Cloud Run Service.
On GCP console verify the Cloud Run Service has been deployed successfully.
This is the second episode of our Cloud Native DevOps on GCP series. In the previous chapter, we have built a multi-AZ GKE cluster with Terraform. This time, we’ll create a cloud native CI/CD pipeline leveraging our GKE cluster and Google DevOps tools such as Cloud Build and Google Container Registry (GCR). We’ll create a Cloud Build trigger by connecting to GitHub repository to perform automatic build, test and deployment of a sample micro-service app onto the GKE cluster.
For this demo, I have provided a simple NodeJS app which is already containerized and packaged as a Helm Chart for fast K8s deployment. You can find all the artifacts atmy GitHub Repo, including the demo app, Helm template/chart, as well as the Cloud Build pipeline code.
Register GCloud as a Docker credential helper — this is important so our Docker client will have privileged access to interact with GCR. (Later we’ll need to build and push a Helm client image to GCR as required for the pipeline deployment process)
Step-3: Initialize Helm for Application Deployment on GKE
As mentioned above, for this demo we have encapsulated our demo app into a Helm Chart. Helm is a package management system designed for simplifying and accelerating application deployment on the Kubernetes platform.
As of version 2, Helm consists of a local client and a Tiller server pod (deployed in K8s cluster) to interact with the Kube-apiserver for app deployment. In our example, we’ll first build a customised Helm client docker image and push it to GCR. This image will then be used by Cloud Build to interact with the Tiller server (deployed on GKE) for deploying the pre-packaged Helm chart — as illustrated in the below diagram.
First let’s configure a service account for Tiller and initialize Helm (server component) on our GKE cluster.
Next we’ll leverage the (previously built) Helm client to interact with our GKE cluster and to deploy the Helm chart (for our node app), with the image repository pointing to the GCR path from the last pipeline stage.
Lastly, we’ll run an integration test to verify the demo app status on our GKE cluster. For our node app there is a built-in heath-check URL configured at “/health“, and we’ll be leveraging anotherCloud Builder curl imageto ping this URL path and expect a return message of <“status”: “ok”> . Note: here we should be polling the internal DNS address for the k8s service (of the demo app) so there is no dependency on IP allocations.
Step-4: Create a Cloud Build Trigger by Connecting to GitHub Repository
Now that we have our GKE cluster ready and Helm image pushed to GCR, the next step is to connect Cloud Build to the GitHub repository and create a CI trigger. On GCP console, go to Cloud Build —> Triggers, select the GitHub repo as below.
If this is the first time you are connecting to GitHub in Cloud Build, it will redirect you to an authorization page like below, accept it in order to access your repositories.
Select the demo app repository, which also includes the pipeline config (cloudbuild.yaml) file.
Create a push trigger in the next page and you should see a summary like this.
You can manually run the trigger now to kick off the CI build process. However we’ll be running more thorough testing to verify the end-to-end pipeline automation process in the next section.
Step-5: Test the CI/CD Pipeline
It’s time to test our CI/CD pipeline! First we’ll make a “cosmetic” version change (1.0.0 to 1.0.1) to the Helm chart for our demo app.
Commit the change and push to the Git repository.
This (push event) should have triggered our Cloud Build pipeline. You can jump on the GCP console to monitor the fully automated 4-stage process. The pipeline will be completed once the integration test has returned a status of OK.
On the GKE cluster we can see our Helm chart v-1.0.1 has been deployed successfully.
The deployment and node app are running as expected.
Retrieve the Ingress public IP and update the local host file for a quick testing. (Note the Ingress URL is defined as “node-app.local”)
[root@cloud-ops01 nodejs-cloudbuild-demo]# kubectl get ingresses
NAME HOSTS ADDRESS PORTS AGE
node-app node-app.local 184.108.40.206 80 15m
[root@cloud-ops01 nodejs-cloudbuild-demo]# echo "220.127.116.11 node-app.local" >> /etc/hosts
Now point your browser to “node-app.local” and you should see the demo app page like below. Congrats, you have just successfully deployed a cloud native CI/CD pipeline on GCP!
This is the first episode of our Cloud Native DevOps on GCP series. Here we’ll be building an Google Kubernetes Engine (GKE) cluster using Terraform. From my personal experience, GKE has been one of the most scalable and reliable managed Kubernetes solution, and it’s also 100% upstream compliant and certified by CNCF.
For this demo I have provided a sample Terraform script at here. The target state will look like this:
In specific, we’ll be launching the following GCP/GKE resources:
1x new VPC for hosting the demo GKE cluster
1x /17 CIDR block as the primary address space for the VPC
2x /18 CIDR blocks for the GKE Pod and Service address spaces
1x GKE high availability cluster across 2x Availability Zone (AZ)
Remember to update the terraform.tfvars with your own GCP project_id
project_id = "xxxxxxxx"
Make sure to enable the GKE API if not already
gcloud services enable container.googleapis.com
Now run the Terraform script:
The whole process should be taking about 7~10 mins, and you should get an output like this:
Now register the cluster and update kubeconfig file
[root@cloud-ops01 tf-gcp-gke]# gcloud container clusters get-credentials node-pool-cluster-demo --region australia-southeast1
Fetching cluster endpoint and auth data.
kubeconfig entry generated for node-pool-cluster-demo.
Step-2: Verify the GKE Cluster Status
Check that we can access the GKE cluster and there should be 4x worker nodes provisioned.
[root@cloud-ops01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-node-pool-cluster-demo-pool-01-03a2c598-34lh Ready <none> 8m59s v1.16.9-gke.2
gke-node-pool-cluster-demo-pool-01-03a2c598-tpwq Ready <none> 9m v1.16.9-gke.2
gke-node-pool-cluster-demo-pool-01-e903c7a8-04cf Ready <none> 9m5s v1.16.9-gke.2
gke-node-pool-cluster-demo-pool-01-e903c7a8-0lt8 Ready <none> 9m5s v1.16.9-gke.2
This can also been verified on GKE console
The 4x worker nodes are provisioned over 2x managed instance groups across two different AZs
Run kubectl describe nodes and we can see each node has been tagged with a few customised labels based on its unique properties. These are important metadata which can be used for selective Pod/Node deployment and other use cases like affinity or anti-affinity rules.
Step-3: Deploy GKE Add-on Services
Install Metrics-Server to provide cluster-wide resource metrics collection and to support use cases such as Horizontal Pod Autoscaling (HPA)
On GCP console we can see that an external Load Balancer has been provisioned in front of the Ingress Controller. Take a note of the LB address at below — this is the public IP that will be consumed by our ingress services.
In addition, we’ll deploy 2x storage classes to provide dynamic persistent storage support for stateful pods and services. Note the different persistent disk (PD) specs (standard & SSD) for different I/O requirements.
The application requests 2x persistent volumes (PV) for the redis-master and redis-slave pods. Both PVs should be automatically provisioned by the persistent volume claims (PVC) with the 2x different storage classes as we deployed earlier. You should see the STATUS reported as “Bound” between each PV and PVC mapping.
Retrieve the external IP/DNS for the frontend service of the Guestbook app.
[root@cloud-ops01 tf-gcp-gke]# kubectl get svc frontend -n guestbook-app
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
frontend LoadBalancer 192.168.127.128 18.104.22.168 80:31006/TCP 23m
You should be able to access the Guesbook app now. Enter and submit some messages, and try to destroy and redeploy the app, your data will be kept by the redis PVs.
Lastly, we’ll deploy a modified version of the yelb app to test the NGINX ingress controller
You should see an ingress service deployed as per below.
Retrieve the external IP for the ingress service within the yelb namespace. As mentioned before, this should be the same address of the external LB deployed for the ingress controller.
[root@cloud-ops01 tf-gcp-gke]# kubectl get ingresses -n yelb
NAME HOSTS ADDRESS PORTS AGE
yelb-ingress yelb.local 22.214.171.124 80 6m47s
Also, notice the ingress URL path is defined as “yelb.local”. This is the DNS entry that will be redirected by the http ingress service. So we’ll update the local host file (with the ingress public IP) for a quick testing.