Enabling embedded Harbor Image Registry in vSphere 7 with Kubernetes

This will be a quick blog to demonstrate how to enable the (embedded) Harbor Image Registry in vSphere 7 with Kubernetes. Harbor was originally developed by VMware as a enterprise-grade private container registry. It was then donated to the CNCF in 2018 and recently became a CNCF graduated project.

For this demo, we’ll activate the embedded Harbor register within the vSphere 7 Kubernetes environment, and integrate it with the Supervisor Cluster for container management and deployment.

WHAT YOU’LL NEED:

Enabling the embedded Harbor Registry in vSphere 7 with Kubernetes

To begin, go to your vSphere 7 “Workload Cluster —> Namespaces —> Image Registry”, and then click “Enable Harbor”.

Make sure to select the vSAN storage policy to provide persistent storage as required for the Harbor installation.

The process will take a few minutes, and you should see 7x vSphere Pods after Harbor is installed and enabled. Take a note of the Harbor URL — this is an external address of the K8s load balancer that is created by NSX-T.

Push Container Images to Harbor Registry

First, let’s log into the Harbor UI and take a quick look. Since this is embedded within vSphere, it supports the SSO login 🙂

Harbor will automatically create a project for every vSphere namespace we have created. In my case, there are two projects “dev01” and “guestbook” created, which are mapped to the two namespaces in my vSphere workload cluster.

Click the “dev01” project, and then “repository” — as expected it is currently empty, and we’ll be pushing container images to this repository for a quick test. However, before we can do that we’ll need to download and import the certificate to our client machine for certificate-based authentication. Click the “Registry Certificate” to download the ca.crt file.

Next, on the local client create a new directory under /etc/docker/cert.d/ using the same name as the registry FQDN (URL).

[root@pacific-ops01 ~]# cd /etc/docker/certs.d/
[root@pacific-ops01 certs.d]# mkdir 192.168.100.133
[root@pacific-ops01 certs.d]# cd 192.168.100.133/
[root@pacific-ops01 192.168.100.133]# vim ca.crt

Now, let’s get a test (nginx) image, tag it, and try to push it to the dev01 repository.

[root@pacific-ops01 ~]# docker login 192.168.100.133 --username administrator@vsphere.local
Password: 
Login Succeeded

[root@pacific-ops01 ~]# docker pull nginx
Using default tag: latest
Trying to pull repository docker.io/library/nginx ... 
latest: Pulling from docker.io/library/nginx
bf5952930446: Pull complete 
cb9a6de05e5a: Pull complete 
9513ea0afb93: Pull complete 
b49ea07d2e93: Pull complete 
a5e4a503d449: Pull complete 
Digest: sha256:b0ad43f7ee5edbc0effbc14645ae7055e21bc1973aee5150745632a24a752661
Status: Downloaded newer image for docker.io/nginx:latest
[root@pacific-ops01 ~]# 
[root@pacific-ops01 ~]# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
docker.io/nginx     latest              4bb46517cac3        3 days ago          133 MB
[root@pacific-ops01 ~]# 
[root@pacific-ops01 ~]# docker tag docker.io/nginx 192.168.100.133/dev01/nginx
[root@pacific-ops01 ~]# 
[root@pacific-ops01 ~]# docker push 192.168.100.133/dev01/nginx
The push refers to a repository [192.168.100.133/dev01/nginx]
550333325e31: Pushed 
22ea89b1a816: Pushed 
a4d893caa5c9: Pushed 
0338db614b95: Pushed 
d0f104dc0a1f: Pushed 
latest: digest: sha256:179412c42fe3336e7cdc253ad4a2e03d32f50e3037a860cf5edbeb1aaddb915c size: 1362
[root@pacific-ops01 ~]# 

It works, perfect! Now refresh the repository and we can see the new nginx image we just pushed through.

Deploy Kubernetes Pods to Supervisor Cluster from the Harbor Registry

Let’s run a quick test to deploy a Pod using the nginx image from our Harbor Registry. First, log into the Supervisor Cluster and switch to the “dev01” namespace/context.

[root@pacific-ops01 ~]# kubectl vsphere login --server=192.168.100.129 --vsphere-username administrator@vsphere.local --insecure-skip-tls-verify
Password: 
Logged in successfully.
…
[root@pacific-ops01 ~]# kubectl config use-context dev01
Switched to context "dev01".

Make a nginx Pod config using the image path from our Harbor repository.

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: nginx-demo
  name: nginx-demo
  namespace: dev01
spec:
  containers:
  - image: 192.168.100.133/dev01/nginx
    name: nginx-demo
  restartPolicy: Always

Deploy the Pod.

[root@pacific-ops01 ~]# kubectl apply -f nginx-demo.yaml 
pod/nginx-demo created

Monitor the events and soon we can see the Pod is deployed successfully from the image fetched from the Harbor repository.

[root@pacific-ops01 ~]# kubectl get  events -n dev01
LAST SEEN   TYPE     REASON                         OBJECT                                                    MESSAGE
48s         Normal   Status                         image/nginx-4f70b77c704ff28acdf14ce0405bc1811e8ee077-v0   pacific-esxi-3: Image status changed to Resolving
40s         Normal   Resolve                        image/nginx-4f70b77c704ff28acdf14ce0405bc1811e8ee077-v0   pacific-esxi-3: Image resolved to ChainID sha256:80b21afd8140706d5fe3b7106ae6147e192e6490b402bf2dd2df5df6dac13db8
40s         Normal   Bind                           image/nginx-4f70b77c704ff28acdf14ce0405bc1811e8ee077-v0   Imagedisk 80b21afd8140706d5fe3b7106ae6147e192e6490b402bf2dd2df5df6dac13db8-v0 successfully bound
32s         Normal   Status                         image/nginx-4f70b77c704ff28acdf14ce0405bc1811e8ee077-v0   Image status changed to Fetching
14s         Normal   Status                         image/nginx-4f70b77c704ff28acdf14ce0405bc1811e8ee077-v0   Image status changed to Ready
7s          Normal   SuccessfulRealizeNSXResource   pod/nginx-demo                                            Successfully realized NSX resource for Pod
<unknown>   Normal   Scheduled                      pod/nginx-demo                                            Successfully assigned dev01/nginx-demo to pacific-esxi-1
50s         Normal   Image                          pod/nginx-demo                                            Image nginx-4f70b77c704ff28acdf14ce0405bc1811e8ee077-v0 bound successfully
39s         Normal   Pulling                        pod/nginx-demo                                            Waiting for Image dev01/nginx-4f70b77c704ff28acdf14ce0405bc1811e8ee077-v0
14s         Normal   Pulled                         pod/nginx-demo                                            Image dev01/nginx-4f70b77c704ff28acdf14ce0405bc1811e8ee077-v0 is ready
7s          Normal   SuccessfulMountVolume          pod/nginx-demo                                            Successfully mounted volume default-token-bqxc2
7s          Normal   Created                        pod/nginx-demo                                            Created container nginx-demo
7s          Normal   Started                        pod/nginx-demo                                            Started container nginx-demo



[root@pacific-ops01 ~]# kubectl get pods -n dev01   
NAME         READY   STATUS    RESTARTS   AGE
nginx-demo   1/1     Running   0          60s

Use kubectl describe pod to confirm the nginx Pod is indeed running on the image pulled from the Harbor registry.

Deploying Contour Ingress Controller on Tanzu Kubernetes Grid (TKG)

This blog provides a guide to help you deploying Contour Ingress Controller onto a Tanzu Kubernetes Grid (TKG) cluster. Contour is an open source Kubernetes ingress controller that exposes HTTP/HTTPS routes for internal services so they are reachable from outside the cluster. Like many other ingress controllers, Contour can provide advanced L7 URL/URI based routing and load balancing, as well as SSL/TLS termination capabilities.

Contour was originally developed by Heptio (VMware) and has been recently handed over to CNCF as an incubating project. Contour consists of a control plane that is provisioned via a K8s deployment, and an Envoy-based data plane running as a Daemonset on every cluster worker node.

img
https://projectcontour.io/contour-v014/

WHAT YOU’LL NEED:

For this lab, we’ll install the Contour ingress controller onto a TKG cluster, and we’ll then deploy a sample app (supplied within the manifest) for testing the Ingress services. The overall service topology will look like this:

Install the Contour Ingress Controller

To begin, unzip the TKG extension manifest (I’m using v1.1.0).

[root@pacific-ops01 ~]# tar -xzf tkg-extensions-manifests-v1.1.0-vmware.1.tar.gz 

Log into your TKG cluster and make sure you are in the correct context.

[root@pacific-ops01 ~]# kubectl vsphere login --server=192.168.100.129 --vsphere-username administrator@vsphere.local --insecure-skip-tls-verify --tanzu-kubernetes-cluster-name dev01-tkg-01 --tanzu-kubernetes-cluster-namespace dev01
[root@pacific-ops01 ~]# kubectl config use-context dev01-tkg-01 

Next, install the Cert-Manager (for Contour Ingress) onto the TKG cluster.

Before we can install Contour and Envoy, we’ll need to make a small change to the Envoy service config (02-service-envoy.yaml). As illustrated in the service topology, we will deploy a LoadBalancer in front of the ingress controller. So we’ll update the Envoy service type from NodePort (default) to LoadBalancer.

Now deploy Contour and Envoy onto the cluster.

We can see a Contour deployment, and an Envoy daemonset of 3x (we have 3 worker nodes) have been deployed under the namespace of tanzu-system-ingress. Also, take a note of the external IP (192.168.100.130) of the Envoy LoadBalancer service as this will be used by our Ingress services.

Deploy a Sample App for testing Ingress Services

Deploy the sample app from within the manifest, this will create:

  • one new namespace called “test-ingress”
  • one deployment of the “helloweb” app, with a Replicaset of 3x Pods
  • two separate services called “s1” & “s2” — Note: both services are actually pointing to the same 3x Pods (as they are using the same Pod selector)

Verify the Pods are up and running

[root@pacific-ops01 ~]# kubectl get pods -n test-ingress 
NAME                        READY   STATUS    RESTARTS   AGE
helloweb-7cd97b9cb8-qjwtk   1/1     Running   0          50s
helloweb-7cd97b9cb8-r9s8g   1/1     Running   0          51s
helloweb-7cd97b9cb8-swztl   1/1     Running   0          51s

and both services (s1 & s2) are deployed as expected.

[root@pacific-ops01 ~]# kubectl get svc -n test-ingress 
NAME   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
s1     ClusterIP   10.40.183.104   <none>        80/TCP    1m
s2     ClusterIP   10.40.129.12    <none>        80/TCP    1m

We can’t get to these services yet as they are internal K8s services (ClusterIP) only. We’ll need to deploy an Ingress object so that Contour can expose these services and route traffic to them from external. The good news is that there’s already an Ingress config template provided in the manifest. I’ve made the following changes to the template as per my lab environment (my lab domain is vxlan.co). Note the hostname (URL) and the path (URI) as we’ll be using these to access the two services.

Deploy the Ingress object.

[root@pacific-ops01 ~]# cd tkg-extensions-v1.1.0/ingress/contour/examples/https-ingress 
[root@pacific-ops01 https-ingress]# kubectl apply -f .
ingress.extensions/https-ingress created
secret/https-secret created

Verify the Ingress service is running as expected

[root@pacific-ops01 https-ingress]# kubectl get ingress -n test-ingress 
NAME            HOSTS              ADDRESS   PORTS     AGE
https-ingress   ingress.vxlan.co             80, 443   2m

Create a DNS record with the ingress hostname by pointing to the Envoy load balancer external IP.

Now test access to the s1 service by browsing https://ingress.vxlan.co/s1

and s2 service by browsing https://ingress.vxlan.co/s2

Congrats, you have successfully deployed a Contour Ingress controller on a TKG cluster!

Deploying vSphere 7 with Kubernetes and Tanzu Kubernetes Grid (TKG) Cluster

In this post we’ll explore the vSphere 7 with Kubernetes capabilities and the detailed deployment steps in order to provision a vSphere supervisor cluster and a Tanzu Kubernetes Grid (TKG) cluster.

If you are new to vSphere 7 and Tanzu Kubernetes, below are some background readings that can be used as a good start point:

Requirements

I’ll be building a nested vSphere7/VCF4 environment in my home lab ESXi host, and the overall lab setup looks like below:

As you might have guessed, this lab requires a lot of resources! In specific you’ll need the following:

  • physical ESXi host running at least vSphere 6.7 or later
  • capacity to provision VM with up to 8x vCPU
  • capacity to provision up to 140-180GB of RAM
  • around 1TB of spare storage
  • a flat /24 subnet connected to external & Internet (can be shared with lab management network)
  • access to vSphere 7 ESXi/VCSA and NSX-T/Edge 3.0 OVA files and trial licenses

In order to save time on provisioning the vSphere/VCF stack, I’m using William Lam‘s vSphere 7 automation script as discussed here. You can find the PowerShell code and further details at his Git repository.

All demo apps and configuration yaml files used in this lab can be found at my Git Repo.

We’ll cover the following steps:

  • #1 – build a (nested) vSphere7/VCF4 stack
  • #2 – configure workload management and deploy supervisor cluster
  • #3 – deploy a demo app with native vSphere Pod services
  • #4 – deploy a TKG cluster
  • #5 – vSphere environment overview (post deployment)

Step-1: Deploy a vSphere7/VCF4 stack

First, you’ll need to download William’s PowerShell script and modify it based on your own lab environment. You’ll also need to download the required OVAs and place them in the same path as defined in the script — Note for the VCSA you’ll need to unzip the ISO and point the path to the unzipped folder!

Now let’s run the PowerShell script and you’ll see a deployment summary page like this:

Hit “Y” to kickoff the deployment and for me the whole process took just a little over 1 hour.

Once the script completes you should see a vAPP look like this deployed under your physical ESXi host.

Step-2: Configure Workload Management and Deploy Supervisor Cluster

To activate vSphere 7 native Kubernetes capabilities, we need to enable workload management which will configure our nested ESXi cluster as a supervisor cluster. First, log into the nested VCSA, and navigate to “Menu” —> “Workload Management”, click “Enable”:

Select our nested ESXi cluster to be configured as a supervisor cluster

Select supervisor Control Plane VM size

Configure the management network settings for the supervisor cluster, note that we’ll need to reserve a 5-address block for the control plane VMs including a VIP.

Next, configure vSphere Pod network settings — for this demo we’ll reserve one /27 for the Ingress CIDR block as the NAT IPs to be consumed by Load Balancer or Ingress services; and another /27 for the Egress CIDR block as outbound SNAT IPs for provisioned K8s namespaces.

Configure storage policies by selecting the pre-provisioned pacific-gold vSAN policy, then click “Finish” to begin the deployment of supervisor cluster.

This process will take another 20~30 mins to complete, and you’ll see a cluster of 3x control plan VMs being provisioned.

Back to the “Workload Management” —> “Cluster”, you should see our supervisor cluster (consists of 3x ESXi hosts) is now up and running. Also, take a note of the VIP address of the control plan VMs as we’ll be using that IP to log into the supervisor cluster.

Step-3: Deploy a demo app with Native vSphere Pods

To consume the native vSphere Kubernetes Pods capabilities, we need to firstly create a vSphere Namespace, which is mapped to a K8s namespace within the supervisor cluster. vSphere leverages the K8s namespace logical construct to provide resource segmentation for the vSphere pods/services/deployments, and it offers a flexible way to attach authorization and network/storage policies for different environments.

Go to “Menu” —> “Workload Management”, and click “Create Namespace”.

Since we’ll be deploying a sample guestbook app, we’ll name the namespace “guestbook”.

Next, grant the vSphere admin with editor’s permission to the namespace, and assign the vSAN storage policy “pacific-gold-storage-policy” for the namespace —> this is important as (behind the scene) we are leveraging the vSAN CSI (container storage interface) driver to provide persistent storage support for the cluster.

Now we are ready to dive into the vSphere supervisor cluster! Before we can do that, let’s get the Kubectl CLI and the vSphere plugin package.
Open the CLI tools link at here:

Follow the onscreen instructions to download and install the vSphere Kubectl CLI toolkit onto your management host (I’m using a CentOS7 VM).

Time to log into our superviosr K8s cluster! — remember to use the control plane VIP (192.168.100.129) as noted before.

[root@Pacific-Ops01]# kubectl vsphere login --server=192.168.100.129 -u administrator@vsphere.local --insecure-skip-tls-verify

switch context to our “guestbook” namespace

[root@Pacific-Ops01]# kubectl config use-context guestbook
Switched to context "guestbook".

take a look of the cluster nodes, you’ll see the 3x master nodes (supervisor control VMs) and 3x worker nodes (ESXi hosts)

[root@pacific-ops01 vs7-k8s]# kubectl get nodes -o wide
NAME                               STATUS   ROLES    AGE   VERSION                    INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                 KERNEL-VERSION      CONTAINER-RUNTIME
420a7d079f62a8ae40fb4bffea3cee48   Ready    master   8d    v1.16.7-2+bfe512e5ddaaaa   10.244.0.196      <none>        VMware Photon OS/Linux   4.19.84-1.ph3-esx   docker://18.9.9
420acb46e78281fcfaf3f45ea3d7c577   Ready    master   8d    v1.16.7-2+bfe512e5ddaaaa   10.244.0.194      <none>        VMware Photon OS/Linux   4.19.84-1.ph3-esx   docker://18.9.9
420aef27c9f45b01e8e0ed4a7e45cf2e   Ready    master   8d    v1.16.7-2+bfe512e5ddaaaa   10.244.0.195      <none>        VMware Photon OS/Linux   4.19.84-1.ph3-esx   docker://18.9.9
pacific-esxi-1                     Ready    agent    8d    v1.16.7-sph-4d52cd1        192.168.100.121   <none>        <unknown>                <unknown>           <unknown>
pacific-esxi-2                     Ready    agent    8d    v1.16.7-sph-4d52cd1        192.168.100.122   <none>        <unknown>                <unknown>           <unknown>
pacific-esxi-3                     Ready    agent    8d    v1.16.7-sph-4d52cd1        192.168.100.123   <none>        <unknown>                <unknown>           <unknown>

Clone the git repo for this demo lab, and apply a dummy network policy (permit all ingress and all egress traffic)

[root@pacific-ops01 ~]# git clone https://github.com/sc13912/vs7-k8s.git
Cloning into 'vs7-k8s'...
remote: Enumerating objects: 15, done.
remote: Counting objects: 100% (15/15), done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 15 (delta 2), reused 12 (delta 2), pack-reused 0
Unpacking objects: 100% (15/15), done.
[root@pacific-ops01 ~]# cd vs7-k8s/
[root@pacific-ops01 vs7-k8s]# kubectl apply -f network-policy-allowall.yaml
networkpolicy.networking.k8s.io/allow-all created

To deploy the guestbook app, we’ll leverage the dynamic persistent volume provisioning capability of the vSphere CSI driver by calling the vSAN storage class “pacific-gold-storage-policy”

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  namespace: guestbook
  name: redis-master-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: pacific-gold-storage-policy
  resources:
    requests:
      storage: 2Gi

apply the PVCs yamls for both the redis master and slave Pods

[root@pacific-ops01 vs7-k8s]# kubectl apply -f guestbook/guestbook-master-claim.yaml
persistentvolumeclaim/redis-master-claim created

[root@pacific-ops01 vs7-k8s]# kubectl apply -f guestbook/guestbook-slave-claim.yaml 
persistentvolumeclaim/redis-slave-claim created

verify both PVCs are showing “Bound” status mapped to two dynamically provisioned persistent volumes (PVs)

[root@pacific-ops01 vs7-k8s]# kubectl get pvc
NAME                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
redis-master-claim   Bound    pvc-0102e725-41ad-440b-8a02-8af4d4768ebb   2Gi        RWO            pacific-gold-storage-policy   14m
redis-slave-claim    Bound    pvc-fb4b7bbe-9b35-40e8-b251-8f2effe85a2d   2Gi        RWO            pacific-gold-storage-policy   13m

Now deploy the guestbook app.

[root@pacific-ops01 vs7-k8s]# kubectl apply -f guestbook/guestbook-all-in-one.yaml 
service/redis-master created
deployment.apps/redis-master created
service/redis-slave created
deployment.apps/redis-slave created
service/frontend created
deployment.apps/frontend created

wait until all the pods up and running

[root@pacific-ops01 vs7-k8s]# kubectl get pods -o wide -n guestbook 
NAME                            READY   STATUS    RESTARTS   AGE     IP             NODE             NOMINATED NODE   READINESS GATES
frontend-6cb7f8bd65-kjgh2       1/1     Running   0          3m2s    10.244.0.214   pacific-esxi-2   <none>           <none>
frontend-6cb7f8bd65-mlv79       1/1     Running   0          3m2s    10.244.0.213   pacific-esxi-1   <none>           <none>
frontend-6cb7f8bd65-slz6b       1/1     Running   0          3m2s    10.244.0.215   pacific-esxi-2   <none>           <none>
frontend-6cb7f8bd65-vtkfz       1/1     Running   0          3m3s    10.244.0.212   pacific-esxi-1   <none>           <none>
redis-master-64fb8775bf-65sdc   1/1     Running   0          3m10s   10.244.0.210   pacific-esxi-1   <none>           <none>
redis-slave-779b6d8f79-bj9q7    1/1     Running   0          3m7s    10.244.0.211   pacific-esxi-2   <none>           <none>

retrieve the Load Balancer service IP — note NSX has allocated an IP from the /27 Ingress CIDR block

[root@pacific-ops01 vs7-k8s]# kubectl get svc -n guestbook 
NAME           TYPE           CLUSTER-IP    EXTERNAL-IP       PORT(S)        AGE
frontend       LoadBalancer   10.32.0.209   192.168.100.130   80:32610/TCP   4m15s
redis-master   ClusterIP      10.32.0.34    <none>            6379/TCP       4m22s
redis-slave    ClusterIP      10.32.0.197   <none>            6379/TCP       4m21s

Hit the load balancer IP in browser to test the guestbook app. Enter and submit some messages, and try to destroy and redeploy the app, your data will be kept by the redis PVs.

Step-4: Deploy a TKG cluster

Before we can deploy a TKG cluster, we’ll need to create a content library subscription by pointing to https://wp-content.vmware.com/v2/latest/lib.json, which contains the VMware Tanzu Kubernetes images:

wait for about 5~10 mins for the library to fully sync, at this point of time I can see two versions of Tanzu K8s images:

Next, create a new namespace called “dev01” which will be hosting our new TKG cluster.

Back to the CLI, we’ll switch context from “guestbook” to the new “dev01” namespace:

[root@pacific-ops01 vs7-k8s]# kubectl config get-contexts 
CURRENT   NAME              CLUSTER           AUTHINFO                                          NAMESPACE
          192.168.100.129   192.168.100.129   wcp:192.168.100.129:administrator@vsphere.local   
          dev01             192.168.100.129   wcp:192.168.100.129:administrator@vsphere.local   dev01
*         guestbook         192.168.100.129   wcp:192.168.100.129:administrator@vsphere.local   guestbook
[root@pacific-ops01 vs7-k8s]# 
[root@pacific-ops01 vs7-k8s]# kubectl config use-context dev01 
Switched to context "dev01".

let’s examine the two TKG K8s versions available from the library:

[root@pacific-ops01 vs7-k8s]# kubectl get virtualmachineimages
NAME                                                        AGE
ob-15957779-photon-3-k8s-v1.16.8---vmware.1-tkg.3.60d2ffd   9m44s
ob-16466772-photon-3-k8s-v1.17.7---vmware.1-tkg.1.154236c   9m44s

and there are also different classes for the TKG VM templates:

[root@pacific-ops01 vs7-k8s]# kubectl get  virtualmachineclasses
NAME                 AGE
best-effort-large    4h48m
best-effort-medium   4h48m
best-effort-small    4h48m
best-effort-xlarge   4h48m
best-effort-xsmall   4h48m
guaranteed-large     4h48m
guaranteed-medium    4h48m
guaranteed-small     4h48m
guaranteed-xlarge    4h48m
guaranteed-xsmall    4h48m

so I have prepared the following yaml config for my TKG cluster — I’m using 1x master node and 3x worker nodes, all within the “guaranteed-small” machine classes.

[root@pacific-ops01 vs7-k8s]# cat tkg-cluster01.yaml 
apiVersion: run.tanzu.vmware.com/v1alpha1
kind: TanzuKubernetesCluster
metadata:
  name: dev01-tkg-01
  namespace: dev01
spec:
  distribution:
    version: v1.16
  topology:
    controlPlane:
      class: guaranteed-small
      count: 1
      storageClass: pacific-gold-storage-policy
    workers:
      class: guaranteed-small
      count: 3
      storageClass: pacific-gold-storage-policy
  settings:
    network:
      cni:
        name: calico
      services:
        cidrBlocks: ["10.36.0.0/16"]
      pods:
        cidrBlocks: ["10.242.0.0/16"]

apply the config to create the TKG cluster

[root@pacific-ops01 vs7-k8s]# kubectl apply -f tkg-cluster01.yaml 
tanzukubernetescluster.run.tanzu.vmware.com/dev01-tkg-01 created

monitor the cluster creation process, and eventually you’ll see all 4x TKG VMs are up and running:

[root@pacific-ops01 vs7-k8s]# kubectl get tanzukubernetesclusters.run.tanzu.vmware.com 
NAME           CONTROL PLANE   WORKER   DISTRIBUTION                     AGE   PHASE
dev01-tkg-01   1               3        v1.16.8+vmware.1-tkg.3.60d2ffd   13m   creating

[root@pacific-ops01 vs7-k8s]# kubectl get machines 
NAME                                         PROVIDERID                                       PHASE
dev01-tkg-01-control-plane-n9hqx             vsphere://420aff74-1367-9654-b2ba-59f8a64c3b52   running
dev01-tkg-01-workers-nwmhh-c766c8f77-nnbsj   vsphere://420aca94-26f3-f1c6-e112-607c28c439a4   provisioned
dev01-tkg-01-workers-nwmhh-c766c8f77-pcv65   vsphere://420a2c44-f4e3-f698-b173-86a6b4b3fa27   provisioned
dev01-tkg-01-workers-nwmhh-c766c8f77-zqfwj   vsphere://420a2c16-3002-b2c2-ef5d-d4e3d7a08bf8   provisioned

[root@pacific-ops01 vs7-k8s]# kubectl get machines            
NAME                                         PROVIDERID                                       PHASE
dev01-tkg-01-control-plane-n9hqx             vsphere://420aff74-1367-9654-b2ba-59f8a64c3b52   running
dev01-tkg-01-workers-nwmhh-c766c8f77-nnbsj   vsphere://420aca94-26f3-f1c6-e112-607c28c439a4   running
dev01-tkg-01-workers-nwmhh-c766c8f77-pcv65   vsphere://420a2c44-f4e3-f698-b173-86a6b4b3fa27   running
dev01-tkg-01-workers-nwmhh-c766c8f77-zqfwj   vsphere://420a2c16-3002-b2c2-ef5d-d4e3d7a08bf8   running

Time to log into our new cluster!

[root@pacific-ops01 vs7-k8s]# kubectl vsphere login --server=192.168.100.129 --vsphere-username administrator@vsphere.local --insecure-skip-tls-verify --tanzu-kubernetes-cluster-name dev01-tkg-01 --tanzu-kubernetes-cluster-namespace dev01

[root@pacific-ops01 vs7-k8s]# kubectl config use-context dev01-tkg-01 
Switched to context "dev01-tkg-01".

Once you are logged in and switched to the cluster “dev01-tkg-01” namespace, verify that you can see all 4x TKG nodes are in “Ready” status

[root@pacific-ops01 ~]# kubectl get nodes 
NAME                                         STATUS   ROLES    AGE   VERSION
dev01-tkg-01-control-plane-n9hqx             Ready    master   22m   v1.16.8+vmware.1
dev01-tkg-01-workers-nwmhh-c766c8f77-nnbsj   Ready    <none>   56s   v1.16.8+vmware.1
dev01-tkg-01-workers-nwmhh-c766c8f77-pcv65   Ready    <none>   61s   v1.16.8+vmware.1
dev01-tkg-01-workers-nwmhh-c766c8f77-zqfwj   Ready    <none>   85s   v1.16.8+vmware.1

We are now ready to deploy demo apps into the TKG cluster. First, update the cluster RBAC and Pod Security Policies by applying the supplied yaml config.

[root@pacific-ops01 vs7-k8s]# kubectl apply -f allow-nonroot-clusterrole.yaml 
clusterrole.rbac.authorization.k8s.io/psp:privileged created
clusterrolebinding.rbac.authorization.k8s.io/all:psp:privileged created

Next, deploy the yelb demo app :

[root@pacific-ops01 vs7-k8s]# kubectl apply -f yelb/yelb-lb.yaml
service/redis-server created
service/yelb-db created
service/yelb-appserver created
service/yelb-ui created
deployment.apps/yelb-ui created
deployment.apps/redis-server created
deployment.apps/yelb-db created
deployment.apps/yelb-appserver created

wait for all the Pods up and running, then retrieve the external IP of the yelb-ui Load Balancer (assigned by NSX from the pre-provisioned /27 Ingress CIDR block)

[root@pacific-ops01 vs7-k8s]# kubectl get svc yelb-ui -n yelb-app 
NAME      TYPE           CLUSTER-IP    EXTERNAL-IP       PORT(S)        AGE
yelb-ui   LoadBalancer   10.40.19.40   192.168.100.132   80:30116/TCP   9d

Go to the LB IP and you’ll see the app is running successfully.

vSphere Environment Overview

Below is a quick overview of the vSphere Lab environment after you have completed all the steps. You should see a supervisor cluster (consists of 3x ESXi worker nodes and the 3x control VMs), a TKG cluster with its own namespace, and a guestbook microservice app deployed with native vSphere Pod services by leveraging vSAN CSI.

and here is the network topology overview captured from NSX-T UI. Note NSX automatically deploys a dedicated Tier-1 gateway for every TKG cluster created. The tier-1 gateway also provides egress SNAT and Ingress LB capabilities for the TKG cluster.

Build a Serverless CI/CD pipeline on AWS with Fargate, CodePipeline and Terraform

This blog provides an example for deploying a CI/CD pipeline on AWS utilising the serverless container platform Fargate and the fully managed CodePipeline service. We’ll also use Terraform to automate the process for building the entire AWS environment, as shown in the below diagram.

Specifically, we’ll be creating the following AWS resources:

  • 1x demo VPC including public/private subnets, NAT gateway and security groups etc
  • 1x ALB for providing LB services to a target group of 2x Fargate container tasks
  • 1x ECS cluster with a Fargate service definition (running our demo app)
  • 1x CodePipeline definition, which builds the demo app from GitHub Repo (with a webhook trigger) and deploys it to the same Fargate service
  • 1x ECR repository for hosting pipeline build images
  • 2x S3 Buckets as build & artifact cache

References – for this demo, I’m using these Terraform modules found on GitHub:

PREREQUISITES

  • Access to an AWS testing environment
  • Install Git & Terraform on your client
  • Install AWS toolkits including AWS CLI, AWS-IAM-Authenticator
  • Check the NTP clock & sync status on your client —> important!
  • Clone or donwload the Terraform code at here.
  • Clone or fork the demo app (including CodePipeline buildspec) at here.

Step-1: Review the Terraform Script

Let’s take a close look of the Terraform code. I’ll skip the VPC and ALB sections and focus on the ECS/Fargate service and CodePipeline definition.

This section creates an ECS cluster with the Fargate service definition, note I have put a bitnami node image for testing purpose and it will get replaced automatically by our demo app via the CodPipeline execution.

############################# Create ECS Cluster and Fargate Service ##################################


resource "aws_ecs_cluster" "ecs_cluster" {
  name = "default"
}


module "ecs_fargate" {
  source           = "git::https://github.com/tmknom/terraform-aws-ecs-fargate.git?ref=tags/2.0.0"
  name             = var.ecs_service_name
  container_name   = var.container_name
  container_port   = var.container_port
  cluster          = aws_ecs_cluster.ecs_cluster.arn
  subnets          = module.vpc.public_subnets
  target_group_arn = join("", module.alb.target_group_arns)
  vpc_id           = module.vpc.vpc_id

  container_definitions = jsonencode([
    {
      name      = var.container_name
      image     = "bitnami/node:latest"
      essential = true
      portMappings = [
        {
          containerPort = var.container_port
          protocol      = "tcp"
        }
      ]
    }
  ])

  desired_count                      = 2
  deployment_maximum_percent         = 200
  deployment_minimum_healthy_percent = 100
  deployment_controller_type         = "ECS"
  assign_public_ip                   = true
  health_check_grace_period_seconds  = 10
  platform_version                   = "LATEST"
  source_cidr_blocks                 = ["0.0.0.0/0"]
  cpu                                = 256
  memory                             = 512
  requires_compatibilities           = ["FARGATE"]
  iam_path                           = "/service_role/"
  description                        = "Fargate demo example"
  enabled                            = true

  tags = {
    Environment = "Dev"
  }
}

This section creates an ECR repository (for hosting the build image) and defines the pipeline, which builds the demo app from GitHub repo, pushes the new image to ECR and deploys it to the same ECS cluster and Fargate service as created from the above.

################################### Create ECR Repo and Code Pipeline ###################################


resource "aws_ecr_repository" "fargate-repo" {
  name = var.ecr_repo

  image_scanning_configuration {
    scan_on_push = true
  }
}

module "ecs_codepipeline" {
  source                = "git::https://github.com/cloudposse/terraform-aws-ecs-codepipeline.git?ref=master"
  name                  = var.app_name
  namespace             = var.namespace
  region                = var.region
  image_repo_name       = var.ecr_repo
  stage                 = var.stage
  github_oauth_token    = var.github_oath_token
  github_webhooks_token = var.github_webhooks_token
  webhook_enabled       = "true"
  repo_owner            = var.github_repo_owner
  repo_name             = var.github_repo_name
  branch                = "master"
  service_name          = module.ecs_fargate.ecs_service_name
  ecs_cluster_name      = aws_ecs_cluster.ecs_cluster.arn
  privileged_mode       = "true"
}

Note the pipeline is synced to GitHub with a webhook trigger enabled, and you’ll need to supply a GitHub personal token for this. So go create one if you haven’t already done so.

This image has an empty alt attribute; its file name is image.png

Step-2: Create the Serverless Pipeline with Terraform

Configure AWS environment variables

[root@cloud-ops01 tf-aws-eks]# aws configure
AWS Access Key ID [*****]: 
AWS Secret Access Key [***]: 
Default region name [us-east-1]: 
Default output format [json]:

update terraform.tfvars based on your own environment

region = "us-east-1"
ecs_service_name = "ecs-svc-example"
container_port = 3000
container_name = "demo-app"
namespace = "xxx"
stage = "dev"
app_name = "demo-app-xxxx"
ecr_repo = "fargate-demo-repo"
github_oath_token = "xxxx"
github_webhooks_token = "xxxx"
github_repo_owner = "xxxx"
github_repo_name = "fargate-demo-app"

Now run the Terraform script

terraform init
terraform apply

The process will take about 5 mins and you should see an output like this. Note the public URL of the ALB, which is providing LB services to the 2x Fargate container tasks.

Step-3: Review the Fargate Service

On the AWS Console, go to “Elastic Container Service (ECS) —> Cluster” and we can see an ECS cluster “default” has been created, with 1x Fargate service defined and 2x container tasks/pods running.

and here are the two running container tasks/pods:

Click any of the tasks to confirm its running our demo app image deployed from the ECR repository.

Next, search for AWS service “Developer Tools —> CodePipeline“, you’ll see our Pipeline has been deployed with a (1st) successful execution.

Now search for “EC2 —> Load Balancer”, confirm that an ALB has been created and it should be deployed on two different subsets across two AZs.

This is because we are spreading the 2x ECS container tasks onto two AZs for high availability

Go to the ALB public DNS/URL and you should see the default page of our demo app running on AWS Fargate, cool!

Step-4: Test the Pipeline Run

It’s testing time now! As discussed, the pipeline is synced to Github repository and will be triggered by a push to master event. The actual build task is defined within the buildspec.yaml which contains a simple 3-stage process as per below. Note the output of the build process includes a json artifact (imagedefinitions.json) which includes the ECR path for the latest build image.

version: 0.2
phases:
  pre_build:
    commands:
      - echo Logging in to Amazon ECR...
      - aws --version
      - eval $(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email)
      - REPOSITORY_URI=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO_NAME
      - IMAGE_TAG=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)
  build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...
      - REPO_URI=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO_NAME
      - docker pull $REPO_URI:latest || true
      - docker build --cache-from $REPO_URI:latest --tag $REPO_URI:latest --tag $REPO_URI:$IMAGE_TAG .
  post_build:
    commands:
      - echo Build completed on `date`
      - echo Pushing the Docker images...
      - REPO_URI=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO_NAME
      - docker push $REPO_URI:latest
      - docker push $REPO_URI:$IMAGE_TAG
      - echo Writing image definitions file...
      - printf '[{"name":"demo-app","imageUri":"%s"}]' "$REPO_URI:$IMAGE_TAG" | tee imagedefinitions.json
artifacts:
  files: imagedefinitions.json

To test the pipeline run, we’ll make a “cosmetic change” to the app revision (v1.0 —> v1.1)

Commit and push to master.

As expected, this has triggered a new pipeline run

Soon you’ll see two additional pods are launching with a new revision number of “3” — this is because by default Fargate implements a rolling update deployment strategy with a default minimum healthy percent of 100%. So it will not remove the previous container pods (revision 2) until the new ones are running and ready.

Once the v3 Pods are running and we can see the v2 pods are being terminated and de-registered from the service.

Eventually the v2 pods are removed and the Fargate service is now updated with revision 3, which consists of the new pods running our demo app “v1.1”.

In the CodePipeline history, verify the new build & deployment process have been completed successfully.

Also, verify the new image (tag “99cc610”) of the demo app is pushed to ECR as expected.

Go to the Fargate tasks (revision 3) again and verify the container pods are indeed running on the new image “99cc610”.

Refresh the ALB address to see the v1.1 app page loading — Magic!

Cloud Native DevOps on GCP Series Ep3 – Use Terraform to launch a Serverless CI/CD pipeline with Cloud Run, GCR and Cloud Build

This is the third episode of our Cloud Native DevOps on GCP series. In the previous chapters, we have achieved the following:

This time, we will take a step further and go completely serverless by deploying the same Node app onto the Google Cloud Run platform. Cloud Run is built from an open source project named Knative, which is a serverless framework developed based on the industry proven Kubernetes architecture. Whilst Knative is developed with the same event-driven concept (like other serverless solutions), it also offers great flexibility and multi-cloud portability at a container level.

For this demo, we will firstly launch a Cloud Run Service with an initial image using cloudrun-hello app provided by Google. We will also create a Cloud Build Pipeline to automatically build and push our Node app onto GCR, and then deploy it to the same Cloud Run Service (as a new revision). As previously, the pipeline will be synced to GitHub repository and automatically triggered by a Git push event.

Best of all, all GCP resources in this environment, including the Cloud Run Service and the Cloud Build Pipeline will be provisioned via Terraform, as illustrated at below.

WHAT YOU’LL NEED:

  • Access to a GCP testing environment
  • Install Git and Terrafrom on your client
  • Install GCloud SDK
  • Check the NTP clock & sync status on your client —> important!
  • Clone or download the Terraform script at here
  • Clone or fork the NodeJS demo app at here

Step-1: Prepare the GCloud Environment

To start, configure the GCloud environment variables and authentications.

gcloud init
gcloud config set accessibility/screen_reader true
gcloud auth application-default login

Enable required GCP API services

gcloud services enable servicenetworking.googleapis.com
gcloud services enable cloudresourcemanager.googleapis.com
gcloud services enable cloudbuild.googleapis.com
gcloud services enable containerregistry.googleapis.com 
gcloud services enable run.googleapis.com 
gcloud services enable sourcerepo.googleapis.com    

Update Cloud Build service account with all the necessary roles so it will have required permissions to access Cloud Run and GCR within the project.

PROJECT_ID=`gcloud config get-value project`
CLOUDBUILD_SA="$(gcloud projects describe $PROJECT_ID --format 'value(projectNumber)')@cloudbuild.gserviceaccount.com"
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:$CLOUDBUILD_SA --role roles/editor
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:$CLOUDBUILD_SA --role roles/run.admin
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:$CLOUDBUILD_SA --role roles/container.developer

Step-2: Connect Cloud Build to GitHub Repository

Next, let’s connect Cloud Build to the demo app Git Repository. On GCP console, go to “Cloud Build —> Triggers —> Connect Repository” and then select “GitHub” as below. (You will be redirected to GitHub for authentication.)

Select the demo app repository which contains the sample NodeJs application.

In the next page, make sure to click “Skip for now” and we are done. We’ll leave it to Terraform to create the trigger at later.

Step-3: Run Terrafrom Script to launch a Serverless CI/CD Pipeline

Before executing the script, make sure to update the variables (as defined in Terrafrom.tfvars) as per your own GCP environment.

project_id = "xxxxxxxx"
location = "asia-northeast1"
gcr_region = "asia"
github_owner = "xxxxxx"
github_repository = "xxxxxx"

Run the Terraform script.

terraform init
terraform apply

Since we are not provisioning any Infrastructure resources (it’s Serverless!), the process should take less than 2~3 mins. Take a note of the URL provided in the output — this is the public URL of our Cloud Run Service.

On GCP console verify the Cloud Run Service has been deployed successfully.

Now go to the above URL and you should see the default page of the cloudrun-hello app.

Before we move forward, confirm there is now a Cloud Build triggered provisioned by Terrafrom with the pipeline config defined as “cloudbuild.yaml“.

Step-4: Test the Pipeline

Now let’s take a closer look at the pipeline code. This is a basic 3-stage pipeline:

  • Build the demo Node app
  • Push the image to GCR
  • Deploy the image from GCR to the existing Cloud Run Service
steps:
  # Build Node app docker image
  - name: "gcr.io/cloud-builders/docker"
    args:
      - build
      - -t
      - ${_GCR_REGION}.gcr.io/$PROJECT_ID/${_SERVICE_NAME}:$COMMIT_SHA
      - .

  # Push Node app image to GCR
  - name: "gcr.io/cloud-builders/docker"
    args:
      - push 
      - ${_GCR_REGION}.gcr.io/$PROJECT_ID/${_SERVICE_NAME}:$COMMIT_SHA

  # Deploy the docker image to Cloud Run Service
  - name: "gcr.io/cloud-builders/gcloud"
    args:
      - run
      - deploy
      - ${_SERVICE_NAME}
      - --image=${_GCR_REGION}.gcr.io/$PROJECT_ID/${_SERVICE_NAME}:$COMMIT_SHA
      - --region=${_LOCATION}
      - --platform=managed

images:
  - "${_GCR_REGION}.gcr.io/$PROJECT_ID/${_SERVICE_NAME}:$COMMIT_SHA"

timeout: 1200s
substitutions:
  _LOCATION: asia-northeast1 
  _GCR_REGION: asia 
  _SERVICE_NAME: cloudrun-demo

Time to test the pipeline! We’ll add a note into the README file.

Commit and push to Git.

This should automatically trigger the pipeline, and the 3-stage process should be completed around a minute 🙂

Now go back to our Cloud Run Service, you should see a new revision has been deployed by Cloud Build, with the container image now pointing to the GCR path (which contains our demo app).

Refresh the browser and Boom — you now have access to the demo app running on Google Cloud Run!

This concludes our Cloud Native DevOps on GCP series. I hope this has been informative and thanks very much for reading!