NKE lab series – Ep3: Deep dive into NKE networking with Calico CNI

This is the 3rd episode of our NKE lab series. Previously, I have walked through:

In this episode, we’ll deep dive into the NKE networking spaces by exploring the following:

  • PART-1: Exploring Calico CNI deployment models within a NKE cluster
  • PART-2: Applying standard Kubernetes network policy in a NKE cluster
  • PART-3: Leveraging Calico specific policies in a NKE cluster

pre-requisites

  • a 1-node or 3-node Nutanix CE 2.0 cluster deployed in nested virtualization depending on your lab compute capacity, as documented here and here
  • a NKE-enabled K8s cluster deployed in Nutanix CE (see Ep1)
  • a Guestbook demo app deployed onto the NKE cluster (see Ep2)
  • a lab network environment supports VLAN tagging and provides basic infra services such as AD, DNS, NTP etc (these are required when installing the CE cluster)
  • a Linux/Mac workstation for managing the Kubernetes cluster, with Kubectl installed

PART-1: Exploring Calico CNI models in NKE

Calico is recognized as the most popular CNI plugins within he Kubernetes community, and it has been widely deployed in production thanks to its reliable performance and comprehensive networking and security features.

Calico supports a variety of flexible Kubernetes networking deployment options, including:

  • Non-Overlay network (most performant model with no encapsulation involved)
    • Flat L2 mode (full-mesh BGP peering between k8s nodes to route Pod IPs)
    • BGP (uses iBGP router reflectors to reduce peering pressure in a large cluster, can use ToR L3 switch, or just nodes as software reflectors)
  • Overlay Network (for cross-subnet cluster connectivity, or there is no BGP support)
    • IP-in-IP encapsulation
    • VXLAN encapsulation

In a NKE cluster, Calico is pre-configured to use the (default) Flat L2 mode, where all K8s nodes are deployed within the same L2 subnet and establish BGP full-mesh to route Pod IP prefixes. This is the most simple yet performant solution and does not introduce additional dataplane encapsulation overhead.

Let’s check our NKE cluster to find out more details. First let’s see what are the Calico-based CNI API resources made available to us.

sc@vx-ops02:~$ kubectl api-resources | grep calico
bgpconfigurations                                     crd.projectcalico.org/v1               false        BGPConfiguration
bgpfilters                                            crd.projectcalico.org/v1               false        BGPFilter
bgppeers                                              crd.projectcalico.org/v1               false        BGPPeer
blockaffinities                                       crd.projectcalico.org/v1               false        BlockAffinity
caliconodestatuses                                    crd.projectcalico.org/v1               false        CalicoNodeStatus
clusterinformations                                   crd.projectcalico.org/v1               false        ClusterInformation
felixconfigurations                                   crd.projectcalico.org/v1               false        FelixConfiguration
globalnetworkpolicies                                 crd.projectcalico.org/v1               false        GlobalNetworkPolicy
globalnetworksets                                     crd.projectcalico.org/v1               false        GlobalNetworkSet
hostendpoints                                         crd.projectcalico.org/v1               false        HostEndpoint
ipamblocks                                            crd.projectcalico.org/v1               false        IPAMBlock
ipamconfigs                                           crd.projectcalico.org/v1               false        IPAMConfig
ipamhandles                                           crd.projectcalico.org/v1               false        IPAMHandle
ippools                                               crd.projectcalico.org/v1               false        IPPool
ipreservations                                        crd.projectcalico.org/v1               false        IPReservation
kubecontrollersconfigurations                         crd.projectcalico.org/v1               false        KubeControllersConfiguration
networkpolicies                                       crd.projectcalico.org/v1               true         NetworkPolicy
networksets                                           crd.projectcalico.org/v1               true         NetworkSet

If we query the ippools.crd.projectcalico.org API we can see the Calico deployment details.

sc@vx-ops02:~$ kubectl get ippools.crd.projectcalico.org -o yaml > nke-dev01-calico-ippool.yaml
sc@vx-ops02:~$ cat nke-dev01-calico-ippool.yaml 
...
  spec:
    allowedUses:
    - Workload
    - Tunnel
    blockSize: 26
    cidr: 172.20.0.0/16
    ipipMode: Never
    natOutgoing: true
    nodeSelector: all()
    vxlanMode: Never
kind: List

As shown above, we are not using any overlay encapsulation (both IPinIP or VXLAN mode are off). Each K8s node will get a /26 block from the pre-allocated Pod CIDR (172.20.0.0/16) — this will be used to assign IP addresses to the local Pods.

We can also query the caliconodestatuses.crd.projectcalico.org to get further details for node networking. However, to use this function we’ll need to create a CalicoNodeStatus resource and specify which node and what information we want to collect.

sc@vx-ops02:~$ cat calico-node-status.yaml 
---
apiVersion: crd.projectcalico.org/v1
kind: CalicoNodeStatus
metadata:
  name: caliconodestatus-master-0
spec:
  classes:
    - Agent
    - BGP
    - Routes
  node: nke-dev01-89e792-master-0 
  updatePeriodSeconds: 10

---
apiVersion: crd.projectcalico.org/v1
kind: CalicoNodeStatus
metadata:
  name: caliconodestatus-worker-0
spec:
  classes:
    - Agent
    - BGP
    - Routes
  node: nke-dev01-89e792-worker-0 
  updatePeriodSeconds: 10

---
apiVersion: crd.projectcalico.org/v1
kind: CalicoNodeStatus
metadata:
  name: caliconodestatus-worker-1
spec:
  classes:
    - Agent
    - BGP
    - Routes
  node: nke-dev01-89e792-worker-1 
  updatePeriodSeconds: 10


sc@vx-ops02:~$ kubectl apply -f calico-node-status.yaml 
caliconodestatus.crd.projectcalico.org/caliconodestatus-master-0 created
caliconodestatus.crd.projectcalico.org/caliconodestatus-worker-0 created
caliconodestatus.crd.projectcalico.org/caliconodestatus-worker-1 created

Once deployed, let it run for 30 seconds to collect the information and we can then query the caliconodestatuses.crd.projectcalico.org API:

sc@vx-ops02:~$ kubectl get caliconodestatuses.crd.projectcalico.org -o yaml > nke-dev01-caliconodestatus.yaml
sc@vx-ops02:~$ cat nke-dev01-caliconodestatus.yaml 

There’s heaps information here but we’ll only focus at this section under the Master node:

...
    bgp:
      numberEstablishedV4: 2
      numberEstablishedV6: 0
      numberNotEstablishedV4: 0
      numberNotEstablishedV6: 0
      peersV4:
      - peerIP: 192.168.102.104
        since: "14:19:16"
        state: Established
        type: NodeMesh
      - peerIP: 192.168.102.103
        since: "14:19:26"
        state: Established
        type: NodeMesh

    routes:
      routesV4:
...
      - destination: 172.20.188.64/26
        gateway: 192.168.102.103
        interface: eth0
        learnedFrom:
          peerIP: 192.168.102.103
          sourceType: NodeMesh
        type: FIB
      - destination: 172.20.116.128/26
        gateway: 192.168.102.104
        interface: eth0
        learnedFrom:
          peerIP: 192.168.102.104
          sourceType: NodeMesh
        type: FIB

From above we can confirm the master node (192.168.102.102) has established full mesh with the other 2x worker nodes

  • Worker-0: 192.168.102.103 (learned Pod routes 172.20.188.64/26 via BGP)
  • Worker-1: 192.168.102.104 (learned Pod routes 172.20.116.128/26 via BGP)

and we can further confirm the Pod routes by looking at Pod IP addresses on each worker node:

sc@vx-ops02:~$ kubectl get pods --all-namespaces -o wide | grep nke-dev01-89e792-worker-1 | grep 172.20
guestbook        frontend-795b566649-4zkqc                    1/1     Running   0              82m   172.20.116.140    nke-dev01-89e792-worker-1   <none>           <none>
guestbook        frontend-795b566649-8tf76                    1/1     Running   0              82m   172.20.116.139    nke-dev01-89e792-worker-1   <none>           <none>
guestbook        redis-follower-5ffdf87b7d-4lqlr              1/1     Running   0              82m   172.20.116.141    nke-dev01-89e792-worker-1   <none>           <none>
guestbook        redis-leader-c767d6dbb-t5t8j                 1/1     Running   0              82m   172.20.116.142    nke-dev01-89e792-worker-1   <none>           <none>
kube-system      calico-kube-controllers-5cd67d7657-52km8     1/1     Running   0              11h   172.20.116.128    nke-dev01-89e792-worker-1   <none>           <none>
kube-system      coredns-6fb596b5df-p7mhq                     1/1     Running   0              11h   172.20.116.129    nke-dev01-89e792-worker-1   <none>           <none>
metallb-system   controller-77676c78d9-5gvd6                  1/1     Running   0              93m   172.20.116.136    nke-dev01-89e792-worker-1   <none>           <none>
ntnx-system      alertmanager-main-0                          2/2     Running   1 (11h ago)    11h   172.20.116.133    nke-dev01-89e792-worker-1   <none>           <none>
ntnx-system      blackbox-exporter-5458d77cfb-d62kc           3/3     Running   0              11h   172.20.116.132    nke-dev01-89e792-worker-1   <none>           <none>
ntnx-system      csi-snapshot-webhook-756b45fb5c-t9k8k        1/1     Running   0              11h   172.20.116.131    nke-dev01-89e792-worker-1   <none>           <none>
ntnx-system      fluent-bit-v8gdt                             1/1     Running   0              11h   172.20.116.130    nke-dev01-89e792-worker-1   <none>           <none>
sc@vx-ops02:~$ 
sc@vx-ops02:~$ kubectl get pods --all-namespaces -o wide | grep nke-dev01-89e792-worker-0 | grep 172.20
guestbook        frontend-795b566649-j4r75                    1/1     Running   0              82m   172.20.188.73     nke-dev01-89e792-worker-0   <none>           <none>
kube-system      coredns-6fb596b5df-4kkrc                     1/1     Running   0              11h   172.20.188.64     nke-dev01-89e792-worker-0   <none>           <none>
ntnx-system      csi-snapshot-controller-7d68bf5bd7-6fg9c     1/1     Running   0              11h   172.20.188.65     nke-dev01-89e792-worker-0   <none>           <none>
ntnx-system      fluent-bit-t9d7c                             1/1     Running   0              11h   172.20.188.66     nke-dev01-89e792-worker-0   <none>           <none>
ntnx-system      kube-state-metrics-54c97cdfdd-rkm2m          3/3     Running   0              11h   172.20.188.69     nke-dev01-89e792-worker-0   <none>           <none>
ntnx-system      kubernetes-events-printer-6f44868d47-5sg98   1/1     Running   0              11h   172.20.188.67     nke-dev01-89e792-worker-0   <none>           <none>
ntnx-system      prometheus-adapter-678c647d87-rbkqc          1/1     Running   0              11h   172.20.188.70     nke-dev01-89e792-worker-0   <none>           <none>
ntnx-system      prometheus-k8s-0                             2/2     Running   0              11h   172.20.188.71     nke-dev01-89e792-worker-0   <none>           <none>
ntnx-system      prometheus-operator-f57b8d9cb-kpwp9          2/2     Running   0              11h   172.20.188.68     nke-dev01-89e792-worker-0   <none>           <none>

PART-2: USING standard K8s network policy

With Calico deployed in our NKE cluster, we can straight away using standard Kubernetes Network policies to enhance cluster-wide security.

For example, by default we can access the Guestbook service from anywhere within the cluster. We can test this by launching a testpod within the default namespace and curl the frontend service (using K8s DNS format service.namespace)

sc@vx-ops02:~$ kubectl run testpod -it  --rm --image=yauritux/busybox-curl -- sh
/home # 
/home # curl frontend.guestbook
<html ng-app="redis">
  <head>
    <title>Guestbook</title>
    <link rel="stylesheet" href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css">
    https://ajax.googleapis.com/ajax/libs/angularjs/1.2.12/angular.min.js
    http://controllers.js
    https://cdnjs.cloudflare.com/ajax/libs/angular-ui-bootstrap/2.5.6/ui-bootstrap-tpls.js
  </head>
  <body ng-controller="RedisCtrl">
    <div style="width: 50%; margin-left: 20px">
      <h2>Guestbook</h2>
    <form>
    <fieldset>
    <input ng-model="msg" placeholder="Messages" class="form-control" type="text" name="input"><br>
    <button type="button" class="btn btn-primary" ng-click="controller.onRedis()">Submit</button>
    </fieldset>
    </form>
    <div>
      <div ng-repeat="msg in messages track by $index">
        {{msg}}
      </div>
    </div>
    </div>
  </body>
</html>

So how about if I only want to limit a certain namespace to access and consume the guestbook service? For example, we can use the below K8s ingress network policy to limit access to our guestbook service only from the namespace of “testns”.

sc@vx-ops02:~$ kubectl create ns testns

sc@vx-ops02:~$ kubectl get ns --show-labels | grep testns
testns            Active   40m     kubernetes.io/metadata.name=testns

sc@vx-ops02:~$ cat net-policy.yaml 
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: mypolicy
  namespace: guestbook
spec:
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: testns

sc@vx-ops02:~$ kubectl apply -f net-policy.yaml 
networkpolicy.networking.k8s.io/mypolicy created

After applying the ingress policy, we should only be able to access the frontend service from within the testns namespace. Open another terminal, launch a new testpod in the testns namespace to verify this.

sc@vx-ops02:~$  kubectl run testpod -n testns -it  --rm --image=yauritux/busybox-curl -- sh

/home # curl frontend.guestbook
<html ng-app="redis">
  <head>
    <title>Guestbook</title>
    <link rel="stylesheet" href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css">
    https://ajax.googleapis.com/ajax/libs/angularjs/1.2.12/angular.min.js
    http://controllers.js
    https://cdnjs.cloudflare.com/ajax/libs/angular-ui-bootstrap/2.5.6/ui-bootstrap-tpls.js
  </head>
  <body ng-controller="RedisCtrl">
    <div style="width: 50%; margin-left: 20px">
      <h2>Guestbook</h2>
    <form>
    <fieldset>
    <input ng-model="msg" placeholder="Messages" class="form-control" type="text" name="input"><br>
    <button type="button" class="btn btn-primary" ng-click="controller.onRedis()">Submit</button>
    </fieldset>
    </form>
    <div>
      <div ng-repeat="msg in messages track by $index">
        {{msg}}
      </div>
    </div>
    </div>
  </body>
</html>
/home # 

Back at the previous terminal, we are now unable to connect to the frontend service from the first testpod within the default namespace, cool!

/home # curl frontend.guestbook
curl: (28) Failed to connect to frontend.guestbook port 80 after 127280 ms: Operation timed out

PART-3: leveraging Calico specific policies

One of the limitations with standard Kubernetes network policy is that you can only apply permit rules but deny rules are not supported. This makes it difficult to implement a “blacklist” scenario where you want to only block certain conditions but allow the rest.

Good news is that we can use Calico based network policies which supports deny rules.

In the below example, we leverage the Calico CNI to create 2x egress rules for the guestbook namespace to deny outbound access to 8.0.0.0/8 but allow to the rest networks.

sc@vx-ops02:~$ cat net-policy1.yaml 
---
apiVersion: crd.projectcalico.org/v1
kind: NetworkPolicy
metadata:
  name: custom-deny-egress
  namespace: guestbook
spec:
  order: 10
  types:
    - Egress
  egress:
    - action: Deny
      destination:
        nets:
          - 8.0.0.0/8
---
apiVersion: crd.projectcalico.org/v1
kind: NetworkPolicy
metadata:
  name: default-allow-egress
  namespace: guestbook
spec:
  order: 20
  types:
    - Egress
  egress:
    - action: Allow
      destination:
        nets:
          - 0.0.0.0/0

Now let’s apply the Calico policy.

sc@vx-ops02:~$ kubectl apply -f net-policy1.yaml 
networkpolicy.crd.projectcalico.org/custom-deny-egress created
networkpolicy.crd.projectcalico.org/default-allow-egress created

To test this, we can simply find a pod in the guestbook namespace, attach to it and run some ping tests.

sc@vx-ops02:~$ kubectl get pods -n guestbook 
NAME                              READY   STATUS    RESTARTS   AGE
frontend-795b566649-4zkqc         1/1     Running   0          7h3m
frontend-795b566649-8tf76         1/1     Running   0          7h3m
frontend-795b566649-j4r75         1/1     Running   0          7h3m
redis-follower-5ffdf87b7d-4lqlr   1/1     Running   0          7h3m
redis-leader-c767d6dbb-t5t8j      1/1     Running   0          7h3m

sc@vx-ops02:~$ kubectl exec -ti -n guestbook frontend-795b566649-j4r75 -- sh
# apt-get install iputils-ping

# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
^C
--- 8.8.8.8 ping statistics ---
7 packets transmitted, 0 received, 100% packet loss, time 6ms

# ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=55 time=10.4 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=55 time=9.33 ms
^C
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 9.330/9.856/10.382/0.526 ms
# 
# ping 192.168.100.5
PING 192.168.100.5 (192.168.100.5) 56(84) bytes of data.
64 bytes from 192.168.100.5: icmp_seq=1 ttl=126 time=0.877 ms
64 bytes from 192.168.100.5: icmp_seq=2 ttl=126 time=0.792 ms
^C

As we can see, from the frontend pod we are enable to ping Google but still able to access the rest of networks – exactly what we expected!