Skip to main content

How to check if the Kubernetes control plane is healthy


Disclaimer

I have published this post on my work blog https://reece.tech previously.

Why is this important

We are running an on premise Kubernetes cluster (currently version 1.11.6) on Red Hat Linux 7.5 (in VMware). Most documentation (especially when it comes to master version upgrades) mentions checking that the control plane is healthy prior to performing any cluster changes. Obviously this is an important step to ensure consistency and repeatability - and also important during day to day management of your cluster, but how exactly do we do this?

Our approach

Our (multi master) Kubernetes control plane consists of a few different services / parts like etcd, kube-apiserver, scheduler, controller-manager and so on. Each component should be verified during this process.

Starting simple

Run kubectl get nodes -o wide to ensure all nodes are Ready. Also check that the master servers have the master role.

Also running kubectl get cs will show you the status of vital control plane components at a glance.

# kubectl get cs
NAME                 STATUS    MESSAGE              ERROR
controller-manager   Healthy   ok                   
scheduler            Healthy   ok                   
etcd-2               Healthy   {"health": "true"}   
etcd-4               Healthy   {"health": "true"}   
etcd-1               Healthy   {"health": "true"}   
etcd-3               Healthy   {"health": "true"}   
etcd-0               Healthy   {"health": "true"}

Load balancing

Our apiserver endpoint is load balanced (via F5 load balancer). Checking the health of each node ensures that all nodes are up and responding to requests.

etcd

We are using etcd (3.2.22) as standalone systemd service on each of our 5 Kubernetes master nodes. The health of this service can be checked using the etcdctl tool as follows:

# etcdctl member list
e7e4dac626988c1e: name=victkubea02.XX.yy peerURLs=http://10.XX.XX.21:2380 clientURLs=http://0.0.0.0:2379 isLeader=false
ee3ce541f522cb82: name=victkubea05.XX.yy peerURLs=http://10.XX.XX.24:2380 clientURLs=http://0.0.0.0:2379 isLeader=false
efecd5b345acd2e2: name=victkubea04.XX.yy peerURLs=http://10.XX.XX.23:2380 clientURLs=http://0.0.0.0:2379 isLeader=false
f23ec2be4c9c3a2f: name=victkubea03.XX.yy peerURLs=http://10.XX.XX.22:2380 clientURLs=http://0.0.0.0:2379 isLeader=false
ffeb4905ae8fa6c0: name=victkubea01.XX.yy peerURLs=http://10.XX.XX.20:2380 clientURLs=http://0.0.0.0:2379 isLeader=true

# etcdctl cluster-health
member e7e4dac626988c1e is healthy: got healthy result from http://0.0.0.0:2379
member ee3ce541f522cb82 is healthy: got healthy result from http://0.0.0.0:2379
member efecd5b345acd2e2 is healthy: got healthy result from http://0.0.0.0:2379
member f23ec2be4c9c3a2f is healthy: got healthy result from http://0.0.0.0:2379
member ffeb4905ae8fa6c0 is healthy: got healthy result from http://0.0.0.0:2379
cluster is healthy

As you can see, there is one leader with four followers, which is expected. Also each nodes is healthy, as is the cluster in general. We could stop here, or we could dig deeper by comparing key / value contents across nodes to be certain everything is in sync. A relatively simple test (apart from dumping all keys / values into a local file and diffing each node data) is just to count the lines for each key / value pair on every etcd server.

# ETCDCTL_API=3 etcdctl get / --prefix | wc -l
734297

kube-scheduler

There can only be one active kube-scheduler - while the other master server’s schedulers are dormant. This is achieved by updating a lease key in etcd periodically. The following command shows which master is active / the leader here.

# kubectl get endpoints kube-scheduler -n kube-system -o yaml | grep control-plane
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"victkubea01_399a5ce9-fcf4-11e8-8adc-005056a8097d","leaseDurationSeconds":15,"acquireTime":"2018-12-11T03:43:59Z","renewTime":"2019-01-23T23:43:22Z","leaderTransitions":29}'

In this example you see that victkubea01 is currently the kube-scheduler leader, there have been 29 transitions in the past and the update interval is 15 seconds. Should this kube-scheduler fail to update the lease, another kube-scheduler will become the leader and take over. This happens during rolling restarts or updates.

Stopping kubelet and docker temporarily on victkubea01 should lead to a leadership change within 30 seconds.

kube-controller-manager

Similar to the kube-scheduler, the kube-controller-manager can only be active on one of the master nodes.

# kubectl get endpoints kube-controller-manager -n kube-system -o yaml | grep control-plane
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"victkubea01_5d1b0efc-d720-11e8-91ec-005056a8097d","leaseDurationSeconds":15,"acquireTime":"2018-10-24T02:54:03Z","renewTime":"2019-01-23T23:53:21Z","leaderTransitions":28}'

kube-apiserver

The kube-apiserver process / pod is active on each master node. Accessing this endpoint directly using kubectl or curl on each master node should be a first simple check.

# curl -k https://10.XX.XX.20:6443/api/
{
  "kind": "APIVersions",
  "versions": [
    "v1"
  ],
  "serverAddressByClientCIDRs": [
    {
      "clientCIDR": "0.0.0.0/0",
      "serverAddress": "10.XX.XX.225:6443"
    },
    {
      "clientCIDR": "10.XX.XX.0/12",
      "serverAddress": "10.XX.XX.1:443"
    }
  ]
}

Another good way to ensure that all API servers are working together and sharing state correctly is comparing the checksum of each master’s configuration. Depending on your admin.conf file, this may hit the same endpoint or different API servers, so either run this multiple times or ensure this will query different API servers.

# kubeadm config view | md5sum
5062203b981028963247ceb1953dd210  -

Checking the CPU utilisation of each master server (especially the kube-apiserver container / process) can also shed light on configuration problems. The CPU utilisation should be less than 10 percent - during previous upgrades we have experienced high load problems (100% CPU) with this process on each master node, clearly indicating some issues.

Pods

Check that all pods are running - this can be limited to the kube-system namespace only should you have many pods in limbo by default.

kubectl get pods -o wide --all-namespaces | grep -vE 'Running|Completed'

Once all of these tests have completed successfully, you can start performing maintenance or upgrades on your cluster.

Comments

Popular posts from this blog

Manual Kubernetes TLS certificate renewal procedure

Intro Kubernetes utilizes TLS certificates to secure different levels of internal and external cluster communication.  This includes internal services like the apiserver, kubelet, scheduler and controller-manager etc. These TLS certificates are created during the initial cluster installation and are usually valid for 12 months. The cluster internal certificate authority (CA) certificate is valid for ten years. There are options available to automate certificate renewals, but they are not always utilised and these certs can become out of date. Updating certain certificates may require restarts of K8s components, which may not be fully automated either. If any of these certificates is outdated or expired, it will stop parts or all of your cluster from functioning correctly. Obviously this scenario should be avoided - especially in production environments. This blog entry focuses on manual renewals / re-creation of Kubernetes certificates. For example, the api-server certificate below...

Analysing and replaying MySQL database queries using tcpdump

Why There are situations where you want to quickly enable query logging on a MySQL Database or trouble shoot queries hitting the Database server in real-time. Yes, you can enable the DB query log and there are other options available, however the script below has helped me in many cases as it is non intrusive and does not require changing the DB server, state or configuration in any way. Limitations The following only works if the DB traffic is not encrypted (no SSL/TLS transport enabled). Also this needs to be run directly on the DB server host (as root / admin). Please also be aware that this should be done on servers and data you own only. Script This script has been amended to suit my individual requirements. #!/bin/sh tcpdump -i any -s 0 -l -w - dst port 3306 | strings | perl -e ' while(<>) { chomp; next if /^[^ ]+[ ]*$/;   if(/^(ALTER|COMMIT|CREATE|DELETE|DROP|INSERT|SELECT|SET|UPDATE|ROLLBACK)/i) {     if (defined $q) { print "$q\n"; }     $q=$_; ...

Deprecating Networking Ingress API version in Kubernetes 1.22

  Intro Kubernetes deprecates API versions over time. Usually this affects alpha and beta versions and only requires changing the apiVersion: line in your resource file to make it work. However with this Ingress object version change, additional changes are necessary. Basics For this post I am quickly creating a new cluster via Kind (Kubernetes in Docker) . Once done, we can see which API versions are supported by this cluster (version v1.21.1). $ kubectl api-versions | grep networking networking.k8s.io/v1 networking.k8s.io/v1beta1 Kubernetes automatically converts existing resources internally into different supported API versions. So if we create a new Ingress object with version v1beta1 on a recent cluster version, you will receive a deprecation warning - and the same Ingress object will exist both in version v1beta1 and v1. Create $ cat ingress_beta.yaml apiVersion: networking.k8s.io/v1beta1 kind: Ingress metadata:   name: clusterpirate-ingress spec:   rules:  ...