Skip to main content

Migrating Kubernetes from Docker to Containerd

Disclaimer

I have published this post on my work blog https://reece.tech previously.

Overview

I have operated multiple on-premise and cloud hosted K8s clusters for many years and we heavily utilise docker as our container runtime for master and worker nodes.

As most readers would be aware by now, the Kubernetes update to 1.20 also announced the deprecation and future removal of the much loved docker interface.

This post documents our journey from docker to a suitable replacement option.

Options

The two most obvious alternatives are cri-o and containerd. As containerd is the default for many cloud based K8s environments and containerd was used behind the scenes by our K8s docker layer already anyway, the choice was quite easy.

Changes required

The main change (for K8s 1.19.5) was to install containerd instead of dockerd and then start kubelet with additional --container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock command line options.

The dedicated /var/lib/docker volume has been renamed and remounted as /var/lib/containerd instead. We also added additional dedicated volumes for /var/lib/kubelet and /var/log as disk space usage for these directories increased somewhat after the migration.

A new version of crictl was required as well as changes to /etc/crictl.yaml to add the default cri runtime via runtime-endpoint: unix:///run/containerd/containerd.sock.

Rather than running docker system prune -a -f on each worker periodically, we are now using the below script on each new containerd node to remove unused images and clean up old containers.

for id in `crictl ps -a | grep -i exited | awk '{print $1}'`; do crictl rm $id ; done ; crictl rmi --prune` 

For CentOS7 kernels, an additional kernel parameter was required as we experienced random problems such as cannot allocate memory errors when kubelet was starting up new pods, especially for Kubernetes cronjobs. This lead to having quite a few pods hanging in ContainerCreating state, obviously not ideal. Adding the option cgroup.memory=nokmem to the kernel fixed the issue for us.

Logging

Our logging pipeline has changed a bit over the years and the solution prior to migrating to containerd was a modified version of the https://github.com/looplab/logspout-logstash daemonset. Each pod was reading logs from journald, enriching them with cluster and docker metadata and forwarding them to logstash located on each of our Elasticsearch servers. This solution was not perfect as logspout-logstash sometimes lost network connectivity to logstash without recovering and the combination of docker-ce and journald added quite some extra load to each worker.

The new logging solution with containerd employs fluent-bit to tail container logs from /var/log/containers/ and to send them including K8s labels straight to Elasticsearch. Fluent-bit also filters out some unnecessary logging such as health checks.

Familiarity

After years of using familiar docker commands, we suddenly found ourselves learning and using ctr and crictl. We introduced a temporary docker shell script wrapper which runs the equivalent ctr and crictl commands for troubleshooting tasks such as docker imagesdocker ps or docker rm to name a few.

Other side effects

The average worker load has decreased considerably - most likely due to migrating away from journald. Also having pods logs in the local /var/log/containers directory made debugging (especially for the logging pipeline) somewhat easier.

Conclusion

The actual changes required were quite small, however it forced quite a big change to our logging infrastructure and also required additional monitoring for each worker in order to be fit for production workloads.

Links and further reading

https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/ 
https://acloudguru.com/blog/engineering/kubernetes-is-deprecating-docker-what-you-need-to-know

Comments

Popular posts from this blog

Manual Kubernetes TLS certificate renewal procedure

Intro Kubernetes utilizes TLS certificates to secure different levels of internal and external cluster communication.  This includes internal services like the apiserver, kubelet, scheduler and controller-manager etc. These TLS certificates are created during the initial cluster installation and are usually valid for 12 months. The cluster internal certificate authority (CA) certificate is valid for ten years. There are options available to automate certificate renewals, but they are not always utilised and these certs can become out of date. Updating certain certificates may require restarts of K8s components, which may not be fully automated either. If any of these certificates is outdated or expired, it will stop parts or all of your cluster from functioning correctly. Obviously this scenario should be avoided - especially in production environments. This blog entry focuses on manual renewals / re-creation of Kubernetes certificates. For example, the api-server certificate below...

Analysing and replaying MySQL database queries using tcpdump

Why There are situations where you want to quickly enable query logging on a MySQL Database or trouble shoot queries hitting the Database server in real-time. Yes, you can enable the DB query log and there are other options available, however the script below has helped me in many cases as it is non intrusive and does not require changing the DB server, state or configuration in any way. Limitations The following only works if the DB traffic is not encrypted (no SSL/TLS transport enabled). Also this needs to be run directly on the DB server host (as root / admin). Please also be aware that this should be done on servers and data you own only. Script This script has been amended to suit my individual requirements. #!/bin/sh tcpdump -i any -s 0 -l -w - dst port 3306 | strings | perl -e ' while(<>) { chomp; next if /^[^ ]+[ ]*$/;   if(/^(ALTER|COMMIT|CREATE|DELETE|DROP|INSERT|SELECT|SET|UPDATE|ROLLBACK)/i) {     if (defined $q) { print "$q\n"; }     $q=$_; ...

Deprecating Networking Ingress API version in Kubernetes 1.22

  Intro Kubernetes deprecates API versions over time. Usually this affects alpha and beta versions and only requires changing the apiVersion: line in your resource file to make it work. However with this Ingress object version change, additional changes are necessary. Basics For this post I am quickly creating a new cluster via Kind (Kubernetes in Docker) . Once done, we can see which API versions are supported by this cluster (version v1.21.1). $ kubectl api-versions | grep networking networking.k8s.io/v1 networking.k8s.io/v1beta1 Kubernetes automatically converts existing resources internally into different supported API versions. So if we create a new Ingress object with version v1beta1 on a recent cluster version, you will receive a deprecation warning - and the same Ingress object will exist both in version v1beta1 and v1. Create $ cat ingress_beta.yaml apiVersion: networking.k8s.io/v1beta1 kind: Ingress metadata:   name: clusterpirate-ingress spec:   rules:  ...