Disclaimer
I have published this post on my work blog https://reece.tech previously.
Overview
I have operated multiple on-premise and cloud hosted K8s clusters for many years and we heavily utilise docker
as our container runtime for master and worker nodes.
As most readers would be aware by now, the Kubernetes update to 1.20 also announced the deprecation and future removal of the much loved docker
interface.
This post documents our journey from docker to a suitable replacement option.
Options
The two most obvious alternatives are cri-o
and containerd
. As containerd
is the default for many cloud based K8s environments and containerd
was used behind the scenes by our K8s docker layer already anyway, the choice was quite easy.
Changes required
The main change (for K8s 1.19.5) was to install containerd
instead of dockerd
and then start kubelet with additional --container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock
command line options.
The dedicated /var/lib/docker
volume has been renamed and remounted as /var/lib/containerd
instead. We also added additional dedicated volumes for /var/lib/kubelet
and /var/log
as disk space usage for these directories increased somewhat after the migration.
A new version of crictl
was required as well as changes to /etc/crictl.yaml
to add the default cri runtime via runtime-endpoint: unix:///run/containerd/containerd.sock
.
Rather than running docker system prune -a -f
on each worker periodically, we are now using the below script on each new containerd
node to remove unused images and clean up old containers.
for id in `crictl ps -a | grep -i exited | awk '{print $1}'`; do crictl rm $id ; done ; crictl rmi --prune`
For CentOS7 kernels, an additional kernel parameter was required as we experienced random problems such as cannot allocate memory
errors when kubelet was starting up new pods, especially for Kubernetes cronjobs. This lead to having quite a few pods hanging in ContainerCreating
state, obviously not ideal. Adding the option cgroup.memory=nokmem
to the kernel fixed the issue for us.
Logging
Our logging pipeline has changed a bit over the years and the solution prior to migrating to containerd
was a modified version of the https://github.com/looplab/logspout-logstash
daemonset. Each pod was reading logs from journald
, enriching them with cluster and docker metadata and forwarding them to logstash
located on each of our Elasticsearch servers. This solution was not perfect as logspout-logstash
sometimes lost network connectivity to logstash without recovering and the combination of docker-ce
and journald
added quite some extra load to each worker.
The new logging solution with containerd
employs fluent-bit
to tail container logs from /var/log/containers/
and to send them including K8s labels straight to Elasticsearch. Fluent-bit also filters out some unnecessary logging such as health checks.
Familiarity
After years of using familiar docker commands, we suddenly found ourselves learning and using ctr
and crictl
. We introduced a temporary docker shell script wrapper which runs the equivalent ctr
and crictl
commands for troubleshooting tasks such as docker images
, docker ps
or docker rm
to name a few.
Other side effects
The average worker load has decreased considerably - most likely due to migrating away from journald
. Also having pods logs in the local /var/log/containers
directory made debugging (especially for the logging pipeline) somewhat easier.
Conclusion
The actual changes required were quite small, however it forced quite a big change to our logging infrastructure and also required additional monitoring for each worker in order to be fit for production workloads.
Links and further reading
https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/
https://acloudguru.com/blog/engineering/kubernetes-is-deprecating-docker-what-you-need-to-know
Comments
Post a Comment