Disclaimer
I have published this post on my work blog https://reece.tech previously.
Intro
We are running an on premise Kubernetes cluster on Red Hat Linux 7.5 (in VMware).
The /var/lib/docker
file-system is a separate partition, formatted with ext4
and we used overlay
as storage provider for docker, which was recommended for earlier RHEL 7 releases.
What happened
One fine day, one of our containers started creating core dumps - about 1 GB per minute worth, resulting in /var/lib/docker
(100 GB in size) to fill up in less than 90 minutes. Existing pods crashed, new pods could not pull their image or start up. We deleted the existing pods on one of the Kubernetes worker nodes manually, however the container in question migrated to a different worker and continued its mission.
Investigation
We believed there is a 10 GB size limit for each running containers by default, however this did not seem to be the case. After consulting the relevant documentation it became clear that the overlay
storage driver and also use of ext4
does not support container size limits and is also not the recommended solution (anymore). At the time of writing, xfs
and overlay2
are recommended, which in combination with xfs
project quotas can enforce the size limit per container.
Resolution
Reformatting the disk and updating the fstab.
/dev/sdb /var/lib/docker xfs defaults,quota,prjquota,pquota,gquota 0 0
Updating /etc/systemd/system/docker.service.d/override.conf
ExecStart=/usr/bin/dockerd --storage-driver=overlay2 --exec-opt native.cgroupdriver=systemd --log-driver=journald --storage-opt overlay2.override_kernel_check=true --storage-opt overlay2.size=10G
The overlay2.override_kernel_check=true option is required for older (3.10.x) kernels.
Testing the new setup
# docker info | egrep "Backing Filesystem|Storage Driver"
Storage Driver: overlay2
Backing Filesystem: xfs
# mount | grep '/dev/sdb on /var/lib/docker'
/dev/sdb on /var/lib/docker type xfs (rw,relatime,seclabel,attr2,inode64,usrquota,prjquota,grpquota)
# docker container disk space is limited to 10 GB
root@aaba31936b78:/# dd if=/dev/zero of=out bs=4096k
dd: error writing 'out': No space left on device
2560+0 records in
2559+0 records out
10737352704 bytes (11 GB, 10 GiB) copied, 7.11036 s, 1.5 GB/s
# xfs_quota -x -c 'report -h' /var/lib/docker
Project quota on /var/lib/docker (/dev/sdb)
Blocks
Project ID Used Soft Hard Warn/Grace
---------- ---------------------------------
...
#197 16K 10G 10G 00 [------]
#198 8K 10G 10G 00 [------]
#199 10.0G 10G 10G 00 [------] # <---- this container uses 10 GB max.
Comments
Post a Comment