Disclaimer
I have published this post on my work blog https://reece.tech previously.
Intro
We are running an on premise Kubernetes cluster on Red Hat Linux 7.5 (in VMware).
The /var/lib/docker file-system is a separate partition, formatted with ext4 and we used overlay as storage provider for docker, which was recommended for earlier RHEL 7 releases.
What happened
One fine day, one of our containers started creating core dumps - about 1 GB per minute worth, resulting in /var/lib/docker (100 GB in size) to fill up in less than 90 minutes. Existing pods crashed, new pods could not pull their image or start up. We deleted the existing pods on one of the Kubernetes worker nodes manually, however the container in question migrated to a different worker and continued its mission.
Investigation
We believed there is a 10 GB size limit for each running containers by default, however this did not seem to be the case. After consulting the relevant documentation it became clear that the overlay storage driver and also use of ext4 does not support container size limits and is also not the recommended solution (anymore). At the time of writing, xfs and overlay2 are recommended, which in combination with xfs project quotas can enforce the size limit per container.
Resolution
Reformatting the disk and updating the fstab.
/dev/sdb /var/lib/docker xfs defaults,quota,prjquota,pquota,gquota 0 0
Updating /etc/systemd/system/docker.service.d/override.conf
ExecStart=/usr/bin/dockerd --storage-driver=overlay2 --exec-opt native.cgroupdriver=systemd --log-driver=journald --storage-opt overlay2.override_kernel_check=true --storage-opt overlay2.size=10G
The overlay2.override_kernel_check=true option is required for older (3.10.x) kernels.
Testing the new setup
# docker info | egrep "Backing Filesystem|Storage Driver"
Storage Driver: overlay2
 Backing Filesystem: xfs
 
# mount | grep '/dev/sdb on /var/lib/docker'
/dev/sdb on /var/lib/docker type xfs (rw,relatime,seclabel,attr2,inode64,usrquota,prjquota,grpquota)
 
# docker container disk space is limited to 10 GB
root@aaba31936b78:/# dd if=/dev/zero of=out bs=4096k
dd: error writing 'out': No space left on device
2560+0 records in
2559+0 records out
10737352704 bytes (11 GB, 10 GiB) copied, 7.11036 s, 1.5 GB/s
 
# xfs_quota -x -c 'report -h' /var/lib/docker
Project quota on /var/lib/docker (/dev/sdb)
                        Blocks             
Project ID   Used   Soft   Hard Warn/Grace  
---------- ---------------------------------
...
#197          16K    10G    10G  00 [------]
#198           8K    10G    10G  00 [------]
#199        10.0G    10G    10G  00 [------]  # <---- this container uses 10 GB max.
Comments
Post a Comment