Environment

Yugabyte Platform - 2.14

Issue

In a k8's environment if a tserver is in a crashloop due to disk full, in this case there is no way to get a bash session into the tserver container, as its continually being restarted. But there are other ways to look at the mounts and storage and determine the files/directories filling up the disk.

Resolution

Overview

In the tserver pod, as well as the yb-tserver container, there is a yb-cleanup container, which is designed to purge/zip old logs to maintain disk space.

You can get more details of the containers in the tserver pod you can use the command below.

kubectl -n <namespace> describe pod yb-tserver-0

This will show you details of the yb-cleenup container, subset of output below.

yb-cleanup:
Container ID: containerd://d0f8ce4873dab43ff66ac63a6ce9a547aa3f7150823500e492386350268230b7
Image: quay.io/yugabyte/yugabyte:2.14.1.0-b36
Image ID: quay.io/yugabyte/yugabyte@sha256:fb322935064af376b3a0f9548483192547782b98f88f10fbdf8c739c12fbe3a7
Port: <none>
Host Port: <none>
Command:
/sbin/tini
--
Args:
/bin/bash
-c
while true; do
sleep 3600;
/home/yugabyte/scripts/log_cleanup.sh;
done

This shows the yb-cleenup container runs a cleanup script once an hour.

Steps

1. To determine why the tserver is in crashloop, get the logging from the container to see if it gives any indication of the errors.

kubectl -n <namespace> logs yb-tserver-0 -c yb-tserver

If it is disk full issue, you will see output like below, in our example the log directory had filled up, as logging was generated faster than the cleanup script to purge or zip. In our example network issues caused the high generation of logs.

.....
DNS addr resolve: yb-tserver-0.yb-tservers.yb-dev-xxxx-yyyy.svc.cluster.local
DNS addr resolve success.
Bind ipv4: 10.11.12.13 port 9042
Bind success.
DNS addr resolve: 0.0.0.0
DNS addr resolve success.D
Bind ipv4: 0.0.0.0 port 5433
Bind success.
Could not open file in log_dir /mnt/disk0/yb-data/tserver/logs: No space left on device

2. Get a bash session into the yb-cleenup container

kubectl -n <namespace> exec -it yb-tserver-0 -c yb-cleanup -- bash

Once in the container, you can see the all the disks and navigate to the required directories and files
Example

[root@yb-tserver-0 yugabyte]# df -h
Filesystem Size Used Avail Use% Mounted on
overlay    46G    5.5G 40G  13% /
tmpfs      64M       0 64M   0% /dev
tmpfs      16G       0 16G   0% /sys/fs/cgroup
/dev/sdc   49G     49G  0G 100% /home/yugabyte
shm        64M     20K 64M   1% /dev/shm
/dev/sda1  46G    5.5G 40G  13% /mnt/disk0
tmpfs      28G     12K 28G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs      16G       0 16G   0% /proc/acpi
tmpfs      16G       0 16G   0% /proc/scsi
tmpfs      16G       0 16G   0% /sys/firmware

In our example we navigated to logs directory /mnt/disk0/yb-data/tserver/logs and removed old logs to get the tservers to restart.

k8's tserver in crashloop due to disk full

Environment

Issue

Resolution

Overview

Steps

Comments

Environment

Issue

Resolution

Overview

Steps

Related articles