Environment
- Yugabyte Platform - 2.14
Issue
In a k8's environment if a tserver is in a crashloop due to disk full, in this case there is no way to get a bash session into the tserver container, as its continually being restarted. But there are other ways to look at the mounts and storage and determine the files/directories filling up the disk.
Resolution
Overview
In the tserver pod, as well as the yb-tserver container, there is a yb-cleanup container, which is designed to purge/zip old logs to maintain disk space.
You can get more details of the containers in the tserver pod you can use the command below.
kubectl -n <namespace> describe pod yb-tserver-0
This will show you details of the yb-cleenup container, subset of output below.
yb-cleanup:
Container ID: containerd://d0f8ce4873dab43ff66ac63a6ce9a547aa3f7150823500e492386350268230b7
Image: quay.io/yugabyte/yugabyte:2.14.1.0-b36
Image ID: quay.io/yugabyte/yugabyte@sha256:fb322935064af376b3a0f9548483192547782b98f88f10fbdf8c739c12fbe3a7
Port: <none>
Host Port: <none>
Command:
/sbin/tini
--
Args:
/bin/bash
-c
while true; do
sleep 3600;
/home/yugabyte/scripts/log_cleanup.sh;
done
This shows the yb-cleenup container runs a cleanup script once an hour.
Steps
1. To determine why the tserver is in crashloop, get the logging from the container to see if it gives any indication of the errors.
kubectl -n <namespace> logs yb-tserver-0 -c yb-tserver
If it is disk full issue, you will see output like below, in our example the log directory had filled up, as logging was generated faster than the cleanup script to purge or zip. In our example network issues caused the high generation of logs.
.....
DNS addr resolve: yb-tserver-0.yb-tservers.yb-dev-xxxx-yyyy.svc.cluster.local
DNS addr resolve success.
Bind ipv4: 10.11.12.13 port 9042
Bind success.
DNS addr resolve: 0.0.0.0
DNS addr resolve success.D
Bind ipv4: 0.0.0.0 port 5433
Bind success.
Could not open file in log_dir /mnt/disk0/yb-data/tserver/logs: No space left on device
2. Get a bash session into the yb-cleenup container
kubectl -n <namespace> exec -it yb-tserver-0 -c yb-cleanup -- bash
Once in the container, you can see the all the disks and navigate to the required directories and files
Example
[root@yb-tserver-0 yugabyte]# df -h
Filesystem Size Used Avail Use% Mounted on
overlay 46G 5.5G 40G 13% /
tmpfs 64M 0 64M 0% /dev
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/sdc 49G 49G 0G 100% /home/yugabyte
shm 64M 20K 64M 1% /dev/shm
/dev/sda1 46G 5.5G 40G 13% /mnt/disk0
tmpfs 28G 12K 28G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 16G 0 16G 0% /proc/acpi
tmpfs 16G 0 16G 0% /proc/scsi
tmpfs 16G 0 16G 0% /sys/firmware
In our example we navigated to logs directory /mnt/disk0/yb-data/tserver/logs and removed old logs to get the tservers to restart.
Comments
0 comments
Please sign in to leave a comment.