Environment
YugabyteDB - Core DB
Issue:
- The tserver process crashes with the following FATAL log:
hybrid_clock.cc:169] Too big clock skew is detected: 0.503s, while max allowed is: 0.500sF0418 23:14:41.833551
- Master logs report the below errors:
hybrid_clock.cc:172] Too big clock skew is detected: 1.078s, while max allowed is: 0.500s I0610 12:39:59.924661 48 cluster_balance.cc:311] Total pending adds=1, total pending removals=0, total pending leader stepdowns=0
Resolution:
In order to keep the clocks in sync on the universes, please install NTP
or Chrony
.
To check if clock is in sync using NTP use the command ntpq -p
~]$ ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== +clock.util.phx2 .CDMA. 1 u 111 128 377 175.495 3.076 2.250 *clock02.util.ph .CDMA. 1 u 69 128 377 175.357 7.641 3.671 ms21.snowflakeh .STEP. 16 u - 1024 0 0.000 0.000 0.000 rs11.lvs.iif.hu .STEP. 16 u - 1024 0 0.000 0.000 0.000 2001:470:28:bde .STEP. 16 u - 1024 0 0.000 0.000 0.000
To check if clock is in sync using Chrony use chronyc sources
See appropriate documentation on using chrony or ntp for your distribution. For convenience, find some documentation below. Remember to confer with your systems team for the appropriate solution in your environment.
Red Hat Enterprise Linux:
ntp documentation
chrony documentation
Root Cause:
The above error indicates the nodes running tserver/master process are having clock skew outside of an acceptable range. Clock skew and clock drift can lead to significant consistency issues and should be fixed as soon as possible. YugabyteDB uses the fail_on_out_of_range_clock_skew
flag in order to govern the behavior for the tserver/master process if clock skew is detected.
fail_on_out_of_range_clock_skew
is set to true
in all Yugabyte releases starting starting in YugabyteDB 2.8
Find the documentation on release versioning here
If fail_on_out_of_range_clock_skew
gflag is set to false then, the tserver/master process will not crash on clock skew. However, it will still log the error messages and if clock skew is not addressed, data inconsistencies may occur.
YugabyteDB and VMware vMotion
Note: This section is only relevant to those using VMware vMotion
VMware vMotion is a complex event where a VM is live-migrated from one physical ESXi host to another. vMotion generally has two broad phases:
- Iterative Pre-copy - Multiple iterations where cold memory pages are copied from source to destination
- Suspend/Resume - Final phase where the VM with its remaining hot pages is suspended on the source and resumed on the destination
During vMotion the VMware Guest OS may incur clock skew in the suspend/resume phase depending on the order in which the guest processes are resumed. If YugabyteDB processes are running in a VM then the clock skew between the VM and the actual time (or its peer VMs) might be greater than the threshold which the YugabyteDB tserver process can tolerate resulting in a tserver process crash. Below we propose two ways to mitigate the buildup of this clock skew.
If you have clock skew due to vMotion, then this Prometheus query will show a sharp jump at the time of the event:
max(max_over_time(hybrid_clock_skew{export_type=~"(tserver_export|master_export)", node_prefix="<universe_name>"}[56s])) by (exported_instance)
Option 1 - VMware Tools solution
The best practices for synchronizing time in the Guest OS during VM live migration in VMware environments are listed in this KB article. This requires VMware Tools to be installed in the Guest OS. VMware Tools offer a one-time clock synchronization with the Host clock immediately after a suspend/resume during vMotion. Depending on when the VMware Tools are scheduled to run on the VM on resume this option can be used to immediately synchronize the VM clock with the host clock. So, if the host is synchronizing time with an NTP server, then using this option the guest time will also automatically be synchronized with the host NTP server on resume immediately after a vMotion. The VMware Tools setting pref.timeLagInMilliseconds
is the allowable lag between the guest and the host times which is by default set to 1000 or 1s. YugabyteDB tserver process cannot tolerate a clock skew with its peers (which are synchronized to real time) beyond 500ms. Therefore in order to prevent any tserver crashes, we should aggressively set this value to 100ms.
Option 2 - Use PTP for clock synchronization
PTP or Precision Time Protocol is a new standard for time synchronization which provides a microsecond level time synchronization between the source and target. This is a much more precise time synchronization protocol than NTP or Network Time Protocol which only provides a millisecond level synchronization between the source and the target. Here is more information on ESXi’s support for PTP and PTP hardware clock or PHC. Various NTP clients like Chrony can be configured on modern Linux OSs to refer to the system time using a PHC which is a paravirtualized virtual hardware device which refers to the host time. Here are the system requirements for enabling PHC in Linux guest OSs.
Requirement | Version |
vSphere version | vSphere 7.0 or above |
Virtual HW version | 17 or above |
Linux OS version | Centos 7.6 / RHEL 8.0 or above |
With the above host/guest versions we can configure Chrony in the Guest VMs running YugabyteDB to sync with the PTP virtual hardware device backed by a precise host clock.
Comments
0 comments
Please sign in to leave a comment.