Environment

YugabyteDB - Core DB

Issue:

The tserver process crashes with the following FATAL log:

hybrid_clock.cc:169] Too big clock skew is detected: 0.503s, while max allowed is: 0.500sF0418 23:14:41.833551

Master logs report the below errors:

hybrid_clock.cc:172] Too big clock skew is detected: 1.078s, while max allowed is: 0.500s
I0610 12:39:59.924661    48 cluster_balance.cc:311] Total pending adds=1, total pending removals=0, total pending leader stepdowns=0

Resolution:

In order to keep the clocks in sync on the universes, please install NTP or Chrony.

To check if clock is in sync using NTP use the command ntpq -p

~]$ ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+clock.util.phx2 .CDMA.           1 u  111  128  377  175.495    3.076   2.250
*clock02.util.ph .CDMA.           1 u   69  128  377  175.357    7.641   3.671
 ms21.snowflakeh .STEP.          16 u    - 1024    0    0.000    0.000   0.000
 rs11.lvs.iif.hu .STEP.          16 u    - 1024    0    0.000    0.000   0.000
 2001:470:28:bde .STEP.          16 u    - 1024    0    0.000    0.000   0.000

To check if clock is in sync using Chrony use chronyc sources

See appropriate documentation on using chrony or ntp for your distribution. For convenience, find some documentation below. Remember to confer with your systems team for the appropriate solution in your environment.

Red Hat Enterprise Linux:
ntp documentation
chrony documentation

Ubuntu:
ntp and chrony combined documentation

Root Cause:

The above error indicates the nodes running tserver/master process are having clock skew outside of an acceptable range. Clock skew and clock drift can lead to significant consistency issues and should be fixed as soon as possible. YugabyteDB uses the fail_on_out_of_range_clock_skew flag in order to govern the behavior for the tserver/master process if clock skew is detected.

fail_on_out_of_range_clock_skew is set to true in all Yugabyte releases starting starting in YugabyteDB 2.8

Find the documentation on release versioning here

If fail_on_out_of_range_clock_skew gflag is set to false then, the tserver/master process will not crash on clock skew. However, it will still log the error messages and if clock skew is not addressed, data inconsistencies may occur.

YugabyteDB and VMware vMotion

Note: This section is only relevant to those using VMware vMotion

VMware vMotion is a complex event where a VM is live-migrated from one physical ESXi host to another. vMotion generally has two broad phases:

Iterative Pre-copy - Multiple iterations where cold memory pages are copied from source to destination
Suspend/Resume - Final phase where the VM with its remaining hot pages is suspended on the source and resumed on the destination

During vMotion the VMware Guest OS may incur clock skew in the suspend/resume phase depending on the order in which the guest processes are resumed. If YugabyteDB processes are running in a VM then the clock skew between the VM and the actual time (or its peer VMs) might be greater than the threshold which the YugabyteDB tserver process can tolerate resulting in a tserver process crash. Below we propose two ways to mitigate the buildup of this clock skew.

If you have clock skew due to vMotion, then this Prometheus query will show a sharp jump at the time of the event:

max(max_over_time(hybrid_clock_skew{export_type=~"(tserver_export|master_export)", node_prefix="<universe_name>"}[56s])) by (exported_instance)

Option 1 - VMware Tools solution

The best practices for synchronizing time in the Guest OS during VM live migration in VMware environments are listed in this KB article. This requires VMware Tools to be installed in the Guest OS. VMware Tools offer a one-time clock synchronization with the Host clock immediately after a suspend/resume during vMotion. Depending on when the VMware Tools are scheduled to run on the VM on resume this option can be used to immediately synchronize the VM clock with the host clock. So, if the host is synchronizing time with an NTP server, then using this option the guest time will also automatically be synchronized with the host NTP server on resume immediately after a vMotion. The VMware Tools setting pref.timeLagInMilliseconds is the allowable lag between the guest and the host times which is by default set to 1000 or 1s. YugabyteDB tserver process cannot tolerate a clock skew with its peers (which are synchronized to real time) beyond 500ms. Therefore in order to prevent any tserver crashes, we should aggressively set this value to 100ms.

Option 2 - Use PTP for clock synchronization

PTP or Precision Time Protocol is a new standard for time synchronization which provides a microsecond level time synchronization between the source and target. This is a much more precise time synchronization protocol than NTP or Network Time Protocol which only provides a millisecond level synchronization between the source and the target. Here is more information on ESXi’s support for PTP and PTP hardware clock or PHC. Various NTP clients like Chrony can be configured on modern Linux OSs to refer to the system time using a PHC which is a paravirtualized virtual hardware device which refers to the host time. Here are the system requirements for enabling PHC in Linux guest OSs.

Requirement	Version
vSphere version	vSphere 7.0 or above
Virtual HW version	17 or above
Linux OS version	Centos 7.6 / RHEL 8.0 or above

With the above host/guest versions we can configure Chrony in the Guest VMs running YugabyteDB to sync with the PTP virtual hardware device backed by a precise host clock.

Too big clock skew leading to error messages or tserver crashes

Environment

Issue:

Resolution:

Root Cause:

YugabyteDB and VMware vMotion

Option 1 - VMware Tools solution

Option 2 - Use PTP for clock synchronization

Comments

Environment

Issue:

Resolution:

Root Cause:

YugabyteDB and VMware vMotion

Option 1 - VMware Tools solution

Option 2 - Use PTP for clock synchronization

Related articles