- YugabyteDB - 2.4
- YugabyteDB - 2.6
- YugabyteDB - 2.8
- YugabyteDB - 2.12
- YugabyteDB - 2.14
During rolling restart of a Yugabyte database initiated through YugabyteDB Anywhere (formerly Yugabyte Platform), an increase in query latency is observed. Once the rolling restart has completed, latency returns to normal.
In versions of YugawareDB Anywhere prior to 2.8.1, during rolling restart, the database software on each node is shut down without taking any action to drain incoming requests from the remote procedure call (RPC) queue on the node. When a node hosting tablet leaders is restarted, any pending RPCs are discarded and must be retried, leading to an increase in query latency.
Starting in version 2.8.1, YugawareDB Anywhere supports the use of a database feature called "tablet leader blacklisting" during rolling restarts. Once this feature is enabled, tablet leadership will be moved away from the node scheduled for restart, giving the RPC queues a chance to drain before the node is restarted.
IMPORTANT: Systems using custom (CA-signed) TLS certificates and running YugabyteDB Anywhere versions 2.12.2 through 2.12.9 should be upgraded to version 2.12.10 or newer to address bug PLAT‑4658 prior to enabling this feature.
1. Upgrade YugawareDB Anywhere to version 2.8.1 or newer. This feature is supported with database versions 2.4 and newer.
2. Use the YugawareDB Anywhere REST API to set the runtime configuration setting
true to enable tablet leader blacklisting during rolling restarts.
For example, the command below will enable tablet leader blacklisting for all Universes associated with the corresponding YugawareDB Anywhere instance (replace <platform_address> with the hostname or IP address of the YugawareDB Anywhere instance, <cuuid> with the Customer ID value from the User Profile section of the YugawareDB Anywhere user interface, and <auth_token> with a suitable REST API auth token):
curl --request PUT \ --url https://<platform_address>/api/v1/customers/<cuuid>/runtime_config/00000000-0000-0000-0000-000000000000/key/yb.upgrade.blacklist_leaders \ --header 'Content-Type: text/plain' \ --header 'X-AUTH-YW-API-TOKEN: <auth_token>' \ --data true
More information about getting and setting Yugabyte Anywhere runtime configuration variables is available in the Runtime Configuration section of the REST API documentation.
NOTE: Only the SuperAdmin user can modify runtime configuration variables at the global scope (00000000-0000-0000-0000-000000000000).
NOTE: The tablet leader blacklisting feature will be enabled by default in a future release.
3. If necessary, adjust the
blacklist_leader_wait_time_ms runtime configuration setting. By default, YugawareDB Anywhere will wait a maximum of 60000 ms (1 minute) for tablet leader migration to complete before restarting each node.
The amount of time required for all tablet leaders to migrate varies depending on the number of tablets on each node. By default, the database software will perform 2 tablet leader moves per second.
YugawareDB Anywhere periodically checks the status of tablet leader migration and will restart a node immediately if all leaders have been migrated, so this setting can be safely increased to several minutes. This setting acts as a "backstop" that prevents rolling restart from hanging in the event that tablet leader migration does not complete in a timely manner.
4. If necessary, adjust the master GFlag
load_balancer_max_concurrent_moves. This flag controls the number of tablet leader moves that the database software will perform concurrently and therefore how fast tablet leader blacklisting will complete. If rolling restarts of a Universe are taking too long after enabling tablet leader blacklisting, adjust this flag as shown in the table below and initiate a rolling restart of the Universe:
|Node vCPUs||Node Memory||load_balancer_max_concurrent_moves|
|4 or more||8 GiB or more||10|
For more information on how to set database GFlags, see the Edit configuration flags section of the documentation.