Environment
- Yugabyte DB
Issue
Need to tune or improve the performance of the load balancer in the following conditions:
- Node additions
- Replication returning to RF after a node failure in a zone
- Node replacement
Solution
Tuning the load balancer should usually be done at the recommendation of Yugabyte Support. Please test any changes at a period of low load on the cluster, because incorrect or oversized values can quickly overwhelm a cluster's IO or network available bandwidth.
Data re-balancing happens mainly in the event of node addition and node removal. Some of the configurations are applicable only for Node addition as mentioned in the below tables. These are configurations that can be tweaked on the YB-Master and YB-Tserver to enhance the performance of data balancing and leader movement:
YB Master Flags
Configuration Flag | Description | Default Value | Node Add | Node Remove |
load_balancer_max_concurrent_adds | Maximum number of tablet peer replicas to add in any one run of the load balancer | 1 | Yes | Yes |
load_balancer_max_concurrent_moves | The maximum number of tablet leaders on tablet servers (across the cluster) to move in any one run of the load balancer. | 10 (2 in older releases) | Yes | Yes |
load_balancer_max_concurrent_moves_per_table |
The maximum number of tablet leaders per table to move in any one run of the load balancer. The maximum number of tablet leader moves across the cluster is still limited by the flag load_balancer_max_concurrent_moves. This flag is meant to prevent a single table from using all of the leader moves quota and starving other tables. |
1 | Yes | Yes |
load_balancer_max_concurrent_removals |
Specifies the maximum number of over-replicated tablet peer removals to do in a load balancer operation. |
1 | Yes | No |
load_balancer_max_concurrent_tablet_remote_bootstraps | Maximum number of tablets being remote bootstrapped across the cluster. | 10 | Yes | Yes |
load_balancer_max_concurrent_tablet_remote_bootstraps_per_table | Maximum number of tablets being remote bootstrapped for any table. The maximum number of remote bootstraps across the cluster is still limited by the flag load_balancer_max_concurrent_tablet_remote_bootstraps. This flag is meant to prevent a single table-use all the available remote bootstrap sessions and starving other tables. | 2 | Yes | Yes |
load_balancer_max_over_replicated_tablets | Maximum number of running tablet replicas that are allowed to be over the configured replication factor. | 1 | Yes |
No |
YB Tserver Flags
Configuration Flag | Description | Default Value | Node Add | Node Remove |
remote_bootstrap_rate_limit_bytes_per_sec | Rate control across all tablets being remote bootstrapped from or to this process. | 256MB | Yes | Yes |
rate_limiter_min_size | Minimum size for each transmission request | 32KB | Yes | Yes |
remote_bootstrap_max_chunk_size | Maximum chunk size to be transferred at a time during remote bootstrap | 64MB1 | Yes | Yes |
The recommended tuning practice is as follows:
1) Increase the number of tablets the load balancer moves at a time using a relevant flag until the rate limit per tserver is hit
2) Increase the remote_bootstrap_rate_limit_bytes_per_sec
to a maximum safe value, which should be estimated using available remaining IO and network bandwidth during peak cluster operations.
For example, for cluster expansion operations, because tablets will be over replicated, double load_balancer_max_over_replicated_tablets
until the enforced remote_bootstrap_rate_limit_bytes_per_sec
is hit, and then observe what remaining IO and network are available, and re-tune the remote bootstrap rate limit appropriately.
Footnotes
1. this value was increased from a value of 1MB, as tracked in https://github.com/yugabyte/yugabyte-db/issues/11868 . The new value is safe to a apply to prior version where it is not default.
Comments
0 comments
Please sign in to leave a comment.