Environment
- Yugabyte Platform - 2.8.x
Issue
When an operation such as node removal, node additions, tablet splitting has started and we need to speed up the LoadBalancer, this details how to adjust runtime parameters.
Resolution
Steps
The following steps should be taken on the master leader
1. Export the path the the yugabyte binaries
export PATH=$HOME/tserver/bin:$PATH
2. Get a status of the data and where it is, run it a few time a minute apart and you will see what is changing by default it is really really slow to rebalance
yb-admin -init_master_addrs `hostname -i` list_all_tablet_servers -certs_dir_name <path to certs>
3. On the master leader (it must be this node), run the following to see the various Loadbalancer settings.
curl -s http://`hostname -i`:7000/varz?raw | grep load
You should see output like the below, save this output in a text file
--master_ignore_deleted_on_load=true
--load_balancer_initial_delay_secs=120
--balancer_load_max_standard_deviation=2
--TEST_load_balancer_handle_under_replicated_tablets_only=false
--TEST_load_balancer_skip_inactive_tablets=true
--TEST_load_balancer_wait_after_count_pending_tasks_ms=0
--enable_global_load_balancing=true
--enable_load_balancing=true
--load_balancer_count_move_as_add=true
--load_balancer_drive_aware=true
--load_balancer_ignore_cloud_info_similarity=false
--load_balancer_max_concurrent_adds=1
--load_balancer_max_concurrent_moves=2
--load_balancer_max_concurrent_moves_per_table=1
--load_balancer_max_concurrent_removals=1
--load_balancer_max_concurrent_tablet_remote_bootstraps=10
--load_balancer_max_concurrent_tablet_remote_bootstraps_per_table=2
--load_balancer_max_over_replicated_tablets=1
--load_balancer_num_idle_runs=5
--load_balancer_skip_leader_as_remove_victim=false
--TEST_inject_load_transaction_delay_ms=0
--TEST_download_partial_wal_segments=false
4. Make runtime changes to the parameters below.
Be careful on a production universe with traffic.
This example is for adjusting the parameters up to 5X the default values, this would be appropriate for clusters with 4CPU nodes, if the nodes were to be for example 16CPU, we would recommend going to 10X.
However for parameter load_balancer_max_concurrent_tablet_remote_bootstraps
, 40
is an appropriate setting for all scenarios.
yb-ts-cli -server_address `hostname -i`:7100 set_flag -force load_balancer_max_concurrent_tablet_remote_bootstraps 40 -certs_dir_name <path to certs>
yb-ts-cli -server_address `hostname -i`:7100 set_flag -force load_balancer_max_concurrent_tablet_remote_bootstraps_per_table 10 -certs_dir_name <path to certs>
yb-ts-cli -server_address `hostname -i`:7100 set_flag -force load_balancer_max_concurrent_adds 5 -certs_dir_name <path to certs>
yb-ts-cli -server_address `hostname -i`:7100 set_flag -force load_balancer_max_concurrent_moves 5 -certs_dir_name <path to certs>
yb-ts-cli -server_address `hostname -i`:7100 set_flag -force load_balancer_max_concurrent_moves_per_table 5 -certs_dir_name <path to certs>
yb-ts-cli -server_address `hostname -i`:7100 set_flag -force load_balancer_max_over_replicated_tablets 5 -certs_dir_name <path to certs>
yb-ts-cli -server_address `hostname -i`:7100 set_flag -force load_balancer_max_concurrent_removals 5 -certs_dir_name <path to certs>
5. Check the above changes have taken place.
curl -s http://`hostname -i`:7000/varz | grep load
6.Monitor the loadbalacning data balancing and leader movement either via the master UI (tablet servers), or yb-admin commands i.e. list_all_tablet_servers
.
7. Once the re-balancing is complete, revert the gflag modified back to the original values using the yb-ts-cli command.
Footnotes
For yb-admin and yb-ts-cli commands the extra argument certs_dir_name is only required for clusters with TLS enabled.
Other KB's related to this
https://support.yugabyte.com/hc/en-us/articles/4424009538317-How-to-tune-Load-Balancer-performance
https://yugabyte.zendesk.com/knowledge/articles/6489683969677/en-us?brand_id=360003803051
Comments
0 comments
Please sign in to leave a comment.