Environment
- YugabyteDB - all supported versions
Overview
Customers may need to confirm whether YugabyteDB moved a tablet replica or tablet leader by reviewing YB-Master and YB-TServer logs.
Tablet movement usually means one of the following:
- A tablet replica is added to a new YB-TServer and then removed from another YB-TServer.
- A tablet leader is moved from one YB-TServer to another.
- A newly added replica is remote bootstrapped from an existing peer.
YugabyteDB tablet movement is coordinated by the YB-Master load balancer. The master decides what to move, schedules Raft configuration changes, and tracks add/remove/leader stepdown tasks. YB-TServers then execute the underlying operations such as ChangeConfig and remote bootstrap.
For a full replica move, the common evidence chain is:
- YB-Master logs a decision to move or add a replica.
- YB-Master sends an
ADD_SERVERChangeConfigrequest. - The target YB-TServer remote bootstraps the tablet from an existing peer.
- The remote bootstrap completes and the new replica is promoted from
PRE_VOTERtoVOTER. - YB-Master removes the old replica with a
REMOVE_SERVERChangeConfigrequest.
Leader movement is lighter weight. It does not copy tablet data. The master logs a leader movement decision and sends a leader stepdown request so another existing replica can become leader.
Steps
1. Check the load balancer state
Use yb-admin to check whether the load balancer is active or idle.
./bin/yb-admin \ -master_addresses <master_addresses \ get_load_balancer_state
./bin/yb-admin \ -master_addresses <master_addresses \ get_is_load_balancer_idle
If a data move is in progress after a cluster configuration change, check completion percentage:
./bin/yb-admin \ -master_addresses <master_addresses \ get_load_move_completion
2. Search YB-Master logs for movement decisions
YB-Master logs are the starting point because the load balancer makes the placement decision.
grep -E "Moving tablet|Adding replica of tablet|Removing replica of tablet|Moving leader of tablet|Change config succeeded|ChangeConfig\\(\\) failed" /path/to/yb-master.INFO*
Useful YB-Master log patterns:
Moving tablet <tablet_id from <source_ts_uuid to <target_ts_uuid. Reason: <reason Adding replica of tablet <tablet_id to <target_ts_uuid. Reason: <reason Removing replica of tablet <tablet_id from <source_ts_uuid. Reason: <reason Moving leader of tablet <tablet_id from <source_ts_uuid to <target_ts_uuid. Reason: <reason AddServer ChangeConfig RPC for tablet <tablet_id ... Change config succeeded on leader TS <leader_ts_uuid for tablet <tablet_id with type AddServer ChangeConfig for replica <peer_uuid RemoveServer ChangeConfig RPC for tablet <tablet_id ... Change config succeeded on leader TS <leader_ts_uuid for tablet <tablet_id with type RemoveServer ChangeConfig for replica <peer_uuid ChangeConfig() failed on leader <leader_ts_uuid ...
3. Search YB-TServer logs for remote bootstrap
Replica movement requires remote bootstrap on the target YB-TServer. Search the target and source YB-TServer logs for the tablet ID.
grep -E "Initiating RemoteBootstrap|Beginning remote bootstrap session|Began remote bootstrap session|Remote bootstrap complete|Remote bootstrap for tablet ended successfully|ChangeRole succeeded|Remote bootstrap session with id" /path/to/yb-tserver.INFO*
Useful target-side YB-TServer log patterns:
T <tablet_id P <target_ts_uuid: Initiating RemoteBootstrap from Peer <source_ts_uuid (<source_host:9100) T <tablet_id P <target_ts_uuid: Remote client base: Beginning remote bootstrap session from peer <source_ts_uuid [...] T <tablet_id P <target_ts_uuid: Remote client base: Began remote bootstrap session <session_id [...] T <tablet_id P <target_ts_uuid: Remote bootstrap complete. Replacing tablet superblock. T <tablet_id P <target_ts_uuid: Remote bootstrap: Opening tablet T <tablet_id P <target_ts_uuid: Remote bootstrap for tablet ended successfully
Useful source-side YB-TServer log patterns:
Remote bootstrap session with id <session_id completed. Stats: Transmission rate: <rate, ... Total bytes: <bytes ChangeRole succeeded for bootstrap session <session_id
4. Confirm the final tablet placement
Use list_tablet_servers to verify where the replicas currently exist.
./bin/yb-admin \ -master_addresses <master_addresses \ list_tablet_servers <tablet_id
For a table-level view, use:
./bin/yb-admin \ -master_addresses <master_addresses \ list_tablets <keyspace_type.<keyspace_name <table_name 0
5. Interpret the movement
Use the log messages to classify what happened:
| Evidence | Meaning |
|---|---|
Adding replica of tablet <tablet_id to <target |
The master is adding a missing replica. This can happen when a placement zone does not have enough replicas. |
Moving tablet <tablet_id from <source to <target |
The load balancer decided to move a replica from one YB-TServer to another. The reason can be per-table or global tablet imbalance. |
AddServer ChangeConfig RPC ... type AddServer ChangeConfig |
The Raft config accepted the new peer, commonly first as PRE_VOTER. |
Remote bootstrap complete. Replacing tablet superblock. |
The target YB-TServer completed remote bootstrap and is replacing the local tablet superblock. |
Remote bootstrap for tablet ended successfully |
The target YB-TServer completed remote bootstrap successfully. |
ChangeRole succeeded for bootstrap session |
The bootstrapped peer was promoted from learner/pre-voter state to a serving role. |
Removing replica of tablet <tablet_id from <source |
The old replica is being removed after the tablet became over-replicated. |
RemoveServer ChangeConfig RPC ... type RemoveServer ChangeConfig |
The Raft config accepted removal of the old peer. |
Moving leader of tablet <tablet_id from <source to <target |
Only tablet leadership is moving. This does not copy tablet data. |
Example: Replica Movement
The following example shows how to follow one tablet through the master decision, Raft config change, target-side remote bootstrap, source-side bootstrap completion, and old-replica removal.
YB-Master decides to move the replica
YB-Master logs the tablet ID, source YB-TServer UUID, target YB-TServer UUID, and the reason for the move.
I0511 09:55:32.687013 72 cluster_balance.cc:1655] Moving tablet 02e654a6c8a54c5193e78526beea59f8 from b5fa153e3f8f4f9c844e2a41d4bd5ae8 to dfcfc5c76be44a48b244e047d4e76e22. Reason: Source tserver has more tablets for this table than destination (8 3)
This means the load balancer selected tablet 02e654a6c8a54c5193e78526beea59f8 for movement from source peer b5fa153e3f8f4f9c844e2a41d4bd5ae8 to target peer dfcfc5c76be44a48b244e047d4e76e22.
YB-Master confirms the add-server config change
Before the old peer can be removed, the new peer is added to the Raft configuration.
I0511 09:55:32.700906 736 async_rpc_tasks.cc:831] AddServer ChangeConfig RPC for tablet 02e654a6c8a54c5193e78526beea59f8 (transactions [id=9058142903e84c80b6f1d988a9894f12]) on peer d94fc9e9ec86402a91ebc168a5d3e125 with cas_config_opid_index 5. Reason: Source tserver has more tablets for this table than destination (8 3) (task=0x000013693e2dd358, state=kComplete): Change config succeeded on leader TS d94fc9e9ec86402a91ebc168a5d3e125 for tablet 02e654a6c8a54c5193e78526beea59f8 with type AddServer ChangeConfig for replica dfcfc5c76be44a48b244e047d4e76e22
This means the Raft leader accepted the target peer into the config, usually first as a learner or pre-voter while remote bootstrap runs.
Target YB-TServer remote bootstraps the tablet
The target peer starts a remote bootstrap session from an existing peer, replaces the tablet superblock, opens the tablet, and reports successful completion.
I0511 09:55:32.693066 228 ts_tablet_manager.cc:3524] T 02e654a6c8a54c5193e78526beea59f8 P dfcfc5c76be44a48b244e047d4e76e22: Initiating RemoteBootstrap from Peer b5fa153e3f8f4f9c844e2a41d4bd5ae8 (<source_host:9100) I0511 09:55:32.720897 228 remote_bootstrap_client.cc:428] T 02e654a6c8a54c5193e78526beea59f8 P dfcfc5c76be44a48b244e047d4e76e22: Remote client base: Began remote bootstrap session dfcfc5c76be44a48b244e047d4e76e22-02e654a6c8a54c5193e78526beea59f8-4925.231s [Bootstrapping from FOLLOWER] I0511 09:55:32.732460 228 remote_bootstrap_client.cc:482] T 02e654a6c8a54c5193e78526beea59f8 P dfcfc5c76be44a48b244e047d4e76e22: Remote client base: Remote bootstrap complete. Replacing tablet superblock. I0511 09:55:32.739709 228 ts_tablet_manager.cc:1726] T 02e654a6c8a54c5193e78526beea59f8 P dfcfc5c76be44a48b244e047d4e76e22: Remote bootstrap: Opening tablet I0511 09:55:33.547946 228 ts_tablet_manager.cc:1747] T 02e654a6c8a54c5193e78526beea59f8 P dfcfc5c76be44a48b244e047d4e76e22: Remote bootstrap for tablet ended successfully
This confirms that the target YB-TServer created and opened the replica.
Source peer reports remote bootstrap completion
The existing peer that served the bootstrap files logs session completion and ChangeRole success for the target peer.
I0511 09:55:32.736579 747 remote_bootstrap_service.cc:544] Remote bootstrap session with id dfcfc5c76be44a48b244e047d4e76e22-02e654a6c8a54c5193e78526beea59f8-4925.231s completed. Stats: Transmission rate: 8601307, RateLimiter total time slept: 0.000s, Total bytes: 3948, Read rate 484.881 bytes/msec (Total ms: 8.144), CRC computation rate: 476178.989 bytes/msec(Total ms: 0.008) I0511 09:55:32.738956 747 remote_bootstrap_service.cc:574] ChangeRole succeeded for bootstrap session dfcfc5c76be44a48b244e047d4e76e22-02e654a6c8a54c5193e78526beea59f8-4925.231s
YB-Master removes the old source replica
After the target replica is available, YB-Master removes the old peer. The removal reason commonly shows the tablet is now over-replicated because this is the cleanup phase of the move.
I0511 09:55:33.689706 72 cluster_balance.cc:1670] Removing replica of tablet 02e654a6c8a54c5193e78526beea59f8 from b5fa153e3f8f4f9c844e2a41d4bd5ae8. Reason: Tablet is over-replicated (this is expected if the tablet is being moved)
I0511 09:55:33.700935 758 async_rpc_tasks.cc:831] RemoveServer ChangeConfig RPC for tablet 02e654a6c8a54c5193e78526beea59f8 (transactions [id=9058142903e84c80b6f1d988a9894f12]) on peer d94fc9e9ec86402a91ebc168a5d3e125 with cas_config_opid_index 8. Reason: Tablet is over-replicated (this is expected if the tablet is being moved) (task=0x000013693c1f62d8, state=kComplete): Change config succeeded on leader TS d94fc9e9ec86402a91ebc168a5d3e125 for tablet 02e654a6c8a54c5193e78526beea59f8 with type RemoveServer ChangeConfig for replica b5fa153e3f8f4f9c844e2a41d4bd5ae8
I0511 09:55:33.701050 689 master_heartbeat_service.cc:1126] Tablet: 02e654a6c8a54c5193e78526beea59f8 reported consensus state change. New consensus state: current_term: 2 leader_uuid: "d94fc9e9ec86402a91ebc168a5d3e125" config { opid_index: 9 peers { permanent_uuid: "004047b796a344e18c37573e9a8b7ec8" member_type: VOTER ... } peers { permanent_uuid: "d94fc9e9ec86402a91ebc168a5d3e125" member_type: VOTER ... } peers { permanent_uuid: "dfcfc5c76be44a48b244e047d4e76e22" member_type: VOTER ... } }This confirms the final Raft config no longer includes old peer b5fa153e3f8f4f9c844e2a41d4bd5ae8 and includes the target peer dfcfc5c76be44a48b244e047d4e76e22.
Example: Leader Movement
Leader movement is shown by master Moving leader of tablet and Stepdown Leader RPC entries. The reason is shown inline in the master log entry.
I0511 09:54:42.528630 72 cluster_balance.cc:1682] Moving leader of tablet 02e654a6c8a54c5193e78526beea59f8 from b5fa153e3f8f4f9c844e2a41d4bd5ae8 to 004047b796a344e18c37573e9a8b7ec8. Reason: Source tserver has more leaders for this table than destination (7 0) I0511 09:54:42.529081 682 async_rpc_tasks.cc:947] Stepdown Leader RPC for tablet 02e654a6c8a54c5193e78526beea59f8 (transactions [id=9058142903e84c80b6f1d988a9894f12]) on peer b5fa153e3f8f4f9c844e2a41d4bd5ae8. Reason: Source tserver has more leaders for this table than destination (7 0) (task=0x000013693ed64598, state=kRunning): Prep Leader step down 1, leader_uuid=b5fa153e3f8f4f9c844e2a41d4bd5ae8, change_ts_uuid=b5fa153e3f8f4f9c844e2a41d4bd5ae8 I0511 09:54:42.530087 682 async_rpc_tasks.cc:980] Stepdown Leader RPC for tablet 02e654a6c8a54c5193e78526beea59f8 (transactions [id=9058142903e84c80b6f1d988a9894f12]) on peer b5fa153e3f8f4f9c844e2a41d4bd5ae8. Reason: Source tserver has more leaders for this table than destination (7 0) (task=0x000013693ed64598, state=kRunning): Stepping down leader b5fa153e3f8f4f9c844e2a41d4bd5ae8 for tablet 02e654a6c8a54c5193e78526beea59f8 with new leader 004047b796a344e18c37573e9a8b7ec8
Use the Stepdown Leader RPC lines to identify leader movement. Use the remote bootstrap lines separately to identify replica movement.
Common Findings
- Placement expansion can appear as
Adding replica ... Reason: Placement (...) does not have enough replicas of this tablet. - Full movement can appear as
Moving tablet ... Reason: Source tserver has more tablets for this table than destination. - Replica cleanup can appear as
Removing replica ... Reason: Tablet is over-replicated (this is expected if the tablet is being moved). - Leader balancing can appear as
Moving leader ... Reason: Source tserver has more leaders for this table than destination. - If YB-Master logs show
Moving tabletbut no target-side remote bootstrap logs, search all YB-TServer logs for the tablet ID and target UUID. The target may not have received or accepted the bootstrap request yet. - If
ChangeConfig()fails with a message that the contacted replica is not the leader, check later master logs for a successfulChangeConfigagainst the current leader. -
Use
Moving leader of tabletandStepdown Leader RPCto identify leadership movement. UseMoving tablet,AddServer ChangeConfig, remote bootstrap, andRemoveServer ChangeConfigto identify replica movement.
Comments
0 comments
Please sign in to leave a comment.