Environment
- YugabyteDB
Issue
When trying to create a backup of a YugabyteDB table using snapshots, the backup may fail with an exception like:
Timed out waiting for snapshot
This means that the snapshot creation operation did not complete within the timeout. The snapshot may be stuck in the `CREATING` state, and the master logs will have errors like:
Failed, will be retried: Leader not ready to serve requests. (yb/consensus/consensus.cc:117): Leader not yet replicated NoOp to be ready to serve requests (tablet server error 24)
and
Failed to CREATING snapshot at c8bb52ecea584d5aa35a18b4c89f1e01: Reached maximum number of retries (20), terminal: 0, 2 was running
This indicates that the leader tablet server of the table was not able to replicate a no-op operation to its followers, which is required for the leadership of tablets.
Cause
Cause of this issue is that one or more follower tablet servers are lagging behind the leader by one or more terms. This can be verified by checking the tablet report and looking for the term and index values of the followers. And this can happen because the leader tablet server is unable to send requests to the followers because the consensus batch size has exceeded its limit. This can be checked by looking for warnings like:
Can't advance the committed index across term boundaries until operations from the current term are replicated.
Resolution
Overview
The Cause of this issue is that one or more follower tablet servers are lagging behind the leader by one or more terms. This can be verified by checking the tablet report and looking for the term and index values of the followers. And this can happen because the leader tablet server is unable to send requests to the followers because the consensus batch size has exceeded its limit. This can be checked by looking for warnings like:
Can't advance the committed index across term boundaries until operations from the current term are replicated.
Steps
To resolve this issue, increase the value of `consensus_max_batch_size_bytes` GFlag on tablet servers which has tablet using `yb-ts-cli`. The default value of this GFlag is 4 MB, and it can be increased set to 64 MB and can be increased till 196 MB and should be reset to default value once load balancing is completed.
Comments
0 comments
Please sign in to leave a comment.