Environment
- YugabyteDB
Issue
In the older version of YugabyteDB, We have seen cases where large WAL entries persisted on the tablet leader but cannot be successfully replicated to followers, leading to tablets getting into an unhealthy/unusable state.
Symptoms:
- Tablets cannot acquire a leader lease
- Snapshot creation fails
- Backup fails
Additional Information:
This issue is already fixed in the below versions. We will use the below steps in Yugabyte DB versions previous to these releases.
- YugabyteDB 2.8.9.0
- YugabyteDB 2.6.20.0
- YugabyteDB 2.12.10.0
- YugabyteDB 2.14.2.0
- YugabyteDB 2.15.3.0
Important Notes:
- Please use these steps only when the leader has accepted the message but followers are failing to accept which only happens in the YugabyteDB versions above mentioned above.
- If the leader is not accepting the large messages, then the User should different approaches like reducing the prefetch.
- Increasing this value way too much is not recommended (i. g. 512MB+)
Resolution
Overview
To fix this issue, We will have to increase the rpc_max_message_size
value to more than the largest message.
Steps
- To find the largest message size, run the below command against all yb-tserver logs.
grep 'The frame had a length of' yb-tserver.INFO |grep 'tcp_stream.cc'|sed 's/.* length of //g'|sed 's/, but we .*)//g' | sort|uniq |tail
Example: In the below example, 300298043 is the largest message size.
grep -r 'The frame had a length of' |grep 'tcp_stream.cc'|sed 's/.* length of //g'|sed 's/, but we .*)//g' | sort|uniq |tail
299128732
300298041
300298042
300298043
- Once we have the largest message size, We can increase the
rpc_max_message_size
value by a few megabytes. For example:
yb-ts-cli --server_address hostname:9100 set_flag rpc_max_message_size 333256758 --force
Note:
- Once the tablets become healthy, Please revert the GFlag value to the default value which is 256MB, and advise the user to avoid having large messages.
Comments
0 comments
Please sign in to leave a comment.