Issue
A universe related task fails in YugabyteDB Anywhere with following error:
Error occurred in subtask taskType : WaitForServerReady, taskState: Failure
java.lang.RuntimeException: WaitForServerReady, max number attempts reached
Cause
There are several possible causes. For instance, YugabyteDB can take time to load tablet metadata if the number of tablets is high. When this process exceeds 10 minutes (default), YBA (YugabyteDB Anywhere) marks it as failed.
A scenario observed in YBA logs:
- G-Flag upgrade failed on a 2nd node in RF3 cluster with WaitForServerReady exception.
YW 2024-05-15T21:45:09.190Z [ERROR] a34c3577-0716-4163-b207-7c8d8e53dbfa from TaskExecutor in TaskPool-11 - Error occurred in subtask taskType : WaitForServerReady, taskState: Failure
java.lang.RuntimeException: WaitForServerReady, max number attempts reached: 593. Failing...
- Here the T-Server takes time to start RPC server due to many tablets. DB-4304. On the n-2 server the loading of tablet started at below time:
YW 2024-05-15T21:35:08.839Z [INFO] a34c3577-0716-4163-b207-7c8d8e53dbfa from WaitForServerReady in TaskPool-GFlagsUpgrade(546971f2-a64f-4cf2-97a3-059649b2e5d4)-0 - TSERVER on node yb-dev-test-n2 not ready after iters=0, error '4095 tablets not running out of 4123.'.
and finished at below time:
YW 2024-05-15T21:45:09.180Z [INFO] a34c3577-0716-4163-b207-7c8d8e53dbfa from WaitForServerReady in TaskPool-GFlagsUpgrade(546971f2-a64f-4cf2-97a3-059649b2e5d4)-0 - Timing out after iters=593. error '174 tablets not running out of 4122.'.
- Means more than 10 minutes and hence the YBA marked this as failure.
Resolution
Overview
To resolve the issue manually increase the "yb.checks.wait_for_server_ready.timeout" value in YBA Runtime Configuration.
Steps
- Go to YBA UI > Admin > Advanced > Search for yb.checks.wait_for_server_ready.timeout > Action > Edit Configuration > Increase the value as per needs and save changes.
Comments
0 comments
Please sign in to leave a comment.