Environment
- YugabyteDB Anywhere - 2.18.5
Issue
The YugabyteDB universe rolling restart fails with below error messages in YBA.
java.lang.RuntimeException: WaitForServer(045beb8a-b356-416e-850d-e3882ba50493, yb-test-cluster-n2, type=TSERVER) did not respond in the set time.
Cause
There are several possible causes. For instance, YugabyteDB can take time to load tablet metadata if the number of tablets is high. When this process exceeds 5 minutes, YBA (YugabyteDB Anywhere) marks it as timed out, which is hard-coded to 5 minutes.
A scenario observed in YBA logs:
- The YBA log showing error at 17:59:10
YW 2024-03-01T17:59:10.124Z [ERROR] 005e5e94-aeba-4a31-a0bc-6cba20703db8 from TaskExecutor in TaskPool-SoftwareUpgrade(045beb8a-b356-416e-850d-e3882ba50493)-0 - Failed to execute task type Wait ForServer UUID 3700e0bb-b5f0-4e0a-8318-11da4fb4d633 details ....., hit error.
java.lang.RuntimeException: WaitForServer(045beb8a-b356-416e-850d-e3882ba50493, yb-test-cluster-n2, type=TSERVER) did not respond in the set time.
- The respective T-Server's RPC Server started at 17:59:29
I0301 17:59:29.073175 9141 rpc_server.cc:173] RPC server started. Bound to: <IP-Address>:9100
This example shows that the T-Server started slightly late, while YBA marked the operation as timed out.
Resolution
Overview
To resolve the issue manually increase the YBA timeout using the parameter wait_for_server_timeout
to some larger value and restart the YBA.
Steps
For YBA-Installer Based Installations:
1. SSH into YBA node.
2. Open the file `/opt/yba-ctl/yba-ctl.yml` using Vi or any text editor.
3. Add the below property near the platform.additional which can be set as.
platform:
... # Other values here
.
.
# This section is for and additional data entries that you can
# add to platform.conf if you desire. Please enter the text
# exactly in the correct format starting from the lines below the
# piped delimiter, as it will be appended to the end of yb-platform.conf
# in the exact format typed.
# Example:
# yb {
# taskGC.task_retention_duration=60
# }
platform:
additional: |
yb {
wait_for_server_timeout=900000ms <--- Here we have added the timeout value.
}
# Config parameters for Postgres
# ==============================
...
4. Save the file run below command to reconfigure the YBA.
sudo yba-ctl reconfigure
For Replicated Based YBA Installations:
1. SSH into YBA node.
2. Access the yugaware container using following command:
docker exec -ti yugaware bash
3. Modify the application.docker.conf
file and insert the wait_for_server_timeout
to 900000 ms
as below:
yb {
.
.
health.check_interval_ms = 300000
health.status_interval_ms = 43200000
wait_for_server_timeout = 900000 ms <------ Here we added the timeout.
.
.
releases.path = "/opt/yugabyte/node-agent/releases"
}
4. Restart the yugware container.
sudo docker restart yugaware
5. Retry the Rolling Restart of Universe again.
Comments
0 comments
Please sign in to leave a comment.