- YugabyteDB Anywhere - 2.12.1 or newer
After Upgrading to YBA version 2.12.1, New "DB write/read test error" Alerts are received for multiple Universes. For Example:
name=DB write/read test error
message=Test YSQL write/read operation failed on 1 nodes(s) for universe '<Universe Name>'
- Additional Information:
This is a new Health Check introduced in 2.12 YBA.What this test does is:
- Connects to the DB via ysqlsh.
- Writes one row in special range sharded table. (
- Reads this row back right away to test that data is written.
- Removes above row from the table.
- This test runs every five minutes and tries to connect to every tserver in every Universe by default.
Currently the alert will only have the Universe Information, There is a feature request to include Node information in the alerts as well.
To narrow down the nodes reporting this issue, We have to use a prometheus query.
Since this alert is based on a metric which has the node information, The easiest way to determine the nodes reporting this issue is to use Prom UI with below query:
"yb_node_ysql_write_read < 1"
If we review the Postgres logs on the problematic node during the problem window, it will have the following ERROR:
2023-02-28 14:02:01.914 UTC  ERROR: Operation failed. Try again: Unknown transaction, could be recently aborted: d22df22a-b353-4de8-a397-32ab7e64e656 2023-02-28 14:02:01.914 UTC  STATEMENT: insert into write_read_test values (101) on conflict do nothing; select from write_read_test where id = 101; delete from write_read_test where id = 101;
We were able to reproduce the issue internally and found that the the query fails sometimes and we need to retry the query in order to make this check pass.
We have filed a bug for this issue - https://yugabyte.atlassian.net/browse/PLAT-7561 and the fixes are already available and backported to newer releases.
We can also disable the check temporarily using the below KB to avoid the frequent alerts: