Environment:
- Yugabyte CoreDB
Issue:
Read queries are timing out after a too many deletes on the table.
delete from frames where prefix='FDRPEdge3';
OperationTimedOut: errors={'10.64.10.55': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=10.64.10.55
Root Cause:
By default compactions are triggered automatically as new data arrives and memstores are flushed to create SSTable files. There are no scheduled compactions, although that is feature that is being considered as per GHI
https://github.com/yugabyte/yugabyte-db/issues/7614
This current scenario can cause issues if there is a lot of tombstone i.e dead data which was deleted for the partition and is yet to be compacted. This can cause timeouts when querying a table with a lot of tombstones, as the SELECT has to skip past possibly millions of dead rows before getting to valid rows.
Resolution
Overview
To resolve the above issue, manual compactions can be run on the table.
Steps to run manual compaction:
Step 1. Set force unresponsive_ts_rpc_retry_limit
on master nodes only, as Flush/Compact requests to the tServer are rpcs that are expected to take a long time.
e.g.
~/tserver/bin/yb-ts-cli --server_address=<master 1> set_flag -force unresponsive_ts_rpc_retry_limit 0
~/tserver/bin/yb-ts-cli --server_address=<master 2> set_flag -force unresponsive_ts_rpc_retry_limit 0
~/tserver/bin/yb-ts-cli --server_address=<master 3> set_flag -force unresponsive_ts_rpc_retry_limit 0
Step 2. Run the manual compaction using the yb-admin
command on one node only. In the below command we have set the timeout_in_seconds
to 86400 i.e 1 day so that the command does not timeout and returns when compaction completes. However, you can set it longer if needed. Even if the command times out after the specified timeout_in_seconds
duration it is ok, the compaction will run in the background. If the compaction completes before timeout_in_seconds
then it will notify. If the command times out the only thing is it would be difficult to monitor the compaction or get notified when it completes.
Note: This only needs to be executed on one node in the cluster. It will run compaction on all the nodes in the cluster.
~/tserver/bin/yb-admin -master_addresses <master addresses> compact_table <keyspace> <table name> 86400
Step 3. Once compaction is done revert the unresponsive_ts_rpc_retry_limit
flag to default value.
~/tserver/bin/yb-ts-cli --server_address=$IP1 set_flag -force unresponsive_ts_rpc_retry_limit 20
~/tserver/bin/yb-ts-cli --server_address=$IP2 set_flag -force unresponsive_ts_rpc_retry_limit 20
~/tserver/bin/yb-ts-cli --server_address=$IP3 set_flag -force unresponsive_ts_rpc_retry_limit 20
Related articles:
Comments
0 comments
Please sign in to leave a comment.