Environment
- YugabyteDB Version: - All versions
Issue
Operations such as rolling restarts, G-flag changes, or other universe-wide operations fail with an error indicating that there are under-replicated tablets. The error message is typically of the form :
CheckUnderReplicatedTablets, timing out after retrying ... Under-replicated tablet size: X. Failing...Resolution
Overview
The root cause of this issue is a misconfiguration of a tablespace, where the placement information for the tablespace refers to a cloud, region, or zone that does not exist in the universe's configuration. When a table or index is created in such a tablespace, its tablets cannot be placed correctly, leading to them being permanently under-replicated. The YugabyteDB load balancer will continuously try and fail to place these tablets, but it will not be able to because of the incorrect placement policy.
Steps
1. Identify the Under-replicated Tablets
When an operation fails with the CheckUnderReplicatedTablets error, the first step is to identify which tablets are under-replicated. This can be done by querying the master leader's /api/v1/tablet-under-replication endpoint.
http://<master-leader-ip>:7000/api/v1/tablet-under-replicationThis will return a JSON object containing a list of under-replicated tablets. Note down the table_uuid and tablet_uuid for each of the under-replicated tablets.
{
"underreplicated_tablets": [
{
"table_uuid": "...",
"tablet_uuid": "...",
"underreplicated_placements": [...]
}
]
}
2. Identify the Problematic Tablespace
Once you have the table_uuid of an under-replicated tablet, you can find the corresponding table name and its tablespace.
- To find the table name, you can use the /tables endpoint on the master leader UI, and search for the table_uuid.
http://<master-leader-ip>:7000/api/v1/tables- Once you have the table name, you can find its tablespace by connecting to the database using ysqlsh and running the following command:
\d+ <table_name>This will show the tablespace of the table. You can also get a list of all tablespaces with their configurations using:
SELECT * FROM pg_tablespace;3. Verify the Tablespace Configuration
Now, compare the placement information of the problematic tablespace with the actual configuration of the universe.
- You can get the universe's placement configuration from the YBA UI, under the 'Edit Universe' section, or by querying the master leader's /api/v1/cluster-config endpoint.
http://<master-leader-ip>:7000/api/v1/cluster-config- The tablespace configuration is in the spcoptions column of the pg_tablespace table. It will look something like this:
{"replica_placement":"{\"num_replicas\":3, \"placement_blocks\":[{\"cloud\":\"gcp\",\"region\":\"us-central1\",\"zone\":\"us-central1-a\",\"min_num_replicas\":1},{\"cloud\":\"gcp\",\"region\":\"us-central1\",\"zone\":\"us-central1-b\",\"min_num_replicas\":1},{\"cloud\":\"gcp\",\"region\":\"us-east1\",\"zone\":\"us-east1-a\",\"min_num_replicas\":1}]}"}4. Correct the Tablespace Configuration
-- Drop the tables that use the incorrect tablespace
DROP TABLE <table_name>;
-- Drop the problematic tablespace
DROP TABLESPACE <tablespace_name>;
-- Recreate the tablespace with the correct placement configuration
CREATE TABLESPACE <tablespace_name> WITH (replica_placement = '{"num_replicas":3, "placement_blocks":[
{"cloud":"gcp","region":"us-central1","zone":"us-central1-a","min_num_replicas":1},
{"cloud":"gcp","region":"us-central1","zone":"us-central1-b","min_num_replicas":1},
{"cloud":"gcp","region":"us-central1","zone":"us-central1-c","min_num_replicas":1}
]}');
-- Recreate the tables using the corrected tablespace
CREATE TABLE <table_name> (...) TABLESPACE <tablespace_name>;5. Verify the Fix
After correcting the tablespace configuration, the under-replicated tablets should start to get placed correctly. You can monitor the /api/v1/tablet-under-replication endpoint to confirm that the list of under-replicated tablets becomes empty.
Once the list is empty, you can retry the operation that was failing.
Additional Information
ERROR: Invalid table definition: Error creating table <table_name> on the master: Not enough tablet servers in the requested placements. Need at least X, have Y
For more information on tablespaces, please refer to the YugabyteDB documentation on Tablespaces.
Comments
0 comments
Please sign in to leave a comment.