Environment
- YugabyteDB
Overview
This article provides a step-by-step troubleshooting guide for diagnosing and resolving database clone timeouts, using a real-world incident as a case study. It highlights key log lines, explains the clone workflow, and provides actionable steps to address the issue.
Issue
- Symptom:
CREATE DATABASEusing clone timed out.- User can see the clone target database, but cannot connect to it.
- Connections to the clone target are not enabled, as the clone was not fully successful.
Clone Workflow & Key Logs
- Master leader is: n1.
- Involved Tservers: n1, n2.
Workflow and Logs:
- Clone Initiation (tserver n1):
I0416 10:28:00.347015 3523954 client.cc:976] Creating database clone_db_testnoncoldb35 as clone of database testnoncoldb35
- CloneNamespace RPC to Master Leader:
I0416 10:28:00.347188 694864 clone_state_manager.cc:259] Servicing CloneNamespace request: source_namespace { name: "testnoncoldb35" database_type: YQL_DATABASE_PGSQL } restore_ht: 7146756834701565952 target_namespace_name: "clone_db_testnoncoldb35" pg_source_owner: "yugabyte" pg_target_owner: "yugabyte"- Master sends ClonePgschema Request to tserver n2 (ysql_dump starts):
- Note: The dump took 6 minutes.
I0416 10:28:00.441762 2660014 ysql_binary_runner.cc:34] Running tool: [/apps/yugabyte/yb-software/yugabyte-2024.2.2.1-b6-centos-x86_64/bin/../postgres/bin/ysql_dump, --host=/tmp/.yb.0.0.0.0:5433, --port=5433, --schema-only, --serializable-deferrable, --create, --read-time=1744813680347062, --include-yb-metadata, testnoncoldb35]
- Creating the Schema (tserver n2):
I0416 10:33:55.909320 2660014 ysql_binary_runner.cc:34] Running tool: [/apps/yugabyte/yb-software/yugabyte-2024.2.2.1-b6-centos-x86_64/bin/../postgres/bin/ysqlsh, --host=/tmp/.yb.0.0.0.0:5433, --port=5433, --file=ysql_dump_P9N0xf, --set, ON_ERROR_STOP=on]
- Creating Namespace and Objects (master leader):
I0416 10:33:56.992177 1791223 catalog_manager.cc:8333] CreateNamespace from 22.36.181.25:42559: name: "clone_db_testnoncoldb35"
- Finished Creating Last Object:
- Note: Creating database objects took ~7 minutes.
I0416 10:40:17.705891 2189688 catalog_manager.cc:4345] Successfully created index k_9
- Timeout Error (tserver n1):
I0416 10:38:00.348660 3523954 client_master_rpc.cc:79] 0x000051a1ac49dc20 -> ListClones: Failed: Timed out (yb/client/client_master_rpc.cc:43): Request ListClones timed out after the deadline expired
Summary Table
| Step | Node | Log Snippet / Action |
|---|---|---|
| Clone Initiation | tserver n1 | Creating database clone_db_testnoncoldb35 as clone of database testnoncoldb35 |
| CloneNamespace RPC | master n1 | Servicing CloneNamespace request... |
| ysql_dump | tserver n2 | Running tool: ... ysql_dump ... |
| ysqlsh execution | tserver n2 | Running tool: ... ysqlsh ... |
| CreateNamespace | master n1 | CreateNamespace ... name: "clone_db_testnoncoldb35" |
| Last object created | master n1 | Successfully created index k_9 |
| Timeout error | tserver n1 | ListClones: Failed: Timed out ... |
Root Cause
- The total time for cloning the schema (dump + create objects) was 13 minutes (6 + 7).
- The configured timeout for the clone operation (
ysql_clone_pg_schema_rpc_timeout_ms) was 10 minutes. - The operation exceeded the timeout, causing the tserver to report a failure, even though the schema creation eventually completed.
Resolution
Steps
- Check the logs on the master leader and tservers for the above patterns.
- Compare the total time taken for schema dump and object creation with the configured timeout.
- Increase the value of
ysql_clone_pg_schema_rpc_timeout_msif your schema is large or clone operations are slow.
Conclusion
- Database clone timeouts typically occur when the schema dump is large and its creation processes exceed the configured timeout.
- Increasing
ysql_clone_pg_schema_rpc_timeout_msto accommodate larger schemas or slower clusters will prevent premature failures.
Reference number
- SUPPORT-723
Comments
0 comments
Please sign in to leave a comment.