Table of Contents

Environment

YugabyteDB YSQL

Issue

When creating an index in YugabyteDB (or PostgreSQL), you may encounter the error:

ERROR:  timed out waiting for postgres backends to catch up
DETAIL:  1 backends on database 13515 are still behind catalog version 2.
HINT:  Run the following query on all tservers to find the lagging backends: SELECT * FROM pg_stat_activity WHERE backend_type != 'walsender' AND backend_type != 'yb-conn-mgr walsender' AND catalog_version < 2 AND datid = 13515;

This error indicates that some backend Postgres processes are still using an outdated catalog version, blocking the index creation.

Additional Information:

The yb-master UI shows the task as waiting on catalog version.

Index Creation Fails with “timed out waiting for postgres backends to catch up”.png

Resolution

Overview

The most common cause is a session in an idle in transaction state or long-running queries or transactions. This happens when a user or process connects to the database, runs BEGIN, and then leaves the session open without committing or rolling back. These sessions prevent the catalog from advancing, blocking DDL operations like CREATE INDEX.

Steps

Identify the catalog version the build is waiting on

On the same node where the CREATE INDEX session is running, open a second YSQL session and run the following query. It joins pg_stat_progress_create_index with pg_stat_activity to expose the current phase, wait event, and the catalog_version the build is waiting on:

SELECT
  i.relname AS indexname,
  t.relname AS tablename,
  p.phase,
  p.tuples_total,
  p.tuples_done,
  CASE
    WHEN p.tuples_total > 0
    THEN ROUND((p.tuples_done::numeric / p.tuples_total::numeric) * 100, 2)
    ELSE 0
  END AS percent_complete,
  now() - a.backend_start AS time_elapsed,
  a.wait_event_type,
  a.wait_event,
  a.catalog_version,
  a.datid,
  a.datname
FROM pg_stat_progress_create_index p
JOIN pg_stat_activity a ON p.pid = a.pid
JOIN pg_class i ON p.index_relid = i.oid
JOIN pg_class t ON p.relid = t.oid;

Example output:

       indexname       | tablename |    phase     | tuples_total | tuples_done | percent_complete |  time_elapsed   | wait_event_type |    wait_event    | catalog_version | datid |  datname
-----------------------+-----------+--------------+--------------+-------------+------------------+-----------------+-----------------+------------------+-----------------+-------+-----------
 idx_employees_email_1 | employees | initializing |      3000000 |           0 |             0.00 | 01:48:36.856407 | RPCWait         | WaitingOnTServer |               9 | 13515 | yugabyte

Interpret the row as follows:

phase = initializing with wait_event = WaitingOnTServer confirms the build is stuck in the catalog-propagation phase, not in the actual backfill.
catalog_version (9 in the example) is the version the backend is waiting for all other backends to reach. Use this value as the threshold in the next step.
datid (13515 in the example) is the OID of the database in which the index is being created. Catalog versions are tracked per database, so this value must also be carried forward as a filter in the next step to avoid false positives from sessions in unrelated databases.
time_elapsed tells you how long the build has been blocked.

Note: The view pg_stat_progress_create_index is local to the node that received the CREATE INDEX statement. Run this query from the same node as the CREATE INDEX session, otherwise it returns no rows.

Find the lagging backends on all tservers

Connect to each tserver in the universe and list the backends whose catalog_version is below the value from the previous step, scoped to the database where the index is being built. Replace 9 with the catalog_version and 13515 with the datid you observed:

SELECT *
FROM pg_stat_activity
WHERE backend_type != 'walsender' AND backend_type != 'yb-conn-mgr walsender'
  AND catalog_version < 9
  AND datid = 13515;   -- Replace with actual datid

Each row returned is a session that has not yet refreshed its catalog cache and is therefore blocking the index build. Note the pid, datname, usename, application_name, client_addr, state, backend_start, xact_start, and query columns for each row — you will need these to identify the owning application before taking action.

Apply a solution

Once the lagging backends are identified, choose one of the following solutions based on whether the blocking sessions are expected to complete on their own.

Solution 1: Increase Timeout

If you know that the blocking sessions will complete in a particular time, you can increase the timeout for waiting for backends to catch up by setting the yb_wait_for_backends_catalog_version_timeout flag to a higher value (e.g., 10 minutes):

set yb_wait_for_backends_catalog_version_timeout to '10min';

Solution 2: Verify, Confirm and Terminate Blocking Backends

Warning: Always confirm with the application team before terminating any backend. The lagging sessions returned by the previous step may correspond to legitimate batch jobs, ETL pipelines, or long-running reports. Abruptly terminating them can cause partial writes, application errors, or data inconsistency. Use the application_name, usename, client_addr, and query columns from pg_stat_activity to identify the owner and obtain explicit approval before proceeding.

Once approved, terminate each blocking session by its pid:

SELECT pg_terminate_backend(<pid>);

Replace <pid> with the process ID from the pg_stat_activity output of the Find the lagging backends on all tservers step.

Retry Index Creation

After terminating the blocking sessions, retry your CREATE INDEX statement.

If the Issue Persists
- Re-run the diagnostic to check for new blocking sessions.
- Investigate which user or process is creating these sessions and address the root cause (e.g., application code leaving transactions open or long-running queries).