- YugabyteDB - 2.4 or lower
- YCQL Write Operations error with
Write failed: Operation failed. Try again. (yb/docdb/lock_batch.cc:31): Timeout: 1.309s: Failed to obtain locks until deadline: 21460499.690s
- High latencies for read/write operations due to RPC threads stuck waiting on a lock.
rpc_tp_TabletServerthread queue build up causing contention and eventually a deadlock condition which is visible on https://<tablet-server-ip>:9000/threadz
Yugabyte's CoreDB engineering is tracking the issue. Please refer to the following Github issues.
1. A rolling restart may resolve this issue. This can be done from Yugabyte's admin console.
2. Set client and server YCQL timeout to the same value based on your application requirements. This values defaults to 60s
DEFINE_int32(client_read_write_timeout_ms, 60000, "Timeout for client read and write operations.");
3. If the issue doesn't resolve after implementing steps 1 & 2, open a P1 ticket with Yugabyte Support
Yugabyte database internally retries the operation on expired or transactions that conflict with other transactions when executing queries on a transaction enabled YCQL table. In a situation where client timeout is set to lower than server timeout, the YCQL service will retry the operation continuously without aborting causing a high RPC queue to build up. This eventually leads to a deadlock condition where threads trying to get a lock are blocked and are waiting resulting in overall latency and slowness.
Please sign in to leave a comment.