Getting error messages like "Timed out waiting for server response" and "Client request timeout" – Yugabyte

Environment

Yugabyte core DB
YSQL
YCQL

Issue

Application reporting query timeout:

  com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT2S  
    at com.datastax.oss.driver.api.core**.DriverTimeoutException.cop**y(DriverTimeoutException.java:34)  
    at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)

Query timeout using cqlsh

   OperationTimedOut: errors={'10.184.7.181': 'Client request timeout.  See Session.execute[_async](timeout)'}, last_host=10.184.7.xxx

Operation timeout

  Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /240b:c0e0:204:5400:b434:2:0:46af:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [/240b:c0e0:204:5400:b434:2:0:46af:9042] Timed out waiting for server response), /240b:c0e0:204:5400:b434:2:0:4fb3:9042

Resolution

1. Increase the client timeout.

For YCQL:

cqlsh --request-timeout=<timeout-in-seconds>.

The default value is 10 seconds.

Note: If the client driver’s timeout is set to a lower value than this server-side timeout, then it is possible that the client driver is retrying the same operation, thus making the situation worse (because it will be issuing duplicate requests essentially).

Therefore, another option would be increase the client driver side timeout higher than server-side client_read_write_timeout_ms timeout setting.

2. Allow the server to allow longer client reads with client_read_write_timeout_ms. We can start by increasing the timeout client_read_write_timeout_ms from its default of 60000 (1 minute) to 300000 (5 mins), to see if that allows the query to succeed. This GFlag change does require a rolling restart of the tservers.

Note : The downside of increasing the timeouts is that they are applied globally, and we are taking away the database’s built in mechanism for protecting itself against large queries that would pollute its the signals that the database uses to to cache data.

For YSQL:

ysql_client_read_write_timeout_ms is the flag that controls/overrides the value of client_read_write_timeout_ms for ysql. This flag has been introduced in v2.6/2.7 which separates the YSQL timeout parameter from YCQL. Earlier releases have the same parameter for both.

3. Use the partition_hash function to parallelize your query to avoid timing out on a single thread:

https://docs.yugabyte.com/latest/api/ycql/expr_fcall/#partition-hash-function

Root Cause:

Yugabyte has an internal query timeout as a primitive throttling mechanism which is used to prevent long running or runaware queries from consuming all resources on the system. By default, this timeout is 60 seconds. This means that any query which runs longer than 60 seconds will be cancelled and will not succeed.

In general, the fact that the query did not succeed means that it is not well written to take advantage of Yugabyte architecture - increasing the client_read_write_timeout_ms may help to mitigate the issue but is not a long term recommended solution. Instead, you should work to leverage the parallelism of Yugabyte DB to scale your query - for example, by using the partition_hash function if you are using the YCQL api. Find documentation on partition_hash here: https://docs.yugabyte.com/latest/api/ycql/expr_fcall/#partition-hash-function

If the query which is failing is a row count, it’s recommended to review the article on ycrc - which is a tool which leverages the partition_hash function to parallelize a row count operation. This article is available here: https://yugabyte.zendesk.com/hc/en-us/articles/360060685992

Environment

Issue

Resolution

Root Cause:

Related articles