Environment
- Yugabyte core DB
- YSQL
- YCQL
Issue
- Application reporting query timeout:
com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT2S at com.datastax.oss.driver.api.core**.DriverTimeoutException.cop**y(DriverTimeoutException.java:34) at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
- Query timeout using cqlsh
OperationTimedOut: errors={'10.184.7.181': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=10.184.7.xxx
- Operation timeout
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /240b:c0e0:204:5400:b434:2:0:46af:9042 (com.datastax.driver.core.exceptions.OperationTimedOutException: [/240b:c0e0:204:5400:b434:2:0:46af:9042] Timed out waiting for server response), /240b:c0e0:204:5400:b434:2:0:4fb3:9042
Resolution
1. Increase the client timeout.
For YCQL:
cqlsh --request-timeout=<timeout-in-seconds>
.
The default value is 10 seconds.
Note: If the client driver’s timeout is set to a lower value than this server-side timeout, then it is possible that the client driver is retrying the same operation, thus making the situation worse (because it will be issuing duplicate requests essentially).
Therefore, another option would be increase the client driver side timeout higher than server-side client_read_write_timeout_ms
timeout setting.
2. Allow the server to allow longer client reads with client_read_write_timeout_ms
. We can start by increasing the timeout client_read_write_timeout_ms
from its default of 60000 (1 minute) to 300000 (5 mins), to see if that allows the query to succeed. This GFlag change does require a rolling restart of the tservers.
Note : The downside of increasing the timeouts is that they are applied globally, and we are taking away the database’s built in mechanism for protecting itself against large queries that would pollute its the signals that the database uses to to cache data.
For YSQL:
ysql_client_read_write_timeout_ms
is the flag that controls/overrides the value of client_read_write_timeout_ms
for ysql. This flag has been introduced in v2.6/2.7 which separates the YSQL timeout parameter from YCQL. Earlier releases have the same parameter for both.
3. Use the partition_hash function to parallelize your query to avoid timing out on a single thread:
https://docs.yugabyte.com/latest/api/ycql/expr_fcall/#partition-hash-function
Root Cause:
Yugabyte has an internal query timeout as a primitive throttling mechanism which is used to prevent long running or runaware queries from consuming all resources on the system. By default, this timeout is 60 seconds. This means that any query which runs longer than 60 seconds will be cancelled and will not succeed.
In general, the fact that the query did not succeed means that it is not well written to take advantage of Yugabyte architecture - increasing the client_read_write_timeout_ms
may help to mitigate the issue but is not a long term recommended solution. Instead, you should work to leverage the parallelism of Yugabyte DB to scale your query - for example, by using the partition_hash
function if you are using the YCQL api. Find documentation on partition_hash here: https://docs.yugabyte.com/latest/api/ycql/expr_fcall/#partition-hash-function
If the query which is failing is a row count, it’s recommended to review the article on ycrc
- which is a tool which leverages the partition_hash
function to parallelize a row count operation. This article is available here: https://yugabyte.zendesk.com/hc/en-us/articles/360060685992
Comments
0 comments
Please sign in to leave a comment.