Environment:
Yugabyte core DB - all versions
Issue
A CQL client encounters one or more of the following errors:
Client-side (application logs):
Unable to connect to any servers', {'10.20.30.40': ConnectionException('Failed to initialize
new connection to 10.20.30.40: Error from server: code=1001 [Coordinator node overloaded,
rejecting connection]com.datastax.driver.core.exceptions.OperationTimedOutException com.datastax.driver.core.exceptions.TransportException
TServer logs:
UpdateTransaction request on yb.tserver.TabletServerService from 10.238.220.15:38209 dropped due to backpressure. The service queue is full, it has 5000 items.
Note: The message "Coordinator node overloaded" is generated by the Cassandra/CQL client driver, not by the YugabyteDB server. The server sends CQL error code OVERLOADED (0x1001), and the driver translates that into the "Coordinator node overloaded" text the application sees.
Conceptual Background
Services, queues, and threads inside a TServer
Each YugabyteDB TServer process is composed of multiple services, each responsible for a distinct area of work — tablet operations, consensus (Raft leader election), CQL request handling, and so on. Each service has:
- A request queue — a bounded, FIFO queue where incoming RPCs wait to be processed.
- A pool of RPC worker threads — each thread picks up one request from the queue and processes it to completion before picking up the next.
The queue has a fixed maximum size (configurable per service). This limit exists to prevent unbounded memory growth: if the server cannot process requests as fast as they arrive, the queue would grow indefinitely and eventually cause an out-of-memory crash. The limit acts as a safety valve.
Why separate queues per service
Different services have different priorities. For example, Raft consensus (leader election, log replication) is more important than a regular read or write — if consensus is starved, it causes unnecessary leader movements and cluster instability.
By giving each service its own queue, a misbehaving or overloaded workload on one service (e.g. a flood of CQL writes) cannot starve a higher-priority service (e.g. consensus). Each service's queue fills and overflows independently.
How backpressure works
Here is what happens step by step when a CQL request arrives:
- CQL layer receives the request. Before doing anything else, it checks:
- Is the server under memory pressure? (soft memory limit exceeded?)
- Are there available CQL processor slots? (concurrent operation limit)
- If either check fails, the request is rejected immediately at the CQL layer — it never reaches the tablet server queue.
- Request enters the TServer service queue. If the CQL-layer checks pass, the request is placed in the tablet server's RPC service queue as a pending item.
- RPC worker threads process requests. Each thread picks up one item from the queue and works on it. Some operations are fast; others (compression, decompression, large scans, cross-shard transactions) are CPU-intensive and take longer.
- Queue fills up. If all RPC threads are busy with slow operations and new requests keep arriving faster than they are completed, the queue grows. Once it reaches its maximum capacity (default: 5000 items), the queue stops accepting new requests.
- Backpressure error is returned. The service pool returns ERROR_SERVER_TOO_BUSY to the layer that submitted the request. The CQL layer receives this and converts it to the CQL-native error code OVERLOADED (0x1001). The client driver then shows "Coordinator node overloaded."
Write-path throttling (separate from queue backpressure)
For write operations specifically, there are two additional checks that happen after the request has been dequeued and is being processed:
- Memory-based throttling: If the tablet's memory tracker shows usage above the soft limit (default 85%), the write is rejected probabilistically — the closer to 100%, the higher the rejection probability.
- SST file-based throttling: If a tablet has accumulated too many SST files (compaction is falling behind), writes are rejected probabilistically to give compaction time to catch up.
These write-path checks do not affect reads, metadata operations, or new connections. They only affect write requests that have already been dequeued.
Summary of the layers
Client Application
│
│ Sees: "Coordinator node overloaded" (code=1001)
│ This text comes from the CQL driver, not from the server.
│
▼
CQL Layer (cqlserver) ◄── Layer 1: CQL-level gates
│ ✓ Check: CQL processor slots available?
│ ✓ Check: Memory below soft limit?
│ If either fails → ERROR_SERVER_TOO_BUSY → OVERLOADED (0x1001)
│
▼
RPC Service Queue ◄── Layer 2: Queue capacity
│ ✓ Check: Queue has room? (default max: 5000)
│ If full → "dropped due to backpressure" → ERROR_SERVER_TOO_BUSY
│
▼
RPC Worker Threads ◄── Layer 3: Processing
│ Thread picks up request and processes it.
│ For writes only:
│ ✓ Check: Tablet memory above 85%? → probabilistic reject
│ ✓ Check: SST files above 24? → probabilistic reject
│
▼
RocksDB / StorageA request can be rejected at any of these layers. Each rejection ultimately results in the same OVERLOADED (0x1001) error to the client, but the cause and the fix are different depending on which layer rejected it. The TServer logs indicate which layer triggered the rejection.
Root Cause — The Four Triggers
The YugabyteDB server returns error code OVERLOADED (0x1001) to CQL clients when it cannot accept more work. Internally, any component that returns ERROR_SERVER_TOO_BUSY is converted by the CQL layer into this OVERLOADED response.
There are four distinct triggers, listed from most common to least common.
1. RPC service queue full (most common)
When all RPC worker threads are busy and the queue reaches its capacity, new incoming requests are rejected with backpressure.
Relevant flag:
| Flag | Default | Description |
|---|---|---|
| tablet_server_svc_queue_length | 5000 | Maximum number of pending requests in the tablet server service queue |
Log signature:
request on yb.tserver.TabletServerService from <ip>:<port> dropped due to backpressure. The service queue is full, it has 5000 items.
2. CQL processor limit exhaustion (common)
The CQL layer limits how many concurrent CQL processors (roughly: active CQL operations) can exist. When this limit is reached, new CQL calls are rejected before they even reach the tablet server queue.
Relevant flag:
| Flag | Default | Description |
|---|---|---|
| cql_processors_limit | -4000 | Max CQL processors. Positive = absolute limit. Negative = per 1 GB of memory limit (e.g. -4000 means 4000 processors per GB). 0 = unlimited. |
Log signature:
Unable to allocate CQL processor, already allocated <N> of <limit>
Code: src/yb/yql/cql/cqlserver/cql_service.cc (GetProcessor / CQLProcessorsLimit)
3. Memory pressure (less common)
There are two memory pressure checks that can trigger this error:
3a. CQL-level memory throttle (applies to ALL CQL calls)
Before a CQL request is queued, the CQL layer checks whether the server is under memory pressure. If the soft memory limit is exceeded, the call is rejected.
Relevant flags:
| Flag | Default | Description |
|---|---|---|
| memory_limit_soft_percentage | 85 | Percentage of hard memory limit at which throttling begins |
| throttle_cql_calls_on_soft_memory_limit | true | Enable/disable CQL memory throttling |
| throttle_cql_calls_policy | 0 | 0 = respond with OVERLOADED error; 1 = silently drop (client gets timeout instead of an explicit error) |
3b. Write-path memory throttle (applies to WRITES only)
When processing a write request, the server checks memory usage at the tablet level. If usage is between the soft limit (85%) and the hard limit (100%), writes are rejected with a probability that increases linearly.
Rejection probability formula:
P(reject) = (M - 85) / 15
Where M is memory usage as a percentage of the hard limit (85 ≤ M ≤ 100).
| Memory usage | Rejection probability |
|---|---|
| 85% | 0% |
| 90% | 33% |
| 92.5% | 50% |
| 100% | 100% |
Caution: An older version of this KB stated the formula as (M-85)/100, which dramatically understates the rejection rate. The correct denominator is 15 (the range from soft limit to hard limit: 100 - 85 = 15).
Code: src/yb/util/mem_tracker.cc (SoftLimitExceeded, line 668)
4. SST file count pressure (uncommon, WRITES only)
When a tablet has accumulated many SST files (indicating compaction is falling behind), write requests are rejected probabilistically to give compaction time to catch up. This does not affect reads or new connections.
Rejection probability formula:
P(reject) = (F - soft_limit) / (hard_limit - soft_limit)
Where F is the number of SST files.
Relevant flags:
| Flag | Default | Description |
|---|---|---|
| sst_files_soft_limit | 24 | SST file count at which probabilistic rejection starts |
| sst_files_hard_limit | 48 | SST file count at which all writes are rejected |
| SST file count | Rejection probability |
|---|---|
| 24 | 0% |
| 36 | 50% |
| 48 | 100% |
Resolution
Immediate actions
- Reduce processing load — lower the number of queries per second from the application side.
- Reduce concurrent connections — each connection consumes a CQL processor slot. Use connection pooling and keep pool sizes reasonable for the cluster size.
- Switch from async to sync queries (or add concurrency limits) — if the application uses executeAsync, hundreds or thousands of queries may execute simultaneously, overwhelming the server. Either switch to synchronous execution or add a concurrency limiter (e.g. a semaphore or bounded executor) to cap in-flight requests.
- Check for compaction backlog — if TServer logs show SST file limit messages, compaction is falling behind. Ensure sufficient disk I/O capacity and CPU.
Scaling
If the load is expected and sustained, the cluster is likely undersized:
- Scale up (more CPU/RAM per node) — directly increases RPC thread capacity and memory headroom.
- Scale out (more nodes) — distributes requests across more TServers.
The most common occurrence for this error is during bulk load or COPY operations that are too aggressive. If the load pattern is not a one-time event and becomes regular, the cluster should be sized to handle it.
Tuning flags (use with caution)
| Flag | When to tune |
|---|---|
| tablet_server_svc_queue_length (default 5000) | Increase if you have CPU headroom but see brief queue spikes. Without more CPU, this only delays the error. |
| cql_processors_limit (default -4000) | Increase if CQL processor exhaustion is the bottleneck and memory allows it. |
| memory_limit_soft_percentage (default 85) | Raise to delay memory-based throttling. Risk: increases chance of OOM. |
| sst_files_soft_limit / sst_files_hard_limit (default 24/48) | Raise if compaction is temporarily behind and you want to allow more writes through. Risk: read amplification from too many SST files. |
| throttle_cql_calls_on_soft_memory_limit (default true) | Set to false to disable CQL-level memory throttling entirely. Risk: OOM under pressure. |
Important: Increasing queue sizes or limits without adding CPU or memory only delays the error — it does not solve the underlying capacity problem. For sustained high load, scaling the cluster is the correct long-term fix.
Reference ID: SUPPORT-900
Comments
0 comments
Please sign in to leave a comment.