Throttling mechanism in YugaByte TServer due to high Memory Usage. – Yugabyte

When a yugabyte TServer instance is deployed, it is configured with Hard and Soft limits in terms of memory usage. By default, the hard limit is set to 85% of total memory available. The following code in the file src/yb/util/mem_tracker.cc in the yugabyte-db repo enforces this:

DEFINE_double(default_memory_limit_to_ram_ratio, 0.85,
            "If memory_limit_hard_bytes is left unspecified, then it is "
            "set to default_memory_limit_to_ram_ratio * Available RAM.");

The soft limit is 85% of the hard limit, enforced by the following code in the same file:

DEFINE_int32(memory_limit_soft_percentage, 85,
            "Percentage of the hard memory limit that this daemon may "
            "consume before memory throttling of writes begins. The greater "
            "the excess, the higher the chance of throttling. In general, a "
            "lower soft limit leads to smoother write latencies but "
            "decreased throughput, and vice versa for a higher soft limit.");

So, for instance if we have a 16GB node, the hard limit will be 13.6GB and the soft limit will be 11.56GB.

After 11.56GB is consumed, the TServer will start to reject new internal writes (when TServers receives an insert/update/delete etc request from an external client, these request are converted into one or more internal writes, which are then replicated through RAFT), with probability change depending on current memory consumption. Here probability change is a random number picked between 0 and 1. If the random number picked is less than (C-S)/(H-S), where S is soft limit, H is hard limit and C is current consumption.

So the probability change will be 0 at 11.56GB consumption and will be 100% at 13.6GB, meaning no internal write will be rejected when the memory consumption is 11.56GB, some internal writes will be rejected when the memory consumption is between 11.56GB and 13.6GB, and all the writes will be rejected when the memory consumption is at 13.6GB.

Note that even though the (internal) writes are rejected, that does not mean that the write request from the client fails completely. As mentioned, a single write request from a client may be translated into multiple internal writes, and only some of these internal writes may be rejected. On rejection, the TServer will retry this internal write with a delay defined by the following code in src/yb/rpc/rpc.cc file:

DEFINE_int32(linear_backoff_ms, 1,
            "Number of milliseconds added to delay while using linear backoff strategy.");

Meaning, after each write rejection, the write will be retried with a delay value, with the delay value increasing by 1ms on each retry. So the first retry will be retried with a delay of 1ms, the second with a delay of 2ms, the third with 3ms and so on.

Due to these delays, the entire write operation originating from the external client will be delayed a bit.

Related articles