This issue affects all currently released versions of Yugabyte.
- Database operations fail with the message
"Operation failed, memory consumption has exceeded its limit or the limit
of an ancestral tracker".
- Inbound RPCs are hung with the message
"Unable to allocate read buffer because of limit".
- Soft memory limit errors
"Soft memory limit exceeded (at X% of capacity)" or "We have exceeded our soft
memory limit (current capacity is x%). However, there are no-ops currently
runnable which would free memory".
- CQL calls are rejected on the client side when the soft memory limit is reached.
"Coordinator node overloaded" or "Unable to connect"
- Exceeding hard and soft memory limits triggering Out-of-memory.
Configure TServer flags to help alleviate the issue
This will require a restart of all the processes to take effect.
- Log in to Yugabyte DB admin console.
- Click Universes > select the universe you want to make a change > nodes.
- Click on the Actions drop-down located at the top right corner.
- Click on Edit Flags as shown below.
db_block_cache_size_percentage = 40
global_memstore_size_mb_max = 1G
5. Update the values for T-server, check Rolling is highlighted for the upgrade option and hit OK.
Consider a Schema or DB Redesign
Colocated Tables: Another option to optimize for memory is to limit the tablets per node using a feature called Colocating Tables. This puts all of the data into a single tablet, will reduce the number of tablets per node, thereby reducing resource consumption and RPCs volume. If your workload supports it, using colocated tables may be a partial solution. Consider your options regarding colocated tables and see if you can modify your schema to fit into this use case.
Note: Colocation is enabled during database creation time. The data in the colocation tablet is still replicated based on the cluster replication factor. Refer to the following links for exploring more.
Reduce Shards per TServer or per Table
This is recommended for small tables. You can set the values on a global level using Tserver flags, or on a per-table basis, using a feature called Tablet Splitting. Refer to the following links for exploring both concepts.
The following are some of the root causes:
- The database has a soft memory limit with a default value of 85% managed by the memory_limit_soft_percentage parameter which when exceeds, throttles the write request.
75 DEFINE_int32(memory_limit_soft_percentage, 85, <----
76 "Percentage of the hard memory limit that this daemon may "
77 "consume before memory throttling of writes begins. The greater "
78 "the excess, the higher the chance of throttling. In general, a "
79 "lower soft limit leads to smoother write latencies but "
80 "decreased throughput, and vice versa for a higher soft limit.");
81 TAG_FLAG(memory_limit_soft_percentage, advanced);
In certain situations such as in the case of a large database operation and a node is restarted or a new node joins the cluster or when there is a huge lag between leader and followers, the RPCs size/calls for the data catchup between tablet leaders and followers significantly increases.
The tablet server is slow in processing these AppendEntries RPCs due to their volume. These inbound RPCs do not fit in the read buffer, binaryCallParser is unable to increase buffer size due to the memory limits, and requests indefinitely hang. Because of the exceeding memory limits, and TCP Receive buffer full, these request gets blocked from further processing. The unprocessed requests consume more memory reaching the hard memory limit triggering Out of memory(OOM).
- This issue could also occur due to the high tablet count per node in the cluster. Please refer to the article High number of tablets.
If you need more help, please reach out to Yugabyte Support.