This issue affects all currently released versions of Yugabyte.
- Yugabyte database and node crashes with the following errors in OS diagnostic logs.
dmesg -T |grep -i 'Out of memory'
Out of memory: Kill process 23930 (yb-tserver) score 420 or sacrifice child
- We have also seen the tablet server process crash due to an open issue with gperftools 2.8 package(tcmalloc library). please see the following GitHub issues for more details.
Troubleshooting for Out of memory (OOM) issues has a broad scope as it can occur for numerous reasons depending on the environment and workload running. This article covers only tuning the Yugabyte database cluster to reduce the possibility of the crash due to Out of memory. It is essential to optimize memory performance following best OS practices. Please also refer to OS-specific to prevent system crashes due to the Out of Memory killer daemon.
Drop unused tables
The first recommended solution is to drop unused tables, as each table will create tablets according to the sharding strategy in use, which is by default 8 shards per table. Review your data and determine if you can drop test tables or tables which are not in use, or see if you can do a schema re-design to reduce the number of tables required.
Reduce SQL connections
Each SQL connection has some overhead, as there will be a separate postgres process per connection. Especially on systems with small amounts of memory, reducing the number of postgres connections may reduce the chances of OOM killing. This can be enforced by setting the ysql_max_connections flag:
Use Collocated tables or Tablegroups(Beta)
If your workload supports it, using colocated tables may be a partial solution. Colocated tables will reside on the same tablets, so there is no additional tablet overhead. Small, relational tables are a good candidate for colocated tables. Consider your options regarding colocated tables and see if you can modify your schema to fit into this use case:
Note - This feature is in Beta and strictly should NOT be used in production environments.
Tune configuration flags to decrease the overhead of each individual tablet
If you are unable to reduce the number of tablets to an acceptable number using one of the methods above, it is possible to reduce the memory usage of the core DB by adjusting the following settings. Please confirm these values with Yugabyte Support for your specific use case:
db_block_cache_size_percentage = 40
global_memstore_size_mb_max = 1024
default_memory_limit_to_ram_ratio = 0.80
If you have any questions or feedback, please reach out to Yugabyte Support.