Environment:
Yugabyte DB using YSQL
This issue affects specific versions based on an upgrade path. This path is specified here:
Initial version:
- 2.1.8.1 to 2.2.6
- any 2.3
- 2.5.0 to 2.5.1
Final version:
- 2.4
- 2.7.1
Problem summary:
After Upgrading the version according the pattern above, these issues appear.
1.The health check (core files) is failing in the nodes
Many Core Files with the Following Message:
core_postgres.6751: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'postgres: shopping_cart_dev_user shopping_cart_dev 10.35.232.14(58534) startup', real uid: 993, effective uid: 993, real gid: 989, effective gid: 989, execfn: '/home/yugabyte/yb-software/yugabyte-2.4.4.0-b7-centos-x86_64/postgres/bin/postgres', platform: 'x86_64'
2.Connection to Postgres fails immediately with the following error:
[yugabyte@hostname logs]$ ysqlsh
ysqlsh: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
Resolution
Product fixes for this issue are available in the following releases:
- 2.4.5
- 2.6
- 2.7.2
Workaround:
For Short Term solution, we can use Following Gflag in the master server to resolve this issue on one of the affected versions
use_docdb_aware_bloom_filter = false
Note: ONLY for master processes
Root Cause
This issue is caused because the bloom filter generated an incompatible filter for master tablets which were applied starting in 2.4.4.
Starting in the affected versions, (2.1.8.1 to 2.2.6, any 2.3, or 2.5.0 to 2.5.1), a bloom filter compatible with hash partitioned tablets was generated for the master tablets. However, master tables for SQL use range partitioned data. This issue did not appear in any of the initial versions because master tablets did not have rocksdb block cache. However, the bloom filter were being generated incorrectly, they just weren't used.
In versions 2.4.4 and 2.7, block cache was introduced for the masters. Thus, the incorrectly generated bloom filter was applied, causing a panic. This is why the workaround of disabling bloom filters for master is successful - despite a bad bloom filter, the flag disables applying the bloom filter so the panic cannot happen.
This specifically occurs on postgres system tables, which are read during a ysql connection start, because postgres system tables are by default range partitioned, rather than hash partitioned. This issue cannot occur for YCQL use cases because CQL tables cannot be created with range sharding. For more clarity on the difference in range and hash partitioned data, check the documentation on sharding here: https://docs.yugabyte.com/latest/architecture/docdb-sharding/sharding/
The long term resolution for this takes into account range partitioned data and generates a correct bloom filter for the rockdb scan.
Comments
0 comments
Please sign in to leave a comment.