Environment
- YugabyteDB - 2.14, 2.16, and 2.18
Issue
Query performance issue in a multi-region deployment. Typically the degradation will be reported only for specific queries. The performance issue will be seen across regions.
Resolution
Overview
This problem is due to a known issue - https://github.com/yugabyte/yugabyte-db/issues/16999 with using LZ4 compression. This issue can be reproduced by setting stream_compression_algo=3, where --stream_compression_algo specifies which RPC compression algorithm to use (Requires enable_stream_compression to be set to true). Valid values are:
0: No compression (default value)
1: Gzip
2: Snappy
3: LZ4
The performance issue is caused as there is a discrepancy between the data-length reported by the RPC header vs the actual data received over the socket by pggate.
The data-length in the RPC header is always higher. pggate on the receiving/reading side notices this difference, creates a buffer that is large enough to hold larger data as reported by the RPC header, and waits until the remaining data arrives.
But of course, there is no more data to be sent by the local T-Server, and so pggate blocks.
What resolves this standoff situation is a "no-data" message/packet/RPC that gets sent with an upper bound of 5s. This message/RPC/packet tells pggate that there is no more data to be received, thus resetting the buffer and forcing pggate to process what it has already received.
The issue that we're seeing isn't really a compression problem but Snappy and LZ4's compression implementations differ ever so slightly, in that Snappy always sends the "no-data" packet, whereas LZ4 does not.
And so, in case of LZ4, pggate is stuck waiting for the remaining data to arrive. In both cases, the data field in the RPC header and the length of the actual data differ by the exact same amount.
To hit the latency issue the RPC response size needs to meet this criteria compressed response size < internal buffer size < decompressed response size.
Steps
The workaround is to change stream_compression_algo from 3 to 2 to switch to Snappy compression which is not affected by this issue.
Additional Information
Internal DB JIRA - https://yugabyte.atlassian.net/browse/DB-6318
GH Issue - https://github.com/yugabyte/yugabyte-db/issues/16999
Comments
0 comments
Please sign in to leave a comment.