Environment
- YBA Version: all versions
Issue:
Customers have consistently encountered issues when configuring High Availability (HA) in YBA, especially related to certificate management. This KB outlines a systematic troubleshooting approach for identifying and resolving HA issues. The most common problems include:
- Network/firewall issues (e.g., connection refused or timeout)
- Certificate verification failures (e.g., mismatched certificates)
- Incorrect certificate installation on HA node
- Configuration using a Load Balancer (not permitted by YBA)
Resolution:
Initial Troubleshooting Questions:
- YBA Version: Identify the version of YBA that the customer is running.
- Certificates: Confirm whether the customer is using self-signed or CA-signed certificates.
- Topology: Understand the number of HA nodes and their geographical locations (zones).
Active Troubleshooting Steps:
1.Review the application.log
File:
-
- Check for error messages that point to common issues like network/firewall errors (connection refused, connection timeout) or certificate verification issues.
- Look for patterns of error messages such as:
Caused by: javax.net.ssl.SSLHandshakeException: No trust manager was able to validate this certificate chain: # of exceptions = 3
ERROR: connection refused --network error
Network Issues: Identify if any nodes are unable to connect due to firewall or security group settings. Ensure proper communication between nodes.
2. Verify the Certificate Presented by the YBA UI:
- From your browser, inspect the certificate for both the Active node and each Standby node.
- Steps to follow:
- Right-click on the YBA UI and select Inspect.
- Navigate to the Security tab and click View Certificate.
- Capture the SHA-256 Fingerprint from the browser for each node
3. Compare Certificate Fingerprints:
- Ensure that the SHA-256 fingerprint of the certificate presented in the browser matches the certificate uploaded to the trust store on each node.
openssl x509 -noout -fingerprint -sha256 -in <certificate_file>
If the fingerprints do not match, the wrong certificate may be installed on the node or there was an issue during the certificate upload process.
4. Check Subject Alternate Names (SAN):
- Verify that the hostnames of each YBA node are included in the list of Subject Alternate Names (SAN) in the certificate.
If the hostname is missing, the certificate may not be valid for the node, causing validation errors during HA communication.
5. Root Certificate in the Trust Store:
- Ensure that the root certificate of each node is installed in the trust store for all other nodes in the HA setup.
- CA-signed Certificates: The root certificate is normally the same for all the YBA nodes.
- Self-signed Certificates: There will be a root certificate for each YBA node.
6. Failover Testing:
- Perform a failover to each standby node and confirm that the new active node can successfully communicate with other standby nodes.
- Ensure backups are functioning between nodes after failover. Sometimes, certificates are correctly set up for one direction only, leading to HA issues during failover.
Failover steps: - Promote a standby node to active.
- Confirm that the new active node is reachable by all other nodes.
- Perform a backup operation and ensure it completes without certificate or connection errors.
RCA:
Most HA configuration issues arise from misconfigured certificates, including:
- Mismatched certificates between the browser and the trust store.
- Missing Subject Alternate Names (SANs) for hostnames, leading to validation errors.
- Improper root certificate setup, particularly with self-signed certificates where each node requires the root certificate of every other node in its trust store.
- One-way certificate setup causing failures during HA failover.
Using a Load Balancer for configuration is not permitted by the YBA architecture
- When configuring a standby HA node, this must be done by accessing the node directly. The YBA architecture will not permit configuration of a standby HA instance while trying to access that instance through a load balancer. Once all the HA nodes are configured, a load balancer can be used to direct traffic from a single hostname or IP to the active instance but when making any HA changes, this must be done by connecting to the target node directly.
Comments
0 comments
Please sign in to leave a comment.