Environment
- Yugabyte Anywhere: All versions
- YugabyteDB Version: - All versions
Issue
Intermittent incremental backup failures are observed with the error message "Got empty success marker for base backup". This can affect the reliability of the backup and restore strategy.
Resolution
Overview
The root cause of this issue is intermittent network connectivity problems between the YugabyteDB cluster and the backup storage location (e.g., S3-compatible object storage). The backup process attempts to download a success marker file from the storage to verify the status of the base backup. When the download times out due to network issues, the process fails and reports the "Got empty success marker" error. The error curlCode: 28, Timeout was reached in the logs is a strong indicator of this network timeout.
Steps
1. Verify the Error in the Logs
To confirm that the issue is caused by a network timeout, you can search for the error message in the YugabyteDB Anywhere (YBA) logs. The logs are typically located in the support bundle YBC logs.
Look for log entries similar to the following:
E20251111 05:30:27.678261 259 cloud_store_s3.cc:335] Download Failed: univ-04f7af47-dfeb-4bfe-8c38-4dfffaa91c15/ybc_backup-6d29e275b9554b0a8fd4169adaf534c1/full/2025-11-11T05:13:20/multi-table-obf_checkpoint_2693470d381c4fcd9e276b886bb2481a/success Error: curlCode: 28, Timeout was reached
E20251111 05:30:27.678416 259 cloud_store_s3.cc:162] Got an unknown aws http response code: -1
I20251111 05:30:27.678478 259 cloud_store.cc:111] Op failed, retrying in: 1s; count: 1
This log indicates that a download operation from the S3-compatible storage failed with a timeout (curlCode: 28).
2. Troubleshoot Network Connectivity
Since the issue is network-related, you need to investigate the network path between your YugabyteDB cluster and the backup storage.
- Check Network Stability: Work with your network team to monitor the network for any signs of instability, packet loss, or high latency between the cluster and the storage endpoint.
- Firewall and Security Groups: Ensure that firewalls, security groups, and any other network security appliances are not interfering with the connection to the storage. Check for any rules that might be throttling or dropping connections.
- Test with Different Storage Endpoints: If possible, try configuring backups to a different storage endpoint or even a different S3-compatible provider (e.g., Minio, another cloud provider's object storage) to see if the issue persists. This can help isolate whether the problem is with the network path or the storage provider itself.
3. Monitor Backups
After taking steps to address potential network issues, it's important to monitor the backups to ensure the problem is resolved.
- Observe Backup Status: Keep a close watch on the status of your incremental backups in the YugabyteDB Anywhere UI.
- Review Logs: If failures continue, review the logs again to see if the curlCode: 28 error is still present.
4. Mitigation
If the network issues are outside of your immediate control, you can consider the following mitigation strategies:
- Increase Timeouts (If possible): In some cases, it may be possible to increase the timeout for S3 operations.
- Backup Scheduling: If the failures are intermittent, you can adjust your backup schedule to run more frequently. This increases the chances of getting a successful backup. However, this is a temporary workaround and does not address the root cause.
Additional Information
The curlCode: 28 error is a generic timeout error from the curl library, which is used internally by YugabyteDB for S3 communication. It's not a YugabyteDB-specific error but rather an indication of a problem at the network transport layer.
SUPPORT-844
Comments
0 comments
Please sign in to leave a comment.