Environment

YugabyteDB Anywhere

Issue

When a scheduled backup is triggered in YugabyteDB Anywhere (YBA), it may fail if another universe operation is already in progress for example another backup, Stop node, LDAP user sync etc which included task triggered via API/Automation. This can result in a "Backup Schedule Failure" alert, even if the backup is retried and eventually succeeds. The default alert configuration may cause false positives, leading to unnecessary notifications.

Root Cause

The backup schedule and another universe operation (e.g., a sync or maintenance task) are scheduled to run at overlapping times.
When the backup is triggered while the universe is locked by another operation, the backup cannot start and fails initially.
YBA's alerting system is configured with a default duration of 0 seconds for the "Backup Schedule Failure" alert, which fires immediately on any failure, even if the backup is retried and succeeds shortly after.

Solution

Tune the alert configuration to reduce false positives:

Increase the alert duration for the "Backup Schedule Failure" alert from the default 0 seconds to 3600 seconds (1 hour) or any other desired value.
This change ensures that the alert will only fire if the backup remains in a failed state for a continuous hour, preventing alerts for transient or retried failures.

Steps to Tune Alert Configuration

Navigate to Alert Configurations:
- In YBA, go to Admin > Alert Configurations.
Locate the "Backup Schedule Failure" Alert:
- Find the alert rule for "Backup Schedule Failure".
Edit the Alert Duration:
- Change the duration parameter from 0s (seconds) to 3600s (1 hour).
Save the Configuration:
- Apply and save the changes.

Conclusion

Adjusting the alert duration for "Backup Schedule Failure" in YugabyteDB Anywhere helps prevent unnecessary notifications caused by temporary conflicts with other universe operations. By setting the alert duration to 3600 seconds (1 hour), you ensure alerts are only triggered for persistent backup failures and not for short-lived, automatically retried incidents. This tuning reduces false positives and allows your team to focus on genuine issues requiring attention.

How to Tune Alert Configuration for Backup Schedule Conflicts

Environment

Issue

Root Cause

Solution

Steps to Tune Alert Configuration

Conclusion

Comments

Environment

Issue

Root Cause

Solution

Steps to Tune Alert Configuration

Conclusion

Related articles