Environment
- YugabyteDB Anywhere - 2.16 and above.
Issue
After Upgrading YBAnywhere to 2.16.1 Database backups take more time to complete.
YBAnywhere Application log shows most of the time is spent in Phase 3 of the database backup:
YW 2023-02-21T05:38:11.854Z [DEBUG] 085f7d70-02e5-4c10-8d38-6ecbd455179f from ShellProcessHandler in TaskPool-MultiTableBackup(d704042e-3bf2-4dfe-91d0-12fe350fce8d)-0 - 0:00:16.324224 : PHASE 3 : Upload snapshot directories
After Upgrade:
YW 2023-02-22T14:52:26.173Z [DEBUG] 10d36e82-2bda-45f8-905b-889efb2be5ad from ShellProcessHandler in TaskPool-MultiTableBackup(d704042e-3bf2-4dfe-91d0-12fe350fce8d)-5 - 9:49:02.058427 : PHASE 3 : Upload snapshot directories
There are two possible Scenarios:
1. The log doesn't have any useful information on where the time is being spend during Phase 3.
2. The logs would report the following WARNING messages during Phase 3.
WARNING: Found a snapshot directory '/mnt/disk0/yb-data/tserver/data/rocksdb/table-<id>/tablet-<id>.snapshots/<id>' on tablet server '<tablet_server>' that is not present in the list of tablets we are interested in that have this tserver hosting it (..., ... ), skipping
This KB is specific to scenario 2. Depending on the number of tablet leaders in the Universe, the backup can run longer, as checking and printing the Warning messages in the log can increase the Backup run time.
Resolution
Overview
The problem is due to the yb_backup.py
script spending too much time waiting to find the tablet leaders as part of the backup and printing WARNING messages if they identify a Follower( thats not needed for the backup). This is the part of the code that's causing the problem.
Steps
Workaround for this issue is to comment out the following lines in the yb_backup.py script, and retry the backup:
1. SSH/Login into the yugaware docker container.
(Follow the appropriate steps to login to the Yugaware host based on the deployment type used)
2. Edit the backup script
vi /opt/yugabyte/devops/bin/yb_backup.py
3. Comment out a group of lines starting at line 2428 to skip the step to print WARNING Messages: (please don't comment the continue line at the end)
if tablet_id not in tablets_by_tserver_ip[tserver_ip]:
# logging.warning(
# ("Found a snapshot directory '{}' on tablet server '{}' that is not "
# "present in the list of tablets we are interested in that have this "
# "tserver hosting it ({}), skipping.").format(
# snapshot_dir, tserver_ip,
# ", ".join(sorted(tablets_by_tserver_ip[tserver_ip]))))
continue
4. Re-run the backup.
Additional Information
We have also filed a JIRA internally to investigate the issue further. https://yugabyte.atlassian.net/browse/PLAT-7481
The steps need to be executed again if YugabyteDB Anywhere is upgraded to a newer version (that doesn't have the code fix) after implementing the above solution. (As the yb_backup.py file will get overwritten).
Comments
0 comments
Please sign in to leave a comment.