Environment
- YugabyteDB Anywhere - 2.1.+
Issue
Sometimes various states can get stuck in the YugabyteDB Anywhere (YBA) UI, making the UI unusable for certain actions, and making it appear certain tasks are still in progress.
See related KB regarding specific issue of UI stuck in "Waiting for data migration"
https://support.yugabyte.com/hc/en-us/articles/4405157060365-Tasks-UI-hangs-on-progress-of-a-task
The example used in this KB is regarding backups stuck in state "In Progress", where the backup is no longer running.
This issue can occur when backups taken via the YBA UI hang or have problems or terminate but the UI still thinks they are in progress.
A customer had backups in progress for several days and the UI would allow more backups to be taken.
Resolution
Overview
To free up the UI, changes will have to be made to the meta data in order to mark the backups that have terminated as "Failed" in the meta data instead of "In Progress"
Some caution is needed here, as these changes to meta data should only be done with the assistance of Yugabyte Support, as the actual changes to meta data can vary subtly depending on the issue.
A fix for these sorts of issues is being planned for future releases.
Steps
1. Begin by taking a backup of the meta data before making any changes. On the YBA node, create a dump of the yugaware Postgres database.
sudo docker exec -it postgres pg_dump -U postgres yugaware > /<path_to>/yugaware.sql
2. Confirm that the backup in question is no longer running, this can be done by looking at the underlying backup configuration i.e. S3 and conforming nothing is writing to the bucket.
3. On the YBA server, log into the yugaware database.
sudo docker exec -it postgres bash
psql -U postgres yugaware
4. Determine the backups still marked as "In Progress" and the corresponding task uuid
select backup_uuid, state, task_uuid from backup where state = 'In Progress';
Then determine the task and sub-tasks associated with the task uuid above
select uuid, task_type, task_state, parent_uuid from task_info where uuid = <task_uuid>;
select uuid, task_type, task_state, parent_uuid from task_info where parent_uuid = <task_uuid>;
The parent task is (main task) is likely to be something like MultiTableBackup, and the sub tasks likely to be BackupTable, UniverseUpdateSucceeded.
5. Check the Universe table, one of json elements, backupInProgress is like to be set to true.
If you have multiple universes managed by YBA, make sure you determine the correct universe uuid.
select universe_details_json from universe where universe_uuid = <universe uuid>;
6. Firstly cleanup the backup table.
update backup set state='Failed' where state='In Progress';
7. Next clean up the task_info table to mark the backup task and sub tasks as failed.update task_info
update task_info set task_state = 'Failure' where uuid = '<task_uuid>';
update task_info set task_state = 'Failure' where parent_uuid = '<task_uuid>';
8. Finally update the universe table to mark backupInProgress to false for the correct universe
yugaware=# begin;
BEGIN
yugaware=*# update universe set universe_details_json = jsonb_set(universe_details_json::jsonb, '{backupInProgress}', '"false"') where name = '<universe-name>';
UPDATE 1
yugaware=*# Commit;
COMMIT
This should free up the UI and allow new backups to be taken. The previously hung backups should appear as ailed in the task tabs.
Other situations of state stuck
1. When a restore operation that fails, the universe remains in state of {"updateInProgress" : " true"}. This state is not cleared, and blocks all scheduling of tasks, including health checks. This state should clear or self check after a period of time. If it does not clear, may need meta data change with the assistance of Yugabyte Support.
2. An issue with YBA where we see nodes in the state of "Update Cert" - when no actions were taken to make any modifications to certificates.
Node state prevents a bounce for the processes running on the nodes, requiring a rolling restart.
3. When setting gflag on master non-master tservers end up in `Update Gflag` state.
This may need an update to meta data with the assistance of Yugabyte Support.
Or when updating gflags fails due to an issue with the universe, but then the state remains `Update Gflag`, not allowing another actions on the universe.
This again will ned an update to meta data with the assistance of Yugabyte Support.
Comments
0 comments
Please sign in to leave a comment.