Environment

YugabyteDB Anywhere - 2.x

Issue

A customer had a new policy that any AMI (Amazon Machine Image) file would be deleted from the system after 90 days.

They wanted to retain the same IP address for the VM (Virtual Machine) so that they did not have to update the underlying clients of the YBDB.

The AMI file could not be reinstated and this made any change to the universe (pause/resume etc) fail.

Resolution

Overview

All steps are to be run sequentially and the previous step must complete before moving to the current step.

Upgrading to 2.18.3 or later is required for these steps to complete (preferably 2.20.x).

Steps

1. Backup Master/Tserver/Yb-controller gFlag confs

Copy the content of master/tserver/yb-controller gFlag content from the nodes so as to ensure when we start these services post patching, it comes with the same set of gFlags.
SSH into the DB node.
```
cd master/conf
```
Copy the content of server.conf & store the content elsewhere.
Repeat the above for tserver/yb-controller as well.

2. AWS UI/CLI Steps

Deploy a VM from AWS console (using the AMI that you want the VM to be using)

Ensure that the boot disk is not deleted post VM deletion.
Terminate the VM deployed above
- Note:(we are only interested in the boot disk that we ensured will not be deleted as part of the VM deletion).
Stop the node associated with the universe from the AWS console.
- Once the node is stopped, unmount the boot volume (and data volume(s) if required) (details can be obtained from the AWS console for the VM) associated with the above instance.
- NOTE: you may need to unmount data volumes as well but this is not usually required.
- Sample AWS CLI command for the same is -

aws ec2 detach-volume --volume-id <volume_id> --instance-id <instance_id>

Mount the new volume (that we retained post deletion of new VM), to the above instance_id.

aws ec2 attach-volume --volume-id <volume_id> --instance-id <instance_id> --device <device_path(ex: /dev/sda1)>
aws ec2 wait volume-in-use --volume-ids <volume-id>
aws ec2 modify-instance-attribute --instance-id <instance-id> --block-device-mappings "[{ \"DeviceName\": \"<device-name>\", \"Ebs\": { \"DeleteOnTermination\": true/false } }]"

Mount any other volumes that were previously detached.
Start the node that we stopped in step 3 from the AWS console.

3. Update YBA universe_details

We can use edit_universe_details.py for performing this operation

Take a Backup of the Yugabyte Anywhere Database:

For a YBA Installer or Yugabundle based install:

cd /opt/yugabyte/software/active/pgsql/bin
pg_dump -U postgres -d yugaware > platform_dump.sql

For the Kubernetes based install:

kubectl exec -it <yugaware pod> -c postgres -n <namespace> -- pg_dump -U postgres yugaware > platform_dump.sql

Using the edit_universe_details.py Script:

Note: This process will open the universe details in a vi window where you can make necessary edits and save them with :wq

For a YBA Installer or Yugabundle based install the script needs to be run from the YBA instance

On the YBA instance, navigate to the directory where the script is located:

cd /opt/yugabyte/software/2024.2.2.1-b6/yba_installer/packages/yugabyte-2024.2.2.1-b6/devops/bin

- OR - 

cd /opt/yugabyte/devops/bin/

To execute the script, run the following on the YBA instance:

sudo ./edit_universe_details.py -i <universe_uuid> -t standalone

For a k8's install, the script needs to be run from outside of the pods

To get the script, it can be copied from the yugaware container in the yugaware platform pod:

kubectl cp <yugaware pod>:/opt/yugabyte/devops/bin/edit_universe_details.py ~/edit_universe_details.py -c yugaware -n <namespace>

To execute the script, run the following:

./edit_universe_details.py -i <universe_uuid> -t kubernetes -p <yugaware pod> -n <namespace>

Edits to Make:

Run the python script for the concerned universe.
Update the isMaster, isTserver as false in json for the concerned node (for which we performed the above operations).
Set the state as Stopped, for the universe node.

Diff should look like the following before confirming the update.

161c161
< "isMaster": true,
---
> "isMaster": false,
163c163
< "isTserver": true,
---
> "isTserver": false,
175c175
< "state": "Live",
---
> "state": "Stopped",
219c219
< }
\ No newline at end of file
---

4. Reprovisioning

This section highlights the steps that need to be executed to reprovision the nodes & start the required services.

Trigger the re-provision API for the node. Below is the sample curl request for the same.

curl <YBA-URL>/api/v1/customers/<customer_uuid>/universes/<universe_uuid>/nodes/<node_name> -X 'PUT' -H 'X-AUTH-YW-API-TOKEN: <auth-token>' -H 'Content-Type: application/json' -H 'Accept: application/json, text/plain, */*' -d '{"nodeAction":"REPROVISION"}'

node_name - can be obtained from the YBA UI, under the 'nodes' section.

auth-token - can be obtained from the UI, from the 'user profile' section.

Update the gFlag conf files that we copied in the first step to their respective places.
- (Copy the server.conf files from step 1)

Check that the services are enabled

```
systemctl enable yb-master.service
```
```
systemctl enable yb-tserver.service
```
```
systemctl enable yb-controller.service
```

Start three services.
- The following shows starting the services via systemctl, older instances may be cron jobs and running 'crontab -l' will show the commands to start the services in this case
- ```
systemctl start yb-master.service [–user]
```
- ```
systemctl start yb-tserver.service [–user]
```
- ```
systemctl start yb-controller.service [–user]
```

Verify the services are up using ps aux | grep yugabyte

5. Update YBA universe_details

Once the services are restarted, we need to update the universe_details that were updated in step 3 so that YBA universe_details & node are correct.

(We can use edit_universe_details.py for performing this operation)

Run the python script for the concerned universe.
Update the isMaster, isTserver as true in json for the concerned node(for which we performed the above operations).
Set the state as Live, for the universe node.

6. Fixing the provider

Once we fix the universe running with the above provider, we need to fix the provider as well so as to ensure that new deployments/nodes added come up with the new AMI.

We need to update the imageBundle associated with the provider so that it references the new AMI that needs to be used.

Assuming provider_uuid is the uuid for the provider that we need to fix.
Log in to the postgres container running on YBA host.

sudo docker exec -it postgres bash
select details from image_bundle where provider_uuid=<provider_uuid>;

- Keep a copy of the details selected above
Update the AMI ID for the details manually in the above json.

{"arch":"x86_64","regions":{"us-west-2":{"ybImage":"ami-0ebb279a6cc8e3dc2","sshUserOverride":"ec2-user"}}}

Replace the ybImage to be desired AMI ID, & update the image bundle in the DB.

update image_bundle set details='<details>' where uuid='<image_bundle_uuid>';

Additional Information

In a future version, this process will be included within the UI and will have much less user interaction required.
This article will be updated as/when the new flow is released.

Refresh the AMI when it has been deleted

Environment

Issue

Resolution

Overview

Steps

1. Backup Master/Tserver/Yb-controller gFlag confs

2. AWS UI/CLI Steps

3. Update YBA universe_details

Take a Backup of the Yugabyte Anywhere Database:

Using the edit_universe_details.py Script:

Edits to Make:

4. Reprovisioning

5. Update YBA universe_details

6. Fixing the provider

Additional Information

Comments

Environment

Issue

Resolution

Overview

Steps

1. Backup Master/Tserver/Yb-controller gFlag confs

2. AWS UI/CLI Steps

3. Update YBA universe_details

Take a Backup of the Yugabyte Anywhere Database:

Using the edit_universe_details.py Script:

Edits to Make:

4. Reprovisioning

5. Update YBA universe_details

6. Fixing the provider

Additional Information

Related articles