Environment
- Yugabyte 2.2 and above
Issue
When following Yugabyte Documentation for the yb-admin alter_universe_replication, the command returns the following error:
I0428 18:00:58.765663 58290 meta_cache.cc:755] 0x00000000131afa10 -> LookupByIdRpc(tablet: da76297c50dc45d5a9b578441e5b20ef, num_attempts: 1): Failed, got resp error NOT_THE_LEADER
- Additional Information: Please be sure to review validation steps prior to making any changes to your cluster. As a general recommendation, always take a backup of the config file via the `get_universe_config` command for future reference, should it be needed.
Resolution
Overview
In an environment were xCluster replication has been enabled, all replication streams are governed by the information stored in the Producer Map and Consumer Registry.
Sometimes, a Universe can undergo a master/leader change that can cause data to become out of sync with the values stored in the Producer Map. If this occurs in your environment, you can use the following steps to validate the issue, and then fix the discrepancy.
Steps
1. Examine the reported masters list, from the Producer Universe, using the yb-admin list_all_masters command:
[yugabyte@yb-dev-hamilton-replication-universe-1-n3 bin]$ ./yb-admin -master_addresses `hostname` list_all_masters
Master UUID RPC Host/Port State Role
6eb085e82c654b22970eaa4bb3dcf98c 10.202.0.35:7100 ALIVE LEADER
801de3c2d34d41c1850fcf5ec962a698 10.207.0.42:7100 ALIVE FOLLOWER
e82afc71c0974a61ab191d1382b59a6f 10.207.0.43:7100 ALIVE FOLLOWER
2. Examine the reported masters list stored in the universe config using the yb-admin get_universe_config command
[yugabyte@yb-dev-hamilton-n1 bin]$ ./yb-admin -master_addresses 10.207.0.42,10.207.0.43,10.202.0.35 get_universe_config | python -m json.tool
{
"clusterUuid": "7ea1f6c9-f20e-4107-b052-285546ba88ba",
"consumerRegistry": {
"producerMap": {
"7ea1f6c9-f20e-4107-b052-285546ba88ba": {
"masterAddrs": [
{
"host": "10.202.0.35",
"port": 7100
},
{
"host": "10.207.0.42",
"port": 7100
},
{
"host": "10.207.0.47",
"port": 7100
}
--- </snip> ---
- Notice that the masterAddrs tuple from the ProduceMap is inconsistent with the output from the list_all_masters command from the Producer Universe.
3. Using the yb-admin alter_universe_replication command, update the ProducerMap values for the masterAddrs, to align with the list_all_masters output from the ProducerUniverse.
- Note: This command should be run on the Consumer Universe
[yugabyte@yb-dev-hamilton-1-n1 bin]$ ./yb-admin -master_addresses `hostname` alter_universe_replication 7ea1f6c9-f20e-4107-b052-285546ba88ba set_master_addresses 10.202.0.35:7100,10.207.0.42:7100,10.207.0.43:7100
Replication altered successfully
4. Using the yb-admin get_universe_config command, verify that the masterAddrs list is now consistent with the output from the list_all_masters on the Producer Universe.
[yugabyte@yb-dev-hamilton-n1 bin]$ ./yb-admin -master_addresses 10.207.0.42,10.207.0.43,10.202.0.35 get_universe_config | python -m json.tool
{
"clusterUuid": "7ea1f6c9-f20e-4107-b052-285546ba88ba",
"consumerRegistry": {
"producerMap": {
"7ea1f6c9-f20e-4107-b052-285546ba88ba": {
"masterAddrs": [
{
"host": "10.202.0.35",
"port": 7100
},
{
"host": "10.207.0.42",
"port": 7100
},
{
"host": "10.207.0.43",
"port": 7100
}
--- </snip> --
5. To fully ensure that replication is working as expected - use a test entry or deletion task from a replicated table
- Note: In this example we will delete a value from the table on the Producer side, and verify that the changes propagate to the Consumer.
5.a Check the number of enteries on the Consumer table
yugabyte@yb-dev-hamilton-1-n1 bin]$ ./ycqlsh `hostname`
Connected to local cluster at yb-dev-hamilton-zd1453-1-n1:9042.
[ycqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
ycqlsh> select count (*) from ybdemo_keyspace.cassandrauserid ;
count
-------
4635
5.b Check the number of enteries on the Producer table
[yugabyte@yb-dev-hamilton-replication-universe-1-n3 bin]$ ./ycqlsh `hostname`
Connected to local cluster at yb-dev-hamilton-zd1453-replication-universe-1-n3:9042.
[ycqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
ycqlsh> select count (*) from ybdemo_keyspace.cassandrauserid ;
count
-------
4635
5.b Delete an entry from the Producer table, and verify count
ycqlsh> DELETE FROM ybdemo_keyspace.cassandrauserid WHERE user_name='54c11b91-0add-42eb-a71a-59831bb136e0:1100';
ycqlsh> select count (*) from ybdemo_keyspace.cassandrauserid ;
count
-------
4634
5.b Confirm that the consumer reflects the same changes
ycqlsh> select count (*) from ybdemo_keyspace.cassandrauserid ;
count
-------
4634
ycqlsh> SELECT * FROM ybdemo_keyspace.cassandrauserid WHERE user_name='54c11b91-0add-42eb-a71a-59831bb136e0:1100';
user_name | password | update_time
-----------+----------+-------------
(0 rows)
Additional Information
This KB is meant to serve as a temporary solution. Our Engineering team is working to address this issue via GHI 7807
Comments
0 comments
Please sign in to leave a comment.