Environment

Yugabyte DB - YCQL API
Yugabyte DB - YSQL API

Issue

What is the difference between a primary key and a partition key?
Why do I have data skew if I have a unique primary key?

Resolution

When designing a schema, include enough columns in your partition key to make a high cardinality partition key. Include all columns in the partition key in a subset of parentheses, as described in the documentation:
YCQL partition key definition

YSQL partition key definition

For example, if you have the following schema with very few vendors, but lots of unique items, explicitly define the partition key with additional parentheses to include vendor and item_id

Here, only vendor is in the partition key:

CREATE TABLE test.single_column_partition_key (
         vendor text,
         item_id text,
         location_id text,
         description jsonb,
         PRIMARY KEY (vendor, item_id, location_id) 
         )   
         WITH default_time_to_live = 0
         AND transactions = {'enabled': 'true'};

But in this schema, all of vendor and item_id are in the partition key:

CREATE TABLE test.double_column_partition_key (
         vendor text,
         item_id text,
         location_id text,
         description jsonb,
         PRIMARY KEY ((vendor, item_id), location_id) 
         )   
         WITH default_time_to_live = 0
         AND transactions = {'enabled': 'true'};

Further Discussion

Primary keys in YugabyteDB must be unique, but the partition key may be defined as any subset of the columns included in the primary key. This means that it is possible to design a schema that may be subject to data skew, because columns that may be included to make a unique primary key are not utilized in the partition key.

The documentation on our sharding strategy is found here

But in this case, we'll examine from the query layer how we define partition keys.

Take this example in YCQL:

CREATE TABLE test.single_column_partition_key (
         vendor text,
         item_id text,
         location_id text,
         description jsonb,
         PRIMARY KEY (vendor, item_id, location_id) 
         )   
         WITH default_time_to_live = 0
         AND transactions = {'enabled': 'true'};

Because a partition key is, by default, only the first column in the primary key, in this case only vendor is a partition key, which can be seen in the table view in the master HTTP UI at port 7000 - for example:

http://localhost:7000/table?id=11bd2791d06a4543bf71fd1525fad346

Table: test.single_column_partition_key (11bd2791d06a4543bf71fd1525fad346)

Column	ID	Type
vendor	0	string NOT NULL PARTITION KEY
item_id	1	string NOT NULL NOT A PARTITION KEY
location_id	2	string NOT NULL NOT A PARTITION KEY
description	3	jsonb NULLABLE NOT A PARTITION KEY

Remember that YCQL only offers hash sharding - although the same behavior applies in hash sharding in YSQL.

If you had only two vendors, but each vendor offered hundreds of items at dozens of locations, you'd likely have no issue with primary key collision and duplicate. But Yugabyte would distribute data into no more than two hash partitions.

So even if you had a twelve nodes - only at most six of these nodes - the ones containing the shard with this hash - would actually contain data for this table in RF=3. The data is replicated 3 times - and only one of the shards would be a leader - so you'd have data hot spots for this table on 6 nodes, and CPU hot spots on the nodes that contain the leaders if this table has high read/write activity.

If instead we create a the table with two columns in the partition key:

CREATE TABLE test.double_column_partition_key (
         vendor text,
         item_id text,
         location_id text,
         description jsonb,
         PRIMARY KEY ((vendor, item_id), location_id) 
         )   
         WITH default_time_to_live = 0
         AND transactions = {'enabled': 'true'};

We would see in the master HTTP UI:

Table: test.double_column_partition_key (f4bd8afcdaff48c38ad55c4775020c11)

Column	ID	Type
vendor	0	string NOT NULL PARTITION KEY
item_id	1	string NOT NULL PARTITION KEY
location_id	2	string NOT NULL NOT A PARTITION KEY
description	3	jsonb NULLABLE NOT A PARTITION KEY

Now, the partition hash is calculated by both vendor and item_id columns. In this scenario, if we have only two vendors but hundreds of items, cardinality is hundreds of times what it would be than previously, where we only partitioned on vendor.

If we were expanding the database to hundreds of nodes, we may still have some hot spots - so it may make sense to further include location_id in the partition key - making the partition key the full primary key - to make sure we have sufficient cardinality to avoid hot spots. This schema would look like:

CREATE TABLE test.triple_column_partition_key (
         vendor text,
         item_id text,
         location_id text,
         description jsonb,
         PRIMARY KEY ((vendor, item_id, location_id)) 
         )   
         WITH default_time_to_live = 0
         AND transactions = {'enabled': 'true'};

Considerations

In order to query efficiently, every query will need to specify all partition key columns. This limits how many columns you put in the partition key.

So this proposed partition key would have issues if not all access patterns query item_id. Such as query may need to use the partition_hash function to execute cleanly, as used in the ycrc tool:

https://support.yugabyte.com/hc/en-us/articles/360060685992-How-to-run-count-query-in-Yugabyte-DB-

Understanding the difference between primary key and partition key when using hash sharding

Environment

Issue

Resolution

Further Discussion

Considerations

Comments

Environment

Issue

Resolution

Further Discussion

Considerations

Related articles