Apache Kafka Partitions as Unit of Storage
Understanding how to manage Kafka's storage at the partition level
send()
function to write a new event, there is an internal process triggered that takes care of deciding which partition that event should go to. This process is known as partitioning, and it can be customized by a component called partitioner.partitioner.class
property, then it will execute this partitioner to compute which partition to use. If there is no custom partitioner configured, then Kafka tries to compute which partition to use based on the key assigned. The partitionForKey()
method below describes how the key is used along with the number of available partitions.partitioner.ignore.keys
is configured to true, then Kafka will fallback to compute which partition to use based on factors such as broker load, the amount of data produced to each partition, etc. This partitioning behavior was introduced by KIP-794.poll()
function from the consumer to read events, the events will be read from partitions selected by an internal process triggered beforehand that takes care of deciding how to assign the partitions to consumers. This process is known as assignment, and it can be customized by a component called assignor. To better understand how assignors work, you need to understand how Kafka handles consumers. All consumers must belong to a consumer group. This is the reason the group.id
property in the consumer API is mandatory. Every group has a group coordinator, which oversees who joins and leaves the group.assign()
method below from the RangeAssignor implementation describes how this works.assign()
method below from the CooperativeStickyAssignor implementation describes how this works.kafka-reassign-partitions
tool available in the /bin
folder of your Kafka distribution. You can do this by first generating a reassignment recommendation given the new layout of your cluster.broker-list
was set to 2,3
, which correspond to the broker ids of the newly added brokers. The partitions-to-reassign.json
file provided as a parameter is a file you must create yourself and it should contain information about which one or more partitions you intend to reassign. You should create this file using the following syntax: