4. Distributed System

Kafka is a distributed system. This is done by copying the broker and turning it into a leader and follower system. Write operations are performed on the leader partitions and read operations are performed on the follower partitions. ![[kafka-distributed-producing-1.png]] The distributed system is performed at the partition level as shown above. * Benefit of this is that if one broker goes down, we don't loose our entire data. **How do publishers and subscribers know in which broker to find specific leader or follower partitions for a specific topic ?** ZooKeeper stores all the information needed to determine in which broker there is a specific leader or follower partition for a specific topic. ![[kafka-distributed-producing-2.png]] Producers figure out where is the leader partition located for the specific topic that they are about to write to before sending any data. The data is sent after figuring out this information. Write operations are usually performed on the leader. The leader then communicates that information to the follower partitions to achieve data synchronization. ![[kafka-distributed-producing-3.png]] The message sent to leader partition 2 in users topic is replicated to the follower partition 2. ![[kafka-distributed-consumer.png]] In the example above, the consumer specifies both the topic and partition. Specifying the partition can be avoided by using using consumer groups. Consumers that are placed in a consumer group get assigned partitions by the consumer group. The ZooKeeper algorithm decides whether the consumer consumes messages from the leader or follower partition 1 for the example above. Since the leader synchronizes its data with followers, it should not matter from which partition (leader, follower) the consumer reads data from.