Kafka controller is a thread that runs inside only one broker in a Kafka cluster i.e. If we have a cluster of N brokers then there will be only one broker that is the controller.
It is like a brain for the cluster so that the cluster functions in a smooth and resilient way.
Whenever a Kafka Cluster is spun up, the brokers will first create a session with the zookeeper and the brokers will try to create an ephemeral node “/controller” inside the zookeeper. The broker that will be able to successfully create the “/controller” node will become the controller.
The rest of the brokers will create a watch on this “/controller” node.
Why? whats’ the need to watch this “/controller” node?
Suppose the controller broker goes down Then the cluster should start behaving abnormally? Right? This actually never happens. Kafka is very good when it comes to self-healing. So, to overcome such a scenario rest of the brokers (Kafka Cluster minus Kafka Controller) will create a watch on the “/controller” node. As soon as the controller goes down or its session with the zookeeper is lost then this znode will be deleted and the rest of the brokers will be notified, and a new controller will be elected again.
A lot of theory? Let’s verify this practically now.
Login to the zshell by using zookeeper-shell.sh localhost:2181(This utility is available under kafka-broker/bin directory).
Then use “ls /” to list all the znodes available as shown.
To check how many brokers are functioning in the cluster. Use ls /brokers/ids
The output depicts that we have 2 brokers with ids as 0 and 1 connected to zookeeper.
Now let’s refocus on the controller. You got it right we have a “/controller” znode that contains information about the current controller. As we have 2 brokers, ideally a controller would be out of these 2 brokers only. Let’s verify the same now!!
We can see that the broker with id as “0” is the controller. Some of you might be thinking, how a controller will be re-elected in case of a controller failure. Cool, you are going in the right direction.
To understand this let’s shutdown a broker(Ensure that you shut down the broker with id as “0” so that you can understand the concept clearly.
Do a Ctrl+C on the Kafka console window(Never do this in production)and again do a get /controller on zshell window.
Here we see the broker id “1” is the controller.
The take away here is that there would be only 1 broker that will act as a controller.
Responsibilities of the Controller
The Controller performs very crucial tasks
Broker Liveliness: The controller performs one of the crucial tasks of monitoring a broker’s liveliness by keeping a watch at the zookeeper znode. Once a broker’s session with zookeeper is lost the controller will consider it as a broker failure. For some of the partitions, this broker failure will be a leader loss, this leadership has to be transferred to another broker.
Leader Election: If a broker fails, then the partitions of which this broker is a leader will require a new leader. The controller will elect a new leader from the ISR list and will persist this mapping in the Zookeeper in order to avoid loss of information. Not only on a failure but also when a new topic is created, the controller will assign leaders for its partition(s) too.
Updating the ISR: Once a new leader for a partition has been elected the controller will persist the same in zookeeper and will broadcast this change to all the brokers(Remember any broker in the cluster can serve the metadata request)When the Controller elects a new leader, it updates the leader epoch and sends that information to all the members of the ISR list and also persists this ISR list in the zookeeper.
We saw that the controller will be re-elected automatically so, can we shut down any broker any time? Shall we perform a planned shutdown of a broker or straight away press the Ctrl+C button? Let’s understand this now.
Impact of Controlled Vs Uncontrolled Shutdown of a Broker on the Controller:
Not everything can be performed in a controlled manner but if we are intentionally shutting down the brokers for maintenance or maybe rolling an update then it’s a good practice to perform a controlled shut down. There are several benefits for a controlled shutdown and one of those directly impacts the Controller.
If a broker is about to shut down, then the partition of which this broker is a leader will have to get a new leader. This leader’s re-election has a cost associated with it. During the leader election, the partition will go offline(a client can’t read/write to it).
So, if it’s a controlled shutdown then partitions will go offline 1 by 1 and the impact would be the bare minimum and would be negligible in most of the cases.
But think of an uncontrolled shutdown, all the partitions will go offline at once and the Controller is now under pressure to re-elect a new leader for these partitions. Though the controller only performs leader election one by one for each partition which means the last partition will remain offline for the maximum time span. e.g. for N number of partitions, the Nth partition has to serve the max offline time.
What If Controller shuts down abruptly? As explained if a controller fails, a new controller will be elected and this new Controller will initialize itself by reading a lot of metadata i.e. the metadata for all Partitions in the cluster, who is the leader, who is the follower, the ISR list, etc. Hence the more the Partitions, the longer this recovery takes.