How do load Balancers Work? What is Consistent Hashing? - System Design
Updated: Dec 9, 2022
Imagine that a system is horizontally scaled, i.e. the system makes use of many servers to handle requests. There needs to be some kind of mechanism to direct these incoming requests to the destination server. It is also required that any one of the servers should not be flooded with requests while the other server sits empty. Obviously, we would want all the requests to be uniformly distributed among the servers. All the above tasks are carried out by a separate entity which is called a Load Balancer.
Note: If you want to know more about system scaling and its types - Horizontal Scaling and Vertical Scaling. Refer to my blog :
To formalize - Load balancing is the process of distributing a set of incoming requests over a set of servers, with the aim of making their overall processing efficient. Load balancing can optimize the response time and avoid uneven distribution, i.e. overloading some serves while the other compute nodes are left idle.
So how do Load Balancers distribute these requests?
Modulo Hashing or Distributed Hashing
Modulo hashing is one of the ways by which load balancers distribute incoming requests among the servers. This involves hashing the unique id of the requests and taking the modulo of the hashed number with the number of servers.
For example: if my system consists of 4 servers (Server 0, Server 1, Server 2, Server 3). I get a request, I compute the hash of this request id and it gives me 34567. Since I have 4 servers, I take the modulo of 35467 with 4 and I get the remainder as 3 :
35467 mod 4 = 3.
This indicates that the Load Balancer should direct the request to Server 3 as per modulo hashing.
Since the hash function is uniformly random and the modulo operation is also uniformly random, the requests will be distributed randomly across all the servers.
In practice, this request id is generally the user-id or the client-id. Hash of the same user id/client id will give us the same server-id. This implies all the requests from the same client will be directed to the same server. Hence it is a common practice for these servers to actually cache the user/client information in their local memory.
Let's look at an example of how the requests are distributed among each server:
Request - Id
Hash of the request id
Hash mod 4 (Num of servers)
Server Id ( when 4 servers are used )
Server - 2
Server - 2