Learn the concepts of Consistent Hashing in Load Balancing and understand the System Design concept

7 min readMay 1, 2021

Well, as an adept reader, one must know about 3 “Q”s - Why? What? How? before even starting to read. So let’s get started with WHY?

Big companies ask questions in interviews about this topic, and that is one of the biggest inspirations to read this. And this is important to develop large-scale systems like Uber, Swiggy, Tinder, Instagram, etc. which deal with millions of requests from the world in a short span of time.

To start with Consistent Hashing, we need to understand System Designs. This helps to create large-scale applications that can serve many requests with minimal response time, makes the application fault-tolerant, consistent, scalable. In one word, to make an efficient system (application), we need to implement the concepts of System Designs. (give some time to yourself at your leisure to read about it from the internet)

In system designs, there is a concept of Load Balancing. The idea is to scatter (Balance) the upcoming requests (Loads) to specified workers (servers in case of application) uniformly so that the workers can work at equal strength. It is the responsibility of the load balancer to keep watch on the system that no worker gets a heavy load of work (serving request by the server in case of application) alone, it should be uniformly distributed. The system should never become skewed.

Now it is time to know WHAT is Consistent Hashing?

Consistent Hashing is a popular technique to balance the loads. In other words, it simply maps the requests to specific Nodes (the servers) intelligently so that every node in the system can work on a similar number of requests. Also, it enables the system to add or remove nodes flexibly and maintain a good Load Factor (The number of requests served per node).

Now it's time to answer the big question, HOW we can achieve it?

Let’s consider there are R number of requests, and rᵢ is one of the requests’ id

where i ∈ R

Now we have to consider any hashing function Hρ which hashes the id rᵢ

Hρ(rᵢ) = a specific index (number) to map request

Let’s say we have an array of length N

But think of it as a ring -

Now some maths to do 📚 If we hash the value rᵢ by Hρ, the value of Hρ(rᵢ) is the index to the ring array. But Hρ(rᵢ) can be any number and can be greater than (N-1). So we can not consider it as the index to the ring. What we can do is to add a MOD operation on Hρ(rᵢ) by N. Which means -

Value of index to the ring array for request is INDEXᵣ = Hρ(rᵢ)%N
(Which will return a reminder, which is always less than N)

For example, say Hρ(rᵢ) = 123445, N = 20, so INDEXᵣ = Hρ(rᵢ)%N = 5

Then the picture looks like -

So in this manner, we can map all the requests to the processing queue (which is basically our Ring)

Now let’s say we have S number of servers, sⱼ is one of the servers’ id

where j ∈ S

Here also we have to consider a different hashing function Hσ, which will hash sⱼ. And by MOD operation with N on it will yield INDEXₛ = Hσ(sⱼ)%N

Value of index to the ring array for server is INDEXₛ = Hσ(sⱼ)%N
(Which will return a reminder, which is always less than N)

So yesss !! By now, you might have understood, we are also going to map the servers to the same ring array (this is the catch 😵).

For example, say Hσ(sⱼ) = 107, N = 20, so INDEXₛ = Hσ(sⱼ)%N = 7

The picture for server mapping looks something like this -

So in this manner, we keep hashing all the requests and servers, we may end up into something like this -

So these blue-marked requests will be served by the nearest red-marked server to it in a clockwise direction. From the above picture, we can say that -

requests referenced in 2,3,4,5,6 will be served by the server referenced in 7requests referenced in 8,9,10 will be served by the server referenced in 11And so on...

What did we achieve?

Well, by hashing we have introduced randomness in placing requests and servers in the array. And a good hash function will always keep a uniformity while mapping. So each server will have a theoretically equal number of loads to serve. This is what we desired at the very beginning -

Reducing loads on heavily loaded nodes and spread the task (Load) evenly (or Uniformly) to all the other nodes. This is why it is called Load Balancing.

Benefits

Now let’s say one of the servers go down, but nothing to worry about. The requests which were earlier served by a server that just went down will be now served by the next server in a clockwise direction. From the example above, if we lose a server say referenced in 7, then requests referenced in 2, 3, 4, 5, 6 will be mapped to the next server referenced in 11. So now, the server referenced in 11 will serve requests referenced in 2, 3, 4, 5, 6, 8, 9, 10.

We call this concept Fault Tolerance. So this is why even if any Facebook server crushes, within a fraction of a second your request is served by another server, and you enjoy the uninterrupted service of such apps. And that makes you happy 😊

Problem 🔥

There are still some issues with this design. As you can see, due to a server shut down, another server is taking an extra load on it. In our example, it is still ok because the load increases by 5 (2, 3, 4, 5, 6). But as we said earlier, theoretically R number of loads distributed evenly to N positions, but practically it is not N/R always that each node serves. The type of requests can be skewed and can create a design where say a server sₘ is mapped to 50 loads, next sₙ is mapped to 30 loads, and the rest of the system’s each server is mapped to loads around say 10. Now, if sₘ fails, sₙ will be mapped to 50+30=80 loads. This is really unfair to sₙ since its other fellow servers are just serving 10 requests.

Well, this is a real problem. And this is due to fewer server points in the ring array, and a considerably large number of loads in the same array. But it is also true that the number of servers will always be very much less than the loads. There used to be countable severs in fact. So the solution is to have more server points in the array, but we are restricted to a limited number of servers at the same time. Then how to solve the issue?

Solution

We have to create virtual nodes. It means, let's say we have P number of nodes -

Servers ⇒ s₀, s₁, s₂, s₃, ....... , sₚ

So in the earlier design, we would have created P points in the ring array. But now in the new design, for each node, we will create K virtual points. There will be PK points in the array. The array points contain the addresses (the reference) of the nodes. So basically we created K such reference of each node. That is what we call Virtual Servers.

Thus we have more references to servers, which reduces the number of loads per server. So in this design, even if one server goes down, a fewer number of loads will be mapped to the next available server. And thus the load factor remains still same (changes negligibly).

To get the best result, you have to choose a good value of K smartly.

K ≈ log(N) is a good approximation that one may choose

Conclusion

System Designs is subjective to concepts. The more you improve here, the better the software you get. But we also need to understand that Consistent Hashing is not the ultimate answer to fault tolerance, scalability. But it is one of the useful concepts in request allocation techniques. But there is always scope for improving the algorithm. A good choice of numbers N, P, K (reference described above) is all to focus on.

Created by Suman Mandal with ❤️
Follow me - LinkedIn, Instagram, Twitter