Updated: Feb 14
You can listen to the audio of this blog here
Let's dive right in -
What is the CAP theorem?
CAP theorem (also known as Brewer’s theorem) is a theory that was formulated by Eric Brewer in 2000. The CAP theorem states that a distributed database system cannot simultaneously provide consistency, availability, and partition tolerance. A distributed database system can only guarantee 2 out of these 3 principles.
Consistency implies that the data stored across all the databases and storage units are the same. This means that if I were to read from any of the data sources, I would get the latest and updated information. A system that follows this principle is called a consistent system.
Availability indicates that the system should be available 100% of the time. There should be no downtime in the system and no request should throw an exception or an error. It is important to note that an available system can still return a result that is not the latest version of the data.
Partition refers to the situation when the channel of communication between two connected systems is broken in a distributed system architecture. Partition Tolerance means that the system will continue to work even if there is a partition and a few messages are dropped or there is a delay in message communications across each system.
Where is the CAP theorem used in the real world?
In real-world scenarios, "Partition Tolerance" is a property that cannot be avoided as most of the real-world scenarios use a distributed data storage unit and are bound to have partitions, so you would want your system to be tolerant to such message outages.
Now we are left to choose between Consistency and Availability. Yes, as per the CAP theorem we can choose among these two combinations:
CP (Consistency and Partition Tolerance)
AP (Availability and Partition Tolerance)
As you start to scale, the problems of consistency and availability start becoming a problem as partitioning data sources become a necessity.
If you want to know more about System Scaling, check out this blog on Horizontal vs Vertical Scaling: https://www.thegeekyminds.com/post/all-about-scaling-a-system-horizontal-scaling-vs-vertical-scaling-system-design
If a NoSQL database is partitioned, it can either be CP (Consistency and Partition Tolerance) or AP (Availability and Partition Tolerance). If a NoSQL database is CA (Consistency and Availability), it means it has not been partitioned and is monolithic.
CP (Consistency and Partition Tolerance) with NoSQL
MongoDB is an example of a NoSQL database that follows CP (Consistency and Partition Tolerance). MongoDB stores data in several nodes. Every few seconds the primary node sends a heartbeat ping to the other nodes to check that it is alive. If any node does not respond to this signal, it is retired and cannot be accessed.
This way the NoSQL database compromises on availability to be consistent
AP (Availability and Partition Tolerance) with NoSQL
Cassandra is an example of a distributed database that uses AP (Availability and Partition Tolerance). Cassandra is a P2P (Peer to Peer) system that copies data on multiple nodes. The data which is returned is based on a "voting" method where the data in more nodes is assumed to be correct. This means that if a write operation was made recently and is present on only one of the nodes, the data returned would not be the latest as most of the nodes have an older copy of the data.
Cassandra however guarantees eventual consistency where the data is replicated to other nodes with time.
Is the CAP theorem valid in today's world?
Yes, of course, the CAP theorem is still valid and it is an important point to keep the trade-offs in mind while designing systems. Although we have come up with better solutions like eventual consistency, achieving all the parts of the "CAP" theorem simultaneously is still not possible.
The CAP theorem should be seen as a guiding tool while designing distributed systems as to which database to choose and how to design it.
The CAP theorem says that only 2 of the 3 principles can be achieved but designing systems in the real world need not be so black and white. You can opt for solutions that provide eventual consistency or eventual availability. By this even though you can't fully achieve all the 3 principles simultaneously but it is a better version of having only one of them (Consistency and availability).
So do not take the CAP theorem as an absolute, but think of it as something where there can be partial trade-offs. Be innovative while designing systems.
There's no one right answer when it comes to designing systems. As a software developer, it is your responsibility to look at business needs and implement whatever works best for your organization.
Blockchain technology is an excellent example where it sacrifices immediate consistency over eventual consistency.
To know more about blockchains, checkout this article: https://www.thegeekyminds.com/post/so-how-do-blockchains-and-cryptocurrencies-actually-work-exploring-web3
A Brief History of the CAP Theorem
After the invention of the transistor, computer architects needed to figure out how to increase data storage in a reliable way. One solution was to use a distributed system. This is when the CAP theorem was first introduced by Eric Brewer of UC Berkeley.
Brewer's paper on the CAP theorem became popular after it was published in 2000, and his ideas were quickly adopted by architects who were looking for ways to solve their problems with data storage.
Bonus: CAP Theorem and Latency
Many people say that the CAP theorem does not take into consideration the aspect of latency. But actually, partition and latency are deeply connected. When we say partition, it does not necessarily mean a breakage of communication between 2 interconnected systems, it also means that the message was not delivered in a given time frame, which is basically a timeout.
CAP theorem is as valid as it was in the 2000s when it was developed. It is just that the dynamics and trade-offs have changed as we have advanced a lot in terms of system design and technology.
And that's a wrap! Hi, I am Gourav Dhar, a software developer and I write blogs on Backend Development and System Design. Subscribe to my Newsletter and learn something new every week - https://thegeekyminds.com/subscribe