Updated: Dec 12, 2022
A majority of the content on the internet is served from physical servers which are present at a particular geographical location. And if you live very far from this server's geographical location, the content will load very slowly for you. The reason is that your request will have to travel all the way to the physical location of the server, and in the response, the content will need to travel all the way back to your machine. If the server is hundreds and thousands of km away from you, then it's possible that the website may load slower for you.
This is where CDN comes in. CDN stands for Content Delivery Network.
If you are a software developer, a system architect, or a computer science student, you would need to know about a CDN to design and develop your system such that it is efficient and optimized. If you want to know about CDN, how it works and the scenarios where using a CDN is beneficial, this blog is for you.
Let's dive right in -
What is a CDN (Content Delivery Network)?
A CDN or Content Delivery Network is a network of servers that work together to deliver content faster to end-users.
These CDN servers are distributed around the world. The content on your main server is stored on these servers and can be accessed anywhere in the world at virtually the same speed.
CDNs are basically pseudo servers. All the incoming requests are directed to this CDN server and not the main server. This has a couple of benefits.
Reduces load on the main server and hence reduces the chance of the main server going down due to a lot of incoming requests
Reduces the response time of the requests, which leads to reduced latency
Reduces the bandwidth cost of the end user
The goal is for the CDN server to be as close as possible to the user’s physical location in order for them to download content as quickly as possible.
CDNs are so popular that almost every big and small businesses use CDN. CDNs are often used by websites, video hosting services, and cloud storage providers in order to reduce bandwidth costs and improve response times. Netflix and Youtube are very good examples of companies that use CDN for seamless video streaming.
How does a CDN (Content Delivery Network) work?
First, let's have a look at what the CDN infrastructure includes.
What does a CDN infrastructure include?
The origin server is the main server that contains the original version of the website. It listens to incoming requests and responds to each of them. Whenever a request is made, it goes to the origin server which then sends a response.
This time taken depends on the physical distance between the user and the server also called latency. Apart from this, time is also spent on creating a secure connection using the SSL/TLS protocol. This needs some extra RTT (Round Trip Time).
RTT (Round Trip Time) is basically the total time in milliseconds, from the time the browser requested till the time it received the response. Besides physical distances, the volume of traffic through nodes and intermediaries on a web server or over other networks also affects it.
To know more about SSL/TLS protocol, check out my blog: https://www.thegeekyminds.com/post/complete-guide-to-ssl-tls
The Origin Server is owned, maintained, and updated by the website owner.
Edge Server or PoPs (Point of Presence)
While the Origin Servers store the original data, the Edge server or PoP, also known as the Content Delivery Network Server or CDN PoPs (Point of Presence), stores a copy of the original data. These Edge servers are distributed across various locations around the world. Since these servers are present at the edge of a CDN network, they are called Edge Servers.
The Edge servers are responsible for caching the static data and serving any requests that are directed to them. This way the edge servers take some load off the Origin Server and also reduce latency in serving the requests.
These PoP data centers are strategically placed around the world to serve more people in a more effective way. These PoPs speed up load times and make the website user-friendly, wherever they are in the world. Each PoP has several caching servers.
The PoP or Edge server uses a reverse proxy to direct the incoming requests to one of the many caching servers.
If you want to know more about proxy and reverse proxy, checkout my blog: https://www.thegeekyminds.com/post/what-is-a-proxy-the-difference-between-a-proxy-and-a-reverse-proxy-use-cases-of-proxies
A PoP or Edge Server is made up of multiple caching servers that serve as a proxy for content like video and other files. They mostly store static data and deliver it locally to users which can reduce bandwidth consumption and make websites more responsive. These cache servers have strong storage and memory capabilities to cache files securely at high speed.
DNS Configuration for CDN
This means that you would have to route all the incoming requests to a URL's domain and sub-domain to the address of the CDN and not the main Origin Server. To achieve this you would need to modify your DNS configurations. This can be done in the following way:
Change the domain A record to point to the CDN's IP
Change the subdomain's CNAME record to point to the IP of the edge server assigned to the CDN.
Different CDN providers may have different ways to configure the DNS, so it's better to check the CDN provider's official documentation.
If you want to know more about DNS, check out this article: https://www.thegeekyminds.com/post/dns-and-how-it-works
Now that you are aware of all the components of a CDN infrastructure, let's see how a CDN network actually works.
So how does a CDN actually work?
A CDN is composed of many edge servers or PoP (Points of Presence) located in multiple geographic locations. There can be hundreds of such PoPs strategically located all around the world.
Different CDNs use different technologies to direct a user request to the nearest Edge server. The 2 primary ways are:
DNS based Routing
DNS based Routing
In DNS-based routing, each PoP or edge server has its own IP address. When a user looks for the URL address, the DNS returns the IP address of the Edge Server that is geographically closest to them.
In this arrangement, all PoPs or edge servers share the same IP address. The Internet Service Provider directs the user's request to the nearest Edge Server.
The location of these edge servers and the content that these edge servers should store is strategically decided based on the target audience, their demographics, and geographical location. For example, a video streaming platform may decide not to store a famous French video in their Indian Caching server because there won't be many requests for it. Similarly, a video that is famous all around the globe can be stored in all of its Edge servers. The decision on content and the physical location of the PoP of the CDN network is highly guided by the business needs and requirements.
Where is a CDN (Content Delivery Network) used?
CDNs benefit large e-commerce platforms to handle a large load of traffic and spikes in user interactions. Since CDN is a proxy for Origin servers, it also improves security.
Digital publishers can store static reading materials in a CDN which can be accessed fast from anywhere across the globe.
A CDN is also the perfect solution for media streaming companies that deliver high-quality content in real-time. Usual users can expect a fast & reliable service on these sites.
One way financial services providers enhance user experiences is by using Content Delivery Networks to cache APIs so they can serve highly dynamic content, like stock prices.
On social media sites, where traffic is high and the content is rich in multimedia, a CDN can help to ensure that the user experience is good and their site is fast.
Pros and Cons of using a CDN
Benefits of using a CDN
Performance Improvement - Improves page load time
Since the edge servers are located near the user location, CDN reduces the latency.
The biggest benefit of a CDN is that it improves the page load time.
As per bigcommerce.com :
"Users who are frustrated by a slow-loading site are likely to "bounce" - that is, visit your store once, leave and never return. The Aberdeen Group found that 40 percent of shoppers abandon a website that takes more than three seconds to load. Loyal customers who slog through a slow experience aren't unaffected either: A one-second delay (or three seconds of waiting) decreases customer satisfaction by 16 percent. Lower satisfaction means your slow-loading pages aren't just impacting that one customer visit-page load time can prevent customers from wanting to return to your site or recommend it to their friends."
The CDN servers are built for high-speed and high-volume routing capabilities. Hence they are expected to handle large amounts of load. This makes the CDN server highly scalable. If a website is receiving more requests from a particular location, the edge servers present in that location can be scaled up. This makes the CDN arrangement very flexible.
Highly Reliable and Resilient
The CDN is an arrangement of so many distributed servers. This makes CDN highly reliable, resilient, and highly available. Resilient means strong enough to deal with failures. The statistical probability of a CDN architecture being down is close to 0. And there is no single point of failure in a CDN architecture.
Protects the Origin Server
A CDN or Content Delivery Network protects the Origin Server from unwanted cyber-attacks. The CDN basically acts as a shield.
Inspect incoming traffic
CDNs inspect requests to filter out web application attacks such as SQL injection and cross-site scripting, among others. This prevents not only DDoS but also the kinds of attacks like SQL injection or cross-site scripting that are meant to take your website down.
Hide the IP address of the origin servers
CDN providers help protect your server from direct-to-IP attacks, such as network layer DDoS. This is done by hiding the origin website’s real IP address and routing domain queries to the IP of CDN providers.
Managing traffic spikes
CDNs are designed to handle spikes in web traffic to your website. Big traffic surges can overwhelm the origin servers which often leads to downtime and service interruptions. CDN provides a way of offloading large traffic spikes by distributing them across its network without creating additional problems for the publisher.
When not to use a CDN?
Here are the 5 situations when you are better off without a CDN:
If you have an extremely localized user base, you could look at the option of setting up your main sever in that area and you won't require a CDN.
If you handle a lot of sensitive user information, there will be a complex governance protocol that needs to be followed. In that case, using a CDN can lead to additional work and adding checks on the CDN server.
If your website does not have frequent visitors, it may not make sense to use a CDN. CDNs mostly remove the cached data if data has not been requested for some time. If your site has fewer visitors, the CDN will make an extra request to the origin server for the data, which will nullify the benefit of CDN in the first place.
If the CDN provider you are using doesn't have an edge server in the location you are operating, having a CDN would not benefit you.
If your website is already very fast, and the load times are good, you are better off not using a CDN as you won't have to pay for it and spend time maintaining it.
CDN (Content Delivery Network) Topology
One of the main features of CDNs is that they offer minimized latency. This is achieved through an optimized architecture, where data hubs are localized to major networking intersections so that your content loads faster.
Based on topology the CDN architectures can be divided into the following types:
This topology consists of low to medium-capacity edge servers densely spread across a geographical location. The aim of this topology is to reduce the physical proximity between the user and the server. Most of the early CDNs were built on this model where physical proximity played a very important role in reducing latency.
With the installation of optical fibers, the benefit of minimizing the physical distance went down. Maintaining scattered CDN topology is a time-consuming task as new changes would need to deploy across so many nodes.
Physical proximity minimizes latency
Effective in low-connectivity regions where optical fibers are not used
Smaller POPs are easier to deploy
Higher maintenance costs
RTT (Round Trip Time) prolonged by multiple connection points
Cumbersome to deploy new configurations
Consolidated CDN consists of high-capacity edge servers which are placed strategically at major data centers to cover a large region. Most modern-day CDNs use this approach.
Consolidated topologies are centralized systems with agile management infrastructure. This allows for rapid configuration deployment, which is useful for both the end user and the network operator. It offers more control and a better overall response time than other topologies.
This topology is more resilient, specifically when it comes to DDoS attack mitigation.
High-capacity servers are better for DDoS mitigation
Enables agile configuration deployment
Lower maintenance costs
Less effective in low-connectivity regions
High-capacity PoPs are harder to deploy
Types of CDN network
From a user perspective, there is no difference in any type of CDN, but from the perspective of implementation, the CDN networks are classified into the following types:
As the name suggests, this type of CDN makes use of the peer-to-peer protocol. Yes, you guessed it right! It's the same protocol that Torrent uses for sharing files. In this technique, no caching is required. All the users are part of the CDN network. The users who download certain content from the internet become part of the CDN network and share certain parts of the content and this has minimal impact on the browsing speed of the user. Since minimal hardware is used in this type of arrangement, companies offer their P2P (peer-to-peer) services free of cost. PeerCast, PPS.tv, Freecast, etc. are some examples of Peer to Peer CDNs
In this type of CDN, the contents are physically pushed to the CDN servers. As a result, it's possible to think of those servers as secondary servers to your main server. You can host files at the CDN servers as well in this situation and your main server will take them from there and provide them for you. Amazon Cloudfront is an example of Push CDN. The Push CDN is a fantastic choice if you are looking to make downloads faster because it provides personalization, global coverage, and great performance. Push CDN is also called Hosted CDN.
Origin Pull CDN
Origin Pull CDN technology is very different from Push CDN in that it doesn't require any physical data distribution. You only upload your content to the Origin servers and it will be automatically distributed to the client via access to Origin Pull CDN nodes. Now, when a user requests content, the CDN server will return the data if it is present, else it will get the data from the origin server, store it in its cache and then return the data. The files will stay at that location for a certain time before expiring. Pull CDN is also known as Relayed CDN.
In origin Pull CDN, the first person requesting the data will experience the poor performance as the data would need to be fetched from the origin server. Origin Pull CDN or Relaying CDN can be further divided into 2 categories:
Full Site Content Delivery CDN
In this arrangement, whenever a request is made, the entire website is delivered to the CDN cache.
Partial Site Delivery CDN
Here only certain parts of a webpage (like Media files, CSS files, Java scripts etc. ) are delivered through the CDN.
Myths related to CDN
Myth No.1 - If I use CDN, my page load speed issue will be solved
A website's page load speed depends on a variety of factors, like the size of the elements used in a webpage, whether good practices were followed while developing the website, and a lot of other factors. Using a CDN may or may not improve your website's loading time.
ProTip: You can use Google Lighthouse to see what may be causing your webpage to load slowly.
Myth No.2 - My hosting provider will provide me with CDN services
Adding a CDN is a function of the client side. So if you own a website, you would need to do an additional integration for it and pay for it separately.
Myth No.3 - One CDN is good enough
If you are a big business and have visitors from all around the globe, you may need more than one quality CDN placed strategically across the globe.
Myth No.4 - CDN will protect my site from security threats
CDN helps to mitigate a few Cyber Security threats like DDoS attacks etc. but your server needs to have an additional firewall against so many other types of cyber threats.
Myth No.5 - More PoPs or edge servers in a CDN are better
If your CDN Provider has a lot of PoPs but does not have any at the place where most of your visitors are from, having a CDN is useless. A good CDN is one that has its edge servers strategically placed near your target audience.
Conclusion - My Thoughts
CDN is a network Provider's Best Friend when it comes to saving bandwidth which ultimately saves time and money. CDNs are great for improving the speed and reliability of your website. It's also an ideal solution for cutting down on costs and managing increased traffic at peak hours. CDN has distributed caching mechanism, so it doesn’t have a single point of failure making it highly available and resilient. CDN also protects your main server from cyber-attacks.
You may not get the most out of CDNs in a highly localized area, because CDNs are designed to handle traffic from a distributed geographical setup. You should choose a CDN that fulfills your needs. There are a variety of solutions that you can use to implement CDNs. A few popular ones are: Akamai, Amazon CloudFront, Google Cloud CDN, CacheFly, CDN77.com, CloudFlare, Stackpath, Yottaa
And that's a wrap! Hi, I am Gourav Dhar, a software developer and I write blogs on Backend Development and System Design. Subscribe to my Newsletter and learn something new every week - https://thegeekyminds.com/subscribe