A Crash Course on Content-Delivery Networks (CDN)

In the era of the modern web, latency can directly impact an organization’s bottom line.

Here are some stats from a study by Portent:

One of the most effective strategies to reduce latency is using a Content-Delivery Network or a CDN.

A CDN is a geographically distributed network of servers that work together to deliver fast and reliable content delivery across the globe.

When a user requests content from a website or application that uses a CDN, the request is redirected to the nearest CDN server, which serves the content to the user. This reduces the distance data travels and improves overall performance.

Think of a CDN like an ATM. If money were available only from one bank branch in town, everyone would have to make a time-consuming trip to that branch. With ATMs in every locality, everyone has fast and easy access to money. 

The market size for CDN-related solutions is expected to reach nearly $38 billion by 2028, with companies like Akamai, Cloudflare, and Amazon CloudFront investing heavily in improving their CDN offerings.

Why the Need for a CDN?

CDNs provide several key benefits:

How Does a CDN Work?

The diagram below shows the content delivery process in detail.

Here’s what happens in each step:

The sequence diagram below shows a basic CDN flow.

CDN Architecture and Components

A typical CDN architecture consists of several key components:

Let’s take a deeper look at each component and its role in the overall architecture of a CDN.

Origin Server

The origin server is the primary source of the content that the CDN distributes. 

It hosts the authoritative version of the content and ensures that the CDN has access to the most up-to-date version. The origin server can be a physical server, a cluster of servers, or even a cloud-based service. 

When a user requests content that is not available on the CDN edge servers, the CDN fetches the content from the origin server, caches it, and then serves it to the user. 

CDN Edge Servers

CDN edge servers, also known as Points of Presence (PoPs), are the backbone of the CDN infrastructure.

These servers are strategically located in data centers around the world to ensure that content is delivered quickly and efficiently to users in different geographic regions.

The primary role of CDN edge servers is to cache and serve content to the users. When a user requests content, the request is routed to the nearest edge server, which then serves the content from its cache.

Edge servers perform critical functions, such as:

DNS

The DNS plays a crucial role in directing user requests to the appropriate CDN edge server. 

When a user requests content from a website or application that uses a CDN, the DNS resolves the domain name to the IP address of the nearest edge server.

CDN providers typically use specialized DNS services, such as Global Server Load Balancing (GSLB) or Anycast DNS. We will look at both in the next section.

CDN Control Plane

The CDN control plane is the central management system that oversees the operation of the CDN. 

It is responsible for configuring and managing the edge servers, monitoring performance, and ensuring that the CDN operates efficiently. The control plane typically includes tools and interfaces for:

Monitoring and Analytics

CDN providers use sophisticated monitoring and analytics tools to track the performance of the CDN. 

These tools collect data on various metrics, such as:

This data is used to identify performance bottlenecks, optimize content delivery, and ensure that the CDN meets the needs of the content providers and their users.

CDN Request Routing

There are some key concepts to understand when it comes to request routing in a CDN.

Global Server Load Balancing

GSLB is a technique used by CDN providers to distribute incoming traffic across multiple geographically dispersed servers or data centers. 

The primary goal of GSLB is to ensure that user requests are routed to the server that can provide the best performance and availability. It typically considers the following factors:

For a CDN, the GSLB system is usually integrated into the CDN’s overall DNS infrastructure, which works alongside the standard DNS hierarchy, which includes:

In a CDN setup, the GSLB operates at the authoritative DNS level. Here’s how it works:

Anycast DNS

Anycast DNS is a networking technique that allows multiple servers to share the same IP address. 

In CDNs, Anycast DNS typically routes incoming traffic to the nearest data center with the capacity to process the request efficiently. With an Anycast network, instead of one server handling the load of all traffic, the load is spread across other available data centers.

The diagram below shows the difference between Anycast and Unicast:

Here’s how it works:

A CDN using Anycast is also great at preventing DDoS attacks since it increases the surface area of the receiving network. 

Unfiltered denial-of-service traffic from a distributed botnet is absorbed by each of the CDN’s data centers. The larger the network size, the harder it becomes to launch an effective DDoS against the CDN.

The Role of Internet Exchange Points

Internet Exchange Points (IXPs) are physical infrastructure facilities where multiple networks can connect and exchange data traffic. 

They enable internet service providers, content providers, and other network operators to exchange traffic directly between their respective networks.

Without IXPs, traffic going from one network to another might rely on an intermediary network, Potentially causing “tromboning”, where traffic from one city destined for another ISP in the same city may end up traveling vast distances.

But how do CDNs leverage IXPs?

CDN providers establish a presence at major IXPs to improve their connectivity and reach. 

IXPs allow CDNs to exchange traffic directly with ISPs and other networks, minimizing the distance and time required for data to travel between the CDN and end users. A CDN with an IXP presence optimizes the path through which data flows within its network.

Best Practices to Optimize CDN Performance

Optimizing CDN performance is crucial for delivering fast, reliable, and efficient content to users worldwide.

Here are some best practices:

1 - Caching Optimization

2 - Content Optimization

3 - Network Optimization

4 - Security Optimization

Can a CDN Deliver Dynamic Content?

Dynamic content refers to content that is generated or personalized in real-time based on user interactions, database queries, or other dynamic factors. 

Some examples of dynamic content include:

Delivering dynamic content through a CDN is challenging because it requires real-time processing and cannot be easily cached like static content.

CDNs can employ various techniques to optimize dynamic content delivery:

A Real-World Example of CDN

Let’s now look at a prominent real-world example where the use of CDN played a key role in the scaling journey of Netflix.

Netflix debuted its streaming service in 2007. In the early days, they had 35 million members across 50 countries, streaming more than a billion hours of video each month.

In 2009, Netflix used a 3rd party CDN to keep the costs down and invest their time on higher-priority projects. However, in 2012, Netflix built its own dedicated CDN to maximize network efficiency and viewing experience.

The CDN is called Open Connect.

The Role of OCAs at Netflix

Here’s how it works on a high level:

Summary

Content Delivery Networks have become a key component of modern web applications. 

We’ve discussed many aspects of CDN, such as:

Here are a few important learning points to take away from this article:

References: