0% found this document useful (0 votes)
263 views

Caching Policies and Strategies - System Design On AWS (Book)

This chapter discusses caching policies and strategies for improving system performance. It covers cache eviction policies like LRU and LFU that determine when to remove data from the cache. The chapter also discusses caching strategies for read-heavy and write-heavy applications as well as popular open-source caching solutions.

Uploaded by

ibrahim esawy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
263 views

Caching Policies and Strategies - System Design On AWS (Book)

This chapter discusses caching policies and strategies for improving system performance. It covers cache eviction policies like LRU and LFU that determine when to remove data from the cache. The chapter also discusses caching strategies for read-heavy and write-heavy applications as well as popular open-source caching solutions.

Uploaded by

ibrahim esawy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

SIGN IN TRY NOW

See everything available through the O’Reilly learning platform and sta Search

System Design on AWS by Jayanth Kumar, Mandeep Singh

Chapter 4. Caching Policies and Strategies

A Note for Early Release Readers


With Early Release ebooks, you get books in their earliest form—the authors’ raw and
unedited content as they write—so you can take advantage of these technologies long be-
fore the official release of these titles.

This will be the 4th chapter of the final book. Please note that the GitHub repo will be
made active later on.

If you have comments about how we might improve the content and/or examples in this
book, or if you notice missing material within this chapter, please reach out to the editor
at [email protected].

In computing, a cache is a component or mechanism used to store frequently ac-


cessed data or instructions closer to the processor, reducing the latency of retriev-
ing the information from slower storage or external resources. Caches are typically
implemented as high-speed, small-capacity memories located closer to the CPU
than the main memory. The goal is to improve overall system performance by re-
ducing the time required to access data or instructions.
The concept of cache revolves around the principle of locality, which suggests that
SIGN IN TRY NOW
programs tend to access a relatively small portion of their data or instructions re-
peatedly. By storing this frequently accessed information in a cache, subsequent
access to the same data can be served faster, resulting in improved performance.

When data is accessed from a cache, there are two possible outcomes: cache hit
and cache miss. A cache hit occurs when the requested data is found in the cache,
allowing for fast retrieval without accessing the slower main memory or external
resources. On the other hand, a cache miss happens when the requested data is
not present in the cache, requiring the system to fetch the data from the main
memory or external storage. Cache hit rates measure the effectiveness of the
cache in serving requests without needing to access slower external storage, while
cache miss rates indicate how often the cache fails to serve requested data.

This chapter will cover important information to help you understand how to use
data caching effectively. We’lll talk about cache eviction policies, which are rules
for deciding when to remove data from the cache to make retrieval of important
data faster. We’ll also cover cache invalidation, which ensures the cached data is
always correct and matches the real underlying data source. The chapter will also
discuss caching strategies for both read and write intensive applications. We’ll also
cover how to actually put caching into action, including where to put the caches to
get the best results. You’ll also learn about different ways caching works and why
Content Delivery Networks (CDNs) are important. And finally, you’ll learn about
two popular open-source caching solutions. So, let’s get started with the benefits
of caching.

Caching Benefits
Caches play a crucial role in improving system performance and reducing latency
for several reasons:

Faster Access

Caches offer faster access times compared to main memory or external


storage. By keeping frequently accessed data closer to the CPU, cache
access times can be significantly lower, reducing the time required to
fetch data.
Reduced Latency
SIGN IN TRY NOW

Caches help reduce latency by reducing the need to access slower stor-
age resources. By serving data from a cache hit, the system avoids the
delay associated with fetching data from main memory or external
sources, thereby reducing overall latency.

Bandwidth Optimization

Caches help optimize bandwidth usage by reducing the number of re-


quests sent to slower storage. When data is frequently accessed from
the cache, it reduces the demand on the memory bus or external inter-
faces, freeing up resources for other operations.

Improved Throughput

Caches improve overall system throughput by allowing the CPU to ac-


cess frequently needed data quickly, without waiting for slower storage
access. This enables the CPU to perform more computations in a given
amount of time, increasing overall system throughput.

Amdahl’s Law and the Pareto distribution provide further insights into the benefits
of caching:

Amdahl’s Law

Amdahl’s Law states that the overall speedup achieved by optimizing a


particular component of a system is limited by the fraction of time that
component is utilized. Caches, being a critical optimization component,
can have a significant impact on overall system performance, especially
when the fraction of cache hits is high. Amdahl’s Law emphasizes the im-
portance of efficient caching to maximize the benefits of performance
optimization.

Pareto Distribution

The Pareto distribution, also known as the 80/20 rule, states that a sig-
nificant portion of the system’s workload is driven by a small fraction of
the data. Caching aligns well with this distribution by allowing the fre-
quently accessed data to reside in a fast cache, serving the most critical
operations efficiently. By focusing caching efforts on the most accessed
data, the Pareto distribution can be leveraged to SIGN IN TRY NOW
optimize performance
for the most important workloads.

In summary, caches provide faster access to frequently accessed data, reducing la-
tency and improving overall system performance. They help optimize bandwidth,
increase throughput, and align with principles such as Amdahl’s Law and the
Pareto distribution to maximize performance benefits.

The next section will cover different policies to perform cache eviction, like tech-
niques such as least recently used (LRU) and least frequently used (LFU), which
can help you choose the best caching policy for different situations.

Cache Eviction Policies


Caching plays a crucial role in improving the performance and efficiency of data re-
trieval systems by storing frequently accessed data closer to the consumers.
Caching policies determine how the cache handles data eviction and replacement,
when its capacity is reached. Cache eviction policies try to maximize the cache hit
ratio—the percentage of time the requested item was found in the cache and
served. Higher cache hit ratio reduces the necessity to retrieve data from external
storage, resulting in better system performance. In this section, we will explore
various caching policies, including Belady’s algorithm, queue-based policies (FIFO,
LIFO), recency-based policies (LRU, TLRU, MRU, SLRU), and frequency-based poli-
cies (LFU, LFRU).

Belady’s Algorithm
Belady’s algorithm is an optimal caching algorithm that evicts the data item that
will be used furthest in the future. It requires knowledge of the future access pat-
tern, which is usually impractical to obtain. Belady’s algorithm serves as a theoret-
ical benchmark for evaluating the performance of other caching policies.

Queue-Based Policies

Queue-based cache eviction policies involve managing the cache by


treating it like a queue. When the cache reaches its capacity, one of the
queue-based policies is used to remove data to make space for new data.
SIGN IN TRY NOW

FIFO (First-In-First-Out)

FIFO is a simple caching policy that evicts the oldest data item
from the cache. It follows the principle that the first data item
inserted into the cache is the first one to be evicted when the
cache is full. FIFO is easy to implement but may suffer from
the “aging” problem, where recently accessed items are
evicted prematurely.

LIFO (Last-In-First-Out)

LIFO is the opposite of FIFO, where the most recently inserted


data item is the first one to be evicted. LIFO does not consider
the access pattern and can result in poor cache utilization and
eviction decisions.
Recency-Based Policies

Recency-based cache eviction policies focus on the time aspect of data


access. These policies prioritize keeping the most recently accessed
items in the cache.

LRU (Least Recently Used)

LRU is a popular caching policy that evicts the least recently


accessed data item from the cache. It assumes that recently
accessed items are more likely to be accessed in the near fu-
ture. LRU requires tracking access timestamps for each item,
making it slightly more complex to implement.

MRU (Most Recently Used)

MRU evicts the most recently accessed data item from the
cache. It assumes that the most recently accessed item is likely
to be accessed again soon. MRU can be useful in scenarios
where a small subset of items is accessed frequently.

Frequency-Based Policies
Frequency-based cache eviction policies prioritize retaining items in the
cache based on how often they are accessed. TheSIGN IN TRY NOW
cache replaces items
that have been accessed the least frequently, assuming that rarely ac-
cessed data may not be as critical for performance optimization.

LFU (Least Frequently Used)

LFU evicts the least frequently accessed data item from the
cache. It assumes that items with lower access frequency are
less likely to be accessed in the future. LFU requires maintain-
ing access frequency counts for each item, which can be mem-
ory-intensive.

LFRU (Least Frequently Recently Used)

LFRU combines the concepts of LFU and LRU by considering


both the frequency of access and recency of access. It evicts
the item with the lowest frequency count among the least re-
cently used items.

Allowlist Policy
An allowlist policy for cache replacement is a mechanism that defines a set of pri-
oritized items eligible for retention in a cache when space is limited. Instead of us-
ing a traditional cache eviction policy that removes the least recently used or least
frequently accessed items, an allowlist policy focuses on explicitly specifying
which items should be preserved in the cache. This policy ensures that important
or high-priority data remains available in the cache, even during periods of cache
pressure. By allowing specific items to remain in the cache while evicting others,
the allowlist policy optimizes cache utilization and improves performance for criti-
cal data access scenarios.

Caching policies serve different purposes and exhibit varying performance charac-
teristics based on the access patterns and workload of the system. Choosing the
right caching policy depends on the specific requirements and characteristics of
the application.
By understanding and implementing these caching policies effectively, system de-
SIGN IN TRY NOW
signers and developers can optimize cache utilization, improve data retrieval per-
formance, and enhance the overall user experience. Let’s discuss different cache
invalidation strategies, which are applied post identifying which data to evict
based on the above cache eviction policies.

Cache Invalidation
Cache invalidation is a crucial aspect of cache management that ensures the
cached data remains consistent with the underlying data source. Effective cache
invalidation strategies help maintain data integrity and prevent stale or outdated
data from being served. Here are three common cache invalidation techniques: ac-
tive invalidation, invalidating on modification, invalidating on read, and time-to-
live (TTL).

Active Invalidation

Active invalidation involves explicitly removing or invalidating cached


data when changes occur in the underlying data source. This approach
requires the application or the system to actively notify or trigger cache
invalidation operations. For example, when data is modified or deleted in
the data source, the cache is immediately updated or cleared to ensure
that subsequent requests fetch the latest data. Active invalidation pro-
vides precise control over cache consistency but requires additional
overhead to manage the invalidation process effectively.

Invalidating on Modification

With invalidating on modification, the cache is invalidated when data in


the underlying data source is modified. When a modification operation
occurs, such as an update or deletion, the cache is notified or flagged to
invalidate the corresponding cached data. The next access to the invali-
dated data triggers a cache miss, and the data is fetched from the data
source, ensuring the cache contains the most up-to-date information.
This approach minimizes the chances of serving stale data but introduces
a slight delay for cache misses during the invalidation process.

Invalidating on Read
In invalidating on read, the cache is invalidated when the cached data is
accessed or read. Upon receiving a read request, SIGN IN TRY NOW
the cache checks if the
data is still valid or has expired. If the data is expired or flagged as in-
valid, the cache fetches the latest data from the data source and updates
the cache before serving the request. This approach guarantees that
fresh data is always served, but it adds overhead to each read operation
since the cache must validate the data’s freshness before responding.

Time-to-Live (TTL)

Time-to-Live is a cache invalidation technique that associates a time du-


ration with each cached item. When an item is stored in the cache, it is
marked with a TTL value indicating how long the item is considered
valid. After the TTL period elapses, the cache treats the item as expired,
and subsequent requests for the expired item trigger cache misses,
prompting the cache to fetch the latest data from the data source. TTL-
based cache invalidation provides a simple and automatic way to manage
cache freshness, but it may result in serving slightly stale data until the
TTL expires.

The choice of cache invalidation strategy depends on factors such as the nature of
the data, the frequency of updates, the performance requirements, and the de-
sired consistency guarantees. Active invalidation offers precise control but re-
quires active management, invalidating on modification ensures immediate data
freshness, invalidating on read guarantees fresh data on every read operation, and
TTL-based invalidation provides a time-based expiration mechanism.
Understanding the characteristics of the data and the system’s requirements helps
in selecting the appropriate cache invalidation strategy to maintain data consis-
tency and improve overall performance.

The next section covers different caching strategies to ensure that data is properly
consistent between the cache and the underlying data source.

Caching Strategies
Caching strategies define how data is managed and synchronized between the
cache and the underlying data source. In this section, we will explore several
caching strategies as shown in Figure 4-1, including cache-aside, read-through, re-
fresh-ahead, write-through, write-around, and write-back.SIGN IN
TRY NOW

Figure 4-1. Caching Strategies

The left-hand side of the diagram displays read-intensive caching strategies, fo-
cusing on optimizing the retrieval of data that is frequently read or accessed. The
goal of a read-intensive caching strategy is to minimize the latency and improve
the overall performance of read operations by serving the cached data directly
from memory, which is much faster than fetching it from a slower and more dis-
tant data source. This strategy is particularly beneficial for applications where the
majority of operations involve reading data rather than updating or writing data.

Let’s take a look at those in more detail:

Cache-Aside

Cache-aside caching strategy, also known as lazy loading, delegates the


responsibility of managing the cache to the application code. When data
is requested, the application first checks the cache. If the data is found,
it is returned from the cache. If the data is not in the cache, the applica-
tion retrieves it from the data source, stores it in the cache, and then re-
turns it to the caller. Cache-aside caching offers flexibility as the appli-
cation has full control over caching decisions but requires additional
logic to manage the cache.

Read-Through
Read-through caching strategy retrieves data from the cache if available;
SIGN IN TRY NOW
otherwise, it fetches the data from the underlying data source. When a
cache miss occurs for a read operation, the cache retrieves the data from
the data source, stores it in the cache for future use, and returns the
data to the caller. Subsequent read requests for the same data can be
served directly from the cache, improving the overall read performance.
This strategy offloads the responsibility of managing cache lookups from
the application unlike Cache-aside strategy, providing a simplified data
retrieval process.

Refresh-Ahead

Refresh-ahead caching strategy, also known as prefetching, proactively


retrieves data from the data source into the cache before it is explicitly
requested. The cache anticipates the future need for specific data items
and fetches them in advance. By prefetching data, the cache reduces la-
tency for subsequent read requests and improves the overall data re-
trieval performance.

The right-hand side of the diagram displays the write-intensive strategies, fo-
cussing around optimizing the storage and management of data that is frequently
updated or written. Unlike read-intensive caching, where the focus is on optimiz-
ing data retrieval, a write-intensive caching strategy aims to enhance the effi-
ciency of data updates and writes, while still maintaining acceptable performance
levels. In a write-intensive caching strategy, the cache is designed to handle fre-
quent write operations, ensuring that updated data is stored temporarily in the
cache before being eventually synchronized with the underlying data source, such
as a database or a remote server. This approach can help reduce the load on the
primary data store and improve the application’s responsiveness by acknowledging
write operations more quickly.

Let’s take a look at those in more detail:

Write-Through

Write-through caching strategy involves writing data to both the cache


and the underlying data source simultaneously. When a write operation
occurs, the data is first written to the cache and then immediately prop-
agated to the persistent storage synchronously before the write opera-
SIGN IN TRY NOW
tion is considered complete. This strategy ensures that the data remains
consistent between the cache and the data source. However, it may in-
troduce additional latency due to the synchronous write operations.
Write-Around

Write-around caching strategy involves bypassing the cache for write


operations. When the application wants to update data, it writes directly
to the underlying data source, bypassing the cache. As a result, the writ-
ten data does not reside in the cache, reducing cache pollution with in-
frequently accessed data. However, subsequent read operations for the
updated data might experience cache misses until the data is fetched
again from the data source and cached.

Write-Back

Write-back caching strategy allows write operations to be performed di-


rectly on the cache, deferring the update to the underlying data source
until a later time. When data is modified in the cache, the change is
recorded in the cache itself, and the update is eventually propagated to
the data source asynchronously on schedule or or when specific condi-
tions are met (e.g., cache eviction, time intervals). Write-back caching
provides faster write operations by reducing the number of immediate
disk writes. However, it introduces a potential risk of data loss in the
event of a system failure before the changes are flushed to the data
source.

Each caching strategy has its own advantages and considerations, and the selec-
tion of an appropriate strategy depends on the specific requirements and charac-
teristics of the system.

By understanding these caching strategies, system designers and developers can


make informed decisions to optimize data access and improve the overall perfor-
mance of their applications. Let’s cover different deployment options for a cache
in the overall system and how it affects the performance and data sharing.

Caching Deployment
When deploying a cache, various deployment options are available depending on
SIGN IN TRY NOW
the specific requirements and architecture of the system. Here are three common
cache deployment approaches: in-process caching, inter-process caching, and re-
mote caching.

In-Process Caching

In in-process caching, the cache resides within the same process or ap-
plication as the requesting component. The cache is typically imple-
mented as an in-memory data store and is directly accessible by the ap-
plication or service. In-process caching provides fast data access and low
latency since the cache is located within the same process, enabling di-
rect access to the cached data. This deployment approach is suitable for
scenarios where data sharing and caching requirements are limited to a
single application or process.

Inter-Process Caching

Inter-process caching involves deploying the cache as a separate process


or service that runs alongside the applications or services. The cache acts
as a dedicated caching layer that can be accessed by multiple applica-
tions or processes. Applications communicate with the cache using inter-
process communication mechanisms such as shared memory, pipes,
sockets, or remote procedure calls (RPC). Inter-process caching allows
multiple applications to share and access the cached data, enabling bet-
ter resource utilization and data consistency across different compo-
nents. It is well-suited for scenarios where data needs to be shared and
cached across multiple applications or processes within a single machine.

Remote Caching

Remote caching involves deploying the cache as a separate service or


cluster that runs on a different machine or location than the requesting
components. The cache service is accessed remotely over a network us-
ing protocols such as HTTP, TCP/IP, or custom communication protocols.
Remote caching enables distributed caching and can be used to share
and cache data across multiple machines or even geographically distrib-
uted locations. It provides scalability, fault-tolerance, and the ability to
share cached data among different applications or services running on
separate machines. Remote caching is suitable for scenarios that require
caching data across a distributed system or whenSIGN
the IN
TRY NOW
cache needs to be
accessed by components running on different machines.

The choice of cache deployment depends on factors such as the scale of the sys-
tem, performance requirements, data sharing needs, and architectural considera-
tions. In-process caching offers low latency and direct access to data within a sin-
gle process, inter-process caching enables sharing and caching data across multi-
ple applications or processes, and remote caching provides distributed caching ca-
pabilities across multiple machines or locations. Understanding the specific re-
quirements and characteristics of the system helps in selecting the appropriate
cache deployment strategy to optimize performance and resource utilization. Let’s
cover different caching mechanisms to improve application performance in the
next section.

Caching Mechanisms
In this section, we will explore different caching mechanisms, including client-side
caching, CDN caching, web server caching, application caching, database caching,
query-level caching, and object-level caching.

Client-side Caching

Client-side caching involves storing cached data on the client device,


typically in the browser’s memory or local storage. This mechanism al-
lows web applications to store and retrieve static resources, such as
HTML, CSS, JavaScript, and images, directly from the client’s device.
Client-side caching reduces the need to fetch resources from the server
on subsequent requests, leading to faster page load times and improved
user experience.

CDN Caching

Content Delivery Network (CDN) caching is a mechanism that involves


caching static content on distributed servers strategically located across
different geographic regions. CDNs serve cached content to users based
on their proximity to the CDN server, reducing the latency and load on
the origin server. CDN caching is commonly used to cache static files,
SIGN IN TRY NOW
media assets, and other frequently accessed content, improving the
overall performance and scalability of web applications.
Web Server Caching

Web server caching refers to caching mechanisms implemented at the


server-side to store frequently accessed content. When a request is
made to the server, it first checks if the requested content is already
cached. If found, the server serves the cached content directly, avoiding
the need to regenerate the content. Web server caching is effective for
static web pages, dynamic content with a long expiration time, and con-
tent that is expensive to generate.

Application Caching

Application caching involves caching data within the application’s mem-


ory or in-memory databases. It is typically used to store frequently ac-
cessed data or computation results that are costly to generate or re-
trieve from other data sources. Application caching improves response
times by reducing the need for repeated data retrieval and computation,
enhancing the overall performance of the application.

Database Caching

Database caching focuses on improving the performance of database op-


erations by caching frequently accessed data or query results. This
caching mechanism can be implemented at different levels: query-level
caching and object-level caching.

Query-Level Caching

Query-level caching involves storing the results of frequently


executed queries in memory. When the same query is executed
again, the cached result is served instead of querying the data-
base again, reducing the database load and improving re-
sponse times.

Object-Level Caching
Object-level caching caches individual data objects or records
SIGN IN TRY NOW
retrieved from the database. This mechanism is useful when
accessing specific objects frequently or when the database is
relatively static. Object-level caching reduces the need for fre-
quent database queries, improving overall application
performance.

By employing these caching mechanisms as shown in Figure 4-3, organizations and


developers can optimize data retrieval, reduce latency, and improve the scalability
and responsiveness of their systems. However, it is essential to carefully consider
cache invalidation, cache coherence, and cache management strategies to ensure
the consistency and integrity of the cached data.

Figure 4-2. Caching mechanisms employed at different stages

Out of the above mechanisms, Content Delivery Networks (CDNs) play a crucial
role in improving the performance and availability of web content to end-users by
reducing latency and enhancing scalability by caching at edge locations. Let’s
cover CDNs in detail in the next section.

Content Delivery Networks


CDNs employ various strategies and architectural models to efficiently distribute
and cache content across geographically distributed servers. This section explores
different types of CDNs, including push and pull CDNs, optimization techniques for
CDNs, and methods for ensuring content consistency within CDNs.

CDNs can be categorized into two main types: push and pull CDNs.

Push CDN
In a push CDN, content is pre-cached and distributed to edge servers in
SIGN IN TRY NOW
advance. The CDN provider proactively pushes content to edge locations
based on predicted demand or predetermined rules. With push CDNs,
content is only uploaded when it is new or changed, reducing traffic
while maximizing storage efficiency. This approach ensures faster con-
tent delivery as the content is readily available at the edge servers when
requested by end-users.

Push CDNs are suitable for websites with low traffic or content that
doesn’t require frequent updates. Instead of regularly pulling content
from the server, it is uploaded to the CDNs once and remains there until
changes occur.

Pull CDN

In a pull CDN, content is cached on-demand. The CDN servers pull the
content from the origin server when the first user requests it. The con-
tent is then cached at the edge servers for subsequent requests, opti-
mizing delivery for future users. The duration for which content is
cached is determined by a time-to-live (TTL) setting. Pull CDNs minimize
storage space on the CDN, but there can be redundant traffic if files are
pulled before their expiration, resulting in unnecessary data transfer.

Pull CDNs are well-suited for websites with high traffic since recently-
requested content remains on the CDN, evenly distributing the load.

CDNs employ different optimization techniques to improve the performance of


caching at the edge server. Let’s cover some of these optimization techniques in
detail.

Dynamic Content Caching Optimization

CDNs face challenges when caching dynamic content that frequently


changes based on user interactions or real-time data. To optimize dy-
namic content caching, CDNs employ various techniques such as:

Content Fragmentation
Breaking down dynamic content into smaller fragments to en-
SIGN IN TRY NOW
able partial caching and efficient updates.

Edge-Side Includes (ESI)

Implementing ESI tags to separate dynamic and static content,


allowing dynamic portions to be processed on-the-fly while
caching the static fragments.

Content Personalization

Leveraging user profiling and segmentation techniques to


cache personalized or user-specific content at the edge
servers.
Multi-tier CDN architecture

Multi-tier CDN architecture involves the distribution of content across


multiple layers or tiers of edge servers. This approach allows for better
scalability, fault tolerance, and improved content delivery to geographi-
cally diverse regions. It enables efficient content replication and reduces
latency by bringing content closer to end-users.

DNS Redirection

DNS redirection is a mechanism employed by CDNs to direct user re-


quests to the nearest or most suitable edge server. By resolving DNS
queries to the most appropriate server based on factors like geographic
proximity, network conditions, and server availability, CDNs optimize
content delivery and reduce latency.

Client Multiplexing

Client multiplexing refers to the technique of combining multiple HTTP


requests and responses into a single connection between the client and
the edge server. This reduces the overhead of establishing multiple con-
nections and improves efficiency, especially for small object requests,
resulting in faster content delivery. Content Consistency in CDNs

Ensuring content consistency across multiple edge servers within a CDN is crucial
to deliver the most up-to-date and accurate content. CDNs employ various meth-
ods to maintain content consistency, including:
SIGN IN TRY NOW

Periodic Polling

CDNs periodically poll the origin server to check for updates or changes
in content. This ensures that cached content is refreshed to reflect the
latest version.

Time-to-Live (TTL)

CDNs utilize Time-to-Live values, specified in HTTP headers or DNS


records, to determine how long cached content remains valid. Once the
TTL expires, the CDN fetches updated content from the origin server.

Leases

CDNs use lease-based mechanisms to control the duration of content


caching at the edge servers. Leases define a specific time window during
which the content remains valid before requiring renewal or revalidation.

Note

AWS offers Amazon Cloudfront, a pull CDN offering built for high performance, se-
curity, and developer convenience, which we will cover in more detail in Chapter 9
- AWS Network Services.

Before ending the section, let’s also understand that using a CDN can come with
certain drawbacks also due to cost, stale content and frequent URL changes. CDNs
may involve significant costs depending on the amount of traffic. However, it’s im-
portant to consider these costs in comparison to the expenses you would incur
without utilizing a CDN. If updates are made before the TTL expires, there is a pos-
sibility of content being outdated until it is refreshed on the CDN. CDNs require
modifying URLs for static content to point to the CDN, which can be an additional
task to manage.

Overall, CDNs offer benefits in terms of performance and scalability but require
careful consideration of these factors and the specific needs of your website. At
the end of this chapter, let’s dive deeper into two popular open-source caching so-
lutions to understand their architecture and how they implement the caching con-
cepts discussed in the chapter.
Open Source Caching Solutions SIGN IN TRY NOW

Open source caching solutions, such as Redis and Memcached, have gained popu-
larity due to their efficiency, scalability, and ease of use. Let’s take a closer look at
Memcached and Redis, two widely adopted open-source caching solutions.

Memcached
Memcached is an open-source, high-performance caching solution widely used in
web applications. It operates as a distributed memory object caching system, stor-
ing data in memory across multiple servers. Here are some key features and bene-
fits of Memcached:

Simple and Lightweight

Memcached is designed to be simple, lightweight, and easy to deploy. It


focuses solely on caching and provides a straightforward key-value in-
terface for data storage and retrieval.

Horizontal Scalability

Memcached follows a distributed architecture, allowing it to scale hori-


zontally by adding more servers to the cache cluster. This distributed ap-
proach ensures high availability, fault tolerance, and improved perfor-
mance for growing workloads.

Protocol Compatibility

Memcached adheres to a simple protocol that is compatible with various


programming languages. This compatibility makes it easy to integrate
Memcached into applications developed in different languages.

Transparent Caching Layer

Memcached operates as a transparent caching layer, sitting between the


application and the data source. It helps alleviate database or API load
by caching frequently accessed data, reducing the need for repetitive
queries.
Let’s take a look at Memcached’s architecture.
SIGN IN TRY NOW

Memcached Architecture
Memcached’s architecture consists of a centralized server that coordinates the
storage and retrieval of cached data. When a client sends a request to store or re-
trieve data, the server handles the request and interacts with the underlying mem-
ory allocation strategy.

Memcached follows a multi-threaded architecture that enables it to efficiently


handle concurrent requests and scale across multiple CPU cores. In this architec-
ture, Memcached utilizes a pool of worker threads that can simultaneously process
client requests. Each worker thread is responsible for handling a subset of incom-
ing requests, allowing for parallel execution and improved throughput. This multi-
threaded approach ensures that Memcached can effectively handle high traffic
loads and distribute the processing workload across available CPU resources. By
leveraging multiple threads, Memcached can achieve better performance and re-
sponsiveness, making it suitable for demanding caching scenarios where high con-
currency is a requirement.

In terms of memory allocation, Memcached employs a slab allocation strategy. It


divides the allocated memory into fixed-size chunks called slabs. Each slab is fur-
ther divided into smaller units known as pages. These pages are then allocated to
store individual cache items. The slab allocation strategy allows Memcached to ef-
ficiently manage memory by grouping items of similar sizes together. It reduces
memory fragmentation and improves memory utilization.

When a new item is added to the cache, Memcached determines the appropriate
slab size for the item based on its size. If an existing slab with enough free space is
available, the item is stored in that slab. Otherwise, Memcached allocates a new
slab from the available memory pool and adds the item to that slab. The slab allo-
cation strategy enables efficient memory utilization and allows Memcached to
store a large number of items in memory while maintaining optimal performance.

Overall, Memcached’s architecture and memory allocation strategy work together


to provide a lightweight and efficient caching solution that can handle high traffic
loads and deliver fast data access times. By leveraging memory effectively and
employing a scalable architecture, Memcached enables applications to signifi-
SIGN IN TRY NOW
cantly improve performance by caching frequently accessed data in memory.

Redis
Redis, short for Remote Dictionary Server, is a server-based in-memory data struc-
ture store that can serve as a high-performance cache. Unlike traditional databases
that rely on iterating, sorting, and ordering rows, Redis organizes data in custom-
izable data structures from the ground up, supporting a wide range of data types,
including strings, bitmaps, bitfields, lists, sets, hashes, geospatial, hyperlog and
more, making it versatile for various caching use cases. Here are some key features
and benefits of Redis:

High Performance

Redis is designed for speed, leveraging an in-memory storage model


that allows for extremely fast data retrieval and updates. It can handle a
massive number of operations per second, making it suitable for high-
demand applications.

Persistence Options

Redis provides persistence options that allow data to be stored on disk,


ensuring durability even in the event of system restarts. This feature
makes Redis suitable for use cases where data needs to be retained be-
yond system restarts or cache invalidations.

Advanced Caching Features

Redis offers advanced caching features, such as expiration times, evic-


tion policies, and automatic cache invalidation based on time-to-live
(TTL) values. It also supports data partitioning and replication for scala-
bility and fault tolerance.

Pub/Sub and Messaging

Redis includes publish/subscribe (pub/sub) messaging capabilities, en-


abling real-time messaging and event-driven architectures. This makes it
suitable for scenarios involving real-time data updates and notifications.
Redis serves as an in-memory database primarily used as a cache in front of other
SIGN IN TRY NOW
databases like MySQL or PostgreSQL. By leveraging the speed of memory, Redis
enhances application performance and reduces the load on the main database. It is
particularly useful for storing data that changes infrequently but is frequently re-
quested, as well as data that is less critical but undergoes frequent updates.
Examples of such data include session or data caches, leaderboard information,
and roll-up analytics for dashboards.

Redis architecture is designed for high performance, low latency, and simplicity. It
provides a range of deployment options for ensuring high availability based on the
requirements and cost constraints. Let’s go over the availability in Redis deploy-
ments in detail, followed by persistence models for redis durability and memory
management in Redis.

Availability in Redis Deployments


Redis supports different deployment architectures as shown in Figure 4-3, includ-
ing a single Redis instance, Redis HA (High Availability), Redis Sentinel, and Redis
Cluster. Each architecture has its trade-offs and is suitable for different use cases
and scalability needs.

Figure 4-3. Redis Deployment Setups

Single Redis Instance

In a single Redis instance setup, Redis is deployed as a standalone


server. While it is straightforward and suitable for small instances, it
lacks fault tolerance. If the instance fails or becomes unavailable, all
client calls to Redis will fail, affecting overall system performance.

Redis HA (High Availability)


Redis HA involves deploying a main Redis instance with one or more sec-
SIGN IN TRY NOW
ondary instances that synchronize with replication. The secondary in-
stances can help scale read operations or provide failover in case the
main instance is lost. Replication ID and offset play a crucial role in the
synchronization process, allowing secondary instances to catch up with
the main instance’s data.

Redis Sentinel

Redis Sentinel is a distributed system that ensures high availability for


Redis. Sentinel processes coordinate the state and monitor the availabil-
ity of main and secondary instances. They also serve as a point of discov-
ery for clients, informing them of the current main instance. Sentinel
processes can start a failover process if the primary instance becomes
unavailable.

Redis Cluster

Redis Cluster enables horizontal scaling by distributing data across mul-


tiple machines or shards. Algorithmic sharding is used to determine
which Redis instance (shard) holds a specific key. Redis Cluster employs
a hashsloting mechanism to map data to shards and allows for seamless
resharding when adding new instances to the cluster. Gossip Protocol is
used in Redis Cluster to maintain cluster health. Nodes constantly com-
municate to determine the availability of shards and can promote sec-
ondary instances to primary if needed.

Durability in Redis Deployment

Redis provides two persistence models for data durability: RDB files (Redis
Database Files) and AOF (Append-Only File). These persistence mechanisms en-
sure that data is not lost in case of system restarts or crashes. Let’s explore both
models in more detail:

RDB Files (Redis Database Files)

RDB is the default persistence model in Redis. It periodically creates snapshots of


the dataset and saves them as binary RDB files. These files capture the state of the
Redis database at a specific point in time. Here are key features and considerations
SIGN IN TRY NOW
of RDB persistence:

Snapshot-based Persistence

RDB persistence works by periodically taking snapshots of the entire


dataset and storing it in a file. The frequency of snapshots can be config-
ured based on requirements.

Efficiency and Speed

RDB files are highly efficient in terms of disk space usage and data load-
ing speed. They are compact and can be loaded back into Redis quickly,
making it suitable for scenarios where fast recovery is essential.

Full Data Recovery

RDB files provide full data recovery as they contain the entire dataset. In
case of system failures, Redis can restore the data by loading the most
recent RDB file available.

However, it’s worth noting that RDB files have some limitations. Since they are
snapshots, they do not provide real-time durability and may result in data loss if a
crash occurs between two snapshot points. Additionally, restoring large RDB files
can take time and impact the system’s performance during the recovery process.

AOF (Append-Only File)

AOF persistence is an alternative persistence model in Redis that logs every write
operation to an append-only file. AOF captures a sequential log of write opera-
tions, enabling Redis to reconstruct the dataset by replaying the log. Here are key
features and considerations of AOF persistence:

Write-ahead Log

AOF persists every write operation to the append-only file as a series of


commands or raw data. This log can be used to rebuild the dataset from
scratch.

Durability and Flexibility


AOF offers more durability than RDB files since it captures every write
SIGN IN TRY NOW
operation. It provides the ability to recover data up to the last executed
command. Moreover, AOF offers different persistence options (such as
every write, every second, or both) to balance durability and
performance.

Append-only Nature

AOF appends new write operations to the end of the file, ensuring that
the original dataset is never modified. This approach protects against
data corruption caused by crashes or power failures.

However, AOF persistence comes with its own considerations. The append-only
file can grow larger over time, potentially occupying significant disk space. Redis
offers options for AOF file rewriting to compact the log and reduce its size.
Additionally, AOF persistence typically has a slightly higher performance overhead
compared to RDB files due to the need to write every command to disk.

In practice, Redis users often employ a combination of RDB and AOF persistence
based on their specific requirements and trade-offs between performance, durabil-
ity, and recovery time objectives.

It’s important to note that Redis also provides an option to use no persistence
(volatile mode) if durability is not a primary concern or if data can be regenerated
from an external source in the event of a restart or crash.

Memory Management in Redis

Redis leverages forking and copy-on-write (COW) techniques to facilitate data per-
sistence efficiently within its single-threaded architecture. When Redis performs a
snapshot (RDB) or background saving operation, it follows these steps:

1. 1. Forking: Redis uses the fork() system call to create a child process, which is
an identical copy of the parent process. Forking is a lightweight operation as
it creates a copy-on-write clone of the parent’s memory.
2. 2. Copy-on-Write (COW): Initially, the child process shares the same memory
pages with the parent process. However, when eitherSIGN
the IN
TRY NOW
parent or child
process modifies a memory page, COW comes into play. Instead of immedi-
ately duplicating the modified page, the operating system creates a new
copy only when necessary.

By employing COW, Redis achieves the following benefits:

Memory Efficiency

When the child process is initially created, it shares the same memory
pages with the parent process. This shared memory approach consumes
minimal additional memory. Only the modified pages are copied when
necessary, saving memory resources.

Performance

Since only the modified pages are duplicated, Redis can take advantage
of the COW mechanism to perform persistence operations without incur-
ring a significant performance overhead. This is particularly beneficial for
large datasets where copying the entire dataset for persistence would be
time-consuming.

Fork Safety

Redis uses fork-based persistence to avoid blocking the main event loop
during the snapshot process. By forking a child process, the parent
process can continue serving client requests while the child process per-
forms the persistence operation independently. This ensures high re-
sponsiveness and uninterrupted service.

It’s important to note that while forking and COW provide memory efficiency and
performance benefits, they also have considerations. Forking can result in in-
creased memory usage during the copy-on-write process if many modified pages
need to be duplicated. Additionally, the fork operation may be slower on systems
with large memory footprints.
Overall, Redis effectively utilizes forking and copy-on-write mechanisms within its
SIGN IN TRY NOW
single-threaded architecture to achieve efficient data persistence. By employing
these techniques, Redis can perform snapshots and background saving operations
without significantly impacting its performance or memory usage.

Overall, Redis offers developers a powerful and flexible data storage solution with
various deployment options and capabilities.

Both Redis and Memcached are excellent open-source caching solutions with their
unique strengths. The choice between them depends on specific requirements and
use cases. Redis is suitable for scenarios requiring versatile data structures, persis-
tence, pub/sub messaging, and advanced caching features. On the other hand,
Memcached shines in simple, lightweight caching use cases that prioritize scalabil-
ity and ease of integration.

Note

AWS offers Amazon Elasticache, compatible with both Redis and Memcached for
real-time use cases like caching, session stores, gaming, geo-spatial services, real-
time analytics, and queuing, which we will cover in more detail in Chapter 10 -
AWS Storage Services.

Conclusion
In concluding this chapter on caching, we have journeyed through a comprehen-
sive exploration of the fundamental concepts and strategies that empower effi-
cient data caching. We’ve covered cache eviction policies, cache invalidation mech-
anisms, and a plethora of caching strategies, equipping you with the knowledge to
optimize data access and storage. We’ve delved into caching deployment, under-
standing how strategic placement can maximize impact, and explored the diverse
caching mechanisms available. Additionally, we’ve touched upon Content Delivery
Networks (CDNs) and open-source caching solutions including Redis and
Memcached, that offer robust options for enhancing performance. By incorporat-
ing Redis or Memcached into your architecture, you can significantly improve ap-
plication performance, reduce response times, and enhance the overall user expe-
rience by leveraging the power of in-memory caching.
As we move forward in our exploration of enhancing system performance, the next
SIGN IN TRY NOW
chapter will embark on an exploration of scaling and load balancing strategies.
Scaling is a pivotal aspect of modern computing, allowing systems to handle in-
creased loads gracefully. We will also delve into strategies for load balancing in
distributing incoming traffic efficiently. Together, these topics will empower you
to design and maintain high-performing systems that can handle the demands of
today’s dynamic digital landscape.

Get System Design on AWS now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by


job role, and more from O’Reilly and nearly 200 top publishers.

START YOUR FREE TRIAL

ABOUT O’REILLY DOWNLOAD THE O’REILLY APP

Teach/write/train Take O’Reilly with you and learn anywhere, anytime on


your phone and tablet.
Careers

Press releases
Media coverage

Community partners
WATCH ON YOUR BIG SCREEN
Affiliate program
View all O’Reilly videos, Superstream events, and Meet
Submit an RFP
the Expert sessions on your home TV.
Diversity

O’Reilly for marketers

SUPPORT
DO NOT SELL MY PERSONAL INFORMATION
Contact us
Newsletters

Privacy policy
SIGN IN TRY NOW

INTERNATIONAL

Australia & New Zealand

Hong Kong & Taiwan


India

Indonesia

Japan

© 2024, O’Reilly Media, Inc. All trademarks and registered trademarks appearing on oreilly.com are the property of their respective
owners.
Terms of service • Privacy policy • Editorial independence

You might also like