Advances in Microprocessor Cache Architectures Over The Last 25 Years

Uploaded by

reg2reg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

293 views

Advances in Microprocessor Cache Architectures Over The Last 25 Years

Uploaded by

reg2reg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

THEME ARTICLE: MICROPROCESSOR AT 50

Advances in Microprocessor Cache

Architectures Over the Last 25 Years
Ravi Iyer , Vivek De , Ramesh Illikkal, David Koufaty , Bhushan Chitlur, Andrew Herdrich,
Muhammad Khellah, Fatih Hamzaoglu, and Eric Karl, Intel Corporation, Hillsboro, OR, 97124, USA

Over the last 25 years, the use of caches has advanced significantly in mainstream
microprocessors to address the memory wall challenge. As we transformed
microprocessors from single-core to multicore to manycore, innovations in the
architecture, design, and management of on-die cache hierarchy were critical to
enabling scaling in performance and efficiency. In addition, at the system level, as
input/output (I/O) devices (e.g., networking) and accelerators (domain-specific)
started to interact with general-purpose cores across shared memory,
advancements in caching became important as a way of minimizing data
movement and enabling faster communication. In this article, we cover some of the
major advancements in cache research and development that have improved the
performance and efficiency of microprocessor servers over the last 25 years. We will
reflect upon several techniques including shared and distributed last-level caches
(including data placement and coherence), cache Quality of Service (addressing
interference between workloads), direct cache access (placing I/O data directly into
CPU caches), and extending caching to off-die accelerators (CXL.cache). We will
also outline potential future directions for cache research and development over
the next 25 years.

CACHING CHALLENGES OVER THE (DRAM), architects had to innovate and advance cach-
LAST 25 YEARS ing techniques to facilitate high bandwidth, low latency

O
ver the last 25 years, we have seen signiﬁcant data access from the core as well as from input/output
advances in microprocessors including sub- (I/O) devices and accelerators. Figure 1 illustrates the
stantial improvements in core frequency and compute growth and the memory wall challenge and
performance, multicore, manycore, and more recently highlights some caching innovations that we will cover
heterogenous compute architectures [diverse central in the article. The advancements in caching (as a result)
processing unit (CPU) cores and tightly coupled acceler- are best described by an illustrative example (see
ators/devices]. These have enabled applications to grow Table 1). About 25 years ago, the Intel Pentium Pro was
rapidly from single-threaded to multithreaded on client launched into the market for both client and eventually
and server platforms and furthermore to multitenant server platforms. The Intel Pentium Pro was a single-
and service-oriented microservices scenarios in virtual- core processor running at 150–200 MHz and featured an
ized cloud infrastructure. All these advancements in off-die but on-package nonblocking L2 cache (256 K at
compute performance and increase in application introduction) connected to the core using a backside
demands required improvements in access to data bus to address the memory latency and enable concur-
(both latency and bandwidth). With a slower pace of rent access to cache and memory. Fast forward 25 years
advancements in dynamic random access memory to our current generation server microprocessors,
which have tens of cores (each multithreaded and capa-
ble of running at well over 3 GHz) with on-die cache
capacity at almost 100 MB or more (including both L2
0272-1732 ß 2021 IEEE
and L3) that is physically distributed across an on-die
Digital Object Identiﬁer 10.1109/MM.2021.3114903 interconnect. Table 1 presents the number of cores, fre-
Date of current version 19 November 2021. quency, and cache sizes for the latest third-generation