SlideShare a Scribd company logo
Core 2 Duo die
“Just a few years ago, the idea of putting multiple processors
on a chip was farfetched. Now it is accepted and
commonplace, and virtually every new high performance
processor is a chip multiprocessor of some sort…”
Center for Electronic System Design
Univ. of California Berkeley
Chip Multiprocessors??
“Mowry is working on the development
of single-chip multiprocessors: one large
chip capable of performing multiple
operations at once, using similar
techniques to maximize performance”
-- Technology Review, 1999
Sony's Playstation 3, 2006
CMP Caches: Design Space
• Architecture
– Placement of Cache/Processors
– Interconnects/Routing
• Cache Organization & Management
– Private/Shared/Hybrid
– Fully Hardware/OS Interface
“L2 is the last line of defense before hitting the
memory wall, and is the focus of our talk”
Private L2 Cache
I$ D$ I$ D$
L2 $ L2 $ L2 $ L2 $ L2 $ L2 $
I N T E R C O N N E C T
Coherence Protocol
Offchip Memory
+ Less interconnect traffic
+ Insulates L2 units
+ Hit latency
– Duplication
– Load imbalance
– Complexity of coherence
– Higher miss rate
L1 L1
Proc
Shared-Interleaved L2 Cache
– Interconnect traffic
– Interference between cores
– Hit latency is higher
+ No duplication
+ Balance the load
+ Lower miss rate
+ Simplicity of coherence
I$ D$ I$ D$
I N T E R C O N N E C T
Coherence ProtocolL1
L2
Take Home Message
• Leverage on-chip access time
Take Home Messages
• Leverage on-chip access time
• Better sharing of cache resources
• Isolating performance of processors
• Place data on the chip close to where it is used
• Minimize inter-processor misses (in shared cache)
• Fairness towards processors
On to some solutions…
Jichuan Chang and Gurindar S. Sohi
Cooperative Caching for Chip Multiprocessors
International Symposium on Computer Architecture, 2006.
Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki
Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches
International Symposium on Computer Architecture, 2009.
Shekhar Srikantaiah, Mahmut Kandemir, and Mary Jane Irwin
Adaptive Set-Pinning: Managing Shared Caches in Chip Multiprocessors
Architectural Support for Programming Languages and Operating, Systems 2008.
each handles this problem in a different way
Co-operative Caching
(Chang & Sohi)
• Private L2 caches
• Attract data locally to reduce remote on chip access.
Lowers average on-chip misses.
• Co-operation among the private caches for efficient
use of resources on the chip.
• Controlling the extent of co-operation to suit the
dynamic workload behavior
CC Techniques
• Cache to cache transfer of clean data
– In case of miss transfer “clean” blocks from another L2 cache.
– This is useful in the case of “read only” data (instructions) .
• Replication aware data replacement
– Singlet/Replicate.
– Evict singlet only when no replicates exist.
– Singlets can be “spilled” to other cache banks.
• Global replacement of inactive data
– Global management needed for managing “spilling”.
– N-Chance Forwarding.
– Set recirculation count to N when spilled.
– Decrease N by 1 when spilled again, unless N becomes 0.
Set “Pinning” -- Setup
P1
P2
P3
P4
Set 0
Set 1
:
:
Set (S-1)
L1
cache
Processors Shared
L2 cache
I
n
t
e
r
c
o
n
n
e
c
t
Main
Memory
Set “Pinning” -- Problem
P1
P2
P3
P4
Set 0
Set 1
:
:
Set (S-1)
Main
Memory
Set “Pinning”
-- Types of Cache Misses
• Compulsory
(aka Cold)
• Capacity
• Conflict
• Coherence
• Compulsory
• Inter-processor
• Intra-processor
versus
P1
P2
P3
P4
Main
Memory
POP 1
POP 2
POP 3
POP 4
Set
:
:
Set
Owner Other bits Data
R-NUCA: Use Class-Based Strategies
Solve for the common case!
Most current (and future) programs have the following types of accesses
1. Instruction Access – Shared, but Read-Only
2. Private Data Access – Read-Write, but not Shared
3. Shared Data Access – Read-Write (or) Read-Only, but Shared.
R-NUCA: Can do this online!
• We have information from the OS and TLB
• For each memory block, classify it as
– Instruction
– Private Data
– Shared Data
• Handle them differently
– Replicate instructions
– Keep private data locally
– Keep shared data globally
R-NUCA: Reactive Clustering
• Assign clusters based on level of sharing
– Private Data given level-1 clusters (local cache)
– Shared Data given level-16 clusters (16 neighboring machines), etc.
Clusters ≈ Overlapping Sets in Set-Associative Mapping
• Within a cluster, “Rotational Interleaving”
– Load-Balancing to minimize contention on bus and controller
Future Directions
Area has been closed.
Just Kidding…
• Optimize for Power Consumption
• Assess trade-offs between more caches and more cores
• Minimize usage of OS, but still retain flexibility
• Application adaptation to allocated cache quotas
• Adding hardware directed thread level speculation
Questions?
THANK YOU!
Backup
• Commercial and research prototypes
– Sun MAJC
– Piranha
– IBM Power 4/5
– Stanford Hydra
Backup
Ad

More Related Content

What's hot (15)

Introduction to Microkernels
Introduction to MicrokernelsIntroduction to Microkernels
Introduction to Microkernels
Vasily Sartakov
 
Plan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIXPlan 9: Not (Only) A Better UNIX
Plan 9: Not (Only) A Better UNIX
National Cheng Kung University
 
From L3 to seL4: What have we learnt in 20 years of L4 microkernels
From L3 to seL4: What have we learnt in 20 years of L4 microkernelsFrom L3 to seL4: What have we learnt in 20 years of L4 microkernels
From L3 to seL4: What have we learnt in 20 years of L4 microkernels
microkerneldude
 
Linux kernel Architecture and Properties
Linux kernel Architecture and PropertiesLinux kernel Architecture and Properties
Linux kernel Architecture and Properties
Saadi Rahman
 
μ-Kernel Evolution
μ-Kernel Evolutionμ-Kernel Evolution
μ-Kernel Evolution
Sergio Shevchenko
 
Architecture Of The Linux Kernel
Architecture Of The Linux KernelArchitecture Of The Linux Kernel
Architecture Of The Linux Kernel
guest547d74
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3
Shah Zaib
 
L4 Microkernel :: Design Overview
L4 Microkernel :: Design OverviewL4 Microkernel :: Design Overview
L4 Microkernel :: Design Overview
National Cheng Kung University
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
Dan Frincu
 
[TALK] Exokernel vs. Microkernel
[TALK] Exokernel vs. Microkernel[TALK] Exokernel vs. Microkernel
[TALK] Exokernel vs. Microkernel
Hawx Chen
 
Hints for L4 Microkernel
Hints for L4 MicrokernelHints for L4 Microkernel
Hints for L4 Microkernel
National Cheng Kung University
 
Embedded Hypervisor for ARM
Embedded Hypervisor for ARMEmbedded Hypervisor for ARM
Embedded Hypervisor for ARM
National Cheng Kung University
 
Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
Smruti Sarangi
 
Implement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVMImplement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVM
National Cheng Kung University
 
Ubuntu
UbuntuUbuntu
Ubuntu
Jacquiline Tabelin
 

Similar to Optimizing shared caches in chip multiprocessors (20)

chapter-6-multiprocessors-and-thread-level (1).ppt
chapter-6-multiprocessors-and-thread-level (1).pptchapter-6-multiprocessors-and-thread-level (1).ppt
chapter-6-multiprocessors-and-thread-level (1).ppt
harishM874937
 
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
npinto
 
Massively Parallel Architectures
Massively Parallel ArchitecturesMassively Parallel Architectures
Massively Parallel Architectures
Jason Hearne-McGuiness
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi core
mukul bhardwaj
 
Cache coherence ppt
Cache coherence pptCache coherence ppt
Cache coherence ppt
ArendraSingh2
 
Introduction to symmetric multiprocessor
Introduction to symmetric multiprocessorIntroduction to symmetric multiprocessor
Introduction to symmetric multiprocessor
myjuni04
 
High_Performance_ComputingforComputers.pptx
High_Performance_ComputingforComputers.pptxHigh_Performance_ComputingforComputers.pptx
High_Performance_ComputingforComputers.pptx
JPrince9
 
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthLeveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Perforce
 
Pdc lecture1
Pdc lecture1Pdc lecture1
Pdc lecture1
SyedSafeer1
 
12-6810-12.ppt
12-6810-12.ppt12-6810-12.ppt
12-6810-12.ppt
Lonewolf379705
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
ssuser5c9d4b1
 
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
Dell EMC World
 
PyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc AltedPyData Paris 2015 - Closing keynote Francesc Alted
PyData Paris 2015 - Closing keynote Francesc Alted
Pôle Systematic Paris-Region
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
MayaData Inc
 
Chap1
Chap1Chap1
Chap1
adisi
 
Preparing OpenSHMEM for Exascale
Preparing OpenSHMEM for ExascalePreparing OpenSHMEM for Exascale
Preparing OpenSHMEM for Exascale
inside-BigData.com
 
Ceg4131 models
Ceg4131 modelsCeg4131 models
Ceg4131 models
anandme07
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]
Akhil Nadh PC
 
module4.ppt
module4.pptmodule4.ppt
module4.ppt
Subhasis Dash
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Nikolay Savvinov
 
chapter-6-multiprocessors-and-thread-level (1).ppt
chapter-6-multiprocessors-and-thread-level (1).pptchapter-6-multiprocessors-and-thread-level (1).ppt
chapter-6-multiprocessors-and-thread-level (1).ppt
harishM874937
 
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
npinto
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi core
mukul bhardwaj
 
Introduction to symmetric multiprocessor
Introduction to symmetric multiprocessorIntroduction to symmetric multiprocessor
Introduction to symmetric multiprocessor
myjuni04
 
High_Performance_ComputingforComputers.pptx
High_Performance_ComputingforComputers.pptxHigh_Performance_ComputingforComputers.pptx
High_Performance_ComputingforComputers.pptx
JPrince9
 
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network BandwidthLeveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Leveraging Structured Data To Reduce Disk, IO & Network Bandwidth
Perforce
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
ssuser5c9d4b1
 
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
MT48 A Flash into the future of storage….  Flash meets Persistent Memory: The...
Dell EMC World
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
MayaData Inc
 
Chap1
Chap1Chap1
Chap1
adisi
 
Preparing OpenSHMEM for Exascale
Preparing OpenSHMEM for ExascalePreparing OpenSHMEM for Exascale
Preparing OpenSHMEM for Exascale
inside-BigData.com
 
Ceg4131 models
Ceg4131 modelsCeg4131 models
Ceg4131 models
anandme07
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]
Akhil Nadh PC
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...Using the big guns: Advanced OS performance tools for troubleshooting databas...
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Nikolay Savvinov
 
Ad

More from Hoang Nguyen (20)

Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your site
Hoang Nguyen
 
How to build a rest api
How to build a rest apiHow to build a rest api
How to build a rest api
Hoang Nguyen
 
Api crash
Api crashApi crash
Api crash
Hoang Nguyen
 
Smm and caching
Smm and cachingSmm and caching
Smm and caching
Hoang Nguyen
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
Hoang Nguyen
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
Hoang Nguyen
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
Hoang Nguyen
 
Cache recap
Cache recapCache recap
Cache recap
Hoang Nguyen
 
Python your new best friend
Python your new best friendPython your new best friend
Python your new best friend
Hoang Nguyen
 
Python language data types
Python language data typesPython language data types
Python language data types
Hoang Nguyen
 
Python basics
Python basicsPython basics
Python basics
Hoang Nguyen
 
Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in python
Hoang Nguyen
 
Learning python
Learning pythonLearning python
Learning python
Hoang Nguyen
 
Extending burp with python
Extending burp with pythonExtending burp with python
Extending burp with python
Hoang Nguyen
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
Hoang Nguyen
 
Object oriented programming using c++
Object oriented programming using c++Object oriented programming using c++
Object oriented programming using c++
Hoang Nguyen
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
Hoang Nguyen
 
Object model
Object modelObject model
Object model
Hoang Nguyen
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
Hoang Nguyen
 
Data abstraction the walls
Data abstraction the wallsData abstraction the walls
Data abstraction the walls
Hoang Nguyen
 
Rest api to integrate with your site
Rest api to integrate with your siteRest api to integrate with your site
Rest api to integrate with your site
Hoang Nguyen
 
How to build a rest api
How to build a rest apiHow to build a rest api
How to build a rest api
Hoang Nguyen
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
Hoang Nguyen
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
Hoang Nguyen
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
Hoang Nguyen
 
Python your new best friend
Python your new best friendPython your new best friend
Python your new best friend
Hoang Nguyen
 
Python language data types
Python language data typesPython language data types
Python language data types
Hoang Nguyen
 
Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in python
Hoang Nguyen
 
Extending burp with python
Extending burp with pythonExtending burp with python
Extending burp with python
Hoang Nguyen
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
Hoang Nguyen
 
Object oriented programming using c++
Object oriented programming using c++Object oriented programming using c++
Object oriented programming using c++
Hoang Nguyen
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
Hoang Nguyen
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
Hoang Nguyen
 
Data abstraction the walls
Data abstraction the wallsData abstraction the walls
Data abstraction the walls
Hoang Nguyen
 
Ad

Recently uploaded (20)

Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 

Optimizing shared caches in chip multiprocessors

  • 1. Core 2 Duo die “Just a few years ago, the idea of putting multiple processors on a chip was farfetched. Now it is accepted and commonplace, and virtually every new high performance processor is a chip multiprocessor of some sort…” Center for Electronic System Design Univ. of California Berkeley Chip Multiprocessors?? “Mowry is working on the development of single-chip multiprocessors: one large chip capable of performing multiple operations at once, using similar techniques to maximize performance” -- Technology Review, 1999 Sony's Playstation 3, 2006
  • 2. CMP Caches: Design Space • Architecture – Placement of Cache/Processors – Interconnects/Routing • Cache Organization & Management – Private/Shared/Hybrid – Fully Hardware/OS Interface “L2 is the last line of defense before hitting the memory wall, and is the focus of our talk”
  • 3. Private L2 Cache I$ D$ I$ D$ L2 $ L2 $ L2 $ L2 $ L2 $ L2 $ I N T E R C O N N E C T Coherence Protocol Offchip Memory + Less interconnect traffic + Insulates L2 units + Hit latency – Duplication – Load imbalance – Complexity of coherence – Higher miss rate L1 L1 Proc
  • 4. Shared-Interleaved L2 Cache – Interconnect traffic – Interference between cores – Hit latency is higher + No duplication + Balance the load + Lower miss rate + Simplicity of coherence I$ D$ I$ D$ I N T E R C O N N E C T Coherence ProtocolL1 L2
  • 5. Take Home Message • Leverage on-chip access time
  • 6. Take Home Messages • Leverage on-chip access time • Better sharing of cache resources • Isolating performance of processors • Place data on the chip close to where it is used • Minimize inter-processor misses (in shared cache) • Fairness towards processors
  • 7. On to some solutions… Jichuan Chang and Gurindar S. Sohi Cooperative Caching for Chip Multiprocessors International Symposium on Computer Architecture, 2006. Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches International Symposium on Computer Architecture, 2009. Shekhar Srikantaiah, Mahmut Kandemir, and Mary Jane Irwin Adaptive Set-Pinning: Managing Shared Caches in Chip Multiprocessors Architectural Support for Programming Languages and Operating, Systems 2008. each handles this problem in a different way
  • 8. Co-operative Caching (Chang & Sohi) • Private L2 caches • Attract data locally to reduce remote on chip access. Lowers average on-chip misses. • Co-operation among the private caches for efficient use of resources on the chip. • Controlling the extent of co-operation to suit the dynamic workload behavior
  • 9. CC Techniques • Cache to cache transfer of clean data – In case of miss transfer “clean” blocks from another L2 cache. – This is useful in the case of “read only” data (instructions) . • Replication aware data replacement – Singlet/Replicate. – Evict singlet only when no replicates exist. – Singlets can be “spilled” to other cache banks. • Global replacement of inactive data – Global management needed for managing “spilling”. – N-Chance Forwarding. – Set recirculation count to N when spilled. – Decrease N by 1 when spilled again, unless N becomes 0.
  • 10. Set “Pinning” -- Setup P1 P2 P3 P4 Set 0 Set 1 : : Set (S-1) L1 cache Processors Shared L2 cache I n t e r c o n n e c t Main Memory
  • 11. Set “Pinning” -- Problem P1 P2 P3 P4 Set 0 Set 1 : : Set (S-1) Main Memory
  • 12. Set “Pinning” -- Types of Cache Misses • Compulsory (aka Cold) • Capacity • Conflict • Coherence • Compulsory • Inter-processor • Intra-processor versus
  • 13. P1 P2 P3 P4 Main Memory POP 1 POP 2 POP 3 POP 4 Set : : Set Owner Other bits Data
  • 14. R-NUCA: Use Class-Based Strategies Solve for the common case! Most current (and future) programs have the following types of accesses 1. Instruction Access – Shared, but Read-Only 2. Private Data Access – Read-Write, but not Shared 3. Shared Data Access – Read-Write (or) Read-Only, but Shared.
  • 15. R-NUCA: Can do this online! • We have information from the OS and TLB • For each memory block, classify it as – Instruction – Private Data – Shared Data • Handle them differently – Replicate instructions – Keep private data locally – Keep shared data globally
  • 16. R-NUCA: Reactive Clustering • Assign clusters based on level of sharing – Private Data given level-1 clusters (local cache) – Shared Data given level-16 clusters (16 neighboring machines), etc. Clusters ≈ Overlapping Sets in Set-Associative Mapping • Within a cluster, “Rotational Interleaving” – Load-Balancing to minimize contention on bus and controller
  • 18. Just Kidding… • Optimize for Power Consumption • Assess trade-offs between more caches and more cores • Minimize usage of OS, but still retain flexibility • Application adaptation to allocated cache quotas • Adding hardware directed thread level speculation
  • 20. Backup • Commercial and research prototypes – Sun MAJC – Piranha – IBM Power 4/5 – Stanford Hydra