Elements of Cache Design Pentium IV Cache Organization
Elements of Cache Design Pentium IV Cache Organization
design
Pentium IV cache
organization
Addressing
Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Checking cache for data takes time
Cache of 64kByte
Cache block of 4 bytes
(224=16M)
Tag s-r
8
Line or Slot r
Word w
14
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag field
Check contents of cache by finding line and checking Tag
Simple
Inexpensive
Fixed location for given block
Tag 22 bit
Word
2 bit
Tag 9 bit
Set 13 bit
Word
2 bit
No choice
Each block only maps to one line
Replace that line
Random
Larger blocks
8k bytes
64 byte lines
four way set associative
L2 cache
L3 cache on chip
Problem
Solution
Processoronwhichfeature
firstappears
Externalmemoryslowerthanthesystembus.
Addexternalcacheusingfaster
memorytechnology.
386
Increasedprocessorspeedresultsinexternalbusbecominga
bottleneckforcacheaccess.
Moveexternalcacheonchip,
operatingatthesamespeedasthe
processor.
486
Internalcacheisrathersmall,duetolimitedspaceonchip
AddexternalL2cacheusingfaster
technologythanmainmemory
486
Createseparatedataandinstruction
caches.
Pentium
Createseparatebacksidebusthat
runsathigherspeedthanthemain
(frontside)externalbus.TheBSBis
dedicatedtotheL2cache.
PentiumPro
ContentionoccurswhenboththeInstructionPrefetcherand
theExecutionUnitsimultaneouslyrequireaccesstothe
cache.Inthatcase,thePrefetcherisstalledwhilethe
ExecutionUnitsdataaccesstakesplace.
Increasedprocessorspeedresultsinexternalbusbecominga
bottleneckforL2cacheaccess.
Someapplicationsdealwithmassivedatabasesandmust
haverapidaccesstolargeamountsofdata.Theonchip
cachesaretoosmall.
MoveL2cacheontotheprocessor
chip.
PentiumII
AddexternalL3cache.
PentiumIII
MoveL3cacheonchip.
Pentium4
Fetch/Decode Unit
Fetches instructions from L2 cache
Decode into micro-ops
Store micro-ops in L1 cache
Execution units
Execute micro-ops
Data from L1 cache
Results in registers
Memory subsystem
L2 cache and systems bus
Core
Cache
Type
CacheSize(kB)
CacheLineSize
(words)
Associativity
Location
WriteBuffer
Size(words)
ARM720T
Unified
4way
Logical
ARM920T
Split
16/16D/I
64way
Logical
16
ARM926EJS
Split
4128/4128D/I
4way
Logical
16
ARM1022E
Split
16/16D/I
64way
Logical
16
ARM1026EJS
Split
4128/4128D/I
4way
Logical
IntelStrongARM
Split
16/16D/I
32way
Logical
32
IntelXscale
Split
32/32D/I
32way
Logical
32
ARM1136JFS
Split
464/464D/I
4way
Physical
32
Manufacturer sites
Intel
ARM
Search on cache