Cache Conclusion and I/O Introduction: Storage Devices, Metrics, & Productivity
Cache Conclusion and I/O Introduction: Storage Devices, Metrics, & Productivity
DAP.F96 1
Architecture
How to turn high memory bandwidth into performance?
Vector?
Extensive Prefetching?
DAP.F96 2
DAP.F96 3
Alpha 21064
Separate Instr & Data
TLB & Caches
TLBs fully associative
TLB updates in SW
(Priv Arch Libr)
Caches 8KB direct
mapped, write thru
Critical 8 bytes first
Prefetch instr. stream
buffer
2 MB L2 cache, direct
mapped, WB (off-chip)
256 bit path to main
memory, 4 x 64-bit
modules
Victim Buffer: to give
read priority over write
4 entry write buffer
between D$ & L2$
Instr
Data
Write
Buffer
Stream
Buffer
Victim Buffer
DAP.F96 5
Miss Rate
10.00%
Su2cor
Spice
Nasa7
Mdljp2
Hydro2d
Wave5
Alvinn
Tomcatv
Doduc
Ear
Swm256
Fpppp
Mdljsp2
Ora
Gcc
Compress
Sc
Li
Eqntott
Espresso
I$ miss = 6%
D$ miss = 32%
L2 miss = 10%
TPC-B (db2)
TPC-B (db1)
AlphaSort
100.00%
1.00%
I $
8K
D$
8K
L2
2M
0.10%
I$ miss = 2%
D$ miss = 13%
L2 miss = 0.6%
0.01%
I$ miss = 1%
D$ miss = 21%
L2 miss = 0.3%
DAP.F96 6
L2
I$
2.50
D$
2.00
I Stall
1.50
Other
1.00
0.50
Hydro2d
Mdljp2
Wave5
Tomcatv
Alvinn
Doduc
Swm256
Ear
Fpppp
Ora
Mdljsp2
Compress
Gcc
Sc
Eqntott
Li
Espresso
TPC-B (db1)
TPC-B (db2)
0.00
AlphaSort
CPI
3.00
DAP.F96 7
30%
D: gcc
D: espresso
25%
I: gcc
20%
I: espresso
I: tomcatv
15%
10%
5%
0%
1
16
32
64
128
DAP.F96 8
Cummlati
3.5
ve
3
Average
Memory 2.5
Access
2
Time
1.5
1
I$ = 4 KB, B=16B
0 1 2 3 4 5 6 7 8 9 1011 12
D$ = 4 KB, B=16B
L2 = 512 KB, B=128B
Instructions Executed (billions)
MP = 12, 200
DAP.F96
I/O bottleneck:
Diminishing fraction of time in CPU
Diminishing value of faster CPUs
DAP.F96 10
I/O Systems
Processor
interrupts
Cache
I/O
Controller
Disk
Disk
I/O
Controller
I/O
Controller
Graphics
Network
Technology Trends
CPU Performance
Mini:
40% increase
per year
RISC:
100% increase
per year
DRAM Capacity
doubles every 3
years
DAP.F96 13
Technology Trends
Disk Capacity
doubles every
3 years
The I/O
GAP
Data utilities
high capacity, hierarchically managed storage
DAP.F96 15
Historical Perspectives
1956 IBM Ramac early 1970s Winchester
Developed for mainframe computers
proprietary interfaces
DAP.F96 16
Historical Perspective
1970s developments
5.25 inch floppy disk formfactor
download microcode into mainframe
DAP.F96 17
Historical Perspective
Early 1980s
PCs and first generation workstations
Mid 1980s
Client/server computing
Centralized storage on file server
accelerates disk downsizing
8 inch to 5.25 inch
Historical Perspective
Late 1980s/Early 1990s:
Laptops, notebooks, palmtops
3.5 inch, 2.5 inch, 1.8 inch formfactors
Formfactor plus capacity drives market, not
performance
Challenged by DRAM, flash RAM in PCMCIA cards
still expensive, Intel promises but doesnt deliver
unattractive MBytes per cubic inch
DAP.F96 19
Historical Perspective
$30,000
30000
$25,000
Semiconductor Memory
Revenue, millions
25000
World Population,
millions
15000
$20,000
$15,000
$10,000
20000
10000
$5,000
5000
$0
1992
1990
1988
1986
1984
1982
1980
1978
1976
1974
1972
1970
Mega
Dollars
Mega
People
Year
DAP.F96 20
Historical Perspectives
9000.0
8000.0
7000.0
6000.0
5000.0
4000.0
3000.0
2000.0
1000.0
0.0
TBytes
Disk, Terabytes
30000
Memory, Terabytes
25000
World Population,
millions
20000
15000
10000
5000
0
1988
1989
1990
Year
1991
1992
Mega
People
DAP.F96 21
CS 252 Administrivia
Midterm Quiz Wednesday October 8
5:45 - 8:45 PM in 306 Soda
2 sheets with notes
Chapters 4, 5, and Ap B + Lectures
DAP.F96 22
BPI
TPI
12000
22860
104
38
92
3000
minutes
seconds
45 secs
20 secs
15 secs?
1638
1870
71
114
492
183
1880
2235
63
62
3000
4250
18796
454
88
24130
18 ms
20 ms
100 ms
DAP.F96 23
Track
Sector
Characteristics:
Cylinder
positional latency
rotational latency
Transfer rate
Capacity
Gigabytes
Quadruples every 3 years
(aerodynamics)
Head
Platter
Response time
= Queue + Controller + Seek + Rot + Xfer
Service time
DAP.F96 24
Disk Latency = Queuing Time + Seek Time + Rotation Time + Xfer Time
Order of magnitude times for 4K byte transfers:
Seek: 12 ms or less
Rotate: 4.2 ms @ 7200 rpm (8.3 ms @ 3600 rpm )
Xfer: 1 ms @ 7200 rpm (2 ms @ 3600 rpm)
DAP.F96 25
Advantages of Small
Formfactor Disk Drives
Low cost/MB
High MB/volume
High MB/watt
Low cost/Actuator
Cost and Environmental Efficiencies
DAP.F96 26
vs.
removable long strips wound on spool
(sequential access, "unlimited" length, multiple / reader)
DAP.F96 27
R-DAT Technology
2000 RPM
DAP.F96 29
Helical Scan
Tape
Type
5.25"
8mm
Capacity
0.75 GB
5 GB
$8
$3,000
Access
Robot Time 10 - 20 s
10 - 20 s
DAP.F96 31
STC 4400
8 feet
10 feet
6000 x 0.8 GB 3490 tapes = 5 TBytes in 1992
$500,000 O.E.M. Price
6000 x 20 GB D3 tapes = 120 TBytes in 1994
1 Petabyte (1024 TBytes) in 2000
DAP.F96 32
9.1 GB
3.5
4.3 GB
2.5
514 MB
1.1 GB
$2129
$1985
$1199
$999
$299
$345
$0.23/MB
$0.22/MB
$0.27/MB
$0.23/MB
$0.58/MB
$0.33/MB
$1695+199
$1499+189
$0.41/MB
$0.39/MB
$700
$1300
$3600
$175/MB
$32/MB
$20.50/MB
Optical Disks
5.25
4.6 GB
PCMCIA Cards
Static RAM
Flash RAM
4.0 MB
40.0 MB
175 MB
DAP.F96 33
DAP.F96 34
Metrics:
Response Time
Throughput
Response
Time (ms)
200
100
100%
0%
Throughput
(% total BW)
Queue
Proc
IOC
Device
2nd transaction
DAP.F96 36
entry
graphics
1.0s
0.00
5.00
resp
10.00
think
15.00
Time
0.7sec off response saves 4.9 sec (34%) and 2.0 sec
(70%) total time per transaction => greater productivity
Another study: everyone gets more done with faster
response, but novice with fast response = expert with
DAP.F96 37
slow
Controller overhead is 2 ms
Assume that disk is idle so no queuing delay
What is Average Disk Access Time for a Sector?
Ave seek + ave rot delay + transfer time + controller overhead
12 ms + 0.5/(7200 RPM/60) + 8 KB/4 MB/s + 2 ms
12 + 4.15 + 2 + 2 = 20 ms
INtroduction To Queueing
Theory
Arrivals
Departures
server
IOC
Device
Littles Law: Lq = r x Tq
Mean number customers = arrival rate x mean service
DAP.F96 40
time
server
IOC
Device
server
IOC
Device
DAP.F96 42
server
IOC
Device
DAP.F96 43
Tw = Ts x u x (1 + C) / (2 x (1 u))
Notation:
r
Ts
u
Tw
Lw
Notation:
r
Ts
u
Tw
Tq
Lw
Lq
DAP.F96 46
Notation:
r
Ts
u
Tw
Tq
Lw
Lq
DAP.F96 47
Notation:
r
Ts
u
Tw
Tq
Lw
Lq