Advanced Computer Architecture Fundamentals of Computer Design
Advanced Computer Architecture Fundamentals of Computer Design
Outline
Ajou Univ.
Multimedia
Communications
SOC Lab.
Perfformance (vs. VA
AX-11/780)
??%/year
1000
52%/year
100
10
25%/year
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
VAX
: 25%/year 1978 to 1986
RISC + x86:
86 52%/
52%/year 1986 tto 2002
RISC + x86: ??%/year 2002 to present
Ajou Univ.
Multimedia
Communications
SOC Lab.
Outline
Ajou Univ.
Multimedia
Communications
SOC Lab.
software
instruction set
hardware
Ajou Univ.
Multimedia
Communications
SOC Lab.
Example: MIPS
r0
r1
r31
PC
lo
hi
Programmable storage
Data types ?
2^32 x bytes
Format ?
31 x 32
32-bit
bit GPRs (R0=0)
Addressing Modes?
Operations?
HI, LO, PC
Arithmetic logical
Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU,
AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI
SLL, SRL, SRA, SLLV, SRLV, SRAV
Memory Access
LB, LBU, LH, LHU, LW, LWL,LWR
SB, SH, SW, SWL, SWR
Control
Multimedia
Communications
SOC Lab.
SOFTWARE
-- Organization of Programmable
Storage
g
-- Data Types & Data Structures:
Encodings & Representations
-- Instruction Formats
-- Instruction (or Operation Code) Set
-- Modes of Addressing and Accessing Data Items and Instructions
-- Exceptional Conditions
Ajou Univ.
Multimedia
Communications
SOC Lab.
Ajou Univ.
Multimedia
Communications
SOC Lab.
Ajou Univ.
Multimedia
Communications
SOC Lab.
Outline
Ajou Univ.
10
Multimedia
Communications
SOC Lab.
Ajou Univ.
11
Multimedia
Communications
SOC Lab.
Disks,
Memory,
Network,
Processors
Ajou Univ.
12
Multimedia
Communications
SOC Lab.
Bandwidth:
0.6 MBytes/sec
Latency: 48.3 ms
Cache: none
Ajou Univ.
13
(4X)
(2500X)
(80X)
(60X)
(140X)
(8X)
Multimedia
Communications
SOC Lab.
10000
1000
Relative
BW
100
Improve
ment
Disk
10
(Latency improvement
= Bandwidth improvement)
1
1
10
100
Ajou Univ.
Multimedia
Communications
SOC Lab.
Ajou Univ.
15
Multimedia
Communications
SOC Lab.
10000
1000
Relative
Memory
BW
100
Improve
ment
Disk
10
(Latency improvement
= Bandwidth improvement)
1
1
10
100
(
(latency
y = simple
p operation
p
w/o contention
BW = best-case)
Ajou Univ.
16
Multimedia
Communications
SOC Lab.
Copper core
Ajou Univ.
Ethernet 802.3ae
Year of Standard: 2003
10 000 Mbit
10,000
Mbits/s
/
(1000X)
link speed
Latency: 190 sec
(15X)
S
Switched
media
Category 5 copper wire
17
Twisted Pair:
Multimedia
Communications
SOC Lab.
10000
1000
Network
Relative
Memory
BW
100
Improve
ment
Disk
10
(Latency improvement
= Bandwidth improvement)
1
1
10
100
Ajou Univ.
Multimedia
Communications
SOC Lab.
Ajou Univ.
19
Multimedia
Communications
SOC Lab.
10000
CPU high,
Memory low
(Memory
Wall) 1000
Processor
Network
Relative
Memory
BW
100
Improve
ment
Disk
10
(Latency improvement
= Bandwidth improvement)
1
1
10
100
Ajou Univ.
20
Multimedia
Communications
SOC Lab.
Outline
Ajou Univ.
21
Multimedia
Communications
SOC Lab.
P
Power
dynamic = 1 / 2 Capacitive
C
i i Load
L d Voltage
F
FrequencySSwitched
i h d
V l
For mobile devices, energy better metric
2
Energydynamic
d
i = CapacitiveLoad Voltage
22
Multimedia
Communications
SOC Lab.
C
L d (.
Load
FrequencySSwitched
h d
= 1 / 2 .85 Capacitive
( 85
8 Voltage
l
) F
2
= (.85)3 OldPowerdynamic
0.6 OldPowerdynamic
Ajou Univ.
23
Multimedia
Communications
SOC Lab.
Ajou Univ.
24
Multimedia
Communications
SOC Lab.
IInfrastructure
f
t
t
providers
id
now offer
ff Service
S
i Level
L
l Agreements
A
t (SLA)
to guarantee that their networking or power service would be
dependable
1.
2.
Ajou Univ.
25
Multimedia
Communications
SOC Lab.
Ajou Univ.
26
Multimedia
Communications
SOC Lab.
27
Multimedia
Communications
SOC Lab.
Outline
Ajou Univ.
28
Multimedia
Communications
SOC Lab.
Definition: Performance
Performance is in units of things per sec
bigger is better
If we are primarily concerned with response time
performance(x) =
1
execution_time(x)
=
Performance(Y)
Ajou Univ.
Execution_time(Y)
29
Execution_time(X)
Multimedia
Communications
SOC Lab.
CPU only
only, split between integer and floating point programs
SPECint2000 has 12 integer, SPECfp2000 has 14 integer pgms
SPECCPU2006 to be announced Spring 2006
SPECSFS (NFS file server) and SPECWeb (WebServer) added as server
benchmarks
Ajou Univ.
Multimedia
Communications
SOC Lab.
Ajou Univ.
31
Multimedia
Communications
SOC Lab.
ExecutionT
E
Timereference
SPECRatio A
ExecutionTime A
1.25 =
=
SPECRatioB ExecutionTimereference
ExecutionTimeB
ExecutionTimeB Performance A
=
=
ExecutionTime A Performanc
f
eB
Note that when comparing 2 computers as a ratio, execution times
on the reference computer drop out, so choice of reference
computer is irrelevant
Ajou Univ.
32
Multimedia
Communications
SOC Lab.
GeometricMean = n
SPECRatio
i =1
Ajou Univ.
33
Multimedia
Communications
SOC Lab.
1 n
Ajou Univ.
34
Multimedia
Communications
SOC Lab.
Ajou Univ.
35
Multimedia
Communications
SOC Lab.
SPE
ECfpRatio
o
12000
10000
GM = 2712
GSTEV = 1.98
8000
6000
5362
4000
2712
2000
1372
Ajou Univ.
36
apsi
sixt rack
fm
ma3d
lu
ucas
am
mmp
faccerec
equ
uake
art
ga
algel
mesa
m
applu
a
mgrid
m
sswim
wupw
wise
Multimedia
Communications
SOC Lab.
SPE
ECfpRatio
o
12000
10000
GM = 2086
GSTEV = 1.40
8000
6000
4000
2911
2086
1494
2000
Ajou Univ.
37
apsi
sixttrack
fm
ma3d
lu
ucas
am
mmp
faccerec
equ
uake
art
ga
algel
mesa
m
applu
a
mgrid
m
sswim
wupwise
Multimedia
Communications
SOC Lab.
Outline
Ajou Univ.
38
Multimedia
Communications
SOC Lab.
Ajou Univ.
39
Multimedia
Communications
SOC Lab.
Ajou Univ.
Ifetch
DMem
Reg
DMem
Reg
DMem
Reg
ALU
Reg
ALU
O
r
d
e
r
Ifetch
ALU
I
n
s
t
r.
ALU
C l 1 Cycle
Cycle
C l 2 Cycle
C l 3 Cycle
C l 4 Cycle
C l 5 Cycle
C l 6 Cycle
C l 7
Ifetch
Ifetch
40
Reg
Reg
Reg
DMem
Reg
Multimedia
Communications
SOC Lab.
Limits to pipelining
Hazards prevent next instruction from executing during its
designated clock cycle
Structural hazards: attempt to use the same hardware to do two
different things at once
Data hazards: Instruction depends on result of prior instruction still in
the pipeline
Control hazards: Caused by delay between the fetching of instructions
and decisions about changes in control flow (branches and jumps).
Reg
DMem
Ifetch
Reg
DMem
Ifetch
Reg
ALU
DMem
Ifetch
Reg
A
ALU
Ifetch
ALU
U
I
n
s
t
r.
ALU
Ti
Time
((clock
l k cycles)
l )
O
r
d
e
r
Ajou Univ.
41
Reg
Reg
Reg
DMem
Reg
Multimedia
Communications
SOC Lab.
Ajou Univ.
MEM
42
Multimedia
Communications
SOC Lab.
Capacity
p
y
Access Time
Cost
CPU Registers
100s Bytes
300 500 ps (0.3-0.5
(0 3-0 5 ns)
L1 and L2 Cache
10s-100s K Bytes
~1 ns - ~10 ns
$1000s/ GByte
Staging
Xfer Unit
I t O
Instr.
Operands
d
L1 Cache
Blocks
Disk
D
s
10s T Bytes, 10 ms
(10,000,000 ns)
~ $1 / GByte
Tape
iinfinite
fi i
sec-min
~$1 / GByte
Ajou Univ.
prog./compiler
1-8 bytes
f t
faster
cache cntl
y
32-64 bytes
L2 Cache
Blocks
Main Memory
G Bytes
80ns- 200ns
~ $100/ GByte
Upper Level
Registers
cache cntl
64-128 bytes
Memory
Pages
OS
4K-8K bytes
Files
user/operator
Mbytes
Disk
Tape
Larger
Lower Level
43
Multimedia
Communications
SOC Lab.
Frequent case is often simpler and can be done faster than the
infrequent case
E.g., overflow is rare when adding 2 numbers, so improve performance
by optimizing more common case of no overflow
May slow down overflow, but overall performance improved by
optimizing for the normal case
What is frequent
q
case and how much performance
p
improved
p
by
y
making case faster => Amdahls Law
Ajou Univ.
44
Multimedia
Communications
SOC Lab.
4) Amdahls Law
ExTimenew
Fraction enhanced
(
)
= ExTimeold 1 Fractionenhanced +
Speedup
p
p
enhanced
Speedupoverall =
ExTimeold
ld
=
ExTimenew
(1 Fractionenhanced ) +
Fraction enhanced
Speedupenhanced
Ajou Univ.
1
(1 - Fractionenhanced )
45
Multimedia
Communications
SOC Lab.
Speedup overall =
1
Fraction enhanced
(1 Fraction enhanced ) +
Speedup enhanced
1
1
=
= 1.56
=
0.4 0.64
(1 0.4) +
10
Apparently, its human nature to be attracted by 10X
faster, vs. keeping in perspective its just 1.6X faster
Ajou Univ.
46
Multimedia
Communications
SOC Lab.
= Seconds
= Instructions x
Program
Program
CPI
Program
Compiler
(X)
Inst. Set.
X
X
Technolog
Technology
Ajou Univ.
Cycle time
x Seconds
Instruction
Inst Count
X
Organization
Cycles
CPI
Cycle
Clock Rate
X
X
47
Multimedia
Communications
SOC Lab.
Latch
L
t h
or
register
combinational
logic
Ajou Univ.
48
Multimedia
Communications
SOC Lab.