HPCA - Recap and Moving Forward: Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
HPCA - Recap and Moving Forward: Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
2
What is Computer Architecture?
1
shall study later
2
Ref: Hen Pat books
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
Addressing modes
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
ps are primarily shipped in DIMM modules, it is harder to track
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
y, as DRAM manufacturers typically offer several capacity prod-
ame time to match DIMM capacity. Capacity per DRAM chip has
y about 25% to 40% per year recently, doubling roughly every
years.Technology
This technology Trends is the foundation of main memory, and
t in Chapter ◮ 2. Note that density
Transistor the rateincreases
of improvement
by about 35% haspercontinued to
year. Increases
he editions of this book,
in die size :as10%Figure
to 20%1.8per
shows.
year. TheThere is even
combined con-is a
effect
ther the growthgrowth rate willrate stop in the middle
in transistor count onof this of
a chip decade
about due 40% to to 55%
ng difficulty ofperefficiently manufacturing
year, or doubling every 18 to even smaller
24 months. ThisDRAM trend is
2005]. Chapter 2 mentions several other technologies that may
popularly known as Moore’s law.
AM if it hits a capacity wall.
◮ DRAM - the rate of improvement has continued to slow
ge in rate
HPCAof improvement
- recap and moving forwardin DRAM capacity over Dey,
Soumyajit time. TheProfessor,
Assistant first twoCSE, IIT Kharagpur
lled this rate the DRAM Growth Rule of Thumb, since it had been so
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
Technology Trends
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
Performance Trends
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
100,000
Microprocessor
10,000
Relative bandwidth improvement
Network
1000
Memory
100 Disk
10
(Latency improvement
= bandwidth improvement)
1
1 10 100
Relative latency improvement
Figure 1.9 Log–log plot of bandwidth and latency milestones from Figure 1.10 rela-
tive to the first milestone. Note that latency improved 6X to 80X while bandwidth
HPCA - recap and
improved moving
about forward
300X Soumyajit
to 25,000X. Updated from Dey, Assistant
Patterson [2004].Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
◮ f : switching frequency
◮ C : capacitive load driven by the transistor
◮ VDD : operating voltage
The first microprocessors consumed less than a watt and the first
32-bit microprocessors used about 2 watts, while a 3.3 GHz Intel
Core i7 consumes 130 watts. Given that this heat must be
dissipated from a chip that is about 1.5 cm on a side, we have
reached the limit of what can be cooled by air.
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
dissipated from a chip that is about 1.5 cm on a side, we have reached the limit
of what can be cooled by air.
Classic 5 stage RISC pipeline Theequation
Given the Memory above,
Hierarchy
you would expect Instruction Level growth
clock frequency Parallelism
to (ILP)
slow down if we can’t reduce voltage or increase power per chip. Figure 1.11
shows that this has indeed been the case since 2003, even for the microproces-
sors in Figure 1.1 that were the highest performers each year. Note that this
Clock rate flattening out period of flat clock rates corresponds to the period of slow performance
improvement range in Figure 1.1.
10,000
Intel Pentium4 Xeon Intel Nehalem Xeon
3200 MHz in 2003 3330 MHz in 2010
Sun-4 SPARC
10
16.7 MHz in 1986
Digital VAX-11/780
5 MHz in 1978
15%/year
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012
Figure 1.11 Growth in clock rate of microprocessors in Figure 1.1. Between 1978 and 1986, the clock rate improved
less than 15% per year while performance improved by 25% per year. During the “renaissance period” of 52% perfor-
HPCA - recap
mance and moving forward
improvement per year between 1986 and 2003, clockSoumyajit
rates shot upDey, Assistant
almost 40% perProfessor, CSE,the
year. Since then, IITclock
Kharagpur
rate has been nearly flat, growing at less than 1% per year, while single processor performance improved at less than
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
Optimizations
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
Dependability
◮ MTTF = 1
failure rate(λ)
◮ Availability = MTTF MTTF
+MTTR
◮ The reciprocal of MTTF is a rate of failures, generally
reported as failures per billion hours of operation, or FIT (for
failures in time).
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
MTTF
Assume a disk subsystem with the following components and MTTF: 10
disks, each rated at 1,000,000-hour MTTF; 1 ATA controller,
MTTF; 1 fan,■ 35
1.7 Dependability
500,000-hour MTTF; 1 power supply, 200,000-hour
200,000-hour MTTF; 1 ATA cable, 1,000,000-hour MTTF
or 23,000 FIT. The MTTF for the system is just the inverse of the failure rate:
1 1,000,000,000 hours
MTTFsystem = ---------------------------------------- = -------------------------------------------------- = 43,500 hours
Failure rate system 23,000
or just under 5 years.
Working assumption : system is in series
HPCA - recap and moving forward
The primary way to cope with failure isSoumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
redundancy, either in time (repeat the
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
Section 1
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
CISC vs RISC
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
PCSrc
4 M
U
X
ADD
Shift
ADD left 2
ALU operation
MemRead
R Reg 1
Data 1 MemToReg
PC Address Address Data
Instruction Zero
R Reg 2
ALUSrc
Instruction Registers Data Memory M
W Reg Result U
Data 2 M X
U Write data
Instruction
Memory Data X
MemWrite
RegWrite
Sign
Extend
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
Multi-cycle instructions
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
Pipelining
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
Structural hazard
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
◮ sub $2, $1, $3; and $12, $2, $5 Read after Write
(RAW)
◮ if ‘sub’ is in IF stage in i + 1-th clock cycle, $2 is updated in
(i + 5)-th cycle
◮ ‘and’ is in EX stage in i + 4-th cycle, updated value of $2 is
not yet ready
◮ Solution : ‘sub’ computes the value for $2 in (i + 3)-th stage,
◮ this may be forwarded directly to execution of ‘and’
◮ need suitable logic to detect hazard and forwarding
requirement
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
Control hazards
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur