0% found this document useful (0 votes)
33 views

HPCA - Recap and Moving Forward: Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur

This document discusses several key topics in computer architecture: 1. It defines computer architecture as encompassing both instruction set design and implementation details. 2. It covers different types of instruction set architectures including addressing modes like register, displacement, immediate, and PC relative addressing. 3. It discusses technology trends in semiconductors like transistor density improvements, DRAM capacity growth slowing over time, flash memory and magnetic disk density/cost trends. 4. It examines performance trends with bandwidth growing faster than latency improvements and the relationship between the two. 5. It covers energy consumption in microprocessors and how dynamic power is related to capacitance, voltage, and frequency. Heat dissipation limits on-

Uploaded by

Sunil Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

HPCA - Recap and Moving Forward: Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur

This document discusses several key topics in computer architecture: 1. It defines computer architecture as encompassing both instruction set design and implementation details. 2. It covers different types of instruction set architectures including addressing modes like register, displacement, immediate, and PC relative addressing. 3. It discusses technology trends in semiconductors like transistor density improvements, DRAM capacity growth slowing over time, flash memory and magnetic disk density/cost trends. 4. It examines performance trends with bandwidth growing faster than latency improvements and the relationship between the two. 5. It covers energy consumption in microprocessors and how dynamic power is related to capacitance, voltage, and frequency. Heat dissipation limits on-

Uploaded by

Sunil Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

HPCA - recap and moving forward

Soumyajit Dey, Assistant Professor,


CSE, IIT Kharagpur

March 21, 2021

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

2
What is Computer Architecture?

◮ Several years ago, the term computer architecture often


referred only to instruction set design. Implementation is
considered separately. Leaves out much critical details
◮ What people advocate : think of the instruction set
architecture (ISA) as boundary of HW and SW
◮ ISA : the programmer/compiler- visible instruction set
◮ Kind of ISA ; are instructions complex/simple in function, are
they fast/slow to execute ?
◮ Kind of ISA : are instructions able to access memory directly ?
Is access aligned 1 ??

1
shall study later
2
Ref: Hen Pat books
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Addressing modes

◮ Register – get data in register and register (e.g. $r1) encoding


is part of instruction. (direct addressing). Ex: add $t0 $t1 $t2
◮ Displacement – indirect addressing. Ex: lw rd, i(rb),
◮ Immediate – Ex: addi $t1 $t1 1. Operand (limited by bit
width) is a constant within the instruction itself.
◮ PC relative – address is the sum of the program counter and a
constant in the instruction.

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
ps are primarily shipped in DIMM modules, it is harder to track
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)
y, as DRAM manufacturers typically offer several capacity prod-
ame time to match DIMM capacity. Capacity per DRAM chip has
y about 25% to 40% per year recently, doubling roughly every
years.Technology
This technology Trends is the foundation of main memory, and
t in Chapter ◮ 2. Note that density
Transistor the rateincreases
of improvement
by about 35% haspercontinued to
year. Increases
he editions of this book,
in die size :as10%Figure
to 20%1.8per
shows.
year. TheThere is even
combined con-is a
effect
ther the growthgrowth rate willrate stop in the middle
in transistor count onof this of
a chip decade
about due 40% to to 55%
ng difficulty ofperefficiently manufacturing
year, or doubling every 18 to even smaller
24 months. ThisDRAM trend is
2005]. Chapter 2 mentions several other technologies that may
popularly known as Moore’s law.
AM if it hits a capacity wall.
◮ DRAM - the rate of improvement has continued to slow

DRAM growth Characterization of impact


Year rate on DRAM capacity
1990 60%/year Quadrupling every 3 years
1996 60%/year Quadrupling every 3 years
2003 40%–60%/year Quadrupling every 3 to 4 years
2007 40%/year Doubling every 2 years
2011 25%–40%/year Doubling every 2 to 3 years

ge in rate
HPCAof improvement
- recap and moving forwardin DRAM capacity over Dey,
Soumyajit time. TheProfessor,
Assistant first twoCSE, IIT Kharagpur
lled this rate the DRAM Growth Rule of Thumb, since it had been so
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Technology Trends

◮ Semiconductor Flash - Nonvolatile. Flash memory is 15 to 20


times cheaper per bit than DRAM. Capacity double roughly
twice per year
◮ Magnetic disk - 15 to 25 times cheaper per bit than Flash.
Density doubles every three years. Central to server and
warehouse scale storage.

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Performance Trends

◮ Bandwidth or throughput is the total amount of work done in


a given time, such as megabytes per second for a disk transfer.
◮ Latency or response time is the time between the start and the
completion of an event, such as milliseconds for a disk access.
◮ Simple rule of thumb is that bandwidth grows by at least the
square of the improvement in latency.

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Performance Trends 1.4 Trends in Technology ■ 19

100,000

Microprocessor
10,000
Relative bandwidth improvement

Network

1000
Memory

100 Disk

10
(Latency improvement
= bandwidth improvement)

1
1 10 100
Relative latency improvement

Figure 1.9 Log–log plot of bandwidth and latency milestones from Figure 1.10 rela-
tive to the first milestone. Note that latency improved 6X to 80X while bandwidth
HPCA - recap and
improved moving
about forward
300X Soumyajit
to 25,000X. Updated from Dey, Assistant
Patterson [2004].Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Energy and Power consumption in Microprocessor

Transistor dynamic power consumption


◮ Powerdyn = C × VDD 2 ×f

◮ f : switching frequency
◮ C : capacitive load driven by the transistor
◮ VDD : operating voltage
The first microprocessors consumed less than a watt and the first
32-bit microprocessors used about 2 watts, while a 3.3 GHz Intel
Core i7 consumes 130 watts. Given that this heat must be
dissipated from a chip that is about 1.5 cm on a side, we have
reached the limit of what can be cooled by air.

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
dissipated from a chip that is about 1.5 cm on a side, we have reached the limit
of what can be cooled by air.
Classic 5 stage RISC pipeline Theequation
Given the Memory above,
Hierarchy
you would expect Instruction Level growth
clock frequency Parallelism
to (ILP)
slow down if we can’t reduce voltage or increase power per chip. Figure 1.11
shows that this has indeed been the case since 2003, even for the microproces-
sors in Figure 1.1 that were the highest performers each year. Note that this

Clock rate flattening out period of flat clock rates corresponds to the period of slow performance
improvement range in Figure 1.1.

10,000
Intel Pentium4 Xeon Intel Nehalem Xeon
3200 MHz in 2003 3330 MHz in 2010

Intel Pentium III


1000 MHz in 2000
1000
Digital Alpha 21164A
500 MHz in 1996
1%/year
Clock rate (MHz)

Digital Alpha 21064


150 MHz in 1992
100
MIPS M2000
25 MHz in 1989
40%/year

Sun-4 SPARC
10
16.7 MHz in 1986

Digital VAX-11/780
5 MHz in 1978
15%/year
1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012

Figure 1.11 Growth in clock rate of microprocessors in Figure 1.1. Between 1978 and 1986, the clock rate improved
less than 15% per year while performance improved by 25% per year. During the “renaissance period” of 52% perfor-
HPCA - recap
mance and moving forward
improvement per year between 1986 and 2003, clockSoumyajit
rates shot upDey, Assistant
almost 40% perProfessor, CSE,the
year. Since then, IITclock
Kharagpur
rate has been nearly flat, growing at less than 1% per year, while single processor performance improved at less than
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Optimizations

Modern microprocessors try to improve energy efficiency despite


flat clock rates and constant supply voltages
◮ Clock gating: if no floating-point instructions are executing,
the clock of the floating-point unit is disabled. If some cores
are idle, their clocks are stopped.
◮ Dynamic Voltage-Frequency Scaling (DVFS)
◮ Mode based optimizations : DRAMs have a series of
increasingly lower power modes to extend battery life. Disk
can spin slowly when idle.

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Dependability

◮ MTTF = 1
failure rate(λ)
◮ Availability = MTTF MTTF
+MTTR
◮ The reciprocal of MTTF is a rate of failures, generally
reported as failures per billion hours of operation, or FIT (for
failures in time).

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

MTTF
Assume a disk subsystem with the following components and MTTF: 10
disks, each rated at 1,000,000-hour MTTF; 1 ATA controller,
MTTF; 1 fan,■ 35
1.7 Dependability
500,000-hour MTTF; 1 power supply, 200,000-hour
200,000-hour MTTF; 1 ATA cable, 1,000,000-hour MTTF

Answer The sum of the failure rates is


1 1 1 1 1
Failure rate system = 10 × ------------------------ + ------------------- + ------------------- + ------------------- + ------------------------
1,000,000 500,000 200,000 200,000 1,000,000
10 + 2 + 5 + 5 + 1 23 23,000
= ------------------------------------------- = ------------------------ = --------------------------------------------------
1,000,000 hours 1,000,000 1,000,000,000 hours

or 23,000 FIT. The MTTF for the system is just the inverse of the failure rate:
1 1,000,000,000 hours
MTTFsystem = ---------------------------------------- = -------------------------------------------------- = 43,500 hours
Failure rate system 23,000
or just under 5 years.
Working assumption : system is in series
HPCA - recap and moving forward
The primary way to cope with failure isSoumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
redundancy, either in time (repeat the
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

MTTF for redundant system

λred = rate of first failure × rate of next failure before repair


MTTR
= 2λ ×
MTTF
2 MTTR
⇒ MTTFred = ×
MTTF MTTF

◮ if we assume it takes on average 24 hours for a human


operator to notice that a power supply has failed and replace
it, the reliability of the fault tolerant pair of power supplies is
2
= 200000
2×24 ∼ 830, 000, 000 which is ∼ 4150 times more reliable
than a single power supply.

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Section 1

Classic 5 stage RISC pipeline

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Basic RISC architecture

◮ The operation of a processor is characterized by a fetch⇒


decode⇒execute cycle.
◮ RISC n CISC ⇒ two different philosophies of computing
hardware design
◮ RISC/CISC - Reduced/Complex Instruction Set Computing
◮ CISC approach - complete a task with as few instructions
(instrs) as possible
◮ A CISC instruction : MUL addr1 addr2 addr3
◮ Equivalent RISC : LOAD R2 addr2 ; LOAD R3 addr3 ; MUL R1
R2 R3; STORE addr1 R1

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

CISC vs RISC

CISC features RISC features

◮ Older ISA ◮ Ideas emerged in 1980s


◮ Multi-cycle instructions, HW ◮ Single-cycle instructions, SW
intensive design intensive design
◮ Efficient RAM usage ◮ Heavy RAM usage, Large
Register file
◮ Instructions - complex and
◮ Small no. of simple fixed
variable length, lots of them
length instructions
◮ Micro-code support
◮ Less no. of addressing
◮ Compound addressing modes
modes

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Elementary CPU Datapath

PCSrc

4 M
U
X
ADD
Shift
ADD left 2

ALU operation
MemRead
R Reg 1
Data 1 MemToReg
PC Address Address Data
Instruction Zero
R Reg 2
ALUSrc
Instruction Registers Data Memory M
W Reg Result U
Data 2 M X
U Write data
Instruction
Memory Data X
MemWrite
RegWrite

Sign
Extend

◮ The datapath ‘fetches’ instruction, ‘decodes’ and ‘executes’ it


◮ Control logic generates suitable activation signals
◮ Executes different instructions with variable delays
HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Single cycle implementation of datapath

◮ The choice of clock rate is limited by the instruction with


maximum delay
◮ Options : choose the clock period more than latency of
‘slowest’ instruction or,
◮ choose variable periods for diff instructions – not practical !
◮ Alternate possibility - break the instruction execution cycle
into a series of basic steps
◮ Basic steps have less delay, choose a fast clock and use it to
execute one basic step at a time

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Multi-cycle instructions

A basic stage represents one of the following states in the execution


of an instruction
◮ Fetch (IF): IR ⇐ Memory[PC]; PC=PC+4
◮ Decode (ID): Understand instruction semantics
◮ Execute (EX): based on instruction type
◮ Arithmetic/logical operation, Mem address / Branch condition
computation
◮ Memory (MEM): For load/store Instr, read/write
data from/to memory
◮ Writeback (WB): Update register file

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Pipelining

◮ Operate IF→ID→EX→MEM→WB in parallel for a sequence


of instructions
◮ Every basic stage is always processing some instruction
◮ In every clock cycle, one instruction completes - ideal scenario
◮ Practical issues - pipeline hazards

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Structural hazard

◮ Consider a sequence of 4 lw (load-word) instructions


◮ When the first instruction fetches data from memory, the
fourth instruction itself is to be fetched from memory
◮ This is structural hazard as the pipeline needs to stall due to
lack of resources, if the hardware cannot support multiple
reads in parallel

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Data Hazard : MIPS example

◮ sub $2, $1, $3; and $12, $2, $5 Read after Write
(RAW)
◮ if ‘sub’ is in IF stage in i + 1-th clock cycle, $2 is updated in
(i + 5)-th cycle
◮ ‘and’ is in EX stage in i + 4-th cycle, updated value of $2 is
not yet ready
◮ Solution : ‘sub’ computes the value for $2 in (i + 3)-th stage,
◮ this may be forwarded directly to execution of ‘and’
◮ need suitable logic to detect hazard and forwarding
requirement

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur
Classic 5 stage RISC pipeline The Memory Hierarchy Instruction Level Parallelism (ILP)

Control hazards

◮ Branch decisions : the branch condition needs evaluation (beq


$1, $2, offset)
◮ The branch decision is inferred only in MEM stage
◮ Optimization : assume branch not taken, operate pipeline
normally,
◮ Execute branch when decision is evaluated as true (taken) and
flush intermediate instructions from pipeline
◮ Sophisticated schemes : use branch prediction HW (predict a
branch decision based on branch history table content)

HPCA - recap and moving forward Soumyajit Dey, Assistant Professor, CSE, IIT Kharagpur

You might also like