0% found this document useful (0 votes)

135 views

Chapter 5 - The Processor: Datapath and Control

This document summarizes key aspects of a processor's datapath and control unit. It describes the goals of processor implementation including balancing instruction supply, execution, and data supply rates. It reviews sequential logic design concepts. It then details the datapath components for instruction fetch, execution including different instruction types, and data memory. It shows how these components are connected and controlled for a single-cycle implementation. Finally, it summarizes the control signals needed and their values for different instruction types. The overall goal is to efficiently implement a MIPS processor datapath and control unit.

Uploaded by

api-26072581

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

135 views

Chapter 5 - The Processor: Datapath and Control

Uploaded by

api-26072581

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 101

CSE305N – Computer

Architecture

Chapter 5 – The
Processor: Datapath
and Control
Goals in processor
implementation
 Balance the rate of supply of instructions
and data and the rate at which the
execution core can consume them and can
update memory

instruction supply execution core data supply

Goals in processor
implementation
 Recall from Chapter 2
 CPU Time = INST x CPI x CT
 INST largely a function of the ISA and
compiler
 Objective: minimize CPI x CT within design
constraints (cost, power, etc.)
 Trading off CPI and CT is tricky
multiplier
multiplier
multiplier
logic

logic

logic
Brief review of sequential logic
design
 State elements are clocked devices
 Flip flops, etc
 Combinatorial elements hold no state
 ALU, caches, multiplier, multiplexers, etc.
 In edge triggered clocking, state elements
are only updated on the (rising) edge of
the clock pulse
Brief review of sequential logic
design
 The same state element can be read at the
beginning of a clock cycle and updated at
the end

 Example:
12 incrementing
clock the PC
8
Add input 8
PC

Add Add output 12

PC register 8 12
clock
Our processor design progression
 (1) Instruction fetch, execute, and operand
reads from data memory all take place in a
single clock cycle

 (2) Instruction fetch, execute, and operand

reads from data memory take place in
successive clock cycles

 (3) A pipelined design (Chapter 6)

Pieces of the processor puzzle
 Instruction fetch
 Execution
 Data memory

instruction supply execution core data supply

Instruction fetch datapath
 Memory to hold instructions
 Register to hold the instruction memory
address
 Logic to generate the next instruction
address

PC +4
Execution datapath
 Focus on only a subset of all MIPS
instructions
 add, sub, and, or
 lw, sw
 slt
 beq, j

 For all instructions except j, we

 Read operands from the register file
 Perform an ALU operation

 For all instructions except sw, beq, and j,

we write a result into the register file
Execution datapath
 Register file block diagram

 Read register 1,2: source operand register numbers

 Read data 1,2: source operands (32 bits each)
 Write register: destination operand register number
 Write data: data written into register file
 RegWrite: when asserted, enables the writing of Write
Data
Execution datapath
 Datapath for R-type (add, sub, and, or, slt)

 R-type instruction format:

31 26 25 21 20 16 15 11 10 6 5 0

op rs rt rd shamt funct
Execution datapath
 Datapath for beq instruction

 I-type instruction format:

31 26 25 21 20 16 15 0
 Zero ALU output
op rs indicates
rt if rs=rt (branch is taken/not taken)
immediate
 Branch target address is the sign extended immediate left
shifted two positions, and added to PC+4
Data memory
 Used for lw, sw (I-type format)
 Block diagram

 Address: memory location to be read or written

 Read data: data out of the memory on a load
 Write data: data into the memory on a store
 MemRead: indicates a read operation is to be performed
 MemWrite: indicates a write operation is to be performed
Execution datapath + data
memory
 Datapath for lw, sw

 Address is the sign-extended immediate added to the

source operand read out of the register file
 sw: data written to memory from specified register
 lw: data written to register file from specified memory
address
Putting the pieces together
 Single clock cycle for fetch, execute, and
operand read from data memory

 3 MUXes
 Registerfile operand or sign extended immediate to ALU
 ALU or data memory output written to register file
 PC+4 or branch target address written to PC register
Datapath for R-type instructions

Example: add $4, $18, $30

Datapath for I-type ALU
instructions

Example: slti $7, $4, 100

Datapath for not taken beq
instruction

Example: beq $28, $13, EXIT

Datapath for taken beq
instruction

Example: beq $28, $13, EXIT

Datapath for load instruction

Example: lw $8, 112($2)

Datapath for store instruction

Example: sw $10, 0($3)

Control signals we need to
generate
ALU operation control
 ALU control input codes from Chapter 4

ALU control input ALU operation Used for

000 and and

001 or or

010 add add, lw, sw

110 subtract sub, beq

111 set on less than slt

 Two steps to generate the ALU control input

 Use the opcode to distinguish R-type, lw and sw, and beq
 If R-type, use funct field to determine the ALU control
input
ALU operation control
 Opcode used to generate a 2-bit signal
called ALUOp with the following encodings
 00: lw or sw, perform an ALU add
 01: beq, perform an ALU subtract
 10: R-type, ALU operation is determined by the funct
field
Funct Instruction ALU control input

100000 add 010

100010 sub 110

100100 and 000

100101 or 001

101010 slt 111

Given Datapath: RTL →
Control
Instruction<31:0>

Inst

<21:25>

<16:20>

<11:15>

<0:15>
Memory

Adr
Op Fun Rt Rs Rd Imm16

Control

nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg Zero

DATA PATH
A Summary of Control
inst Register Transfer
Signals
ADD R[rd] ← R[rs] + R[rt]; PC ← PC + 4
ALUsrc = RegB, ALUctr = “add”, RegDst = rd, RegWr, nPC_sel = “+4”
SUB R[rd] ← R[rs] – R[rt]; PC ← PC + 4
ALUsrc = RegB, ALUctr = “sub”, RegDst = rd, RegWr, nPC_sel = “+4”
ORi R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4
ALUsrc = Im, Extop = “Z”, ALUctr = “or”, RegDst = rt, RegWr, nPC_sel = “+4”
LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4
ALUsrc = Im, Extop = “Sn”, ALUctr = “add”,
MemtoReg, RegDst = rt, RegWr, nPC_sel = “+4”
STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rs]; PC ← PC + 4
ALUsrc = Im, Extop = “Sn”, ALUctr = “add”, MemWr, nPC_sel = “+4”
BEQ if ( R[rs] == R[rt] ) then PC ← PC + sign_ext(Imm16)] || 00 else PC ← PC + 4
nPC_sel = “Br”, ALUctr = “sub”
A Summary of Control
See
Signals We Don’t Care :-)
func 10 0000 10 0010
Appendix A op 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
add sub ori lw sw beq jump
RegDst 1 1 0 0 x x x
ALUSrc 0 0 1 1 1 0 x
MemtoReg 0 0 0 1 x x x
RegWrite 1 1 1 1 0 0 0
MemWrite 0 0 0 0 1 0 0
nPCsel 0 0 0 0 0 1 0
Jump 0 0 0 0 0 0 1
ExtOp x x 0 1 1 x x
ALUctr<2:0> Add Subtract Or Add Add Subtract xxx

31 26 21 16 11 6 0
R-type op rs rt rd shamt funct add, sub

I-type op rs rt immediate ori, lw, sw, beq

J-type op target address jump

The Concept of Local
op Decoding
00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
R-type ori lw sw beq jump
RegDst 1 0 0 x x x
ALUSrc 0 1 1 1 0 x
MemtoReg 0 0 1 x x x
RegWrite 1 1 1 0 0 0
MemWrite 0 0 0 1 0 0
Branch 0 0 0 0 1 0
Jump 0 0 0 0 0 1
ExtOp x 0 1 1 x x
ALUop<N:0> “R-type” Or Add Add Subtract xxx

func
ALU ALUctr
op Main 6
ALUop Control 3
6 Control (Local)
N

ALU
The Encoding of
ALUop
func
ALU
op Main 6 ALUctr
ALUop Control
6 Control (Local) 3
N

 In
this exercise, ALUop has to be 2 bits wide to
represent:
 “R-type” instructions (1)
 “I-type” instructions that require the ALU to perform:
 (2) Or, (3) Add, and (4) Subtract
 Toimplement the full MIPS ISA, ALUop has to be
3 bits to represent:
 “R-type” instructions (1)
 “I-type” instructions that require the ALU to perform:
 (2) Or, (3) Add, (4) Subtract, and (5) And (Example: andi)

R-type ori lw sw beq jump

ALUop (Symbolic) “R-type” Or Add Add Subtract xxx
ALUop<2:0> 1 00 0 10 0 00 0 00 0 01 xxx
The Decoding of the
op Main
“func”
func
6
Field
ALU ALUctr
ALUop Control
6 Control 3
(Local)
N

R-type ori lw sw beq jump

ALUop (Symbolic) “R-type” Or Add Add Subtract xxx
ALUop<2:0> 1 00 0 10 0 00 0 00 0 01 xxx
31 26 21 16 11 6 0
R-type op rs rt rd shamt funct

P. 286 text:
funct<5:0> Instruction Operation ALUctr ALUctr<2:0> ALU Operation
10 0000 add 000 And
10 0010 subtract 001 Or
10 0100 and
ALU
010 Add
10 0101 or 110 Subtract
10 1010 set-on-less-than 111 Set-on-less-than
The Truth Table for
ALUctr
funct<3:0> Instruction Op.
0000 add
ALUop R-type ori lw sw beq 0010 subtract
(Symbolic) “R-type” Or Add Add Subtract 0100 and
ALUop<2:0> 1 00 0 10 0 00 0 00 0 01 0101 or
1010 set-on-less-than

ALUop func ALU ALUctr

bit<2> bit<1> bit<0> bit<3> bit<2> bit<1> bit<0> Operation bit<2> bit<1> bit<0>
0 0 0 x x x x Add 0 1 0
0 x 1 x x x x Subtract 1 1 0
0 1 x x x x x Or 0 0 1
1 x x 0 0 0 0 Add 0 1 0
1 x x 0 0 1 0 Subtract 1 1 0
1 x x 0 1 0 0 And 0 0 0
1 x x 0 1 0 1 Or 0 0 1
1 x x 1 0 1 0 Set on < 1 1 1
The Logic Equation for
ALUop ALUctr<2>
func
bit<2> bit<1> bit<0> bit<3> bit<2> bit<1> bit<0> ALUctr<2>
0 x 1 x x x x 1
1 x x 0 0 1 0 1
1 x x 1 0 1 0 1

This makes func<3> a don’t care

 ALUctr<2> = !ALUop<2> & ALUop<0> +

ALUop<2> & !func<2> & func<1> & !func<0>
The Logic Equation for
ALUop ALUctr<1>
func
bit<2> bit<1> bit<0> bit<3> bit<2> bit<1> bit<0> ALUctr<1>
0 0 0 x x x x 1
0 x 1 x x x x 1
1 x x 0 0 0 0 1
1 x x 0 0 1 0 1
1 x x 1 0 1 0 1

 ALUctr<1> = !ALUop<2> & !ALUop<0> +

ALUop<2> & !func<2> & !func<0>
The Logic Equation for
ALUop ALUctr<0>
func
bit<2> bit<1> bit<0> bit<3> bit<2> bit<1> bit<0> ALUctr<0>
0 1 x x x x x 1
1 x x 0 1 0 1 1
1 x x 1 0 1 0 1

 ALUctr<0> = !ALUop<2> & ALUop<0>

+ ALUop<2> & !func<3> & func<2> & !func<1>
& func<0>
+ ALUop<2> & func<3> & !func<2> & func<1>
& !func<0>
The ALU Control
func Block
6 ALU ALUctr
Control
ALUop (Local) 3
3

 ALUctr<2> = !ALUop<2> & ALUop<0> +

ALUop<2> & !func<2> & func<1> & !func<0>
 ALUctr<1> = !ALUop<2> & !ALUop<0> +
ALUop<2> & !func<2> & !func<0>
 ALUctr<0> = !ALUop<2> & ALUop<0>
+ ALUop<2> & !func<3> & func<2>
& !func<1> & func<0>
+ ALUop<2> & func<3> & !func<2>
& func<1> & !func<0>
Step 5: Logic for Each
 nPC_sel Control Signal
<= if (OP == BEQ) then “Br” else
“+4”
 ALUsrc <= if (OP == “Rtype”) then “regB”
else “immed”
 ALUctr <= if (OP == “Rtype”) then funct
elseif (OP == ORi) then “OR”
elseif (OP == BEQ) then “sub”
else “add”
 ExtOp <= _____________
 MemWr <= _____________
 MemtoReg <= _____________
 RegWr: <=_____________
 RegDst: <= _____________
Step 5: Logic for each
 nPC_sel
control signal
<= if (OP == BEQ) then “Br” else “+4”
 ALUsrc <= if (OP == “Rtype”) then “regB” else
“immed”
 ALUctr<= if (OP == “Rtype”) then funct elseif (OP ==
ORi) then “OR” elseif (OP == BEQ) then “sub”
else “add”
 ExtOp <= if (OP == ORi) then “zero” else “sign”
 MemWr <= (OP == Store)
 MemtoReg <= (OP == Load)
 RegWr:<= if ((OP == Store) || (OP == BEQ)) then 0
else 1
 RegDst: <= if ((OP == Load) || (OP == ORi)) then 0
else 1
The “Truth Table” for the
Main Control
RegDst
ALUSrc
func
ALUctr
op 6 ALU
Main
6 Control
: Control 3
ALUop (Local)
3

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

R-type ori lw sw beq jump
RegDst 1 0 0 x x x
ALUSrc 0 1 1 1 0 x
MemtoReg 0 0 1 x x x
RegWrite 1 1 1 0 0 0
MemWrite 0 0 0 1 0 0
nPC_sel 0 0 0 0 1 0
Jump 0 0 0 0 0 1
ExtOp x 0 1 1 x x
ALUop (Symbolic) “R-type” Or Add Add Subtract xxx
ALUop <2> 1 0 0 0 0 x
ALUop <1> 0 1 0 0 0 x
ALUop <0> 0 0 0 0 1 x
Comparing instruction fields

31 26 25 21 20 16 15 11 10 6 5 0

R-type 0 rs rt rd shamt funct

31 26 25 21 20 16 15 0

beq 4 rs rt immediate (offset)

31 26 25 21 20 16 15 0

lw (sw) 35 (43) rs rt immediate (offset)

 Opcode, source registers, function code,

and immediate fields always in same place
 Destination register is
 bits 15-11 (rd) for R-type
 bits 20-16 (rt) for lw
 MUX to select the right one
Datapath with instr fields and ALU
control
Main control unit design
Main control unit design
 Truth table

(0)
(34)
(43)
(4)
Adding support for jump
instructions
 J-type format
31 26 25 0

2 target

 Next PC formed by shifting left the 26-bit

target two bits and combining it with the 4
high-order bits of PC+4
 Now the next PC will be one of
 PC+4
 beq target address
 j target address

 We need another MUX and control bit

Adding support for jump
instructions
Evaluation of the simple
implementation
 All instructions take one clock cycle (CPI = 1)
 Assume the following worst case delays
 Instruction memory: 4 time units
 Data memory: 4 time units (read), 2 time units (write)
 ALU: 4 time units
 Adders: 3 time units
 Register file: 2 time units (read), 1 time unit (write)
 MUXes, sign extension, gates, and shifters: 1 time unit

 Large disparity in worst case delays among

instruction types
 R-type: 4+2+1+4+1+1 = 13 time units
 beq: 4+2+1+4+1+1+1 = 14 time units
 j: 4+1+1 = 6 time units
 store: 4+2+4+2 = 12 time units
 load: 4+2+4+4+1+1 = 16 time units
Evaluation of the simple
implementation
 Disparity would be worse in a real machine
 Even slower integer instructions (e.g., multiply/divide
in MIPS)
 Floating point instructions

 Simple instructions take as long as

complex ones
A multicycle implementation
 Instruction fetch, register file access, etc
occur in separate clock cycles
 Different instruction types take different
numbers of cycles to complete
 Clock cycle time should be faster
High level view of datapath

 New registers store results of each step

 Not programmer visible!
 Hardware can be shared
 One ALU for PC+4, branch target calculation, EA calculation,
and arithmetic operations
 One memory for instructions and data
Detailed multi-cycle datapath
Multi-cycle control
First two cycles for all instructions
 Instruction fetch (1st cycle)
 Load the instruction into the IR register
 IR = Memory[PC]
 Increment the PC
 PC = PC+4

 Instruction decode and register fetch (2nd

cycle)
 Readregister file locations rs and rt, results into the A
and B registers
 A=Reg[IR[25-21]]
 B=Reg[IR[20-16]]

 Calculate the branch target address and load into

ALUOut
 ALUOut = PC+(sign-extend (IR[15-0]) <<2)
Instruction fetch

 IR=Mem[PC]
Instruction fetch

 PC=PC+4
Instruction decode and register fetch

 A=Reg[IR[25-21]], B=Reg[IR[20-16]]
Instruction decode and register fetch

 ALUOut = PC+(sign-extend (IR[15-0]) <<2)

Additional cycles for R-type
 Execution
 ALUOut = A op B

 Completion
 Reg[IR[15-11]] = ALUOut
R-type execution cycle

 ALUOut = A op B
R-type completion cycle

 Reg[IR[15-11]] = ALUOut
Additional cycles for store
 Address computation
 ALUOut = A + sign-extend (IR[15-0])

 Memory access
 Memory[ALUOut] =B
Store address computation cycle

 ALUOut = A + sign-extend (IR[15-0])

Store memory access cycle

 Memory[ALUOut] = B
Additional cycles for load
 Address computation
 ALUOut = A + sign-extend (IR[15-0])

 Memory access
 MDR = Memory[ALUOut]

 Read completion
 Reg[IR[20-16]] = MDR
Load memory access cycle

 MDR = Memory[ALUOut]
Load read completion cycle

 Reg[IR[20-16]] = MDR
Additional cycle for beq
 Branch completion
 if (A == B) PC = ALUOut
Branch completion cycle for beq

 if (A == B) PC = ALUOut
Additional cycle for j
 Jump completion
 PC = PC[31-28] || (IR[25-0]<<2)
Jump completion cycle for j

 PC = PC[31-28] || (IR[25-0]<<2)
Control logic design
 Implemented as a Finite State Machine

 Inputs: 6 opcode bits

 Outputs: 16 control signals
 State: 4 bits for 10 states
High-level view of FSM
Instruction fetch cycle
Instruction decode/register fetch
cycle
R-type execution cycle
R-type completion cycle
Memory address computation
cycle
Store memory access cycle
Load memory access cycle
Load read completion cycle
beq branch completion cycle
j jump completion cycle
Complete FSM
Evaluation of the multi-cycle
design
 CPI calculated based on the instruction mix
 For gcc (Figure 4.54)
 23% loads (5 cycles each)
 13% stores (4 cycles each)
 19% branches (3 cycles each)
 2% jumps (3 cycles each)
 43% ALU (4 cycles each)

 CPI = 0.23*5+0.13*4+0.19*3+0.02*3+0.43*4=4.02

 Cycle time is calculated from the longest

delay path assuming the same timing
delays as before
Worst case datapath: branch target

 ALUOut = PC+(sign-extend (IR[15-0]) <<2)

 Delay = 7 time units (delay of simple = 16)
Evaluation of the multi-cycle
design
 Time per instruction of simple and multi-
cycle
 TPI(simple) = CPI(simple) x cycle time(simple) = 16
 TPI(multi-cycle) = 4.02 x 7 = 28.1

 Simple single-cycle implementation is

faster

 Multicycle with pipelining will be

considerably faster than single-cycle
implementation
Exceptions
 An exception is an event that causes a deviation from
the normal execution of instructions
 Types of exceptions
 Operating system call (e.g., read a file, print a file)
 Input/output device request
 Page fault (request for instruction/data not in memory – Ch 7)
 Arithmetic error (overflow, underflow, etc.)
 Undefined instruction
 Misaligned memory access (e.g., word access to odd address)
 Memory protection violation
 Hardware error
 Power failure

 An exception is not usually due to an error!

 We need to be able to restart the program at
the point where the exception was detected
Handling exceptions
 Detect the exception

 Save enough information about the exception

to handle it properly

 Save enough information about the program

to resume it after the exception is handled

 Handle the exception

 Either terminate the program or resume

executing it depending on the exception type
Detecting exceptions
 Performed by hardware

 Overflow: determined from the opcode and

the overflow output of the ALU

 Undefined instruction: determined from

 The opcode in the main control unit
 The function code and ALUop in the ALU control logic
Detecting exceptions

overflow

undefined
instruction
Saving exception information

 Performed by hardware
 We need the type of exception and the PC of
the instruction when the exception occurred
 In MIPS, the Cause register holds the
exception type
 Need an encoding for each exception type
 Need a signal from the control unit to load it into the
Cause register
 and the Exception Program Counter (EPC)
register holds the PC
 Need to subtract 4 from the PC register to get the correct
PC (since we loaded PC+4 into the PC register during the
Instruction Fetch cycle)
 Need a signal from the control unit to load it into EPC
Saving exception information
Saving program information
 Needed in order to restart the program from the
point where the exception occurred

 Performed by hardware and software

 EPC register holds the PC of the instruction that

had the exception (where we will restart the
program)

 The software routine that handles the exception

saves any registers that it will need to the stack
and restores them when it is done
Handling the exception
 Performed by hardware and software

 Need to transfer control to a software routine to

handle the exception (exception handler)

 The exception handler runs in a privileged mode

that allows it to use special instructions and
access all of memory
 Our programs run in user mode

 The hardware enables the privileged mode,

loads PC with the address of the exception
handler, and transitions to the Fetch state
Handling the exception
 Loading the PC with exception handler
address
Exception handler
 Stores the values of the registers that it
will need to the stack
 Handles the particular exception
 Operating system call: calls the subroutine associated with the
call
 Underflow: sets register to zero or uses denormalized numbers
 I/O: handles the particular I/O request, e.g., keyboard input

 Restores registers from the stack (if

program is to be restarted)
 Terminates the program, or resumes
execution by loading the PC with EPC and
transitioning to the Instruction Fetch state
FSM modifications
The Intel Pentium processor
 Introduced in 1993
 Uses a multi-cycle datapath with the
following steps for integer instructions
 Prefetch (PF): read instruction from the instruction
memory
 Decode 1 (D1): first stage of instruction decode
 Decode 2 (D2): second stage of instruction decode
 Execute (E): perform the ALU operation
 Write back (WB): write the result to the register file

 Datapath usage varies by instruction type

 Simple instructions make one pass through the
datapath using state machine control
 Complex instructions make multiple passes, reusing
the same hardware elements under microcode control
The Intel Pentium processor
 The Pentium is a 2-way superscalar design as
two instructions can simultaneously execute

D2 E WB U pipe

PF D1

D2 E WB V pipe

 Ideal CPI for a 2-way superscalar is 0.5

 Conditions for superscalar execution
 Both must be simple instructions
 The result of the first instruction cannot be needed by the
second
 Both instructions cannot write the same register
 The first instruction in program sequence cannot be a jump
The Intel Pentium Pro processor
 Introduced in 1995 as the successor to the
Pentium
 The basis for the Pentium II and Pentium III
 Implements a 14-cycle, 3-way superscalar
integer datapath
 Very high frequency is the goal
 Uses out-of-order execution in that instructions
may execute out of their original program
order
 Completely handled by hardware transparently to software
 Instructions execute as soon as their source operands
become available
 Complicates exception handling
 Some instructions before the excepting one may not have executed,
while some after it may have executed
The Intel Pentium Pro processor
 Pentium Pro designers (and AMD designers
before them) used innovative engineering
to overcome the disadvantages of CISC
ISAs
 Many complex X86 instructions are internally
translated by hardware into RISC-like micro-ops with
state machine control
 Achieves a very low CPI for simple integer operations
even
on programs compiled for older implementations
 Combination of high frequency and low CPI
gave the Pentium Pro extremely
competitive integer performance versus
RISC microprocessors
 Result
has been that RISC CPUs have failed to gain the
desktop market share that had been expected
The Intel Pentium 4 processor
 20 cycle superscalar integer pipeline
 Extremely high frequency (>3GHz)
 Major effort to lower power dissipation
 Clock gating: clock to a unit is turned off when the unit
is not in use
 Trace cache: caches micro-ops of previously decoded
complex instructions to avoid power-consuming
decode operation
Questions?

Portugal Appointment
100% (6)
Portugal Appointment
2 pages
Design A MIPS Processor (II)
No ratings yet
Design A MIPS Processor (II)
10 pages
Aircraft Engineering Planning
100% (2)
Aircraft Engineering Planning
2 pages
A Comparative Analysis of English and Igala Morphological Processes
100% (1)
A Comparative Analysis of English and Igala Morphological Processes
165 pages
Lec09-datapath
No ratings yet
Lec09-datapath
36 pages
L12 Sglcycle Datapath
No ratings yet
L12 Sglcycle Datapath
69 pages
Chap5 Processor Part-II
No ratings yet
Chap5 Processor Part-II
21 pages
Lecture07 Control Signals
No ratings yet
Lecture07 Control Signals
18 pages
Goals of This Chapter: COEN-4710 Computer Hardware Processor Part 1: Datapath and Control (Ch.4)
No ratings yet
Goals of This Chapter: COEN-4710 Computer Hardware Processor Part 1: Datapath and Control (Ch.4)
24 pages
KAIST cs311 05 Proc I
No ratings yet
KAIST cs311 05 Proc I
28 pages
Processor DP Control
No ratings yet
Processor DP Control
44 pages
The Processor: (Datapath and Pipelining)
No ratings yet
The Processor: (Datapath and Pipelining)
144 pages
Control: Control Unit Takes Input From
No ratings yet
Control: Control Unit Takes Input From
29 pages
Simple Implementation Scheme
No ratings yet
Simple Implementation Scheme
16 pages
Datapath3 1up
No ratings yet
Datapath3 1up
26 pages
The Final Datapath: Add M U X
No ratings yet
The Final Datapath: Add M U X
32 pages
Lec12-Processor Control I
No ratings yet
Lec12-Processor Control I
9 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
34 pages
Lect5 Single Cycle Control
No ratings yet
Lect5 Single Cycle Control
29 pages
MOD 4-I Simple Computer - Bottom Up Implementation
No ratings yet
MOD 4-I Simple Computer - Bottom Up Implementation
11 pages
Ch8, Central Processing Unit
No ratings yet
Ch8, Central Processing Unit
61 pages
04 the+Processor
No ratings yet
04 the+Processor
11 pages
Chapter 3 Lecture 1 Central Processing Unit
No ratings yet
Chapter 3 Lecture 1 Central Processing Unit
37 pages
CA04_2024S2_printout
No ratings yet
CA04_2024S2_printout
31 pages
RISC Processor Design: Multi-Cycle Cycle Implementation: Mips
No ratings yet
RISC Processor Design: Multi-Cycle Cycle Implementation: Mips
49 pages
Slide 5
No ratings yet
Slide 5
31 pages
Mips PDF
No ratings yet
Mips PDF
11 pages
Multi Cycle2
No ratings yet
Multi Cycle2
54 pages
Chapter 5: The Processor: Datapath and Control: I. MIPS Implementation
No ratings yet
Chapter 5: The Processor: Datapath and Control: I. MIPS Implementation
6 pages
Nstruction Datapath
No ratings yet
Nstruction Datapath
10 pages
DECA_Unit 4 Complete
No ratings yet
DECA_Unit 4 Complete
103 pages
Computer Organization & Design: Bilgisayar Organizasyonu & Tasarimi
No ratings yet
Computer Organization & Design: Bilgisayar Organizasyonu & Tasarimi
32 pages
L06 - RISCV Datapath Design
100% (1)
L06 - RISCV Datapath Design
78 pages
Chapter 4 Cpu
100% (1)
Chapter 4 Cpu
26 pages
5. Building a data path
No ratings yet
5. Building a data path
61 pages
Unit 2
No ratings yet
Unit 2
69 pages
L8 CPU ALU B Class
No ratings yet
L8 CPU ALU B Class
16 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
101 pages
Lecture 4.1 - The Processor
No ratings yet
Lecture 4.1 - The Processor
29 pages
An Example Hardwired CPU
No ratings yet
An Example Hardwired CPU
29 pages
Unit 3.2 Cental Processing Unit
No ratings yet
Unit 3.2 Cental Processing Unit
30 pages
L7 Single Cycle DP
No ratings yet
L7 Single Cycle DP
24 pages
Advanced Microprocessor Designs: Few Important Concepts Related To Advanced Processors
No ratings yet
Advanced Microprocessor Designs: Few Important Concepts Related To Advanced Processors
18 pages
The Processor: Computer Organization and Design
No ratings yet
The Processor: Computer Organization and Design
162 pages
M4-1.RISCV-Datapath
No ratings yet
M4-1.RISCV-Datapath
93 pages
Cs2100 12 Basic Datapath
No ratings yet
Cs2100 12 Basic Datapath
50 pages
Cs3351 - Dp _ Co Unit 4 Reg2021
No ratings yet
Cs3351 - Dp _ Co Unit 4 Reg2021
50 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
15 pages
Lectures 02, 03 Introduction To Stored Programs": Suggested Reading:" HP Chapters 1.1-1.3"
No ratings yet
Lectures 02, 03 Introduction To Stored Programs": Suggested Reading:" HP Chapters 1.1-1.3"
30 pages
Chapter V Processor Architecture
No ratings yet
Chapter V Processor Architecture
140 pages
Coa Unit 5
No ratings yet
Coa Unit 5
73 pages
L13 Multicycle Datapath
No ratings yet
L13 Multicycle Datapath
62 pages
Chapter 11 Single Cycle Datapath
No ratings yet
Chapter 11 Single Cycle Datapath
17 pages
11 Processor 1
No ratings yet
11 Processor 1
23 pages
coa chapter 6
No ratings yet
coa chapter 6
9 pages
Multi Cycle PDF
No ratings yet
Multi Cycle PDF
16 pages
Computer Architecture: CSCE 350
No ratings yet
Computer Architecture: CSCE 350
41 pages
(Cpre 381) Computer Organization and Assembly-Level Programming, Fall 2018 Project A Report
No ratings yet
(Cpre 381) Computer Organization and Assembly-Level Programming, Fall 2018 Project A Report
10 pages
lecture08_RISCV_Impl_2
No ratings yet
lecture08_RISCV_Impl_2
55 pages
RISC Processor Document
No ratings yet
RISC Processor Document
51 pages
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
Part 1: Bangladesh: Prof. Dr. Chowdhury Mofizur Rahman Pro-Vice Chancellor, UIU
No ratings yet
Part 1: Bangladesh: Prof. Dr. Chowdhury Mofizur Rahman Pro-Vice Chancellor, UIU
41 pages
Instruction Set Architecture: Mips Section 2.6-2.7,2.8,2.9
No ratings yet
Instruction Set Architecture: Mips Section 2.6-2.7,2.8,2.9
26 pages
Instruction Set Architecture: Mips Section 2.1-2.5
No ratings yet
Instruction Set Architecture: Mips Section 2.1-2.5
27 pages
Instructions: Language of The Computer
No ratings yet
Instructions: Language of The Computer
17 pages
Inroduction
No ratings yet
Inroduction
17 pages
Performance
No ratings yet
Performance
42 pages
Large and Fast: Exploiting Memory Hierarchy
No ratings yet
Large and Fast: Exploiting Memory Hierarchy
48 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
71 pages
Ethiopian Commercial Code PDF
100% (4)
Ethiopian Commercial Code PDF
2 pages
HW 2 S
No ratings yet
HW 2 S
2 pages
Daily test chapter 5
No ratings yet
Daily test chapter 5
6 pages
Conference Paper: Mustapha Mokhtari Gharbi Yacine Si Lakhdar
No ratings yet
Conference Paper: Mustapha Mokhtari Gharbi Yacine Si Lakhdar
21 pages
PeakPok Club-Whitepaper
No ratings yet
PeakPok Club-Whitepaper
13 pages
2 BesuchInfo EN
No ratings yet
2 BesuchInfo EN
16 pages
Full o Beans Web
No ratings yet
Full o Beans Web
28 pages
unit 3
No ratings yet
unit 3
32 pages
找到最合适的论文导师
100% (1)
找到最合适的论文导师
6 pages
Second Part (Through Part Feeder)
No ratings yet
Second Part (Through Part Feeder)
64 pages
DP Unit 9 & 19. Redox Processes
No ratings yet
DP Unit 9 & 19. Redox Processes
12 pages
Value of Widal Test in Diagnosis of Typhoid Fever Dina A. Al-Roubaeay Abbas A. Al-Ani
No ratings yet
Value of Widal Test in Diagnosis of Typhoid Fever Dina A. Al-Roubaeay Abbas A. Al-Ani
5 pages
Ayra WSM 02 03 Mkii Fog Machine User Manual
No ratings yet
Ayra WSM 02 03 Mkii Fog Machine User Manual
9 pages
Structural and Material Analysis of An Automobile Wheel Rim Using Ansys
No ratings yet
Structural and Material Analysis of An Automobile Wheel Rim Using Ansys
6 pages
Unit I: Web and Internet Technologies (15A05605) III B.Tech II Sem (CSE)
No ratings yet
Unit I: Web and Internet Technologies (15A05605) III B.Tech II Sem (CSE)
50 pages
Fulltext
No ratings yet
Fulltext
35 pages
Uploads - Downloads - Junior Plumber, OP, L-1
No ratings yet
Uploads - Downloads - Junior Plumber, OP, L-1
4 pages
Comet: Communities of Maspeth and Elmhurst Together
No ratings yet
Comet: Communities of Maspeth and Elmhurst Together
8 pages
Lec 2 - Newton's Ring
No ratings yet
Lec 2 - Newton's Ring
8 pages
GAP 4.02 Release: Product Change / Improvement
No ratings yet
GAP 4.02 Release: Product Change / Improvement
2 pages
A Project Synopsis
No ratings yet
A Project Synopsis
25 pages
3012Tag3A 3000 Series: Diesel Engine - Electropak
100% (1)
3012Tag3A 3000 Series: Diesel Engine - Electropak
2 pages
EXAMPLE Feasibility Report For Teldon Facilities Corporation
No ratings yet
EXAMPLE Feasibility Report For Teldon Facilities Corporation
24 pages
Floriculture Breeding PDF
No ratings yet
Floriculture Breeding PDF
20 pages
English: Quarter I - Module I
50% (2)
English: Quarter I - Module I
45 pages
Juggernaut Method Instructions:: Cycle 1
No ratings yet
Juggernaut Method Instructions:: Cycle 1
9 pages
STRU-MISC - Structural Manual - Rev 60 - 2024-03
No ratings yet
STRU-MISC - Structural Manual - Rev 60 - 2024-03
356 pages

Chapter 5 - The Processor: Datapath and Control

Uploaded by

Chapter 5 - The Processor: Datapath and Control

Uploaded by

CSE305N – Computer

instruction supply execution core data supply

Add Add output 12

 (2) Instruction fetch, execute, and operand

 (3) A pipelined design (Chapter 6)

instruction supply execution core data supply

 For all instructions except j, we

 For all instructions except sw, beq, and j,

 Read register 1,2: source operand register numbers

 R-type instruction format:

 I-type instruction format:

 Address: memory location to be read or written

 Address is the sign-extended immediate added to the

Example: add $4, $18, $30

Example: slti $7, $4, 100

Example: beq $28, $13, EXIT

Example: beq $28, $13, EXIT

Example: lw $8, 112($2)

Example: sw $10, 0($3)

ALU control input ALU operation Used for

000 and and

010 add add, lw, sw

110 subtract sub, beq

111 set on less than slt

 Two steps to generate the ALU control input

100000 add 010

100010 sub 110

100100 and 000

101010 slt 111

nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg Zero

I-type op rs rt immediate ori, lw, sw, beq

J-type op target address jump

R-type ori lw sw beq jump

R-type ori lw sw beq jump

ALUop func ALU ALUctr

This makes func<3> a don’t care

 ALUctr<2> = !ALUop<2> & ALUop<0> +

 ALUctr<1> = !ALUop<2> & !ALUop<0> +

 ALUctr<0> = !ALUop<2> & ALUop<0>

 ALUctr<2> = !ALUop<2> & ALUop<0> +

op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010

R-type 0 rs rt rd shamt funct

beq 4 rs rt immediate (offset)

lw (sw) 35 (43) rs rt immediate (offset)

 Opcode, source registers, function code,

 Next PC formed by shifting left the 26-bit

 We need another MUX and control bit

 Large disparity in worst case delays among

 Simple instructions take as long as

 New registers store results of each step

 Instruction decode and register fetch (2nd

 Calculate the branch target address and load into

 ALUOut = PC+(sign-extend (IR[15-0]) <<2)

 ALUOut = A + sign-extend (IR[15-0])

 Inputs: 6 opcode bits

 Cycle time is calculated from the longest

 ALUOut = PC+(sign-extend (IR[15-0]) <<2)

 Simple single-cycle implementation is

 Multicycle with pipelining will be

 An exception is not usually due to an error!

 Save enough information about the exception

 Save enough information about the program

 Handle the exception

 Either terminate the program or resume

 Overflow: determined from the opcode and

 Undefined instruction: determined from

 Performed by hardware and software

 EPC register holds the PC of the instruction that

 The software routine that handles the exception

 Need to transfer control to a software routine to

 The exception handler runs in a privileged mode

 The hardware enables the privileged mode,

 Restores registers from the stack (if

 Datapath usage varies by instruction type

 Ideal CPI for a 2-way superscalar is 0.5

You might also like