0% found this document useful (0 votes)
5 views4 pages

Ex4 Updated

The document contains exercises related to computer architecture, focusing on processor instructions, control signals, resource utilization, instruction execution, and pipelining. It includes questions on instruction latency, data hazards, branch prediction, and the impact of various predictors on performance. The exercises require detailed analysis of instruction execution in single-cycle and pipelined datapaths, as well as the implications of adding NOP instructions and optimizing branch prediction strategies.

Uploaded by

ndvu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views4 pages

Ex4 Updated

The document contains exercises related to computer architecture, focusing on processor instructions, control signals, resource utilization, instruction execution, and pipelining. It includes questions on instruction latency, data hazards, branch prediction, and the impact of various predictors on performance. The exercises require detailed analysis of instruction execution in single-cycle and pipelined datapaths, as well as the implications of adding NOP instructions and optimizing branch prediction strategies.

Uploaded by

ndvu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

ICT1.

003 – Computer Architecture


Chapter 4: The Processor exercises
1. Consider the following instruc on:
Instruc on: and rd, rs1, rs2
Interpreta on: Reg[rd] = Reg[rs1] AND Reg[rs2]
a. What are the values of control signals generated by the control unit for this
instruc on?
b. Which resources (blocks) perform a useful func on for this instruc on?
c. Which resources (blocks) produce no output for this instruc on? Which resources
produce output that is not used?
2. Consider the following instruc on mix:

a. What frac on of all instruc ons use data memory?


b. What frac on of all instruc ons use instruc on memory?
c. What frac on of all instruc ons use the sign extend?
d. What is the sign extend doing during cycles in which its output is not needed?

3. In this exercise, we examine in detail how an instruc on is executed in a single-cycle


datapath. Problems in this exercise refer to a clock cycle in which the processor fetches
the following instruc on word: 0x00c6ba23.
a. What are the values of the ALU control unit’s inputs for this instruc on?
b. What is the new PC address a er this instruc on is executed? Highlight the path
through which this value is determined.
c. For each mux, show the values of its inputs and outputs during the execu on of this
instruc on. List values that are register outputs at Reg [xn].
d. What are the input values for the ALU and the two add units?
e. What are the values of all inputs for the registers unit?

4. Problems in this exercise assume that the logic blocks used to implement a processor’s
datapath have the following latencies:
“Register read” is the me needed a er the rising clock edge for the new register value
to appear on the output. This value applies to the PC only. “Register setup” is the
amount of me a register’s data input must be stable before the rising edge of the clock.
This value applies to both the PC and Register File.
a. What is the latency of an R-type instruc on (i.e., how long must the clock period be
to ensure that this instruc on works correctly)?
b. What is the latency of lw?
c. What is the latency of sw?
d. What is the latency of beq?
e. What is the latency of an arithme c, logical, or shi I-type (non-load) instruc on?
f. What is the minimum clock period for this CPU?
5. In this exercise, we examine how pipelining affects the clock cycle me of the processor.
Problems in this exercise assume that individual stages of the datapath have the
following latencies:

Also, assume that instruc ons executed by the processor are broken down as
follows:

a. What is the clock cycle me in a pipelined and non-pipelined processor?


b. What is the total latency of an lw instruc on in a pipelined and non-pipelined
processor?
c. If we can split one stage of the pipelined datapath into two new stages, each with
half the latency of the original stage, which stage would you split and what is the
new clock cycle me of the processor?
d. Assuming there are no stalls or hazards, what is the u liza on of the data memory?
e. Assuming there are no stalls or hazards, what is the u liza on of the write-register
port of the “Registers” unit?
6. What is the minimum number of cycles needed to completely execute n instruc ons on
a CPU with a k stage pipeline? Jus fy your formula.
7. Add NOP instruc ons to the code below so that it will run correctly on a pipeline that
does not handle data hazards.
addi x11, x12, 5
add x13, x11, x12
addi x14, x11, 15
add x15, x13, x12
8. Consider a version of the pipeline that does not handle data hazards (i.e., the
programmer is responsible for addressing data hazards by inser ng NOP instruc ons
where necessary). Suppose that (a er op miza on) a typical n-instruc on program
requires an addi onal 1.4*n NOP instruc ons to correctly handle data hazards.
a. Suppose that the cycle me of this pipeline without forwarding is 250 ps. Suppose
also that adding forwarding hardware will reduce the number of NOPs from 1.4*n to
1.05*n, but increase the cycle me to 300 ps. What is the speedup of this new
pipeline compared to the one without forwarding?
b. Different programs will require different amounts of NOPs. How many NOPs (as a
percentage of code instruc ons) can remain in the typical program before that
program runs slower on the pipeline with forwarding?
c. however, this me let x represent the number of NOP instruc ons rela ve to n. (x
was equal to 1.4 in b) Your answer will be with respect to x.
d. Can a program with only 1.075*n NOPs possibly run faster on the pipeline with
forwarding? Explain why or why not.
e. At minimum, how many NOPs (as a percentage of code instruc ons) must a program
have before it can possibly run faster on the pipeline with forwarding?

9. The importance of having a good branch predictor depends on how o en condi onal
branches are executed. Together with branch predictor accuracy, this will determine how
much me is spent stalling due to mispredicted branches. In this exercise, assume that
the breakdown of dynamic instruc ons into various instruc on categories is as follows:

Also, assume the following branch predictor accuracies:

a. Stall cycles due to mispredicted branches increase the CPI. What is the extra CPI due
to mispredicted branches with the always-taken predictor? Assume that branch
outcomes are determined in the ID stage and applied in the EX stage that there are
no data hazards, and that no delay slots are used.
b. What is the CPI for the “always-not-taken” predictor.
c. What is the CPI for for the 2-bit predictor.
d. With the 2-bit predictor, what speedup would be achieved if we could convert half of
the branch instruc ons to some ALU instruc on? Assume that correctly and
incorrectly predicted instruc ons have the same chance of being replaced.
e. With the 2-bit predictor, what speedup would be achieved if we could convert half of
the branch instruc ons in a way that replaced each branch instruc on with two ALU
instruc ons? Assume that correctly and incorrectly predicted instruc ons have the
same chance of being replaced.
f. Some branch instruc ons are much more predictable than others. If we know that
80% of all executed branch instruc ons are easy-to-predict loop-back branches that
are always predicted correctly, what is the accuracy of the 2-bit predictor on the
remaining 20% of the branch instruc ons?
10. This exercise examines the accuracy of various branch predictors for the following
repea ng pa ern (e.g., in a loop) of branch outcomes: T, NT, T, T, NT.
a. What is the accuracy of always-taken and always-not-taken predictors for this
sequence of branch outcomes?
b. What is the accuracy of the 2-bit predictor for the first four branches in this pa ern,
assuming that the predictor starts off in the bo om le state (predict not taken)?
c. What is the accuracy of the 2-bit predictor if this pa ern is repeated forever?
d. Design a predictor that would achieve a perfect accuracy if this pa ern is repeated
forever. You predictor should be a sequen al circuit with one output that provides a
predic on (1 for taken, 0 for not taken) and no inputs other than the clock and the
control signal that indicates that the instruc on is a condi onal branch.
e. What is the accuracy of your predictor if it is given a repea ng pa ern that is the
exact opposite of this one?
f. Design a predictor similar to (d), but now your predictor should be able to eventually
(a er a warm-up period during which it can make wrong predic ons) start perfectly
predic ng both this pa ern and its opposite. Your predictor should have an input
that tells it what the real outcome was. Hint: this input lets your predictor determine
which of the two repea ng pa erns it is given.

You might also like