Cs3351-Digital Principles and Computer Organization-All Units Question Bank With Answers
Cs3351-Digital Principles and Computer Organization-All Units Question Bank With Answers
FIRST HALF TOPIC: Combinational Circuits, Karnaugh Map, Analysis and Design,
Procedures.
PART B
Explain the analysis procedure. Analyze the combinational circuit the following logic diagram.
(May
2015)
Proceed to obtain the truth table for the outputs of those gates which are a function of previously
defined values until the columns for all outputs are determined.
DESIGN PROCEDURE
Explain the procedure involved in designing combinational circuits.
The design of combinational circuits starts from the specification of the design objective and culminates
in a logic circuit diagram or a set of Boolean functions from which the logic diagram can be obtained.
The procedure involved involves the following steps,
From the specifications of the circuit, determine the required number of inputs and outputs and assign a
symbol to each.
Derive the truth table that defines the required relationship between inputs and outputs.
Obtain the simplified Boolean functions for each output as a function of the input variables.
Draw the logic diagram and verify the correctness of the design.
**************************************************
Half adder:
Construct a half adder with necessary diagrams. (Nov-06,May- 07)
A half-adder is an arithmetic circuit block that can be used to add two bits and produce two outputs
SUM and CARRY.
The Boolean expressions for the SUM and CARRY outputs are given by the equations
Truth Table:
*************************
Full adder:
Design a full adder using NAND and NOR gates respectively. (Nov -10)
A Full-adder is an arithmetic circuit block that can be used to add three bits and produce two outputs
SUM and CARRY.
The Boolean expressions for the SUM and CARRY outputs are given by the equations
Truth table:
Karnaugh map:
Logic diagram:
The Boolean expressions of S and C are modified as follows
****************************
Half subtractor:
Design a half subtractor circuit. (Nov-2009)
A half-subtractor is a combinational circuit that can be used to subtract one binary digit from anotherto
produce a DIFFERENCE output and a BORROW output.
The BORROW output here specifies whether a ‘1’ has been borrowed to perform the subtraction. The
Boolean expression for difference and borrow is:
Logic diagram:
*************************************
Full subtractor:
*************************************
The carry output of lower order stage is connected to the carry input of the next higher order stage.
Hence this type of adder is called ripple carry adder.
In a 4-bit binary adder, where each full adder has a propagation delay of tp ns, the output in the fourth
stage will be generated only after 4tp ns.
The magnitude of such delay is prohibitive for high speed computers.
One method of speeding up this process is look-ahead carry addition which eliminates ripple carry
delay.
**********************************
Complement of a number:
1’s complement:
The 1’s complement of a binary number is formed by changing 1 to 0 and 0 to 1.
Example:
1. The 1’s complement of 1011000 is 0100111.
2. The 1’s complement of 0101101 is 1010010.
2’s complement:
The 2’s complement of a binary number is formed by adding 1 with 1’s complement of a binary
number.
Example:
1. The 2’s complement of 1101100 is 0010100
2. The 2’s complement of 0110111 is 1001001
****************************
SECOND HALF TOPIC: Binary Adder, Subtractor, Decimal Adder, Magnitude Comparator,
Decoder, Encoder,Multiplexers , Demultiplexers
The subtraction of unsigned binary numbers can be done most conveniently by means of complements.
The subtraction A - B can be done by taking the 2’s complement of B and adding it to A . The 2’s
complement can be obtained by taking the 1’s complement and adding 1 to the least significant pair of
bits. The 1’s complement can be implemented with inverters, and a 1 can be added to the sum through the
input carry.
The circuit for subtracting A - B consists of an adder with inverters placed between each data input B and
the corresponding input of the full adder. The input carry Cin must be equal to 1 when subtraction
is performed. The operation thus performed becomes A,plus the 1’s complement of B , plus 1. This is equal to
Aplus the 2’s complement of B.
For unsigned numbers, that gives A-B if A>=B or the 2’s complement of B - Aif A <B. For signed
numbers, the result is A - B, provided that there is no overflow.
****************************
Consider the circuit of the full adder shown in Fig. If we define two new binary variables
Gi is called a carry generate, and it produces a carry of 1 when both Ai and Bi are 1,regardless of
the input carry Ci. Pi is called a carry propagate, because it determines whether a carry into stage i will
propagate into stage i + 1 (i.e., whether an assertion of Ci will propagate to an assertion of Ci+1 ).
We now write the Boolean functions for the carry outputs of each stage and substitute the value
of each Ci from the previous equations:
The construction of a four-bit adder with a carry look ahead scheme is shown in Fig.
Each sum output requires two exclusive-OR gates.
The output of the first exclusive-OR gate generates the Pi variable, and the AND gate generates the Gi
variable.
The carries are propagated through the carry look ahead generator and applied as inputs to the second
exclusive-OR gate.
All output carries are generated after a delay through two levels of gates.
Thus, outputs S1 through S3 have equal propagation delay times. The two-level circuit for the output
carry C4 is not shown. This circuit can easily be derived by the equation-substitution method.
******************************
4 bit-Parallel adder/subtractor:
Explain about binary parallel / adder subtractor. [NOV – 2019]
The addition and subtraction operations can be combined into one circuit with one common binary adder
by including an exclusive-OR gate with each full adder. A four-bit adder–subtractor circuit is shown in
Fig.
The mode input M controls the operation. When M = 0, the circuit is an adder, and when M = 1, the
circuit becomes a subtractor.
It performs the operations of both addition and subtraction.
It has two 4bit inputs A3A2A1A0 and B3B2B1B0.
The mode input M controls the operation when M=0 the circuit is an adder and when M=1 the circuits
become subtractor.
Each exclusive-OR gate receives input M and one of the inputs of B .
When M = 0, we have B xor0 = B. The full adders receive the value of B , the input carry is 0, and the
circuit performs A plus B . This results in sum S3S2S1S0and carry C4.
When M = 1, we have B xor 1 = B’ and C0 = 1. The B inputs are all complemented and a 1 is added
through the input carry thus producing 2’s complement of B.
Now the data A3A2A1A0will be added with 2’s complement of B3B2B1B0to produce the sum i.e., A-B if
A≥B or the 2’s complement of B-A if A<B.
*************************
Comparators
Design a 2 bit magnitude comparator. (May 2006) MAY 2024
It is a combinational circuit that compares two numbers and determines their relative magnitude. The
output of comparator is usually 3 binary variables indicating:
A<B, A=B, A>B
1- bitcomparator: Let’s begin with 1bit comparator and from the name we can easily make out that this
circuit would be used to compare 1bit binary numbers.
A B A>B A=B A<B
0 0 0 1 0
1 0 1 0 0
0 1 0 0 1
1 1 0 1 0
For a 2-bit comparator we have four inputs A1 A0 and B1 B0 and three output E (is 1 if two numbers are
equal) G (is 1 when A>B) and L (is 1 when A<B) If we use truth table and K-map the result is
Truth table:
K-Map:
Logic Diagram:
********************
Input
Function Equation
*************************
BCD Adder:
Design to perform BCD addition.(or) What is BCD adder? Design an adder to perform arithmetic
addition of two decimal bits in BCD. (May -08)(Apr 2017,2018)[Nov – 2019]
Consider the arithmetic addition of two decimal digits in BCD, together with an input carry from a
previous stage. Since each input digit does not exceed 9, the output sum cannot be greater than 9 + 9 + 1
= 19, the 1 in the sum being an input carry.
Suppose we apply two BCD digits to a four-bit binary adder. The adder will form the sum in binary and
produce a result that ranges from 0 through 19. These binary numbers are listed in Table and are labeled
by symbols K, Z8, Z4, Z2, and Z1. K is the carry, and the subscripts under the letter Z represent the
weights 8, 4, 2, and 1 that can be assigned to the four bits in the BCD code.
A BCD adder that adds two BCD digits and produces a sum digit in BCD is shown in Fig. The two
decimal digits, together with the input carry, are first added in the top four-bit adder to produce the
binary sum.
When the output carry is equal to 0, nothing is added to the binary sum. When it is equal to 1, binary
0110 is added to the binary sum through the bottom four-bit adder.
The condition for a correction and an output carry can be expressed by the Boolean function
C = K + Z8Z4 + Z8Z2
The output carry generated from the bottom adder can be ignored, since it supplies information already
available at the output carry terminal.
A decimal parallel adder that adds n decimal digits needs n BCD adder stages. The output carry from
one stage must be connected to the input carry of the next higher order stage.
******************************
Binary Multiplier:
Explain about binary Multiplier.
Multiplication of binary numbers is performed in the same way as multiplication of decimal numbers.
The multiplicand is multiplied by each bit of the multiplier, starting from the least significant bit. Each
such multiplication forms a partial product.
Successive partial products are shifted one position to the left. The final product is obtained from the
sum of the partial products.
A combinational circuit binary multiplier with more bits can be constructed in a similar fashion.
A bit of the multiplier is ANDed with each bit of the multiplicand in as many levels as there are bits in
the multiplier.
The binary output in each level of AND gates is added with the partial product of the previous level to
form a new partial product. The last level produces the product.
*************************************
CODE CONVERSION
Design a binary to gray converter. MAY 2024 (Nov-2009)(Nov2017)
Binary to Grayconverter
Truth Table
B3 B2 B1 B0 G3 G2 G1 G0
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 1
0 0 1 0 0 0 1 1
0 0 1 1 0 0 1 0
0 1 0 0 0 1 1 0
0 1 0 1 0 1 1 1
0 1 1 0 0 1 0 1
0 1 1 1 0 1 0 0
1 0 0 0 1 1 0 0
1 0 0 1 1 1 0 1
1 0 1 0 1 1 1 1
1 0 1 1 1 1 1 0
1 1 0 0 1 0 1 0
1 1 0 1 1 0 1 1
1 1 1 0 1 0 0 1
1 1 1 1 1 0 0 0
G3=B3 G2=B3’B2+B3B2’=B3 B2
K-MAP FORG1: K-MAP FORG0:
Logic diagram:
K-Map:
Logic Diagram:
Truth table:
K-Map:
Logic Diagram
Excess -3 to BCD converter:
Design a combinational circuit to convert Excess-3 to BCD code. (May 2007)
Truth table:
Design Binary to BCD converter.
Truth table:
K-map:
Logic diagram:
******************************
DECODERS AND ENCODERS
Decoder:
Explain about decoders with necessary diagrams. (Apr 2018)(Nov 2018)
A decoder is a combinational circuit that converts binary information from n input lines to a maximum
of 2n unique output lines. If the n -bit coded information has unused combinations, the decoder may
have fewer than 2n outputs.
The purpose of a decoder is to generate the 2n (or fewer) minterms of n input variables, shown below for
two input variables.
2 to 4 decoder:
3 to 8 Decoder:
Design 3 to 8 line decoder with necessary diagram. May -10)
Truth table:
Logic diagram:
Design for 3 to 8 decoder with 2 to 4 decoder:
Not that the two to four decoder design shown earlier, with its enable inputs can be used to build a three
to eight decoder as follows.
Since the three to eight decoder provides all the minterms of three variables, the realisation of a
function in terms of the sum of products can be achieved using a decoder and OR gates as follows.
Example: Implement full adder using decoder.
Sum is given by ∑m(1, 2, 4, 7) while Carry is given by ∑m(3, 5, 6, 7) as given by the minterms
each of the OR gates are connected to.
Design for 4 to 16 decoder using 3 to 8 decoder: Design 5 to 32 decoder using 3 to 8 and 2 to 4 decoder:
**********************************
Truth table:
K-Map:
Logic Diagram:
The specification above requires that the output be zeroes (none of the segments are lighted up) when
the input is not a BCD digit.
In practical implementations, this may defer to allow representation of hexadecimal digits using the
seven segments.
**************************
Encoder:
Explain about encoders. (Nov 2018)
An encoder is a digital circuit that performs the inverse operation of a decoder. An encoder has 2 n (or
fewer) input lines and n output lines. The output lines, as an aggregate, generate the binary code
corresponding to the input value.
The encoder can be implemented with OR gates whose inputs are determined directly from the truth
table. Output z is equal to 1 when the input octal digit is 1, 3, 5, or 7.
Output y is 1 for octal digits 2, 3, 6, or 7, and output x is 1 for digits 4, 5, 6, or 7. These conditions can
be expressed by the following Boolean output functions:
Truth table:
Another ambiguity in the octal-to-binary encoder is that an output with all 0’s is generated when all the
inputs are 0; but this output is the same as when D0 is equal to 1. The discrepancy can be resolved by
providing one more output to indicate whether at least one input is equal to 1.
Logic Diagram:
************************
Priority Encoder:
Design a priority encoder with logic diagram.(or) Explain the logic diagram of a 4 – input priority
encoder. (Apr – 2019)
A priority encoder is an encoder circuit that includes the priority function. The operation of the
priority encoder is such that if two or more inputs are equal to 1 at the same time, the input having the
highest priority will take precedence.
Truth table:
K-Map:
Logic Equations:
Logic diagram:
*******************************************
MULTIPLEXERS AND DEMULTIPLEXERS
Multiplexer: (MUX)
Design a 2:1 and 4:1 multiplexer.
A multiplexer is a combinational circuit that selects binary information from one of many input lines and
directs it to a single output line. The selection of a particular input line is controlled by a set of selection
lines.
Normally, there are 2n input lines and n selection lines whose bit combinations determine which input is
selected.
2 to 1 MUX:
A 2 to 1 line multiplexer is shown in figure below, each 2 input lines A to B is applied to one input of an
AND gate. Selection lines S are decoded to select a particular AND gate. The truth table for the 2:1 mux
is given in the table below.
To derive the gate level implementation of 2:1 mux we need to have truth table as shown in figure. And
once we have the truth table, we can draw the K-map as shown in figure for all the cases when Y is
equal to '1'.
Truth table:
Logic Diagram:
4 to 1 MUX:
A 4 to 1 line multiplexer is shown in figure below, each of 4 input lines I0 to I3 is applied to one input
of an AND gate.
Selection lines S0 and S1 are decoded to select a particular AND gate.
The truth table for the 4:1 mux is given in the table below.
Logic Diagram:
Truth Table:
SELECT OUTPUT
INPUT
S1 S0 Y
0 0 I0
0 1 I1
1 0 I2
1 1 I3
Problems :
Example: Implement the Boolean expression using MUX
F(A,B,C,D) = ∑m(0,1,5,6,8,10,12,15) (Apr 2017, Nov 2017)
F (x, y, z) = Σm (1, 2, 6, 7)
Solution:
Implementation table:
Multiplexer Implementation:
Example: 32:1 Multiplexer using 8:1 Mux (Nov 2018) (Apr – 2019)
DEMULTIPLEXERS:
Explain about demultiplexers.
The de-multiplexer performs the inverse function of a multiplexer, that is it receives information on one
line and transmits its onto one of 2n possible output lines.
The selection is by n input select lines. Example: 1-to-4 De-multiplexer
INPUT OUTPUT
E D S0 S1 Y0 Y1 Y2 Y3
1 1 0 0 1 0 0 0
1 1 0 1 0 1 0 0
1 1 1 0 0 0 1 0
1 1 1 1 0 0 0 1
Example:
1. Implement full adder using De-multiplexer.
2. Implement the following functions using de-multiplexer.
f1 (A,B,C) = ∑m(1,5,7), f2 (A,B,C) = ∑m(3,6,7)
Solution:
***************************
Parity Checker / Generator:
A parity bit is an extra bit included with a binary message to make the number of 1’s either odd or
even. The message, including the parity bit, is transmitted and then checked at the receiving end for
errors. An error is detected if the checked parity does not correspond with the one transmitted.
The circuit that generates the parity bit in the transmitter is called a parity generator. The circuit that
checks the parity in the receiver is called a parity checker.
In even parity system, the parity bit is ‘0’ if there are even number of 1s in the data and the parity bit
is ‘1’ if there are odd number of 1s in the data.
In odd parity system, the parity bit is ‘1’ if there are even number of 1s in the data and the parity bit is
‘0’ if there are odd number of 1s in the data.
Logic Diagram:
FIRST HALF TOPIC: Combinational Circuits, Karnaugh Map, Analysis and Design,
Procedures.
Truth Table:
5) Draw the logic diagram of half adder using NAND gate. (May 2006,13)
Logic Diagram: Half adder using NAND gate:
6) What is Full adder? Draw the truth table of full adder. (Apr 2018)
A Full-adder is an arithmetic circuit block that can be used to add three bits and produce two
outputs SUM and CARRY.
The Boolean expressions for the SUM and CARRY outputs are given by the equations
10) What is Full subtractor? Write the truth table of full subtractor. (Nov 2017)
A full subtractor performs subtraction operation on two bits, a minuend and a subtrahend, and
also takes into consideration whether a ‘1’ has already been borrowed by the previous adjacent lower
minuend bit or not. As a result, there are three bits to be handled at the input of a full subtractor, namely
the two bits to be subtracted and a borrow bit designated as Bin . There are two outputs, namely the
DIFFERENCE output D and the BORROW output Bo. The BORROW output bit tells whether the
minuend bit needs to borrow a ‘1’ from the next possible higher minuend bit. The Boolean expression for
difference and barrow is:
11) Draw Full subtractor using two half subtractor.
SECOND HALF TOPIC: Binary Adder, Subtractor, Decimal Adder, Magnitude Comparator,
Decoder, Encoder,Multiplexers , Demultiplexers
16) How Subtraction of binary numbers perform using 2’s complement addition?
The subtraction of unsigned binary number can be done by means of complements.
Subtraction of A-B can be done by taking 2’s complement of B and adding it to A.
Check the resulting number. If carry present, the number is positive and remove the carry.
If no carry present, the resulting number is negative, take the 2’s complement of result and put
negative sign.
17) Given the two binary numbers X = 1010100 and Y = 1000011, perform the subtraction
(a) X - Y and (b) Y - X by using 2’s complements.
Solution:
(c) X = 1010100
2’s complement of Y = + 0111101
Sum= 10010001
Discard end carry. Answer: X - Y = 0010001
(d) Y = 1000011
2’s complement of X= + 0101100
Sum= 1101111
There is no end carry. Therefore, the answer is Y - X = -(2’s complement of 1101111) =-0010001.
18) Draw the logic diagram of Parallel Binary Subtractor.
19) Draw 1:8 Demux using two 1:4 demux. (Nov 2018)
20) Draw the logic diagram of 2’s complement adder/subtractor. (May 2013)
The mode input M controls the operation. When M = 0, the circuit is an adder, and when M = 1,
the circuit becomes a subtractor.
The outcome of the comparison is specified by three binary variables that indicate whether A
> B, A = B, or A < B.
Logic Circuits:
A decoder is a combinational circuit that converts binary information from n input lines to a
maximum of 2n unique output lines. If the n -bit coded information has unused combinations, the
decoder may have fewer than 2n outputs.
The purpose of a decoder is to generate the 2n (or fewer) minterms of n input variables, shown below
for two input variables.
Not that the two to four decoder design shown earlier, with its enable inputs can be used to build a
three to eight decoder as follows.
25) What is Encoder? (May 2012)
An encoder is a digital circuit that performs the inverse operation of a decoder. An encoder
has 2n (or fewer) input lines and n output lines. The output lines, as an aggregate, generate the binary
code corresponding to the input value.
26) What is Priority Encoder? (Apr 2017)
A priority encoder is an encoder circuit that includes the priority function. The operation of
the priority encoder is such that if two or more inputs are equal to 1 at the same time, the input having
the highest priority will take precedence.
27) Define Multiplexer (MUX) (or) Data Selector. (Dec 2006, May 2011) [NOV – 2019]
A multiplexer is a combinational circuit that selects binary information from one of many
input lines and directs it to a single output line. The selection of a particular input line is controlled by
a set of selection lines. Normally, there are 2n input lines and n selection lines whose bit combinations
determine which input is selected.
end for errors. An error is detected if the checked parity does not correspond with the one
transmitted.
30) What is Parity Checker / Generator:
The circuit that generates the parity bit in the transmitter is called a parity generator. The
circuit that checks the parity in the receiver is called a parity checker.
31) What is even parity and odd parity?
In even parity system, the parity bit is ‘0’ if there are even number of 1s in the data and the
parity bit is ‘1’ if there are odd number of 1s in the data.
In odd parity system, the parity bit is ‘1’ if there are even number of 1s in the data and the
parity bit is ‘0’ if there are odd number of 1s in the data.
31) Give the applications of Demultiplexer.
i) It finds its application in Data transmission system with error detection.
ii) One simple application is binary to Decimal decoder.
2. The second bit of the Grey code can be found by performing the Exclusive-OR (EX-OR)
operation between the First and second bits of the Binary Number.
3. The Third bit of the Grey code can be found by performing the Exclusive-OR (EX-OR) operation
between the Third and Second bits of the given Binary Number; and so on
EX-OR Operation:
1. Both the bits are 0 or 1 then the output of EX-OR gate will be 0.
2. Any one of the bit in two bits is 1 then the output of EX-OR gate will be 1.
41) How Gray Code to Binary Conversion done?
Consider g0, g1, g2 and g3 is the Gray Code and it is need be converted into Binary Number. The
steps for Binary to Gray Code Conversion needs to be reversed to find out the equivalent Binary
Number
1. The Most Significant Bit (MSB) of the Binary is same as the First MSB of the Gray Code.
2. If the second Gray Bit is 0 then the second bit of the Binary is bit will be same as that of the First
Binary bit; if the Second Gray Bit is 1 then the Second Bit of the Binary will be inverse of its
previous binary bit. Refer the below image for easy understanding of Gray to Binary Conversion
32) Draw the circuit for 4 to 1 line multiplexer. (Apr 2017) [NOV – 2019]
Logic Diagram:
Page 58
Truth Table:
SELECT OUTPUT
INPUT
S1 S0 Y
0 0 I0
0 1 I1
1 0 I2
1 1 I3
UNIT II SYNCHRONOUS SEQUENTIAL CIRCUITS
TOPIC WISE POSSIBLE PART B UNIVERSITY QUESTIONS
PART B
SEQUENTIAL CIRCUITS
Sequential circuits:
Sequential circuits employ storage elements in addition to logic gates. Their outputs are a function of
the inputs and the state of the storage elements.
Because the state of the storage elements is a function of previous inputs, the outputs of a sequential
circuit depend not only on present values of inputs, but also on past inputs, and the circuit behavior
must be specified by a time sequence of inputs and internal states.
Realize SR Latch using NOR and NAND gates and explain its operation.
The SR latch is a circuit with two cross-coupled NOR gates or two cross-coupled NAND gates, and
two inputs labeled S for set and R for reset.
The SR latch constructed with two cross-coupled NOR gates is shown in Fig.
The latch has two useful states. When output Q = 1 and Q’= 0, the latch is said to be in the set state .
When Q = 0 and Q’ = 1, it is in the reset state . Outputs Q and Q’ are normally the complement of
each other.
However, when both inputs are equal to 1 at the same time, a condition in which both outputs are
equal to 0 (rather than be mutually complementary) occurs.
If both inputs are then switched to 0 simultaneously, the device will enter an unpredictable or
undefined state or a metastable state. Consequently, in practical applications, setting both inputs to 1
is forbidden.
FLIP FLOPS
Triggering of Flip Flops:
Explain about triggering of flip flops in detail.
The state of a latch or flip-flop is switched by a change in the control input. This momentary change
is called a trigger, and the transition it causes is said to trigger the flip-flop.
Level Triggering:
SR, D, JK and T latches are having enable input.
Latches are controlled by enable signal, and they are level triggered, either positive level triggered or
negative level triggered as shown in figure (a).
The output is free to change according to the input values, when active level is maintained at the
enable input.
Edge Triggering:
A clock pulse goes through two transitions: from 0 to 1 and the return from 1 to 0.
As shown in above Fig (b) and (c)., the positive transition is defined as the positive edge and the
negative transition as the negative edge.
The purpose is to convert a given type A FF to a desired type B FF using some conversion logic.
The key here is to use the excitation table, which shows the necessary triggering signal (S,R, J,K, D and
The table is then completed by writing the values of S and R required to get each Qp+1 from the
corresponding Qp. That is, the values of S and R that are required to change the state of the flip flop from
Qp to Qp+1 are written.
2. JK Flip Flop to SR Flip Flop
This will be the reverse process of the above explained conversion. S and R will be the external
inputs to J and K. As shown in the logic diagram below, J and K will be the outputs of the combinational
circuit. Thus, the values of J and K have to be obtained in terms of S, R and Qp. The logic diagram is
shown below.
A conversion table is to be written using S, R, Qp, Qp+1, J and K. For two inputs, S and R, eight
combinations are made. For each combination, the corresponding Qp+1 outputs are found. The outputs
for the combinations of S=1 and R=1 are not permitted for an SR flip flop. Thus the outputs are
considered invalid and the J and K values are taken as “don’t cares”.
3. SR Flip Flop to D Flip Flop
As shown in the figure, S and R are the actual inputs of the flip flop and D is the external input of
the flip flop. The four combinations, the logic diagram, conversion table, and the K-map for S and R in
terms of D and Qp are shown below.
Moore machine:
In the Moore model, the outputs are a function of present state only.
Mealy machine:
In the Mealy model, the outputs are a function of present state and external inputs.
Example:
A sequential circuit with two ‘D’ Flip-Flops A and B, one input (x) and one output (y).
The Flip-Flop input functions are:
DA= Ax+ Bx
DB= A’x and
the circuit output function is, Y= (A+ B) x’.
(a) Draw the logic diagram of the circuit, (b) Tabulate the state table, (c) Draw the state diagram.
Solution:
State table:
State diagram:
COUNTERS
Counter:
A counter is a register (group of Flip-Flop) capable of counting the number of clock pulse
arriving at its clock input.
A counter that follows the binary number sequence is called a binary counter.
Counter are classified into two types,
1. Asynchronous (Ripple) counters.
2. Synchronous counters.
In ripple counter, a flip- flop output transition serves as clock to next flip-flop.
o With an asynchronous circuit, all the bits in the count do not all change at the same time.
In a synchronous counter, all flip-flops receive common clock.
o With a synchronous circuit, all the bits in the count change synchronously with the
assertion of the clock
A counter may count up or count down or count up and down depending on the input control.
Uses of Counters:
The most typical uses of counters are
To count the number of times that a certain event takes place; the occurrence of event to be
counted is represented by the input signal to the counter
To control a fixed sequence of actions in a digital system
To generate timing signals
To generate clocks of different frequencies
**********************************
Modulo 16 /4 bit Ripple Down counter/ Asynchronous Down counter
Explain about Modulo 16 /4 bit Ripple Down counter.
The output of down-counter is decremented by one for each clock transition.
A 4-bit asynchronous down-counter consists of 4JK Flip-Flops.
The external clock signal is connected to the clock input of the first Flip-Flop.
The clock inputs of the remaining Flip-Flops are triggered by the Q output of the previous stage.
We know that in JK Flip-Flop, if J=1 , K=1 and clock is triggered the past output will be
complemented.
4- bitSynchronous up-counter:
Explain about 4-bit Synchronous up-counter.
In JK Flip-Flop, If J=0, K=0 and clock is triggered, the output never changes. If J=1 and K=1 and
the clock is triggered, the past outpit will be complemented.
Initially the register is cleared QDQCQBQA= 0000.
During the first clock pulse, JA= KA = 1, QA becomes 1, QB, QC, QD remains 0.
QDQCQBQA= 0001.
During second clock pulse, JA= KA = 1, QA=0.
JB= KB = 1, QB =1, QC, QD remains 0.
QDQCQBQA= 0010.
During third clock pulse, JA= KA = 1, QA=1.
JB= KB = 0, QB =1, QC, QD remains 0.
QDQCQBQA= 0011.
During fourth clock pulse, JA= KA = 1, QA=0.
JB= KB = 1, QB =0
JC= KC = 1, QC=1
QD remains 0
QDQCQBQA= 0100.
The same procedure repeats until the counter counts up to 1111.
*******************************************
4- bit Synchronous down-counter:
Explain about 4-Bit Synchronous down counter.
In JK Flip-Flop, If J=0, K=0 and clock is triggered, the output never changes. If J=1 and K=1 and the
clock is triggered, the past outpit will be complemented.
Initially the register is cleared QDQCQBQA= 0000
QDQCQBQA= 1111
************************************
Modulo 8 Synchronous Up/Down Counter:
Explain about Modulo 8 Synchronous Up/Down Counter.
In synchronous up-counter the QA output is given to JB, KBand QA. QB is given to JC, KC. But in
synchronous down –counter QAoutput is given toJB, KB and QA. QB is given to JC, KC.
If Up/Down =1, the 3-bit asynchronous up/down counter will perform up-counting. It will count from
000 to 111. If Up/Down =1 gates G2 and G4 are disabled and gates G1 and G3 are enabled. So that the
circuit behaves as an up-counter circuit.
If Up/Down =0, the 3-bit asynchronous up/down counter will perform down-counting. It will count from
111 to 000. If Up/Down =0 gates G2 and G4 are enabled and gates G1 and G3 are disabled. So that the
circuit behaves as an down-counter circuit.
DESIGN OF RIPPLE COUNTERS
3- Bit Asynchronous Binary Counter/ modulo -7 ripple counter:
Design a 3-bit binary counter using T-flip flops. [NOV – 2019]
Explain about 3-Bit Asynchronous binary counter. (Nov -2009)
The following is a three-bit asynchronous binary counter and its timing diagram for one cycle. It
works exactly the same way as a two-bit asynchronous binary counter mentioned above, except it has
eight states due to the third flip-flop.
Asynchronous counters are commonly referred to as ripple counters forthe following reason: The
effect of the input clock pulse is first “felt” by FFO. This effect cannot get to FF1 immediately because of
the propagation delay through FF0. Then there is the propagation delay through FF1 before FF2 can be
triggered. Thus, the effect of an input clock pulse “ripples” through the counter, taking some time, due
topropagation delays, to reach the last flip-flop.
**********************************
ANALYSIS OF CLOCKED SEQUENTIAL
CIRCUIT
Design and analyze of clocked sequential circuit with an example.
The analysis of a sequential circuit consists of obtaining a table or a diagram for the time sequence of
inputs, outputs and internal states.
Consider the sequential circuit is shown in figure. It consists of two D flip-flops A and B, an input x and
an output y.
A state equation specifies the next state as function of the present state and inputs.
A(n+1)= A(n)x(n)+B(n)x(n)
B(n +1)= A(n)x(n)
They can be written in simplified form as,
A(n+1) = Ax +Bx
B(n +1) = Ax
The present state value of the output can be expressed algebraically as,
y(n)=(A+B) x
State Diagram:
State diagram is the graphical representation of the information available in a state table.
In state diagram, a state is represented by a circle and the transitions between states are indicated by
directed lines connecting the circles.
State Table:
A state table gives the time sequence of inputs, outputs ad flip flops states. The table consists of
four sections labeled present state, next state, input and output.
The present state section shows the states of flip flops A and B at any given time ‘n’. The input
section gives a value of x for each possible present state.
The next state section shows the states of flip flops one clock cycle later, at time n+1.
The state table for the circuit is shown. This is derived using state equations.
The above state table can also be expressed in different forms as follows.
The state diagram for the logic circuit in below figure.
Excitation table:
K-Map:
Logic Diagram:
*************************************
Truth table:
K-Map:
Logic Diagram:
SHIFT REGISTERS
Explain various types of shift registers. (or) Explain the operation of a 4-bit bidirectional shift register.
(Or) What are registers? Construct a 4 bit register using D-flip flops and explain the operations on
the register. (or) With diagram explain how two binary numbers are added serially using shift
registers. (Apr – 2019)[NOV – 2019]
A register is simply a group of Flip-Flops that can be used to store a binary number.
There must be one Flip-Flop for each bit in the binary number.
For instance, a register used to store an 8-bit binary number must have 8 Flip-Flops.
The Flip-Flops must be connected such that the binary number can be entered (shifted) into the
register and possibly shifted out.
A group of Flip-Flops connected to provide either or both of these functions is called a shift register.
A register capable of shifting the binary information held in each cell to its neighboring cell in a
selected direction is called a shift register.
There are four types of shift registers namely:
1. Serial In Serial Out Shift Register,
2. Serial In Parallel Out Shift Register
3. Parallel In Serial Out Shift Register
4. Parallel In Parallel Out Shift Register
As seen, it accepts data serially .i.e., one bit at a time on a single input line. It produces the stored
information on its single output also in serial form.
Data may be shifted left using shift left register or shifted right using shift right register.
As shown in above figure,the clock pulse is applied to all the flip-flops simultaneously.
The output of each flip-flop is connected to D input of the flip-flop at its right.
Each clock pulse shifts the contents of the register one bit position to the right.
New data is entered into stage A whereas the data presented in stage D are shifted out.
For example, consider that all stages are reset and a steady logical 1 is applied to the serial input
line.
When the first clock pulse is applied, flip-flop A is set and all other flip-flops are reset.
When the second clock pulse is applied, the ‘1’ on the data input is shifted into flip-flop A and ‘1’
that was in flip flop A is shifted to flip-flop B.
This continues till all flip-flop sets.
The data in each stage after each clock pulse is shown in table below
The clock is applied to all the flip-flops simultaneously. The output of each flip-flop is connected
to D input of the flip-flop at its left.
Each clock pulse shifts the contents of the register one bit position to the left.
Let us illustrate the entry of the 4-bit binary number 1111 into the register beginning with the
right most bit.
When the first clock pulse is applied, flip flop A is set and all other flip-flops are reset.
When second clock pulse is applied, ’1’ on the data input is shifted into flip-flop A and ‘1’ that
was in flip flop A is shifted toflip-flop B. This continues fill all flip-flop are set.
The data in each stage after each clock pulse is shown in table below.
2. Serial in Parallel out shift register:
A 4 bit serial in parallel out shift register is shown in figure.
It consists of one serial input and outputs are taken from all the flip-flops simultaneously.
The output of each flip-flop is connected to D input of the flip-flop at its right. Each clock pulse
shifts the contents of the register one bit position to the right.
For example, consider that all stages are reset and a steady logical ‘1’ is applied to the serial
input line.
When the first clock pulse is applied flip flop A is set and all other flip-flops are reset.
When the second pulse is applied the ‘1’ on the data input is shifted into flip flop A and ‘1’ that
was in flip flop A is shifted into flip-flop B. This continues till all flip-flops are set. The data in
each stage after each clock pulse is shown in table below.
3. Parallel In Serial Out Shift register:
For register with parallel data inputs, register the bits are entered simultaneously into their
respective stages on parallel lines.
A four bit parallel in serial out shift register is shown in figure. Let A,B,C and D be the four
parallel data input lines and SHIFT/LOAD is a control input that allows the four bits of data to be
entered in parallel or shift the serially.
When SHIFTS/LOAD is low, gates G1 through G3 are enabled, allowing the data at parallel
inputs to the D input of its respective flip-flop. When the clock pulse is applied the flip-flops with
D=1 will set and those with D=0 will reset, thereby storing all four bits simultaneously.
When SHIFT/LOADis high. AND gates G1 through G3 are disabled and gates G4 through G6are
enabled, allowing the data bits to shifts right from one stage to next. The OR gates allow either
the normal shifting operation or the parallel data entry operation, depending on which AND gates
are enabled by the level on the SHIFT/LOAD input.
Parallel In Parallel OutShift Register:
In parallel in parallel out shift register, data inputs can be shifted either in or out of the register in
parallel.
A four bit parallel in parallel out shift register is shown in figure.Let A, B, C, D be the four
parallel data input lines and QA,QB,QC and QD be four parallel data output lines. The
SHIFT/LOAD is the control input that allows the four bits data to enter in parallel or shift the
serially.
When SHIFT/LOAD is low, gates G1 through G3 are enabled, allowing the data at parallel inputs
to the D input of its respective flip-flop. When the clock pulse is applied, the flip-flops with D =1
willset those with D=0 will reset thereby storing all four bits simultaneously. These are
immediately available at the outputs QA,QB,QC and QD.
When SHIFT/LOAD is high, gates G1, through G3 are disabled and gates G4 through G6 are
enabled allowing the data bits to shift right from one stage to another. The OR gates allow either
the normal shifting operation or the parallel data entry operation, depending on which AND gates
are enabled by the level on the SHIFT/LOAD input.
**************************************
Universal Shift Register:
Explain about universal shift register.( Apr -2018)
A register that can shift data to right and left and also has parallel load capabilities is called
universal shift register.
It has the following capabilities.
1. A clear control to clear the register to 0.
2. A clock input to synchronize the operations.
3. A shift right control to enable the shift right operation and the associated serial input
and output lines.
4. A shift left control to enable the shift left operation and the associated serial input and
output lines.
5. A parallel load control to enable a parallel transfer and the n input lines.
6. n parallel output lines.
7. A control state that leaves the information in the register unchanged in the presence of
the clock.
The diagram of 4-bit universal shift register that has all that capabilities listed above is shown in
figure. It consists of four D flip-flop and four multiplexers. All the multiplexers have two
common selection inputs S1 and S0. Input 0 is selected when S1S0=00, input 1 is selected when
S1S0=01 and similarly for other two inputs.
The selection inputs control the mode of operation of the register. When S1S0=00, the present
value of the register is applied to the D inputs of the flip-flop. The next clock pulse transfers into
each flip-flop the binary value it held previously, and no change of state occurs.
When S1S0=01,terminal 1 of the multiplexer inputs has a path to be the D inputs of the flip-flops.
This causes a shift right operation, with the serial input transferred into flip-flop A3.
When S1S0=10, a shift left operation results with the other serial input going into flip-flop A0.
Finally, when S1 S0 = 11, the binary information on the parallel input lines is transferred into the
register simultaneously during the next clock edge. The function table is shown below.
***********************************
**** SHIFT REGISTER COUNTERS:
Explain about Johnson and Ring counter. (Nov 2018)
Most common shift register counters are Johnson counter and ring counter.
Johnson counter:
A 4 bit Johnson counter using D flip-flop is shown in figure. It is also called shift counter or
twisted counter.
The output of each flip-flop is connected to D input of the next stage. The inverted output of last
flip-flop QDis connected to the D input of the first flip-flop A.
Initially, assume that the counter is reset to 0. i.e., QA QB QC QD =0000. The value at DB =
DC=DD=0, whereas DA =1 since QD.
When the first clock pulse is applied, the first flip-flop A is set and the other flip-flops are reset.
i.e., QA QB QC QD =1000.
When the second clock pulse is applies, the counter is QA QB QC QD = 1100. This continues and
the counter will fill up with 1’s from left to right and then it will fill up with 0’s again.
The sequence of states is shown in the table. As observed from the table, a 4-bit shift counter has
8 states. In general, an n-flip-flop Johnson counter will result in 2n states.
As shown in figure, the true output of flip-flop D. i.e., QD is connected back to serial input of flip-
flop A.
Initially, 1 preset into the first flip-flop and the rest of the flip-flops are cleared i.e.,
QAQBQCQD=1000.
When the first clock pulse is applied, the second flip-flop is set to 1while the other three flip flops
are reset to 0.
When the second clock pulse is applied, the ‘1’ in the second flip-flop is shifted to the third flip-
flop and so on.
The truth table which describes the operation of the ring counter is shown below.
As seen a 4-bit ring counter has 4 states. In general, an n-bit ring counter has n states. Since a
single ‘1’ in the register is made to circulate around the register, it is called a ring counter. The
timing diagram of the ring counter is shown in figure.
TWO MARK QUESTIONS & ANSWERS
TOPIC WISE POSSIBLE 2 MARKS UNIVERSITY QUESTIONS
SECOND HALF TOPIC: Analysis and design of clocked sequential circuits – Design-Moore/Mealy
models, state minimization, state assignment, circuit implementation - Registers – Counters.
26. What is mealy and Moore circuit? Or what are the models used to represent clocked sequential
circuits?
Mealy circuit is a network where the output is a function of both present state and input.
Moore circuit is a network where the output is function of only present state
FIRST HALF TOPIC: Functional Units of a Digital Computer: Von Neumann Architecture –
Operation and Operands of Computer Hardware Instruction – Instruction Set Architecture (ISA):
Memory Location, Address
and Operation
Input Unit:
A computer must receive both data and program statements to function properly and
be able to solve problems. The method of feeding data and programs to a computer is
accomplished by an input device.
Computer input devices read data from a source, such as magnetic disks, and translate that data
into electronic impulses for transfer into the CPU. Some typical input devices are a keyboard, a
mouse or a scanner.
Output Unit
The output unit is the counterpart of the input unit. Its function is to send processed results to the
outside world.
The most familiar example of such a device is a printer. Printers employ mechanical impact
heads, inkjet streams, or photocopying techniques, as in laser printers, to perform the printing. It
produces printers capable of printing as many as 10,000 lines per minute.
This is a tremendous speed for a mechanical device but is still very slow compared to the
electronic speed of a processor unit. Monitors, Speakers, Headphones and projectors are also
some of the output devices.
Some units, such as graphic displays, provide both an output function and an input function. The
dual role of input and output of such units are referred with single name as I/O unit in many cases.
Speakers, Headphones and projectors are some of the output devices. Storage devices such
as hard disk, floppy disk, flash drives are also used for input as well as output.
Memory Unit
The function of the memory unit is to store programs and data. There are two classes of storage,
called primary and secondary. Primary storageis a fast memory that operates at electronic
speeds.
Programs must be stored in the memory while they are being executed. The memory contains a
large number of semiconductor storage cells, each capable of storing one bit of information.
These cells are rarely read or written as individual cells but instead are processed in groups of
fixed size called words.
The memory is organized so that the contents of one word, containing n bits, can be stored or
retrieved in one basic operation.
To provide easy access to any word in the memory, a distinct address is associated with each
word location. Addresses are numbers that identify successive locations.
A given word is accessed by specifying its address and issuing a control command that starts the
storage or retrieval process. The number of bits in each word is often referred to as the word
length of the computer.
Typical word lengths range from 16 to 64 bits. The capacity of the memory is one factor that
characterizes the size of a computer.
Programs must reside in the memory during execution. Instructions and data can be written into
the memory or read out under the controller of the processor.
It is essential to be able to access any word location in the memory as quickly as possible.
Memory in which any location can be reached in a short and fixed amount of time after specifying
its address is called random-access Memory (RAM).
The time required to access one word is called the memory access time. This time is fixed,
independent of the location of the word being accessed. It typically ranges from a few
nanoseconds (ns) to about 100 ns for modem RAM units.
The memory of a computer is normally implemented as a Memory hierarchy of three or four
levels of semiconductor RAM units with different speeds and sizes.
The small, fast, RAM units are called caches. They are tightly coupled with the processor and are
often contained on the same integrated circuit chip to achieve high performance.
The largest and slowest unit is referred to as the main Memory. Although primary storage is
essential, it tends to be expensive.
Thus additional, cheaper, secondary storage is used when large amounts of data and many programs have
to be stored, particularly for information that is access infrequently. A wide selection of secondary
storage device is available, including magnetic disks and tapes and optical disks
Arithmetic and Logic Unit(ALU):
ALU is a digital circuit that performs two types of operations arithmetic and logical.
Arithmetic operations are the fundamental mathematical operations consisting of addition,
subtraction, multiplication and division. Logical operations consists of comparisons. (i.e) Two
pieces of data are compared to see whether one is equal to, less than, or greater than the other.
The ALU is a fundamental building block of the central processing unit of a computer. Memory
enables a computer to store, at least temporarily, data and programs.
Memory also known as the primary storage or main memory - is a part of the microcomputer that
holds data for processing, instructions for processing the data (the program) and information
(processed data).
Part of the contents of the memory is held only temporarily. (i.e)It is stored only as long as the
microcomputer is turned on. When you turn the machine off, the contents are lost.
The control unit instructs the arithmetic-logic unit which operation to perform and then sees that
the necessary numbers are supplied. The control and arithmetic & logic units are many times
faster than other devices connected to a computer system.
Control Unit (CU):
It is the part of a CPU that directs its operation. The control unit instructs the rest of the computer system
how to carry out a program‘s instructions.
It directs the movement of electronic signals between memories, which temporarily holds data,
instructions & processed information and the ALU.
It also directs these control signals between the CPU and input/output devices. The control unit is
the circuitry that controls the flow of information through the processor, and coordinates
the activities of the other units within it.
1.1 VON NEUMANN ARCHITECTURE
Von Neumann Architecture also known as the Von Neumann model, the computer consisted of
a CPU, memory and I/O devices.
The program is stored in the memory. The CPU fetches an instruction from the memory at a time
and executes it.
Thus, the instructions are executed sequentially which is a slow process. Neumann m/c are called
control flow computer because instruction are executed sequentially as controlled by a program
counter.
To increase the speed, parallel processing of computer have been developed in which serial
CPU‘s are connected in parallel to solve a problem. Even in parallel computers, the basic
building blocks are Neumann processors.
The von Neumann architecture is a design model for a stored-program digital computer that uses
a processing unit and a single separate storage structure to hold both instructions and data.
It is named after mathematician and early computer scientist John von Neumann.
Such a computer implements a universal Turing machine, and the common ―referential model‖ of
specifying sequential architectures, in contrast with parallel architectures.
Answer
The translation from C to MIPS assembly language instructions are performed by the compiler.
Show the MIPS code produced by a compiler.
A MIPS instruction operates on two source operands and places the result in one destination
operand.
Hence, the two simple statements above compile directly into these two MIPS assembly language
instructions:
Answer
The compiler must break this statement into several assembly instructions, since only one
operation is performed per MIPS instruction.
The first MIPS instruction calculates the sum of g and h. We must place the result somewhere, so
the compiler creates a temporary variable, called t0:
Although the next operation is subtract, we need to calculate the sum of i and j before we can
subtract.
Thus, the second instruction places the sum of i and j in another temporary variable created by the
compiler, called t1:
Finally, the subtract instruction subtracts the second sum from the first and places the difference
in the variable f, completing the compiled code:
Memory Operands
Programming languages have simple variables that contain single data elements, as in these
examples, but they also have more complex data structures—arrays and structures.
These complex data structures can contain many more data elements than there are registers in a
computer.
The processor can keep only a small amount of data in registers, but computer memory contains
billions of data elements.
Hence, data structures (arrays and structures) are kept in memory.
As explained above, arithmetic operations occur only on registers in MIPS instructions; thus,
MIPS must include instructions that transfer data between memory and registers. Such
instructions are called data transfer instructions.
To access a word in memory, the instruction must supply the memory address.
Memory is just a large, single-dimensional array, with the address acting as the index to that
array, starting at 0. For example, in the following Figure, the address of the third data element is
2, and the value of Memory[2] is 10
MIPS Code
In the given statement, there is a single operation. Whereas, one of the operands is in memory, so
we must carry this operation in two steps:
Step 1: load the temporary register($s3) + 8
Step 2: perform addition with h(($s2)), and store result in g($s1)
The constant in a data transfer instruction (8) is called the offset, and the register added to form
the address ($s3) is called the base register.
Example 2:
Compiling Using Load and Store
What is the MIPS assembly code for the C assignment statement below?
Assume variable h is associated with register $s2 and the base address of the array A is in $s3.
MIPS code
The final instruction stores the sum into A[12], using 48 (4 × 12) as the offset and register $s3 as
the base register
Load word and store word are the instructions that copy words between memory and registers in
the MIPS architecture. Other brands of computers use other instructions along with load and store
to transfer data.
Constant or Immediate Operands
Constant variables are used as one of the operand for many arithmetic operation in MIPS
architecture
The constants would have been placed in memory when the program was loaded.
To avoid load instruction used in arithmetic instruction we can use one operand is a constant
This quick add instruction with one constant operand is called add immediate or add i. To add 4
to register $s3, we just write
Design Principle 3: Make the common case fast.
A byte is always 8 bits, but the word length typically ranges from 16 to 64 bits. It is
impractical to assign distinct addresses to individual bit locations in the memory.
The most practical assignment is to have successive addresses refer to successive byte
locations in the memory. This is the assignment used in most modern computers. The term
byte-addressable memory is used for this assignment. Byte locations have addresses 0, 1, 2,....
Thus, if the word length of the machine is 32 bits, successive words are located at addresses 0,
4, 8, , with each word consisting of four bytes.
3.1.2 Big-Endian and Little-Endian Assignments
The name big-endian is used when lower byte addresses are used for the more significant bytes
(the leftmost bytes) of the word.
The name little-endian is used for the opposite ordering, where the lower byte addresses are
used for the less significant bytes (the rightmost bytes) of the word.
The words ―more significant‖ and ―less significant‖ are used in relation to the weights (powers of
2) assigned to bits when the word represents a number.
Both little-endian and big-endian assignments are used in commercial machines. In both cases,
byte addresses 0, 4, 8,..., are taken as the addresses of successive words in the memory of a
computer with a 32-bit word length.
These are the addresses used when accessing the memory to store or retrieve a word.
UNIT IV PROCESSOR
FIRST HALF TOPIC: Instruction Execution – Building a Data Path – Designing a Control
Unit – Hardwired Control
PART B
1. Briefly explain about Basic MIPS Implementation. Nov / Dec 2015, 2018
A Basic MIPS Implementation:
We will be examining an implementation that includes a subset of the core MIPS instruction
set:(Micro Instruction per Second)
The memory-reference instructions load word (lw) and store word (sw)
The arithmetic-logical instructions add, sub, AND, OR, and slt
The instructions branch equal (beq) and jump (j), which we add last
This subset does not include all the integer instructions (for example, shift, multiply, and divide
are issuing), nor does it include any floating-point instructions.
However, the key principles used in creating a data path and designing the control are illustrated.
The implementation of the remaining instructions is similar. In examining the implementation,
we will have the opportunity to see how the instruction set architecture determines many aspects
of the implementation, and how the choice of various implementation strategies affects the clock
rate and CPI for the computer.
In addition, most concepts used to implement the MIPS subset in this chapter are the same basic
ideas that are used to construct a broad spectrum of computers, from high-performance servers to
general-purpose microprocessors to embedded processors.
An Overview of the Implementation
MIPS instructions, including the integer arithmetic-logical instructions, the memory-reference
instructions, and the branch instructions.
What needs to be done to implement these instructions is the same, independent of the exact
class of instruction.
For every instruction, the first two steps are identical:
1. Send the program counter (PC) to the memory that contains the code and fetch the instruction
from that memory.
2. Read one or two registers, using fields of the instruction to select the registers to read. For the
load word instruction, we need to read only one register, but most other instructions require that
we read two registers.
After these two steps, the actions required to complete the instruction depend on the instruction
class. Fortunately, for each of the three instruction classes (memory-reference, arithmetic-
logical, and branches), the actions are largely the same, independent of the exact instruction.
The simplicity and regularity of the MIPS instruction set simplifies the implementation by
making the execution of many of the instruction classes similar.
For example,
All instruction classes, except jump, use the arithmetic-logical unit (ALU) after reading the
registers.
The memory-reference instructions use the ALU for an address calculation, the arithmetic-
logical instructions for the operation execution, and branches for comparison. After using the
ALU, the actions required to complete various instruction classes differ.
A memory-reference instruction will need to access the memory either to read data for a load or
write data for a store.
An arithmetic-logical or load instruction must write the data from the ALU or memory back into
a register. Lastly, for a branch instruction, we may need to change the next instruction address
based on the comparison; otherwise, the PC should be incremented by 4 to get the address of the
next instruction.
An abstract view of the implementation of the MIPS subset showing the Major
functional units and the major connections between them
All instructions start by using the program counter to supply the instruction address to the
instruction memory.
After the instruction is fetched, the register operands used by an instruction are specified by
fields of that instruction.
Once the register operands have been fetched, they can be operated on to compute a memory address
(for a load or store), to compute an arithmetic result (for an integer arithmetic-logical instruction), or
a compare (for a branch).
If the instruction is an arithmetic-logical instruction, the result from the ALU must be written to
a register. If the operation is a load or store, the ALU result is used as an address to either store a
value from the registers or load a value from memory into the registers.
The result from the ALU or memory is written back into the register file. Branches require the
use of the ALU output to determine the next instruction address, which comes either from the
ALU (where the PC and branch offset are summed) or from an added that increments the current
PC by 4.
The thick lines interconnecting the functional units represent buses, which consist of multiple
signals. The arrows are used to guide the reader in knowing how information flows. Since signal
lines may cross, we explicitly show when crossing lines are connected by the presence of a dot
where the lines cross.
BUILDING A DATA PATH
2. Give detail description about Building a Data path.(or) Build a suitable Data path for branch
instruction. Explain all the blocks with suitable example. Nov/Dec 2021
Data path element: A unit used to operate on or hold data within a processor. In the MIPS
implementation, the data path elements include the instruction and data memories, the register file, the
ALU and adders.
A memory unit to store the instructions of a program and supply instructions given an address. The
program counter (PC), is a register that holds the address of the current instruction. We need an adder to
increment the PC to the address of the next instruction.
Two state elements are needed to store and access instructions, and an adder is needed
to compute the next instruction address.
The state elements are the instruction memory and the program counter. The instruction memory
need only provide read access because the data path does not write instructions.
Since the instruction memory only reads, we treat it as combinational logic: the output at any
time reflects the contents of the location specified by the address input, and no read control
signal is needed. (We will need to write the instruction memory when we load the program; this
is not hard to add, and we ignore it for simplicity.)
The program counter is a 32‑bit register that is written at the end of every clock cycle and thus
does not need a write control signal. The adder is an ALU wired to always add its two 32‑bit
inputs and place the sum on its output.
Simply by wiring the control lines so that the control always specifies an add operation. We will
draw such an ALU with the label Add, to indicate that it has been permanently made an adder
and cannot perform the other ALU functions. To execute any instruction, we must start by
fetching the instruction from memory.
To prepare for executing the next instruction, we must also increment the program counter so
that it points at the next instruction, 4 bytes later how to combine the three elements to form a
datapath that fetches instructions and increments the PC to obtain the address of the next
sequential instruction.
Now let’s consider the R-format instructions. They all read two registers, perform an ALU
operation on the contents of the registers, and write the result to a register.
We call these instructions either R-type instructions or arithmetic-logical instructions(since
they perform arithmetic or logical operations). This instruction class includes add, sub, AND,
OR, and slt, Recall that a typical instance of such an instruction is add $t1,$t2,$t3, which reads
$t2 and $t3 and writes $t1.
The processor’s 32 general-purpose registers are stored in a structure called a register file. A
register file is a collection of registers in which any register can be read or written by specifying
the number of the register in the file. The register file contains the register state of the computer.
In addition, we will need an ALU to operate on the values read from the registers.
R-format instructions have three register operands, so we will need to read two data words from
the register file and write one data word into the register file for each instruction. For each data
word to be read from the registers, input to the register file that specifies the register number to
be read and an output from the register file that will carry the value that has been read from the
registers.
A portion of the datapath used for fetching instructions and incrementing the program
counter.
The fetched instruction is used by other parts of the datapath.
The two units needed to implement loads and stores, in addition to the register file and
ALU
The beq instruction has three operands, two registers that are compared for equality, and a 16‑bit
offset used to compute the branch target address relative to the branch instruction address. Its
form is beq $t1,$t2,offset. To implement this instruction, we must compute the branch target
address by adding the sign-extended offset field of the instruction to the PC.
There are two details in the definition of branch instructions
The instruction set architecture specifies that the base for the branch address calculation is the
address of the instruction following the branch. Since we compute PC + 4 (the address of the
next instruction) in the instruction fetch datapath, it is easy to use this value as the base for
computing the branch target address.
The architecture also states that the offset field is shifted left 2 bits so that it is a word offset; this
shift increases the effective range of the offset field by a factor of 4.
To deal with the later complication, we will need to shift the offset field by 2.
Branch taken.A branch where the branch condition is satisfied and the program counter (PC)
becomes the branch target.All unconditional branches are taken branches.
Branch not taken or (untaken branch) .A branch where the branch condition is false and the
program counter (PC) becomes the address of the instruction that sequentially follows the branch
The datapath for a branch uses the ALU to evaluate the branch condition and a
separate adder to compute the branch target as the sum of the incremented PC and
the sign-extended, lower 16 bits of the instruction (the branch displacement), shifted
left 2 bits.
The unit labeled Shift left 2is simply a routing of the signals between input and output that adds
00two to the low-order end of the sign-extended offset field; no actual shift hardware is needed,
since the amount of the “shift” is constant.
Since we know that the offset was sign-extended from 16 bits, the shift will throw away only
“sign bits.” Control logic is used to decide whether the incremented PC or branch target should
replace the PC, based on the Zero output of the ALU.
How the ALU control bits are set depends on the ALUOp control bits and the
different function codes for the R-type instruction.
The opcode, listed in the first column, determines the setting of the ALUOp bits. All the
encodings are shown in binary.
Notice that when the ALUOp code is 00 or 01, the desired ALU action does not depend on the
function code field; in this case, we say that we “don’t care” about the value of the function
code, and the funct field is shown as XXXXXX. When the ALUOp value is 10, then the function
code is used to set the ALU control input.
There are several different ways to implement the mapping from the 2‑bit ALUOp field and the
6‑bit funct field to the four ALU operation control bits.
Because only a small number of the 64 possible values of the function field are of interest and
the function field is used only when the ALUOp bits equal 10, we can use a small piece of logic
that recognizes the subset of possible values and causes the correct setting of the ALU control
bits.
As a step in designing this logic, it is useful to create a truth table for the interesting combinations of the
function code field and the ALU Op bits. The below truth table shows how the 4‑bit ALU control is set
depending on these two input fields.
The truth table for the 4 ALU control bits (called Operation).
The inputs are the ALUOp and function code field. Only the entries for which the ALU control
is asserted are shown.
Some don’t-care entries have been added. For example, the ALUOp does not use the encoding
11, so the truth table can contain entries 1X and X1, rather than 10 and 01.
Note that when the function field is used, the first 2 bits (F5 and F4) of these instructions are
always 10, so they are don’t-care terms and are replaced with XX in the truth table.
Don’t-care term: An element of a logical function in which the output does not depend on the
values of all the inputs.
4. Give detail description about the Design of Main Control Unit.
Designing the Main Control Unit
To understand how to connect the fields of an instruction to the data path, it is useful to review the
formats of the three instruction classes:
The R-type instruction classes,
Branch instruction classes, and
Load-store instruction classes
The three instruction classes (R-type, load and store, and branch) use two different
instruction formats
The jump instructions use another format, which we will discuss shortly.
(a). Instruction format for R-format instructions, which all have an opcode of 0. These
instructions have three register operands: rs, rt, and rd. Fields rs and rt are sources, and rd is the
destination. The ALU function is in the funct field and is decoded by the ALU control design in the
previous section. The R-type instructions that we implement are add, sub, AND, OR, and slt. The shamt
field is used only for shifts; we will ignore it in this chapter.
(b). Instruction format for load (opcode = 35ten) and store (opcode = 43ten) instructions. The
register rs is the base register that is added to the 16‑bit address field to form the memory address. For
loads, rt is the destination register for the loaded value. For stores, rt is the source register whose value
should be stored into memory.
(c). Instruction format for branch equal (opcode = 4). The registers rs and rt are the source
registers that are compared for equality. The 16‑bit address field is sign-extended, shifted, and added to
the PC+4 to compute the branch target address.
There are several major observations about this instruction format that we will rely on:
The op field, also called the opcode, is always contained in bits 31:26. We will refer to this field
as Op[5:0].
The two registers to be read are always specified by the rs and rt fields, at positions 25:21 and
20:16. This is true for the R-type instructions, branch equal, and store.
The base register for load and store instructions is always in bit positions 25:21 (rs).
The 16‑bit offset for branch equal, load, and store is always in positions 15:0.
The destination register is in one of two places. For a load it is in bit positions 20:16 (rt), while
for an R-type instruction it is in bit positions 15:11 (rd). Thus, we will need to add a multiplexor
to select which field of the instruction is used to indicate the register number to be written.
Using this information, we can add the instruction labels and extra multiplexor (the Write
register number input of the register file) to the simple datapath.
These additions plus the ALU control block, the write signals forstate elements, the read signal
for the data memory, and the control signals for the multiplexors. Since all the multiplexors have
two inputs, they each require a single control line.
Seven single bit control lines plus the 2‑bit ALUOp control signal. We have already defined
how the ALUOp control signal works, and it is useful to define what the seven other control
signals do informally before we determine how to set these control signals during instruction
execution.
The data path of all necessary multiplexors and all control lines identified.
The control lines are shown in color. The ALU control block has also been added. The PC does
not require a write control, since it is written once at the end of every clock cycle; the branch
control logic determines whether it is written with the incremented PC or the branch target
address.
The effect of each of the seven control signals.
When the 1‑bit control to a two-way multiplexor is asserted, the multiplexor selects the input
corresponding to 1. Otherwise, if the control is disserted, the multiplexor selects the 0 input.
Remember that the state elements all have the clock as an implicit input and that the clock is
used in controlling writes. Gating the clock externally to a state element can create timing
problems.
5. Briefly explain about Operation of the Data path with neat diagram. Apr. / May
2018,Nov/Dec2020. Nov/Dec 2021
Operation of the Data path:
The active functional units and asserted control lines for a load. We can think of a load instruction as
operating in
Five steps (similar to the R-type executed in four):
1. An instruction is fetched from the instruction memory, and the PC is incremented.
2. A register ($t2) value is read from the register file.
3. The ALU computes the sum of the value read from the register file and the Sign-extended, lower16
bits of the instruction (offset).
4. The sum from the ALU is used as the address for the data memory.
5. The data from the memory unit is written into the register file; the register destination is givenby bits
20:16 of the instruction ($t1).
The data path in operation for a load instruction:
The control lines, data path units, and connections that are active are highlighted.
A store instruction would operate very similarly. The main difference would be that the memory
control would indicate a write rather than a read, the second register value read would be used
for the data to store, and the operation of writing the data memory value to the register file would
not occur.
The data path in operation for a branch-on-equal instruction.
Finally, we can show the operation of the branch-on-equal instruction, such as beq $t1,$t2,offset
in the same fashion.
It operates much like an R‑format instruction, but the ALU output is used to determine whether
the PC is written with PC + 4 or the branch target address.
The four steps for execution:
1. An instruction is fetched from the instruction memory, and the PC is incremented.
2. Two registers, $t1 and $t2, are read from the register file.
3. The ALU performs subtract operation on the data values read from the register file. The value of
PC+4is added to the sign-extended, lower 16 bits of the instruction (offset) shifted left by two; Result is
in the branch target address.
4. The Zero result from the ALU is used to decide which adder result to store into the PC.
The control memory address register specifies the address of the microinstruction and the control
data register holds the microinstruction read from memory the microinstruction contains a
control word that specifies one or more micro operations for the data processor.
Once these operations are executed, the control must determine the next address. The location of
the next microinstruction may be the one next in sequence, or it may be locate somewhere else in
the control memory. For this reason it is necessary to use some bits of the present
microinstruction to control the generation of the address of the next microinstruction.
The next address may also be a function of external input conditions. While the micro operations
are being executed, the next address is computed in the next address generator circuit and then
transferred into the control address register to read the next microinstruction.
The next address generator is sometimes called a micro program sequencer, as it determines the
address sequence that is read from control memory, the address of the next microinstruction can
be specified several ways, depending on the sequencer inputs.
Typical functions of a micro program sequencer are incrementing the control address register by
one, loading into the control address register an address from control memory, transferring an
external address or loading an initial address to start the control operations.
The main advantages of the micro programmed control are the fact that once the hardware
configuration is established; there should be no need for further hardware or wiring changes.
If we want to establish are different control sequence for the system, all we need to do is specify
different set microinstructions for control memory.
The hardware configuration should not be changed for different operations; the only thing that
must be changed is the micro program residing in control memory.
Microinstructions are stored in control memory in groups, with each group specifying routine.
Each computer instruction has micro program routine in control memory to generate the micro
operations that execute the instruction.
The hardware that controls the address sequencing of the control memory must be capable of
sequencing the microinstructions within a routine and be to branch from one routine to another.
The address sequencing capabilities required in a control memory are:
1. Incrementing of the control address register.
2. Unconditional branch or conditional branch, depending on status bit conditions.
3. A mapping process from the bits of the instruction to an address for control memory.
4. A facility for subroutine call and return.
Figure shows a block diagram of control memory and the associated hardware needed for
selecting the next microinstruction address.
The microinstruction in control memory contains a set of bits to initiate micro operations in
computer registers and other bits to specify the method by which the address is obtained.
The diagram shows four different paths from which the control address register (CAR) receives
the address.
The incrementer increments the content of the control address register by one, to select the next
microinstruction in sequence.
Branching is achieved by specifying the branch address in one of the fields of the
microinstruction.
Conditional branching is obtained by using part of the microinstruction to select a specific status
bit in order to determine its condition.
An external address is transferred into control memory via a mapping logic circuit. The return
address for a subroutine is stored in a special register whose value is then used when the micro
program wishes to return from the subroutine.
Selection address for control memory
PIPELINING
6. Explain a 4-stage instruction pipeline. Explain the issues affecting pipeline performance. (Or)
Discus the basic concepts of pipelining. (Apr/May2012) (May/June2013)Nov / Dec 2015,
2016,Nov/Dec 2020.
OVERVIEW:
Role of cache memory
Pipelining Performance
Pipelining is an implementation technique in which multiple instructions are overlapped in
execution pipelining is key to make processor fast.
A pipeline can be visualized as a collection of processing segments through which binary
information follows.
In computer architecture Pipelining means executing machine instructions concurrently. The
pipelining is used in modern computers to achieve high performance. The speed of execution of
programs is influenced by many factors.
One way to improve performance is to use faster circuit technology to build the processor and
the main memory.
Another possibility is to arrange the hardware so that more than one operation can be performed
at the same time
In this way, the number of operations performed per second is increased even though the elapsed
time needed to perform anyone operation is not changed.
Pipelining is a particularly effective way of organizing concurrent activity in a computer system.
The basic idea is very simple. It is frequently encountered in manufacturing plants, where
pipelining is commonly known as an assembly-line operation.
The processor executes a program by fetching and executing instructions, one after the other. Let
Fi and Ei refer to the fetch and execute steps for instruction Ii.
An execution of a program consists of a sequence of fetch and execute steps as shown below.
Performance measures:
The various performance measures of pipelining are
Throughput
CPI
Speedup
Dependencies
Hazards
Throughput
The number of instruction executed per second CPI(clock cycle per instruction)
The CPI and MIPS can be related by the equation
CPI=f/MIPS
Where F is the clock frequency in MHz
Speedup
Speedup is defined by
S(m)=T(1)/T (m)
Where T (m) is the execution time for some target workload on an m-stage pipeline and T(1) is
the execution time for same target workload on a non-pipelined Processor.
Dependencies
If the output of any stage interferes the execution of other stage then dependencies exists.
There are two types of dependencies. They are
1. Control dependency
2. Data dependency
8. Give detail description about Pipelined data path and control. (Nov/Dec2014)(Apr/May2012)
(Or) Discuss the modified data path to accommodate pipelined executions with a diagram.
Apr/May 2016, 2017, 2018Nov/Dec 2021
Pipelined data path and control
Consider the three-bus structure suitable for pipelined execution with a slight modification to support a
4-stage pipeline. Several important changes are
There are separate instruction and data cachesthat use separate address and data connections to
the processor. This requires two versions of the MAR register, IMAR for accessing tile
instruction cache and DMAR for accessing the data cache.
The PC is connected directly to the IMAR, so that the contents of the PC can be transferred to
IMAR at the same time that an independent ALU operation is taking place.
The data address in DMAR can be obtained directly from the register file or from the ALU to
support the register indirect and indexed addressing modes.
Separate MDR registers are provided for read and write operations. Data can be transferred
directly between these registers and the register file during load and store operations without the
need to pass through the ALU.
Buffer registers have been introduced at the inputs and output of the ALU. These are registers
SRCl, SRC2, and RSLT. Forwarding connections may be added if desired.
The instruction register has been replaced with an instruction queue, which is loaded from the
instruction cache.
The output of the instruction decoder is connected to the control signal pipeline. This pipeline
holds the control signals in buffers B2 and B3.
The following operations can be performed independently in the process,
Reading an instruction from the instruction cache
Incrementing the pc
Decoding the instruction
Reading from or writing into the data cache.
Reading the contents of up to two registers from the register file.
Writing in to one register in the register file
Performing an ALU operation.
The structure provides the flexibility required to implement the four-stage pipeline.
For example: I1, I2, I3, I4
Be the sequence of four instructions.
Write the result of instruction I1 into the register file.
Read the operands of instruction I2 from the register file.
Decode instruction I3
Fetch instruction I4 and increment the PC.
REGISTER
FILE
Bus a
Bus b A
Bus c
R
ALU
pc
Control signal
Pipeline
Incrementer
Instructio
n
Decoder IMAR
Memory
Instruction address
Queue (Instruction
Data cache
F:
INSTRUCTION
FETCH
UNIT INSTRUCTION QUEUE
FLOATING
POINT
UNIT
DISPATCH
UNIT W: WRITE
RESULTS
INTEGER
UNIT
The cycle time of the processor is reduced; increasing the instruction throughput. Pipelining
doesn't reduce the time it takes to complete an instruction; instead it increases the number of
instructions that can be processed simultaneously ("at once") and reduces the delay between
completed instructions (called 'throughput'). The more pipeline stages a
processor has, the more instructions it can process "at once" and the less of a delay there is
between completed instructions. Every predominant general purpose microprocessor
manufactured today uses at least 2 stages of pipeline up to 30 or 40 stages.
If pipelining is used, the CPU Arithmetic logic unit can be designed faster, but more complex.
Pipelining in theory increases performance over an un-pipelined core by a factor of the number
of stages (assuming the clock frequency also increases by the same factor) and the code is ideal
for pipeline execution.
Pipelined CPUs generally work at a higher clock frequency than the RAM clock frequency, (as
of 2008 technologies, RAMs work at a low frequencies compared to CPUs frequencies)
increasing computers overall performance.
Disadvantages of Pipelining:
Pipelining has many disadvantages though there are a lot of techniques used by CPUs and compilers
designers to overcome most of them; the following is a list of common drawbacks:
The design of a non-pipelined processor is simpler and cheaper to manufacture, non-pipelined
processor executes only a single instruction at a time.
This prevents branch delays (in Pipelining, every branch is delayed) as well as problems when
serial instructions being executed concurrently.
In pipelined processor, insertion of flip flops between modules increases the instruction latency
compared to a non-pipelining processor.
A non-pipelined processor will have a defined instruction throughput. The performance of a
pipelined processor is much harder to predict and may vary widely for different programs.
Many designs include pipelines as long as 7, 10, 20, 31 and even more stages; a disadvantage of
a long pipeline is when a program branches, the entire pipeline must be flushed (cleared).
The higher throughput of pipelines falls short when the executed code contains many branches:
the processor cannot know in advance where to read the next instruction, and must wait for the
branch instruction to finish, leaving the pipeline behind it empty.
This disadvantage can be reduced by predicting whether the conditional branch instruction will
branch based on previous activity.
After the branch is resolved, the next instruction has to travel all the way through the pipeline
before its result becomes available and the processor resumes "working" again.
In such extreme cases, the performance of a pipelined processor could be worse than non-
pipelined processor.
Unfortunately, not all instructions are independent. In a simple pipeline, completing an
instruction may require 5 stages. To operate at full performance, this pipeline will need to run 4
subsequent independent instructions while the first is completing.
Any of those 4 instructions might depend on the output of the first instruction, causing the
pipeline control logic to wait and insert a stall or wasted clock cycle into the pipeline until the
dependency is resolved.
Fortunately, techniques such as forwarding can significantly reduce the cases where stalling is
required.
Self-modifying programs may fail to execute properly on a pipelined architecture when the
instructions being modified are near the instructions being executed.
This can be caused by the instructions may already being in the Prefetch Input Queue, so the
modification may not take effect for the upcoming execution of instructions. Instruction caches
make the problem even worse.
Hazards: When a programmer (or compiler) writes assembly code, they generally assume that
each instruction is executed before the next instruction is being executed.
When this assumption is not validated by pipelining it causes a program to behave incorrectly,
the situation is known as a hazard. Various techniques for resolving hazards
or working around such as forwarding and delaying (by inserting a stall or a wasted clock cycle)
exist.
HAZARD
Any condition that causes the pipeline to stall is called a hazard.
There are three type of hazard
I. Data Hazards
II. Control/instruction hazards
III. Structural Hazard
The operation specified in instruction I2 requires three cycles to complete, from cycle 4 through
cycle 6. Thus, in cycles 5 and 6, the Write stage must be told to do nothing, because it has no
data to work with. Meanwhile, the information in buffer B2 must remain intact until the Execute
stage has completed its operation.
This means that stage 2 and, in turn, stage1 are blocked from accepting new instructions because
the information in B1 cannot be overwritten.
Thus, steps D4 and F5 must be postponed as shown below. Pipelined operation is said to have
been stalled for two clock cycles. Normal pipelined operation resumes in cycle 7.
I2(Add) F2 D2 D2 E2 W2
I3
F2 D3 E3 W3
I4 F4 D4 E4 W4
Instruction B Instruction A
TIME
Writes
Reads
This is quite common and called read after write data hazard. This situation is solved with a
simple hardware technique called operand forwarding.
Example:
Add $ s0, $ t0, $ t1
Sub $ t2, $ s0, $ t3 (A) Forwarding Datapath
This type of hazard arises from pipeline of branch and other instructions that change the contents
of PC. (i.e) Trying to make a decision based on the results of instruction while others are
executing.
Unconditional Branch:
The belowfigure shows a sequence of instructions being executed in a two-stage pipeline instruction I1
to I3 are stored at consecutive memory address and instruction I2 is a branch instruction.
Branch timing in the presence of an instruction queue. Branch target address in computed D stage
The fetch unit must have sufficient decoding and processing capability to recognize and execute
branch instruction.
The fetch unit always keeps the instruction queue filled at all times.
Fetch unit continues to fetch instructions and add them to the queue.
Similarly if there is a delay in fetching instructions, the dispatch unit continues to issue
instruction from the instruction queue.
Every fetch operation adds one instruction to the queue and each dispatch operation reduces
queue length by one. Hence queue length remains same for first four clock cycle.
Instruction I5 is a branch instruction Its target instruction, Ik , is fetched in cycle 7, and
instruction I6 is discarded. The branch instruction would normally cause a stall in cycle 7 as a
result of discarding instruction I6. Instead, instruction I4 is dispatched from queue to the
decoding stage. After discarding I6, the queue length drops to 1 in cycle 8. The queue length will
be at this value until another stall is encountered.
The sequence of instruction completions instruction I1,I2,I3,I4 and Ik complete execution in
successive clock cycle. Hence the branch instruction does not increase the overall execution
time.
This is because the instruction fetch unit has executed branch instruction concurrently with the
execution of other instruction. This technique is referred to as branch folding.
Branch folding occurs only if at the time a branch instruction encountered, at least one
instruction is available in the queue other than the branch instruction.
11. Explain about Conditional Branches: (Apr/May2014)
Conditional Branches:
The conditional branching is a major factor that affects the performance of instruction pipelining.
When a conditional branch is executed if may or may not change the PC.
If a branch changes the PC to its target address, it is a taken branch, if it falls through, it is not
taken. The decision to branch cannot be taken until the execution of that instruction has been
completed.
Delayed Branch:
The location following the branch instruction is called branch delay slot. There may be more
than one branch delay slot depending on the time it takes to execute the branch instruction.
The instruction in the delay slot is always fetched at least partially executed before the branch
decision is made and the branch target address is completed.
A technique called delayed branching can minimize the penalty caused by conditional branch
instruction.
The instruction in the delay slot is always fetched. Therefore, arrange the instructions which are
fully executed, whether or not the branch is taken. Place the useful instructions in the delay
slot.If no useful instructions available; fill the slot with NOP instructions.
EXAMPLE:
-----------------------------------------------------------------------------------
LOOP shift-Left R1
Decrement R2
Branch = 0 LOOP
NEXT Add R1, R2
(a). Original program loop
-------------------------------------------------------------------------------------
LOOP Decrement R2
Branch = 0 LOOP
shift-Left R1
NEXT Add R1, R2
(b). Reordering instructions
Reordering of instructions for a delayed branch
Register R2 is used as counter to determine the number of times contents of R1 are shifted left.
For a processor with one delay slot, the instructions can be recorded a above. For a processor
with one delay slot, the instructions can be reordered as shown in above figure(b).
The shift instruction is fetched while branch instruction is being executed.
After evaluating the branch condition, the processor fetches the instruction at LOOP or at NEXT,
depending on whether the branch condition is true or false respectively. In either case, it
completes the execution of the shift instructions.
The sequence of events during the last two passes in the loop is illustrated in figure.
Execution timing showing the delay slot being filled during two passes through the loop
Pipelined operation is not interrupted at any time, and there are no idle cycles. Branching takes
place one instruction later than where branch instruction appears in the sequence, hence named
"delayed branch".
12. Explain about Branch prediction Algorithm. Nov / Dec 2016
Branch prediction:
Over view:
Speculative execution
Static prediction
Dynamic Branch Prediction
Branch prediction
Prediction techniques can be used to check whether a branch will be valid or not valid. The
simplest form of branch prediction is to assume that the branch will not take place and to
continue to fetch instructions in sequential address order. Until the branch condition is evaluated,
instruction execution along the predicted path must be done on a speculative basis.
Speculative execution means that instructions are executed before the processor is certain that they are
in the correct execution sequence.
The below figure illustrate the incorrectly predicted branch.
Figure shows a compare instruction followed by Branch > 0 instruction. In cycle 3 the branch
prediction takes; the fetch unit predicts that branch will not be taken and it continues to fetch
instruction I4 as I3 enters the Decode Stage.
The result of compare operation is available at the ends of cycle 3. The branch condition is
evaluated in cycle 4. At this point, the instruction fetch unit realizes that the prediction was
incorrect and the two instructions in the execution pipe are purged.
A new instruction Ik is fetched from the branch target address in clock cycle 5. We will examine
prediction schemes static and dynamic prediction.
FIRST HALF TOPIC: Instruction Execution – Building a Data Path – Designing a Control
Unit – Hardwired Control
PART-A
1. What is MIPS and write its instruction set?
MIPS is a reduced instruction set computer (RISC) instruction set architecture (ISA) developed
by MIPS Technologies (formerly MIPS Computer Systems). The early MIPS architectures were 32-
bit, with 64-bit versions added later.
MIPS instruction set:(Micro Instruction per Second)
The memory-reference instructions load word (lw) and store word (sw)
The arithmetic-logical instructions add, sub, AND, OR, and slt
The instructions-branchequal (beq) and jump (j), which we add last.
2. What are R-type instructions? (Apr/May 2015)Nov/Dec 2020
13. What are the two types of branch prediction technique available? (May/June 2009)
The two types of branch prediction techniques are
Static branch prediction
Dynamic branch prediction
14. Define static and dynamic branch prediction.
The branch prediction decision is always the same for every time a given instruction is
executed. This is known as static branch prediction.
Another approach in which the prediction may change depending on execution history is called
dynamic branch prediction.
15. List the two states in the dynamic branch prediction.
LT : Branch is likely to be taken.
LNT : Branch is likely not to be taken.
16.List out the four stages in the branch prediction algorithm.
ST :Strongly likely to be taken
LT :Likely to be taken
LNT :Likely not to be taken
SNT :Strongly not to be taken
17. Define Register renaming. (Nov/Dec 2009)
When temporary register holds the contents of the permanent register, the name of permanent
register is given to that temporary register is called as register renaming.
For example, if I2 uses R4 as a destination register, then the temporary register used in step
TW2 is also referred as R4 during cycles 6 and 7 that temporary register used only for
instructions that follow I2 in program order.
For example, if I1 needs to read R4 in cycle 6 or 7, it has to access R4 though it contains
unmodified data be I2.
SECOND HALF TOPIC: Microprogrammed Control – Pipelining – Data Hazard – Control
Hazards.
18. What is pipelining and what are the advantages of pipelining? (Apr/May 2010) Nov /
Dec 2013
Pipelining is process of handling the instruction concurrently.
The pipelining processor executes a program by one after another.
Advantages: May / June 2016
Pipelining improves the overall throughput of an instruction set processor.
It is applied to design of complex data path units such as multiplexers and floating points
adders.
19. Draw the hardware organization of two-stage pipeline.
20. Name the four stages of pipelining. (Or)What are the steps in pipelining processor?
Nov/Dec 2020.
Fetch : Read the instruction from the memory.
Decode : Decode the instruction and fetch the source operands.
Execute : Perform the operation specified by the instruction
Write : Store the result in the destination location.
21. Write short notes on instruction pipelining.
The various cycles involved in the instruction cycle.
These fetch, decode and execute cycles for several instructions are performed simultaneously to
reduce overall processing time.
This process is referred as instruction pipelining.
22. What is the role of cache in pipelining? (Or) What is the need to use the cache
memory in pipelining concept?(Nov/Dec 2011)
Each stage in a pipeline is expected to complete its operation in one clock cycle. But the
accessing time of the main memory is high.
So it will take more than one clock cycle to complete its operation. So we are using cache
memory for pipelining concept.
The accessing speed of the cache memory is very high.
23. What is meant by bubbles in pipeline? Or what is meant by pipeline bubble? Nov / Dec
2016
Any condition that causes the pipeline to be idle is known as pipeline stall. This is also known as
bubble in the pipeline. Once the bubble is created as a result of a delay, a bubble moves down
stream until it reaches the last unit.
24. What are the major characteristics of pipeline?
Pipelining cannot be implemented on a single task, as it works of splitting multiple tasks into a
number of subtasks and operating on them simultaneously.
The speedup or efficiency is achieved by using a pipeline depends on the number of pipe stages
and the number of available tasks that can be subdivided.
25. Give the features of the addressing mode suitable for pipelining. (Apr/May 2014)
They access operand from memory in only one cycle.
Only load and store instruction are provided to access memory.
The addressing modes used do not have side effects.(When a location other than one explicitly
named in an instruction as the destination operand is a affected, the instruction is said to have a
side effect).
Three basic addressing modes used do not have these features are register, register indirect and
index. The first two require bus address computation. In the index mode, the address can be
computed in one cycle, whether the index value is given in the instruction or in registration.
26. What are the disadvantages of increasing the number of stages in pipelined
processing?(Apr/May 2011) (Or) What would be the effect, if we increase the number
of pipelining stages? (Nov/Dec 2011)
Speedup:
Speedup is defined by
S(m)=T(1)/T (m)
Where T (m) is the execution time for some target workload on an m-stage pipeline and T(1) is
the execution time for same target workload on a non-pipelined Processor.
27. What is the ideal CPI of a pipelined processor?
The ideal CPI on a pipelined processor is almost always 1. Hence, we can compute the pipelined
CPI:
CPI pipelined = Ideal CPI + Pipeline stall clock cycles per instruction = 1 + Pipeline stall clock
cycles per instruction
28. How can memory access be made faster in a pipelined operation? Which hazards
can be reduced by faster memory access? (Apr/May 2010)
The goal in controlling a pipelined CPU is maximize its performance with respect to the target
workloads.
Performance measures:
The various performance measures of pipelining are,
Throughput
CPI
Speedup
Dependencies
Hazards
The following Hazards can be reduced by faster memory access:
Structural hazards
Data or Data dependent hazards
Instruction or control hazards
29. Write down the expression for speedup factor in a pipelined architecture. May 2013
The speedup for pipeline computer is
S=(K+n-1)tp
Where,
k-number of segments in a pipeline.
n-number of instruction to be executed.
tp- cycle time.
30. Define Hazard and State different types of hazards that occur in pipeline. Nov / Dec
2015, Apr / May 2017, May 2019 Nov/Dec 2020.
In the domain of central processing unit (CPU) design, hazards are problems with the instruction
pipeline in CPU microarchitectures when the next instruction cannot execute in the following clock
cycle, and can potentially lead to incorrect computation results.
The various pipeline hazards are:
Structural hazards
Data or Data dependent hazards
Instruction or control hazards
31. What is structural hazard?(Nov/Dec 2008) (Apr /May 2014)
When two instructions require the use of a given hardware resource at the same time this hazard
will occur. The most common case of this hazard is memory access.
32. What is data hazard in pipelining? (Nov/Dec 2007, 2008)
A data hazard is any condition in which either the source or the destination operands of an
instruction are not available at the time expected in pipeline. As a result some operation has be
delayed and the pipeline stalls.
Arise when an instruction depends on the results of a previous instruction in a way that is
exposed by overlapping of instruction in pipeline
33. What are instruction hazards (or) control hazards?
They arise while pipelining branch and other instructions that change the contents of program
counter.
The simplest way to handle these hazards is to stall the pipeline stalling of the pipeline allows
few instructions to proceed on completion while stopping the execution of those which results
in hazards.
34. How can we eliminate the delay in data hazard?
In pipelining the data can be executed after the completion of the fetch operation. The data are
available at the output of the ALU once the execute stage completes.
Hence the delay can be reduced if we arrange for the result of fetch instruction to be forwarded
directly for use in next step. This is known as operand forwarding.
35. How can we eliminate data hazard using software?
The data dependencies can be handled with the software. The compiler can be used for this
purpose. The compiler can introduce the two cycle delays needed between instruction I1 and I2
by inserting NOP (no operation)
I1: MUL R2, R3, R4
NOP
NOP
I2: ADD R5, R4, R6
36. List the techniques used for overcoming hazard.
Data forwarding
Adding sufficient hardware
Stalling instructions
Document to find instruction in wrong order.
37. What are the techniques used to present control hazard?
Scheduling instruction in delay slots
Loop unrolling
Conditional execution
Speculation (by both compiler and CPU).
38. List the types of data hazards.
i. RAW (Read After Write)
ii. WAW (Write After Write)
iii. WAR (Write After Read)
iv. RAR (Read After Read)
39. Define stall.
Idle periods are called stalls. They are also often referred to as bubbles in the pipeline.
40. Give 2 examples for instruction hazard.
Cache miss
Hazard in pipeline.
41. A =5 A<-3+A A<-4+A What hazard does the above two instructions
create when executed concurrently? (Apr/May 2011)
If these operations are performed in the order given, the result is 32. But, if they were performed
concurrently, the value is 5. So output is wrong.
42. What is meant by speculative execution? (Apr/May 2012) Or what is the need for
speculation? (Nov/Dec 2014), May 2019
A technique allows a superscalar processor to keep its functional units as busy as possible by
executing instructions before it is known that they will be needed.
The Intel P6 uses speculative execution.
43. What is meant by hazard in pipelining? Define data and control hazards.
(May/June 2013) (Apr/May 2012)
The idle periods in any of the pipeline stage due to the various dependency relationships among
instructions are said to be stalls.
A data hazard is any condition in which either the source or the destination operands of an
instruction are not available at the time expected in pipeline.
As a result some operation has be delayed and the pipeline stalls arise when an instruction
depends on the results of a previous instruction in a way that is exposed by overlapping of
instruction in pipeline.
Types
1. RAW 2. WAW 3. WAR
Control hazards arise while pipelining branch and other instructions that change the contents of
program counter. The simplest way to handle these hazards is to stall the pipeline
stalling of the pipeline allows few instructions to proceed to completion while stopping
the execution of those which results in hazards
44. Why is branch prediction algorithm needed? Distinguish between static and
dynamic branch prediction. (May/June 2009) Or Differentiate between the static and
dynamic techniques. (May/June 2013)
Branch Prediction has become essential to getting good performance from scalar instruction streams.
– Underlying algorithm has regularities.
– Data that is being operated on has regularities.
– Instruction sequence has redundancies that are artifacts of way that humans/compilers
think about problems.
S.NO. STATIC BRANCH PREDICTION DYNAMIC BRANCH PREDICTION
1. Branch can be predicted based on branch It used recent branch history during program
codes type statistically. execution, information is stored in buffer
called branch target buffer(BTB).
2. It may not produce accurate result every time. Processing of conditional branches with
zero delay.
45. What is Branch Target Buffer?
Branch Target Buffer (BTB): A hardware mechanism that aims at reducing the stall cycles resulting
from correctly predicted taken branches to zero cycles.
46. Define program counter (PC).
The register containing the address of the instruction in the program being executed.
47. What are Sign-extend?
To increase the size of a data item by replicating the high-order sign bit of the original data item
in the high order bits of the larger, destination data item.
48. Define Register file.
A state element that consists of a set of registers that can be read and written by supplying a
register number to be accessed.
49. What is a Don’t-care term?
An element of a logical function in which then output does not depend on the values of all the
inputs.
50. Define Forwarding.
A method of resolving a data hazard by retrieving the missing data element from internal buffers rather
than waiting for it to arrive from programmer visible registers or memory. Also called bypassing.
51. What is a branch prediction buffer? (Apr/May 2015)
A small memory that is indexed by the lower portion of the address of the branch instruction and that
contains one or more bits indicating whether the branch was recently taken or not. It is also called
branch history table.
52. What is an Exception? Nov / Dec 2014, May / June 2016, Apr. / May 2018, Nov. / Dec. 2018
Exceptions and interrupts are unexpected events that disrupt the normal flow of instruction execution.
An exception is an unexpected event from within the processor. An interrupt is an unexpected event
from outside the processor. We have to implement exception and interrupt handling in our multi cycle
CPU design.
53. Give one example for MIPS exception. Apr. / May 2018, Nov. / Dec. 2018
Exceptions in MIPS
EX Arithmetic exception
WB None
56.What is Instruction Level Parallelism? (Dec 2012, Dec 2013, May 2015, May 2016)
The technique which is used to overlap the execution of instructions and improve performance is
called ILP.
57. What are the approaches to exploit ILP? (Dec 2012, Dec 2015)
The two separable approaches to exploit ILP are,
Dynamic or Hardware Intensive Approach
Static or Compiler Intensive Approach.
58. What is Loop Level Parallelism?
Loop level parallelism is a way to increase the amount of parallelism available among
instructions is to exploit parallelism among iterations of loop.
59. Give the methods to enhance performance of ILP.
To obtain substantial performance enhancements, the ILP across multiple basic blocks are exploited
using
Loop Level Parallelism
Vector Instructions
60. Define Dynamic Scheduling. (May 2013) (Or) Explain the idea behind dynamic
scheduling. (Nov/Dec 2016)
Dynamic scheduling is a technique in which the hardware rearranges the instruction execution to
reduce the stalls while maintaining data flow and exception behavior.
61. List the drawbacks of Dynamic Scheduling.
The complexity of the tomasulo scheme.
Each reservation station must contain an associative buffer.
The performance can be limited by the single CDB.
62. List the advantages of Dynamic Scheduling. (May 2012)
It handles dependences that are unknown at compile time.
It simplifies the compiler.
It allows code compiled for one pipeline to run efficiently on a different pipeline
Uses speculation techniques to improve the performance.
63. Differentiate Static and Dynamic Scheduling.
Static Scheduling Dynamic Scheduling
The data hazard that prevents a new instruction The CPU rearranges the instructions to reduce
issue in the next cycle was resolved using a stalls while preserving dependences.
technique called data forwarding It uses hardware based mechanism to rearrange
And also by compiler scheduling that separated instruction execution order to reduce stalls at
the dependent instruction this is called as static runtime.
scheduling. It enables handling cases where dependences
are unknown at compile time.
If the pipeline has not yet completed instructions that are earlier in program order then that
instructions will cause exception
77. What is Register Renaming?
Renaming of register operand is called register renaming.
It can be either done statically by the compiler or dynamically by the hardware.
78. Difference between Static and Dynamic Branch Prediction? (May 2011)
Static Branch Prediction Dynamic Branch Prediction
Static branch prediction is usually It uses the run time behavior of branch to make
carried out by the complier. more accurate prediction.
It is static because the prediction is Information about the outcome of previous
already known even before the occurrences of a given branch is used to predict
program is executed. the current occurrences.
79. In a datapath diagram, what is the size of ALUop Control signal. Nov/Dec 2021
FIRST HALF TOPIC: Memory Concepts and Hierarchy – Memory Management – Cache
Memories: Mapping and Replacement Techniques
Hard discs are discs that are permanently attached and cannot be removed by a single user.
(iii) Optical Disks: It‘s a laser-based storage medium that can be written to and read. It is reasonably priced
and has a long lifespan. The optical disc can be taken out of the computer by occasional users. Types of
Optical Disks :
(a) CD – ROM:
It‘s called Compact Disk. Only read from memory.
Information is written to the disc by using a controlled laser beam to burn pits on the disc surface.
It has a highly reflecting surface, which is usually aluminum.
The diameter of the disc is 5.25 inches.
16000 tracks per inch is the track density.
The capacity of a CD-ROM is 600 MB, with each sector storing 2048 bytes of data.
The data transfer rate is about 4800KB/sec. & the new access time is around 80 milliseconds.
(b) WORM-(WRITE ONCE READ MANY):
A user can only write data once.
The information is written on the disc using a laser beam.
It is possible to read the written data as many times as desired.
They keep lasting records of information but access time is high.
It is possible to rewrite updated or new data to another part of the disc.
Data that has already been written cannot be changed.
Usual size – 5.25 inch or 3.5 inch diameter.
The usual capacity of 5.25 inch disk is 650 MB,5.2GB etc.
(c) DVDs:
The term ―DVD‖ stands for ―Digital Versatile/Video Disc,‖ and there are two sorts of DVDs:
(i)DVDR (writable) and (ii) DVDRW (Re-Writable)
DVD-ROMS (Digital Versatile Discs): These are read-only memory (ROM) discs that can be used in a
variety of ways. When compared to CD-ROMs, they can store a lot more data. It has a thick
polycarbonate plastic layer that serves as a foundation for the other layers. It‘s an optical memory that
can read and write data.
DVD-R: It is a writable optical disc that can be used just once. It‘s a DVD that can be recorded. It‘s a lot
like WORM. DVD-ROMs have capacities ranging from 4.7 to 17 GB. The capacity of 3.5 inch disk is
1.3 GB.
3. Cache Memory: It is a type of high-speed semiconductor memory that can help the CPU run faster.
Between the CPU and the main memory, it serves as a buffer. It is used to store the data and programs that
the CPU uses the most frequently.
Advantages of cache memory:
It is faster than the main memory.
When compared to the main memory, it takes less time to access it.
It keeps the programs that can be run in a short amount of time.
It stores data in temporary use.
Disadvantages of cache memory:
Because of the semiconductors used, it is very expensive.
The size of the cache (amount of data it can store) is usually small.
Memory unit:
Memories are made up of registers.
Each register in the memory is one storage location.
The storage location is also called a memory location. Memory locations are identified using Address.
The total number of bits a memory can store is its capacity.
A storage element is called a Cell.
Each register is made up of a storage element in which one bit of data is stored.
The data in a memory are stored and retrieved by the process called writing and reading respectively.
A word is a group of bits where a memory unit stores binary information.
A word with a group of 8 bits is called a byte.
A memory unit consists of data lines, address selection lines, and control lines that specify the
direction of transfer. The block diagram of a memory unit is shown below:
Cache memory
Cache Memory is a special very high-speed memory. It is used to speed up and synchronizing with high-
speed CPU. Cache memory is costlier than main memory or disk memory but economical than CPU
registers. Cache memory is an extremely fast memory type that acts as a buffer between RAM and the
CPU. It holds frequently requested data and instructions so that they are immediately available to the CPU
when needed.
Cache memory is used to reduce the average time to access data from the Main memory. The cache is a
smaller and faster memory which stores copies of the data from frequently used main memory locations.
There are various different independent caches in a CPU, which store instructions and data.
Levels of memory:
Level 1 or Register –
It is a type of memory in which data is stored and accepted that are immediately stored in CPU. Most
commonly used register is accumulator, Program counter, address register etc.
Level 2 or Cache memory –
It is the fastest memory which has faster access time where data is temporarily stored for faster access.
Level 3 or Main Memory –
It is memory on which computer works currently. It is small in size and once power is off data no
longer stays in this memory.
Level 4 or Secondary Memory –
It is external memory which is not as fast as main memory but data stays permanently in this memory.
Cache Performance:
When the processor needs to read or write a location in main memory, it first checks for a corresponding
entry in the cache.
If the processor finds that the memory location is in the cache, a cache hit has occurred and data is
read from cache
If the processor does not find the memory location in the cache, a cache miss has occurred. For a cache
miss, the cache allocates a new entry and copies in data from main memory, then the request is fulfilled
from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called Hit ratio.
Hit ratio = hit / (hit + miss) = no. of hits/total accesses
We can improve Cache performance using higher cache block size, higher associativity, reduce miss rate,
reduce miss penalty, and reduce the time to hit in the cache.
Cache Measures
Cache: Cache is small, fast storage used to improve average access time to slow memory. It applied
whenever buffering is employed to reuse commonly occurring items, i.e. file caches, name caches,
and so on.
Cache Hit: CPU finds a requested data item in the cache.
Cache Miss: The item in not in the cache at access.
Block is a fixed size collection of data, retrieved from memory and placed into the cache.
Advantage of Temporal Locality: If access data from slower memory, move it to faster memory. If
data in faster memory is unused recently, move it to slower memory.
Advantage of Spatial Locality: If need to move a word from slower to faster memory, move
adjacent words at same time.
Hit Rate (Hit Ratio): Fraction of accesses that are hits at a given level of the hierarchy.
Hit Time: Time required accessing a level of the hierarchy, including time to determine whether
access is a hit or miss.
Miss Rate (Miss Ratio): Fraction of accesses that are misses at a given level.
Miss Penalty: Extra time required to fetch a block into some level from the next level down.
The address space is usually broken into fixed size blocks, called pages. At each time, each page
resides either in main memory or on disk.
Average memory access time is a useful measure to evaluate the performance of a memory-hierarchy
configuration.
Average Memory Access Time = Memory Hit Time + Memory Miss Rate x Miss
Penalty Cache Mapping
3. Discuss in detail about various cache mapping techniques.
When the processor needs to read or write a location in main memory, it first checks for a
corresponding entry in the cache.
If the processor finds that the memory location is in the cache, a cache hit has occurred and
data is read from cache
If the processor does not find the memory location in the cache, a cache miss has occurred.
For a cache miss, the cache allocates a new entry and copies in data from main memory, and
then the request is fulfilled from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called Hit ratio.
Hit Ratio = Hit / (Hit + Miss) = No. of Hits / Total Accesses
We can improve Cache performance using higher cache block size, higher associativity, reduce miss
rate, reduce miss penalty and reduce the time to hit in the cache.
Cache Mapping
Cache memory mapping is the way in which we map or organize data in cache memory, this is done
for efficiently storing the data which then helps in easy retrieval of the same.
The three different types of mapping used for the purpose of cache memory are as follow,
Direct Mapping
Associative Mapping
Set-Associative Mapping
Direct Mapping:
In direct mapping, assigned each memory block to a specific line in the cache.
If a line is previously taken up by a memory block when a new block needs to be loaded, the old
block is trashed.
An address space is split into two parts index field and tag field.
The cache is used to store the tag field whereas the rest is stored in the main memory.
Direct mapping`s performance is directly proportional to the Hit ratio.
Associative Mapping:
In this type of mapping, the associative memory is used to store content and addresses both of the
memory word. Any block can go into any line of the cache.
This means that the word id bits are used to identify which word in the block is needed, but the tag
becomes all of the remaining bits.
This enables the placement of the any word at any place in the cache memory.
It is considered to be the fastest and the most flexible mapping form.
Set-Associative Mapping:
This form of mapping is an enhanced form of the direct mapping where the drawbacks of direct
mapping are removed.
Set associative addresses the problem of possible thrashing in the direct mapping method.
It does this by saying that instead of having exactly one line that a block can map to in the cache; we
will group a few lines together creating a set.
Then a block in memory can map to any one of the lines of a specific set.
Set-associative mapping allows that each word that is present in the cache can have two or more
words in the main memory for the same index address.
Set associative cache mapping combines the best of direct and associative cache mapping techniques.
Uses of Cache
Usually, the cache memory can store a reasonable number of blocks at any given time, but this
number is small compared to the total number of blocks in the main memory.
The correspondence between the main memory blocks and those in the cache is specified by a
mapping function.
Types of Cache
Primary Cache – A primary cache is always located on the processor chip. This cache is small and
its access time is comparable to that of processor registers.
Secondary Cache – secondary cache is placed between the primary cache and the rest of the
memory. It is referred to as the level 2 (L2) cache. Often, the Level 2 cache is also housed on the
processor chip.
Locality of Reference
Since size of cache memory is less as compared to main memory.
So to check which part of main memory should be given priority and loaded in cache is decided
based on locality of reference.
Types of Locality of Reference
Spatial Locality of reference – this says that there is chance that element will be present in the close
proximity to the reference point and next time if again searched then more close proximity to the
point of reference.
Temporal Locality of reference – In this Least recently used algorithm will be used. Whenever
there is page fault occurs within word will not only load word in main memory but complete page
fault will be loaded because spatial locality of reference rule says that if you are referring any word
next word will be referred in its register that‘s why we load complete page table so complete block
will be loaded.
Cache replacement Techniques:
In an operating system that uses paging for memory management, a page replacement algorithm is needed
to decide which page needs to be replaced when a new page comes in.
Page Fault: A page fault happens when a running program accesses a memory page that is mapped into the
virtual address space but not loaded in physical memory. Since actual physical memory is much smaller than
virtual memory, page faults happen. In case of a page fault, Operating System might have to replace one of
the existing pages with the newly needed page. Different page replacement algorithms suggest different
ways to decide which page to replace. The target for all algorithms is to reduce the number of page faults.
Page Replacement Algorithms:
1. First In First Out (FIFO): This is the simplest page replacement algorithm. In this algorithm, the
operating system keeps track of all pages in the memory in a queue, the oldest page is in the front of the
queue. When a page needs to be replaced page in the front of the queue is selected for removal.
Example 1: Consider page reference string 1, 3, 0, 3, 5, 6, 3 with 3 page frames. Find the number of page
faults.
Initially, all slots are empty, so when 1, 3, 0 came they are allocated to the empty slots —> 3 Page Faults.
when 3 comes, it is already in memory so —> 0 Page Faults. Then 5 comes, it is not available in memory so
it replaces the oldest page slot i.e 1. —>1 Page Fault. 6 comes, it is also not available in memory so it
replaces the oldest page slot i.e 3 —>1 Page Fault. Finally, when 3 come it is not available so it replaces 0 1
page fault.
Belady’s anomaly proves that it is possible to have more page faults when increasing the number of page
frames while using the First in First Out (FIFO) page replacement algorithm. For example, if we consider
reference strings 3, 2, 1, 0, 3, 2, 4, 3, 2, 1, 0, 4, and 3 slots, we get 9 total page faults, but if we increase slots
to 4, we get 10-page faults.
2. Optimal Page replacement: In this algorithm, pages are replaced which would not be used for the
longest duration of time in the future.
Example-2: Consider the page references 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 3 with 4 page frame. Find
number of page fault.
Initially, all slots are empty, so when 7 0 1 2 are allocated to the empty slots —> 4 Page faults
0 is already there so —> 0 Page fault. when 3 came it will take the place of 7 because it is not used for
the longest duration of time in the future.—>1 Page fault. 0 is already there so —> 0 Page fault. 4 will
takes place of 1 —> 1 Page Fault.
Now for the further page reference string —> 0 Page fault because they are already available in the
memory.
Optimal page replacement is perfect, but not possible in practice as the operating system cannot know future
requests. The use of Optimal Page replacement is to set up a benchmark so that other replacement algorithms
can be analyzed against it.
3. Least Recently Used: In this algorithm, page will be replaced which is least recently used.
Example-3: Consider the page reference string 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 3 with 4 page frames. Find
number of page faults.
Initially, all slots are empty, so when 7 0 1 2 are allocated to the empty slots —> 4 Page faults
0 is already there so —> 0 Page fault. when 3 came it will take the place of 7 because it is least recently
used —> 1 Page fault 0 is already in memory so —> 0 Page fault. 4 will takes place of 1 —> 1 Page
Fault Now for the further page reference string —> 0 Page fault because they are already available in
the memory.
4. Most Recently Used (MRU): In this algorithm, page will be replaced which has been used recently.
Belady‘s anomaly can occur in this algorithm.
SECOND HALF TOPIC: Virtual Memory – DMA – I/O – Accessing I/O: Parallel and Serial
Interface – Interrupt I/O – Interconnection Standards: USB, SATA
4. Explain in detail about virtual memory with an example.(Nov/Dec 2019) (Nov/Dec 2021) or
Discuss the concept of virtual memory and explain how a virtual memory system is implemented, pointing
out the hardware and software support. (Nov/Dec 2017)Nov/Dec 2020.
VIRTUAL MEMORY
Virtual memory divides physical memory into blocks (called page or segment) and allocates them to
different processes.
With virtual memory, the CPU produces virtual addresses that are translated by a combination of
HW and SW to physical addresses, which accesses main memory.
The process is called memory mapping or address translation.
Today, the two memory-hierarchy levels controlled by virtual memory are DRAMs and magnetic
disks.
Virtual Memory manages the two levels of the memory hierarchy represented by main memory and
secondary storage.
Figure below shows the mapping of virtual memory to physical memory for a program with four
pages.
While DMA control is taking place, the program requested the transfer cannot continue and the
processor can be used to execute another program.
After DMA transfer is completed, the processor returns to the program that requested the transfer.
R/W->Determines the direction of transfer
When R/W =1, DMA controller read data from memory to I/O device.
R/W =0, DMA controller perform write operation.
Done Flag=1, the controller has completed transferring a block of data and is ready to receive
another command.
IE=1, it causes the controller to raise an interrupt (interrupt Enabled) after it has completed
transferring the block of data.
IRQ=1, it indicates that the controller has requested an interrupt.
A DMA controller connects a high speed network to the computer bus, and the disk controller for two
disks also has DMA capability and it provides two DMA channels.
To start a DMA transfer of a block of data from main memory to one of the disks, the program write‘s
the address and the word count information into the registers of the corresponding channel of the disk
controller.
When DMA transfer is completed, it will be recorded in status and control registers of the DMA
channel (ie) Done bit=IRQ=IE=1.