Ca Part 4
Ca Part 4
FLYNN'STAXONOMY OF COMPUTER
ARCHITECTURE
Chapter at a Glance
Flynn's
The most classification
popular taxonomy of computer architecture was defined by Flynn in 1966. Flynn's
information
classification scheme is based on the notion of a stream of information. Two tyYpes of
defined as the sequence
flow into a processor: instructions and data. The instruction stream is
the data traf
instructions performed by the processing unit. The data stream is defined as
Flynn's classification
Cxchanged between the memory and the processing unit. According to architecture can he
Computer
either of the instruction or data streams can be single or multiple.
classified accordingly into the following four distinct categories:
Single-instruction single-data streams (SISD)
Single-instruction multiple-data streams (SIMD)
Multiple-instruction single-data streams (MISD) Multiple-instruction multiple-data
streams (MIMD)
CA-104
COMPUTER ARCHITECTURE
2 Which one of the following has no practical usage? WBUT 2010, 2014, 2016]
a) SISD b) SIMD c) MISD d) MIMD
Answer: (c)
Iostruction Stream
Procesing Unit Data Menory
Data Stream
Fig: SIMD Architecture
CA-106
COMPUTER ARCHITECTURE
Multiple Instructions, Single Data stream
(MISD)
Multiple instructions operate on a single data
generally used for fault tolerance. stream. Uncommon architecture which is
stream and must agree on the result. Heterogeneous systems operate on the same data
computer. Examples include the Space Shuttle flight control
Instruction Memory Conrol Unit
Instruction Stream
Instruction Stream
Data Stream
Instruction Stream
Fig: MISD Architecture
CA-107
POPULAR PUBLICATIONS
2. Implement the data routing logic of SIMD architecture to compute
s(k) =Ai for k =0,1,2..N -1. WBUT 2008]
OR,
Why do we need masking mechanism in SIMD array processors? WBUT 2015]
In an SIMDarray processor of &PEs, the sum S(k) of the first k components
vector A is desired for each k from 0 to 7.
in!
Let A=(4,Ai.., d, ). We need to compute the following and throughput.
s(k) = 4; for k=0,1,..,.7
Discuss how data-routing and masking are performed in the processor.
Answer:
WBUT 2015]
Masking technique for a SIMD processor is capable of masking a plurality of individual
machine operations within a single instruction incorporating a plurality of operations. To
accomplish this each different machine operation within the instruction includes a
number of masking bits which address a specific location in a mask register. The mask
register includes a mask bit bank. The mask location selected within the mask register is
bit-wise ANDed with a mask context bit in order to establish whether the processing
element will be enabled or disabled for a particular conditional sub-routine which is
called.
CA-108
COMPUTER ARCHITECTURE
PE 0 A0 S(0)
A1 S(1)
PE 1 0,1 0,1 0,1
PE 2 S(2)
1,2 0-2 0-2
: PE4 S4)
3,4
PE 7 4-7 O-7
6,7
Fig: The calculation of the summation s(k)=)Ai for k =0,1,2...N -1 in an SIMD machine
j=0
CPI ~CPI,"1,
=
IC i=l
CA-110
COMPUTER ARCHITECTURE
of
A defining attribute the multicomputer model is th¡t accesses to local (same-node)
memory are less expensive than accesses to remote (different-node) memory. That is,
read and
write are less costly than send and receive. Hence, it is desirable that accesses to
Jocal data be more frequent than accesses to remote data. This property, called
locality, is
lodfundamental requirement for parallel software, in addition to concurrency and
scalability.
Another important class of parallel computer is the multiprocessor or shared-memory
MIMD computer. In multiprocessors, all processors share access to a common memory,
typically via a bus or a hierarchy of buses. In the idealized Parallel Random Access
Machine (PRAM) model, often used in theoretical studies of parallel algorithms, any
processor can access any memory element in the same amount of time. In practice,
particular,
scalingthis architecture usually introduces some form of memory hierarchy; in storing
the frequency with which the shared memory is accessed may be reduced byAccess to
conies of frequently used data items in a cache associated with each processor.
this cache is much faster than access to the shared memory.
architecture.
2. a) Describe the distribution and shared memory model of SIMD element.
b) Draw the block diagram and explain the functionality of processing [WBUT 2008]
Answer:
described below based on the
a) There are twO types of SIMD computer models are
Model and
memory distribution and addressing scheme used. One is Distributed-Memory
another is Shared-Memory Model. Most SIMD computers use a single control unit and
distributed memories, except for a few that use associative memories. The instråction set
elements
of an- SIMD computer is decoded by the array control unit. The processing from the
(PEs) in the SIMD array are passive ALUs executing instructions broadcast
control unit.
Distributed-Memory Model: A distributed-memory SIMD computer consists of an
array of PEs which are controlled by the same array control unit, as shown in Fig: 1.
CA-111
POPULAR PUBLICATIONS
Mass Storage
Scalar
Processor
Network
Scalar Instruction
Host
Control Control Memory I/O
Array (Program and Data) Computer
Control Unit Instr. (User)
Vector
Instructions Broadcast Bus
(Instructions and
Constants) PEn Data
PEI PE2
Bus
LM2 LMn
LMI PE: Processing
Element
LM: Local
Data Routing Network Memory
CA-112
COMPUTER ARCHITECTURE
Mass
Storage
Control Memory Scalar Scalar
Processor
Host
Array Control Unit Instr.
I/O (User)
Broadcast Bus
Network (Vector Instructions)
Control PE, PE; PEn
Alignment Network
A1 B C
To other PEs via the
interconnecion
PE network
D; RË
ALU
PEM;
the earliest computers, only one program ran at a time. Acomputation-intensive program
thattook one hour to run and a tape copying program that took one hour to run would
take a total of two hours to run. An early form of parallel processing allowed the
interleaved execution of both programs together. The computer would start an
operation, and while it was waiting for the operation to complete, it would execute the
processor-intensive program. The total execution time for the two jobs would be a little
over one hour.
Levels of parallelprocessing:
We can have parallel processing at four levels.
Instruction Level: Most processors have several execution units and can execute
everal instructions (usually machine level) at the same time. Good compilers can reorder
inctructions to maximize instruction throughput. Often the processor itself can do this.
Modern processors even parallelize execution of micro-steps of instructions within the
same pipe.
iD Loop Level: Here, consecutive loop iterations are candidates for parallelexecution.
However, data between subsequent iterations may restrict parallel execution of
instructions at loop level. There is a lot of scope for parallel execution at loop level.
i) Procedure Level: Here parallelism is available in the form of parallel executable
procedures. Here the design of the algorithm plays amajor role. For example each thread
in Java can be spawned to run a function or method.
iv) Program Level: This is usually the responsibility of the operating system, which runs
processes concurrently. Different programs are obviously independent of each other.
So parallelism can be extracted by the operating system at this level.
5. Write short notes on the following:
a) Array processor [WBUT 2005, 2007, 2010]
b) MMX Technology WBUT 2005, 2006, 2007]
c)CM-2 machine [WBUT 2008]
d) Flynn's classification [WBUT 2011]
Answer:
a) Array processor:
The SIMD-1 Array Processor consists of a Memory, an Array Control Unit (ACU) and
the One-dimensional SIMD array of simple processing elements (PEs). The figures show
a4-processor array. The figures shows the initial image seen when the model is loaded.
CA-115
POPULAR PUBLICATIONS
aock
MEMORY
Pbsa
PC main, 0
PEC
SIMD Aray
b) MMX Technology:
Architecture (IA) designed to improve
MMX technology is an extension to the Intel Pentium processor with
performance of multimedia and communication algorithms. The
instruction set.
MMX Technology is the first microprocessor to implement the new
MMX consists of two main processor architectural improvements.
Operation of MMX Technology over the non-MMX Pentiun
The MMX technology consists of several improvements
microprocessors: to
instructions added th0se have been designed
1. There are S57 new microprocessor use
MMX
handle video, audio, and graphical data more efficiently. Programs can
instructions without changing to a,new mode or operating-system visible state.
CA-116
COMPUTER ARCHITECTURE
AII MMX chips have a larger internal Ll cache than their non-MMX counterparts. This
improvesthe performance of any software running on the chip, regardless of whether it
actually uses the MMX-specific instructions
or not.
Frequency Speedup
the pipeline of the Pentium
To simplify the design and to meet the core frequency goal, (length
processor w/MMX was extended with a new pipeline stage decode). In order to
due to
maintain and improve the CPI (Clock per Instruction) of MMX technology is
modifications that increase the Clock Rate.
As.we know,
CA-117
POPULAR PUBLICATIONS
Prefetch | Fetch D1 D2 Execute |Writeback
Munit va
BTB Shadow reg.
CROM
RSB
FPU
FP registers
Code Instr. Adr.
cache Len. deood calc,
16K decod and
op. Integer exec
FIFO read
TLB
assod Dcache TLB
16K fassoc
Bus unit Page
unit
IPC
CA-118
COMPUTER ARCHITECTURE
Sequencer
Global result bus
Scalar Memory bus
Instruction Broadcast bus
Combine
Processors
M M M M
Memories
Router /News /S.anning
CA-119
POPULAR PUBLICATIONS
RISC & CISC ARCHITECTURES
T Chapter at a Glance
Non von Neumann architecture characteristics
Any computer architecture in which the underlying model of computation is different from what
has come to be called the standard von Neumann model. A non von Neumann machine may thus
be without the concept of sequential flow of control (i.e. without any register corresponding to a
program counter" that indicates the current point that has been reached in execution of a
programn) and/or without the concept of a variable (i.e. without "named" storage locations in which
a value may be stored and subsequently referenced or changed). Examples of non von Neumann
machines are the dataflow machines and the reduction machines. In both of these cases there is a
high degree of parallelism, and instead of variables there are immutable bindings between names
and constant values.
Cluster Computer
A cluster computer consists of a set of loosely connected computers that work together so that in
many respects they can be viewed as a single system. The components of a cluster are usually
conected to each other through fast local area networks, each node (computer used as a server)
running its own instance of an operating system. Computer clusters emerged as a result of
convergence of a number of computing trends including the availability of low cost
microprocessors, high speed networks, and software for high performance distributed computing.
Clusters are usually deployed to improve performance and availability over that of a single
computer, while typically being much more cost-effective than single computers of comparable
speed or availability. Computer clusters have a wide rang of applicability and deployment,
ranging from small business clusters with a handful of nodes to some of the fastest
supercomputers in the world.
CA-120
COMPUTER ARCHITECTURE
2. What is a main advantage of classical vector systems (VS) compared with RISC
based systems (RS)? [WBUT 2008, 2009]
alVs have significantly higher memory bandwidth than Rs
rate than RS
b) VS have higher clockthan RS
c) VS are more parallel
d) None of these
Answer: (a)
3.Difference between RISC and CISC is [WBUT 2010]
a) RISC is more complex b) CISC is more effective
c) RISCis better optimizable d) none of these
Answer: (a)
4. The advantage of RISC over CISC is that WBUT 2011]
a) RISC can achieve pipeline segments, requiring just one clock cycle
b) CISC uses many segments in its pipeline with the longest segment
requiring two or more clock cycle
c) both (a) &(b)
d) none of these
Answer: (d)
5. Which of the following is not RISC architecture characteristic?
[WBUT 2012, 2018]
a) simplified and unified format of code of instructions
b) no specialized register
C) no storage / storage instruction
d) small register file
Answer:(d)
6. Which of the following architectures correspond to von-Neumann architecture?
WBUT 2012]
a) MISD c) SISD d) SIMD
b) MIMD
Answer: (c)
7. The CPl
a) 1 value forb) RISC processors is (WBUT 2015]
2 c) 3 d) more
Answer: (a)
CA-121
POPULAR PUBLICATIONS
8. In
which of the following shared memory multiprocessor models
access shared memory is same? the time to
d) NUMA 2019]
WBUT
a) NORMA b) COMA c) UMA
Answer: (c)
Short Answer Type Questions
1. Compare between RISCand CISC. wBUT 2010, 2012, 2014, 2015, 20181
OR,
Compare RISCand CISC architecture inbrief.
Answer: WBUT 2019]
Characteristics CISC RISC
Instruction set size Instruction set is very large Instruction set is small and
and
instruction and instruction format is instruction format is fixed.
formats variable (16 64 bit per
instruction)
Addressing mode 12 - 24 3-5
General purpose
8-24 general purpose registers
registes and cache present. Unified cache is used Though most instructions are
register base so large numbers of
design for instruction and data
registers (32 - 192) are used and
cache is split, in data cache and
instruction cache.
CPI CPIis between 2 to 15 In most cases CPI is 1but
average
CPI is less than 1.5
CPUcontrol CPU is controlled by control CPU is controlled by hardware
memory (ROM) using without control memory
microprograms.
2. What are multiprocessor, multi-computer and multi-core systems?
WBUT 2012, 2014]
Answer:
In Multiprocessor system there are more than one processor that works
In this system there is one master processor and other are the Slave. If simultaneously.
one processor fails
then master can assign the task to other slave processor. But if Master will be fail than
entire system will fail. Central part of Multiprocessor is the Master. All of them share the
hard disk, memory and other devices.
A multicomputer system consisting of more than one computer,
usually under the
supervision of a master computer, in which smaller computers handle input/output and
routine jobs while the large computer carries out the more complex
A multi-core processor is a single computing component with twocomputations.
or
actual central processing units (called "cores"), which are the units thatmore independent
read and execute
program instructions. The instructions are ordinary CPU instructions such as add, move
data, and branch, but the multiple cores can run multiple instructions at the same time,
increasing overall speed for programs amenable to parallel computing. Manufacturers
CA-122
COMPUTER ARCHITECTURE
typicallyintegratethe cores ontoa single integrated circuit die or onto multiple dies in a
singlechippackage.
Whatis Von-Neumann architecture? What is a Von-Neumann bottleneck? How
3. W WBUT 2019]
can this be reduced?
Answer:
architecture was first published by John von Neumann in 1945. His
Von Neumann
computer architecture design consists of a Control Unit, Arithmetic and Logic Unit
Neumann architecture is based
IALU, Memory Unit, Registers and Inputs/Outputs. Von data and program data are
n the stored-program computer concept, where instruction computers produced today.
etored in the same memory. This design is stillused in most
responsible for executing the
The Central Processing Unit (CPU) is the electronic circuit to as the microprocessor or
instructions of a computer program. It is sometimes referred
processor. The CPUcontains the ALU, CU and a variety of registers. Registers
are high
in a register before it can be
Speed storage areas in the CPU. All data must be stored (add, subtract etc.) and
processed. Arithmetic and Logic Unit (ALU) allows arithmeticcontrol unit controls the
logic (AND, OR, NÞT étc.) operations to be carried out. The
operation of the computer's ALU, memory and input/output devices, telling them how to
from the memory unit.
respond to the program instructions it has just read and interpreted
The control unit also provides the timing and control issignals required by other computer
components. Buses are the means by which data transmitted from one part of a
to the CPUand memory.
computer to another, Connecting allmajor internal componentsbus and address bus.
standard CPU system bus is comprised of a control bus, data
A
Control Unit
Registers PC CR
AC MAR MDR
Memory Unit
a
PUs processing speed is much faster in comparison to the main memory (RAM) asand
The CPU
Tesult the CPUneeds to wait longer'to obtain data-word from the memory. can be
emory speed disparity is kniown as Von Neumann bottleneck. This problem
solved in two ways:
CA-123
POPULAR PUBLICATIONS
I. Use of cachememory between CPU and main mnemory
2. Using RISC computers
This performance problem can be reduced by introducing a cache
of fast memory) in between the CPUand the main memory. This ismemory (special type
because the speed of
thecache memory is almost same as that of the CPU. So there is no waiting time for CPU
and data-word to come to it for processing. Another way of solving the problem is by
USing special type of computer known as Reduced Instruction Set Computers (RISC).
The main intention of the RISCis to reduce the total number of memory references made
by the CPU; instead it uses large number of registers for the same
purpose.
Long Answer Type Questions
1. a) What is SPEC rating? Explain.
WBUT 2015]
b) À50 MHz processor was used to execute a program with the following
instruction mix and clock cycle counts:
Instruction type Instructioncount Clock cycle count
Integer arithmetic 50000
Data transfer 35000 2
Floating point arithmetic 20000 2
Branch 6000 3
Calculate the effective CPI, MIPS rate and execution time for this program.
Answer:
a) The Standard Performance Evaluation Corporation (SPEC) is an American non-profit
organization that aims to "produce, establish, maintain and endorse a standardized set" of
performance benchmarks for computers. SPEC was founded in 1988. SPEC benchmarks
are widely used to evaluate the performance of computer systems; the test results are
published on the SPEC website. Results are sometimes informally referred to as
"SPECmarks" or just "SPEC". SPEC evolved into an umbrella organization
encompassing four diverse groups; Graphics and Workstation Performance Group
(GWPG), the High Performance Group (HPG), the Open Systems Group (OSG) and the
newest, the Research Group (RG).
b) Total instruction count 111000
CPI(50000 x 1+35000 x 2 +20000 x 2 + 6000 x 3) /111000=l.6
MIP=(clock frequency) (CPI x 1000000) =(50 x1000000) / (1.6 x1000000) 31.25
Execution time =CPI x Instruction count x Clock time
= 1.6 x 111000 (1/ (50 x 1000000) =0.003ms
CA-124
COMPUTER ARCHITECTURE
Answer:
Theterm addressing mmodes refers to the way in which the operand of an instruction is
a)
specified. Information contained in the instruction code is the value of the operand or the
address of the
result/operand. Following are the main addressing modes that are used on
and
various platforms l architectures.
Immediate Mode: The operand is an immediate value is stored explicitly in the
instruction.
Mode: The address of the operand is obtained by adding to the contents of the
2) Indexregister (called index register) aconstant value. The number of the index register
general
value are included in the instruction code.
andthe constant
3) Indirect Mode: The effective address of the operand is the contents of a register or
main memory
location, location whose address appears in the instruction. Indirection is
the instruction
noted by placing the name of the register or the memory address given in of the operand
address
in parentheses. The register or memory location that contains the be told to go to
isa pointer. When an execution takes place in such mode, instruction may
a specific address.
instruction code.
) DirectMode: The address of the operand is embedded in the
in the instruction. The
5) Register Mode: The name of the CPUregister is embedded register
register contains the value of the operand. The number of bits used to specify the
depends on the total number of registers from the processor set.
is
b) Relative Addressing Mode: In this mode, the content of the program counter (PC)
address. When the
added to the address part of the instruction to obtain the effective
number is added to the content of the PC, the result is an effective address whose position
in memory is relative to the address of the next instruction.
Effective Address (EA) = PC + A
Direct Addressing Mode: In this mode, the address of the memory location which holds
the operand is included in the instruction. The operand resides in memory and its address
is given by the address field of the instruction.
For example LDA 4000H
Instruction
LOpcode Address Memory
Operand
CA-125
POPULAR PUBLICATIONS
Answer:
a) Power PC:
microprocessor is a highly integrated single-chip processor that combines a
The PowerPC organization, and a versatile hich
machine
powerful RISC architecture, a superscalarcontains a 32KB unified cache and is canahl
performance bus interface. The processor to 3 instructions per cycle. The be
and completing up
of dispatching, executing,provide a wide range of system bus interfaces, including
interface configurations transactions. 1he result is a cost effective, general
pipelined, non-pipelined, and split
microprocessor solution that offers very competitive performance.
purpose
Inetructlon Quaue and Dispatch Loglc
32 Y32
Flxed Floating
Branch Sequencer Point Polnt
Unil Unit Unlt
f32
64
Me mory
Instr. Menagement
Felch Unit 32 32
256
32 32
256
Y32
Memory Queue + cOP
Bus Interface Unit Unit
Fetch Dispatch
Decode Execute Writeback
Losd/siore Instruetiong
Fetch Dispatch Address
Decode Cache writeback
Gen,
Floating-Polnt
Felch
Dispatch Decode Execute Execuie2witoback
Fig: PowerPC 601 pipeline Architecture
d) RISC:
RISC, or Reduced Instruction Set Computer is a type of microprocessor architecture that
utilizes a small, highly-optimized set of instructions, rather than a more specialized set of
instructions often found in other types of architectures. The first RISC projects came
from IBM, Stanford, and UC-Berkeley in the late 70s and early 80s. The IBM 801,
Stanford MIPS, and Berkeley RISC 1 and 2 were all designed with a similar philosophy
which has become known as RISC. Certain design features have been characteristic of
most RSCprocessors:
One cycle execution time: RISC processors have a CPI(clock per instruction) of one
cycle. This is due to the optimization of each instruction on the CPU and a technique
called;
Pipelining: a technique that allows for simultaneous execution of parts, or stages, of
instructions to more efficiently process instructions.
Large number of registers: the RISC design philosophy generally incorporates a
larger number of registers to prevent in large amounts of interactions with memory.
Characteristics of RISC
Simpler instruction, hence simple instruction decoding.
Instructions come under size of one word.
Instructions take single clock cycle to get executed.
More number of general purpose register.
Simple Addressing Modes.
Less Data types.
Pipeline can be achieved.
CA-128