0% found this document useful (0 votes)
34 views

Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in

This document discusses advanced computer architecture. It covers instruction set architectures including CISC and RISC processors. CISC processors use complex instruction sets that can complete tasks in fewer lines of code, while RISC processors use simpler instruction sets. Superscalar and vector processors are also discussed, with superscalar processors issuing multiple instructions per cycle using multiple pipelines. Vector processors execute repeated operations on arrays of data in a pipelined manner. Memory hierarchy including caches and buses are additional topics covered.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Unit 2 - Advanced Computer Architecture - WWW - Rgpvnotes.in

This document discusses advanced computer architecture. It covers instruction set architectures including CISC and RISC processors. CISC processors use complex instruction sets that can complete tasks in fewer lines of code, while RISC processors use simpler instruction sets. Superscalar and vector processors are also discussed, with superscalar processors issuing multiple instructions per cycle using multiple pipelines. Vector processors execute repeated operations on arrays of data in a pipelined manner. Memory hierarchy including caches and buses are additional topics covered.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Subject Name: Advanced Computer Architecture

Subject Code: CS-6001


Semester: 6th
Downloaded from be.rgpvnotes.in
Outline
1 Instruction Set Architecture
CISC Scalar Processors
RISC Scalar Processors
2 Superscalar and Vector Processors
Superscalar Processors
The VLIW Architecture
Scalar RISC and Superscalar RISC
3 Memory Hierarchy
Hierarchical Memory Technology
Inclusion, Coherence and Locality
Memory Capacity Planning
4 Shared-Memory Organisation
Interleaved Memory Organisation
Bandwidth and Fault Tolerance
5 Backplane Bus Systems
Addressing and Timing Protocols
Page Arbitration,
Followno:us1 onTransaction,
facebook and Interrupt
to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 2 / 58
Downloaded from Instruction Set Architecture
be.rgpvnotes.in
Instruction Set Architecture

1 The instruction set of a compiler specifies the primitive commands or


machine instructions that a programmer can use in programming the
machine.
2 The complexity of an instruction set is attributed to the instruction
formats, data formats, addressing modes, general purpose registers,
opcode specifications and flow control mechanisms used.
3 Two schools of thoughts on instruction set architectures have
evolved, namely, CISC and RISC.

Page no:us2 on facebook to get real-time updates from RGPV


Follow
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 3 / 58
Downloaded from Instruction Set Architecture
be.rgpvnotes.in
Complex Instruction Set
1 Most functions are built into the hardware, making the insruction set
very large and complex.
2 Contains approximately 120 to 350 instructions using variable
instruction/data formats.
3 Uses a small set of 8 to 24 general purpose registers.
4 Execute a large set of memory reference operations based on more
than a dozen addressing modes.
5 Single instruction can execute several low-level operations (such as a
load from memory, an arithmetic operation, and a memory store).
6 The primary goal of CISC architecture is to complete a task in as few
lines of assembly code as possible.
7 Many high level language statements are directly implemented in the
hardware/firmware. This may simplify the compiler development,
improve execution efficiency, and allow an extension from scalar
instructions to vector and symbolic instructions.
Page no:us3 on facebook to get real-time updates from RGPV
Follow
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 4 / 58
Downloaded from Instruction Set Architecture
be.rgpvnotes.in
Reduced Instruction Set
1 Rarely used instructions are pushed into software to vacate chip area
for building more powerful RISC.
2 Contains less than 100 instructions with a fixed instruction format (32
bits).
3 Only 3 to 5 simple addressing modes are used.
4 Most instructions are register-based.
5 Memory access is done by load/store instructions only.
6 A large register file (at least 32) is used to improve fast context
switching among multiple users and most instructions execute in one
cycle with hardwired control.
7 Because of the reduction in instruction-set complexity, the entire
processor is implemented on a single VLSI chip.
8 Higher clock rate and a lower CPI, which leads to higher MIPS
ratings.
Page no:us4 on facebook to get real-time updates from RGPV
Follow
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 5 / 58
Downloaded from Instruction Set Architecture
be.rgpvnotes.in
Architectural Distinctions I

S.N. Complex Instruction Set Computer Reduced Instruction Set Computer


(CISC) (RISC)
1 Small Code size. Complex instruc- Large code size. Simple instruc-
tions taking multiple cycles. tions taking 1 cycle.
2 Any instruction may reference Only LOAD/STORE reference
memory. memory.
3 Not pipelined or less pipelined. Highly pipelined.
4 Use a unified cache for holding Separate instruction and data
both instructions and data. There- caches are used with different
fore, they must share the same access paths.
data/instruction path. Modern
CISC also use split cache.
5 Use microprogrammed control Use hardwired control.
(ROM). Modern CISC also use
hardwired control.

Page no:us5 on facebook to get real-time updates from RGPV


Follow
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 6 / 58
Downloaded from Instruction Set Architecture
be.rgpvnotes.in
Architectural Distinctions II

S.N. Complex Instruction Set Computer Reduced Instruction Set Computer


(CISC) (RISC)
6 Single register set. Multiple register sets.
7 With few general purpose regis- Using a large register file and sepa-
ters many more instructions accessrate I and D caches it benfits with
memory. Therefore, high CPI internal data forwarding. CPI =1 if
pipelining is carried out perfectly.
8 Large set of instruction set with Small set of instruction set with
variable formats(16-64 bits) fixed 32 bit format and most reg-
ister based instructions.
9 12- 24 addressing modes. 3-5 addressing modes.
10 Used in low end applications such Used in high end applications such
as security systems, home automa- as video processing, telecommuni-
tions, etc. cations and image processing.

Page no:us6 on facebook to get real-time updates from RGPV


Follow
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 7 / 58
Downloaded from Instruction Set Architecture
CISC Scalar Processors
be.rgpvnotes.in
CISC Scalar Processors

1 Execute with scalar data.


2 Capable of executing both integer and floating point operatons.
3 Exploit instruction level parallelism.
4 Based on a complex instruction set, can be built either with a single
chip or with multiple chips.
5 Major causes of underpipelined situations include data dependence,
resource conflicts, branch penalties and logic hazards.
6 Representative CISC Procesors: VAX 860, MC68040, i486.

Page no:us7 on facebook to get real-time updates from RGPV


Follow
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 8 / 58
Downloaded from Instruction Set Architecture
RISC Scalar Processors
be.rgpvnotes.in
RISC Scalar Processors

1 Designed to issue one instruction per cycle.


2 Less frequently used operations are pushed into the sofware.
3 The reliance of a good compiler is much more demanding in a RISC
processor.
4 Exploit instruction level parallelism.
5 Representative RISC Procesors: Sun SPARC, Intel i860, Motorola
M88100, and AMD 29000.

Page no:us8 on facebook to get real-time updates from RGPV


Follow
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 9 / 58
Superscalar
Downloaded from and Vector Processors
be.rgpvnotes.in
Superscalar and Vector Processors

1 A CISC or RISC scalar processors can be improved with a superscalar


or vector architecture. Scalar processors are those executing one
instruction per cycle. Only one instruction is issued per cycle and only
one completion of instruction is expected from the pipeline per cycle.
2 In a superscalar processor, multiple instruction pipelines are used per
cycle and multiple results are generated per cycle.
3 A vector processor executes vector instructions on arrays of data.
Thus each instruction involves a string of repeated operations, which
are ideal for pipelining with one result per cycle.

Page no:us9 on facebook to get real-time updates from RGPV


Follow
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 10 / 58
Superscalar
Downloaded from and Vector Processors
be.rgpvnotes.in Superscalar Processors

Superscalar Processors I

1 Superscalar processors are designed to exploit more instruction-level


parallelism in user programs. Only independent instructions can be
executed in parallel without causing a wait state.
2 The instruction issue degree is limited to 2 to 5.
3 In order to fully utilize a superscalar processor of degree m, m
instructions must be executable in parallel. This situation may not be
true in all clock cycles.
4 Superscalar processor depends more on an optimizing compiler to
exploit parallelism.
5 Multiple instruction pipelines are used. The instruction cache supplies
multiple instructions per fetch.
6 Multiple functional units are built into the integer and floating point
unit. Multiple data buses exist among the functional units.

Page no:us10
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 11 / 58
Superscalar
Downloaded from and Vector Processors
be.rgpvnotes.in The VLIW Architecture

The VLIW Architecture I

Page no:us11
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 12 / 58
Superscalar
Downloaded from and Vector Processors
be.rgpvnotes.in The VLIW Architecture

The VLIW Architecture II


1 The VLIW architecture is generalised from two well established
concepts: horizontal microcoding and superscalar processing.
2 A VLIW machine has instruction words hundreds of bits in length.
3 Multiple functional units are used concurrently in a VLIW processor.
4 All functional units share the use of a common large register file.
5 The operations to be simultaneously executed by the functional units
are synchronized in a VLIW instruction.
6 Different fields of the long instruction word carry the opcodes to be
dispatched to different functional units.
7 Programs written in conventional short instruction words must be
compacted together to form the VLIW instructions.
8 This compaction must be done by a compiler which can predict
branch outcomes.

Page no:us12
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 13 / 58
Superscalar
Downloaded from and Vector Processors
be.rgpvnotes.in The VLIW Architecture

Pipelining in VLIW processors I

Each instruction specifies multiple operations. The effective CPI becomes


less than 0.33. VLIW machines behave much like superscalar machines
with three differences.
1 The decoding of VLIW instructions is easier than that of superscalar
instructions.
2 The code density of superscalar machine is better when the available
instruction level parallelism is less than that exploitable by the VLIW
machine. This is because the fixed VLIW format includes bits for non
executable operations, while the superscalar processor issues only
executable instructions.
3 A superscalar machine can be object-code compatible with a large
family of non parallel machines. On the contrary, a VLIW machine
exploiting different amounts of parallelism would require different
instruction sets.
Page no:us13
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 14 / 58
Superscalar
Downloaded from and Vector Processors
be.rgpvnotes.in The VLIW Architecture

Superscalar VLIW
Instruction per 6N 1 large instruction
Cycle does the same work
How do we find Look at ≫ N in- Just do next large
independent structions instructions
instructions
Hardware Cost Expensive Less Expensive
Help from com- Compiler can help Completely de-
piler pends on compiler

Page no:us14
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 15 / 58
Superscalar
Downloaded from and Vector Processors
be.rgpvnotes.in Scalar RISC and Superscalar RISC

Scalar RISC and Superscalar RISC I

1 A processor that executes scalar data is called a scalar processor.


Using fixed point operands, integer instructions are executed by scalar
processors even in their simplest state. More powerful scalar
processors usually execute both floating point and integer operations.
Recently produced scalar processors contain both a floating point unit
and an integer unit, all on the same CPU chip. Most of these modern
scalar processors use instructions of the 32-bit kind.
2 The superscalar processor, on the other hand, executes multiple
instructions at a time because of its multiple number of pipelines.
This CPU structure implements instruction-level parallelism, which is
a form of parallelism in computer hardware, within a single computer
processor. This means it can allow fast CPU throughput that is not
even remotely possible in other processors that do not implement
instruction-level parallelism.
Page no:us15
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 16 / 58
Superscalar
Downloaded from and Vector Processors
be.rgpvnotes.in Scalar RISC and Superscalar RISC

Scalar RISC and Superscalar RISC II


3 Instead of executing one instruction at a time, a superscalar processor
uses its redundant functional units in the execution of multiple
instructions. These functional units are not separate CPU cores, but a
single CPU’s extension resources such as multipliers, bit shifters and
arithmetic logic units (ALUs).
4 A scalar processor, considered to be the simplest of all processors,
works on one or two computer data items at a given time. The
superscalar processor works on multiple instructions and several
groups of multiple data items at a time.
5 Scalar and superscalar processors both function the same way in
terms of how they manipulate data, but their difference lies in how
many manipulations and data items they can work on in a given time.
Superscalar processors can handle multiple instructions and data
items, while the scalar processor simply cannot, therefore making the
former a more powerful processor than the latter.
Page no:us16
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 17 / 58
Superscalar
Downloaded from and Vector Processors
be.rgpvnotes.in Scalar RISC and Superscalar RISC

Scalar RISC and Superscalar RISC III

6 Scalar and superscalar processors both have some similarities with


vector processors. Like a scalar processor, a vector processor also
executes a single instruction at a time, but instead of just
manipulating one data item, its single instruction can access multiple
data items.
7 Similar with the superscalar processor, a vector processor has several
redundant functional units that let it manipulate multiple data items,
but it can only work on a single instruction at a time. In essence, a
superscalar processor is a combination of a scalar processor and a
vector processor.

Page no:us17
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 18 / 58
Downloaded from Memory Hierarchy
be.rgpvnotes.in
Hierarchical Memory Technology I

Storage devices such as registers, caches, main memory, disk devices and
tape units are often organised as a hierarchy. The memory technology and
storage organisation at each level are characterized by five parameters:
access time, memory size, cost per byte, transfer bandwidth and unit of
transfer.
Access time refers to the roundtrip time from the CPU to the ith-level
memory.
The memory size is the number of bytes or words in level i.
The cost of ith-level memory is estimated by the product of cost per
byte and memory size.
The bandwidth refers to the rate at which information is transferred
between adjacent levels.
The unit of transfer refers to the grain size for data transfer between
level i and i + 1.
Page no:us18
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 19 / 58
Downloaded from Memory Hierarchy
be.rgpvnotes.in
Hierarchical Memory Technology II

Memory devices at a lower level are faster to access, smaller in size


and more expensive per byte, having a higher bandwidth and using a
smaller unit of transfer as compared with those at a higher level.
ti−1 < ti , si−1 < si , ci−1 > ci bi−1 > bi , ti−1 < ti and xi−1 < xi

Page no:us19
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 20 / 58
Downloaded from Memory Hierarchy
be.rgpvnotes.in
Hierarchical Memory Technology III

Page no:us20
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 21 / 58
Downloaded from Memory Hierarchy
be.rgpvnotes.in Inclusion, Coherence and Locality

Inclusion, Coherence and Locality I

1 The inclusion property is stated as M1 ⊂ M2 ⊂ M3 ... ⊂ Mn . The


set inclusion relationship implies that all information items are
originally stored in the outermost level Mn . During the processing,
subsets of Mn are copied into Mn−1 . Information transfers between
the CPU and cache is in terms of words (4 or 8 bytes).
The cache (M1 ) is divided into cache blocks. Each block is typically
32 bytes. Blocks are the unit of data transfer between the cache and
main memory.
The main memory (M2 ) is divided into pages, say 4 Kbytes each.
Each page contains say, 128 blocks. pages are the units of
information transferred between disk and main memory.
Scattered pages are organised as a segment in disk memory.
Data transfer betwen the disk and tape unit is handled at the pipe
level.
Page no:us21
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 22 / 58
Downloaded from Memory Hierarchy
be.rgpvnotes.in Inclusion, Coherence and Locality

Inclusion, Coherence and Locality II


2 The coherence property requires that copies of the same
information item at successive memory levels be consistent. If a word
is modified in the cache, copies of that word must be updated
immediately, or eventually at all higher levels. There are two
strategies for maintaining the coherence in a memory hierarchy.
The first method is called write-through which demands immediate
updates in Mi+1 if a word is modified in Mi .
The second method is write-back which delays the updates in Mi+1
until the word being modified in Mi is replaced or removed from Mi .
3 Locality of References The memory hierarchy was developed based
on a program behaviour known as locality of references. Memory
references are generated by the CPU for either instruction or data
access. These accesses tend to be clustered in certain regions in time,
space, and ordering. There are three dimensions of the locality
property: temporal, spatial and sequential. Duing the lifetime of a
software process, a number of pages are used dynamically. The
Page no:us22
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 23 / 58
Downloaded from Memory Hierarchy
be.rgpvnotes.in Inclusion, Coherence and Locality

Inclusion, Coherence and Locality III

references to these pages vary from time to time, however they follow
certain access patterns. These memory reference access patterns are
caused by the following locality properties:
1 Temporal Locality: Recently referenced items are likely to be
referenced again in the near future. This is often caused by the special
program constructs such as iterative loops, process stacks, temporary
variables, or subroutines.
2 Spatial Locality: This referes to the tendency for a process to access
items whose addresses are near one another. For example, operations
on tables or array involve accesses of a certain clustered area in the
address space. Program segments such as routines and macros tend to
be stored in the same neighbourhood of the memory space.
3 Sequential Locality: In typical programs, the execution of instructions
follows a sequential order unless branch instructions create out-of-order
execution. Besides, the access of a large data array also follows a
sequential order.

Page no:us23
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 24 / 58
Downloaded from Memory Hierarchy
be.rgpvnotes.in Memory Capacity Planning

Memory Capacity Planning I

Hit Ratio: It is a concept defined for any two adjacent levels of a


memory hierarchy. When an information item is found in Mi , we call
it a ”hit”, otherwise a ”miss”. Consider memory levels Mi and Mi−1
in a hierarchy, i = 1, 2, ..., n. The hit ratio hi at Mi is the probability
that an information item will be found in Mi .
It is a function of the characteristics of the two adjacent levels Mi−1
and Mi . The miss ratio at Mi is defined as 1 − h. Assume h0 =0 and
hn =1. The access frequency to Mi is defined as
fi = (1 − h1 )(1 − h2 )...(1 − hi−1 )(hi ).
Due to the locality property, the access frequencies decrease very
rapidly from low to high levels; that is f1 ≫ f2 ≫ f3 ≫ ... ≫ fn .

Page no:us24
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 25 / 58
Downloaded from Memory Hierarchy
be.rgpvnotes.in Memory Capacity Planning

Memory Capacity Planning II


Efective Access Time: Every time a miss occurs, a penalty must be
paid to access the next higher level of memory. The misses have been
called block misses in the cache and page faults in the main memory.
The time penalty for a page fault is much longer than that for a block
miss due to the fact that t1 < t2 < t3 .
X
n
Teff = fi .ti
i=1
(1)
= h1 t1 + (1 − h1 )h2 t2 + (1 − h1 )(1 − h2 )h3 t3 + ...+
(1 − h1 )(1 − h2 )...(1 − hn−1 )hn tn

Hierarchy Optimization: The total cost of a memory hierarchy


isestimated as follows The total cost of a memory hierarchy
isestimated as follows
Xn
Ctotal = ci .si (2)
Page no:us25
Follow i=1
on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 26 / 58
Downloaded from Memory Hierarchy
be.rgpvnotes.in Memory Capacity Planning

Memory Capacity Planning III


Consider the design of a three-level memory hierarchy with the following
specification for memory characteristics:

Memory Level Access Time Capacity Cost/Kbyte


Cache t1=25 ns s1=512 Kbytes c1=$1.25
Main Memory t2=unknown s2=32 Mbytes c2=$0.2
Disk array t3=4 ms s3=unknown c3=$0.0002

The design goal is to achieve an effective memory access time t = 10.04µs


with a cache hit ratio h1 =0.98 and a hit ratio h2 = 0.9 in the main
memory. Also the total cost of the memory hierarchy is upper bounded by
$15,000. The main memory hierarchy cost is calculated as

C = c1 .s1 + c2 .s2 + c3 .s3 ≤ 15000 (3)

The maximum capacity of the disk is thus obtained as s3 =39.8 Gbytes


without exceeding the budget.
Page no:us26
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 27 / 58
Downloaded from Memory Hierarchy
be.rgpvnotes.in Memory Capacity Planning

Memory Capacity Planning IV

The effective memory access time is calculated as

Teff = h1 t1 + (1 − h1 )h2 t2 + (1 − h1 )(1 − h2 )h3 t3 ≤ 10.04µs (4)

10.04 × 10−6 = 0.98 × 25 × 10−9 + 0.02 × 0.9 × t2 + 0.02 ∗ 0.1 × 1 × 4 × 10−3 .


Thus t2 = 903ns

Page no:us27
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 28 / 58
Downloaded from Shared-Memory Organisation
be.rgpvnotes.in Interleaved Memory Organisation

Interleaved Memory Organisation I

1 In order to close up the speed gap between the CPU/cache and main
memory built with RAM modules, an interleaving technique is used.
2 The memory design goal is to broaden the effective memory
bandwidth so that more memory words can be accessed per unit type.
3 The ultimate purpose is to match the memory bandwidth with the
bus bandwidth and with the processor bandwidth.
Memory Interleaving
1 The main memory is built with memory modules connected to a
system bus.
2 Once presented with a memory address, each memory module returns
with one word per cycle.
3 It is possible to present different addresses to different memory
modules so that parallel access of multiple words can be done
sumulatenously or in a pipelined fashion.
Page no:us28
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 29 / 58
Downloaded from Shared-Memory Organisation
be.rgpvnotes.in Interleaved Memory Organisation

Interleaved Memory Organisation II

4 Consider a main memory formed with m = 2a modules, each


containing w = 2b words of memory cells. The total memory capacity
is m.w . = 2a+b . These memory words are assigned linear addresses.
Different ways of assigning linear addresses result in different memory
organisation.
5 Besides random access, the main memory is often block-accessed at
consecutive addresses.
6 Block access is needed for fetching a sequence of instructions or for
accessing a linearly ordered data structure.
7 There are two address formats for memory interleaving.

Page no:us29
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 30 / 58
Downloaded from Shared-Memory Organisation
be.rgpvnotes.in Interleaved Memory Organisation

Interleaved Memory Organisation III

8 Low-order interleaving spreads contiguous memory locations across


the m modules horizontally. Low order a bits of the memory address
are used to identify the memory module. The high order b bits are
the word addresses within each module. Same word address is applied
to all memory modules simultaneously. A module address decoder is
used to distribute module addresses. The low-order m-way
interleaving supports block access in a pipelined fashion.
9 High-order interleaving uses the high order a bits as the module
address and the low order b bits as the word address within each
module. Contiguous memory locations are thus assigned to the same
memory module. In each memory cycle, only one word is accessed
from each module. Thus the high order interleaving cannot support
block access of contiguous locations.

Page no:us30
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 31 / 58
Downloaded from Shared-Memory Organisation
be.rgpvnotes.in Interleaved Memory Organisation

Interleaved Memory Organisation IV

Page no:us31
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 32 / 58
Downloaded from Shared-Memory Organisation
be.rgpvnotes.in Interleaved Memory Organisation

Pipelined Memory Access I

1 Access of the m memory modules can be overlapped in a pipeline


fashion. For this purpose, the memory cycle is divided into m minor
cycles.
θ
τ= (5)
m
The major cycle θ is the total time required to complete the access of
a single word from a module. The minor cycle τ is the actual time
needed to produce one word, assuming overlapped access of
successive memory modules separated in every minor cycle τ . Even
though, the total block access access time is 2θ, the effective access
time of each word is reduced to τ .

Page no:us32
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 33 / 58
Downloaded from Shared-Memory Organisation
be.rgpvnotes.in Interleaved Memory Organisation

Pipelined Memory Access II

Page no:us33
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 34 / 58
Downloaded from Shared-Memory Organisation
be.rgpvnotes.in Bandwidth and Fault Tolerance

Memory Bandwidth I

1 Multiway interleaving increases effective memory bandwidth. A single


memory module is assumed to deliver one word per cycle and thus
has a bandwidth of 1.
2 Memory Bandwidth The memory bandwidth B of an m−way
interleaved memory is upper-bounded by m and lower-bounded by 1.
3 The Hellerman estimate of B is

B = m0.56 ≅ m (6)

where m is the number of interleaved memory module.

Page no:us34
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 35 / 58
Downloaded from Shared-Memory Organisation
be.rgpvnotes.in Bandwidth and Fault Tolerance

Fault Tolerance I

1 High and low-order interleaving can be combined to yield many


different interleaved memory organisations.
2 Sequential addresses are assigned in the high-order interleaved
memory in each memory module.
3 This makes it easier to isolate faulty memory modules. When one
module failure is detected, the remaining module can still be used by
opening a window in the address space.
4 This fault isolation cannot be carried out in a low-order interleaved
memory, in which a module failure may paralyze the entire memory
bank. Thus low-order interleaving memory is not fault tolerant.

Page no:us35
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 36 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in
Backplane Bus Systems I

The system bus of a computer system operates on a contention basis.


Several active devices such as processors may request use of the bus at the
same time. However, only one of them can be granted access at a time.
The effective bandwidth available to each processor is inversely
proportional to the number of processors contending for the bus. For this
reason, most bus based commercial multiprocessors are small in size.(4-16
processors)

Page no:us36
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 37 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in
Backplane Bus Specifications I

A backplane bus interconnects processors, data storage and peripheral


devices in a tightly coupled hardware conffuration.
The system bus must be designed to allow communication between
devices on the bus without disturbing the internal activities of all the
devices attached to the bus.
Timing protocols must be established to arbitrate between multiple
requests.
Operational rules must be sent to ensure orderly data transfers on the
bus.
Signal lines on the backplane are often functionally grouped into
several buses.
Various functional boards are plugged into slots on the backplane.

Page no:us37
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 38 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in
Backplane Bus Specifications II

Data Transfer Bus(DTB):


Data, address and control lines form the data transfer bus.
The addressing lines are used to broadcast the data and device
address.
The number of address lines are proportional to the logarithm of the
size of the address space.
The data lines are proportional to the memory word length.
Bus Arbitration and Control:
The process of assigning control of the DTB to a requester is called
arbitration.
Dedicated lines are reserved to coordinate the arbitration process
among several requesters. The requester is called a master and the
receiving end is called a slave.

Page no:us38
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 39 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in
Backplane Bus Specifications III

Interrupt lines are used to handle interrupts which are often


prioritized.
Dedicated lines may be used to synchronize parallel activities among
the processor module.
Utility lines include signals that provide periodic timing and
coordinate the power up and power down sequences of the system.
The backplane is made of signal lines and connectors.
Functional Modules:
A special bus controller board is used to house the backplane control
logic such as system clock driver, arbiter, bus timer and power driver.

Page no:us39
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 40 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in
Addressing and Timing Protocols I

There are two types of printed circuit boards connected to a bus.


Active boards like processors can act as bus masters or as slaves at
different times.
Passive boards like memory boards can act only as slaves.
The master can initiate a bus cycle and the slaves respond to requests
by a master.
Only one master can control the bus at a time. However, one or more
slaves can respond to the master’s request at a time.

Page no:us40
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 41 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in

Page no:us41
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 42 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Addressing and Timing Protocols

There are two types of printed circuit boards: active and passive.
Active boards can act as bus masters or slaves at different times.
Passive boards like memory boards can act only as slaves.
The master can initiate a bus cycle, and the slaves respond to
requests by a master.
Only one master can control the bus at a time. Howerver, one or
more slaves can respond to the master’s request at the same time.

Page no:us42
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 43 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Addressing and Timing Protocols

Bus Addressing

A backplane bus is driven by a digital clock with a fixed cycle time


called the bus cycle.
Not all the cycles are used for data transfers. To optimize
performance, the bus should be designed to minimize the time
required for request handling, arbitration, addressing and interrupts so
that most bus cycles are used for useful data transfer operations.
Each board can be identified with a slot number. When the slot
number matches the contents of high-order address lines, the board is
selected as a slave.
This geographical addressing allows the allocation of a logical board
address under software control, which increase the application
flexibility.

Page no:us43
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 44 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Addressing and Timing Protocols

Broadcall and Broadcast I

A broadcall is a read operation involving multiple slaves placing their


data on the bus lines. Special AND or OR operations over these data
are performed on the bus from the selected slaves. Broadcall
operations are used to detect multiple interrupt sources.
A broadcast is a write operation iinvolving multiple slaves.

Page no:us44
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 45 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Addressing and Timing Protocols

Synchronous Timing I

All bus transactions take place at fixed clock edges.


The clock signals are broadcast to all potential masters and slaves.
The clock cycle time is determined by the slowest device connected to
the bus.
The master uses the data-ready pulse to initiate the transfer.
The slave uses a data-accept pulse to signal completion of the bit
information being transferred.
Asynchronous bus is simple to control, requires less control circuitry,
and thus costs less.
It is suitable for connecting devices having relatively the same speed.
Otherwise, the slowest device will slow down the entire bus operation.

Page no:us45
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 46 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Addressing and Timing Protocols

Asynchronous Timing I

Asynchronous timing is based on a handshaking or interlocking


mechanism.
No fixed clock cycle is needed.
The rising edge(1) of the data-ready signal from the master triggers
the rising(2) of the data-accept signal from the slave.
The second signal triggers the falling (3) of the data-ready clock and
the removal of data from the bus.
The third signal triggers the trailing edge (4) of the data-accept clock.
This four-edge handshaking process is repeated untill all the data are
transferred.
The advantage of using an asynchronous bus lies in the freedom of
using variable length clock signals for different speed devices.
This does not impose any response time restrictions on the source
and destination.
Page no:us46
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 47 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Addressing and Timing Protocols

Asynchronous Timing II

It allows fast and slow devices to be cponnected on the same bus, and
it is less prone to noise.
Offers better application flexibility at the expense of increased
complexity and costs.

Page no:us47
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 48 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Addressing and Timing Protocols

Page no:us48
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 49 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Arbitration, Transaction, and Interrupt

Arbitration, Transaction, and Interrupt I

The process of selecting the next bus master is called bus arbitration.
The duration of a master’s control of the bus is called bus tenure.
The arbitration process is designed to restrict tenure of the bus to one
master at a time.
Competing requests must be arbitrated on a fairness or priority basis.

Page no:us49
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 50 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Arbitration, Transaction, and Interrupt

Central Arbitration I

The scheme uses a central arbiter.


Potential masters are daisy chained in a cascade.
A special signal line is used to propagate a bus-grant signal level from
the first master to the last master.
Each potential master can send a bus request. However, all requests
share the same bus-request line.
The bus-request signals the rise of the bus-grant level, which in turn
raises the bus-busy level.
A fixed priority is set in a daisy chain from left to right. Only when the
devices on the left do not request can a device be granted bus tenure.
When the bus transaction is complete, the busy bus level is lowered,
which triggers the faling of the bus grant signal and the subsequent
rising of the bus request signal.
Page no:us50
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 51 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Arbitration, Transaction, and Interrupt

Central Arbitration II

The advantage of this arbitration scheme is its simplicity. Additional


devices can be added anywhere in the daisy chain by sharing the same
set of arbitration lines.
The disadvantage is a fixed-priority sequence violating the fairness
practice.
Another drawback is its slowness in propagating the bus grant signal
along the daisy chain.
Whenever a higher priority device fails, all the lower priority devices
on the right of the daisy chain cannot use the bus.

Page no:us51
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 52 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Arbitration, Transaction, and Interrupt

Page no:us52
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 53 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Arbitration, Transaction, and Interrupt

Independent Requests and Grants I

Multiple bus-request and bus-grant signal lines are provided for each
potential master.
No daisy chanining is used.
The arbitration among potential masters is still carried out by a
central arbiter.
The advantage of using independent requests and grants in bus
arbitration is their flexibility and faster arbitration time compared with
the daisy chained policy.
Drawback is the large number of arbitration lines used.

Page no:us53
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 54 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Arbitration, Transaction, and Interrupt

Distributed Arbitration I

Each potential master is equipped with its own arbiter and a unique
arbitration number.
The arbitration number is used to resolve the arbitration competition.
When two or more devices compete for the bus, the winner is the one
whose arbitration number is the largest.
Parallel contention arbitration arbitration is used to determine which
device has the highest arbitration number.
The winner seizes control of the bus.

Page no:us54
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 55 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Arbitration, Transaction, and Interrupt

Page no:us55
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 56 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Arbitration, Transaction, and Interrupt

Transaction Modes I

Address-only Transfer consists of an address transfer followed by no


data.
Compelled Data Transfer consists of an address transfer followed
by a block of one or more data transfers to one or more contiguous
addresses.
Packet Data Transfer consists of an address transfer followed by a
fixed-length block of data transfers
A bus transaction consists of a request followed by a response.
Connected Transactionis used to carry out a master’s request and a
slave’s response in a single bus transaction.
Split Transaction splits the request and respponse into separate bus
transaction.

Page no:us56
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 57 / 58
Downloaded from Backplane Bus Systems
be.rgpvnotes.in Arbitration, Transaction, and Interrupt

Interrupt Mechanisms I

An interrupt is a request from I/O or other devices to a processor for


service or attention
A priority interrupt bus is used to pass the interrupt signal.
A functional module can be used as an interrupt handler.
Interrupts can be handled by message passing using the data bus line
on a time sharing basis.

Page no:us57
Follow on facebook to get real-time updates from RGPV
Veena Khandelwal (Dept of CSE IIST Indore) Advance Computer Architecture March 27, 2018 58 / 58
We hope you find these notes useful.
You can get previous year question papers at
https://qp.rgpvnotes.in .

If you have any queries or you want to submit your


study notes please write us at
[email protected]

You might also like