0% found this document useful (0 votes)
6 views

practical ACA

The document is a lab manual for Advanced Computer Architecture at Gyan Ganga Institute of Technology and Sciences, detailing various experiments related to computer architecture concepts. It covers topics such as classification schemes, arithmetic and instruction pipelines, branch handling, and types of hazards in pipelining. Each experiment includes aims, explanations of concepts, and methodologies for practical implementation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

practical ACA

The document is a lab manual for Advanced Computer Architecture at Gyan Ganga Institute of Technology and Sciences, detailing various experiments related to computer architecture concepts. It covers topics such as classification schemes, arithmetic and instruction pipelines, branch handling, and types of hazards in pipelining. Each experiment includes aims, explanations of concepts, and methodologies for practical implementation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 44

GYAN GANGA INSTITUTE OF TECHNOLOGY AND SCIENCES, JABALPUR

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

LAB MANUAL

Advanced Computer Architecture

(MCSE-103)

NAME:

ENROLLMENT NUMBER:

TEACHER IN CHARGE: -

DECEMBER 2024
Index
S.N Experiment Date Remark Sign
o.
1 Study of different Classification
Schemes

2 Study of Arithmetic pipeline

3 Study of instruction pipeline

4 Study of branch handling in pipeline

5 Study of different type of hazard

6 Implement parallel algorithm for


array
Processor (SIMD)
7 Implement search Algorithm

8 Study of scheduling algorithm in


distribution context

9 Study of load balancing in


distributed system

10 Study of deadlock avoidance in


distributed system
Experiment 1
Aim-Study of Different Classification Schemes.
There are 4 classification schemes-

 Flynn’s Classification (1966)


 Feng’s Classification (1972)
 Handler’s Classification (1977)
 Shore Classification(1973)

Flynn’s Classification-
It is based on multiplicity of instruction stream and data stream in computer system. this
classification scheme was produced by Michale J Flynn.

Digital computer may be classified into four categories according to the multiplicity of
instruction & data stream. The essential computing process is the execution of the sequence of
instruction on a set of data. The term stream used to hear to denote a sequence items
(instruction & data)as executed or operated upon by a single processor. Instruction or data are
defined with respect to a referenced machine & data stream is a sequence of data including
input partial or temporary result called for by the instruction stream.

Flynn’s Organization-
 Single instruction stream-single data stream(SISD)
 Single instruction stream-multiple data stream (SIMD)
 Multiple instruction stream-single data stream(MISD)
 Multiple instruction stream-multiple data stream(MIMD)

Instruction & data are fetched from memory modules. Instruction are decode by control unit
which sends the decode instruction stream to the processor unit for execution. Data stream
flow between, the processor and the memory bidirectional multiple memory module may be
used in the shared memory subsystem. Each instruction stream is generated by independent
control unit Multiple data stream originate from the sub system of subsystem module.

Single Instruction stream-Single Data stream(SISD):


In SISD instruction are executed sequentially but may be overlapped in there execution stages
(pipelining). As SISD computer may have more than one functional unit in it. All the functional
units are under the supervision of one control unit.
Single Instruction stream-Multiple Data stream (SIMD):
This class corresponds to array processors. There are multiple processing elements supervised
by the same control unit. All PEs receive the same instruction broadcast from the control unit
but operate on different data set from distinct data stream. The shared memory subsystem
may contain multiple modules.

Multiple Instruction stream-Single Data stream (MISD):


There are n processor units each receiving distinct instruction operation over the same data
stream & its derivatives. The result of one processor become the input of next processor in the
micropipe no real embodiment of MISD computer exist.

Multiple Instruction stream-Multiple Data stream (MIMD):


Most multiprocessor & multi computer system can classified in this category. In intrinsic MIMD
computer implies instruction among the n processor because all memory stream are derived
from the same data space shared by all processor. An intrinsic MIMD computer is tightly
coupled if the degree of instruction among the processor is high.
Feng’s Classification-
Tse-Yun Feng has suggested the use of the degree of parallelism to classify various computer
architecture. Feng’s classification is based on serial versus parallel processing. The maximum no
of binary digit that can be processed within a unit time by computer system called max
parallism degree. According to Feng’s computer Organization is categories in word slice & bit
slice. They are listed below:

 Word serial & bit Serial (WSBS)


 Word parallel & bit Serial (WPBS)
 World serial & bit parallel (WSBP)
 Word parallel & bit parallel (WPBP)

Horizontal axis shows word length (n) & vertical axis shows bit slice length (m)

Word Serial & Bit Serial (WSBS):


WSBS has been called bit serial processing because one bit (n=m=1) is processed at a time, a
rather slow process this was done only first generation computer

Word Parallel & Bit Serial (WBPS):


(n=1,m>1)has been called (bit slice) because an m-bit slice is processed at a time.

World Parallel Bits Parallel (WPSB):


(n>1,m>1) is known as fully parallel processing in which an array of n m bits is processed at a
time. It is fastest processing mode.
Handler’s Classification-
Wolfgang Handler has proposed a classification scheme for identifying the parallism degree &
pipelining degree built into the hardware structure of the computer system the consider
parallel- pipeline processing at 3 subsystem level.

 Processor control unit (PCU)


 Arithmetic Logic Unit (ALU)
 Bit Level Circuit (BLC)

Each PCU correspond to one processor or one CPU. The ALU is equivalent to the processing
element (PE’s). Computer system can be characterized by triple containing six independent
entities as below:

T(C)= < K*K’,D*D’,W*W’ >

Where,

o K = the no. Of processor(PCU’s) with in computer


o D = the no. Of ALU under the control of one PCU
o W = the word length of an ALU or of a PE
o W’= the no. Of pipeline stages in all ALU or in PE
o D’ = the no. ALU that can be pipelined
o K’= the no.of PCU’s that can be pipelined.

Shore classification-
Unlike Flynn’s ,shore classified the computers on the basis of organization of the constituent
elements in the computers Six different kind of machine were recognized-

Machine 1-
There are conventional Von Neumann architecture with following units in single quantities.

 Control unit
 Processing unit
 Instruction Memory(IM)
 Data Memory(DM)
A single DM read producer all bits of any word for processing in parallel by PV The PU may
contain multiple functional unit.

Machine 2-
Similar to Machine 1 except the DM fetcher a bit slice from all the words in the memory & PU is
organized to perform the operation in a bit serial manner on all the words.
Machine 3-
This is combination of 2 machine 1 & 2. It could be characterized having a memory as an array
of bits with both horizontal 2 vertical reading 2 processing possible.

Machine 4-
It is obtained by replicating the PU & DM of machine 1. An assemble of PU &DM is called as
processing Element (PE).Instructing is issued to PE by a single Control Unit.
Machine 5-
It is similar to machine 4 with the addition of communication between processing element Eg.
ILLIAC IV

Machine 6-
Machine 1 & 5 maintain separation between data memory & processing unit with some data
bus connection unit providing the communication between them.
Experiment 2
Aim- Study of Arithmetic Pipeline.
The complex arithmetic operations like multiplication and floating point operations consume
much of the time of the ALU. These operations can also be pipelined by segmenting the
operations of the ALU and as a consequence, high speed performance may be achieved. Thus,
the pipelines used for arithmetic operations are known as arithmetic pipelines.

The technique of pipelining can be applied to various complex and slow arithmetic operations
to speed up the processing time. The pipelines used for arithmetic computations are called
Arithmetic pipelines. In this section, we discuss arithmetic pipelines based on arithmetic
operations. Arithmetic pipelines are constructed for simple fixed-point and complex floating-
point arithmetic operations. These arithmetic operations are well suited to pipelining as these
operations can be efficiently partitioned into subtasks for the pipeline stages. For implementing
the arithmetic pipelines we generally use following two types of adder:

 Carry propagation adder (CPA):


It adds two numbers such that carries generated in successive digits are propagated.
 Carry save adder (CSA):
It adds two numbers such that carries generated are not propagated rather these are
saved in a carry vector.
Fixed Arithmetic pipelines:

We take the example of multiplication of fixed numbers. Two fixed-point numbers are added by
the ALU using add and shift operations. This sequential execution makes the multiplication a
slow process. If we look at the multiplication process carefully, then we observe that this is the
process of adding the multiple copies of shifted multiplicands as show below:
Now, we can identify the following stages for the pipeline:

 The first stage generates the partial product of the numbers, which form the six rows of
shifted multiplicands.
 In the second stage, the six numbers are given to the two CSAs merging into four
numbers.
 In the third stage, there is a single CSA merging the numbers into 3 numbers.
 In the fourth stage, there is a single number merging three numbers into 2 numbers.
 In the fifth stage, the last two numbers are added through a CPA to get the final
product.
These stages have been implemented using CSA tree

Arithmetic pipeline for multiplication of two 6 digit numbers

Floating point Arithmetic pipelines:


Floating point computations are the best candidates for pipelining. Take the example of
addition of two floating point numbers. Following stages are identified for the addition of two
floating point numbers:

• First stage will compare the exponents of the two numbers.


• Second stage will look for alignment of mantissas.
• In the third stage, mantissas are added.
• In the last stage, the result is normalized.

Arithmetic pipeline for floating point addition of 2 numbers


Experiment -3
Aim- Study of Instruction Pipeline.
An instruction cycle may consist of many operations like, fetch code, decode , compute operand
addresses, fetch operands, and execute instructions. These operations of the instruction
execution cycle can be realized through the pipelining concept. Each of these operations forms
one stage of a pipeline. The overlapping of execution of the operations through the pipeline
provides a speedup over the normal execution. Thus, the pipelines used for instruction cycle
operations are known as instruction pipelines.

the stream of instructions in the instruction execution cycle, can be realized through a pipeline
where overlapped execution of different operations are performed. The process of executing
the instruction involves the following major steps:

• Fetch the instruction from the main memory


• Decode the instruction
• Fetch the operand
• Execute the decoded instruction
These four steps become the candidates for stages for the pipeline, which we call as instruction
pipeline.
Since, in the pipelined execution, there is overlapped execution of operations, the four stages
of the instruction pipeline will work in the overlapped manner. First, the instruction address is
fetched from the memory to the first stage of the pipeline. The first stage fetches the
instruction and gives its output to the second stage. While the second stage of the pipeline is
decoding the instruction, the first stage gets another input and fetches the next instruction.
When the first instruction has been decoded in the second stage, then its output is fed to the
third stage. When the third stage is fetching the operand for the first instruction, then the
second stage gets the second instruction and the first stage gets input for another instruction
and so on. In this way, the pipeline is executing the instruction in an overlapped manner
increasing the throughput and speed of execution. The scenario of these overlapped operations
in the instruction pipeline can be illustrated through the space-time diagram. In Figure A, first
we show the space-time diagram for non-overlapped execution in a sequential environment
and then for the overlapped pipelined environment. It is clear from the two diagrams that in
non-overlapped execution, results are achieved only after 4 cycles while in overlapped
pipelined execution, after 4 cycles, we are getting output after each cycle. Soon in the
instruction pipeline, the instruction cycle has been reduced to ¼ of the sequential execution.

Space-time diagram for Non-pipelined Processor

Space-time diagram for Overlapped Instruction pipelined Processor

Figure A
Instruction buffers:

For taking the full advantage of pipelining, pipelines should be filled continuously. Therefore,
instruction fetch rate should be matched with the pipeline consumption rate. To do this,
instruction buffers are used. Instruction buffers in CPU have high speed memory for storing the
instructions. The instructions are pre-fetched in the buffer from the main memory. Another
alternative for the instruction buffer is the cache memory between the CPU and the main
memory. The advantage of cache memory is that it can be used for both instruction and data.
But cache requires more complex control logic than the instruction buffer. Some pipelined
computers have adopted both.
Experiment -4
Aim: Study of Branch Handling in pipeline.

The performance of pipelined processors is limited by data dependence & branch instruction.
The evaluation of branching strategies can be performed either or specific pipeline architecture
using trace data or by applying analytic models.

Effect of Branching:

3 Basic terms are introduced below for the analysis of branching effect :
The action of fetching a non sequential or remote instruction after a branch instruction is called
branch taken. The instruction to be executed after a branch taken is called branch target. The
number of pipeline cycle wasted between a branch taken its branch target is called the delay
slot, denoted by b . In general 0<b<k-1, where k is the number of pipeline stages. When a
branch taken occurs all the instruction following the branch in the pipeline become useless and
will be drained from the pipeline.

This implies that a branch taken cause the pipeline to be flushed losing a no of useful cycles.

These term are illustrated in fig where a branch taken cause I B+1 through Ib+k-1 to be drained from
the pipeline. Let P be the probability of a conditional branch instruction ina typical instruction
stream & q the probability of a successfully executed conditional branch instruction in a branch
taken typical value of p + 20% & q = 60% have been observed in some programs :

The penalty paid by branching is equal to pqnbr because each branch taken costs if extra
pipeline cycles. The total execution of n instruction including the effect of branching as
follows :

Teff = Kr + (n-1)T + pqnbI

Effective pipeline throughout with the influence of branching :


When p = q = 0 (bo branching) the ae bound approaches the maximum throughout f = 1/
P + 0.2, q=)>^ & b = k-1 = 7 we define the following performance degradation factor :

The above analysis implies that performance can be degraded by 46% with branching when the
instruction stream is sufficiently 10ng. This analysis demonstrate the degree of performance
caused by branching in an instruction pipeline.
BRANCH PREDICTION :

Branch can be predicted either based on branch code types statically or based on branch
history during program execution. The probability of branch instruction type can be used to
predict branch ? This requires collecting the frequency & probability of branch taken abd
branch type across a large no. of program trace. Such a static branch strategy may nt be always
accurate.

The static prediction direction (taken or not taken) is usually wired into the processor. The
wired in static prediction cannot be changed once committed to the hw However the scheme
can be modified to arrow the programmer or compiler to select the direction of each branch on
a semi static prediction basis>.

A dynamic branch strategy uses recent branch history to predict whether or not the branch will
be taken next time when it occurs. To be accurate one may need to use the entire history of the
branch to predict the future choice. This ps infeasible to implement . Therefore most dynamic
prediction is determined recent history.

Gayon (1992) has classified dynamic branch strategies Into 3 major classes :

One class predicts the branch direction based upon information found at the decode stage>

The second class uses a cache to store target address a the stage the effective address of the
branch target is computed.

The third scheme uses a cheche to store target instruction at the fetch stage all dynamic
prediction are adjusted dynamically as a program is executed.

Dynamic prediction demands additional h/w to keep track of the past behavior of the branch
instruction at run time. The amount of history recorded should be small. Otherwise the
prediction logic becomes too costly to implement.

Lee & Smith (1994) have shown the use of a branch target buffer (BTB) to implement branch
prediction. The BTB is used to hold recent branch information including address of the branch
target used. The address of the branch instruction locates its entry in the BTB.
Experiment 5
Aim- Study of different type of hazards.
The basic requirement of any pipeline processor is that various stages in the pipeline
should work independent of each other.However it may not be always the case. There may be
some hindrance which may cause disturbance in the smooth flow through the pipeline such
hindrance in the pipeline terminology are called as Hazards . There are bottleneck of pipeline
design .These hazards prevent the smooth flow in the pipeline and degrade the performance of
the pipeline. There are 3 types of harards.

 DATA HAZARDS
 CONTROL HAZARDS
 STRUCTURAL HAZARDS

DATA HAZARDS-
Data hazards occur when instructions that exhibit data dependence modify data in different
stages of a pipeline.

Data Dependence-

A data dependency in computer science is a situation in which a program statement


(instruction) refers to the data of a preceding statement. In compiler theory, the technique
used to discover data dependencies among statements (or instructions) is called dependence
analysis.

There are three types of dependencies: data, name, and control.

 Flow dependency

A Flow dependency, also known as a data dependency or true dependency or read-after-


write (RAW), occurs when an instruction depends on the result of a previous instruction:

1. A = 3
2. B = A
3. C = B
Instruction 3 is truly dependent on instruction 2, as the final value of C depends on the
instruction updating B. Instruction 2 is truly dependent on instruction 1, as the final value of B
depends on the instruction updating A. Since instruction 3 is truly dependent upon instruction 2
and instruction 2 is truly dependent on instruction 1, instruction 3 is also truly dependent on
instruction 1.

 Anti-dependency

Anti-dependency, also known as write-after-read (WAR), occurs when an instruction requires a


value that is later updated. In the following example, instruction 2 anti-depends on instruction
3 — the ordering of these instructions cannot be changed, nor can they be executed in parallel
(possibly changing the instruction ordering), as this would affect the final value of A.

1. B = 3
2. A = B + 1
3. B = 7

Anti-dependency is an example of a name dependency. That is, renaming of variables could


remove the dependency, as in the next example:

1. B = 3
N. B2 = B
2. A = B2 + 1
3. B = 7

A new variable, B2, has been declared as a copy of B in a new instruction, instruction N. The
anti-dependency between 2 and 3 has been removed, meaning that these instructions may
now be executed in parallel. However, the modification has introduced a new dependency:
instruction 2 is now truly dependent on instruction N, which is truly dependent upon
instruction 1.

 Output dependency

An output dependency, also known as write-after-write (WAW), occurs when the ordering of
instructions will affect the final output value of a variable. In the example below, there is an
output dependency between instructions 3 and 1 — changing the ordering of instructions in
this example will change the final value of A, thus these instructions cannot be executed in
parallel.

1. B = 3
2. A = B + 1
3. B = 7
As with anti-dependencies, output dependencies are name dependencies. That is, they may be
removed through renaming of variables, as in the below modification of the above example:

1. B2 = 3
2. A = B2 + 1
3. B = 7

A commonly used naming convention for data dependencies is the following: Read-after-Write
or RAW (flow dependency), Write-after-Write or WAW (output dependency), and Write-After-
Read or WAR (anti-dependency).

Input Dependence-
Read and write are imput statement input output dependence occur not because the same variable

is involved but the same file is referenced by both i/p statement.

A statement S2 is input dependent on S1 if and only if S1 and S2 read the same resource and S1
precedes S2 in execution. The following is an example of an input dependence (RAR: Read-
After-Read):
S1 y := x + 3

S2 z := x + 5

Here, S2 and S1 both access the variable x.

 Unknown Dependence-
The dependence relation between two statement can not be deter mind in the following situation .

The subscript of a variable is itself subscribed (indirect addresing).

The subscript does not contain the loop index .

A variable appears more than once with subscript is not linear in the loop index variable.

when one or more of these condition exist a cover alive assumption is to claim known dependence

among the statement involved .

DATA DEPENDENCE IN PROGRAM-

Consider the following code fragment of 4 instructions –


S1: Load R1, A/R1 <- memory (A)/

S2: Add R2, R1/R2 <- (R1) + (R2)/

S3: Move R1, R2/R1 <- (R3)/

S4: Store B, R1/ memory (B) <- (R1)/

S2 is flow dependent on S1 because variable A is passed via the register R1.



S3 is antidependent on S2 because of potential conflict in register content in R1.

S3 is o/p dependent on S1 because they both modify the same register .

S2 and S3 are independent.

TYPES OF DATA HAZARDS:-


Read After Write (RAW)

(i2 tries to read a source before i1 writes to it) A read after write (RAW) data hazard refers to a
situation where an instruction refers to a result that has not yet been calculated or retrieved.
This can occur because even though an instruction is executed after a previous instruction, the
previous instruction has not been completely processed through the pipeline.

For example:

i1. R2 <- R1 + R3
i2. R4 <- R2 + R3

The first instruction is calculating a value to be saved in register R2, and the second is going to
use this value to compute a result for register R4. However, in a pipeline, when we fetch the
operands for the 2nd operation, the results from the first will not yet have been saved, and
hence we have a data dependency.

We say that there is a data dependency with instruction i2, as it is dependent on the completion
of instruction i1.

Write After Read (WAR)

(i2 tries to write a destination before it is read by i1) A write after read (WAR) data hazard
represents a problem with concurrent execution.

For example:

i1. R4 <- R1 + R5
i2. R5 <- R1 + R2

If we are in a situation that there is a chance that i2 may be completed before i1 (i.e. with
concurrent execution) we must ensure that we do not store the result of register R5 before i1
has had a chance to fetch the operands.

Write After Write (WAW)

(i2 tries to write an operand before it is written by i1) A write after write (WAW) data hazard
may occur in a concurrent execution environment.

Example

For example:
i1. R2 <- R4 + R7
i2. R2 <- R1 + R3

Control hazards (branch hazards):-

Branching hazards (also known as control hazards) occur with branches. On many instruction
pipeline micro architectures, the processor will not know the outcome of the branch when it
needs to insert a new instruction into the pipeline. These type of hazards arise due to control
dependence b/w various stages of pipeline control dependence refer to the situation where
order of execution of statement can’t be deter mind before run time. For example conditional
will not be resolved until run-time. Different path taken after a conditional branch may
introduce or eliminate data dependence among instruction dependence may also exist b/w
operation performed in successive interaction of a looping procedure.
Control hazards often prohibit parallelism from being exploited compiler techniques are needed
to get around the control hazard in order to exploit more parallelism.

Structural hazards

A structural hazard occurs when a part of the processor's hardware is needed by two or more
instructions at the same time. A canonical example is a single memory unit that is accessed
both in the fetch stage where an instruction is retrieved from memory, and the memory stage
where data is written and/or read from memory.They can often be resolved by separating the
component into orthogonal units (such as separate caches) or bubbling the pipeline.

It arise due to resource dependence .An instruction is resource dependent on a previously


issued instruction if it requires a b/w resource which is still being used by previously issued
instruction. If for instance only a single non-pipeline division unit is available then in the code
sequence
div r1,r2,r3
div r4,r2,r5
The second division instruction is resource dependent on the first one and cannot be executed
in parallel. Resource dependencies are constraints caused by limited resource such as execution
units. They can reduce the degree of parallelism that can be achieved at different stages of
execution.

Experiment 6
Aim: Implement parallel algorithm for array processor(SIMD)
The original motivation for developing SIMD array processor was to perform parallel
computation on vector or matrix types of data parallel processing algorithm have been
developed by many computer scientist for SIMD computers important SIMD algorithm can be
used to perform matrix multiplication, fast Fourier transform (FFT) matrix transposition
summation of vector element matrix inversion, parallel sorting linear recurrence Boolean
matrix operation & to solve partial differential equation.

The implementation of these parallel algorithms on SIMD machine is described by concurrent


algorithm

SIMD MATRIX MULTIPLICATION-

matrix multiplication is frequently needed in solving linear system of equation. The difference
between SISD & SIMD matrix algorithm are pointed out in their program architecture& speed
performance. In general the linear loop of multilevel SISD program can be replaced by one or
more simd vector instruction

Let A=[ai k] &

B=[b kj] be n*n matrix. The multiplication of A & B generate a product matrix C=A*B=[Cij] of
dimension n*n. The element of the product matrix C is related to the elements of A & B by

Cij=∑aik*bkj for 1≤ i≤n &1≤ j≤n-------------------1

There are n^3 cumulative multiplication to be performed by eq1.In conventional SISD uni-
processor system the cumulative multiplication are carried out by serially coded program.

For i=1 to n do

For j=1 to n do

Cij= 0(initialization)

For K=1 to n do

Cij= Cij+ aik*bkj(Scalar additive multiply)

End of K loop

End of J loop

End of i loop.
Matrix multiplication on SIMD computer with n PE’s algorithm construct depends heavily on
memory allocation of A & B & C matrices in PE’m. If we store each row vector of matrix across
PEM& column vector within same PEM. Memory allocation scheme allows parallels access of
the entire matrix element. In algorithm two parallel do operations corresponds to vector load
for initialization & vector multiply for the inner loop of additive multiplication. The time
complexity has been reduced to O(n^2) i.e. SIND algorithm is n time faster than SISD algorithm
for matrix multiplication .

An O(n^2) algo for SIMD algo matrix multiplication


for j=1 to n Do

for K=1 to n Do

Cik=0(vector load)

For j=1 to n Do;

For K=1 to n Do;

Cik =Cik+aij*bjk(vector multiply)

End of K loop

End of j loop;

End of I loop;
It should be noted that the vector load operation is performed to initialize the row vector of
matrix one row of 0 time. In the vector multiply operation the same multiplier aij is broadcast
from the C to all PEs to multiply all n element of the row vector of B.

PARALLEL SORTING ON ARRAY PROCESSOR


An SIMD algorithm is to be presented for sorting n^2 elements on a mesh connected processor
array in O(n) routing & comparison steps. This shows a speedup of O(log n) over the best
sorting algorithm. Which takes O(n log n) step on a uni-processor system .We assumes or array
processor with N=n^2 identical PEs connected by mesh network. There is no wrap around
connection simplifies the array sorting algorithm .Two time measure are needed to estimate
the time complexity of parallel sorting algorithm.

The M(J,2) sorting algorithm

J1:move all odds to the left column & all even to the right in 2tk time.

J2:use the odd even transposition sort to sort each column in 2j+k+j+c time
J3:interchange on each row in 2tk time

J4:perform one comparison interchange in 2kt+tk time.

The M(J,k) sorting algorithm


M1: if j>2 perform a single interchange step on even rows so that column contain either all
evens or all odds .the column are already segregated so nothing else needs to be done
(time:2k)

M2:unshuffled each row (time:k-2,tk)

M3:merge by calling algorithm m(j,k/2) on each half of the array (time:T(j,K/2))

M4: Shuffle each row [time (K-2)tk]

M5: Interchange on even rows [time:2k]

M6: Comparision interchange adjacent elements (every even with the next odd)[time:4tk+tk]
Experiment -7
Aim -Implement search algorithm.
Associative search algorithm:

Associative memories are basically used for the fast search & ordered retrieval of large files of
records. Many researchers have suggested using associative memories for implementing
relational database machine. Every relational database or relation of records can be arranged in
a tabular form as illustrated in example. The tabulation of records (relation) can be
programmed into the cells of an associative memory various associative search operations have
been divided into the following categories by T.Y Feng(1976).

 Extreme Search
The Maxima: The largest among a collection of record is researched.
The Minima: The smallest among a collection of records is searched.
The Media : Find for the median according to a certain ordering.

 Equivalence Searches
Equal to: Exact match is searched under a certain equality relation.
Not-equal-to: Search for these elements under equal to the given key.
Similar to: Search for a match with in the masked field.
Proximate to: Search for those records that satisfy a certain proximity (neighbourhood)
condition.

 Threshold search

Smaller-than: Search for these records that are strictly smaller than the given key.

Greater than: Search for these records that are strictly greater than the given key.

Not smaller than: Search for these records that are equal to or greater than the given
key.

Not greater than: Search for those records that are equal to or smaller than the given
key.

 Adjacency Searches
Near below: Search for the nearest record which is smaller than the key.

 Between limits searches

[X,Y]: Search for these records within the closed range {Z|X≤Z≤Y}
(X,y): Search for these records within the open range{Z|X<Z<Y}
[X,Y): Search for these records within the range {Z|X<Z<Y}
(X,Y]: Search for those records within the range {Z|X≤Z≤Y}

 Ordered retrievals

Ascending sort: list all the records in ascending order.


Descending sort: list all the records in descending order.

Listed above are primitive search operations of course one can always combine a
sequence of primitive search operations by some Boolean operators to form various
query conjuction. For eg: one may wish to answer the queries equal to A but not equal
to B the second largest from below. Outside the range etc Boolean operators AND, OR&
NOT can be used to form any query conjuction of predicate. A predicate consist of one
of the above relation operators plus on ployertory such as the paris {≤, A} or {≠, A}. The
above search operators are frequently used in text retrieval operations.

Example: The minima search that algo searches for the smallest number among a set of
n positive number stored in a bit serial AM array. Every number has I bits stored in a
field of a word from the bit position to the bit position S+f-1.

1. Initialize: C← I; I(0)←1; T(0)←0; K=1; J=S+K-1;


m←(0-----11---10----0) f bils of 1s.
2. Load Ti(K)= Ii(K-1) n (ei + Bij) for all i=1,2---n.
n
3. Detect Q(k)= U.Ti(k).
i-1
4. Reset T by applying ri(k)= ͞͞͞͞Ti (͞K) ∩ Q(k) for all i=1, 2 ----n
5. Increment K←K+1 Then proceed to step 2 if K≤f-1 or read out the work W
indicated by Ii(f)=1 If K=f.
EXPERIMENT- 8
Aim: Study of scheduling algorithm in distribution context.
Distributed scheduling concentrates on global scheduling because of the architecture of the
under- lying system. Casavant and Kuhl defines a taxonomy of task placement algorithm for
distributed system which shows in fig. The 2 major categories of global algorithms are static
and dynamic.

Static algorithm makes scheduling decisions based purely on information available at


compilation time. For eg. the typical I/P to a static algorithm would include the machine
configuration and no. of task and estimate of their running time.

Dynamic algorithm on the other hand, takes factors into account such as the current load to
each processor. Adaptive algorithm are special sub class of dynamic algorithm.

In physical non- distributed or centralized scheduling policies, a single processor makes all
decision regarding task placement has obvious implication for the autonomy of the
participating system.

Under physical distributed algorithm the logical authority for the decision making process is
distributed among the processor that constitute system.

Under non- cooperative distributed scheduling policies individual processor makes scheduling
choice independent of the choice make by other processor with co- operative scheduling the
processor sub ordinate local autonomy to the achievement of common goal.Both static and co-
operative distributed scheduling have optimal and sub- optimal branches.

Optimal assignments can be reached if complete information describing the system and the
task force is available.

Sub optimal algorithm are either approximate or heuristic. Heuristic algorithm use guiding
principle such as assigning tasks with heavy inter- task communication to the same processor or
placing the job first.

Approximate solution use the same computational methods as optimal solutions but use
solution that are within an acceptable range according to an algorithm depending metric.

Approximate and optimal algorithm employ technique based on computational approaches.

Enumeration of all possible solution


Graph theory

Mathematical programming

Queuing theory

A Taxonomy of Distributed Scheduling Algorithm


EXPERIMENT – 9
Aim: Study of load balancing in distributed system.
Load balancing is the way of distributing load units( job or task ) across a set of process which
are connected to a network which may be distributed across the globe. The access load or
remaining unexecuted load from a processor is migrated to other processor which have load
below the threshold load. Threshold load is such an amount of load to a processor that any
load may come further to that processor. In a system with multiple nodes there is a very high
chance that some nodes will be idle while the other will be over loaded. So the processor in the
system can be identified according to their present load as heavily loaded processor ( enough
jobs are waiting for execution ) lightly loaded processor (have no job for execution) by load
balancing strategy it is possible to make every processor equally busy and to finish the works
approximately at the same time.

A load balancing operation consist of 3 rules:

 Location rule
 Distribution rule
 Selection rule

The selection rule works either in pre-emption or in non- pre-emption fashion. The newly
generated process is always picked up by the non- pre-emptive rule while the running process
may be picked up by pre-emptive rule. Pre-emptive transfer is costly than non- pre-emptive
transfer which is more preferable . However pre-emptive transfer is more excellent than non-
pre-emptive transfer in some instances.

Practically load balancing decision are taken jointly by location and distribution rule. The
balancing domains are of 2 types; local and global. In local domain the balancing decision is
taken from a group of nearest neighbours by exchanging the local work load information while
in global domain the balancing decision is taken by triggering transfer partners across the
whole system it exchange work load information globally.

Benefits of Load balancing:

 Load balancing improves the performance of each node and hence the overall system
performance.
 Load balancing reduces job idle time.
 Small job do not suffer from long starvation.
 Maximum utilization of resources.
 Response time become shorter.
 Higher throughput.
 Higher reliability
 Low cost but high gain
 Extensibility and incremental growth

For the above benefits load balancing strategy becomes a field of intensive research.

The selection of load balancing depends on application parameter like balancing quality, load
generation, pattern and also between parameter like communication overhead. Generally load
balancing algorithms are of 2 types:

 Static Load Balancing: In static algorithm the processor are assigned to the processor at
the compile time according to the performance of the node. Once the processor are
assigned, no change or re- assignment is possible at run time. No. of jobs in each node is
fixed in static load balancing algorithm. Static algorithm do not collect any information
about the node. The assignment of job is done to the processing node on the basis of
the following factors: Incoming time, Extent of resources needed, mean execution time
and inter process communication static load balancing algorithm also called probabilistic
algorithm. Static load balancing algorithm divided into 2 sub class:
 Optimal static load balancing: If all the information and resource related to a system
are known optimal static load balancing can be done. It is possible to increase
throughput of a system and to maximize the use of resource by optimal load
balancing algorithm.
 Sub optimal static load balancing: Sub optimal load balancing algorithm will be
mandatory for some application when optimal solution is not found.
 Dynamic Load balancing:
During the static load balancing too much information about the system and job must
be known before execution. These information may not be available in advance so
dynamic load balancing algorithm come into existence. The assignment of job is done at
run time. In DLB jobs are re- assigned at the run time depending upon the situation that
is the load will be transferred from heavily loaded nodes to the lightly loaded nodes. In
this case communication over heads occur and become more when no. of processors
increases. In DLB no decision is taken until the process gets execution. The strategy
collects the information about the system state and about the job information.
EXPERIMENT-10
Aim- Study of deadlock avoidance in distributed system.

Deadlock-
Deadlock is a condition in a system where a process can’t proceed because it needs to obtain a
resource held by another process but it itself is holding a resource that the other process needs.
In a system of processes which communicate only with a single central agent deadlock can be
detected easily because the central agent has complete information about every process.
Deadlock is a fundamental problem in distributed system.

Types of deadlock-

 Communication Deadlock-
It occurs when process A is trying to send a message to process B, which is trying to send
0 message to process C, which is trying to send message to process A.
 Resource Deadlock-
It occur when process are trying to get exclusive access to device files, locks, server or
other resource.

Condition for Deadlock


There are four conditions have to be met for a deadlock occur in a system.

 Mutual Exclusion- A resource can be held by atmost one process.


 Hold and Wait- Process that already hold the resource can wait for another
resource.
 Non-Preemption- A resource once granted can’t be taken away.

 Circular Wait- Two or more peocess are waiting for resource hold by one of the
other process.

 Handling Deadlock in Distributed System-


Deadlock in distributed system is same as centralized system. Several categories can be
used to handle deadlock.
 Ignore-
In this approach assume that deadlock will never occur. This is used when the time
interval between occurrence of deadlock are large and the data loss incurred each time
in tolerable.
 Detection-
In this deadlock allow to occur when the state of the system is examined to detect that
a deadlock has occurred and sub-sequent it is corrected. After a deadlock detect it can
be corrected by following methods.

STATES OF PROCESS & THEIR STATE TRANSITION


TRANSITION EVENTS
1 active process
2 run process
3 preempt process
4 block process
5 wakeup process
6 terminate process
P1 -> R1 means that resource R1 is allocated to process P1.

P1<- R1 means that resource R1 is requested by process P1.

 Process Termination-
One or more process involved in the deadlock may be aborted.
 Resource Preemption-
Resource allocated to various process may be successively preempted and allocated to
other process until the deadlock is broken.
 Prevention-
Deadlock prevention works by preventing one of the 4 Coffman condition from
occurring.
o Removing the mutual exclusion condition means that no process will have exclusive
access to a resource.
o The hold and wait or resource holding condition may be removed by requiring
processes to request all the resource they will need before starting up.
o The no. of preemption condition may also be difficult or impossible to avoid.
o Circular wait approaches that avoid circular wait includes disabling interrupts during
circular sections.

 Avoidance- Deadlock can be avoided by certain information about process are available
to the OS before allocation of resource, as which resource a process will consume in its
lifetime. For every resource request, the systems see whatever granting the request will
mean that the system will enter an unsafe state, meaning a state that could result in
deadlock. The system then only grants request that will lead to safe state. In order for
the system to be able to determine whether the next state will be safe or unsafe. It
must know in advance at any time.

o Resource currently available.


o Resource currently allocated to each process.
o Resource that will require and released by these process in the future.

One known algorithm. That is used for deadlock avoidance is the bankes’s algo. Which requires
resource usage limit to be known in advance? However for many systems it is impossible to
know in advance what ever processes will require. This means that deadlock avoidance is often
impossible.
Two other algorithm. Are wail/die & wound/Die each of which uses a symmetry-
breaking technique.

CU:CONTROL UNIT

PU:PROCESSOR UNIT

MM:MEMORY MODULE CU PU MM
SM:SHARED MEMORY

IS:INSTRUCTION STREAM SISD COMPUTER

DS: DATA STREAM

You might also like