0% found this document useful (0 votes)

19 views49 pages

Computer Organization UNIT5

Uploaded by

swapna.medishetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views49 pages

Computer Organization UNIT5

Uploaded by

swapna.medishetty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Pipeline Crossbar

Timeshared

Cache Multiport
Memory

Coherence Inter-Process
Structure

Parallel Interconnection
Synchronization

Vector Switch
Communication

Processing Instruction
Computer Organization | RISC and CISC

 Reduced instruction set architecture (RISC )

The main idea behind this is to make hardware simpler by using an instruction set composed of a few basic
steps for loading, evaluating, and storing operations just like a load command will load data, a store
command will store the data.
 Complex instruction set architecture (cisc)

The main idea is that a single instruction will do all loading, evaluating, and storing operations just
like a multiplication command will do stuff like loading data, evaluating, and storing it, hence it’s
complex.

Both approaches try to increase the cpu performance

Risc: reduce the cycles per instruction at the cost of the number of instructions per program.

Cisc: the cisc approach attempts to minimize the number of instructions per program but at the cost
of an increase in the number of cycles per instruction.
Simpler instruction, hence simple instruction decoding.

Instruction comes undersize of one word.

Characteristic Instruction takes a single clock cycle to get executed.

of RISC More general-purpose registers.

Simple Addressing Modes.

Fewer Data types.

A pipeline can be achieved.

Complex instruction, hence complex instruction decoding.

Instructions are larger than one-word size.

Characteristic Instruction may take more than a single clock cycle to get
executed.
of CISC
Less number of general-purpose registers as operations get
performed in memory itself.

Complex Addressing Modes.

More Data types.

RISC CISC
Focus on software Focus on hardware
Uses both hardwired and microprogrammed
Uses only Hardwired control unit
control unit
Transistors are used for storing complex
Transistors are used for more registers
Instructions
Fixed sized instructions Variable sized instructions
Can perform only Register to Register Arithmetic Can perform REG to REG or REG to MEM or
operations MEM to MEM
Requires more number of registers Requires less number of registers
Code size is large Code size is small

Differences An instruction executed in a single clock cycle

An instruction fit in one word.
Instruction takes more than one clock cycle
Instructions are larger than the size of one word
Simple and limited addressing modes. Complex and more addressing modes.
RISC is Reduced Instruction Cycle. CISC is Complex Instruction Cycle.
The number of instructions are less as compared to The number of instructions are more as compared
CISC. to RISC.
It consumes the low power. It consumes more/high power.
RISC is highly pipelined. CISC is less pipelined.
RISC required more RAM. CISC required less RAM.
Here, Addressing modes are less. Here, Addressing modes are more.
Advantages and Disadvantages
Advantages of RISC:

Simpler instructions: RISC processors use a smaller set of simple instructions, which
makes them easier to decode and execute quickly. This results in faster processing times.
Faster execution: Because RISC processors have a simpler instruction set, they can
execute instructions faster than CISC processors.
Lower power consumption: RISC processors consume less power than CISC processors,
making them ideal for portable devices.

Disadvantages of RISC:

More instructions required: RISC processors require more instructions to perform complex
tasks than CISC processors.
Increased memory usage: RISC processors require more memory to store the additional
instructions needed to perform complex tasks.
Higher cost: Developing and manufacturing RISC processors can be more expensive than
CISC processors.
Advantages and Disadvantages
Advantages of CISC:

Reduced code size: CISC processors use complex instructions that can perform multiple
operations, reducing the amount of code needed to perform a task.
More memory efficient: Because CISC instructions are more complex, they require fewer
instructions to perform complex tasks, which can result in more memory-efficient code.
Widely used: CISC processors have been in use for a longer time than RISC processors, so
they have a larger user base and more available software.

Disadvantages of CISC:

Slower execution: CISC processors take longer to execute instructions because they have
more complex instructions and need more time to decode them.
More complex design: CISC processors have more complex instruction sets, which makes
them more difficult to design and manufacture.
Higher power consumption: CISC processors consume more power than RISC processors
because of their more complex instruction sets.
What is Parallel Processing ?

 For the purpose of increasing the computational speed of computer system, the term ‘parallel
processing‘ employed to give simultaneous data-processing operations is used to represent a large
class. In addition, a parallel processing system is capable of concurrent data processing to achieve
faster execution times.
 As an example, the next instruction can be read from memory, while an instruction is being
executed in ALU. The system can have two or more ALUs and be able to execute two or more
instructions at the same time. In addition, two or more processing is also used to speed up computer
processing capacity and increases with parallel processing, and with it, the cost of the system
increases. But, technological development has reduced hardware costs to the point where parallel
processing methods are economically possible.
Parallel processing derives from multiple levels of complexity.
It is distinguished between parallel and serial operations by the type of registers used at the lowest level.
Shift registers work one bit at a time in a serial fashion, while parallel registers work simultaneously with all bits of
simultaneously with all bits of the word.
At high levels of complexity, parallel processing derives from having a plurality of functional units that perform
separate or similar operations simultaneously.
By distributing data among several functional units, parallel processing is installed.

As an example, arithmetic, shift and logic operations can be divided into three units and operations are transformed into
a teach unit under the supervision of a control unit.
One possible method of dividing the execution unit into eight functional units operating in parallel is shown in figure.
Depending on the operation specified by the instruction, operands in the registers are transferred to one of the units,
associated with the operands. In each functional unit, the operation performed is denoted in each block of the diagram.
The arithmetic operations with integer numbers are performed by the adder and integer multiplier.
 Floating-point operations can be divided into three circuits operating in parallel. Logic, shift, and increment
operations are performed concurrently on different data. All units are independent of each other, therefore one
number is shifted while another number is being incremented. Generally, a multi-functional organization is
associated with a complex control unit to coordinate all the activities between the several components.

 The main advantage of parallel processing is that it provides better utilization of system resources by increasing
resource multiplicity which overall system throughput.
Pipelining : Type 1

 To improve the performance of a CPU we have two options: 1) Improve the hardware by introducing faster circuits. 2) Arrange the
hardware such that more than one operation can be performed at the same time. Since there is a limit on the speed of hardware and
the cost of faster circuits is quite high, we have to adopt the 2 nd option.
 Pipelining is a process of arrangement of hardware elements of the CPU such that its overall performance is increased.
Simultaneous execution of more than one instruction takes place in a pipelined processor.

 Design of a basic pipeline

 In a pipelined processor, a pipeline has two ends, the input end and the output end. Between these ends, there are
multiple stages/segments such that the output of one stage is connected to the input of the next stage and each stage
performs a specific operation.
 Interface registers are used to hold the intermediate output between two stages. These interface registers are also
called latch or buffer.
 All the stages in the pipeline along with the interface registers are controlled by a common clock.
Execution in a pipelined processor Execution sequence of instructions in a pipelined processor can be visualized
using a space-time diagram. For example, consider a processor having 4 stages and let there be 2 instructions to be
executed. We can visualize the execution sequence through the following space-time diagrams:

Non-overlapped execution: Overlapped

execution:
Stage
/ Stage /
Cycl Cycle 1 2 3 4 5
e 1 2 3 4 5 6 7 8
S1 I1 I2
S1 I1 I2
S2 I1 I2
S2 I1 I2

S3 I1 I2
S3 I1 I2

S4 I1 I2
S4 I1 I2

Total time = 8 Cycle Total time = 5 Cycle

 Pipeline Stages RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC instruction set.
Following are the 5 stages of the RISC pipeline with their respective operations:
 Stage 1 (Instruction Fetch) In this stage the CPU reads instructions from the address in the memory whose value is present in
the program counter.
 Stage 2 (Instruction Decode) In this stage, instruction is decoded and the register file is accessed to get the values from the
registers used in the instruction.
 Stage 3 (Instruction Execute) In this stage, ALU operations are performed.
 Stage 4 (Memory Access) In this stage, memory operands are read and written from/to the memory that is present in the
instruction.
 Stage 5 (Write Back) In this stage, computed/fetched value is written back to the register present in the instructions.
 Performance of a pipelined processor Consider a ‘k’ segment pipeline with clock cycle time as ‘Tp’. Let there be ‘n’ tasks to
be completed in the pipelined processor. Now, the first instruction is going to take ‘k’ cycles to come out of the pipeline but the
other ‘n – 1’ instructions will take only ‘1’ cycle each, i.e, a total of ‘n – 1’ cycles.
Performance of a pipelined processor Consider a ‘k’ segment pipeline with clock cycle time as ‘Tp’. Let
there be ‘n’ tasks to be completed in the pipelined processor. Now, the first instruction is going to take ‘k’
cycles to come out of the pipeline but the other ‘n – 1’ instructions will take only ‘1’ cycle each, i.e, a total
of ‘n – 1’ cycles. So, time taken to execute ‘n’ instructions in a pipelined processor:

ETpipeline = k + n – 1 cycles = (k + n – 1) Tp

In the same case, for a non-pipelined processor, the execution time of ‘n’ instructions will be:

ETnon-pipeline = n * k * Tp
Arithmetic Pipeline and Instruction Pipeline

1. ArithmeticPipeline :
An arithmetic pipeline divides an arithmetic problem into various sub problems for execution in various pipeline segments. It is used
for floating point operations, multiplication and various other computations. The process or flowchart arithmetic pipeline for floating
point addition is shown in the diagram.
 Floating point addition using arithmetic pipeline :

The following sub operations are performed in this case:

 Compare the exponents.
 Align the mantissas.
 Add or subtract the mantissas.
 Normalise the result
 First of all the two exponents are compared and the larger of two exponents is chosen as the result exponent. The difference in the
exponents then decides how many times we must shift the smaller exponent to the right. Then after shifting of exponent, both the
mantissas get aligned. Finally the addition of both numbers take place followed by normalisation of the result in the last segment.
2. Instruction Pipeline :


In this a stream of instructions can be executed by overlapping fetch, decode and execute phases of an instruction cycle. This type
of technique is used to increase the throughput of the computer system. An instruction pipeline reads instruction from the memory
while previous instructions are being executed in other segments of the pipeline. Thus we can execute multiple instructions
simultaneously. The pipeline will be more efficient if the instruction cycle is divided into segments of equal duration.
 In the most general case computer needs to process each instruction in following sequence of steps:
 Fetch the instruction from memory (FI)
 Decode the instruction (DA)
 Calculate the effective address
 Fetch the operands from memory (FO)
 Execute the instruction (EX)
 Store the result in the proper place
Vector processing

According to from where the operands are retrieved in a vector processor, pipe lined vector computers are classified into two
architectural configurations:
1.Memory to memory architecture –
In memory to memory architecture, source operands, intermediate and final results are retrieved (read) directly from the main
memory. For memory to memory vector instructions, the information of the base address, the offset, the increment, and the vector
length must be specified in order to enable streams of data transfers between the main memory and pipelines. The processors like TI-
ASC, CDC STAR-100, and Cyber-205 have vector instructions in memory to memory formats. The main points about memory to
memory architecture are:
1. There is no limitation of size
2. Speed is comparatively slow in this architecture
2.Register to register architecture –
In register to register architecture, operands and results are retrieved indirectly from the main memory through the use of large
number of vector registers or scalar registers. The processors like Cray-1 and the Fujitsu VP-200 use vector instructions in register
to register formats. The main points about register to register architecture are:
1. Register to register architecture has limited size.
2. Speed is very high as compared to the memory to memory architecture.
3. The hardware cost is high in this architecture.
Flowchart
Example:

• Here the instruction is fetched on first clock cycle in segment 1.

• Now it is decoded in next clock cycle, then operands are fetched and finally the instruction is executed. We can see
that here the fetch and decode phase overlap due to pipelining. By the time the first instruction is being decoded,
next instruction is fetched by the pipeline.
• In case of third instruction we see that it is a branched instruction. Here when it is being decoded 4th instruction is
fetched simultaneously. But as it is a branched instruction it may point to some other instruction when it is decoded.
• Thus fourth instruction is kept on hold until the branched instruction is executed. When it gets executed then the
fourth instruction is copied back and the other phases continue as usual.
A block diagram of a modern multiple pipeline vector computer
Array Processor

Array Processor performs computations on large array of data.

These are two types of Array Processors: Attached Array Processor, and SIMD Array Processor.
1. Attached Array Processor :
To improve the performance of the host computer in numerical computational tasks auxiliary processor is attached to it.
Attached array processor has two interfaces:

1.Input output interface to a common processor.

2.Interface with a local memory.

Here local memory interconnects main memory. Host computer is general purpose computer. Attached
processor is back end machine driven by the host computer.
The array processor is connected through an I/O controller to the computer & the computer treats it as an
external interface.
2. SIMD array processor :This is computer with multiple processing units operating in parallel. Both types of array processors,
manipulate vectors but their internal organization is different.

The processing units are synchronized to perform the same operation under the control of a common control unit. Thus providing a single instruction stream, multiple
data stream (SIMD) organization. As shown in figure, SIMD contains a set of identical processing elements (PES) each having a local memory M.
ProcessingElement

Each PE includes
ALU
Floating point arithmetic unit
Working Register

• Master control unit controls the operation in the PEs. The function of master control unit is to decode the instruction and determine how the instruction
to be executed. If the instruction is scalar or program control instruction then it is directly executed within the master control unit.
• Main memory is used for storage of the program while each PE uses operands stored in its local memory .
Multiprocessor:

A Multiprocessor is a computer system with two or more central processing units

(CPUs) share full access to a common RAM. The main objective of using a
multiprocessor is to boost the system’s execution speed, with other objectives being
fault tolerance and application matching.

There are two types of multiprocessors

1. Shared memory multiprocessor
2. Distributed memory multiprocessor.

In shared memory multiprocessors, all the CPUs shares the common memory but in a
distributed memory multiprocessor, every CPU has its own private memory.
Applications of Multiprocessor

1. As a uniprocessor, such as single instruction, single data stream (SISD).

2. As a multiprocessor, such as single instruction, multiple data stream (SIMD), which is usually
used for vector processing.
3. Multiple series of instructions in a single perspective, such as multiple instruction, single data
stream (MISD), which is used for describing hyper-threading or pipelined processors.
4. Inside a single system for executing multiple, individual series of instructions in multiple
perspectives, such as multiple instruction, multiple data stream (MIMD).
Benefits of using a Multiprocessor

 Enhanced performance.
 Multiple applications.
 Multi-tasking inside an application.
 High throughput and responsiveness.
 Hardware sharing among CPUs.
Interconnection Structures

The processors must be able to share a set of main memory modules & I/O devices in a
multiprocessor system. This sharing capability can be provided through interconnection
structures.

The interconnection structure that are commonly used can be given as follows:

1. Time-shared / Common Bus

2. Cross bar Switch
3. Multiport Memory
4. Multistage Switching Network
5. Hypercube System
1.Time-shared / Common Bus

In a multiprocessor system, the time shared bus interconnection provides a common communication path connecting all the functional units like processor, I/O processor,
memory unit etc. The figure below shows the multiple processors with common communication path (single bus).

To communicate with any functional unit, processor needs the bus to transfer the
data. To do so, the processor first need to see that whether the bus is available / not by
checking the status of the bus. If the bus is used by some other functional unit, the
status is busy, else free.

A processor can use bus only when the bus is free. The sender processor puts the
address of the destination on the bus & the destination unit identifies it. In order to
communicate with any functional unit, a command is issued to tell that unit, what
work is to be done. The other processors at that time will be either busy in internal
operations or will sit free, waiting to get bus.
We can use a bus controller to resolve conflicts, if any.

Single-Bus Multiprocessor Organization

Advantages and Disadvantages

Advantages
Inexpensive as no extra hardware is required such as switch.
Simple & easy to configure as the functional units are directly connected to the bus .

Disadvantages
Major fight with this kind of configuration is that if malfunctioning occurs in any of the bus interface circuits, complete
system will fail.

Decreased throughput
At a time, only one processor can communicate with any other functional unit.

Increased arbitration logic

As the number of processors & memory unit increases, the bus contention problem increases.
2.Cross bar Switch
A crossbar switch contains a matrix of simple switch elements that can switch on and off to create or break a connection. Turning on a switch
element in the matrix, a connection between a processor and a memory can be made. Crossbar switches are non-blocking, that is all
communication permutations can be performed without blocking.

The crossbar switch organization includes various crosspoints that are located at intersections between processor buses and memory module
directions. The diagram shows a crossbar switch interconnection between four CPUs and four memory modules.
The tiny square in each crosspoint is a switch that decides the direction from a processor to a memory module. Every switch point has to
control logic to set up the transmission direction between a processor and memory.
It determines the address that is located in the bus to decide whether its specific module is being addressed. It can also resolve several
requests for an approach to the equal memory module on a fixed priority basis.
s

The diagram shows the functional design of a

crossbar switch linked to one memory module. The
circuit includes multiplexers that choose the
information, address, and control from one CPU for
interaction with the memory module.
Priority levels are created by the arbitration logic to choose one CPU when multiple CPUs try to access the equal memory. The
multiplexers are regulated with the binary code that is created through a priority encoder inside the arbitration logic.
3.Multiport Memory

Multiport Memory System employs separate buses between each memory module and each CPU. A processor bus comprises the address, data and
control lines necessary to communicate with memory. Each memory module connects each processor bus. At any given time, the memory module
should have internal control logic to obtain which port can have access to memory.

Memory module can be said to have four ports and each port accommodates one of the buses. Assigning fixed priorities to each memory port resolve
the memory access conflicts. the priority is established for memory access associated with each processor by the physical port position that its bus
occupies in each module. Therefore CPU 1 can have priority over CPU 2, CPU 2 can have priority over CPU 3 and CPU 4 can have the lowest
priority.

Advantage:-
High transfer rate can be achieved because of multiple paths
Disadvantage:-
It requires expensive memory control logic and a large number of cables and
connectors.
It is only good for systems with small number of processors.
4.Multistage Switching Network

The 2×2 crossbar switch is used in the multistage network. It has 2 inputs (A & B) and 2 outputs (0 & 1). To establish the connection between the input
& output terminals, the control inputs C A & CB are associated.

The input is connected to 0 output if the

control input is 0 & the input is connected
to 1 output if the control input is 1.
This switch can arbitrate between
conflicting requests.

Only 1 will be connected if both A & B

require the same output terminal, the other
will be blocked/ rejected.

Contd…..
2 * 2 Crossbar Switch
We can construct a multistage network using 2×2 switches, in order to control the communication between a number of sources & destinations.
Creating a binary tree of cross-bar switches accomplishes the connections to connect the input to one of the 8 possible destinations.

1 to 8 way switch using 2*2 Switch

In the above diagram, PA & PB are 2 processors, and they are connected to 8 memory modules in a binary way from 000(0) to 111(7) through switches. Three levels are there from
a source to a destination. To choose output in a level, one bit is assigned to each of the 3 levels. There are 3 bits in the destination number: 1st bit determines the output of the
switch in 1st level, 2nd bit in 2nd level & 3rd bit in the 3rd level.
Example: If the source is: PB & the destination is memory module 011 (as in the figure): A path is formed from P B to 0
output in 1st level, output 1 in 2nd level & output 1 in 3rd level.
Usually, the processor acts as the source and the memory unit acts as a destination in a tightly coupled system. The
destination is a memory module. But, processing units act as both, the source and the destination in a loosely coupled
system.
Many patterns can be made using 2×2 switches such as Omega networks, Butterfly Network, etc.

Conclusion :
Interconnection structure can decide the overall system’s performance in a multi-processor environment. To overcome the
disadvantage of the common bus system, i.e., availability of only 1 path & reducing the complexity (crossbar have the
complexity of O(n2))of other interconnection structure, Multi-Stage Switching network came. They used smaller switches,
i.e., 2×2 switches to reduce the complexity. To set the switches, routing algorithms can be used. Its complexity and cost are
less than the cross-bar interconnection network
5.Hypercube System

Hypercube (or Binary n-cube multiprocessor) structure represents a loosely coupled system made up of N=2n processors interconnected in an n-dimensional binary cube. Each
processor makes a made of the cube. Each processor makes a node of the cube. Therefore, it is customary to refer to each node as containing a processor, in effect it has not only a CPU
but also local memory and I/O interface. Each processor has direct communication paths to n other neighbor processors. These paths correspond to the cube edges.
There are 2 distinct n-bit binary addresses which can be assigned to the processors. Each processor address differs from that of each of its n neighbors by exactly one bit position.
Hypercube structure for n= 1, 2 and 3.
A one cube structure contains n = 1 and 2n = 2.
It has two processors interconnected by a single path.
A two-cube structure contains n=2 and 2n=4.
It has four nodes interconnected as a cube.
An n-cube structure contains 2n nodes with a processor residing in each node.
Each node is assigned a binary address in such a manner, that the addresses of two neighbors differ in exactly one bit position. For example, the three neighbors of the node with
address 100 are 000, 110, and 101 in a three-cube structure. Each of these binary numbers differs from address 100 by one bit value.
Interprocess communication and synchronization.

Interprocess communication is the mechanism provided by the operating system that allows processes to communicate with each other. This
communication could involve a process letting another process know that some event has occurred or the transferring of data from one process
to another.

A diagram that illustrates interprocess communication is as follows

Synchronization in Interprocess Communication

Synchronization is a necessary part of Interprocess communication. It is either provided by the interprocess control mechanism or
handled by the communicating processes.

Some of the methods to provide synchronization are as follows :

Semaphore A semaphore is a variable that controls the access to a common resource by multiple processes. The two types of
semaphores are binary semaphores and counting semaphores.

Mutual Exclusion Mutual exclusion requires that only one process thread can enter the critical section at a time. This is useful for
synchronization and also prevents race conditions.

Barrier A barrier does not allow individual processes to proceed until all the processes reach it. Many parallel languages and
collective routines impose barriers.

Spinlock This is a type of lock. The processes trying to acquire this lock wait in a loop while checking if the lock is available or not.
This is known as busy waiting because the process is not doing any useful operation even though it is active.
Approaches to Interprocess Communication

The different approaches to implement interprocess communication are given as follows −

Pipe A pipe is a data channel that is unidirectional. Two pipes can be used to create a two-way data channel between two processes.
This uses standard input and output methods. Pipes are used in all POSIX systems as well as Windows operating systems.
Socket The socket is the endpoint for sending or receiving data in a network. This is true for data sent between processes on the same
computer or data sent between different computers on the same network. Most of the operating systems use sockets for interprocess
communication.
File A file is a data record that may be stored on a disk or acquired on demand by a file server. Multiple processes can access a file as
required. All operating systems use files for data storage.
Signal Signals are useful in interprocess communication in a limited way. They are system messages that are sent from one process to
another. Normally, signals are not used to transfer data but are used for remote commands between processes.
Shared Memory Shared memory is the memory that can be simultaneously accessed by multiple processes. This is done so that the
processes can communicate with each other. All POSIX systems, as well as Windows operating systems use shared memory.
Message Queue Multiple processes can read and write data to the message queue without being connected to each other. Messages are
stored in the queue until their recipient retrieves them. Message queues are quite useful for interprocess communication and are used by
most operating systems
A diagram that demonstrates message queue and shared memory methods of Interprocess communication
Cache Coherence

Cache coherence : In a multiprocessor system, data inconsistency may occur among adjacent levels or within the same level of the memory hierarchy.
In a shared memory multiprocessor with a separate cache memory for each processor, it is possible to have many copies of any one instruction operand:
one copy in the main memory and one in each cache memory. When one copy of an operand is changed, the other copies of the operand must be
changed also.

Example : Cache and the main memory may have inconsistent copies of the same object.
Suppose there are three processors, each having cache. Suppose the following scenario:-
Processor 1 read X : obtains 24 from the memory and caches it.
Processor 2 read X : obtains 24 from memory and caches it.
Again, processor 1 writes as X : 64, Its locally cached copy is updated.

Now, processor 3 reads X, what value should it get?

Memory and processor 2 thinks it is 24 and processor 1 thinks it is 64.

As multiple processors operate in parallel, and independently multiple caches may possess different copies of the same memory block,
this creates a cache coherence problem. Cache coherence is the discipline that ensures that changes in the values of shared operands are
propagated throughout the system in a timely fashion.

There are three distinct level of cache coherence :-

1. Every write operation appears to occur instantaneously.

2. All processors see exactly the same sequence of changes of values for each separate operand.
3. Different processors may see an operation and assume different sequences of values; this is known as non-coherent behavior.
There are various Cache Coherence Protocols in multiprocessor system.

These are :-

MSI protocol (Modified, Shared, Invalid)

MOSI protocol (Modified, Owned, Shared, Invalid)
MESI protocol (Modified, Exclusive, Shared, Invalid)
MOESI protocol (Modified, Owned, Exclusive, Shared, Invalid)

These important terms are discussed as follows:

Modified – It means that the value in the cache is dirty, that is the value in current cache is different from the main memory.
Exclusive – It means that the value present in the cache is same as that present in the main memory, that is the value is clean.
Shared – It means that the cache value holds the most recent data copy and that is what shared among all the cache and main memory
as well.
Owned – It means that the current cache holds the block and is now the owner of that block, that is having all rights on that particular
blocks.
Invalid – This states that the current cache block itself is invalid and is required to be fetched from other cache or main memory.
Coherency mechanisms

There are three types of coherence :

1. Directory-based – In a directory-based system, the data being shared is placed in a common directory that maintains the coherence
between caches. The directory acts as a filter through which the processor must ask permission to load an entry from the primary memory
to its cache. When an entry is changed, the directory either updates or invalidates the other caches with that entry.

2. Snooping – First introduced in 1983, snooping is a process where the individual caches monitor address lines for accesses to memory
locations that they have cached. It is called a write invalidate protocol. When a write operation is observed to a location that a cache has a
copy of and the cache controller invalidates its own copy of the snooped memory location.

3. Snarfing – It is a mechanism where a cache controller watches both address and data in an attempt to update its own copy of a memory
location when a second master modifies a location in main memory. When a write operation is observed to a location that a cache has a
copy of the cache controller updates its own copy of the snarfed memory location with the new data.
Thank You

Exam Questions RISC and CISC and Parallel
No ratings yet
Exam Questions RISC and CISC and Parallel
55 pages
RISC and CISC, Parallel Processing
No ratings yet
RISC and CISC, Parallel Processing
23 pages
CH 0
No ratings yet
CH 0
138 pages
CISC & RISC
No ratings yet
CISC & RISC
4 pages
RISC AND CISC
No ratings yet
RISC AND CISC
14 pages
COA UNIT-V PPTS Dr.G.Bhaskar ECE
No ratings yet
COA UNIT-V PPTS Dr.G.Bhaskar ECE
100 pages
03 Cisc Risc
No ratings yet
03 Cisc Risc
37 pages
Unit 5 (Coa) Notes
No ratings yet
Unit 5 (Coa) Notes
35 pages
Understanding Instruction Sets of Microcontrollers
No ratings yet
Understanding Instruction Sets of Microcontrollers
52 pages
Unit IV Material Part 3 1705469836025
No ratings yet
Unit IV Material Part 3 1705469836025
6 pages
8229_90_51_RISC-CISC-ARM
No ratings yet
8229_90_51_RISC-CISC-ARM
98 pages
PROCESSOR ORGANIZATION & PIPELINING
No ratings yet
PROCESSOR ORGANIZATION & PIPELINING
74 pages
Reduced Instruction Set Architecture
No ratings yet
Reduced Instruction Set Architecture
4 pages
RISC and CISC Architecture
No ratings yet
RISC and CISC Architecture
15 pages
Ldco Unit 5 Notes
No ratings yet
Ldco Unit 5 Notes
23 pages
COA-UNIT-5
No ratings yet
COA-UNIT-5
53 pages
intro archi
No ratings yet
intro archi
58 pages
2.1 RISC CISC
No ratings yet
2.1 RISC CISC
4 pages
a2-ch15
No ratings yet
a2-ch15
34 pages
Cisc & Risc Lec 5aa
100% (1)
Cisc & Risc Lec 5aa
4 pages
Computer Hardware Lecturer - 4
No ratings yet
Computer Hardware Lecturer - 4
9 pages
15 Hardware
No ratings yet
15 Hardware
27 pages
Coa Based on Willam Stalling
No ratings yet
Coa Based on Willam Stalling
9 pages
Unit 5
No ratings yet
Unit 5
23 pages
Processor and Computer Achitecture
No ratings yet
Processor and Computer Achitecture
26 pages
Chapter 15 A Level CS
No ratings yet
Chapter 15 A Level CS
3 pages
abushe
No ratings yet
abushe
28 pages
3.3.5 RISC Processors, 3.3.6 Paralell Processi
No ratings yet
3.3.5 RISC Processors, 3.3.6 Paralell Processi
12 pages
Unit 5
No ratings yet
Unit 5
23 pages
RISC Is The Way To Make Hardware Simpler Whereas CISC Is The Single Instruction That Handles Multiple Work
No ratings yet
RISC Is The Way To Make Hardware Simpler Whereas CISC Is The Single Instruction That Handles Multiple Work
9 pages
Cco Unit 5
No ratings yet
Cco Unit 5
41 pages
UNIT-5_PDF_MATERIAL
No ratings yet
UNIT-5_PDF_MATERIAL
27 pages
Reduced Instruction Set Computers (RISC Processors) : Execution Time N X S X T
No ratings yet
Reduced Instruction Set Computers (RISC Processors) : Execution Time N X S X T
4 pages
COA 3.2_RISC_CISC
No ratings yet
COA 3.2_RISC_CISC
20 pages
CISC and RISC
No ratings yet
CISC and RISC
18 pages
Risc Cisc in Microcontroller and Microprocessor
No ratings yet
Risc Cisc in Microcontroller and Microprocessor
31 pages
File Handling Notes
No ratings yet
File Handling Notes
17 pages
Von Neumann Architecture vs. Harvard
No ratings yet
Von Neumann Architecture vs. Harvard
22 pages
Embedded Systems - 7
No ratings yet
Embedded Systems - 7
17 pages
Electronics
No ratings yet
Electronics
8 pages
Risc A Cisc P
No ratings yet
Risc A Cisc P
10 pages
CSC 315 PDF 1
No ratings yet
CSC 315 PDF 1
7 pages
15.1 Processors & Paralell Processing (MT-L)
No ratings yet
15.1 Processors & Paralell Processing (MT-L)
12 pages
Session - 26 CISC and RISC
No ratings yet
Session - 26 CISC and RISC
15 pages
ACA Notes
No ratings yet
ACA Notes
60 pages
Microprocessors and Microcontrollers: Lecture-03
No ratings yet
Microprocessors and Microcontrollers: Lecture-03
9 pages
C Book Thyagaraj Gauda
No ratings yet
C Book Thyagaraj Gauda
545 pages
Microprocess OR & Computer Architecture: 14CS253 / UE14CS253
No ratings yet
Microprocess OR & Computer Architecture: 14CS253 / UE14CS253
12 pages
Risc and Cisc: By: Farheen Masood Sania Shahzad
No ratings yet
Risc and Cisc: By: Farheen Masood Sania Shahzad
17 pages
(Computer Architecture CIA-3) : (By: Nikhil Kumar Yadav) (Roll. No.: 1847241) (Date of Submission: 06-09-2019)
No ratings yet
(Computer Architecture CIA-3) : (By: Nikhil Kumar Yadav) (Roll. No.: 1847241) (Date of Submission: 06-09-2019)
8 pages
Structure of Computer Systems
No ratings yet
Structure of Computer Systems
16 pages
Risc and Cisc: Computer Architecture
No ratings yet
Risc and Cisc: Computer Architecture
17 pages
Superscalar Architectures: COMP375 Computer Architecture and Organization
No ratings yet
Superscalar Architectures: COMP375 Computer Architecture and Organization
35 pages
Risc Vs Cisc
No ratings yet
Risc Vs Cisc
6 pages
Risc and Cisc: by Eugene Clewlow
No ratings yet
Risc and Cisc: by Eugene Clewlow
17 pages
Risc and Cisc
No ratings yet
Risc and Cisc
20 pages
Risc and Cisc
No ratings yet
Risc and Cisc
17 pages
RISC and CISC - Eugene Clewlow
No ratings yet
RISC and CISC - Eugene Clewlow
17 pages
Risc and Cisc: by Eugene Clewlow
No ratings yet
Risc and Cisc: by Eugene Clewlow
17 pages
Project Report
No ratings yet
Project Report
8 pages
Instant ebooks textbook Oracle Database Transactions and Locking Revealed Building High Performance Through Concurrency 2nd Edition Darl Kuhn Thomas Kyte download all chapters
100% (3)
Instant ebooks textbook Oracle Database Transactions and Locking Revealed Building High Performance Through Concurrency 2nd Edition Darl Kuhn Thomas Kyte download all chapters
62 pages
LANGuard 9 - Scripting Guide
No ratings yet
LANGuard 9 - Scripting Guide
89 pages
Funky EMA 2
No ratings yet
Funky EMA 2
4 pages
Cobol Programming Standards
No ratings yet
Cobol Programming Standards
33 pages
ITIL 4 Candidate Syllabus
67% (3)
ITIL 4 Candidate Syllabus
5 pages
ML -Assignment-1
No ratings yet
ML -Assignment-1
1 page
Udaygo Aadhar-Msme Certificate
100% (1)
Udaygo Aadhar-Msme Certificate
1 page
Peugeot SMEG GPS Guide & User Manual
No ratings yet
Peugeot SMEG GPS Guide & User Manual
59 pages
PIC18F25K20
No ratings yet
PIC18F25K20
456 pages
Kronos Import Guide
No ratings yet
Kronos Import Guide
292 pages
Compiler Design Study Material Unit 2nd
No ratings yet
Compiler Design Study Material Unit 2nd
28 pages
Seminar Symbian
No ratings yet
Seminar Symbian
52 pages
Trees With Unique Minimum
No ratings yet
Trees With Unique Minimum
5 pages
Simulation Optimization of Part Input Sequence in A Flexible Manufacturing System
No ratings yet
Simulation Optimization of Part Input Sequence in A Flexible Manufacturing System
9 pages
NDRRMC - 221
100% (1)
NDRRMC - 221
5 pages
File Transfer Protocol
No ratings yet
File Transfer Protocol
17 pages
Rectangular Pulse Convolution-Update
No ratings yet
Rectangular Pulse Convolution-Update
3 pages
Readme
No ratings yet
Readme
34 pages
2000 Printer J210 Series: Deskjet
No ratings yet
2000 Printer J210 Series: Deskjet
2 pages
MCQ
No ratings yet
MCQ
11 pages
Diagraph HP Thermal Jet System PDF
No ratings yet
Diagraph HP Thermal Jet System PDF
2 pages
Ed 9221 - Finite Element Methods in Mechanical Design
No ratings yet
Ed 9221 - Finite Element Methods in Mechanical Design
3 pages
LPP Project
No ratings yet
LPP Project
23 pages
5.8 Radical Eqts and Ineq
No ratings yet
5.8 Radical Eqts and Ineq
3 pages
Calibrating The ADXL210 Accelerometer Joe Matson: Application Note
No ratings yet
Calibrating The ADXL210 Accelerometer Joe Matson: Application Note
7 pages
EBS R12 Isetup How To Transport Forms Personalizations
No ratings yet
EBS R12 Isetup How To Transport Forms Personalizations
11 pages
Unified Database Interface Srs
No ratings yet
Unified Database Interface Srs
13 pages
PHP Microservices
From Everand
PHP Microservices
Carlos Pérez Sánchez
3/5 (1)
Code Beneath the Surface: Mastering Assembly Programming
From Everand
Code Beneath the Surface: Mastering Assembly Programming
Kameron Hussain
No ratings yet
Node.js 63 Interview Questions and Answers
From Everand
Node.js 63 Interview Questions and Answers
John Edward Cooper Berg
No ratings yet

Computer Organization UNIT5

Uploaded by

Computer Organization UNIT5

Uploaded by

Pipeline Crossbar

 Reduced instruction set architecture (RISC )

Both approaches try to increase the cpu performance

Instruction comes undersize of one word.

Characteristic Instruction takes a single clock cycle to get executed.

of RISC More general-purpose registers.

Simple Addressing Modes.

Fewer Data types.

A pipeline can be achieved.

Instructions are larger than one-word size.

Complex Addressing Modes.

More Data types.

Differences An instruction executed in a single clock cycle

 Design of a basic pipeline

Non-overlapped execution: Overlapped

Total time = 8 Cycle Total time = 5 Cycle

The following sub operations are performed in this case:

• Here the instruction is fetched on first clock cycle in segment 1.

Array Processor performs computations on large array of data.

1.Input output interface to a common processor.

A Multiprocessor is a computer system with two or more central processing units

There are two types of multiprocessors

1. As a uniprocessor, such as single instruction, single data stream (SISD).

1. Time-shared / Common Bus

Single-Bus Multiprocessor Organization

Increased arbitration logic

The diagram shows the functional design of a

The input is connected to 0 output if the

Only 1 will be connected if both A & B

1 to 8 way switch using 2*2 Switch

A diagram that illustrates interprocess communication is as follows

Some of the methods to provide synchronization are as follows :

The different approaches to implement interprocess communication are given as follows −

Now, processor 3 reads X, what value should it get?

There are three distinct level of cache coherence :-

1. Every write operation appears to occur instantaneously.

MSI protocol (Modified, Shared, Invalid)

These important terms are discussed as follows:

There are three types of coherence :

You might also like