COA - Module-5
COA - Module-5
Answer: Cache
4. Parallelism
4.1 Introduction
Parallel Processing
Parallel processing can be described as a class of techniques which enables the
system to achieve simultaneous data-processing tasks to increase the
computational speed of a computersystem.
A parallel processing system can carry out simultaneous data-processing to
achieve faster executiontime.
For instance, while an instruction is being processed in the ALU component of the
CPU, the next instruction can be read frommemory.
The primary purpose of parallel processing is to enhance the computer processing
capability and increase itsthroughput,
A parallel processing system can be achieved by having a multiplicity of
functional units that perform identical or different operationssimultaneously.
The data can be distributed among various multiple functionalunits.
The following diagram shows one possible way of separating the execution unit
into eight functional units operating inparallel.
The operation performed in each functional unit is indicated in each block if the
diagram:
The adder and integer multiplier performs the arithmetic operation with integer
numbers.
The floating-point operationsare separated into three circuits operating in parallel.
The logic, shift, and increment operations can be performed concurrently on
differentdata.
All units are independent of each other, so one number can be shifted while
another number is beingincremented.
Parallel computers can be roughly classified according to the level at which the
hardware supports parallelism, with multi-core and multi-processor computers
having multiple processing elements within a singlemachine.
In some cases parallelism is transparent to the programmer, such as in bit-level or
instruction-levelparallelism.
But explicitly parallel algorithms, particularly those that use concurrency, are more
difficult to write than sequential ones, because concurrency introduces several new
classesofpotentialsoftwarebugs,ofwhichraceconditionsarethemostcommon.
Communication and synchronization between the different subtasks are typically
some of the greatest obstacles to getting optimal parallel programperformance.
Types of Parallelism:
1. Bit-level parallelism: It is the form of parallel computing which is based on the
increasing processor’s size. It reduces the number of instructions that the system
must execute in order to perform a task on large-sized data.
Example: Consider a scenario where an 8-bit processor must compute the sum of
two 16-bit integers. It must first sum up the 8 lower-order bits, then add the 8
higher-order bits, thus requiring two instructions to perform the operation. A 16-
bit processor can perform the operation with just oneinstruction.
2. Instruction-level parallelism: A processor can only address less than one
instructionforeachclockcyclephase.Theseinstructionscanbere-orderedand
grouped which are later on executed concurrently without affecting the result of the
program. This is called instruction-levelparallelism.
3. Task Parallelism: Task parallelism employs the decomposition of a task into
subtasks and then allocating each of the subtasks for execution. The processors
perform execution of sub tasksconcurrently.
4. Data-level parallelism (DLP) – Instructions from a single stream operate
concurrently on several data – Limited by non-regular data manipulation
patterns and by memorybandwidth
Architectural Trends
When multiple operations are executed in parallel, the number of cycles needed to
execute the program isreduced.
However, resources are needed to support each of the concurrentactivities.
Resources are also needed to allocate localstorage.
The best performance is achieved by an intermediate action plan that uses
resources to utilize a degree of parallelism and a degree oflocality.
Generally, the history of computer architecture has been divided into four
generations having following basic technologies−
Vacuum tubes
Transistors
Integratedcircuits
VLSI
Till 1985, the duration was dominated by the growth in bit-levelparallelism.
4-bit microprocessors followed by 8-bit, 16-bit, and soon.
To reduce the number of cycles needed to perform a full 32-bit operation, the
widthofthedatapathwasdoubled.Lateron,64-bitoperationswereintroduced.
The growth in instruction-level-parallelism dominated the mid-80s tomid-90s.
The RISC approach showed that it was simple to pipeline the steps of instruction
processingsothatonanaverageaninstructionisexecutedinalmosteverycycle.
Growth in compiler technology has made instruction pipelines more productive.
In mid-80s, microprocessor-based computers consistedof
An integer processingunit
A floating-pointunit
A cachecontroller
SRAMs for the cachedata
Tagstorage
As chip capacity increased, all these components were merged into a singlechip.
Thus, a single chip consisted of separate hardware for integer arithmetic, floating
point operations, memory operations and branchoperations.
Other than pipelining individual instructions, it fetches multiple instructions at a
time and sends them in parallel to different functional units whenever possible.
This type of instruction level parallelism is called superscalarexecution.
FLYNN‘S CLASSIFICATION
Flynn's taxonomy is a specific classification of parallel computer architectures that
are based on the number of concurrent instruction (single or multiple) and data
streams (single or multiple) available in thearchitecture.
The four categories in Flynn's taxonomy are thefollowing:
1. (SISD) single instruction, singledata
2. (SIMD) single instruction, multipledata
3. (MISD) multiple instruction, singledata
4. (MIMD) multiple instruction, multipledata
Instruction stream: is the sequence of instructions asexecuted by themachine
Data Stream is a sequence of data including input, or partial or temporary result,
called by the instructionStream.
Instructions are decoded by the control unit and then ctrl unit send the instructions
to the processing units for execution.•
Data Stream flows between the processors and memory bidirectionally.
SISD
An SISD computing system is a uniprocessor machine which is capable of executing a
single instruction, operating on a single datastream.
SIMD
• An SIMD system is a multiprocessor machine capable of executing the same
instruction on all the CPUs but operating on different datastreams
Machines based on an SIMD model are well suited to scientific computing since
they involve lots of vector and matrixoperations.
So that theinformation can be passed to all the processing elements (PEs)
organized data elements of vectors can be divided into multiple sets(N-sets for N PE
systems) and each PE can process one dataset.
Dominant representative SIMD systems is Cray’s vector processingmachine.
MISD
An MISD computing system is a multiprocessor machinecapable of executing
different instructions on different PEs but all of them operating on the same
dataset .
The system performs different operations on the same data set. Machines built
using the MISD model are not useful in most of the application, a few machines
are built, but none of them are availablecommercially.
MIMD
An MIMD system is a multiprocessor machine which is capable of executing
multiple instructions on multiple datasets.
Each PE in the MIMD model has separate instruction and data strea m s; therefore
machines built usingthism odel are capable to any kind ofapplication.
Unlike SIMD and MISD machines, PEs in MIMD mac h ines work
asynchronously.
dly categorized into
MIMD machines arebroa
shared-memoryMIMD and
distributed-memoryMIMD
based on the way PEs are coupled to the main memory.
In the shared memory MIMD model (tightly coupled multiprocessor systems), all the
PEs are connected to a single global memory and they all have access to it. The
communication between PEs in this model takes place through the shared memory,
modification of the data stored in the global memory by one PE is visible to all other PEs.
Dominant representative shared memory MIMD systems are Silicon Graphics machines
and Sun/IBM’s SMP (SymmetricMulti-Processing).
Pipelining
The term Pipelining refers to a technique of decomposing a sequential process into sub-operations,
with each sub-operation being executed in a dedicated segment that operates concurrently with all
other segments.
The most important characteristic of a pipeline technique is that several computations can be in
progress in distinct segments at the same time. The overlapping of computation is made possible by
associating a register with each segment in the pipeline. The registers provide isolation between each
segment so that each can operate on distinct data simultaneously.
The structure of a pipeline organization can be represented simply by including an input register for
each segment followed by a combinational circuit.
Let us consider an example of combined multiplication and addition operation to get a better
understanding of the pipeline organization.
The combined multiplication and addition operation is done with a stream of numbers such as:
The operation to be performed on the numbers is decomposed into sub-operations with each sub-
operation to be implemented in a segment within a pipeline.
The sub-operations performed in each segment of the pipeline are defined as:
The following block diagram represents the combined as well as the sub-operations performed in
each segment of the pipeline.
Registers R1, R2, R3, and R4 hold the data and the combinational circuits operate in a particular
segment.
The output generated by the combinational circuit in a given segment is applied as an input register
of the next segment. For instance, from the block diagram, we can see that the register R3 is used as
one of the input registers for the combinational adder circuit.
The pipelined approach, overlapping of each stages without disturbing the other stages.
Now each person's waiting time is reduced and each stage idle time is reduced.
The same principles apply to processors where the pipeline instruction-execution is applied.
The MIPS instructions classically take five steps or stages:
1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register
All the pipeline stages take a single clock cycle, so the clock cycle must be long enough to
accommodate the slowest operation.
If the stages are perfectly balanced, then the time between instructions on the pipelined
processor—assuming ideal conditions—is equal to
Under ideal conditions and with a large number of instructions, the speed-up from
pipelining is approximately equal to the number of pipe stages; a five-stage pipeline is nearly
five times faster.
Pipelined approach - sequences of load Instruction, but stages are overlapped during
execution as given below.
Total time taken for execution is 1400 ps, while in Non-pipelined approach is 2400 ps.
But as per formula
Time taken for pipelined approach = time taken (non-pipelined approach)/ No of stages
= 2400 ps / 5
= 480 ps
But the practical results show, it is 1400 ps.
So only when the No. of instructions in pipelined execution is high enough, the theoretical
execution speed can be achieved or nearly achieved. Pipelining improves performance by increasing
instruction throughput.
In general, the pipeline organization is applicable for two areas of computer design
which includes:
1. Arithmetic Pipeline
2. Instruction Pipeline
Array Processor: Architecture, Types, Working & Its
Applications
A supercomputer is a very powerful computer that includes architecture,
resources & components that gives a huge computing power to the
consumer. A supercomputer also contains a large number
of processors which performs millions or billions of computations each
second. So these computers can perform numerous tasks in a few
seconds. There are three types of supercomputers tightly connected cluster
computers that work together like a single unit. Commodity computers can
connect to low latency & high-bandwidth LANs and finally vector
processing computers which depend on an array processor or vectors. An
array processor is like a CPU that helps in performing mathematical
operations on various data elements. The most famous array processor is
the ILLIAC IV computer which is designed by the Burroughs Corporation.
This article discusses an overview of an array processor – working, types
& applications.
This processor includes a master control unit and main memory. The
master control unit in the processor controls the operation of the processing
elements. And also, decodes the instruction & determines how the
instruction is executed. So, if the instruction is program control or scalar
then it is executed directly in the master control unit. Main memory is
mainly used to store the program while every processing unit uses
operands that are stored in its local memory.
Advantages
The advantages of an array processor include the following.
• Process migration.
1. MSI Protocol:
This is a basic cache coherence protocol used in multiprocessor system. The letters of protocol name
identify possible states in which a cache can be. So, for MSI each block can have one of the following
possible states:
• Modified –
The block has been modified in cache, i.e., the data in the cache is inconsistent with the
backing store (memory). So, a cache with a block in “M” state has responsibility to write the
block to backing store when it is evicted.
• Shared –
This block is not modified and is present in atleast one cache. The cache can evict the data
without writing it to backing store.
• Invalid –
This block is invalid and must be fetched from memory or from another cache if it is to be
stored in this cache.
2. MOSI Protocol:
This protocol is an extension of MSI protocol. It adds the following state in MSI protocol:
• Owned –
It indicates that the present processor owns this block and will service requests from other
processors for the block.
3. MESI Protocol –
It is the most widely used cache coherence protocol. Every cache line is marked with one the
following states:
• Modified –
This indicates that the cache line is present in current cache only and is dirty i.e its value is
different from the main memory. The cache is required to write the data back to main
memory in future, before permitting any other read of invalid main memory state.
• Exclusive –
This indicates that the cache line is present in current cache only and is clean i.e its value
matches the main memory value.
• Shared –
It indicates that this cache line may be stored in other caches of the machine.
• Invalid –
It indicates that this cache line is invalid.
4. MOESI Protocol:
This is a full cache coherence protocol that encompasses all of the possible states commonly used in
other protocols. Each cache line is in one of the following states:
• Modified –
A cache line in this state holds the most recent, correct copy of the data while the copy in the
main memory is incorrect and no other processor holds a copy.
• Owned –
A cache line in this state holds the most recent, correct copy of the data. It is similar to
shared state in that other processors can hold a copy of most recent, correct data and unlike
shared state however, copy in main memory can be incorrect. Only one processor can hold
the data in owned state while all other processors must hold the data in shared state.
• Exclusive –
A cache line in this state holds the most recent, correct copy of the data. The main memory
copy is also most recent, correct copy of data while no other holds a copy of data.
• Shared –
A cache line in this state holds the most recent, correct copy of the data. Other processors in
system may hold copies of data in shared state as well. The main memory copy is also the
most recent, correct copy of the data, if no other processor holds it in owned state.
• Invalid –
A cache line in this state does not hold a valid copy of data. Valid copies of data can be either
in main memory or another processor cache.
Clusters In Computer Organisation
A cluster is a set of loosely or tightly connected computers working together as a unified computing
resource that can create the illusion of being one machine. Computer clusters have each node set to
perform the same task, controlled and produced by the software.
Clustered Operating Systems work similarly to Parallel Operating Systems as they have many CPUs.
Cluster systems are created when two or more computer systems are merged. Basically, they have an
independent computer but have common storage and the systems work together.
The components of clusters are usually connected using fast area networks, with each node running
its own instance of an operating system. In most circumstances, all the nodes use the same hardware
and the same operating system, although in some setups different hardware or different operating
systems can be used in some setups.
In the field of computer organization, a cluster refers to a set of interconnected computers or servers
that collaborate to provide a unified computing resource. Clustering is an effective method to ensure
high availability, scalability, and fault tolerance in computer systems.
Clusters can be categorized into two major types, namely high-availability clusters and load-balancing
clusters. High-availability clusters guarantee uninterrupted service provision even when one or more
nodes fail. Multiple nodes are configured to provide redundant services, so that in case of failure,
another node takes over the failed node’s services without any interruption to the user. On the other
hand, load-balancing clusters distribute workloads among nodes in the cluster to ensure that no
single node is overburdened.
Several hardware and software technologies can be used to implement clusters, including dedicated
clustering hardware, virtualization technologies, and distributed software frameworks.
Clustering provides several benefits such as high availability, scalability, fault tolerance, and load
balancing. Nevertheless, there are a few challenges associated with clustering, such as complexity,
cost, and management.
To ensure the best performance on SEO, the content should contain relevant keywords and provide
valuable information to readers. It is important to avoid keyword stuffing and provide content that is
easy to read and understand.
For making cluster more efficient there exist two clusters:
• Hardware Cluster
• Software Cluster
Hardware Cluster helps in enable high-performance disk sharing between systems, while
the Software Cluster allows all systems to work together.
• Symmetric Cluster: In this type of clustering, all the nodes run applications and monitor
other nodes at the same time. This clustering is more efficient than Asymmetric clustering as
it doesn’t have any hot standby key.
Classification of Clusters:
Computer Clusters are arranged together in such a way to support different purposes from general-
purpose business needs such as web-service support to computation-intensive scientific calculation.
Basically, there are three types of Clusters, they are:
• Load-Balancing Cluster – A cluster requires an effective capability for balancing the load
among available computers. In this, cluster nodes share a computational workload to
enhance the overall performance. For example- a high-performance cluster used for
scientific calculation would balance the load from different algorithms from the web-server
cluster, which may just use a round-robin method by assigning each new request to a
different node. This type of cluster is used on farms of Web servers (web farm).
• Fail-Over Clusters – The function of switching applications and data resources over from a
failed system to an alternative system in the cluster is referred to as fail-over. These types are
used to cluster database of critical mission, mail, file, and application servers
• High-Availability Clusters – These are also known as “HA clusters”. They offer a high
probability that all the resources will be in service. If a failure does occur, such as a system
goes down or a disk volume is lost, then the queries in progress are lost. Any lost query, if
retried, will be serviced by a different computer in the cluster. This type of cluster is widely
used in web, email, news, or FTP servers.
Benefits:
• Absolute scalability – It is possible to create large clusters that beats the power of even the
largest standalone machines. A cluster can have dozens of multiprocessor machines.
• Additional scalability – A cluster is configured in such a way that it is possible to add new
systems to the cluster in small increments. Clusters have the ability to add systems
horizontally. This means that more computers may be added to the clusters to improve their
performance, redundancy, and fault tolerance(the ability for a system to continue working
with malfunctioning of the node).
• High availability – As we know that each node in a cluster is a standalone computer, the
failure of one node does not mean loss of service. A single node can be taken down for
maintenance, while the rest of the clusters takes on a load of that individual node.
High Performance: Clusters are designed to provide high performance computing by utilizing the
processing power of multiple computers working together.
Scalability: Clusters are scalable, which means that they can easily accommodate new nodes or
computers to increase processing power and performance.
Fault Tolerance: Clusters are designed to be fault-tolerant, which means that they can continue to
operate even if one or more nodes fail. This is achieved through redundant hardware, software, or
both.
Load Balancing: Clusters use load balancing techniques to distribute processing workload across
multiple nodes in a balanced manner. This helps to maximize performance and prevent overloading
of individual nodes.
Interconnectivity: Clusters are interconnected through a high-speed network that allows for efficient
communication and data transfer between nodes.
Shared Resources: Clusters allow for shared access to resources such as storage, memory, and
input/output devices. This makes it easier to manage resources and reduces the need for
duplication.
Versatility: Clusters can be used for a wide range of applications, including scientific computing, data
analysis, and web serving.
1. When considering a particular topic, it is important to evaluate the potential advantages and
disadvantages. This approach provides a comprehensive understanding of the topic, helping
to make informed decisions.
2. Advantages refer to the benefits or positive aspects of a particular topic. For instance, when
implementing a new technology, the advantages may include improved efficiency, cost
savings, and increased productivity. These benefits can positively impact individuals or
organizations, leading to increased profits, better customer service, or improved quality of
life.
3. On the other hand, disadvantages refer to the potential drawbacks or negative aspects of a
topic. When evaluating a new technology, for instance, the disadvantages may include higher
costs, reduced privacy, or possible security breaches. These drawbacks may negatively
impact individuals or organizations, leading to financial losses, negative customer
experiences, or reputational damage.
4. It is important to weigh both the advantages and disadvantages of a particular topic before
making a decision. This helps to ensure that the potential benefits outweigh the potential
drawbacks. Additionally, it is essential to keep in mind that the advantages and
disadvantages may differ depending on the individual or organization’s needs, goals, and
circumstances.
In summary, evaluating the advantages and disadvantages of a topic is a crucial step in decision-
making. By considering both the positive and negative aspects of a particular topic, individuals or
organizations can make informed decisions that align with their goals and values.