0% found this document useful (0 votes)
8 views

Lecture 10 - SIMD Architecture

SIMD (Single Instruction, Multiple Data) architecture allows multiple processing units to execute the same instruction on different data simultaneously, enhancing computational efficiency. It can be implemented through true SIMD with either distributed or shared memory, and pipelined SIMD architectures, each having distinct advantages and disadvantages. While SIMD improves performance for large data sets, it may lead to under-utilization of resources and increased complexity in programming.

Uploaded by

noor.semab.butt
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lecture 10 - SIMD Architecture

SIMD (Single Instruction, Multiple Data) architecture allows multiple processing units to execute the same instruction on different data simultaneously, enhancing computational efficiency. It can be implemented through true SIMD with either distributed or shared memory, and pipelined SIMD architectures, each having distinct advantages and disadvantages. While SIMD improves performance for large data sets, it may lead to under-utilization of resources and increased complexity in programming.

Uploaded by

noor.semab.butt
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

1

SIMD Architecture
Introduction
2
● Computer architecture as “the structure of a computer that a
machine language programmer must understand to write a
correct program for a machine”
● Computer architecture can be classified into four main
categories. These categories are defined under the Flynn’s
Taxonomy. Computer architecture is classified by the number
of instructions that are running in parallel and how its data is
managed.
Introduction
3
● The four categories that computer architecture can be
classified under are:
● 1. SISD: Single Instruction, Single Data
● 2. SIMD: Single Instruction, Multiple Data
● 3. MISD: Multiple Instruction, Single Data
● 4. MIMD: Multiple Instruction, Multiple Data
4
SIMD ARCHITECTURE
5
● Single Instruction stream, Multiple Data stream (SIMD)
processors one instruction works on several data items
simultaneously by using several processing elements, all of
which carried out same operation.
● SIMD or Single Instruction, Multiple Data is technology that
enables processing of multiple data with a single instruction
instead of using scalar operations where one instruction
processes each data.
6
How to View a SIMD Machine
7

● Think of soldiers all in a unit.


● A commander selects certain soldiers as active
– for example, every even numbered row.
● The commander shouts out an order that all
the active soldiers should do and they execute
the order synchronously.
8
What is SIMD
9
 Single Instruction Multiple Data
 multiple processing units work under the control of a single
control unit
 All PE’s get same instruction broadcasted from CU.
 parallel units share the same instruction, but they carry it out
on different data elements and are locked step together
 Main memory can also be divided into modules for generating
multiple data streams acting as a shared memory
How SIMD Model Works?
10

SIMD computer consists of two types of processors:


● A front-end or control unit
■ Stores a copy of the program
■ Has a program control unit to execute program
■ Broadcasts parallel program instructions to the array of
processors.
● Array of processors of simplistic processors that are
functionally more like an ALU.
■ Does not store a copy of the program nor have a program
control unit.
■ Executes the commands in parallel sent by the front end.
11
Processor Array
12
 The processor array is a set of identical synchronized
processing elements capable of simultaneously performing the
same operation on different data.

 Each processor in the array has a small amount of local


memory where the distributed data resides while it is being
processed in parallel.

 The processor array is connected to the memory bus of the


front end so that the front end can randomly access the local
processor memories.
Processor Array
13
Types
14

● Implementing SIMD Architecture:


● Two types of SIMD architectures exist:
 1. True SIMD (Distributed Memory, Shared Memory)
● 2. Pipelined SIMD
True SIMD architecture:
15

● True SIMD architectures can be determined by


its usage of distributed memory and shared
memory. Both true SIMD architectures possess
similar implementation, but differ on
placement of processor and memory modules.
Distributed Memory
16
● In distributed memory, each memory model is uniquely associated with a particular
arithmetic unit.
● The synchronized PE's are controlled by one control unit.
● Each PE is basically an arithmetic logic unit with attached working registers and
local memories for storage of distributed data.
● The CU decodes the instructions and determines where they should be executed.
● The scalar instructions are executed in CU whereas the vector instructions are
broadcast to PE's.
● If an ALU needs the information contained in a different ALU, it must put in a
request to the CU and the CU must manage the transferring of information.
● The advantage of this type of architecture is in the ease of adding more memory
and PE's to the computer.
● The disadvantage can be found in the time wasted by the CU managing all memory
exchanges
True SIMD with Distributed
Memory
17
Shared Memory
18
● In shared memory SIMD machines, the local memories
attached to PE's are replaced by memory modules shared by
all PE's through an alignment network.

● This configuration allows the individual PE's to share their


memory without accessing the CU.

● Disadvantage is inherited in the difficulty of adding and


configuring memory separately.
True SIMD with Shared Memory
19
Pipelined SIMD Architecture
20
● Composed of a pipeline of arithmetic units with memory.
● The pipeline takes different streams of instructions and performs all
the operations of an arithmetic unit.
● The pipeline is a first in first out type of procedure.
● The data to be evaluated must be stored in different memory
modules
● It is also possible for a single processor to perform the same
instruction on a large set of data items
● One set of operands starts through the pipeline, and
● Before the computation is finished on this set of operands, another
set of operands starts flowing through the pipeline
● The advantages to this architecture can be found in the speed and
efficiency of data processing
21
Advantages of SIMD
22
● Less hardware as they have only one control unit.
● Ability to conduct mathematical operations that are applied to
large sets of data points in a much quicker manner.
● Single instruction stream and synchronization of PEs make
SIMD applications easier to program and understand.
● Improve performance. For example, processing 12 data items
could take 12 instructions for scalar processing, but would
require only three instructions if four data elements are
processed per instruction.
● SIMD offers greater flexibility and opportunities for better
performance in video, audio and communications tasks
Disadvantages of SIMD
23
● A single CU provides the instruction stream for all of the array
processors, the system will frequently be under-utilized
whenever programs are run that require only a few PEs.
● It also has large register files which increases power
consumption and chip area.
● Implementing SIMD supported instructions in a computer
program requires significantly more human interaction than
traditional C or C++ programming cannot be applied to all
algorithms
● Gathering data into SIMD registers and scattering it to the
correct destination locations is tricky
Design space(Granularity)
24

● Mapping data to PEs


Granularity
Fine to coarse grain 25
Processor complexity
26

● The design space for processor precision.


Difference between fine-grained and
coarse-grained parallelism
27
Fine-grained parallelism Coarse-grained parallelism
● In fine-grained parallelism, a ● In coarse-grained parallelism, a
program is broken down to a program is split into large tasks.
large number of small tasks ● The amount of work is not evenly
● The amount of work is evenly distributed. Hence, coarse-
distributed. Hence, fine-grained grained parallelism does not
parallelism facilitates facilitates load balancing.
load balancing. ● Due to this, a large amount of
● Each processing element computation takes place in
processors.
processes less data.

You might also like