Lecture 10 - SIMD Architecture

SIMD (Single Instruction, Multiple Data) architecture allows multiple processing units to execute the same instruction on different data simultaneously, enhancing computational efficiency. It can be implemented through true SIMD with either distributed or shared memory, and pipelined SIMD architectures, each having distinct advantages and disadvantages. While SIMD improves performance for large data sets, it may lead to under-utilization of resources and increased complexity in programming.

Uploaded by

noor.semab.butt

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Lecture 10 - SIMD Architecture

Uploaded by

noor.semab.butt

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

1

SIMD Architecture
Introduction
2
● Computer architecture as “the structure of a computer that a
machine language programmer must understand to write a
correct program for a machine”
● Computer architecture can be classified into four main
categories. These categories are defined under the Flynn’s
Taxonomy. Computer architecture is classified by the number
of instructions that are running in parallel and how its data is
managed.
Introduction
3
● The four categories that computer architecture can be
classified under are:
● 1. SISD: Single Instruction, Single Data
● 2. SIMD: Single Instruction, Multiple Data
● 3. MISD: Multiple Instruction, Single Data
● 4. MIMD: Multiple Instruction, Multiple Data
4
SIMD ARCHITECTURE
5
● Single Instruction stream, Multiple Data stream (SIMD)
processors one instruction works on several data items
simultaneously by using several processing elements, all of
which carried out same operation.
● SIMD or Single Instruction, Multiple Data is technology that
enables processing of multiple data with a single instruction
instead of using scalar operations where one instruction
processes each data.
6
How to View a SIMD Machine
7

● Think of soldiers all in a unit.

● A commander selects certain soldiers as active
– for example, every even numbered row.
● The commander shouts out an order that all
the active soldiers should do and they execute
the order synchronously.
8
What is SIMD
9
 Single Instruction Multiple Data
 multiple processing units work under the control of a single
control unit
 All PE’s get same instruction broadcasted from CU.
 parallel units share the same instruction, but they carry it out
on different data elements and are locked step together
 Main memory can also be divided into modules for generating
multiple data streams acting as a shared memory
How SIMD Model Works?
10

SIMD computer consists of two types of processors:

● A front-end or control unit
■ Stores a copy of the program
■ Has a program control unit to execute program
■ Broadcasts parallel program instructions to the array of
processors.
● Array of processors of simplistic processors that are
functionally more like an ALU.
■ Does not store a copy of the program nor have a program
control unit.
■ Executes the commands in parallel sent by the front end.
11
Processor Array
12
 The processor array is a set of identical synchronized
processing elements capable of simultaneously performing the
same operation on different data.

 Each processor in the array has a small amount of local

memory where the distributed data resides while it is being
processed in parallel.

 The processor array is connected to the memory bus of the

front end so that the front end can randomly access the local
processor memories.
Processor Array
13
Types
14

● Implementing SIMD Architecture:

● Two types of SIMD architectures exist:
 1. True SIMD (Distributed Memory, Shared Memory)
● 2. Pipelined SIMD
True SIMD architecture:
15

● True SIMD architectures can be determined by

its usage of distributed memory and shared
memory. Both true SIMD architectures possess
similar implementation, but differ on
placement of processor and memory modules.
Distributed Memory
16
● In distributed memory, each memory model is uniquely associated with a particular
arithmetic unit.
● The synchronized PE's are controlled by one control unit.
● Each PE is basically an arithmetic logic unit with attached working registers and
local memories for storage of distributed data.
● The CU decodes the instructions and determines where they should be executed.
● The scalar instructions are executed in CU whereas the vector instructions are
broadcast to PE's.
● If an ALU needs the information contained in a different ALU, it must put in a
request to the CU and the CU must manage the transferring of information.
● The advantage of this type of architecture is in the ease of adding more memory
and PE's to the computer.
● The disadvantage can be found in the time wasted by the CU managing all memory
exchanges
True SIMD with Distributed
Memory
17
Shared Memory
18
● In shared memory SIMD machines, the local memories
attached to PE's are replaced by memory modules shared by
all PE's through an alignment network.

● This configuration allows the individual PE's to share their

memory without accessing the CU.

● Disadvantage is inherited in the difficulty of adding and

configuring memory separately.
True SIMD with Shared Memory
19
Pipelined SIMD Architecture
20
● Composed of a pipeline of arithmetic units with memory.
● The pipeline takes different streams of instructions and performs all
the operations of an arithmetic unit.
● The pipeline is a first in first out type of procedure.
● The data to be evaluated must be stored in different memory
modules
● It is also possible for a single processor to perform the same
instruction on a large set of data items
● One set of operands starts through the pipeline, and
● Before the computation is finished on this set of operands, another
set of operands starts flowing through the pipeline
● The advantages to this architecture can be found in the speed and
efficiency of data processing
21
Advantages of SIMD
22
● Less hardware as they have only one control unit.
● Ability to conduct mathematical operations that are applied to
large sets of data points in a much quicker manner.
● Single instruction stream and synchronization of PEs make
SIMD applications easier to program and understand.
● Improve performance. For example, processing 12 data items
could take 12 instructions for scalar processing, but would
require only three instructions if four data elements are
processed per instruction.
● SIMD offers greater flexibility and opportunities for better
performance in video, audio and communications tasks
Disadvantages of SIMD
23
● A single CU provides the instruction stream for all of the array
processors, the system will frequently be under-utilized
whenever programs are run that require only a few PEs.
● It also has large register files which increases power
consumption and chip area.
● Implementing SIMD supported instructions in a computer
program requires significantly more human interaction than
traditional C or C++ programming cannot be applied to all
algorithms
● Gathering data into SIMD registers and scattering it to the
correct destination locations is tricky
Design space(Granularity)
24

● Mapping data to PEs

Granularity
Fine to coarse grain 25
Processor complexity
26

● The design space for processor precision.

Difference between fine-grained and
coarse-grained parallelism
27
Fine-grained parallelism Coarse-grained parallelism
● In fine-grained parallelism, a ● In coarse-grained parallelism, a
program is broken down to a program is split into large tasks.
large number of small tasks ● The amount of work is not evenly
● The amount of work is evenly distributed. Hence, coarse-
distributed. Hence, fine-grained grained parallelism does not
parallelism facilitates facilitates load balancing.
load balancing. ● Due to this, a large amount of
● Each processing element computation takes place in
processors.
processes less data.

Super V+ Manual
No ratings yet
Super V+ Manual
20 pages
Asashi Sales Order - 3 7
No ratings yet
Asashi Sales Order - 3 7
1 page
A Comparative Analysis of SIMD and MIMD Architectures
No ratings yet
A Comparative Analysis of SIMD and MIMD Architectures
6 pages
Install Top Win 7
33% (6)
Install Top Win 7
9 pages
Mcap Notes
No ratings yet
Mcap Notes
186 pages
ch.9 Pipeline MoDIFIED
No ratings yet
ch.9 Pipeline MoDIFIED
76 pages
SIMD Architecture
100% (1)
SIMD Architecture
16 pages
SIMD and Associative Computational Models: Parallel & Distributed Algorithms
No ratings yet
SIMD and Associative Computational Models: Parallel & Distributed Algorithms
31 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
Cs8083 Notes Mcap
No ratings yet
Cs8083 Notes Mcap
187 pages
02 Lecture Flynn IN
No ratings yet
02 Lecture Flynn IN
78 pages
Multiprocessors I
No ratings yet
Multiprocessors I
13 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
Aca Unit 1.1
No ratings yet
Aca Unit 1.1
20 pages
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
No ratings yet
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
17 pages
28 MIMD Architecture
No ratings yet
28 MIMD Architecture
28 pages
StudM1p1Parallel Computer Modelsppt1shared
No ratings yet
StudM1p1Parallel Computer Modelsppt1shared
107 pages
Lecture 10 Parallel Computing - by FQ
No ratings yet
Lecture 10 Parallel Computing - by FQ
29 pages
BCSE412L - Parallel Computing 04
No ratings yet
BCSE412L - Parallel Computing 04
9 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
Paralelismo_2024
No ratings yet
Paralelismo_2024
30 pages
Lec 9 Taxonomy,RISC,CISC-computer architecture
No ratings yet
Lec 9 Taxonomy,RISC,CISC-computer architecture
31 pages
Parallel Processing
No ratings yet
Parallel Processing
22 pages
COA U5 PPT Full
No ratings yet
COA U5 PPT Full
43 pages
Unit -01 easid
No ratings yet
Unit -01 easid
18 pages
PARALLEL VS DISTRIBUTED COMPUTING
No ratings yet
PARALLEL VS DISTRIBUTED COMPUTING
9 pages
15.1 Processors, Parallel Processing and Virtual Machines
No ratings yet
15.1 Processors, Parallel Processing and Virtual Machines
25 pages
CENG 365 Microprocessor UENR Unit 1
No ratings yet
CENG 365 Microprocessor UENR Unit 1
65 pages
CS516: Parallelization of Programs: Overview of Parallel Architectures
No ratings yet
CS516: Parallelization of Programs: Overview of Parallel Architectures
43 pages
Advanced Computer Architecture: Presented By, Krishna
No ratings yet
Advanced Computer Architecture: Presented By, Krishna
35 pages
Copy of Unit IV CA
No ratings yet
Copy of Unit IV CA
73 pages
L13 - Modern Processors
No ratings yet
L13 - Modern Processors
19 pages
Notes_FT_HA
No ratings yet
Notes_FT_HA
4 pages
Module 4- Architecture
No ratings yet
Module 4- Architecture
22 pages
Explicitly Parallel Platforms
No ratings yet
Explicitly Parallel Platforms
90 pages
Unit I_Introduction to Parallel Processing (1)
No ratings yet
Unit I_Introduction to Parallel Processing (1)
45 pages
multicore architecture
No ratings yet
multicore architecture
274 pages
Computer Architecture and Parallel Processing
No ratings yet
Computer Architecture and Parallel Processing
29 pages
ACA UNIT-5 Notes
No ratings yet
ACA UNIT-5 Notes
15 pages
GPU Architecture
0% (2)
GPU Architecture
28 pages
COME6102 Chapter 1 Introduction 2 of 2
No ratings yet
COME6102 Chapter 1 Introduction 2 of 2
8 pages
21cs401 CA Unit V
No ratings yet
21cs401 CA Unit V
16 pages
01. Stored Program Concept (1)
No ratings yet
01. Stored Program Concept (1)
27 pages
Unit 5
No ratings yet
Unit 5
23 pages
Flynn's Classification
No ratings yet
Flynn's Classification
4 pages
Unit-6 Pipelining
No ratings yet
Unit-6 Pipelining
63 pages
Unit-7 Design Issues For Parallel Computers Definition
No ratings yet
Unit-7 Design Issues For Parallel Computers Definition
11 pages
Processor and Computer Achitecture
No ratings yet
Processor and Computer Achitecture
26 pages
UNIT 2 Cloud Computing
No ratings yet
UNIT 2 Cloud Computing
18 pages
5 Marks Q. Describe Array Processor Architecture
No ratings yet
5 Marks Q. Describe Array Processor Architecture
11 pages
Introduction Mod1
No ratings yet
Introduction Mod1
120 pages
Unit 1
No ratings yet
Unit 1
48 pages
HPC UNIT 1 SOLUTION
No ratings yet
HPC UNIT 1 SOLUTION
8 pages
Multi-Core Processors: Page 1 of 25
No ratings yet
Multi-Core Processors: Page 1 of 25
25 pages
Parallel
No ratings yet
Parallel
5 pages
26-27 SIMD Architecture
No ratings yet
26-27 SIMD Architecture
33 pages
High Performance Computing On Gpu
No ratings yet
High Performance Computing On Gpu
37 pages
Ca Unit 4 Prabu
No ratings yet
Ca Unit 4 Prabu
24 pages
Comparison of Multimedia SIMD, GPUs and Vector
No ratings yet
Comparison of Multimedia SIMD, GPUs and Vector
13 pages
DS1822 - Parallel Computing-unit3
No ratings yet
DS1822 - Parallel Computing-unit3
6 pages
Silver Oak College of Engineering and Technology: Computer Organization Module Solution - 4
No ratings yet
Silver Oak College of Engineering and Technology: Computer Organization Module Solution - 4
11 pages
Multicore Computers
No ratings yet
Multicore Computers
9 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
HB - Austausch FST2 Gegen FST2XT - 2015-12 - en
No ratings yet
HB - Austausch FST2 Gegen FST2XT - 2015-12 - en
21 pages
Lab Program 12 - Analog To Digital Converter
No ratings yet
Lab Program 12 - Analog To Digital Converter
21 pages
Audio 8 DJ Manual English
No ratings yet
Audio 8 DJ Manual English
61 pages
FALLSEM2019-20 ITE1007 ETH VL2019201002633 Reference Material I 10-Jul-2019 OOAD ITE1007
No ratings yet
FALLSEM2019-20 ITE1007 ETH VL2019201002633 Reference Material I 10-Jul-2019 OOAD ITE1007
10 pages
HPE MR Storage Administrator User Guide-A00095199en - Us
No ratings yet
HPE MR Storage Administrator User Guide-A00095199en - Us
81 pages
Cilkray Graphics-Business Case Study
No ratings yet
Cilkray Graphics-Business Case Study
11 pages
Important Dos Commands
No ratings yet
Important Dos Commands
14 pages
3A Step-Down Voltage Regulator LM2596HV
No ratings yet
3A Step-Down Voltage Regulator LM2596HV
13 pages
Samsung GT E1195
No ratings yet
Samsung GT E1195
50 pages
Top 60+ TCS NQT Interview Questions and Answers for 2025 (1)
No ratings yet
Top 60+ TCS NQT Interview Questions and Answers for 2025 (1)
18 pages
01 06 Plug and Play Deployment Configuration
No ratings yet
01 06 Plug and Play Deployment Configuration
11 pages
Tic106 Series Silicon Controlled Rectifiers
No ratings yet
Tic106 Series Silicon Controlled Rectifiers
8 pages
Special Function Register in 8051
No ratings yet
Special Function Register in 8051
5 pages
Experiment #5
No ratings yet
Experiment #5
9 pages
SFG-2000Series Generador de Funciones
No ratings yet
SFG-2000Series Generador de Funciones
1 page
Read Me
No ratings yet
Read Me
6 pages
HP Color LaserJet Managed MFP E77830 Data Sheet
No ratings yet
HP Color LaserJet Managed MFP E77830 Data Sheet
5 pages
Head-of-Line Blocking in QUIC and HTTP3 The Details
No ratings yet
Head-of-Line Blocking in QUIC and HTTP3 The Details
23 pages
Certification Material PDF
No ratings yet
Certification Material PDF
224 pages
Cics Class 05
No ratings yet
Cics Class 05
18 pages
Network Exam 2
No ratings yet
Network Exam 2
3 pages
SMART GLOVE: Sign To Speech Conversion and Home Automation Control For Mute Community
No ratings yet
SMART GLOVE: Sign To Speech Conversion and Home Automation Control For Mute Community
5 pages
13 Functions and Function Blocks
100% (1)
13 Functions and Function Blocks
25 pages
Jetjat Live-W: Owner's Manual
No ratings yet
Jetjat Live-W: Owner's Manual
34 pages
IT Grade 9
No ratings yet
IT Grade 9
4 pages
7.2.7 Lab4 - View Network Device MAC Addresses
No ratings yet
7.2.7 Lab4 - View Network Device MAC Addresses
9 pages
Evolution of Microprocessor
No ratings yet
Evolution of Microprocessor
32 pages