0% found this document useful (0 votes)
12 views

Unit 1

The document discusses the course U21CS603 - Distributed Systems. It outlines the course objectives, outcomes and covers topics like parallelism fundamentals, Flynn's taxonomy, multi-core processors, shared vs distributed memory and introduction to OpenMP programming.

Uploaded by

Harini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Unit 1

The document discusses the course U21CS603 - Distributed Systems. It outlines the course objectives, outcomes and covers topics like parallelism fundamentals, Flynn's taxonomy, multi-core processors, shared vs distributed memory and introduction to OpenMP programming.

Uploaded by

Harini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

U21CS603 – DISTRIBUTED SYSTEMS Dr.

Vishnu Kumar Kaliappan

U21CS603 – DISTRIBUTED COMPUTING

UNIT - I
PARALLELISM FUNDAMENTALS AND
ARCHITECTURE
Dr. Vishnu Kumar Kaliappan
Professor, Dept. CSE
[email protected],[email protected]

2 January 2024 1
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Google Classroom Code for DC?

u2pkdvp
2 January 2024 2
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Course Objective
• To Introduce the fundamentals of parallel and distributed computing
architectures and paradigms.

• To understand the technologies, systems architecture and communication


architecture that propelled the growth of parallel and distributed computing
systems.

• To develop and execute basic parallel and distributed application using basic
programming models and tools.

2 January 2024 3
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Course Outcome
• CO1: Design and implement distributed computing systems (Apply)

• CO2: Asses models for distributed systems (Apply)

• CO3: Design and implement distributed algorithms (Apply)

• CO4: Experiment with mechanisms such as client/server and P2P algorithms,


remote procedure calls and consistency (Apply)

• CO5: Analyze the requirement for programming parallel systems and critically
evaluate the strengths and weaknesses of parallel programming models
(Analyze)

2 January 2024 4
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Unit 1 – Parallelism Fundamentals


• Motivation
• Key Concepts and Challenges
• Overview of Parallel computing
• Flynn’s Taxonomy
• Multi-core processors
• Shared vs Distributed memory
• Introduction to OpenMP
programming
• Instruction Level Support for Parallel
Programming
• SIMD
• Vector Processing
• GPUs.
2 January 2024 5
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Parallel computing
• Parallel computing is a
computing architecture
that involves executing
multiple processors on an
application or computation
simultaneously.

2 January 2024 6
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 7
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 Slide credit: Prof.Kayvon, Standford CS149 8


Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 Slide credit: Prof.Kayvon, Standford CS149 9


Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 Slide credit: Prof.Kayvon, Standford CS149 10


Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 Ref: Prof.Kayvon, Standford CS149 11


Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 Ref: Prof.Kayvon, Standford CS149 12


Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 Ref: Prof.Kayvon, Standford CS149 13


Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 14
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 15
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 16
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 17
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 18
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 19
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 20
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 21
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 22
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 23
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 24
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 25
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 26
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 27
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 28
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 29
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 30
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Takeaways from the video


• How does the super computer works ?
• What is the basic concept adopted ?
• Is the parallel concept adopted ?????

Let see……

2 January 2024 31
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Assignment
• Google……it………
• Find top 10 super computer exist around the globe and compare their
performance. (mention location, flops, etc.,)
• Write for one pager and submit in google classroom tomorrow.

2 January 2024 32
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Motivation
• Development of parallel software has traditionally been thought of as time
and effort intensive.
• largely attributed to the inherent complexity of specifying and coordinating
concurrent tasks, a lack of portable algorithms, standardized environments,
and software development toolkits.
• It takes two years to develop a parallel application, during which time the
underlying hardware and/or software platform has become obsolete.
• This is a result of lack of implicit parallelism as well as other bottlenecks
such as the datapath and the memory.

2 January 2024 33
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Concepts and challenges


• Moore's law has been extrapolated to state that the amount of
computing power available at a given cost doubles approximately
every 18 months.
• Memory/disk Speed Argument:
• clock rates of high-end processors have increased at roughly 40% per year
over the past decade, DRAM access times have only improved at the rate of
roughly 10% per year over this interval.
• Data Communication Argument:
• As the networking infrastructure evolves, the vision of using the Internet as
one large heterogeneous parallel/distributed computing environment has
begun to take shape.

2 January 2024 34
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Overview of Parallel computing


• Applications
• Applications in Engineering and Design
• Scientific Applications
• Applications in Computer Systems
• More…..

2 January 2024 35
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Flynn’s Taxonomy or Flynn’s classification of


computer (GATE topic for CSE):
• What is Flynn’s classification of computer?
• M.J. Flynn offered a classification for a
computer system’s organization based on the
number of instructions as well as data items
that are changed at the same time.
• An instruction stream is a collection of
instructions read from memory
• A data stream is the result of the actions done
on the data in the processor.
• The term ‘stream’ refers to the flow of data or
instructions.
• Parallel processing can happen in the data
stream, the instruction stream, or both.

2 January 2024 36
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Flynn’s Classification
• SISD: SISD is an abbreviation
for Single Instruction and Single
Data Stream. It depicts the
structure of a single computer,
which includes a control unit, a
memory unit, and a processor
unit.
• Like classic Von-Neumann
computers, most conventional
computers utilize the SISD PE = Processing Element, CU = Control Unit, and M = Memory
architecture
➢ Examples: Minicomputers, workstations, and computers from previous generations.
2 January 2024 37
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Flynn’s Classification
• SIMD: SIMD is an abbreviation for
Single Instruction and Multiple Data
Stream. It symbolizes an organization
with a large number of processing
units overseen by a common control
unit.
• SIMD was created with array
processing devices in mind. Vector
processors, on the other hand, can be
included in this category according to
Flynn’s taxonomy. There are
architectures that are not vector
processors but are SIMD
architectures.
2 January 2024 38
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Flynn’s Classification
• MISD: MISD is an abbreviation for
Multiple Instruction and Single Data
stream. Because no real system has
been built using the MISD structure, it
is primarily of theoretical importance.
• Multiple processing units work on a
single data stream in MISD. Each
processing unit works on the data in
its own way, using its own instruction
stream.
Here, M = Memory Modules, P = Processor
Units, and CU = Control Unit

2 January 2024 39
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Flynn’s Classification
• MIMD: MIMD is an abbreviation
for Multiple Instruction and
Multiple Data Stream. All
processors of a parallel computer
may execute distinct instructions
and act on different data at the
same time in this organization.
• Each processor in MIMD has its
own program, and each program
generates an instruction stream
PE = Processing Element, M = Memory
Module, and CU = Control Unit

2 January 2024 40
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Multi-Core Processors
• A multicore processor is an integrated circuit that has two or more
processor cores attached for enhanced performance and reduced
power consumption.
• These processors also enable more efficient simultaneous processing
of multiple tasks, such as with parallel processing and multithreading.
• A dual core setup is similar to having multiple, separate processors
installed on a computer. However, because the two processors are
plugged into the same socket, the connection between them is faster.
• The use of multicore processors or microprocessors is one approach
to boost processor performance without exceeding the practical
limitations of semiconductor design and fabrication.

2 January 2024 41
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Multi-Core Processors
• Multicore processors working concept: The heart of every processor
is an execution engine, also known as a core. The core is designed to
process instructions and data according to the direction of software
programs in the computer's memory.
• Clock speed. One approach was to make the processor's clock faster. The
clock is the "drumbeat" used to synchronize the processing of instructions
and data through the processing engine.
• Hyper-threading. Another approach involved the handling of multiple
instruction threads. Intel calls this hyper-threading. With hyper-threading,
processor cores are designed to handle two separate instruction threads at
the same time.

2 January 2024 42
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Multi-Core Processors
• More chips. The next step was to add processor chips -- or dies -- to the
processor package, which is the physical device that plugs into the motherboard.
A dual-core processor includes two separate processor cores.

Multicore processors Architecture


2 January 2024 43
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Multi-Core Processors
• Types of multicore processors: Different multicore processors often
have different numbers of cores. For example, a quad-core processor
has four cores.
• Core types:
• Homogeneous (symmetric) cores. All of the cores in a homogeneous
multicore processor are of the same type; typically, the core processing units
are general-purpose central processing units that run a single multicore
operating system.
• Heterogeneous (asymmetric) cores. Heterogeneous multicore processors
have a mix of core types that often-run different operating systems and
include graphics processing units.

2 January 2024 44
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Multi-Core Processors
• Number and level of caches. Multicore processors vary in terms of
their instruction and data caches, which are relatively small and fast
pools of local memory.
• How cores are interconnected. Multicore processors also vary in
terms of their bus architectures.
• Isolation. The amount, typically minimal, of in-chip support for the
spatial and temporal isolation of cores:
• Physical isolation ensures that different cores cannot access the same
physical hardware (e.g., memory locations such as caches and RAM).
• Temporal isolation ensures that the execution of software on one core does
not impact the temporal behavior of software running on another core.

2 January 2024 45
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Multi-Core Processors

Homogeneous Multicore Processor


2 January 2024 46
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Multi-Core Processors

Heterogeneous Multicore Processor


2 January 2024 47
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Application of Multicore processors


• Multicore processors work on any modern computer hardware
platform. Virtually all PCs and laptops today build in some multicore
processor model.
• Virtualization. A virtualization platform, such as VMware, is designed to
abstract the software environment from the underlying hardware.
Virtualization is capable of abstracting physical processor cores into virtual
processors or central processing units (vCPUs) which are then assigned to
virtual machines (VMs).
• Databases. A database is a complex software platform that frequently needs
to run many simultaneous tasks such as queries.
• Analytics and HPC. Big data analytics, such as machine learning, and high-
performance computing (HPC) both require breaking large, complex tasks into
smaller and more manageable pieces.

2 January 2024 48
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Application of Multicore processors

• Cloud. Organizations building a cloud will almost certainly adopt


multicore processors to support all the virtualization needed to
accommodate the highly scalable and highly transactional demands
of cloud software platforms such as OpenStack
• Visualization. Graphics applications, such as games and data-
rendering engines, have the same parallelism requirements as other
HPC applications

2 January 2024 49
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Shared Vs Distributed Memory


• Shared Memory: Shared memory is the memory which
all the processors can access. In hardware point of view
it means all the processors have direct access to the
common physical memory through bus based (usually
using wires) access
• Uniform Memory Access (UMA):
• Most commonly represented today by Symmetric
Multiprocessor (SMP) machines
• Identical processors
• Equal access and access times to memory
• Sometimes called CC-UMA - Cache Coherent UMA. Cache
coherent means if one processor updates a location in shared
memory, all the other processors know about the update.
Cache coherency is accomplished at the hardware level.

2 January 2024 50
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Shared Vs Distributed Memory


• Non-Uniform Memory Access (NUMA):
• Often made by physically linking two or more
SMPs
• One SMP can directly access memory of another
SMP
• Not all processors have equal access time to all
memories
• Memory access across link is slower
• If cache coherency is maintained, then may also be
called CC-NUMA - Cache Coherent NUMA

2 January 2024 51
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Shared Vs Distributed Memory


• Distributed Memory: Distributed memory
in hardware sense, refers to the case
where the processors can access another
processor's memory only through
network. In software sense, it means
each processor only can see local
machine memory directly and should use
communications through network to
access memory of the other processors

2 January 2024 52
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Introduction to OpenMP Programming


• OpenMP is a standard parallel programming API for shared memory
environments, written in C, C++, or FORTRAN.
• It consists of a set of compiler directives with a “lightweight” syntax,
library routines, and environment variables that influence run-time
behavior.
• OpenMP is governed by OpenMP Architecture Review Board (or
OpenMP ARB), and is defined by several hardware and software
vendors.

2 January 2024 53
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Introduction to OpenMP Programming

OpenMP Solution Stack


2 January 2024 54
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Introduction to OpenMP Programming


• Assignment
• Follow the instruction in the materials for installation of OpenMP
• Create a Simple parallel Loop with the given code in the material.
• Following are to be submitted as assignment in google classroom
• Getting Started with OpenMP: //%compiler: clang
• Introduction to parallel programming //%cflags: -fopenmp
• Hello world and how threads work #include <stdio.h>
• The Core features of OpenMP #include <stdlib.h>
#include <omp.h>
• Creating Threads (the Pi program)
• Parallel Loops (making the Pi program simple) int main(int argc, char *argv[]){
• Working with OpenMP #pragma omp parallel
• Synchronize single masters and stuff printf("%s\n", "Hello World");

return 0;
}
2 January 2024 55
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

SIMD: Single instruction, multiple data (SIMD)


• Single instruction, multiple data (SIMD) is a form of parallel execution in
which the same operation is performed on multiple data elements
independently in hardware vector processing units (VPU), also called
SIMD units.
• The addition of two vectors to form a third vector is a SIMD operation.
Many processors have SIMD (vector) units that can perform
simultaneously 2, 4, 8 or more executions of the same operation (by a
single SIMD unit).

2 January 2024 56
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 57
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Vector Processing
• Vector processor is basically a central processing unit that has the
ability to execute the complete vector input in a single instruction.
More specifically we can say, it is a complete unit of hardware
resources that executes a sequential set of similar data items in the
memory using a single instruction.

2 January 2024 58
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

2 January 2024 59
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Characteristic of vector processing


• A vector is defined as an ordered set of a one-dimensional array of
data items. A vector V of length n can be represented as a row vector
by V = [V1 V2 V3 · · · Vn]. If the data items are listed in a column, it
may be represented as a column vector. For a processor with multiple
ALUs, it is possible to operate on multiple data elements in parallel
using a single instruction. Such instructions are called single-
instruction multiple-data (SIMD) instructions. They are also called
vector instructions.
instruction format of the vector processor

Base address Base address Base address


Operation code Vector length
source1 source2 destination

2 January 2024 60
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Classification of vector processor


Register to Register Architecture: This architecture is highly
used in vector computers. As in this architecture, the fetching of
the operand or previous results indirectly takes place through
the main memory by the use of registers.
Memory to Memory Architecture: Here in memory to
memory architecture, the operands or the results are directly
fetched from the memory despite using registers.

2 January 2024 61
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

vector processor
• Advantages of Vector Processor
• Vector processor uses vector instructions by which code density of the
instructions can be improved.
• The sequential arrangement of data helps to handle the data by the hardware
in a better way.
• It offers a reduction in instruction bandwidth.
• So, from the above discussion, we can conclude that register to
register architecture is better than memory to memory architecture
because it offers a reduction in vector access time.

2 January 2024 62
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Graphical Processing Units (GPUs)


• GPU stands for graphics processing unit. GPUs were originally
designed specifically to accelerate computer graphics workloads,
particularly for 3D graphics.
• GPU parallel computing is now used in a wide range of applications,
including graphics and video rendering. GPU parallel computing is the
ability to perform several tasks at once.
• GPU Parallel computing enables GPUs to break complex problems
into thousands or millions of separate tasks and work them out all at
once instead of one-by-one like a CPU needs to.

2 January 2024 63
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Graphical Processing Units (GPUs)

2 January 2024 64
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012
U21CS603 – DISTRIBUTED SYSTEMS Dr. Vishnu Kumar Kaliappan

Recap
Motivation
Key Concepts and Challenges
Overview of Parallel computing
Flynn’s Taxonomy
Multi-core processors
Shared vs Distributed memory
Introduction to OpenMP programming
Instruction Level Support for Parallel Programming
SIMD
Vector Processing
GPUs.

End of Unit 1
2 January 2024 65
Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar, “Introduction to Parallel Computing”, Pearson, 2nd 2012

You might also like