0% found this document useful (0 votes)
2 views

Lecture 2 General Parallelism Terms

The document discusses the fundamentals of parallel and distributed computing, focusing on the von Neumann architecture and various classifications of parallel computers, including Flynn's Taxonomy. It explains different types of parallel processing such as SISD, SIMD, MISD, and MIMD, along with key terminologies like supercomputing, synchronization, and scalability. Additionally, it highlights the necessity of parallel computing due to the rise of multi-core machines and the benefits it offers, including reduced execution time and energy efficiency.

Uploaded by

zaeemrana69
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 2 General Parallelism Terms

The document discusses the fundamentals of parallel and distributed computing, focusing on the von Neumann architecture and various classifications of parallel computers, including Flynn's Taxonomy. It explains different types of parallel processing such as SISD, SIMD, MISD, and MIMD, along with key terminologies like supercomputing, synchronization, and scalability. Additionally, it highlights the necessity of parallel computing due to the rise of multi-core machines and the benefits it offers, including reduced execution time and energy efficiency.

Uploaded by

zaeemrana69
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

CSC334 Parallel & Distributed Computing

Lecture # 02
General Parallelism Terminologies
von Neumann Architecture
• Authored the general
requirements for an electronic
computer in his 1945 papers
• Also known as "stored-program
computer”
• Both program instructions and
data are kept in electronic
memory
• Since then, virtually all computers
have followed his design
John von Neumann
Von Neumann Architecture
Four main Components
• Read/write, random access memory is used
to store both program instructions and data
• Program instructions are coded data which tell
the computer to do something
• Data is simply information to be used by the
program
• Control unit fetches instructions/data from
memory, decodes the instructions and then
sequentially coordinates operations to
accomplish the programmed task.
• Arithmetic Unit performs basic arithmetic
operations parallel computers still follow this basic design
• Input/Output is the interface to the human Just multiplied in units
operator
Flynn's Classical Taxonomy
• Different ways to classify parallel
computers
• One of the more widely used
classifications, in use since 1966,
is called Flynn's Taxonomy
• Distinguishes multi-processor
computer architectures
according to two dimensions
• Instruction Stream
• Data Stream
SISD (Scalar Processing)
• A serial (non-parallel) computer
• Single Instruction: Only one instruction
stream is being acted on by the CPU
during any one clock cycle
• Single Data: Only one data stream is
being used as input during any one clock
cycle
• Deterministic execution
• This is the oldest type of computer
• Examples: older generation mainframes,
minicomputers, workstations and single
processor/core PCs.
SIMD (Vector Processing)
• Single Instruction: All processing units
execute the same instruction at any given
clock cycle
• Multiple Data: Each processing unit can
operate on a different data element
• Best suited for specialized problems such as
graphics/image processing.
• Synchronous and deterministic execution
• Processor Arrays Vector Pipelines
• Examples:
• Processor Arrays: Thinking Machines CM-2
• Vector Pipelines: IBM 9000
• Most modern computers with GPU units
Scalar vs Vector
(SIMD)
Multiple Instruction, Single Data
(MISD)
• Multiple Instruction: Each processing
unit operates on the data independently
via separate instruction streams.
• Single Data: A single data stream is fed
into multiple processing units.
• Few (if any) actual examples of this class
of parallel computer have ever existed.
• Some conceivable uses might be:
• multiple frequency filters operating on a
single signal stream
• multiple cryptography algorithms
attempting to crack a single coded
message.
Multiple Instruction, Multiple
Data (MIMD)
• Multiple Instruction: Every processor may
be executing a different instruction stream
• Multiple Data: Every processor may be
working with a different data stream
• Execution can be synchronous or
asynchronous, deterministic or non-
deterministic
• Currently, the most common type of
parallel computer - most modern
supercomputers fall into this category
• Examples: most current supercomputers,
networked parallel computer clusters,
grids, multi-core PCs
MIMD
Classification
• Shared Memory
• Distributed Memory
Shared vs Distributed
Memory
Some General Parallel Terminology
• Supercomputing / High Performance Computing (HPC)
• Using the world's fastest and largest computers to solve large problems
• Node
• A standalone "computer in a box". Usually comprised of multiple CPUs/processors/cores,
memory, network interfaces, etc. Nodes are networked together to comprise a
supercomputer.
• CPU / Socket / Processor / Core
• In the past, a CPU (Central Processing Unit) was a singular execution component
• Then, multiple CPUs were incorporated into a node
• Then, individual CPUs were subdivided into multiple "cores", each being a unique execution
unit
• CPUs with multiple cores are sometimes called "sockets”
• The result is a node with multiple CPUs, each containing multiple cores
Some General Parallel Terminology
• Task
• A logically discrete section of computational work. A task is typically a program or
program-like set of instructions that is executed by a processor. A parallel program
consists of multiple tasks running on multiple processors.
• Pipelining
• Breaking a task into steps performed by different processor units, with inputs
streaming through, much like an assembly line; a type of parallel computing.
• Shared Memory
• From a strictly hardware point of view, describes a computer architecture where all
processors have direct (usually bus based) access to common physical memory. In a
programming sense, it describes a model where parallel tasks all have the same
"picture" of memory and can directly address and access the same logical memory
locations regardless of where the physical memory actually exists.
Some General Parallel Terminology
• Symmetric Multi-Processor (SMP)
• Shared memory hardware architecture where multiple processors share a single
address space and have equal access to all resources.
• Distributed Memory
• In hardware, refers to network based memory access for physical memory that is not
common. As a programming model, tasks can only logically "see" local machine
memory and must use communications to access memory on other machines where
other tasks are executing.
• Communications
• Parallel tasks typically need to exchange data. There are several ways this can be
accomplished, such as through a shared memory bus or over a network, however the
actual event of data exchange is commonly referred to as communications regardless
of the method employed
Some General Parallel Terminology
• Synchronization
• The coordination of parallel tasks in real time, very often associated with communications.
• Often implemented by establishing a synchronization point within an application where a task may not proceed further until another task(s)
reaches the same or logically equivalent point.
• Synchronization usually involves waiting by at least one task and can therefore cause a parallel application's wall clock execution time to
increase.
• Granularity
• granularity is a qualitative measure of the ratio of computation to communication
• Coarse: relatively large amounts of computational work are done between communication events
• Fine: relatively small amounts of computational work are done between communication events
• Observed Speedup
• Observed speedup of a code which has been parallelized, defined as:
wall-clock time of serial execution
-----------------------------------
wall-clock time of parallel execution
• One of the simplest and most widely used indicators for a parallel program's performance.
Some General Parallel Terminology
• Parallel Overhead
• The amount of time required to coordinate parallel tasks, as opposed to doing useful work.
Parallel overhead can include factors such Task start-up time
• Synchronizations
• Data communications
• Software overhead imposed by parallel languages, libraries, operating system, etc.
• Task termination time
• Massively Parallel
• Refers to the hardware that comprises a given parallel system - having many processing
elements. The meaning of "many" keeps increasing, but currently, the largest parallel
computers are comprised of processing elements numbering in the hundreds of thousands
• Embarrassingly Parallel
• Solving many similar, but independent tasks simultaneously; little to no need for coordination
between the tasks
Some General Parallel Terminology
• Scalability
• Refers to a parallel system's (hardware and/or software) ability to
demonstrate a proportionate increase in parallel speedup with the addition of
more resources. Factors that contribute to scalability include:
• Hardware - particularly memory-CPU bandwidths and network communication
properties
• Application algorithm
• Parallel overhead related
• Characteristics of your specific application
Why Parallel Computing is
necessary?
• Rise of multi-core computing machines
• Under-utilization of resources
• Hyperthreading, introduced by Intel
Example
• Let’s take a 16-core CPU with
hyperthreading and a 256 bit-wide
vector unit, commonly found in home
desktops. A serial program using a
single core and no vectorization only
uses 0.8% of the theoretical
processing capability of this processor.
• The calculation is
• 16 cores × 2 hyperthreads × (256 bit-
wide vector unit)/(64-bit double) = 128-
way parallelism
where 1 serial path/128 parallel paths
= .008 or 0.8%. The following figure
shows that this is a small fraction of the
total CPU processing power.
Hyper-threading
• Hyper-threading is a process by which a CPU divides up its physical
cores into virtual cores that are treated as if they are actually physical
cores by the operating system.
• These virtual cores are also called threads.
• Most of Intel’s CPUs with 2 cores use this process to create 4 threads
or 4 virtual cores.
• Intel CPUs with 4 cores use hyper-threading to create more power in
the form of 8 virtual cores, or 8 threads.
Benefits of Parallel Computing
• Lesser execution time with multiple cores
• Solve larger and complex problems
• Energy efficient system
• Cost reduction

You might also like