SlideShare a Scribd company logo
HIGH PERFORMANCE
COMPUTING
INFRASTRUCTURES
NGDT - II
CONTENTS
• Introduction to HPC
• Parallel Architectures
• Multi Cores
• Graphical Processing Units
• Clusters
• Grid Computing
• Cloud Computing
HIGH PERFORMANCE
COMPUTING-HPC• Ability to process data and perform complex calculations at high
speeds
• One of the best-known types of HPC solutions is the supercomputer
• Supercomputer contains thousands of compute nodes that work
together to complete one or more tasks
The IBM Blue Gene/P supercomputer
"Intrepid" at Argonne National Laboratory
runs 164,000 processor cores using
normal data center air conditioning,
grouped in 40 racks/cabinets connected
by a high-speed 3-D torus network
WHEN DO WE NEED HPC?
• Case1: Complete a time-consuming operation in less time
 I am an automotive engineer
 I need to design a new car that consumes less gasoline
 I’d rather have the design completed in 6 months than in 2 years
 I want to test my design using computer simulations rather than building very
expensive prototypes and crashing them
• Case 2: Complete an operation under a tight deadline
 I work for a weather prediction agency
 I am getting input from weather stations/sensors
 I’d like to predict tomorrow’s forecast today
• Case 3: Perform a high number of operations per seconds
 I am an engineer at Amazon.com
 My Web server gets 1,000 hits per seconds
 I’d like my web server and databases to handle 1,000 transactions per seconds so
that customers do not experience bad delays
WHAT DOES HPC INCLUDE?
• High-performance computing is fast computing
• Computations in parallel over lots of compute elements (CPU, GPU)
• Very fast network to connect between the compute elements
• Hardware
• Computer Architecture
• Vector Computers, Distributed Systems, Clusters
• Network Connections
• InfiniBand, Ethernet, Proprietary
• Software
• Programming models
• MPI (Message Passing Interface), SHMEM (Shared Memory), PGAS, etc.
• Applications
• Open source, commercial
HOW DOES HPC WORK?
• HPC solutions have three main components:
Compute
Network
Storage
• To build a high-performance computing architecture, compute servers
are networked together into a cluster
• Software programs and algorithms are run simultaneously on the
servers in the cluster
• Cluster is networked to the data storage to capture the output
• Together, these components operate seamlessly to complete a diverse
set of tasks
PARALLEL ARCHITECTURES
• Traditionally, software has been written for serial computation
• A problem is broken into a discrete series of instructions
• Instructions are executed sequentially one after another
• Executed on a single processor
• Only one instruction may execute at any moment in time
• Parallel computing is the simultaneous use of multiple compute
resources to solve a computational problem
• A problem is broken into discrete parts that can be solved concurrently
• Each part is further broken down to a series of instructions
• Instructions from each part execute simultaneously on different processors
• An overall control/coordination mechanism is employed
PARALLEL ARCHITECTURES
Serial Computing Parallel Computing
PARALLEL ARCHITECTURES
• Virtually all stand-alone computers today are parallel from a
hardware perspective:
Multiple functional units (L1 cache, L2 cache, branch, pre-fetch, decode, floating-
point, graphics processing (GPU), integer, etc.)
Multiple execution units/cores
Multiple hardware threads
• Networks connect multiple stand-alone computers (nodes) to make
larger parallel computer cluster
WHY USE PARALLEL
ARCHITECTURES?
• Save time and/or money
• Solve larger / more complex problems
• Provide concurrency
• Take advantage of non-local resources
• Make better use of underlying parallel hardware
TYPES OF PARALLELISM
• Data Parallelism
 Focuses on distributing the data across different parallel computing nodes
 Also called as loop-level parallelism
 Example: CPU A could add all elements from the top half of the matrices, while
CPU B could add all elements from the bottom half of the matrices
 Since the two processors work in parallel, the job of performing matrix addition
would take one half the time of performing the same operation in serial using one
CPU alone
• Task Parallelism
 Focuses on distribution of tasks across different processors
 Also known as functional parallelism or control parallelism
 As a simple example, if we are running code on a 2-processor system (CPUs "a" &
"b") in a parallel environment and we wish to do tasks "A" and "B" , it is possible to
tell CPU "a" to do task "A" and CPU "b" to do task 'B" simultaneously, reducing the
runtime of the execution
PARALLEL ARCHITECTURES
• Flynn's taxonomy distinguishes multi-processor computer architectures
according to how they can be classified along the two independent
dimensions of Instruction Stream and Data Stream
• Flynn’s classical taxonomy of Parallel Architectures:
• SISD – Single Instruction stream Single Data stream
• SIMD – Single Instruction stream Multiple Data stream
• MISD – Multiple Instruction stream Single Data stream
• MIMD – Multiple Instruction stream Multiple Data stream
FLYNN’S CLASSICAL
TAXONOMYSISD SIMD MISD MIMD
• Serial
• Only one instruction and
data stream is acted on
during any one clock
cycle
• Examples: older
generation mainframes,
minicomputers,
workstations and single
processor/core PCs
• All processing units
execute the same
instruction at any given
clock cycle
• Each processing unit
operates on a different
data element
• Most modern computers,
particularly those with
GPUs employ SIMD
instructions and
execution units
• Different instructions
operated on a single data
element.
• Example: Multiple
cryptography algorithms
attempting to crack a
single coded message
• Can execute different
instructions on different
data elements.
• Examples: Most current
supercomputers,
networked parallel
computer clusters and
"grids", multi-processor
computers, multi-core
PCs
PARALLEL COMPUTER
MEMORY ARCHITECTURES:
SHARED MEMORY
ARCHITECTURE• All processors access all memory as a single global address space & data sharing is fast
• Multiple processors can operate independently but share the same memory resources
• Changes in a memory location effected by one processor are visible to all other processors
• Shared memory machines have been classified as UMA and NUMA, based upon memory
access times
Uniform Memory Access (UMA) Non-Uniform Memory Access (NUMA)
• Commonly represented today by Symmetric
Multiprocessor (SMP) machines
•Identical processors, equal access and access times
to memory
•Sometimes called CC-UMA - Cache Coherent UMA.
Cache coherent means if one processor updates a
location in shared memory, all the other processors
know about the update.
•Often made by physically linking two or more SMPs
•One SMP can directly access memory of another
SMP
•Not all processors have equal access time to all
memories
•Memory access across link is slower
•If cache coherency is maintained, then may also be
called
CC-NUMA - Cache Coherent NUMA
PARALLEL COMPUTER
MEMORY ARCHITECTURES:
DISTRIBUTED MEMORY
ARCHITECTURE• Each processor has its own memory
• Programmer is responsible for many details of communication between processors
• Each processor has its own local memory, it operates independently. Changes it
makes to its local memory have no effect on the memory of other processors. Hence,
the concept of cache coherency does not apply
• When a processor needs access to data in another processor, it is usually the task of
the programmer to explicitly define how and when data is communicate
• Synchronization between tasks is likewise the programmer's responsibility
MULTI CORES
• A multi-core processor is a single computing component with two or more
independent processing units called cores, which read and execute program
instructions
Multi-Core CPU Chip
Single Core CPU Chip
MULTI-CORES
• The cores fit on a single processor socket
• Also called CMP (Chip Multi-Processor)
• The cores run in parallel
• Interaction with OS:
• OS perceives each core as a separate processor
• OS scheduler maps threads/processes to different cores
• Most major OS support multi-core today
WHY MULTI-CORES?
• Difficult to make single-core clock frequencies even higher
• Deeply pipelined circuits:
• heat problems
• speed of light problems
• difficult design and verification
• large design teams necessary
• server farms need expensive air-conditioning
• Many new applications are multithreaded
• General trend in computer architecture (shift towards more parallelism)
MULTI-CORES
• Multi-core processors are MIMD
• Different cores execute different threads ( Multiple Instructions), operating on different parts
of memory ( Multiple Data)
• Multi-core is a shared memory multiprocessor
• All cores share the same memory
WHAT APPLICATIONS
BENEFIT FROM MULTI-CORE?
• Database servers
• Web servers (Web commerce)
• Compilers
• Multimedia applications
• Scientific applications, CAD/CAM
• Editing a photo while recording a TV show through a digital video recorder
• Downloading software while running an anti-virus program
• Anything that can be threaded today will map efficiently to multi-core
MULTI-CORES: CACHE
COHERENCE PROBLEM
• Cache coherence is the uniformity of shared resource data that ends up
stored in multiple local caches
• When clients in a system maintain caches of a common memory resource,
problems may arise with incoherent data, which is particularly the case with
CPUs in a multi-core architecture
• Coherence Mechanism:
• Snooping:
• Snooping based protocols tend to be faster, if enough bandwidth is available, since all transactions are a
request/response seen by all processors
• Snooping isn't scalable. Every request must be broadcast to all nodes in a system
• Directory based
• Tend to have longer latencies but use much less bandwidth since messages are point to point and not broadcast
• For this reason, many of the larger systems (>64 processors) use this type of cache coherence
MULTI-CORES: COHERENCE
PROTOCOLS
• Write-invalidate:
• When a write operation is observed to a location that a cache has a copy of, the cache
controller invalidates its own copy of the snooped memory location, which forces a read from
main memory of the new value on its next access
•Write-update:
• When a write operation is observed to a location that a cache has a copy of, the cache
controller updates its own copy of the snooped memory location with the new data
GRAPHICAL PROCESSING
UNITS- GPU
• Processor optimized for 2D/3D graphics, video, visual computing, and
display
• Highly parallel, highly multithreaded multiprocessor optimized for visual
computing
• Provide real-time visual interaction with computed objects via graphics
images, and video
• Serves as both a programmable graphics processor and a scalable parallel
computing platform
• Heterogeneous Systems: combine a GPU with a CPU
GPU EVOLUTION
• 1980’s – No GPU. PC used VGA controller
•1990’s – Add more function into VGA controller
•1997 – 3D acceleration functions:
 Hardware for triangle setup and rasterization
 Texture mapping
 Shading
•2000 – A single chip graphics processor ( beginning of GPU term)
•2005 – Massively parallel programmable processors
•2007 – CUDA (Compute Unified Device Architecture)
• 2010 – AMD’s Radeon cards, GeForce 10 series
WHY GPU?
• To provide a separate dedicated graphics resources including a graphics
processor and memory.
• To relieve some of the burden of the main system resources, namely the
Central Processing Unit, Main Memory, and the System Bus, which would
otherwise get saturated with graphical operations and I/O requests
GPU VS CPU
• A GPU is tailored for highly parallel operation while a CPU executes
programs serially.
• For this reason, GPUs have many parallel execution units , while CPUs
have few execution units .
• GPUs have significantly faster and more advanced memory interfaces as
they need to shift around a lot more data than CPUs.
• GPUs have much deeper pipelines (several thousand stages vs 10-20 for
CPUs).
COMPONENTS OF GPU
• Graphics Processor
• Graphics co-processor
• Graphics accelerator
• Frame buffer
• Memory
• Graphics BIOS
• Digital-to-Analog Converter (DAC)
• Display Connector
• Computer (Bus) Connector
CLUSTERS
• A computer cluster is a group of loosely or tightly coupled computers that
work together closely so that in many respects it can be viewed as though it
were a single computer.
• Connected through fast LAN.
• Deployed to improve speed & reliability over that provided by a single
computer, while typically being much more cost effective than single
computer in terms of speed or reliability.
• Middleware is required to manage them
CLUSTERS
• In cluster computing each node within a cluster is an independent system,
with its own operating system, private memory, and, in some cases, its own
file system
• Processors on one node cannot directly access the memory on the other
nodes, programs or software run on clusters usually employ a procedure
called "message passing" to get data and execution code from one node to
another
NEED OF CLUSTERS
• More computing power
• Better reliability by orchestrating a number of low cost commercial off-the-
shelf computers has given rise to a variety of architectures and
configurations
• Improve performance and availability over that of a single computer
• More cost-effective than single computers of comparable speed or
availability
• E.g. Big Data
TYPES OF CLUSTERS
High Availability Clusters Load Balancing Clusters Compute Clusters
• Provide uninterrupted availability
of data or services (typically web
services) to the end-user community
• In case of node failure, service can
be restored without affecting the
availability of the services provided
by the cluster. There will be a
performance drop due to the
missing node
• Implementations: Data mining
,simulations, mission-critical
applications or databases, mail, file
and print, web, or application
servers
• E.g. Oracle Clusterware
• Distributes incoming requests for
resources or content among multiple
nodes running the same programs or
having the same content
• Every node in the cluster is able to
handle requests for the same content
or application.
• Typically, seen in a web-hosting
environment
• E.g. nginx as HTTP load balancer
• Used for computation-intensive
purposes, rather than handling IO-
oriented operations such as web
service or databases.
• Compute clusters vary in the level of
coupling
• Jobs with frequent
communications among nodes
may require dedicated network,
dense location & likely
homogenous nodes
• Jobs with infrequent
communication between nodes
may relax some of these
requirements
• E.g. Rocks package on Linux
BEOWULF CLUSTERS
• Uses parallel processing across multiple computers to create cheap and
powerful supercomputers.
• A cluster has two types of computers:
• Master or service node or front node : Used to interact with users and manage the cluster.
• Nodes : A group of computers (computing nodes) E.g. keyboard, mouse, floppy, video etc.
• E.g. OSCAR on Linux
• When a large problem or set of data is given to a Beowulf cluster, the
master computer first runs a program that breaks the problem into small
discrete pieces; it then sends a piece to each node to compute. As nodes
finish their tasks, the master computer continually sends more pieces to
them until the entire problem has been computed
CLUSTERS-TECHNOLOGIES
TO IMPLEMENT
• Parallel Virtual Machine (PVM)
• Must be directly installed on every cluster node & provides a set of software libraries that
paint the node as a “parallel virtual machine”
• Provides a run-time environment for :
• Message-passing
• Task & Resource management
• Fault notification
• Message Passing Interface (MPI)
• Drew on various features available in commercial systems of the time. The MPI
specifications then gave rise to specific implementations
• Implementations typically use TCP/IP & socket connections
• Widely available communications model that enables parallel programs to be written in
languages such as: C, Fortran, Python, etc
CLUSTER BENEFITS
• Availability
• Performance
• Low Cost
• Elasticity
• Run Jobs Anytime Anywhere
GRID COMPUTING
• Grid computing combines computers from multiple administrative domains
to reach a common goal
• What distinguishes grid computing from Cluster systems such as
cluster computing is that grids tend to be more loosely coupled,
heterogeneous, and geographically dispersed
• Special kind of distributed computing in which different computers within the
same network share one or more resources
High performance computing
TYPES OF GRID COMPUTING
– DATA GRIDS
• Allows you to distribute your data across the grid
• Main goal of Data Grid is to provide as much data as possible from
memory on every grid node and to ensure data coherency
• Characteristics:
a. Data Replication- all data is fully replicated to all nodes in the grid
b. Data Invalidation- Whenever data changes on one of the nodes, then the same data on
all other nodes is purged
c. Distributed Transactions- Transactions are required to ensure Data Coherency
d. Data Backups- Useful for fail-over. Some Data Grid products provide ability to assign
backup nodes for the data
e. Data Affinity/Partitioning- Allows to split/partition whole data set into multiple subsets
and assign every subset to a grid node
TYPES OF GRID COMPUTING
– COMPUTE GRIDS
• Allows to take a computation, optionally split it into multiple parts, and
execute them on different grid nodes in parallel leads to faster rate of
execution. E.g. MapReduce
• Helps to improve overall scalability and fault-tolerance by offloading your
computations onto most available nodes
• Characteristics:
a. Automatic Deployment
b. Topology Resolution - allows to provision nodes based on any node characteristic or user-specific configuration
c. Collision Resolution - Jobs are executed in parallel but synchronization is maintained
d. Load Balancing – proper balancing of your system load within grid
e. Checkpoints - Long running jobs should be able to periodically store their intermediate state
f. Grid Events - a querying mechanism for all grid events is essential
g. Node Metrics - a good compute grid solution should be able to provide dynamic grid metrics for all grid nodes
GRID COMPUTING
• Advantages:
 Can solve larger, more complex problems in a shorter time
 Easier to collaborate with other organizations
 Make better use of existing hardware
• Disadvantages:
 Grid software and standards are still evolving
 Learning curve to get started
 Non-interactive job submission
CLOUD COMPUTING
• Computing paradigm shift where computing is moved away from personal
computers or an individual application server to a “cloud” of computers.
• Abstraction: Users of the cloud only need to be concerned with the
computing service being asked for, as the underlying details of how it is
achieved are hidden.
• Virtualization: Cloud Computing virtualizes system by pooling and sharing
resources
NIST DEFINITION OF CLOUD
COMPUTING
• Cloud computing is a model for enabling ubiquitous, convenient, on‐demand
network access to a shared pool of configurable computing resources (e.g.,
networks, servers, storage, applications, and services) that can be rapidly
provisioned and released with minimal management effort or service
provider interaction
CHARACTERISTICS OF
CLOUD COMPUTING
1. On‐demand self‐service
2. Broad network access
3. Resource pooling
4. Rapid elasticity
5. Measured service
CLOUD COMPONENTS
• Clients
• Data Center (Collection of Servers where the application to which you
subscribe is housed)
• Internet
CLOUD COMPUTING-
BENEFITS
• Lower Costs
• Lower computer costs
• Reduced Software Costs
• By using the Cloud infrastructure on “pay as used and on demand”, all of us can save in capital and operational
investment!
• Ease of utilization
• Quality of Service
• Reliability
• Outsourced IT management
• Simplified maintenance and upgrade
• Low Barrier to Entry
• Unlimited storage capacity
• Universal document access
• Latest version availability
CLOUD COMPUTING -
LIMITATIONS
• Requires a constant Internet connection
• Does not work well with low‐speed connections
• Larger organizations can have applications more customizable
• Security and Privacy issues
• Cloud Service provider may go down
• Latency concerns
RESOURCES
1. https://www.hpcadvisorycouncil.com/pdf/Intro_to_HPC.pdf
2. https://computing.llnl.gov/tutorials/parallel_comp/#Whatis
3. https://www.cs.cmu.edu/~fp/courses/15213-s06/lectures/27-multicore.pdf
4. https://en.wikipedia.org/wiki/Cache_coherence
https://www.youtube.com/watch?v=A_i5kOlj_UU
https://www.youtube.com/watch?v=bkLVuNfiCVs
Ad

More Related Content

What's hot (20)

Introduction to High-Performance Computing
Introduction to High-Performance ComputingIntroduction to High-Performance Computing
Introduction to High-Performance Computing
Umarudin Zaenuri
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
Esteban Hernandez
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPT
Upender Upr
 
Parallel computing
Parallel computingParallel computing
Parallel computing
Vinay Gupta
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
VIKAS SINGH BHADOURIA
 
Introduction to High Performance Computing
Introduction to High Performance ComputingIntroduction to High Performance Computing
Introduction to High Performance Computing
Umarudin Zaenuri
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
pkaviya
 
GPU Computing
GPU ComputingGPU Computing
GPU Computing
Khan Mostafa
 
Neuromorphic computing
Neuromorphic computingNeuromorphic computing
Neuromorphic computing
SreekuttanJayakumar
 
Cluster computing
Cluster computingCluster computing
Cluster computing
pooja khatana
 
Introduction to HPC
Introduction to HPCIntroduction to HPC
Introduction to HPC
Chris Dwan
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
BOSS Webtech
 
High Performance Computing: an Introduction for the Society of Actuaries
High Performance Computing: an Introduction for the Society of ActuariesHigh Performance Computing: an Introduction for the Society of Actuaries
High Performance Computing: an Introduction for the Society of Actuaries
Adam DeConinck
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...
Pradeep Redddy Raamana
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applications
Burhan Ahmed
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
Divyen Patel
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)
Saksham Tanwar
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
Taras Zakharchenko
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
Sayed Chhattan Shah
 
Introduction to High-Performance Computing
Introduction to High-Performance ComputingIntroduction to High-Performance Computing
Introduction to High-Performance Computing
Umarudin Zaenuri
 
High performance computing for research
High performance computing for researchHigh performance computing for research
High performance computing for research
Esteban Hernandez
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPT
Upender Upr
 
Parallel computing
Parallel computingParallel computing
Parallel computing
Vinay Gupta
 
Introduction to High Performance Computing
Introduction to High Performance ComputingIntroduction to High Performance Computing
Introduction to High Performance Computing
Umarudin Zaenuri
 
CS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question BankCS8791 Cloud Computing - Question Bank
CS8791 Cloud Computing - Question Bank
pkaviya
 
Introduction to HPC
Introduction to HPCIntroduction to HPC
Introduction to HPC
Chris Dwan
 
High Performance Computing: an Introduction for the Society of Actuaries
High Performance Computing: an Introduction for the Society of ActuariesHigh Performance Computing: an Introduction for the Society of Actuaries
High Performance Computing: an Introduction for the Society of Actuaries
Adam DeConinck
 
High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...High performance computing tutorial, with checklist and tips to optimize clus...
High performance computing tutorial, with checklist and tips to optimize clus...
Pradeep Redddy Raamana
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applications
Burhan Ahmed
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
Divyen Patel
 
Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)Nvidia (History, GPU Architecture and New Pascal Architecture)
Nvidia (History, GPU Architecture and New Pascal Architecture)
Saksham Tanwar
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
Sayed Chhattan Shah
 

Similar to High performance computing (20)

Overview of HPC.pptx
Overview of HPC.pptxOverview of HPC.pptx
Overview of HPC.pptx
sundariprabhu
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
AbcvDef
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
Syed Zaid Irshad
 
CA UNIT IV.pptx
CA UNIT IV.pptxCA UNIT IV.pptx
CA UNIT IV.pptx
ssuser9dbd7e
 
Thread
ThreadThread
Thread
Syed Zaid Irshad
 
unit 4.pptx
unit 4.pptxunit 4.pptx
unit 4.pptx
SUBHAMSHARANRA211100
 
unit 4.pptx
unit 4.pptxunit 4.pptx
unit 4.pptx
SUBHAMSHARANRA211100
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)
Sudarshan Mondal
 
1 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c9
1 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c91 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c9
1 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c9
JamesSalcedo2
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
giddy5
 
Computer Architecture CSN221_Lec_37_SpecialTopics.pdf
Computer Architecture CSN221_Lec_37_SpecialTopics.pdfComputer Architecture CSN221_Lec_37_SpecialTopics.pdf
Computer Architecture CSN221_Lec_37_SpecialTopics.pdf
ssuser034ce1
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
Muhammad54342
 
This is Unit 1 of High Performance Computing For SRM students
This is Unit 1 of High Performance Computing For SRM studentsThis is Unit 1 of High Performance Computing For SRM students
This is Unit 1 of High Performance Computing For SRM students
cegafen778
 
parallel processing
parallel processingparallel processing
parallel processing
Sudarshan Mondal
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
Niranjana Ambadi
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programming
Shaveta Banda
 
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
parellelisum edited_jsdnsfnjdnjfnjdn.pptxparellelisum edited_jsdnsfnjdnjfnjdn.pptx
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
aravym456
 
Hpc 1
Hpc 1Hpc 1
Hpc 1
Yasir Khan
 
Parallel architecture &programming
Parallel architecture &programmingParallel architecture &programming
Parallel architecture &programming
Ismail El Gayar
 
Aca module 1
Aca module 1Aca module 1
Aca module 1
Avinash_N Rao
 
Overview of HPC.pptx
Overview of HPC.pptxOverview of HPC.pptx
Overview of HPC.pptx
sundariprabhu
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
AbcvDef
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
Syed Zaid Irshad
 
Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)Lec 2 (parallel design and programming)
Lec 2 (parallel design and programming)
Sudarshan Mondal
 
1 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c9
1 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c91 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c9
1 1c291nx981n98nun1nnc120102cn190n u90cn19nc 1c9
JamesSalcedo2
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
giddy5
 
Computer Architecture CSN221_Lec_37_SpecialTopics.pdf
Computer Architecture CSN221_Lec_37_SpecialTopics.pdfComputer Architecture CSN221_Lec_37_SpecialTopics.pdf
Computer Architecture CSN221_Lec_37_SpecialTopics.pdf
ssuser034ce1
 
Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
Muhammad54342
 
This is Unit 1 of High Performance Computing For SRM students
This is Unit 1 of High Performance Computing For SRM studentsThis is Unit 1 of High Performance Computing For SRM students
This is Unit 1 of High Performance Computing For SRM students
cegafen778
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
Niranjana Ambadi
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programming
Shaveta Banda
 
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
parellelisum edited_jsdnsfnjdnjfnjdn.pptxparellelisum edited_jsdnsfnjdnjfnjdn.pptx
parellelisum edited_jsdnsfnjdnjfnjdn.pptx
aravym456
 
Parallel architecture &programming
Parallel architecture &programmingParallel architecture &programming
Parallel architecture &programming
Ismail El Gayar
 
Ad

More from punjab engineering college, chandigarh (12)

Microgrippers
MicrogrippersMicrogrippers
Microgrippers
punjab engineering college, chandigarh
 
Grid computing
Grid computingGrid computing
Grid computing
punjab engineering college, chandigarh
 
Wireless personal area networks(PAN)
Wireless personal area networks(PAN)Wireless personal area networks(PAN)
Wireless personal area networks(PAN)
punjab engineering college, chandigarh
 
1g to 5g technologies
1g to 5g technologies1g to 5g technologies
1g to 5g technologies
punjab engineering college, chandigarh
 
Wireless lan
Wireless lanWireless lan
Wireless lan
punjab engineering college, chandigarh
 
satellite communication and deep space networks
satellite communication and deep space networkssatellite communication and deep space networks
satellite communication and deep space networks
punjab engineering college, chandigarh
 
Industry 4.0
Industry 4.0 Industry 4.0
Industry 4.0
punjab engineering college, chandigarh
 
Fracture mechanics
Fracture mechanicsFracture mechanics
Fracture mechanics
punjab engineering college, chandigarh
 
Carbon fiber and its features
Carbon fiber and its featuresCarbon fiber and its features
Carbon fiber and its features
punjab engineering college, chandigarh
 
Gear definitions and formulas
Gear definitions and formulasGear definitions and formulas
Gear definitions and formulas
punjab engineering college, chandigarh
 
Authors
AuthorsAuthors
Authors
punjab engineering college, chandigarh
 
gkn driveline report
gkn driveline reportgkn driveline report
gkn driveline report
punjab engineering college, chandigarh
 
Ad

Recently uploaded (19)

project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
White and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptxWhite and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptx
canumatown
 
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 
Computers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers NetworksComputers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers Networks
Tito208863
 
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
DataProvider1
 
DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)
APNIC
 
Understanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep WebUnderstanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep Web
nabilajabin35
 
IT Services Workflow From Request to Resolution
IT Services Workflow From Request to ResolutionIT Services Workflow From Request to Resolution
IT Services Workflow From Request to Resolution
mzmziiskd
 
5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx
andani26
 
Perguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolhaPerguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolha
socaslev
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry SweetserAPNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC
 
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation TemplateSmart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
yojeari421237
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 
project_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptxproject_based_laaaaaaaaaaearning,kelompok 10.pptx
project_based_laaaaaaaaaaearning,kelompok 10.pptx
redzuriel13
 
White and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptxWhite and Red Clean Car Business Pitch Presentation.pptx
White and Red Clean Car Business Pitch Presentation.pptx
canumatown
 
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC -Policy Development Process, presented at Local APIGA Taiwan 2025
APNIC
 
Best web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you businessBest web hosting Vancouver 2025 for you business
Best web hosting Vancouver 2025 for you business
steve198109
 
Computers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers NetworksComputers Networks Computers Networks Computers Networks
Computers Networks Computers Networks Computers Networks
Tito208863
 
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...Mobile database for your company telemarketing or sms marketing campaigns. Fr...
Mobile database for your company telemarketing or sms marketing campaigns. Fr...
DataProvider1
 
DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)DNS Resolvers and Nameservers (in New Zealand)
DNS Resolvers and Nameservers (in New Zealand)
APNIC
 
Understanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep WebUnderstanding the Tor Network and Exploring the Deep Web
Understanding the Tor Network and Exploring the Deep Web
nabilajabin35
 
IT Services Workflow From Request to Resolution
IT Services Workflow From Request to ResolutionIT Services Workflow From Request to Resolution
IT Services Workflow From Request to Resolution
mzmziiskd
 
5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx5-Proses-proses Akuisisi Citra Digital.pptx
5-Proses-proses Akuisisi Citra Digital.pptx
andani26
 
Perguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolhaPerguntas dos animais - Slides ilustrados de múltipla escolha
Perguntas dos animais - Slides ilustrados de múltipla escolha
socaslev
 
(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security(Hosting PHising Sites) for Cryptography and network security
(Hosting PHising Sites) for Cryptography and network security
aluacharya169
 
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry SweetserAPNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC Update, presented at NZNOG 2025 by Terry Sweetser
APNIC
 
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 SupportReliable Vancouver Web Hosting with Local Servers & 24/7 Support
Reliable Vancouver Web Hosting with Local Servers & 24/7 Support
steve198109
 
Determining Glass is mechanical textile
Determining  Glass is mechanical textileDetermining  Glass is mechanical textile
Determining Glass is mechanical textile
Azizul Hakim
 
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation TemplateSmart Mobile App Pitch Deck丨AI Travel App Presentation Template
Smart Mobile App Pitch Deck丨AI Travel App Presentation Template
yojeari421237
 
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHostingTop Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
Top Vancouver Green Business Ideas for 2025 Powered by 4GoodHosting
steve198109
 
OSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description fOSI TCP IP Protocol Layers description f
OSI TCP IP Protocol Layers description f
cbr49917
 
highend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptxhighend-srxseries-services-gateways-customer-presentation.pptx
highend-srxseries-services-gateways-customer-presentation.pptx
elhadjcheikhdiop
 

High performance computing

  • 2. CONTENTS • Introduction to HPC • Parallel Architectures • Multi Cores • Graphical Processing Units • Clusters • Grid Computing • Cloud Computing
  • 3. HIGH PERFORMANCE COMPUTING-HPC• Ability to process data and perform complex calculations at high speeds • One of the best-known types of HPC solutions is the supercomputer • Supercomputer contains thousands of compute nodes that work together to complete one or more tasks The IBM Blue Gene/P supercomputer "Intrepid" at Argonne National Laboratory runs 164,000 processor cores using normal data center air conditioning, grouped in 40 racks/cabinets connected by a high-speed 3-D torus network
  • 4. WHEN DO WE NEED HPC? • Case1: Complete a time-consuming operation in less time  I am an automotive engineer  I need to design a new car that consumes less gasoline  I’d rather have the design completed in 6 months than in 2 years  I want to test my design using computer simulations rather than building very expensive prototypes and crashing them • Case 2: Complete an operation under a tight deadline  I work for a weather prediction agency  I am getting input from weather stations/sensors  I’d like to predict tomorrow’s forecast today • Case 3: Perform a high number of operations per seconds  I am an engineer at Amazon.com  My Web server gets 1,000 hits per seconds  I’d like my web server and databases to handle 1,000 transactions per seconds so that customers do not experience bad delays
  • 5. WHAT DOES HPC INCLUDE? • High-performance computing is fast computing • Computations in parallel over lots of compute elements (CPU, GPU) • Very fast network to connect between the compute elements • Hardware • Computer Architecture • Vector Computers, Distributed Systems, Clusters • Network Connections • InfiniBand, Ethernet, Proprietary • Software • Programming models • MPI (Message Passing Interface), SHMEM (Shared Memory), PGAS, etc. • Applications • Open source, commercial
  • 6. HOW DOES HPC WORK? • HPC solutions have three main components: Compute Network Storage • To build a high-performance computing architecture, compute servers are networked together into a cluster • Software programs and algorithms are run simultaneously on the servers in the cluster • Cluster is networked to the data storage to capture the output • Together, these components operate seamlessly to complete a diverse set of tasks
  • 7. PARALLEL ARCHITECTURES • Traditionally, software has been written for serial computation • A problem is broken into a discrete series of instructions • Instructions are executed sequentially one after another • Executed on a single processor • Only one instruction may execute at any moment in time • Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem • A problem is broken into discrete parts that can be solved concurrently • Each part is further broken down to a series of instructions • Instructions from each part execute simultaneously on different processors • An overall control/coordination mechanism is employed
  • 9. PARALLEL ARCHITECTURES • Virtually all stand-alone computers today are parallel from a hardware perspective: Multiple functional units (L1 cache, L2 cache, branch, pre-fetch, decode, floating- point, graphics processing (GPU), integer, etc.) Multiple execution units/cores Multiple hardware threads • Networks connect multiple stand-alone computers (nodes) to make larger parallel computer cluster
  • 10. WHY USE PARALLEL ARCHITECTURES? • Save time and/or money • Solve larger / more complex problems • Provide concurrency • Take advantage of non-local resources • Make better use of underlying parallel hardware
  • 11. TYPES OF PARALLELISM • Data Parallelism  Focuses on distributing the data across different parallel computing nodes  Also called as loop-level parallelism  Example: CPU A could add all elements from the top half of the matrices, while CPU B could add all elements from the bottom half of the matrices  Since the two processors work in parallel, the job of performing matrix addition would take one half the time of performing the same operation in serial using one CPU alone • Task Parallelism  Focuses on distribution of tasks across different processors  Also known as functional parallelism or control parallelism  As a simple example, if we are running code on a 2-processor system (CPUs "a" & "b") in a parallel environment and we wish to do tasks "A" and "B" , it is possible to tell CPU "a" to do task "A" and CPU "b" to do task 'B" simultaneously, reducing the runtime of the execution
  • 12. PARALLEL ARCHITECTURES • Flynn's taxonomy distinguishes multi-processor computer architectures according to how they can be classified along the two independent dimensions of Instruction Stream and Data Stream • Flynn’s classical taxonomy of Parallel Architectures: • SISD – Single Instruction stream Single Data stream • SIMD – Single Instruction stream Multiple Data stream • MISD – Multiple Instruction stream Single Data stream • MIMD – Multiple Instruction stream Multiple Data stream
  • 13. FLYNN’S CLASSICAL TAXONOMYSISD SIMD MISD MIMD • Serial • Only one instruction and data stream is acted on during any one clock cycle • Examples: older generation mainframes, minicomputers, workstations and single processor/core PCs • All processing units execute the same instruction at any given clock cycle • Each processing unit operates on a different data element • Most modern computers, particularly those with GPUs employ SIMD instructions and execution units • Different instructions operated on a single data element. • Example: Multiple cryptography algorithms attempting to crack a single coded message • Can execute different instructions on different data elements. • Examples: Most current supercomputers, networked parallel computer clusters and "grids", multi-processor computers, multi-core PCs
  • 14. PARALLEL COMPUTER MEMORY ARCHITECTURES: SHARED MEMORY ARCHITECTURE• All processors access all memory as a single global address space & data sharing is fast • Multiple processors can operate independently but share the same memory resources • Changes in a memory location effected by one processor are visible to all other processors • Shared memory machines have been classified as UMA and NUMA, based upon memory access times Uniform Memory Access (UMA) Non-Uniform Memory Access (NUMA) • Commonly represented today by Symmetric Multiprocessor (SMP) machines •Identical processors, equal access and access times to memory •Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent means if one processor updates a location in shared memory, all the other processors know about the update. •Often made by physically linking two or more SMPs •One SMP can directly access memory of another SMP •Not all processors have equal access time to all memories •Memory access across link is slower •If cache coherency is maintained, then may also be called CC-NUMA - Cache Coherent NUMA
  • 15. PARALLEL COMPUTER MEMORY ARCHITECTURES: DISTRIBUTED MEMORY ARCHITECTURE• Each processor has its own memory • Programmer is responsible for many details of communication between processors • Each processor has its own local memory, it operates independently. Changes it makes to its local memory have no effect on the memory of other processors. Hence, the concept of cache coherency does not apply • When a processor needs access to data in another processor, it is usually the task of the programmer to explicitly define how and when data is communicate • Synchronization between tasks is likewise the programmer's responsibility
  • 16. MULTI CORES • A multi-core processor is a single computing component with two or more independent processing units called cores, which read and execute program instructions Multi-Core CPU Chip Single Core CPU Chip
  • 17. MULTI-CORES • The cores fit on a single processor socket • Also called CMP (Chip Multi-Processor) • The cores run in parallel • Interaction with OS: • OS perceives each core as a separate processor • OS scheduler maps threads/processes to different cores • Most major OS support multi-core today
  • 18. WHY MULTI-CORES? • Difficult to make single-core clock frequencies even higher • Deeply pipelined circuits: • heat problems • speed of light problems • difficult design and verification • large design teams necessary • server farms need expensive air-conditioning • Many new applications are multithreaded • General trend in computer architecture (shift towards more parallelism)
  • 19. MULTI-CORES • Multi-core processors are MIMD • Different cores execute different threads ( Multiple Instructions), operating on different parts of memory ( Multiple Data) • Multi-core is a shared memory multiprocessor • All cores share the same memory
  • 20. WHAT APPLICATIONS BENEFIT FROM MULTI-CORE? • Database servers • Web servers (Web commerce) • Compilers • Multimedia applications • Scientific applications, CAD/CAM • Editing a photo while recording a TV show through a digital video recorder • Downloading software while running an anti-virus program • Anything that can be threaded today will map efficiently to multi-core
  • 21. MULTI-CORES: CACHE COHERENCE PROBLEM • Cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches • When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, which is particularly the case with CPUs in a multi-core architecture • Coherence Mechanism: • Snooping: • Snooping based protocols tend to be faster, if enough bandwidth is available, since all transactions are a request/response seen by all processors • Snooping isn't scalable. Every request must be broadcast to all nodes in a system • Directory based • Tend to have longer latencies but use much less bandwidth since messages are point to point and not broadcast • For this reason, many of the larger systems (>64 processors) use this type of cache coherence
  • 22. MULTI-CORES: COHERENCE PROTOCOLS • Write-invalidate: • When a write operation is observed to a location that a cache has a copy of, the cache controller invalidates its own copy of the snooped memory location, which forces a read from main memory of the new value on its next access •Write-update: • When a write operation is observed to a location that a cache has a copy of, the cache controller updates its own copy of the snooped memory location with the new data
  • 23. GRAPHICAL PROCESSING UNITS- GPU • Processor optimized for 2D/3D graphics, video, visual computing, and display • Highly parallel, highly multithreaded multiprocessor optimized for visual computing • Provide real-time visual interaction with computed objects via graphics images, and video • Serves as both a programmable graphics processor and a scalable parallel computing platform • Heterogeneous Systems: combine a GPU with a CPU
  • 24. GPU EVOLUTION • 1980’s – No GPU. PC used VGA controller •1990’s – Add more function into VGA controller •1997 – 3D acceleration functions:  Hardware for triangle setup and rasterization  Texture mapping  Shading •2000 – A single chip graphics processor ( beginning of GPU term) •2005 – Massively parallel programmable processors •2007 – CUDA (Compute Unified Device Architecture) • 2010 – AMD’s Radeon cards, GeForce 10 series
  • 25. WHY GPU? • To provide a separate dedicated graphics resources including a graphics processor and memory. • To relieve some of the burden of the main system resources, namely the Central Processing Unit, Main Memory, and the System Bus, which would otherwise get saturated with graphical operations and I/O requests
  • 26. GPU VS CPU • A GPU is tailored for highly parallel operation while a CPU executes programs serially. • For this reason, GPUs have many parallel execution units , while CPUs have few execution units . • GPUs have significantly faster and more advanced memory interfaces as they need to shift around a lot more data than CPUs. • GPUs have much deeper pipelines (several thousand stages vs 10-20 for CPUs).
  • 27. COMPONENTS OF GPU • Graphics Processor • Graphics co-processor • Graphics accelerator • Frame buffer • Memory • Graphics BIOS • Digital-to-Analog Converter (DAC) • Display Connector • Computer (Bus) Connector
  • 28. CLUSTERS • A computer cluster is a group of loosely or tightly coupled computers that work together closely so that in many respects it can be viewed as though it were a single computer. • Connected through fast LAN. • Deployed to improve speed & reliability over that provided by a single computer, while typically being much more cost effective than single computer in terms of speed or reliability. • Middleware is required to manage them
  • 29. CLUSTERS • In cluster computing each node within a cluster is an independent system, with its own operating system, private memory, and, in some cases, its own file system • Processors on one node cannot directly access the memory on the other nodes, programs or software run on clusters usually employ a procedure called "message passing" to get data and execution code from one node to another
  • 30. NEED OF CLUSTERS • More computing power • Better reliability by orchestrating a number of low cost commercial off-the- shelf computers has given rise to a variety of architectures and configurations • Improve performance and availability over that of a single computer • More cost-effective than single computers of comparable speed or availability • E.g. Big Data
  • 31. TYPES OF CLUSTERS High Availability Clusters Load Balancing Clusters Compute Clusters • Provide uninterrupted availability of data or services (typically web services) to the end-user community • In case of node failure, service can be restored without affecting the availability of the services provided by the cluster. There will be a performance drop due to the missing node • Implementations: Data mining ,simulations, mission-critical applications or databases, mail, file and print, web, or application servers • E.g. Oracle Clusterware • Distributes incoming requests for resources or content among multiple nodes running the same programs or having the same content • Every node in the cluster is able to handle requests for the same content or application. • Typically, seen in a web-hosting environment • E.g. nginx as HTTP load balancer • Used for computation-intensive purposes, rather than handling IO- oriented operations such as web service or databases. • Compute clusters vary in the level of coupling • Jobs with frequent communications among nodes may require dedicated network, dense location & likely homogenous nodes • Jobs with infrequent communication between nodes may relax some of these requirements • E.g. Rocks package on Linux
  • 32. BEOWULF CLUSTERS • Uses parallel processing across multiple computers to create cheap and powerful supercomputers. • A cluster has two types of computers: • Master or service node or front node : Used to interact with users and manage the cluster. • Nodes : A group of computers (computing nodes) E.g. keyboard, mouse, floppy, video etc. • E.g. OSCAR on Linux • When a large problem or set of data is given to a Beowulf cluster, the master computer first runs a program that breaks the problem into small discrete pieces; it then sends a piece to each node to compute. As nodes finish their tasks, the master computer continually sends more pieces to them until the entire problem has been computed
  • 33. CLUSTERS-TECHNOLOGIES TO IMPLEMENT • Parallel Virtual Machine (PVM) • Must be directly installed on every cluster node & provides a set of software libraries that paint the node as a “parallel virtual machine” • Provides a run-time environment for : • Message-passing • Task & Resource management • Fault notification • Message Passing Interface (MPI) • Drew on various features available in commercial systems of the time. The MPI specifications then gave rise to specific implementations • Implementations typically use TCP/IP & socket connections • Widely available communications model that enables parallel programs to be written in languages such as: C, Fortran, Python, etc
  • 34. CLUSTER BENEFITS • Availability • Performance • Low Cost • Elasticity • Run Jobs Anytime Anywhere
  • 35. GRID COMPUTING • Grid computing combines computers from multiple administrative domains to reach a common goal • What distinguishes grid computing from Cluster systems such as cluster computing is that grids tend to be more loosely coupled, heterogeneous, and geographically dispersed • Special kind of distributed computing in which different computers within the same network share one or more resources
  • 37. TYPES OF GRID COMPUTING – DATA GRIDS • Allows you to distribute your data across the grid • Main goal of Data Grid is to provide as much data as possible from memory on every grid node and to ensure data coherency • Characteristics: a. Data Replication- all data is fully replicated to all nodes in the grid b. Data Invalidation- Whenever data changes on one of the nodes, then the same data on all other nodes is purged c. Distributed Transactions- Transactions are required to ensure Data Coherency d. Data Backups- Useful for fail-over. Some Data Grid products provide ability to assign backup nodes for the data e. Data Affinity/Partitioning- Allows to split/partition whole data set into multiple subsets and assign every subset to a grid node
  • 38. TYPES OF GRID COMPUTING – COMPUTE GRIDS • Allows to take a computation, optionally split it into multiple parts, and execute them on different grid nodes in parallel leads to faster rate of execution. E.g. MapReduce • Helps to improve overall scalability and fault-tolerance by offloading your computations onto most available nodes • Characteristics: a. Automatic Deployment b. Topology Resolution - allows to provision nodes based on any node characteristic or user-specific configuration c. Collision Resolution - Jobs are executed in parallel but synchronization is maintained d. Load Balancing – proper balancing of your system load within grid e. Checkpoints - Long running jobs should be able to periodically store their intermediate state f. Grid Events - a querying mechanism for all grid events is essential g. Node Metrics - a good compute grid solution should be able to provide dynamic grid metrics for all grid nodes
  • 39. GRID COMPUTING • Advantages:  Can solve larger, more complex problems in a shorter time  Easier to collaborate with other organizations  Make better use of existing hardware • Disadvantages:  Grid software and standards are still evolving  Learning curve to get started  Non-interactive job submission
  • 40. CLOUD COMPUTING • Computing paradigm shift where computing is moved away from personal computers or an individual application server to a “cloud” of computers. • Abstraction: Users of the cloud only need to be concerned with the computing service being asked for, as the underlying details of how it is achieved are hidden. • Virtualization: Cloud Computing virtualizes system by pooling and sharing resources
  • 41. NIST DEFINITION OF CLOUD COMPUTING • Cloud computing is a model for enabling ubiquitous, convenient, on‐demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction
  • 42. CHARACTERISTICS OF CLOUD COMPUTING 1. On‐demand self‐service 2. Broad network access 3. Resource pooling 4. Rapid elasticity 5. Measured service
  • 43. CLOUD COMPONENTS • Clients • Data Center (Collection of Servers where the application to which you subscribe is housed) • Internet
  • 44. CLOUD COMPUTING- BENEFITS • Lower Costs • Lower computer costs • Reduced Software Costs • By using the Cloud infrastructure on “pay as used and on demand”, all of us can save in capital and operational investment! • Ease of utilization • Quality of Service • Reliability • Outsourced IT management • Simplified maintenance and upgrade • Low Barrier to Entry • Unlimited storage capacity • Universal document access • Latest version availability
  • 45. CLOUD COMPUTING - LIMITATIONS • Requires a constant Internet connection • Does not work well with low‐speed connections • Larger organizations can have applications more customizable • Security and Privacy issues • Cloud Service provider may go down • Latency concerns
  • 46. RESOURCES 1. https://www.hpcadvisorycouncil.com/pdf/Intro_to_HPC.pdf 2. https://computing.llnl.gov/tutorials/parallel_comp/#Whatis 3. https://www.cs.cmu.edu/~fp/courses/15213-s06/lectures/27-multicore.pdf 4. https://en.wikipedia.org/wiki/Cache_coherence