High Performance Cluster Computing:: Architectures and Systems
High Performance Cluster Computing:: Architectures and Systems
Computing:
Architectures and Systems
Aerospace
Internet &
Life Sciences
Ecommerce
Get Help
Computer Analogy
Using faster hardware
Optimized algorithms and techniques used to
SMP
2-64 processors today
Shared-everything architecture
All processors share all the global resources available
Single copy of the OS runs on these systems
Scalable Parallel Computer
Architectures
CC-NUMA
a scalable multiprocessor system having a cache-coherent nonuniform
memory access architecture
every processor has a global view of all of the memory
Clusters
a collection of workstations / PCs that are interconnected by a high-speed
network
work as an integrated collection of resources
have a single system image spanning all its nodes
Distributed systems
considered conventional networks of independent computers
have multiple system images as each node runs its own OS
the individual machines could be combinations of MPPs, SMPs, clusters, &
individual computers
Rise and Fall of Computer
Architectures
Vector Computers (VC) - proprietary system:
provided the breakthrough needed for the emergence of
computational science, buy they were only a partial answer.
Massively Parallel Processors (MPP) -proprietary
systems:
high cost and a low performance/price ratio.
Symmetric Multiprocessors (SMP):
suffers from scalability
Distributed Systems:
difficult to use and hard to extract parallel performance.
Clusters - gaining popularity:
High Performance Computing - Commodity Supercomputing
High Availability Computing - Mission Critical Applications
Top500 Computers Architecture
(Clusters share is growing)
The Dead Supercomputer Society
http://www.paralogos.com/DeadSuper/
ACRI Dana/Ardent/Stellar
Alliant Elxsi
American ETA Systems
Supercomputer Evans & Sutherland
Ametek Computer Division
Applied Dynamics Floating Point Systems Meiko
Astronautics Myrias
Galaxy YH-1
Convex C4600
BBN Goodyear Aerospace MPP Thinking
CDC
Gould NPL Machines
Convex
Cray Computer
Guiltech Saxpy
Cray Research (SGI?
Intel Scientific Scientific
Tera) Computers Computer
Culler-Harris Intl. Parallel Machines Systems (SCS)
Culler Scientific KSR Soviet
Cydrome MasPar Supercomputers
Suprenum
Vendors: Specialised ones (e.g.,
TMC) disappeared, new emerged
Computer Food Chain: Causing the
demise of specialize systems
PDA
Clusters
& OS
A cluster:
generally 2 or more computers (nodes) connected together
in a single cabinet, or physically separated & connected via a
LAN
appear as a single system to users and applications
provide a cost-effective way to gain features and benefits
Cluster Architecture
Parallel Applications
Parallel Applications
Parallel Applications
Sequential Applications
Sequential Applications
Sequential Applications Parallel Programming Environment
Cluster Middleware
(Single System Image and Availability Infrastructure)
UP
Common Cluster Modes
High Performance (dedicated).
High Throughput (idle cycle
harvesting).
High Availability (fail-over).
Shared Pool of
Computing Resources:
Processors, Memory, Disks
Interconnect
SMPs (CLUMPS)
Grid Computing
System CPUs
Processors
Intel x86-class Processors
Pentium Pro and Pentium Xeon
AMD x86, Cyrix x86, etc.
Digital Alpha – phased out when HP acquired it.
Alpha 21364 processor integrates processing, memory
controller, network interface into a single chip
IBM PowerPC
Sun SPARC
Scalable Processor Architecture)
SGI MIPS
Microprocessor without Interlocked Pipeline
Stages
System Disk
Disk and I/O
Overall improvement in disk access time has
been less than 10% per year
Amdahl’s law
Speed-up obtained by from faster processors is
limited by the slowest system component
Parallel I/O
Carry out I/O operations in parallel, supported
by parallel file system based on hardware or
software RAID
Commodity Components for
Clusters (II): Operating Systems
Operating Systems
2 fundamental services for users
make the computer hardware easier to use
create a virtual machine that differs markedly from the real
machine
share hardware resources among users
Processor - multitasking
The new concept in OS services
support multiple threads of control in a process itself
parallelism within a process
multithreading
POSIX thread interface is a standard programming environment
Trend
Modularity – MS Windows, IBM OS/2
Microkernel – provide only essential OS services
high level abstraction of OS portability
Prominent Components of
Cluster Computers
State of the art Operating Systems
Linux (MOSIX, Beowulf, and many more)
Microsoft NT (Illinois HPVM, Cornell Velocity)
SUN Solaris (Berkeley NOW, C-DAC PARAM)
IBM AIX (IBM SP2)
HP UX (Illinois - PANDA)
Mach (Microkernel based OS) (CMU)
Cluster Operating Systems (Solaris MC, SCO
Unixware, MOSIX (academic project)
OS gluing layers (Berkeley Glunix)
Operating Systems used in Top500
Powerful computers
Prominent Components of
Cluster Computers (III)
High Performance Networks/Switches
Ethernet (10Mbps),
latency)
ATM (Asynchronous Transfer Mode)
Myrinet (1.28Gbps)
InfiniBand
Prominent Components of
Cluster Computers (IV)
Fast Communication Protocols and
Services (User Level
Communication):
Active Messages (Berkeley)
Fast Messages (Illinois)
U-net (Cornell)
XTP (Virginia)
High Throughput
High Availability
Clusters Classification (I)
Application Target
High Performance (HP) Clusters
Grand Challenging Applications
High Availability (HA) Clusters
Mission Critical applications
Clusters Classification (II)
Node Ownership
Dedicated Clusters
Non-dedicated clusters
Node Hardware
Clusters of PCs (CoPs)
Piles of PCs (PoPs)
Clusters of Workstations
(COWs)
Clusters of SMPs (CLUMPs)
Clusters Classification (IV)
Node Operating System
Linux Clusters (e.g., Beowulf)
Solaris Clusters (e.g., Berkeley
NOW)
AIX Clusters (e.g., IBM SP2)
HP-UX clusters
Node Configuration
Homogeneous Clusters
All nodes will have similar architectures
and run the same OSs
Heterogeneous Clusters
All nodes will have different
architectures and run different OSs
Clusters Classification (VI)
Levels of Clustering
Group Clusters (#nodes: 2-99)
Nodes are connected by SAN like Myrinet
Departmental Clusters (#nodes: 10s to 100s)
Organizational Clusters (#nodes: many 100s)
National Metacomputers (WAN/Internet-
based)
International Metacomputers (Internet-based,
#nodes: 1000s to many millions)
Grid Computing
Web-based Computing
Peer-to-Peer Computing
Single System Image
func1
func1( () ) func2
func2( () ) func3
func3( () )
{{ {{ {{
Medium grain
Threads ....
....
....
....
....
....
....
....
.... (control level)
.... .... ....
}} }} }} Function (thread)
Fine grain
Compilers
aa( (00) )=..
=.. aa( (11)=..
)=.. aa( (22)=..
)=..
bb( (00) )=..
=.. bb( (11)=..
)=.. bb( (22)=..
)=..
(data level)
Loop (Compiler)
processors
In uniprocessor systems
Used to utilize the system resources effectively
Internet Applications:
ASPs (Application Service Providers);
Computing Portals;