Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
Unit-1 (Cloud Computing) 1. (Accessible) Scalable Computing Over The Internet
A P2P system is built over many client machines. Peer machines are
globally distributed in nature.
P2P, cloud computing, and web service platforms are more focused on HTC
applications than on HPC applications.
Clustering and P2P technologies are lead to the development of
computational grids or data grids.
High-Performance Computing
For many years, HPC systems emphasize the raw speed performance.
The speed of HPC systems has increased from Gflops in the early 1990s to
now Pflops in 2010.
High-Throughput Computing
All resources (processors, memory, and storage) are fully shared and tightly
coupled within one integrated OS (one module is dependent on other
module).
Many data centers and supercomputers are centralized systems, but they are
used in parallel, distributed, and cloud computing applications.
Parallel Computing
Distributed Computing
All processors are either tightly coupled with centralized shared memory or
consists of multiple autonomous computers, each having its own private
memory, communicating through a computer network.
Information exchange in a distributed system is accomplished through
message passing.
A computer program that runs in a distributed system is known as a
distributed program.
The process of writing distributed programs is referred to as distributed
programming.
Cloud Computing
Clouds can be built with physical or virtualized resources over large data
centers that are centralized or distributed.
The following technologies for Network based Systems are used as follows:
Processor speed is measured in millions of instructions per second (MIPS) and network
bandwidth is measured in megabits per second (Mbps) or gigabits per second (Gbps). The
unit GE (Gigabits Ethernet) refers to 1 Gbps Ethernet bandwidth.
Advances in CPU Processors advanced CPUs or microprocessor chips assume a
multicore architecture with dual, quad, six, or more processing cores. These processors
exploit parallelism at ILP and TLP levels.
ILP mechanisms include multiple-issue superscalar architecture (CPU that implements a
form of parallelism namely instruction level parallelism), dynamic branch prediction,
and speculative (theoritical) execution, among others. These ILP techniques demand
hardware and compiler support.
In addition, DLP(Data Loss Prevention) and TLP(Transmission Line Pulse is a integrated
circuit behavior in the current and time domin) are highly explored in graphics processing
units (GPUs) that adopt many-core architecture with hundreds to thousands of simple
cores.
Both multi-core CPU and many-core GPU processors can handle multiple instruction
threads at different magnitudes today.
Figure 1.5 shows the architecture of a typical multicore processor.
Each core is essentially a processor with its own private cache (L1 cache).
Multiple cores are housed in the same chip with an L2 cache that is shared by all cores.
In the future, multiple CMPs could be built on the same CPU chip with even the L3 cache on
the chip.
Multicore and multithreaded CPUs are equipped with many high-end processors, including
the Intel i7, Xeon, AMD Opteron, Sun Niagara, IBM Power 6, and X cell processors. Each
core could be also multithreaded.
Multicore CPU and Many-Core GPU Architectures:
Multicore CPUs may increase from the tens of cores to hundreds or more in the future.
However, the CPU has reached its limit in terms of exploiting massive DLP due to the
aforementioned memory wall problem. This has triggered the development of many-core
GPUs with hundreds or more thin cores.
Both (is the 32-bit version of the x86 instruction set architecture designed by Intel and
first implemented in the 80386 micropressor) IA-32 and IA-64 Instruction Set
architectures are built into commercial CPUs. Now, x-86 processors have been extended
to serve HPC and HTC systems in some high-end server processors.
Many (Reduced Instruction Set Computer which allows a computers microprocessor to
have fewer cycles per instruction than a complex instruction set computer) RISC
processors have been replaced with multicore x-86 processors and many-core GPUs in
the Top 500 systems. This tendency indicates that x-86 upgrades will dominate in data
centers and supercomputers.
The GPU(Graphic Processing Unit is a Programmable logic chip specialized for display
functions for the puposes of GPU renders images, animations and Vedio Processing
computer screen. Also, GPUs are located on plug-in cards in a chipset on the
motherboard) also has been applied in large clusters to build supercomputers in MPPs.
In the future, the processor industry is also keen to develop asymmetric or heterogeneous
chip multiprocessors that can house both fat CPU cores and thin GPU cores on the same
chip.
Today, the NVIDIA GPU has been upgraded to 128 cores on a single chip. Furthermore,
each core on a GPU can handle eight threads of instructions.
This translates to having up to 1,024 threads executed concurrently on a single GPU. This
is true massive parallelism, compared to only a few threads that can be handled by a
conventional CPU.
The CPU is optimized for latency caches, while the GPU is optimized to deliver much
higher throughput with explicit management of on-chip memory.
GPU offloads (relieves) the CPU from all data-intensive (concentrated) calculations, not
just those are related to video processing.
Conventional GPUs are widely used in mobile phones, game consoles, embedded
systems, PCs, and servers.
The NVIDIA CUDA Tesla or Fermi is used in GPU clusters or in HPC systems for
parallel processing of massive floating-pointing data.
GPU Programming Model
Figure 1.7 shows the interaction between a CPU and GPU in performing parallel
execution of floating-point operations concurrently.
The CPU is the conventional multicore processor with limited parallelism to exploit.
The GPU has a many-core architecture that has hundreds of simple processing cores
organized as multiprocessors.
Each core can have one or more threads.
Essentially, the CPU’s floating-point kernel computation role is largely offloaded to the
many-core GPU.
The CPU instructs the GPU to perform massive data processing.
The bandwidth must be matched between the on-board main memory and the on-chip
GPU memory.
This process is carried out in NVIDIA’s CUDA programming using the GeForce 8800 or
Tesla and Fermi GPUs.
A conventional computer has a single OS image. This offers an inflexible architecture that
tightly couples application software to a specific hardware platform.
Some software running well on one machine may not be executable on another platform with
a different instruction set under a fixed OS.
Virtual machines (VMs) offer novel solutions to under utilized resources, application
inflexibility, software manageability, and security concerns in existing physical machines.
Currently, to build large clusters, large grids (networks), and large clouds, we need to access
large amounts of computing, storage, and networking resources in a virtualized manner.
We need to aggregate those resources, and confidently, offer a single system image.
In particular, a cloud of provisioned resources must rely on virtualization of processors,
memory, and I/O facilities dynamically.
Virtual Machines
First, the VMs can be multiplexed between hardware machines, as shown in Figure 1.13(a).
o Second, a VM can be suspended and stored in stable storage, as shown in Figure 1.13(b).
o Third, a suspended VM can be resumed or provisioned to a new hardware platform, as
shown in Figure 1.13(c).
o Finally, a VM can be migrated from one hardware platform to another, as shown in Figure
1.13(d).
o VM approach will significantly enhance the utilization of server resources.
o Multiple server functions can be consolidated on the same hardware platform to achieve higher
system efficiency.
Data Center Virtualization for Cloud Computing - Data Center Growth and Cost Breakdown
Convergence of Technologies
o Essentially, cloud computing is enabled by the convergence of technologies in four areas:-
First, the VMs can be multiplexed between hardware machines, as shown in Figure
1.13(a).
Second, a VM can be suspended (Postponed) and stored in steady storage, as shown in
Figure 1.13(b).
Third, a suspended VM can be resumed (recommenced) or provisioned to a new
hardware platform, as shown in Figure 1.13(c).
Finally, a VM can be traveled from one hardware platform to another, as shown in Figure
1.13(d).
VM approach will significantly improve the utilization of server resources.
Multiple server functions can be combined on the same hardware platform to achieve
higher system efficiency.
Data Center Virtualization for Cloud Computing - Data Center Growth and Cost
Breakdown
A large data center may be built with thousands of servers.
Smaller data centers are typically built with hundreds of servers.
The cost to build and maintain data center servers has increased over the years.
Low-Cost Design Philosophy
High-end switches or routers may be too cost-prohibitive for building data centers.
Thus, using high-bandwidth networks may not fit the economics of cloud computing.
Convergence of Technologies