0% found this document useful (0 votes)
10 views

Unit 4 Notes

bct
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Unit 4 Notes

bct
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Unit:4

1.()()()()()

Cloud Resource Management and Scheduling

1. Core Role of Resource Management

o Impacts performance, functionality, and cost of a system.

o Inefficient resource management negatively affects performance and cost, indirectly


affecting functionality.

2. Challenges in Cloud Resource Management

o Clouds are complex systems with shared resources facing unpredictable requests and
external events.

o Requires complex policies and multi-objective optimization.

o Accurate global state information is unattainable due to system complexity and


unpredictable interactions.

3. Cloud Delivery Models and Resource Management

o IaaS, PaaS, and SaaS strategies vary significantly.

o Cloud elasticity is challenged by fluctuating loads.

o Planned spikes can be handled with advance provisioning (e.g., seasonal traffic).

o Unplanned spikes need Auto Scaling, reliant on:

 A pool of on-demand resources.

 A monitoring system with real-time control loops.

4. Auto Scaling and its Limitations

o PaaS services (e.g., Google App Engine) support Auto Scaling.

o IaaS scaling is more challenging due to lack of standardization.

5. Decentralized and Autonomic Policies

o Centralized control is unsuitable for cloud systems due to unpredictability and scale.

o Autonomic policies are essential to manage large-scale systems, high request


volumes, and fluctuating loads.

6. Resource Allocation and Scheduling

o Includes mechanisms like control theory, machine learning, utility models, and
combinatorial auctions.

o Specific scheduling algorithms:

 Fair queuing, start-time fair queuing, borrowed virtual time scheduling.


 Scheduling with deadlines.

o The impact of application scaling on resource management is significant.

2.()()()()()()

Policies for Resource Management

1. Admission Control:

o Ensures workloads comply with high-level system policies.

o Prevents accepting tasks that hinder ongoing operations.

o Relies on global state information, often outdated in dynamic systems.

2. Capacity Allocation:

o Allocates resources for service instances.

o Requires navigating large search spaces and multiple global optimization constraints.

3. Load Balancing:

o Traditional Approach: Distributes loads evenly among servers.

o Cloud Approach: Consolidates load to minimize active servers, reducing energy


consumption by switching idle servers to standby.

4. Energy Optimization:

o Reduces energy costs by optimizing server utilization.

o Uses techniques like dynamic voltage and frequency scaling (DVFS).

5. Quality-of-Service (QoS):

o Ensures performance consistency despite dynamic conditions.

o Critical for cloud service reliability.

Mechanisms for Resource Management

1. Control Theory:

o Uses feedback to ensure system stability and predict transient behaviors.

o Limited to local rather than global predictions.

2. Machine Learning:

o Avoids reliance on detailed performance models.

o Useful for coordinating multiple autonomic managers.

3. Utility-Based Approaches:

o Require performance models to link user-level performance and cost.


4. Market-Oriented Mechanisms:

o Examples: Combinatorial auctions for resource bundles.

o Operate without requiring system models.

Workload Management

1. Interactive Workloads (e.g., Web Services):

o Focus on flow control and dynamic application placement.

2. Noninteractive Workloads:

o Prioritize scheduling tasks efficiently.

3. Heterogeneous Workloads:

o Combine interactive and noninteractive tasks, requiring tailored management.

Energy-Performance Tradeoffs

1. Dynamic Voltage and Frequency Scaling (DVFS):

o Reduces power consumption by lowering voltage and frequency.

o Balances energy savings with minimal performance impact (e.g., 18% energy savings
at 1.8 GHz with only 5% performance loss).

Limitations of Existing Techniques

1. Mechanisms do not scale effectively for large systems.

2. Many focus on a single aspect (e.g., admission control) while ignoring others like energy
optimization.

3. Require complex, often impractical computations within short response times.

4. Based on overly simplistic assumptions (e.g., servers are protected from overload).

5. Rarely include energy or QoS tradeoffs in performance models.

3.()()()()()()

Applications of Control Theory to Task Scheduling on a Cloud

1. Overview of Control Theory in Resource Management

o Adaptive resource management applied to power management, task scheduling,


QoS adaptation in web servers, and load balancing.

o Uses classical feedback control methods for regulating system parameters.


o Assumes linear time-invariant systems with closed-loop controllers, stability, and
sensitivity constraints.

2. Optimal Resource Management Technique

o Incorporates multiple QoS objectives and operating constraints into a cost function.

o Applicable to systems like web servers, database servers, application servers, and
mobile/embedded systems.

3. Optimal Control Principles

o Optimizes control inputs over a look-ahead horizon.

o Uses a convex cost function with state variables x(k)x(k)x(k) and control vector
u(k)u(k)u(k).

o Solves discrete-time optimization problems over a horizon to minimize a cost


function.

4. Lagrange Multiplier Method

o Finds optimal solutions by introducing a multiplier λ\lambdaλ to represent


constraints.

o Forms an adjoint system with state and costate equations to solve two-point
boundary problems.

5. Single-Processor System Model

o Uses a queuing model to manage a stream of input requests.

o Assumes processors operate at variable frequencies u(k)u(k)u(k) with scaling factors


and estimates processing rates N(k)N(k)N(k).

6. Queue Dynamics and Response Time

o Queue size at k+1k+1k+1: q(k+1)=max⁡[q(k)+λ^(k)−u(k)c^(k)⋅umax⋅Ts,0]q(k+1) = \


max\left[q(k) + \hat{\lambda}(k) - \frac{u(k)}{\hat{c}(k) \cdot u_{\text{max}}} \cdot
T_s, 0 \right]q(k+1)=max[q(k)+λ^(k)−c^(k)⋅umaxu(k)⋅Ts,0]

o Response time ω(k)\omega(k)ω(k): ω(k)=(1+q(k))⋅c^(k)\omega(k) = (1 + q(k)) \cdot \


hat{c}(k)ω(k)=(1+q(k))⋅c^(k)

7. Utility and Energy Functions

o Utility: Quadratic function of response time. S(q(k))=12⋅s⋅(ω(k)−ω0)2S(q(k)) = \frac{1}


{2} \cdot s \cdot (\omega(k) - \omega_0)^2S(q(k))=21⋅s⋅(ω(k)−ω0)2

o Energy: Quadratic function of operating frequency. R(u(k))=12⋅r⋅u(k)2R(u(k)) = \


frac{1}{2} \cdot r \cdot u(k)^2R(u(k))=21⋅r⋅u(k)2

8. Cost Function and Constraints

o Total cost: J=S(q(N))+∑k=1N−1[S(q(k))+R(u(k))]J = S(q(N)) + \sum_{k=1}^{N-1} [S(q(k))


+ R(u(k))]J=S(q(N))+k=1∑N−1[S(q(k))+R(u(k))]
o Constraints: q(k+1)=q(k)+λ^(k)−u(k)c^(k)⋅umax⋅Tsq(k+1) = q(k) + \hat{\lambda}(k) - \
frac{u(k)}{\hat{c}(k) \cdot u_{\text{max}}} \cdot T_sq(k+1)=q(k)+λ^(k)−c^(k)⋅umax
u(k)⋅Ts q(k)≥0, umin≤u(k)≤umaxq(k) \geq 0, \, u_{\text{min}} \leq u(k) \leq u_{\
text{max}}q(k)≥0,umin≤u(k)≤umax

9. Hamiltonian Formulation

o Includes system dynamics, utility, energy, and boundary constraints.

o Lagrange multipliers reflect sensitivity to cost functions and constraints.

10. Challenges in Scaling to Cloud Systems

o Extending methods from single-server systems to large-scale cloud environments is


complex.

o Cloud systems require additional considerations, such as workflow complexity and


coordination among multiple servers.

4.()()()()()

Stability of a Two Level Resource Allocation

The key points of the two-level resource allocation architecture described in the provided text are:

1. Two-Level Architecture:

o Application Controller: Operates at the application level, managing individual


application performance based on specific Service Level Agreements (SLAs).

o Cloud Controller: Operates at the cloud platform level, managing the overall cloud
infrastructure, including resource allocation and energy optimization.

2. Components of the Control System:

o Inputs: Workload, admission control policies, capacity allocation, load balancing,


energy optimization, and Quality of Service (QoS) guarantees.

o System Components: Sensors (for performance measurement) and controllers (to


implement policies).

o Outputs: Resource allocation to individual applications.

3. Sources of Instability:

o Delays: Reaction delay after a control action.

o Control Granularity: Small controller changes leading to significant output changes.


o Oscillations: Large input changes propagate to the output when control is weak.

4. Types of Policies:

o Threshold-Based Policies: Simple, rely on upper and lower performance bounds for
triggering adaptation.

o Sequential Decision Policies: Use Markovian models to determine actions


dynamically.

5. Key Lessons for Stability:

o Controllers must adjust actions only after system performance stabilizes.

o Avoid setting upper and lower thresholds too close together to prevent oscillations.

o Consider stabilization time before making further adjustments.

o Single VM allocation/deallocation can sometimes cause threshold crossing, leading


to further instability.

This architecture emphasizes feedback control to ensure the cloud system remains stable while
efficiently managing resources.

5.()()()()()

Key Challenges in Control Systems for Cloud Resource Management:

1. Control System Components:

o Sensors: Measure parameters of interest (e.g., system load).

o Monitors: Analyze sensor data to decide if actions are needed.

o Actuators: Implement the necessary changes (e.g., scaling resources).

2. Admission Control Policy:

o A typical policy limits additional load when system load reaches a defined threshold
(e.g., 80%).

3. Challenges in Implementation:

o Load Estimation Issues:


 The system load changes rapidly, making accurate measurement difficult.

 The large number of servers in a cloud adds complexity to real-time


monitoring.

o Service-Level Agreement (SLA) Constraints:

 Users’ resource demands must be met once SLAs are established.

 Denying user requests within SLA limits is not feasible, even when system
load is high.

Thresholds in Control Systems:

1. Threshold Definition:
A threshold is a predefined value of a system parameter that triggers a change in the
system's behavior. Thresholds are used to keep critical parameters within acceptable limits in
a control system.

2. Types of Thresholds:

o Static Threshold: A fixed value, set once, that doesn't change over time.

o Dynamic Threshold: A flexible threshold that adapts over time, often based on
averages or multiple parameters. For example, it could be the average of
measurements over a specific time period (integral control).

3. High and Low Thresholds:

o High Threshold: When the system parameter exceeds this value, the system may
reduce or limit certain activities to avoid overload.

o Low Threshold: When the system parameter falls below this value, the system may
increase activities or resource usage to meet demand.

4. Control Granularity:

o Fine Control: Uses highly detailed information about system parameters, allowing for
precise adjustments.

o Coarse Control: Involves less detailed information, trading off precision for simplicity
and efficiency. Coarse control is often preferred in large systems like clouds, where
efficiency is a priority.

Proportional Thresholding: Application of these ideas to cloud computing, in particular to the IaaS

delivery model, and a strategy for resource management called proportional thresholding.

1.Two Types of Controllers:

 Application Controllers: Decide when additional resources are needed for an application.

 Cloud Controllers: Manage resource allocation at the cloud level, ensuring that physical
resources are appropriately distributed.
2. Proportional Thresholding Algorithm:
The proposed algorithm for resource management in cloud computing using proportional
thresholding works as follows:

 Step 1: Calculate the high and low thresholds as averages of the maximum and minimum
CPU utilization, respectively, over a historical period.

 Step 2: If the current average CPU utilization exceeds the high threshold, request additional
VMs.

 Step 3: If the current average CPU utilization falls below the low threshold, release a VM.

6.()()()()()

Coordination of Specialized Autonomic Performance Managers

The goal of the research presented in the paper from IBM in 2007 is to explore
whether specialized autonomic performance managers can cooperate to optimize
power consumption while ensuring that service-level agreement (SLA)
requirements are satisfied.

Key Concepts and Mechanisms:

1. Dynamic Voltage Scaling (DVS):


o Energy-saving mechanism used in modern processors.
o Power dissipation scales quadratically with supply voltage, so adjusting
voltage and frequency can reduce power consumption.
o CPU frequency control impacts instruction execution rates, which may
decrease performance for compute-intensive workloads.
2. Power and Performance Coordination:
o The coordination between performance managers and power managers is
central to achieving both optimal energy consumption and SLA compliance.
o The system architecture involves:
 Performance Manager: Monitors and adjusts performance (e.g.,
response times) to meet SLA requirements.
 Power Manager: Controls power usage and adjusts the CPU clock
frequency based on the energy optimization policy.
3. Joint Utility Function:
o A joint utility function Upp(R,P)U_{pp}(R, P)Upp(R,P) combines response
time (R) and power consumption (P), helping balance the two goals.
 The utility function can take forms like:
 Upp(R,P)=U(R)−ϵ×PU_{pp}(R, P) = U(R) - \epsilon \times
PUpp(R,P)=U(R)−ϵ×P, where U(R)U(R)U(R) is a response
time utility function and ϵ\epsilonϵ is a weighting parameter.
 Upp(R,P)=U(R)PU_{pp}(R, P) = \frac{U(R)}{P}Upp
(R,P)=PU(R), where performance is maximized relative to
power consumption.
o This function is used to determine the optimal power cap for each server or
blade.
4. Power Cap Optimization:
o A power cap (pκp_\kappapκ) is used to restrict the power consumption of
each server.
o The optimal power cap pκopt(nc)p_\kappa^{opt}(n_c)pκopt(nc) depends on
the number of clients ncn_cnc and aims to maximize the utility function for a
given workload.
5. System Setup:
o The hardware used for the experiments consisted of Goldensbridge blades
with Intel Xeon processors running at 3 GHz, equipped with 1 GB of L2
cache and 2 GB of DRAM. Hyperthreading was enabled.
o The power management system adjusts the processor's frequency based on the
power cap, ranging from 375 MHz at low loads to 3 GHz at high loads.
o The response time target is set to be under 1,000 ms.

7.()()()()()

A utility-based model for cloud-based Web services

The model you described presents a utility-based approach to manage cloud-based web
services, focusing on optimizing the allocation of resources in a cloud environment while
meeting service-level agreement (SLA) requirements. Here’s a breakdown of the core
components:

Utility Function and SLA

The utility function represents the trade-off between the benefits (revenue or penalties) and
the cost of providing services, where the benefit could relate to factors like revenue or SLA-
compliant performance, and the cost often refers to resource consumption (such as power).
The utility function helps optimize this trade-off by guiding resource allocation decisions.

 Utility Function: A typical utility function, U(R)U(R)U(R), used for web services
might be piecewise, with the reward increasing as response time decreases until a
certain threshold, at which penalties kick in if the response time exceeds an SLA-
defined limit. In figure 6.4, the utility function could be approximated by a quadratic
curve, capturing the benefit versus cost relationship as response times vary.

Optimization Problem
The goal of the optimization is to maximize profit—the difference between the revenue
guaranteed by the SLA and the total cost of providing services, which involves complex
decisions on server resource allocation, frequency scaling, and server management. The
optimization model involves a mixture of revenues, costs, and constraints:

 Revenue and Penalty Terms: These reflect the income or penalties derived from
maintaining specific performance levels.
 Cost Terms: These capture the costs related to running servers in various states
(active or idle), including the power consumption, memory use, and migration costs
for virtual machines.

This results in a mixed-integer nonlinear programming problem (MINLP) with multiple


decision variables (e.g., whether a server is running, at which frequency, whether requests are
assigned to specific servers) and constraints (e.g., capacity limits, SLA requirements).

Decision Variables

Some key decision variables include:

 xix_ixi: Whether server iii is running or idle.


 yi,hy_{i,h}yi,h: Whether server iii is running at frequency hhh.
 zi,k,jz_{i,k,j}zi,k,j: Whether a class kkk request's application tier jjj runs on server iii.
 λi,k,j\lambda_{i,k,j}λi,k,j: The execution rate of class kkk requests on server iii.
 ϕi,k,j\phi_{i,k,j}ϕi,k,j: Fraction of server capacity assigned to class kkk, tier jjj
requests.

Constraints

The model has several constraints, which govern the behavior of the cloud system:

1. Traffic and resource allocation constraints (e.g., request load, server capacity).
2. Server state constraints (e.g., a server can only run at one frequency at a time).
3. SLA and availability constraints (ensuring that the minimum availability required by
the SLA is met).

Challenges and Scalability

The optimization problem is computationally expensive due to the large number of servers,
applications, decision variables, and constraints. This lack of scalability makes the approach
impractical for large-scale cloud environments.

For large-scale systems, this approach might require simplifications or more efficient
algorithms (e.g., heuristic or machine learning-based approaches) to handle the complexity
and reduce computation time. Additionally, cloud systems with hundreds or thousands of
servers need a model that can handle distributed resource management dynamically.

This utility-based approach provides a rigorous framework for autonomic resource


management in cloud computing, but scalability remains a major challenge, particularly for
real-world, large-scale deployments.
8.()()()()()()

Resource bundling: Combinatorial auctions for cloud resources:


The passage discusses the application of combinatorial auctions for resource allocation in cloud
computing. In these auctions, users bid on bundles of resources, such as CPU, memory, disk space,
and network bandwidth, to maximize the benefits of combining certain resources. The auction
process involves the allocation of resources to the highest bidder, aiming to achieve efficient, fair,
and scalable resource distribution.

Key Concepts:

1. Combinatorial Auctions:

o Participants can bid on bundles or packages of resources, rather than individual


resources. This allows users to specify what combinations of resources they require.

o The auction operates with a clock mechanism, where the price for each resource is
displayed, and the auction proceeds in rounds, adjusting prices based on supply and
demand.

2. Bidding Process:

o Users provide bids in the form of a vector Bu={Qu,πu}Bu = \{Q_u, \pi_u\}Bu={Qu,πu},


where QuQ_uQu is a vector specifying the quantity of each resource desired, and
πu\pi_uπu is the price they are willing to pay.

o The auction aims to optimize a function f(x,p)f(x, p)f(x,p), which could measure the
value of resources traded or the surplus (the difference between the users'
willingness to pay and the price they pay).

3. Pricing and Allocation:


o The auction ensures fairness by making sure that all winning participants pay the
same price for the resources they are allocated, and it guarantees that the prices are
non-negative.

o A key aspect is to partition participants into winners and losers based on their bids
and the price threshold.

4. ASCA Algorithm:

o The Ascending Clock Auction (ASCA) algorithm is one such approach for resource
allocation. In this model:

 Participants bid on bundles, and the auctioneer adjusts prices based on


demand and supply.

 The auction stops when the demand for resources is met or if there is no
excess demand.

 The auction progresses with bids updated in each round, and prices are
adjusted using a function based on excess demand.

o Excess Demand: The demand for resources is compared to their availability using the
excess vector z(t)z(t)z(t), which indicates whether the demand exceeds the offer. If
there is excess demand, prices are increased.

5. Challenges:

o Scalability: The algorithm must be computationally tractable to handle large systems.

o Fairness and Efficiency: Prices must be uniform across winning bidders, and all
participants must have a clear understanding of the resource allocation.

o Cloud Elasticity vs Auction Scheduling: A challenge arises from the fact that cloud
services require elasticity—immediate resource allocation—while auctions operate
in scheduled rounds, which can cause delays.

6. Results:

o Preliminary experiments with the ASCA algorithm at Google showed substantial


improvements in resource allocation, encouraging users to make their applications
more flexible and adaptable to the system's resource management.

This model is promising because it supports resource bundling and works without needing a detailed
model of the system, making it suitable for dynamic and large-scale cloud environments. However,
practical implementation remains challenging due to the need for periodic auctions and potential
delays.
9.()()()()()()

Scheduling algorithms for computing clouds

The passage discusses scheduling algorithms for cloud computing systems, focusing on how
resources are allocated and managed to meet the demands of different types of applications.
Scheduling is a key component in cloud resource management, ensuring that resources like CPU,
memory, and network bandwidth are allocated efficiently and fairly.

Key Concepts in Scheduling:

1. Types of Scheduling:

o Preemptive: Allows higher-priority tasks to interrupt lower-priority tasks.

o Nonpreemptive: Tasks run without interruption until completion.

o The scheduler must handle different types of applications, ranging from batch jobs to
real-time systems with varying levels of urgency and timing constraints.

2. Scheduling Dimensions:

o The scheduling algorithm must address two key dimensions:

 Resource Allocation: The quantity of resources allocated to an application.

 Timing: When the application can access the resources.

o These dimensions define the broad classes of resource requirements:

 Best-effort: No specific requirements for resource allocation or timing (e.g.,


batch applications).

 Soft-requirements: Applications need guaranteed statistical performance,


such as maximum delay and throughput (e.g., multimedia applications).

 Hard-requirements: Strict resource and timing constraints (e.g., real-time


systems).

3. Fairness in Scheduling:

o Max-min fairness is a common fairness criterion used to allocate resources among


users or applications.

 The conditions include ensuring that no user gets more than they requested,
and no user gets less than the minimum required unless all users are treated
equally.

o A fairness criterion for CPU scheduling ensures that the amount of work done by two
threads (over a specific time interval) is distributed proportionally according to their
weights, minimizing any discrepancies in resource allocation between threads.

4. Quality of Service (QoS):

o Different types of applications have varying QoS requirements that guide the
scheduling policy:
 Best-effort applications (e.g., batch processing, analytics) don’t require
guarantees and use simpler scheduling algorithms like round-robin, First-
Come-First-Served (FCFS), and Shortest Job First (SJF).

 Soft-real-time applications (e.g., multimedia) require scheduling algorithms


that ensure statistically guaranteed maximum delay and throughput, using
techniques like Earliest Deadline First (EDF) or Rate Monotonic Algorithms
(RMA).

 Hard-real-time systems (which aren’t yet commonly used in public clouds)


need precise timing and resource allocation.

5. Integrated Scheduling:

o Some algorithms integrate scheduling for various types of applications, such as RAD
(Resource Allocation/Dispatching) and RBED (Rate-Based Earliest Deadline), to
handle a mix of best-effort, soft, and hard real-time applications simultaneously.

Common Scheduling Algorithms:

1. Round-robin: Each thread gets a fixed time slice in a circular order.

2. First-Come-First-Served (FCFS): Threads are executed in the order they arrive.

3. Shortest Job First (SJF): Threads are executed based on the shortest execution time.

4. Earliest Deadline First (EDF): For real-time systems, the thread with the earliest deadline is
executed first.

5. Rate Monotonic Algorithm (RMA): A static priority algorithm used for real-time systems,
where tasks with shorter periods get higher priority.

Challenges and Evolution in Scheduling:

Scheduling in cloud environments is evolving to address the fairness and efficiency needed in the
increasingly complex cloud systems, especially as they support a mix of real-time, batch, and
multimedia applications. The need for fairness, efficiency, and scalability drives the development of
new algorithms that can meet the diverse requirements of cloud-based applications, with a special
focus on minimizing delays and maximizing throughput.
10.()()()()()()

Fair Queuing (FQ) as a scheduling algorithm, particularly in the context of network communication
and cloud computing. The primary challenge it addresses is ensuring fair bandwidth allocation
among multiple flows in a network, where each flow represents data from a source-destination pair.

Key Concepts of Fair Queuing:

1. Network Congestion and FCFS Limitations:

o Interconnection networks in cloud systems connect servers and users, but these
networks often have limited bandwidth and buffer capacity. When the load exceeds
capacity, packets may be dropped.

o The First-Come-First-Served (FCFS) scheduling algorithm, while simple, does not


guarantee fairness, especially if certain flows (e.g., those transmitting larger packets)
dominate bandwidth.

2. Fair Queuing Algorithm:

o A fair queuing algorithm ensures that each flow gets a fair share of the network
resources. The key feature is that the switch maintains separate queues for each flow
and processes them in a round-robin manner.

o Round-robin scheduling gives each flow equal opportunity, but the algorithm initially
does not guarantee fair bandwidth allocation for flows with different packet sizes.
Larger packets may still consume more bandwidth, creating imbalance.

3. Bit-by-Bit Round-Robin (BR):

o To address bandwidth fairness, Bit-by-Bit Round-Robin (BR) scheduling was


introduced. In BR, the algorithm transmits one bit at a time from each flow's queue
in a round-robin fashion.

o This ensures that smaller packets don't get unfairly blocked by larger ones, but it is
inefficient in practice due to its high overhead.

o The scheduling is described mathematically using variables like:

 R(t): the number of rounds of the algorithm by time t.

 Nactive(t): the number of active flows at time t.

 Si_a and Fi_a: the start and finish times of packet i of flow a.

4. Nonpreemptive Scheduling Rule:

o The nonpreemptive version of BR involves selecting the next packet to transmit


based on the smallest Fi_a (the finishing time of a packet).

o If a new packet arrives with a shorter finishing time, it will preempt the current
packet in the preemptive version of the algorithm.

5. Bid-based Scheduling:
o To better manage bandwidth allocation and transmission timing, the FQ algorithm
introduces a bid (Bi_a) for each packet, which helps prioritize packets fairly across
flows.

o The bid is calculated as

where:

 P_i_a is the packet size.

 F_i_a is the finishing time of the previous packet.

 R(t_i_a) is the round number when the packet arrives.

 δ is a non-negative parameter that introduces some flexibility.

6. Fairness vs. Timing:

o Fairness in FQ ensures each flow gets a fair share of the available bandwidth, but this
does not directly affect timing (i.e., when packets are transmitted).

o One approach to improve fairness while reducing delay is to allow flows that use less
than their fair share of bandwidth to experience less delay in their transmission.

Visual Explanation:

 Figure 6.8 provides a graphical explanation of how packets from a flow are transmitted. It
shows:

o Si_a: the start time of a packet's transmission.

o Fi_a: the finish time of a packet's transmission.

o Two cases are illustrated where either the previous packet's finish time is earlier or
later than the current round number R(t_i_a).
11()()()()()()

Start-Time Fair Queuing (SFQ)

The Start-Time Fair Queuing (SFQ) algorithm is designed for CPU scheduling in multimedia operating
systems. It ensures fair allocation of CPU time across threads, particularly in environments where
multiple threads or applications are running concurrently, such as in virtual machines (VMs).

Key Concepts of SFQ:

1. Hierarchical Tree Structure:

o The SFQ scheduler organizes threads in a tree structure:

 Root node: The CPU (processor).

 Leaves: Individual threads of applications or virtual machines (VMs).

 A scheduler operates at each level of the hierarchy to manage bandwidth


allocation fairly.

2. Bandwidth Allocation:

o The bandwidth (B) allocated to each node in the tree is proportional to the weight of
its children: Bi=B×wi∑j=1nwjB_i = B \times \frac{w_i}{\sum_{j=1}^n w_j}Bi=B×∑j=1n
wjwi where wiw_iwi is the weight of the node iii (e.g., a virtual machine or thread),
and the sum in the denominator is the total weight of all sibling nodes. This ensures
that nodes with higher weights receive more bandwidth.

3. Virtual Time:

o SFQ uses virtual time to track and schedule thread executions. The virtual time
progresses as threads are scheduled, and each thread has a virtual start time (S) and
virtual finish time (F).

o The virtual time is used to determine the order in which threads are scheduled, with
the scheduler picking the thread with the smallest virtual start time.

4. Thread Scheduling Rules:

o R1: Threads are serviced based on their virtual start time; ties are broken arbitrarily.

o R2: The virtual start time of a thread xxx is calculated as:


Sx(t)=max⁡(v(t),Fx(i−1))S_x(t) = \max(v(t), F_{x(i-1)})Sx(t)=max(v(t),Fx(i−1)) where
v(t)v(t)v(t) is the current virtual time, and Fx(i−1)F_{x(i-1)}Fx(i−1) is the finish time of
the last activation of thread xxx.

o R3: The virtual finish time is computed as: Fx(t)=Sx(t)+qwxF_x(t) = S_x(t) + \frac{q}
{w_x}Fx(t)=Sx(t)+wxq where qqq is the time quantum and wxw_xwx is the weight of
the thread xxx. The thread is suspended when its time quantum expires.

o R4: The virtual time v(t)v(t)v(t) is updated based on the maximum finish time of all
threads, or the virtual start time of the thread in service at time ttt. If the CPU is idle,
the virtual time remains the same.
Example Walkthrough:

In an example with two threads (a and b) with weights wa=1w_a = 1wa=1 and wb=4w_b = 4wb=4,
and a time quantum q=12q = 12q=12, the scheduling is as follows:

1. Initial State: At time t=0t = 0t=0, both threads are ready to run. Since the virtual start times
are equal, thread b is chosen arbitrarily to run first.

2. Thread b Execution:

o Thread b runs for q/wb=12/4=3q/w_b = 12/4 = 3q/wb=12/4=3 units of virtual time.


Its virtual finish time is Fb=3F_b = 3Fb=3.

o At time t=3t = 3t=3, thread a is activated because its virtual start time is less than
thread b's next start time.

3. Thread a Execution:

o Thread a runs for q/wa=12/1=12q/w_a = 12/1 = 12q/wa=12/1=12 units of virtual


time. Its virtual finish time is Fa=12F_a = 12Fa=12.

o At time t=15t = 15t=15, thread b is resumed and continues its execution, now with a
new virtual start time of Sb=3S_b = 3Sb=3.

4. Subsequent Scheduling:

o As time progresses, threads a and b are alternated based on their virtual start times.
If one thread is suspended (e.g., thread b blocks at time t=24t = 24t=24 and resumes
at t=60t = 60t=60), the remaining thread continues execution.

The process continues until all threads are executed, and the virtual times ensure fairness in CPU
allocation based on the thread weights.

Properties of SFQ:

 Fairness: SFQ allocates CPU time fairly even when the available bandwidth (or CPU time)
varies dynamically.

 Throughput and Delay Guarantees: SFQ provides both throughput (ensuring that threads
get sufficient CPU time) and delay guarantees (ensuring that the wait times for threads are
kept reasonable).

 No Need for Time Quantum for Scheduling: The scheduler does not require knowledge of
the exact time quantum when deciding which thread to run; the quantum is only used to
calculate the finish time after scheduling decisions.

Conclusion:

The Start-Time Fair Queuing (SFQ) algorithm is a sophisticated CPU scheduling method that ensures
fairness and efficient resource distribution in systems with multiple threads or virtual machines. It
leverages virtual time to prioritize threads based on their needs and weights, offering a solution to
handle CPU scheduling in a fair and efficient manner, especially in multimedia and cloud
environments.
UNIT -1
1.()()()()Distributed System Models and Enabling Technologies

The Age of Internet Computing

1. Internet Usage and Demands:

o Billions of daily users have led to increased demand for high-performance computing
(HPC) and high-throughput computing (HTC) systems.

o Traditional HPC benchmarks like Linpack no longer suffice for measuring system
performance due to these new demands.

2. Emergence of Computing Clouds:

o Transition from HPC to HTC systems focuses on parallel and distributed computing
for enhanced throughput.

o Upgraded infrastructure (e.g., fast servers, storage, high-bandwidth networks)


supports cloud-based network computing and web services.

The Platform Evolution

1. Generations of Computing:

o 1950–1970: Mainframes (e.g., IBM 360, CDC 6400) for government and businesses.

o 1960–1980: Mini-computers (e.g., DEC PDP 11) for smaller organizations and
academic use.

o 1970–1990: Personal computers using VLSI microprocessors became widespread.

o 1980–2000: Portable computers and pervasive devices gained traction.

o Since 1990: Hidden HPC and HTC systems drive network-based applications.

2. HPC Evolution:

o Speed increased from gigaflops (1990s) to petaflops (2010s), driven by scientific and
industrial needs.

o Despite advances, less than 10% of users require supercomputers.

3. HTC Paradigm:

o HTC addresses simultaneous, large-scale tasks like internet searches.

o Focuses on cost efficiency, energy savings, security, and reliability.

Definitions of Paradigms:

o Centralized Computing:

 Resources (processors, memory, storage) located in one physical system with


a unified OS.

 Examples: Supercomputers, certain data centers.


o Parallel Computing:

 Processors share memory (tightly coupled) or communicate via messages


(loosely coupled).

 Programs written for parallel systems enable concurrent task execution.

o Distributed Computing:

 Autonomous systems with private memory, connected via a network,


exchanging information through message passing.

 Focuses on decentralization and scalability.

o Cloud Computing:

 Combines parallel and distributed computing over centralized or distributed


infrastructures.

 Provides utility-based services, leveraging physical or virtualized resources.

2.()()()()()
Scalable Computing Over the Internet
Cloud Computing Overview
Cloud computing has emerged as a transformative technology that combines software,
services, and infrastructure into a unified system. The term cloud computing refers to both
the applications delivered over the Internet and the underlying hardware and software
systems in data centers that support these services.
A cloud is a distributed, parallel system made up of interconnected and virtualized
computers that are dynamically allocated and presented as unified computing resources.
These resources are made available based on service-level agreements (SLAs), which are
negotiated between the service provider and consumers.
Scalable Computing Over the Internet
Cloud computing enables scalable computing by using a combination of operating systems,
network connectivity, and application workloads distributed across multiple machines.
Instead of relying on a single centralized computer, distributed computing leverages multiple
interconnected systems to tackle large-scale problems over the Internet. This makes the
system data-intensive and network-centric, optimizing the computational process for
handling large datasets and parallel workloads.

Applications and Key Concepts


I. The Age of Internet Computing
1. Demand for High-Performance Computing (HPC):
o Large data centers are required to provide high-performance computing
services to meet the growing demand from millions of Internet users.
o HPC is crucial for applications that require immense computational power,
such as simulations, scientific research, and big data analytics.
2. High-Throughput Computing (HTC):
o In addition to HPC, the rise of the HTC paradigm reflects the growing
importance of processing large volumes of smaller tasks simultaneously,
rather than focusing solely on raw computational speed.
o HTC systems are optimized for high-volume, distributed processing, such as
Internet searches, web services, and online transaction processing.

Types of computing:

 Centralized Computing: All resources (processors, memory, storage) are housed in one
system, tightly coupled within a single OS, often used in data centers and supercomputers.

 Parallel Computing: Multiple processors work simultaneously on a task, using either shared
memory (tightly coupled) or distributed memory (loosely coupled), to improve computational
speed.

 Distributed Computing: Involves multiple autonomous computers, each with its own
memory, that communicate over a network to perform tasks collectively using message passing.

 Cloud Computing: A network-based computing model that provides on-demand resources


(e.g., processing power, storage) via the internet, using both centralized and distributed
architectures.

3.()()()()()

Technologies for Network-Based Systems (10 Marks Answer)


In modern distributed systems, scalable computing is achieved through a combination of hardware,
software, and network technologies. These advancements are critical for handling massive parallelism
and improving the efficiency of distributed computing environments. Below are the key technologies
that support the design and implementation of scalable and high-performance network-based systems.

1. Multicore CPUs and Multithreading Technologies

o Multicore CPUs: Modern processors have multiple cores (dual, quad, etc.), allowing
them to perform parallel processing, significantly improving performance.

o Instruction-Level Parallelism (ILP) and Thread-Level Parallelism (TLP): ILP helps


execute multiple instructions within one core, while TLP runs multiple threads across
different cores, increasing the system's efficiency for large-scale tasks.

o Limitations: Processor speed is no longer increasing significantly due to power


consumption limits. Future performance improvements rely on parallelism rather
than higher clock speeds.

2. GPU Computing to Exascale and Beyond

o GPUs: Originally designed for graphics, GPUs now perform parallel computations in
tasks like scientific simulations, machine learning, and data analysis.

o Exascale Computing: GPUs are crucial for handling exascale systems (processing
quintillions of calculations per second) due to their high throughput and ability to
manage large, parallel workloads.

o Performance Boost: GPUs excel in tasks like matrix multiplications and data-parallel
operations, often used in scientific and engineering applications.

3. Memory, Storage, and Wide-Area Networking

o Memory and Storage: Efficient memory systems and high-speed storage are
necessary to handle large-scale data, preventing bottlenecks during data processing.

o Wide-Area Networking (WAN): WAN technologies, like fiber-optic networks, enable


quick data transmission between geographically dispersed systems and data centers,
which is essential for large-scale distributed systems.

o Cloud Integration: Cloud platforms benefit from high-performance storage systems


(distributed file systems, object storage) combined with powerful networking to
handle big data efficiently.

4. Virtual Machines and Virtualization Middleware

o Virtualization: This technology allows multiple operating systems to run on a single


physical machine, enhancing resource management and flexibility.

o Virtualization Middleware: It abstracts hardware resources, ensuring consistent


environments across different systems. It also supports workload migration and load
balancing, improving efficiency in distributed systems.

o Hypervisors: These are used to isolate virtual machines from each other, enhancing
security by preventing interference between workloads.
5. Data Center Virtualization for Cloud Computing

o Cloud Computing: Relies on virtualized resources in data centers to offer flexible,


scalable services. Virtualization enables dynamic allocation of computing resources
based on demand, optimizing utilization and reducing hardware costs.

o Elasticity: Virtualization allows resources to scale up or down quickly in response to


varying workloads, which is essential for cloud platforms serving diverse user
demands.

o Software-Defined Networking (SDN) and Software-Defined Storage (SDS): These


technologies automate and manage cloud environments, providing centralized
control over network traffic and storage provisioning, enhancing efficiency and
scalability.

These technologies form the foundation for scalable and efficient network-based systems, supporting
modern computing needs such as cloud computing, distributed systems, and high-performance
applications.

4.()()()()()
System Models for Distributed and Cloud Computing

Distributed and cloud computing rely on various system models to manage resources, handle tasks,
and ensure efficient computing. These models provide frameworks that allow for scalable, flexible,
and reliable systems across geographically distributed locations. Below are key system models for
both distributed and cloud computing:

1. Centralized System Model

 Description: A centralized system has a single central node (usually a server) that manages
all resources and processes. Clients communicate with the server for processing requests.
 Example: A traditional client-server system where all data is stored and processed on a single
central server.

 Use Case: Suitable for smaller applications or systems where central control and
management are needed.

2. Client-Server Model

 Description: In this model, clients (end-user devices) request services or resources, which
are provided by servers. The server performs all the computational work, and clients only
need to interact with the server for their needs.

 Key Characteristics:

o The server provides services or resources.

o Clients request resources from the server.

o The server manages data and computing processes.

 Example: A web server that handles multiple requests from users accessing a website.

 Use Case: Widely used in enterprise applications, where users access centralized data or
services via the internet.

3. Peer-to-Peer (P2P) Model

 Description: In a P2P model, each node (peer) acts as both a client and a server. Nodes
communicate directly with each other to share resources or information, without a central
server.

 Key Characteristics:

o All nodes have equal roles (client and server).

o No central authority or server.

o Data or resources are shared among peers.

 Example: File-sharing networks (e.g., BitTorrent) or blockchain networks.

 Use Case: Ideal for decentralized systems where peers interact with one another for resource
sharing and collaboration.

4.Cluster Computing Model

 Description: A cluster computing system uses multiple interconnected computers (nodes)


that work together to perform tasks. These nodes function as a single system to improve
performance and reliability.

 Key Characteristics:

o Multiple computers work as a single unified system.


o High availability and fault tolerance.

o Distributed storage and computation across nodes.

 Example: Google Search, where numerous servers in a data center process queries and store
data.

 Use Case: Suitable for applications requiring high availability, fault tolerance, and large-scale
computations, such as in data centers or web services.

5. Grid Computing Model

 Description: Grid computing connects geographically dispersed systems and resources to


form a unified computing environment. It allows for sharing and distributing large-scale tasks
across multiple machines to increase processing power.

 Key Characteristics:

o Resources are geographically distributed.

o A grid system shares resources for problem-solving.

o Used to perform large, complex computations or simulations.

 Example: SETI@home, where volunteers contribute their computing power to process data
for the Search for Extraterrestrial Intelligence (SETI) project.

 Use Case: Ideal for scientific research and other resource-intensive applications that require
significant computational power over distributed resources.

6. Cloud Computing Model

 Description: Cloud computing provides on-demand access to computing resources (such as


servers, storage, and applications) via the internet. Cloud service providers maintain large
data centers, and users can scale resources up or down as needed.

 Key Characteristics:

o Services are provided over the internet (as a service).

o Resources are virtualized and scalable.

o Flexible billing models (pay-as-you-go or subscription-based).

 Types of Cloud Models:

o IaaS (Infrastructure as a Service): Provides virtualized computing resources over the


internet (e.g., Amazon Web Services).

o PaaS (Platform as a Service): Provides a platform for developing and deploying


applications without managing the underlying infrastructure (e.g., Google App
Engine).
o SaaS (Software as a Service): Provides access to software applications over the
internet (e.g., Google Workspace, Microsoft 365).

 Example: Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform.

 Use Case: Ideal for organizations or individuals who need scalable, on-demand resources for
storage, computing power, and applications, without maintaining physical infrastructure.

7. Hybrid Cloud Model

 Description: A hybrid cloud model combines both private and public cloud environments,
allowing data and applications to be shared between them. This model enables businesses to
have more flexibility and optimization in terms of resources.

 Key Characteristics:

o A mix of private and public cloud resources.

o Allows for better security and control with the private cloud.

o Scalable resources available via the public cloud.

 Example: An organization may keep sensitive data in a private cloud while using a public
cloud for running applications.

 Use Case: Suitable for organizations that require flexibility in resource management,
combining control over sensitive data with the scalability of the public cloud.

5.()()()()()()

Performance in Distributed and Cloud Computing

Performance in distributed and cloud computing systems refers to how efficiently and effectively
these systems execute tasks and manage resources. Below are key aspects of performance:

1. Latency and Throughput

o Latency: Time taken to complete a specific task or request in the system. Low latency
is critical for responsive applications.

o Throughput: Number of tasks completed per unit of time, important for systems
handling large-scale workloads.

2. Scalability

o The ability of the system to maintain or improve performance as the workload


increases by adding more resources (horizontal scaling) or enhancing existing ones
(vertical scaling).

3. Resource Utilization
o Efficient use of computing, memory, and network resources ensures high
performance. Virtualization and load balancing play a significant role in optimizing
utilization.

4. Fault Tolerance and Availability

o Distributed systems ensure high performance by maintaining operation despite


failures. Mechanisms like redundancy, replication, and failover improve availability
and reduce downtime.

5. Load Balancing

o Even distribution of workload across servers to prevent bottlenecks and ensure


consistent performance. Dynamic load balancing adjusts to changing workloads in
real-time.

6. Energy Efficiency

o Especially in cloud computing, energy-efficient systems reduce operational costs and


environmental impact while maintaining performance.

7. Quality of Service (QoS)

o Systems are evaluated based on meeting performance guarantees such as response


time, uptime, and throughput outlined in Service Level Agreements (SLAs).

Performance optimization in these systems often involves:

 Improving network speeds and reducing latency.

 Leveraging parallelism and distributed task execution.

 Employing caching and efficient data management techniques.

6.()()()()()()()

Security and Energy Efficiency in Distributed and Cloud Computing

1. Security in Distributed and Cloud Computing

Security is a critical concern in distributed and cloud systems due to shared and decentralized
environments. Key aspects include:

 Data Confidentiality and Privacy

o Encryption techniques (e.g., AES, RSA) are used to protect sensitive data in transit
and at rest.

o Access control mechanisms restrict data access to authorized users only.

 Authentication and Authorization

o Multi-factor authentication ensures users are verified before accessing the system.

o Role-based and policy-based access control mechanisms define permissions.

 Data Integrity
o Techniques like hashing (e.g., SHA-256) verify that data has not been tampered with
during transmission or storage.

 Network Security

o Firewalls, intrusion detection/prevention systems (IDS/IPS), and secure protocols


(e.g., HTTPS, SSL/TLS) protect communication.

 Threat Mitigation

o Distributed Denial-of-Service (DDoS) protection mechanisms, such as traffic filtering


and rate limiting, prevent system overloads.

o Regular vulnerability assessments and patch management minimize risks.

 Compliance and Governance

o Cloud providers adhere to standards like GDPR, HIPAA, and ISO/IEC 27001 to ensure
compliance with data protection regulations.

2. Energy Efficiency in Distributed and Cloud Computing

Energy efficiency aims to minimize the environmental and operational costs of large-scale systems.
Key approaches include:

 Dynamic Resource Allocation

o Techniques like dynamic voltage and frequency scaling (DVFS) reduce power
consumption by adjusting resource usage based on workload.

 Virtualization

o Consolidating multiple virtual machines (VMs) on fewer physical servers reduces idle
resources and energy waste.

 Green Data Centers

o Efficient cooling systems, renewable energy sources, and energy-efficient hardware


reduce the carbon footprint.

o Techniques like free cooling (using ambient air) lower cooling costs.

 Workload Scheduling

o Energy-aware scheduling allocates workloads to servers or data centers with lower


energy costs, often based on real-time energy pricing.

 Cloud-Based Solutions

o Public clouds leverage economies of scale, sharing infrastructure to optimize


resource utilization and reduce energy waste.

 Monitoring and Optimization

o Tools like power monitoring systems and predictive analytics identify inefficiencies
and suggest improvements.
Balancing security and energy efficiency is crucial, as ensuring robust security often increases energy
consumption. However, modern techniques strive to achieve both without compromising either
aspect.

UNIT-2
1.()()()()()

Security and Energy Efficiency in Distributed and Cloud Computing

1. Security in Distributed and Cloud Computing

Security is a critical concern in distributed and cloud systems due to shared and decentralized
environments. Key aspects include:

 Data Confidentiality and Privacy

o Encryption techniques (e.g., AES, RSA) are used to protect sensitive data in transit
and at rest.

o Access control mechanisms restrict data access to authorized users only.

 Authentication and Authorization

o Multi-factor authentication ensures users are verified before accessing the system.

o Role-based and policy-based access control mechanisms define permissions.

 Data Integrity

o Techniques like hashing (e.g., SHA-256) verify that data has not been tampered with
during transmission or storage.

 Network Security

o Firewalls, intrusion detection/prevention systems (IDS/IPS), and secure protocols


(e.g., HTTPS, SSL/TLS) protect communication.

 Threat Mitigation

o Distributed Denial-of-Service (DDoS) protection mechanisms, such as traffic filtering


and rate limiting, prevent system overloads.

o Regular vulnerability assessments and patch management minimize risks.

 Compliance and Governance

o Cloud providers adhere to standards like GDPR, HIPAA, and ISO/IEC 27001 to ensure
compliance with data protection regulations.

2. Energy Efficiency in Distributed and Cloud Computing

Energy efficiency aims to minimize the environmental and operational costs of large-scale systems.
Key approaches include:

 Dynamic Resource Allocation


o Techniques like dynamic voltage and frequency scaling (DVFS) reduce power
consumption by adjusting resource usage based on workload.

 Virtualization

o Consolidating multiple virtual machines (VMs) on fewer physical servers reduces idle
resources and energy waste.

 Green Data Centers

o Efficient cooling systems, renewable energy sources, and energy-efficient hardware


reduce the carbon footprint.

o Techniques like free cooling (using ambient air) lower cooling costs.

 Workload Scheduling

o Energy-aware scheduling allocates workloads to servers or data centers with lower


energy costs, often based on real-time energy pricing.

 Cloud-Based Solutions

o Public clouds leverage economies of scale, sharing infrastructure to optimize


resource utilization and reduce energy waste.

 Monitoring and Optimization

o Tools like power monitoring systems and predictive analytics identify inefficiencies
and suggest improvements.

Balancing security and energy efficiency is crucial, as ensuring robust security often increases energy
consumption. However, modern techniques strive to achieve both without compromising either
aspect.

2.()()()()()()

Virtualization Structures, Tools, and Mechanisms (10 Marks Answer)

Virtualization enables multiple operating systems and applications to run simultaneously on the
same hardware by abstracting the underlying physical resources. It introduces a virtualization layer
between the hardware and the operating system. Below are the main virtualization structures and
mechanisms:

1. Classes of VM Architectures

1. Hypervisor Architecture

o The hypervisor (or Virtual Machine Monitor - VMM) directly interacts with physical
hardware and the guest OS.

o It provides virtual hardware to multiple VMs and supports hardware-level


virtualization.

o Examples: VMware ESXi, Microsoft Hyper-V, Xen.

2. Para-Virtualization
o Modifies the guest OS to enable direct communication with the hypervisor through
hypercalls.

o Reduces the overhead compared to full virtualization.

o Example: Xen Hypervisor in para-virtualization mode.

3. Host-Based Virtualization

o The virtualization layer runs as an application on the host operating system.

o Easier to implement but less efficient due to additional overhead from the host OS.

o Examples: Oracle VirtualBox, VMware Workstation.

2. Hypervisor and Xen Architecture

 Hypervisor

o Acts as a bridge between hardware and guest OSes, converting physical resources
into virtual ones.

o Supports various virtualization mechanisms like CPU, memory, and disk


virtualization.

o Two Types:

 Type 1 (Bare-Metal): Runs directly on hardware. Examples: VMware ESXi,


Xen.

 Type 2 (Hosted): Runs on an existing OS. Examples: VirtualBox, VMware


Workstation.

 Xen Hypervisor

o A micro-kernel hypervisor that separates the control and user domains:

 Dom0 (Control Domain): Manages device drivers and system administration


tasks.

 DomU (User Domains): Runs guest OSes in isolated environments.

3. Virtualization Tools

 VMware Suite (ESXi, vSphere): Advanced server virtualization with features like live
migration.

 Microsoft Hyper-V: A micro-kernel hypervisor offering efficient virtualization for enterprise


applications.

 KVM (Kernel-based Virtual Machine): Built into Linux, providing high-performance


virtualization.

 Oracle VirtualBox: Host-based virtualization for desktop use.

4. Mechanisms and Benefits


 Resource Abstraction: Virtualizes CPUs, memory, storage, and network interfaces for each
VM.

 Isolation: Ensures that VMs are independent, enhancing security and fault tolerance.

 Flexibility: Supports different operating systems on the same hardware.

 Portability: VMs can be migrated across servers with minimal downtime.

3.()()()()()

Virtualization of CPU, Memory, and I/O Devices (Detailed Explanation)

1. Hardware Support for Virtualization

Modern processors include mechanisms to ensure multiple processes can operate without
conflicts:

 Modes of Operation:
o User Mode: Restricted access to critical hardware.
o Supervisor Mode: Full access to hardware, allowing execution of privileged
instructions.
 Virtualization adds complexity as multiple layers must cooperate to maintain system
correctness.

2. CPU Virtualization

 A Virtual Machine (VM) executes most instructions on the host processor in native mode for
efficiency.
 Critical Instructions:
o Privileged Instructions: Require supervisor mode and trap if executed in user mode.
o Control-Sensitive Instructions: Modify resource configurations.
o Behavior-Sensitive Instructions: Behavior depends on resource configurations (e.g.,
memory operations).
 Virtualizable CPU Architecture: Allows VMs to run in user mode while the VMM operates in
supervisor mode, trapping critical instructions to maintain system stability.

Hardware-Assisted CPU Virtualization:

 Intel and AMD introduce a privilege level (e.g., Ring -1) for the hypervisor, separating it from
guest operating systems (running at Ring 0).
 Simplifies virtualization by trapping sensitive instructions automatically, eliminating the need
for complex binary translation.

3. Memory Virtualization

 Extends traditional virtual memory mechanisms by adding a two-stage mapping process:


o Virtual Memory → Guest Physical Memory (managed by the guest OS).
o Guest Physical Memory → Machine Memory (managed by the VMM).
 Components:
o Shadow Page Tables: Maintain mappings for guest OS page tables, but increase
overhead.
o Nested Page Tables (NPTs): Add another layer of indirection, improving performance
but increasing complexity.
 TLB Optimization: Translation Lookaside Buffers (TLBs) avoid two-stage translations,
enhancing access speed.
 Challenges: Maintaining performance while managing multiple page tables and ensuring
memory isolation among VMs.

4. I/O Virtualization

Manages I/O requests between virtual devices and shared hardware through:

1. Full Device Emulation:


o Replicates physical device behavior in software.
o Traps I/O requests in the VMM, but with significant performance overhead.
2. Para-Virtualization:
o Uses split drivers:
 Frontend Driver: Manages guest OS I/O requests (runs in the VM).
 Backend Driver: Manages real I/O devices (runs in the VMM or host OS).
o Achieves better performance than emulation but increases CPU usage.
3. Direct I/O:
o Allows VMs direct access to hardware for near-native performance.
o Primarily used in networking, with challenges in adapting to commodity hardware.

5. Virtualization in Multi-Core Processors

Virtualizing multi-core processors introduces complexities such as:

 Parallelization Requirements: Applications must be designed to utilize multiple cores


effectively.
 Dynamic Heterogeneity:
o Combining CPU and GPU cores on the same chip.
o Adds complexity in resource management and scheduling.

Key Concepts:

1. Physical vs. Virtual Processor Cores:


o Abstracts low-level core details for software, reducing management inefficiencies.
2. Virtual Hierarchy:
o Adapts cache and coherence hierarchies to match workload demands.
o Improves performance isolation and data access speed through dynamic
adjustments.
4.()()()()()

Virtual Clusters and Resource Management

1. Introduction to Virtual Clusters


Virtual clusters operate by deploying Virtual Machines (VMs) across physical clusters to form logically
interconnected systems. They offer scalability, flexibility, and efficient resource utilization,
distinguishing them from traditional physical clusters. The key characteristics of virtual clusters
include:

 Ability to consolidate multiple VMs with different OSs on a single physical machine.

 Dynamic growth and reduction in size based on workload demands.

 Improved fault tolerance and disaster recovery through VM replication.

 Enhanced resource utilization and flexibility in application deployment.

2. Key Design Issues

1. Live VM Migration:

o Facilitates transferring workloads between nodes to balance loads or handle failures.

o Includes steps for preparation, memory transfer, state copy, and activation at the
destination node.

2. Memory and File Migration:

o Efficiently handles large memory instances and file systems.

o Incorporates techniques like Copy-On-Write (COW), temporal locality, and


distributed file systems to minimize data transfer.

3. Dynamic Deployment of Virtual Clusters:

o Ensures rapid deployment and scaling of VMs to meet workload requirements.

Physical vs. Virtual Clusters:


 Physical Clusters: Comprised of physical machines, forming a fixed hardware-based
infrastructure.

 Virtual Clusters: Consist of VMs distributed across physical machines, offering logical
boundaries and flexibility to adjust resources dynamically.

 Virtual clusters offer cost-effective management of computing resources, with advantages


like fault tolerance, load balancing, and server consolidation.

Key Features and Techniques in Virtual Cluster Management

1. VM Deployment and Scheduling:

 Automated tools reduce the time to configure and deploy VMs.

 Fast deployment techniques, such as using templates and pre-edited profiles, simplify
configuration.

 Resource scheduling ensures workload balance across nodes.

2. Live VM Migration:

 Involves iterative memory transfers to minimize downtime.

 Pre-Copy Migration: Transfers memory pages iteratively before halting the VM.

 Post-Copy Migration: Transfers memory pages after suspending the VM, reducing total
migration time but increasing downtime.

 Compression techniques leverage multi-core CPUs to reduce data transfer volumes.

3. Storage Management:

 Efficient storage solutions minimize duplication, using hash values and block-level
management.

 Distributed file systems provide location-independent file access for VMs.

4. Network Resource Migration:

 Virtual IP and MAC addresses allow seamless network connectivity during migration.

 Mechanisms like unsolicited ARP replies and MAC address port detection ensure connection
continuity.

Advantages of Virtual Clusters

 Resource Optimization: Higher server utilization and workload flexibility.

 Scalability: Nodes can be added or removed dynamically.

 Fault Tolerance: VMs can be replicated across hosts for resilience.

 Cost Efficiency: Shared infrastructure reduces capital and operational expenses.

Challenges

 Migration Overheads: Live migrations can strain network and memory resources.
 Downtime Management: Maintaining minimal service interruption during migration.

 Complexity in Management: Integrated management tools are needed to handle virtualized


and physical resources effectively.

Conclusion:
Virtual clusters enable modern computing environments to achieve high performance, resource
efficiency, and scalability. Their ability to dynamically allocate resources and balance workloads
makes them indispensable in cloud computing and enterprise IT infrastructures. However, addressing
challenges like migration overhead and efficient storage management remains critical for their
optimal performance.

5.()()()()()

VIRTUALIZATION FOR DATA-CENTER AUTOMATION

Data-center automation enables dynamic allocation of hardware, software, and database


resources.

It supports simultaneous service to millions of users with guaranteed QoS (Quality of Service) and
cost-effectiveness.

Server Consolidation

 Problem: Underutilized servers waste resources.

 Solution: Virtualization-based server consolidation.

o Benefits:

 Enhanced hardware utilization.

 Reduced TCO (deferred purchases, lower costs).

 Improved provisioning agility.

 Better availability and business continuity.

o Challenges:

 Increased complexity in resource utilization.

 Need for multi-level resource scheduling.

3.5.2 Virtual Storage Management

Storage Virtualization: Refers to how storage is managed by VMMs and guest OSes in a virtualized
environment.

Types of Data:

 VM images (specific to virtualization).


 Application data (similar to traditional OS data).

Encapsulation & Isolation: Multiple VMs can run on one physical machine, each isolated from
others.

Storage Challenges: Virtualization complicates storage management due to the additional layer
between hardware and the OS.

Multiple VMs: Multiple VMs sharing the same hard disk makes storage management more
complex.

3.5.3 Cloud OS for Virtualized Data Centers

VI Managers/OS:

o Nimbus, Eucalyptus, OpenNebula: Open-source options for virtualization.

o vSphere 4: Proprietary software for managing cloud resources.

 Features:

o Nimbus & Eucalyptus: Support virtual networks.

o OpenNebula: Can dynamically provision resources and make advance reservations.

o vSphere 4: Supports virtual storage, networking, and data protection.

 Virtualization Technologies:

o Nimbus, Eucalyptus, OpenNebula: Use Xen and KVM hypervisors.

o vSphere 4: Uses VMware’s ESX and ESXi hypervisors.

3.5.4 Trust Management

 Security risks from VMM vulnerabilities and random number reuse.

 VM-Based Intrusion Detection:

o IDS in each VM or integrated with VMM.

o Isolation of VMs for better security.

o Use of honeypots and honeynets for intrusion analysis.

3.5.5 VM-Based Intrusion Detection:

Intrusion Detection: Identifies unauthorized access to a system.

 HIDS: Works on the monitored system, but can be attacked.

 NIDS: Monitors network traffic but can't detect fake actions.

VM-Based Intrusion Detection:

 VMs are isolated from each other, so an attack on one VM doesn't affect others.

 VMM monitors and audits access to resources, offering protection.


VM-Based IDS Methods:

1. IDS in each VM or a high-privileged VM on the VMM.

2. IDS built into the VMM with access to hardware.

You might also like