0% found this document useful (0 votes)
59 views

Homework 5: The 11 Rule: Good Suggestions Interesting, But

The document discusses thermal issues in data centers caused by heat recirculation and proposes thermal-aware task placement as a solution. It presents concepts like modeling heat recirculation through coefficients, developing fast thermal models, and formulating the task placement problem to minimize peak inlet temperatures and cooling costs. Spatio-temporal job scheduling algorithms aim to further reduce energy use by spreading workloads over time for more efficient placement considering the data center's changing utilization.

Uploaded by

jhampia
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Homework 5: The 11 Rule: Good Suggestions Interesting, But

The document discusses thermal issues in data centers caused by heat recirculation and proposes thermal-aware task placement as a solution. It presents concepts like modeling heat recirculation through coefficients, developing fast thermal models, and formulating the task placement problem to minimize peak inlet temperatures and cooling costs. Spatio-temporal job scheduling algorithms aim to further reduce energy use by spreading workloads over time for more efficient placement considering the data center's changing utilization.

Uploaded by

jhampia
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Homework 5: the 11

th
rule
Good suggestions
Use more efficient
code
Train the users
Reuse the waste heat
Interesting, but
Use alternative,
renewable energy
sources
Make predictions
Do monitoring
Repair instead of
replace
Thermal-aware Task
placement (spatial
scheduling) of Data
Centers
Overview
Thermal issues
in dense computer rooms
(i.e. Data centers, Computer Clusters, Data warehouses)
Heat recirculation
Hot air from the equipment air
outlets is fed back to the equipment
air inlets
Hot spots
Effect of Heat Recirculation
Areas in the data center with
alarmingly high temperature
Consequence
Cooling has to be set very low to
have all inlet temperatures in safe
operating range
C
o
u
r
t
e
s
y
:

I
n
t
e
l

L
a
b
s

Conceptual overview of
thermal-aware task placement
Task placement determines
temperature distribution
Temperature distribution
determines the equipment
peak air inlet temperature
Peak air inlet temperature
determines upper bound to
CRAC temperature setting
CRAC temperature setting
determines its efficiency
(Coefficient of Performance)
bottom
line
There is a task placement that maximizes cooling efficiency.
Find it and use it!
The lower the peak inlet temperature
the higher the CRAC efficiency
Coefficient of Performance
(source: HP)
Prerequisites for
thermal management
Task profiling
CPU utilization, I/O activity etc
Equipment power profiling
CPU consumption, disk consumption etc
Heat recirculation modeling
Task management technologies

Need for a comprehensive research
framework
Thermal-aware
job scheduling
On-line job scheduling
algorithm to minimize peak air
inlet temperature, thus
minimizing the cost of cooling.

Thermal Models
To enable on-line real-time thermal-aware job
scheduling
fast (analytical, non CFD based)
non-evasive (machine-learning)

Characterization
Characterize the power consumption
of a given workload (CPU, memory,
disk etc) on a given equipment
Thermal management research framework
Model the thermal impact of
multicore systems

Sensor Data
Gathering Service
Data Center
Monitoring
Performance
Monitoring Service
Non-Invasive
Thermal
Evaluation
Fast Thermal
Evaluation Service
Thermal/Power &
Performance Correlation
Service
Job Scheduling
Service
Cluster
Management
Policy
Enforcement
Thermal Management
Policy Enforcement
Service
Job Queues
Resource
Queues
Thermal
Control Policies
Cooling Control
Service
Air-flow Control
Service
Facility
Management
Resource &
Server
Management
OS-Level Services
Performance
Monitoring
Thermal Management Infrastructure
& Services for Data Centers
http://impact.asu.edu/

Sandeep Gupta
Qinghui Tang
Tridib Mukherjee
Michael Jonas
Georgios Varsamopoulos
Power Model and Profiling
Power
Consumption
is mainly
affected by the
CPU utilization
Power
consumption
is linear to the
CPU utilization



P = a U + b
A simple thermal model
Basic Idea:
We dont need an extensive
CFD model
We only need to know the
effect of recirculation at
specific points
Express recirculation as
coefficients
C
o
u
r
t
e
s
y
:

I
n
t
e
l

L
a
b
s

N
1
N
2
N
3
N
4
N
5
Recirculation coefficients:
a fast thermal model
Reduce/Simplify
the thermal map
concept to points
of interest:
equipment air
inlets
Can be computed
from CFD
models/simulations
Matrix A
a
ij
: portion of heat
exhausted from node i
that directly goes to node j
A
recirculation coefficients
Linear Thermal Model
Heat Recirculation
Coefficients
Analytical
Matrix-based
Properties of model
Granularity at air inlets
(discrete/simplified)
Assumes steadiness of
air flow
=
+

inlet
temperatures
supplied air
temperatures
heat distribution
power
vector
T
in
T
sup
D P
N1 AC
Recirculation
T
sup
T
in
T
out
T
ACin
N2
N3
o
12
o
13
o
21
o
31
o
11
Benefit: fast thermal evaluation
Give workload (job)
Run CFD simulation (days)
Extract
temperatures
Give workload (job) Compute vector (seconds)
+

T
in
T
sup

D P
Yields
temperatures
Courtesy: Flometrics
Thermal-aware
Task Placement Problem
Given an incoming task consisting of homogeneous
processes, find a placement of the processes to minimize
the (increase of) peak inlet temperature
=
+

inlet
temperatures
supplied air
temperatures
heat distribution
utilization
vector
T
in
T
sup
D U
(
a
+
)
b
b
b
b
b
b
b
Formulation
Given a task that requires C
tot

servers, a matrix D that
describes recirculation, and the
power profile parameters a, b :
P = a U + b
n i c m
n i b ac d t t
C c
t
i
n
j
j ij i i
n
i
i
i i

1 , 0
1 ), (
: such that
} { max Minimize
1
sup in
1
tot
in
= > >
= + + =
=

=
=
Simulation environment
Small-scale data center
One row is equipped with Dell
Poweredge 1955
The other row is equipped with Dell
Poweredge 1855
Due to the heterogeneity of
equipment:
There is a difference between
minimizing just cooling cost vs.
minimizing total cost
Difference is small but can be larger
depending on the data center
We now have to minimize total cost
Thermal-aware task placement
Simulation results for data centers
For small loads,
the task is assigned on 1955s, therefore
the optimization has to sacrifice cooling
cost to improve overall power cost
Spatio-Temporal Thermal-aware
Job Scheduling Algorithms for
(heterogeneous) data centers
Motivation
Past work on spatial-only
thermal-aware job scheduling
has shown considerable
energy savings
Savings from:
Knowing/modeling heat
recirculation and controling the
server assignment (i.e. spatial
scheduling) to minimize it
Adjusting the CRAC
thermostat to the highest yet
safe setting to save energy
Onto spatio-temporal job scheduling
Data center utilization
changes over time
Job scheduling though is
mainly a temporal process
Problem:
How to incorporate thermal
awareness into the temporal
dimension?

Spatio-temporal approaches
Based on approach of XInt
SCINT:
Discretize time as well as space
Formulate a discrete spatio-temporal reservation problem to minimize objective function
Solve using a genetic algorithm
Based on extending FCFS w/ back-filling
FCFS-XInt
FCFS temporal, XInt spatial
FCFS-LRH
FCFS temporal, least-recirculated-heat spatial
Based on approximating SCINT behavior
Running SCINT is very time-consuming
SCINT induces savings by temporally spreading workload to allow more energy-
efficent spatial placement.
Approximate using earliest-deadline-first (temporal) with LRH (spatial)
Some challenges
The algorithms require a good model of
the heat recirculation
Use of the abstract linear heat interference
model (ALHI) requires profiling of the heat
recirculation, either through measurements
or through simulation

The algorithms require a good estimate of
the actual execution time
Reservation time (i.e. slack) is not a good
estimate, it is almost always a generous
over-estimation

Deadline is not specified by the
submissions
Use the (submission time + slack) as
deadline
Interference coefficients matrix
Slack vs execution time
Submission and
execution time
Power profile
of schedules
Energy
consumption of
schedules
Submission and
execution time
Power profile
of schedules
Energy
consumption of
schedules
Submission and
execution time
Power profile
of schedules
Energy
consumption of
schedules
Submission and
execution time
Power profile
of schedules
Energy
consumption of
schedules
Submission and
execution time
Power profile
of schedules
Energy
consumption of
schedules
Submission and
execution time
Power profile
of schedules
Energy
consumption of
schedules
Conclusions from this work
There exist synergies in spatio-temporal
scheduling:
Synergy between temporal smoothing and thermal-
aware placement (spatial scheduling)
Synergy between proactivity of spatio-temporal
scheduling and power scheduling
Near-optimal heuristics are very slow
Fast approximations are preferable
Energy consumption of
spatio-temporal job
scheduling in a linear
cooling environment
Cooling Models
Constant-value
cooling (FloVENT)
T
out
= b
Linear cooling
T
out
= aT
in
+ b
Segmented constant-
linear cooling (FloVENT)



Stepwise linear (observed)

> +
s
=


in in
in
out
T b T a
T b
T
if ) (
if ,

> +
s +
=

in in
in in
out
T b T a
T b T a
T
if ,
if ,
2 2
1 1
Cooling distribution model
Assume a 3-mode heat-
extractor cooling system:
P
out
=5 KW cooling until
T
in
=16
P
out
=75 KW cooling until
T
in
=20
P
out
=250 KW when T
in
>26
Time delay of 10 minutes to
fully switch the mode

Return Heat
Total -recirculated heat
P
in
= (1-a
ij
)P
i

Supplied Heat
Input heat extracted heat
P
sup
= P
in
- P
out
Results: FCFS-Xint
Cooling Power and Energy
Assume a 3-mode heat-extractor cooling system:
5 KW cooling until T
in
=16, 75 KW cooling until T
in
=20, 250 KW when T
in
>26
Results: EDF-LRH
Cooling Power and Energy
Assume a 3-mode heat-extractor cooling system:
5 KW cooling until T
in
=16, 75 KW cooling until T
in
=20, 250 KW when T
in
>26
Results: SCINT
Cooling Power and Energy
Assume a 3-mode heat-extractor cooling system:
5 KW cooling until T
in
=16, 75 KW cooling until T
in
=20, 250 KW when T
in
>26
Conclusions from this work
Data Center energy consumption is increasing
Benefits emerge if viewed as Cyber-Physical
Systems
Thermal-aware scheduling
Need to bridge the gap between simulation
results and practice
Non-invasive ways to apply modeling methods in
real data centers
Use realistic cooling models

You might also like