SlideShare a Scribd company logo
AVOID RESOURCE CONTENTION
WITH ECO4CLOUD TECHNOLOGY
A PRIMARY TELCO USE CASE
Ph. +39 0984 494276 Piazza Vermicelli
87036 Rende (CS), Italy
www.eco4cloud.com
info@eco4cloud.com
Copyright © 2016 Eco4Cloud. All rights reserved. This product is protected by Italian and international copyright and intellectual property laws.
Eco4Cloud — www.eco4cloud.com | Phone +39 0984494276 | E-mail info@eco4cloud.com
AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY
© 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 2
Overcommitment and Contention
1. Introduction
VMware® ESX™ is a hypervisor designed to efficiently manage hardware resources
including CPU, memory, storage and network among multiple concurrent virtual machines
[1]. ESX uses high-level resource management policies to compute a target memory
allocation for each virtual machine (VM), based on the current system load and parameter
settings for the virtual machine (shares, reservation, and limit [2]).
The computed target allocation is used to guide the dynamic adjustment of the memory
allocation for each virtual machine; in case host memory is overcommitted, the target
allocations are achieved by invoking several lower-level mechanisms to reclaim memory
from virtual machines.
VMware ESX enables impressive memory and CPU consolidation ratios; ESX allows
running VMs with total configured resources that exceed the amount available on the
physical machine: this is called overcommitment.
Overcommitment raises the consolidation ratio, increases operational efficiency and lowers
total cost of operating virtual machines; if out of control, overcommitment leads to Resource
Contention, a typical situation where several VMs are competing over the same resources,
waiting for the VMware scheduler to assign them.
This is the main reason for performance issues in virtualized environment and, as such, it’s
the very first key performance indicator to be monitored in a virtual farm.
Contention is measured via CPU Ready Time and Memory Ballooning.
2. CPU Ready Time
CPU Ready Time is the period of time a VM waits in a ready-to-run state (meaning it has
work to do) before being scheduled by the hypervisor on one or more physical CPUs.
Therefore, CPU Ready Time is a metric showing how much time virtual CPU is ready to be
scheduled on a given physical host. In general terms, it is normal for VMs to have small
values of CPU Ready Time, even if the hypervisor is not over subscribed, or under heavy
activity; it is just the nature of shared scheduling in virtualization. For SMP VMs with multiple
vCPUs, the amount of ready time will generally be higher than for VMs with fewer vCPUs,
In general terms,
it is
normal for VMs to
have small values
of CPU Ready
Time
AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY
© 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 3
since it requires more resources to schedule/co-schedule the VM when necessary and each
CPU accumulates the time separately; under normal operating conditions, this value should
remain under 5%. If ready time values are higher, virtual machines experience bad
performance.
Even in best designed environments there will be some CPU contention and that is okay.
Any %ready number less than 5% is considered the optimal area to be in. Once your
%ready number climbs in between 5 and 10%, you need to pay attention when adding more
virtual machines and/or CPU cores to the virtual machines. We can call this the warning
area. Now, once the %ready numbers climb higher than 10%, you will reach the dangerous
area and as a consequence bad performance will impact those virtual machines. Your host
could show a %50 overall CPU utilization and strong CPU contention in your environment,
thus affecting the overall performance of your virtual machines.
Just to summarize, CPU contention is one of the hidden issues you might find in your
environment, unless you know where looking for. The best tool to use when looking for any
CPU contention in your environment is ESXTOP from inside the service console of the host,
RESXTOP from the vMA appliance, or other third-party tools, like Eco4Cloud. The best
defense against CPU contention is knowledge and comprehension of scheduler interactions
with multi-processor virtual machines; if you are using multi-processor systems, take into
account that potential issue.
While there are a number of scenarios where high values of CPU Ready Time can occur,
there are two most common scenarios. The first common reason tends to be host over
subscription, where too many vCPUs have been allocated per pCPU ratio wise; while ESX 5
supports a maximum of 25 vCPUs per physical CPU, this is definitely the case where just
because you can do it, it equals to a good practice. As always, your mileage may vary based
on your specific VM workloads, but typically you begin to experience some problems when a
host is in the range of 2-2.5X over subscribed for server workloads.
The second most common scenario where CPU Ready Time goes higher is when a larger
SMP VM, for example a 4-8 vCPUs running on a host having a lot of smaller VMs with 1-2
vCPUs for application servers. Depending on the number of physical processors and on the
total number of vCPUs allocated on the host, a larger resource allocation for the VM results
in longer waiting time, because the hypervisor has to preempt the necessary physical CPUs
to schedule/co-schedule the workload. When this issue occurs, the software vendor
increases vCPUs requirements, due to performance problems for the VM. Unfortunately, if
CPU Ready Time is the root cause, increasing vCPUs number actually does not improve
performance, on the contrary things get worse.
The best defense
against CPU
contention
is knowledge
AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY
© 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 4
3. Memory Ballooning
One of main benefits introduced by virtualization is virtual machines isolation, which is very
useful for security and risk management. A drawback of virtual machines isolation is that the
guest operating system is not aware it is running inside a virtual machine and is not aware of
the states of other virtual machines on the same physical host. When the hypervisor runs
multiple virtual machines and the total amount of free host memory gets low, none of the
virtual machines will release guest physical memory, since when the guest operating system
cannot detect the host’s memory shortage.
VMware ballooning is a memory reclamation technique used when an ESXi host is running
low on memory. This allows the physical host system to retrieve unused memory from
certain guest virtual machines (VMs) and share it with others [3].
Ballooning makes the guest operating system aware of the low memory status of the host. In
ESX, a balloon driver is loaded into the guest operating system as a pseudo-device driver. It
has no external interface to the guest operating system and communicates with the
hypervisor through a private channel. The balloon driver polls the hypervisor to obtain a
target balloon size. If the hypervisor needs to reclaim virtual machine memory, it sets a
proper target balloon size for the balloon driver, making it “inflate” by allocating guest
physical pages within the virtual machine.
Ballooned memory is a symptom of RAM memory contention. If host free memory drops
towards the 4% threshold, the hypervisor starts to reclaim memory, using ballooning.
VM memory ballooning can create performance degradation.
Ballooning is a CPU intensive process, and can eventually lead to memory swapping, when
a balloon driver inflates to the point where the VM no longer has enough memory to run its
processes. This will slow down the VMs, depending upon the amount of memory to recoup
and/or the quality of the storage IOPS delivered to it.
4. Why these counters are important
CPU Ready Time and Ballooned Memory are symptoms of contention on CPU and RAM,
respectively. These metrics represent, in IT literature, the universally recognized two most
significant indicators of the fact that virtual machines are experiencing bad performance.
The generally accepted industry best practice based on VMware’s guidelines is that CPU
Ready Time values up to 5% (per vCPU) fall within acceptable parameters.
AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY
© 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 5
Memory Ballooning is the first technique the hypervisor uses to reclaim memory. Absence or
very low levels of ballooning is a sign of excellent/good health for a virtual farm.
Eco4Cloud Workload Consolidation intelligence computes the ideal placement of VMs
among physical hosts, in order to decrease both CPU Ready Time and Memory Ballooning,
enabling higher performance and VMs density.
5. Test Workflow
A field test has been performed in a performance comparison between VMware® Distributed
Resource Scheduler and Eco4Cloud Workload Consolidation platform.
VMware® Distributed Resource Scheduler (DRS) aggregates computing capacity across a
set of servers into logical resource pools and intelligently allocates available resources
among the VMs, based on pre-defined rules.
VMware Distributed Power Management (DPM), within VMware DRS, automates power
management and minimizes power consumption across a given collection of servers in a
VMware DRS cluster.
The test was performed on a cluster in a production farm of a leader Italian Telco company;
the cluster contained 6 physical hosts running vmware vSphere version 5.0.
The hosts were HP ProLiant DL580 G5, equipped with 64GB RAM and 4 CPU socket. Three
hosts mounted 4x Intel® Xeon® CPU E7320 @ 2.13GHz while the other three mounted 4x
Intel® Xeon® CPU X7350 @ 2.93GHz. Each CPU had 4 physical cores, so the total number
of physical cores for each host was 16. The hosts ran about 94 virtual machines with a
number of virtual CPU assigned cores that range from 1 to 8 (most of them with 2 or 4 virtual
cores) and an amount of assigned RAM varying from 1 to 16 GB (most of them with 2 or 4
GB RAM). The guests operating systems were: 80% Microsoft Windows (various editions,
32 and 64 bit), 14% Linux Red Hat Enterprise (5 and 6, 32 and 64 bit), and 6% Oracle
Solaris 10 64 bit.
The average CPU host usage during the test performance was about 28%.
In order to collect valuable data, a set of tests using VMWare DRS and E4C Workload
Consolidation were performed.
Overall test was set to run in 6 days, divided in two equal length phases.
During first phase (3 days) workload placement was managed with VMware® DRS in fully
automated mode and Eco4Cloud Workload Consolidation was disabled.
Avoiding ballooning
is sign
of good health
for a virtual farm
AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY
© 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 6
After that first phase, a second one of additional 3 days occurred: Eco4Cloud Workload
Consolidation was enabled and VMware® DRS was put in partially automated mode.
The two phases were comparable, because the production workload on the given cluster did
not change significantly.
6. Results
Just after the end of the test, it was crystal clear that through Eco4Cloud Workload
Consolidation usage, the overall cluster performance increased: on one hand, CPU
Ready Time dropped by 23%; on the other hand, Ballooned Memory was completely
removed, through the intelligent workload placement strategy brought by Eco4Cloud
Workload Consolidation.
On the memory side, the result is crystal clear: problem solved.
On the CPU side, the result positively affects performance; 23% is just an aggregate value.
Let’s see how CPU Ready Time decreases in most important cases, where CPU Ready
Time exceeds the warning and alert thresholds, 5% and 10%, respectively.
Ballooning
memory
totally
removed
AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY
© 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 7
As you can see from the following exhibit, CPU Ready Time warnings decrease by 90.26%,
while CPU Ready Time alerts decrease by 42.86%.
It means, in our evaluation scenario:
- 514 less warning/alerts each day, per cluster
- 3598 less warnings/alerts each week, per cluster
Given how much time it takes to manage a performance warning or an alert, evaluating how
much time you can save with an intelligent workload placement solution is simple math.
References
[1] Carl A. Waldspurger. “Memory Resource Management in VMware ESX Server”.
Proceeding of the fifth Symposium on Operating System Design and Implementation,
Boston, Dec 2002
[2] vSphere Resource Management Guide. VMware.
http://www.vmware.com/pdf/vsphere4/r40/vsp_40_upgrade_guide.pdf
[3] Understanding Memory Resource Management in VMware® ESX™ Server
http://www.vmware.com/files/pdf/perf-vsphere-memory_management.pdf
For more information
 E4C Workload Consolidation: http://www.eco4cloud.com/workload-consolidation
 Eco4Cloud Workload Consolidation Product Overview
 Eco4Cloud Workload Consolidation FAQ
Ph. +39 0984 494276 Piazza Vermicelli
87036 Rende (CS), Italy
www.eco4cloud.com
info@eco4cloud.com
CPU Ready
Time warnings
and alerts
decreased by
more than 90%
and 42%,
respectively

More Related Content

What's hot (14)

PDF
VMworld 2013: VMware Disaster Recovery Solution with Oracle Data Guard and Si...
VMworld
 
PPTX
Benchmark emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mware
solarisyougood
 
PDF
Accelerating virtualized Oracle 12c performance with vSphere 5.5 advanced fea...
Principled Technologies
 
PDF
Postgres plus cloud_database_getting_started_guide
ice1oog
 
PDF
Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1
Nuno Alves
 
PDF
KoprowskiT_2AMaDisasterJustBeganAD2018
Tobias Koprowski
 
PDF
Citrix PVS Advanced memory and storage considerations for provisioning services
Nuno Alves
 
PPT
Ibm aix technical deep dive workshop advanced administration and problem dete...
solarisyougood
 
PDF
VMworld 2013: vSphere Data Protection 5.5 Advanced VMware Backup and Recovery...
VMworld
 
PDF
Virtualization with Lenovo X6 Blade Servers: white paper
Lenovo Data Center
 
PDF
VMworld 2014: Data Protection for vSphere 101
VMworld
 
PDF
A Step-By-Step Disaster Recovery Blueprint & Best Practices for Your NetBacku...
Symantec
 
PPTX
Optimize Oracle On VMware (Sep 2011)
Guy Harrison
 
PPTX
Optimize oracle on VMware (April 2011)
Guy Harrison
 
VMworld 2013: VMware Disaster Recovery Solution with Oracle Data Guard and Si...
VMworld
 
Benchmark emc vnx7500, emc fast suite, emc snap sure and oracle rac on v-mware
solarisyougood
 
Accelerating virtualized Oracle 12c performance with vSphere 5.5 advanced fea...
Principled Technologies
 
Postgres plus cloud_database_getting_started_guide
ice1oog
 
Wp intelli cache_reduction_iops_xd5.6_fp1_xs6.1
Nuno Alves
 
KoprowskiT_2AMaDisasterJustBeganAD2018
Tobias Koprowski
 
Citrix PVS Advanced memory and storage considerations for provisioning services
Nuno Alves
 
Ibm aix technical deep dive workshop advanced administration and problem dete...
solarisyougood
 
VMworld 2013: vSphere Data Protection 5.5 Advanced VMware Backup and Recovery...
VMworld
 
Virtualization with Lenovo X6 Blade Servers: white paper
Lenovo Data Center
 
VMworld 2014: Data Protection for vSphere 101
VMworld
 
A Step-By-Step Disaster Recovery Blueprint & Best Practices for Your NetBacku...
Symantec
 
Optimize Oracle On VMware (Sep 2011)
Guy Harrison
 
Optimize oracle on VMware (April 2011)
Guy Harrison
 

Viewers also liked (10)

PDF
The benefits of operating systems consolidation in corporate datacenters
Eco4Cloud
 
PPTX
Nelson Mandela Home Learning
Jenn_Gray
 
PPTX
Teoría de las necesidades de abraham maslow ...
Ana Cristina Coronel Zavaleta
 
PPTX
huracanes
andre nuñez
 
PDF
Saving energy in data centers through workload consolidation
Eco4Cloud
 
PPTX
Music Video Work
Adam Newton
 
PPTX
Careers our ancestors wouldn't believe
John Woolley
 
PPTX
Eco4Cloud - Company Presentation
Eco4Cloud
 
PDF
How to sell clickbank products fast
takesurveysforcash
 
The benefits of operating systems consolidation in corporate datacenters
Eco4Cloud
 
Nelson Mandela Home Learning
Jenn_Gray
 
Teoría de las necesidades de abraham maslow ...
Ana Cristina Coronel Zavaleta
 
huracanes
andre nuñez
 
Saving energy in data centers through workload consolidation
Eco4Cloud
 
Music Video Work
Adam Newton
 
Careers our ancestors wouldn't believe
John Woolley
 
Eco4Cloud - Company Presentation
Eco4Cloud
 
How to sell clickbank products fast
takesurveysforcash
 
Ad

Similar to Avoid resource contention with e4 c (20)

DOCX
Cpu ready recomendaciones
Cristian Muñoz
 
PPTX
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld
 
PPTX
PlovDev 2016: Application Performance in Virtualized Environments by Todor T...
PlovDev Conference
 
PDF
The have no fear guide to virtualizing databases
SolarWinds
 
PPTX
webinar vmware v-sphere performance management Challenges and Best Practices
Metron
 
PPT
ESX performance problems 10 steps
Concentrated Technology
 
PPTX
Optimizing cpu resources
AnithaDevi19
 
PDF
VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld
 
PDF
Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02
Suresh Kumar
 
PPTX
Vmwareperformancetroubleshooting 100224104321-phpapp02
Suresh Kumar
 
PPTX
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Suresh Kumar
 
PDF
Understanding VMware Capacity
Precisely
 
PPTX
Master VMware Performance and Capacity Management
Iwan Rahabok
 
PDF
vSphere APIs for performance monitoring
Alan Renouf
 
PDF
Esx mem-osdi02
35146895
 
PDF
Introduction to eNlight Cloud Computing Platform
Milind Koyande
 
PDF
eNlight- Intelligent Cloud Computing Platform
Manisha Daulatani
 
PPTX
Virtualisation Oversubscription - What's so scary?
Metron
 
PDF
Could the “C” in HPC stand for Cloud?
IBM India Smarter Computing
 
PPTX
Cloud Computing 2023 - Lecture 07.pptx
emanamin19
 
Cpu ready recomendaciones
Cristian Muñoz
 
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
VMworld
 
PlovDev 2016: Application Performance in Virtualized Environments by Todor T...
PlovDev Conference
 
The have no fear guide to virtualizing databases
SolarWinds
 
webinar vmware v-sphere performance management Challenges and Best Practices
Metron
 
ESX performance problems 10 steps
Concentrated Technology
 
Optimizing cpu resources
AnithaDevi19
 
VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld
 
Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02
Suresh Kumar
 
Vmwareperformancetroubleshooting 100224104321-phpapp02
Suresh Kumar
 
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Suresh Kumar
 
Understanding VMware Capacity
Precisely
 
Master VMware Performance and Capacity Management
Iwan Rahabok
 
vSphere APIs for performance monitoring
Alan Renouf
 
Esx mem-osdi02
35146895
 
Introduction to eNlight Cloud Computing Platform
Milind Koyande
 
eNlight- Intelligent Cloud Computing Platform
Manisha Daulatani
 
Virtualisation Oversubscription - What's so scary?
Metron
 
Could the “C” in HPC stand for Cloud?
IBM India Smarter Computing
 
Cloud Computing 2023 - Lecture 07.pptx
emanamin19
 
Ad

Recently uploaded (20)

PDF
AI Image Enhancer: Revolutionizing Visual Quality”
docmasoom
 
PPTX
Function & Procedure: Function Vs Procedure in PL/SQL
Shani Tiwari
 
PDF
Message Level Status (MLS): The Instant Feedback Mechanism for UAE e-Invoicin...
Prachi Desai
 
PPTX
Odoo Migration Services by CandidRoot Solutions
CandidRoot Solutions Private Limited
 
PPTX
ChessBase 18.02 Crack + Serial Key Free Download
cracked shares
 
PPTX
UI5con_2025_Accessibility_Ever_Evolving_
gerganakremenska1
 
PDF
Introduction to Apache Iceberg™ & Tableflow
Alluxio, Inc.
 
PDF
Troubleshooting Virtual Threads in Java!
Tier1 app
 
PPTX
Cutting Optimization Pro 5.18.2 Crack With Free Download
cracked shares
 
PDF
Instantiations Company Update (ESUG 2025)
ESUG
 
PDF
Code and No-Code Journeys: The Maintenance Shortcut
Applitools
 
PDF
How Attendance Management Software is Revolutionizing Education.pdf
Pikmykid
 
PDF
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
PDF
custom development enhancement | Togglenow.pdf
aswinisuhu
 
PDF
Ready Layer One: Intro to the Model Context Protocol
mmckenna1
 
PDF
Show Which Projects Support Your Strategy and Deliver Results with OnePlan df
OnePlan Solutions
 
PDF
Understanding the EU Cyber Resilience Act
ICS
 
PPTX
SAP Public Cloud PPT , SAP PPT, Public Cloud PPT
sonawanekundan2024
 
PDF
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
 
PPT
Brief History of Python by Learning Python in three hours
adanechb21
 
AI Image Enhancer: Revolutionizing Visual Quality”
docmasoom
 
Function & Procedure: Function Vs Procedure in PL/SQL
Shani Tiwari
 
Message Level Status (MLS): The Instant Feedback Mechanism for UAE e-Invoicin...
Prachi Desai
 
Odoo Migration Services by CandidRoot Solutions
CandidRoot Solutions Private Limited
 
ChessBase 18.02 Crack + Serial Key Free Download
cracked shares
 
UI5con_2025_Accessibility_Ever_Evolving_
gerganakremenska1
 
Introduction to Apache Iceberg™ & Tableflow
Alluxio, Inc.
 
Troubleshooting Virtual Threads in Java!
Tier1 app
 
Cutting Optimization Pro 5.18.2 Crack With Free Download
cracked shares
 
Instantiations Company Update (ESUG 2025)
ESUG
 
Code and No-Code Journeys: The Maintenance Shortcut
Applitools
 
How Attendance Management Software is Revolutionizing Education.pdf
Pikmykid
 
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
custom development enhancement | Togglenow.pdf
aswinisuhu
 
Ready Layer One: Intro to the Model Context Protocol
mmckenna1
 
Show Which Projects Support Your Strategy and Deliver Results with OnePlan df
OnePlan Solutions
 
Understanding the EU Cyber Resilience Act
ICS
 
SAP Public Cloud PPT , SAP PPT, Public Cloud PPT
sonawanekundan2024
 
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
 
Brief History of Python by Learning Python in three hours
adanechb21
 

Avoid resource contention with e4 c

  • 1. AVOID RESOURCE CONTENTION WITH ECO4CLOUD TECHNOLOGY A PRIMARY TELCO USE CASE Ph. +39 0984 494276 Piazza Vermicelli 87036 Rende (CS), Italy www.eco4cloud.com [email protected] Copyright © 2016 Eco4Cloud. All rights reserved. This product is protected by Italian and international copyright and intellectual property laws. Eco4Cloud — www.eco4cloud.com | Phone +39 0984494276 | E-mail [email protected]
  • 2. AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY © 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 2 Overcommitment and Contention 1. Introduction VMware® ESX™ is a hypervisor designed to efficiently manage hardware resources including CPU, memory, storage and network among multiple concurrent virtual machines [1]. ESX uses high-level resource management policies to compute a target memory allocation for each virtual machine (VM), based on the current system load and parameter settings for the virtual machine (shares, reservation, and limit [2]). The computed target allocation is used to guide the dynamic adjustment of the memory allocation for each virtual machine; in case host memory is overcommitted, the target allocations are achieved by invoking several lower-level mechanisms to reclaim memory from virtual machines. VMware ESX enables impressive memory and CPU consolidation ratios; ESX allows running VMs with total configured resources that exceed the amount available on the physical machine: this is called overcommitment. Overcommitment raises the consolidation ratio, increases operational efficiency and lowers total cost of operating virtual machines; if out of control, overcommitment leads to Resource Contention, a typical situation where several VMs are competing over the same resources, waiting for the VMware scheduler to assign them. This is the main reason for performance issues in virtualized environment and, as such, it’s the very first key performance indicator to be monitored in a virtual farm. Contention is measured via CPU Ready Time and Memory Ballooning. 2. CPU Ready Time CPU Ready Time is the period of time a VM waits in a ready-to-run state (meaning it has work to do) before being scheduled by the hypervisor on one or more physical CPUs. Therefore, CPU Ready Time is a metric showing how much time virtual CPU is ready to be scheduled on a given physical host. In general terms, it is normal for VMs to have small values of CPU Ready Time, even if the hypervisor is not over subscribed, or under heavy activity; it is just the nature of shared scheduling in virtualization. For SMP VMs with multiple vCPUs, the amount of ready time will generally be higher than for VMs with fewer vCPUs, In general terms, it is normal for VMs to have small values of CPU Ready Time
  • 3. AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY © 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 3 since it requires more resources to schedule/co-schedule the VM when necessary and each CPU accumulates the time separately; under normal operating conditions, this value should remain under 5%. If ready time values are higher, virtual machines experience bad performance. Even in best designed environments there will be some CPU contention and that is okay. Any %ready number less than 5% is considered the optimal area to be in. Once your %ready number climbs in between 5 and 10%, you need to pay attention when adding more virtual machines and/or CPU cores to the virtual machines. We can call this the warning area. Now, once the %ready numbers climb higher than 10%, you will reach the dangerous area and as a consequence bad performance will impact those virtual machines. Your host could show a %50 overall CPU utilization and strong CPU contention in your environment, thus affecting the overall performance of your virtual machines. Just to summarize, CPU contention is one of the hidden issues you might find in your environment, unless you know where looking for. The best tool to use when looking for any CPU contention in your environment is ESXTOP from inside the service console of the host, RESXTOP from the vMA appliance, or other third-party tools, like Eco4Cloud. The best defense against CPU contention is knowledge and comprehension of scheduler interactions with multi-processor virtual machines; if you are using multi-processor systems, take into account that potential issue. While there are a number of scenarios where high values of CPU Ready Time can occur, there are two most common scenarios. The first common reason tends to be host over subscription, where too many vCPUs have been allocated per pCPU ratio wise; while ESX 5 supports a maximum of 25 vCPUs per physical CPU, this is definitely the case where just because you can do it, it equals to a good practice. As always, your mileage may vary based on your specific VM workloads, but typically you begin to experience some problems when a host is in the range of 2-2.5X over subscribed for server workloads. The second most common scenario where CPU Ready Time goes higher is when a larger SMP VM, for example a 4-8 vCPUs running on a host having a lot of smaller VMs with 1-2 vCPUs for application servers. Depending on the number of physical processors and on the total number of vCPUs allocated on the host, a larger resource allocation for the VM results in longer waiting time, because the hypervisor has to preempt the necessary physical CPUs to schedule/co-schedule the workload. When this issue occurs, the software vendor increases vCPUs requirements, due to performance problems for the VM. Unfortunately, if CPU Ready Time is the root cause, increasing vCPUs number actually does not improve performance, on the contrary things get worse. The best defense against CPU contention is knowledge
  • 4. AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY © 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 4 3. Memory Ballooning One of main benefits introduced by virtualization is virtual machines isolation, which is very useful for security and risk management. A drawback of virtual machines isolation is that the guest operating system is not aware it is running inside a virtual machine and is not aware of the states of other virtual machines on the same physical host. When the hypervisor runs multiple virtual machines and the total amount of free host memory gets low, none of the virtual machines will release guest physical memory, since when the guest operating system cannot detect the host’s memory shortage. VMware ballooning is a memory reclamation technique used when an ESXi host is running low on memory. This allows the physical host system to retrieve unused memory from certain guest virtual machines (VMs) and share it with others [3]. Ballooning makes the guest operating system aware of the low memory status of the host. In ESX, a balloon driver is loaded into the guest operating system as a pseudo-device driver. It has no external interface to the guest operating system and communicates with the hypervisor through a private channel. The balloon driver polls the hypervisor to obtain a target balloon size. If the hypervisor needs to reclaim virtual machine memory, it sets a proper target balloon size for the balloon driver, making it “inflate” by allocating guest physical pages within the virtual machine. Ballooned memory is a symptom of RAM memory contention. If host free memory drops towards the 4% threshold, the hypervisor starts to reclaim memory, using ballooning. VM memory ballooning can create performance degradation. Ballooning is a CPU intensive process, and can eventually lead to memory swapping, when a balloon driver inflates to the point where the VM no longer has enough memory to run its processes. This will slow down the VMs, depending upon the amount of memory to recoup and/or the quality of the storage IOPS delivered to it. 4. Why these counters are important CPU Ready Time and Ballooned Memory are symptoms of contention on CPU and RAM, respectively. These metrics represent, in IT literature, the universally recognized two most significant indicators of the fact that virtual machines are experiencing bad performance. The generally accepted industry best practice based on VMware’s guidelines is that CPU Ready Time values up to 5% (per vCPU) fall within acceptable parameters.
  • 5. AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY © 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 5 Memory Ballooning is the first technique the hypervisor uses to reclaim memory. Absence or very low levels of ballooning is a sign of excellent/good health for a virtual farm. Eco4Cloud Workload Consolidation intelligence computes the ideal placement of VMs among physical hosts, in order to decrease both CPU Ready Time and Memory Ballooning, enabling higher performance and VMs density. 5. Test Workflow A field test has been performed in a performance comparison between VMware® Distributed Resource Scheduler and Eco4Cloud Workload Consolidation platform. VMware® Distributed Resource Scheduler (DRS) aggregates computing capacity across a set of servers into logical resource pools and intelligently allocates available resources among the VMs, based on pre-defined rules. VMware Distributed Power Management (DPM), within VMware DRS, automates power management and minimizes power consumption across a given collection of servers in a VMware DRS cluster. The test was performed on a cluster in a production farm of a leader Italian Telco company; the cluster contained 6 physical hosts running vmware vSphere version 5.0. The hosts were HP ProLiant DL580 G5, equipped with 64GB RAM and 4 CPU socket. Three hosts mounted 4x Intel® Xeon® CPU E7320 @ 2.13GHz while the other three mounted 4x Intel® Xeon® CPU X7350 @ 2.93GHz. Each CPU had 4 physical cores, so the total number of physical cores for each host was 16. The hosts ran about 94 virtual machines with a number of virtual CPU assigned cores that range from 1 to 8 (most of them with 2 or 4 virtual cores) and an amount of assigned RAM varying from 1 to 16 GB (most of them with 2 or 4 GB RAM). The guests operating systems were: 80% Microsoft Windows (various editions, 32 and 64 bit), 14% Linux Red Hat Enterprise (5 and 6, 32 and 64 bit), and 6% Oracle Solaris 10 64 bit. The average CPU host usage during the test performance was about 28%. In order to collect valuable data, a set of tests using VMWare DRS and E4C Workload Consolidation were performed. Overall test was set to run in 6 days, divided in two equal length phases. During first phase (3 days) workload placement was managed with VMware® DRS in fully automated mode and Eco4Cloud Workload Consolidation was disabled. Avoiding ballooning is sign of good health for a virtual farm
  • 6. AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY © 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 6 After that first phase, a second one of additional 3 days occurred: Eco4Cloud Workload Consolidation was enabled and VMware® DRS was put in partially automated mode. The two phases were comparable, because the production workload on the given cluster did not change significantly. 6. Results Just after the end of the test, it was crystal clear that through Eco4Cloud Workload Consolidation usage, the overall cluster performance increased: on one hand, CPU Ready Time dropped by 23%; on the other hand, Ballooned Memory was completely removed, through the intelligent workload placement strategy brought by Eco4Cloud Workload Consolidation. On the memory side, the result is crystal clear: problem solved. On the CPU side, the result positively affects performance; 23% is just an aggregate value. Let’s see how CPU Ready Time decreases in most important cases, where CPU Ready Time exceeds the warning and alert thresholds, 5% and 10%, respectively. Ballooning memory totally removed
  • 7. AVOID RESOURCE CONTENTION WITH E4C TECHNOLOGY © 2016 Eco4Cloud and/or its affiliates. All rights reserved. This document is Eco4Cloud Public. Page 7 As you can see from the following exhibit, CPU Ready Time warnings decrease by 90.26%, while CPU Ready Time alerts decrease by 42.86%. It means, in our evaluation scenario: - 514 less warning/alerts each day, per cluster - 3598 less warnings/alerts each week, per cluster Given how much time it takes to manage a performance warning or an alert, evaluating how much time you can save with an intelligent workload placement solution is simple math. References [1] Carl A. Waldspurger. “Memory Resource Management in VMware ESX Server”. Proceeding of the fifth Symposium on Operating System Design and Implementation, Boston, Dec 2002 [2] vSphere Resource Management Guide. VMware. http://www.vmware.com/pdf/vsphere4/r40/vsp_40_upgrade_guide.pdf [3] Understanding Memory Resource Management in VMware® ESX™ Server http://www.vmware.com/files/pdf/perf-vsphere-memory_management.pdf For more information  E4C Workload Consolidation: http://www.eco4cloud.com/workload-consolidation  Eco4Cloud Workload Consolidation Product Overview  Eco4Cloud Workload Consolidation FAQ Ph. +39 0984 494276 Piazza Vermicelli 87036 Rende (CS), Italy www.eco4cloud.com [email protected] CPU Ready Time warnings and alerts decreased by more than 90% and 42%, respectively