SlideShare a Scribd company logo
VMware Performance
  Troubleshooting

 Presented by Chris Kranz
Topics Covered
•   Introduction
•   Root Cause Analysis
•   Performance Characteristics
•   CPU
•   Networking
•   Memory
•   Disk
•   Virtual Machine optimisation
•   ESXTop
•   vm-support
•   Service Console
•   Resource Groups
•   Design Guidelines
•   Capacity Planner limitations and cautions
•   Conclusion
•   Reference Articles
Introduction
Multiple layers of virtualisation are used to
increase service levels, availability and
manageability

However, multiple layers of virtualisation often
mask performance and configuration issues
making it more of a challenge to troubleshoot
and correct

The worst out come is that performance issues
after a virtualisation project lead to the
perception that VMware results in reduced
performance and future confidence in VMware
can be affected
Performance Basics


• Virtual Machine Resources
  – CPU
  – Memory
  – Disk
  – Networking
Resource Maximums

                           Host           Guest
  Logical Processors       64              N/A
  Virtual CPUs             N/A              8
  Virtual CPU’s per Core   20              N/A
  Memory                   1TB            256GB

http://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf
Typical Host

                                       vSphere 1U Host
  CPU’s                                 2 x Quad Core
  Memory                                 32-64GB RAM


      Typical 3 VMs per core, 24VM’s per Host
      Each has 2GB of RAM = 48GB of RAM
Root Cause Analysis




http://www.vmware.com/resources/techresources/10066
Root Cause ...
Monitoring Performance

• Do not rely on guest tools, but
  – Can show high CPU, & Memory Utilisation
  – Measurement of Latency & throughput of Disk &
    Network Interfaces
• Use the virtualisation layer, to diagnose cause:
  – Guest is unaware of virtualisation workload
  – The way in which guest OS’s account time is
    different
  – No visibility of available resources
Performance Analysis Tools

• esxtop (service console only)
• resxtop (remote command line utilities)
• Performance graphs in vCentre
esxtop

• esxtop can be run:
  – Interactively
  – Batch (eg. esxtop -a -b > analysis.csv)
  – Load batch into windows perfmon or MS Excel


• Two keys to remember
  – H : help
  – F : fields to display
esxtop basics


                                Host Resources




   Name of Resource
   Pool, Virtual      Number of Worlds
   Machine or World
Performance Characteristics




    CPU            Memory           Networking       Disk
Slow Processing   Slow Processing    Packet Loss    Log Stalls
 High CPU Wait     Disk Swapping    Slow Network   Disk Queue



                  Slow Application Performance
                    Reduced User Experience
                     Data Loss and Corruption
CPU
         ESX Scheduler

                                        Basic World States
                                        Read / Run / Wait

                                           CPU States
          Service    Virtual           Ready / Usage / Wait
          Console    Machine




      Limits / Shares / Reservations
CPU                        High %RDY + High %User can imply over commitment
esxtop
     •PCPU(%): CPU utilization
     •%USED: Utilization
     •%RDY: Ready Time
     •%RUN: Run Time
     •%WAIT: Wait and idling time
CPU
VI-Client
   Used Time > Ready Time:
   Possible CPU over-committment


                                   Used Time




                                   Ready Time
CPU
Further Investigation


                        %MLMTD shows this VM has been limited
CPU
Further Investigation




                        High ready time caused by CPU resource limit
VMware Memory Management
• Transparent Page Sharing
• VMware Tools Balloon Driver to force the VM to swap to disk
• Virtual Machine Page File
Memory
Ballooning vs. Swapping
 Ballooning driver causes the
 host to swap pages that it
 chooses to disk

 ESX Swapping will swap any
 pages to disk.
Memory

• Ballooning can be disabled (0 value) or
  controlled on a per Virtual Machine basis
  using:
  sched.mem.maxmemctl
• Default is set to 65%, can be controlled at host
  level.
• Only is an issue in resource contention
  scenarios. (or VM’s with low latency eg Citrix)
Memory - Host

VI Client shows memory usage of the host. This is calculated as “consumed + overhead
memory + Service Console”.

Performance charts are a very good way of showing the Virtual Machine memory
breakdown.

    • Consumed Memory
    • Ballooned Memory
    • Shared Memory
    • Swapped Memory
Memory - Guest

  Host Memory = Consumed + Overhead Memory
   Guest Memory = Active Memory for Guest OS
Memory – Guest Overhead
Memory          Virtual Machine Memory Metrics – VI Client

Metric                 Description
Memory Active (KB)     Physical pages touched recently by a VM
Memory Usage (%)       Active memory / configured memory
Memory Consumed (KB)   Machine memory mapped to a virtual machine, including its portion of
                       shared pages. Doesn’t include overhead memory
Memory Granted (KB)    Physical pages allocated to a virtual machine. May be less than
                       configured memory. Includes shared pages. Doesn’t include overhead
                       memory.
Memory Shared (KB)     Physical pages shared with other virtual machines
Memory Balloon (KB)    Physical memory ballooned from a virtual machine
Memory Swapped (KB)    Physical memory in swap file (approx. “swap out – swap in”). Swap out
                       and Swap in are cumulative
Overhead Memory (KB)   Machine pages used for virtualisation
Memory                              Host Memory Metrics – VI Client

Metric                  Description
Memory Active (KB)      Physical pages touched recently by the host
Memory Usage (%)        Active memory / configured memory
Memory Consumed (KB)    Total host physical memory – free memory on host. Includes Overhead
                        and Service Console memory
Memory Granted (KB)     Sum of physical pages allocated to all virtual machines. Doesn’t include
                        overhead memory.
Memory Shared (KB)      Physical pages shared by virtual machines on host
Shared Common (KB)      Total machine pages used by shared pages
Memory Balloon (KB)     Machine pages ballooned from virtual machines
Memory Swap Used (KB)   Physical memory in swap file (approx. “swap out – swap in”). Swap out
                        and Swap in are cumulative
Overhead Memory (KB)    Machine pages used for virtualisation
Memory   PMEM: Total physical memory breakdown
esxtop        VMKMEM: Memory managed by vmkernel
              COSMEM: Service Console memory breakdown
              PSHARE: Page sharing statistics
              SWAP: Swap statistics
              MEMCTL: Balloon driver data
Memory      esxtop / VI Client metrics : Virtual Machines




VI Client         esxtop
Active Memory     TCHD
Memory Usage      %ACTV
Consumed Memory   N/A
Memory Granted    N/A (SZTGT and CMTTGT represent memory scheduler targets)
Memory Shared     SHRD (+SHRDSVD per VM). Must enable COW stats in ESXTOP
Memory Balloon    MCTLSZ
Memory Swapped    SWCUR (SWR/s & SWW/s are rates)
Overhead Memory   OVHD & OVHDMAX
Memory          esxtop / VI Client metrics : Host Usage



VI Client              esxtop
Memory Active          N/A (try /proc/vmware/sched/mem-verbose)
Memory Usage           N/A (try /proc/vmware/sched/mem-verbose)
Memory Consumed        PMEM total – PMEM free
Memory Granted         N/A (SZTGT and CMTTGT represent memory scheduler targets)
Memory Shared          PSHARE (shared)
Memory Shared Common   PSHARE (common)
Memory Balloon         MEMCTL
Memory Swap Used       SWAP (r/w and w/s are rates)
Overhead Memory        OVHD & OVHDMAX
Memory
         VI Client memory usage graph
Memory
         Troubleshooting Memory usage issues
Networking
                       •Switch Assisted Teaming (IP Hash)
                       •VLAN Trunking
                       •Flow Control (full)
                       •Speed & Duplex (1000Mb / Full)
                       •Port Fast
                       •BPDU Disabled
                       •STP Disabled
                       •Link State Tracking
                       •Jumbo Frames
Network configuration is more likely to blame than resource contention
Networking
esxtop
                               Transmit and Receive in Mb/s

              Transmit and Receive in Packets
Networking
esxtop


                  Dropped Packets Transmit
                                   Drop Packets Received
Disk
 Varying Factors
    • File system performance
    • Disk subsystem configuration (SAN, NAS, iSCSI, local disk)
    • Disk caching
    • Disk formats (thick, sparse, thin)

ESX Storage Stack
   •Different latencies for different disks
   •Queuing within the kernel

                                         K: Kernel
                                         D: Device
                                         G: Guest
Disk                                     VI Client statistics

Quite Coarse Statistics
   • Disk read / write rate (KB/s)
   • Disk usage: sum of read BW and write BW (KB/s)
   • Disk read / write requests (per 20s interval)
   • Bus resets / Command aborts (per 20s interval)
   •Per LUN or aggregated stats
Disk                                    esxtop statistics
Aggregated stats similar to VI Client
    • Disk read / write per sec (READS/s, WRITES/s)
    • MB read / write per sec (MBREAD/s, MBWRTN/s)
Latency Statistics
    • Kernel Average / command (KAVG/cmd)
    • Device Average / command (DAVG/cmd)
    • Guest Average / command (GAVG/cmd)
Queuing Information
    • Adapter Queue Length (AQLEN)
    • LUN Queue Length (LQLEN)
    • VMKernel (QUED)
    • Active Queue (ACTV)
    • %Used (%USD = ACTV/LQLEN)
Disk
SAN Rough Estimates
 Purely looking at a single ESX host, roughly:
 Throughput (in MBps) = (Outstanding IOs * Block size in KB) / latency in msec

 FC, rough maximums:
 Effective Link Bandwidth = ~80/90% of Real Bandwidth
      Effective (2Gbps) = 200 – 230 MBps
      Effective (4Gbps) = 410 – 460 MBps
      Effective (8Gbps) = 820 – 920 MBps

 iSCSI / NFS / FCoE, rough maximums:
 Effective Link Bandwidth = ~70/80% of Real Bandwidth
      Effective (1GigE) = 90 – 100 MBps
      Effective (10GigE) = 900 – 1000 MBps
Disk
Desired Latency Calculations
Desired Larency in msec <= (Outstanding IOs * Block size in KB) / Throughput per host

Example:
Number of Hosts: 16
Effective Link Bandwidth: 90 MBps
Throughput per host: 90 / 16 = 5.6 MBps
Desired Latency: (32 * 32) / (5.6) = 182.86 msec


Workload                         Cached Sequential Read        Cached Sequential Write
Desired Latency (msec)                    182.86                        182.86
Observed Latency (msec)                    ~350                          ~180
Throughput Drop?                            Yes                           No
Throughput (MBps)                           ~45                          ~90
Disk          SAN Cache enabled
VI Client             High throughput




SAN Cache disabled
 Poor throughput
Disk
esxtop




             Latency is quite high



            After enabling cache,
            Latency is reduced
Virtual Machine Optimisation
 Deploy all machines from an optimised template!
• VMware tools MUST be installed
• The disks MUST be block aligned to the storage (even when using NFS and SAN)
• Where possible, always separate data disks from OS disks
• Windows performance settings should be optimised for application performance
• Guest operating system timeouts should be set as defined by the SAN vendor
• Pagefile should be separated where appropriate (this can impact VMware SRM however)
• Unused Windows services should be disabled (wireless config, print spooler, audio, etc.)
• Last access update time should be disabled (unless where required)
• Logging of the VM should be disabled (only enabled for troubleshooting)
• Remove any unused virtual hardware (floppy drives, USB, etc.)
• Disable screen savers and power saving features, including logon screen saver
• Enable Remote Desktop, avoid using the VI Client for remote administration
• Install standard applications into template (bginfo, AntiVirus, any host agents, etc)
• Multiple-CPU’s should be allocated sparingly
Virtual Machine Optimisation
Block alignment is vital to good disk performance!
Command Action
      esxtop         space    Update the display
                     ?        Show the help page
Command Options      q        quit
when inside esxtop   f/F      Add or Remove columns from the display
                     o/O      Change the order the display is sorted
                     s        change the update interval
                     #        change the number of instances to display
                     W        Write configuration to file
                     e        Expand / Rollup CPU Stats
                     V        View only VM instances
                     L        Change the length of the NAME field
                     m        Display memory statistics
                     n        Display network statistics
                     i        Display interrupt statistics
                     d        Display disk adapter statistics
                     u        Display disk device statistics
                     v        Display disk VM statistics
esxtop
Command Line Options
from the console
                       Command Action
                       -b       batch mode
                       -l       locks the objects available in the first snapshot
                       -s       enables secure mode
                       -a       show all statistics
                       -c       sets the configuration file
                       -R       enables replay mode (used with “vm-support –S”)
                       -d       sets the update interval
                       -n       runs esxtop for n iterations
esxtop




Expand the default window size for your session to get all statistics
vm-support
Creates a packaged zip file containing the following sections:
   • boot
         • contains the grub configuration
   • etc
         • contains the Console OS configuration files (cron, tcpwrappers, syslog, etc)
   • proc
         • contains much of the hardware configuration modules and variables
   • tmp
         • contains a lot of the ESX specific configuration output
   • var
         • contains log files and any core dumps
   • vmfs
         • contains the structure of the VMFS datastores
   • esx3-installation (where appropriate)
         • contains a copy if the previous esx3 configuration variables
vm-support
Using vm-support to extract performance information:

vm-support –S –d <duration> -i <interval>
<duration> and <interval> are in seconds

The output from this can then be replayed in esxtop for review after it has been
extracted.

esxtop –R <path_to_vm-support_output>
Service Console Performance
•Multiple Service Console networks – for network resiliency
•Increased Service Console memory – upto 800MB
•Use host agents supplied by your vendors
•Make storage recommended tweaks such as HBA Queue Depth
and IO timeouts
•Minimal use of the VI Client console – RDP or SSH instead
•Properly sized vCenter server – 64bit OS where possible
Resource Groups
                  Dynamically reallocate resource shares




                  Additional VM, shares allow you to over-
                  commit resources and have a graceful
                  re-allocation



                  Remove a VM and exploit extra resources
                  across all remaining VM’s
Design Guidelines
• Full Resilience / Multiple paths
• Standard configuration across all aspects (ESX, Storage, Networking, etc.)
• Standard naming conventions
• Learn from others mistakes
• Follow guidelines from vendors best-practices
• Rule out the basics before requesting support
Capacity Planner & P2V Cautions and Limitations

• Peak CPU usage can sometimes be misleading
• Back-end storage system performance
• P2V machines will require block-aligning to the storage
• P2V machines will still require guest OS optimisation
Conclusion
• Performance issues can often be traced with simple root cause
analysis using basic tools (VI Client / esxtop)
• Performance tools help diagnose issues and help rule out non-
issues
• Performance tools are useful in different contexts, not always
either/or
    • Real-time data and troubleshooting: esxtop
    • Historical data: VI Client
    • Coarse resource / cluster usage: VI Client
    • Detailed resource usage: esxtop
• Combine information from various tools to get a complete picture
• Always benchmark your systems first so you not what the optimal
performance is that you can receive
Reference Articles
•   http://www.vmware.com/pdf/esx3_memory.pdf
•   http://www.vmworld.com/docs/DOC-2370
•   http://blogs.vmware.com/performance/
•   http://communities.vmware.com/docs/DOC-5420
•   http://kb.vmware.com/kb/1008205
•   http://communities.vmware.com/community/vmtn/general/performance
•   http://www.vmware.com/products/vmmark/
•   http://www.vmware.com/pdf/vsphere4/r40/vsp_40_san_cfg.pdf
•   http://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdf
•   http://www.vmware.com/pdf/vsphere4/r40/vsp_40_resource_mgmt.pdf
•   http://www.vmware.com/pdf/GuestOS_guide.pdf
•   http://www.vmware.com/resources/techresources/10066
•   http://www.vmware.com/resources/techresources/10059
•   http://www.vmware.com/resources/techresources/10062

More Related Content

What's hot (20)

PDF
Tuning DB2 in a Solaris Environment
Jignesh Shah
 
PDF
Postgres & Red Hat Cluster Suite
EDB
 
PDF
XS Boston 2008 Memory Overcommit
The Linux Foundation
 
PPTX
Demand-Based Coordinated Scheduling for SMP VMs
Hwanju Kim
 
PDF
XS Boston 2008 Quantitative
The Linux Foundation
 
PPTX
6. Live VM migration
Hwanju Kim
 
PDF
Postgres on OpenStack
EDB
 
PDF
Advanced performance troubleshooting using esxtop
Alan Renouf
 
PDF
Tuning Linux Windows and Firebird for Heavy Workload
Marius Adrian Popa
 
PPTX
How swift is your Swift - SD.pptx
OpenStack Foundation
 
PPTX
CPU Scheduling for Virtual Desktop Infrastructure
Hwanju Kim
 
PPTX
Virtual Infrastructure Disaster Recovery
Davoud Teimouri
 
PDF
Xen Memory Management
The Linux Foundation
 
PDF
My experience with embedding PostgreSQL
Jignesh Shah
 
PDF
12 christian ferber xen_server_advanced
Digicomp Academy AG
 
PPTX
Planning & Best Practice for Microsoft Virtualization
Lai Yoong Seng
 
PPTX
PostgreSQL and Linux Containers
Jignesh Shah
 
PPTX
Scott Schnoll - Exchange server 2013 high availability and site resilience
Nordic Infrastructure Conference
 
PPTX
Revisiting CephFS MDS and mClock QoS Scheduler
Yongseok Oh
 
PDF
Distributed Caching Essential Lessons (Ts 1402)
Yury Kaliaha
 
Tuning DB2 in a Solaris Environment
Jignesh Shah
 
Postgres & Red Hat Cluster Suite
EDB
 
XS Boston 2008 Memory Overcommit
The Linux Foundation
 
Demand-Based Coordinated Scheduling for SMP VMs
Hwanju Kim
 
XS Boston 2008 Quantitative
The Linux Foundation
 
6. Live VM migration
Hwanju Kim
 
Postgres on OpenStack
EDB
 
Advanced performance troubleshooting using esxtop
Alan Renouf
 
Tuning Linux Windows and Firebird for Heavy Workload
Marius Adrian Popa
 
How swift is your Swift - SD.pptx
OpenStack Foundation
 
CPU Scheduling for Virtual Desktop Infrastructure
Hwanju Kim
 
Virtual Infrastructure Disaster Recovery
Davoud Teimouri
 
Xen Memory Management
The Linux Foundation
 
My experience with embedding PostgreSQL
Jignesh Shah
 
12 christian ferber xen_server_advanced
Digicomp Academy AG
 
Planning & Best Practice for Microsoft Virtualization
Lai Yoong Seng
 
PostgreSQL and Linux Containers
Jignesh Shah
 
Scott Schnoll - Exchange server 2013 high availability and site resilience
Nordic Infrastructure Conference
 
Revisiting CephFS MDS and mClock QoS Scheduler
Yongseok Oh
 
Distributed Caching Essential Lessons (Ts 1402)
Yury Kaliaha
 

Similar to Vmwareperformancetroubleshooting 100224104321-phpapp02 (20)

PPTX
VDI Design Guide
Dan Brinkmann
 
PDF
vSphere APIs for performance monitoring
Alan Renouf
 
PPTX
PlovDev 2016: Application Performance in Virtualized Environments by Todor T...
PlovDev Conference
 
PPTX
Right-Sizing your SQL Server Virtual Machine
heraflux
 
PDF
Presentation v mware performance overview
solarisyourep
 
PDF
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
The Linux Foundation
 
PPT
VDI storage and storage virtualization
Sisimon Soman
 
PPT
VMWare Performance Tuning by Virtera (Jan 2009)
vmug
 
PDF
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld
 
PDF
VMworld 2013: Extreme Performance Series: Storage in a Flash
VMworld
 
PDF
Spectrum Scale Memory Usage
Tomer Perry
 
PPTX
Get Your GeekOn with Ron - Session One: Designing your VDI Servers
Unidesk Corporation
 
PPTX
ChinaNetCloud - Zabbix Monitoring System Overview
ChinaNetCloud
 
PDF
20160503 Amazed by AWS | Tips about Performance on AWS
Amazon Web Services Korea
 
PDF
VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld
 
PDF
Virtualization overheads
Sandeep Joshi
 
PDF
Presentation architecting a cloud infrastructure
solarisyourep
 
PDF
Presentation architecting a cloud infrastructure
xKinAnx
 
PDF
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward
 
PDF
Exchange 2010 New England Vmug
csharney
 
VDI Design Guide
Dan Brinkmann
 
vSphere APIs for performance monitoring
Alan Renouf
 
PlovDev 2016: Application Performance in Virtualized Environments by Todor T...
PlovDev Conference
 
Right-Sizing your SQL Server Virtual Machine
heraflux
 
Presentation v mware performance overview
solarisyourep
 
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
The Linux Foundation
 
VDI storage and storage virtualization
Sisimon Soman
 
VMWare Performance Tuning by Virtera (Jan 2009)
vmug
 
VMworld 2013: Successfully Virtualize Microsoft Exchange Server
VMworld
 
VMworld 2013: Extreme Performance Series: Storage in a Flash
VMworld
 
Spectrum Scale Memory Usage
Tomer Perry
 
Get Your GeekOn with Ron - Session One: Designing your VDI Servers
Unidesk Corporation
 
ChinaNetCloud - Zabbix Monitoring System Overview
ChinaNetCloud
 
20160503 Amazed by AWS | Tips about Performance on AWS
Amazon Web Services Korea
 
VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld
 
Virtualization overheads
Sandeep Joshi
 
Presentation architecting a cloud infrastructure
solarisyourep
 
Presentation architecting a cloud infrastructure
xKinAnx
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward
 
Exchange 2010 New England Vmug
csharney
 
Ad

More from Suresh Kumar (8)

PPT
Vsphere 4-partner-training180
Suresh Kumar
 
PDF
Vsphere4 100325065654-phpapp01
Suresh Kumar
 
PDF
Vmwareserver tips-tricks-110218231744-phpapp01
Suresh Kumar
 
PPTX
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Suresh Kumar
 
PPTX
Managingvspherewiththevesi 091210144626-phpapp02
Suresh Kumar
 
PDF
Advancedtroubleshooting 101208145718-phpapp01
Suresh Kumar
 
PDF
Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02
Suresh Kumar
 
PPTX
Vstoragetamsupportday1 110311121032-phpapp02
Suresh Kumar
 
Vsphere 4-partner-training180
Suresh Kumar
 
Vsphere4 100325065654-phpapp01
Suresh Kumar
 
Vmwareserver tips-tricks-110218231744-phpapp01
Suresh Kumar
 
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Suresh Kumar
 
Managingvspherewiththevesi 091210144626-phpapp02
Suresh Kumar
 
Advancedtroubleshooting 101208145718-phpapp01
Suresh Kumar
 
Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02
Suresh Kumar
 
Vstoragetamsupportday1 110311121032-phpapp02
Suresh Kumar
 
Ad

Recently uploaded (20)

PPTX
PYLORIC STENOSIS: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PDF
Comprehensive Guide to Writing Effective Literature Reviews for Academic Publ...
AJAYI SAMUEL
 
PPTX
ABDOMINAL WALL DEFECTS:GASTROSCHISIS, OMPHALOCELE.pptx
PRADEEP ABOTHU
 
PPTX
ENGLISH LEARNING ACTIVITY SHE W5Q1.pptxY
CHERIEANNAPRILSULIT1
 
PPTX
Presentation: Climate Citizenship Digital Education
Karl Donert
 
PDF
Exploring-the-Investigative-World-of-Science.pdf/8th class curiosity/1st chap...
Sandeep Swamy
 
PPTX
MALABSORPTION SYNDROME: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PDF
Stepwise procedure (Manually Submitted & Un Attended) Medical Devices Cases
MUHAMMAD SOHAIL
 
PDF
07.15.2025 - Managing Your Members Using a Membership Portal.pdf
TechSoup
 
PPTX
ANORECTAL MALFORMATIONS: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PPTX
national medicinal plants board mpharm.pptx
SHAHEEN SHABBIR
 
PDF
Ziehl-Neelsen Stain: Principle, Procedu.
PRASHANT YADAV
 
PDF
FULL DOCUMENT: Read the full Deloitte and Touche audit report on the National...
Kweku Zurek
 
PPT
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
PPTX
Maternal and Child Tracking system & RCH portal
Ms Usha Vadhel
 
PPTX
Various Psychological tests: challenges and contemporary trends in psychologi...
santoshmohalik1
 
PPTX
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
PPTX
How to Consolidate Subscription Billing in Odoo 18 Sales
Celine George
 
PPTX
FAMILY HEALTH NURSING CARE - UNIT 5 - CHN 1 - GNM 1ST YEAR.pptx
Priyanshu Anand
 
PPTX
Blanket Order in Odoo 17 Purchase App - Odoo Slides
Celine George
 
PYLORIC STENOSIS: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
Comprehensive Guide to Writing Effective Literature Reviews for Academic Publ...
AJAYI SAMUEL
 
ABDOMINAL WALL DEFECTS:GASTROSCHISIS, OMPHALOCELE.pptx
PRADEEP ABOTHU
 
ENGLISH LEARNING ACTIVITY SHE W5Q1.pptxY
CHERIEANNAPRILSULIT1
 
Presentation: Climate Citizenship Digital Education
Karl Donert
 
Exploring-the-Investigative-World-of-Science.pdf/8th class curiosity/1st chap...
Sandeep Swamy
 
MALABSORPTION SYNDROME: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
Stepwise procedure (Manually Submitted & Un Attended) Medical Devices Cases
MUHAMMAD SOHAIL
 
07.15.2025 - Managing Your Members Using a Membership Portal.pdf
TechSoup
 
ANORECTAL MALFORMATIONS: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
national medicinal plants board mpharm.pptx
SHAHEEN SHABBIR
 
Ziehl-Neelsen Stain: Principle, Procedu.
PRASHANT YADAV
 
FULL DOCUMENT: Read the full Deloitte and Touche audit report on the National...
Kweku Zurek
 
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
Maternal and Child Tracking system & RCH portal
Ms Usha Vadhel
 
Various Psychological tests: challenges and contemporary trends in psychologi...
santoshmohalik1
 
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
How to Consolidate Subscription Billing in Odoo 18 Sales
Celine George
 
FAMILY HEALTH NURSING CARE - UNIT 5 - CHN 1 - GNM 1ST YEAR.pptx
Priyanshu Anand
 
Blanket Order in Odoo 17 Purchase App - Odoo Slides
Celine George
 

Vmwareperformancetroubleshooting 100224104321-phpapp02

  • 1. VMware Performance Troubleshooting Presented by Chris Kranz
  • 2. Topics Covered • Introduction • Root Cause Analysis • Performance Characteristics • CPU • Networking • Memory • Disk • Virtual Machine optimisation • ESXTop • vm-support • Service Console • Resource Groups • Design Guidelines • Capacity Planner limitations and cautions • Conclusion • Reference Articles
  • 3. Introduction Multiple layers of virtualisation are used to increase service levels, availability and manageability However, multiple layers of virtualisation often mask performance and configuration issues making it more of a challenge to troubleshoot and correct The worst out come is that performance issues after a virtualisation project lead to the perception that VMware results in reduced performance and future confidence in VMware can be affected
  • 4. Performance Basics • Virtual Machine Resources – CPU – Memory – Disk – Networking
  • 5. Resource Maximums Host Guest Logical Processors 64 N/A Virtual CPUs N/A 8 Virtual CPU’s per Core 20 N/A Memory 1TB 256GB http://www.vmware.com/pdf/vsphere4/r40/vsp_40_config_max.pdf
  • 6. Typical Host vSphere 1U Host CPU’s 2 x Quad Core Memory 32-64GB RAM Typical 3 VMs per core, 24VM’s per Host Each has 2GB of RAM = 48GB of RAM
  • 9. Monitoring Performance • Do not rely on guest tools, but – Can show high CPU, & Memory Utilisation – Measurement of Latency & throughput of Disk & Network Interfaces • Use the virtualisation layer, to diagnose cause: – Guest is unaware of virtualisation workload – The way in which guest OS’s account time is different – No visibility of available resources
  • 10. Performance Analysis Tools • esxtop (service console only) • resxtop (remote command line utilities) • Performance graphs in vCentre
  • 11. esxtop • esxtop can be run: – Interactively – Batch (eg. esxtop -a -b > analysis.csv) – Load batch into windows perfmon or MS Excel • Two keys to remember – H : help – F : fields to display
  • 12. esxtop basics Host Resources Name of Resource Pool, Virtual Number of Worlds Machine or World
  • 13. Performance Characteristics CPU Memory Networking Disk Slow Processing Slow Processing Packet Loss Log Stalls High CPU Wait Disk Swapping Slow Network Disk Queue Slow Application Performance Reduced User Experience Data Loss and Corruption
  • 14. CPU ESX Scheduler Basic World States Read / Run / Wait CPU States Service Virtual Ready / Usage / Wait Console Machine Limits / Shares / Reservations
  • 15. CPU High %RDY + High %User can imply over commitment esxtop •PCPU(%): CPU utilization •%USED: Utilization •%RDY: Ready Time •%RUN: Run Time •%WAIT: Wait and idling time
  • 16. CPU VI-Client Used Time > Ready Time: Possible CPU over-committment Used Time Ready Time
  • 17. CPU Further Investigation %MLMTD shows this VM has been limited
  • 18. CPU Further Investigation High ready time caused by CPU resource limit
  • 19. VMware Memory Management • Transparent Page Sharing • VMware Tools Balloon Driver to force the VM to swap to disk • Virtual Machine Page File
  • 20. Memory Ballooning vs. Swapping Ballooning driver causes the host to swap pages that it chooses to disk ESX Swapping will swap any pages to disk.
  • 21. Memory • Ballooning can be disabled (0 value) or controlled on a per Virtual Machine basis using: sched.mem.maxmemctl • Default is set to 65%, can be controlled at host level. • Only is an issue in resource contention scenarios. (or VM’s with low latency eg Citrix)
  • 22. Memory - Host VI Client shows memory usage of the host. This is calculated as “consumed + overhead memory + Service Console”. Performance charts are a very good way of showing the Virtual Machine memory breakdown. • Consumed Memory • Ballooned Memory • Shared Memory • Swapped Memory
  • 23. Memory - Guest Host Memory = Consumed + Overhead Memory Guest Memory = Active Memory for Guest OS
  • 24. Memory – Guest Overhead
  • 25. Memory Virtual Machine Memory Metrics – VI Client Metric Description Memory Active (KB) Physical pages touched recently by a VM Memory Usage (%) Active memory / configured memory Memory Consumed (KB) Machine memory mapped to a virtual machine, including its portion of shared pages. Doesn’t include overhead memory Memory Granted (KB) Physical pages allocated to a virtual machine. May be less than configured memory. Includes shared pages. Doesn’t include overhead memory. Memory Shared (KB) Physical pages shared with other virtual machines Memory Balloon (KB) Physical memory ballooned from a virtual machine Memory Swapped (KB) Physical memory in swap file (approx. “swap out – swap in”). Swap out and Swap in are cumulative Overhead Memory (KB) Machine pages used for virtualisation
  • 26. Memory Host Memory Metrics – VI Client Metric Description Memory Active (KB) Physical pages touched recently by the host Memory Usage (%) Active memory / configured memory Memory Consumed (KB) Total host physical memory – free memory on host. Includes Overhead and Service Console memory Memory Granted (KB) Sum of physical pages allocated to all virtual machines. Doesn’t include overhead memory. Memory Shared (KB) Physical pages shared by virtual machines on host Shared Common (KB) Total machine pages used by shared pages Memory Balloon (KB) Machine pages ballooned from virtual machines Memory Swap Used (KB) Physical memory in swap file (approx. “swap out – swap in”). Swap out and Swap in are cumulative Overhead Memory (KB) Machine pages used for virtualisation
  • 27. Memory PMEM: Total physical memory breakdown esxtop VMKMEM: Memory managed by vmkernel COSMEM: Service Console memory breakdown PSHARE: Page sharing statistics SWAP: Swap statistics MEMCTL: Balloon driver data
  • 28. Memory esxtop / VI Client metrics : Virtual Machines VI Client esxtop Active Memory TCHD Memory Usage %ACTV Consumed Memory N/A Memory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets) Memory Shared SHRD (+SHRDSVD per VM). Must enable COW stats in ESXTOP Memory Balloon MCTLSZ Memory Swapped SWCUR (SWR/s & SWW/s are rates) Overhead Memory OVHD & OVHDMAX
  • 29. Memory esxtop / VI Client metrics : Host Usage VI Client esxtop Memory Active N/A (try /proc/vmware/sched/mem-verbose) Memory Usage N/A (try /proc/vmware/sched/mem-verbose) Memory Consumed PMEM total – PMEM free Memory Granted N/A (SZTGT and CMTTGT represent memory scheduler targets) Memory Shared PSHARE (shared) Memory Shared Common PSHARE (common) Memory Balloon MEMCTL Memory Swap Used SWAP (r/w and w/s are rates) Overhead Memory OVHD & OVHDMAX
  • 30. Memory VI Client memory usage graph
  • 31. Memory Troubleshooting Memory usage issues
  • 32. Networking •Switch Assisted Teaming (IP Hash) •VLAN Trunking •Flow Control (full) •Speed & Duplex (1000Mb / Full) •Port Fast •BPDU Disabled •STP Disabled •Link State Tracking •Jumbo Frames Network configuration is more likely to blame than resource contention
  • 33. Networking esxtop Transmit and Receive in Mb/s Transmit and Receive in Packets
  • 34. Networking esxtop Dropped Packets Transmit Drop Packets Received
  • 35. Disk Varying Factors • File system performance • Disk subsystem configuration (SAN, NAS, iSCSI, local disk) • Disk caching • Disk formats (thick, sparse, thin) ESX Storage Stack •Different latencies for different disks •Queuing within the kernel K: Kernel D: Device G: Guest
  • 36. Disk VI Client statistics Quite Coarse Statistics • Disk read / write rate (KB/s) • Disk usage: sum of read BW and write BW (KB/s) • Disk read / write requests (per 20s interval) • Bus resets / Command aborts (per 20s interval) •Per LUN or aggregated stats
  • 37. Disk esxtop statistics Aggregated stats similar to VI Client • Disk read / write per sec (READS/s, WRITES/s) • MB read / write per sec (MBREAD/s, MBWRTN/s) Latency Statistics • Kernel Average / command (KAVG/cmd) • Device Average / command (DAVG/cmd) • Guest Average / command (GAVG/cmd) Queuing Information • Adapter Queue Length (AQLEN) • LUN Queue Length (LQLEN) • VMKernel (QUED) • Active Queue (ACTV) • %Used (%USD = ACTV/LQLEN)
  • 38. Disk SAN Rough Estimates Purely looking at a single ESX host, roughly: Throughput (in MBps) = (Outstanding IOs * Block size in KB) / latency in msec FC, rough maximums: Effective Link Bandwidth = ~80/90% of Real Bandwidth Effective (2Gbps) = 200 – 230 MBps Effective (4Gbps) = 410 – 460 MBps Effective (8Gbps) = 820 – 920 MBps iSCSI / NFS / FCoE, rough maximums: Effective Link Bandwidth = ~70/80% of Real Bandwidth Effective (1GigE) = 90 – 100 MBps Effective (10GigE) = 900 – 1000 MBps
  • 39. Disk Desired Latency Calculations Desired Larency in msec <= (Outstanding IOs * Block size in KB) / Throughput per host Example: Number of Hosts: 16 Effective Link Bandwidth: 90 MBps Throughput per host: 90 / 16 = 5.6 MBps Desired Latency: (32 * 32) / (5.6) = 182.86 msec Workload Cached Sequential Read Cached Sequential Write Desired Latency (msec) 182.86 182.86 Observed Latency (msec) ~350 ~180 Throughput Drop? Yes No Throughput (MBps) ~45 ~90
  • 40. Disk SAN Cache enabled VI Client High throughput SAN Cache disabled Poor throughput
  • 41. Disk esxtop Latency is quite high After enabling cache, Latency is reduced
  • 42. Virtual Machine Optimisation Deploy all machines from an optimised template! • VMware tools MUST be installed • The disks MUST be block aligned to the storage (even when using NFS and SAN) • Where possible, always separate data disks from OS disks • Windows performance settings should be optimised for application performance • Guest operating system timeouts should be set as defined by the SAN vendor • Pagefile should be separated where appropriate (this can impact VMware SRM however) • Unused Windows services should be disabled (wireless config, print spooler, audio, etc.) • Last access update time should be disabled (unless where required) • Logging of the VM should be disabled (only enabled for troubleshooting) • Remove any unused virtual hardware (floppy drives, USB, etc.) • Disable screen savers and power saving features, including logon screen saver • Enable Remote Desktop, avoid using the VI Client for remote administration • Install standard applications into template (bginfo, AntiVirus, any host agents, etc) • Multiple-CPU’s should be allocated sparingly
  • 43. Virtual Machine Optimisation Block alignment is vital to good disk performance!
  • 44. Command Action esxtop space Update the display ? Show the help page Command Options q quit when inside esxtop f/F Add or Remove columns from the display o/O Change the order the display is sorted s change the update interval # change the number of instances to display W Write configuration to file e Expand / Rollup CPU Stats V View only VM instances L Change the length of the NAME field m Display memory statistics n Display network statistics i Display interrupt statistics d Display disk adapter statistics u Display disk device statistics v Display disk VM statistics
  • 45. esxtop Command Line Options from the console Command Action -b batch mode -l locks the objects available in the first snapshot -s enables secure mode -a show all statistics -c sets the configuration file -R enables replay mode (used with “vm-support –S”) -d sets the update interval -n runs esxtop for n iterations
  • 46. esxtop Expand the default window size for your session to get all statistics
  • 47. vm-support Creates a packaged zip file containing the following sections: • boot • contains the grub configuration • etc • contains the Console OS configuration files (cron, tcpwrappers, syslog, etc) • proc • contains much of the hardware configuration modules and variables • tmp • contains a lot of the ESX specific configuration output • var • contains log files and any core dumps • vmfs • contains the structure of the VMFS datastores • esx3-installation (where appropriate) • contains a copy if the previous esx3 configuration variables
  • 48. vm-support Using vm-support to extract performance information: vm-support –S –d <duration> -i <interval> <duration> and <interval> are in seconds The output from this can then be replayed in esxtop for review after it has been extracted. esxtop –R <path_to_vm-support_output>
  • 49. Service Console Performance •Multiple Service Console networks – for network resiliency •Increased Service Console memory – upto 800MB •Use host agents supplied by your vendors •Make storage recommended tweaks such as HBA Queue Depth and IO timeouts •Minimal use of the VI Client console – RDP or SSH instead •Properly sized vCenter server – 64bit OS where possible
  • 50. Resource Groups Dynamically reallocate resource shares Additional VM, shares allow you to over- commit resources and have a graceful re-allocation Remove a VM and exploit extra resources across all remaining VM’s
  • 51. Design Guidelines • Full Resilience / Multiple paths • Standard configuration across all aspects (ESX, Storage, Networking, etc.) • Standard naming conventions • Learn from others mistakes • Follow guidelines from vendors best-practices • Rule out the basics before requesting support
  • 52. Capacity Planner & P2V Cautions and Limitations • Peak CPU usage can sometimes be misleading • Back-end storage system performance • P2V machines will require block-aligning to the storage • P2V machines will still require guest OS optimisation
  • 53. Conclusion • Performance issues can often be traced with simple root cause analysis using basic tools (VI Client / esxtop) • Performance tools help diagnose issues and help rule out non- issues • Performance tools are useful in different contexts, not always either/or • Real-time data and troubleshooting: esxtop • Historical data: VI Client • Coarse resource / cluster usage: VI Client • Detailed resource usage: esxtop • Combine information from various tools to get a complete picture • Always benchmark your systems first so you not what the optimal performance is that you can receive
  • 54. Reference Articles • http://www.vmware.com/pdf/esx3_memory.pdf • http://www.vmworld.com/docs/DOC-2370 • http://blogs.vmware.com/performance/ • http://communities.vmware.com/docs/DOC-5420 • http://kb.vmware.com/kb/1008205 • http://communities.vmware.com/community/vmtn/general/performance • http://www.vmware.com/products/vmmark/ • http://www.vmware.com/pdf/vsphere4/r40/vsp_40_san_cfg.pdf • http://www.vmware.com/pdf/vsphere4/r40/vsp_40_iscsi_san_cfg.pdf • http://www.vmware.com/pdf/vsphere4/r40/vsp_40_resource_mgmt.pdf • http://www.vmware.com/pdf/GuestOS_guide.pdf • http://www.vmware.com/resources/techresources/10066 • http://www.vmware.com/resources/techresources/10059 • http://www.vmware.com/resources/techresources/10062