Vsphere Troubleshooting - I: Balaji CB It Architect 27 Jun 2013
Vsphere Troubleshooting - I: Balaji CB It Architect 27 Jun 2013
27 Jun 2013
vSphere Troubleshooting - I
Agenda
Troubleshooting Approach
Troubleshooting VM related issues
Troubleshooting vMotion and HA issues
Troubleshooting vCenter & Host related issues
Troubleshooting Network issues
Troubleshooting Storage Issues
Performance Troubleshooting
Troubleshooting Approach
vSphere Features
Understand the different files that constitute a VM vmx, vswap, nvram, vmdk, flat.vmdk
delta.vmdk, etc.
Critical files vmx, vmdk
Commands that run on the vmx file
vmware-cmd (ESX) ex. Vmware-cmd -l
vim-cmd (ESXi) ex. vim-cmd vmsvc/getallvms
vm-support
esxcli vm process - list, kill, etc.
Unresponsive VM
Vmware-cmd to get status, kill, power on
Kill the PID
Resource availability
Locks on virtual machine files vmx, vswap, vmdk
Vmkfstools D vmxfile
Vm-support x, -X
Vmdk in use by other VMs
Troubleshooting vMotion
Troubleshooting vMotion
Troubleshooting HA issues
Correct configuration
Check name resolution is configured correctly on the ESX and vCenter Servers
Check time synchronization
Check network connectivity between the hosts themselves and the VC.
Examine the contents of these three files:
/etc/hosts, /etc/sysconfig/network, /etc/vmware/esx.conf
HA advanced configuration
das.isolationaddress, das.allowNetwork[x], and more
Slot size
FDM - Fault Domain Manager
Verify that all the configuration files of the FDM agent were pushed successfully from the vCenter
Server to your ESXi host:
Location: /etc/opt/vmware/fdm
File Names: clusterconfig (cluster configuration), compatlist (host compatibility list for virtual
machines), hostlist (host membership list), and fdm.cfg.
Check logs -
/var/log/fdm.log or /var/run/log/fdm* (one log file for FDM operations)
/var/log/fdm-installer.log (FDM agent installation log)
ESX/ESXi logging
/var/log/messages - OS log
/var/log/vmware/hostd.log - host agent log
/var/log/vmware/vpx/vpxa.log - vcenter agent log
/varlog/vmkernel - vmkernel log
/var/log/vmwarning - vmkernel warnings
vSphere client connected directly to host - also can display system logs
System logs - accessible from DCUI also
Performance issues
Confirm whether it is really a performance issue or is it a perception issue.
Ask the user to explain the difference in performance.
Always start from Guest OS level, Move to the individual VM, and then host.
Causes
CPU constraints
Memory overcommitment
Storage latency
Network latency
ESXTOP best way to check for performance
CPU
Load average
A load average of 1.00 means that the ESX/ESXi Server machines physical CPUs are
fully utilized, and a load average of 0.5 means that they are half utilized.
%READY field
The percentage of time that the virtual machine was ready but could not be scheduled
to run on a physical CPU. Normal operating conditions, this value should remain
under 5%