0% found this document useful (0 votes)
66 views

Vsphere Troubleshooting - I: Balaji CB It Architect 27 Jun 2013

The document discusses troubleshooting approaches and common issues in vSphere including troubleshooting virtual machine, vMotion, HA, vCenter, ESX/ESXi host, network and storage issues. It provides detailed steps and commands to troubleshoot each issue.

Uploaded by

Sreenath Gooty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Vsphere Troubleshooting - I: Balaji CB It Architect 27 Jun 2013

The document discusses troubleshooting approaches and common issues in vSphere including troubleshooting virtual machine, vMotion, HA, vCenter, ESX/ESXi host, network and storage issues. It provides detailed steps and commands to troubleshoot each issue.

Uploaded by

Sreenath Gooty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Balaji CB – IT Architect

27 Jun 2013

vSphere Troubleshooting - I

© 2009 IBM Corporation


vSphere Troubleshooting - I

Agenda

 Troubleshooting Approach
 Troubleshooting VM related issues
 Troubleshooting vMotion and HA issues
 Troubleshooting vCenter & Host related issues
 Troubleshooting Network issues
 Troubleshooting Storage Issues
 Performance Troubleshooting

2 © 2009 IBM Corporation


vSphere Troubleshooting - I

Troubleshooting Approach

 Do not Panic. Approach the issue with a calm mind.


 Collect History about the problem. What happened before the issue occurred, any changes
in the recent past to the environment, even if trivial, it can cause an issue
 Get the exact Time Stamp of the issue. This is IMPORTANT.
 Check the LOGS - logs always have some info about the issue and possibly the cause of the
issue.
 Perform any step only if you know what is the expected outcome.
 If you are not sure about the direction, call out for help.
 Documentation – Very critical in major outages.

3 © 2009 IBM Corporation


vSphere Troubleshooting - I

vSphere Features

 Features (services) provided by the vSphere product.

4 © 2009 IBM Corporation


vSphere Troubleshooting - I

Troubleshooting VM related issues

 Understand the different files that constitute a VM – vmx, vswap, nvram, vmdk, flat.vmdk
delta.vmdk, etc.
– Critical files – vmx, vmdk
 Commands that run on the vmx file
– vmware-cmd (ESX) ex. Vmware-cmd -l
– vim-cmd (ESXi) ex. vim-cmd vmsvc/getallvms
– vm-support
– esxcli vm process - list, kill, etc.
 Unresponsive VM
– Vmware-cmd to get status, kill, power on
– Kill the PID
– Resource availability
 Locks on virtual machine files – vmx, vswap, vmdk
– Vmkfstools –D vmxfile
– Vm-support –x, -X
– Vmdk in use by other VMs

5 © 2009 IBM Corporation


vSphere Troubleshooting - I

Troubleshooting VM related issues

 Corrupted VMX file


 Missing descriptor files
 Snapshot issues
– CID mismatch You can verify if the CID mismatch has been corrected by running this
command against the highest level snapshot .vmdk:
• # vmkfstools -q snapshot_xxxxxx#.vmdk -v10 | less
Note: If there are failed messages in the results, the CID mismatch has not been
corrected or there is still a mismatch in the snapshot hierarchy.
– Monitor Snapshot deletion
• # watch -d 'ls -luth | grep -E "delta|flat"'

6 © 2009 IBM Corporation


vSphere Troubleshooting - I

Troubleshooting vMotion

 vMotion should be enabled on all ESX hosts


 VMkernel network connectivity exists using vmkping
 Verify that Name Resolution is valid on ESX.
 Ensure that the virtual machine does not have reservations set that exceed the available
resources of the target ESX host. Also check for VM configuration like CPU affinity that can
prevent vMotion.
 VirtualCenter only allows VMotion migration between CPU's that have similar features
 CPUs need to have the same manufacturer (AMD or Intel), be of the same family (Pentium,
Xeon, etc.), and have common extended features between the ESX Servers involved in the
VMotion.
 Hosts have the same features turned on in the BIOS of the Servers - Two most common
features to check are Non-Execute Memory Protection (NX / XD) and Virtualization
Technology (VT / AMD-V)
 Check for IP conflicts. Ensure that the IP addresses for your vCenter Servers and ESX/ESXi
hosts are unique

7 © 2009 IBM Corporation


vSphere Troubleshooting - I

Troubleshooting vMotion

 Checking for resource starvation on the ESX host


 Verify that time is synchronized across environment.
 If the virtual machine being vMotioned is a 64-bit virtual machine, verify that the VT option is
enabled on both of your ESX hosts.
 Select ESX host -> Configuration -> Advanced Setting(under Software) -> Migrate.
– Change Migrate.Enabled to 0. Select OK. Then change Migrate.Enabled setting back to
1 again.
 If Jumbo Frames are enabled (MTU of 9000), ensure that vmkping is communicating using
this command:
– vmkping -d -s MTU_size_range_1500_to_9000 destination_IP_address
 Enable / Disable vMotion setting on the vmkernel portgroup.
 Check for packet loss over the vMotion network. Try having the source host ping (vmkping)
the destination host's vMotion vmknic IP address for the duration of the vMotion.

8 © 2009 IBM Corporation


vSphere Troubleshooting - I

Troubleshooting HA issues
 Correct configuration
 Check name resolution is configured correctly on the ESX and vCenter Servers
 Check time synchronization
 Check network connectivity between the hosts themselves and the VC.
 Examine the contents of these three files:
– /etc/hosts, /etc/sysconfig/network, /etc/vmware/esx.conf
 HA advanced configuration
– das.isolationaddress, das.allowNetwork[x], and more
 Slot size
FDM - Fault Domain Manager
 Verify that all the configuration files of the FDM agent were pushed successfully from the vCenter
Server to your ESXi host:
– Location: /etc/opt/vmware/fdm
– File Names: clusterconfig (cluster configuration), compatlist (host compatibility list for virtual
machines), hostlist (host membership list), and fdm.cfg.
 Check logs -
– /var/log/fdm.log or /var/run/log/fdm* (one log file for FDM operations)
– /var/log/fdm-installer.log (FDM agent installation log)

9 © 2009 IBM Corporation


vSphere Troubleshooting - I

Troubleshooting vCenter issues

 Connection to the Database


 Blocked ports
 Health of the database
 Proper user account is used to start the service
 vCenter Server logging
– vpxd log
• 5MB in size, 10 files are maintained - then rolled over
• Note the time stamp and collect logs immediately
– vpxd profiler log
• Performance related information
• Poor performance for vCenter operations - refer to this log file
– Event logs
 Backtrace in vpxd logs

10 © 2009 IBM Corporation


vSphere Troubleshooting - I

Troubleshooting ESX/ESXi host issues

Host not responding


 Check for network issues
 Check for storage issues, access /vmfs/volumes
 best practice – configure secondary management network
– isolated mgmt network, like separate VLAN
 The Host agent is the "vmware-hostd" service which is the software that collects,
communicates and executes the actions received from the mgmt interfaces and the vcenter
server. This also relies on "vmware-authd" service for authenticating task requests.
– To check if it is running ==> ps | grep hostd
 The next critical service is the "vmware-vpxa" service.
– To check if it is running ==> ps | grep vpxa
 Restart the services.
 Verify ports are open. Use netstat command.

11 © 2009 IBM Corporation


vSphere Troubleshooting - I

Troubleshooting ESX/ESXi host issues

 ESX/ESXi logging
– /var/log/messages - OS log
– /var/log/vmware/hostd.log - host agent log
– /var/log/vmware/vpx/vpxa.log - vcenter agent log
– /varlog/vmkernel - vmkernel log
– /var/log/vmwarning - vmkernel warnings
 vSphere client connected directly to host - also can display system logs
 System logs - accessible from DCUI also

12 © 2009 IBM Corporation


vSphere Troubleshooting - I

Troubleshooting Network issues

 vSwitch (vSS) / Distributed vSwitch (vDS) Configuration


– esx - vswif0
– esxi - vmk0 - created by default on vSwitch0
 Proper load balancing algorithms
– IP hash is required if etherchannel is enabled on the physical switch
 Check MTU size
 VLAN tagging
 ETHTOOL
 ESXTOP
 dvswitch configuration is stored locally in /etc/vmware/dvsdata.db ==> synchronized with
vc every 5 mins
– This allows esx to operate even when vcenter is down
– To read data about dvs on each host you can use the net-dvs commandnet-dvs -f
<filename> ==> this will dump the data to a file which you can then access

13 © 2009 IBM Corporation


vSphere Troubleshooting - I

Troubleshooting Storage issues

 Run ESXTOP - check DAVG/CMD value


 Check for SCSI reservation conflicts - Sync CR errors in the vmkernel logs
 Check sense codes - refer vmware kbs 289902, 1030381, 1029039
– H:0xA D:0xB P:0xC Possible sense data: 0xD 0xE 0xF
• A = Host status (Initiator)
• B = Device Status (Target)
• C = Plugin (VMware Specific)
• D = Sense Key
• E = Additional Sense Code
• F = Additional Sense Code Qualifier
 Resolve SCSI reservations
– - Reduce the number of hosts accessing the LUN
– - Check latest HBA firmware, driver
– - Reduce number of VMs per lun
– - Perform a rescan
– - Check for scheduled backups / antivirus scans on VMs / OS updates
 Vmkernel logs, search vmware kb
– Ex.: grep 'reservation conflict' vmkernel* | awk '{print $13}' | sort | uniq -c

14 © 2009 IBM Corporation


vSphere Troubleshooting - I

Performance issues
 Confirm whether it is really a performance issue or is it a perception issue.
 Ask the user to explain the difference in performance.
 Always start from Guest OS level, Move to the individual VM, and then host.
 Causes
– CPU constraints
– Memory overcommitment
– Storage latency
– Network latency
 ESXTOP – best way to check for performance
– CPU
• Load average
A load average of 1.00 means that the ESX/ESXi Server machine’s physical CPUs are
fully utilized, and a load average of 0.5 means that they are half utilized.
• %READY field
The percentage of time that the virtual machine was ready but could not be scheduled
to run on a physical CPU. Normal operating conditions, this value should remain
under 5%

15 © 2009 IBM Corporation


vSphere Troubleshooting - I

Vmware Resolution Paths

 Learn the resolution paths


– There is only so many steps available for troubleshooting and 80% of the issues can be
resolved by using these steps.
– And these steps are covered in the resolution paths.
– http://blogs.vmware.com/kb/resolution-paths

16 © 2009 IBM Corporation

You might also like