0% found this document useful (0 votes)

471 views

ESXTOP

ESXTOP is a utility for ESX/ESXi hosts that displays real-time resource usage. It has three modes: interactive mode shows live statistics; batch mode saves output to a file; replay mode replays previously collected statistics. ESXTOP allows switching between views of CPU, memory, disk, network, and other usage. Key commands provide filtering and customization options to optimize troubleshooting performance issues.

Uploaded by

Umesh Chavaan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

471 views

ESXTOP

Uploaded by

Umesh Chavaan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 33

ESXTOP

ESXTOP is the utility only for ESX host to examine real-time resource usage for ESX and resxtop for ESX
& ESXi hosts. esxtop can only be used for the local ESX machine but resxtop can be used remotely to
view the resource utilization of ESX/ESXi hosts from other ESX/ESXi servers or VMA.
There are 3 different types of Modes in esxtop/resxtop
Interactive Mode
Batch Mode
Replay Mode.
Interactive mode (the default mode) All statistics are displayed are in real time
Batch mode Statistics can be collected so that the output can be saved in a file (csv) and can be
viewed & analyzed using windows perfmon & other tools in later time.
Replay mode It is similar to record and replay operation. Data that was collected by the vm-support
command is interpreted and played back as esxtop statistics. We can view the captured performance
information for a particular duration or time period as like real time to view what was happening during
that time. It is perfectly used for the VMware support person to replay the stats to understand what was
happening to the server during that time.
ESX Interactive mode (the default mode) All statistics are displayed are in real time which is similar to
windows task manager. By default screen refreshes by every 2 seconds.
Below is snapshot of esxtop with the memory stats

Below are the single key commands in esxtop to switch to different stats while running in Interactive mode.
C CPU View (default screen when you typed esxtop)
Type c in the interactive mode to Switch to the CPU resource utilization screen of ESX server
m Memory view

Type m in the interactive mode to Switch to the memory resource utilization screen of ESX server
d Disk adapter view
Type d in the interactive mode to Switch to the storage disk adapter resource utilization screen of the ESX server
u Disk device view
Type u in the interactive mode to Switch to the storage Disk device resource utilization screen of the ESX server
v Virtual Disk View
Type v in the interactive mode to Switch to the virtual disk resource utilization screen of the ESX server
n Network View
Type n in the interactive mode to Switch to the network utilization screen of the ESX server
y Power Management
Type y in the interactive mode to switch to the power utilization screen of the ESX server
h Help screen for esxtop
Type h to display the help for esxtop commands

q quit the esxtop

Type q to quit from the esxtop interactive mode
f add or remove the fields in the respective view. For example, In the memory stats view, you can add or remove
some fields by pressing f and If you want MEM SIZE to display press G to add that field to display in stats and
press Enter to back to view the added filed in the interactive mode.

o to order the fields in the respective view. use a-o to change order.Uppercase moves a filed left, lowercase moves
a filed right.

s to set the refresh delay to refresh the screen. Default is 5 seconds. Press Space bar to refresh immediately.

W to save as the customized fields. Add or remove the fields as per your wish and if you want the customized
fields to load everytime just save as with the default name (Default to : /root/.esxtop4rc) or save as with name as per
your wish.

To load the esxtop with your customized fields

1. Type ESXTOP command
2. Add or remove the fields which you want, for example I just pressed m to show the memory details
3. Press W to save the file
4. When it prompted to save a config file to type the location other than the default one, for example typed
/home/mohammedk/mystats

5. Quit the current esxtop screen

6. To load ESXTOP with your saved custom configuration, type esxtop c <your configuration file name>, In our
case esxtop c /home/mohammedk/mystats

ESXTOP -Batch Mode

Batch mode Statistics can be collected and output can be saved in a file (csv) and also it can be viewed &
analyzed using windows perfmon & other tools in later time.

To run esxtop in batch mode and save the output file for feature analysis use the command as in in below syntax

esxtop -b -d 10 -n 5 >/home/mohammedk/esxtstats.csv

d Switch is used for the number of seconds between refreshes

n switch is the number of iterations to run the esxtop
In our above example, esxtop command will run for about 50 seconds. 10 seconds dealy* 5 iterations.
( >/home/mohammedk/esxstats.csv) redirecting the output of above esxtop stats into csvfile to store in the location
/home/mohammedk/esxstats.csv

Once the command completed, browse towards the location /home/mohammedk to see the esxtop output file
esxstats.csv. Transfer the csv file using winscp to your windows desktop and analyze using
windows perfmon or esxplot.
ESXTOP Replay Mode
Replay mode It is similar to record and replay operation. Data that was collected by the vm-support command is
interpreted and played back as esxtop statistics. We can view the captured performance information for a particular
duration or time period as like real time to view what was happened during that time.
This will be very useful for VMware support engineers who dont have access to your system to troubleshoot some
performance issues. They can run esxtop against the collected support file to analyze the performance issue occurred
during that particular time. Make sure you have enough free space on your server to save the support file.
Running esxtop for a longer duration will consume a huge amount of disk space.
To run the esxtop in replay mode, first run the vm-support command first. I am running from the directory
/home/mohammedk. So the output file will be saved in the same directory.

vm-support -s i 5 -d 10

i is the iteration and -d is the delay between the refresh. Above command will collect stats for 50 seconds ( 10
seconds * 5 iteration) = 50 seconds.Once vm-support completed, all the files are stored in the location
(/home/mohammedk)
We need to extract the file from esx-2012-06-2813.51.29993.tgz to the same directory using the below command
tar -xzf esx-2012-06-2813.51.29993.tgz

To run the esxtop in replay mode, run the below command with the extracted file vm-support-vmware-arena-201206-2813.51.29993 from the above command.
esxtop -r vm-support-vmware-arena-2012-06-2813.51.29993

The output will appear similar to esxtop command but here we are re playing the support file output
using esxtop replay mode.

ESXTOP is a fantastic tool available for the VMware administrator when troubleshooting performance issues in a
vSphere Environment. ESXTOP has a somewhat steep learning curve, but it is all worth it. In this post I want to help
you get a head start with ESXTOP. If you want a really good read I recommend Duncans very comprehensive post
on the same subject
ESXTOP is available in two ways. Either through the ESXi Shell or through the vSphere Management Assistant
with the command RESXTOP. In this article I will focus on ESXTOP from the ESXi shell. It is very simple to get
access to ESXTOP.
Step 1: Get access to the ESXi Shell. This is done by opening your vSphere Client, go to host, configuration,
security profile and start the ESXi Shell service on a specific ESXi host.
Step 2: Download putty (or another SSH client) and create a SSH connection on port 22 to your ESXi host. Login
with root and your password.
Step 3: Type the command esxtop and hit return
Step 4: You are now looking at ESXTOP it should look similar to this:

What you are looking at is the CPU screen in ESXTOP and you are now looking for CPU specific counters. You can
browse around through different pages. If you type

M you will see memory metrics. N for network etc. If you

type H you will see all available commands. By default ESXTOP shows a lot of worlds a world is similar to a
process in windows task manager. To sort it out and not show vmkernel worlds you type lower case v. By doing
this you only see the virtual machines running on this specific ESXi host.
Now you are inside ESXTOP so lets focus on some good counters to use for performance troubleshooting.

CPU
When troubleshooting CPU performance for your virtual machines the following counters are the most important.

%USED, %RDY, %CSTP

%USED tells you how much time did the virtual machine spend executing CPU cycles on the physical CPU.
%RDY is a Key Performance Indicator! Always start with this one. This one defines how much time your virtual
machine wanted to execute CPU cycles but could not get access to the physical CPU. It tells you how much time did
you spend in a queue. I normally expect this value to be better than 5% (this equals 1000ms in the vCenter
Performance Graphs read about it
CPU Ready 1000 ms. equals 5%
JANUARY 31, 2011 BY

FRANKBRIX 8 COMMENTS

One of the key performance counters in a vSphere enviroment is:

CPU ready (%rdy in ESXTOP)

CPU ready is the time a virtual CPU is ready to run but is not being scheduled on a physical CPU. This would under
normal circumstances indicate that there is not enough physical CPU resources on an ESX/ESXi host. This is the
first go-to counter when your users complain about bad performance.
The CPU ready counter is accessible from the vSphere Client and from ESXTOP. I have made two screenshots
showing the a virtual machine and its ready time:

vCenter Performance Graphs (Value 1035 milliseconeds)

ESXTOP (value 5.38%)

What we see is a virtual machine with a ready time of 1035 ms. or 5.38%. These numbers are actually telling us the
same thing. When we are using the performance graphs the graph updates every 20 second (or 20,000 millisecond).
With a ready time of 1035 ms. we can change it to a percentage:

(1035 ms. x 100) / 20000 ms = 5,175%

To be able to interprept ready times it is essential to know the relationship between the percentage of ESXTOP and
ms. of the Performance Graphs. You are seeing the same numbers. One is in milliseconds the other is a percentage.

1% = 200 ms.
5% = 1,000 ms.
10% = 2,000 ms.
100% = 20,000 ms.
In general you want to see virtual machines with a ready time lower than 1000 ms. or 5%.
Read more about ESXTOP

here

Just heard of a cool calculator to convert cpu ready times to a percentage: http://www.vmcalc.com/

%CSTP tells you how much time a virtual machine is waiting for a virtual machine with multiple vCPU to catch
up. If this number is higher than 3% you should consider lowering the amount of vCPU in your virtual machine.
Memory
When troubleshooting memory performance this is the counters you want to focus on from a virtual machine
perspective.

MCTL?, MCTLSZ, SWCUR, SWR/s, SWW/s

MCTLSZ The column show you how inflated the balloon is in the virtual machine. If it says 500MB it translates
to the balloon driver inside the guest operating system has stolen 500MB from Windows/Linux etc. You would
expect to see a value of 0 (zero) in this column

SWCUR tells you how much memory the virtual machine has in the .vswp file. If you see a number of 500MB
here it means that 500MB is from the swap file. This does not necessarily equals to bad performance. To figure out
if you virtual machine is suffering from hypervisor swapping you need to look at the next two counters. In a healthy
environment you would want this value to p 0 (zero)

SWR/s This value tells you the Read activity to your swap file. If you see a number here, then your virtual
machine is suffering from hypervisor swapping.

SWW/s This value tells you the Write activity to your swap file. You want to see the number 0 (zero) here. Every
number above 0 is BAD.
ESXTOP
Intro
Thresholds
Howto Run
Howto Capture
Howto Analyze
Howto Limit esxtop to specific VMs
References
Changelog
This page is solely dedicated to one of the best tools in the world for ESX; esxtop.
Intro
I am a huge fan of esxtop! I read a couple of pages of the esxtop bible every day before I go to bed. Something I
however am always struggling with is the thresholds of specific metrics. I fully understand that it is not
black/white, performance is the perception of a user in the end.
There must be a certain threshold however. For instance it must be safe to say that when %RDY constantly exceeds
the value of 20 it is very likely that the VM responds sluggish. I want to use this article to define these thresholds,
but I need your help. There are many people reading these articles, together we must know at least a dozen metrics
lets collect and document them with possible causes if known.
Please keep in mind that these should only be used as a guideline when doing performance troubleshooting! Also be
aware that some metrics are not part of the default view. You can add fields to an esxtop view by clicking f on
followed by the corresponding character.

I used VMworld presentations, VMware whitepapers, VMware documentation, VMTN Topics and of course my
own experience as a source and these are the metrics and thresholds I came up with so far. Please comment and help
build the main source for esxtop thresholds.
Metrics and Thresholds
Display

Metric

Threshold

Explanation

CPU

%RDY

Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check

CPU

%CSTP

%MLMTD) has been set. See Jasons explanationfor vSMP VMs

Excessive usage of vSMP. Decrease amount of vCPUs for this

CPU

%SYS

particular VM. This should lead to increased scheduling opportunities.

The percentage of time spent by system services on behalf of the world.
Most likely caused by high IO VM. Check other metrics and VM for

CPU

%MLMTD

possible root cause

The percentage of time the vCPU was ready to run but deliberately
wasnt scheduled because that would violate the CPU limit settings.

CPU

%SWPWT

If larger than 0 the world is being throttled due to the limit on CPU.
VM waiting on swapped pages to be read from disk. Possible cause:

MEM

MCTLSZ

Memory overcommitment.
If larger than 0 host is forcing VMs to inflate balloon driver to reclaim

MEM

SWCUR

memory as host is overcommited.

If larger than 0 host has swapped memory pages in the past. Possible

MEM

SWR/s

cause: Overcommitment.
If larger than 0 host is actively reading from swap(vswp). Possible

MEM

SWW/s

cause: Excessive memory overcommitment.

If larger than 0 host is actively writing to swap(vswp). Possible cause:

MEM

CACHEUSD

Excessive memory overcommitment.

If larger than 0 host has compressed memory. Possible cause: Memory

MEM

ZIP/s

overcommitment.
If larger than 0 host is actively compressing memory. Possible cause:

MEM

UNZIP/s

Memory overcommitment.
If larger than 0 host has accessing compressed memory. Possible cause:

MEM

N%L

Previously host was overcommited on memory.

If less than 80 VM experiences poor NUMA locality. If a VM has a
memory size greater than the amount of memory local to each
processor, the ESX scheduler does not attempt to use NUMA

optimizations for that VM and remotely uses memory via

interconnect. Check GST_ND(X) to find out which NUMA nodes
NETWORK

%DRPTX

are used.
Dropped packets transmitted, hardware overworked. Possible cause:

NETWORK

%DRPRX

very high network utilization

Dropped packets received, hardware overworked. Possible cause: very

DISK

GAVG

high network utilization

Look at DAVG and KAVG as the sum of both is GAVG.

DISK

DAVG

Disk latency most likely to be caused by array.

DISK

KAVG

Disk latency caused by the VMkernel, high KAVG usually means

DISK

QUED

queuing. Check QUED.

Queue maxed out. Possibly queue depth set to low. Check with array

DISK

ABRTS/s

vendor for optimal queue depth value.

Aborts issued by guest(VM) because storage is not responding. For
Windows VMs this happens after 60 seconds by default. Can be caused
for instance when paths failed or array is not accepting any IO for

DISK

RESETS/s

whatever reason.
The number of commands reset per second.

DISK

CONS/s

SCSI Reservation Conflicts per second. If many SCSI Reservation

Conflicts occur performance could be degraded due to the lock on the
VMFS.

Running esxtop

Although understanding all the metrics esxtop provides seem to be impossible using esxtop is fairly simple. When
you get the hang of it you will notice yourself staring at the metrics/thresholds more often than ever. The following
keys are the ones I use the most.
Open console session or ssh to ESX(i) and type:
esxtop
By default the screen will be refreshed every 5 seconds, change this by typing:
s2
Changing views is easy type the following keys for the associated views:
c = cpu
m = memory
n = network
i = interrupts
d = disk adapter
u = disk device (includes NFS as of 4.0 Update 2)
v = disk VM
p = power states
V = only show virtual machine worlds
e = Expand/Rollup CPU statistics, show details of all worlds associated with group (GID)
k = kill world, for tech support purposes only!
l = limit display to a single group (GID), enables you to focus on one VM
# = limiting the number of entitites, for instance the top 5
2 = highlight a row, moving down
8 = highlight a row, moving up
4 = remove selected row from view
e = statistics broken down per world
6 = statistics broken down per world
Add/Remove fields:
f
<type appropriate character>
Changing the order:
o

<move field by typing appropriate character uppercase = left, lowercase = right>

Saving all the settings youve changed:
W
Keep in mind that when you dont change the file-name it will be saved and used as default settings.
Help:
?
In very large environments esxtop can high CPU utilization due to the amount of data that will need to be gathered
and calculations that will need to be done. If CPU appears to highly utilized due to the amount of entities (VMs /
LUNs etc) a command line option can be used which locks specific entities and keeps esxtop from gathering specific
info to limit the amount of CPU power needed:
esxtop -l
More info about this command line option can be found here.
Capturing esxtop results
First things first. Make sure you only capture relevant info. Ditch the metrics you dont need. In other words run
esxtop and remove/add(f) the fields you dont actually need or do need! When you are finished make sure to
write(W) the configuration to disk. You can either write it to the default config file(esxtop4rc) or write the
configuration to a new file.
Now that you have configured esxtop as needed run it in batch mode and save the results to a .csv file:
esxtop -b -d 2 -n 100 > esxtopcapture.csv
Where -b stands for batch mode, -d 2 is a delay of 2 seconds and -n 100 are 100 iterations. In this specific
case esxtop will log all metrics for 200 seconds. If you want to record all metrics make sure to add -a to your
string.
Or what about directly zipping the output as well? These .csv can grow fast and by zipping it a lot of precious
diskspace can be saved!
esxtop -b -a -d 2 -n 100 | gzip -9c > esxtopoutput.csv.gz
Please note that when a new VM is powered on, a VM is vMotion to the host or a new world is created it will not
show up within esxtop when -b is used as the entities are locked! This behavior is similar to starting esxtop with l.

Analyzing results
You can use multiple tools to analyze the captured data.
1.

VisualEsxtop

perfmon

excel

esxplot

What is VisualEsxtop as it is a relatively new tool (published 1st of July 2013).

VisualEsxtop is an enhanced version of resxtop and esxtop. VisualEsxtop can connect to VMware vCenter Server or
ESX hosts, and display ESX server stats with a better user interface and more advanced features.
That sounds nice right? Lets have a look how it works, this is what I did to get it up and running:

Go to http://labs.vmware.com/flings/visualesxtop and click download

Unzip VisualEsxtop.zip in to a folder you want to store the tool

Go to the folder

Double click visualesxtop.bat when running Windows (Or follow Williams tip for the Mac)

Click File and Connect to Live Server

Enter the Hostname, Username and Password and hit Connect

That is it

Now some simple tips:

By default the refresh interval is set to 5 seconds. You can change this by hitting Configuration and then
Change Interval

You can also load Batch Output, this might come in handy when you are a consultant for instance and a
customers sends you captured data, you can do this under: File -> Load Batch Output

You can filter output, very useful if you are looking for info on a specific virtual machine / world! See the
filter section.

When you click Charts and double click Object Types you will see a list of metrics that you can create
a chart with. Just unfold the ones you need and double click them to add them to the right pane

There are a bunch of other cool features in their like color-coding of important metrics for instance. Also the fact
that you can show multiple windows at the same time is useful if you ask me and of course the tooltips that provide
a description of the counter! If you ask me, a tool everyone should download and check out.
Lets continue with my second favorite tool, perfmon. Ive used perfmon(part of Windows also know as
Performance Monitor) multiple times and its probably the easiest as many people are already familiar with it. You
can import a CSV as follows:
1.

Run: perfmon

Right click on the graph and select Properties.

Select the Source tab.

Select the Log files: radio button from the Data source section.

Click the Add button.

Select the CSV file created by esxtop and click OK.

Click the Apply button.

Optionally: reduce the range of time over which the data will be displayed by using the sliders under the
Time Range button.

Select the Data tab.

10. Remove all Counters.

11. Click Add and select appropriate counters.
12. Click OK.
13. Click OK.
The result of the above would be:

With MS Excel it is also possible to import the data as a CSV. Keep in mind though that the amount of captured data
is insane so you might want to limit it by first importing it into perfmon and then select the correct timeframe and
counters and export this to a CSV. When you have done so you can import the CSV as follows:

Run: excel

Click on Data

Click Import External Data and click Import Data

Select Text files as Files of Type

Select file and click Open

Make sure Delimited is selected and click Next

Deselect Tab and select Comma

Click Next and Finish

All data should be imported and can be shaped / modelled / diagrammed as needed.
Another option is to use a tool called esxplot. It hasnt been updated in a while, and I am not sure what the state of
the tool is. You can download the latest version here though, but personally I would recommend using VisualEsxtop
instead of esxplot, just because it is more recent.
1.

Run: esxplot

Click File -> Import -> Dataset

Select file and click Open

Double click host name and click on metric

As you can clearly see in the screenshot above the legend(right of the graph) is too long. You can modify that as
follows:
1.

Click on File -> preferences

Select Abbreviated legends

Enter appropriate value

For those using a Mac, esxplot uses specific libraries which are only available on the 32Bit version of Python. In
order for esxplot to function correctly set the following environment variable:
export VERSIONER_PYTHON_PREFER_32_BIT=yes
Limiting your view
In environments with a very high consolidation ratio (high number of VMs per host) it could occur that the VM you
need to have performance counters for isnt shown on your screen. This happens purely due to the fact that height of
the screen is limited in what it can display. Unfortunately there is currently no command line option for esxtop to
specify specific VMs that need to be displayed. However you can export the current list of worlds and import it
again to limit the amount of VMs shown.
esxtop -export-entity filename
Now you should be able to edit your file and comment out specific worlds that are not needed to be displayed.
esxtop -import-entity filename
I figured that there should be a way to get the info through the command line as and this is what I came up with.
Please note that <virtualmachinename> needs to be replaced with the name of the virtual machine that you need the
GID for.

VMWID=`vm-support -x | grep <virtualmachinename> |awk '{gsub("wid=", "");print $1}'`

VMXCARTEL=`vsish -e cat /vm/$VMWID/vmxCartelID`
vsish -e cat /sched/memClients/$VMXCARTEL/SchedGroupID
Now you can use the outcome within esxtop to limit(l) your view to that single GID. William Lam has written
an article a couple of days after I added the GID section. The following is a lot simpler than what I came up with,
thanks William!
VM_NAME=STA202G ;grep "${VM_NAME}" /proc/vmware/sched/drm-stats | awk '{print $1}'

Identify CPU Memory Network Disk device or disk issues using ESXTOP , in interactive
batch or replay mode
Determine use cases for and apply esxtop Interactive, Batch and Replay modes
Use vscsiStats to gather storage performance data
Use esxtop/resxtopto collect performance data
witch display: c:cpu i:interrupt m:memory n:network

d:disk adapter u:disk device v:disk VM p:power mgmt

fF Add or remove fieldsoO Change the order of displayed fields
s Set the delay in seconds between updates
# Set the number of instances to display
W Write configuration file ~/.esxtop50rc
k Kill a world
e Expand/Rollup Cpu Statistics
V View only VM instances
L Change the length of the NAME field
l Limit display to a single group
usage: esxtop [-h] [-v] [-b] [-l] [-s] [-a] [-c config file] [-R vm-support-dir-path][-d delay] [-n iterations]
[-export-entity entity-file] [-import-entity entity-file]
-h prints this help menu.
-v prints version.
-b enables batch mode.
-l locks the esxtop objects to those available in the first snapshot.
-s enables secure mode.
-a show all statistics.
-c sets the esxtop configuration file, which by default is .esxtop50rc
-R enables replay mode.
-d sets the delay between updates in seconds.
-n runs esxtop for only n iterations.
Experimental Features
-export-entity writes the entity ids into a file, which can be modified to select interesting entities.
-import-entity reads the file of selected entities. If this option is used, esxtop only shows the data for the
selected entities.
2 = highlight a row, moving down
8 = highlight a row, moving up
4 = remove selected row from view
Type below command to display all fields not default ones:
~ # esxtop -a

Of course my screen even will not be enough to show all of them, but the Magic when you are here and
press h that will take you to the help screen , my concern here is not the help but how to order by the screen , for the
above one , I have the below filters:

CPU (%USED, %RDY, %CSTP)

Press h as mentioned so you can sort by:

Sort by:
U:%USED R:%RDY N:GID
When troubleshooting CPU performance for your virtual machines the following counters are the most important.
%USED tells you how much time did the virtual machine spend executing CPU cycles on the physical CPU.
%RDY is a Key Performance Indicator! Always start with this one. This one defines how much time your virtual
machine wanted to execute CPU cycles but could not get access to the physical CPU. It tells you how much time did
you spend in a queue. I normally expect this value to be better than 5% (this equals 1000ms in the vCenter
Performance raphs read about it here)
%CSTP tells you how much time a virtual machine is waiting for a virtual machine with multiple vCPU to catch
up. If this number is higher than 3% you should consider lowering the amount of vCPU in your virtual machine.

Memory (MCTL?, MCTLSZ, SWCUR, SWR/s, SWW/s)

Sort by:
M:MEMSZ B:MCTLSZ N:GID
When troubleshooting memory performance this is the counters you want to focus on from a virtual machine
perspective.

MCTL? This column is either YES or NO. If Yes it means that the balloon driver is installed. The Balloon driver
is automatically installed with VMware tools and should be in every virtual machine. If it says No in this column
then figure out why.
MCTLSZ The column show you how inflated the balloon is in the virtual machine. If it says 500MB it translates
to the balloon driver inside the guest operating system has stolen 500MB from Windows/Linux etc. You would
expect to see a value of 0 (zero) in this column
SWCUR tells you how much memory the virtual machine has in the .vswp file. If you see a number of 500MB
here it means that 500MB is from the swap file. This does not necessarily equals to bad performance. To figure out
if you virtual machine is suffering from hypervisor swapping you need to look at the next two counters. In a healthy
environment you would want this value to p 0 (zero)
SWR/s This value tells you the Read activity to your swap file. If you see a number here, then your virtual
machine is suffering from hypervisor swapping.
SWW/s This value tells you the Write activity to your swap file. You want to see the number 0 (zero) here. Every
number above 0 is BAD.
Sequence of memory bottle neck

IF ESXi host has a memory pressure situation it starts with:

Page sharing then ballooning MCTLSZ then compression Cacheusd & ZIP/s then the
last swap SWR/s & SWW/s which is really so bad
Network (MCTL?, MCTLSZ, SWCUR, SWR/s, SWW/s)

Sort by:
T:MbTX/s R:MbRX/s
t:PKTTX/s r:PKTRX/s
N:Default
SPEED (Mbps) The link speed in Megabits per second. This information is only valid for a physical NIC.
FDUPLX Y implies the corresponding link is operating at full duplex. N implies it is not. This information is
only valid for a physical NIC.
UP Y implies the corresponding link is up. N implies it is not. This information is only valid for a physical
NIC.
PKTTX/s The number of packets transmitted per second.
PKTRX/s The number of packets received per second.
MbTX/s (Mbps) The MegaBits transmitted per second.
MbRX/s (Mbps) The MegaBits received per second.
Q: Why does MbRX/s not match PKTRX/s for different workloads?
A: This is because the packet size may not be the same. The average packet size can be computed as follows:
average_packet_size = MbRX/s / PKTRX/s . A large packet size may improve CPU efficiency of processing the
packet. However, it may potentially increase latency.

Storage (d:disk adapter u:disk device v:disk VM vscsiStats )

Disk Adapters:

KAVG/cmd
DAVG/cmd
GAVG/cmd
QAVG/cmd

Average ESXi VMkernel latency per command, in milliseconds

Average device latency per command, in milliseconds.
Average virtual machine operating system latency per command, in
milliseconds.
Average queue latency per command, in milliseconds.

Metric

Threshold

What to Check

DAVG/cmd
KAVG/cmd
GAVG/cmd

>20
>1
>20

Storage processor/array performance for bottleneck.

Kernel driver firmware and adapter queue length.
DAVG/KAVG metrics, and Guest OS performance.

Note that:

GAVG/cmd = KAVG/cmd + DAVG/cmd

DAVG/cmd is the adapter device Driver Average Latency per Command. This is the round-trip
in milliseconds from the HBA to the storage array and the return acknowledgement. Typically, most admins like to
see around 20ms or less, though it can vary significantly depending on your workload and its sensitivity to
latency.
DAVG/cmd is a good indicator that you need to start your investigation outside of ESX at the fabric and storage
array levels.

KAVG/cmd is the adapter device VMkernel Average Latency per Command. This is the
average latency between when the HBA receives the data from the storage fabric and passes it along to the Guest
OS, or vice versabasically the round trip time in the kernel itself. So, it should be a very low value, meaning that
the the I/O operation should spend as little time as possiblezero or near-zero is idealin the kernel.

GAVG/cmd is the adapter device Guest OS Average Latency per Command. This is the roundtrip in milliseconds from the Guest OS (its perspective) through the HBA to the storage array and back. This is why
this number is a sum of DAVG/cmd + KAVG/cmd. If DAVG & KAVG are within normal thresholds, but
GAVG/cmd is high, typically this indicates the VMs on that adapter or at least one of them is constrained by another
resource, and needs more ESXi resources in order to process IOs more quickly. In my experience, however, high
GAVG/cmd will typically be accompanied by another high value in either DAVG or KAVG.
If KAVG/cmd is greater than 1ms or so, check a couple of things.
1) Your device drivers are up-to-date and you are using compatible firmware versions, as this can slow down the
kernel IO path;
2) Your adapter optimization settings, which will be provided by the vendor (some of which we will discuss in the
next post).
Disk Device:

Metric

Threshol
d

What to Check

DQLEN

n/a

For reference; configured device queue length (prior to 5.0

LQLEN)

BLKSZ

n/a

For reference; configured device block size (for

alignment issues)

RESETS/s

ABRTS/sQUED
RESV/s
CON/s

>0
>0-1
n/a

Check paths and device availabilityCheck storage

fabric/array for bottleneck
Check queue depth and storage fabric/array for bottleneck
Compare to CONS/s
If >RESV/s, check for reservation conflicts with other

ESXi hosts
>RESV/s

DQLEN is the configured Device Queue Length. This is really a reference point to make sure you have
configured your devices correctly. A quick glance, as in the screenshot above, and you might notice one queue
misconfigured.

BLKSZ is the configured Device Block Size. This is another reference point to ensure that you have the
correct block size for the type of workload you are running.

RESETS/s is the number of Device SCSI Reset Commands per Second. A SCSI reset command is
issued when the SCSI operation fails to reach the target, and in a SAN environment is usually indicative in a path
down or multipathing issuei.e., ESXi thinks a path is fine but in reality it is faulty. This is commonly seen on
Cisco Nexus fabrics as CRC errors on a port, for example.

ABRTS/s is the number of Device SCSI Abort Commands per Second. A SCSI abort command is
issued from the Guest OS when the command times out waiting for a response acknowledgement. In Windows 2008
and later, this is 60 seconds by default. Typically if you are encountering a large number of aborts, the storage
fabric/array is causing a bottleneck and is the place to begin your investigation.
If you are using something such as a NetApp FAS, be sure that you run the GOS Timeout Script on your VM or
VM template to make sure you have the proper timeout values (login required) set in order to prevent a SCSI
abort during a path failover or path problem.

QUED is the current Device Commands Queued in the VMkernel. As I explained previously,
this number should be at zero or near zero, otherwise it is indicating that something in the kernel is throttling the IO
throughput between the Guest OS and the HBA/storage fabric/array. Check firmware versions for correct revisions
and other performance tuning options within ESXi, especially vendor recommendations.

RESV/s is the Device SCSI Reservations per Second. SCSI reservations are commonplace; thats how
SCSI commands work. This value is only important as it relates to CONS/s.

CONS/s is the Device SCSI Reservation Conflicts per Second. If this value is greater than RESV/s,
then it is indicative that some other ESXi hosts are holding reservations on this particular path that are conflicting
with reservations currently held by this particular host. A very high value could be felt as a performance
sluggishness in the storage subsystem due to the kernel constantly requesting SCSI locks and being denied, and
consequently, retrying.
Troubleshooting SCSI reservation conflicts can be challenging. Some helpful information can be found in this
VMware KB deep-dive article on Troubleshooting SCSI Reservation Conflicts, as well as in VMware
KB 1005009 and VMware KB 1002293.
From <http://www.datacenterdan.com/blog/vsphere-55-bptroubleshooting06-esxtop-disk-devices>
Virtual Machine Disk

vscsiStat ,,
Please

review

You can output your results to csv file for other analysis :
vscsiStats -p all -c > /tmp/output.csv

Determine use cases for and apply esxtop/resxtop Interactive, Batch and Replay modes
Use cases:
Troubleshooting poor performance for specific VM , or identify issues with storage , network or Memory.

Interactive mode (the default mode): All statistics are displayed are in real time

Batch mode: Statistics can be collected so that the output can be saved in a file (csv) and can be viewed &
analyzed using windows perfmon & other tools in later time.
~ # esxtop -b -d 20 -n 2 -a > /tmp/20secsnds2intrpts.csv
This will run for 20 seconds for 2 iterations and output as csv

Replay mode: It is similar to record and replay operation. Data that was collected by the vm-support command
is interpreted and played back as esxtop statistics. We can view the captured performance information for a
particular duration or time period as like real time to view what was happening during that time. It is perfectly used
for the VMware support person to replay the stats to understand what was happening to the server during that time.
First let us see the vm-support switches:

So I run it with
intervals

p to collect the performance data and d during a period of 100 seconds , then over 2 seconds

/vmfs/volumes/4aaa440f-1a187eb4-6f5e-0000c985147e/LoGs # vm-support -p -d 100 -i 2 -w

/vmfs/volumes/4aaa440f-1a187eb4-6f5e-0000c985147e/LoGs

Then reconstruct the data:

/vmfs/volumes/4aaa440f-1a187eb4-6f5e-0000c985147e/LoGs # cd esx-esx01.com-2015-04-2104.56/
./reconstruct.sh

VMware Vsphere Basics
No ratings yet
VMware Vsphere Basics
28 pages
WP VM Performance and Troubleshootinng Esxtop
No ratings yet
WP VM Performance and Troubleshootinng Esxtop
8 pages
Analyzing Esxtop Columns
No ratings yet
Analyzing Esxtop Columns
3 pages
Vreference-Esxtop1 2
No ratings yet
Vreference-Esxtop1 2
1 page
ESXTOP Vsphere6 PDF
No ratings yet
ESXTOP Vsphere6 PDF
1 page
Vmware vcpdcv7 3 7 1 Monitor Vcenter Server Tasks, Events, and Appliance Health
No ratings yet
Vmware vcpdcv7 3 7 1 Monitor Vcenter Server Tasks, Events, and Appliance Health
2 pages
Esxtop English v11
No ratings yet
Esxtop English v11
1 page
ESX 5 Esxcli Cheat Sheet
No ratings yet
ESX 5 Esxcli Cheat Sheet
14 pages
ESX 5 Esxcli Cheat Sheet
No ratings yet
ESX 5 Esxcli Cheat Sheet
14 pages
DOC-9279
No ratings yet
DOC-9279
29 pages
Esxtop Troubleshooting Eng
No ratings yet
Esxtop Troubleshooting Eng
1 page
Xylos Troubleshooting
No ratings yet
Xylos Troubleshooting
15 pages
The Top 25 VMware ESXi Commands
No ratings yet
The Top 25 VMware ESXi Commands
8 pages
ESXCli Basic Commands
No ratings yet
ESXCli Basic Commands
10 pages
Enabling Remote Access (SSH) To The Esx/Esxi Host: Vmknic Adequately Substitute For Them
No ratings yet
Enabling Remote Access (SSH) To The Esx/Esxi Host: Vmknic Adequately Substitute For Them
10 pages
Vmware Esxi Esxcli Command: A Quick Tutorial: Home Virtualization
No ratings yet
Vmware Esxi Esxcli Command: A Quick Tutorial: Home Virtualization
65 pages
Vmware Commands
100% (1)
Vmware Commands
18 pages
Vsphere Esxi Vcenter Server 601 Monitoring Performance Guide
No ratings yet
Vsphere Esxi Vcenter Server 601 Monitoring Performance Guide
216 pages
ESX Qupdate 1.5
No ratings yet
ESX Qupdate 1.5
8 pages
ESXi Host Section Alpha
No ratings yet
ESXi Host Section Alpha
1 page
Vsphere Esxi Vcenter Server 51 Monitoring Performance Guide
No ratings yet
Vsphere Esxi Vcenter Server 51 Monitoring Performance Guide
196 pages
The Top VMware ESX Commands and ESXi Commands
No ratings yet
The Top VMware ESX Commands and ESXi Commands
2 pages
VMWARE CLI Commands
No ratings yet
VMWARE CLI Commands
9 pages
Troubleshooting ESXi
No ratings yet
Troubleshooting ESXi
43 pages
ESX3.x VC2.x ServiceConsole Guide
No ratings yet
ESX3.x VC2.x ServiceConsole Guide
70 pages
Troubleshooting a virtual machine that has stopped responding_ VMM and Guest CPU usage comparison
No ratings yet
Troubleshooting a virtual machine that has stopped responding_ VMM and Guest CPU usage comparison
6 pages
ESXi 5.1 Poster
No ratings yet
ESXi 5.1 Poster
1 page
Vsphere Troubleshooting
No ratings yet
Vsphere Troubleshooting
69 pages
Vmware Esxi Command Line Cheat Sheet v1 1 Printable
No ratings yet
Vmware Esxi Command Line Cheat Sheet v1 1 Printable
1 page
VMware VCAP-DCA Exam Command-Line Cheat Sheet v1.0
No ratings yet
VMware VCAP-DCA Exam Command-Line Cheat Sheet v1.0
19 pages
Module 3 - Tools for Troubleshooting vSphere
No ratings yet
Module 3 - Tools for Troubleshooting vSphere
92 pages
VMWARE Commands
No ratings yet
VMWARE Commands
11 pages
Analyze IO Workloads To Determine Storage Performance Requirements
No ratings yet
Analyze IO Workloads To Determine Storage Performance Requirements
3 pages
Vsphere Esxi Vcenter Server 672 Monitoring Performance Guide
No ratings yet
Vsphere Esxi Vcenter Server 672 Monitoring Performance Guide
234 pages
Vsphere Troubleshooting Tips and Tricks: Publication or Distribution
No ratings yet
Vsphere Troubleshooting Tips and Tricks: Publication or Distribution
52 pages
VSphere Troubleshooting and Tricks
No ratings yet
VSphere Troubleshooting and Tricks
46 pages
ESXTOP Command Overview
No ratings yet
ESXTOP Command Overview
1 page
Vmware Vcap-Dca Cli Cheat Sheet v1.0
No ratings yet
Vmware Vcap-Dca Cli Cheat Sheet v1.0
18 pages
ESX Config Commands
No ratings yet
ESX Config Commands
1 page
Vmware Command Line
100% (5)
Vmware Command Line
25 pages
Command Line ESX
No ratings yet
Command Line ESX
59 pages
Vsphere Esxi Vcenter Server 601 Monitoring Performance Guide
No ratings yet
Vsphere Esxi Vcenter Server 601 Monitoring Performance Guide
208 pages
V Sphere Apis For Performance Monitoring
No ratings yet
V Sphere Apis For Performance Monitoring
92 pages
VM Notes
No ratings yet
VM Notes
4 pages
All About ESX Command Line
No ratings yet
All About ESX Command Line
113 pages
Vsphere Monitoring and Performance PDF
No ratings yet
Vsphere Monitoring and Performance PDF
219 pages
ESXTOP
No ratings yet
ESXTOP
40 pages
The Mac Terminal Reference and Scripting Primer
From Everand
The Mac Terminal Reference and Scripting Primer
Jay Docherty
4.5/5 (3)
Build your own Blockchain: Make your own blockchain and trading bot on your pc
From Everand
Build your own Blockchain: Make your own blockchain and trading bot on your pc
Magelan Cybersecurity
No ratings yet
Linux System Administrator Interview Questions You'll Most Likely Be Asked
From Everand
Linux System Administrator Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Windows Batch File Programming
From Everand
Windows Batch File Programming
Michael Elliott
2/5 (2)
Linux Services Deployment
From Everand
Linux Services Deployment
Fabian Mestre
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Windows Command Prompt
From Everand
Windows Command Prompt
Murat Yildirimoglu
No ratings yet
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
From Everand
DRBD-Cookbook: How to create your own cluster solution, without SAN or NAS!
Joerg Christian Seubert
No ratings yet
Linux Command-Line Tips & Tricks
From Everand
Linux Command-Line Tips & Tricks
V. Subhash
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
The Complete Powershell Training for Beginners
From Everand
The Complete Powershell Training for Beginners
Abdelfattah Benammi
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Right To Information As A Part of Freedom of Speech and Expression in India)
No ratings yet
Right To Information As A Part of Freedom of Speech and Expression in India)
18 pages
Temperature Controller General Specifications: DT-48E / DT-36E
No ratings yet
Temperature Controller General Specifications: DT-48E / DT-36E
4 pages
HCC VS LCC
No ratings yet
HCC VS LCC
6 pages
Initial Test Paper: 7 Grade
No ratings yet
Initial Test Paper: 7 Grade
2 pages
Pattabhishekam Tamil Lyrics
No ratings yet
Pattabhishekam Tamil Lyrics
4 pages
B Make Her Feel
No ratings yet
B Make Her Feel
1 page
Healthy Lifestyle
No ratings yet
Healthy Lifestyle
2 pages
Linguistics. Gr.-A ATTEND.-1
No ratings yet
Linguistics. Gr.-A ATTEND.-1
2 pages
Chapter 1 - Information Systems in Global Business Today
No ratings yet
Chapter 1 - Information Systems in Global Business Today
19 pages
English
No ratings yet
English
10 pages
Assignment 8
No ratings yet
Assignment 8
8 pages
2023 the Good the Bad the Unknown
No ratings yet
2023 the Good the Bad the Unknown
79 pages
Advanced Mechanical Design Assignment
No ratings yet
Advanced Mechanical Design Assignment
6 pages
Starbucks
No ratings yet
Starbucks
4 pages
This Section Contains Multiple Choice Questions. Each Question Has 4 Choices (A), (B), (C) and (D), Out of Which ONLY ONE Is Correct
No ratings yet
This Section Contains Multiple Choice Questions. Each Question Has 4 Choices (A), (B), (C) and (D), Out of Which ONLY ONE Is Correct
10 pages
Humanities Electives
No ratings yet
Humanities Electives
10 pages
CCC-GHC-BK1-02 - The Tithes That Bind v2.5
100% (1)
CCC-GHC-BK1-02 - The Tithes That Bind v2.5
33 pages
Catacarb Chemistry
100% (1)
Catacarb Chemistry
31 pages
RashisWeekend 1 PDF
No ratings yet
RashisWeekend 1 PDF
15 pages
The House on Mirror Lake Dragonbane
100% (1)
The House on Mirror Lake Dragonbane
8 pages
Sales Promotion of Reliance GSM Servicesjitu
No ratings yet
Sales Promotion of Reliance GSM Servicesjitu
61 pages
The Importance of Understanding Literacy in Teaching
No ratings yet
The Importance of Understanding Literacy in Teaching
4 pages
Ascher On Compassion Thesis
100% (3)
Ascher On Compassion Thesis
5 pages
Jet Propulsion Laboratory Case
No ratings yet
Jet Propulsion Laboratory Case
4 pages
The Undermountain LVL 1
No ratings yet
The Undermountain LVL 1
20 pages
Chapter 3
No ratings yet
Chapter 3
7 pages
Neurodiversity and the Social Ecology of Mental Functions
No ratings yet
Neurodiversity and the Social Ecology of Mental Functions
13 pages
Phase-1 Chartwork - Chief Mate
100% (3)
Phase-1 Chartwork - Chief Mate
93 pages
Mid 1
0% (1)
Mid 1
43 pages
DPT 2nd Semester Computer Paper
50% (2)
DPT 2nd Semester Computer Paper
3 pages

ESXTOP

Uploaded by

ESXTOP

Uploaded by

ESXTOP

q quit the esxtop

To load the esxtop with your customized fields

5. Quit the current esxtop screen

ESXTOP -Batch Mode

d Switch is used for the number of seconds between refreshes

M you will see memory metrics. N for network etc. If you

%USED, %RDY, %CSTP

One of the key performance counters in a vSphere enviroment is:

CPU ready (%rdy in ESXTOP)

vCenter Performance Graphs (Value 1035 milliseconeds)

ESXTOP (value 5.38%)

(1035 ms. x 100) / 20000 ms = 5,175%

MCTL?, MCTLSZ, SWCUR, SWR/s, SWW/s

Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check

%MLMTD) has been set. See Jasons explanationfor vSMP VMs

particular VM. This should lead to increased scheduling opportunities.

possible root cause

memory as host is overcommited.

cause: Excessive memory overcommitment.

Excessive memory overcommitment.

Previously host was overcommited on memory.

optimizations for that VM and remotely uses memory via

very high network utilization

high network utilization

Disk latency most likely to be caused by array.

Disk latency caused by the VMkernel, high KAVG usually means

queuing. Check QUED.

vendor for optimal queue depth value.

SCSI Reservation Conflicts per second. If many SCSI Reservation

<move field by typing appropriate character uppercase = left, lowercase = right>

What is VisualEsxtop as it is a relatively new tool (published 1st of July 2013).

Go to http://labs.vmware.com/flings/visualesxtop and click download

Unzip VisualEsxtop.zip in to a folder you want to store the tool

Click File and Connect to Live Server

Enter the Hostname, Username and Password and hit Connect

Now some simple tips:

Right click on the graph and select Properties.

Select the Source tab.

Click the Add button.

Select the CSV file created by esxtop and click OK.

Click the Apply button.

Select the Data tab.

10. Remove all Counters.

Click Import External Data and click Import Data

Select Text files as Files of Type

Select file and click Open

Make sure Delimited is selected and click Next

Deselect Tab and select Comma

Click Next and Finish

Click File -> Import -> Dataset

Select file and click Open

Double click host name and click on metric

Click on File -> preferences

Select Abbreviated legends

Enter appropriate value

VMWID=`vm-support -x | grep <virtualmachinename> |awk '{gsub("wid=", "");print $1}'`

d:disk adapter u:disk device v:disk VM p:power mgmt

CPU (%USED, %RDY, %CSTP)

Press h as mentioned so you can sort by:

Memory (MCTL?, MCTLSZ, SWCUR, SWR/s, SWW/s)

IF ESXi host has a memory pressure situation it starts with:

Storage (d:disk adapter u:disk device v:disk VM vscsiStats )

Average ESXi VMkernel latency per command, in milliseconds

Storage processor/array performance for bottleneck.

GAVG/cmd = KAVG/cmd + DAVG/cmd

For reference; configured device queue length (prior to 5.0

For reference; configured device block size (for

Check paths and device availabilityCheck storage

/vmfs/volumes/4aaa440f-1a187eb4-6f5e-0000c985147e/LoGs # vm-support -p -d 100 -i 2 -w

Then reconstruct the data:

You might also like