Академический Документы
Профессиональный Документы
Культура Документы
Introduction
One of the many jobs administrators are tasked with is that of performance monitoring to ensure that
the environments we are responsible for are running as smoothly and as efficiently as possible. This
fact remains true regardless of whether we are running in a physical or in a virtual environment. While
the graphs displayed to us through the vSphere Client / vSphere Web Client do provide us some of
that functionality, the tool to most likely be used should we decide to involve VMware technical support
will be ESXTOP.
ESXTOP is one of those commands not covered in the official VMware 5.1 Install, Configure, and
Manage course (it is mentioned in one slide) however ESXTOP (and it's brother resxtop) provide us
with a way to view live performance data directly on the host using counters and percentages. For
those administrators familiar with Linux / UNIX, ESXTOP is used the same way the TOP command is
used. Personally, I believe that for performance troubleshooting, ESXTOP gives us the best native
tool to monitor resources such as CPU, memory, disk, and network usage. In this document, I attempt
to outline some of the more common options used with the ESXTOP command along with their
description, recommended thresholds, and examples of problems that may signal a potential issue
within our environment.
Starting esxtop
Being a command line tool, we first need to determine how we are going to access that command line.
Depending on the method chosen, the command used to invoke it may change somewhat. If using
SSH or the ESXi shell, we would log in and enter:
esxtop
In the event we would prefer to use the downloadable vCLI (vSphere Command Line Interface)
package or vMA (vSphere Management Assistant), the command changes to:
resxtop --server <server name or IP address>
Regardless of the method used to access the tool, the first screen encountered is the CPU statistics
(Figure 1). This screen is automatically refreshed every 5 seconds however the refresh rate can be
changed by using 's' to denote seconds and then the desired number of seconds between refreshes.
As an example, the following command would set the pages refresh rate to refresh every 3 seconds:
s3
We can navigate to screens displaying other resources by using different keystrokes. Although there
are a multitude of available options, the following are the most commonly used and therefore the ones
I'll focus on:
c = cpu
m = memory
n = network
d = disk adapter
u = disk device (includes NFS as of 4.0 Update 2)
v = virtual machine disk activity
CPU
View
Column
Threshold
Description
CPU
%RDY
10
CPU
%CSTP
The amount of time a vSMP VM spent descheduled to try and equalize the threads
processed. Possibly due to high vSMP
configuration.
CPU
%MLMTD
high %RDY times therefore it is important to look at all of a VMs resources when troubleshooting
latency problems in an environment.
Another issue commonly seen in environments has to do with vSMP (virtual symmetric
multiprocessing) VMs. Although it is possible to create VMs with multiple vCPUs, it is recommended
that that vSMP VMs be the exception and not the norm. The reason for this is because when a vSMP
VM requires CPU cycles, it will attempt to look for those x number of vCPUs (where 'x' is the number
of vCPUs allocated to the VM) in order to execute. If x number is not available, a relaxed coscheduler will allow for some of the threads to be placed on available cores. At first this doesn't seem
like a problem however at some point, the thread(s) furthest along must be de-scheduled off of a CPU
in order allow it's siblings to catch up with it. In figure 3, this can be seen because a vSMP VM
(WinXP-4) has been introduced to the environment and is also now conducting CPU intensive
calculations. With all VMs performing these calculations, WinXP-4's vCPUs must be forcibly descheduled and is reflected in the %CSTP (co-stop) counter.
View
Column
Threshold
Description
DISK
GAVG
25
DISK
DAVG
25
DISK
KAVG
DISK
QUED
Similar to what was experienced with the disk latency, virtual machine latency may also be caused by
network problems. These problems may be the result of dropped packets which in turn may be the
result of a saturated uplink. Figure 6 is a view of the activity on the virtual network. In this screen
shot, network activity is minimal however figure 7 shows this same network under higher utilization.
View
Column
Threshold
Description
Network
%DRPTX
Percentage of packets
dropped during
transmission. Possible
network saturation.
Network
%DRPRX
Percentage of packets
dropped on receipt.
Possible network
saturation.
immediate action to mitigate the situation (i.e. vMotion VMs to another host, add another host to the
cluster, etc.). The memory screen can be accessed using the 'm' key on our keyboards (Figure 8).
Because were interested in seeing the ballooning statistics, the counter we choose is j (MCTL). We
can then exit this screen and are returned to the memory statistics screen (figure 10) which now
contains the ballooning counters (MCTL). By looking at these counters, we can determine if VMs are
borrowing memory from each other. Ballooning begins to take place when the host is beginning to run
low on memory and allows for one VM to borrow memory pages from another VM. If the memory
being borrowed wasnt being used by the VM being victimized, then it is possible for no performance
impact to be felt however if the memory being taken were hot pages, this may force to victimized VM
to swap pages to disk which obviously brings a performance penalty with it.
View
Column
Threshold
Description
Memory
MCTLSZ
Memory
SWCUR
Memory
SWW/s
Memory
ZIP/s
It is possible for some problems to manifest themselves during off hours (every morning at 0200 for
example). If someone is not readily available and monitoring, troubleshooting such issues can be a
challenge however in these cases, we can schedule ESXTOP to run and capture performance
snapshots which can be saved and replayed at our convenience or sent to VMware technical support
for further analysis. The command used for this can vary between ESXi versions (see
(http://kb.vmware.com/kb/1967) however one example would be:
vm-support -p -d 600 -i 20 > perfsnap.csv
This command would run the vm-support command specifically for performance (-p) for ten minutes (d is duration in seconds) in 20 second intervals (-i). It will then export that into the perfsnap.csv file for
later analysis.
ESXTOP is a command that provides us with a plethora of options and each of these options could be
discussed in great length. In the preceding document, I attempted to detail some of the more common
areas where issues may creep up however this document was not intended to be exhaustive. For
further reading on this topic, Ive provided links to VMware documentation and VMworld presentations
that you may find useful.
http://www.vmware.com/pdf/esx2_using_esxtop.pdf
http://kb.vmware.com/kb/1008205
http://communities.vmware.com/docs/DOC-9279
http://communities.vmware.com/docs/DOC-5240
http://www.vmware.com/pdf/vsphere4/r41/vsp_41_resource_mgmt.pdf
VMworld 2011 - ESXTOP for Advanced Users
VMworld 2011 - Performance Best Practices and Troubleshooting