Вы находитесь на странице: 1из 6

CPU and Memory considerations for virtual machines on

VMware ESXi

This is one of the greatest mysteries still prevalent even amongst the top
echelons of virtualization consultants, bloggers and the like. The source of
all our troubles in understanding exactly how an ESXi host will handle a
virtual machine under different circumstances are two very basic things.
1. VMware has never documented, at least in the public domain, a lot of
technical details on how their algorithms work. Sometimes not even the
thresholds for particular operations.
2. ESXi uses a vast array of complicated algorithms. And at any point of
time, a multitude of these algorithms may be in action.
The first thing one has to understand is that CPU and memory management are
not two different tasks but rather the two sides of same coin. However, lets
talk about them as separate entities at first before funneling down to the
co-design aspects of CPU and memory management in the ESXi hypervisor.
Note: All of the material here are either my own opinion or directly borrowed
from sites like frankdenneman.nl, yellow-bricks.com and others.
CPU Management

A typical fallacy of virtualization is rooted in how IT departments are
structured. Anyone who has spent time in operations knows that typically the
specification for a virtual machine is generated by an application management
team. In almost all cases, such teams have no idea of how virtualization
works. So their specs are ditto copies of what they had in a physical server
for the same or maybe an earlier version of the application. IT
administrators also typically do not look into the applications details and
discuss with the application team about the VM specs. This results in
numerous 4- and 8 vCPU servers in most environments. Sometimes CPU allocated
to a machine is increased in response to performance issues as well. These
behemoths chug along for years on the end, hogging precious resources
unnecessarily, and waiting for some smart guy to figure out that things are
very wrong.
The first question is: How many CPUs should you have in your virtual
machines?
The answer is really simple and constant: 1.
Most users are somewhat taken aback at this. But one needs to realize that
they are not dealing with physical hardware anymore. Rather they are
literally slicing up a physical hardware and dividing it amongst many
operating systems. The virtual machine shell is really for the operating
system and not for the user. And the slicing happens both in space and time.
Think of this as statistical multiplexing. So if the number of CPUs is
increased in a VM then the total number of CPUs in the system increases which
in turn increases contention and latency. And the latency is not only limited
to scheduling latency. Two vCPUs belonging to the same VM are like two close
brothers. They like to run together and wont leave each other behind. So if
one vCPU falls behind because there is no work for him then ESXi forces the
other vCPU of the same VM to stop long enough for the slow brother to catch
up. This happens even if there is a CPU intensive load running on this VM.
But then one may wonder, If there is a CPU intensive load then it should be
using both CPUs to run faster and there is then no question of one vCPU
falling behind.
This is not entirely true. The application may be a single threaded
application sequential in other words. Thus it will not use more than one
CPU and is absolutely not bothered about how many CPUs are present in the
system. Though most of the applications today are multi-threaded, the
administrators need to know how many CPUs can be effectively used. Increasing
CPU count to deal with performance issues only complicates the matter
further.
Similarly, when sizing the VM a lot of administrators are guilty of planning
for maximum load rather than average load. This means that most resources are
sitting idle for most of the time. Now in case of physical to virtual (P2V)
conversions one can use Capacity Planner from VMware to calculate average
load. Otherwise though it becomes a little more difficult and one has to rely
on previous experience. Generally, a scale out policy is better than a scale
up policy when it comes to designing a virtual datacenter. However, this
requires the administrator to have a detailed understanding of the workload.
CPU Resource Management

VMware provides several configurable settings to allow the administrator to
direct the relative importance of VMs. The most important amongst these are
Reservation and Share. Bursts of CPU activities can be handled by
adjusting the Limit setting.
Reservation tells the hypervisor how much CPU should be permanently set aside
for the machine whereas shares tell it the relative importance of a machine.
The reservation setting is typically set only if required. Shares are a much
better way of dividing the resources amongst machines.
In the context of CPU management, reservation is not so damaging. But even
then, under certain conditions the VM for which CPU reservations are set may
get choked if it requires more CPU than specified in its reservation. This is
because the hypervisor uses a metric called MHzPerShare to determine the
fairness in CPU allocation to various VMs. Reservations set on a VM with a
CPU intensive application drives up this metric. The hypervisor will then
offer other VMs the ability to catch up before CPU cycles are offered to this
VM again. To gain further insight into this, please look here.
It is generally not advised to set CPU limits either especially in an under-
loaded system. To understand the impact of CPU limits please read this KB.
Memory Management

As far as memory is concerned it is more expensive than CPU in terms of
virtualization. It is a general experience that memory resources in a
physical system is utilized much faster than CPU and/or storage. The
virtualization density of any server is effectively limited by the amount of
memory available on the system. And though VMware is a market leader in this
space by miles, it is still a major area of concern.
VMware uses the following algorithms to handle memory:
1. Transparent Page Sharing (TPS) This is used to deduplicate pages in
the physical memory space. This feature cannot be disabled.
2. Memory Compression This is used to compress pages in the physical
memory. This can also be disabled at the system level or at the
individual VM level.
3. Ballooning This has been for a long time and still is VMwares crown
jewel in memory management. It comes along with the VMware Tools and is
installed in the guest operating system as a driver. Whenever there is
a request for memory that cannot be satisfied by the current amount of
free physical memory pages, the balloon driver (memctl) starts
inflating or in other words starts requesting memory from the guest
operating system. Once it has received this memory, the hypervisor
frees up the physical memory at the backend and gives it to the VM
requesting for memory. However, please bear in mind that due to TPS the
amount of memory given up may not be equal to the amount of physical
pages freed up. Ballooning can be disabled but it is not advised.
Ballooning is used to pass the resource contention in the hypervisor to
the guest operating systems so that currently active pages are not
freed up.
4. Swap This comes as the last resort to handle memory contention. The
hypervisor starts swapping out a VMs memory to the swap file created.
However, this usually results in terrible performance of the VM. The
hypervisor has no idea of the pages currently active for any VM running
on it. The swap out can thus result in active pages being swapped out.
As a thumb rule, administrators should look at re-distributing VMs at
the first sign of hypervisor level swapping.
TPS and Memory Compression allow overcommitment of memory while ballooning
and swapping are reclamation techniques. Usually when less than 4% memory is
free ESXi will start reclamation through ballooning and swapping when only 2%
is free.
Memory Resource Management

Similar to CPU settings, we again have Reservation, Share and Limit. In
the context of memory management setting reservation can be disastrous.
Unlike reserved CPU, ESXi does not redistribute reserved physical memory
pages once the VM has touched them. This is especially bad in case of Windows
VMs as Windows zeroes out its entire memory at boot. Reservation also has a
negative impact on High Availability (HA) clusters setup in the Host mode.
HA in this case uses the maximum memory and CPU reservations to calculate
slot size for admission control purposes.
However, reservation does have one positive effect. When a virtual machine is
setup it creates a swap file in the storage equal in size to the memory. If
memory is available in a system where you have run out of storage, memory
reservations can be set to free up storage to increase virtualization
density. Typically, it is not a good idea for physical hosts with a variety
of applications.
Again it is the Share setting that comes to the rescue. Similar to CPU, share
settings decide the proportional importance of a VM when the hypervisor is
assigning memory shares. Please read this article for a better understanding.
NUMA

This is the technology that has made life harder for virtualization
administrators. AMD has been using the NUMA architecture with their Opteron
processors for years and for a few years now even Intel has rolled over to
NUMA from their earlier shared Front Side/Back Side Bus (FSB/BSB)
architecture. This was also called North Bridge South Bridge.
So basically in NUMA each CPU has its own bunch of associated memory. This is
also known as local memory. In order to access memory associated with other
processors remote memory the package has to initiate communication with
the related CPU over system board bus. Thus remote access is much slower than
local access.
NUMA and VMware

VMware ESXi is a NUMA aware hypervisor. This means that on NUMA architecture
it has special optimization algorithms that kick in. It is the goal of ESXi
to keep at least 80% of a VMs working set localized i.e. in the local
memory. This is achieved by assigning a soft package affinity to that VM.
If locally available memory falls below 80% then it is considered as poor
localisation. If this value falls too far then ESXi has algorithms to
determine whether it makes more sense to move the CPU affinity of the VM to
other packages.
ESXi will always prefer to place all vCPUs belonging to the same VM on the
same NUMA node. This is why it makes more sense to have a physical server
with more cores than with more packages. This not only improves physical RAM
locality but also vastly improves cache locality at least till L2 cache.
Also, with servers with less packages but more cores have a lesser number of
NUMA nodes which decreases the requirement for wide VMs.
As a side note, there is no difference between multiple sockets and multiple
cores at the vCPU level apart from the fact that Hot-Add works only for
sockets.
Wide VMs

A wide VM is a VM that has so many vCPU cores that it cannot fit into a
single NUMA node. Please note that ESXi ignores hyperthreading while doing
this calculation. Once this is determined and the vCPUs scheduled the NUMA
optimizations kick in and memory is localized as far as possible.
However, when wide VMs are deployed the memory is not absolutely localized
but interleaved. This means that the total memory will be distributed evenly
by ESXi amongst the various NUMA nodes used by the VM. This implies that a
vCPU scheduled on node 1 maybe accessing memory from node 2. Thus wide VMs
should be used only when absolutely required.
TPS with NUMA

TPS, as mentioned earlier, deduplicates memory pages. However, deduplication
of memory pages across NUMA nodes can induce a performance hit. This behavior
is thus disabled by default. TPS works only within each NUMA node.
This feature can be disabled and performance hit due to system wide TPS can
still provide enough free memory to justify the move. However, most
environments do not need this.
Similarly, other memory techniques are also localized to the NUMA node.
Please read this article for greater insight.

Вам также может понравиться