Вы находитесь на странице: 1из 43

Applica

tion
Active
Directo
ry

Design
Qualifier
Availabili
ty

Recommendations
1-) Try to separate FSMO roles between many AD DC VMs and separate these VMs on different host
using VMs Anti-Affinity Roles.
2-) Try to separate all DC VMs on separate back-end Storage Arrays. If not available, try to host one DC
VM on a local datastore as a protection in case of back-end shared storage failure. Keep in mind that,
DC VM on local datastore wont use features like: HA or vMotion.
3-) Try to separate your DC VMs on many clusters on many physical racks or blade chassis using Soft
(Should) VM-Host Anti-affinity rule. At least, dedicate a management cluster separated from production
cluster.
4-) Make sure to set HA Restart Priority for all DC VMs in your HA Cluster(s) to High in order to be
restarted first before any other VMs in case of host failure.

Performa
nce

5-) Try to use VM Monitor to monitor activity of the AD DC VMs and restart them in case of OS failure.
1-) CPU Sizing:
Site Size
No. of vCPUs
<500 Users per Site
Single vCPU
<10,000 Users per Site
2 vCPUs
>10,000 Users per Site
3+ vCPUs
This assumes that the primary work of the directory is user authentication. For any additional workload,
like Exchange Server, additional vCPUs may be required. Capacity Monitoring is helpful to determine the
correct vCPUs required.
2-) Memory Sizing:
Memory of AD DC can help in boosting the performance by caching AD DB in the RAM, like any other DB
application. Ideal case is to cache all AD DB in the RAM for max. performance. This is preferred in
environments that have AD integrated with other solutions, like Exchange servers. The following guide
line is a start:
Site Size
Min. RAM Size
<500 Users per domain per Site
512 MB
500-1,000 Users per domain per Site
1GB
>1,000 Users per domain per Site
2 GB
For the correct sizing of RAM, start with min. required and use Windows Performance Monitor to monitor
Database/DB Cache Hit% for lsass service after extended period after deploying this DC. Add RAM if
required using vSphere Hot Add feature (Keep in mind that you have to enable it before starting up the
DC VM). When the RAM is sized correctly enough to cache proper portion of DB, this ratio should be
near 100%.
Keep in mind that, this is only for AD Domain Services, i.e. additional RAM is required for the Guest OS.

3-) Storage Sizing:


The following equations are general for sizing the required storage size for a DC:
Storage required= OS Storage+ AD DB Storage+ AD DB Logs Storage+ SYSVOL Folder Storage+
Global Catalogue Storage+ Any data stored in Application Partition+ Any 3 rd Part Storage
AD DB Storage= 0.4GB for any 1,000 users 0.4MB*Total No. of Users
AD DB Logs Storage= 25% of AD DB Storage
SYSVOL Folder 500MB+
May increase in case of high no. of GPOs.
Global Catalogue Storage= 50% of AD DB for any additional Domain
Any data stored in Application Partition is to be estimated
Any 3rd Part Storage includes any installed OS patches, paging file, backup agent or anti-virus agent
The following table shows the Read/Write behavior of each of AD DC components:
AD DC Component
Read/Write
RAID Recommended
AD DB
Read Intensive
RAID 5
AD DB Logs
Write Intensive
RAID 1/10
OS
Read/Write
RAID 1
Keep in mind that for large environments with many integrated solutions with AD, separation of OS, AD
DB and AD DB Logs on many disks is recommended for IO separation on many disks and vSCSI
adapters. In such environments, caching most of AD DB on RAM will give a performance boost.

Managea
bility

4-) Network Sizing:


AD DC VM should have a VMXNET3 vNIC which gives the max. network performance with least CPU load
and this should be sufficient on 1GB physical network. The port group to which AD DC VM is connected
should have a teamed pNICs for redundancy.
1-) Time Sync:
Time Synchronization is one of the most important things in AD DS environments. As stated by VMware
Best Practices to Virtualize AD DS on Windows 2012 here, its recommended to follow a hierarchical
time sync as follows:
- Sync the PDC in Forest Root Domain to an external Startum 1 NTP servers.
- Sync all other PDCs in other child domains in the forest to the Root PDC or any other DC in the Root
Domain.
- Sync all ESXi Hosts in the VI to the same Startum 1 NTP Server.
- Sync all workstations in every domain to the nearest DC in their domains respectively.
To configure the PDC to time-sync with an external NTP server using GPO:
http://blogs.technet.com/b/askds/archive/2008/11/13/configuring-an-authoritative-time-server-withgroup-policy-using-wmi-filtering.aspx
Also, its recommended to disable time-sync between VMs and Hosts using VMware Tools totally (Even
after uncheck the box from VM settings page, VM can sync with the Host using VMware Tools in case of
startup, resume, snapshotting, etc.) according to the following KB:
http://kb.vmware.com/selfservice/microsites/search.do?

language=en_US&cmd=displayKC&externalId=1189
This will make the PDC VM only sync its time with the configured time source using GPO.
2-) Use Best Practices Analyzer (BPA):
Its recommended to use BPA for AD DCs to make sure that your configuration is coherent with Microsoft
recommended configuration. In some cases and for valid reasons, you can drift from Microsoft
recommendations.
AD DS: http://technet.microsoft.com/en-us/library/dd391875(v=ws.10).aspx
3-) Use AD DS Replication Tool:
This tool, offered by Microsoft for free, can help detect any issue in replication between all DCs in your
environment and show them and the related KB articles to solve these issues. Its the next generation
from REPADMIN CLI tool.
Download it from: http://www.microsoft.com/en-us/download/details.aspx?id=30005out

Recovera
bility

4-) Snapshots:
Using AD DC on Windows 2012, you can use snapshots without worrying about reverting to old
snapshot and the related USN Rollback issue. AD DC on Windows 2012 leverages the new VMGeneration ID feature that makes the AD DC Virtualization aware and hence, any hot/cold snapshot can
be used to revert to it safely. Check VMware Best Practices to Virtualize AD DS on Windows 2012 here
for more information about VM-Generation ID and related Virtualization Safeguards.
1-) Try to use a backup software that is VSS-aware to safely backup your AD DB. AD DC on Windows
2012 can be backed up with a backup software that uses VSS to backup and restore entire DC VM,
because AD DC on Windows 2012 leverages the new VM-Generation ID feature that makes the AD DC
Virtualization aware and hence, any restore process of entire DC VM can be done safely. Check VMware
Best Practices to Virtualize AD DS on Windows 2012 here for more information about VM-Generation ID
and related Virtualization Safeguards.
2-) Make sure to backup any DC System State. System State contains AD DB, AD DB Logs, SYSVOL
Folder and any other OS critical component like registry files.
3-) For DR, you can use native AD DCs replication to replicate the AD DB between the main site and the
DR site. This approach requires min. management overhead and good DR capability. This approach only
lacks the ability to protect the five FSMO role holders.

Scalabilit
y

4-) Another approach for DR is to leverage VMware SRM with VM-Generation ID capability on Windows
2012. This approach helps to continuously replicate AD DC VMs using SRM Replication or Array-based
Replication and failover in case of disaster. This allows to protect FSMO roles holders as well as provide
AD infrastructure to failed-over VMs in the DR site.
1-) For greater scalability, try to upgrade your AD DCs to Windows Server 2012. AD DC on Windows
2012 leverages the new VM-Generation ID feature that makes the AD DC Virtualization aware and
hence, it can be cloned easily and any cloning process can be done safely. Check VMware Best Practices

Security

to Virtualize AD DS on Windows 2012 here for more information and cloning process step-by-step guide.
Cloning can help in case of urgently needed expansion in AD DC infrastructure, DR process or testing. It
also cuts down heavy network utilization by AD DCs in replication of entire DB to the new promotedfrom-scratch DCs.
Keep in mind that Cold Cloning is the only one supported by both VMware and Microsoft. Hot Cloning
isnt supported in production by either VMware or Microsoft.
1-) All security procedures done for securing physical DCs should be done in DC VMs, like: Role-based
Access Policy and hard drive encryption.
2-) Follow VMware Hardening Guide (v5.1/v5.5) for more security procedures to secure both of your VMs
and vCenter Server.

MS Clustering
Solutions

Availabilit
y

1-) Use vSphere HA with MSCS to provide additional level of availability to your
protected application.
2-) Use vSphere DRS with Partial Automated level with MSCS to provide automatic
placement of clustered VMs when powered on only. Clustered VMs use SCSI BUS
Sharing which require not to migrate these VMs with vMotion and hence,
Automatic DRS load balancing cant be used. If the vSphere Cluster on which
clustered VMs is configured with Automatic DRS, change VMs-specific DRS
configuration to Partial Automated.
3-) Affinity Rules:
With Cluster-in-a-box configuration, use VMs Affinity Rule to gather all clustered
VMs together on the same host. With Cluster-across-boxes or Physical-Virtual
Cluster, use VMs Anti-affinity Rule to separate the VMs across different Hosts. HA
doesnt respect VM Affinity/Anti-affinity rules and when a host fails, HA may
violate these rules. In vSphere 5.1, configure the vSphere Cluster with
ForeAffinePowerOn option set to 1 to respect all VMs Affinity/Anti-affinity rules.
In vSphere 5.5, configure the vSphere Cluster with both ForeAffinePowerOn &
das.respectVmVmAntiAffinityRules set to 1 to respect all VMs Affinity/Antiaffinity rules respectively.

Performa
nce

4-) Try to use VM Monitor to monitor activity of the clustered VMs and restart them
in case of OS failure.
1-) Memory Sizing:
Dont use Memory Over-commitment on ESXi hosts hosting clustered VMs.
Memory Over-commitment may cause small pauses to these VMs which are
sensitive to any time delay.
2-) SCSI Driver:

SCSI Driver
Supported
LSI-Logic Parallel
LSI-Logic SAS
LSI-Logic SAS

OS (Windows)

2003 SP1 or SP2 32/64 bit


2008 SP2 or 2008 R2 SP1 32/64 bit
2012 (vSphere 5.5.x) or 2012 R2 (vSphere 5.5 U1 or
later)
Keep in mind that, you have to use different SCSI drivers for both of Guest OS Disk
and shared Quorum Disk, i.e. both of SCSI (0:x) and (1:x).
3-) Storage Supported for OS Disks :
For OS disks in clustered VMs, its recommended to use Thick-provisioned Disks
instead of Thin ones for max. performance.
4-) Storage Supported for Shared Quorum Disk:
vSphere Cluster
OS (Windows) Disk Type
Version
Configuratio
n Type
vSphere
5.x

Cluster-in-abox
(Recommende
d
Configuration)
Cluster-in-abox
Cluster-acrossboxes
(Recommende
d
Configuration)
Cluster-acrossboxes
PhysicalVirtual

vSphere
5.5 Only

Cluster-in-abox
(Recommende

SCSI
BUS
Sharin
g
Virtual

2003 SP1 or
SP2
2008 SP2 or
2008 R2 SP1

Eager-Zeroed ThickProvisioned Virtual Disk


(.vmdk): Local/on Fiber
SAN.

2003
SP2
2008
2008
2003
SP2
2008
2008

SP1 or

Virtual-mode RDM Disk:


on Fiber SAN.

Virtual

Physical-mode RDM
Disk on Fiber SAN.

Physica
l

2003
SP2
2003
SP2
2008
2008
2008
2008
2012

SP1 or

Virtual-mode RDM Disk:


on Fiber SAN.
Physical-mode RDM
Disk on Fiber SAN.

Physica
l
Physica
l

Eager-Zeroed ThickProvisioned Virtual Disk


(.vmdk): Local/on

Virtual

SP2 or
R2 SP1
SP1 or
SP2 or
R2 SP1

SP1 or
SP2 or
R2 SP1
SP2 or
R2 SP1 or
or 2012

d
Configuration)

R2 (2012 R2
iSCSI/FCoE SAN.
requires
vSphere 5.5
U1)
Cluster-in-a2008 SP2 or
Virtual-mode RDM Disk: Virtual
box
2008 R2 SP1 or on iSCSI/FCoE SAN.
2012 or 2012
R2 (2012 R2
requires
vSphere 5.5
U1)
Cluster-across- 2008 SP2 or
Physical-mode RDM
Physica
boxes
2008 R2 SP1 or Disk on iSCSI/FCoE
l
(Recommende 2012 or 2012
SAN.
d
R2 (2012 R2
Configuration) requires
vSphere 5.5
U1)
Physical2008 SP2 or
Physical-mode RDM
Physica
Virtual
2008 R2 SP1 or Disk on iSCSI/FCoE
l
2012 or 2012
SAN.
R2 (2012 R2
requires
vSphere 5.5
U1)
In-Guest iSCSI target sharing for Quorum Disk is supported for any type of
clustering configuration and any OS.
vSphere 5.5.x supports in-guest FCoE target sharing for Quorum Disk.
Keep in mind that mixing between Cluster-across-box/Cluster-in-box configuration
isnt supported as well as mixing between different verison of vSphere in a single
cluster.
Mixing between different types of storage protocols connecting Quorum Disk isnt
supported, i.e. first node connected to Quorum Disk using iSCSI and the second is
connected using FC.
Mixing between different types of initiators for a storage protocol is supported
only on vSphere 5.5.x, i.e. Host 1 can connect using Software iSCSI and Host 2 can
connect using HW iSCSI Initiator. Same goes for FCoE.
5-) Multi-pathing Policy:
For clustered VMs configuration on vSphere 5.1 that requires FC SAN, certain
Multi-pathing policy must be set to configure how ESXi Hosts connect to that FC

SAN:
Multi-pathing Plugin
NMP
using SATP: ALUA_CX

SAN Type

Generic
EMC Clariion
EMC VNX
using SATP: ALUA
IBM 2810XIV
using SATP: Default_AA
IBM 2810XIV
Hitachi
NETAPP Data ONTAP 7-Mode
using SATP: SYMM
EMC Symmetrix
In vSphere 5.5 or above, this issue was resolved according

Path Selection
Policy
Round Robin
Fixed
MRU
Fixed
Fixed
to both of KB1 & KB2.

6-) Guest Disk IO Timeout:


From Guest OS, its recommended to change Disk IO Timeout to more than 60
seconds from the following registry key:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Disk\TimeOutValue.
7-) Set shared LUN on to which Quorum Disk is placed (RDM), Perennially
Reserved on each host participating to prevent long time duration of starting of
any ESXi host participating or hosting a clustered VM.
Check the following KB for more information:
http://kb.vmware.com/selfservice/microsites/search.do?
language=en_US&cmd=displayKC&externalId=1016106

Manageab
ility

8-) Network:
- You should choose the latest vNICs available to the Guest OS. The most preferred
is VMXNET3 for both Private and Public networks.
- Try to set your port groups with -at least- 2 physical NICs for redundancy and NIC
teaming capabilities. Connect each physical NIC to a different physical switch for
max. redundancy.
- Consider network separation between different types of networks, like: vMotion,
Management, Production, Fault Tolerance, etc. Network separation is either
physical or virtual using VLANs.
- Clustered VMs should have two vNICs, one for public network and the other one
for heartbeat network. For Cluster-across-boxes, configure heartbeat network with
two physical NICs for redundancy.
1-) Time Sync:
Time Synchronization is one of the most important things in SQL environments.
Its recommended to do the following:
- Let all your SQL VMs sync their time with DCs only, not with VMware Tools.
- Disable time-sync between SQL VMs and Hosts using VMware Tools totally (Even

after uncheck the box from VM settings page, VM can sync with the Host using
VMware Tools in case of startup, resume, snapshotting, etc.) according to the
following KB:
http://kb.vmware.com/selfservice/microsites/search.do?
language=en_US&cmd=displayKC&externalId=1189
- Sync all ESXi Hosts in the VI to the same Startum 1 NTP Server which is the
same time source of your forest/domain.

Recovera
bility

Scalabilit
y
Security

2-) Supported OSs and Number of Nodes:


No. of Nodes
OS (Windows)
2 Nodes
2003 SP1 or SP2 32/64 bit and
per vSphere 5.1 hypervisors.
2 Nodes
FCoE SAN hosting Quorum Disk
with vSphere 5.1 U2 and
Windows 2008/2012.
5 Nodes
2008 SP2 or 2008 R2 SP1 32/64
bit
5 Nodes
Windows 2012 (vSphere 5.5.x)
or Windows 2012 R2 (vSphere
5.5 U1 or later)
5 Nodes
FC SAN hosting Quorum Disk
with vSphere 5.1 U2 and
Windows 2012
1-) Try to maintain a proper backup/restore plan. This helps in case of total corrupt
of a cluster node which requires a full restore on bare metal/VM. Keep in mind also
to continuously test restoring your backup sets to test their effectiveness.
2-) Try to maintain a proper DR/BC plan. Clustering Configurations would not help
a lot in case of total data center failure situation. Try to test your DR/BC plan from
time to time, at least twice per year.
1-) For greater scalability, try to upgrade your clustered VMs to Windows Server
2012. With vSphere 5.5.x and Windows Server 2012, Quorum Disk can be hosted
on iSCSI or FCoE SAN. Issue of using Round Robin PSP is solved (under certain
conditions mentioned in this KB).
1-) All security procedures done for securing physical Microsoft Clusters should be
done in Clustered VMs, like: Role-based Access Policy.
2-) Follow VMware Hardening Guide (v5.1/v5.5) for more security procedures to
secure both of your VMs and vCenter Server.

Availabili

1-) Try to use vSphere HA in addition to Exchange DAG to provide the highest level of availability. Adapt a

Excha
nge

ty

protection policy of N+1, as N is the number of DAG members VMs in vSphere Cluster of N+1 hosts. In
case of a ESXi failure, in the background, vSphere HA powers-on the failed virtual machine on another
host, restoring the failed DAG member and bringing the newly passive database up to date and ready to
take over in case of a failure, or to be manually reactivated as the primary active database.
2-) Try to separate your DAG MBX VMs on different Racks, Blade Chassis and Storage Arrays if available
using VMs Affinity/Anti-affinity rules and Storage Affinity/Anti-affinity rules for most availability.
3-) For MBX VMs, use VMs anti-affinity rules to separate them over different hosts. When HA restart a VM,
itll not respect the anti-affintiy rule, but on the following DRS invocation, the VM will be migrated to
respect the rule. In vSphere 5.1, configure the vSphere Cluster with ForeAffinePowerOn option set to 1
to respect all VMs Affinity/Anti-affinity rules. In vSphere 5.5, configure the vSphere Cluster with both
ForeAffinePowerOn & das.respectVmVmAntiAffinityRules set to 1 to respect all VMs Affinity/Antiaffinity rules respectively. For CAS VMs, use Host-VM should affinity rule to load-balance them on different
group of hosts even if two CAS VMs stayed on the same host.
3-) As Microsoft supports vMotion of Exchange VMs, use DRS Clusters in Fully Automated Mode.
4-) vMotion:
- Leverage Multi-NIC vMotion feature to make vMotion process much faster even for large MBX VM.
- Try to enable Jumbo Frames on vMotion Network to reduce overhead and increase throughput.
- DAG members are very sensitive to any latency or drop in its heartbeat network and hence, any vMotion
operation may cause a false failover during the switch between the source and destination hosts
although this drop is really negligible and characterized by single ping-packet drop. Thats why we need
to set DAG Cluster Heartbeat setting SameSubnetDelay to 2 seconds (2000 milliseconds). For step-bystep guide of how to change this value, refer to:
http://www.vmware.com/files/pdf/Exchange_2013_on_VMware_Best_Practices_Guide.pdf. In some cases
when leveraging Multi-NIC vMotion and enabling Jumbo Frames on vMotion Network, changing heartbeat
setting isnt necessary.
5-) Try to leverage VM Monitoring to mitigate the risk of Guest OS failure. VMware Tools inside Exchange
VMs will send heartbeats to HA driver on the host. If its stopped because Guest OS failure, the host will
monitor IO and network activity of the VM for certain period. If theres also no activity, the host will restart
the VM. This add additional layer of availability for Exchange VMs.
6-) Try to leverage Symantec Application HA Agent for Exchange with vSphere HA & Exchange DAG for
max. availability. Using Application HA, the monitoring agent will monitor Exchange service, sending
heartbeats to HA driver on ESXi host. In case of application failure, it may restart services or mount
databases. If Application HA Agent cant recover the application from that failure, itll stop sending
heartbeats and the host will initiate a VM restart as a HA action.

Performa
nce

1-) Try to leverage Building Blocks approach. Divide your required number of Users Mailboxes on
equally-sized MBX VMs of either 500, 1000, 2000, or 4000 Users Mailboxs per VM. Calculate the required
resources per VM and then decide the number of VMs required to serve your requirement. Keep in mind
that, deploying your MBX VMs as a Standalone Configuration is somehow different in calculations than
DAG Configuration.
2-) Its recommended to distribute all users mailboxes evenly on all of your DAG members to load balance
the users load between all MBX VMs. Keep in mind that additional compute capacity on MBX VMs is

needed for passive DBs for failover. In addition, distribute these DAG members VMs evenly on all hosts
using DRS VMs Anti-affinity rule for more load-balancing and higher availability.
2-) Use Exchange Server Calculator (2010:
http://blogs.technet.com/b/exchange/archive/2010/01/22/updates-to-the-exchange-2010-mailbox-serverrole-requirements-calculator.aspx) & (2013: https://gallery.technet.microsoft.com/Exchange-2013-ServerRole-f8a61780) to calculate the required resources for your MBX VMs according to your chosen building
block sizes.
3-) For Exchange 2010, follow this link (http://technet.microsoft.com/en-us/library/dd346700.aspx &
http://technet.microsoft.com/en-us/library/dd346701.aspx) for CAS/ HUB Sizing according to sizing MBX
Server. For Exchange 2013, follow this link (http://blogs.technet.com/b/exchange/archive/2013/05/06/askthe-perf-guy-sizing-exchange-2013-deployments.aspx) for CAS Sizing.
4-) CPU, Memory and IOps Sizing Guide for Exchange 2010: (http://technet.microsoft.com/enus/library/ee712771.aspx & http://technet.microsoft.com/en-us/library/ee832793.aspx).
5-) A complete guiding size for Exchange 2013:
http://blogs.technet.com/b/exchange/archive/2013/05/06/ask-the-perf-guy-sizing-exchange-2013deployments.aspx
4-) CPU Sizing:
- Exchange is a SMP application that can use all VM vCPUs. Assign vCPUs as required and dont overallocate to the VM to prevent CPU Scheduling issues at hypervisor level and high RDY time.
- Dont over-commit CPUs. Ratio of Virtual: Physical Cores should be 2:1 max (better to keep it nearly 1:1)
to be under MS support umbrella. In some cases like small environments, over-commit is allowed after
establishing a performance baseline.
- Enable Hyperthreading when available. It wont double the processing power in opposite to what shown
on ESXi host as double number of logical cores- but itll give a CPU processing boost up to 20-25% in
some cases. Dont consider it when calculating Virtual: Physical Cores ratio.
- Exchange VMs should have CPU Utilization less than 70-75% if used in a Standalone Configuration. If
DAG is to be implemented, CPU Utilization should be less than 80% even in case of failover of a failed
MBX DB. MBX role shouldnt use more than 40% of CPU Utilization in case of Multi-role deployment.
- Exchange 2010/2013 arent vNUMA-aware, so theres no need to configure large Exchange VMs like
underlying physical NUMA topology. On the other side, ESXi Hypervisor is NUMA aware and it leverages
the NUMA topology to gain a significant performance boost. Try to size your Exchange VMs to fit inside
single NUMA node to gain the performance boost of NUMA node locality.
- Use the following equations:
5-) Memory Sizing:
- Dont over-commit memory, as Exchange is a memory-intensive application. If needed, reserve the
configured memory to provide the required performance level, specially for MBX VMs. Keep in mind that
memory reservation affects as aspects, like: HA Slot Size, vMotion chances and time. In addition,
reservation of memory removes VM swapfiles from datastores and hence, its space is usable for adding
more VMs.
- Dont disable Ballon-driver installed with VMware Tools. Its ESXi Hosts last line of defense against

memory contention before Compression and Swapping to Disk processes.


- Exchange VMs should be always monitored for its memory performance and adjust the configured and
reserved if any- memory values to meet its requirements.
- Adding memory on the fly to Exchange VMs specially Exchange 2013- will not add any performance
gain till the VM is rebooted. Thats why enabling hot add wont be necessary.
6-) Storage Sizing:
- Always consider any storage space overhead while calculating VMs space size required. Overhead can
be: swapfiles, VMs logs or snapshots. Its recommended to add 20-30% of space as an overhead.
- Separate Exchange VMs disks on different dedicated if needed- datastores, as Exchange is an IOintensive application to avoid IO contention.
- Provide at least 4 paths, through two HBAs, between each ESXi host and the Storage Array for max.
availability.
- For IP-based Storage, enable Jumbo Frames on its network end-to-end. Jumbo Frames reduces network
overhead and increases Throughput.
- RDM can be used in many cases, like: Exchange P2V migration or to leverage 3 rd Party array-based
backup tool. Choosing RDM disks or VMFS-based disks are based on your technical requirements. No
performance difference between these two types of disks.
- Microsoft supports only virtualized Exchange VMs on either FC, FCoE, iSCSI SANs or using In-guest iSCSI
targets. NAS Arrays arent supported either as a NFS datastore or accessing it through UNC path from
inside the Guest OS.
- For heavy workloads, dedicate a LUN/Datastore per a MBX VM for max. performance although itll add
high management and maintenance overhead.
- Use Paravirtual SCSI Driver in all of your Exchange VMs, specially disks used for DB and Logs, for max.
performance and least latency and CPU overhead.
- Distribute any Exchange VM disks on the four allowed SCSI drivers for max. performance paralleling and
higher IOps. Try to use Eager-zeroed Thick disks for DB and Logs disks.
- ESXi 5.x Host may need to increase its VMFS Heap Size to allow for hosting Exchange VMs with large
disks of several TBs, according to this link:
http://kb.vmware.com/selfservice/microsites/search.do?
language=en_US&cmd=displayKC&externalId=1004424
This issue is mitigated in vSphere 5.1 U1 and later.
- Partition Alignment gives a performance boost to your backend storage, as spindles will not make two
reads or writes to process single request. VMFS5 created using vSphere (Web) Client will be aligned
automatically as well as any disks formatted using newer versions of Windows. Any upgraded VMFS
datastores or upgraded versions of Windows Guests will require a partitions alignment process. For
upgraded VMFS, its done by migrate VMs disks to another datastore using Storage vMotion, then format
and recreate the datastore on VMFS5.
7-) Network:
- Use VMXNet3 vNIC in all Exchange VMs for max. performance and throughput and least CPU overhead.

- Exchange VMs port group should have at least 2 physical NICs for redundancy and NIC teaming
capabilities. Connect each physical NIC to a different physical switch for max. redundancy.
- Consider network separation between different types of networks, like: vMotion, Management, Exchange
production, Fault Tolerance, etc. Network separation is either physical or virtual using VLANs.
- DAG members VMs should have two vNICs, one for public network and the other one for heartbeat and
replication network. Keep in mind that, configuring DAG members with one vNIC is supported, but its not
considered as a best practice.
5-) Monitoring:
- Always, try to monitor your environment using In-guest tools and ESXi and vCenter performance charts.
The following are some counters that may help in monitoring your Exchange VMs performance, as well as
your hosting ESXi hosts using ESXTOP tool and Windows Performance Monitor respectively:
Subsystem
CPU
Memory
Storage
Network
Subsystem
VM Processor
VM Memory

ESXTOP Counters
%RDY
%USED
%ACTV
SWW/s
SWR/s
ACTV
DAVG/cmd
KAVG/cmd
MbRX/s
MbTX/s
Win PerfMon Counters
% Processor Time
Memory Ballooned
Memory Swapped

vCenter Counter
Ready (milliseconds in a 20,000 ms window)
Usage
Active
Swap-in Rate
Swap-out Rate
Commands
Device Latency
Kernel Latency
packets-Rx
packets-Tx
Description
Processor usage across all vCPUs.
Amount of memory in MB reclaimed by
balloon driver.
Amount of memory in MB forcibly
swapped to ESXi host swap.
Physical memory in use by the virtual
machine.

Memory Used
Managea
bility

1-) Try to leverage vSphere Templates in your environment. Create your Golden Template for every tier of
your VMs. This reduces the time required for deploying or scaling your Exchange environment as well as
preserve consistency of configuration throughout your environment.
2-) Use vCenter Operation Manager to monitor your environment performance trends, establish a

dynamic baseline of your VMs performance to prevent false static alerts, estimate the capacity required
for further scaling and proactively protect your environment against sudden peaks of VMs performance
that need immediate scaling-up of resources.
3-) Microsoft Support Statement for Exchange in Virtual environments:
http://technet.microsoft.com/en-us/library/jj619301(v=exchg.150).aspx
10-) Time Synchronization is one of the most important things in Exchange environments. Its
recommended to do the following:
- Let all your Exchange VMs sync their time with DCs only, not with VMware Tools.
- Disable time-sync between Exchange VMs and Hosts using VMware Tools totally (Even after uncheck the
box from VM settings page, VM can sync with the Host using VMware Tools in case of startup, resume,
snapshotting, etc.) according to the following KB:
http://kb.vmware.com/selfservice/microsites/search.do?
language=en_US&cmd=displayKC&externalId=1189
- Sync all ESXi Hosts in the VI to the same Startum 1 NTP Server which is the same time source of your
forest/domain.
Recovera
bility

Scalabilit
y

1-) Try to leverage any backup software that uses Microsoft Volume Shadow Service (VSS). These are
Exchange-aware and dont cause any corruption in mailbox DB due to quiesceing the DB during the
backup operation. Ofcourse, one of them is vSphere Advanced Data Protection. Check the following link:
http://www.vmware.com/files/pdf/products/vsphere/VMware-vSphere-Data-Protection-Product-FAQ.pdf
2-) If available, you can use any backup software that depends on array-based snapshots if its Exchangeaware.
3-) Use VMware Site Recovery Manager (SRM) if available for DR. With SRM, automated failover to a
replicated copy of the VMs in your DR site can be carried over in case of a disaster or even a failure of
single MBX VM for example- in your environment.
4-) If SRM isnt available for any reason, try to leverage any 3 rd party replication software to replicate your
Exchange VMs to a DR site for recovery in case of any disaster.
5-) Another approaches for DR:
- You can use Stretched DAG configuration with Automatic Failover.
- You can use Stretched DAG Configuration with Lagged Copy.
http://www.vmware.com/files/pdf/Exchange_2013_on_VMware_Availability_and_Recovery_Options.pdf
1-) Scale-up Approach of DAG members requires a large ESXi Hosts with many sockets and RAM. It
reduces the number of VMs required to serve certain number of mailbox users and hence, a single failed
VM will affect a large portion of users. Thats why Scale-up Approach needs a careful attention to
availability of DAG VMs. Scale-out Approach requires smaller ESXi Hosts and gives a more flexibility in
designing a DAG VM, but requires high number of ESXi hosts to provide the required level of availability. A
single VM failure has a less effect using Scale-out Approach and it requires less time for migration using
vMotion and hence, DRS will be more effective. Theres no best approach here. It all depends on your
environment and your requirements.

Variables
Adjusted Mega Cycles per Core
Baseline Mega Cycles per Core
Mega Cycle for Certain Mailbox Usage
Total Required Mega Cycles
Required Mega Cycles for Active DB Copy
Required Mega Cycles for Active DB Copy (Worst Case
Scenario-Single Node Failure)
Required Mega Cycles for Passive DB Copy
Required Mega Cycles for Passive DB Copy (Worst Case
Scenario-Single Node Failure)
Required Mega Cycles for Effect of Passive DB on Active DB
Copy
Total Number of Users Mailboxes
Total Number of Users Mailbox per MBX Server (Building Block
Size)
Total Number of Active Mailboxes (Worst Case Scenario-Single
Node Failure)
Total Number of Passive Mailboxes (Worst Case Scenario-Single
Node Failure)
Utilization
No. of Virtual Cores of CAS
No. of Virtual Cores of MBX
Total Cache Memory Required on a Mailbox Server
Cache Memory required per Mailbox usage
Memory required for CAS Server
Exchange 2013:
AMC= BMC*(New Rating/Baseline Rating)

Standalone Configuration (U70-75%)


TMC= (MMC*NT)/U
NVCM= TMC/AMC
MT= NS*MMBX
NVCC= 0.375*U*NVCM
MC= 2+2*NVCC
8GB
For Multi-role Deployment:
No. of vCPU= 1.375* NVCM
Memory Required MC+MT

Representation
AMC
BMC
MMC
TMC
RMCADB
RMCDADB
RMCPDB
RMCDPDB
RMCPADB
NT
NS
NA
NP
U
NVCC
NVCM
MT
MMBX
MC

SQL
Serv
er
2012

Availabili
ty

1-) Try to use vSphere HA in addition to SQL AAG to provide the highest level of availability. Adapt a
protection policy of N+1, as N is the number of AAG members VMs in vSphere Cluster of N+1 hosts. In case
of an ESXi failure, in the background, vSphere HA powers-on the failed virtual machine on another host,
restoring the failed AAG member and bringing the newly passive database up to date and ready to take
over in case of a failure, or to be manually reactivated as the primary active database.
2-) Try to separate your SQL AAG VMs on different Racks, Blade Chassis and Storage Arrays if available
using VMs Affinity/Anti-affinity rules and Storage Affinity/Anti-affinity rules for most availability.
3-) For SQL AAG VMs, use VMs anti-affinity rules to separate them over different hosts. When HA restart a
VM, itll not respect the anti-affintiy rule, but on the following DRS invocation, the VM will be migrated to
respect the rule. In vSphere 5.1, configure the vSphere Cluster with ForeAffinePowerOn option set to 1 to
respect all VMs Affinity/Anti-affinity rules. In vSphere 5.5, configure the vSphere Cluster with both
ForeAffinePowerOn & das.respectVmVmAntiAffinityRules set to 1 to respect all VMs Affinity/Anti-affinity
rules respectively. For licensing limits, use Host-VM must affinity rule to force VMs to run on licensed hosts
only. Keep in mind that, must rules are respected even in case of HA invocation, i.e. HA will not restart a
failed VM if it cant respect a must rule. In this case, Its recommended to separate your licensed hosts on
different Racks, Blade Chassis and Power Supplies for more availability.
4-) If SQL AAG isnt available, therere many other High-Availability technologies to be used in SQL Clusters:
Technology
Autom No. of
Readab RPO
RTO
vSphere Compatibility
atic
Second le
Failov
aries
Second
er
aries
Always-on Availability
Yes
Max. 2
Yes
0
Secon Totally compatible and
Group (AAG)ds
supported with vSphere HA,
Synchronous Mode
DRS and vMotion
Always-on Availability
No
Max. 4
Yes
Secon Minute Totally compatible and
Group (AAG)ds
s
supported with vSphere HA,
Asynchronous Mode
DRS and vMotion
Always-on Failover
Yes
Max. 4
No
0
Secon Requires certain
Cluster Instances
(vSphere
ds
configuration to be
hosted
supported on vSphere
Clusters)
Clusters. Totally supported
with HA but isnt supported
for vMotion and Automatic
DRS.
Data Mirroring High
Yes
Max. 1
No
0
Secon Totally compatible and
Safety Mode with
ds
supported with vSphere HA,
automatic Failover
DRS and vMotion
Data Mirroring High
No
Max. 1
No
0
Minute Totally compatible and
Safety Mode without
s
supported with vSphere HA,
automatic Failover
DRS and vMotion
Data Mirroring High
No
Max. 1
No
Secon Minute Totally compatible and

Performance Mode

ds

supported with vSphere HA,


DRS and vMotion
Log Shipping
No
N/A
No
Minute Minute Totally compatible and
s
s-Hrs
supported with vSphere HA,
DRS and vMotion
Backup/Restore
No
N/A
No
Minute HrsTotally compatible and
s-Hrs
Days
supported with vSphere HA,
DRS and vMotion
Keep in mind that, all of AAG, Log Shipping or Data Mirroring require DB Full Recovery Mode, which may
lead to higher storage growth.
5-) As Microsoft supports vMotion of SQL VMs, use DRS Clusters in Fully Automated Mode. Itll always load
balance your SQL VMs across the cluster, respecting all of your configured affinity/anti-affinity rules.
5-) vMotion:
- Leverage Multi-NIC vMotion feature to make vMotion process much faster even for large SQL VM.
- Try to enable Jumbo Frames on vMotion Network to reduce overhead and increase throughput.
- SQL AAG members are very sensitive to any latency or drop in its heartbeat network and hence, any
vMotion operation may cause a false failover during the switch between the source and destination hosts
although this drop is really negligible and characterized by single ping-packet drop. Thats why we need to
set SQL AAG Cluster Heartbeat setting SameSubnetDelay to 2 seconds (2000 milliseconds). In some
cases when leveraging Multi-NIC vMotion and enabling Jumbo Frames on vMotion Network, changing
heartbeat setting isnt necessary.
6-) Try to leverage VM Monitoring to mitigate the risk of Guest OS failure. VMware Tools inside SQL VMs will
send heartbeats to HA driver on the host. If its stopped because Guest OS failure, the host will monitor IO
and network activity of the VM for certain period. If theres also no activity, the host will restart the VM.
This add additional layer of availability for Exchange VMs.
7-) Try to leverage Symantec Application HA Agent for SQL with vSphere HA for max. availability. Using
Application HA, the monitoring agent will monitor SQL instance and its services, sending heartbeats to HA
driver on ESXi host. In case of application failure, it may restart services or mount databases. If Application
HA Agent cant recover the application from that failure, itll stop sending heartbeats and the host will
initiate a VM restart as a HA action.
Performa
nce

1-) Try to leverage Resources Governor limits and pools in conjunction with using vSphere Resource Pools
for more control over the compute resources presented to SQL Server. Generally, its better to adjust
resources settings from SQL Server Resources Governor itself, rather than vSphere Resource Pools. SQL
Server Resources Governor can now create up to 64 resource pools and can sit affinity rules on which CPUs
or NUMA nodes to process similar to vSphere Resource-controlling capabilities.
2-) Disable virus scanning on SQL DB and logs as it may lead to a performance impact on your SQL DB.
2-) CPU Sizing:
- As most of SQL VMs underutilize their vCPUs (95% of SQL servers are under 30% utilization as stated by
VMware Capacity Planner), its preferred if available- to choose physical CPUs with higher clock and lower

number of cores. This reduces the license costs as the number of physical cores is reduced and doesnt
affect the cores performance due to higher clock speed and more MHz to share between the low-utilization
SQL VMs. Keep in mind that, you have to do your homework of capacity planning and performance
monitoring to maintain your performance baseline before deciding. You can also use SPECint2006 Results
to compare between CPUs.
- Dont over-commit CPUs. Its better to keep it nearly 1:1. In some cases like small environments, overcommitment is allowed after establishing a performance baseline of normal-state utilization.
- Enable Hyperthreading when available. It wont double the processing power in opposite to what shown
on ESXi host as double number of logical cores- but itll give a CPU processing boost up to 20-25% in some
cases and these added logical cores wont be taken into licensing consideration (Max. Virtualization per
Core Licensing). Dont consider it when calculating Virtual: Physical Cores ratio. In some cases of Tier-1 SQL
Servers, setting VM Setting Hyperthreading Sharing to None disabling Hyperthreading for this VM- may
lead to better performance than enabling Hyperthreading. Test both options and choose what fits your
performance requirements.
- ESXi Hypervisor is NUMA aware and it leverages the NUMA topology to gain a significant performance
boost. Try to size your SQL VMs to fit inside single NUMA node to gain the performance boost of NUMA node
locality.
- In case of large SQL VMs that wont fit inside single NUMA node, keep in mind that SQL 2012/2014 are
vNUMA-aware, so its recommended to show underlying physical NUMA topology to SQL VMs. Its applicable
for large VMs with 8 or more vCPUs or can be enabled for small VMs using advanced VM Setting
numa.vcpu.min.
3-) Memory Sizing:
- Dont over-commit memory, as SQL 2012/2014 is a memory-intensive application. If needed, reserve the
configured memory to provide the required performance level. Keep in mind that memory reservation
affects as aspects, like: HA Slot Size, vMotion chances and time. In addition, reservation of memory
removes VM swapfiles from datastores and hence, its space is usable for adding more VMs. For some
cases, where a lot of underutilized SQL Servers there, over-commitment is allowed to get higher
consolidation ratios. Performance monitoring is mandatory in this case to maintain a baseline of normalstate utilization.
- Dont disable Ballon Driver installed with VMware Tools. Balloning is the last line of defense of ESXi Host
before compression and swapping to disk when its memory becomes too low. When Balloning is needed,
Ballon Driver will force Guest OS to swap the idle memory pages to disk to return the free memory pages
to the Host, i.e. swapping is done according to Guest OS techniques. Swapping to disk is done by ESXi Host
itself. Hostd swap memory pages from physical memory to VM swap file on disk without knowing what
these pages contain or if these pages are required or idle. Balloning hit to the performance is somehow
much lower than Swapping to disk. Generally speaking as mentioned in the previous point, dont overcommit memory for business critical SQL VMs and if youll do some over-commitment, dont push it to
these extreme limits.
- SQL Server 2012/2014 tends to use all the configured memory of the VM. This may lead to some
performance issues cause any other application installed like: backup agent or AV agent- wouldnt find
adequate memory to operate. Its recommended to use Max Server Memory parameter from inside SQL
itself and set it to Configured Memory minus 4GB to allow some memory for Guest OS and any 3 rd-party

Applications.
- Set Min Server Memory to define a min amount of memory for SQL server to acquire for processing.
Min. Sever Memory will not immediately be allocated on startup. However, after memory usage has
reached this value due to client load, SQL Server will not free memory unless the minimum server memory
value is reduced. This can help in high-consolidation-ratios scenarios, as when Host is memory-contended
and Ballon Driver at its full swing to free memory pages from Guest OSes, Min. Server Memory will
prevent Ballong Driver from inflating and taking this defined memory space.
- Use Large Memory Pages for Tier-1 SQL VMs. SQL Server supports the concept of large pages when
allocating memory for some internal structures and the buffer pool, when the following conditions are met:
- You are using SQL Server Enterprise Edition.
- VM has 8GB or more of physical RAM.
- The Lock Pages in Memory privilege is set for the service account. Check:
http://msdn.microsoft.com/en-us/library/ms190730(v=sql.110).aspx
To enable all SQL Server buffer to use Large Memory pages, you should start your 64bit SQL Server with
trace flag 834. To enable trace flag 834, check: http://support.microsoft.com/kb/920093. his will lead
to:
- With large pages enabled in the guest operating system, and the virtual machine is running on a
host that supports large pages, vSphere does not perform Transparent Page Sharing on the VMs
memory unless hosts reached Memory Hard state.
- With trace flag 834 enabled, SQL Server startup behavior changes. Instead of allocating memory
dynamically at runtime, SQL Server allocates all buffer pool memory during startup. Therefore, SQL
Server startup time can be significantly delayed.
- With trace flag 834 enabled, SQL Server allocates memory in 2MB contiguous blocks instead of 4KB
blocks. After the host has been running for a long time, it might be difficult to obtain contiguous
memory due to fragmentation. If SQL Server is unable to allocate the amount of contiguous memory
it needs, it can try to allocate less, and SQL Server might then run with less memory than you
intended.
This only should be enabled on Tier-1 dedicated SQL VMs with memory reservation and no memory overcommitment. Any memory over-commitment technique can lead to performance issues with trace flag
834. For more information: http://blogs.msdn.com/b/psssql/archive/2009/06/05/sql-server-and-large-pagesexplained.aspx
- SQL Server uses all the configured memory as a cache of its queries and it manages its configured
memory by its own techniques. Active Memory counter from vSphere (Web) Client may not reflect the
actual usage of SQL Server memory. Use in-guest memory counters and SQL Server memory counters.
4-) Storage Sizing:
- If possible, use SQLIOSIM tool found in Binn folder found under your SQL instance installation path- to
simulate your SQL DB IO load on your backend storage to establish a performance baseline of your
backend storage array. Its recommended to run it in non-business hours as it loads your HBAs, NICs and
back-end array. Check: http://support.microsoft.com/kb/231619
- Always consider any storage space overhead while calculating VMs space size required. Overhead can be:
swapfiles, VMs logs or snapshots. Its recommended to add 20-30% of space as an overhead.

- Follow Microsoft SQL Storage Best Practices: http://technet.microsoft.com/en-us/library/cc966534.aspx


- Separate different SQL VMs disks on different dedicated if needed- datastores to avoid IO contention, as
SQL is an IO-intensive application.
- Provide at least 4 paths, through two HBAs, between each ESXi host and the Storage Array for max.
availability.
- For IP-based Storage, enable Jumbo Frames on its network end-to-end. Jumbo Frames reduces network
overhead and increases Throughput.
- RDM can be used in many cases, like: SQL P2V migration or to leverage 3 rd Party array-based backup tool.
Choosing RDM disks or VMFS-based disks are based on your technical requirements. No performance
difference between these two types of disks.
- Use Paravirtual SCSI Driver in all of your SQL VMs, specially disks used for DB and Logs, for max.
performance, least latency and least CPU overhead.
- Distribute any SQL VM disks on the four allowed SCSI drivers for max. performance paralleling and higher
IOps. Its recommended to use Eager-zeroed Thick disks for DB and Logs disks.
- The following table shows the Read/Write behavior of each of SQL DB components:
DB Component
Read/Write
RAID Recommended
DB
Read Intensive
RAID 5
Logs
Write Intensive
RAID 1/10
tempdb
Write Intensive
RAID 10
- Allocate tempdb for each of your SQ VM vCPU. Its recommended if these tempdbs are on fast SSD array
for better overall performance.
- Partition Alignment gives a performance boost to your backend storage, as spindles will not make two
reads or writes to process single request. VMFS5 created using vSphere (Web) Client will be aligned
automatically as well as any disks formatted using newer versions of Windows. Any upgraded VMFS
datastores or upgraded versions of Windows Guests will require a partitions alignment process. For
upgraded VMFS, its done by migrate VMs disks to another datastore using Storage vMotion, then format
and recreate the datastore on VMFS5.
- Enable Instant Initialization feature for speeding up the growth or creating new databases. It forces
Windows to initialize the DB file before zeroing all of the blocks in its allocated size, then Windows pads
zeros to it as long as a new writes occur. Log files disks cant be instantly initializaed. Check:
http://technet.microsoft.com/en-us/library/ms175935(v=sql.110).aspx
5-) Network:
- Use VMXNet3 vNIC in all SQL VMs for max. performance and throughput and least CPU overhead.
- SQL VMs port group should have at least 2 physical NICs for redundancy and NIC teaming capabilities.
Connect each physical NIC to a different physical switch for max. redundancy.
- Consider network separation between different types of networks, like: vMotion, Management, SQL
production, SQL Replication, Fault Tolerance, etc. Network separation is either physical or virtual using
VLANs.
- Clustered SQL VMs should have two vNICs, one for public network and the other one for heartbeat and
replication network. Its better to dedicate a physical NIC on ESXi hosts for replication network between

Clustered SQL VMs, specially when using Synchronous-commit mode AAGs.


6-) Monitoring:
Try to establish a performance baseline for your SQL VMs and VI by monitoring the following:
- ESXi Hosts and VMs counters:
Resource

Metric
(esxtop/resxtop)

Metric (vSphere
Client)

Host/
VM

Description

CPU

%USED
%RDY
%SYS

Used
Ready
System

Both
VM
Both

Memor
y

Swapin,
Swapout
MCTLSZ (MB)

Swapinrate,
Swapoutrate
vmmemctl

Both

Disk

READs/s,
WRITEs/s
DAVG/cmd
KAVG/cmd

NumberRead,
NumberWrite
deviceLatency
KernelLatency

Both

CPU used over the collection interval (%)


CPU time spent in ready state
Percentage of time spent in the ESX/ESXi Server
VMKernel
Memory ESX/ESXi host swaps in/out from/to disk (per
virtual machine, or cumulative over host)
Amount of memory reclaimed from resource pool by
way of ballooning
Reads and Writes issued in the collection interval

MbRX/s, MbTX/s

Received,
Transmitted
PacketsRx,
PacketsTx
DroppedRx,
DroppedTx

Both

Average latency (ms) of the device (LUN)


Average latency (ms) in the VMkernel, also known as
Queuing Time
Amount of data received/transmitted per second

Both

Received/Transmitted Packets per second

Both

Receive/Transmit Dropped packets per second

Networ
k

PKTRX/s, PKTTX/s
%DRPRX,
%DRPTX

Both

Both
Both

-In-guest Counters:
http://www.vmware.com/files/pdf/solutions/SQL_Server_on_VMware-Best_Practices_Guide.pdf
Managea
bility

1-) SQL Server 2012 Support Statement for Virtualization: http://support.microsoft.com/kb/956893


2-) SQL Server 2012/2014 Editions: http://www.microsoft.com/en-us/server-cloud/products/sql-servereditions/
2-) SQL Server 2012/2014 Licensing:
2012:
http://download.microsoft.com/documents/china/sql/SQL_Server_2012_Licensing_Reference_Guide.pdf
2014: http://www.microsoft.com/en-us/server-cloud/products/sql-server/buy.aspx
Two licensing approaches: Licensing your VMs or Licensing your physical hosts (Max. Virtualization
Approach).
- Licensing your VMs:
Used usually for small deployments. Two models: Licensing your virtual cores or Server/CAL licensing.
Licensing Virtual Cores
Licensing Server/CAL
- Min.: 4 virtual cores per VM.
- Each VM will have Server license and each
- Core Factor doesnt apply. Hyperthreading is
connection requires CAL license.

taken into consideration as additional vCPUs that


- Available for Standard and Business Intelligence
must be licensed if used.
Editions.
- Available only for Standard Edition.
- Allow for VMs mobility across different hosts
- Allow for VMs mobility across different hosts
using VMware vMotion with Software Assurance
using VMware vMotion with Software Assurance
(SA) Benefits.
(SA) Benefits.
- Licensing your Physical Hosts (Max. Virtualization Approach):
Used for large virtual environments. You count only for physical CPU cores with consideration of Core Factor
and Hyperthreading isnt taken into considerations. With SA Benefits, it allows for unlimited number of SQL
Server VMs and allows for license mobility as much as possible in your datacenter. Without SA, youre
limited in deploying total vCPUs of SQL VMs less than or equal to number of physical Cores licensed for SQL
per host.
3-) SQL Server 2012/2014 Maximums: http://msdn.microsoft.com/en-us/library/ms143432.aspx
4-) Try to have a policy for Change Management in your environment to control the SQL Server VMs sprawl
due to high demand on SQL Servers for the different purposes in your environment, like: testing,
development, supporting single production application, etc.
5-) Try to leverage monitoring and capacity planning tools, like: Microsoft MAP Tool. It helps significantly in
monitoring all your SQL VMs performance and utilization, your DBs sizes and performance trends as well as
creating a complete inventory of all your SQL servers, their editions and licenses (This helps a lot in case of
P2V, DBs migration to SQL Servers or SQL Server migration to new version).
6-) Try to use Upgrade Advisor to perform a full check-up on the old SQL Servers before upgrading to SQL
2012/2014. Keep in mind that it cant check on OS or some 3 rd-party applications installed that may not
allow for upgrade. For more information:
http://msdn.microsoft.com/en-us/library/ms144256.aspx
7-) Try to leverage Microsoft SQL Best Practices Analyzer to make sure you follow the best practices of
deploying SQL Server in your environment.
8-) Try to leverage VMware Converter for faster migration from physical SQL Servers. Its recommended to
use Cold P2V conversion to preserve DBs consistency. Keep in mind to remove unneeded HW drivers from
P2V VMs after successful conversions. This is can be done by:
- Change Windows Environmental Variable devmgr_show_unpresent_devices to 1.
- From Device Manager, and after all not presented physical drivers are shown, remove all unneeded old
drivers.
- Restart the VM.
9-) An availability group listener is a virtual network name (VNN) that directs read-write requests to the
primary replica and read-only requests to the read-only secondary replica. Always create an availability
group listener when deploying AAG on vSphere. That enables application clients to connect to an
availability replica without knowing the name of the physical instance of the SQL Server installation. For
more information: http://technet.microsoft.com/en-us/library/hh213417.aspx
10-) Time Synchronization is one of the most important things in SQL environments. Its recommended to
do the following:
- Let all your SQL VMs sync their time with DCs only, not with VMware Tools.

Recovera
bility

Scalabilit
y

Security

- Disable time-sync between SQL VMs and Hosts using VMware Tools totally (Even after uncheck the box
from VM settings page, VM can sync with the Host using VMware Tools in case of startup, resume,
snapshotting, etc.) according to the following KB:
http://kb.vmware.com/selfservice/microsites/search.do?
language=en_US&cmd=displayKC&externalId=1189
- Sync all ESXi Hosts in the VI to the same Startum 1 NTP Server which is the same time source of your
forest/domain.
11-) Make sure that you enable Full Recovery Mode on all DBs that will be included in your SQL AAGs and
also make sure that at least single Full Backup is taken.
1-) Try to leverage any backup software that uses Microsoft Volume Shadow Service (VSS). These are SQLaware and dont cause any corruption in DB due to quiesceing the DB during the backup operation.
Ofcourse, one of them is vSphere Advanced Data Protection. Check the following link:
http://www.vmware.com/files/pdf/products/vsphere/VMware-vSphere-Data-Protection-Product-FAQ.pdf
2-) If available, you can use any backup software that depends on array-based snapshots if its Exchangeaware.
2-) Use VMware Site Recovery Manager (SRM) if available for DR. With SRM, automated failover to a
replicated copy of the VMs in your DR site can be carried over in case of a disaster or even a failure of
single MBX VM for example- in your environment.
3-) If VMware SRM isnt available, you can leverage some availability features of SQL itself for more
recoverability and DR. You can either use a mix between AAG Synchronous/Asynchronous replicas or Data
Mirroring in High safety Mode with Log Shipping in DR Site. This approach leads to lower cost, but with
higher management overhead and higher RPO/RTO results than using VMware SRM.
- Leverage CPU/Memory Hot add with SQL VMs to scale them as needed. All you need is to use
RECONFIGURE query using Management Studio to force SQL Server to use the newly added resources.
- Try to leverage SQL Sysprep Tool to provide a golden template of your SQL VM. Itll reduce the time
required for deploying or scaling your SQL environment as well as preserve consistency of configuration
throughout your environment. For more information, check: http://msdn.microsoft.com/enus/library/ee210754.aspx & http://msdn.microsoft.com/en-us/library/ee210664.aspx
- Scale-up Approach of SQL VMs requires a large ESXi Hosts with many sockets and RAM. It reduces the
number of VMs required to serve certain number of DBs and hence, a single failed VM will affect a large
portion of users. Thats why Scale-up Approach needs a careful attention to availability of DAG VMs. In the
same time it reduces the cost of software licenses and physical hosts. Scale-out Approach requires smaller
ESXi Hosts and gives a more flexibility in designing a SQL VM, but requires high number of ESXi hosts to
provide the required level of availability and more software licenses and hence, more cost. A single VM
failure has a less effect using Scale-out Approach and it requires less time for migration using vMotion and
hence, DRS will be more effective. Theres no best approach here. It all depends on your environment and
your requirements.

Ora
cle
DB

Availabili
ty

1-) Try to use vSphere HA in addition to Oracle RAC Cluster to provide the highest level of availability. Adapt
a protection policy of N+1, as N is the number of Oracle RAC Cluster members VMs in vSphere Cluster of
N+1 hosts. In case of an ESXi failure, in the background, vSphere HA powers-on the failed virtual machine
on another host, restoring the failed Oracle RAC Clusters member and bringing the newly passive member
up and ready to take over in case of a failure, or to be manually reactivated as the primary active instance.
2-) Try to separate your Oracle RAC Cluster VMs on different Racks, Blade Chassis and Storage Arrays if
available using VMs Affinity/Anti-affinity rules and Storage Affinity/Anti-affinity rules for most availability.
3-) For Oracle RAC Cluster VMs, use VMs anti-affinity rules to separate them over different hosts. When HA
restart a VM, itll not respect the anti-affintiy rule, but on the following DRS invocation, the VM will be
migrated to respect the rule. In vSphere 5.1, configure the vSphere Cluster with ForeAffinePowerOn
option set to 1 to respect all VMs Affinity/Anti-affinity rules. In vSphere 5.5, configure the vSphere Cluster
with both ForeAffinePowerOn & das.respectVmVmAntiAffinityRules set to 1 to respect all VMs
Affinity/Anti-affinity rules respectively.
4-) For zero down time, deploy Orace RAC Multi-node Cluster with leveraging vSphere HA, but itll lead to
high costs of licenses and physical hosts to host many Oracle VMs. For minimum downtime (Near zero), you
can deploy Oracle RAC Single-Node cluster with leveraging vSphere HA which costs much lower than MultiNode cluster, but it suffers from some downtime while moving Oracle Instance from failed node to another
node using OMotion. Eventually, its your choice according to your technical requirements and SLA.
6-) Try to leverage VM Monitoring to mitigate the risk of Guest OS failure. VMware Tools inside Oracle VMs
will send heartbeats to HA driver on the host. If its stopped because Guest OS failure, the host will monitor
IO and network activity of the VM for certain period. If theres also no activity, the host will restart the VM.
This add additional layer of availability for Oracle VMs.
7-) Try to leverage Symantec Application HA Agent for Oracle with vSphere HA for max. availability. Using
Application HA, the monitoring agent will monitor Oracle instance and its services, sending heartbeats to
HA driver on ESXi host. In case of application failure, it may restart services and any dependent resources. If
Application HA Agent cant recover the application from that failure, itll stop sending heartbeats and the
host will initiate a VM restart as a HA action.

Performa
nce

1-) Configure the following BIOS Settings on each ESXi Host:


Settings
Recomme Description
nded
Value
Virtualization
Technology
Turbo Mode
Node Interleaving
VT-x, AMD-V, EPT,
RVI
C1E Halt State
Power-Saving
Virus Warning
Hyperthreading

Yes

Necessary to run 64-bit guest operating systems.

Yes
No
Yes

Balanced workload over unused cores.


Disables NUMA benefits if set to Yes.
Hardware-based virtualization support.

No
No
No
Yes

Disable if performance is more important than saving power.


Disable if performance is more important than saving power.
Disables warning messages when writing to the master boot record.
For use with some Intel processors. Hyperthreading is always recommended

with Intels newer Core i7 processors such as the Xeon 5500 series.
Not necessary for database virtual machine.

Video BIOS
Cacheable
Wake On LAN
Execute Disable

No

Video BIOS
Shadowable
Video RAM
Cacheable
On-Board Audio
On-Board Modem
On-Board Firewire
On-Board Serial
Ports
On-Board Parallel
Ports
On-Board Game Port

No

Required for VMware vSphere Distributed Power Management feature.


Required for vMotion and VMware vSphere Distributed Resource Scheduler
(DRS) features.
Not necessary for database virtual machine.

No

Not necessary for database virtual machine.

No
No
No
No

Not
Not
Not
Not

No

Not necessary for database virtual machine.

No

Not necessary for database virtual machine.

Yes
Yes

necessary
necessary
necessary
necessary

for
for
for
for

database
database
database
database

virtual
virtual
virtual
virtual

machine.
machine.
machine.
machine.

2-) Remove unnecessary services from the Guest OS, wither its Windows or Linux. For example, on
Windows: Indexing Service, System Restore and Remote Desktop. For example on Linux: IPTables, Autofs
and cups.
3-) Set VM settings to Automatically Choose Best CPU/MMU Virtualization Mode.
4-) CPU Sizing:
- Dont over-commit CPUs. Its better to keep it nearly 1:1. In some cases like small environments, overcommitment is allowed after establishing a performance baseline of normal-state utilization.
- Enable Hyperthreading when available. It wont double the processing power in opposite to what shown
on ESXi host as double number of logical cores- but itll give a CPU processing boost up to 20-25% in some
cases.
- ESXi Hypervisor is NUMA aware and it leverages the NUMA topology to gain a significant performance
boost. Try to size your Oracle VMs to fit inside single NUMA node to gain the performance boost of NUMA
node locality.
- In case of large Oracle VMs that wont fit inside single NUMA node, keep in mind that Oracle 11g and
above are supporting NUMA Virtual NUMA-aware-, but its disabled by default. Its recommended to test if
enabling NUMA Support and exposing vNUMA to Oracle would increase performance or not before applying
it in production.
5-) Memory Sizing:
- Dont over-commit memory, as Oracle is a memory-intensive application. If needed, reserve the
configured memory to provide the required performance level. Keep in mind that memory reservation
affects as aspects, like: HA Slot Size, vMotion chances and time. In addition, reservation of memory
removes VM swapfiles from datastores and hence, its space is usable for adding more VMs. For some cases,
where a lot of underutilized testing Oracle Servers there, over-commitment is allowed to get higher
consolidation ratios. Performance monitoring is mandatory in this case to maintain a baseline of normalstate utilization.

- Dont disable Ballon Driver installed with VMware Tools. Balloning is the last line of defense of ESXi Host
before compression and swapping to disk when its memory becomes too low. When Balloning is needed,
Ballon Driver will force Guest OS to swap the idle memory pages to disk to return the free memory pages to
the Host, i.e. swapping is done according to Guest OS techniques. Swapping to disk is done by ESXi Host
itself. Hostd swap memory pages from physical memory to VM swap file on disk without knowing what
these pages contain or if these pages are required or idle. Balloning hit to the performance is somehow
much lower than Swapping to disk. Generally speaking as mentioned in the previous point, dont overcommit memory for business-critical Oracel VMs and if youll do some over-commitment, dont push it to
these extreme limits.
- Use Large Memory Pages for Tier-1 Oracle VMs. Oracle Server supports the concept of large pages when
allocating memory since version 9i R2 for Linux and 10g R2 for Windows.
6-) Storage Sizing:
- Always consider any storage space overhead while calculating VMs space size required. Overhead can be:
swapfiles, VMs logs or snapshots. Its recommended to add 20-30% of space as an overhead.
- Separate different Oracle VMs disks on different dedicated if needed- datastores to avoid IO contention,
as Oracle is an IO-intensive application.
- Provide at least 4 paths, through two HBAs, between each ESXi host and the Storage Array for max.
availability.
- For IP-bases Storage, enable Jumbo Frames on its network end-to-end. Jumbo Frames reduces network
overhead and increases Throughput.
- RDM can be used in many cases, like: Oracle P2V migration or to leverage 3 rd Party array-based backup
tool. Choosing RDM disks or VMFS-based disks are based on your technical requirements. No performance
difference between these two types of disks. Keep in mind that, Oracle Real Application Cluster (RAC)
supports vMotion using shared VMFS datastores hosting its disks.
- Use Paravirtual SCSI Driver in all of your Oracle VMs, specially disks used for DB and Logs, for max.
performance, least latency and least CPU overhead.
- Distribute any Oracle VM disks on the four allowed SCSI drivers for max. performance paralleling and
higher IOps. Its recommended to use Eager-zeroed Thick disks for DB and Logs disks.
- The following table shows the Read/Write behavior of each of Oracle DB components:
DB Component
Read/Write
RAID Recommended
DB
Read Intensive
RAID 5
Logs
Write Intensive
RAID 1/10
OS Disk
Read/Write
RAID 1/10
- Partition Alignment gives a performance boost to your backend storage, as spindles will not make two
reads or writes to process single request. VMFS5 created using vSphere (Web) Client will be aligned
automatically as well as any disks formatted using newer versions of Windows. Any upgraded VMFS
datastores or upgraded versions of Windows Guests will require a partitions alignment process. For
upgraded VMFS, its done by migrate VMs disks to another datastore using Storage vMotion, then format
and recreate the datastore on VMFS5.
- Dont use Oracle Clustered File System (OCFS). It is the predecessor of Oracle Automatic Storage
Management (ASM) and doesnt allow for single instance Oracle Server. Oracle ASM performs better and

allows for single instance and clustered Oracle RACs.


- Dont use Oracle Automatic Storage Management (ASM) Failure Groups. It costs additional CPU overhead
and may behave unexpectedly after failure of one disk in virtual environments. For disk redundancy, use
external RAID or similar technology- thats done on storage-array level, transparently to Oracle Stack.
- Create Oracle ASM Disk groups on similar disks and storage arrays, as a disk group perform as fast as the
slowest disk inside the group.
- For Oracle Multi-Node RAC Cluster, you can use either VMDK disks on VMFS datastores or RDM LUNs for
shared CSR/Voting Disks. In case you use VMDK disks on VMFS datastores as CSR/Voting Disks, you have to
disable simultaneous write protection provided by VMFS using the multi-writer flag
(http://kb.vmware.com/kb/1034165) to share these disks between nodes as well as using VMDK disks as
Independent Persistent disks. For RDM disks as CSR/Voting Disks, you use SCSI Bus Sharing in Physical
Mode to share these disks between nodes.
7-) Network:
- Use VMXNet3 vNIC in all Oracle VMs for max. performance and throughput and least CPU overhead.
- Oracle VMs port group should have at least 2 physical NICs for redundancy and NIC teaming capabilities.
Connect each physical NIC to a different physical switch for max. redundancy.
- Consider network separation between different types of networks, like: vMotion, Management, Oracle
production, Oracle Replication, Fault Tolerance, etc. Network separation is either physical or virtual using
VLANs.
- Clustered Oracle VMs should have two vNICs, one for public network and the other one for heartbeat and
replication network. Its better to dedicate a physical NIC on ESXi hosts for replication network between
Clustered Oracle RAC VMs.
8-) Monitoring:
Try to establish a performance baseline for your Oracle VMs and VI by monitoring the following:
- ESXi Hosts and VMs counters:
Resource

Metric
(esxtop/resxtop)

Metric (vSphere
Client)

Host/
VM

Description

CPU

%USED
%RDY
%CSTP

Used
Ready
Co-Stop

Both
VM
VM

%SYS
Swapin,
Swapout
MCTLSZ (MB)

System
Swapinrate,
Swapoutrate
vmmemctl

Both
Both

READs/s,
WRITEs/s
DAVG/cmd
KAVG/cmd

NumberRead,
NumberWrite
deviceLatency
KernelLatency

Both

CPU used over the collection interval (%)


CPU time spent in ready state
Percentage of time a vCPU spent in read, co-descheduled
state. Only meaningful for SMP virtual machines.
Percentage of time a vCPU was ready to run but was
deliberately not scheduled due to CPU limits.
Percentage of time spent in the ESX/ESXi Server VMKernel
Memory ESX/ESXi host swaps in/out from/to disk (per
virtual machine, or cumulative over host)
Amount of memory reclaimed from resource pool by way
of ballooning
Reads and Writes issued in the collection interval

Both
Both

Average latency (ms) of the device (LUN)


Average latency (ms) in the VMkernel, also known as

%MLMTD

Memor
y
Disk

VM

Both

Networ
k

MbRX/s, MbTX/s
PKTRX/s, PKTTX/s
%DRPRX,
%DRPTX

Received,
Transmitted
PacketsRx,
PacketsTx
DroppedRx,
DroppedTx

Both

Queuing Time
Amount of data received/transmitted per second

Both

Received/Transmitted Packets per second

Both

Receive/Transmit Dropped packets per second

In-guest monitoring is important as well, as CPU and Memory usage is more accurately obtained using inguest counters.
Managea
bility

1-) Oracle Support statement for VMware:


2-) VMware Expanded support for Oracle DB:
https://www.vmware.com/support/policies/oracle-support
3-) Try to leverage a Golden Template to provide a base to your Oracle VM. Itll reduce the time required for
deploying or scaling your Oracle environment as well as preserve consistency of configuration throughout
your environment.
4-) Time Synchronization is one of the most important things in Oracle environments. Its recommended to
do the following:
- Let all your Oracle VMs sync their time according to the following best practices:
Windows: http://kb.vmware.com/selfservice/microsites/search.do?
language=en_US&cmd=displayKC&externalId=1318
Linux: http://kb.vmware.com/selfservice/microsites/search.do?
language=en_US&cmd=displayKC&externalId=1006427
- Disable time-sync between SQL VMs and Hosts using VMware Tools totally (Even after uncheck the box
from VM settings page, VM can sync with the Host using VMware Tools in case of startup, resume,
snapshotting, etc.) according to the following KB:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1189
- Sync all ESXi Hosts in the VI to the same Startum 1 NTP Server which is the same time source of your
forest/domain.
5-) Leverage CPU/Memory Hot add with SQL VMs to scale them as needed. All you need is to use
RECONFIGURE query using Management Studio to force SQL Server to use the newly added resources.

Scalabilit
y

- vSphere supports sizing Oracle VMs using Scale-out or Scale-up approaches. Scale-up Approach of Oracle
VMs requires a large ESXi Hosts with many sockets and RAM. It reduces the number of VMs required to
serve certain number of transactions and hence, a single failed VM will affect a large portion of users. Thats
why Scale-up Approach needs a careful attention to availability of Oracle VMs. In the same time it reduces
the cost of software licenses and physical hosts. Scale-out Approach requires smaller ESXi Hosts and gives a
more flexibility in designing of an Oracle VM, but requires high number of ESXi hosts to provide the required
level of availability and more software licenses and hence, more cost. A single VM failure has a less effect
using Scale-out Approach and it requires less time for migration using vMotion and hence, DRS will be more
effective. Theres no best approach here. It all depends on your environment and your requirements.

- Leverage CPU/Memory Hot add with Oracle VMs to scale them as needed. Oracle -64bit versions mainly- is
Hot-add aware and theres no need to reboot a VM afer hot-add operation. Itll use the added resource
immediately.

ShareP
oint
2013

Availabili
ty

1-) For Web Server Role, its recommended to deploy many Web Server VMs behind Load Balancers
(HW/Virtual Aplliances) to provide load-balancing and high availability. For additional availability,
leverage vSphere HA with VM Monitoring to restart any failed Web Server VMs on other hosts for better
availability and min. downtime.
2-) For Application Server Role, its recommended to deploy many Application Server VMs to provide
load-balancing and high availability. SharePoint 2013 Farm will automatically balance the users load
between all Application Server VMs. For additional availability, leverage vSphere HA with VM Monitoring
to restart any failed Application Server VMs on other hosts for better availability and min. downtime.
3-) For DB Servers, its recommended to use SQL Server native availability techniques combined with
vSphere HA, VM Monitoring and ApplicationHA for max. availability. Some SQL native availability
techniques can be used with all types of SharePoint 2013 Farm DBs while other techniques cant be
used with all types of SharePoint DBs. For example:
Availability Technique
Configuration Central
Content
DB
Administration DB
DB
DB Mirroring High Safety Mode
Yes
Yes
Yes
DB Mirroring High Performance Mode/ Log
No
Yes
Yes
Shipping
SQL 2012 AAG Synchronous-commit Mode
Yes
Yes
Yes
SQL 2012 AAG Asynchronous-commit Mode
No
No
Yes
For more information: http://technet.microsoft.com/en-us/library/jj841106(v=office.15).aspx
5-) For Search Service Availability:
- Deploy two or more Query Servers, each with a main Index Partition and a mirror of the other partitionfor load balancing and redundancy.
- Deploy Two or more Crawl DB Servers, each one holds a Crawl DB for a Crawl Server, for more
redundancy and load balancing.
- Protect Crawl DB with the suitable SQL native availability technique for higher availability if needed.
For more information: http://technet.microsoft.com/en-us/library/jj841106(v=office.15).aspx
- Deploy two or more Crawl Servers, each with two or more Crawler Services that each one of the
Crawler Services connected to a different Crawl DB.
This provides highest level of redundancy and load balancing for large environments with Enterprise
Search Service. For more information: http://technet.microsoft.com/en-us/library/gg502595.aspx
5-) As Microsoft supports vMotion of SQL VMs, use DRS Clusters in Fully Automated Mode. Itll always
load balance your SharePoint VMs across the cluster, respecting all of your configured affinity/antiaffinity rules. For SQL Servers in your SharePoint Farms, consider all best practices while deploying
vMotion Network to support large SQL VMs migrations.

4-) For Web Server/Application Server VMs, use DRS VMs anti-affinity rules to separate them over
different hosts. When HA restart a VM, itll not respect the anti-affintiy rule, but on the following DRS
invocation, the VM will be migrated to respect the rule. In vSphere 5.1, configure the vSphere Cluster
with ForeAffinePowerOn option set to 1 to respect all VMs Affinity/Anti-affinity rules. In vSphere 5.5,
configure the vSphere Cluster with both ForeAffinePowerOn & das.respectVmVmAntiAffinityRules
set to 1 to respect all VMs Affinity/Anti-affinity rules respectively.
6-) Try to leverage VM Monitoring to mitigate the risk of Guest OS failure. VMware Tools inside
SharePoint VMs will send heartbeats to HA driver on the host. If its stopped because Guest OS failure,
the host will monitor IO and network activity of the VM for certain period. If theres also no activity, the
host will restart the VM. This add additional layer of availability for SharePoint VMs.
7-) Try to leverage Symantec Application HA Agent for SharePoint with vSphere HA for max. availability.
Using Application HA, the monitoring agent will monitor SQL instance and SharePoint services, sending
heartbeats to HA driver on ESXi host. In case of application failure, it may restart services or mount
databases or services. If Application HA Agent cant recover the application from that failure, itll stop
sending heartbeats and the host will initiate a VM restart as a HA action.
Performa
nce

1-) Distributed Cache Application VMs should have their configured memory reserved. They heavily
depend on their memory as a Cache for the entire SharePoint Farm, so they have not to participate in
any memory reclamation techniques, like: balloning.
2-) You should have a good knowledge about SharePoint 2013 Farms DBs used in it and their
performance characteristics before virtualizing. SharePoint 2013 supports only SQL Server 2008R2 or
2012. For more information about SharePoint 2013 DBs: http://technet.microsoft.com/enus/library/cc678868(v=office.15).aspx
For graphical poster: http://www.microsoft.com/en-us/download/confirmation.aspx?id=30363
3-) As its so hard to give performance recommendations for green-field deployments of SharePoint
2013 farms, Microsoft has its own performance tests for some scenarios and recommendations based on
these tests: http://technet.microsoft.com/en-us/library/ff608068(v=office.15).aspx
4-) For Capacity Planning of a SharePoint 2013 Farm, its recommended to follow Microsoft
recommendations as its really difficult to provide standard guidance: http://technet.microsoft.com/enus/library/ff758645(v=office.15).aspx
You can also check Microsoft published case studies about different deployments scenarios with
different capacities: http://technet.microsoft.com/en-us/library/cc261716(v=office.14).aspx
5-) Follow Microsoft Best Practices for backend SQL Server: http://technet.microsoft.com/enus/library/hh292622(v=office.15).aspx
5-) CPU Sizing:
- Assign vCPUs as required using Hot Add feature- and dont over-allocate to the VM to prevent CPU
Scheduling issues at hypervisor level and high RDY time. This approach can be applied to the three roles
in your SharePoint Farm: Web, Application and DB VMs. For Application and web Server VMs, sometimes
its easier and better to follow scale-out approach by creating additional VMs to serve more load than
scale-up approach. Besides, vSphere DRS easily balance smaller VMs across the cluster that larger VMs.
Generally speaking, better CPU utilization in you SharePoint Farm means higher throughput and lower

latency.
- Dont over-commit CPUs. Ratio of Virtual: Physical Cores should be 2:1 max (better to keep it nearly
1:1) for mission-critical SharePoint VMs. In some cases like small environments, over-commit is allowed
after establishing a performance baseline.
- Enable Hyperthreading when available. It wont double the processing power in opposite to what
shown on ESXi host as double number of logical cores- but itll give a CPU processing boost up to 2025% in some cases. Dont consider it when calculating Virtual: Physical Cores ratio.
- ESXi Hypervisor is NUMA aware and it leverages the NUMA topology to gain a significant performance
boost. Try to size your SharePoint VMs to fit inside single NUMA node to gain the performance boost of
NUMA node locality.
- On your SQL Server, Set Maximum Degree of Parallelism setting to 1 from SQL Server adv. properties
to control how SQL Server divides incoming requests between VM vCPUs.
- Generally speaking, SQL Server, Web Server and Application Server can start with 4 vCPUs then be
scaled up, down or out according to your environment.
6-) Memory Sizing:
- Dont over-commit memory, as SharePoint 2013 is a memory-intensive application. If needed, reserve
the configured memory to provide the required performance level (more memory equals more caching
and better throughput and lower latency). Keep in mind that memory reservation affects as aspects,
like: HA Slot Size, vMotion chances and time. In addition, reservation of memory removes VM swapfiles
from datastores and hence, its space is usable for adding more VMs. For some cases, where a lot of
underutilized SharePoint Servers there, over-commitment is allowed to get higher consolidation ratios.
Performance monitoring is mandatory in this case to maintain a baseline of normal-state utilization.
- Leverage Memory Hot-add feature to scale your VMs quickly. Keep in mind that some SharePoint
servers, like: Distributed Cache, cant use the added memory till a reboot.
- For Web Server and Application Server, the min recommended memory is 8GB for small environments
and scale-up to 16GB for large ones. Adding more users load on your Web Server or more applications
on your Applications Servers will require adding more memory than the recommended.
- For Content DB Memory Sizing:
Combined size of content
RAM recommended for
databases
computer running SQL Server
Minimum for small production
8GB
deployments
Minimum for medium production
16 GB
deployments
Recommendation for up to 2
32 GB
terabytes
Recommendation for the range of 64 GB
2 terabytes to 5 terabytes
Recommendation for more than 5 >64 GB (estimated according to
terabytes
your DB size to provide enough
cache to improve your SQL

Server performance)
Keep in mind that, leveraging any SQL Availability techniques that create additional secondary copies of
the DB, will require sizing the secondary SQL Server node with the same memory size to provide the
same performance in case of failover. In case of using SQL 2012 AAG groups with readable secondaries,
sizing secondary SQL Server node properly will improve Read operation performance.
7-) Storage Sizing;
- Always consider any storage space overhead while calculating VMs space size required. Overhead can
be: swapfiles, VMs logs or snapshots. Its recommended to add 20-30% of space as an overhead.
- All recommended best practices for deploying SQL Server Storage requirements must be applied when
deploying SharePoint Farm backend DB Servers.
- Separate different SharePoint VMs disks on different dedicated if needed- datastores to avoid IOps
contention, as SharePoint is an IO-intensive application with many components, each with different IOps
requirements.
- Provide at least 4 paths, through two HBAs, between each ESXi host and the Storage Array for max.
availability.
- RDM can be used in many cases, like: P2V migration or to leverage 3 rd Party array-based backup tool.
Choosing RDM disks or VMFS-based disks are based on your technical requirements. No performance
difference between these two types of disks.
- Partition Alignment gives a performance boost to your backend storage, as spindles will not make two
reads or writes to process single request. VMFS5 created using vSphere (Web) Client will be aligned
automatically as well as any disks formatted using newer versions of Windows. Any upgraded VMFS
datastores or upgraded versions of Windows Guests will require a partitions alignment process. For
upgraded VMFS, its done by migrate VMs disks to another datastore using Storage vMotion, then format
and recreate the datastore on VMFS5.
- Use Paravirtual SCSI Driver in all of your Oracle VMs, specially disks used for DB and Logs, for max.
performance, least latency and least CPU overhead.
8-) Network Sizing:
- Your Application and Web Servers should have two vNICs, one for public communication with users and
the other for backend communication with SQL DB VMs.
- Use VMXNet3 vNIC in all SharePoint VMs for max. performance and throughput and least CPU
overhead.
- SharePoint VMs port group should have at least 2 physical NICs for redundancy and NIC teaming
capabilities. Connect each physical NIC to a different physical switch for max. redundancy.
- Consider network separation between different types of networks, like: vMotion, Management,
SharePoint production, SharePoint backend communication, Fault Tolerance, etc. Network separation is
either physical or virtual using VLANs.
- Its better to dedicate a physical NIC on ESXi hosts for backend communication network between
Application and Web VMs and SQL DB VMs.
- Provided that your design will depend on creating multiple redundant instances of all SharePoint Roles,
you can keep one Web, one Application and one backend DB Server VMs as a one unit ona single ESXi
Hosts. This will make all their backend communications local on hosts memory which provides much

more throughput than your network and much lower latency. Use DRS VMs Affinity rules to keep these
VMs together. Create many units of the three VMs and distribute them on your ESXi hosts for higher
availability. Use Host-VM Should Affinity rules to control which unit runs on which host.
- For your DB Servers, dedicate physical NIC on ESXi hosts hosting them for replication traffic between
redundant instances to keep them in tight lockstep for better availability and better RPO.
8-) Monitoring:
Try to establish a performance baseline for your SQL VMs and VI by monitoring the following:
- ESXi Hosts and VMs counters:
Resource

Metric
(esxtop/resxtop)

Metric (vSphere
Client)

Host/
VM

Description

CPU

%USED
%RDY
%CSTP

Used
Ready
Co-Stop

Both
VM
VM

CPU used over the collection interval (%)


CPU time spent in ready state
Percentage of time a vCPU spent in read, codescheduled state. Only meaningful for SMP virtual
machines.
Percentage of time a vCPU was ready to run but was
deliberately not scheduled due to CPU limits.
Percentage of time spent in the ESX/ESXi Server
VMKernel
Memory ESX/ESXi host swaps in/out from/to disk (per
virtual machine, or cumulative over host)
Amount of memory reclaimed from resource pool by
way of ballooning
Reads and Writes issued in the collection interval

%MLMTD

VM

%SYS

System

Both

Memor
y

Swapin,
Swapout
MCTLSZ (MB)

Swapinrate,
Swapoutrate
vmmemctl

Both

Disk

READs/s,
WRITEs/s
DAVG/cmd
KAVG/cmd

NumberRead,
NumberWrite
deviceLatency
KernelLatency

Both

MbRX/s, MbTX/s

Received,
Transmitted
PacketsRx,
PacketsTx
DroppedRx,
DroppedTx

Both

Average latency (ms) of the device (LUN)


Average latency (ms) in the VMkernel, also known as
Queuing Time
Amount of data received/transmitted per second

Both

Received/Transmitted Packets per second

Both

Receive/Transmit Dropped packets per second

Networ
k

PKTRX/s, PKTTX/s
%DRPRX,
%DRPTX

Both

Both
Both

- In-guest counters:
For all in-guest counters need to be monitored: http://technet.microsoft.com/en-us/library/ff758658.aspx
Recovera
bility

1-) Use VMware Site Recovery Manager (SRM) if available for DR. With SRM, automated failover to a
replicated copy of the VMs in your DR site can be carried over in case of a disaster or even a failure of
single VM in your SharePoint Farm.
2-) If VMware SRM isnt available, you can leverage some availability features of SQL itself for more
recoverability of your backend DB infrastructure. You can either use a mix between AAG
Synchronous/Asynchronous replicas or Data Mirroring in High safety Mode with Log Shipping in DR Site.

This approach leads to lower cost, but with higher management overhead and higher RPO/RTO results
than using VMware SRM.
3-) For the least protection, use warm clones of your VMs in the DR sites that ready to be powered up
and deployed in case of disaster. This approach require consistent Backup/Restore cycle of your
SharePoint Farm VMs. Fore more information about SharePoint Farm DR:
http://technet.microsoft.com/en-us/library/ff628971.aspx
4-) Try to leverage native backup techniques in SharePoint. For more information:
http://technet.microsoft.com/en-us/library/ee428315(v=office.15).aspx
5-) Try to leverage any backup software that uses Microsoft Volume Shadow Service (VSS). These are
SQL-aware and dont cause any corruption in DB due to quiesceing the DB during the backup operation.
In addition, theyre SharePoint-aware and uses VSS writers to backup any Application or Web Server
without any interruption. Ofcourse, one of them is vSphere Advanced Data Protection. Check the
following link: http://www.vmware.com/files/pdf/products/vsphere/VMware-vSphere-Data-ProtectionProduct-FAQ.pdf
Managea
bility

Scalabilit
y

1-) Microsoft support for SharePoint 2013 Virtualization: http://technet.microsoft.com/enus/library/ff607936(v=office.15).aspx


2-) Try to leverage vApp feature in vSphere. It can be really helpful in packaging and exporting group of
SharePoint VMs with certain reserved resources for development or testing.
3-) Use vCenter Operation Manager to monitor your environment performance trends, establish a
dynamic baseline of your VMs performance to prevent false static alerts, estimate the capacity required
for further scaling and proactively protect your environment against sudden peaks of VMs performance
that need immediate scaling-up of resources.
4-) Use SharePoint Product Preparation Tool found on SharePoint media to install all prerequisites on
your SharePoint Server.
5-) Install SharePoint Server binaries on all required Application and Web Server VMs before configuring
any required configuration on any one of them to achieve configuration consistency and stable
SharePoint farm.
6-) Time Synchronization is one of the most important things in SharePoint environments. Its
recommended to do the following:
- Let all your SharePoint VMs sync their time with DCs only, not with VMware Tools.
- Disable time-sync between SharePoint VMs and Hosts using VMware Tools totally (Even after uncheck
the box from VM settings page, VM can sync with the Host using VMware Tools in case of startup,
resume, snapshotting, etc.) according to the following KB:
http://kb.vmware.com/selfservice/microsites/search.do?
language=en_US&cmd=displayKC&externalId=1189
- Sync all ESXi Hosts in the VI to the same Startum 1 NTP Server which is the same time source of your
forest/domain.
7-) Make sure that you enable Full Recovery Mode on all SharePoint DBs that will be included in your
SQL AAGs and also make sure that at least single Full Backup is taken.
1-) SharePoint 2013 Farm contains many SQL DBs that differ in their scalability approaches according to
the number allowed of each in the farm, their performance characteristics and the max. recommended

size. Generally speaking, Configuration DB and Central administration DBs must be co-located and both
will never grow beyond 1 GB. SharePoint 2013 Farm must have only one of each of them and hence, if
you have a rare case to expand any of them, you should scale them up not out. Content DB will grow
according to your deployment of your SharePoint 2013 Farm and can beyond 1 TB. Its recommended to
keep it below 200GB for max. performance. For more scalability, scale-out your Web Server and add
another Content DB that should be kept also below 200GB and so on. For more information:
http://technet.microsoft.com/en-us/library/cc678868(v=office.15).aspx#
2-) Microsoft released some topologies for different sizes of SharePoint environments with the required
components. These can be a starting point for you to size your environment to acheive the required
performance, scalability and availability levels. Check ShreaPoint 2010: http://technet.microsoft.com/enus/library/cc263044.aspx
SharePoint 2013: http://www.microsoft.com/en-eg/download/details.aspx?id=30377
3-) Leverage CPU/Memory Hot add with SharePoint VMs to scale them as needed. Some VMs , like SQL
Server, may use added resources without a reboot when others, like: Distributed Cache Server, will need
a reboot to use them.
4-) Try to leverage vSphere Templates in your environment. Create your Golden Template for every tier
of your VMs. This reduces the time required for deploying or scaling your SharePoint environment as
well as preserve consistency of configuration throughout your environment.

SAP
HAN
A

Availabili
ty

1-) Leverage vMotion with your SAP HANA VMs. Make sure that destination host has the required resources
to run migrated VM.
1-) Make sure to enable DRS in Fully Automated Mode on the cluster hosting SAP HANA VMs. SAP HANA
support migration of its VMs using vMotion.
2-) Use DRS Anti-affinity rules for separating SAP HANA VMs apart and use VM-Host Should Affinity rules to
keep SAP HANA VMs on their certified ESXi Hosts only.
3-) Try to leverage different SAP HANA High Availability solutions with vSphere HA. Check:
http://www.saphana.com/servlet/JiveServlet/previewBody/2775-102-4-9467/HANA_HA_2.1.pdf
4-) Make sure to add the automatic SAP HANA start parameter to the SAP HANA configuration file to enable
SAP HANA automatic restart after reboot in case of HA event.
6-) Try to leverage VM Monitoring to mitigate the risk of Guest OS failure. VMware Tools inside SAP HANA
VMs will send heartbeats to HA driver on the host. If its stopped because Guest OS failure, the host will
monitor IO and network activity of the VM for certain period. If theres also no activity, the host will restart
the VM. This add additional layer of availability for SharePoint VMs.
7-) Try to leverage Symantec Application HA Agent for SAP HANA with vSphere HA for max. availability.
Using Application HA, the monitoring agent will monitor SAP HANA instance related services, sending
heartbeats to HA driver on ESXi host. In case of application failure, it may restart services. If Application HA
Agent cant recover the application from that failure, itll stop sending heartbeats and the host will initiate a
VM restart as a HA action.

Performa
nce

1-) Follow all VMware best practices for Latency-sensitive applications:


http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-Workloads.pdf
2-) Configure the following BIOS Settings on each ESXi Host:
Settings
Recomme Description
nded
Value
Virtualization
Technology
Turbo Mode
Node Interleaving
VT-x, AMD-V, EPT,
RVI
C1E Halt State
Power-Saving
Virus Warning
Hyperthreading

Yes

Necessary to run 64-bit guest operating systems.

Yes
No
Yes

Balanced workload over unused cores.


Disables NUMA benefits if set to Yes.
Hardware-based virtualization support.

No
No
No
Yes

Video BIOS
Cacheable
Wake On LAN
Execute Disable

No

Disable if performance is more important than saving power.


Disable if performance is more important than saving power.
Disables warning messages when writing to the master boot record.
For use with some Intel processors. Hyperthreading is always recommended
with Intels newer Core i7 processors such as the Xeon 5500 series.
Not necessary for database virtual machine.

Video BIOS
Shadowable
Video RAM
Cacheable
On-Board Audio
On-Board Modem
On-Board Firewire
On-Board Serial
Ports
On-Board Parallel
Ports
On-Board Game Port

No

Required for VMware vSphere Distributed Power Management feature.


Required for vMotion and VMware vSphere Distributed Resource Scheduler
(DRS) features.
Not necessary for database virtual machine.

No

Not necessary for database virtual machine.

No
No
No
No

Not
Not
Not
Not

No

Not necessary for database virtual machine.

No

Not necessary for database virtual machine.

Yes
Yes

necessary
necessary
necessary
necessary

for
for
for
for

database
database
database
database

virtual
virtual
virtual
virtual

machine.
machine.
machine.
machine.

2-) Remove unnecessary services from the Guest OS, which is SUSE Linux, for example on Linux: IPTables,
Autofs and cups.
3-) Turn off the SLES kernel dump function (kdump) if it is not needed for specific reasons, for example: a
root cause analysis.
4-) Configure the SLES kernel parameter as described below:
net.ipv4.tcp_slow_start_after_idle=0
5-) Adhere to the shared memory settings as described below.
Deployment Size
Shmmni Value
Physical Memory Size
Small
4GB
24 G & 64GB

Medium
64GB
64 G & 256GB
Large
53488 MB
> 256GB
6-) Set VM settings to Automatically Choose Best CPU/MMU Virtualization Mode.
7-) CPU Sizing:
- Assign vCPUs as required using Hot Add feature- and dont over-allocate to the VM to prevent CPU
Scheduling issues at hypervisor level and high RDY time.
- Dont over-commit CPUs. Its better to keep Virtual: Physical Cores nearly 1:1 for mission-critical SAP HANA
VMs. In some cases like test environments, over-commit is allowed after establishing a performance
baseline.
- Enable Hyperthreading when available. It wont double the processing power in opposite to what shown
on ESXi host as double number of logical cores- but itll give a CPU processing boost up to 10-20%. Dont
consider it when calculating Virtual: Physical Cores ratio.
- ESXi Hypervisor is NUMA aware and it leverages the NUMA topology to gain a significant performance
boost. Try to size your SAP HANA VMs to fit inside single NUMA node to gain the performance boost of NUMA
node locality.
- For large SAP HANA VMs, SAP is NUMA-aware, so enabling vNUMA on the wide VMs that spans multiple
NUMA nodes- will give better performance. In addition, pin each vCPU to its NUMA noda to prevent
migrations from physical NUMA node to another one by setting the following adv. setting in VM
Configuration Parameters:
sched.vcpu0.affinity = 0-19
sched.vcpu1.affinity = 0-19

sched.vcpu9.affinity = 0-19
sched.vcpu10.affinity = 20-39
sched.vcpu11.affinity = 20-39
..
sched.vcpu19.affinity = 20-39
8-) Memory Sizing:
- Dont over-commit memory, as SAP HANA is a memory-intensive application. If needed, reserve the
configured memory to provide the required performance level. Keep in mind that memory reservation
affects as aspects, like: HA Slot Size, vMotion chances and time. In addition, reservation of memory removes
VM swapfiles from datastores and hence, its space is usable for adding more VMs. For some cases, like
testing environments, over-commitment is allowed to get higher consolidation ratios. Performance
monitoring is mandatory in this case to maintain a baseline of normal-state utilization.
- Use Large Memory Pages (aka HugePages feature in SUSE Linux 11) to give a 10% performance boost to
your SAP HANA VMs. Its enabled by default since SUSE Linux 11 SP2.
- As Linux VMs just touches the needed memory pages when booting, setting memory reservation for it
wont allocate all the reserved memory during the booting process. Itll just allocate and reserve the
touched memory only. For SAP HANA Linux VMs, all memory configured should be per-allocated using the
following adv. setting in VM Configuration Parameters:

sched.mem.prealloc=True
sched.swap.vmxSwapEnabled=False
- In order to achieve the absolute lowest possible latency for SAP HANA, it recommended to set the latency
to in VM adv. setting.
- As SAP HANA instances usually need large memory reservation, dont forget memory overhead to be
calculated and accounted for. For large-memory VMs, memory overhead can be several GBs of memory.
8-) Storage Sizing:
- Separate different SAP HANA VMs disks on different dedicated if needed- datastores to avoid IOps
contention, as SAP HANAis an IO-intensive application with many components, each with different IOps
requirements.
- Provide at least 4 paths, through two HBAs, between each ESXi host and the Storage Array for max.
availability.
- RDM can be used in many cases, like: P2V migration or to leverage 3 rd Party array-based backup tool.
Choosing RDM disks or VMFS-based disks are based on your technical requirements. No performance
difference between these two types of disks.
- Dont use IBM GPFS with your virtualized SAP HANA instances, as it wont support the following:
- VMware vMotion, Distributed Resource Scheduler (DRS), Fault Tolerance (FT) and Cloning.
- N_Port ID virtualization (NPIV).
- running on mixed VMware ESXi versions.
IBM GPFS supports only running with Physical-mode RDM.
- Use Paravirtual SCSI Driver in all of your SAP HANA VMs for max. performance, least latency and least CPU
overhead.
- Distribute any SAP HANA VM disks on the four allowed SCSI drivers for max. performance paralleling and
higher IOps. Its recommended to use Eager-zeroed Thick disks for DB and Logs disks.
- Partition Alignment gives a performance boost to your backend storage, as spindles will not make two
reads or writes to process single request. Datastores created using vSphere (Web) Client is natively aligned.
- Its recommended to use NOOP Scheduler as your IO scheduler in your SAP HANA Linux VMs. For more
information: http://kb.vmware.com/selfservice/microsites/search.do?
language=en_US&cmd=displayKC&externalId=2011861
9-) Network Sizing:
- Use VMXNet3 vNIC in all SAP HANA VMs for max. performance and throughput and least CPU overhead.
- Try to leverage vSphere Distributed Switch (vDS) to preserve consistency in your network configuration
between all ESXi Hosts. vDS also provides many advanced features that dont exist in Standard Switch-,
like: Private VLANs and NetFlow.
10-) Monitoring:
Try to establish a performance baseline for your SQL VMs and VI by monitoring the following:
- ESXi Hosts and VMs counters:
Resource

Metric
(esxtop/resxtop)

Metric (vSphere
Client)

Host/
VM

Description

CPU

%USED
%RDY

Used
Ready

Both
VM

CPU used over the collection interval (%)


CPU time spent in ready state

%CSTP

Memor
y

Co-Stop

%MLMTD

VM

%SWPWT

VM

%SYS
Swapin,
Swapout
MCTLSZ (MB)

System
Swapinrate,
Swapoutrate
vmmemctl

N%L

Disk

READs/s,
WRITEs/s
DAVG/cmd
KAVG/cmd

Managea
bility

Recovera

RESET/s
MbRX/s, MbTX/s
PKTRX/s,
PKTTX/s
%DRPRX,
%DRPTX

Both
Both
Both
VM

NumberRead,
NumberWrite
deviceLatency
KernelLatency

ABRTS/s

Networ
k

VM

Both
Both
Both

VM
Both

Average latency (ms) of the device (LUN)


Average latency (ms) in the VMkernel, also known as
Queuing Time
Aborts are issued by the virtual machine because the storage
is not
responding. For Windows virtual machines, this happens after
a 60-second
default. This issue can be caused by path failure, or when the
storage array
is not accepting I/O.
The number of command resets per second.
Amount of data received/transmitted per second

Both

Received/Transmitted Packets per second

Both

Receive/Transmit Dropped packets per second

VM

Received,
Transmitted
PacketsRx,
PacketsTx
DroppedRx,
DroppedTx

Percentage of time a vCPU spent in read, co-descheduled


state. Only meaningful for SMP virtual machines.
Percentage of time a vCPU was ready to run but was
deliberately not scheduled due to CPU limits.
Virtual machine waiting on swapped pages to be read from
disk. This can indicate overcommitted memory.
Percentage of time spent in the ESX/ESXi Server VMKernel
Memory ESX/ESXi host swaps in/out from/to disk (per virtual
machine, or cumulative over host)
Amount of memory reclaimed from resource pool by way of
ballooning
If less than 80, the virtual machine is experiencing poor
NUMA locality. If the virtual machine has memory size greater
than the amount of memory
local to each processor, the ESXi scheduler does not attempt
to use NUMA optimizations for that virtual machine.
Reads and Writes issued in the collection interval

1-) SAP HANA instance virtualization is supported for production with vSphre 5.5 and SAP HANA SPS 7.
2-) SAP has released use of parallel SAP HANA VMs on VMware vSphere 5.5 into controlled availability,
allowing selected customers, depending on their scenarios and system sizes to go live with this
configuration.
3-) Its recommended to use vSphere Host Profiles while configuring ESXi Hosts that will host SAP HANA
instances. Host Profiles preserve configuration consistency between ESXi Hosts in the cluster which is
crucial for a cluster hosting some SAP HANA instances to achieve high performance.
2-)
1-) Use VMware Site Recovery Manager (SRM) if available for DR. With SRM, automated failover to a

bility
Scalabilit
y

Java
Enterpris
e
Applicatio
ns

replicated copy of the VMs in your DR site can be carried over in case of a disaster or even a failure of single
VM in your SAP HANA environment.
1-) Try to leverage vSphere Templates in your environment. Create your Golden Template for every tier of
your VMs. This reduces the time required for deploying or scaling your SharePoint environment as well as
preserve consistency of configuration throughout your environment.

Availabili
ty

1-) Try to use vSphere HA with your Java VMs to provide decent level of availability.
2-) Try to separate your Java VMs, from the same tier, on different Racks, Blade Chassis and Storage
Arrays if available using VMs Affinity/Anti-affinity rules and Storage Affinity/Anti-affinity rules for most
availability. Keep in mind that, when HA restart a VM, itll not respect the anti-affintiy rule, but on the
following DRS invocation, the VM will be migrated to respect the rule. In vSphere 5.1, configure the
vSphere Cluster with ForeAffinePowerOn option set to 1 to respect all VMs Affinity/Anti-affinity
rules. In vSphere 5.5, configure the vSphere Cluster with both ForeAffinePowerOn &
das.respectVmVmAntiAffinityRules set to 1 to respect all VMs Affinity/Anti-affinity rules
respectively.
6-) Try to leverage VM Monitoring to mitigate the risk of Guest OS failure. VMware Tools inside Java
VMs will send heartbeats to HA driver on the host. If its stopped because Guest OS failure, the host
will monitor IO and network activity of the VM for certain period. If theres also no activity, the host
will restart the VM. This add additional layer of availability for Java VMs.
7-) Try to leverage Symantec Application HA Agent for Oracle with vSphere HA for max. availability.
Using Application HA, the monitoring agent will monitor Oracle instance and its services, sending
heartbeats to HA driver on ESXi host. In case of application failure, it may restart services and any
dependent resources. If Application HA Agent cant recover the application from that failure, itll stop
sending heartbeats and the host will initiate a VM restart as a HA action.

Performa
nce

1-) Follow all VMware best practices for Latency-sensitive applications:


http://www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-Workloads.pdf
2-) Configure the following BIOS Settings on each ESXi Host:
Settings
Recomme Description
nded
Value
Virtualization
Technology
Turbo Mode
Node Interleaving
VT-x, AMD-V, EPT,
RVI
C1E Halt State
Power-Saving
Virus Warning

Yes

Necessary to run 64-bit guest operating systems.

Yes
No
Yes

Balanced workload over unused cores.


Disables NUMA benefits if set to Yes.
Hardware-based virtualization support.

No
No
No

Disable if performance is more important than saving power.


Disable if performance is more important than saving power.
Disables warning messages when writing to the master boot record.

Hyperthreading

Yes

Video BIOS
Cacheable
Wake On LAN
Execute Disable

No

Video BIOS
Shadowable
Video RAM
Cacheable
On-Board Audio
On-Board Modem
On-Board Firewire
On-Board Serial
Ports
On-Board Parallel
Ports
On-Board Game Port

No

Required for VMware vSphere Distributed Power Management feature.


Required for vMotion and VMware vSphere Distributed Resource
Scheduler (DRS) features.
Not necessary for database virtual machine.

No

Not necessary for database virtual machine.

No
No
No
No

Not
Not
Not
Not

No

Not necessary for database virtual machine.

No

Not necessary for database virtual machine.

Yes
Yes

For use with some Intel processors. Hyperthreading is always


recommended with Intels newer Core i7 processors such as the Xeon
5500 series.
Not necessary for database virtual machine.

necessary
necessary
necessary
necessary

for
for
for
for

database
database
database
database

virtual
virtual
virtual
virtual

machine.
machine.
machine.
machine.

2-) During testing and planning phase, try to establish a baseline of HTTP requests: Java Heaps: DB
Connections required.
2-) CPU Sizing:
- CPU Over-commit is allowed in Java Enterprise Applications VMs so that total physical CPUs
utilization doesnt exceed 80%. Performance monitoring is mandatory in this case to maintain a
baseline of normal-state utilization.
- Assign vCPUs as required and dont over-allocate to the VM to prevent CPU Scheduling issues at
hypervisor level and high RDY time.
- Enable Hyperthreading when available. It wont double the processing power in opposite to what
shown on ESXi host as double number of logical cores- but itll give a CPU processing boost up to 2025% in some cases. Dont consider it when calculating Virtual: Physical Cores ratio.
- ESXi Hypervisor is NUMA aware and it leverages the NUMA topology to gain a significant
performance boost. Try to size your Java VMs to fit inside single NUMA node to gain the performance
boost of NUMA node locality.
2-) Memory Sizing:
- Dont over-commit memory, as Java Applications are memory-intensive. If needed, reserve the
configured memory to provide the required performance level. Keep in mind that memory
reservation affects as aspects, like: HA Slot Size, vMotion chances and time. In addition, reservation
of memory removes VM swapfiles from datastores and hence, its space is usable for adding more
VMs. For some cases, like testing environments, over-commitment is allowed to get higher
consolidation ratios. Performance monitoring is mandatory in this case to maintain a baseline of
normal-state utilization.

- Dont disable Ballon Driver installed with VMware Tools. Balloning is the last line of defense of ESXi
Host before compression and swapping to disk when its memory becomes too low. When Balloning is
needed, Ballon Driver will force Guest OS to swap the idle memory pages to disk to return the free
memory pages to the Host, i.e. swapping is done according to Guest OS techniques. Swapping to disk
is done by ESXi Host itself. Hostd swap memory pages from physical memory to VM swap file on disk
without knowing what these pages contain or if these pages are required or idle. Balloning hit to the
performance is somehow much lower than Swapping to disk. Generally speaking as mentioned in the
previous point, dont over-commit memory for business critical Java VMs and if youll do some overcommitment, dont push it to these extreme limits.
- Configure Memory Reservation on your Java VMs according to:
Reserved memory= VM Memory= Guest OS Memory+ Java Memory (JVM Memory)
- Leverage Large Memory Pages feature. It can give a performance boost for your Java VMs. Keep in
mind not to configure all Java VM configured memory as Large Pages and leave some memory to be
used by small pages for processes that cant leverage Large Pages.
10-) Monitoring:
Try to establish a performance baseline for your SQL VMs and VI by monitoring the following:
- ESXi Hosts and VMs counters:
Resource

Metric
(esxtop/resxtop)

Metric (vSphere
Client)

Host/
VM

Description

CPU

%USED
%RDY
%CSTP

Used
Ready
Co-Stop

Both
VM
VM

Both

CPU used over the collection interval (%)


CPU time spent in ready state
Percentage of time a vCPU spent in read, co-descheduled
state. Only meaningful for SMP virtual machines.
Percentage of time a vCPU was ready to run but was
deliberately not scheduled due to CPU limits.
Virtual machine waiting on swapped pages to be read
from disk. This can indicate overcommitted memory.
Percentage of time spent in the ESX/ESXi Server VMKernel
Memory ESX/ESXi host swaps in/out from/to disk (per
virtual machine, or cumulative over host)
Amount of memory reclaimed from resource pool by way
of ballooning
If less than 80, the virtual machine is experiencing poor
NUMA locality. If the virtual machine has memory size
greater than the amount of memory local to each
processor, the ESXi scheduler does not attempt to use
NUMA optimizations for that virtual machine.
Reads and Writes issued in the collection interval

Both

Average latency (ms) of the device (LUN)

Both

Average latency (ms) in the VMkernel, also known as


Queuing Time

Memor
y

%MLMTD

VM

%SWPWT

VM

%SYS
Swapin,
Swapout
MCTLSZ (MB)

System
Swapinrate,
Swapoutrate
vmmemctl

N%L

Disk

READs/s,
WRITEs/s
DAVG/cmd
KAVG/cmd

Both
Both
Both
VM

NumberRead,
NumberWrite
deviceLatenc
y
KernelLatenc
y

ABRTS/s

Networ
k

RESET/s
MbRX/s,
MbTX/s
PKTRX/s,
PKTTX/s
%DRPRX,
%DRPTX

VM

Received,
Transmitted
PacketsRx,
PacketsTx
DroppedRx,
DroppedTx

VM
Both

Aborts are issued by the virtual machine because the


storage is not
responding. For Windows virtual machines, this happens
after a 60-second by default. This issue can be caused by
path failure, or when the storage array is not accepting
I/O.
The number of command resets per second.
Amount of data received/transmitted per second

Both

Received/Transmitted Packets per second

Both

Receive/Transmit Dropped packets per second

Managea
bility

10-) Time Synchronization is one of the most important things in SQL environments. Its
recommended to do the following:
- Let all your SQL VMs sync their time with DCs only, not with VMware Tools.
- Disable time-sync between SQL VMs and Hosts using VMware Tools totally (Even after uncheck the
box from VM settings page, VM can sync with the Host using VMware Tools in case of startup,
resume, snapshotting, etc.) according to the following KB:
http://kb.vmware.com/selfservice/microsites/search.do?
language=en_US&cmd=displayKC&externalId=1189
- Sync all ESXi Hosts in the VI to the same Startum 1 NTP Server which is the same time source of
your forest/domain.
2-) Try to leverage vSphere-aware load balancers that can integrate with vSphere API and
automatically add newly added VM to its corresponding load-balancing pool.
3-) Use load balancers with known algorithms that you can understand so that you can test and
configure them efficiently to make sure that each VM has equal share of requests and load is
balanced.

Scalabilit
y

1-) Leverage CPU/Memory Hot add with Java VMs to scale them up as needed.
2-) When increasing your VM Java Heap Size, increase your vCPUs for max. performance.
3-) When adapting Scale-out approach of your Java VMs, try to use symmetric VMs in your scaling, so
that load balancers can effectively load-balancing requests between them. Load balancers arent
aware of VMs sizing and hence, non-symmetric VMs would lead to non-efficient load-balancing unless
you configure VMs sizes on your load balancers, which is time consuming.

http://www.vmware.com/files/pdf/solutions/VMware-Virtualizing-Business-Critical-Apps-on-VMware_en-wp.pdf

Вам также может понравиться