Virtualizing and Tuning Large Scale Java Platforms

Java Application Performance Best
Practice
Agenda
Design and Sizing Java Platforms

Performance
Best Practices and Tuning
Java Platforms
Design and Sizing
Conventional Java Platforms

Java Platforms are multitier and multi org
Load Balancer
Tier
Web Server Tier
Load Balancers
Web Servers
IT Operations
Network Team
IT Operations
Server Team
Java App Tier
DB Server Tier
Java Applications
IT Apps Java
Dev Team
DB Servers
IT Ops & Apps

Dev Team
Organizational Key Stakeholder Departments
Design and Sizing of Java Platforms
Step 2 Establish
Benchmark
Scale Up Test
ESTABLISH BUILDING
BLOCK VM
Establish Vertical scalability

Scale Up Test
Establish how many JVMs on a VM?
Establish how large a VM would be
in terms of vCPU and memory
Building Block
VM
Investigate bottlnecked
layer
Network, Storage,
Application
Configuration, &
vSphere
If building
block app/VM
config
problem,
adjust &
iterate
If scale out
bottlenecked
layer is
removed,
No
iterate scale
out test
Test
complet
e
SLA
OK?
DETERMINE HOW MANY

VMs
Scale Out Test

Building Block
VM
Building Block
VM
Establish Horizontal Scalability

Scale Out Test
How many VMs do you need to
meet your Response Time SLAs
without reaching 70%-80%
saturation of CPU?
Establish your Horizontal
scalability Factor before bottleneck
appear in your application
HotSpot JVMs on vSphere
VM
Memor
y
Guest OS
Memory
Java
Stack
JVM
Memor
y
Perm Gen
JVM
Max
Heap
-Xmx
7
Initial
Heap
-Xss per
thread
Other mem
XX:MaxPermSi
ze
Xms
Direct native
Memory
off-the-heap
Non
Direct
Memory
Heap
HotSpot JVMs on vSphere

VM Memory = Guest OS Memory + JVM Memory
JVM Memory = JVM Max Heap (-Xmx value) + JVM Perm Size (XX:MaxPermSize) +
NumberOfConcurrentThreads * (-Xss) + other Mem
Guest OS Memory approx 1G (depends on OS/other processes)

Perm Size is an area additional to the Xmx (Max Heap) value and is not GCed because it contains class-level information.
other mem is additional mem required for NIO buffers, JIT code cache,
classloaders, Socket Buffers (receive/send), JNI, GC internal info
If you have multiple JVMs (N JVMs) on a VM then:

VM Memory = Guest OS memory + N * JVM Memory
8
Sizing Example
set mem Reservation
to 5088m
VM
Memory
(5088m)
Guest OS
Memory
JVM
Memor
y
(4588
m)
JVM
Max
Heap
-Xmx
(4096
m)
Java
Stack
500m used by OS
-Xss per thread
(256k*100)
Other mem (=217m)
Perm Gen
Initial
Heap
-XX:MaxPermSize
(256m)
-Xms (4096m)
Larger JVMs for In-Memory Data Grids

Set memory
reservation to 34g
VM
Memory
for
SQLFire
(34g)
Guest OS
Memory
JVM
Memory
for
SQLFire
(32g)
JVM
Max
Heap
-Xmx
(30g)
10
Java
Stack
0.5-1g used by OS
-Xss per thread
(1M*500)
Other mem (=1g)
Perm Gen
Initial
Heap
-XX:MaxPermSize
(0.5g)
-Xms (30g)
8 vCPU VMs
less than
47GB RAM
on each VM
If VM is sized greater
than 47GB or 8 CPUs,
Then NUMA interleaving
Occurs and can cause
30% drop in memory
throughput performance
11
or
M
y emo
r
M
y em
y emo
r
Each NUMA
Node has 94/2
47GB
Memor
y
Memor
y
Memor
y
M
Me
m
Me
m
or
Memor
y
Memor
y
Memor
y
Pr
oc
2
or
Pr M
o c em
y
1
or
ESX
Scheduler
96 GB RAM
on Server
4vCPU VM
40GB RAM
1
5
ESXi
Scheduler
If 5th VM is of 4vCPU then it

is NOT NUMA wide and will
be scheduled on one NUMA
Node, i.e. all 4vCPUs of
the VM scheduled on 1 socket
Me
mo
ry
Me
mo
ry
2vCPU VMs
less than
20GB RAM
on each VM
Me
mo
ry
Memory
Memory
12
Me
mo
ry
mo
Memory
Me
ry
Memory
P ro
c
2
Memory
P ro
c
1
Me
mo
ry
Memory
128 GB RAM
on server
7vCPU VM
40GB RAM
1
5
ESXi
Scheduler
Split by ESXi into 2 NUMA

clients if the 5th VM > 6vCPU
If 5th VM is of 4vCPU then it
is NOT NUMA wide and will
be scheduled on one NUMA
Node, i.e. all 4vCPUs of
the VM scheduled on 1 socket
Me
mo
ry
Memory
Memory
13
Me
mo
ry
mo
Memory
Me
ry
P ro
c
2
Me
mo
ry
P ro
c
1
Memory
Me
mo
ry
Me
mo
ry
The NUMA client calculation is

controlled by
numa.vcpu.maxPerClient, which
can be set as Advanced Host
Attributes->Advanced Virtual
NUMA Attributes
Default for this server is 6
To split the 5th VM across 2
Memory
NUMA Clients, set
numa.vcpu.maxPerClient= 2
Memory
2vCPU VMs
less than
20GB RAM
on each VM
128 GB RAM
on server
4vCPU
4vCPU
Causes Continuous
Migration/imbalance?
4vCPU
ESXi
Scheduler
Me
mo
ry
What happens if 3 off

4vCPU VMs are
deployed in
sequence?
Me
mo
ry
3 off 4vCPU
VMs
40GB RAM
on each
Me
mo
ry
Memory
Memory
14
Me
mo
ry
mo
Memory
Me
ry
Memory
P ro
c
2
Memory
P ro
c
1
Me
mo
ry
Memory
128 GB RAM
on server
NUMA Local Memory with Overhead

Adjustment
Number of Sockets
On vSphere host
vSphere Overhead
Number of VMs
On vSphere host
Physical RAM
On vSphere host
15
Physical RAM
On vSphere host
vSphere RAM
overhead
1% RAM
overhead
NUMA Local Memory with Overhead

Adjustment
For production environments you obviously dont want to run this close
to the NUMA Local Memory Ceiling, instead within 95% of the above
NUMA Local Memory
Prod NUMA Local Memory = 0.95 * NUMA Local Memory
16
Java Platform Categories Category 1

(many smaller JVMs)
Smaller JVMs < 4GB

heap, 4.5GB Java
process, and 5GB for
VM
vSphere hosts with
<96GB RAM is more
suitable, as by the
time you stack the
many JVM instances,
you are likely to reach
CPU boundary before
you can consume all
of the RAM. For
example if instead you
chose a vSphere host
with 256GB RAM, then
256/4.5GB => 57JVMs,
this would clearly reach
CPU boundary
Multiple JVMs per VM

Use Resource pools to
17
manage different
LOBs
Resource Pool 1
Gold LOB 1
Resource Pool 2
SilverLOB 2
Category 1: 100s to 1000s of JVMs
Java Platform Categories

Category 1
Consider using 4
socket servers instead
of 2 sockets to get
more cores
Use 4 socket servers

to get more cores
Category 1: 100s to 1000s of JVMs

18

fewer larger JVMs
Fewer JVMs < 20

Very large JVMs, 32GB to 128GB
Always deploy 1 VM per NUMA node
and size to fit perfectly

1 JVM per VM
Choose 2 socket vSphere hosts, and
install ample memory128GB to
512GB
Example is in memory databases,
like SQLFire and GemFire
Apply latency sensitive BP disable
interrupt coalescing pNIC and vNIC

Dedicated vSphere cluster
Use 2 socket servers

to get larger NUMA
nodes
Category 2: a dozen of very large

JVMs
19

Many Smaller JVMs Accessing Information From Fewer Large JVMs
Resource Pool 1
Gold LOB 1
Resource Pool 2
SilverLOB 2
Category 3: Category-1 accessing data from

Category-2
20
Most Common Sizing and Configuration

Question
JVM-1
JVM-2
JVM-3
JVM-1
JVM-2
JVM-4
2GB
2GB
2GB
2GB
2vCPU
2vCPU 2vCPU
Option-1 Scale out VM and JVM ( best)
2vCPU
Option-3 Scale up VM Memory

and JVM Memory (3rd best)
JVM-1
JVM-1A
JVM-2A
4GB
4GB
2vCPU
2vCPU
Option-2
Scale Up JVM heap size (2nd best)
21
JVM-2
What else to consider when sizing?

Mixed workloads Job Scheduler vs Web
JVM-2
Web
Job
Web
Job
app require different GC Tuning

Job Schedulers care about Throughput
Web apps care about minimize latency
and response time
You cant have both reduced response
time and increased throughput, without
compromise
Separate the concerns for optimal tuning
Vertical
JVM-3
Web
JVM-1
Web
Job
Horizontal
Web
Job
22
JVM-4
Job
Java Platforms Best

Practices and
Tuning Concepts
Most Common VM Size for Java workloads

Category-1 type of workloads
2 vCPU VM with 1 JVM, for tier-1 production workloads
Maintain this ratio as you scale out or scale-up, i.e. 1 JVM : 2vCPU
Scale out preferred over Scale-up, but both can work
You can diverge from this ratio for less critical workloads
2 vCPU VM
1 JVM (-Xmx 4096m)
Approx 5GB RAM Reservation
24
However for Large JVMs + CMS

Category-2 Type of Workloads
Start with 4+ vCPU VM with 1 JVM, for tier1 in memory data management systems
type of production workloads
Likely increase JVM size, instead of
launching a second JVM instance
Multiple 4vCPU+ will allow for
ParallelGCThreads to be allocated 50% of
the available vCPUs to the JVM, i.e. 2 GC
Threads +
Ability to increase ParallelGCThreads is
critical to YoungGen scalability for large
JVMs
ParallelGCThreads should be allocated
50% of available vCPU to the JVM and not
more. You want to ascertain there other
vCPUs available for other txns
For IBM JVM, use gencon policy
25
For large JVMs

4+ vCPU VM
1 JVM (8-128GB)
Which GC?
ESX doesnt care which GC you select, because of
the degree of independence of Java to OS and OS
to Hypervisor
26
Tuning GC Art Meets Science!

Either you tune for Throughput or Latency, one at
the cost of the other
Reduce
Latency
improved R/T
reduce latency impact
slightly reduced throughput
Tuning
Decisions
Web
improved throughput
longer R/T
increased latency
impact
Job
27
Increase
Throughput
Middleware on VMware Best Practices

Enterprise Java
Applications on
VMware Best
Practices Guide
http://www.vmware.com/resources/techresources/108
7
Best Practices for

Performance Tuning
of Latency-Sensitive
Workloads in
vSphere VMs
http://www.vmware.com/resources/techresources/102
20
vFabric SQLFire Best http://www.vmware.com/resources/techresourc

Practices Guide
es/10327
vFabric Reference
Architecture
28
http://tinyurl.com/cjkvftt

Summary
Follow the design and sizing examples we discussed thus far
Set appropriate memory reservation
Leave HT enabled, size bases on vCPU=1.25pCPU if needed
RHEL6 and SLES 11 SP1 have tickless kernel that does not rely
on a high frequency interrupt-based timer, and is therefore
much friendlier to virtualized latency-sensitive workloads
Do not overcommit memory
Locators/heartbeat process should not be vMotion migrated,
it otherwise would lead to network split brain problems
vMotion over 10Gbps when doing scheduled maintenance
Use Affinity and Anti-Affinity rules to avoid redundant copies
on the same VMware ESX/ESXi host
29

Disable NIC interrupt coalescing on physical and
virtual NIC
Extremely helpful in reducing latency for latencysensitive virtual machines
Disable virtual interrupt coalescing for VMXNET3
It can lead to some performance penalties for other virtual
machines on the ESXi host, as well as higher CPU utilization to deal
with the higher rate of interrupts from the physical NIC
This implies it is best to use dedicated ESX cluster

for Middleware Platforms
All host are configured the same way for latency sensitivity and this
insures non middleware workloads, such as other enterprise
applications are not negatively impacted
30
Middleware on VMware Benefits

Flexibility to change compute resources, VM sizes,
add more hosts
Ability to apply hardware and OS patches while
minimizing downtime
Create more manageable system through reduced
middleware sprawl
Ability to tune the entire stack within one platform
Ability to monitor the entire stack within one
platform
Ability to handle seasonal workloads, commit
resources when they are needed and then remove
them when not needed
31

Virtualizing and Tuning Large Scale Java Platforms

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Virtualizing and Tuning Large Scale Java Platforms

Загружено:

Авторское право:

Доступные форматы

Java Application Performance Best

Design and Sizing Java Platforms

Conventional Java Platforms

Web Server Tier

Java App Tier

IT Ops & Apps

Organizational Key Stakeholder Departments

Design and Sizing of Java Platforms

Establish Vertical scalability

DETERMINE HOW MANY

Scale Out Test

Establish Horizontal Scalability

HotSpot JVMs on vSphere

HotSpot JVMs on vSphere

Guest OS Memory approx 1G (depends on OS/other processes)

If you have multiple JVMs (N JVMs) on a VM then:

Larger JVMs for In-Memory Data Grids

If 5th VM is of 4vCPU then it

Split by ESXi into 2 NUMA

The NUMA client calculation is

What happens if 3 off

NUMA Local Memory with Overhead

NUMA Local Memory with Overhead

Prod NUMA Local Memory = 0.95 * NUMA Local Memory

Java Platform Categories Category 1

Smaller JVMs < 4GB

Multiple JVMs per VM

Category 1: 100s to 1000s of JVMs

Java Platform Categories

Use 4 socket servers

Category 1: 100s to 1000s of JVMs

Java Platform Categories Category 2

Fewer JVMs < 20

and size to fit perfectly

interrupt coalescing pNIC and vNIC

Use 2 socket servers

Category 2: a dozen of very large

Java Platform Categories Category 3

Category 3: Category-1 accessing data from

Most Common Sizing and Configuration

Option-1 Scale out VM and JVM ( best)

Option-3 Scale up VM Memory

What else to consider when sizing?

app require different GC Tuning

Java Platforms Best

Most Common VM Size for Java workloads

However for Large JVMs + CMS

For large JVMs

Tuning GC Art Meets Science!

Middleware on VMware Best Practices

Best Practices for

vFabric SQLFire Best http://www.vmware.com/resources/techresourc

Middleware on VMware Best Practices

Middleware on VMware Best Practices

This implies it is best to use dedicated ESX cluster

Middleware on VMware Benefits

Вам также может понравиться