Вы находитесь на странице: 1из 31

Java Application Performance Best

Practice

Agenda

Design and Sizing Java Platforms


Performance
Best Practices and Tuning

Java Platforms
Design and Sizing

Conventional Java Platforms


Java Platforms are multitier and multi org
Load Balancer
Tier

Web Server Tier

Load Balancers

Web Servers

IT Operations
Network Team

IT Operations
Server Team

Java App Tier

DB Server Tier

Java Applications

IT Apps Java
Dev Team

DB Servers

IT Ops & Apps


Dev Team

Organizational Key Stakeholder Departments

Design and Sizing of Java Platforms

Step 2 Establish
Benchmark

Scale Up Test

ESTABLISH BUILDING
BLOCK VM

Establish Vertical scalability


Scale Up Test
Establish how many JVMs on a VM?
Establish how large a VM would be
in terms of vCPU and memory

Building Block
VM

Investigate bottlnecked
layer
Network, Storage,
Application
Configuration, &
vSphere

If building
block app/VM
config
problem,
adjust &
iterate

If scale out
bottlenecked
layer is
removed,
No
iterate scale
out test
Test
complet
e
SLA
OK?

DETERMINE HOW MANY


VMs

Scale Out Test


Building Block
VM

Building Block
VM

Establish Horizontal Scalability


Scale Out Test
How many VMs do you need to
meet your Response Time SLAs
without reaching 70%-80%
saturation of CPU?
Establish your Horizontal
scalability Factor before bottleneck
appear in your application

HotSpot JVMs on vSphere

VM
Memor
y

Guest OS
Memory
Java
Stack

JVM
Memor
y

Perm Gen

JVM
Max
Heap
-Xmx
7

Initial
Heap

-Xss per
thread
Other mem
XX:MaxPermSi
ze

Xms

Direct native
Memory
off-the-heap

Non
Direct
Memory
Heap

HotSpot JVMs on vSphere


VM Memory = Guest OS Memory + JVM Memory
JVM Memory = JVM Max Heap (-Xmx value) + JVM Perm Size (XX:MaxPermSize) +
NumberOfConcurrentThreads * (-Xss) + other Mem

Guest OS Memory approx 1G (depends on OS/other processes)


Perm Size is an area additional to the Xmx (Max Heap) value and is not GCed because it contains class-level information.
other mem is additional mem required for NIO buffers, JIT code cache,
classloaders, Socket Buffers (receive/send), JNI, GC internal info

If you have multiple JVMs (N JVMs) on a VM then:


VM Memory = Guest OS memory + N * JVM Memory
8

Sizing Example
set mem Reservation
to 5088m
VM
Memory
(5088m)

Guest OS
Memory
JVM
Memor
y
(4588
m)
JVM
Max
Heap
-Xmx
(4096
m)

Java
Stack

500m used by OS
-Xss per thread
(256k*100)
Other mem (=217m)

Perm Gen

Initial
Heap

-XX:MaxPermSize
(256m)

-Xms (4096m)

Larger JVMs for In-Memory Data Grids


Set memory
reservation to 34g

VM
Memory
for
SQLFire
(34g)

Guest OS
Memory
JVM
Memory
for
SQLFire
(32g)
JVM
Max
Heap
-Xmx
(30g)

10

Java
Stack

0.5-1g used by OS
-Xss per thread
(1M*500)
Other mem (=1g)

Perm Gen

Initial
Heap

-XX:MaxPermSize
(0.5g)

-Xms (30g)

8 vCPU VMs
less than
47GB RAM
on each VM

If VM is sized greater
than 47GB or 8 CPUs,
Then NUMA interleaving
Occurs and can cause
30% drop in memory
throughput performance

11

or
M
y emo
r
M
y em

y emo
r

Each NUMA
Node has 94/2
47GB

Memor
y
Memor
y
Memor
y
M

Me
m

Me
m

or

Memor
y
Memor
y
Memor
y

Pr
oc
2

or

Pr M
o c em
y
1
or

ESX
Scheduler

96 GB RAM
on Server

4vCPU VM
40GB RAM

1
5

ESXi
Scheduler

If 5th VM is of 4vCPU then it


is NOT NUMA wide and will
be scheduled on one NUMA
Node, i.e. all 4vCPUs of
the VM scheduled on 1 socket

Me
mo
ry

Me
mo
ry

2vCPU VMs
less than
20GB RAM
on each VM

Me

mo

ry

Memory

Memory

12

Me
mo

ry

mo

Memory

Me

ry

Memory

P ro
c
2

Memory

P ro
c
1

Me
mo
ry

Memory

128 GB RAM
on server

7vCPU VM
40GB RAM

1
5

ESXi
Scheduler

Split by ESXi into 2 NUMA


clients if the 5th VM > 6vCPU
If 5th VM is of 4vCPU then it
is NOT NUMA wide and will
be scheduled on one NUMA
Node, i.e. all 4vCPUs of
the VM scheduled on 1 socket

Me

mo

ry

Memory

Memory

13

Me
mo

ry

mo

Memory

Me

ry

P ro
c
2

Me
mo
ry

P ro
c
1

Memory

Me
mo
ry

Me
mo
ry

The NUMA client calculation is


controlled by
numa.vcpu.maxPerClient, which
can be set as Advanced Host
Attributes->Advanced Virtual
NUMA Attributes
Default for this server is 6
To split the 5th VM across 2
Memory
NUMA Clients, set
numa.vcpu.maxPerClient= 2
Memory

2vCPU VMs
less than
20GB RAM
on each VM

128 GB RAM
on server

4vCPU

4vCPU
Causes Continuous
Migration/imbalance?

4vCPU

ESXi
Scheduler

Me
mo
ry

What happens if 3 off


4vCPU VMs are
deployed in
sequence?

Me
mo
ry

3 off 4vCPU
VMs
40GB RAM
on each

Me

mo

ry

Memory

Memory

14

Me
mo

ry

mo

Memory

Me

ry

Memory

P ro
c
2

Memory

P ro
c
1

Me
mo
ry

Memory

128 GB RAM
on server

NUMA Local Memory with Overhead


Adjustment
Number of Sockets
On vSphere host

vSphere Overhead

Number of VMs
On vSphere host

Physical RAM
On vSphere host
15

Physical RAM
On vSphere host

vSphere RAM
overhead

1% RAM
overhead

NUMA Local Memory with Overhead


Adjustment

For production environments you obviously dont want to run this close
to the NUMA Local Memory Ceiling, instead within 95% of the above
NUMA Local Memory

Prod NUMA Local Memory = 0.95 * NUMA Local Memory

16

Java Platform Categories Category 1


(many smaller JVMs)

Smaller JVMs < 4GB


heap, 4.5GB Java
process, and 5GB for
VM
vSphere hosts with
<96GB RAM is more
suitable, as by the
time you stack the
many JVM instances,
you are likely to reach
CPU boundary before
you can consume all
of the RAM. For
example if instead you
chose a vSphere host
with 256GB RAM, then
256/4.5GB => 57JVMs,
this would clearly reach
CPU boundary

Multiple JVMs per VM


Use Resource pools to
17

manage different
LOBs

Resource Pool 1
Gold LOB 1

Resource Pool 2
SilverLOB 2

Category 1: 100s to 1000s of JVMs

Java Platform Categories


Category 1
Consider using 4
socket servers instead
of 2 sockets to get
more cores

Use 4 socket servers


to get more cores

Category 1: 100s to 1000s of JVMs


18

Java Platform Categories Category 2


fewer larger JVMs

Fewer JVMs < 20


Very large JVMs, 32GB to 128GB
Always deploy 1 VM per NUMA node

and size to fit perfectly


1 JVM per VM
Choose 2 socket vSphere hosts, and
install ample memory128GB to
512GB
Example is in memory databases,
like SQLFire and GemFire
Apply latency sensitive BP disable

interrupt coalescing pNIC and vNIC


Dedicated vSphere cluster

Use 2 socket servers


to get larger NUMA
nodes

Category 2: a dozen of very large


JVMs
19

Java Platform Categories Category 3


Many Smaller JVMs Accessing Information From Fewer Large JVMs

Resource Pool 1
Gold LOB 1

Resource Pool 2
SilverLOB 2

Category 3: Category-1 accessing data from


Category-2
20

Most Common Sizing and Configuration


Question
JVM-1

JVM-2

JVM-3
JVM-1

JVM-2
JVM-4

2GB

2GB

2GB

2GB

2vCPU

2vCPU 2vCPU

Option-1 Scale out VM and JVM ( best)

2vCPU

Option-3 Scale up VM Memory


and JVM Memory (3rd best)
JVM-1
JVM-1A

JVM-2A

4GB

4GB

2vCPU

2vCPU

Option-2
Scale Up JVM heap size (2nd best)
21

JVM-2

What else to consider when sizing?


Mixed workloads Job Scheduler vs Web
JVM-2

Web
Job

Web
Job

app require different GC Tuning


Job Schedulers care about Throughput
Web apps care about minimize latency
and response time
You cant have both reduced response
time and increased throughput, without
compromise
Separate the concerns for optimal tuning

Vertical

JVM-3
Web

JVM-1

Web

Job

Horizontal

Web
Job
22

JVM-4

Job

Java Platforms Best


Practices and
Tuning Concepts

Most Common VM Size for Java workloads


Category-1 type of workloads
2 vCPU VM with 1 JVM, for tier-1 production workloads
Maintain this ratio as you scale out or scale-up, i.e. 1 JVM : 2vCPU
Scale out preferred over Scale-up, but both can work
You can diverge from this ratio for less critical workloads
2 vCPU VM
1 JVM (-Xmx 4096m)
Approx 5GB RAM Reservation

24

However for Large JVMs + CMS


Category-2 Type of Workloads
Start with 4+ vCPU VM with 1 JVM, for tier1 in memory data management systems
type of production workloads
Likely increase JVM size, instead of
launching a second JVM instance
Multiple 4vCPU+ will allow for
ParallelGCThreads to be allocated 50% of
the available vCPUs to the JVM, i.e. 2 GC
Threads +
Ability to increase ParallelGCThreads is
critical to YoungGen scalability for large
JVMs
ParallelGCThreads should be allocated
50% of available vCPU to the JVM and not
more. You want to ascertain there other
vCPUs available for other txns
For IBM JVM, use gencon policy
25

For large JVMs


4+ vCPU VM
1 JVM (8-128GB)

Which GC?
ESX doesnt care which GC you select, because of
the degree of independence of Java to OS and OS
to Hypervisor

26

Tuning GC Art Meets Science!


Either you tune for Throughput or Latency, one at
the cost of the other
Reduce
Latency

improved R/T
reduce latency impact
slightly reduced throughput

Tuning
Decisions

Web

improved throughput
longer R/T
increased latency
impact

Job
27

Increase
Throughput

Middleware on VMware Best Practices


Enterprise Java
Applications on
VMware Best
Practices Guide

http://www.vmware.com/resources/techresources/108
7

Best Practices for


Performance Tuning
of Latency-Sensitive
Workloads in
vSphere VMs

http://www.vmware.com/resources/techresources/102
20

vFabric SQLFire Best http://www.vmware.com/resources/techresourc


Practices Guide
es/10327
vFabric Reference
Architecture

28

http://tinyurl.com/cjkvftt

Middleware on VMware Best Practices


Summary
Follow the design and sizing examples we discussed thus far
Set appropriate memory reservation
Leave HT enabled, size bases on vCPU=1.25pCPU if needed
RHEL6 and SLES 11 SP1 have tickless kernel that does not rely
on a high frequency interrupt-based timer, and is therefore
much friendlier to virtualized latency-sensitive workloads
Do not overcommit memory
Locators/heartbeat process should not be vMotion migrated,
it otherwise would lead to network split brain problems
vMotion over 10Gbps when doing scheduled maintenance
Use Affinity and Anti-Affinity rules to avoid redundant copies
on the same VMware ESX/ESXi host

29

Middleware on VMware Best Practices


Disable NIC interrupt coalescing on physical and
virtual NIC
Extremely helpful in reducing latency for latencysensitive virtual machines
Disable virtual interrupt coalescing for VMXNET3
It can lead to some performance penalties for other virtual
machines on the ESXi host, as well as higher CPU utilization to deal
with the higher rate of interrupts from the physical NIC

This implies it is best to use dedicated ESX cluster


for Middleware Platforms
All host are configured the same way for latency sensitivity and this
insures non middleware workloads, such as other enterprise
applications are not negatively impacted

30

Middleware on VMware Benefits


Flexibility to change compute resources, VM sizes,
add more hosts
Ability to apply hardware and OS patches while
minimizing downtime
Create more manageable system through reduced
middleware sprawl
Ability to tune the entire stack within one platform
Ability to monitor the entire stack within one
platform
Ability to handle seasonal workloads, commit
resources when they are needed and then remove
them when not needed
31

Вам также может понравиться