Вы находитесь на странице: 1из 115

Java Internals

Course Details
Objectives
Understand the JVM
Understand the Garbage Collection
Understand the JVM Parameters
How to analyze the garbage collection logs

Intended audience
Project Managers
Architects
Performance engineers
Testers
ELT

Pre-requisites
Basic Concepts of Performance Engineering
Operating System Basics
Web architecture Basics
This course deals with Java Internals and explains how the JVM works
Introduction
Java is a programming language originally developed by James Gosling at Sun Microsystems
Java is a general-purpose, concurrent, class-based, object-oriented language that is specifically
designed to have as few implementation dependencies as possible. It is intended to let application
developers "write once, run anywhere".
Advantages of JAVA
Simple: Java was designed to be easy to use and is therefore easy to write, compile, debug, and learn than
other programming languages. The reason that why Java is much simpler than C++ is because Java uses
automatic memory allocation and garbage collection where else C++ requires the programmer to allocate
memory and to collect garbage.
Object-oriented: Java is object-oriented because programming in Java is centered on creating objects,
manipulating objects, and making objects work together. This allows you to create modular programs and
reusable code.
Platform-independent: One of the most significant advantages of Java is its ability to move easily from one
computer system to another. The ability to run the same program on many different systems is crucial to World
Wide Web software, and Java succeeds at this by being platform-independent at both the source and binary
levels.



Introduction
Distributed: Distributed computing involves several computers on a network working together. Java is designed to
make distributed computing easy with the networking capability that is inherently integrated into it. Writing network
programs in Java is like sending and receiving data to and from a file. For example, the diagram below shows three
programs running on three different systems, communicating with each other to perform a joint task
Interpreted: An interpreter is needed in order to run Java programs. The programs are compiled into Java Virtual
Machine code called bytecode. The bytecode is machine independent and is able to run on any machine that has a Java
interpreter. With Java, the program need only be compiled once, and the bytecode generated by the Java compiler can
run on any platform.
Secure: Java is one of the first programming languages to consider security as part of its design. The Java language,
compiler, interpreter, and runtime environment were each developed with security in mind.
Robust: Robust means reliable and no programming language can really assure reliability. Java puts a lot of emphasis
on early checking for possible errors, as Java compilers are able to detect many problems that would first show up
during execution time in other languages.
Multithreaded: Multithreaded is the capability for a program to perform several tasks simultaneously within a program.
In Java, multithreaded programming has been smoothly integrated into it, while in other languages, operating system-
specific procedures have to be called in order to enable multithreading. Multithreading is a necessity in visual and
network programming.



Java Internals
IBM JDK 1.6
JVM
The IBM Virtual Machine for Java (JVM) is a core component of the Java Runtime Environment (JRE) from
IBM. The JVM is a virtualized computing machine that follows a well-defined specification for the runtime
requirements of the Java programming language
JVM is called Virtual because it provides a machine interface that does not depend on the underlying
operating system and machine hardware architecture
Java programs are compiled into bytecodes (Class file) which are then executed in the JVM
JVM is specific to a operating system and hardware combination
All JVMs:
Execute code that is defined by a standard known as the class file format (bytecode)
Provide fundamental runtime security such as bytecode verification
Provide intrinsic operations such as performing arithmetic and allocating new objects

Java Application Stack
Components of JVM
The JVM API encapsulates all the interaction between external programs and the JVM
The diagnostics component provides Reliability, Availability, and Serviceability (RAS) facilities
to the JVM. The IBM Virtual Machine for Java is distinguished by its extensive RAS
capabilities. The JVM is designed to be deployed in business-critical operations and includes
several trace and debug utilities to assist with problem determination.
The memory management component is responsible for the efficient use of the Java Heap

Components of JVM
The class loader component is responsible for supporting Java's dynamic code loading facilities. The
dynamic code loading facilities include:
Reading standard Java .class files
Resolving class definitions in the context of the current runtime environment
Verifying the bytecodes defined by the class file to determine whether the bytecodes are
language-legal
Initializing the class definition after it is accepted into the managed runtime environment
Various reflection APIs for introspection on the class and its defined members.
The interpreter is the implementation of the stack-based bytecode machine that is defined in the JVM
specification. The bytecodes define the logic of the application. It can switch between running
bytecodes and handing control to the platform-specific machine-code produced by the JIT compiler
(The Just-In-Time (JIT) compiler is a component of the Java Runtime Environment. It improves the
performance of Java applications by compiling bytecodes to native machine code at run time.)
The platform port layer is an abstraction of the native platform functions that are required by the
JVM. Other components of the JVM are written in terms of the platform-neutral platform port layer
functions. Further porting of the JVM requires the provision of implementations of the platform port
layer facilities.
Classloader
Class loading loads, verifies, prepares and resolves, and initializes a class from a Java class file
Loading involves obtaining the byte array representing the Java class file.
Verification of a Java class file is the process of checking that the class file is structurally well-
formed and then inspecting the class file contents to ensure that the code does not attempt to
perform operations that are not permitted.
Preparation involves the allocation and default initialization of storage space for static class
fields. Preparation also creates method tables, which speed up virtual method calls, and object
templates, which speed up object creation.
Initialization involves the processing of the class's class initialization method, if defined, at
which time static class fields are initialized to their user-defined initial values (if specified).
The parent-delegation model requires that any request for a class loader to load a given class is first
delegated to its parent class loader before the requested class loader tries to load the class itself
The JVM has three class loaders, each possessing a different scope from which it can load classes
Bootstrap - responsible for loading only the classes that are from the core Java API
Extensions responsible for loading standard extensions packages in the extensions directory
Application - responsible for load classes from the local file system, and will load files from the
CLASSPATH
JIT Compiler
The Just-In-Time (JIT) compiler is a component of the Java Runtime Environment which improves the
performance of Java applications by compiling bytecodes to native machine code at run time.
The JIT compiler is enabled by default, and is activated when a Java method is called.
When a method has been compiled, the JVM calls the compiled code of that method directly instead of
interpreting it.
Methods are not compiled the first time they are called. For each method, the JVM maintains a call
count, which is incremented every time the method is called. The JVM interprets a method until its call
count exceeds a JIT compilation threshold.
To help the JIT compiler analyze the method, its bytecodes are first reformulated in an internal
representation called trees, which resembles machine code more closely than bytecodes. Analysis and
optimizations are then performed on the trees of the method. At the end, the trees are translated into
native code.
JIT Compiler
The compilation consists of the following phases:
Inlining - Inlining is the process by which the trees of smaller methods are merged, or "inlined",
into the trees of their callers
Local optimizations -Local optimizations analyze and improve a small section of the code at a
time (Ex. Register Usage, Local data flow optimization)
Control flow optimizations - Control flow optimizations analyze the flow of control inside a
method and rearrange code paths to improve their efficiency.
Global optimizations -Global optimizations work on the entire method at once (Ex.
Synchronization optimization, GC optimization)
Native code generation - The trees of a method are translated into machine code instructions;
some small optimizations are performed specific to the platforms architectural characteristics.
All phases except native code generation are cross-platform code.
The compiled code is placed into a part of the JVM process space called the code cache; the location of
the method in the code cache is recorded, so that future calls to it will call the compiled code.
The JVM process consists of the JVM executable files and a set of JIT-compiled code that is linked
dynamically to the bytecode interpreter in the JVM.
Remote Method Invocation
Java Remote Method Invocation (Java RMI) enables you to create distributed Java technology-based
applications that can communicate with other such applications.
Methods of remote Java objects can be run from other Java virtual machines (JVMs), possibly on
different hosts.
The RMI implementation consists of three abstraction layers.
The Stub and Skeleton layer, which intercepts method calls made by the client to the interface reference variable
and redirects these calls to a remote RMI service.
The Remote Reference layer understands how to interpret and manage references made from clients to the
remote service objects.
The Transport layer, which is based on TCP/IP connections between machines in a network. It provides basic
connectivity, as well as some firewall penetration strategies.
RMI uses object serialization to marshal and unmarshal parameters and does not truncate types,
supporting object-oriented polymorphism.
The RMI registry is a lookup service for ports.
Distributed garbage collection:
The RMI subsystem implements reference counting based Distributed Garbage Collection (DGC)
to provide automatic memory management facilities for remote server objects.
The client creates (unmarshalls) a remote reference, it calls dirty() on the server-side DGC. The
call returns a lease guaranteeing that the server-side DGC will not collect the remote object for a
certain time.
After the client has finished with the remote reference, it calls the corresponding clean() method.
The call indicates that the server does not need to keep the remote object alive for this client
RMI provides an easy way to distribute objects, but does not allow for interoperability between
programming languages.
Remote Method Invocation
The Common Object Request Broker Architecture (CORBA) is an open, vendor-independent
specification for distributed computing. It is published by the Object Management Group (OMG).
CORBA enables objects on various platforms and operating systems to interoperate, using the Internet
Inter-ORB Protocol (IIOP).
RMI-IIOP is an extension of traditional Java RMI that uses the IIOP protocol.
This protocol allows RMI objects to communicate with CORBA objects. Java programs can therefore
interoperate transparently with objects that are written in other programming languages, provided that
those objects are CORBA-compliant.
Objects can still be exported to traditional RMI (JRMP) and the two protocols can communicate.
In RMI (JRMP), the server objects are called skeletons; in RMI-IIOP, they are called ties. Client objects
are called stubs in both protocols.
CORBA
Java Native Interface
The Java Native Interface (JNI) establishes a well-defined and platform-independent interface between
the Java code and the Native code.
Native code can be used together with Java in two distinct ways: as "native methods" in a running JVM
and as the code that creates a JVM using the "Invocation API".
Native Methods - Java native methods are declared in Java, implemented in another language
(such as C or C++), and loaded by the JVM as necessary.
Invocation API - The aspect of the JNI used for creating the JVM is called the JNI Invocation API
Java Internals
IBM JDK 1.6
Memory Management
Memory Management
Memory management contains the Garbage Collector and the Allocator. It is responsible for allocating
memory in addition to collecting garbage

















Thread Stacks
Buffers
JIT Compiled Code
Motif structures
Size
Next
Size
Next
freelist
Null
free storage
free storage
Native Heap
Wilderness
Active area of heap
Java Heap
Free List
kCluster
pCluster
System
Heap
Java Heap

The heap is a contiguous area of storage that is obtained from the
operating system at JVM initialization
heapbase is the address of the start of the heap
heaptop is the address of the end of the heap
heaplimit is the address of the top of the currently-used part of
the heap. heaplimit can expand and shrink
The -Xmx option controls the size from heapbaseto heaptop
The -Xms option controls the initial size from heapbase to
heaplimit
Default Value for Xmx
Windows: Half the real storage with a minimum of 16 MB and a
maximum of 2 GB-1
OS/390 and AIX: 64 MB
Linux: Half the real storage with a minimum of 16 MB and a
maximum of 512 MB-1
Default Value for Xms
Windows, AIX, and Linux: 4 MB.
OS/390: 1 MB
Object
Layout of an object on the heap
size + flags slot
The main purpose of this slot is to contain the length of the object
The size + flags slot is four bytes on 32-bit architecture and eight bytes on 64-bit
architecture
Mptr
The mptr slot is four bytes on 32-bit architecture and eight bytes on 64-bit architecture.
Locknflags : Its main use is to contain data for the LK component when locking. (The LK
component handles locking in the JVM)
Object Data
This is where the object data starts, the layout of which is object dependent


The size + flags, mptr, and locknflags are sometimes known collectively as the header.
Object
size + flags :
The bottom three bits are not used for the size, so the Garbage Collector uses them for some
flags to indicate different states of the object
As the size of objects is limited, the top two bits can be used for flags
Bit 1 has several purposes. It is the swapped bit, and is used during compaction. Bit 1 is also the
multipinned bit. It is used to indicate that this object has been pinned multiple times. During a garbage
collection cycle, the multipinned bit is removed and restored to allow the other uses of this multipurpose
bit.
Bit 2 is the dosed bit. The dosed bit is set on if the object is referenced from the stack or registers. (root
objects) . Referenced means that the object cannot be moved in this garbage collection cycle (because
the Garbage Collector cannot fix up the reference because it might not be a real reference but an integer
that happens to have the same value that an object on the heap has).
Bit 3 is the pinned bit. Pinned objects cannot be moved, usually because they are referenced from
outside the heap. Examples of this are Thread and ClassClass objects.
Bit 31 in 32-bit architecture, or bit 63 in 64-bit architecture, is the flat locked contention (flc) bit and is
used by the locking (LK) component.
Bit 32 in 32-bit architecture, or bit 64 in 64-bit architecture, is the hashed bit and is used to denote an
object that has returned its hashed value. This is required because the hash value is the address of the
object and the Garbage Collector needs to maintain this if it moves the object.


Object
Mptr : The mptr has one of two functions:
If this is not an array, the mptr points to the method table, from where the Garbage
Collector can get to the class block. In this way, the Garbage Collector can tell of what
class an object is an instantiation. The method table and class block are allocated by the
class loader (CL) component and are not in the heap
If this is an array, the mptr contains a count of how many array entries are in this object.
Locknflags also contains these flags:
Bit 2 is the array flag. If this bit is set on, the object is an array and the mptr field contains
a count of how many elements are in the array.
Bit 3 is the hashed and moved bit. If this bit is set on, it indicates that this object has been
moved after it was hashed, and that the hash value can be found in the last slot of the
object
The locknflags slot is four bytes on 32-bit architecture and eight bytes on 64-bit architecture,
although only the lower four bytes are used.



Object Allocation
Object Allocation : Object allocation is driven by requests by applications, class libraries, and the JVM for storage of
Java objects, which can vary in size and require different handling
Every allocation requires a heap lock to be acquired to prevent concurrent thread access
To optimize this allocation, particular areas of the heap are dedicated to a thread, known as the TLH (thread
local heap), and that thread can allocate from its TLH without having to lock out other threads. This technique
delivers the best possible allocation performance for small objects. Objects are allocated directly from a thread
local heap.
All objects less than 512 bytes (768 bytes on 64-bit JVMs) are allocated from the cache
Large Object Allocation :
All objects => 64K are termed large from the VM perspective
In practice, objects of 10MB+ in size are usually considered large
The Large Object Area is 5% of the active heap by default.
Any object is first tried to be allocated in the free list of the main heap if there is not enough
contiguous space in the main heap to satisfy the allocation request for object => 64K, then it is
allocated in the Large Object Area (wilderness)
Objects < 64K can only be allocated in the main heap and never in the Large Object Area
The LOA boundary is calculated when the heap is initialized, and recalculated after every garbage collection.
The size of the LOA can be controlled using command-line options: -Xloainitial (0.05[5%]), -Xloaminimum(0),
and Xloamaximum(0.5[50%]). The options take values between 0 and 0.95 (0% thru 95% of the current tenure
heap size).

Types of Allocation
Cache allocation is specifically designed to deliver the best possible allocation performance for small
objects
Objects are allocated directly from a thread local allocation buffer that the thread has previously
allocated from the heap. A new object is allocated from the end of this cache without the need to grab
the heap lock; therefore, cache allocation is very efficient. The criterion for using cache allocation is:
Use cache allocation if the size of the object is less than 512 bytes, or if the object can be
contained in the current cache block
The cache block is sometimes called a thread local heap (TLH)

Heap lock allocation occurs when the
allocation request cannot be satisfied in the
existing cache. Heap lock allocation occurs
when the allocation request is greater than 512
bytes or the allocation cannot be contained in
the existing cache. Heap lock allocation
requires a lock, and is avoided if possible by
using the cache instead.

System Heap

The system heap contains only objects that have a life-expectancy of the life of the JVM
The objects that are in this heap are the class objects for system and shareable middleware and
application classes
The Garbage Collector never collects the system heap because all objects that are in the heap are
either reachable for the lifetime of the JVM, or, in the case of shareable application classes, have
been selected to be reused during the lifetime of the JVM
The system heap is a chain of noncontiguous areas of storage. The initial size of the system heap is
128 KB in 32-bit architecture, and 8 MB in 64-bit architecture. If the system heap fills, it obtains
another extent and chains the extents together.
Reachable Objects & Free List

Reachable Objects
The active state of the JVM is made up of the set of stacks that represents the threads, the
statics that are inside Java classes, and the set of local and global JNI references
All functions that are invoked inside the JVM itself cause a frame on the C stack. This
information is used to find the roots
These roots are then used to find references to other objects. This process is repeated until all
reachable objects are found
Free List
The head of the list is in global storage and points to the first free chunk that is on the heap.
Each chunk of free storage has a size field and a pointer that points to the next free chunk. The
free chunks are in address sequence. The last free chunk has a NULL pointer
Alloc bits and mark bits
Alloc bits and mark bits
These two bit vectors indicate the state of objects that are on the heap. Because all objects that
are on the heap start on an 8-byte boundary, both vectors have one bit to represent eight bytes
of the heap. Therefore, each of these vectors is 1/64 of the heap size
When objects are allocated in the heap, a bit is set on in allocbits to indicate the start of the
object
During the mark phase, a bit is set on in markbits to indicate the start of a live object
Java Internals
IBM JDK 1.6
Garbage Collection
Terminology
Pinned objects are those that cannot be moved because the JNI has given native code direct access
to the contents of the object, e.g., An array or ClassClass objects.
Dosed Objects
All references from a stack or registers to an object cause garbage collector not to move these
objects in compaction phase. Such objects that are temporarily fixed in position are referred to
as dosed objects.
When the method calls are completed and reference from that method frame on stack are
cleared the object can now be moved.
Dark Matter
Any piece of storage that is more than 512 bytes is treated as free space and is available to
mutators or object allocators. Other chunks that are less than 512 bytes are termed dark matter,
and are not available as free space.
Garbage Collection Basics
Garbage Collection is performed when there is:
An allocation failure in the heap lock allocation
Specific call to System.gc
Garbage collection has three phases:
Mark
Sweep
Compaction (optional)
Garbage Collection is a stop-the-world (STW) operation, because all application threads are stopped
while the garbage is collected.
GC occurs in the thread that handled the request
Requested object allocation that caused allocation failure
Programmatically requested GC
On heap lock allocation failure, if at least 30% of the heap has been allocated since the last garbage
collection (30% can be changed with the -Xminf parameter) has been made since the last garbage
collection, and the size of the allocation request is less than 64 KB, the Garbage Collector runs
2 types of Garbage Collector
Mark and Sweep Collector
Generational Collector
Mark Sweep Collector
Obtain locks and suspend threads
Mark phase
Process of identifying all objects reachable from the root set
All live objects are marked by setting a mark bit in the mark bit vector
Reference handling
Enqueuing of finalizers
Sweep phase
Sweep phase identifies all the objects that have been allocated, but no longer referenced
Compaction (optional)
Once garbage has been removed, we consider compacting the resulting set of objects to
remove spaces between them
Release locks and resume threads
Mark Phase
Process of identifying all objects reachable from the root set
The Garbage Collector performs the scan of a thread stack to identify the slot that can be a potential pointer to
an object
Objects that are referenced in this way are known as roots, and have their dosed bit set on to indicate that they
cannot be moved.
All live objects are marked by setting a mark bit in the mark bit vector
Parallel Mark
The majority of garbage collection time is spent marking objects. Therefore, a parallel version of Garbage
Collector Mark has been developed
The time spent marking objects is decreased through the addition of helper threads and a facility that shares
work between those threads
A single application thread is used as the master coordinating thread, often known as the main gc thread. This
thread has the responsibility for scanning C-stacks to identify root pointers for the collection
A platform with N processors also has N-1 new helper threads that work with the master thread to complete the
marking phase of garbage collection
The default number of threads can be overridden with the Xgcthreads<n> parameter where n represents the
number of threads
Mark Phase
Concurrent Mark
Concurrent mark gives reduced garbage collection pause times when heap sizes increase
It starts a concurrent marking phase before the heap is full. In the concurrent phase, the Garbage Collector
scans the roots by asking each thread to scan its own stack. These roots are then used to trace live objects
concurrently.
Tracing is done by a low-priority background thread and by each application thread when it does a heap lock
allocation
A STW (Stop The World) collection is started when one of the following occurs:
Allocation failure
System.gc
Concurrent mark completes all the marking that it can do
This parameter enables concurrent mark: -Xgcpolicy: <gencon | optavgpause | optthruput | subpool>
Gencon - Requests the combined use of concurrent and generational GC to help minimize the time that is spent in
any garbage collection pause.
Optthruput disables concurrent mark. This is the default setting.
Optavgpause enables concurrent mark
Subpool - Disables concurrent mark. It uses an improved object allocation algorithm to achieve better performance
when allocating objects on the heap. This option might improve performance on SMP systems with 16 or more
processors. The subpool option is available only on AIX, Linux PPC and zSeries, z/OS, and i5/OS
Sweep Phase
The sweep phase identifies the intersection of the allocbits and markbits vectors; that is, objects that
have been allocated but are no longer referenced
In the bitsweep technique, the Garbage Collector examines the markbits vector directly and looks for
long sequences of zeros, which probably identify free space. When such a long sequence is found,
the Garbage Collector checks the length of the object at the start of the sequence to determine the
amount of free space that is to be released. If this amount of free space is greater than 512 bytes plus
the header size, this free chunk is put on the freelist
The small areas of storage that are not on the freelist are known as "dark matter", and they are
recovered when the objects that are next to them become free, or when the heap is compacted
The markbits are copied to the allocbits so that on completion, the allocbits correctly represent the
allocated objects that are on the heap

Sweep Phase
Parallel Sweep
Parallel Bitwise Sweep improves sweep time by using all available processors
In Parallel Bitwise Sweep, the Garbage Collector uses the same helper threads that are used in
Parallel Mark, so the default number of helper threads is also the same.
The heap is divided into sections. The number of sections is significantly larger than the number
of helper threads. The calculation for the number of sections is as follows:
32 x the number of helper threads, or the maximum heap size / 16 MB whichever is larger
The helper threads take a section at a time and scan it, performing a modified bitwise sweep.
The results of this scan are stored for each section. When all sections have been scanned, the
freelist is built.
Sweep Phase
Concurrent Sweep
Like concurrent mark, concurrent sweep gives reduced garbage collection pause times when
heap sizes increase
Concurrent sweep starts immediately after a stop-the-world (STW) collection
The mark map used for concurrent mark is also used for sweeping.
The concurrent sweep process is split into two types of operations:
Sweep analysis: Sections of data in the mark map (mark bit array) are analyzed for
ranges of free or potentially free memory.
Connection: The analyzed sections of the heap are connected into the free list.
To enable concurrent sweep, use the -Xgcpolicy: parameter optavgpause. It becomes active
along with concurrent mark.
The modes optthruput, subpool, and gencon do not support concurrent sweep
Compaction Phase
When objects are freed by garbage collection, the heap becomes fragmented. This fragmentation can
cause a state in which enough free space is still available in the heap, but the free space is not
contiguous, so it cannot be used for further object allocations.
Compaction defragments the Java heap. The process of compaction is complicated because, if any
object is moved, the Garbage Collector must change all the references that exist to it.
Represents pinned or dosed objects
Compaction Phase
Compaction occurs if any one of the following is true and -Xnocompactgc has not been
specified:
-Xcompactgc has been specified.
Following the sweep phase, not enough free space is available to satisfy the allocation request.
A System.gc() has been requested, and the last allocation failure garbage collection did not
compact or -Xcompactexplicitgc has been specified
At least half the previously available memory has been consumed by TLH allocations (ensuring an
accurate sample) and the average TLH size falls below 1024 bytes
The scavenger is enabled, and the largest object that the scavenger failed to tenure in the most
recent scavenge is larger than the largest free entry in tenured space.
The heap is fully expanded and less than 4% of old space is free.
Less than 128 KB of the active heap is free.

Compaction Avoidance
Compaction avoidance focuses on correct object placement
and is done using a concept called Wilderness Preservation
Wilderness preservation attempts to keep a region of the
heap in an unused state by focusing allocation activity
elsewhere
The wilderness (Large Object Area) is consumed only when
necessary to satisfy a large allocation, or when not enough
allocation progress has been made since the previous
garbage collection
The wilderness is allocated at the end of the active part of
the heap. Its initial size is 5% of the active part of the heap,
and it expands and shrinks depending on usage
Incremental Compaction
The process of compaction can cause a considerable increase in the pause time of a garbage
collection cycle (Ex. 40 seconds for 1 GB heap). Long pause times are unacceptable for real-world
applications
Incremental compaction is a way of spreading compaction work across garbage collection cycles,
thereby reducing pause times
In incremental compaction, the Garbage Collector splits the heap into sections and compacts each
section in the same way in which it does a full compaction. That is, the Garbage Collector moves all
the moveable objects down the heap
This action retrieves all the dark matter and leaves large areas of free space
Incremental compaction has two main steps:
Identify and remember all references that point into the compaction region; this action is done during the mark
phase. At the end of this stage, all free space that is in the sections can be identified.
Compute the new locations of objects and move them in the compaction region. Then set up pointers to those
objects.
Incremental Compaction
Individual sections on which incremental compaction runs are of fixed size, and therefore constrains
the time required for compaction
Incremental compaction is done only if the heap size is greater than a minimum value (128 MB)
Incremental compaction operates in a cycle. An incremental compaction cycle is a cycle of successive
garbage collection cycles that incrementally compacts the whole heap, a region at a time.
The compaction spans multiple garbage collection cycles, therefore spreading compaction time over
multiple garbage collections and reducing pause times.
Incremental compaction isON by default. (Xpartialcompactgc - enables Incremental compaction ,
-Xnopartialcompactgc, - disables incremental compaction)
The heap is divided into regions
The regions are further divided into sections
Each section is handled by one helper thread
A region is divided into
(number of helper threads +1) or
8 sections (whichever is less)
The whole heap is covered in a few GC cycles.

Reference Objects
A reference object encapsulates a reference to some other object, which is called the referent. Reference objects enable
all references to be handled and processed in the same way. Therefore, two separate objects are created on the heap:
the object itself and a separate reference object.
Objects that are associated with a finalizer are 'registered' with the Finalizer class on creation. The result is the creation
of a Final Reference object that is associated with the Finalizer queue and that refers to the object that is to be finalized.
A Reference Queue is a simple data structure onto which the garbage collector places reference objects when the
reference field is cleared (set to null).
Soft and weak references are automatically cleared by the garbage collector, if the referent objects are not strongly
reachable. Unlike soft and weak references, phantom references are not automatically cleared by the garbage collector
as they are en-queued. An object that is reachable via phantom references will remain so until all such references are
cleared or themselves become unreachable
The Garbage Collector is required to clear all Soft
References before throwing an Out Of Memory Error.
Soft References are considered young till they span
32 GC cycles (age=32) and are not eligible for
collection till they are young.
-Xsoftrefthreshold parameter can be used to adjust
the frequency of collection.
Reference Objects
Going from strongest to weakest, the different levels of reachability reflect the life cycle of an object.
An object is strongly reachable if it can be reached by some thread without traversing any reference objects. A newly-
created object is strongly reachable by the thread that created it.
An object is softly reachable if it is not strongly reachable but can be reached by traversing a soft reference. Soft
reference objects are cleared at the discretion of the garbage collector in response to memory demand
An object is weakly reachable if it is neither strongly nor softly reachable but can be reached by traversing a weak
reference. When the weak references to a weakly-reachable object are cleared, the object becomes eligible for
finalization.
An object is phantom reachable if it is neither strongly, softly, nor weakly reachable, it has been finalized, and some
phantom reference refers to it.
Finally, an object is unreachable, and therefore eligible for reclamation, when it is not reachable in any of the above
ways.
During garbage collection, the referent field is not traced during the marking phase. When marking is complete, the
references are processed in sequence:
Soft - Soft references are for implementing memory-sensitive caches
Weak - Weak references are for implementing canonicalizing mappings that do not prevent their keys (or values) from
being reclaimed,
Final
Phantom - Phantom references are for scheduling pre-mortem cleanup actions in a more flexible way than is possible with
the Java finalization mechanism.
Heap Expansion & Shrinkage
Heap expansion occurs after garbage collection and after all the threads have been restarted, but while
the HEAP_LOCK is still held. The active part of the heap is expanded up to the maximum if any one of
the following is true:
The Garbage Collector did not free enough storage to satisfy the allocation request.
Free space is less than the minimum free space, which you can set by using the -Xminf parameter. The default is
30%.
More than the maximum time threshold is being spent in garbage collection, set using the -Xmaxt parameter. The
default is 13%.
The amount of heap to be expanded are rounded to the nearest 512-byte boundary on 32-bit JVMs or a
1024-byte boundary on 64-bit JVMs
Heap Expansion & Shrinkage
Heap shrinkage occurs after garbage collection, but when all the threads are still suspended.
Shrinkage does not occur if any one of the following is true:
The Garbage Collector did not free enough space to satisfy the allocation request.
The maximum free space, which can be set by the -Xmaxf parameter (default is 60%), is set to 100%.
The heap has been expanded in the last three garbage collections.
This is a System.gc() and the amount of free space at the beginning of the garbage collection was less than -
Xminf (default is 30%) of the live part of the heap.
If none of the above is true and more than -Xmaxf free space exists, the Garbage Collector must
calculate by how much to shrink the heap to get it to -Xmaxf free space, without going below the initial (-
Xms) value.This value is are rounded to the nearest 512-byte boundary on 32-bit JVMs or a 1024-byte
boundary on 64-bit JVMs
A compaction occurs before the shrink if all the following are true:
A compaction was not done on this garbage collection cycle.
No free chunk is at the end of the heap, or the size of the free chunk that is at the end of the heap is less than
10% of the required shrinkage amount.
The Garbage Collector did not shrink and compact on the last garbage collection cycle
Generational Concurrent GC
A generational garbage collection strategy is well suited to an application that creates many short-
lived objects (typically transactional applications).It can be enabled using -Xgcpolicy:gencon.
The Java heap is split into two areas, a new (or nursery) area and an old (or tenured) area.
Objects are created in the new area and, if they continue to be reachable for long enough, they are
moved into the old area. Objects are moved when they have been reachable for enough garbage
collections (known as the tenure age).
The new area is split into two logical spaces: allocate and survivor
Objects are allocated into the Allocate Space. When that space is filled, a garbage collection process called
scavenge is triggered.
During a scavenge, reachable objects are copied either into the Survivor Space or into the Tenured Space if they
have reached the tenured age.
Generational Concurrent GC
When all the reachable objects have been copied, the spaces in the new area switch roles. The new
Survivor Space is now entirely empty of reachable objects and is available for the next scavenge.

Tenure age is a measure of the object age at which it should be promoted to the tenure area
This age is dynamically adjusted by the JVM and reaches a maximum value of 14
An objects age is incremented on each scavenge. A tenure age of x means that an object is promoted to the tenure
area after it has survived x flips between survivor and allocate space.
The threshold is adaptive and adjusts the tenure age based on the percentage of space used in the new area.
Tenured space is concurrently traced with a similar approach to the one used for Xgcpolicy:optavgpause
Java Internals
Sun JDK 1.5
JVM
The Suns Java Virtual Machine for Java (JVM) is a core component of the Java Runtime Environment (JRE) from Sun
Microsystems. The JVM is a virtualized computing machine that follows a well-defined specification for the runtime
requirements of the Java programming language
JVM is called Virtual because it provides a machine interface that does not depend on the underlying operating system and
machine hardware architecture
Java programs are compiled into bytecodes (Class file) which are then executed in the JVM
JVM is specific to a operating system and hardware combination
All JVMs:
Execute code that is defined by a standard known as the class file format (bytecode)
Provide fundamental runtime security such as bytecode verification
Provide intrinsic operations such as performing arithmetic and allocating new objects

Suns JVM is called as the HotSpot JVM
It has 2 flavors, Client for client-side applications and Server VM tuned for
server applications
Previous versions of JVM, such as Classic VM, uses indirect handles to represent object references.
This makes relocating objects easier during garbage collection
Represents significant performance bottleneck, because accesses to the instance variables of objects require two
levels of indirection.
In Java HotSpot VM, no handles are used by Java code. Object references are implemented as direct
pointers.
Provides C-speed access to instance variables.
When an object is relocated during memory reclamation, the garbage collector is responsible for finding and
updating all references to the object in place.
Layout of an Object on the Heap
Object contains the following components
Object data
Header
Java HotSpot VM uses a two machine-word object header
First header word contains information such as the identity hash code and GC status information
Second header word is a reference to the object's class
Only arrays have a third header word, for the array size
Reflective Data are represented as Objects (ex. Class). It enables the same GC to collect such objects
Memory Model



Native Thread Support (including Preemption and Multiprocessing)
Per-thread method activation stacks are represented using the host operating system's stack
and thread model.
Both Java programming language methods and native methods share the same stack, allowing
fast calls between the native code and Java code.
Fully pre-emptive java threads are supported using the host operating system's thread
scheduling mechanism.
A major advantage of using native OS threads and scheduling is the ability to take advantage of
native OS multiprocessing support transparently
Memory Model



Memory management is the process of recognizing when allocated objects are no longer needed,
deallocating (freeing) the memory used by such objects, and making it available for subsequent
allocations
Explicit Memory Management
Programmers responsibility for allocating and freeing memory
Common errors
Dangling references - Object is removed from heap but the pointer to the object is not
removed
Application memory leaks (Ex. De-allocating only the first element of a linked list, causing the
other elements to go out of reach)
Automatic Memory Management
Performed by a program called garbage collector
Garbage collection avoids the dangling reference problem, because an object that is still
referenced somewhere will never be garbage collected and so will not be considered free.
Garbage collection also solves the space leak problem since it automatically frees all memory
no longer referenced.

Memory Management



Garbage Collection Basics
Garbage Collection is performed when there is:
An allocation failure in the heap lock allocation
Specific call to System.gc [can be disabled using -XX:+DisableExplicitGC parameter]
Garbage collector is responsible for
Allocating memory
Ensuring that any referenced objects remain in memory
Recovering memory used by objects that are no longer reachable from references in executing
code.
Garbage collection has three phases:
Mark
Sweep
Compaction (optional)
GC occurs in the thread that handled the request which triggered GC
Beginning with the J2SE Platform version 1.2, the virtual machine incorporated a number of different
garbage collection algorithms that are combined using generational collection
While naive garbage collection examines every live object in the heap, generational collection exploits
several empirically observed properties of most applications to avoid extra work.
Garbage Collector



Infant Mortality
In the graph, the sharp peak at the left represents objects
that can be reclaimed (i.e., have "died") shortly after being
allocated. Iterator objects, for example, are often alive for the
duration of a single loop.
Efficient collection is made possible by focusing on the fact
that a majority of objects "die young".

HotSpot Generations
Memory in the Java HotSpot virtual machine is organized into three generations: a young generation,
an old generation, and a permanent generation
The young generation consists of an area called Eden plus two smaller survivor spaces
Most objects are initially allocated in Eden.
The survivor spaces hold objects that have survived at least one young generation collection
and have thus been given additional chances to die before being considered old enough to be
promoted to the old generation
At any given time, one of the survivor spaces holds such objects, while the other is empty and
remains unused until the next collection.
HotSpot Generations
User Heap (Young and Tenured Generation)
The sizes of the initial heap and maximum heap are calculated based on the size of the physical
memory.
If phys_mem is the size of the physical memory on the platform, the initial heap size will
be set to phys_mem / DefaultInitialRAMFraction.
DefaultInitialRAMFraction is a command line option with a default value of 64.
The maximum heap size will be set to phys_mem / DefaultMaxRAM.
DefaultMaxRAMFraction has a default value of 4.
The Minimum and Maximum heap size can be set using Xms and Xmx parameter
respectively
System Heap (Permanent Generation)
Permanent Generation space is reserved for long-term objects. (mostly Class objects that are
part of the native JVM or created by the application as loaded by ClassLoaders).
The Minimum and Maximum heap size can be set using XX:PermSize=<n>and
XX:MaxPermSize=<n>parameter (the default is 64M) respectively
The space occupied by permanent generation is in addition to the space used by user heap
(Ex. PermGen=128MB, User Heap=512MB, Total RAM occupied = 640MB)


Fast Allocation
Allocations from large contiguous blocks are efficient, using a simple bump-the-pointer technique (i.e)
end of the previously allocated object is always kept track of
For multithreaded applications, allocation operations need to be multithread-safe. If global locks were
used to ensure this, then allocation into a generation would become a bottleneck and degrade
performance.
Thread-Local Allocation Buffers
Improves multithreaded allocation throughput by giving each thread its own buffer
Only one thread can be allocating into each TLAB, allocation can take place quickly by utilizing
the bump-the-pointer technique, without requiring any locking
A thread acquires a TLAB at it's first object allocation after a GC scavenge
The TLAB is released when it is full (or nearly so), or the next GC scavenge occurs
TLABs are allocated only in Eden, never from Survivor-Space or the OldGen.
Size of the TLAB can be specified using -XX:TLABSize flag, the initial size of a TLAB is computed
as: init_size = size_of_eden / (allocating_thread_count * target_refills_per_epoch)
Allocating_thread_count is the expected number of threads which will be actively allocating during the next
epoch (an epoch is the mutator time between GC scavenges.)
Target_refills_per_epoch is the desired number of tlab allocations per thread during an epoch
Types of Collector
Garbage Collection Types
When the young generation fills up, a young generation collection (sometimes referred to as a minor collection) of
just that generation is performed
When the old or permanent generation fills up, what is known as a full collection (sometimes referred to as a
major collection) is typically done. All generations are collected as part of full collection.
Types of Collector
Serial Collector Young Generation
With the serial collector, both young and old collections are done serially (using a single CPU), in a stop-
the-world fashion.
Live objects from the Eden space and From survivor space are copied to the To survivor space
Ones that are too large to fit comfortably in the To survivor space are directly copied to the old generation
Once GC is complete, both Eden and the From survivor space are empty . Only the To survivor space
contains live objects. At this point, the survivor spaces swap roles.
Before GC After GC
Types of Collector
Serial Collector Old Generation
The old and permanent generations are collected via a mark-sweep-compact collection algorithm
In the mark phase, the collector identifies which objects are still live.
The sweep phase sweeps over the generations, identifying garbage.
The collector then performs sliding compaction, sliding the live objects towards the beginning of the old
generation space, leaving any free space in a single contiguous chunk at the opposite end.
Serial Collector the serial collector is automatically chosen as the default garbage collector on machines that are not
server-class machines
Serial Collector can be chosen explicitly using -XX:+UseSerialGC parameter
The serial collector is the collector of choice for most applications that are run on client-style machines and that do not
have a requirement for low pause times

Types of Collector
Parallel Collector
The parallel collector, also known as the throughput collector, was developed in order to take advantage of
available CPUs rather than leaving most of them idle while only one does garbage collection work.
The young generation parallel collector uses a parallel version of the young generation collection algorithm
utilized by the serial collector.
It is still a stop-the-world and copying collector
Performing the young generation collection in parallel, using many CPUs, decreases garbage collection
overhead and hence increases application throughput.
Old generation garbage collection for the parallel collector is done using the same serial mark-sweep-compact
collection algorithm as the serial collector.
The parallel collector is automatically chosen as the default
garbage collector on server-class machines
Parallel collector can be explicitly chosen using
-XX:+UseParallelGC parameter
Parallel collector is the choice for applications that do not have
pause time constraints since infrequent, but potentially long, old
generationcollections will still occur
Types of Collector
Parallel Compacting Collector
Young generation garbage collection for the parallel compacting collector is done using the same algorithm as that
for young generation collection using the parallel collector.
With the parallel compacting collector, the old and permanent generations are collected in a stop-the-world, mostly
parallel fashion with sliding compaction
The collector utilizes three phases.
First, each generation is logically divided into fixed-sized regions.
In the marking phase,
The initial set of live objects directly reachable from the application code is divided among garbage
collection threads, and then all live objects are marked in parallel.
As an object is identified as live, the data for the region it is in is updated with information about the
size and location of the object
Summary phase
The summary phase operates on regions, not objects.
Examines the density of the regions, starting with the leftmost one, until it reaches a point where the
space that could be recovered from a region and those to the right of it is worth the cost of
compacting those regions
The regions to the right of that point will be compacted, eliminating all dead space. The new location
of the first byte of live data for each compacted region will be calculated and stored.
The summary phase is currently implemented as a serial phase
Types of Collector
In the compaction phase,
The garbage collection threads use the summary data to identify regions that need to be filled,
and the threads can independently copy data into the regions.
This produces a heap that is densely packed on one end, with a single large empty block at the
other end.
As with the parallel collector, the parallel compacting collector is beneficial for applications that are run
on machines with more than one CPU.
The parallel operation of old generation collections reduces pause times and makes the parallel
compacting collector more suitable than the parallel collector for applications that have pause time
constraints
The parallel compacting collector might not be suitable for applications run on large shared machines
(such as SunRays), where no single application should monopolize several CPUs for extended periods
of time
In such cases, the number of threads to be used can be controlled using the XX:ParallelGCThreads=<n>
parameter
Parallel compacting collector can be explicitly chosen using -XX:+UseParallelOldGC parameter
Types of Collector
Concurrent Mark-Sweep (CMS) Collector
Young generation collections do not typically cause long pauses. However, old generation collections, though
infrequent, can impose long pauses, especially when large heaps are involved. To address this issue, the HotSpot
JVM includes a collector called the concurrent mark-sweep (CMS) collector, also known as the low-latency
collector.
The CMS collector collects the young generation in the same manner as the parallel collector.
Initial Mark - identifies the initial set of live objects directly reachable from the application code.
Concurrent marking phase - marks all live objects that are transitively reachable from this set. This
happens when the application is running.
Remark Phase - finalizes marking by revisiting any objects that were modified during the concurrent
marking phase. Efficiency is increased by running multiple threads
Concurrent sweep phase - reclaims all the garbage that has been identified
Types of Collector
The CMS collector is the only collector that is non-compacting. Hence the free space is not contiguous, free lists has
to be maintained.
Floating garbage - Objects that become garbage during the mark phase will not be reclaimed until the next old
generation collection.
Fragmentation - Garbage collector tracks popular object sizes, estimates future demand, and may split or join free
blocks to meet demand
As it is not a stop-the-world process, the CMS collector starts at a time based on statistics regarding previous
collection times and how quickly the old generation becomes occupied.
The CMS collector will also start a collection if the occupancy of the old generation exceeds something called the
initiating occupancy.
The value of the initiating occupancy is set by the command line option XX:CMSInitiatingOccupancyFraction=n,
where n is a percentage of the old generation size. The default is 68.
For machines with lesser number of CPUs,
The CMS collector can be used in a mode in which the concurrent phases are done incrementally
The work done by the collector is divided into small chunks of time that are scheduled between young
generation collections
CMS collector can be used if the application needs shorter garbage collection pauses and can afford to share
processor resources with the garbage collector when the application is running. (Ex. Web Servers)
CMS collector can be explicitly chosen using -XX:+UseConcMarkSweepGC parameter
CMS Incremental Collector can be explicitly chosen using XX:+CMSIncrementalMode parameter

Reference Objects
Soft references are cleared less aggressively in the server virtual machine than the client.
The rate of clearing can be slowed by increasing -XX:SoftRefLRUPolicyMSPerMB=<n> parameter
SoftRefLRUPolicyMSPerMB is a measure of the time that a soft reference survives for a given amount of free space in
the heap.
The default value is 1000 ms per megabyte. This can be read to mean that a soft reference will survive (after the last
strong reference to the object has been collected) for 1 second for each megabyte of free space in the heap.

Tunning Garbage Collector
Maximum Pause Time Goal
The maximum pause time goal is specified with the command line option -XX:MaxGCPauseMillis=<n> where n
represents the desired pause times in milliseconds
Heap size and other garbage collection-related parameters will be adjusted in an attempt to keep garbage
collection pauses shorter than n milliseconds.
May reduce overall throughput of the application
By default no maximum pause time goal is set.
Throughput Goal
The throughput goal is measured as the time spent doing garbage collection / the time spent outside of
garbage collection (application time). The time spent in garbage collection is the total time for all generations.
The goal is specified by the command line option
-XX:GCTimeRatio=<n>
The ratio of garbage collection time to application time is 1 / (1 + n). For example -XX:GCTimeRatio=19 sets a
goal of 5% of the total time for garbage collection.
If the goal is not met, the sizes of the generations are increased in an effort to increase the time the application
can run between collections.
The default goal is 1% (i.e. n= 99).

Tunning Garbage Collector
Footprint Goal
If the throughput and maximum pause time goals have been met, the garbage collector reduces
the size of the heap until one of the goals (invariably the throughput goal) cannot be met.
Then, the goal that is not being met will be addressed
Goal Priorities
The parallel collector prioritizes the goals in the following order
Maximum pause time
Throughput
Footprint
The statistics (e.g., average pause time) are updated at the end of each collection
Checks whether the goals are met else any needed adjustments to the size of a generation is made
Explicit garbage collections (calls to System.gc()) are ignored in terms of keeping statistics and
making adjustments to the sizes of generations


Tunning Garbage Collector
Adjusting the Size of Generations
Growing and shrinking the size of a generation is done by increments that are a fixed percentage of the size of
the generation
By default a generation grows in increments of 20% and shrinks in increments of 5%.
Growth percentage is adjusted using the following parameters
-XX:YoungGenerationSizeIncrement=<n > for the young generation
-XX:TenuredGenerationSizeIncrement=<n> for the tenured generation
Shrink percentage is adjusted using -XX: AdaptiveSizeDecrementScaleFactor=<n > parameter
If the size of an increment for growing is X percent, the size of the decrement for shrinking will be X / n percent.
At startup, there is a supplemental percentage added to the growth percentage.
Supplement decays with the number of collections
The intent of the supplement is to increase startup performance.
There is no supplement used for shrink percentage
Maximum pause time goal
If not met, the size of only one generation is shrunk at a time
If not met for both generations , the size of the generation with the larger pause time is shrunk first.
Throughput goal
If not met, the sizes of both generations are increased. Each is increased in proportion to its respective
contribution to the total garbage collection time. (ex.) if young generation collection time is 25% of the total
collection time and if growth percentage is 20%, then the young generation would be increased by 5%.

Key settings related to GC
Java Internals
IBM JDK 1.6
Verbose GC
Verbose GC
-verbose:gc option can be used for understanding what is happening during GC

<gc type="global" id="5" totalid="5" intervalms="18.880">
<compaction movecount="9282" movebytes="508064" reason="forced compaction" />
<classunloading classloaders="0" classes="0" timevmquiescems="0.000" timetakenms="0.064"/>
<expansion type="tenured" amount="1048576" newsize="3145728" timetaken="0.011"
reason="insufficient free space following gc" />
<refs_cleared soft="0" weak="0" phantom="0" />
<finalization objectsqueued="0" />
<timesms mark="7.544" sweep="0.088" compact="9.992" total="17.737" />
<tenured freebytes="1567256" totalbytes="3145728" percent="49" >
<soa freebytes="1441816" totalbytes="3020288" percent="47" />
<loa freebytes="125440" totalbytes="125440" percent="100" />
</tenured>
</gc>
<gc> Indicates that a garbage collection was triggered on the heap.
Type="global" indicates that the collection was global (mark, sweep, possibly compact).
The id attribute gives the occurrence number of this global collection.
The totalid indicates the total number of garbage collections (of all types) that have taken place.
intervalms gives the number of milliseconds since the previous global collection.
Verbose GC
<compaction>
Shows the number of objects that were moved during compaction and the total number of bytes
these objects represented. The reason for the compaction is also shown. In this case, the
compaction was forced, because -Xcompactgc was specified on the command line.
This line is displayed only if compaction occurred during the collection.
<classunloading>
Lists the number of class loaders unloaded in this garbage collection and how many actual
classes were unloaded by that operation.
timevmquiescems as the number of milliseconds that the GC had to wait for the VM to stop so
that it could begin unloading the classes
timetakenms which is the number of milliseconds taken to perform the actual unload.
This tag is only present if a class unloading attempt was made.
<expansion>
Indicates that during the handling of the allocation (but after the garbage collection), a heap
expansion was triggered
The area expanded, the amount by which the area was increased (in bytes), its new size, the time
taken to expand, and the reason for the expansion are shown.
Verbose GC
<refs_cleared>
Provides information relating to the number of Java Reference objects that were cleared during
the collection. In this example, no references were cleared.
<finalization>
Provides information detailing the number of objects containing finalizers that were enqueued for
VM finalization during the collection.
The number of objects is not equal to the number of finalizers that were run during the collection,
because finalizers are scheduled by the VM.
<timems>
Provides information detailing the times taken for mark, sweep and compact phases along with
the total time taken.
When compaction was not triggered, the number returned for compact is zero.
<tenured>
Indicates the status of the tenured area following the collection
Shows the occupancy levels of the different heap areas before the garbage collection - both the
small object area (SOA) and the large object area (LOA).
Verbose GC
<sys id="1" timestamp="Jul 15 12:56:26 2005" intervalms="0.000">
<time exclusiveaccessms="0.018" />
<refs soft="7" weak="8" phantom="0" />
<tenured freebytes="821120" totalbytes="4194304" percent="19" >
<soa freebytes="611712" totalbytes="3984896" percent="15" />
<loa freebytes="209408" totalbytes="209408" percent="100" />
</tenured>
<gc type="global" id="1" totalid="1" intervalms="0.000">
<classunloading classloaders="0" classes="0" timevmquiescems="0.000" timetakenms="0.064" />
<refs_cleared soft="0" weak="4" phantom="0" />
<finalization objectsqueued="6" />
<timesms mark="3.065" sweep="0.138" compact="0.000" total="3.287" />
<tenured freebytes="3579072" totalbytes="4194304" percent="85" >
<soa freebytes="3369664" totalbytes="3984896" percent="84" />
<loa freebytes="209408" totalbytes="209408" percent="100" />
</tenured>
</gc>
<tenured freebytes="3579072" totalbytes="4194304" percent="85" >
<soa freebytes="3369664" totalbytes="3984896" percent="84" />
<loa freebytes="209408" totalbytes="209408" percent="100" />
</tenured>
<refs soft="7" weak="4" phantom="0" />
<time totalms="3.315" />
</sys>
-verbose:gc output when a System.GC is executed
Verbose GC
<sys>
Indicates that a System.gc() has occurred.
The id attribute gives the number of this System.gc() call; in this case, this is the first such call in
the life of this VM.
timestamp gives the UTC timestamp when the System.gc() call was made
intervalms gives the number of milliseconds that have elapsed since the previous System.gc()
call. In this case, because this is the first such call, the number returned is zero.
<time exclusiveaccessms=>
Shows the amount of time taken to obtain exclusive VM access.
optional line <warning details="exclusive access time includes previous garbage collections" />
might occasionally be displayed, to inform you that the following garbage collection was queued
because the allocation failure was triggered while another thread was already performing a
garbage collection
<time>
Shows the total amount of time taken to handle the System.gc() call (in milliseconds).
Verbose GC
<gc type="scavenger" id="11" totalid="11" intervalms="46.402">
<failed type="tenured" objectcount="24" bytes="43268" />
<flipped objectcount="523" bytes="27544" />
<tenured objectcount="0" bytes="0" />
<refs_cleared soft="0" weak="0" phantom="0" />
<finalization objectsqueued="0" />
<scavenger tiltratio="67" />
<nursery freebytes="222208" totalbytes="353280" percent="62" tenureage="2" />
<tenured freebytes="941232" totalbytes="1572864" percent="59" >
<soa freebytes="862896" totalbytes="1494528" percent="57" />
<loa freebytes="78336" totalbytes="78336" percent="100" />
</tenured>
<time totalms="0.337" />
</gc> -verbose:gc output when a Scavenge GC occurs
<gc>
Indicates that a garbage collection has been triggered. The type="scavenger" attribute indicates that the collection is
a scavenger collection.
<failed type="tenured">
Indicates that the scavenger failed to move some objects into the old or tenured area during the collection. The
output shows the number of objects that were not moved, and the total bytes represented by these objects.
If <failed type="flipped"> is shown, the scavenger failed to move or flip certain objects into the survivor space.
Verbose GC
<flipped>
Shows the number of objects that were flipped into the survivor space during the scavenger
collection, together with the total number of bytes flipped.
<scavenger tiltratio="x" />
Shows the percentage of the tilt ratio following the last scavenge event and space adjustment.
The scavenger redistributes memory between the allocate and survivor areas using a process
called tilting. Tilting controls the relative sizes of the allocate and survivor spaces, and the tilt
ratio is adjusted to maximize the amount of time between scavenges
<tenured>
Shows the number of objects that were moved into the tenured area during the scavenger
collection, together with the total number of bytes tenured.
<nursery>
Shows the amount of free and total space in the nursery area after a scavenge event. The output
also shows the number of times an object must be flipped in order to be tenured. This number is
the tenure age, and is adjusted dynamically.
<time>
Shows the total time taken to perform the scavenger collection, in milliseconds
Verbose GC
<af type="nursery" id="28" timestamp="Jul 15 13:11:45 2005" intervalms="65.0
16">
<minimum requested_bytes="520" />
<time exclusiveaccessms="0.018" />
<refs soft="7" weak="4" phantom="0" />
<nursery freebytes="0" totalbytes="8239104" percent="0" />
<tenured freebytes="5965800" totalbytes="21635584" percent="27" >
<soa freebytes="4884456" totalbytes="20554240" percent="23" />
<loa freebytes="1081344" totalbytes="1081344" percent="100" />
</tenured>
<gc type="scavenger" id="28" totalid="30" intervalms="65.079">
<expansion type="nursery" amount="1544192" newsize="9085952" timetaken="0.017"
reason="excessive time being spent scavenging" />
<flipped objectcount="16980" bytes="2754828" />
<tenured objectcount="12996" bytes="2107448" />
<refs_cleared soft="0" weak="0" phantom="0" />
<finalization objectsqueued="0" />
<scavenger tiltratio="70" />
<nursery freebytes="6194568" totalbytes="9085952" percent="68" tenureage="1" />
<tenured freebytes="3732376" totalbytes="21635584" percent="17" >
<soa freebytes="2651032" totalbytes="20554240" percent="12" />
<loa freebytes="1081344" totalbytes="1081344" percent="100" />
</tenured>
<time totalms="27.043" />
</gc>
<nursery freebytes="6194048" totalbytes="9085952" percent="68" />
<tenured freebytes="3732376" totalbytes="21635584" percent="17" >
<soa freebytes="2651032" totalbytes="20554240" percent="12" />
<loa freebytes="1081344" totalbytes="1081344" percent="100" />
</tenured>
<refs soft="7" weak="4" phantom="0" />
<time totalms="27.124" />
</af>
- verbose:gc output when allocation failure occurs in new area (nursery)
Verbose GC
<af type="nursery">
Indicates that an allocation failure has occurred when attempting to allocate to the new area.
The id attribute shows the index of the type of allocation failure that has occurred.
timestamp shows a local timestamp at the time of the allocation failure.
intervalms shows the number of milliseconds elapsed since the previous allocation failure of that
type.
<minimum>
Shows the number of bytes requested by the allocation that triggered the failure. Following the
garbage collection, freebytes might drop by more than this amount. The reason is that the free list
might have been discarded or the Thread Local Heap (TLH) refreshed.
<nursery> and <tenured>
The first set of <nursery> and <tenured> tags show the status of the heaps at the time of the
allocation failure that triggered garbage collection.
The second set of tags shows the status of the heaps after the garbage collection has occurred.
The third set of tags shows the status of the different heap areas following the successful
allocation.
Verbose GC
<af type="tenured" id="2" timestamp="Jul 15 13:17:11 2005" intervalms="450.0
57">
<minimum requested_bytes="32" />
<time exclusiveaccessms="0.015" />
<refs soft="7" weak="4" phantom="0" />
<tenured freebytes="104448" totalbytes="2097152" percent="4" >
<soa freebytes="0" totalbytes="1992704" percent="0" />
<loa freebytes="104448" totalbytes="104448" percent="100" />
</tenured>
<gc type="global" id="4" totalid="4" intervalms="217.002">
<expansion type="tenured" amount="1048576" newsize="3145728" timetaken="0.008"
reason="insufficient free space following gc" />
<refs_cleared soft="0" weak="0" phantom="0" />
<finalization objectsqueued="5" />
<timesms mark="4.960" sweep="0.113" compact="0.000" total="5.145" />
<tenured freebytes="1612176" totalbytes="3145728" percent="51" >
<soa freebytes="1454992" totalbytes="2988544" percent="48" />
<loa freebytes="157184" totalbytes="157184" percent="100" />
</tenured>
</gc>
<tenured freebytes="1611632" totalbytes="3145728" percent="51" >
<soa freebytes="1454448" totalbytes="2988544" percent="48" />
<loa freebytes="157184" totalbytes="157184" percent="100" />
</tenured>
<refs soft="7" weak="4" phantom="0" />
<time totalms="5.205" />
</af>
-verbose:gc output when allocation failure occurs in old area (tenured)
Concurrent Garbage Collection
Concurrent kickoff
The below output is produced when the concurrent mark process is triggered.
<con event="kickoff" timestamp="Nov 25 10:18:52 2005">
<stats tenurefreebytes="2678888" tracetarget="21107394 kickoff="2685575" tracerate="8.12" />
</con>
Tenurefreebytes - the amount of free space in the tenured area
Tracetarget - the target amount of tracing to be performed by concurrent mark
Kickoff - The kickoff threshold at which concurrent is triggered
tracerate - initial trace rate. The trace rate represents the amount of tracing each mutator thread should perform
relative to the amount of space it is attempting to allocate in the heap.
If running in generational mode, an additional nurseryfreebytes= attribute is displayed, showing the status of the
new area as concurrent mark was triggered.
Concurrent sweep completed
This output shows that the concurrent sweep process (started after the previous garbage collection completed)
has finished. The amount of bytes swept and the amount of time taken is shown.
<con event="completed sweep" timestamp="Fri Jul 15 13:52:08 2005">
<stats bytes="0" time="0.004" />
</con>
Concurrent Garbage Collection
Allocation failures
<con event="aborted"> shows that, as a result of the allocation failure, concurrent mark tracing was aborted.
The below output is produced when concurrent mark is halted.
<con event="halted" mode="trace only">
<stats tracetarget="2762287">
<traced total="137259" mutators="137259" helpers="0" percent="4" />
<cards cleaned="0" kickoff="115809" />
</stats>
</con>
<con event="final card cleaning">
<stats cardscleaned="16" traced="2166272" durationms="22.601" />
</con>
<con event="halted">
Shows that concurrent mark tracing was halted as a result of the allocation failure.
The tracing target is shown, together with the amount that was performed, both by mutator threads and the
concurrent mark background thread.
The number of cards cleaned during concurrent marking is also shown, with the free-space trigger level for card
cleaning. Card cleaning occurs during concurrent mark after all available tracing has been exhausted.
<con event="final card cleaning>
Indicates that final card cleaning occurred before the garbage collection was triggered. The number of cards cleaned
during the process and the number of bytes traced is shown, along with the total time taken by the process.
Concurrent Garbage Collection
If concurrent mark completes all tracing and card cleaning, a concurrent collection is triggered.
<con event="collection" id="15" timestamp="Jul 15 15:13:18 2005 intervalms="1875.113>
-
-
<stats tracetarget="26016936">
<traced total="21313377" mutators="21313377" helpers="0" percent="81" />
<cards cleaned="14519" kickoff="1096607" />
</stats>
<con event="completed full sweep" timestamp="Jul 15 15:13:18 2005">
<stats sweepbytes="0" sweeptime="0.009" connectbytes="5826560 connecttime="0.122" />
</con>
<con event="final card cleaning">
<stats cardscleaned="682" traced="302532" durationms="3.053" />
</con>
<con event="collection">
Shows that a concurrent collection has been triggered. The id attribute shows the number of this
concurrent collection, next is a local timestamp, and the number of milliseconds since the
previous concurrent collection is displayed.
Concurrent Garbage Collection
<stats>
The target amount of tracing is shown, together with the amount that took place (both by mutators
threads and helper threads).
Information is displayed showing the number of cards in the card table that were cleaned during
the concurrent mark process, and the heap occupancy level at which card cleaning began.
<con event="completed full sweep">
Shows that the full concurrent sweep of the heap was completed. The number of bytes of the
heap swept is displayed with the amount of time taken, the amount of bytes swept that were
connected together, and the time taken to do this.
<con event="final card cleaning">
Shows that final card cleaning has been triggered. The number of cards cleaned is displayed,
together with the number of milliseconds taken to do so.
Java Internals
Sun JDK 1.5
Verbose GC
Verbose GC
The GC details can be printed using -XX:+PrintGC, -XX:+PrintGCTimeStamps, -XX:+PrintGCDetails
Aging information of objects in young generation can belogged by using -XX:+PrintTenuringDistribution switch


Verbose GC CMS Collector
[GC 39.910: [ParNew: 261760K->0K(261952K), 0.2314667 secs] 262017K->26386K(1048384K), 0.2318679 secs] - Young
generation (ParNew) collection
[GC [1 CMS-initial-mark: 26386K(786432K)] 26404K(1048384K), 0.0074495 secs] - This is the initial marking phase of
CMS where all the objects directly reachable from roots are marked. This is a stop-the-world process.
[CMS-concurrent-mark-start] & [CMS-concurrent-mark: 0.521/0.529 secs] - Marking of live objects. Concurrent mark is
a concurrent phase performed with all other threads running.
[CMS-concurrent-preclean-start] & [CMS-concurrent-preclean: 0.017/0.018 secs] - Precleaning is also a concurrent
phase. This phase identifies the objects in CMS heap which were updated by promotions from the young generation or new
allocations, or were updated by mutators while the concurrent marking phase is ON.
[GC40.704: [Rescan (parallel) , 0.1790103 secs]
[weak refs processing, 0.0100966 secs]
[1 CMS-remark: 26386K(786432K)] 52644K(1048384K), 0.1897792 secs]
Stop-the-world phase. This phase rescans any residual updated objects in CMS heap, retraces from the roots and also
processes reference objects.
[CMS-concurrent-sweep-start] & [CMS-concurrent-sweep: 0.126/0.126 secs] - Sweeping of dead/non-marked objects.
Sweeping is a concurrent phase performed with all other threads running.
[CMS-concurrent-reset-start] & [CMS-concurrent-reset: 0.127/0.127 secs] - Reset phase re-initializes the CMS data
structures so that a new cycle may begin at a later time
Wherever the time is specified as x/y secs, x denotes the CPU time and y denotes the wall time (includes the yield to other threads also)
88



2008, Cognizant Technology Solutions. Confidential
HotSpot GC Log Samples - 1
Young Generation Too Small
Overall Heap Size : 32 Mb
Young Heap Size : 4 Mb






Increasing young generation size
Overall Heap Size : 32 Mb
Young Heap Size : 8 Mb



89



2008, Cognizant Technology Solutions. Confidential

Small Tenured Generation size
Overall Heap Size : 32 Mb
Young Heap Size : 8 Mb







Large Tenured Generation size
Overall Heap Size : 64 Mb
Young Heap Size : 8 Mb





Major collection pause : 0.13 secs
Major collections occur every : 10 secs

Major collection pause : 0.21 secs
Major collections occur every : 30 secs
HotSpot GC Log Samples - 2
Java Internals
IBM JDK 1.5
OutOfMemory
OutOfMemoryError
An OutOfMemoryError exception results from running out of space on the Java heap or the native heap
An OutOfMemory error does not indicate a memory leak, just that the steady state of memory use that is
required is higher than that available.
The first step is to determine which heap is being exhausted and increase the size of that heap
If the problem is occurring because of a real memory leak, increasing the heap size does not solve the problem,
but does delay the onset of the OutOfMemoryError exception or error conditions. That delay can be helpful on
production systems.
The maximum size of an object that can be allocated is limited only by available memory.
The maximum number of array elements supported is 2^31 1. In reality, such huge arrays amy run into issues due to
unavailability of memory.
These limits apply to both 32-bit and 64-bit JVMs.
Java Heap Exhaustion
The Java heap becomes exhausted when garbage collection cannot free enough objects to make a new object
allocation
Java heap exhaustion can be identified from the -verbose:gc output by garbage collection occurring more and
more frequently, with less memory being freed
If the Java heap is being exhausted, and increasing the Java heap size does not solve the problem, the next
stage is to examine the objects that are on the heap, and look for suspect data structures that are referencing
large numbers of Java objects that should have been released.

OutOfMemoryError
Native Heap Exhaustion
Native memory OutOfMemoryError exceptions might occur when loading classes, starting threads, or using
monitors
Native heap exhaustion can be monitored the svmon snapshot output for AIX and -Xdump:heap option
The java.lang.OutOfMemoryError: Failed to create a thread message
occurs when the system does not have enough resources to create a new thread.
There are two possible causes
There are too many threads running and the system has run out of internal resources to create new
threads.
The system has run out of native memory to use for the new thread. Threads require a native memory for
internal JVM structures, a Java stack, and a native stack.
To correct the problem, either:
Increase the amount of native memory available by lowering the size of the Java heap using the -Xmx
option.
Lower the number of threads in your application.

OutOfMemoryError
Information required for diagnising the OutOfMemoryCondition.
The error itself with any message or stack trace that accompanied it.
-verbose:gc output. (Even if the problem is determined to be native heap exhaustion, it can be
useful to see the verbose gc output.)
As appropriate:
The Heapdump output
The javacore.txt file Contains the threads and their stacktrace along with other
infromation like locks, monitors, deadlocks, storage memory, shared classes, classloader
and classes
Generating Dumps
Generating Heap Dumps
In order to generate java core dump, system core dump, heap dump and a snap dump at user signal, the dump
agents must be configured through JVM options as follows. -Xdump:java+heap+system+snap:events=user
When the JVM process command window is available, generate dumps as follows.
Windows - Press CRTL+Break on the command window to generate the dumps
Linux or AIX - Press CTRL+\ on the shell window
When the JVM process command window is not available, generate dumps as follows
Windows Use SendSignal utility
Linux or AIX Use KILL -3 <PID> command
Heap dump format
Xdump:heap:opts=PHD (default)
Xdump:heap:opts=CLASSIC
Xdump:heap:opts=PHD+CLASSIC
JAVA_DUMP_OPTS =
<ONcond1(DumType1,DumType2) , ONcond 2>
condition can be: ANYSIGNAL / DUMP / ERROR /
INTERRUPT / EXCEPTION / OUTOFMEMORY
dumptype can be: ALL / NONE / JAVADUMP /
SYSDUMP / HEAPDUMP / |CEEDUMP (z/OS
specific)

Following environment variables can also be used
Value set for JAVA_DUMP_OPTS takes the highest precedence
Java Internals
Sun JDK 1.5
OutOfMemory
OutOfMemoryError
An OutOfMemoryError does not necessarily imply a memory leak.
Can be thrown if the heap is not sized properly
Can be thrown if the external handles are not closed properly (Ex. DB Connections, File Handles, Reference to
EJB objects in remote JVM etc.)
Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
Thrown if the permanent space is not sized properly
Memory Leak
An object can hold a reference that prevents a class from being collected even though it's no longer in
active use
Usually, it is due to a class stored in the shared directory (Directory to store common libraries (lib
directory) to be used by all applications running on the server)
For tracing class loading, use -XX:+TraceClassLoading and -XX:+TraceClassUnloading parameters,
and find the classes that were loaded but not unloaded.
-XX:+TraceClassResolution parameter will help us track the class resolution
OutOfMemoryError When There's Still Memory Available
Thrown when the JVM is unable to find contiguous space for allocating an object
Identify which of the generations is relatively empty (Ex. Applications dealing with large objects will have
relatively empty young generation)
Initial fix could be to use -XX:NewRatio parameter so as to provide more space to the old generation
Permanent fix will be to reduce the size of the objects created by the application




OutOfMemoryError
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
Thrown if the there is not enough memory to create a thread
Each thread takes about half a megabyte for its stack (Ex. For a 2 GB process [2GB
includes user and system heap], maximum of 5,000 threads can be created, assuming all
the memory is used for threads)
Operating system has limitations on the number of threads that can be created by a
process.
Heap could be fragmented
Heap could have been used by application objects and hence there might not be enough
contiguous space for creating thread. In such cases, increase the heap size.
To find the root cause of out-of-memory errors,
Look at the stack trace will help in identifying whether the exception is because of a large
object
Look at the GC logs will help in identifying the heap growth pattern. If there had been
continuous growth in the heap till the exception and the GC is unable to collect enough memory,
then there could be a memory leak. Otherwise, it might be a memory requirement.


Heap Dump
For Memory Leak analysis
Identify the frequency of the memory leak from the start of the server (Ex. 8 hours after the
server restart)
Use -XX:+HeapDumpOnOutOfMemoryError parameter to get the heap dump when
OutOfMemory happens
Use Heap Analysis Tool (HAT) (the jconsole management tool, and the jmap tool with the histo
option) to understand the objects in the heap
Restart the application and generate periodic heap dumps using <JDK Path>/bin/jmap -
dump:live,file=heap.dump.out,format=b <pid> till OutOfMemory occurs
Analyze the heap dumps to identify the potential objects that could have caused the memory
leak
Thread dumps can be generated using kill -3 <pid> command for unix and SendSignal utility for
windows (For applications running in command window, Ctrl+\ for unix and Ctrl+Break for windows)

Another option generate Heap Dump in Unix is to generate a code dump using gcore [-pgF] [-o filename] [-c content]
<pid> command. Jmap can be used to extract the heap dump from the core file.
Java Internals
IBM JDK 1.6
Tools
Tools
Health Center - http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/
Using Health Center will enable us to:
Identify if native or heap memory is leaking
Discover which methods are taking most time to run
Pin down I/O bottlenecks
Visualize and tune garbage collection
View any lock contentions
Analyse unusual WebSphere Real Time events
Memory Analyzer - http://www.ibm.com/developerworks/java/jdk/tools/memoryanalyzer/
Using Memory Analyzer will enable us to:
Diagnose and resolve memory leaks involving the Java heap
Derive architectural understanding of your Java application through footprint analysis
Improve application performance by tuning memory footprint and optimizing Java collections and Java
cache usage
Produce analysis plug-ins with capabilities specific to your application

Tools
Garbage Collection and Memory Visualizer - http://www.ibm.com/developerworks/java/jdk/tools/gcmv/
Using Health Center will enable us to:
Monitor and fine tune Java heap size and garbage collection performance
Check for memory leaks
Size the Java heap correctly
Select the best garbage collection policy
Dump Analyzer - http://www.ibm.com/developerworks/java/jdk/tools/dumpanalyzer/
Dump Analyzer will help us in quickly diagnose typical problems such as:
Out of memory
Deadlocks
Java Virtual Machine (JVM) or Java Native Interface (JNI) crashes
IBM Thread and Monitor Dump Analyzer for Java - http://www.alphaworks.ibm.com/tech/jca
Analyzes each thread information and provides diagnostic information, such as current thread
information, the signal that caused the javacore, Java heap information (maximum Java heap
size, initial Java heap size, garbage collector counter, allocation failure counter, free Java heap
size, and allocated Java heap size), number of runnable threads, total number of threads, number
of monitors locked, and deadlock information


Tools
Diagnostics Collector - http://www.ibm.com/developerworks/java/jdk/tools/diagnosticscollector/
Use the Diagnostics Collector to:
Automatically capture diagnostic information associated with a problem event.
Avoid having to use the jextract tool to obtain platform specific information associated with the system
dump
Reduce manual work to collect dumps
Save searching for Java dumps
Allow easier management of dump files
Capture problem context information as well as dump files.
Avoid ulimit problems and overlooked settings that disable dumps


Java Internals
Sun JDK 1.5 - Tools

Tools
If the application crashes because of an application or JRE bug, these are the options and tools that can be used to obtain
additional information (either at the time of the crash or later using information from the crash dump)


Tools
Tools that can help in scenarios involving a hung or deadlocked process


Tools
Tools that can help in Monitoring


Tools
Other Tools & Options


10
8



2008, Cognizant Technology Solutions. Confidential
Memory Leak Graph - Sample
The below GC graph depicts the pattern for a memory leak (Tool used : GCViewer)


Java Internals
Native OS Tools

Native Tools - Linux
Native Tools - Windows
Native Tools - Windows
Native Tools - Solaris
Java Internals
Appendix

Appendix
SUN JDK 1.5
For complete set of JVM parameters refer to
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
Reference:
http://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html
IBM JDK 1.6
Reference:
http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/index.jsp?topic=/com.ibm.java.doc.diagnostics.
50/diag/preface/jvm_meaning.html

Вам также может понравиться