Troubleshooting E1 Kernels-1

Troubleshooting E1 Kernels
Including:
Types of Kernel Problems
Kernel Error Troubleshooting Procedure
Getting and Using an OS Core File
OS Tools for Obtaining a call Stack from a running code
Copyright Oracle 2011. All rights reserved
[i]
Table of Contents
TABLE OF CONTENTS ............................................................................................................................................................ II

CHAPTER 1 - INTRODUCTION .............................................................................................................................................. 1
Intended Audience
Structure of this Document
Related Materials
CHAPTER 2 - TYPES OF KERNEL PROBLEMS ................................................................................................................. 3

Hung Kernel with Low CPU
Hung Kernel with High CPU
Zombie Process / Zombie Kernel
Out of Memory Kernel / Memory Leak Kernel
CHAPTER3 - KERNEL ERROR TROUBLESHOOTING PROCEDURE ........................................................................... 4

General Troubleshooting Philosophy
Troubleshooting Procedure Identify Product Area of Problem
Interactive Problems
Enterprise Server Problem / Batch Problem
Batch Problem
CHAPTER 4 - ZOMBIE KERNELS ........................................................................................................................................ 8

Call Object Kernels (COBK)
Metadata Kernel
12
CHAPTER 5 - HUNG KERNELS WITH HIGH CPU ......................................................................................................... 13

CHAPTER 6 - HUNG KERNELS WITH LOW CPU .......................................................................................................... 14
Is a Package Deployment Currently Underway?
14
Troubleshooting Low-CPU Hung Kernels
14
CHAPTER 7 - OUT OF MEMORY / MEMORY LEAK KERNELS................................................................................. 15

Memory Leaks
15
Overly-Aggressive Caching
15
Troubleshooting Out-of-Memory Issues
15
[ii]
5/18/2011
APPENDIX A VALIDATION AND FEEDBACK ............................................................................................................... 17

Customer Validation
17
Field Validation
17
APPENDIX B GLOSSARY .................................................................................................................................................... 18

APPENDIX C GETTING AND USING AN OS CORE FILE ............................................................................................ 19
Windows
19
AS400 iSeries
27
UNIX
29
HP ............................................................................................................................................................................................ 30
LINUX ..................................................................................................................................................................................... 31
AIX........................................................................................................................................................................................... 31
SUN .......................................................................................................................................................................................... 32
APPENDIX D OS TOOLS FOR OBTAINING A CALL STACK FROM RUNNING CODE ........................................ 33
Unix
33
Windows
33
AS400
33
iii
5/18/2011
Chapter 1 - Introduction
JD Edwards EnterpriseOne Kernels consist of several types of processes. The process definitions can be found in JDE.INI. On
the enterprise server, two process name are registered, JDENET_N and JDENET_K. The JDENET_N process services
incoming and outgoing requests for the JDENET_K processes.
The number of JDENET_N processes needed on an EnterpriseOne server can be calculated based on the number of connections
and maximum number of net processes. For a detailed JDENET calculation, please refer to the document, JD Edwards
EnterpriseOne Tools #### System Administration Guide, where #### refers to the tools GA release. The calculation is
described in the section, Understanding the jde.ini File Settings, [JDENET].
E.g. The base guides for 898 are located here: http://download.oracle.com/docs/cd/E13780_01/jded/html/docset.html
The minimum and maximum numbers of each type of JDENET_K process are defined in JDE.INI. For each type of
JDENET_K kernel, there is a section titled [JDENET_KERNEL_DEF#] where # stands for 1, 2, etc. As of 8.97 tool release,
there are 32 JDENET_KERNEL_DEF definitions.
(Two new definitions, JDENET_KERNEL_DEF31 and
JDENET_KERNEL_DEF32, were introduced in 8.97, and they correspond to the XMLPublisher and Management Kernels
respectively.) For detailed definitions of the JDENET_K processes, please refer to the document, JD Edwards
EnterpriseOne Tools #### System Administration Guide, where #### refers to the tools GA release. The necessary
calculations are described in the section, Understanding the jde.ini File Settings, [JDENET_KERNEL_DEF#].
INTENDED AUDIENCE
This document is intended for use by three different groups: Customers, Consultants, and Oracle Global Customer Support
(GCS).
This document is primarily concerned with debugging kernel issues for tools releases prior to 8.98.3.0. Tools release 8.98.3.0
introduces several new utilities to aid in troubleshooting kernel issues. While the information in this document will still be
correct when applied to releases beyond 8.98.3.0, it provides only minimal coverage of the improved troubleshooting utilities
and methodologies that are available in newer tools releases.
STRUCTURE OF THIS DOCUMENT

This document provides guidance to self diagnose the Kernel Issues based on pre-KRM methodology (pre-898_2.0)
The KRM Documentation is present here:
OU Recording:http://oukc.oracle.com/static09/opn/login/?t=checkusercookies|r=-1|c=839298384
Documentation: https://support.oracle.com/CSP/main/article?cmd=show&id=1090646.1&type=NOT
Keep in mind that Oracle updates this document as needed so that it reflects the most current feedback we receive from the
field. Therefore, the structure, headings, content, and length of this document are likely to vary with each posted version. To see
if the document has been updated since you last downloaded it, compare the date of your version to the date of the version
posted on My Oracle Support.
RELATED MATERIALS
5/18/2011
We assume that our readers are experienced IT professionals, with a good understanding of JD Edwards EnterpriseOne. To
take full advantage of the information covered in this document, we recommend that you have a basic understanding of system
administration, basic Internet architecture, relational database concepts/SQL, and how to use Oracle JDEdwards applications.
This document is not intended to replace the documentation delivered with the CRM PeopleBooks. We recommend that before
you read this document, you read the PIA related information in the PeopleTools PeopleBooks to ensure that you have a wellrounded understanding of our PIA technology.
Note: Much of the information in this document will eventually be
incorporated into subsequent versions of the PeopleBooks.
Many of the fundamental concepts related to PIA are discussed in the following PeopleSoft PeopleBooks:
PeopleSoft Internet Architecture Administration (PeopleTools|Administration Tools|PeopleSoft Internet Architecture
Administration)
Application Designer (Development Tools|Application Designer)
Application Messaging (Integration Tools|Application Messaging)
PeopleCode (Development Tools|PeopleCode Reference)
Customers using tools release 8.98.3.0 or newer should also read KRM documentation for information on additional
troubleshooting techniques that are available to users of those releases as a supplement to the techniques described in this
document.
KRM Docs: OU Recording:http://oukc.oracle.com/static09/opn/login/?t=checkusercookies|r=-1|c=839298384
5/18/2011
Chapter 2 - Types of Kernel Problems

This document refers to several specific types of kernel issues that a customer may encounter. The most important categories of
kernel problems are explained below.
HUNG KERNEL WITH LOW CPU

Definition:
A hung kernel with low CPU refers to a kernel that has stopped functioning correctly but whose process continues to run with
very little CPU activity. Generally, this points to a root cause related to deadlock.
HUNG KERNEL WITH HIGH CPU

Definition:
A hung kernel with high CPU refers to a kernel that has stopped functioning correctly but whose process continues to run with
significant CPU activity. Generally, this points to a root cause related to an infinite loop.
ZOMBIE PROCESS / ZOMBIE KERNEL

Definition:
When an E1 server process crashes due to a programming error in some piece of code that it is running, the kernel stops
running from the perspective of the OS. The process is flagged as a zombie kernel within the E1 Enterprise Server, where some
of the process IPC data is saved in shared memory. The process is listed in Server Manager as a zombie process. There are
many potential causes of a zombie process, including but not limited to null or invalid pointer dereferences, heap memory
corruption, stack memory corruption, and race conditions.
OUT OF MEMORY KERNEL / MEMORY LEAK KERNEL

Definition:
An out of memory kernel is a kernel that has crashed because its memory footprint exceeded the maximum amount it is allowed
to allocate. Generally, this points to a memory leak or the caching of overly large quantities of data.
5/18/2011
Chapter3 - Kernel Error Troubleshooting Procedure
GENERAL TROUBLESHOOTING PHILOSOPHY

Oracle JD Edwards EnterpriseOne is a highly complex system with many interacting components. The remainder of this
chapter and the chapters that follow group similar problems together into a few broad categories and provide generalized
techniques to handle any problem in one of these categories. However, in many cases, a more specific troubleshooting
procedure may be necessary for a complex problem/issue.
Whenever a problem is encountered, the very first action on the part of the troubleshooter should be to examine any relevant
logfiles. Generally speaking, this means consulting jde_####.log, where #### is the Process ID (PID) of the relevant jdenet_k
and/or jdenet_n, and also jas.log. If there is a clear error message at or near the end of any of these logfiles, acting on that
message may be more efficient than following the procedure below.
Similarly, the procedure below is designed to guide a troubleshooter until he or she finds something that reveals the root cause
of the problem. If, at any point while following this procedure, the troubleshooter should find some clue to the root cause that
is too specific to be discussed below, he or she should go off-script and pursue that clue; if this search results in a dead-end,
the troubleshooter may resume the scripted procedure where he or she left off.
TROUBLESHOOTING PROCEDURE IDENTIFY PRODUCT AREA OF PROBLEM

There are several types of issues that can cause an E1 User to receive a time-out message or a Web-Exception. The following
sections provide a question-and-answer decision tree to help identify the root cause of the problem.
First the E1 admin needs to determine whether the problem is an Interactive Problem, an Enterprise Server Problem, or a Batch
Problem.
INTERACTIVE PROBLEMS
General:
1) Did the user receive a Web Exception with the following message, There was a problem with the server while running
business function <SomeFunctionName>?
Yes Continue
No
Go to Transaction Processing
2) Get the jas.log file.

a.
Search within in the jas logfile for the phrase, Associated kernel <####> not found, where <####> is the
process ID of the COBK.
b.
Does the jas logfile contain the above phrase?

Yes Continue
No
5/18/2011
3) Log in to SM and go to the Management Dashboard.

4) Select the Enterprise Server from the list of Managed Instances.
5) Select Runtime Metrics->Process Detail.
6) Does the process ID #### exist in the process detail list for the Enterprise Server?
Yes Continue
No
Go to COBK Zombies:
7) From SM, does the process ID #### (COBK) have a status of zombie?
Yes Continue
No
8) Is the process ID #### (COBK) the only kernel with a status of zombie?
Yes Go to COBK Zombies:
No
Go to Multiple COBK Zombies:
Transaction Processing:
1) Did the user receive a Transaction Rollback message?
Yes Go to Chapter 6 - Hung Kernels with Low CPU
No
Go to High CPU
High CPU:
1) Determine how much CPU the COBK process is using. Platform specific instructions follow: (Note that, beginning in
Tools Release 8.98.2.0, this information is also available from Server Manager in the Runtime Metrics->Process Detail
page for the Enterprise Server.)
a.
b.
Windows
i.
Launch Windows Task Manager. On the Performance tab, there is a graph showing overall CPU
activity.
ii.
To see CPU activity specific to the COBK process, first select the Processes tab.
iii.
Go to View->Select Columns and check the box for the PID column if it is not already enabled.
(The CPU Usage column should already be enabled, but if it is not, check that box as well.)
iv.
Click OK, and when you return to the table of processes, click on the PID column to sort by that value.
v.
Find the PID of the COBK, and check the value of the CPU Usage for that row.
AS/400 iSeries From the terminal, type the command wrkactjob. This will show a table of processes running
on that machine. If you know the name of the specific library/subsystem, you may view relevant processes only
via the command wrkactjob sbs(<E1_SYSTEM_LIBRARY>) where <E1_SYSTEM_LIBRARY> is the
appropriate library.
c.
5/18/2011
Unix SSH to the machine hosting the Enterprise Server and type the command top p <####> where <####>
is the Process ID (PID) of the COBK. Consult the %CPU column.
2) Is the COBK to which the user is connected using significant CPU?

Yes Go to Chapter 5 - Hung Kernels with High CPU
No
Continue to Memory Leaks.
Memory Leaks:
1) Answer yes if any of the following are true:
The processes memory usage keeps increasing
This can be observed by using any OS supplied Tool such as Perfmon in Windows or Glance in HP-UX , etc
The processes amount of allocated memory is already extremely large
An out-of-memory error has been observed.
Yes Chapter 7 - Out of Memory / Memory Leak Kernels

No
Continue to Metadata Kernel
Metadata Kernel:
1) Are there any Metadata Zombie Kernels listed in Server Manager?
Yes Go to Chapter 4 - Zombie Kernel::Metadata Kernel
No
Go to Chapter 4 - Zombie Kernel :: CallObject Kernels
ENTERPRISE SERVER PROBLEM / BATCH PROBLEM

1) Are there any outstanding requests for jdenet_k or jdenet_n from SM or NetWM? (If this is a UBE problem, or if this is a
multi-threaded kernel, answer no.)
Yes Go to Outstanding Requests.
No
Continue
2) Are there one or more COBK / RUNBATCH zombies?

Yes Go to Chapter 4 - Zombie Kernels COBK Zombies.
No
Continue
3) Is the process using a significant amount of CPU?

Yes Go to Chapter 5 - Hung Kernels with High CPU
No
Continue
4) Is the processes memory usage continuously and steadily increasing?
5/18/2011
Yes Go to Chapter 7 - Out of Memory / Memory Leak Kernels

No
Continue
5) Is the processes memory usage constant but extremely large?

Yes Go to Chapter 7 - Out of Memory / Memory Leak Kernels
No
Continue
6) Is the process otherwise hanging or not responding?

Yes Go to Chapter 6 - Hung Kernels with Low CPU
No
Continue
7) It appears you have a very unusual issue. Contact Oracle GCS with as much information as is available. Especially make
sure to include any of the following that are available:
a)
steps to reproduce the issue
b) jde_####.log for the kernel.

c)
jde_####.log for the kernels jdenet_n parent process.
d) jdedebug_####.log for the kernel

e)
jdedebug_####.log for the kernels jdenet_n parent process.
f)
dumpfile, core file, or callstack
g) jas log
h) java logs for enterprise server
Outstanding Requests
1) Is the number of processed requests increasing over time?
Yes The kernel is still processing requests, but it is unable to keep up with the rate at which new requests are coming
in, resulting in a backlog of queued operations. There may be a misconfiguration, or your hardware resources
may be insufficient to meet the demands of your userbase.
No
Continue
2) Observe the trend in the number of outstanding requests over time. Is the number increasing, decreasing, or constant?
Return to Step 2 of Enterprise Server Problem above, but include this information if you end up contacting Oracle GCS.
BATCH PROBLEM
Refer to the corresponding Knowledge Experts or Documentation in Batch Area
5/18/2011
Chapter 4 - Zombie Kernels

There are a myriad of programming errors that can cause a kernel to crash (resulting in a zombie kernel), including
but not limited to null or invalid pointer dereferences, heap memory corruption, stack memory corruption, and race
conditions. Furthermore, the crash may not occur until some time after the code containing the logic error executes.
The main focus of this chapter will be on localizing the crash to a specific business function (BSFN) containing the
error. Once the BSFN has been identified, the code can be examined for any programming errors.
CALL OBJECT KERNELS (COBK)

Determining the cause of the zombie status:
COBK Zombies:
1) Open the log file for the COBK/UBE to which the user is connected.
Prior to tools release 8.98.3.0, this file will be named jde_####.log, where #### is either the Process ID
(Windows and Unix) or the Job ID (iSeries) of the relevant COBK/UBE.
From tools release 8.98.3.0 onward you will be looking for a file with a name of the form jde_*_dmp.log.
(This file is created when a kernel crashes, and * represents the PID of the kernel and the timestamp of the
crash.)
2) Go to the end of the log file. Is there a call stack?

Yes Continue
No Go to JDENet Process Log
3) Does the call stack show the BSFN?
Yes Continue
No Go to JDENet Process Log
4) Can the issue be reproduced?
Yes Go to Reproducing the Issue.
No Continue to JDENet_N Parent Process Log
JDENET_N Parent Process Log
1) Obtain the jde_####.log where #### is the PID of the parent jdenet_n that spawned the zombie COBK/UBE. If
you need instructions on finding the file, consult Obtaining the logfile for the Parent JDENET_N Process.
2) Search the logfile for the keywords zombie and died. (If there are no hits on either search term, try searching
for the Process ID of the COBK/UBE.)
3) Is there a callstack associated with any of the search terms?
No Go to Getting an OS Core File.
Yes Continue
4) Does the call stack contain a BSFN?
5/18/2011
No Go to Getting an OS Core File.

Yes Continue
No Go to Multiple COBK Zombies.
Yes Continue to Reproducing the Issue.
Reproducing the Issue
1) Turn on dynamic debugging before reproducing the issue.
2) Can the issue be reproduced with debugging turned on?
No Go to Tool Release
Yes Continue
3) Go ahead and reproduce the problem with debugging on.
4) Open the resulting debug logfile (jdedebug_####.log) and scroll to the end of the file.
5) Search upwards for the string BSFNLevel this should tell you the last BSFN to run before the kernel crashed.
Continue to Trouble with a specific BSFN.
Trouble with a Specific BSFN
1) Is this a customized BSFN?
Yes Go to Trouble with Customized BSFN
No Continue
2) Is there an ESU for this BSFN?
Yes Apply the ESU. Generally, this will resolve the issue. If it persists go to Contacting Oracle GCS
No Go to Contacting Oracle GCS
Trouble Involving a Customized BSFN
1) Is it possible to try replacing the BSFN with the original code from the release?
Yes Continue.
No Consult with the developers who customized the BSFN for your purposes.
2) Try replacing the BSFN with the original code from the release. Does the problem disappear?
Yes Consult with the developers who customized the BSFN for your purposes.
No Continue
3) Is there an ESU for this BSFN?
Yes Continue
4) When the ESU is applied, does the problem go away?
5/18/2011
Yes You will need to merge the changes you made to the original BSFN into the version of the BSFN
supplied by the ESU.
Contacting Oracle GCS
1) Contact Oracle GCS with as much information as is available. Especially make sure to include any of the
following that are available:
a) the name of the BSFN
b) whether the BSFN is customized
c) whether there are any ESUs for the BSFN
d) what tools release is in use
e) steps to reproduce the issue
f)
jde_####.log for the kernel.
g) jde_####.log for the kernels jdenet_n parent process.

h) jdedebug_####.log for the kernel
i)
jdedebug_####.log for the kernels jdenet_n parent process.
j)
dumpfile, core file, or callstack
k) jas log
l)
java logs for enterprise server
Multiple COBK Zombies:

1) Open all of jde_####.log files for all jdenet_n parent processes. There are two ways to do this:
a) Option 1: If you have easy access to the machine hosting the Enterprise Server.
i)
On the hosting machine, navigate to the log folder for your Enterprise Server.
ii)
Grep (search within the text of these files) for the strings zombie and died.
iii) Open up any files that contain either of these expressions.

b) Option 2: If you have easy access to the Server Manager for your Enterprise Server.
i)
Log in to SM and go to the Management Dashboard.
ii)
Select the Enterprise Server from the list of Managed Instances.
iii) Select Runtime Metrics->Process Detail.

iv) Sort by Process Name.
v) For any jdenet_n (Network Listener) processes, click the link in the JDELOG File Size column for that
row to view the logfile.
2) In each jde_####.log for a jdenet_n, locate the Business Functions (BSFN) call stack.
3) Is there a pattern that one BSFN stands out more than the others in the call stack?
10
5/18/2011
Yes Continue
No Go to Consult the OS Core File
Yes Go to Reproducing the Issue
No Go to Consult the OS Core File
Check Tools Release
1) Is the customer on a supported release?
Yes Continue
No The customer should upgrade to a supported release or provide a compelling reason why this is not
possible.
2) Is the customer on the current release?
Yes Skip to step 4.
No Continue
3) Can the customer upgrade to the current release?
Yes The customer should upgrade to the current release and see if the problem is resolved. If the problem
persists, then continue.
No Continue
4) Is there a Solution Document or announcements document in My Oracle Support Knowledge base for the
customers issue?
Yes Follow the instructions in the document for resolving the issue.
No Go to Contacting Oracle GCS.
Obtaining the Logfile for the Parent JDENET_N Process.
1) If a COBK kernel has crashed, and there is no useful information in its log, there may be helpful information in
the logfile for the parent JDENET_N process. This section will provide instructions on obtaining the file.
2) Log in to Server Manager and go to the Management Dashboard.
3) Select your Enterprise Server from the list of Managed Instances.
4) Select Runtime Metrics->Process Detail.
5) Is the zombie COBK listed?
Yes Continue
No The list of zombies has already been cleared. Skip to step #10
6) Click the name (CALL OBJECT KERNEL) of the COBK that has crashed (the zombie COBK).
7) Under General Information, find Parent Process ID. Is the Parent PID non-zero?
Yes Continue
11
5/18/2011
No Skip to step #10

8) Return to the Runtime Metrics->Process Detail page, and find the JDENET_N process whose PID matches the
Parent PID. Click on the size of its log file (the entry under JDELOG File Size for that row) to view the logfile.
9) Return to JDENET_N Parent Process Log.
10) If there is more than one JDENET_N, you will have to find all JDENET_N logfiles and grep (search within the
text of these files) for the PID of the zombie COBK to determine the appropriate logfile.
If you have access to the machine hosting the Enterprise Server, the easiest way to do this is to connect to
that machine, navigate to the log folder for the Enterprise Server, and search within jde_*.log
Alternatively, the JDENET_N logfiles can be accessed one-at-a-time from the Runtime Metrics->Process
Detail page of Server Manager by clicking on the JDELOG File Size for each process that is a Network
Listener.
11) Once you have identified the correct logfile, return to JDENET_N Parent Process Log.
Consult the OS Core File
If it has proven impossible to obtain a (useful) callstack from any of the EntepriseOne log files, it may still be
possible to obtain a callstack from an OS-generated core file. If you are unfamiliar with generating and working with
OS core dumps on your platform, information on doing so is available in Appendix C Getting and Using an OS
Core File.
Once you have examined the callstack, if you can determine which BSFN is running at the time of the crash, go to
Trouble with a specific BSFN above.
If you cannot isolate a specific BSFN, you should consult Oracle GCS.
METADATA KERNEL
There are historical issues that exist with Metadata Kernel, particularly in terms of out-of-memory errors and UBEnot-processing errors. It is believed that these issues were all resolved by Tools Release 8.98.2.0.
If a customer is experiencing crashes of the Metadata Kernel, the customer should attempt to upgrade to a newer
tools release.
If the customer is already running a recent release, or an upgrade is not practical, the customer should contact
Oracle GCS. It will be helpful to Oracle GCS to have:
Any available logfiles for the kernel,
Steps to reproduce the issue,
A copy of the Java heap dump (see Enabling a Java Heap Dump).
Enabling a Java Heap Dump

To Enable a Java heap Dump is a JDK and OS specific set of instructions . Since better and more recent methods
are being created in a very rapid pace its best to contact the Kernel Support or Dev SMEs for the latest means to
create a Java Dump.
12
5/18/2011
Chapter 5 - Hung Kernels with High CPU

A non-responsive kernel with high-CPU has not crashed per se. While the kernel is no longer performing its
required duties, code continues to execute, most likely in some form of infinite loop. The first step in resolving this
issue is to identify where in the continued code the execution is taking place.
One can determine what code is running by examining a callstack. Since the kernel has not crashed in the sense of
encountering a fatal error, there will NOT be a callstack written out to a file. Instead, a callstack can be obtained
using OS tools such as procstac and cstack. These tools are discussed in Appendix D OS Tools for Obtaining a Call
Stack from Running Code. Note that customers running tools release 8.98.3.0 and beyond can obtain such a callstack
through Server Manager.
It is important to note that, while a high-CPU hung kernel is most likely engaged in some sort of infinite loop, that
loop will generally not be contained in the inner-most executing function of the callstack you obtain. Rather, the
inner-most functions are likely to be contained within the infinite loop. Therefore, it is necessary to repeat the
process of obtaining a callstack several (five to ten) times. The outermost entries in the callstack will remain the
same across all the callstacks collected while the innermost entries will vary. The infinite loop most likely resides at
the level of the inner-most function that is common to all of the collected callstacks.
13
5/18/2011
Chapter 6 - Hung Kernels with Low CPU

IS A PACKAGE DEPLOYMENT CURRENTLY UNDERWAY?
When a package is currently being deployed to the Enterprise Server, the kernels temporarily suspend normal operation,
mimicking the behavior of a hung kernel with low CPU usage. Generally, package deployments are fairly quick to complete,
but under certain circumstances, deployments can require extended time. Once the package deployment completes or times out,
normal kernel operations will resume.
If a package deployment is not underway, proceed to the next section.
TROUBLESHOOTING LOW-CPU HUNG KERNELS

Similar to a hung kernel with high-CPU, a non-responsive kernel with low-CPU has also not crashed in the traditional sense.
Although the kernel is no longer performing its required duties, code continues to execute, most likely in some form of
deadlock.
A program is said to be in deadlock when two or more operations are each waiting for the other to finish, creating a situation in
which neither operation ever completes and both wait forever. Though not technically deadlock, a situation with similar
symptoms can arise when a single operation is waiting to obtain a lock on a resource, but that lock was not properly released
when a previous operation finished using the resource.
While UBE kernels are not multi-threaded, it is important to note that they are not immune from deadlock. Two separate UBE's
executing simultaneously (or, more likely, the same UBE being executed multiple times simultaneously) can compete for locks
on shared resources and end up in deadlock
As in the previous chapter, the first step in resolving this issue is to identify where in the code the execution is. One can
determine what code is running by examining a callstack. Since the kernel has not crashed in the sense of encountering a fatal
error, there will NOT be a callstack written out to a file. Instead, a callstack can be obtained using OS tools such as procstac
and cstack. The tools are discussed in Appendix D OS Tools for Obtaining a Call Stack from Running Code. Note that
customers running tools release 8.98.3.0 and beyond can obtain such a callstack through Server Manager.
After obtaining a call stack for all low-CPU hung kernels, the troubleshooter should examine the executing code to identify
what resource locks are currently held and what locks are pending. The troubleshooter should then study the remainder of the
code to determine where else these locks are obtained / released, and where the logical flaw resides.
14
5/18/2011
Chapter 7 - Out of Memory / Memory Leak Kernels

MEMORY LEAKS
Generally speaking, a kernel suffering from a memory leak is discovered after it has crashed. The kernel crashes when a
memory allocation attempt fails because the process has reached its maximum allowed memory.1 Sometimes examining the
callstack at the time of the crash can indicate where this failed memory allocation occurred, but that may or may not provide
useful information. Often, the failed memory allocation is merely the unrelated victim of a programming error elsewhere in the
code that prevents no-longer-needed memory from being recycled.
OVERLY-AGGRESSIVE CACHING
An out-of-memory error does not necessarily imply the existence of a memory leak per se. Misuse of the JDB cache is a
common source of out-of-memory errors. The JDB cache can be used to store the result of a frequent database query in
memory for improved performance. However, if the cache is used too liberally with large tables, free memory will fill up with
JDB cache entries.
Overly-aggressive caching can be an issue with call object kernels, but it more often causes problems in batch jobs, simply due
to the much higher volume of data batch jobs generally manipulate. If an out of memory error is encountered, the
troubleshooter should investigate what information is being stored in the JDB cache and verify that no unreasonably large
queries are being cached.
There are two ways that a query result may be stored in the JDB cache.
1.
If the table over which the query is made has been registered in the F98613 table, then the query result will be placed
in the JDB cache. To check which tables' queries are being cached through this method, examine the F98613 table.
2.
A BSFN can use the JDB_AddTableToDBCache API to have a table's query results added to the cache. To check
whether this has happened, debug logging must be enabled, and the debug log should be searched for the messages of
the form:
Entering JDB_AddTableToDBCache (Table =<E1 table name>)
Small, unchanging tables such as company constants are prime candidates for caching in the JDB cache. Except in very
unusual circumstances, tables containing business data should never be cached.
TROUBLESHOOTING OUT-OF-MEMORY ISSUES

If an out-of-memory error does not appear to be related to overly-aggressive caching, the best way to troubleshoot a kernel that
is running out of memory is to recreate the issue while using a memory profiling tool such as Purify, Valgrind, or Pex.
(Customers using tools release 8.98.3.0 and beyond have the additional options of using BMD or Jade.). Memory profiling
tools such as these will show the user what memory has been allocated and never been freed (reclaimed).
Even when there is plentiful total free memory, an attempt to allocate a large block of memory will still fail if there is no
adequately large block of contiguous free memory
15
5/18/2011
It is important to note that using any of the above profiling tools will incur a heavy performance penalty. If it is at all possible,
this should be done on a non-production server.
16
5/18/2011
Appendix A Validation and Feedback

This section documents that real-world validation that this Document has received.
CUSTOMER VALIDATION
Oracle is working with PeopleSoft customers to get feedback and validation on this document. Lessons learned from these
customer experiences will be posted here.
FIELD VALIDATION
Oracle Consulting has provided feedback and validation on this document. Additional lessons learned from field experience
will be posted here.
17
5/18/2011
Appendix B Glossary
Term
Definition
BSFN
Business Function
COBK
Call Object Kernel
E1
Oracle JD Edwards EnterpriseOne
ESU
Electronic Software Update
GCS
Global Customer Support
MDK
Metadata Kernel
PID
Process Identifier (Process ID)
SAR
Software Action Request
SM
Server Manager
NetWM
Network Work Management standalone utility shipped with Enterprise Server

that shows queues, outstanding requests, etc.
Callstack
A list of currently executing functions organized hierarchically to show parent

(caller) to child (callee) relationships
UBE
Universal Batch Engine
OS
Operating System
Infinite Loop
A program is said to be in an infinite loop when it continues to execute the same

section of code repeatedly forever.
Deadlock
A program is said to be in deadlock when two or more operations are each

waiting for the other to finish, creating a situation where neither operation ever
completes and both wait forever. While not technically deadlock, a situation with
similar symptoms can arise when a single operation is waiting to obtain a lock on
a resource and that lock was not properly released when a previous operation
finished with the resource.
Management Dashboard
The entry page to Server Manager (SM). The page has the title Managed
Homes and Managed Instances and can be reached by clicking a link in the
upper left corner of most SM pages.
18
5/18/2011
Appendix C Getting and Using an OS Core File

In Tools Release 8.98.3.0, several new features were added to streamline the debugging of kernel issues. This document is
primarily intended for users of Tools Releases in the 8.98.2 family and earlier. Users of Tools Release 8.98.3 and beyond will
find a simpler, platform independent set of instructions in the document, The KRM Documentation is present here:
OU Recording:http://oukc.oracle.com/static09/opn/login/?t=checkusercookies|r=-1|c=839298384
This chapter provides instructions for obtaining a call stack and a dump file on the following platforms:
Window Server
AS400 - iSeries
UNIX
WINDOWS
Pre-requisite This is for the Window platform only
1) Machine should have Debugging tools for windows installed, In this is not installed please download and install from
following url:
http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx
PS: The above package will install windbg, please note the path of windbg.exe we will use this to capture the crash dump.
2) Have the customer download this version:
Current Release version 6.11.1.402 - February 6, 2009

Install 32-bit version 6.6.7.5 [15.2 MB]
Steps to install UserDump:

1. Download Site (version 8.1)
http://www.microsoft.com/downloads/details.aspx?FamilyID=E089CA41-6A87-40C8-BF6928AC08570B7E&displaylang=en&displaylang=en
a) Click Download
Copyright 2011 Oracle, Inc. All rights reserved.
19
5/18/2011
b) Click Run
c) After the download completed, a new folder, C:\kktools\userdump8.1, will be created.
2. Setup
http://support.microsoft.com/kb/241215
a) In C:\kktools\userdump8.1\x86, click setup.exe

b) A folder C:\WINDOWS\system32\kktools will be created after the setup.
3. Capturing E1 COBK
a) Go to Control Panel->Process Dumper
20
5/18/2011
b) Click New
21
5/18/2011
c) enter: jdenet_k.exe and click OK
22
5/18/2011
d) Click Rules:
23
5/18/2011
e) Select Use custom rules

- Point the Dump file folder to the folder is easily accessible.
Make sure the folder exist
- Keep all the setting as seen.

- Check the Kill process after dumping
- Click OK
24
5/18/2011
f) Optional: (unless instructed)

1) Check All Exceptions OR
2) Select specific exceptions
i) Access violation
ii) Array bounds exceeded
iii) Stack Overflow
iv) Invalid handle
v) Overflow
vi) Stack Check
g) Click Apply or OK
25
5/18/2011
Getting Page Heap: (Optional)

http://support.microsoft.com/kb/267802
1. From the command line, go to the drive where the Debugging Tools for Window folder is installed.
2. From the command line:
>gflags /p /enable runbatch.exe /full

/full = full page heap, this will use a lot memory and resources.
3. Targetting specific dll

>gflags /p /enable jdenet_k.exe /dlls callbsfn.dll cruntime.dll
4. From the GUI interface of GFLAGS.

a) Go to Start All Programs
b) Debugging Tools for Window Global flags
c) Click on Image File tab page
d) Enter an executable name and TAB OUT - DO NOT HIT ENTER
26
5/18/2011
- check the options as seen
e) To remove the settings, follow instruction 4a thru 4d but uncheck all options
AS400 ISERIES
When a C2M1211 or C2M1212 message is generated from a single-level store heap routine, the code checks for a *DTAARA
named QGPL/QC2M1211 or QGPL/QC2M1212. If the data area exists, the program stack is dumped. If the data area does not
exist, no dump is performed.
Setup data area to capture call stack for C2M1212 heap error message.
27
5/18/2011
CRTDTAARA DTAARA(QGPL/QC2M1212) TYPE(*CHAR) LEN(1)
Setup data area to capture call stack for C2M1211 heap error message.
To setup C2M1211 data area will require SI27412 and SI28640 PTF ON V5R4.
Once the data area is in place, a spool file named QPRINT is created (this we can read to figure out which tools, apps or OS
API is causing the memory overwrite) with dump information for every C2M1211 message or C2M1212 message (this may be
something IBM can read).
The spool file is created for the user running the job that gets the message. For example, if the job getting the C2M1211
message or C2M1212 message is a server job or batch job running under userid ABC123, then the spool file is created in the
output queue for userid ABC123. Once the spool files containing stack tracebacks are obtained, the data area can be removed,
and the tracebacks analyzed.
To disable the dumps, delete the data area(s).
For further information please read Diagnosing and Debugging Memory Problems : C2M1211 and C2M1212 Messages from
IBM website.
When a C2M1211 message or C2M1212 message is generated from a teraspace heap routine, the code checks for a *DTAARA
named QGPL/QC2M1211 or QGPL/QC2M1212. If the data area exists and contains at least 50 characters of data, a 50
character string is retrieved from the data area. If the string within the data area matches one of the following strings, special
behavior is triggered.
_C_TS_dump_stack
_C_TS_dump_stack_vfy_heap
_C_TS_dump_stack_vfy_heap_wabort
_C_TS_dump_stack_vry_heap_wsleep
If the data area does not exist, no dump or heap verification is performed. For further information please read
Enablement for teraspace heap memory managers from IBM website.
Here is an example of how to create a data area to indicate to call _C_TS_malloc_debug to verify the heap whenever a
C2M1211 message or C2M1212 message is generated:
On IBM i 6.1 (with PTF SI33945) and IBM i 7.1 you can use following information to the data area.
VALUE('_C_TS_dump_stack_vfy_heap_wabort')
VALUE('_C_TS_dump_stack_vfy_heap_wabort')
This will re-validate the heap, if it detects memory corruption and will abort the job.
28
5/18/2011
Caution : this should be used in a test environment as this can start throwing lot of errors/exceptions and with abort option
you will see more zombie process.
UNIX
1) In the JDE.INI config file, under the [JDENET] section, set the following: HandleKrnlSignals=0 and krnlCoreDump=1.
This will cause a core file to be dumped, provided the operating system allows it.
2) If the Oracle client is being used to connect to an Oracle database, log in as the oracle userid that owns the Oracle Client
install. Add the following line to the $ORACLE_HOME/network/admin/sqlnet.ora file:
DIAG_SIGHANDLER_ENABLED=false
3) Next, you must ensure that the operating system allows the creation of core files.
a) On the command line type the command: ulimit -c. This will show the current maximum size for core files.
b) If the size is 0 (or very small), then no core file will be created.
c) To change the size for the core file, on the command line, type: ulimit -c <####> where <####> is the size in
bytes
d) Confirm the ulimit change by rerunning ulimit -c on the command line. If the value from step c above is not
displayed, the hard limit may need to be raised by the root user. Changes to the /etc/security/limits
e) If E1 Enterprise Server services are to be started from the command line using RunOneWorld.sh, start the E1
Enterprise Server services from a login session where ulimit -c <####> was run. The ulimit command has to be run
for each new login session on the server that is used to run the RunOneWorld.sh script. If the E1 Enterprise Server
needs to be stopped and restarted often, adding the ulimit -c <####> command to the bottom of the
$SYSTEM/bin32/toolsenv.sh script will ensure the ulimit command is run each time a new login session is opened.
f) If the E1 Enterprise Server is to be stopped and restarted remotely via Server Manager, the Server Manager client on
the Enterprise Server must be restarted from a login session where ulimit -c <####> has been run. Run the ulimit
command, then goto the jde_home/bin directory and run the command: restartAgent
g) Test that core files are being created properly by selecting a jdenet_k process-PID and run the following command:
kill -15 <process-PID> This should generate a core file.
4) When the core file is generated, the core file has the same name in the $SYSTEM/bin32 directory, unless the operation
system is actively managing core file names and locations. The server may already be configured to put all core files in a
central location. If so, the server may be reconfigured, or the core files can be copied to the $SYSTEM/bin32 directory to
be read. Option to generate the core file with the unique name.
a)
On Sun Solaris, put the coreadm command in the user profile:

coreadm -p core.%f.%p $$
The above command will generate the core file with the following format name:
core.<executable_name>.<process_ID>
b) On Linux, log in as root and edit the /etc/sysctl.conf file and add the following line:
kernel.core_uses_pid = 1
Anytime the /etc/sysctl.conf file is changed, the root user must run the following command to make the change effective
immediately: sysctl -p Once this is run, every new login session will get the new settings. Stop and restart E1 following
the directions in step 3e or 3f.
c)
If no other core naming options are available, create a script to detect the core file and rename it. See the following for
example. Run the script from the $SYSTEM/bin32 directory in the background with nohup using this command:
nohup rename_core &
29
5/18/2011
rename_core script sample

#!/bin/ksh
# This script just hangs around waiting for a core file to appear, and if
# one does, renames it to a name based on the current date and time.
while true
do
sleep 30
if [ -f core ]
then
cname="core.$(date +%Y%m%d%H%M%S)"
echo renaming core to $cname
mv core $cname
done
5) Once the core files are captured, the core files must be opened at the customer site to get the call stack.
6) Which platform the customer is using?
HP
LINUX AIX
SUN
HP
1) Do you know what executable create the core file? Yes No
2) On the command line type:
file <core filename>
3) The above command will give you the executable name to be used in the Get HP Callstack (#4)
Get HP Callstack
4) Getting the callstack
Command line:
>gdb <prog name> <core file name>
Example:
>gdb jdenet_k core.xxxx.12345
Once the core file is open, do the following
>info thread
>thread #
>where
>quit

This will give you a list of threads that were created within jdenet_k process.
Open thread number
List the callstack within that thread #
Exit gdb
30
5/18/2011
LINUX
Linux core files generally must be read on the same server they were created. Displaying the core file on a different server can
produce incorrect output.
3) The above command will give you the executable name to be used in the Get Linux Callstack (#4)
Get Linux Callstack

Command line:
>gdb <prog name> <core file name>
Example:
>gdb jdenet_k core.12345
Once the core file is open, do the following
>info thread
>thread #
>where
>quit

This will give you a list of threads that were created within jdenet_k process.
Open thread number
List the callstack within that thread #
Exit gdb
There is some optional information that can be collected along with the stack:
show charset Show the effective character set when the process crashed.
show environment Show the environment variables when the processed crashed.
AIX
3) The above command will give you the executable name to be used in the Get AIX Callstack (#4)
Get AIX Callstack

Command Line:
dbx prog <core_file_name>
This will bring up the dbx command, the user has to hit enter or return key several time
>where List the callstack
31
5/18/2011
SUN
1) Simply type the following in the command line:
Command Line:
pstack <core_file_name>
This will list the callstack
32
5/18/2011
Appendix D OS Tools for Obtaining a Call Stack from Running Code

Following Procstack/ Pstack command is to be used when a process is either hung or running on CPU with high usage. Please
note that this should be used on Systems which are pre-898_3x as in 898_3.x and beyond the same call stacks can be obtained
from CPU Diagnostics in Server manager (simply press the CPU Diagnostics in Server Manager.)
Caution: This document may contain information, software, products, services which are not supported by Oracle Support
Services and are being provided as is without warranty. Please refer to the following site for My Oracle Support Terms of
use: https://support.oracle.com/CSP/ui/TermsOfUse.html
UNIX
Following should be run on various Unixes to dump call stacks:
HP- UX : /usr/ccs/bin/pstack <pid>
AIX: /usr/bin/procstack <pid>
SUN: /usr/bin/pstack <pid>
LINUX: /usr/bin/pstack <pid>
More information on Procstack can be found on the following IBM link for Prockstack Command.
WINDOWS
Use ADPlus tool to collect the call stack information on Windows platform. For more information on how to use the tool,
follow the link from Microsoft on How to use ADPlus to troubleshoot "hangs" and "crashes
AS400
The process below can be used to retrieve the program stack for a job with a single thread or the first thread of a multithreaded
job.
cmd: ADDLIBLE E900SYS

cmd: SAW | Option 2 Work with Server Processes | Option 3 Display OneWorld
Processes
Copyright 2011 Oracle, Inc. All rights reserved.
33
5/18/2011
34
5/18/2011
The following creates a spool file contaiing the program stack(call Stack)
Cmd: DSPJOB JOB(072347/ONEWORLD/JDENET K) OUTPUT(*PRINT) OPTION(*PGMSTK)
The following creates a spool file containing the program stack (call stack)
cmd: DSPJOB JOB(072347/ONEWORLD/JDENET_K) OPTION(*PGMSTK)
35
1.
5/18/2011
Create a library and output queue to move the previously generated spool file items.
cmd: CRTLIB JDETEMP

cmd: CRTOUTQ JDETEMP/JDETEMP
2.
Copy the items found in output queue WRKOUTQ JDETEMP/JDETEMP via iSeries Navigator to a local Windows folder.
a.
Expand the host name node. Login to the system. Expand the Basic Operations node. Right-hand click on
Printer Output highlight Customize this View and select Include.
36
5/18/2011
Change the Users value to All. Type JDETEMP/JDETEMP in the Output queues field as shown below.
37
b.
5/18/2011
Highlight all of the spool files found in the right-hand window pane. Click Ctrl-C (to copy) and paste these
files into a local Windows Explorer folder, e.g. SND2DENVER.
38

Troubleshooting E1 Kernels-1

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Troubleshooting E1 Kernels-1

Загружено:

Авторское право:

Доступные форматы

Troubleshooting E1 Kernels

Copyright Oracle 2011. All rights reserved

TABLE OF CONTENTS ............................................................................................................................................................ II

Structure of this Document

CHAPTER 2 - TYPES OF KERNEL PROBLEMS ................................................................................................................. 3

Hung Kernel with High CPU

Zombie Process / Zombie Kernel

Out of Memory Kernel / Memory Leak Kernel

CHAPTER3 - KERNEL ERROR TROUBLESHOOTING PROCEDURE ........................................................................... 4

Troubleshooting Procedure Identify Product Area of Problem

Enterprise Server Problem / Batch Problem

CHAPTER 4 - ZOMBIE KERNELS ........................................................................................................................................ 8

CHAPTER 5 - HUNG KERNELS WITH HIGH CPU ......................................................................................................... 13

Troubleshooting Low-CPU Hung Kernels

CHAPTER 7 - OUT OF MEMORY / MEMORY LEAK KERNELS................................................................................. 15

Troubleshooting Out-of-Memory Issues

Copyright Oracle 2011. All rights reserved

APPENDIX A VALIDATION AND FEEDBACK ............................................................................................................... 17

APPENDIX B GLOSSARY .................................................................................................................................................... 18

Copyright Oracle 2011. All rights reserved

STRUCTURE OF THIS DOCUMENT

Copyright Oracle 2011. All rights reserved

KRM Docs: OU Recording:http://oukc.oracle.com/static09/opn/login/?t=checkusercookies|r=-1|c=839298384

Copyright Oracle 2011. All rights reserved

Chapter 2 - Types of Kernel Problems

HUNG KERNEL WITH LOW CPU

HUNG KERNEL WITH HIGH CPU

ZOMBIE PROCESS / ZOMBIE KERNEL

OUT OF MEMORY KERNEL / MEMORY LEAK KERNEL

Copyright Oracle 2011. All rights reserved

Chapter3 - Kernel Error Troubleshooting Procedure

GENERAL TROUBLESHOOTING PHILOSOPHY

TROUBLESHOOTING PROCEDURE IDENTIFY PRODUCT AREA OF PROBLEM

2) Get the jas.log file.

Does the jas logfile contain the above phrase?

Copyright Oracle 2011. All rights reserved

3) Log in to SM and go to the Management Dashboard.

Go to Multiple COBK Zombies:

Copyright Oracle 2011. All rights reserved

2) Is the COBK to which the user is connected using significant CPU?

Continue to Memory Leaks.

The processes memory usage keeps increasing

The processes amount of allocated memory is already extremely large

An out-of-memory error has been observed.

Yes Chapter 7 - Out of Memory / Memory Leak Kernels

Continue to Metadata Kernel

Go to Chapter 4 - Zombie Kernel :: CallObject Kernels

ENTERPRISE SERVER PROBLEM / BATCH PROBLEM

2) Are there one or more COBK / RUNBATCH zombies?

3) Is the process using a significant amount of CPU?

4) Is the processes memory usage continuously and steadily increasing?

Copyright Oracle 2011. All rights reserved

Yes Go to Chapter 7 - Out of Memory / Memory Leak Kernels

5) Is the processes memory usage constant but extremely large?

6) Is the process otherwise hanging or not responding?

steps to reproduce the issue

b) jde_####.log for the kernel.

jde_####.log for the kernels jdenet_n parent process.

d) jdedebug_####.log for the kernel

jdedebug_####.log for the kernels jdenet_n parent process.

dumpfile, core file, or callstack

Copyright Oracle 2011. All rights reserved

Chapter 4 - Zombie Kernels

CALL OBJECT KERNELS (COBK)