Вы находитесь на странице: 1из 4

ALERT: Hang During Startup/Shutdown on Unix When System Uptime > 248 Days [ID 11

8228.1]
Modified 21-OCT-2005 Type ALERT Status PUBLISHED

This alert was modified:


30-August-2000 by adding the Q&A section.
31-August-2000 by removing 7.x from the Versions Affected.
Q&A section updated with new information regarding
affected platforms and product versions.
21-Sep-2000 by adding reference to BUG 1399885
25-Sep-2000 by adding a Q&A on corruption and revision of the
affected platforms Q&A.
28-Nov-2000 to add more affected platforms.
23-Mar-2001 to add information about Fujitsu Siemens
20-Apr-2001 Added Sun Solaris Intel as known to be affected.
16-May-2001 Added information on similar DG/UX bug.
HANG DURING STARTUP OR SHUTDOWN
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Versions Affected
~~~~~~~~~~~~~~~~~
Oracle Server and Oracle Enterprise Server 8.0.X through 8.1.6 inclusive
Platforms Affected
~~~~~~~~~~~~~~~~~~
UNIX GENERIC
Description
~~~~~~~~~~~
Due to Oracle Bug:1084273 a timer overflow will cause background
processes to loop indefinitely. A timer overflow occurs when the number
of clock ticks exceeds the positive representation of datatype in which
the value is being stored.
Likelihood of Occurrence
~~~~~~~~~~~~~~~~~~~~~~~~
Operating systems with clock ticks set to milliseconds will see this
problem after 24.8 days but typically systems have clock ticks set
to centiseconds which means the problem would not be seen for 248 days.
Possible Symptoms
~~~~~~~~~~~~~~~~~
After the number of clock ticks since the machine was lasted rebooted
overflows you will not be able to shutdown or startup the affected
Oracle RDBMS products. If you operating system provides a system call
trace such as Sun's truss utility you can check for the behavior.
% truss -af -o <output_file> -p <pid_of_pmon_process>
If you see output similar to the following you are looping and may be
experiencing this problem:
24369: semop(720897, 0xEFFFE7A0, 1) (sleeping...)
24448: Received signal #14, SIGALRM, in semop() [caught]
24448: semop(720897, 0xEFFFDAA8, 1) Err#91 ERESTART
24448: sigprocmask(SIG_BLOCK, 0xEFFFD6C0, 0x00000000) = 0
24448: times(0xEFFFD650) = -2117821797
24448: setitimer(ITIMER_REAL, 0xEFFFD650, 0x00000000) = 0
24448: sigprocmask(SIG_UNBLOCK, 0xEFFFD6C0, 0x00000000) = 0
24448: setcontext(0xEFFFD790)
24377: semop(720897, 0xEFFFE7A0, 1) (sleeping...)
24398: Received signal #14, SIGALRM, in semop() [caught]
24398: semop(720897, 0xEFFFE7A0, 1) Err#91 ERESTART
24398: sigprocmask(SIG_BLOCK, 0xEFFFE3B8, 0x00000000) = 0
24398: times(0xEFFFE348) = -2117821686
Please note that the above function calls are normal. The IMPORTANT
point is that the "times" call is returning a very small negative value.
It is also important to understand that negative values returned by
times is not the problem but how the Oracle timer checks against it is
the issue. It is normal to see negative values returned by "times"
on systems that have been up over 248 days.
Any running instances will need to be aborted with a "shutdown abort"
before the system is shutdown.

Questions & Answers


~~~~~~~~~~~~~~~~~~~~~
Q. Can this bug cause database corruptions?
A. No. This bug does NOT cause database corruptions. It causes a hang. To
resolve the hang, reboot the system. Do not restore the database from a
backup or recreate the controlfile in an attempt to resolve the hang.
Q. What about the reports of controlfile corruption?
A. In an attempt to resolve a hang on ALTER DATABASE MOUNT, a user recreated
the controlfile of their database. Even though this allowed the MOUNT to
proceed, it didn't solve the hang at later points. There was nothing wrong
with the controlfile. After recreating it, the MOUNT process followed a
path which did not result in a hang. This is NOT the correct workaround
for this problem, and may result in loss of valuable metadata. See the
Workaround section below for the correct workaround.
Q. Which platforms are known to be affected by this bug?
A. This bug is known to affect:
Sun Solaris SPARC platform, 32bit and 64bit releases
Fujitsu UXP/DS
NEC EWS4800/UP4800
Fujitsu-Siemens RM200-600 & RM600E Reliant
(OS versions less then 5.45)
Sun Solaris V2 Intel
We have checked the following platforms and found that they are NOT
affected by this bug:
IBM AIX/SP (uses post/wait driver),
HPUX (uses gettimeofday),
Compaq Tru64 (uses gettimeofday),
Linux (uses gettimeofday).
Platforms not mentioned here have not been verified yet.
There is a very similar problem on DG/UX 4.11mu05 and 4.20mu06
on both Intel and Motorola processors where the times() call
gets stuck and continually returns the same value after 248 days
machine uptime. This is an OS bug which produces the same symptoms
as the Oracle bug described in this alert except that the times()
calls in the truss output keep returning 2147483647.
Contact Data General for details of this issue.
Q. Does this problem affect Oracle7?
A. This problem does not affect Oracle7 on any platform.
Q. How do I know whether my system will be impacted 24 days or 248 days
after reboot?
A. On Solaris, the command "/usr/bin/getconf CLK_TCK" will return the
number of clock ticks per second. If the number is 1000, the system is
impacted 24 days after reboot. If the number is 100, the system is
impacted 248 days after reboot.
Another way to determine the clock ticks of your system is to use the
command:
"truss -tsysconfig time true"
On Solaris this will show:
sysconfig(_CONFIG_CLK_TCK) = 1000
if your system clock ticks 1000 times per second.
Q. How do I know whether my system is close to being impacted?
A. Use the uptime command to determine how long the system has been
running. Based on the number of clock ticks per second, you will be able
to tell when the system will be impacted. Alternatively, you can see the
current return value of the times system call using this command:
"truss -ttimes time true"
On Solaris this will show:
times(0xEFFFFC20) = 962090
This system was rebooted 962.09 seconds ago. After the return value of
the times() system call reaches 2147483647, it will wrap and become
-2147483648. When times() returns a negative value, your system has been
impacted.
Q. How do I know that the patch fixed the problem?
A. When your system has passed the impact date, without the patch
installed, shutdown abort a test database on the system, and try to
start it up. If the startup hangs during the mount phase, you have
encountered the problem. Install the patch on the test database, and
retry the startup. It should now startup fine.
Q. Can the fix for 8.0.6.0 also applied for 8.0.6.1 ?
A. Yes. This patch is for the CORE library, which is not part of the
patchsets.

Q. Can the fix for 8.1.6.0 also applied for 8.1.6.1 and 8.1.6.2 ?
A. Yes. This patch is for the CORE library, which is not part of the
patchsets.
Workaround
~~~~~~~~~~
The fix for this issue is incorporated into the 8.1.7 and newer
releases of Oracle Server and Oracle Enterprise Server.
If a patch is not available for your operating system or Oracle
version, see the patch list in the next section, the workaround
is to do the following:
1) Stop all running databases on the server
You may have to issue a SHUTDOWN ABORT command.
2) Reboot the server
Follow the normal OS procedure to reboot the server, do not simply
power off the machine.
This will reset the timer and start the number of ticks back to "0".
Patches
~~~~~~~
As of September 25, 2000 there are four patches available for Sun Solaris
32 bit:
Please Note that some of these BUGs may not be viewable via MetaLink.
8.0.5: BUG:1400358
8.0.6: BUG:1265297
8.1.5: BUG:1400327
8.1.6: BUG:1227119

As of September 25, 2000 there is one patch available for Sun Solaris
64 bit:
8.1.6: BUG:1399885
Fujitsu Siemens RM600 Reliant Unix - See Bug:1504291
The issue is fixed by and OS change in OS release 5.45 onwards.
There is a patch available from Fujitsu Siemens for OS 5.43.
The patch (EKSNAME) for this is SIY3C390+ which at the time of
writing is available from
http://its.sni.de/lobs/its/its_sc/eks_en/index_en.htm

References
~~~~~~~~~~
DATABASE HANGS AFTER 24 DAYS, LOOPING ON SEMOP CALL Bug:1084273
@ DO NOT USE TIMES() RETURN VALUE Bug:1185824

Вам также может понравиться