Вы находитесь на странице: 1из 45

Memory Diagnosibility TOI

Mike Buckley
Platforms TRE Sun Microsystems

Goals
Improve Customer Satisfaction by reducing time to resolution through increased technical proficiency Reduce the incidence of inaccurate Onsite Action Plans and wrong parts ordered Replace memory dimms and other parts only when necessary Correctly identify proper dimm size and part number Accurately diagnose various memory issues Save SUN money $$$

Sun Proprietary/Confidential: Internal Use Only

Topics
Types of memory errors Sun's Best Practices regarding memory errors (review) Dimm size and part number identification techniques Diagnostic tools and utilities (some new ones) Troubleshooting tips and techniques Examples of error messages and tool usage Known memory issues (PLL dimm chip) Resources

Sun Proprietary/Confidential: Internal Use Only

Because ECC can correct single bit flips, single bit errors are referred to as Correctable Errors. These are detected and corrected and generally do not impact performance. CE: Correctable (single bit) errors Types of CE's: Intermittent Persistent Sticky Bit

Multi-bit errors are referred to as Uncorrectable Errors. These are detected, but not corrected. These will result in machine reset (panic or reboot). UE: Uncorrectable (multi bit) errors
Sun Proprietary/Confidential: Internal Use Only

Correctable Errors (CE):

When a CE is detected, the device that reads the word and detected the error can correct the data read and continue on unimpeded. However, this does not address the fact that the referenced word could still be resident in memory uncorrected (i.e. a subsequent read of this word could result in another CE event). If, over time, this word in memory is never corrected the possibility starts to arise that another bit may flip in the same word. This would lead to a UE event which will result in a loss of system service. To avoid this possibility, the detection of a CE causes a trap to Solaris. The Solaris error handling code logs the error and scrubs the affected memory word by writing the corrected word back into memory.

Sun Proprietary/Confidential: Internal Use Only

Intermittent: Means the error was not detected on a reread of the affected memory word. "Intermittent" is not the best choice of words because it implies that this same error can be expected to manifest itself at irregular intervals. This CE is also known as a transient soft error. No DIMM with this sort of error should be considered for replacement without first examining the soft error rate (SER) of this DIMM. Persistent: Means the error was detected again on a re-read of the affected memory word but the scrub operation corrected it. This CE is also known as a temporary soft error. No DIMM with this sort of error should be considered for replacement without first examining the SER of this DIMM. Sticky (aka Sticky Bit): Means that the error still exists in memory even after the scrub operation. This CE is also known as a stuck-at hard error. No DIMM with this sort of error should be considered for replacement without first examining the SER of this DIMM.
Sun Proprietary/Confidential: Internal Use Only

# cat messages | grep -i memory May 24 16:07:34 smro97 SUNW,UltraSPARC-III+: [ID 631608 kern.info] [AFT0] errID 0x00055be6.99821550 Corrected Memory Error on /N0/SB3/P2/B0/D2 J15500 is Persistent May 24 16:07:34 smro97 SUNW,UltraSPARC-III+: [ID 631608 kern.info] [AFT0] errID 0x00055be6.99821550 Corrected Memory Error on /N0/SB3/P2/B0/D2 J15500 is Persistent May 24 16:12:40 smro97 SUNW,UltraSPARC-III+: [ID 910566 kern.info] [AFT0] errID 0x00055c2d.d4cf4320 Corrected Memory Error on /N0/SB3/P2/B0/D2 J15500 is Sticky May 24 16:12:40 smro97 SUNW,UltraSPARC-III+: [ID 910566 kern.info] [AFT0] errID 0x00055c2d.d4cf4320 Corrected Memory Error on /N0/SB3/P2/B0/D2 J15500 is Sticky May 24 16:12:40 smro97 unix: [ID 752700 kern.warning] WARNING: [AFT0] Sticky Softerror encountered on Memory Module /N0/SB3/P2/B0/D2 J15500 May 24 16:12:40 smro97 unix: [ID 752700 kern.warning] WARNING: [AFT0] Sticky Softerror encountered on Memory Module /N0/SB3/P2/B0/D2 J15500
Sun Proprietary/Confidential: Internal Use Only

Here is an example of a memory errors which does not involve a CPU module. A PCI controller was reading data from memory.
May 17 18:45:01 j2kweb06 unix: WARNING: correctable error from pci0 (upa mid 1f) during dvma read transaction May 17 18:45:01 j2kweb06 unix: AFSR=40f10000.9f800000 AFAR=00000000.4fbc1e60, May 17 18:45:01 j2kweb06 <U0402> port id 31. double word offset=4, Memory Module

If this is just a single event then there is nothing to worry about. Basically, there was simply a correctable ECC event on a read from memory. The only difference between the "normal" CE events is that this one happened to be detected by the PCI controller (since it was doing the read) instead of a CPU. A single CE event is nothing to worry about. That's the reason for having ECC protected memory. The system is doing its job and functioning normally.
Sun Proprietary/Confidential: Internal Use Only

Uncorrectable Errors (UE): If a UE is detected, the device that read the word and detected the error cannot correct the data and continue. A UE will cause Solaris to panic if the UE is in kernel memory, or kill of the particular user process that contained the memory in error and an then issue an orderly shutdown and reboot to protect the other processes in the domain. Either way, whether via panic or shutdown and reboot, the customer is considerably impacted (and will likely call for support).

Sun Proprietary/Confidential: Internal Use Only

Memory Scrubber: The Solaris OS runs a memory "scrubber" routine as part of its normal operation. The time interval is 12 hours for scrubbing stale (unused, idle) memory pages. This scrubber does not do anything special besides ensure that every memory location is accessed at least once every 12 hours. If the access finds a CE, then the normal trap to the Solaris OS that occurs for any CE will scrub the affected memory word by writing the corrected word back into memory and log the event. This ensures that multiple CEs do not have time to build up and form a UE at memory locations that are infrequently accessed. Correctable memory errors reported EXACTLY every twelve hours are a result of the Memory scrubber. (see Infodoc 74049: How often does the memory scrubber run? The normal rules apply: a DIMM should only be replaced if it meets the criteria described in the Sun DIMM Replacement Policy. Infodoc 79928 Sun Enhanced Memory DIMM Replacement Policy.
Sun Proprietary/Confidential: Internal Use Only

BADWRITERS: 1. Sometimes multiple memory DIMMs within a system can start reporting soft errors. Examining the messages may reveal that the same databit (or error syndrome) is in error on each DIMM. This indicates that some other component is actually writing the bad data to RAM and consistently creating errors at the same bit address, regardless of the physical DIMM. Recognizing this pattern, and troubleshooting further can prevent much wasted downtime and cost, and the replacing of perfectly good memory DIMMs. 2. When a DIMM is replaced and the errors persist, or return with the same data bit in error, some other component in the system is likely causing the memory errors. Again, recognizing this possibility can head off assumptions that replacing memory will solve the problem. 3. In terms of CE/ECC, a system may only reveal errors when the failing address range is utilized by a particular application or combination of applications. This is almost always a hardware fault. In very rare instances, bad code may generate errors that appear to be hardware. A good first step when troubleshooting a reproducible CE memory issue is to first isolate or disable the suspect memory component(s) via asr-disable, setenv disabled-memory-list X, setenv disabledboard-list X(all under OBP), psradm -f X or cfgadm (under OS). If disabling the suspect memory components is not possible, it may be advisable (especially on lower-end machines) to swap the suspect DIMM with another DIMM in the same bank. If the problem follows the DIMM, replace it. If the problems persist in the same location, it is not a bad DIMM issue. Note: FINDAFT is especially useful when diagnosing Bad Writer scenarios, look for a common CPU (the one implicated more than other CPU's) to be possible Bad Writer.
Sun Proprietary/Confidential: Internal Use Only

Sun's Enhanced Sparc/Solaris DIMM Replacement Policy


Note: The rules detailed in this Policy apply to the following architectures: UltraSPARC II, UltraSPARC III, UltraSPARC IV, UltraSPARC IV+ and T1 Systems. Replace a DIMM when: 1. POST (when run at a level which actually tests memory) fails it. 2. For systems with Predictive Self-Healing (Solaris 10 and later, except on UltraSPARC II-based platforms), when the system tells you to. 3. For all UltraSPARC II-based systems and all other systems without Predictive Self-Healing (Solaris 9 and earlier), whenever Solaris reports a UE or DUE, and investigation shows that the UE or DUE truly originated from memory, and not from a transfer from some CPU's cache, as determined by a qualified Sun Support specialist. 4A. For all UltraSPARC II-based systems and all other systems without Predictive Self-Healing (Solaris 9 and earlier), whenever Solaris reports two or more CEs from two or more different physical addresses on each of two or more different bit positions from the same DIMM within 24 hours of each other, and all the addresses are in the same relative checkword (that is, the AFARs are all the same module 64). [Note: This means at least 4 CEs; two from one bit position, with unique addresses, and two from another, also with unique addresses, and the lower 6 bits of all the addresses are the same.]
Sun Proprietary/Confidential: Internal Use Only

4B. For all UltraSPARC II-based systems and all other systems without Predictive Self-Healing (Solaris 9 and earlier), whenever Solaris reports two or more CEs from two or more different physical addresses on each of three or more different outputs from the same DRAM within 24 hours of each other, as long as the three outputs do not all correspond to the same relative bit position in their respective checkwords. [Note: This means at least 6 CEs; two from one DRAM output signal, with unique addresses, two from another output from the same DRAM, also with unique addresses, and two more from yet another output from the same DRAM, again with unique addresses, as long as the three outputs do not all correspond to the same relative bit position in their respective checkwords.] 5. For Solaris 8 and 9 systems with page retirement (Solaris 8, patch level 108528-24 or later; Solaris 9, patch level 112233-11 or later), as well as for UltraSPARC II-based systems running Solaris 10 and later, when the system indicates that the page retirement limit of 0.1% of physical memory has been reached and denotes one and only one DIMM as suspect (i.e., it has accumulated 130 or more non-intermittent CEs). If more than one DIMM is marked as suspect, then other possible causes of CEs have to be ruled out by a qualified Sun Support specialist before replacing any DIMMs. [Note: Determining these factors is aided by the CEDIAG diagnostic tool set.] In the unlikely event that the system indicates that the page retirement limit has been reached but no DIMM is marked as suspect, contact a Sun Support specialist for assistance in determining any necessary action. Example:
connole 73 =>uname -a SunOS connole 5.9 Generic_112233-12 sun4u sparc SUNW,Ultra-5_10
Sun Proprietary/Confidential: Internal Use Only

6. For older Solaris releases and patch levels, when Solaris reports more than 24 non-intermittent CEs in 24 hours from a single DIMM. If more than one DIMM has experienced more than 24 non-intermittent CEs in 24 hours, then other possible causes of CEs have to be ruled out by a qualified Sun Support specialist before replacing any DIMMs. Limitations: Prior to Solaris 10, retired pages are returned to service whenever a system is rebooted, and will be re-retired if and when Solaris encounters CEs from them again. POST may fail a DIMM that contained retired pages; if it does, replace the DIMM at that time. ----------------------------------------------(end of official policy)------------------------------------Note: Exceptions MAY be made to the Policy in the interest of Customer Satisfaction. Consult with your lead, backline or manager if necessary. When making exception, always make note of that in case notes. Example: Advised customer of Sun's Enhanced Memory Dimm Replacement Policy and suggested that they employ the cediag utility. Referenced Infodocs 79928 & 82264 which explain more about Sun's Enhanced Memory DIMM Replacement Policy and the recommended CEDIAG utility. Customer declined to follow recommendations and insists upon dimm replacement.

Sun Proprietary/Confidential: Internal Use Only

Identifying the correct dimm size / part number Variables that need to be known: dimm size dimm type (speed) dimm quantity (some dimms are always replaced in pairs, eg: V440) Useful utilities to identify dimm size: prtdiag -v /usr/platform/sun4u/sbin/prtdiag -v prtfru -x output (applies to newer machines) POST diagnostic output (when available) memconf utility (now able to be run against Explorer output) showfru ALOM command, displays all FRU info Depending upon the machine platform the prtdiag output may report only the total memory installed, the physical bank size, the logical bank size, or the actual dimm size
Sun Proprietary/Confidential: Internal Use Only

Sun Microsystem machine platforms have varying memory layouts.


Some have ALL the memory dimms installed on a single, common system board (aka motherboard). Examples: most Sun Desktop machines and E250, E450, 280R, V210 & V240 Some have half of the dimms located on the system board and half on a Memory Riser Board. Specifically: Ultra 80 / Enterprise 420R / Netra t 1400/1405 Others machines use Mezzanine Memory modules. Specifically: Netra t / ct 400/800 / SPARCengine CP1500 Some have multiple CPU memory boards, each comprised of CPU modules AND memory dimms. Examples: older machines like E3500/4500/5500/6500 and newer ones like V480, V880 & V440 The examples listed above are by no means all inclusive. When in doubt ALWAYS refer to the online Sun System Handbook. There you may also find helpful notes regarding: Minimum memory dimm slot population requirements Memory dimm / bank installation order Whether dimms must be installed as matched pairs etc...
Sun Proprietary/Confidential: Internal Use Only

E250 Prtdiag output example (partial)


e250-hw 41 =>/usr/platform/sun4u/sbin/prtdiag -v System Configuration: Sun Microsystems sun4u Sun (TM) Enterprise 250 (2 X UltraSPARC-II 296MHz) System clock frequency: 99 MHz Memory size: 1792 Megabytes (total amount of memory installed in system) ========================= CPUs ========================= Run Ecache CPU CPU Brd CPU Module MHz MB Impl. Mask --- --- ------- ----- ------ ------ ---SYS 0 0 296 2.0 US-II 2.0 SYS 1 1 296 2.0 US-II 2.0 ========================= Memory ========================= Interlv. Socket Size Bank Group Name (MB) Status ---- ----- ------ ---- -----0 none U0701 64 OK 0 none U0801 64 OK 0 none U0901 64 OK 0 none U1001 64 OK 1 none U0702 128 OK 1 none U0802 128 OK 1 none U0902 128 OK 1 none U1002 128 OK 2 none U0703 128 OK 2 none U0803 128 OK 2 none U0903 128 OK 2 none U1003 128 OK 3 none U0704 128 OK 3 none U0804 128 OK 3 none U0904 128 OK 3 none U1004 128 OK

(64 meg dimm)

(128 meg dimm)

Each dimm is shown individually 4 Banks of memory 3 Banks of 128 meg dimms, 1 Bank of 64 meg dimms
Sun Proprietary/Confidential: Internal Use Only

E4500 prtdiag (excerpt)


System Configuration: Sun Microsystems sun4u 8-slot Sun Enterprise E4500/E5500 System clock frequency: 100 MHz Memory size: 12288Mb ========================= CPUs ========================= Run Ecache CPU Brd CPU Module MHz 0 0 2 2 4 4 5 0 1 4 5 8 9 10 0 1 0 1 0 1 0 400 400 400 400 400 400 400 MB --- --- ------- ----- ------ ------ ---8.0 US-II 8.0 US-II 8.0 US-II 8.0 US-II 8.0 US-II 8.0 US-II 8.0 US-II 10.0 10.0 10.0 10.0 10.0 10.0 10.0 CPU Impl. Mask

========================= Memory ========================= Intrlv. Intrlv. Brd Bank MB 0 0 2 2 4 4 5 5 0 1 0 1 0 1 0 1 Status Condition Speed Factor With OK OK OK OK OK OK OK OK 60ns 60ns 60ns 60ns 60ns 60ns 60ns 60ns 4-way 8-way 4-way 8-way 4-way 8-way 4-way 8-way A B A B A B A B Board 0 / Bank 0 (8 dimms per bank) 2048 / 8 = 256 meg dimms Board 0 / Bank 1 (8 dimms per bank) 1024 / 8 = 128 meg dimms --- ----- ---- ------- ---------- ----- ------- ------2048 Active 1024 Active 2048 Active 1024 Active 2048 Active 1024 Active 2048 Active 1024 Active

Sun Proprietary/Confidential: Internal Use Only

memconf is a perl script that reports the size of each SIMM/DIMM memory module that is installed in a Sun system. It also reports the system type and any empty memory sockets. In verbose mode, it also reports: * banner name, model, and CPU/system frequencies * address range and bank numbers for each module External url (for customers) http://www.sunfreeware.com/ http://myweb.cableone.net/4schmidts/memconf.html Usage: memconf [ -v | -D | -h ] [ explorer_dir ] -v verbose mode -D send results to memconf maintainer -h print help explorer_dir Sun Explorer output directory
Sun Proprietary/Confidential: Internal Use Only

Notice that the prtdiag output from this Ultra 10 shows only the TOTAL memory installed. NOT how many dimms or which size.
# prtdiag -v System Configuration: Sun Microsystems sun4u Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 360MHz) System clock frequency: 90 MHz Memory size: 1024 Megabytes ========================= CPUs ========================= Run Ecache CPU Brd CPU Module MHz 0 0 0 360 MB --- --- ------- ----- ------ ------ ---0.2 12 9.1 ========================= IO Cards ========================= Bus# Freq Brd Type MHz Slot Name 0 PCI-1 33 0 PCI-1 33 0 PCI-1 33 0 PCI-1 33 1 ebus 1 network-SUNW,hme 2 SUNW,m64B 3 ide-pci1095,646.1095.646.3 ATY,GT-C Model --- ---- ---- ---- -------------------------------- ---------------------CPU Impl. Mask

No failures found in System ========================= HW Revisions ========================= ASIC Revisions: --------------Cheerio: ebus Rev 1 System PROM revisions: ---------------------OBP 3.31.0 2001/07/25 20:36 POST 3.1.0 2000/06/27 13:56 Sun Proprietary/Confidential: Internal Use Only

connole 167 =>./memconf hostname: connole Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 360MHz) Memory Interleave Factor = 2-way socket DIMM1 has a 256MB DIMM socket DIMM2 has a 256MB DIMM socket DIMM3 has a 256MB DIMM socket DIMM4 has a 256MB DIMM empty sockets: None total memory = 1024MB (1GB) connole 168 =>./memconf -v memconf: V1.65 13-Feb-2006 http://www.4schmidts.com/unix.html hostname: connole banner: Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 360MHz) model: Ultra-5_10 Sun development name: Darwin/Otter (Ultra 5), Darwin/SeaLion (Ultra 10) Solaris 9 4/04 s9s_u6wos_08a SPARC, 64-bit kernel, SunOS 5.9 1 UltraSPARC-IIi 360MHz cpu, system freq: 90MHz CPU Units: ========================= CPUs ========================= Run Ecache CPU Brd CPU Module MHz 0 0 0 360 MB --- --- ------- ----- ------ ------ ---0.2 12 9.1 Memory Units: Memory Interleave Factor = 2-way socket DIMM1 has a 256MB DIMM (bank 0L, address 0x00000000-0x0fffffff, 0x20000000-0x2fffffff) socket DIMM2 has a 256MB DIMM (bank 0H, address 0x00000000-0x0fffffff, 0x20000000-0x2fffffff) socket DIMM3 has a 256MB DIMM (bank 1L, address 0x10000000-0x1fffffff, 0x30000000-0x3fffffff) socket DIMM4 has a 256MB DIMM (bank 1H, address 0x10000000-0x1fffffff, 0x30000000-0x3fffffff) empty sockets: None total memory = 1024MB (1GB) CPU Impl. Mask (verbose mode)

Ultra 10 memconf examples

Sun Proprietary/Confidential: Internal Use Only

System Configuration: Sun Microsystems sun4u Sun Enterprise 420R (4 X UltraSPARC-II 450MHz) System clock frequency: 113 MHz

420R prtdiag

Memory size: 4096 Megabytes


========================= CPUs ========================= Run Ecache CPU Brd CPU Module MHz MB CPU Impl. Mask

(only TOTAL memory reported)

--- --- ------- ----- ------ ------ ---0 0 0 0 0 1 2 3 0 1 2 3 450 450 450 450 4.0 US-II 4.0 US-II 4.0 US-II 4.0 US-II 10.0 10.0 10.0 10.0

========================= IO Cards ========================= Bus Freq Brd Type MHz Slot Name Model

--- ---- ---- ---- -------------------------------- ---------------------0 PCI 0 PCI 0 PCI 0 PCI 0 PCI 0 PCI 0 PCI 0 PCI 0 PCI 33 33 33 33 33 33 33 33 33 0 SUNW,qfe-pci108e,1001 1 network-SUNW,hme 1 SUNW,qfe-pci108e,1001 2 fibre-channel-pci10df,f800.10df.+ 2 SUNW,qfe-pci108e,1001 3 scsi-glm/disk (block) 3 scsi-glm/disk (block) 3 SUNW,qfe-pci108e,1001 4 fibre-channel-pci10df,f800.10df.+ SUNW,pci-qfe Symbios,53C875 Symbios,53C875 SUNW,pci-qfe SUNW,pci-qfe SUNW,pci-qfe

========================= HW Revisions ========================= ASIC Revisions: --------------PCI: pci Rev 4 PCI: pci Rev 4 Cheerio: ebus Rev 1

System PROM revisions: ---------------------OBP 3.31.0 2001/07/25 20:35 POST 1.2.8 2000/08/22 19:50

Sun Proprietary/Confidential: Internal Use Only

connole 170 =>memconf /home/mbuckley/Explorers/64834462_420R/explorer.80e8b7a9.njocsprd2-2005.12.05.17.04 hostname: njocsprd2 Sun Explorer directory: /home/mbuckley/Explorers/64834462_420R/explorer.80e8b7a9.njocsprd2-2005.12.05.17.04 Sun Enterprise 420R (4 X UltraSPARC-II 450MHz) socket U0301 has a 256MB DIMM socket U0302 has a 256MB DIMM socket U1301 has a 256MB DIMM socket U1302 has a 256MB DIMM socket U0401 has a 256MB DIMM socket U0402 has a 256MB DIMM socket U1401 has a 256MB DIMM socket U1402 has a 256MB DIMM socket U0303 has a 256MB DIMM socket U0304 has a 256MB DIMM socket U1303 has a 256MB DIMM socket U1304 has a 256MB DIMM socket U0403 has a 256MB DIMM socket U0404 has a 256MB DIMM socket U1403 has a 256MB DIMM socket U1404 has a 256MB DIMM empty sockets: None total memory = 4096MB (4GB) WARNING: Layout of memory sockets not completely recognized on this system. The memory configuration displayed should be correct though since this is a fully stuffed system. This is a known bug due to Sun's 'prtconf', 'prtdiag' and 'prtfru' commands not providing enough detail for the memory layout of this SunOS 5.8 SUNW,Ultra-80 system to be accurately determined. This is a bug in Sun's OBP, not a bug in memconf. The latest release (OBP 3.33.0 2003/10/07) still has this bug. This system is using OBP 3.31.0 2001/07/25 20:35 Sun Proprietary/Confidential: Internal Use Only (individual dimm size reported)

V880 POST output (excerpt)


Probing Memory............ Probing CPU0 memory configuration NGDIMM#0 part# 501-5030-03 serial# 235446, NGDIMM#1 part# 501-5030-03 serial# 235457, NGDIMM#2 part# 501-5030-03 serial# 241586, NGDIMM#3 part# 501-5030-03 serial# 241589, NGDIMM#4 part# 501-5030-03 serial# 241581, NGDIMM#5 part# 501-5030-03 serial# 241579, NGDIMM#6 part# 501-5030-03 serial# 241573, NGDIMM#7 part# 501-5030-03 serial# 241577, Probing CPU1 memory configuration NGDIMM#0 part# 501-5030-03 serial# 241516, NGDIMM#1 part# 501-5030-03 serial# 241522, NGDIMM#2 part# 501-5030-03 serial# 241601, NGDIMM#3 part# 501-5030-03 serial# 241507, NGDIMM#4 part# 501-5030-03 serial# 243281, NGDIMM#5 part# 501-5030-03 serial# 243486, NGDIMM#6 part# 501-5030-03 serial# 241594, NGDIMM#7 part# 501-5030-03 serial# 241588,

256MB + 256MB, 256MB + 256MB, 256MB + 256MB, 256MB + 256MB, 256MB + 256MB, 256MB + 256MB, 256MB + 256MB, 256MB + 256MB, 256MB + 256MB, 256MB + 256MB, 256MB + 256MB, 256MB + 256MB, 256MB + 256MB, 256MB + 256MB, 256MB + 256MB, 256MB + 256MB,

SC#0 (512 meg dimm) SC#0 SC#0 SC#0 SC#0 SC#0 SC#0 SC#0 SC#0 SC#0 SC#0 SC#0 SC#0 SC#0 SC#0 SC#0

SubTool output: Part#: 501-5030 Desc: FRU,ASSY,SDRAM,DIMM,512MB Category: Boards Is a FRU but has no substitutable parts.
Sun Proprietary/Confidential: Internal Use Only

Prtfru -x output from V880: <Location name="dimm-slot?Label=J8001"> <Container name="dimm-module"> <ContainerData> <Segment name="SD"> <ManR> <UNIX_Timestamp32 value="Mon Mar 3 19:39:25 MST 2003"/> <Fru_Description value="256 MB NG SDRAM DIMM"/> <Manufacture_Loc value="ONYANG,KOREA"/> <Sun_Part_No value="5015401"/> <Sun_Serial_No value="A4663A"/> <Vendor_Name value="Samsung"/> <Initial_HW_Dash_Level value="03"/> <Initial_HW_Rev_Level value="50"/> <Fru_Shortname value="DIMM"/> </ManR> <Fru_Type value="256 MB DIMM"/> <DIMM_R> <DIMM_Speed value="75"/> <DIMM_Size value="256"/> </DIMM_R> </Segment> </ContainerData> </Container> <!-- dimm-module --> </Location> <!-- dimm-slot?Label=J8001 -->
Sun Proprietary/Confidential: Internal Use Only

SubTool output: Part#: 501-5401 Desc: FRU,ASSY,SDRAM,DIMM,256MB,18X8MX16 Category: Boards Is a FRU but has no substitutable parts.

Showfru is a commandline prtfru -x summary script available online from: http://pts-appl-z1.holland/showfru.html From the commandline showfru needs to be run on Solaris 10 FCS or later, where the XML perl modules are installed by default. The Showfru script aims to provide a concise summary of FRU data from a prtfru -x output This allows quick identification of FRUs installed and depending on the platform other additional information is available. NOTE: Please link to the script rather than taking a private copy. ################################################################## Latest version 0.74 /net/cores.uk/export/hotline/hotlocal/bin/showfru Report bugs, RFEs or if you have questions email doug.baker@sun.com Further info from http://pts-platform/twiki/bin/view/Tools/ToolPageShowfru ################################################################### Non RoHS example: http://gmpweb.uk/~db124859/showfru/v240_mixed_dimm_sizes.html The script only runs on Solaris 10 and above so if you are stuck on a Solaris 9 sunray use the online version: http://pts-appl-z1.holland/showfru.html Further details and example outputs here: http://pts-platform/twiki/bin/view/Tools/ToolPageShowfru More on ROHS: http://sunsolve2.central.sun.com/handbook_internal/Systems/commondocs/RoHS_Communication.html#meaning
Sun Proprietary/Confidential: Internal Use Only

$ /net/cores.uk/export/hotline/hotlocal/bin/showfru prtfru_-x.out ################################################################################ FRU part and serial number info, use -v for install date and vendor ################################################################################ MB PS0 IFB PS1 MB.P0.B0.D0 MB.P0.B0.D1 MB.P0.B1.D0 MB.P0.B1.D1 MOTHERBOARD PS CHASSIS PS 1 GB 1 GB 1 GB 1 GB 375-3346 RoHS H00ORF

300-1846 RoHS 005530 371-0796 RoHS E2JB13 300-1846 RoHS 005529

################################################################################ SPD DIMM info - FRU, vendor name, vendor part and serial number ################################################################################ MB.P0.B0.D0 MB.P0.B0.D1 MB.P0.B1.D0 MB.P0.B1.D1 Infineon (formerly Siemens) 72D128320GBR6C Infineon (formerly Siemens) 72D128320GBR6C Infineon (formerly Siemens) 72D128320GBR6C Infineon (formerly Siemens) 72D128320GBR6C 0403E910 0403EA10 0403EA12 0409FD27

Sun Proprietary/Confidential: Internal Use Only

sc> showfru FRU_PROM at PS0.SEEPROM Manufacturer Record Timestamp: TUE JUL 01 19:53:52 UTC 2003 Description: P/S,SSI MPS,680W,HOT PLUG Manufacture Location: DELTA ELECTRONICS THAILAND Sun Part No: 3001501 Sun Serial No: T00541 Vendor: Delta Electronics Initial HW Dash Level: 06 Initial HW Rev Level: 50 Shortname: A42_PSU FRU_PROM at C0.P0.B0.D0.SEEPROM Timestamp: MON JUN 02 12:00:00 UTC 2003 Description: SDRAM DDR, 512 MB Manufacture Location: Vendor: Samsung Vendor Part No: M3 12L6420DT0-CA2 FRU_PROM at C0.P0.B0.D1.SEEPROM Timestamp: MON JUN 02 12:00:00 UTC 2003 Description: SDRAM DDR, 512 MB Manufacture Location: Vendor: Samsung Vendor Part No: M3 12L6420DT0-CA2

Sun Proprietary/Confidential: Internal Use Only

The Findaft script, aims to provide a concise summary of AFT, CPU and PCI ECC errors found in the Solaris Operating System /var/adm/messages files. This summary can then used to assist in diagnosing a customers' hardware fault. Note: Findaft is Sun Internal only and cannot be sent to customers. Provides a concise summary of all CPU/Memory/PCI/ECC errors found in the messages. (Makes an ideal case note or start point for an SGR template.) Assists with identification of memory UE errors. Features highlighting of E-Cache events. Directs TSE's towards Best Practices, when to do "nothing". Features highlighting of Datapath faults. Helps to identify the true, root cause of errors. Helps to prevent mis-diagnosis which could result in "wrong" parts being replaced.
Sun Proprietary/Confidential: Internal Use Only

Findaft is a standalone perl script, the latest version is runnable from here: /net/cores.uk/export/hotline/hotlocal/bin/findaft (always use latest available versions of tools) Or downloadable from here: http://gmpweb.uk/~db124859/findaft/ Findaft is always a good starting step to troubleshooting and diagnosing memory issues. Read the docs that findaft suggests, these will usually assist diagnosis. Reference: Infodoc 80270: "Findaft an AFT, CPU, Memory and PCI ECC error message summary script" http://pts-platform/twiki/bin/view/Tools/ToolPageFindaft Alias is available to provide tool support: findaft-interest@sun.com

Sun Proprietary/Confidential: Internal Use Only

May 28 22:43:55 cht1ds004 SUNW,UltraSPARC-II: [ID 862595 kern.info] [AFT0] Corrected Memory Error detected by CPU0, errID 0x0015bd43.7f9abdc2 May 28 22:43:55 cht1ds004 May 28 22:43:55 cht1ds004 May 28 22:43:55 cht1ds004 AFSR 0x00000000.00100000<CE> AFAR 0x00000000.55955a30 AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0xff07b2b4 UDBH Syndrome 0x58 Memory Module U0302

May 28 22:43:55 cht1ds004 SUNW,UltraSPARC-II: [ID 339206 kern.info] [AFT0] errID 0x0015bd43.7f9abdc2 Corrected Memory Error on U0302 is Intermittent May 28 22:43:55 cht1ds004 SUNW,UltraSPARC-II: [ID 368593 kern.info] [AFT0] errID 0x0015bd43.7f9abdc2 ECC Data Bit 31 was in error and corrected May 28 22:43:55 cht1ds004 SUNW,UltraSPARC-II: [ID 748639 kern.info] [AFT0] Corrected Memory Error detected by CPU0, errID 0x0015bd43.7fa13b30 May 28 22:43:55 cht1ds004 May 28 22:43:55 cht1ds004 May 28 22:43:55 cht1ds004 AFSR 0x00000000.00100000<CE> AFAR 0x00000000.55955a30 AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0xff07b2ac UDBH Syndrome 0x58 Memory Module U0302

May 28 22:43:55 cht1ds004 SUNW,UltraSPARC-II: [ID 233778 kern.info] [AFT0] errID 0x0015bd43.7fa13b30 Corrected Memory Error on U0302 is Intermittent May 28 22:43:55 cht1ds004 SUNW,UltraSPARC-II: [ID 315879 kern.info] [AFT0] errID 0x0015bd43.7fa13b30 ECC Data Bit 31 was in error and corrected May 28 22:43:55 cht1ds004 SUNW,UltraSPARC-II: [ID 712106 kern.info] [AFT0] Corrected Memory Error detected by CPU0, errID 0x0015bd43.7fa59597 May 28 22:43:55 cht1ds004 May 28 22:43:55 cht1ds004 May 28 22:43:55 cht1ds004 AFSR 0x00000000.00100000<CE> AFAR 0x00000000.55955a30 AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0xff07b2b4 UDBH Syndrome 0x58 Memory Module U0302

May 28 22:43:55 cht1ds004 SUNW,UltraSPARC-II: [ID 957346 kern.info] [AFT0] errID 0x0015bd43.7fa59597 Corrected Memory Error on U0302 is Intermittent May 28 22:43:55 cht1ds004 SUNW,UltraSPARC-II: [ID 356654 kern.info] [AFT0] errID 0x0015bd43.7fa59597 ECC Data Bit 31 was in error and corrected May 28 22:43:55 cht1ds004 SUNW,UltraSPARC-II: [ID 585299 kern.info] [AFT0] Corrected Memory Error detected by CPU0, errID 0x0015bd43.7fa993cf May 28 22:43:55 cht1ds004 May 28 22:43:55 cht1ds004 May 28 22:43:55 cht1ds004 AFSR 0x00000000.00100000<CE> AFAR 0x00000000.55955a30 AFSR.PSYND 0x0000(Score 05) AFSR.ETS 0x00 Fault_PC 0xff07b2a8 UDBH Syndrome 0x58 Memory Module U0302

May 28 22:43:55 cht1ds004 SUNW,UltraSPARC-II: [ID 342081 kern.info] [AFT0] errID 0x0015bd43.7fa993cf Corrected Memory Error on U0302 is Persistent May 28 22:43:55 cht1ds004 SUNW,UltraSPARC-II: [ID 499012 kern.info] [AFT0] errID 0x0015bd43.7fa993cf ECC Data Bit 31 was in error and corrected

Sun Proprietary/Confidential: Internal Use Only

# /net/cores.uk/export/hotline/hotlocal/bin/findaft /home/mbuckley/Explorers/65041327_420R/explorer.80e93f26.cht1ds004-2006.06.02.05.22/messages/messages ################################################################################ This script looks for Hardware errors including all AFT and pci ECC events Written for 108528-16/112233-01 or above. Some tests may fail on other revisions Report bugs,RFEs or if you have questions email findaft-interest@sun.com Version 2.00 homepage http://pts-platform/twiki/bin/view/Tools/ToolPageFindaft Or runnable from /net/cores.uk/export/hotline/hotlocal/bin/findaft Infodoc 80270 Findaft an AFT CPU Memory and PCI ECC error message summary script ################################################################################ Input file /home/mbuckley/Explorers/65041327_420R/explorer.80e93f26.cht1ds004-2006.06.02.05.22/messages/messages is 0.1 MB ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Syndrome errors CE and UE errors are included ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 10 Syndrome 0x58 Memory Module U0302 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other AFT Events ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 5 1 2 2 3 [AFT0] Corrected Memory Error detected by CPU0, [AFT0] Corrected Memory Error detected by CPU1, [AFT0] Corrected Memory Error detected by CPU2, [AFT0] Corrected Memory Error detected by CPU3, [AFT0] Sticky Softerror encountered on Memory Module U0302

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Main Memory Correctable ECC events sorted by date ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 1 3 2 1 May 28 U0302 is Intermittent May 28 U0302 is Persistent May 30 U0302 is Sticky May 31 U0302 is Persistent Jun 01 U0302 is Persistent

Sun Proprietary/Confidential: Internal Use Only ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

(continued) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Panics, Reboots, Fatal errors etc ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 Jun 01 cht1ds004 SunOS Release 5.8 Version Generic_117000-03 64-bit ############################################################################### Correctable memory errors found, use cediag to determine if a DIMM needs to be replaced, see Infodoc 83216 for examples of the cediag rule failure messages Infodoc 79928: Sun Enhanced Memory DIMM Replacement Policy ################################################################################ cediag -e explorer_directory/ cediag -c SunOS,cht1ds004,5.8,sparc -k 117000-03 -u 2 /home/mbuckley/Explorers/65041327_420R/explorer.80e93f26.cht1ds0042006.06.02.05.22/messages/messages ################################################################################ Start of Ultrasparc II CE specific checks Unique Simms total 1 ################################################################################ U0302 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Unique Syndromes total 1 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ CE Event Syndrome 0x58 Data Bit 31 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ USII CE Event type reported by each CPU ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Reporting CPU Intermittent Persistent Sticky CPU0 CPU1 CPU2 CPU3 3 0 0 0 2 0 1 1 0 1 1 1 << << << Sun Proprietary/Confidential: Internal Use Only

CEDIAG is a memory error analysis tool, comprised of shell scripts and a few binary executables. Currently runs on Solaris SPARC architectures only. Reference: http://onestop/qco/dimm/tools/cediag.shtml Internally runnable from: /net/cores.uk/export/hotline/hotlocal/bin/cediag Usage: # cediag -e unpacked_explorer_dir May also be run in verbose mode to gather additional information (such as total number of memory pages retired) Syntax example: # /net/cores.uk/export/hotline/hotlocal/bin/cediag -v -e /explorer-directory Customers may download from: http://sunsolve.sun.com (Diagnostic Tools) Memory DIMM Replacement Management Tool (Download, install cediag 1.2.1)
Sun Proprietary/Confidential: Internal Use Only

# /net/cores.uk/export/hotline/hotlocal/bin/cediag -v -e /home/mbuckley/Explorers/65041327_420R/explorer.80e93f26.cht1ds004-2006.06.02.05.22/ cediag: Revision: 1.78 @ 2005/02/11 15:54:29 UTC cediag: info: cediag directory: /net/cores.uk/export/hotline/hotlocal/bin cediag: info: Explorer directory: /home/mbuckley/Explorers/65041327_420R/explorer.80e93f26.cht1ds004-2006.06.02.05.22/ cediag: info: UltraSPARC Version: 2 (2) cediag: info: OS Type: SunOS cediag: info: OS Version: 5.8 cediag: info: Hostname: cht1ds004 cediag: info: Memory size: 524032 (8KB pages) cediag: info: MPR (deduced) PRL pages: 497 (8KB pages) cediag: info: MPR-capable OS: true cediag: info: KJP: 117000-03 cediag: info: MPR-aware kernel in-use: true cediag: info: MPR enabled: true cediag: info: MPR disabled in /etc/system: false cediag: info: MPR force mode: n/a cediag: info: MPR state: active cediag: info: Rule#3 check: true cediag: info: Rule#4 check: true cediag: info: Rule#5 check: true cediag: info: Rule#5 check via cestat: false cediag: info: Rule#6 check: false cediag: #### CE Summary prior to reboot at Jun 1 14:55:33 ################### cediag: info: DIMM U0302 had 10 CE(s) cediag: info: DIMM U0302 had 7 non-intermittent CE(s) cediag: info: DIMM U0302 @ Data Bit 31 had 10 CE(s) cediag: info: DIMM U0302 @ Data Bit 31 @ AFAR%64=48 had 10 CE(s) across 1 AFARs
Sun Proprietary/Confidential: Internal Use Only

(MPR = Memory Page Retirement)

cediag: info: messages files: 1 pages scheduled for retirement cediag: info: messages files: 1 pages successfully retired cediag: info: messages files: 0 pages scheduled for clearing cediag: info: messages files: 0 pages successfully cleared cediag: info: PRL deduced status: PRL reached = false cediag: findings: 0 datapath fault message(s) found cediag: findings: 0 UE(s) found - there is no rule#3 match cediag: findings: 0 DIMMs with a failure pattern matching rule#4 cediag: findings: 0 DIMMs with a failure pattern matching rule#5 cediag: #### CE Summary since last detected reboot ########################### cediag: #### last detected reboot at Jun 1 14:55:33 ######################### cediag: info: messages files: 0 pages scheduled for retirement cediag: info: messages files: 0 pages successfully retired cediag: info: messages files: 0 pages scheduled for clearing cediag: info: messages files: 0 pages successfully cleared cediag: info: PRL deduced status: PRL reached = false cediag: findings: 0 datapath fault message(s) found cediag: findings: 0 UE(s) found - there is no rule#3 match cediag: findings: 0 DIMMs with a failure pattern matching rule#4 cediag: findings: 0 DIMMs with a failure pattern matching rule#5 #

Sun Proprietary/Confidential: Internal Use Only

Example cediag messages when a single DIMM needs to be replaced.


Rule 4 can identify UE DIMMs before they cause an outage. cediag: findings: 1 DIMMs with a failure pattern matching rule#4 cediag: findings: DIMM 'Slot A: J8101' matched rule#4 failure pattern cediag: advice:HIGH: replace DIMM 'Slot A: J8101' [A]s [S]oon [A]s [P]ossible Rule 5 failures are low risk and should not cause an outage. cediag: findings: 1 DIMMs with a failure pattern matching rule#5 cediag: findings: DIMM 'Slot B: J3101' matched rule#5 failure pattern cediag: advice:MEDIUM: replace DIMM 'Slot B: J3101' during next maintenance period Rule 6 applies when Solaris is not patched to the level to provide MPR and is low risk. cediag: findings: 1 DIMMs with a failure pattern matching rule#6 cediag: findings: DIMM 'Slot C: J8200' matched rule#6 (24 in 24) failure pattern cediag: advice:MEDIUM: replace DIMM 'Slot C: J8200' during next maintenance period

Sun Proprietary/Confidential: Internal Use Only

Example cediag messages for the more complex faults. Uncorrectable UE errors are often seen as a result of single DIMM Rule 4 failures. cediag: findings: 1 UE(s) found - potential rule#3 match cediag: advice:HIGH: refer UE(s) to Sun Support [A]s [S]oon [A]s [P]ossible Datapath fault - See Infodocs 70134 and 80288 for diagnosis of bad writers and datapath faults from Solaris messages. cediag: findings: 4 datapath fault message(s) found cediag: findings: 8 DIMM(s) having CEs with Esynd of 0x0010 found cediag: advice:HIGH: possible datapath fault - refer to Sun Support ASAP Whenever more than one DIMM fails rules 4,5 or 6 you will get this message. Make sure you really do have multiple failures before replacing any DIMMs cediag: advice:MEDIUM: consult Sun Support to rule out other causes of CEs before replacing any DIMMs

Sun Proprietary/Confidential: Internal Use Only

FIND_UE Utility: Used to identify those UE errors where a single DIMM from a memory bank can be reliably identified as the cause of the fault, or at least narrow down the number of suspect DIMMs. (Enhanced algorithms are being implemented to reduce the number of suspect components.) Identify those UEs which are likely to have been caused by FCO A0258. Field Change Order A0258-1: Mitsubishi 256MB DIMMs (Sun p/n 501-5658) showing significantly lower than expected reliability. Info available at: http://pts-platform/twiki/bin/view/Tools/ToolPageFindUE Alias list is available to provide tool support: findue-interest@sun.com FindUE is a commandline syndrome decoder Usage: /net/cores.uk/export/hotline/hotlocal/bin/findUE messages
Sun Proprietary/Confidential: Internal Use Only

################################################################### FindUE was written to assist in ECC syndrome history analysis, the script will understand Solaris messages, console logs, msgbuf, showlogs and wfail outputs Supported systems include USIII and USIV systems the E3000-6500s but not USIIIi. Version 1.33 from /net/cores.uk/export/hotline/hotlocal/bin/findUE Infodocs 75538 and 74624 have further details. If you find bugs in the script email doug.baker@sun.com for syndrome decode bugs benoit.baguette@sun.com ##################################################################

Sun Proprietary/Confidential: Internal Use Only

Infodoc 80346: Using the fin954 script to diagnose main memory versus L2SRAM errors
The fin954 script was written by Mike Arnott in 2003. The aim was to automate the diagnosis of main memory versus L2SRAM errors on the UltraSparc III systems using the errors found in a Solaris messages file. The fin954 script implements the rules described in FIN I0954-1. These rules apply to all USIII and USIV systems including the VSP systems, SunBlade 1000/2000, 280R, Netra 20, 480/490 and the 880/890, but not the USIIIi based systems. The latest version of fin954 is available from http://fde.aus/tools/fin954 it is a Perl script and needs to be run from the commandline. Further information is available from: FIN I0760-2 Sun Enhanced Memory DIMM Replacement Policy. Infodoc 52427 L2SRAM/DIMM Misdiagnosis Issues Infodoc 75538 Sun Fire[TM] Server: Using ECC Syndrome History to Troubleshoot Uncorrectable Errors (UE) in Memory

When to use fin954:


The fin954 is a special purpose diagnosis script and as such should not be used as an initial scan of messages. If you need a general summary of all AFT events found in messages use findaft. When cediag finds UE errors but cannot identify a single faulty DIMM, fin954 can be used to help diagnose the faulty FRU.

Example:
$ fin954 sample.4.messages.txt ============================================================================ Findings: from analysing sample.4.messages.txt Total "Events" logged: 167 Total *significant* "Events" logged: 28 Total insignificant "Events" logged: 138 . FRU "SB10/P1/B1 J14301 J14401 J14501 J14601" implicated as error source 22 times. FRU "SB10/P1/B1/D0 J14301" implicated as error source 6 times.

Fin954 is a standalone script, the latest version is runnable from here: /net/cores.uk/export/hotline/hotlocal/bin/fin954 Sun Proprietary/Confidential: Internal Use Only

Dimm PLL chip Issues:


Sun has determined that a limited subset of memory DIMMs shipped in 2001 and 2002 (less than one percent of the installed base) may begin to show reduced reliability after approximately two years of operation. This reliability issue manifests itself in the form of UEs (Uncorrectable Errors), sometimes with CEs (Correctable Errors), originating from the DIMMs. The reliability of these DIMMs is normal for approximately the first two years of use, after which they may start to degrade below the expected level. The root cause of this issue is related to a PLL device on the DIMMs. This sub-population of DIMMs has PLL devices with a date code range between 0049 and 0215 inclusive. No unique symptom will be experienced due to this issue, other than higher than expected UEs and CEs. A DIMM lookup tool has been developed to assist in identifying suspect DIMMs (but sometimes manual inspection is required) Impacted Platforms: It has been determined that the following platforms if shipped between 1/01/2001 and 12/31/2002 could be impacted: SB1000, SB2000, Netra20, 280R, V480, V490, V880, V890, V1280, SF3800, SF4800, SF4810, SF6800, F12K, F15K References: FCO A0253-1: A sub-population of DIMMs that shipped between 2001 and 2002 on the below platforms are showing significantly lower reliability than expected. FAQ: http://onestop/qco/plldimm/index_plldimm.shtml
Sun Proprietary/Confidential: Internal Use Only

How this issue may uniquely affect V480/V490/V880/V890 platforms:


In some cases during UE DIMM errors, incorrect memory banks can be called out masking the true location of the faulty DIMM. Due to this bug, not even POST can help since POST is also affected by the bug and will also call out the wrong DIMM location. The Kernel or POST reports a memory group as the source of a CE or UE error, which might cause the engineer to believe there is a defective DIMM within that group. However, on a system experiencing BugID #5034665, the reported group is USUALLY NOT the location of the defective DIMM. You will know that the system is experiencing this bug because there will be multiple UE error messages calling out different groups of memory on the same CPU/Memory board. Note: When this bug is exhibited, the UE DIMM errors will be confined to a single CPU/Memory board - the false errors will not span different CPU/Memory boards, they will only be on one board. In light of the new information discovered by bug # 5034665, we no longer will look for one DIMM specifically. We now will remove all DIMMs containing a specific PLL chip within a range of date codes. References: Infodoc 77110: Sun Fire[TM] Server (V480, V880, V880z, V490, V890): How to Troubleshoot "Dimm PLL chip failure causes CE/UEs to be called out by POST & Solaris[TM] in any dimm location" BugID #5034665 SunAlert 101667 (formerly 57757): A Limited Subset of DIMMs (less than 1%) Shipped in 2001-2002 May Have a Reliability Issue PLL Lookup Tool: http://pts-appl-z1.holland/pll.html (used to scan Explorer outputs)
Sun Proprietary/Confidential: Internal Use Only

Memory reference site: http://onestop/qco/dimm/ https://onestop.sfbay.sun.com/qco/dimm/index_dimm.shtml Infodoc 70361: "Introduction to Solaris[TM] Operating System CE/UE/ECC/CBB/CBI/DBB/DBI Error Messages" Infodoc 72846: Event Messages for UltraSPARC-III[R], UltraSPARC-III+[R], UltraSPARC-IIIi[R], and UltraSPARC-IV[R] CPU Modules Infodoc 72775: "How to determine if a correctable error (CE) on a memory DIMM should result in replacement of FRU" Infodoc 70134 Diagnosis of bad writers and datapath faults from Solaris messages Infodoc 79928: "Sun Enhanced Memory DIMM Replacement Policy" Infodoc 82264: Memory DIMM Replacement Management Tool - cediag 1.2.1 FAQ FIN 100271 (Formerly I0760-2) Sun Enhanced Memory DIMM Replacement Policy
Sun Proprietary/Confidential: Internal Use Only

Misc Memory Resources

Memory Diagnosibility TOI

Mike Buckley michael.buckley@sun.com 781-442-1222


revision: H (6/22/06)

Вам также может понравиться