Вы находитесь на странице: 1из 8


gc lost blocks diagnostics [ID 563566.1]

Modified 10-JUN-2009 Type PROBLEM Status PUBLISHED

In this Document
Global Cache Block Loss Diagnostic Guide

Applies to:

Oracle Server - Enterprise Edition - Version: to

This problem can occur on any platform.
Oracle Clusterware & Oracle Real Application Clusters



In Oracle RAC environments, RDBMS gathers global cache work load statistics which are reported in STATSPACK,
AWRs and GRID CONTROL. Global cache lost blocks statistics ("gc cr block lost" and/or "gc current block lost") for
each node in the cluster as well as aggregate statistics for the cluster represent a problem or inefficiencies in packet
processing for the interconnect traffic. These statistics should be monitored and evaluated regularly to guarantee
efficient interconnect Global Cache and Enqueue Service (GCS/GES) and cluster processing. Any block loss
indicates a problem in network packet processing and should be investigated.

The vast majority of escalations attributed to RDBMS global cache lost blocks can be directly related to faulty or
mis-configured interconnects. This document serves as guide for evaluating and investigating common (and
sometimes obvious) causes.

Even though much of the discussion focuses on Performance issues, it is possible to get a node/instance eviction
due to these problems. Oracle Clusterware & Oracle RAC instances rely on heartbeats for node memberships. If
network Heartbeats are consistently dropped, Instance/Node eviciton may occur. The Symptoms below are
therefore relevant for Node/Instance evictions.



‘gc cr block lost’ / ‘gc current block lost’ top 5 or significant wait event


SQL traces report multiple gc cr requests / gc current request /

gc cr multiblock requests with long and uniform elapsed times
Poor application performance / throughput
Packet send/receive errors as displayed in ifconfig or vendor supplied utility
Netstat reports errors/retransmits/reassembly failures
Node failures and node integration failures
Abnormal cpu consumption attributed to network processing

1 of 8 18.10.2010 15:54

NOTE: lost block problems often occur with gc cr multiblock waits, i.e. waits for scans of consecutive blocks


Probable causes are noted in the Diagnostic Guide below (ordered by most likely to least likely cause)

Global Cache Block Loss Diagnostic Guide

1. Poorly sized UDP receive (rx) buffer sizes / UDP buffer socket overflows

Description: Oracle RAC Global cache block processing is ‘bursty’ in nature and, consequently, the OS may
need to buffer receive(rx) packets while waiting for CPU. Unavailable buffer space may lead to silent packet
loss and global cache block loss. `netstat –s` or `netstat –su` on most UNIX will help determine
UDPInOverflows, packet receive errors, dropped frames, or packets dropped due to buffer full errors.
Action: Packet loss is often attributed to inadequate( rx) UDP buffer sizing on the recipient server, resulting
in buffer overflows and global cache block loss. The UDP receive (rx) buffer size for a socket is set to 128k
when Oracle opens the socket when the OS setting is less than 128k. If the OS setting is larger than 128k
Oracle respects the value and leaves it unchanged. The UDP receive buffer size will automatically increase
according to database block sizes greater than 8k, but will not increase beyond the OS dependent limit. UDP
buffer overflows, packet loss and lost blocks may be observed in environments where there are excessive
timeouts on ‘global cache cr requests’ due to inadequate buffer setting when
DB_FILE_MULTIBLOCK_READ_COUNT is greater than 4. To alleviate this problem, increase the UDP buffer
size and decrease the DB_FILE_MULTIBLOCK_READ_COUNT for the system or active session.

To determine if you are experiencing UDP socket buffer overflow and packet loss, on most UNIX platforms,

`netstat –s` or `netstat –su` and look for either “udpInOverflows”, “packet receive
errors” or “fragments dropped” depending on the platform.

NOTE: UDP packet loss usually results in increased latencies,decreased bandwidth,increased cpu
utilization (kernel and user), and memory consumption to deal with packet retransmission.

2. Poor interconnect performance and high cpu utilization. `netstat –s` reports packet reassembly

Description: Large UDP datagrams may be fragmented and sent in multiple frames based on Medium
Transmission Unit (MTU) size. These fragmented packets need to be reassembled on the receiving node.
High CPU utilization (sustained or frequent spikes), inadequate reassembly buffers and UDP buffer space can
cause packet reassembly failures. `netstat -s` reports a large number of Internet Protocol (IP) ‘reassembles
failed’ and ‘fragments dropped after timeout’ in the "IP Statistics" section of the output on the receiving node.
Fragmented packets have a time-to-live for reassembly. Packets that are not reassembled are dropped and
requested again. Fragments that arrive and there is no space for reassembly are silently dropped

`netstat –a` IP stat counters:

3104582 fragments dropped after timeout
34550600 reassemblies required
8961342 packets reassembled ok
3104582 packet reassembles failed.

Action: Increase fragment reassembly buffers, allocating more space for reassembly. Increase the time to
reassemble packet fragment., increase udp receive buffers to accommodate network processing latencies
that aggravate reassembly failures and identify CPU utilization that negatively impacts network stack

2 of 8 18.10.2010 15:54

To modify reassembly buffer space, change the following thresholds:

/proc/sys/net/ipv4/ipfrag_low_thresh (default = 196608)

/proc/sys/net/ipv4/ipfrag_high_thresh (default = 262144)

To modify packet fragment reassembly times, modify:

/proc/sys/net/ipv4/ipfrag_time (default = 30)

see your OS for the equivalent command syntax.

3. Network packet corruption resulting from UDP checksum errors and/or send (tx) / receiv e (rx)
transmission errors

Description: UDP includes a checksum field in the packet header which is read on receipt. Any corruption of
the checksum results in silent dropped packets. Checksum corruptions result in packet retransmissions,
additional cpu overhead for the additional request and latencies in packet processing.
Action: Use tcpdump/snoop/network utilities sniffer utility to capture packet dumps to identify checksum
errors and confirm checksum corruption. Engage sysadmins and network engineers for root cause. Checksum
offloading on NICs have been known to create checksum errors. Consider disabling the NIC checksum
offloading, if configured, and test. On LINUX ethtool –K <IF> rx off tx off disables the checksum offloading.

4. Mismatched MTU sizes in the communication path

Description: Mismatched MTU sizes cause ‘packet too big’ failures and silent packet loss resulting in global
cache block loss and excessive packet retransmission requests.
Action: The MTU is the "Maximum Transmission Unit" or frame size configured for the interconnect
interfaces. The default standard for most UNIX is 1500 bytes for Ethernet. MTU definitions should be identical
for all devices in the interconnect communication path. Identify and monitor all devices in the interconnect
communication path. Use large, non-default sized, ICMP probe packets for` ping`, `tracepath` or `traceroute`
to detect mismatched MTUs in the path. Use `ifconfig` or vendor recommended utilities to determine and set
MTU sizes for the server NICS. See Jumbo Frames #12 below. Note: Mismatched MTU sizes for the
interconnect will inhibit nodes joining the cluster in 10g and 11g.

5. Faulty or poorly seated cables/cards

Description: Faulty network cable connections, the wrong cable, poorly constructed cables, excessive
length and wrong port assignments can result in inferior bit rates, corrupt frames, dropped packets and poor
Action: CAT 5 grade cables or better should be deployed for interconnect links. All cables should be
securely seated and labeled according to LAN/port and aggregation, if applicable. Cable lengths should
conform to vendor ethernet specifics.

6. Interconnect LAN non-dedicated

Description: Shared public IP traffic and/or shared NAS IP traffic, configured on the interconnect LAN will
result in degraded application performance, network congestion and, in extreme cases, global cache block
Action: The interconnect/clusterware traffic should be on a dedicated LAN defined by a non-routed subnet.
Interconnect traffic should be isolated to the adjacent switch(es), e.g. interconnect traffic should not extend
beyond the access layer switch(es) to which the links are attached. The interconnect traffic should not be
shared with public or NAS traffic. If Virtual LANs (VLANS) are used, the interconnect should be on a single,
dedicated VLAN mapped to a dedicated, non-routed subnet, which is isolated from public or NAS traffic.

7. Lack of Server/Switch Adjacency

Description: Network devices are said to be ‘adjacent’ if they can reach each other with a single hop across

3 of 8 18.10.2010 15:54

a link layer. Multiple hops add latency and introduce unnecessary complexity and risk when other network
devices are in the communication path.
Action: All GbE server interconnect links should be (OSI) layer 2 direct attach to the switch or switches (if
redundant switches are configured). There should be no intermediary network device, such as a router, in the
interconnect communication path. The unix command `traceroute` will help identify adjacency issues.

8. IPFILTER configured

Description:IPFILTER (IPF) is a host based firewall or Network Address Translation (NAT) software
package that has been identified to create problems for interconnect traffic. IPF may contribute to severe
application performance degradation, packet loss and global cache block loss.
Action: disable IPFILTER

9. Outdated Network driver or NIC firmware

Description: Outdated NIC drivers or firmware have been known to cause problems in packet processing
across the interconnect. Incompatible NIC drivers in inter-node communication may introduce packet
processing latencies, skewed latencies and packet loss.
Action: Server NICs should be the same make/model and have identical performance characteristics on all
nodes and should be symmetrical in slot id. Firmware and NIC drivers should be at the same (latest) rev. for
all server interconnect NICs in the cluster.

10. Proprietary interconnect link transport and network protocol

Description: Non-standard, proprietary protocols, such as LLT, HMP,etc, have proven to be unreliable and
difficult to debug. Miss-configured proprietary protocols have caused application performance degradation,
dropped packets and node outages.
Action: Oracle has standardized on 1GbE UDP as the transport and protocol. This has proven stable,
reliable and performant. Proprietary protocols and substandard transports should be avoided. IP and RDS on
Inifiniband are available and supported for interconnect network deployment and 10GbE has been certified for
some platforms (see OTN for details) - certification in this area is ongoing.

11. Misconfigured bonding/link aggregation

Description: Failure to correctly configure NIC Link Aggregation or Bonding on the servers or failure to
configure aggregation on the adjacent switch for interconnect communication can result in degraded
performance and block loss due to ‘port flapping’, interconnect ports on the switch forming an aggregated link
frequently change ‘UP’/’DOWN’ state.
Action: If using link aggregation on the clustered servers, the ports on the switch should also support and be
configured for link aggregation for the interconnect links. Failure to correctly configure aggregation for
interconnect ports on the switch would result in 'port flapping', switch ports randomly dropping, resulting in
packet loss.
Bonding/Aggregation should be correctly configured per driver documentation and tested under load. There
are a number of public domain utilities that help to test and measure link bandwidth and latency performance
(see iperf). OS, network and network driver statistics should be evaluated to determine efficiency of bonding.

12. Misconfigured Jumbo Frames

Description: Misconfigured Jumbo Frames may create mismatched MTU sizes described above.
Action: Jumbo Frames are IEEE non-standard and as a consequence, care should be taken when
configuring. A Jumbo Frame is a frame size around 9000bytes. Frame size may vary depending on network
device vendor and may not be consistent between communicating devices. An identical maximum transport
unit (MTU) size should be configured for all devices in the communication path if the default is not 9000 bytes.
All the network devices , switches/NICS/line cards, in operation must be configured to support the same
frame size (MTU size). Mismatched MTU sizes, where the switch may be configured to be MTU:1500 but the
server interconnect interfaces are configured to be MTU:9000 will lead to packet loss, packet fragmentation
and reassembly errors which cause severe performance degradation and cluster node outages. The IP stats
in `netstat –s` on most platforms will identify frame fragmentation and reassembly errors. The command
`ifconfig -a`, on most platforms, will identify the frame size in use (MTU:1500). See the switch vendors
documentation to identify Jumbo Frames support.

4 of 8 18.10.2010 15:54

13. NIC force full duplex and duplex mode mismatch

Description: Duplex mode mismatch is when two nodes in a communication channel are operating at
half-duplex on one end and full duplex on the other. This may be manually misconfigured duplex modes or, one
end configured manually to be half-duplex while the communication partner is autonegotiate. Duplex mode
mismatch results in severely degraded interconnect communication.
Action: Duplex mode should be set to autonegotiate for all Server NICs in the cluster *and* line cards on the
switch(es) servicing the interconnect links. Gigabit ethernet standards require autonegotiation set to "on" in
order to operate. Duplex mismatches can cause severe network degradation, collisions and dropped packets.
Autonegotiate duplex modes should be confirmed after every hardware/software upgrade affecting the
network interfaces. Autonegotiate on all interfaces will operate at 1000 full duplex.

14. Flow-control mismatch in the interconnect communication path

Description: Flow control is the situation when a server may be transmitting data faster than a network peer
(or network device in the path) can accept it. The receiving device may send a PAUSE frame requesting the
sender to temporarily stop transmitting.
Action: Flow-control mismatches can result in lost packets and severe interconnect network performance

tx flow control should be turned off

rx flow control should be turned on
tx/rx flow control should be turned on for the switch(es)

NOTE: flow control definitions may change after firmware/network driver upgrades. NIC settings should
be verified after any upgrade

15. Packet loss at the OS, NIC or switch layer

Description: Any packet loss as reported by OS, NIC or switch should be thoroughly investigated and
resolved. Packet loss can result in degraded interconnect performance, cpu overhead and network/node
Action: Specific tools will help identify which layer you are experiencing the packet/frame loss (process/OS
/Network/NIC/switch). netstat, ifconfig, ethtool, kstat (depending on the OS) and switch port stats would be
the first diagnostics to evaluate.You may need to use a network sniffer to trace end-to-end packet
communication to help isolate the problem (see public domain tools such as snoop/wireshare/ethereal). Note,
understanding packet loss at the lower layers may be essential to determining root cause. Under sized ring
buffers or receive queues on a network interface are known to cause silent packet loss, e.g. packet loss that
is not reported at any layer. See NIC Driver Issues and Kernel queue lengths below. Engage your systems
administrator and network engineers to determine root cause.

16. NIC Driver/Firmware Configuration

Description: Misconfigured or inadequate default settings for tunable NIC public properties may result in
silent packet loss and increased retransmission requests.
Action: Default factory settings should be satisfactory for the majority of network deployments. However,
there have been issues with some vendor NICs and the nature of interconnect traffic that have required
modifying interrupt coalescence settings and the number of descriptors in the ring buffers associated with the
device. Interrupt coalescence is the CPU interrupt rate for send (tx) and receive (rx) packet processing. The
ring buffers hold rx packets for processing between CPU interrupts. Misconfiguration at this layer often results
in silent packet loss. Diagnostics at this layer require sysadmin and OS/Vendor intervention.

17. NIC send (tx) and receive (rx) queue lengths

Description: Inadequately sized NIC tx/rx queue lengths may silently drop packets when queues are full.
This results in gc block loss, increased packet retransmission and degraded interconnect performance.

5 of 8 18.10.2010 15:54

Action: As packets move between the kernel network subsystem and the network interface device driver,
send (tx) and receive (rx) queues are implemented to manage packet transport and processing. The size of
these queues are configurable. If these queues are underconfigured or misconfigured for the amount of
network traffic generated or MTU size configured, full queues will cause overflow and packet loss. Depending
on the driver and quality of statistics gathered for the device, this packet loss may not be easy to detect.
Diagnostics at this layer require sysadmin and OS/Vendor intervention. (c.f. iftxtqueue and
netdev_max_backlog on linux)

18. Limited capacity and over-saturated bandwidth

Description: Oversubscribed network usage will result in interconnect performance degradation and packet
Action: An interconnect deployment best practice is to know your interconnect usage and bandwidth. This
should be monitored regularly to identify usage trends, transient or constant. Increasing demands on the
interconnect may be attributed to scaling the application or aberrant usage such a bad sql or unexpected
traffic skew. Assess the cause of bandwidth saturation and address it.

19. Over subscribed CPU and scheduling latencies

Description: Sustained high load averages and network stack scheduling latencies can negatively effect
interconnect packet processing and result in interconnect performance degradation, packet loss, gc block loss
and potential node outages.
Action: Scheduling delays when the system is under high CPU utilization can cause delays in network packet
processing. Excessive, sustained latencies will cause severe performance degradation and may cause cluster
node failure. It is critical that sustained elevated CPU utilization be investigated. The `uptime` command will
display load average information on most platforms. Excessive CPU interrupts associated with network stack
processing may be mitigated through NIC interrupt coalescence and/or binding network interrupts to a single
CPU. Please work with NIC vendors for these types of optimizations. Scheduling latencies can result in
reassembly errors. See #2 above.

20. Switch related packet processing problems

Description: Buffer overflows on the switch port, switch congestion and switch misconfiguration such as
MTU size, aggregation and Virtual Land definitions (VLANs) can lead to inefficiencies in packet processing
resulting in performance degradation or cluster node outage.
Action: The Oracle interconnect requires a switched Ethernet network. The switch is a critical component in
end-to-end packet communication of interconnect traffic. As a network device, the switch may be subject to
many factors or conditions that can negatively impact interconnect performance and availability. It is critical
that the switch be monitored for abnormal, packet processing events, temporary or sustained traffic
congestion and efficient throughput. Switch statistics should be evaluated at regular intervals to assess trends
in interconnect traffic and to identify anomalies.

21. QoS which negatively impacts the interconnect packet processing

Description: Quality of Service definitions on the switch which shares interconnect traffic may negatively
impact interconnect network processing resulting in severe performance degradation
Action: If the interconnect is deployed on a shared switch segmented by VLANs, any QoS definitions on the
shared switch should be configured such that prioritization of service does not negatively impact interconnect
packet processing. Any QoS definitions should be evaluated prior to deployment and impact assessed.

22. Spanning tree brownouts during reconvergence.

Description: Ethernet networks use a Spanning Tree Protocol (STP) to ensure a loop-free topology where
there are redundant routes to hosts. An outage of any network device participating in an STP topology is
subject to a reconvergence of the topology which recalculates routes to hosts. If STP is enabled in the LAN
and misconfigured or unoptimized, a network reconvergence event can take up to 1 minute or more to
recalculate (depending on size of network and participating devices). Such latencies can result in interconnect
failure and cluster wide outage.
Action: Many switch vendors provide optimized extensions to STP enabling faster network reconvergence
times. Optimizations such as Rapid Spanning Tree (RSTP), Per-VLAN Spanning Tree (PVST), and Multi-

6 of 8 18.10.2010 15:54

Spanning Tree (MSTP) should be deployed to avoid a cluster wide outage.

23. sq_max_size inadequate for STREAMS queing

Description: AWR reports high waits for "gc cr block lost" and/or "gc current block lost". netstat output does
not reveal any packet processing errors. `kstat -p -s '*nocanput*` returns non-zero values. nocanput indicates
that queues for streaming messages are full and packets are dropped. Customer is running STREAMS in a
RAC in a Solaris env.
Action: Increasing the udp max buffer space and defining unlimited STREAMS queuing should relieve the
problem and eliminate ‘nocanput’ lost messages. The following are the Solaris commands to make these

`ndd -set /dev/udp udp_max_buf <NUMERIC VALUE>`

set sq_max_size to 0 (unlimited) in /etc/system. Default = 2

udp_max_buf controls how large send and receive buffers (in bytes) can be for a UDP socket. The default
setting, 262,144 bytes, may be inadequate for STREAMS applications. sq_max_size is the depth of the
message queue.


As explained above, Lost blocks are generally caused by unreliable Private network. This can be caused by a bad
patch or faulty network configuration or hardware issue.


In most cases, gc buffer lost has been attributed to (a) A missing OS patch (b) Bad network card (c) Bad cable (d)
Bad switch (e) One of the network settings. Customer should open a ticket with their OS vendor and provide them
the OSwatcher output


Customers should start collecting OSwatcher data.

This data should be shared with OS vendor to find root cause of the problem.


NOTE:297623.1 - UDP configuration on Solaris and block lost waits



Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Enterprise Edition




Back to top

7 of 8 18.10.2010 15:54

Rate this document

8 of 8 18.10.2010 15:54