Вы находитесь на странице: 1из 84

RAC Internals

Julian Dyke Independent Consultant


CERN Geneva - November 2008
1

2008 Julian Dyke

juliandyke.com

About me...
20 years Oracle experience as DBA, developer and consultant Independent Consultant specializing in Kernel Performance Tuning RAC and High Availability Chair of UKOUG RAC & HA SIG Regular presenter at conferences, seminars and user group meetings in UK, Europe and USA Member of Oak Table Network Website http://www.juliandyke.com specializing in Oracle internals
2

2008 Julian Dyke

juliandyke.com

About the book...


Pro Oracle Database 10g RAC on Linux Co-authored with Steve Shaw of Intel Corporation Published by Apress Available August 2006 ISBN: 1-59059-524-6 New edition planned for 2009 (Oracle 11gR2)

2008 Julian Dyke

juliandyke.com

1010111 0101 010110101 0110101


10101 101010010 01010010101 010101010101 1001010 110 10101 10010 1101010 1010101 0001 100101010 010 101010 11111000000 0000011000 101 0101010100 1010010 10101 1001010 1010 10101 101001 01101010 '1011011'; 10101 101010010 01010010101 010101010101
11101101110 0110 1001011 1010 11001 10001 00100110 10101 1000110 00101 10101 1010 1001 1111 1001 0101 1000101 111011 101110

2008 Julian Dyke

juliandyke.com

Agenda

Interconnect RAC Background Processes Global Cache Services

2008 Julian Dyke

juliandyke.com

RAC 4-node cluster


Public Network Private Network (Interconnect)
Node 1 Instance 1 Node 2 Instance 2 Node 3 Instance 3 Node 4 Instance 4

Storage Network Shared Storage

2008 Julian Dyke

juliandyke.com

Interconnect Overview
Instances communicate with each other over the interconnect (network) Information transferred between instances includes data blocks locks SCNs Typically 1Gb Ethernet UDP protocol Often teamed in pairs to avoid SPOFs Can also use Infiniband Fewer levels in stack Other proprietary protocols are available
7

2008 Julian Dyke

juliandyke.com

Interconnect TCP/IP Five Layer Model


All messages travel down through layers, across physical layer then up again

5 Application 4 Transport 3 Network 2 Data Link 1Physical

5 Application 4 Transport 3 Network 2 Data Link 1Physical

2008 Julian Dyke

juliandyke.com

Interconnect TCP/IP Five Layer Model


TCP/IP has a four or five layer model Five-layer model shown below
Layer 5 Application 4 Transport 3 Network 2 Data Link 1 Physical TCP/IP Suite DHCP, DNS, FTP, HTTP, SSH, NFS, NTP, SMTP, SNMP, TELNET, RPC, SOAP TCP, UDP IP (IPv4, IPv6), ICMP, ARP, RARP Ethernet, Token Ring, 802.11, Wi-Fi, FDDI, PPP 10BASE-T, 100BASE-T, 1000BASE-T, Optical Fibre, Twisted Pair

Four-layer model combines data link and physical layers

2008 Julian Dyke

juliandyke.com

Interconnect TCP/IP Transport Layer


Transport Layer Connection-oriented (TCP) Connectionless (UDP)

Clusterware

TCP

UDP

RAC

IP

Ethernet

Physical Layer

10

2008 Julian Dyke

juliandyke.com

Interconnect Encapsulation
Data UDP Header IP Header Ethernet Header IP Header UDP Header UDP Header

Data

Data Ethernet Trailer

Data

14 bytes

20 bytes

8 bytes

4 bytes

MTU Size

11

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Node Heartbeat Messages


Sent to each node in cluster every second in both directions Checks nodes are still members of cluster Sent by ocssd.bin using TCP well-known port 49895 Outgoing message is 134 bytes (80 byte payload) Incoming message is 66 bytes (12 byte payload)

Node 1 Node 2

Node 3 Node 4

Outgoing Incoming 12

2008 Julian Dyke

juliandyke.com

Oracle Clusterware Node Status Messages


Number of packets exchanged by a node is determined by number of nodes in cluster Number of packets per node per hour is (#nodes - 1) * 4 messages * 3600 seconds
Number of nodes 2 3 4 5 6 7 8 16 32
13

Packets per hour 14,400 28,800 43,200 57,600 72,000 86,400 100,800 216,000 446,400

2008 Julian Dyke

juliandyke.com

Global Services Overview


Resource Object to which access must be controlled at instance level Enqueue Memory structure that serializes access to a resource Global Resources Object to which access must be controlled at cluster level Global Enqueue Locks and enqueues which need to be consistent between all instances

14

2008 Julian Dyke

juliandyke.com

Global Services Overview


Global Resource Directory (GRD) Records current state and owner of each resource Contains convert and write queues Distributed across all instances in cluster Maintained by GCS and GES Global Cache Services (GCS) Implements cache coherency for database Coordinates access to database blocks for instances Global Enqueue Services (GES) Controls access to other resources (locks) including library cache and dictionary cache Performs deadlock detection
15

2008 Julian Dyke

juliandyke.com

RAC Background Processes Overview


Node 1 DIAG LMON LCK0 LMD0 LMSn Buffer Cache CKPT ARCn LGWR DBWR DBWR LGWR Buffer Cache CKPT ARCn Shared Pool Shared Pool PMON SMON SMON Node 2 PMON DIAG LMON LCK0 LMD0 LMSn

Instance 1

Instance 2

Datafiles Controlfiles Redo Logs


16

Redo Logs

2008 Julian Dyke

juliandyke.com

RAC Background Processes LMSn


LMSn Global Cache Service Process Manage requests for data access across cluster Up to 20 in Oracle 10.1 LMS0-LMS9 LMSa-LMSj Up to 36 in Oracle 10.2 LMS0-LMS9 LMSa-LMSz In Oracle 10.1 and above, number of GCS server processes can be configured using gcs_server_processes parameter Default value is 1 (single CPU system) Can also be configured using _lm_lms parameter
17

2008 Julian Dyke

juliandyke.com

RAC Background Processes LMSn


In Oracle 10.2 and above LMS processes run in real-time mode Remaining processes run in time-share mode Check using:
[oracle@server3 ~]$ ps -eo pid,user,opri,cmd | grep ora_lm 8596 oracle 75 ora_lmon_TEST1 8598 oracle 75 ora_lmd0_TEST1 8601 oracle 58 ora_lms0_TEST1

58 is real time; 75 or 76 is time share You can also check process scheduling policies using chrt
oracle@server3 ~]$ chrt -p 8601 Time pid 8601's current scheduling policy: SCHED_RR pid 8601's current scheduling priority: 1 [oracle@server3 ~]$ chrt -p 8596 Share pid 8596's current scheduling policy: SCHED_OTHER pid 8596's current scheduling priority: 0 18 # lms0 - Real

# lmon - Time

2008 Julian Dyke

juliandyke.com

RAC Background Processes LCK0


LCK0 Instance Enqueue Process Part of KCL (Kernel Cache Library) Manages instance resource requests cross-instance call operations Assists LMS processes Formerly known as lock process One LCK0 process per instance In 9.0.1 and below, number of lock processes may be configurable using _gc_lck_procs parameter
19

2008 Julian Dyke

juliandyke.com

RAC Background Processes LMD0


LMD0 Global Enqueue Service Daemon Manages requests for global enqueues Updates status of enqueues when granted to / revoked from an instance Responsible for deadlock detection One LMD0 process per instance In 8.1.7 and below number of lock daemons may be configurable using _lm_dlmd_processes parameter

20

2008 Julian Dyke

juliandyke.com

RAC Background Processes LMON


LMON Global Enqueue Service Monitor One LMON process per instance Monitors cluster to maintain global enqueues and resources Manages instance and process expirations recovery processing for cluster enqueues

21

2008 Julian Dyke

juliandyke.com

RAC Background Processes DIAG


DIAG - Diagnosability Process Collects diagnostic data in the event of a failure Creates subdirectories in BACKGROUND_DUMP_DEST directory In Oracle 9.0.1 and above can be disabled using _diag_daemon parameter Do not try this on a production system

22

2008 Julian Dyke

juliandyke.com

Global Cache Services Introduction


Global Cache Services exist to implement Cache Fusion Cache Fusion allows blocks to be updated by multiple instances Only one instance can have the updatable (current) version of a block GCS must ensure that only one instance can update a block at any time Many instances can have read-only (consistent read) versions of a block Instances can have multiple copies of same block at different SCNs

23

2008 Julian Dyke

juliandyke.com

Global Cache Services 2 way Consistent Read


N S 1 Request shared resource Resource Master Instance 3

Instance 2

2 Request granted

3 Read request Instance 1 4 Block returned Instance 4

Instance 2 requests current read on block

1318

24

STOP

2008 Julian Dyke

juliandyke.com

Global Cache Services 3-way Current Read


N S N 3 Block and resource status 2 Transfer block to Instance 1 for exclusive access 1 Request exclusive resource 4 Resource status Instance 4 Resource Master

1318
Instance 2

Instance 3

N X

1320
Instance 1 Instance 1 requests exclusive read on block

1318

25

STOP

2008 Julian Dyke

juliandyke.com

Global Cache Services 3-way Current Read (Dirty Block)


N S N 2 Transfer block to Instance 4 in exclusive mode Resource Master 1 Request block in exclusive mode

1318
Instance 2

Instance 3 4 Resource status

N X N

N X

1320
3 Block and resource status Instance 1 Instance 4 requests exclusive read on block

1323
Instance 4

1318

Note that Instance 1 will create a past image (PI) of the dirty block

26

STOP

2008 Julian Dyke

juliandyke.com

Global Cache Services 3-way Current (Without Downgrade)


N 1 Request block in shared mode Resource Master 2 Transfer block to Instance 2 in shared mode

Instance 2

4 Resource status

Instance 3

N X N

N X 3 Block and resource status

1320
Instance 1 Instance 2 requests current read on block

1323
Instance 4

1318

In Oracle 8.1.5 and above _fairness_threshold is used to avoid unnecessary lock conversions

27

STOP

2008 Julian Dyke

juliandyke.com

Global Cache Services 3-way Current (With Downgrade)


S 1 Request block in shared mode Resource Master 2 Transfer block to Instance 2 in shared mode

Instance 2

4 Resource status

Instance 3

N X N

N X S 3 Block and resource status

1320
Instance 1 Instance 2 requests current read on block

1323
Instance 4

1318

In Oracle 8.1.5 and above _fairness_threshold is used to avoid unnecessary lock conversions

28

STOP

2008 Julian Dyke

juliandyke.com

Global Cache Services Wait Events


Wait events show reads where messages have been exchanged with other instances Can include: gc cr grant 2-way gc cr block 2-way gc cr block 3-way gc cr multi block request gc current grant 2-way gc current block 2-way gc current block 3-way gc current multi block request

29

2008 Julian Dyke

juliandyke.com

Global Cache Services Cache Fusion Example


Resource Master RAC2 1
1,42 2,44

RAC3 UPDATE t1 SET c2 = 42 WHERE c1 = 1;

1,42 2,50

RAC1 2
1,40 1318 2,44

RAC4 UPDATE t1 SET c2 = 50 WHERE c1 = 2;

30

2008 Julian Dyke

juliandyke.com

Global Cache Services Cache Fusion Example


RAC4 executes
UPDATE t1 SET c2 = 42 WHERE c1 = 2;

No statistics so dynamic sampling required No indexes so full table scan required Steps are:
Dynamic Sampling 3-way 2-way 2-way 3-way 2-way Current Read 3-way
31

Consistent Read Consistent Read Consistent Read Consistent Read Consistent Read Current Read

Table block 15 Undo block 89 Undo block 239 Table block 15 Undo block 89 Table block 15

Consistent Read

2008 Julian Dyke

juliandyke.com

Global Cache Services Cache Fusion Example


Dynamic Sampling - 10046/8
PARSING IN CURSOR #4 len=433 dep=1 uid=55 oct=3 lid=55 hv=574971495 ad='2b8da360' SELECT /* OPT_DYN_SAMP */ /*+ ALL_ROWS IGNORE_WHERE_CLAUSE NO_PARALLEL(SAMPLESUB) opt_param('parallel_execution_enabled', 'false') NO_PARALLEL_INDEX(SAMPLESUB) NO_SQL_TUNE */ NVL(SUM(C1),:"SYS_B_0"), NVL(SUM(C2),:"SYS_B_1") FROM (SELECT /*+ IGNORE_WHERE_CLAUSE NO_PARALLEL("T7") FULL("T7") NO_PARALLEL_INDEX("T7") */ :"SYS_B_2" AS C1, CASE WHEN "T7"."C1"=:"SYS_B_3" THEN :"SYS_B_4" ELSE :"SYS_B_5" END AS C2 FROM "T7" "T7") SAMPLESUB END OF STMT PARSE #4:c=0,e=423,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1 EXEC #4:c=1999,e=10615,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1 WAIT #4: nam='gc cr block 3-way' ela= 836 p1=8 p2=15 p3=1 obj#=51836 WAIT #4: nam='gc cr block 2-way' ela= 442 p1=6 p2=89 p3=67 obj#=51836 WAIT #4: nam='gc cr block 2-way' ela= 453 p1=6 p2=239 p3=68 obj#=51836 FETCH #4:c=0,e=2540,p=0,cr=10,cu=0,mis=0,r=1,dep=1,og=1 STAT #4 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=10 pr=0 pw=0 time=3903 us)' STAT #4 id=2 cnt=32 pid=1 pos=1 obj=51836 op='TABLE ACCESS FULL T7 (cr=10 pr=0 pw=0 time=2650 us)' 32

2008 Julian Dyke

juliandyke.com

Global Cache Services Cache Fusion Example


UPDATE statement - 10046/8
PARSING IN CURSOR #1 len=34 dep=0 uid=55 oct=6 lid=55 tim=1168417842291309 hv=3829255502 ad='2b8d04dc' UPDATE t7 SET c2 = 20 WHERE c1 = 5 END OF STMT PARSE #1:c=10998,e=61121,p=0,cr=11,cu=0,mis=1,r=0,dep=0,og=1 WAIT #1: nam='gc cr block 3-way' ela= 702 p1=8 p2=15 p3=1 obj#=51836 WAIT #1: nam='gc cr block 2-way' ela= 447 p1=6 p2=89 p3=67 obj#=0 WAIT #1: nam='gc current block 3-way' ela= 650 p1=8 p2=15 p3=33554433 obj#=51836 EXEC #1:c=0,e=2931,p=0,cr=10,cu=1,mis=0,r=1,dep=0,og=1 WAIT #1: nam='SQL*Net message to client' ela= 5 driver id=1650815232 #bytes=1 p3=0 obj#=51836 WAIT #1: nam='SQL*Net message from client' ela= 7807082 driver id=1650815232 #bytes=1 p3=0 obj#=51836 STAT #1 id=1 cnt=0 pid=0 pos=1 obj=0 op='UPDATE T7 (cr=10 pr=0 pw=0 time=2875 us)' STAT #1 id=2 cnt=1 pid=1 pos=1 obj=51836 op='TABLE ACCESS FULL T7 (cr=10 pr=0 pw=0 time=1665 us)'

33

2008 Julian Dyke

juliandyke.com

Global Cache Services gc cr block 3-way wait event


Source RAC4 - Server RAC2 - LMS1 RAC2 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 Destination RAC2 - LMS1 RAC4 - Server RAC3 - LMS1 RAC2 - LMS1 RAC4 - Server RAC4 - Server RAC4 - Server RAC4 - Server RAC4 - Server RAC4 - Server Description Request file 8 block 15 OK Send file 8 block 15 to RAC4 OK Block file 8 block 15 part 1 Block file 8 block 15 part 2 Block file 8 block 15 part 3 Block file 8 block 15 part 4 Block file 8 block 15 part 5 Block file 8 block 15 part 6 Bytes 456 212 480 212 1500 1500 1500 1500 1500 868

34

2008 Julian Dyke

juliandyke.com

Global Cache Services gc cr block 3-way wait event


Resource Master 4 RAC2 2 1 RAC1
1,40 1318 2,44 1,42 2,44

3
1,42 2,44

5 RAC3

10

RAC4 UPDATE t1 SET c2 = 50 WHERE c1 = 2;

35

2008 Julian Dyke

juliandyke.com

Global Cache Services gc cr block 2-way wait event


2-way Consistent Read
Source RAC4 - Server RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 Destination RAC3 - LMS1 RAC4 - Server RAC4 - Server RAC4 - Server RAC4 - Server RAC4 - Server RAC4 - Server RAC4 - Server Description Request file 6 block 69 OK Block file 6 block 69 part 1 Block file 6 block 69 part 2 Block file 6 block 69 part 3 Block file 6 block 69 part 4 Block file 6 block 69 part 5 Block file 6 block 69 part 6 Bytes 400 212 1500 1500 1500 1500 1500 868

36

2008 Julian Dyke

juliandyke.com

Global Cache Services gc cr block 2-way wait event


1,40 2,44

Resource Master 3 4 5 6 7 8

RAC2 1

RAC3 2

1,40 2,44

RAC1
1,40 1318 2,44

RAC4 UPDATE t1 SET c2 = 50 WHERE c1 = 2;

37

STOP

2008 Julian Dyke

juliandyke.com

Global Cache Services gc current block 3-way wait event


3-way Current Read
Source RAC4 - Server RAC2 - LMS1 RAC2 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1 RAC4 - LMS1 RAC2 - LMS1
38

Destination RAC2 - LMS1 RAC4 - Server RAC3 - LMS1 RAC2 - LMS1 RAC4 - Server RAC4 - Server RAC4 - Server RAC4 - Server RAC4 - Server RAC4 - Server RAC2 - LMS1 RAC4 - LMS1

Description Request file 8 block 15 OK Send file 8 block 15 to RAC4 OK Block file 8 block 15 part 1 Block file 8 block 15 part 2 Block file 8 block 15 part 3 Block file 8 block 15 part 4 Block file 8 block 15 part 5 Block file 8 block 15 part 6 Received file 8 block 15 OK

Bytes 456 212 480 212 1500 1500 1500 1500 1500 868 244 212

2008 Julian Dyke

juliandyke.com

Global Cache Services gc current block 3-way wait event


Resource Master 4 RAC2 2 12 RAC1 11 1
1,42 2,50 2,44

3
1,42 2,44

RAC3

UPDATE t1 SET c2 = 42 WHERE c1 = 1; 5 6 7 8 9 10

RAC4
1,40 1318 2,44

UPDATE t1 SET c2 = 50 WHERE c1 = 2;

RAC3 saves past image of the dirty block until RAC4 writes the block to disk

39

STOP

2008 Julian Dyke

juliandyke.com

Global Cache Services Past Images


When an instance passes a dirty block to another instance it Flushes redo buffer to redo log Retains past image (PI) of block in buffer cache PI is retained until another instance writes block to disk Used to reduce recovery times Recorded in V$BH.STATUS as PI Based on X$BH.STATE (value 8 in Oracle 10.2)

40

2008 Julian Dyke

juliandyke.com

Global Cache Services Past Images


Buffer Cache UPDATE t1 SET c1 = 7124; 7128; 7127; 7126; 7125; COMMIT; Buffer Cache

7123 7124 7125 7126 7127 7128 7129

7128 7129

UPDATE t1 SET c1 = 7129; COMMIT;

Instance 1

Instance 2

7123 7124 7125 7126 7127

7124 7125 7126 7127 7128

7128

7129

7129 7123

Redo Log 1
41
STOP

BlockUndo/Redoappliedchanges DBWR hasis updatedcolumn a Instance 42updates perform to Instance 1subsequentlybuffer Block 42 1table t1 contains to Block 422 1 1is2 writtenfrom Assumeis needs recovery Instance notmust column Undo/redoupdated in from GCS transferswritten to 42 Block updates block Instance makes in buffer Undo/Redo Crashes 42 is Undo/redo block written Instance written Undo/Redo written to Block 42 is read from to disk ContentsPastdisk Instance 2 lost to recovery1for by block yet Instance 42cachePast Image Instance cachecache 42 backof 1 uses block are blockRedo Logto disk single row Log 1 a tobuffer 1 Redo in 2 Image DBWR 7127 7126 2 7124 1 1329 7128 7125 back to Log Redo

Redo Log 2

2008 Julian Dyke

juliandyke.com

Global Cache Services gc cr grant 2-way wait event


2-way Consistent Read
Source RAC4 - Server RAC3 - LMS1 RAC3 - LMS1 RAC4 - Server Destination RAC3 - LMS1 RAC4 - Server RAC4 - Server RAC3 - LMS1 Description Request file 6 block 69 OK Grant read file 6 block 69 OK Bytes 400 212 276 212

42

2008 Julian Dyke

juliandyke.com

Global Cache Services gc cr grant 2-way wait event


Resource Master

RAC2 1

RAC3 2

1,40 2,44

RAC1
1,40 1318 2,44

RAC4 SELECT c2 FROM t1 WHERE c1 = 1;

43

STOP

2008 Julian Dyke

juliandyke.com

Global Cache Services gc cr multi block request wait event


Source Destination Description Request file 8 blocks 69-73 Bytes 1872 212 772 212

RAC4 - Server RAC3 - LMS1 RAC3 - LMS1 RAC3 - LMS1

RAC4 - Server OK RAC4 - Server Grant file 8 blocks 69-73 to RAC4 OK

RAC4 - Server RAC3 - LMS1

44

2008 Julian Dyke

juliandyke.com

Global Cache Services gc cr multi block request wait event


Resource Master

RAC2 1

RAC3 2

1,40 1,40 1,40 2,44 1,40 2,44 1,40 2,44 2,44 2,44

RAC1
1,40 1,40 1,40 2,44 1318 1,40 2,44 1,40 2,44 2,44 2,44

RAC4 SELECT c2 FROM t1 WHERE c1 = 1;

45

STOP

2008 Julian Dyke

juliandyke.com

Global Cache Services gc cr multi block request wait event


The following 10046/8 trace is for a gc cr multi block request
WAIT #2: nam='gc cr multi block request' ela= 722 file#=4 block#=248 class#=1 obj#=51866 tim=1169728375495574 WAIT #2: nam='db file scattered read' ela= 10437 file#=4 block#=244 blocks=5 obj#=51866 tim=1169728375506092

This trace can be misleading because: the gc cr multi block request specifies the LAST block in the range the gc cr multi block request does not specify how many blocks should be read the gc cr multi block request does not specify how many blocks have been returned from another instance

46

2008 Julian Dyke

juliandyke.com

Global Cache Services UDP Messages


There are two types of message exchanged within RAC These are PROBABLY defined as follows Synchronous These messages require an acknowledgement for each packet In some cases the acknowledgement packet can be larger than the original request e.g. SCN synchronization Asynchronous These messages do not require an individual acknowledgement for each packet e.g. block transfers between instances
47

2008 Julian Dyke

juliandyke.com

Global Cache Services Lock Modes


Lock modes can be: Null Another instance can hold an exclusive or shared lock Shared Another instance can hold a shared lock but not an exclusive lock Exclusive No other instances can hold shared or exclusive locks Locks can also be: Local No other instance has held an exclusive lock Global Another instance has held an exclusive lock in the past
48

2008 Julian Dyke

juliandyke.com

Global Cache Services Fairness Threshold


Intended to prevent unnecessary lock downgrades when other instances only require read-only copies For write to read transfers Writing instance retains X lock Reading instance retains null lock If _fairness_threshold reached then Writing instance downgrades X lock to S lock Reading instance receives S lock _fairness_threshold default value is 4

49

2008 Julian Dyke

juliandyke.com

Global Cache Services Lock Elements


Lock elements are externalized in the V$LOCK_ELEMENT dynamic performance view Based on X$LE Additional information is available in the X$LE view Past image buffers do not have a lock element In OPS one lock element could manage a contiguous range of blocks Still can in RAC using GC_FILES_PER_LOCK parameter Disables Cache Fusion

50

2008 Julian Dyke

juliandyke.com

Global Cache Services Lock Elements


Contain embedded GCS Client structures (KJBL)

Buffer Header

Buffer Header

Buffer Header

Buffer Header

Lock Element GCS Client

Lock Element GCS Client

Lock Element GCS Client

51

2008 Julian Dyke

juliandyke.com

Global Cache Services Memory Structures


Block Header BH BH Lock Element

GCS Client

LE KJBL

LE KJBL KJBL

GCS Shadow

GCS Resource KJBR KJBR

GCS Shadow describes blocks held by other instances, but mastered locally

52

2008 Julian Dyke

juliandyke.com

Global Cache Services Memory Structures


GCS Resources (KJBR) Stored in segmented array Number of GCS resource structures determined by _gcs_resources parameter Externalized in X$KJBR Number of free GCS resource structures in X$KJBRFX GCS Enqueues (Clients / Shadows) (KJBL) GCS clients embedded in lock elements GCS shadows stored in segmented array Number of GCS shadow structures determined by _gcs_shadow_locks parameter Externalized in X$KJBL Number of free GCS shadow structures in X$KJBLFX
53

2008 Julian Dyke

juliandyke.com

Global Cache Services Dumps


To dump the contents of the global cache use:
ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME GC_ELEMENTS LEVEL 1';
GLOBAL CACHE ELEMENT DUMP (address: 0x21fecd18): id1: 0x3591 id2: 0x10000 obj: 181 block: (1/13713) lock: SL rls: 0x0000 acq: 0x0000 latch: 0 flags: 0x41 fair: 0 recovery: 0 fpin: 'kdswh05: kdsgrp' bscn: 0x0.18a9c bctx: (nil) write: 0 scan: 0x0 xflg: 0 xid: 0x0.0.0 GCS CLIENT 0x21fecd60,1 sq[(nil),(nil)] resp[(nil),0x3591.10000] pkey 181 grant 1 cvt 0 mdrole 0x21 st 0x20 GRANTQ rl LOCAL master 1 owner 0 sid 0 remote[(nil),0] hist 0x7c history 0x3c.0x1.0x0.0x0.0x0.0x0. cflag 0x0 sender 2 flags 0x0 replay# 0 disk: 0x0000.00000000 write request: 0x0000.00000000 pi scn: 0x0000.00000000 msgseq 0x1 updseq 0x0 reqids[1,0,0] infop 0x0 pkey 181 hv 107 [stat 0x0, 1->1, wm 32767, RMno 0, reminc 6, dom 0] kjga st 0x4, step 0.0.0, cinc 8, rmno 10, flags 0x0 lb 0, hb 0, myb 178, drmb 178, apifrz 0

54

2008 Julian Dyke

juliandyke.com

Global Cache Services Dumps


Continued
GLOBAL CACHE ELEMENT DUMP (address: 0x237f4358): id1: 0x6a39 id2: 0x10000 obj: 74 block: (1/27193) lock: SL rls: 0x0000 acq: 0x0000 latch: 0 flags: 0x41 fair: 0 recovery: 0 fpin: 'kdswh05: kdsgrp' bscn: 0x0.26992 bctx: (nil) write: 0 scan: 0x0 xflg: 0 xid: 0x0.0.0 GCS SHADOW 0x237f43a0,1 sq[0x2ee64e8c,0x2eff3858] resp[0x2ee64e74,0x6a39.10000] pkey 74 grant 1 cvt 0 mdrole 0x21 st 0x40 GRANTQ rl LOCAL master 0 owner 0 sid 0 remote[(nil),0] hist 0x12a5 ..... GCS RESOURCE 0x2ee64e74 hashq [0x2ee61894,0x2ff57390] name[0x6a39.10000] pkey 74 grant 0x2eff3858 cvt (nil) send (nil),0 write (nil),0@65535 flag 0x0 mdrole 0x1 mode 1 scan 0 role LOCAL ..... GCS SHADOW 0x2eff3858,1 sq[0x237f43a0,0x2ee64e8c] resp[0x2ee64e74,0x6a39.10000] pkey 74 grant 1 cvt 0 mdrole 0x21 st 0x40 GRANTQ rl LOCAL master 0 owner 1 sid 0 remote[0x23fea160,1] hist 0x65f ..... GCS SHADOW 0x237f43a0,1 sq[0x2ee64e8c,0x2eff3858] resp[0x2ee64e74,0x6a39.10000] pkey 74 grant 1 cvt 0 mdrole 0x21 st 0x40 GRANTQ rl LOCAL master 0 owner 0 sid 0 remote[(nil),0] hist 0x12a5 .....

55

2008 Julian Dyke

juliandyke.com

Global Cache Services Block Mastering


Each block is mastered on one instance Block DBA is reported by X$KJBR.KJBRNAME Names have the format:
[<block_number>][<file_number>][BL]

For example
[0x137][0x40000][BL]

is file# 4, block# 311 Ordering by X$KJBR.KJBRNAME is difficult because the resource names do not collate when sorted e.g.:
[0x12E][0x40000][BL] [0x12F][0x40000][BL] [0x13][0x40000][BL] [0x130][0x40000][BL] [0x131][0x40000][BL] etc...
56

2008 Julian Dyke

juliandyke.com

Global Cache Services Block Mastering


Some useful functions
CREATE OR REPLACE FUNCTION get_file_number (p_resource_name VARCHAR2) RETURN INTEGER IS pos1 INTEGER := INSTR (p_resource_name,'x',1,2); pos2 INTEGER := INSTR (p_resource_name,']',1,2); s VARCHAR2(30) := SUBSTR (p_resource_name,pos1+1,pos2-pos1-1); BEGIN RETURN TO_NUMBER (s,'XXXXXXXX') / 65536; END; / CREATE OR REPLACE FUNCTION get_block_number (p_resource_name VARCHAR2) RETURN INTEGER IS pos1 INTEGER := INSTR (p_resource_name,'x',1,1); pos2 INTEGER := INSTR (p_resource_name,']',1,1); s VARCHAR2(30) := SUBSTR (p_resource_name,pos1+1,pos2-pos1-1); BEGIN RETURN TO_NUMBER (s,'XXXXXXXX'); END; / 57

2008 Julian Dyke

juliandyke.com

Global Cache Services Block Mastering


In Oracle 10.2 block mastering is determined by _lm_contiguous_res_count Specifies number of contiguous blocks that will hash to the same HV bucket Defaults to 128 For example Instance 1 Instance 0
Start 0x080 0x180 0x280 0x380 0x480 0x580 etc
58

End 0x0FF 0x1FF 0x2FF 0x3FF 0x4FF 0x5FF etc

Start 0x000 0x100 0x200 0x300 0x400 0x500 etc

End 0x07F 0x17F 0x27F 0x37F 0x47F 0x57F etc

2008 Julian Dyke

juliandyke.com

Global Cache Services Block Mastering


In Oracle 10.1 and below block mastering is determined by a hash function Algorithm applied to groups of 1289 contiguous blocks In two node cluster Instance 0 has 645 blocks Instance 1 has 644 blocks etc In three node cluster Instance 0 has 430 blocks Instance 2 has 215 blocks Instance 1 has 430 blocks Instance 2 has 214 blocks etc Beware of small hot tables and indexes....
59

2008 Julian Dyke

juliandyke.com

Global Cache Services Block Mastering


The following table shows that masters are still assigned to ranges of 128 contiguous blocks in a four-node cluster
Start Block 0 128 256 384 512 640 768 896 1024 1280 60 End Block 127 255 383 511 639 767 895 1023 1279 1407 Master 1 2 2 3 3 3 1 0 2 1

2008 Julian Dyke

juliandyke.com

Global Cache Services Dynamic Remastering


In Oracle 9.2 documentation describes dynamic remastering not implemented in code In Oracle 10.1 work at data file level very high threshold so difficult to test does occur on some customer sites In Oracle 10.2 works at segment level thresholds are relatively low

61

2008 Julian Dyke

juliandyke.com

Global Cache Services Dynamic Remastering


Example
SELECT data_object_id FROM dba_objects WHERE owner = 'US01'AND object_name = 'T1'; OBJECT_ID --------52084

To remaster object at current instance use:


ORADEBUG LKDEBUG -m pkey 52084

All blocks now mastered by the current instance To redistribute masters to all available instances use:
ORADEBUG LKDEBUG -m dpkey 52084

Blocks mastered by both (all) instances again


62

2008 Julian Dyke

juliandyke.com

Global Cache Services Dynamic Remastering


Object remastering is recorded in V$GCSPFMASTER_INFO Instances are internally numbered 0, 1 etc Initially contains no rows After remastering object 52084 to instance 0
SELECT object_id, current_master, previous_master FROM v$gcspfmaster_info;

Object ID 52084

Current Master 0

Previous Master 32767

After remastering object 52084 to instance 1


Object ID 52084
63

Current Master 1

Previous Master 0

2008 Julian Dyke

juliandyke.com

Global Cache Services Dynamic Remastering


In Oracle 10.2 and above, information about Dynamic Remastering operations is also reported in the following fixed views X$KJDRMREQ Dynamic Remastering Requests X$KJDRMAFNSTATS File Remastering Statistics X$KJDRMHVSTATS Hash Value Statistics

64

2008 Julian Dyke

juliandyke.com

Global Cache Services Dynamic Remastering


In Oracle 11.1 and above, Dynamic Remastering statistics are reported in V$DYNAMIC_REMASTER_STATS
Col;umn Name REMASTER_OPS REMASTER_TIME REMASTERED_OBJECTS QUIESCE_TIME FREEZE_TIME CLEANUP_TIME REPLAY_TIME FIXWRITE_TIME SYNC_TIME RESOURCES_CLEANED REPLAYED_LOCKS_SENT REPLAYED_LOCKS_RECEIVED CURRENT_OBJECTS 65 Data Type NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER NUMBER

2008 Julian Dyke

juliandyke.com

Global Cache Services Dynamic Remastering


Dynamic remastering is coordinated by the LMD0 background The LMD0 process background process includes limited details of dynamic remastering operations Excessive dynamic remastering can cause instance freezes Observed in both Oracle 10.1 and 10.2 Oracle Support occasionally recommends that dynamic remastering is disabled using the following parameters:
_gc_affinity_time = 0 _gc_undo_affinity=FALSE

66

2008 Julian Dyke

juliandyke.com

Global Cache Services System Change Number


In RAC clusters SCN must be maintained across all nodes in cluster SCN propagation scheme differs according to version In Oracle 10.1and below defaults to Lamport algorithm Lamport in alert.log SCN piggy-backed on GCS/GES messages Recorded in redo log Default delay of 7 seconds In Oracle 10.2 and above defaults to Broadcast on Commit algorithm SCN negotiated immediately Apparently no delay
67

2008 Julian Dyke

juliandyke.com

Global Cache Services System Change Number


System Change Number algorithm is determined by the MAX_COMMIT_PROPAGATION_DELAY parameter In Oracle 10.1 and below Initialization parameter specified in centriseconds Default value is 700 centiseconds (7 seconds) Specifies maximum time taken for a COMMIT on one node to be reflected on other nodes in the cluster For some applications performing rapid updates and queries of the same data from different instances, value must be set to 0 (Broadcast on commit) Examples include: E-Business suite SAP
68

2008 Julian Dyke

juliandyke.com

Global Cache Services System Change Number


In Oracle 10.2 and above Default value of MAX_COMMIT_PROPAGATION_DELAY parameter is 0 SCN broadcast on commit method is used SCN updates are synchronized immediately SCN is synchronized after current read before block updated This ensures correct SCN is written to block

69

2008 Julian Dyke

juliandyke.com

Global Cache Services Broadcast on Commit


Ethernet broadcast is not used SCN is synchronized by updating instance Sends UDP SCN synchronization message to each remote instance Remote instances respond with their current SCN Another round of messages may be required if remote SCNs are more recent than local SCN Synchronization occurs every time an instance needs a new SCN Synchronization is always performed by the updating instance Number of messages = 4 x (number of instances - 1)
70

2008 Julian Dyke

juliandyke.com

Global Cache Services Broadcast on Commit


In a 4-node cluster 12 messages are exchanged
Source RAC4-LMS0 RAC1-LMS0 RAC4-LMS0 RAC2-LMS0 RAC4-LMS0 RAC3-LMS0 RAC1-LMS0 RAC4-LMS0 RAC2-LMS0 RAC4-LMS0 RAC3-LMS0 RAC4-LMS0
71

Destination RAC1-LMS0 RAC4-LMS0 RAC2-LMS0 RAC4-LMS0 RAC3-LMS0 RAC4-LMS0 RAC4-LMS0 RAC1-LMS0 RAC4-LMS0 RAC2-LMS0 RAC4-LMS0 RAC3-LMS0

Description Send current SCN OK Send current SCN OK Send current SCN OK Send current SCN OK Send current SCN OK Send current SCN OK

Bytes 192 212 192 212 192 212 192 212 192 212 192 212

2008 Julian Dyke

juliandyke.com

Global Cache Service Read Consistency


When a read consistent version of a block is requested it may be necessary to apply undo to a more recent version of that block Undo can be applied by LMSn background process in Remote instance Local instance If undo applied by remote instance, any outstanding redo must first be flushed from redo buffer of remote instance to redo log Can have significant performance impact on consistent reads Particularly on extended clusters

72

2008 Julian Dyke

juliandyke.com

Global Cache Service Read Consistency


Statistics on inter-instance consistent reads are reported in V$CR_BLOCK_SERVER Reports statistics for blocks served by local instances to remote instances including Number of consistent reads served Number of current reads served Number of data blocks served Number of undo blocks served Number of undo headers served Number of fairness down converts Number of log flushes Number of times light works rule invoked

73

2008 Julian Dyke

juliandyke.com

Global Cache Service Read Consistency


In theory, once a block has been written to disk, the LMS process will not attempt to read it again when responding to a consistent read request Light Works Rule Prevents LMS processes from going to disk when responding to CR requests for data, undo or undo segment blocks Can prevent LMS process from completing its response to a CR request

74

2008 Julian Dyke

juliandyke.com

Global Cache Service Read Consistency


Uncommitted changes MUST be flushed to the redo log before the LMS process can ship a consistent block to another instance Reading process must wait until redo log changes have been written to redo log by LMS process Bad for standard RAC databases Reads must wait for redo log writes Worse for extended / stretch RAC clusters Increased latency of cross site disk communications

75

2008 Julian Dyke

juliandyke.com

Global Cache Service Read Consistency


For each block on which a consistent read is performed, a redo log flush must first be performed Number of redo log flushes is recorded in the FLUSHES column of V$CR_BLOCK_SERVER Redo log flush time is recorded in the gc cr block flush time statistic for the LMS process will increase time taken to serve consistent block will increase time taken to perform consistent read If LMS processes become very busy, consistent reads will experience high wait times e.g. for a full table scan gc cr multi block request
76

2008 Julian Dyke

juliandyke.com

Global Cache Services Read Consistency


Committed transaction on RAC2 - All blocks still in buffer cache

2 110 3 110

109

108 Buffer Cache Redo Buffer RAC1 1 108 Buffer Cache Redo Buffer RAC2

Redo Log

77

STOP

2008 Julian Dyke

juliandyke.com

Global Cache Services Read Consistency


Committed transaction on RAC2 - Some blocks written to disk

3 110 110

109

108 Buffer Cache Redo Buffer RAC1 1 110 4 2 Buffer Cache Redo Buffer RAC2

Redo Log

78

STOP

2008 Julian Dyke

juliandyke.com

Global Cache Services Read Consistency


Uncommitted transaction on RAC2 - All blocks still in buffer cache

1 3 108 6 108 109 110 110

4 5

109

108 Buffer Cache Redo Buffer RAC1 2 108 Buffer Cache Redo Buffer RAC2

Redo Log

79

STOP

2008 Julian Dyke

juliandyke.com

Global Cache Services Read Consistency


Uncommitted transaction on RAC2 - Some blocks written to disk

2 108 109 110 5 7 6 8 108 Buffer Cache Redo Buffer RAC1 3 110 4 1 Buffer Cache Redo Buffer RAC2 110

109

Redo Log

80

STOP

2008 Julian Dyke

juliandyke.com

Global Cache Services Jumbo Frames


By default Maximum Transmission Unit (MTU) is 1500 MTU includes IP header UDP header Data Requires six packets to transmit one 8192 byte block On some adapters MTU can be increased to around 9000 e.g. Intel PRO/1000 At command line
ifconfig eth1 mtu 9000 up

or in /etc/sysconfig/ifcfg-eth<x>
MTU=9000

81

2008 Julian Dyke

juliandyke.com

Global Cache Services Jumbo Frames


Example - cost of sending on 8192 byte block MTU=1500 (default)
Frame# 1 2 3 4 5 6 Total Ethernet Header 14 14 14 14 14 14 84 IP Header 20 20 20 20 20 20 120 UDP Header 8 8 8 8 8 8 48 Data 1472 1472 1472 1472 1472 840 8200 Ethernet Trailer 4 4 4 4 4 4 24 Total 1518 1518 1518 1518 1518 886 8476

MTU=9000
Frame# 1 Total 82 Ethernet Header 14 14 IP Header 20 20 UDP Header 8 8 Data 8200 8200 Ethernet Trailer 4 4 Total 8246 8246

2008 Julian Dyke

juliandyke.com

Global Cache Services Jumbo Frames


Not all network adapter drivers support jumbo frames Particularly cheap ones.... All network adapters in private interconnect must have same MTU size Switch must also be configured to support jumbo frames Lots of bugs and compatibility issues e.g. Bug 4447620: RAC UDP MTU size restricted to 1500 or 9000 affects 10.1.0.5, 10.2,0.1 fixed in 10.2.0.2 and above

83

2008 Julian Dyke

juliandyke.com

Thank you for listening

Any questions?

info@juliandyke.com
84

2008 Julian Dyke

juliandyke.com

Вам также может понравиться