Вы находитесь на странице: 1из 39

ibm.

com/db2/labchats

DB2 pureScale : A Technology Preview

Oct 21, 2009 ibm.com/db2/labchats

1 © 2009 IBM Corporation


> Executive’s Message

Sal Vella
Vice President, Development,
Distributed Data Servers and Data Warehousing
IBM

2 © 2009 IBM Corporation


> Featured Speaker

Matt Huras
Distinguished Engineer,
DB2 for Linux, UNIX, and Windows
IBM

3 3
© 2009 IBM Corporation
> Featured Speaker

Aamer Sachedina
Senior Technical Staff Member,
DB2 for Linux, UNIX, and Windows
IBM

4 4
© 2009 IBM Corporation
Agenda

 Introduction
 Goals & Value Propositions
 Technology Overview

 Technology In-Depth
 Key Concepts & Internals
 Efficient scaling
 Failure modes & recovery automation
 Stealth Maintenance

 Configuration, Monitoring, Tooling


 Cluster configuration and operational status
 Monitoring data
 Client configuration and load balancing
 Solution Packaging

5 © 2009 IBM Corporation


DB2 pureScale : Goals

 Unlimited Capacity
Any transaction processing or ERP workload
Start small
Grow easily, with your business

 Application Transparency
Avoid the risk and cost of tuning your
applications to the database topology

 Continuous Availability
Maintain service across planned and
unplanned events

6 © 2009 IBM Corporation


DB2 pureScale : Technology Overview
Leverage IBM’s System z Sysplex Experience and Know-How

Clients Clients connect anywhere,…


… see single database
 Clients connect into any member
 Automatic load balancing and client reroute may change
underlying physical member to which client is connected
Single Database View

DB2 engine runs on several host computers


 Co-operate with each other to provide coherent access to the
database from any member
Member Member Member Member

Integrated cluster services


CS CS CS CS  Failure detection, recovery automation, cluster file system
 In partnership with STG (GPFS,RSCT) and Tivoli (SA MP)

Low latency, high speed interconnect


Cluster Interconnect  Special optimizations provide significant advantages on RDMA-
capable interconnects (eg. Infiniband)

CS CS
2nd-ary Log Log Log Log Primary
PowerHA pureScale technology from STG
 Efficient global locking and buffer management
Shared Storage Access  Synchronous duplexing to secondary ensures availability

Database
Data sharing architecture
 Shared access to database
 Members write to their own logs
 Logs accessible from another host (for recovery)

7 © 2009 IBM Corporation


Scale with Ease

 Without changing
applications
 Efficient coherency protocols
designed to scale without Single Database View
application change
 Applications automatically and
transparently workload balanced
across members

 Without administrative DB2 DB2 DB2 DB2 DB2


complexity
 No data redistribution required
Log Log Log Log Log

 To 128 members in initial


release

8 © 2009 IBM Corporation


Online Recovery

DB2 DB2 DB2 DB2

 A key DB2 pureScale design point


is to maximize availability during
failure recovery processing
Log Log Log Log

 When a database member fails,


only data in-flight on the failed
member remains locked during the Database member
failure
automated recovery Only data in-flight updates
locked during recovery
 In-flight = data being updated on the member

% of Data Available
at the time it failed 100

50

Time (~seconds)

9 © 2009 IBM Corporation


Stealth System Maintenance

 Goal: allow DBAs to apply Single Database View


system maintenance without
negotiating an outage window

 Procedure: DB2 DB2 DB2 DB2


 Drain (aka Quiesce)
 Remove & Maintain
 Re-integrate Log Log Log Log

 Repeat until done

10 © 2009 IBM Corporation


Agenda

 Introduction
 Goals & Value Propositions
 Technology Overview

 Technology In-Depth
 Key Concepts & Internals
 Efficient scaling
 Failure modes & recovery automation
 Stealth Maintenance

 Configuration, Monitoring, Tooling


 Cluster configuration and operational status
 Monitoring data
 Client configuration and load balancing
 Installation

11 © 2009 IBM Corporation


What is a Member ?
Member 0 Member 1
db2sysc process db2sysc process
db2 agents & other db2 agents & other
 A DB2 engine address space threads threads

 i.e. a db2sysc process and its threads

 Members Share Data log buffer, log buffer,


dbheap, & dbheap, &
 All members access the same shared database other heaps other heaps
 Aka “Data Sharing”
bufferpool(s) bufferpool(s)

 Each member has it’s own …


 Bufferpools
 Memory regions
 Log files
Log Log

 Members are logical.


Can have …
 1 per machine or LPAR (recommended) Shared database
 >1 per machine or LPAR (not recommended) (Single database partition)

12 © 2009 IBM Corporation


What is a PowerHA pureScale ?

db2 agents & other db2 agents & other


 Software technology that assists threads threads
in global buffer coherency
management and global locking
 Derived from System z Parallel Sysplex &
Coupling Facility technology log buffer, log buffer,
dbheap, & dbheap, &
 Software based other heaps other heaps

 Services provided include bufferpool(s) bufferpool(s)

 Group Bufferpool (GBP)


 Global Lock Management (GLM)
 Shared Communication Area (SCA)

Primary
Log Log
 Members duplex GBP, GLM, GBP GLM SCA
SCA state to both a primary and
secondary Secondary
 Done synchronously
 Duplexing is optional (but recommended)
 Set up automatically, by default
Shared database
(Single database partition)
13 © 2009 IBM Corporation
The Role of the GBP Client A :
Select from T1
Client B : Client
where C2=Y C :
Update T1 set C1=X Select from T1
 GBP acts as fast disk cache where C2=Y where C2=Y
Commit
 Dirty pages stored in GBP, then later,
written to disk
 Provides fast retrieval of such pages Member 0 Member 1
when needed by other members

 GBP includes a “Page


Registry” bufferpool(s) bufferpool(s)
 Keeps track of what pages are buffered
in each member and at what memory
address
 Used for fast invalidation of such pages
when they are written to the GBP

Wr

te
da
it e
ge
 Force-at-Commit (FAC) Pa

ali
Pa

Inv
ad

ge
protocol ensures coherent Re

nt”
ile
access to data across

“S
members
 DB2 “forces” (writes) updated pages to GBP GLM SCA
GBP at COMMIT (or before)
 GBP synchronously invalidates any
copies of such pages on other members
– New references to the page on Page
other members will retrieve new Registry
copy from GBP
M1 M2
– In-progress references to page
can continue
14 © 2009 IBM Corporation
The Role of the GLM Client A :
Select from T1
Client B : Client
where C2=Y C :
Update T1 set C1=X Select from T1
where C2=Y where C2=Y
 Grants locks to members Commit
upon request Member 0 Member 1
 If not already held by another member,
or held in a compatible mode

 Maintains global lock state Page LSN is


old,
recent,
row row
lock
 Which member has what lock, in what lock
not needed
needed
mode
 Also - interest list of pending lock
requests for each lock

X
Lo
ck
Re
q
 Grants pending lock requests q

Wr
Re

te
Loc
when available

da
kR

it e
e k
elea ag oc

ali
L

Pa
se P

Inv
d S
 Via asynchronous notification a

ge
Re

nt”
ile
“S
 Notes
 When a member owns a lock, it may GBP GLM SCA
grant further, locally
 “Lock Avoidance” : DB2 avoids lock R32
requests when log sequence number
in page header indicates no update on R33
R33
Page
the page could be uncommitted Registry
M1-X
M2-S

M1 M2 R34

15 © 2009 IBM Corporation


Achieving Efficient Scaling : Key Design Points
 Deep RDMA exploitation over
low latency fabric Lock Mgr Lock Mgr Lock Mgr Lock Mgr
 Enables round-trip response time
~10-15 microseconds
Buffer Mgr

 Silent Invalidation
 Informs members of page updates
requires no CPU cycles on those

Ca
n
members

Yu

e
Ih

ag
p,

av

dP
h

e
 No interrupt or other message

er
Ne

th

a
e

Re
is
w

yo
processing required

loc
pa

u
ge

k?
ar
e.
 Increasingly important as cluster

im
ag
grows

e
GBP GLM SCA

 Hot pages available without


disk I/O from GBP memory
 RDMA and dedicated threads enable
read page operations in
~10s of microseconds

16 © 2009 IBM Corporation


Scalability : Example Clients (2-way x345)

 Transaction processing
workload modeling warehouse
& ordering process 1Gb Ethernet
 Write transactions rate to 20% Client
 Typical read/write ratio of many OLTP Connectivity
p550
workloads p550 powerHA pureScale
members

 No cluster awareness in the


application
 No affinity
20Gb IB
 No partitioning pureScale
 No routing of transactions to members Interconnect
 Testing key DB2 pureScale design point 7874-024
Switch

 Configuration
 12 8-core p550 members Two 4Gb FC
 64 GB, 5 GHz each Switches
 Duplexed PowerHA pureScale across 2
additional 8-core p550s
 64 GB, 5 GHz each
 DS8300 storage
 576 15K disks, Two 4Gb FC Switches DS8300
 IBM 20Gb/s IB HCAs Storage
 7874-024 IB Switch

17 © 2009 IBM Corporation


Scalability : Example

12
11 10.4x @ 12 members
Throughput vs 1 member

10
9
8
7.6x @ 8 members
7
6
5
4 3.9x @ 4 members

3
2 1.98x @ 2 members

1
0
0 5 10 15
# Members
18 © 2009 IBM Corporation
Member SW Failure : “Member Restart on
Home Host”
 kill -9 erroneously issued to a member
Clients
 DB2 Cluster Services automatically detects
member’s death
 Informs other members & powerHA pureScale servers
 Initiates automated member restart on same (“home”)
host
 Member restart is like a database crash recovery in a Single Database View
single system database, but is much faster
• Redo limited to inflight transactions (due to FAC)
• Benefits from page cache in GBP
Automatic; kill -9

 In the mean-time, client connections are


Ultra Fast;
transparently re-routed to healthy members
 Based on least load (by default), or, DB2 DB2 DB2 DB2
 Pre-designated failover member CS CS CS CS
Online Log
Records Pages
 Other members remain fully available
throughout – “Online Failover” Log Log Log Log

 Primary retains update locks held by member at the CS


CS
time of failure
Updated Pages Updated Pages
 Other members can continue to read and update data Global Locks Shared Data Global Locks
not locked for write access by failed member
Secondary Primary

 Member restart completes


 Retained locks released and all data fully available
19 © 2009 IBM Corporation
Member HW Failure : “Member Restart on Guest
Host (aka Restart Light)”
 Power cord tripped over accidentally
 DB2 Cluster Services looses heartbeat and
Clients
declares member down
 Informs other members & PowerHA pureScale servers
 Fences member from logs and data
 Initiates automated member restart on another
(“guest”) host
 Using reduced, and pre-allocated memory model Single Database View
 Member restart is like a database crash recovery in a
single system database, but is much faster
• Redo limited to inflight transactions (due to FAC)
• Automatic;
Benefits from page cache in PowerHA pureScale

 In the mean-time, client connections are Fast;


Ultra
automatically re-routed to healthy members
DB2 DB2 DB2
 Based on least load (by default), or, DB2
 Pre-designated failover member Online CS CS CS
Fe
nc
e
DB2
CS

 Other members remain fully available


throughout – “Online Failover”

Lo
g
R
ec
Log Log Log Log

s
 Primary retains update locks held by member at the Pa
ge
time of failure CS s CS

 Other members can continue to read and update data Updated Pages Updated Pages
Global Locks Global Locks
not locked for write access by failed member Shared Data
Secondary Primary
 Member restart completes
 Retained locks released and all data fully available

20 © 2009 IBM Corporation


Member Failback

Clients

 Power restored and system re-booted


Single Database View

 DB2 Cluster Services automatically detects


system availability
 Informs other members and PowerHA
pureScale servers
 Removes fence
 Brings up member on home host DB2 DB2 DB2 DB2
CS CS CS CS

DB2

 Client connections automatically re-routed


back to member Log Log Log Log

CS CS
Updated Pages Updated Pages
Global Locks Global Locks
Shared Data
Secondary Primary

21 © 2009 IBM Corporation


Primary PowerHA pureScale Failure

Clients
 Power cord tripped over accidentally

 DB2 Cluster Services looses heartbeat Single Database View


and declares primary down
 Informs members and secondary
 PowerHA pureScale service momentarily
blocked Automatic;
 All other database activity proceeds normally
 Eg. accessing pages in bufferpool, existing
locks, sorting, aggregation, etc Ultra Fast;
DB2 DB2 DB2 DB2

 Members send missing data to Online CS CS CS CS

secondary
 Eg. read locks
Log Log Log Log

CS CS
Updated Pages Updated Pages
 Secondary becomes primary Global Locks Shared Data Global Locks

 PowerHA pureScale service continues where Secondary Primary


it left off Primary
 No errors are returned to DB2 members

22 © 2009 IBM Corporation


PowerHA pureScale Re-integration

Clients
 Power restored and system re-booted

 DB2 Cluster Services automatically Single Database View


detects system availability
 Informs members and primary

 New system assumes secondary role


in ‘catchup’ state
 Members resume duplexing
 Members asynchronously send lock and DB2 DB2 DB2 DB2
other state information to secondary CS CS CS CS
 Members asynchronously castout pages from
primary to disk

Log Log Log Log

 Catchup complete CS CS
 Secondary in peer state (contains same lock Updated Pages Updated Pages
and page state as primary) Global Locks Shared Data Global Locks

Primary Secondary

(Catchup
(Peer state)
state)

23 © 2009 IBM Corporation


Secondary PowerHA pureScale Failure

Clients
 Power cord tripped over accidentally

 DB2 Cluster Services looses heartbeat Single Database View


and declares secondary down
 Informs members and primary
 Members stop duplexing
Automatic;

Ultra Fast;
DB2 DB2 DB2 DB2

Online CS CS CS CS


Log Log Log Log

(Re-integration similar to previous chart)


CS CS
Updated Pages Updated Pages
Global Locks Global Locks
Shared Data
Secondary Primary

24 © 2009 IBM Corporation


Summary (Single Failures)
Other
Members
Remain Automatic &
Failure Mode Online ? Transparent ?

DB2 DB2 DB2 DB2

Member
CF CF
Connections to failed
member transparently
move to another member

Primary
PowerHA DB2 DB2 DB2 DB2

pureScale
CF CF

Secondary
PowerHA DB2 DB2 DB2 DB2

pureScale
CF CF

25 © 2009 IBM Corporation


Simultaneous Failures
Other
Members
Remain Automatic &
Failure Mode Online ? Transparent ?

DB2 DB2 DB2 DB2

CF CF
Connections to failed
member transparently
move to another member

DB2 DB2 DB2 DB2

CF CF
Connections to failed
member transparently
move to another member

DB2 DB2 DB2 DB2

CF CF
Connections to failed
member transparently
move to another member

26 © 2009 IBM Corporation


“Stealth” Maintenance : Example

 Ensure automatic load balancing is enabled


(it is by default)
Single Database View

 db2stop member 3 quiesce

 db2stop instance on host <hostname>

 Perform desired maintenance


DB2 DB2 DB2 DB2
eg. install AIX PTF

 db2start instance on host <hostname>


Log Log Log Log

 db2start member 3

27 © 2009 IBM Corporation


Agenda

 Introduction
 Goals & Value Propositions
 Technology Overview

 Technology In-Depth
 Key Concepts & Internals
 Efficient scaling
 Failure modes & recovery automation
 Stealth Maintenance

 Configuration, Monitoring, Tooling


 Cluster configuration and operational status
 Monitoring data
 Client configuration and load balancing
 Installation

28 © 2009 IBM Corporation


db2nodes.cfg

Clients

Single Database View

db2nodes.cfg
host0 host1 host2 host3 0 host0 0 host0ib MEMBER
1 host1 0 host1ib MEMBER
DB2 DB2 DB2 DB2 2 host2 0 host2ib MEMBER
3 host3 0 host3ib MEMBER
4 host4 0 host4ib CF
5 host5 0 host5ib CF

host4 host5

Shared Data

29 © 2009 IBM Corporation


Instance and Host Status

Clients
> db2start
08/24/2008 00:52:59 0 0 SQL1063N DB2START processing was successful.
08/24/2008 00:53:00 1 0 SQL1063N DB2START processing was successful.
08/24/2008 00:53:01 2 0 SQL1063N DB2START processing was successful.
08/24/2008 00:53:01 3 0 SQL1063N DB2START processing was successful.
Single Database View SQL1063N DB2START processing was successful.

> db2instance -list

host0 host1 host2 host3 ID TYPE STATE HOME_HOST CURRENT_HOST ALERT

DB2 DB2 DB2 DB2 0 MEMBER STARTED host0 host0 NO


1 MEMBER STARTED host1 host1 NO
2 MEMBER STARTED host2 host2 NO
3 MEMBER STARTED host3 host3 NO
4 CF PRIMARY host4 host4 NO
host4 host5 5 CF PEER host5 host5 NO

Shared Data HOST_NAME STATE INSTANCE_STOPPED ALERT

host0 ACTIVE NO NO
host1 ACTIVE NO NO
host2 ACTIVE NO NO
host3 ACTIVE NO NO
host4 ACTIVE NO NO
host5 ACTIVE NO NO

30 © 2009 IBM Corporation


Instance Status
Where member or
Node state CF is currently
For members typically… running
Clients Target
(started, stopped, restarting, host for(Normally same as Does the
>waiting_for_failback)
db2start member home host. When member or CF
08/24/2008 00:52:59
08/24/2008 00:53:00
(0Member
1 0
0 SQL1063N
tries to DB2START processing was successful.
differs, usually
SQL1063N DB2START processingrequire was successful.
For CFs typically… 2run0on SQL1063N
this hostindicates
DB2STARThome host
Node type
08/24/2008
(primary,
00:53:01
peer, stopped, attention?
processing was successful.
Number from 08/24/2008 00:53:01 3 when
0 it is
SQL1063N failed and
DB2START member is
processing was successful.
(Example: member
Single Database View (member,
db2nodes.cfg CF)catchup(##%),
SQL1063N restarting)
DB2START processing was successful.
available.) restarting.) restart failed)

> db2instance -list


host0 host1 host2 host3 ID TYPE STATE HOME_HOST CURRENT_HOST ALERT

DB2 DB2 DB2 DB2 0 MEMBER STARTED host0 host0 NO


1 MEMBER STARTED host1 host1 NO
2 MEMBER STARTED host2 host2 NO
3 MEMBER STARTED host3 host3 NO
4 CF PRIMARY host4 host4 NO
host4 host5 5 CF PEER host5 host5 NO

Shared Data HOST_NAME STATE INSTANCE_STOPPED ALERT

host0 ACTIVE NO NO
host1 ACTIVE NO NO
host2 ACTIVE NO NO
host3 ACTIVE NO NO
host4 ACTIVE NO NO
host5 ACTIVE NO NO

31 © 2009 IBM Corporation


Host Status

Clients
> db2start
08/24/2008 00:52:59 0 0 SQL1063N DB2START processing was successful.
08/24/2008 00:53:00 1 0 SQL1063N DB2START processing was successful.
08/24/2008 00:53:01 2 0 SQL1063N DB2START processing was successful.
08/24/2008 00:53:01 3 0 SQL1063N DB2START processing was successful.
Single Database View SQL1063N DB2START processing was successful.

Has the instance been


> db2instance -listdisabled on this host?
DBAs can stop (aka disable)
host0 host1 host2 host3 ID TYPE STATE the instance on the
HOME_HOST host for
CURRENT_HOST ALERT
Does the host
Host state the purposes of maintenance
(active) indicates the host is(eg. Upgrades). While require attention?
DB2 DB2 DB2 DB2 0up and
MEMBER
available.STARTED disabled,host0 host0
member restart (Examples:
and NO
power
A host that is
1 MEMBER STARTED otherhost1 DB2 activity ishost1 failure, can’t
NO
defined in the
2(inactive)
MEMBER indicates the host prevented
STARTED is host2 host2
on the host. communicateNO
with
instance
down and not
3 MEMBER STARTED available. host3 host3 host) NO
4 CF PRIMARY host4 host4 NO
host4 host5 5 CF PEER host5 host5 NO
CF CF

Shared Data HOST_NAME STATE INSTANCE_STOPPED ALERT

host0 ACTIVE NO NO
host1 ACTIVE NO NO
host2 ACTIVE NO NO
host3 ACTIVE NO NO
host4 ACTIVE NO NO
host5 ACTIVE NO NO

32 © 2009 IBM Corporation


Client Connectivity and Workload Balancing
 Run-time load information used to automatically balance load across the cluster (as in System z
sysplex)
 Load information of all members kept on each member
 Piggy-backed to clients regularly
 Used to route next connection (or optionally next transaction) to least loaded member
 Routing occurs automatically (transparent to application)

 Failover
 Load of failed member evenly distributed to surviving members automatically

 Fallback
 Once the failed member is back online, fallback does the reverse

Clients Clients

33 © 2009 IBM Corporation


Optional Affinity-based Routing
App Servers App Servers App Servers App Servers
Group A Group B Group C Group D

 Allows you to target different groups of


clients or workloads to different members
in the cluster

 Maintained after failover …


 … and fallback

 Example use cases

 Consolidate separate workloads/applications on


same database infrastructure
 Minimize total resource requirements for disjoint
workloads

 Easily configured through client


configuration

 db2dsdriver.cfg file

34 © 2009 IBM Corporation


Operational Monitoring
Member 0 Member 1
 New monitoring views and SQL
functions db2 agents db2 agents
 Global locking and global bufferpool
statistics
 Drill down into other PowerHA
pureScale internal statistics 100 Logical Reads 50 Logical Reads
 Cluster communications time
 Cross-member page access statistics
IBMDEFAULTBP IBMDEFAULTBP

 Drill down per member… LBP Hit Ratio = 95% LBP Hit Ratio = 80%
… or get global view
 Available from any member

4 return 5 GBP 10 GBP 8 return


page; 1 Logical Logical page; 2
 Event monitors “always available” does’nt Reads Reads do not
mode
 DB2 pureScale chooses initial member
automatically
 Fails over automatically if member fails GBP
Overall GBP Hit
1 page 1 GBP Ratio = 80% 2 GBP 2 pages
returned Physical Physical returned
 Various new monitoring elements from Read Reads from
disk disk
 Example, GBP tuning related elements
(partial list):
– DATA_GBP_L_READS
– DATA_GBP_P_READS
– INDEX_GBP_L_READS
– INDEX_GBP_P_READS

35 © 2009 IBM Corporation


DB2 pureScale : A Complete Solution

Clients

Single Database View

 DB2 pureScale is a complete


software solution
 Comprised of tightly integrated
subcomponents
Member Member Member Member
CS CS CS CS

 Single install invocation


 Installs all components across desired
Cluster Interconnect hosts
 Automatically configures best practices
CS CS
Log Log Log Log
 No cluster manager scripting
Shared Storage Access
or configuration required
 This is set up automatically, upon
Database installation

36 © 2009 IBM Corporation


DB2 pureScale

 Unlimited Capacity
Start small
Grow easily, with your business

 Application Transparency
Avoid the risk and cost of tuning your
applications to the database topology

 Continuous Availability
Maintain service across planned and
unplanned events

37 © 2009 IBM Corporation


> Questions

38 38
© 2009 IBM Corporation
Thank You!

ibm.com/db2/labchats

ng!
n di
tte
ora
f
u
yo
a nk
Th

39 39
© 2009 IBM Corporation

Вам также может понравиться