Slides DB2 - Purescale

ibm.
com/db2/labchats
DB2 pureScale : A Technology Preview
Oct 21, 2009 ibm.com/db2/labchats
1 © 2009 IBM Corporation

> Executive’s Message
Sal Vella
Vice President, Development,
Distributed Data Servers and Data Warehousing
IBM

> Featured Speaker
Matt Huras
Distinguished Engineer,
DB2 for Linux, UNIX, and Windows
IBM
3 3
© 2009 IBM Corporation
> Featured Speaker
Aamer Sachedina
Senior Technical Staff Member,
DB2 for Linux, UNIX, and Windows
IBM
4 4
Agenda
 Introduction
 Goals & Value Propositions
 Technology Overview
 Technology In-Depth
 Key Concepts & Internals
 Efficient scaling
 Failure modes & recovery automation
 Stealth Maintenance
 Configuration, Monitoring, Tooling

 Cluster configuration and operational status
 Monitoring data
 Client configuration and load balancing
 Solution Packaging

DB2 pureScale : Goals
 Unlimited Capacity
Any transaction processing or ERP workload
Start small
Grow easily, with your business
 Application Transparency
Avoid the risk and cost of tuning your
applications to the database topology
 Continuous Availability
Maintain service across planned and
unplanned events

DB2 pureScale : Technology Overview
Leverage IBM’s System z Sysplex Experience and Know-How
Clients Clients connect anywhere,…

… see single database
 Clients connect into any member
 Automatic load balancing and client reroute may change
underlying physical member to which client is connected
Single Database View
DB2 engine runs on several host computers

 Co-operate with each other to provide coherent access to the
database from any member
Member Member Member Member
Integrated cluster services

CS CS CS CS  Failure detection, recovery automation, cluster file system
 In partnership with STG (GPFS,RSCT) and Tivoli (SA MP)
Low latency, high speed interconnect

Cluster Interconnect  Special optimizations provide significant advantages on RDMA-
capable interconnects (eg. Infiniband)
CS CS
2nd-ary Log Log Log Log Primary
PowerHA pureScale technology from STG
 Efficient global locking and buffer management
Shared Storage Access  Synchronous duplexing to secondary ensures availability
Database
Data sharing architecture
 Shared access to database
 Members write to their own logs
 Logs accessible from another host (for recovery)

Scale with Ease
 Without changing
applications
 Efficient coherency protocols
designed to scale without Single Database View
application change
 Applications automatically and
transparently workload balanced
across members
 Without administrative DB2 DB2 DB2 DB2 DB2

complexity
 No data redistribution required
Log Log Log Log Log
 To 128 members in initial

release

Online Recovery
DB2 DB2 DB2 DB2
 A key DB2 pureScale design point

is to maximize availability during
failure recovery processing
Log Log Log Log
 When a database member fails,

only data in-flight on the failed
member remains locked during the Database member
failure
automated recovery Only data in-flight updates
locked during recovery
 In-flight = data being updated on the member
% of Data Available
at the time it failed 100
50
Time (~seconds)

Stealth System Maintenance
 Goal: allow DBAs to apply Single Database View

system maintenance without
negotiating an outage window
 Procedure: DB2 DB2 DB2 DB2

 Drain (aka Quiesce)
 Remove & Maintain
 Re-integrate Log Log Log Log
 Repeat until done

Agenda
 Introduction

 Monitoring data
 Installation

What is a Member ?
Member 0 Member 1
db2sysc process db2sysc process
db2 agents & other db2 agents & other
 A DB2 engine address space threads threads
 i.e. a db2sysc process and its threads
 Members Share Data log buffer, log buffer,

dbheap, & dbheap, &
 All members access the same shared database other heaps other heaps
 Aka “Data Sharing”
bufferpool(s) bufferpool(s)
 Each member has it’s own …

 Bufferpools
 Memory regions
 Log files
Log Log
 Members are logical.

Can have …
 1 per machine or LPAR (recommended) Shared database
 >1 per machine or LPAR (not recommended) (Single database partition)

What is a PowerHA pureScale ?
db2 agents & other db2 agents & other

 Software technology that assists threads threads
in global buffer coherency
management and global locking
 Derived from System z Parallel Sysplex &
Coupling Facility technology log buffer, log buffer,
dbheap, & dbheap, &
 Software based other heaps other heaps
 Services provided include bufferpool(s) bufferpool(s)
 Group Bufferpool (GBP)

 Global Lock Management (GLM)
 Shared Communication Area (SCA)
Primary
Log Log
 Members duplex GBP, GLM, GBP GLM SCA
SCA state to both a primary and
secondary Secondary
 Done synchronously
 Duplexing is optional (but recommended)
 Set up automatically, by default
Shared database
(Single database partition)
The Role of the GBP Client A :
Select from T1
Client B : Client
where C2=Y C :
Update T1 set C1=X Select from T1
 GBP acts as fast disk cache where C2=Y where C2=Y
Commit
 Dirty pages stored in GBP, then later,
written to disk
 Provides fast retrieval of such pages Member 0 Member 1
when needed by other members
 GBP includes a “Page

Registry” bufferpool(s) bufferpool(s)
 Keeps track of what pages are buffered
in each member and at what memory
address
 Used for fast invalidation of such pages
when they are written to the GBP
Wr
te
da
it e
ge
 Force-at-Commit (FAC) Pa
ali
Pa
Inv
ad
ge
protocol ensures coherent Re
nt”
ile
access to data across
“S
members
 DB2 “forces” (writes) updated pages to GBP GLM SCA
GBP at COMMIT (or before)
 GBP synchronously invalidates any
copies of such pages on other members
– New references to the page on Page
other members will retrieve new Registry
copy from GBP
M1 M2
– In-progress references to page
can continue
The Role of the GLM Client A :
Select from T1
Client B : Client
where C2=Y C :
Update T1 set C1=X Select from T1
where C2=Y where C2=Y
 Grants locks to members Commit
upon request Member 0 Member 1
 If not already held by another member,
or held in a compatible mode
 Maintains global lock state Page LSN is

old,
recent,
row row
lock
 Which member has what lock, in what lock
not needed
needed
mode
 Also - interest list of pending lock
requests for each lock
X
Lo
ck
Re
q
 Grants pending lock requests q
Wr
Re
te
Loc
when available
da
kR
it e
e k
elea ag oc
ali
L
Pa
se P
Inv
d S
 Via asynchronous notification a
ge
Re
nt”
ile
“S
 Notes
 When a member owns a lock, it may GBP GLM SCA
grant further, locally
 “Lock Avoidance” : DB2 avoids lock R32
requests when log sequence number
in page header indicates no update on R33
R33
Page
the page could be uncommitted Registry
M1-X
M2-S
M1 M2 R34

Achieving Efficient Scaling : Key Design Points
 Deep RDMA exploitation over
low latency fabric Lock Mgr Lock Mgr Lock Mgr Lock Mgr
 Enables round-trip response time
~10-15 microseconds
Buffer Mgr
 Silent Invalidation
 Informs members of page updates
requires no CPU cycles on those
Ca
n
members
Yu
e
Ih
ag
p,
av
dP
h
e
 No interrupt or other message
er
Ne
th
a
e
Re
is
w
yo
processing required
loc
pa
u
ge
k?
ar
e.
 Increasingly important as cluster
im
ag
grows
e
GBP GLM SCA
 Hot pages available without

disk I/O from GBP memory
 RDMA and dedicated threads enable
read page operations in
~10s of microseconds

Scalability : Example Clients (2-way x345)
 Transaction processing
workload modeling warehouse
& ordering process 1Gb Ethernet
 Write transactions rate to 20% Client
 Typical read/write ratio of many OLTP Connectivity
p550
workloads p550 powerHA pureScale
members
 No cluster awareness in the

application
 No affinity
20Gb IB
 No partitioning pureScale
 No routing of transactions to members Interconnect
 Testing key DB2 pureScale design point 7874-024
Switch
 Configuration
 12 8-core p550 members Two 4Gb FC
 64 GB, 5 GHz each Switches
 Duplexed PowerHA pureScale across 2
additional 8-core p550s
 64 GB, 5 GHz each
 DS8300 storage
 576 15K disks, Two 4Gb FC Switches DS8300
 IBM 20Gb/s IB HCAs Storage
 7874-024 IB Switch

Scalability : Example
12
11 10.4x @ 12 members
Throughput vs 1 member
10
9
8
7.6x @ 8 members
7
6
5
4 3.9x @ 4 members
3
2 1.98x @ 2 members
1
0
0 5 10 15
# Members
Member SW Failure : “Member Restart on
Home Host”
 kill -9 erroneously issued to a member
Clients
 DB2 Cluster Services automatically detects
member’s death
 Informs other members & powerHA pureScale servers
 Initiates automated member restart on same (“home”)
host
 Member restart is like a database crash recovery in a Single Database View
single system database, but is much faster
• Redo limited to inflight transactions (due to FAC)
• Benefits from page cache in GBP
Automatic; kill -9
 In the mean-time, client connections are

Ultra Fast;
transparently re-routed to healthy members
 Based on least load (by default), or, DB2 DB2 DB2 DB2
 Pre-designated failover member CS CS CS CS
Online Log
Records Pages
 Other members remain fully available
throughout – “Online Failover” Log Log Log Log
 Primary retains update locks held by member at the CS

CS
time of failure
Updated Pages Updated Pages
 Other members can continue to read and update data Global Locks Shared Data Global Locks
not locked for write access by failed member
Secondary Primary
 Member restart completes

 Retained locks released and all data fully available
Member HW Failure : “Member Restart on Guest
Host (aka Restart Light)”
 Power cord tripped over accidentally
 DB2 Cluster Services looses heartbeat and
Clients
declares member down
 Informs other members & PowerHA pureScale servers
 Fences member from logs and data
 Initiates automated member restart on another
(“guest”) host
 Using reduced, and pre-allocated memory model Single Database View
 Member restart is like a database crash recovery in a
single system database, but is much faster
• Redo limited to inflight transactions (due to FAC)
• Automatic;
Benefits from page cache in PowerHA pureScale
 In the mean-time, client connections are Fast;

Ultra
automatically re-routed to healthy members
DB2 DB2 DB2
 Based on least load (by default), or, DB2
 Pre-designated failover member Online CS CS CS
Fe
nc
e
DB2
CS
 Other members remain fully available

throughout – “Online Failover”
Lo
g
R
ec
Log Log Log Log
s
 Primary retains update locks held by member at the Pa
ge
time of failure CS s CS
 Other members can continue to read and update data Updated Pages Updated Pages
Global Locks Global Locks
not locked for write access by failed member Shared Data
Secondary Primary
 Member restart completes
 Retained locks released and all data fully available

Member Failback
Clients
 Power restored and system re-booted

 DB2 Cluster Services automatically detects

system availability
 Informs other members and PowerHA
pureScale servers
 Removes fence
 Brings up member on home host DB2 DB2 DB2 DB2
CS CS CS CS
DB2
 Client connections automatically re-routed

back to member Log Log Log Log
CS CS
Shared Data
Secondary Primary

Primary PowerHA pureScale Failure
Clients
 DB2 Cluster Services looses heartbeat Single Database View

and declares primary down
 Informs members and secondary
 PowerHA pureScale service momentarily
blocked Automatic;
 All other database activity proceeds normally
 Eg. accessing pages in bufferpool, existing
locks, sorting, aggregation, etc Ultra Fast;
DB2 DB2 DB2 DB2
 Members send missing data to Online CS CS CS CS
secondary
 Eg. read locks
Log Log Log Log
CS CS
 Secondary becomes primary Global Locks Shared Data Global Locks
 PowerHA pureScale service continues where Secondary Primary

it left off Primary
 No errors are returned to DB2 members

PowerHA pureScale Re-integration
Clients
 Power restored and system re-booted
 DB2 Cluster Services automatically Single Database View

detects system availability
 Informs members and primary
 New system assumes secondary role

in ‘catchup’ state
 Members resume duplexing
 Members asynchronously send lock and DB2 DB2 DB2 DB2
other state information to secondary CS CS CS CS
 Members asynchronously castout pages from
primary to disk
Log Log Log Log
 Catchup complete CS CS
 Secondary in peer state (contains same lock Updated Pages Updated Pages
and page state as primary) Global Locks Shared Data Global Locks
Primary Secondary
(Catchup
(Peer state)
state)

Secondary PowerHA pureScale Failure
Clients
 DB2 Cluster Services looses heartbeat Single Database View

and declares secondary down
 Informs members and primary
 Members stop duplexing
Automatic;
Ultra Fast;
DB2 DB2 DB2 DB2
Online CS CS CS CS

Log Log Log Log
(Re-integration similar to previous chart)

CS CS
Shared Data
Secondary Primary

Summary (Single Failures)
Other
Members
Remain Automatic &
Failure Mode Online ? Transparent ?
DB2 DB2 DB2 DB2
Member
CF CF
Connections to failed
member transparently
move to another member
Primary
PowerHA DB2 DB2 DB2 DB2
pureScale
CF CF
Secondary
PowerHA DB2 DB2 DB2 DB2
pureScale
CF CF

Simultaneous Failures
Other
Members
Remain Automatic &
Failure Mode Online ? Transparent ?
DB2 DB2 DB2 DB2
CF CF
DB2 DB2 DB2 DB2
CF CF
DB2 DB2 DB2 DB2
CF CF

“Stealth” Maintenance : Example
 Ensure automatic load balancing is enabled

(it is by default)
 db2stop member 3 quiesce
 db2stop instance on host <hostname>
 Perform desired maintenance

DB2 DB2 DB2 DB2
eg. install AIX PTF
 db2start instance on host <hostname>

Log Log Log Log
 db2start member 3

Agenda
 Introduction

 Monitoring data
 Installation

db2nodes.cfg
Clients
db2nodes.cfg
host0 host1 host2 host3 0 host0 0 host0ib MEMBER
1 host1 0 host1ib MEMBER
DB2 DB2 DB2 DB2 2 host2 0 host2ib MEMBER
3 host3 0 host3ib MEMBER
4 host4 0 host4ib CF
5 host5 0 host5ib CF
host4 host5
Shared Data

Instance and Host Status
Clients
> db2start
08/24/2008 00:52:59 0 0 SQL1063N DB2START processing was successful.
Single Database View SQL1063N DB2START processing was successful.
> db2instance -list
host0 host1 host2 host3 ID TYPE STATE HOME_HOST CURRENT_HOST ALERT
DB2 DB2 DB2 DB2 0 MEMBER STARTED host0 host0 NO

1 MEMBER STARTED host1 host1 NO
4 CF PRIMARY host4 host4 NO
host4 host5 5 CF PEER host5 host5 NO
Shared Data HOST_NAME STATE INSTANCE_STOPPED ALERT
host0 ACTIVE NO NO
host1 ACTIVE NO NO
host2 ACTIVE NO NO
host3 ACTIVE NO NO
host4 ACTIVE NO NO
host5 ACTIVE NO NO

Instance Status
Where member or
Node state CF is currently
For members typically… running
Clients Target
(started, stopped, restarting, host for(Normally same as Does the
>waiting_for_failback)
db2start member home host. When member or CF
08/24/2008 00:52:59
08/24/2008 00:53:00
(0Member
1 0
0 SQL1063N
tries to DB2START processing was successful.
differs, usually
SQL1063N DB2START processingrequire was successful.
For CFs typically… 2run0on SQL1063N
this hostindicates
DB2STARThome host
Node type
08/24/2008
(primary,
00:53:01
peer, stopped, attention?
processing was successful.
Number from 08/24/2008 00:53:01 3 when
0 it is
SQL1063N failed and
DB2START member is
processing was successful.
(Example: member
Single Database View (member,
db2nodes.cfg CF)catchup(##%),
SQL1063N restarting)
DB2START processing was successful.
available.) restarting.) restart failed)
> db2instance -list

host0 host1 host2 host3 ID TYPE STATE HOME_HOST CURRENT_HOST ALERT
DB2 DB2 DB2 DB2 0 MEMBER STARTED host0 host0 NO

host0 ACTIVE NO NO
host1 ACTIVE NO NO
host2 ACTIVE NO NO
host3 ACTIVE NO NO
host4 ACTIVE NO NO
host5 ACTIVE NO NO

Host Status
Clients
> db2start
Single Database View SQL1063N DB2START processing was successful.
Has the instance been

> db2instance -listdisabled on this host?
DBAs can stop (aka disable)
host0 host1 host2 host3 ID TYPE STATE the instance on the
HOME_HOST host for
CURRENT_HOST ALERT
Does the host
Host state the purposes of maintenance
(active) indicates the host is(eg. Upgrades). While require attention?
DB2 DB2 DB2 DB2 0up and
MEMBER
available.STARTED disabled,host0 host0
member restart (Examples:
and NO
power
A host that is
1 MEMBER STARTED otherhost1 DB2 activity ishost1 failure, can’t
NO
defined in the
2(inactive)
MEMBER indicates the host prevented
STARTED is host2 host2
on the host. communicateNO
with
instance
down and not
3 MEMBER STARTED available. host3 host3 host) NO
CF CF
host0 ACTIVE NO NO
host1 ACTIVE NO NO
host2 ACTIVE NO NO
host3 ACTIVE NO NO
host4 ACTIVE NO NO
host5 ACTIVE NO NO

Client Connectivity and Workload Balancing
 Run-time load information used to automatically balance load across the cluster (as in System z
sysplex)
 Load information of all members kept on each member
 Piggy-backed to clients regularly
 Used to route next connection (or optionally next transaction) to least loaded member
 Routing occurs automatically (transparent to application)
 Failover
 Load of failed member evenly distributed to surviving members automatically
 Fallback
 Once the failed member is back online, fallback does the reverse
Clients Clients

Optional Affinity-based Routing
App Servers App Servers App Servers App Servers
Group A Group B Group C Group D
 Allows you to target different groups of

clients or workloads to different members
in the cluster
 Maintained after failover …

 … and fallback
 Example use cases
 Consolidate separate workloads/applications on

same database infrastructure
 Minimize total resource requirements for disjoint
workloads
 Easily configured through client

configuration
 db2dsdriver.cfg file

Operational Monitoring
Member 0 Member 1
 New monitoring views and SQL
functions db2 agents db2 agents
 Global locking and global bufferpool
statistics
 Drill down into other PowerHA
pureScale internal statistics 100 Logical Reads 50 Logical Reads
 Cluster communications time
 Cross-member page access statistics
IBMDEFAULTBP IBMDEFAULTBP
 Drill down per member… LBP Hit Ratio = 95% LBP Hit Ratio = 80%
… or get global view
 Available from any member
4 return 5 GBP 10 GBP 8 return

page; 1 Logical Logical page; 2
 Event monitors “always available” does’nt Reads Reads do not
mode
 DB2 pureScale chooses initial member
automatically
 Fails over automatically if member fails GBP
Overall GBP Hit
1 page 1 GBP Ratio = 80% 2 GBP 2 pages
returned Physical Physical returned
 Various new monitoring elements from Read Reads from
disk disk
 Example, GBP tuning related elements
(partial list):
– DATA_GBP_L_READS
– DATA_GBP_P_READS
– INDEX_GBP_L_READS
– INDEX_GBP_P_READS

DB2 pureScale : A Complete Solution
Clients
 DB2 pureScale is a complete

software solution
 Comprised of tightly integrated
subcomponents
Member Member Member Member
CS CS CS CS
 Single install invocation

 Installs all components across desired
Cluster Interconnect hosts
 Automatically configures best practices
CS CS
Log Log Log Log
 No cluster manager scripting
Shared Storage Access
or configuration required
 This is set up automatically, upon
Database installation

DB2 pureScale
 Unlimited Capacity
Start small
Grow easily, with your business
 Application Transparency
Avoid the risk and cost of tuning your
applications to the database topology
 Continuous Availability
Maintain service across planned and
unplanned events

> Questions
38 38
Thank You!
ibm.com/db2/labchats
ng!
n di
tte
ora
f
u
yo
a nk
Th
39 39

Slides DB2 - Purescale

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Slides DB2 - Purescale

Загружено:

Авторское право:

Доступные форматы

ibm.

DB2 pureScale : A Technology Preview

Oct 21, 2009 ibm.com/db2/labchats

1 © 2009 IBM Corporation

2 © 2009 IBM Corporation

 Configuration, Monitoring, Tooling

5 © 2009 IBM Corporation

6 © 2009 IBM Corporation

Clients Clients connect anywhere,…

DB2 engine runs on several host computers

Integrated cluster services

Low latency, high speed interconnect

7 © 2009 IBM Corporation

 Without administrative DB2 DB2 DB2 DB2 DB2

 To 128 members in initial

8 © 2009 IBM Corporation

DB2 DB2 DB2 DB2

 A key DB2 pureScale design point

 When a database member fails,

9 © 2009 IBM Corporation

 Goal: allow DBAs to apply Single Database View

 Procedure: DB2 DB2 DB2 DB2

 Repeat until done

10 © 2009 IBM Corporation

 Configuration, Monitoring, Tooling

11 © 2009 IBM Corporation

 i.e. a db2sysc process and its threads

 Members Share Data log buffer, log buffer,

 Each member has it’s own …

 Members are logical.

12 © 2009 IBM Corporation

db2 agents & other db2 agents & other

 Services provided include bufferpool(s) bufferpool(s)

 Group Bufferpool (GBP)

 GBP includes a “Page

 Maintains global lock state Page LSN is

15 © 2009 IBM Corporation

 Hot pages available without

16 © 2009 IBM Corporation

 No cluster awareness in the

17 © 2009 IBM Corporation

 In the mean-time, client connections are

 Primary retains update locks held by member at the CS

 Member restart completes

 In the mean-time, client connections are Fast;

 Other members remain fully available

20 © 2009 IBM Corporation

 Power restored and system re-booted

 DB2 Cluster Services automatically detects

 Client connections automatically re-routed

21 © 2009 IBM Corporation

 DB2 Cluster Services looses heartbeat Single Database View

 Members send missing data to Online CS CS CS CS

 PowerHA pureScale service continues where Secondary Primary

22 © 2009 IBM Corporation

 DB2 Cluster Services automatically Single Database View

 New system assumes secondary role

Log Log Log Log

23 © 2009 IBM Corporation

 DB2 Cluster Services looses heartbeat Single Database View

(Re-integration similar to previous chart)

24 © 2009 IBM Corporation