Вы находитесь на странице: 1из 19

DB2 pureScale

Steve Rees IBM Canada Ltd. March 15, 2010

www.openfabrics.org www.openfabrics.org

Agenda
Introduction to pureScale
10,000 ft view Goals & value for users Technology overview

Key Concepts & Internals


Major components & moving parts Efficient scaling over low-latency interconnect

Interconnect issues
Requirements, futures, etc.
www.openfabrics.org

3x10

12

nm

DB2 pureScale evolution from 10,000 ft


Before pureScale: DB2 for Linux/Unix/Windows (LUW) available in
SMP configurations Shared-nothing clusters

DB2 on the mainframe


SMP configurations Shared-data clusters with System z Sysplex

Very well-known for continuous availability and OLTP scalability

DB2 pureScale merges interconnect and database technologies to create scale-out, highly available database clusters
DB2 for Syste m z Sysplex

DB2 for S ystem Z


Technology crosspollination within DB2 family

reScale DB2 LUW pu

Converging DB2 Technologies

DB2 L U W
5e+14 s ago 1e+14 s ago 0 s ago

8e+14 s ago

www.openfabrics.org

DB2 pureScale : Goals


Unlimited Capacity
Any transaction processing or ERP workload Start small Grow easily

Application Transparency
Avoids the risk and cost of tuning applications to the database topology

Continuous Availability
Maintain service across planned and unplanned events
www.openfabrics.org

DB2 pureScale : Technology Overview


Leverage System z Sysplex Experience and Know-How
Clients

Clients connect anywhere and see a single database


Single Database View Clients connect into any member Automatic load balancing and client reroute may change underlying physical member to which client is connected Co-operate with each other to provide coherent access to the database from any member

DB2 engine runs on several host machines

Member Member Member Member

Low latency, high speed interconnect


CS CS CS CS

Special optimizations provide significant advantages on RDMA-capable interconnects (eg. Infiniband) Efficient global locking and buffer management Synchronous duplexing to secondary ensures availability Shared access to database Members write to their own logs Logs accessible from another host (used during recovery) Failure detection, recovery automation (TSA / RSCT) Cluster file system (GPFS)

PowerHA pureScale technology


IB
Cluster Interconnect
CS Secondary CS

Data sharing architecture

Log

Log

Log

Log

Primary

Shared Storage Access


Database

Integrated cluster services


www.openfabrics.org

Easy scale-out
To 128 members in initial release
High IOPS & BW of interconnect make this possible
Single Database View

Efficient coherency protocols exploit low-latency interconnect to scale without application change
DB2 DB2 DB2 DB2 DB2

Applications workload automatically and transparently balanced across members


Secondary

IB
Log Log Log Log

Primary

No data redistribution required


www.openfabrics.org

Online Recovery
DB2 DB2 DB2 DB2

A key pureScale design point is to maximize availability during failure recovery processing
Secondary

IB
Log Log Log Log

Primary

% of Data Available

When a database member fails, only data in-flight on the failed member remains locked during the automated recovery High-speed interconnect means duration of even partial inaccessibility is limited

Database member failure Only data in-flight updates locked during recovery
100

50

Time (~seconds)

www.openfabrics.org

Agenda
Introduction to pureScale
10,000 ft view Goals & value for users Technology overview

Key Concepts & Internals


Major components & moving parts Efficient scaling over low-latency interconnect

Interconnect issues
Requirements, futures, etc.
www.openfabrics.org

What is a Member ?
A DB2 engine address space
i.e. a DB2 engine process (db2sysc) and its threads
Member 0 DB2 engine process
DB2 agents & other threads

Member 1 DB2 engine process


DB2 agents & other threads

Each member has its own


Bufferpools Memory regions Log files
log buffer, dbheap, & other heaps bufferpool(s) log buffer, dbheap, & other heaps bufferpool(s)

Members coordinate with each other via the PowerHA pureScale systems, through uDAPL
Sec

uDAPL HCA

uDAPL HCA

IB Log Log

Pri

Members Share Data


All members access the same shared database Aka Data Sharing
www.openfabrics.org

Shared database (Single database partition)

What is a PowerHA pureScale ?


Software technology that assists in global buffer coherency management and global locking
Shared lineage with System z Parallel Sysplex Software based
Member 0 DB2 engine process
DB2 agents & other threads

Member 1 DB2 engine process


DB2 agents & other threads

log buffer, dbheap, & other heaps bufferpool(s)

log buffer, dbheap, & other heaps bufferpool(s)

Services provided include


Group Bufferpool (GBP) Global Lock Management (GLM) Shared Communication Area (SCA)

uDAPL

uDAPL HCA

IB
HCA uDAPL worker threads HCA uDAPL worker threads

GBP GLM SCA

Pri

GBP GLM SCA Secondary

Members duplex GBP, GLM, SCA state to both a primary and secondary
Done synchronously Duplexing is optional (but recommended) Set up automatically, by default

Primary

Log Shared database (Single database partition)

Log

www.openfabrics.org

Walking through some key activities


RDMA exploitation via uDAPL over low latency fabric
Enables round-trip response time ~10-15 microseconds
Lock Mgr Lock Mgr Lock Mgr Lock Mgr Buffer Mgr

Silent page invalidation


Informs members of page updates Requires no CPU cycles on those members No interrupt or other message processing required Increasingly important as cluster grows

k? oc sl thi e. ve ar e u ha ag yo nI im re Ca ge he pa p, Yu New

Re ad

Hot pages available to members from GBP memory without disk I/O
RDMA and dedicated threads enable read page operations in 10s of microseconds
www.openfabrics.org

GBP

GLM

Pa ge

worker threads

SCA

Scalability : Example
Transaction processing workload modeling warehouse & ordering p550 process members
Write transactions rate to 20% Typical read/write ratio of many OLTP workloads

Clients (2-way x345)

p550 powerHA pureScale

1Gb Ethernet Client Connectivity

No cluster awareness in the app


No affinity No partitioning No routing of transactions to members

20Gb IB pureScale Interconnect 7874-024 Switch

Configuration
12 8-core p550 members, 64 GB, 5 GHz IBM 20Gb/s IB HCAs + 7874-024 IB Switch Duplexed PowerHA pureScale across 2 additional | 8-core p550s, 64 GB, 5 GHz DS8300 storage 576 15K disks, Two 4Gb FC Switches
www.openfabrics.org

Two 4Gb FC Switches

DS8300 Storage

Scalability : Example
12 11 10 9 8 7 6 5 4 3 2 1 0 0
www.openfabrics.org

10.4x @ 12 members

Throughput vs. 1 member

7.6x @ 8 members

3.9x @ 4 members 1.98x @ 2 members

5
# Members

10

15

Agenda
Introduction to pureScale
10,000 ft view Goals & value propositions for customers Technology overview

Key Concepts & Internals


Major components & moving parts Efficient scaling over low-latency interconnect

Interconnect issues
Requirements, futures, etc.
www.openfabrics.org

Transport requirements / outlook


RDMA is a key technology in DB2 pureScale
uDAPL + IB spec & maturity make it a natural choice The initial pureScale release platform is well-specified, but fundamentally it is transport-agnostic

Low latency / high bandwidth is king!


QDR / EDR / HDR IB promise improved performance & scalability Stable minimum latencies needed, especially with increased traffic

Multicast RDMA write


Infiniband Roadmap from www.infinibandta.org www.openfabrics.org

Converged fabrics
Large-scale database has huge capacity requirement for interconnect, FC disk access, client network connections IT policy sometimes frowns on multiple network infrastructures Successful convergence needs
Sufficient transport capacity EDR+, 80 Gbps+ Fine-grained traffic prioritization Solid QoS mechanisms

Broad adoption will need demonstrated reliability in heterogeneous, unpredictable IT environments


www.openfabrics.org

Link aggregation
Interconnect fault-tolerance, load balancing
Provides needed additional headroom (esp. IOPS) & high availability for the most demanding configurations Standard practice in conventional Ethernet & FC networks Can be handled at application (pureScale) level, but preferable below that

www.openfabrics.org

Fit and finish


20 years of easy ethernet raises expectations Fairly high level of expertise required to configure & manage low latency interconnects
Very manageable in a lab environment Trickier in a mass-market IT shop

Stack integration into O/S distributions a great step forward Full integration into system management tools still to come
www.openfabrics.org

Summary
A confluence of great technologies
High-bandwidth, low-latency interconnect Efficient user-mode transport access via uDAPL RDMA for ultra-lowcost cache coherency DB2 Sysplex

helps enable pureScale to deliver on its scalability and availability targets In addition
Maturity / standardization of converged transports Continued IOPS / GbPS growth Improved HA and manageability

will continue to promote commercial deployment of advanced interconnect stacks


www.openfabrics.org

Вам также может понравиться