Академический Документы
Профессиональный Документы
Культура Документы
n Introduction
n Background
n Distributed DBMS Architecture
n Distributed Database Design
n Semantic Data Control
n Distributed Query Processing
n Distributed Transaction Management
Data server approach
Parallel architectures
Parallel DBMS techniques
Parallel execution models
o Parallel Database Systems
o Distributed Object DBMS
o Database Interoperability
o Concluding Remarks
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.1
n Predictions
(Micro-) processor speed growth : 50 % per year
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.2
Page 1
The Solution
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.3
Multiprocessor Objectives
n High-performance with better cost-performance
than mainframe or vector supercomputer
n Use many nodes, each with good cost-
performance, communicating through network
Good cost via high-volume components
Good performance via bandwidth
n Trends
Microprocessor and memory (DRAM): off-the-shelf
Network (multiprocessor edge): custom
n The real chalenge is to parallelize applications to
run with good load balancing
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.4
Page 2
Data Server Architecture
Client
client interface
Application query parsing
server
data server interface
communication channel
database
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.5
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.6
Page 3
Data Server Approach: Assessment
n Advantages
Integrated data control by the server (black box)
Increased performance by dedicated system
Can better exploit parallelism
Fits well in distributed environments
n Potential problems
Communication overhead between application and data
server
u High-level interface
High cost with mainframe servers
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.7
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.8
Page 4
Data-based Parallelism
n Inter-operation
p operations of the same query in parallel
op.3
op.1 op.2
n Intra-operation
the same operation in parallel on different data partitions
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.9
Parallel DBMS
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.10
Page 5
Parallel DBMS - Objectives
n Much better cost / performance than
mainframe solution
n High-performance through parallelism
High throughput with inter-query parallelism
Low response time with intra-operation parallelism
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.11
Linear Speed-up
Linear increase in performance for a constant DB
size and proportional increase of the system
components (processor, memory, disk)
ideal
new perf.
old perf.
components
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.12
Page 6
Linear Scale-up
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.13
Barriers to Parallelism
n Startup
The time needed to start a parallel operation may
dominate the actual computation time
n Interference
When accessing shared resources, each new process slows
down the others (hot spot problem)
n Skew
The response time of a set of parallel processes is the time
of the slowest one
n Parallel data management techniques intend
to overcome these barriers
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.14
Page 7
Parallel DBMS
Functional Architecture
User User
task 1 task n
Session Mgr
RM Request Mgr RM
task 1 task n
DM DM DM DM
Data Mgr task n2
task 11 task 12 task n1
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.15
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.16
Page 8
Parallel System Architectures
Shared disk
n Hybrid architectures
Hierarchical (cluster)
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.17
Shared-Memory Architecture
P1 Pn
interconnect
Global Memory
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.18
Page 9
Shared-Disk Architecture
P1 Pn
M1 Mn
interconnect
Shared-Nothing Architecture
interconnect
P1 Pn
D1 Dn
M1 Mn
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.20
Page 10
Hierarchical Architecture
P1 Pn P1 Pn
interconnect interconnect
D D
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.21
Shared-Memory vs.
Distributed Memory
n Mixes two different aspects : addressing and
memory
Addressing
Single address space : Sequent, Encore, KSR
u
Multiple address spaces : Intel, Ncube
u
Physical memory
u Central : Sequent, Encore
u Distributed : Intel, Ncube, KSR
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.22
Page 11
NUMA Architectures
n Cache Coherent NUMA (CC-NUMA)
statically divide the main memory among the nodes
n Cache Only Memory Architecture (COMA)
convert the per-node memory into a large cache of the
shared address space
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.23
COMA Architecture
P1 P2 Pn
Cache
Memory
Cache
Memory Cache
Memory
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.24
Page 12
Parallel DBMS Techniques
n Data placement
Physical placement of the DB onto multiple nodes
Static vs. Dynamic
n Transaction management
Similar to distributed transaction management
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.25
Data Partitioning
n Each relation is divided in n partitions
(subrelations), where n is a function of
relation size and access frequency
n Implementation
Round-robin
u Maps i-th element to node i mod n
u Simple but only exact-match queries
B-tree index
u Supports range queries but large index
Hash function
u Only exact-match queries but small index
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.26
Page 13
Partitioning Schemes
Round-Robin Hashing
Interval
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.27
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.28
Page 14
Interleaved Partitioning
Node 1 2 3 4
Primary copy R1 R2 R3 R4
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.29
Chained Partitioning
Node 1 2 3 4
Primary copy R1 R2 R3 R4
Backup copy r4 r1 r2 r3
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.30
Page 15
Placement Directory
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.31
Join Processing
n Three basic algorithms for intra-operator
parallelism
Parallel nested loop join: no special assumption
Parallel associative join: one relation is declustered on join
attribute and equi-join
Parallel hash join: equi-join
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.32
Page 16
Parallel Nested Loop Join
node 1 node 2
R1: R2:
send
partition
S1 S2
node 3 node 4
R S i=1,n(R Si)
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.33
S1 S2
node 3 node 4
R S i=1,n(Ri Si)
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.34
Page 17
Parallel Hash Join
node node node node
R1: R2: S1: S2:
node 1 node 2
R S i=1,P(Ri Si)
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.35
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.36
Page 18
Execution Plans as Operators Trees
Result Result
j3 j6
Left-deep
j2 R4 R4 j5 Right-deep
R3 R3 j4
j1
R1 R2 R1 R2
Result
Result
j9
j12
R4 j8 Bushy
Zig-zag j10 j11
j7 R3
R1 R2 R3 R4
R1 R2
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.37
Temp2 Temp2
R4 R4
Build2 Probe2 Build2 Probe2
Temp1 Temp1
R3 R3
Build1 Probe1 Build1 Probe1
R1 R2 R1 R2
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.38
Page 19
Load Balancing
n Problems arise for intra-operator parallelism
with skewed data distributions
attribute data skew (AVS)
tuple placement skew (TPS)
selectivity skew (SS)
redistribution skew (RS)
join product skew (JPS)
n Solutions
sophisticated parallel algorithms that deal with skew
dynamic processor allocation (at execution time)
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.39
JPS
JPS
Res2
Res1
AVS/TPS AVS/TPS
Join1 Join2
S2 S1
RS/SS RS/SS
R2 R1
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.40
Page 20
Some Parallel DBMSs
n Prototypes
EDS and DBS3 (ESPRIT)
Gamma (U. of Wisconsin)
Bubba (MCC, Austin, Texas)
XPRS (U. of Berkeley)
GRACE (U. of Tokyo)
n Products
Teradata (NCR)
NonStopSQL (Tandem-Compac)
DB2 (IBM), Oracle, Informix, Ingres, Navigator
(Sybase) ...
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.41
Distributed DBMS 1998 M. Tamer zsu & Patrick Valduriez Page 13.42
Page 21