Вы находитесь на странице: 1из 13

Chapter 1.

Introduction

X What is a distributed DBMS


X Promises of DDB
X Distributed DBMS issues

M.H. Kim, KAIST


1

What is a Distributed Database System?

z Distributed database (DDB)


– a collection of multiple, logically interrelated databases
» distributed over a computer network

z Distributed database management system (DDBMS)


– software that manages the DDB, and
» provides an access mechanism that makes this distribution
transparent to the users

z Distributed database system (DDBS) = DDB + DDBMS

M.H. Kim, KAIST


2

1
What is a DDBS? (cont’d)

X Implicit assumptions of DDB


– data stored at a number of sites
» each site logically consists of a single processor.
– processors at different sites are interconnected by a computer
network
» not multiprocessors, not simply a parallel database system
– distributed database is a database
» not a collection of files
– DDBMS is a full-fledged DBMS
» not remote file system, not simply a (distributed) TP system

M.H. Kim, KAIST


3

What is a DDBS? (cont’d)

X Applications
– manufacturing
» especially multi-plant manufacturing
– military command and control
– corporate MIS
– airlines
– hotel chains
– any organization which has a decentralized organization
structure

M.H. Kim, KAIST


4

2
Promises of DDBS

z Transparent management of
– distributed, fragmented, and replicated data

z Improved reliability/availability
» distributed transactions

z Improved performance
z Easier and more economical system expansion

M.H. Kim, KAIST


5

Promises of DDBS (cont’d)

X Transparency
» separation of the higher level semantics of a system
y from the lower level implementation issues
» fundamental issue in transparency
y provides Data Independence in the distributed environment

– data independence
» immunity of user applications to change in the definition and
organization of data, and vice versa
» distributed query processing support

M.H. Kim, KAIST


6

3
Promises of DDBS (cont’d)

– network transparency
y also called, distribution transparency
» location transparency
» naming transparency
y also called, access transparency
– replication transparency
– fragmentation transparency
» horizontal fragmentation
» vertical fragmentation
» hybrid

M.H. Kim, KAIST


7

Promises of DDBS (cont’d)

Distributed Database - User View

Distributed Database

M.H. Kim, KAIST


8

4
Promises of DDBS (cont’d)

Distributed Database - Reality

User
DBMS Query
Software User
DBMS
Application
Software

DBMS Communication
Software
Subsystem
User
Application
DBMS DBMS
Software Software
User
Query User
Query

M.H. Kim, KAIST


9

Example
EMP ENO ENAME TITLE
E1 J. Doe Elect. Eng.
E2 M. Smith Syst. Anal.
E3 A. Lee Mech. Eng.
E4 J. Miller Programmer
E5 B. Casey Syst. Anal.
E6 L. Chu Elect. Eng.
E7 R. Davis Mech. Eng.
E8 J. Jones Syst. Anal.

PROJ PNO PNAME BUDGET LOC


P1 instrumentation 150000 Montreal
P2 DB Develop. 135000 New York
P3 CAD/Cam 250000 New York
P4 Maintenance 310000 Paris
P5 CAD/Cam 500000 Boston
M.H. Kim, KAIST
10

5
Example (cont’d)

ENO PNO RESP DUR TITLE SAL


E1 P1 Manager 12 Elect. Eng. 40000
E2 P1 Analyst 24 Syst. Anal. 34000
E2 P2 Analyst 6 Mech. Eng. 27000
E3 P3 Consultant 10 Programmer 24000
E3 P4 Engineer 48
E4 P2 Programmer 18
E5 P2 Manager 24 PAY
E6 P4 Manager 48
E7 P3 Engineer 36
E7 P5 Engineer 23
E8 P3 Manager 40

ASG
M.H. Kim, KAIST
11

Promises of DDBS (cont’d)

Tokyo
Paris Q Fragmentation and
replication
Boston Paris projects
Communication Paris employees – for performance and
Network Paris assignments reliability
Boston employees
Boston projects
Boston employees
Boston assignments Montreal

New York Montreal projects


Paris projects
Boston projects New York projects
New York employees with budget > 20000
New York projects Montreal employees
New York assignments Montreal assignments
M.H. Kim, KAIST
12

6
Promises of DDBS (cont’d)

(Ex) Find the names and salaries of employees who


worked on a project for more than 12 months

If fully transparent access,

Ö SELECT ENAME,SAL
FROM EMP, ASG, PAY
WHERE DUR > 12
AND EMP.ENO = ASG.ENO
AND EMP.TITLE = PAY.TITLE

M.H. Kim, KAIST


13

Promises of DDBS (cont’d)

X Who should provide transparency?


– access layer
y middleware service layer
» compiler or interpreter translates the requested services into
required operations

– DBMS layer
» DBMS should be responsible for a high level of data
independence
y together with replication and fragmentation transparency

M.H. Kim, KAIST


14

7
Promises of DDBS (cont’d)

– OS layer
» distributed operating system (DOS)
9 usually provides network transparency
y e.g., distributed file services, naming, timing, etc
» services provided by DOS are limited in general

M.H. Kim, KAIST


15

Promises of DDBS (cont’d)

Language transparency

Fragmentation transparency
Replication transparency

Network transparency
Data independence

Data

Layers of Transparency
M.H. Kim, KAIST
16

8
Promises of DDBS (cont’d)

Full transparency may not be a universally accepted objective


» full transparency makes the management of distributed data very
difficult
– applications coded with transparent access to geographically
distributed data
» poor manageability
» poor message performance

M.H. Kim, KAIST


17

Promises of DDBS (cont’d)

X Potentially improved performance


– proximity of data to its points of use
y requires some support for fragmentation and replication

» distribution of data reduces the contention for CPU and I/O

» localization reduces remote access delay

– parallelism in execution
» inter-query parallelism

» intra-query parallelism

M.H. Kim, KAIST


18

9
Promises of DDBS (cont’d)

  Parallelism requirements
– needs to have as much of the data required by each application
y at the site where the application executes
» full or partial replication/fragmentation

– how about updates?


» updates to distributed/replicated data requires
y distributed concurrency control protocol
y distributed reliability (i.e., commit) protocol and
9 distributed recovery protocol

M.H. Kim, KAIST


19

Promises of DDBS (cont’d)

X Easier system expansion


– easier to accommodate increasing DB sizes
– more economical to scale up DCS
9 rather than purchasing a new single large mainframe

emergence of microprocessor and workstation technologies


» client-server model of computing
local processing cost vs telecommunication cost

M.H. Kim, KAIST


20

10
Complicating Factors

X Major factors for additional complexity in DDBS


– data may be replicated
» mainly due to reliability and efficiency
– failure recovery
» consistency when some sites or communication links fails
– synchronization among tasks on multiple sites

M.H. Kim, KAIST


21

Complicating Factors (cont’d)

X Potential problems
– increased complexity
– increased cost
» especially, increased personnel cost
– distribution of control
9 can be an advantage in certain points
» problems of synchronization and coordination
– difficulty in security

M.H. Kim, KAIST


22

11
Distributed DBMS Issues

X Distributed database design


– how to distribute the database
– replicated & non-replicated database distribution

X Distributed query processing


– converts user transactions to data manipulation instructions
– optimization problem
» min {cost = data transmission + local processing}
general formulation is NP-hard

M.H. Kim, KAIST


23

Distributed DBMS Issues (cont’d)

X Distributed concurrency control


– synchronization of concurrent accesses
» integrity of a single DB
» consistency of multiple copies of DB
– consistency and isolation of transactions’ effects
X Distributed deadlock management
X Reliability
– atomicity and durability
» DB at the operational sites remain consistency
» consistent recovery at the failed sites
– how to make the system resilient to failures
M.H. Kim, KAIST
24

12
Distributed DBMS Issues (cont’d)

X Operating system support


– proper support for
» distributed processing
» database processing

X Heterogeneous DBS and interoperability


» heterogeneity in data model and data language
» constructs DBMS from a number of autonomous and centralized
DBMSs
y more probable scenario
– distributed Multidatabase systems

M.H. Kim, KAIST


25

Relationship between Issues

Directory
Management

Query Distribution
Reliability
Processing Design

Concurrency
Control

Deadlock
Management
M.H. Kim, KAIST
26

13

Вам также может понравиться