Вы находитесь на странице: 1из 44

Parallel and Distributed

Databases
CS263 Lecture 16
LECTURE PLAN

Parallel DBMS - What and Why?

What is a Client/Server DBMS?

Why do we need Distributed DBMSs?

Dates rules for a Distributed DBMS

Benefits of a Distributed DBMS

Issues associated with a Distributed DBMS

Disadvantages of a Distributed DBMS

PARALLEL DATABASE SYSTEM

PARALLEL DBMSs
WHY DO WE NEED THEM?
More and More Data!
We have databases that hold a high amount of
data, in the order of 10
12
bytes:
10,000,000,000,000 bytes!
Faster and Faster Access!
We have data applications that need to process
data at very high speeds:
10,000s transactions per second!
SINGLE-PROCESSOR DBMS ARENT UP TO THE JOB!

Improves Response Time.

INTERQUERY PARALLELISM

It is possible to process a number of transactions in
parallel with each other.

Improves Throughput.

INTRAQUERY PARALLELISM

It is possible to process sub-tasks of a transaction in
parallel with each other.

PARALLEL DBMSs
BENEFITS OF A PARALLEL DBMS

Speed-Up.

As you multiply resources by a certain factor, the time taken
to execute a transaction should be reduced by the same factor:
10 seconds to scan a DB of 10,000 records using 1 CPU
1 second to scan a DB of 10,000 records using 10 CPUs
PARALLEL DBMSs
HOW TO MEASURE THE BENEFITS

Scale-up.

As you multiply resources the size of a task that can be executed
in a given time should be increased by the same factor.
1 second to scan a DB of 1,000 records using 1 CPU
1 second to scan a DB of 10,000 records using 10 CPUs

Sub-linear speed-up
Linear speed-up (ideal)
Number of CPUs
N
u
m
b
e
r

o
f

t
r
a
n
s
a
c
t
i
o
n
s
/
s
e
c
o
n
d

1000/Sec
5 CPUs
2000/Sec
10 CPUs 16 CPUs
1600/Sec
PARALLEL DBMSs
SPEED-UP
10 CPUs
2 GB Database
Number of CPUs, Database size
N
u
m
b
e
r

o
f

t
r
a
n
s
a
c
t
i
o
n
s
/
s
e
c
o
n
d

Linear scale-up (ideal)
Sub-linear scale-up
1000/Sec
5 CPUs
1 GB Database
900/Sec
PARALLEL DBMSs
SCALE-UP

MEMORY
CPU
CPU
CPU
CPU
CPU
CPU
Shared Memory Parallel Database Architecture
CPU
CPU
CPU
CPU
CPU
CPU
Shared Disk Parallel Database Architecture
M
M
M
M
M
M
Shared Nothing Parallel Database Architecture
CPU M
CPU M
CPU M
CPU
M
CPU
M

MAINFRAME DATABASE
SYSTEM

DUMB
DUMB
DUMB
S
P
E
C
I
A
L
I
S
E
D

N
E
T
W
O
R
K

C
O
N
N
E
C
T
I
O
N

TERMINALS
MAINFRAME COMPUTER
PRESENTATION LOGIC
BUSINESS LOGIC
DATA LOGIC

CLIENT/SERVER DATABASE
SYSTEM

CLIENT/SERVER DBMS



Manages user interface
Accepts user data
Processes application/business logic
Generates database requests (SQL)
Transmits database requests to server
Receives results from server
Formats results according to application logic
Present results to the user
CLIENT PROCESS
CLIENT/SERVER DBMS



Accepts database requests
Processes database requests
Performs integrity checks
Handles concurrent access
Optimises queries
Performs security checks
Enacts recovery routines
Transmits result of database request to client
SERVER PROCESS


Data Request
Data Response


CLIENT/SERVER
DBMS ARCHITECTURE
CLIENT
#1
CLIENT
#2
CLIENT
#3
PRESENTATION LOGIC
BUSINESS LOGIC
DATA LOGIC
(FAT CLIENT)
D/BASE
SERVER


D/BASE
SERVER




Data Request
Data Response


CLIENT/SERVER
DBMS ARCHITECTURE
CLIENT
#1
CLIENT
#2
CLIENT
#3
PRESENTATION LOGIC
BUSINESS LOGIC
DATA LOGIC
(THIN CLIENT)
LAN
CLIENT
CLIENT
LAN
CLIENT
CLIENT
CLIENT
CLIENT
LAN
CLIENT
CLIENT
LAN
CLIENT
Leyton
CLIENT
CLIENT
CLIENT
Stratford
D
B
M
S

Barking Leytonstone
DISTRIBUTED PROCESSING ARCHITECTURE
CLIENT
CLIENT
CLIENT
CLIENT

DISTRIBUTED DATABASE
SYSTEM


A distributed database system is a collection of
logically related databases that co-operate in a
transparent manner.

Transparent implies that each user within the
system may access all of the data within all of the
databases as if they were a single database

There should be location independence i.e.- as
the user is unaware of where the data is located it
is possible to move the data from one physical
location to another without affecting the user.
DISTRIBUTED DATABASES
WHAT IS A DISTRIBUTED DATABASE?
LAN
CLIENT
CLIENT
CLIENT
CLIENT
D
B
M
S

DISTRIBUTED DATABASE ARCHITECTURE
LAN
CLIENT
CLIENT
CLIENT
CLIENT
D
B
M
S

Leytonstone
CLIENT
CLIENT
CLIENT
D
B
M
S

Stratford
CLIENT
CLIENT
CLIENT
CLIENT
D
B
M
S

Barking
CLIENT
CLIENT
CLIENT
Leyton
D/BASE
SERVER #1
CLIENT
#1
D/BASE
SERVER #2
CLIENT
#2
CLIENT
#3
M:N CLIENT/SERVER DBMS ARCHITECTURE
NOT TRANSPARENT!

DB
Computer
Network
Site 2
Site 1
GSC
DDBMS
DC LDBMS
GSC
DDBMS
DC

LDBMS = Local DBMS
DC = Data Communications
GSC = Global Systems Catalog
DDBMS = Distributed DBMS
COMPONENTS OF A DDBMS
Reduced Communication Overhead
Most data access is local, less expensive and performs
better.

Improved Processing Power
Instead of one server handling the full database, we now
have a collection of machines handling the same database.

Removal of Reliance on a Central Site
If a server fails, then the only part of the system that is
affected is the relevant local site. The rest of the system
remains functional and available.

DISTRIBUTED DATABASES
ADVANTAGES
Expandability
It is easier to accommodate increasing the size of the
global (logical) database.

Local autonomy
The database is brought nearer to its users. This can effect
a cultural change as it allows potentially greater control
over local data .

DISTRIBUTED DATABASES
ADVANTAGES

A distributed system looks exactly like
a non-distributed system to the user!
1. Local autonomy
2. No reliance on a central site
3. Continuous operation
4. Location independence
5. Fragmentation independence
6. Replication independence
7. Distributed query independence
8. Distributed transaction processing
9. Hardware independence
10. Operating system independence
11. Network independence
12. Database independence
DISTRIBUTED DATABASES
DATES TWELVE RULES FOR A DDBMS

Data Allocation

Data Fragmentation

Distributed Catalogue Management

Distributed Transactions

Distributed Queries (see chapter 20)
DISTRIBUTED DATABASES
ISSUES

1. Locality of reference
Is the data near to the sites that need it?

2. Reliability and availability
Does the strategy improve fault tolerance and accessibility?
3. Performance
Does the strategy result in bottlenecks or under-utilisation of resources?
4. Storage costs
How does the strategy effect the availability and cost of data storage?
5. Communication costs
How much network traffic will result from the strategy?
DISTRIBUTED DATABASES
DATA ALLOCATION METRICS

CENTRALISED

DISTRIBUTED DATABASES
DATA ALLOCATION STRATEGIES
Locality of Reference
Reliability/Availability
Storage Costs
Performance
Communication Costs
Lowest
Lowest
Lowest
Unsatisfactory
Highest

PARTITIONED/FRAGMENTED

DISTRIBUTED DATABASES
DATA ALLOCATION STRATEGIES
Locality of Reference
Reliability/Availability
Storage Costs
Performance
Communication Costs
High
Low (item) High (system)
Lowest
Satisfactory
Low

COMPLETE REPLICATION

DISTRIBUTED DATABASES
DATA ALLOCATION STRATEGIES
Locality of Reference
Reliability/Availability
Storage Costs
Performance
Communication Costs
Highest
Highest
Highest
High
High (update) Low (read)

SELECTIVE REPLICATION

DISTRIBUTED DATABASES
DATA ALLOCATION STRATEGIES
Locality of Reference
Reliability/Availability
Storage Costs
Performance
Communication Costs
High
Average
Satisfactory
Low
Low (item) High (system)

Usage
Applications are usually interested in views not whole relations.

Efficiency
Its more efficient if data is close to where it is frequently used.

Parallelism
It is possible to run several sub-queries in tandem.

Security
Data not required by local applications is not stored at the local
site.

DISTRIBUTED DATABASES
WHY FRAGMENT DATA?
DISTRIBUTED DATABASES
HORIZONTAL DATA FRAGMENTATION
333.00 STRATFORD KHAN 456
500.00 BARKING ONO 400
340.14 BARKING GREEN 350
23.17 STRATFORD SMITH 345
200.00 BARKING GRAY 324
1000.00 STRATFORD JONES 200
BALANCE BRANCH CUSTOMER ACCOUNT
Horizontal Fragmentation: Consists of a Restriction on a Relation.

e.g., (
branch = Stratford
Account)
DISTRIBUTED DATABASES
HORIZONTAL DATA FRAGMENTATION
STRATFORD
STRATFORD
STRATFORD
333.00 KHAN 456
23.17 SMITH 345
1000.00 JONES 200
BALANCE BRANCH CUSTOMER ACCT NO.
BARKING
BARKING
BARKING
500.00 ONO 400
340.14 GREEN 350
200.00 GRAY 324
BALANCE BRANCH CUSTOMER ACCT NO.
STRATFORD BRANCH
BARKING BRANCH
DISTRIBUTED DATABASES
VERTICAL DATA FRAGMENTATION
KJTR78 KHA456T 0208-500-5821 STRATFORD KHAN 456
ZZEE56 GRA324S 0208-545-7528 BARKING GRAY 324
XXYY22 JON200T 0208-500-9000 STRATFORD JONES 200
PASSWORD LOGIN PHONE NO SITE NAME S#
Vertical Fragmentation: Consists of a Projection on a Relation.

e.g., (
S#, NAME, SITE, PHONE NO
Student)
DISTRIBUTED DATABASES
VERTICAL DATA FRAGMENTATION
STRATFORD
BARKING
STRATFORD
KHAN 456
GRAY 324
0208-500-5821
0208-545-7528
0208-500-9000
JONES 200
PHONE NO. SITE NAME S#
KJTR78
ZZEE56
XXYY22
KHA456T 456
GRA324S 324
JON200T 200
PASSWORD LOGIN-ID S#
STUDENT ADMINISTRATION
NETWORK ADMINISTRATION
DISTRIBUTED DATABASES
DISTRIBUTED CATALOG MANAGEMENT
Centralised Global Catalog

One site maintains the full global catalog. All changes to
any local system catalog have to be propagated to the site
maintaining the global catalog. Bad performance, single
point of failure, compromises site autonomy.


Dispersed Catalog

There is no physical global catalog. Each time a remote
data item is required, the catalogues from ALL other sites
are examined for the item. This has severe performance
penalties.
DISTRIBUTED DATABASES
DISTRIBUTED CATALOG MANAGEMENT
Replicated Global Catalog

Each site maintains its own global catalog. Although this
greatly speeds up remote data location, it is very
inefficient to maintain. A detail of every data item added,
changed or deleted locally has to be propagated to ALL
other sites .

Local-Master Catalog

Each site maintains both its local system catalog as well
as a catalog of all of its data items that are replicated at
other sites. This avoids compromising site autonomy, is
fairly efficient, and is not a single point of failure.
A
T
O
M
I
C

D
I
S
T
R
I
B
U
T
E
D

T
R
A
N
S
A
C
T
I
O
N


DISTRIBUTED DATABASES
DISTRIBUTED TRANSACTIONS
Stratford DB
Barking DB
Leyton DB
Stratford
DBMS
Stratford
Client
Stratford
Client
Stratford
Client
Barking
DBMS
Leyton
DBMS
Global Transaction

(a) Debit Stratford A/C 500
(b) Credit Barking A/C 350
(c) Credit Leyton A/C 150
(a)
(b)
(c)
TWO-PHASE COMMIT (2PC) - OK
TWO-PHASE COMMIT (2PC) - ABORT

Architectural complexity.

Cost.

Security.

Integrity control more difficult.

Lack of standards.

Lack of experience.

Database design more complex.
DISTRIBUTED DATABASES
DISADVANTAGES OF DDBMSs

Вам также может понравиться