Вы находитесь на странице: 1из 48

Chap.

5 Distributed Database Design

X Introduction
– Alternative design strategies
X Distribution design issues
X Data fragmentation
X Data allocation

M.H. Kim, KAIST


1

Introduction: Design Strategies

X Alternative design strategies


» approaches of distribution design
– Top-down design
» suitable when designing systems from scratch
» mostly in homogeneous systems
main focus in this chapter
– Bottom-up design
» suitable when DBs already exist at a number of sites
» mostly in heterogeneous systems
will be discussed in Chapter 15

M.H. Kim, KAIST


2

1
Introduction: Design Strategies (cont’d)
Requirements
Analysis

Objectives

User Input
Conceptual View Design
Design View Integration

GCS Access External


Information Schema

Distribution
Design User Input

LCS

Physical
Design
Top-down design process
LIS
M.H. Kim, KAIST
3

Introduction: Design Strategies (cont’d)

  Distribution design
» design the local conceptual schemas
9 by distributing entities over the sites of DCS
– fragmentation
– allocation

M.H. Kim, KAIST


4

2
Distribution Design Issues (cont’d)

X Reasons for fragmentation


y relation may not be a suitable unit of distribution
– application views are usually subsets of relations
» i.e., locality or proximity
– permits a number of transactions to execute concurrently
y i.e., transactions that access different portions of a relation
» inter-query concurrency
» intra-query concurrency
y i.e., parallel execution of a single query

M.H. Kim, KAIST


5

Distribution Design Issues (cont’d)

  Disadvantage of fragmentation
– may require extra processing, e.g., join
» for views that cannot be defined on a single fragment
– semantic data control is more difficult
» especially, integrity enforcement

M.H. Kim, KAIST


6

3
Distribution Design Issues (cont’d)

X Fragmentation alternatives
– horizontal fragmentation
– vertical fragmentation

M.H. Kim, KAIST


7

Distribution Design Issues (cont’d)

(Ex) Horizontal fragmentation

PNO PNAME BUDGET LOC


PROJ
P1 Instrumentation 150000 Montreal
P2 Database Develop. 135000 New York
P3 CAD/CAM 250000 New York
P4 Maintenance 310000 Paris
P5 CAD/CAM 500000 Boston

PROJ1: projects with budgets less than $200,000

PROJ2: projects with budgets greater than or equal to $200,000

M.H. Kim, KAIST


8

4
Distribution Design Issues (cont’d)

(Example cont’d)

PROJ1 PNO PNAME BUDGET LOC


P1 Instrumentation 150000 Montreal
P2 Database Develop. 135000 New York

PROJ2 PNO PNAME BUDGET LOC


P3 CAD/CAM 250000 New York
P4 Maintenance 310000 Paris
P5 CAD/CAM 500000 Boston

M.H. Kim, KAIST


9

Distribution Design Issues (cont’d)

(Ex) Vertical fragmentation

PNO PNAME BUDGET LOC


PROJ P1 Instrumentation 150000 Montreal
P2 Database Develop. 135000 New York
P3 CAD/CAM 250000 New York
P4 Maintenance 310000 Paris
P5 CAD/CAM 500000 Boston

PROJ1: information about project budgets


PROJ2: information about project names and locations
M.H. Kim, KAIST
10

5
Distribution Design Issues (cont’d)

(Example cont’d)

PROJ1 PROJ2

PNO BUDGET PNO PNAME LOC


P1 150000 P1 Instrumentation Montreal
P2 135000 P2 Database Develop. New York
P3 250000 P3 CAD/CAM New York
P4 310000 P4 Maintenance Paris
P5 500000 P5 CAD/CAM Boston

M.H. Kim, KAIST


11

Distribution Design Issues (cont’d)

X Degree of fragmentation

a large number of alternatives

tuples relation
or
attributes

– find the suitable level of partitioning within this range


» can only be defined w.r.t. the applications that will run on the DB

M.H. Kim, KAIST


12

6
Distribution Design Issues (cont’d)

X Correctness of fragmentation
– Completeness
» decomposition of relation R into fragments R1, R2, …, Rn is
complete iff each data item in R can also be found in some Ri
– Disjointness
» if relation R is decomposed into fragments R1, R2, …, Rn, and data
item di is in Rj, then di should not be in any other fragment Rk (k≠j)
– Reconstruction
» if relation R is decomposed into fragments R1, R2, …, Rn, then
there should exist some relational operator ∇ such that
R = ∇1≤i≤n Ri

M.H. Kim, KAIST


13

Distribution Design Issues (cont’d)

X Allocation alternatives
– non-replicated
» partitioned: each fragment resides at only one site
– replicated
y good for reliability and efficiency of read-only-queries
y may cause trouble in update
» fully replicated: each fragment at all sites
» partially replicated : each fragment at some of the sites

M.H. Kim, KAIST


14

7
Distribution Design Issues (cont’d)

  Comparison of replication alternatives

full-replication partial-replication partitioning

query same difficulty


processing easy

directory easy or
management non-existent same difficulty

concurrency moderate
control difficult easy

reliability very high high low

possible possible
Reality applications realistic applications M.H. Kim, KAIST
15

Distribution Design Issues (cont’d)

X Information requirements
– Database information
for for
– Application information fragmentation allocation

– Communication network information

– Site information
» i.e., computer system information

M.H. Kim, KAIST


16

8
Fragmentation

z Horizontal Fragmentation (HF)

– Primary Horizontal Fragmentation (PHF)

– Derived Horizontal Fragmentation (DHF)

z Vertical Fragmentation (VF)

z Hybrid Fragmentation (HF)

M.H. Kim, KAIST


17

Fragmentation (cont’d)

X Horizontal fragmentation (HF)


– Information requirements
» database information
» application information

M.H. Kim, KAIST


18

9
Fragmentation (cont’d)
  Database Information
– Join graph
» equi-join relationships among relations

PAY
TITLE, SAL Owner
1
L1
EMP PROJ n
ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC Member

L2 L3
ASG
ENO, PNO, RESP, DUR

Join graph M.H. Kim, KAIST


19

Fragmentation (cont’d)

  Application Information
– qualitative information
» minterm predicate
y denotes access patterns of user applications

4 quantitative information
» minterm selectivity
y given a minterm predicate, how much tuples are accessed
» access frequency of the query
y how frequently the query is issued

M.H. Kim, KAIST


20

10
Fragmentation (cont’d)

80/20 rule
» analyzing all the user applications may not be possible
– the most active 20% of user queries account for
» 80 % of the total data accesses
– may be used as a guideline

M.H. Kim, KAIST


21

Fragmentation (cont’d)

z Simple predicate
– given relation R(A1, A2, …, An),
» a simple predicate pj has the form
Ai θ Value
» where θ ∈ {=,<,≤,>,≥,≠} and Value is a value in attribute Ai.
(ex) a single condition in the SQL-WHERE clause

– Pr = {p1,p2, …,pm} denotes the set of all simple predicates in R.

(Ex) simple predicates


9 PNAME = “Maintenance”
9 BUDGET ≤ 200000
M.H. Kim, KAIST
22

11
Fragmentation (cont’d)

z Minterm predicate
– given relation R and Pr = {p1, p2, …, pm},
» define M = {m1,m2,…,mr} as
M = {mi | mi = ∧pj∈Pr pj*}, 1≤j≤m, 1≤i≤z
» where pj* = pj or pj* = ¬(pj).
y i.e., each simple predicate occurs in a minterm predicate
either in its natural form or its negated form

(Ex) minterm predicates


m1: PNAME=“Maintenance” ∧ BUDGET≤200000
m2: NOT(PNAME=“Maintenance”) ∧ BUDGET≤200000
m3: PNAME=“Maintenance” ∧ NOT(BUDGET≤200000)
m4: NOT(PNAME=“Maintenance”) ∧ NOT(BUDGET≤200000)
M.H. Kim, KAIST
23

Fragmentation (cont’d)

4 Access frequency of a query: acc(qi)


– frequency with which a user application qi accesses data

4 Minterm selectivity: sel(mi)


– number of tuples of the relation
» that would be accessed by a user query which is specified
according to a given minterm predicate mi.
y e.g., sel(m1) = 0, sel(m2) = 2

M.H. Kim, KAIST


24

12
Fragmentation: PHF

X Primary horizontal fragmentation (PHF)


– defined by a selection operation on the owner relations
y in the Join graph
– given relation R, its horizontal fragments are
Rj = σFj(R), 1 ≤ j ≤ w,
» where Fj is a selection formula,
y which is (preferably) a minterm predicate mi

M.H. Kim, KAIST


25

Fragmentation: PHF (cont’d)

– horizontal fragment Ri of relation R consists of


» all the tuples of R that satisfy a minterm predicate mi
– there are as many horizontal fragments of relation R
» as there are minterm predicates

Minterm fragment
» horizontal fragment defined by a minterm predicate

M.H. Kim, KAIST


26

13
Fragmentation: PHF (cont’d)

z Outline of PHF
– given
» a relation R, and the set of simple predicates Pr
– output
» the set of fragments of R = {R1, R2, . . . , Rw}
y which obey the fragmentation rule

M.H. Kim, KAIST


27

Fragmentation: PHF (cont’d)

Rule 1 (fragmentation rule)


» a relation or fragment is partitioned into at least two parts which
are accessed differently
y by at least one application
» i.e., access frequency af1 of tuples in fragment ft1, and access
frequency af2 of tuples in fragment ft2 are different by some
application

M.H. Kim, KAIST


28

14
Fragmentation: PHF (cont’d)

z Requirements for the set of simple predicates Pr


– Pr should be complete
– Pr should be minimal

M.H. Kim, KAIST


29

Fragmentation: PHF (cont’d)

  Completeness of simple predicates


– a set of simple predicates Pr is said to be complete
» if and only if two tuples of the same minterm fragment
9 defined by Pr
y have the same probability of being accessed by every
application
» i.e., any two tuples in the fragment have the same access
frequency by every application

M.H. Kim, KAIST


30

15
Fragmentation: PHF (cont’d)

(Ex) Completeness of simple predicates


» assume relation PROJ(PNO,PNAME,BUDGET,LOC) has two
applications defined on it

– application 1
» Find the budgets of projects at “Montreal.
» Find the budgets of projects at “New York.
» Find the budgets of projects at “Paris.
– application 2
» Find projects with budgets less than $200000.
» Find projects with budgets greater than or equal to $200000.

M.H. Kim, KAIST


31

Fragmentation: PHF (cont’d)

(Example cont’d)

– according to application1,
» Pr = {LOC=“Montreal”,LOC=“New York”,LOC =“Paris”}
y but, this is not complete with respect to application 2
– thus, modify
» Pr = {LOC=“Montreal”,LOC=“New York”,LOC =“Paris”,
BUDGET≤200000, BUDGET>200000}
– then, it is complete.

M.H. Kim, KAIST


32

16
Fragmentation: PHF (cont’d)

  Minimality of simple predicates


– If a predicate influences the fragmentation
y i.e., causes a fragment f to be further fragmented
9 into, say, fi and fj
» then there should be at least one application that accesses fi and fj
differently
– In other words, the simple predicate should be relevant in
determining a fragmentation
– if all the predicates of a set Pr are relevant,
» then Pr is minimal

M.H. Kim, KAIST


33

Fragmentation: PHF (cont’d)

(Ex) Minimality of simple predicates


Pr = {LOC=“Montreal”,LOC=“New York”,LOC =“Paris”,
BUDGET≤200000,BUDGET>200000}
– is minimal (in addition to being complete).

However, if we add
PNAME = “Instrumentation” to Pr,
» then Pr is not minimal.
y because there is no application that would access the
resulting fragments any differently

M.H. Kim, KAIST


34

17
Fragmentation: PHF (cont’d)

  Algorithm COM_MIN
– input
» a relation R and a set of simple predicates Pr
– output
» a complete and minimal set of simple predicates Pr´ for Pr

\ F: set of minterm fragments

Rule 1 (fragmentation by a relevant predicate)


» a relation or fragment is partitioned into at least two parts which
are accessed differently
y by at least one application
M.H. Kim, KAIST
35

Fragmentation: PHF (cont’d)

{ Initialization
– find a pi ∈ Pr such that pi partitions R
– according to Rule 1
– set Pr´ ← pi; Pr ← Pr - pi ; F ← fi
/* fi : fragment fi defined according to a minterm predicate defined
over the predicates of Pr´ */
| Iteratively add predicates to Pr´ until it is complete
– find a pj ∈ Pr such that pj partitions some fk
» according to Rule 1
– set Pr´ ← Pr´ ∪ pi; Pr ← Pr - pi; F ← F ∪ fi

M.H. Kim, KAIST


36

18
Fragmentation: PHF (cont’d)

  Primary horizontal fragmentation algorithm


» makes use of COM_MIN to perform fragmentation
– input
» a relation R and a set of simple predicates Pr
– output
» a set of minterm predicates M
y according to which relation R is to be fragmented

M.H. Kim, KAIST


37

Fragmentation: PHF (cont’d)


{ Pr´ ← COM_MIN (R, Pr)
| determine the set M of minterm predicates
} determine the set I of implications among pi ∈ Pr´
» (ex) Loc=“Montreal” implies
9 NOT(Loc=“New York”) ∧ NOT(Loc=“Paris”)
~ eliminate the contradictory minterms from M
9 by using the set of implications I

M.H. Kim, KAIST


38

19
Fragmentation: PHF (cont’d)

(Ex) Primary horizontal fragmentation for PAY and PROJ

z Fragmentation of relation PAY


– application: “Check the salary info and determine raise.”
» employee records are managed at two sites
y one site handles records for salaries greater than 30,000
y the other site handles the rest of records
» this application can run at two sites
– simple predicates
p1 : SAL ≤ 30000, p2 : SAL > 30000
» Pr = {p1, p2} which is complete and minimal, thus Pr´ = Pr

M.H. Kim, KAIST


39

Fragmentation: PHF (cont’d)

(Example cont’d)

– Minterm predicates
m1 : (SAL ≤ 30000)
m2 : (SAL > 30000) , i.e., NOT(SAL ≤ 30000)

PAY1 PAY2
TITLE SAL T IT LE SAL
Mech. Eng. 27000 E lect. E ng. 4000 0
Program mer 24000 S yts. A nal. 3400 0

M.H. Kim, KAIST


40

20
Fragmentation: PHF (cont’d)

(Example cont’d)

z Fragmentation of relation PROJ


– applications
1. Given a location, find the name and budget of projects in
that location.
9 issued at three sites
2. Access project information according to budget.
9 one site accesses ≤ 200000,
9 the other site accesses > 200000

M.H. Kim, KAIST


41

Fragmentation: PHF (cont’d)

(Example cont’d)

– simple predicates
» for application 1,
p1 : LOC = “Montreal”
p2 : LOC = “New York”
p3 : LOC = “Paris”
» for application 2,
p4 : BUDGET ≤ 200000
p5 : BUDGET > 200000

» Pr = Pr´ = {p1, p2, p3, p4, p5}

M.H. Kim, KAIST


42

21
Fragmentation: PHF (cont’d)

(Example cont’d)

– Minterm predicates
9 left after eliminating meaningless ones
» m1 : (LOC = “Montreal”) ∧ (BUDGET ≤ 200000)
» m2 : (LOC = “Montreal”) ∧ (BUDGET > 200000)
» m3 : (LOC = “New York”) ∧ (BUDGET ≤ 200000)
» m4 : (LOC = “New York”) ∧ (BUDGET > 200000)
» m5 : (LOC = “Paris”) ∧ (BUDGET ≤ 200000)
» m6 : (LOC = “Paris”) ∧ (BUDGET > 200000)

M.H. Kim, KAIST


43

Fragmentation: PHF (cont’d)

(Example cont’d)

PROJ1 PROJ2
PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC
Database
P1 Instrumentation 150000 Montreal P2 135000 New York
Develop.

PROJ4 PROJ6
PNO PNAME BUDGET LOC PNO PNAME BUDGET LOC

P3 CAD/CAM 250000 New York P4 Maintenance 310000 Paris

M.H. Kim, KAIST


44

22
Fragmentation: PHF (cont’d)

  Correctness of the PHF algorithm


– Completeness
» clear from the method
y every tuple is in some minterm fragment
– Reconstruction
» if relation R is fragmented into FR = {R1,R2, . . . ,Rr}
R = ∪∀Ri ∈ FR Ri
– Disjointness
» minterm predicates that form the basis of fragmentation should be
mutually exclusive

M.H. Kim, KAIST


45

Fragmentation: DHF

X Derived horizontal fragmentation (DHF)


– defined on a member relation of a link
y according to a selection operation specified on its owner

PAY Owner
TITLE, SAL
1
L1
EMP PROJ n
ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC Member
L2 L3
ASG
ENO, PNO, RESP, DUR

M.H. Kim, KAIST


46

23
Fragmentation: DHF (cont’d)

– each link is an equijoin


» tuples of a member relation participating an equijoin
y can be obtained by means of semijoins

– Given a link L where owner(L)=S and member(L)=R,


» the derived horizontal fragments of R are defined as,
Ri = R Si, 1 ≤ i ≤ w
y where Si = σFi (S)
» here, Fi is the formula according to which the primary horizontal
fragment Si is defined

main objective: efficient join


M.H. Kim, KAIST
47

Fragmentation: DHF (cont’d)

(Ex) Derived horizontal fragmentation


– Consider L1
y owner(L1)=PAY and member(L1)=EMP
» EMP1 = EMP PAY1 , where PAY1 = σSAL≤30000(PAY)
» EMP2 = EMP PAY2 , where PAY2 = σSAL>30000(PAY)

EMP1 EMP2
ENO ENAM E T IT L E ENO ENAM E T IT L E
E3 A. Lee M ech. Eng. E1 J. D oe E le c t. E n g .
E4 J . M ille r P ro g ra m m e r E2 M . S m ith S y st. A n a l.
E7 R . D a v is M ech. Eng. E5 B. C asey S y st. A n a l.
E6 L. Chu E le c t. E n g .
E8 J. Jones S y st. A n a l.

M.H. Kim, KAIST


48

24
Fragmentation: DHF (cont’d)

  Complication in DHF
» there can be multiple links on the target (i.e. member) relation
y i.e., there can be several ways of DHF
– Criteria to decide which DHF
» fragmentation used on more applications
y try to focus on the heavy users
» fragmentation with better join characteristics
y joins can be performed on smaller relations
y joins can be performed in a distributed fashion
9 i.e., distributed join

M.H. Kim, KAIST


49

Fragmentation: DHF (cont’d)

  Distributed join
» sub-joins between horizontally fragmented relations
– efficiency of distributed join:
» affected by the nature of a join graph
y simple join graph between fragments
y complex join graph between fragments

M.H. Kim, KAIST


50

25
Fragmentation: DHF (cont’d)

y Simple join graph between fragments


– there is only one link in each fragment
9 occurs when the link in the join graph is one-many
relationship
» sub-joins can proceed independently and in parallel
» allocating fragments of the owner and member at the same site
may be very effective

M.H. Kim, KAIST


51

Fragmentation: DHF (cont’d)

y Complex join graph between fragments


– some sub-graph is not a simple join graph,
9 occurs when the link in the join graph is many-many
relationship
» difficult for efficient parallel processing

M.H. Kim, KAIST


52

26
Fragmentation: DHF (cont’d)

R1 S1 R1 S1

R2 S2 R2 S1

R3 S3 R3 S2

R4 S4 R4 S3

Simple join graph Complex join graph


between fragments between fragments

M.H. Kim, KAIST


53

Fragmentation: DHF (cont’d)

  Correctness of the DHF algorithm


– Completeness
» if DHF is based on foreign keys, the proof is simple
y thus, referential integrity must be preserved
» otherwise, difficult
– Reconstruction
» reconstruction can be performed by the union operator
– Disjointness
» guaranteed if the join graph (between fragments) is simple
simple join graphs between fragments need to be produced
» otherwise, difficult

M.H. Kim, KAIST


54

27
Fragmentation: VF

X Vertical fragmentation
– has been studied within the centralized context
» physical clustering for the most active sub-relations
– number of alternatives is very large
» for m non-primary key attributes, the possible number of
fragments is B(m), i.e, m-th Bell number
» for large m, B(m) ≈ mm
y e.g., B(10) ≈ 115,000, B(15) ≈ 109, B(30) ≈ 1023

M.H. Kim, KAIST


55

Fragmentation: VF (cont’d)

z Two types of heuristic approaches


– grouping
» from a set of attributes to fragments
– splitting
» from an entire relation to fragments

M.H. Kim, KAIST


56

28
Fragmentation: VF (cont’d)

Replication of the global relation’s key


– easier for completeness, i.e., lossless decomposition
» necessary for reconstruction of the global relation
– easier to enforce many functional dependencies
9 for integrity checking etc.
» most dependencies involve key attributes

M.H. Kim, KAIST


57

Fragmentation: VF (cont’d)

  Information requirements
– attribute affinity
y a measure indicating how closely the attributes are related
9 can be obtained from more primitive usage data

» attribute usage values: use(qi, Aj)


y given a set of queries Q = {q1, q2, …, q3} that will run on the
relation R(A1, A2, …, An),
1 if attribute Aj is referenced by query qi
use(qi, Aj) =
0 otherwise

M.H. Kim, KAIST


58

29
Fragmentation: VF (cont’d)

z Attribute affinity measure: aff(Ai, Aj)


» measures the bond between two attributes of a relation
y according to how they are accessed by applications
– the attribute affinity measure between two attributes Ai and Aj of a
relation R(A1, A2, …, An) with respect to the set of applications Q = (q1,
q2, …, qq) is defined as follows:

aff(Ai, Aj) = Σall queries that access Ai and Aj (query_accesses)

» query_accesses = Σall sites (frequency of query qk) *


(# of accesses to Ai and Aj together per execution of qk)

M.H. Kim, KAIST


59

Fragmentation: VF (cont’d)

(Ex) Attribute affinity


– Consider the following 4 queries for relation PROJ:

q1: SELECT BUDGET q2: SELECT PNAME, BUDGET


FROM PROJ FROM PROJ
WHERE PNO = Value

q3: SELECT PNAME q4: SELECT SUM(BUDGET)


FROM PROJ FROM PROJ
WHERE LOC = Value WHERE LOC = Value

M.H. Kim, KAIST


60

30
Fragmentation: VF (cont’d)

(Example cont’d)
– Let A1 = PNO, A2 = PNAME, A3 = BUDGET, A4 = LOC.

A1 A2 A3 A4
q1 1 0 1 0
q2 0 1 1 0 attribute usage matrix
q3 0 1 0 1
q4 0 0 1 1

M.H. Kim, KAIST


61

Fragmentation: VF (cont’d)

(Example cont’d)
– assume
» each query accesses the attributes once during each execution.
» following frequencies of queries at three sites

S1 S2 S3
q1 15 20 10
q2 matrix for
5 0 0
query frequencies
q3 25 25 25 at three sites
q4 3 0 0

M.H. Kim, KAIST


62

31
Fragmentation: VF (cont’d)

(Example cont’d)
– Then, the attribute affinity matrix AA is A1 A2 A3 A4
A1 45 0 45 0
» e.g., aff(A1, A3) = 15*1 + 20*1 + 10*1 = 45
A2 0 80 5 75
A3 45 5 53 3
A4 0 75 3 78
A1 A2 A3 A4 S1 S2 S3 S

q1 1 0 1 0 q1 15 20 10 attribute affinity (AA) matrix


45
q2 0 1 1 0 q2 5 0 0 ⇒ 5
q3 0 1 0 1 q3 25 25 25 75
q4 0 0 1 1 q4 3 0 0 3

attribute usage query frequency


M.H. Kim, KAIST
63

Fragmentation: VF-Clustering

X BEA algorithm: Clustering algorithm


» take the attribute affinity matrix AA, and
» reorganize the attribute orders to form clusters
y where the attributes in each cluster demonstrate high affinity
to one another
– Bond Energy Algorithm (BEA) finds an ordering of attributes
such that the global affinity measure, AM
AM = Σ Σ ( affinity of Ai and Aj with their neighbors)
i j
is maximized.

M.H. Kim, KAIST


64

32
Fragmentation: VF-Clustering (cont’d)

z Global affinity measure AM


n n
AM = ∑ ∑ aff ( Ai , Aj )[aff ( Ai , Aj −1) + aff ( Ai , Aj +1) + aff ( Ai −1, Aj ) + aff ( Ai +1, Aj )]
i =1 j =1

» where aff(A0, Aj) = aff(Ai, A0) = aff(An+1, Aj) = aff(Ai, An+1) = 0

Since AA matrix is symmetric, we can simply define:


n n
AM = ∑ ∑ aff ( Ai , Aj )[aff ( Ai , Aj −1) + aff ( Ai , Aj +1)]
i =1 j =1
» grouping large values with large ones, and

y small values with small ones

M.H. Kim, KAIST


65

Fragmentation: VF-Clustering (cont’d)

  Bond Energy Algorithm (BEA)


» input: AA Matrix
» output: clustered affinity matrix CA
1. Initialization
– place and fix one of the columns of AA in CA.
2. Iteration
» suppose i is the number of columns already placed in CA
– pick each of the remaining n-i columns, and place it in the first i+1
possible positions in the CA matrix.
– for each column, choose the placement that makes the most
contribution to the global affinity measure.
3. Row ordering
– order the rows according to the column ordering.
M.H. Kim, KAIST
66

33
Fragmentation: VF-Clustering (cont’d)

z Contribution of placing attribute Ak between Ai and Aj

cont(Ai, Ak, Aj) = 2bond(Ai, Ak) + 2bond(Ak, Aj) - 2bond(Ai, Aj)

where

bond(Ax, Ay) =zΣ


=1
aff(Az, Ax)aff(Az,Ay)

M.H. Kim, KAIST


67

Fragmentation: VF-Clustering (cont’d)

– definition of AM
n n
AM =
∑∑aff ( A , A )[aff ( A , A
i =1 j =1
i j i j −1 ) + aff ( Ai , Aj +1 )]
n n n
=
∑[∑aff ( A , A )aff ( A , A
j =1 i =1
i j i j −1 ) + ∑aff ( Ai , Aj )aff ( Ai , Aj +1)]
i =1

– Bond between two attributes


n
» bond(Ax, Ay) = ∑aff ( A , A )aff ( A , A )
z =1
z x z y

n
– then, AM = ∑[bond( A , A
j =1
j j −1 ) + bond( Aj , Aj +1 )]

M.H. Kim, KAIST


68

34
Fragmentation: VF-Clustering (cont’d)
l=i: bond(Ai-1, Ai) + bond(Ai, Ai+1)
– Consider the following n attributes
l=i+1: bond(Ai, Ai+1) + bond(Ai+1, Ai+2)

A1 A2 L Ai-1 Ai Aj Aj+1 L An

then, the global affinity measure can be written as:

i −1
AMold = ∑[bond( A , A
l =1
l l −1 ) + bond( Al , Al +1 )] +

∑[bond( Al , Al−1 ) + bond( Al , Al+1 )] +


l =i +2

2bond( Ai , Aj ) + bond( Ai , Ai −1) + bond( Ai +1, Ai + 2 )

M.H. Kim, KAIST


69

Fragmentation: VF-Clustering (cont’d)

– Consider a new attribute Ak between Ai and Aj

A1 A2 L Ai-1 Ai Ak Aj Aj+1 L An

AMnew = AMold + 2bond( Ai , Ak ) + 2bond( Ak , Aj ) - 2bond( Ai , Aj )

– thus, Contribution of placing Ak between Ai and Aj :

cont (Ai, Ak, Aj) = AMnew - AMold

= 2[bond( Ai , Ak ) + bond( Ak , Aj ) − bond( Ai , Aj )]

M.H. Kim, KAIST


70

35
Fragmentation: VF-Clustering (cont’d)

(Ex) Attribute clustering by BEA


– Consider the following AA matrix and the CA matrix
» where A1 and A2 have been placed.
– Now try to place A3:

A1 A2 A3 A4 A1 A2
A1 45 0 45 0 45 0
A2 0 80 5 75 0 80
AA = CA =
A3 45 5 53 3 45 5
A4 0 75 3 78 0 75

M.H. Kim, KAIST


71

Fragmentation: VF-Clustering (cont’d)

(Example cont’d)

Ordering (0-3-1) :
cont(A0, A3, A1) = 2bond(A0, A3) + 2bond(A3, A1) - 2bond(A0, A1)
= 0 + 2(45*45 + 45*53) - 0 = 2*4410 = 8820
Ordering (1-3-2) :
cont(A1, A3, A2) = 2bond(A1, A3) + 2bond(A3, A2) - 2bond(A1, A2)
= 2*4410 + 2(80*5 + 5*53 + 75*3) - 2(45*5)
= 2*4410 + 2*890 + 2*225 = 10150
Ordering (2-3-4) :
cont(A2, A3, A4) = 2bond(A2, A3) + 2bond(A3, A4) - 2bond(A2, A4)
= 890 + 0 - 0 = 1780

M.H. Kim, KAIST


72

36
Fragmentation: VF-Clustering (cont’d)

(Example cont’d)

Therefore, the CA matrix has the form

A1 A3 A2
A1 45 45 0
A3 0 5 80
A2 45 53 5
A4 0 3 75

M.H. Kim, KAIST


73

Fragmentation: VF-Clustering (cont’d)

(Example cont’d)

– Now, try to place A4


» by similar calculation A4 should be placed to the right of A2 .

» Thus, the CA matrix has the form

A1 A3 A2 A4
A1 45 45 0 0
A2 0 5 80 75
A3 45 53 5 3
A4 0 3 75 78
M.H. Kim, KAIST
74

37
Fragmentation: VF-Clustering (cont’d)

(Example cont’d)

– Row ordering
» the final form of the CA matrix (after row ordering) is

A1 A3 A2 A4
A1 45 45 0 0
A3 45 53 5 3
A2 0 5 80 75
A4 0 3 75 78

M.H. Kim, KAIST


75

Fragmentation: VF-Partitioning

X Partitioning algorithm
– divide a set of clustered attributes {A1, A2, …,An} into two (or
more) sets {A1, A2, …, Ai} and {Ai+1, …, An}
» such that these sets of attributes are accessed
y solely, or
y for the most part, by distinct applications

M.H. Kim, KAIST


76

38
Fragmentation: VF-Partitioning (cont’d)

A1 A2 A3 • • • Ai Ai+1 • • • An
A1
A2 TA: top attributes
A3 TA
• BA: bottom attributes


Ai
Ai+1
• BA


An

M.H. Kim, KAIST


77

Fragmentation: VF-Partitioning (cont’d)

z Sets of applications
TQ = set of applications that access only TA
BQ = set of applications that access only BA
OQ = set of applications that access both TA and BA

z Cost for the applications


CTQ = total number of accesses to attributes by applications
that access only TA
CBQ = total number of accesses to attributes by applications
that access only BA
COQ = total number of accesses to attributes by applications
that access both TA and BA

M.H. Kim, KAIST


78

39
Fragmentation: VF-Partitioning (cont’d)

Q Then, find the point along the diagonal that maximizes


CTQ*CBQ - COQ2
– defines two fragments
y such that CTQ and CBQ are as equal as possible
» balance the processing loads
y when the fragments are distributed
– the partitioning algorithm has complexity O(n)
» n: number of attributes

M.H. Kim, KAIST


79

Fragmentation: VF-Partitioning (cont’d)

(Ex) Partitioning the CA matrix


– Consider the CA matrix in the previous example.
– after partitioning,
» F = {F1, F2}
y F1 = (A1, A3)
y F2 = (A2, A4)

M.H. Kim, KAIST


80

40
Fragmentation: VF-Partitioning (cont’d)

  Problems in the partitioning algorithm


– Design of m-way partitioning
» try 1, 2, …, m -1 split points along diagonal, and
y check the best point for each of these
m
9 problem: its cost has complexity O(2 )
» recursive application of the binary partitioning
y apply the binary partitioning algorithm to each of the fragment
obtained in the previous iteration, recursively
y may be better alternative

M.H. Kim, KAIST


81

Fragmentation: VF-Partitioning (cont’d)

– Cluster forming in the middle of the CA matrix


» shift a row up and a column left and apply the algorithm to find the
“best” partitioning point
» do this for all possible shifts

Refer to Algorithm PARTITION in page 144 in the text: O(n2)

M.H. Kim, KAIST


82

41
Fragmentation: VF-Partitioning (cont’d)

  Correctness of the partitioning algorithm


» consider a relation R, defined over attribute set A and key K, that
generates the vertical partitioning FR= {R1, R2, …, Rr}.
– Completeness
» the following should be true for A :
A = ∪ ARi
– Reconstruction
» reconstruction can be achieved by
R= K Ri , ∀Ri∈FR
– Disjointness
» duplicated keys (or TID’s) are not considered to be overlapping

M.H. Kim, KAIST


83

Fragmentation: HF

X Hybrid fragmentation
– VF may be followed by HF, or vice versa
» producing a tree-structured partitioning

R
HF HF

R1 R2

VF VF VF VF VF

R11 R12 R21 R22 R23

M.H. Kim, KAIST


84

42
Fragment Allocation

X Allocation problem
» allocation of resources across the network has been much studied
y however, most of this work is about placing files
9 rather than DDB design
– Given
F = {F1, F2, …, Fn} fragments
S = {S1, S2, …, Sm} network sites
Q = {q1, q2, …, qq} applications
– Find the “optimal” distribution of F to S.

M.H. Kim, KAIST


85

Fragment Allocation (cont’d)

  Definition of optimality
1. Minimal cost
» communication cost +
» storage cost +
» processing cost (read & update)
2. Performance
» response time and/or
» throughput

– optimality measure should include both performance and the


cost factors

M.H. Kim, KAIST


86

43
Fragment Allocation (cont’d)

  File allocation (FAP) vs Database allocation (DAP)


» FAP problem is NP-complete
» DAP problem is more complex
– fragments are not individual files
y relationships between fragments have to be considered
– access to databases is more complicated
y relationship between allocation and query processing
» simple “remote file access model” may not be applicable
– cost of integrity enforcement should be considered
– cost of concurrency control should be considered

M.H. Kim, KAIST


87

Fragment Allocation (cont’d)

X Information requirements
– Database information
» selectivity of fragment Fj with respect to query qi
y # of tuples in Fj that need to be accessed for qi
» size of a fragment
– Application information
» number of read accesses of a query to a fragment
» number of update accesses of a query to a fragment
» a matrix indicating which queries updates which fragments
» a similar matrix for retrievals
» originating site of each query

M.H. Kim, KAIST


88

44
Fragment Allocation (cont’d)

– Site information
» storage capacity
» processing capacity
» unit cost of storing data at a site
» unit cost of processing at a site
– Network information
» communication cost per frame between two sites
» frame size

M.H. Kim, KAIST


89

Fragment Allocation (cont’d)

X Allocation model
» minimize the total cost of processing and storage
y while trying to meet response time constraints
– min(Total cost)
» subject to
y response time constraint
y storage constraint
y processing constraint

decision variable
1 if fragment Fi is stored at site Sj
xij =
0 otherwise
M.H. Kim, KAIST
90

45
Fragment Allocation (cont’d)

  Total cost

Σall queries (query processing cost) +


Σall sitesΣall fragments (storage cost of a fragment at a
site)

– storage cost of fragment Fj at site Sk


» (unit storage cost at Sk) * (size of Fj) * xjk

– query processing cost


» processing component + transmission component

M.H. Kim, KAIST


91

Fragment Allocation (cont’d)

z Query processing cost for one application

{ Processing component
» access cost + integrity enforcement cost + concurrency control cost

– access cost
Σ Σ
all sites all fragments (# of read accesses + # of update accesses) *
xij * (local processing cost at a site)

4 simple assumption
y read cost = update cost

– integrity enforcement and concurrency control costs


» can be similarly calculated
M.H. Kim, KAIST
92

46
Fragment Allocation (cont’d)

| Transmission component
» cost for updates + cost for retrievals

– cost for updates

Σ Σ
all sites all fragments(cost of update message) * xij +

Σ all sites Σ all fragments(cost of acknowledgment) * xij

– cost for retrieval

Σ all fragments minall sites (cost of retrieval request +


cost of sending back the result) * xij

M.H. Kim, KAIST


93

Fragment Allocation (cont’d)

  Constraints
– response time constraint
» (execution time of a query) ≤ (maximum allowable response time
for that query)

– storage capacity constraint (for a site)


Σ all fragments (storage requirement of a fragment at that site) ≤
(storage capacity at that site)

– processing capacity constraints (for a site)


Σ all queries (processing load of a query at that site) ≤
(processing capacity of that site)

M.H. Kim, KAIST


94

47
Fragment Allocation (cont’d)

X Solution methods
y FAP is NP-complete
y DAP is also NP-complete
» has to look for heuristic methods
– heuristics commonly adopted for FAP and DAP
» knapsack problem solutions
» branch and bound techniques
» network flow problem solutions

M.H. Kim, KAIST


95

Fragment Allocation (cont’d)

  Attempts to reduce the complexity of the problem


– find an optimal non-replicated solution at the first step
9 i.e., ignore replication at first
» replication is handled at the second step
y by applying a greedy algorithm to improve the initial feasible
solution

M.H. Kim, KAIST


96

48

Вам также может понравиться