Академический Документы
Профессиональный Документы
Культура Документы
For example, when multiple instances of a given piece of information occur in a table, the
possibility exists that these instances will not be kept consistent when the data within the table
is updated, leading to a loss of data integrity. A table that is sufficiently normalized is less
vulnerable to problems of this kind.
Higher degrees of normalization typically involve more tables and create the need for
a larger number of joins, which can reduce performance. Accordingly, more highly
normalized tables are typically used in database applications involving many isolated transactions
(e.g. an Automated teller machine), while less normalized tables tend to be used in database
applications that need to map complex relationships between data entities and data attributes (e.g.
a reporting application).
Figure 1
EMPLOYEE
Ename Ssn Bdate Address Dnumber
Smith, John B 123456789 1965-01-09 731 Fondren, Houston, TX 5
Wong, Franklin T. 333445555 1955-12-08 638 Voss, Houston, TX 5
Zelyala, Aliciya J. 999887777 1968-07-19 3321 Castle, Spring, TX 4
Wallance, Jennifer S. 987654321 1941-06-20 291 Berry, Belliaire, TX 4
1
Sudipta Saha Page 2 5/6/2011
DEPARTMENT
Dname Dnumber Dmgr_ssn
Research 5 333445555
Administrator 4 987654321
Headquaters 1 888665555
Figure 2
EMP_DEPT
Ename Ssn Bdate Address Dnumber Dname Dmgr_ssn
Smith, John B 123456789 1965-01-09 731 Fondren, Houston, TX 5 Research 333445555
Wong, Franklin T. 333445555 1955-12-08 638 Voss, Houston, TX 5 Research 333445555
Zelyala, Aliciya J. 999887777 1968-07-19 3321 Castle, Spring, TX 4 Administrator 987654321
Wallance, Jennifer S. 987654321 1941-06-20 291 Berry, Belliaire, TX 4 Administrator 987654321
Narayan, Ramesh K. 666884444 1962-09-15 975 Fire Oak, Humble, TX 5 Research 333445555
English, Joyce A. 453453453 1972-07-31 5631 Rice, Houston, TX 5 Research 333445555
Jabbar, Ahamed V. 987987987 1969-03-29 980 Dallas, Houston, TX 4 Administrator 987654321
Borg, James E 888665555 1937-11-10 450, Stone, Houston, TX 1 Headquaters 888665555
♦Insertion anomaly – There is circumstance in which certain facts cannot be recorded at all.
The insertion anomaly occurs when we want to insert a new record in the relation. In
insertion anomaly, the user cannot insert a fact about an entity until he has an additional
fact about another entity.
To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for
the department that the employee works for, or nulls (if the employee does not work for a
department as yet). For example, to insert a new tuple for an employee who works in department
number 5, we must enter the attribute values of department 5 correctly so that they are
consistent with values for department 5 in other tuples in EMP_DEPT. In the design of Figure1
we do not have to worry about this consistency problem because we enter only the department
number in the employee tuple; all other attribute values of department 5 are recorded only once in
the database, as a single tuple in the DEPARTMENT relation.
• It is difficult to insert a new department that has no employees as yet in the EMP_DEPT
relation. The only way to do this is to place null values in the attributes for employee. This
causes a problem because SSN is the primary key of EMP_DEPT, and each tuple is supposed to
represent an employee entity—not a department entity. Moreover, when the first employee is
assigned to that department, we do not need the tuple with null values any more. This problem
does not occur in the design of Figure 1, because a department is entered in the DEPARTMENT
relation whether or not any employees work for it, and whenever an employee is assigned to that
department, a corresponding tuple is inserted in EMPLOYEE.
2
Sudipta Saha Page 3 5/6/2011
♦Deletion anomaly- The deletion anomaly occurs when a record is deleted from the relation.
In this anomaly, the deletion of facts about an entity automatically deleted the fact of
another entity.
For example, if we delete from EMP_DEPT an employee tuple that happens to represent the last
employee working for a particular department, the information concerning that department is lost
from the database. This problem does not occur in the database of Figure 1 because
DEPARTMENT tuples are stored separately.
♦Modification anomaly – The modification anomaly occurs when the record is modified in
the relation. In this anomaly, the modification in the value of specific attribute requires
modification in all records in which that value occurs.
The same information can be expressed on multiple records; therefore updates to the table may
result in logical inconsistencies. If the update is not carried through successfully—then the table
is left in an inconsistent state.
For example, in EMP_DEPT, if we change the value of one of the attributes of a particular
department—say, the manager of department 5—we must update the tuples of all employees who
work in that department; otherwise, the database will become inconsistent. If we fail to update
some tuples, the same department will be shown to have two different values for manager in
different employee tuples, which should not be the case.
Ideally, a relational database table should be designed in such a way as to exclude the possibility
of update, insertion, and deletion anomalies. The normal forms of relational database theory
provide guidelines for deciding whether a particular design will be vulnerable to such
anomalies. It is possible to correct an un-normalized design so as to make it adhere to the
demands of the normal forms: this is called normalization. Removal of redundancies of the
tables will lead to several tables, with referential integrity restrictions between them.
1NF -A relation is in first normal form (1NF) if and only if the domain of an attribute must
include only atomic (simple, indivisible) values and that the value of any attribute in a tuple
must be a single value from the domain of that attribute. Hence, 1NF disallows having a set
of values, a tuple of values, or a combination of both as an attribute value for a single tuple.
Consider the DEPARTMENT relation schema shown in figure 3, whose primary key is
DNUMBER. Each department can have a number of locations. As we can see, DEPARTMENT
relation is not in 1NF because DLOCATIONS is not a single-valued attribute, as illustrated the
first tuple in following Figure.
Figure 3
DEPARTMENT
DNAME DNUMBER DMGRENO DLOCATIONS
Research 5 333445555 {Bellarie, Sugarland, Houston}
Administration 4 987654321 {Stafford}
3
Sudipta Saha Page 4 5/6/2011
There are three main techniques to achieve first normal form for such a relation:
Figure 4
DEPARTMENT
DNAME DNUMBER DMGRENO
Research 5 33445555
Administration 4 987654321
Headquarters 1 888665555
DEPT _ LOCATIONS
DNUMBER DLOCATION
5 Bangalore
5 New Delhi
5 Hyderabad
4 Chennai
1 Hyderabad
2. Expanding the key so that there will be a separate tuple in the original
DEPARTMENT relation for each location of a DEPARTMENT, as shown in
figure 5. In this case, the primary key becomes the combination {DNUMBER,
DLOCATION}. This solution has the disadvantage of introducing redundancy in
the relation.
Figure 5 - DEPARTMENT
DNAME DNUMBER DMGRENO DLOCATION
Research 5 33445555 Bangalore
Research 5 33445555 New Delhi
Research 5 33445555 Hyderabad
Administration 4 987654321 Chennai
Headquarters 1 888665555 Hyderabad
4
Sudipta Saha Page 5 5/6/2011
Figure 6 - - DEPARTMENT
DNAME DNUMBER DMGRENO DLOCATION1 DLOCATION2 DLOCATION3
Research 5 33445555 Bangalore New Delhi Hyderabad
Administration 4 987654321 Chennai Null Null
Headquarters 1 888665555 Hyderabad Null Null
Of the three solutions above, the first is generally considered best because it does not suffer from
redundancy and is completely general, having no limit placed on a maximum number of values.
First normal form also disallows multivalued attributes that are themselves
composite. These are called nested relations because each tuple can have a relation within it.
Figure 7 shows how the EMP_PROJ relation could appear if nesting is allowed. Each tuple
represents an employee entity, and a relation PROJS (PNUMBER, HOURS) within each tuple
represents the employee's projects and the hours per week that employee works on each project.
The schema of this EMP_PROJ relation can be represented as follows
EMP_PROJ (ENO, ENAME, {PROJS (PNUMBER, HOURS)}). The set braces {} identify the
attribute PROJS as multivalued and we list the component attributes that form PROJS between
parentheses ( ).
Figure 7 - EMP_PROJ
Ssn Ename PNUMBERS HOURS
123456789 Smith, John B 1 32.5
2 7.5
666884444 Narayan, Ramesh K. 3 40.0
453453453 English, Joyce A. 1 20.0
2 20.0
333445555 Wong, Franklin T. 2 10.0
3 10.0
10 10.0
20 10.0
999887777 Zelyala, Aliciya J. 30 30.0
10 10.0
987987987 Jabbar, Ahamed V. 10 35.5
30 5.5
987654321 Wallance, Jennifer S. 30 20.0
20 15.5
888665555 Borg, James E 20 NULL
Ssn is the primary key of the EMP_PROJ relation in while PNUMBER is the partial key of the
nested relation; that is, within each tuple, the nested relation must have unique values of PNUMBER.
To normalize this into 1NF, we remove the nested relation attributes into a new relation and propagate the
primary key into it; the primary key of the new relation will combine the partial key with the primary key
of the original relation. Decomposition and primary key propagation yield the schemas EMP_PROJ1 and
EMP_PROJ2 shown in figure 8.
Figure 8
EMP_PROJ1
SSN ENAME
EMP_PROJ2
5
Sudipta Saha Page 6 5/6/2011
This procedure can be applied recursively to a relation with multiple-level nesting to unrest the
relation into a set of 1NF relations.
Functional Dependencies
A functional dependency (FD) is a constraint between two sets of attributes in a relation
from a database. Functional dependencies play a key role in differentiating good database design
from bad database designs.
Definition of Functional Dependency- A functional dependency, denoted by X →Y, between two sets of
attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation
state r of R. The constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], we must also have t1[Y]
= t2[Y].
This means that the values of the Y component of a tuple in r depend on, or are determined by, the values of the X
component; or alternatively, the values of the X component of a tuple uniquely (or functionally) determine the values
of the Y component. We also say that there is a functional dependency from X to Y or that Y is functionally dependent
on X. The abbreviation for functional dependency is FD or f.d. The set of attributes X is called the left-hand side of the
FD, and Y is called the right-hand side.
For example, Figure 9 shows a particular state of the TEACH relation schema. Although at first
glance we may think that TEXT → COURSE, we cannot confirm this unless we know that it is
true for all possible legal states of TEACH.
6
Sudipta Saha Page 7 5/6/2011
Similarly, we can say if a constraint on R states that there cannot be more than one tuple
with a given X-value in any relation instance r(R)—that is, X is a candidate key of R—
this implies that X →Y for any subset of attributes Y of R (because the key constraint
implies that no two tuples in any legal state r(R) will have the same value of X).
1. To test relations to see whether they are legal under a given set of functional
dependencies. If a relation r is legal under a set F of functional dependencies, we say that
r satisfies F.
2. To specify constraints on the set of legal relations.
Examples: -
A B C D
a1 b1 c1 d1
a1 b2 c1 d2
a2 b2 c2 d2
a2 b3 c2 d3
a3 b3 c2 d4
Sample relation r
Here, A -> C is satisfied. There are two tuples that have an A value of a1. These tuples have the
same C value—namely, c1. Similarly, the two tuples with an A value of a2 have the same C
value, c2. There are no other pairs of distinct tuples that have the same A value.
The functional dependency C -> A is not satisfied, however. To see that it is not, consider the
tuples t 1 =: (a2, b3, c2, d3) and t2 = (a3, b3, c2, d4). These two tuples have the same C values,
c2, but they have different A values, a2 and a3, respectively. Thus, we have found a pair of tuples
t1 and t2 such that t1[C]= t2 [C], but t1 [A] ≠ t2 [A].
7
Sudipta Saha Page 8 5/6/2011
satisfied by all relations involving attribute A & B. In general, a functional dependency of the
form α->β is trivial if β ⊆ α, where α, β are set of attributes of any relation R.
However, in the real world, two cities can have streets with the same name. Thus, it is possible, at
some time; to have an instance of the customer relation in which customer-street -> customer-city
is not satisfied. So, we would not include customer-street -> customer-city in the set of
functional dependencies that hold on Customer-schema.
In the loan relation of following figure the dependency loan-number, amount is satisfied. In
contrast to the case of customer-city and customer-street in Customer schema, the real-world
enterprise requires each loan to have only one amount. Therefore, we want to require that loan-
number -> amount be satisfied by the loan relation at all times. In other words, the constraint
loan-number -> amount hold on Loan-schema.
Loan relation
loan-number branch-name amount
L-17 Downtown 1000
L-23 Redwood 2000
L-15 Perryridge 1500
L-14 Downtown 1500
L-93 Mianus 500
8
Sudipta Saha Page 9 5/6/2011
CG-> H
CG -> I
B->H
The functional dependency
A->H
is logically implied. That is, whenever given set of functional dependencies holds on a relation, A
->H must also hold on the relation.
Proof:
Axioms-
Given F, we can compute F+ directly from the formal definition of functional dependency. If F
were large, this process would be lengthy and difficult.
Axioms, or rules of inference, provide a simpler technique for reasoning about functional
dependencies.
Armstrong's Axioms-
We can use the following three rules to find logically implied functional dependencies. By
applying these rules repeatedly, we can find all of F+, given F. This collection of rules is called
Armstrong's axioms in honor of the person who first proposed it.
Transitivity rule- If α -> β holds and β -> γ holds, then a -> γ holds.
Armstrong's axioms are sound, because they do not generate any incorrect functional
dependencies. They are complete, because, for a given set F of functional dependencies, they
allow us to generate all F+.
Although Armstrong's axioms are complete, it is tiresome to use them directly for the
computation of F+. To simplify matters further, we list additional rules. It is possible to use
Armstrong's axioms to prove that these rules are correct.
Union rule: If α -> β holds and α -> γ holds, then α -> β γ holds.
Decomposition rule: If α -> β γ holds, then α -> β holds and α -> γ holds.
Pseudo-transitivity rule: If α -> β holds and γ β -> δ holds, then α γ -> δ holds.
9
Sudipta Saha Page 10 5/6/2011
Suppose that X ⊇ Y and that two tuples t1 and t2 exist in some relation instance r of R such that
t1[X] = t2[X]. Then t1[Y] = t1[Y] because X ⊇ Y; hence, X→Y must hold in r.
Let us assume that X→Y holds in a relation instance r of R but that XZ→YZ does not hold. Then
there must exist two tuples t1and t2 in r such that (1) t1[X] = t2[X], (2) t1 [Y] = t2 [Y], (3) t1 [XZ] =
t2 [XZ], and (4) t1 [YZ] ≠t2 [YZ]. This is not possible because from (1) and (3) we deduce (5) t1[Z]
= t2[Z], and from (2) and (5) we deduce (6) t1 [YZ] = t2 [YZ], contradicting (4).
1. X → Y (given).
2. X → Z (given).
3. X → XY (using augmentation rule on 1 by augmenting with X; notice that XX = X).
4. XY → YZ (using augmentation rule on 2 by augmenting with Y).
5. X →YZ (using transitivity on 3 and 4).
Pseudo-transitivity rule(Using Armstrong’s Axioms)
1. X →Y (given).
2. WY →Z (given).
3. WX →WY (using augmentation rule on 1 by augmenting with W).
4. WX → Z (using transitivity rule on 3 and 2).
A procedure to compute F+
F+ = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F+
for each pair of functional dependencies f1 and f2 in F+
if f1 and f2 can be combined using transitivity
add the resulting functional dependency to F+
10
Sudipta Saha Page 11 5/6/2011
Let us apply our rules to the example of schema R (A, B, C, G, H, I) and the set F of functional
dependencies {A-> B, A-> C, CG-> H, CG-> I, B-> H}. We list several members of F+ here:
A -> H. Since A -> B and B -> H hold, we apply the transitivity rule.
CG-> HI. Since CG -> H and CG ->I, the union rule implies that CG->HI.
AG->I. Since A->C and CG -> I, the pseudo-transitivity rule implies that AG -> I holds.
Another way of finding that AG -> I holds is as follows:
We use the augmentation rule on A -> C to infer AG -> CG. Applying the transitivity rule
to this dependency and CG-> I, we infer AG -> I.
//The left-hand and right-hand sides of a functional dependency are both subsets of R. Since
a set of size n has 2n subsets, there are a total of 2 x 2n = 2 n+1 possible functional
dependencies, where n is the number of attributes in R.
Typically, database designers first specify the set of functional dependencies F that can easily be
determined from the semantics of the attributes of R; then reflexivity, augmentation, and
transitivity are used to infer additional functional dependencies that will also hold on R. A
systematic way to determine these additional functional dependencies is first to determine
each set of attributes α that appears as a left-hand side of some functional dependency in F
and then to determine the set of all attributes that are dependent on α. Thus for each such
set of attributes α, we determine the set of attributes that are functionally determined by α
based on F; is called the closure of α under F.
Let α be a set of attributes. We call the set of all attributes functionally determined by α
under a set F of functional dependencies is the closure of α under F; we denote it by α +.
result := α;
while (changes to result) do
for each functional dependency β->γ in F do
begin
if β ⊆ result then result := result ∪ γ ;
end
Attribute closure algorithm
Here the input is a set F of functional dependencies and the set α of attributes. The output is
stored in the variable result.
To illustrate how the algorithm works, we shall use it to compute (AG)+ with the set F of
functional dependencies {A-> B, A-> C, CG-> H, CG-> I, B-> H}.
11
Sudipta Saha Page 12 5/6/2011
We start with result = AG. The first time that we execute the while loop to test each
functional dependency, we find that
A -> B causes us to include B in result. To see this fact, we observe that A-> B is in F, A
⊆ result (which is AG), and so result: = result ∪ B.
A ->C causes result to become ABCG.
CG->H causes result to become ABCGH.
CG ->I causes result to become ABCGHI
The second time that we execute the while loop, no new attributes are added to result, and
the algorithm terminates.
EMP_PROJ
Canonical Cover
We can reduce the effort spent in checking for violations by testing a simplified
set of functional dependencies that has the same closure as the given set. Any database that
satisfies the simplified set of functional dependencies will also satisfy the original set, and vice
versa, since the two sets have the same closure. However, the simplified set is easier to test.
12
Sudipta Saha Page 13 5/6/2011
Explanation 1:
If F Contains
1) ABC->XYZ
(say α) (say β)
2) BC->XYZ
Then A is extraneous in α
Because F logically implies (F- {ABC ->XYZ}) (BC->XYZ)
As we know by augmentation rule that, If BC->XYZ, then ABC->AXYZ
Then by decomposition rule, If ABC->AXYZ, then ABC-> XYZ
Explanation 2:
If F Contains
1) ABC->XYZ
(say α) (say β)
ABC->YZ
2)
3) YZ ->X
Then X is extraneous in β
Because F logically implies (F- {ABC ->XYZ}) (ABC->YZ)
As we know by transitivity rule that, If ABC->YZ and YZ->X then ABC->X
Then by union rule, If ABC->YZ & ABC->X, then ABC-> XYZ
13
Sudipta Saha Page 14 5/6/2011
Canonical Cover -A canonical cover FC, for F is a set of dependencies such that F logically
implies all dependencies in FC, and FC logically implies all dependencies in F. Furthermore, FC
must have the following properties:
The canonical cover of F, Fc can be shown to have the same closure as F; hence, testing
whether Fc is satisfied is equivalent to testing whether F is satisfied.
Fc = F
repeat
Use the union rule to replace any dependencies in Fc of the form α1->β1 and α2->β2 with α1->β1, β2
Find a functional dependency α->β in Fc with an extraneous attribute either in α or in β
/* Note: the test for extraneous attributes is done using Fc not F */
If an extraneous attribute is found, delete it from α->β
until Fc does not change.
• There are two functional dependencies with the same set of attributes on the left
side of the arrow:
A-> BC
A->B
We combine these functional dependencies into A -> BC
• C is extraneous in A -> BC, since A-> BC is logically implied by A -> B and B-> C.
14
Sudipta Saha Page 15 5/6/2011
Another example –
2NF - Second normal form (2NF) is based on the concept of full functional dependency.
A functional dependency X -> Y is a full functional dependency if removal of any attribute
A from X means that the dependency does not hold any more; that is, for any attribute A ∈
X, (X – {A}) does not functionally determine Y.
15
Sudipta Saha Page 16 5/6/2011
The test for 2NF involves testing for functional dependencies whose left-hand side attributes are
part the primary key. If the primary key contains a single attribute, the test need not be applied at
all.
Example1
Consider the following relations in figure 10:
Figure 10-EMP_PROJ
FD2
FD3
The EMP_ PROJ relation in above figure is in 1NF but is not in 2NF. The nonprime attribute
ENAME violates 2NF because of FD2, as to do the nonprime attributes PNAME and
PLOCATION because of FD3. The functional dependencies FD2 and FD3 make ENAME,
PNAME, and PLOCATION partially dependent on the primary key {SSN, PNUMBER} of
EMP_PROJ, thus violating the 2NF test.
Figure 11
EP1
SSN PNUMBER HOURS
FD1
EP2
SSN ENAME
FD2
Partial and full functional dependencies will now be considered with respect to all candidate
key of a relation.
Example 2
Consider the relation schema LOTS shown in following figure which describes parcels of land
for sale in various counties of a state. Suppose that there are two candidate keys:
16
Sudipta Saha Page 17 5/6/2011
PROPERTY_ID# and {COUNTY_NAME, LOT#}; that is, lot numbers are unique only
within each county, but PROPERTY-ID numbers are unique across counties for the entire
state.
Based on the two candidate keys PROPERTY_ID# and {COUNTY_NAME, LOT#}, we know
that the functional dependencies FD1 and FD2 of Figure hold.
FD1
FD2
FD3
FD4
The LOTS relation schema violates the general definition of 2NF because TAX_RATE is
partially dependent on the candidate key {COUNTY_NAME, LOT#}, due to FD3. To
normalize LOTS into 2NF, we decompose it into the two relations LOTS1 and LOTS2, shown in
following figure. We construct LOTS1 by removing the attribute TAX_RATE that violates 2NF
from LOTS and placing it with COUNTY-NAME (the left-hand side of FD3 that causes the
partial dependency) into another relation LOTS2. Both LOTS1 and LOTS2 are in 2NF. Notice
that FD4 does not violate 2NF and is carried over to LOTS1.
LOTS1
PROPERTY-ID COUNTY_NAME LOT# AREA PRICE
FD1
FD2
FD4
17
Sudipta Saha Page 18 5/6/2011
LOTS2
COUNTY_NAME TAX_RATE
FD3
**The test for 2NF involves testing for functional dependencies whose left-hand site attributes
are part of candidate key. If candidate key contain single attribute, the test not to be applied at all.
Third normal form (3NF) is based on the concept of transitive dependency. A functional
dependency X-> Y in a relation schema R is a transitive dependency if there is a set of
attributes Z that is neither a candidate key nor a subset of any key of R, and both X -> Z
and Z ->Y hold.
3NF- According to Codd's original definition, a relation schema R is in 3NF with respect to
a set F of functional dependencies if it satisfies 2NF and no nonprime attribute of R is
transitively dependent on the primary key.
Or
(General Definition)
A 3NF relation should not have a nonkey attributes functionally determined by another
non-key attribute (or by a set of nonkey attributes).
A relation schema R is in third normal form (3NF) with respect to a set F of functional
dependencies if, whenever a nontrivial functional dependency X -> A holds in R, either (a)
X is a superkey of R, or (b) A is a prime attribute of R.
Example1:
Consider the relation schema EMP_DEPT in the figure 12.
Figure 12
EMP_DEPT
ENAME SSN DOB ADDRESS DNUMBER DENAME DMGRNO
FD1
FD2
The above relation schema is in 2NF, since no partial dependencies on a key exist.
However, EMP_DEPT is not in 3NF because of the transitive dependency of DMGRENO (and
also DNAME) on SSN via DNUMBER. We can normalize EMP_DEPT by decomposing it into
the two 3NF relation schemas EDI and ED2.
ED1
ENAME SSN DOB ADDRESS DNUMBER
FD1
ED2
18
Sudipta Saha Page 19 5/6/2011
FD2
Intuitively, we see that EDI and ED2 represent independent entity facts about employees
and departments.
Intuitively, we can see that any functional dependency in which the left-hand side is part (proper
subset) of the candidate key, or any functional dependency in which the left-hand side is a nonkey
attribute, is a problematic FD. 2NF and 3NF normalization remove these problem FDs by
decomposing the original relation into new relation. In terms of the normalization process, it is
not necessary to remove the partial dependencies before the transitive dependencies, but
historically, 3NF has been defined with the assumption that a relation is tested for 2NF first
before it is tested for 3NF.
Example –2
LOTS1
PROPERTY-ID COUNTY_NAME LOT# AREA PRICE
FD1
FD2
FD4
LOTS2
COUNTY_NAME TAX_RATE
FD3
Here LOTS2 is in 3NF. However, FD4 in LOTS1 violates 3NF because AREA is not a
superkey and PRICE is not a prime attribute in LOTS1. To normalize LOTS1 into 3NF we
decompose it into relation schemas LOTS1A and LOTS1B shown in following figure. We
construct LOTS1A attribute by removing the attribute PRICE that violates 3NF from LOTS1 and
placing it with AREA (the left-hand side of FD4 that causes the transitive dependency) into
another relation LOTS1B. Both LOTS1A and LOTS1B are in 3NF.
LOTS1A
PROPERTY-ID COUNTY_NAME LOT# AREA
FD1
FD2 19
Sudipta Saha Page 20 5/6/2011
LOTS1B
AREA PRICE
FD3
A relation schema R violates the general definition of 3NF if a functional dependency X->A
holds in R that violates both conditions (a) and (b). Violating (b) means that A is a nonprime
attribute. Violating (a) means that X is not a superset of any key of R; hence, X could be
nonprime or it could be a proper subset of a key of R. If X is nonprime, we typically have a
transitive dependency that violates 3NF, whereas if X is a proper subset of a key of R, we have a
partial dependency that violates 3NF (and also 2NF).
Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was found to be
stricter than 3NF. That is, every relation in BCNF is also in 3NF; however, a relation in 3NF
is not necessarily in BCNF.
The formal definition of BCNF differs slightly from the definition of 3NF. The only difference
between definitions of BCNF and 3NF is that condition (b) of 3NF, which allows A to be prime,
is absent from BCNF.
LOTS1A
PROPERTY ID# COUNTY_NAME LOT# AREA
FD1
FD2
FD5
20
Sudipta Saha Page 21 5/6/2011
Here FD5 violates BCNF in LOTS1A because AREA is not a superkey of LOTS1A.
Note that FD5 satisfies 3NF in LOTS1A because COUNTY_NAME is a prime attribute
(condition b), but this condition does not exist in the definition of BCNF. We can decompose
LOTS1A into two BCNF relations LOTS1AX and LOTS1AY, shown in following figure. This
decomposition loses the, functional dependency FD2 because its attributes no longer coexist in
the same relation after decomposition.
LOTS1AX
PROPERTY ID# COUNTY_NAME LOT#
LOTS1AY
AREA COUNTY_NAME
In practice, most relation schemas that are in 3NF are also in BCNF. Only if X ->A holds
in a relation schema R with X not being a superkey and A being a prime attribute will R be
in 3NF but not in BCNF.
Decomposition
If a database is in lower normal form to make it in higher normal form, we should
decompose that relation schema into several schemas with fewer attributes. Careless
decomposition, however, may lead to another form of bad design. We will discuss
some required properties of relational decomposition
1) We must make sure that each attribute in schema R which is decomposed into
D={ R1, R2, ....Rm}will appear in at least one relation schema Ri in the
decomposition so that no attributes are "lost"; formally we have mi=1 Ri=R
This is called the attribute preservation condition of decomposition.
21
Sudipta Saha Page 22 5/6/2011
Lending schema
Branch-Name Branch-City Assets Customer- Loan-Number Amount
Name
Downtown Brooklyn 9000000 Jones L-17 1000
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
Downtown Brooklyn 9000000 Jackson L-14 1500
Mianus Horseneck 400000 Jones L-93 500
Round Hill Horseneck 8000000 Turner L-11 900
Pownal Bennington 300000 Williams L-29 1200
North Town Rye 3700000 Hayes L-16 1300
Downtown Brooklyn 9000000 Johnson L-18 2000
Perryridge Horseneck 1700000 Glenn L-25 2500
Brighton Brooklyn 7100000 Brooks L-10 2200
FD1
FD2
22
Sudipta Saha Page 23 5/6/2011
Branch-customer
Branch-Name Branch-City Assets Customer-Name
Downtown Brooklyn 9000000 Jones
Redwood Palo Alto 2100000 Smith
Perryridge Horseneck 1700000 Hayes
Downtown Brooklyn 9000000 Jackson
Mianus Horseneck 400000 Jones
Round Hill Horseneck 8000000 Turner
Pownal Bennington 300000 Williams
North Town Rye 3700000 Hayes
Downtown Brooklyn 9000000 Johnson
Perryridge Horseneck 1700000 Glenn
Brighton Brooklyn 7100000 Brooks
Customer-loan
Customer- Loan-Number Amount
Name
Jones L-17 1000
Smith L-23 2000
Hayes L-15 1500
Jackson L-14 1500
Jones L-93 500
Turner L-11 900
Williams L-29 1200
Hayes L-16 1300
Johnson L-18 2000
Glenn L-25 2500
Brooks L-10 2200
Of course, there are cases in which we need to reconstruct the loan relation. For example,
suppose that we wish to find all branches that have loans with amounts less than $1000. No
relation in our alternative database contains these data. We need to reconstruct the lending
relation. It appears like the following Relation.
Branch-customer Customer-loan
Branch-Name Branch-City Assets Customer- Loan-Number Amount
Name
Downtown Brooklyn 9000000 Jones L-17 1000
Downtown Brooklyn 9000000 Jones L-93 500
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
23
Sudipta Saha Page 24 5/6/2011
When we compare Branch-customer Customer-loan and the lending relation with which we
started, we notice a difference: Although every tuple that appears in the lending
relation appears in Branch-customer Customer-loan, there are tuples in Branch-
customer Customer-loan that are not in lending. In our example, Branch-
customer Customer-loan has the following additional tuples:
Now we consider the query, "Find all bank branches that have made a loan in an amount less than
$1000." If we look back at Lending relation, we see that the only branches with loan amounts less
than $1000 are Mianus and Round Hill. However, when we apply the expression Branch-
customer Customer-loan, we obtain three branch names: Mianus, Round Hill, and Downtown.
***The word loss in lossy refers to loss of information, not to loss of tuples. If a decomposition
does not have the lossless join property, we may get additional spurious tuples after the
NATURAL JOIN( ) operations are applied; these additional tuples represent erroneous
24
Sudipta Saha Page 25 5/6/2011
information. We prefer the term nonadditive join because it describes the situation more
accurately; if the property holds on a decomposition, we are guaranteed that no spurious tuples
bearing wrong information are added to the result after the d NATURAL JOIN operations are
applied.
Let us consider another alternative design, in which we decompose Lending-schema into the
following two schemas:
Branch-schema = (branch-name, branch-city, assets)
Branch
Branch-Name Branch-City Assets
Downtown Brooklyn 9000000
Redwood Palo Alto 2100000
Perryridge Horseneck 1700000
Mianus Horseneck 400000
Round Hill Horseneck 8000000
Pownal Bennington 300000
North Town Rye 3700000
Brighton Brooklyn 7100000
Loan-info
Branch-Name Customer- Loan-Number Amount
Name
Downtown Jones L-17 1000
Redwood Smith L-23 2000
Perryridge Hayes L-15 1500
Downtown Jackson L-14 1500
Mianus Jones L-93 500
Round Hill Turner L-11 900
Pownal Williams L-29 1200
North Town Hayes L-16 1300
25
Sudipta Saha Page 26 5/6/2011
Branch Loan-info
Branch-Name Branch-City Assets Customer- Loan-Number Amount
Name
Downtown Brooklyn 9000000 Jones L-17 1000
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
Downtown Brooklyn 9000000 Jackson L-14 1500
Mianus Horseneck 400000 Jones L-93 500
Round Hill Horseneck 8000000 Turner L-11 900
Pownal Bennington 300000 Williams L-29 1200
North Town Rye 3700000 Hayes L-16 1300
Downtown Brooklyn 9000000 Johnson L-18 2000
Perryridge Horseneck 1700000 Glenn L-25 2500
Brighton Brooklyn 7100000 Brooks L-10 2200
When we compare Branch Loan-info and the lending relation with which
we started, we notice no difference.
The difference between this example and the preceding one is that for a given branch-name, there
is exactly one assets value and exactly one branch-city (branch-name is superkey of Branch);
whereas a similar statement cannot be made for customer-name. That is, the functional
dependency branch-name -> assets, branch-city holds.
26
Sudipta Saha Page 27 5/6/2011
Input of the algorithm is: - A universal relation R, a decomposition of R into D={R1, R2,
R3, ....,Rm} and a set F of functional dependencies.
Algorithm is as follows:
1. Create an initial matrix S with one row i for each relation Ri in D, and one column j for each
attribute Aj in R.
2. Set S(i, j) := bij for all matrix entries.
3. For each row i representing relation schema Ri
{for each column j representing attribute Aj
{if (relation includes attribute ) then set S(i, j):=aj ;};};
4. Repeat the following loop until a complete loop execution results in no changes to S
{for each functional dependency X →Y in F
{for all rows in S which have the same symbols in the columns corresponding to
attributes in X
{make the symbols in each column that correspond to an attribute in Y be the
same in all these rows as follows: if any of the rows has an "a" symbol for the column, set the
other rows to that same "a" symbol in the column. If no "a" symbol exists for the attribute in any
of the rows, choose one of the "b" symbols that appear in one of the rows for the attribute and set
the other rows to that same "b" symbol in the column ;}
If a row is made up entirely of "a" symbols, goto step 5;};};
5. If a row is made up entirely of "a" symbols, then the decomposition has the lossless join
property; otherwise it does not.
Example 1-
Answer:
The initial Matrix S looks as follows:
SSN ENAME PNUMBER PNAME PLOCATIONS HOURS
EMP_LOCS b11 a2 b13 b14 a5 b16
EMP_PROJ1 a1 b22 a3 a4 a5 a6
27
Sudipta Saha Page 28 5/6/2011
The loop in step 4 of the algorithm cannot change any ‘b’ symbols to ‘a’
symbols, hence the resulting matrix S does not have a row with all a
symbols, and so the decomposition does not have lossless join property.
Example 2-
Answer
The initial Matrix S looks as follows:
SSN ENAME PNUMBER PNAME PLOCATIONS HOURS
EMP a1 a2 b13 b14 b15 b16
PROJ b21 b22 a3 a4 a5 b26
WORKS_ON a1 b32 a3 b34 b35 a6
28
Sudipta Saha Page 29 5/6/2011
Now one row of matrix S made up entirely of "a" symbols, hence the decomposition has
the lossless join property.
Example 3
Let R = ABCDE, R1 = AD, R2 = AB, R3 = BE, R4 = CDE, and R5 = AE. Let the functional
dependencies be: A -> C, B -> C, C -> D, DE -> C, CE -> A. Test if the decomposition of R into
{R1,..,R5} is a lossless join decomposition.
Answer:
29
Sudipta Saha Page 30 5/6/2011
A B C D E
R1 a1 b12 b13 a4 b15
R2 a1 a2 b13 a4 b25
R3 b31 a2 b13 a3 a4 a5
R4 b41 b42 a3 a4 a5
R5 a1 b52 b13 a3 a4 a5
Now one row of matrix S made up entirely of "a" symbols, hence the decomposition
has the lossless join property.
Dependency Preservation
There is another goal in relational-database design: dependency preservation. When an update is
made to the database, the system should be able to check that the update will not create an illegal
relation—that is, one that does not satisfy all the given functional dependencies.
If we want to check updates efficiently, we should design relational- database schemas that allow
update validation without the computation of joins. ---(In other words we should design relational
database schemas in such a way that dependency is preserved.)
To decide whether joins must be computed to check an update, we need to determine what
functional dependencies can be tested by checking each relation individually.
Let F be a set of functional dependencies on a schema R, and let R1, R2, ..., Rm be a
decomposition of R. The projection of F to Ri (denoted by ΠRi(F) )is the set of all functional
dependencies in F+ that include only attributes of Ri. Since all functional dependencies in a
projection involve attributes of only one relation schema, it is possible to test such a dependency
for satisfaction by checking only one relation.
Note that the definition of restriction uses all dependencies in F+, not just those in F. For
instance, suppose F = {A → B, B → C}, and we have a decomposition into AC and AB. The
restriction of F to AC is then A → C, since AC is in F+, even though it is not in F.
The set of projections ΠR1(F), ΠR2(F), … ΠRm(F) is the set of dependencies that can be checked
efficiently. We now must ask whether testing only the restrictions is sufficient. Let F' =
((ΠR1(F)) (ΠR2(F)) … (ΠRm(F))). F' is a set of functional dependencies on schema R,
but, in general, F' ≠F. However, even if F' F, it may be that F´+ = F+. If the latter is true, then
every dependency in F is logically implied by F', and, if we verify that F' is satisfied, we
30
Sudipta Saha Page 31 5/6/2011
have verified that F is satisfied. We say that a decomposition having the property F'+ = F+ is
a dependency-preserving decomposition.
We now give a more efficient test for dependency preservation, which avoids computing
F+. The idea is to test each functional dependency α → β in F by using a modified form
of attribute closure to see if it is preserved by the decomposition. We apply the following
procedure to each α → β in F.
result = α
while (changes to result) do
for each Ri in the decomposition t = (result Ri)+ Ri
result = result t
Example 1:
Suppose R (A, B, C) is a relation schema with a set of functional dependencies F = {A-
>B, B->C}. R (A, B, C) is decomposed into
R1 = {A, B} and R2= {A, C}. Check weather the decomposition is dependency
preserved or not?
Answer:
First we compute F+
Since F+ ≠ F´+, decomposition of R(A, B, C) into R1 = {A, B} and R2= {A, C}are not
dependency preserving.
31
Sudipta Saha Page 32 5/6/2011
Example 2:
Suppose R (A, B, C) is a relation schema with a set of functional dependencies F = {A-
>B, B->C}. R (A, B, C) is decomposed into
R1 = {A, B} and R2= {B, C}. Check weather the decomposition is dependency preserved
or not?
Answer:
First we compute F+
Since F+ = F´+, decomposition of R(A, B, C) into R1 = {A, B} and R2= {A, C}are
dependency preserving.
Example
Suppose R (branch_name, customer_name, banker_name) is a relation schema with a set
of functional dependencies F = {banker_name -> branch_name, (branch_name,
customer_name)-> banker_name}.
32
Sudipta Saha Page 33 5/6/2011
So, from this we can say that not every BCNF decomposition is dependency
preserving.
R (branch_name, customer_name, banker_name) with a set of functional dependencies F
= {banker_name -> branch_name, (branch_name, customer_name)-> banker_name} are
already in 3NF.
3NF BCNF
Definition A relation schema R is in third normal form A relation schema R is in BCNF
(3NF) with respect to a set F of functional with respect to a set F of functional
dependencies if, whenever a nontrivial functional dependencies if whenever a
dependency X -> A holds in R, either (a) X is a nontrivial functional dependency X
superkey of R, or (b) A is a prime attribute of R. -> A holds in R, then X is a
superkey of R.
33
Sudipta Saha Page 34 5/6/2011
meaningful relationships among data items, and null values to represent some of the
there is the problem of repetition of information. possible meaningful relationships
among data items, and there is no
the problem of repetition of
information.
Which is Since it is not always possible to satisfy If we choose BCNF and
preferable BCNF and dependency preservation, we dependency is not preserved,
may be forced to choose between BCNF we have to consider each
and dependency preservation with 3NF. If dependency in a minimum
we choose dependency preservation with cover Fc, that is not preserved
3NF, application programmer needs to in the decomposition. For
worry about writing code to keep each such dependency α→β
redundant data consistent on updates. we define a materialized view
that computes a join of all
relations in the
decomposition, and projects
the result on αβ. The
functional dependency can be
easily tested on this
materialized view.
34