Database Normalization Revised

Sudipta Saha Page 1 5/6/2011
Database Normalization Theory (Upto BCNF)

The goal of a relational-database design is to generate a set of relation schemas
that allows us to store information without unnecessary redundancy, yet also allows us to retrieve
information easily. One approach is to design schemas that are in an appropriate normal form.
Definition: - Database normalization, (sometimes referred to as canonical synthesis), is a

technique for designing relational database tables to minimize duplication of information
and, in so doing, to safeguard the database against certain types of logical or structural
problems, namely data anomalies.
For example, when multiple instances of a given piece of information occur in a table, the
possibility exists that these instances will not be kept consistent when the data within the table
is updated, leading to a loss of data integrity. A table that is sufficiently normalized is less
vulnerable to problems of this kind.
Higher degrees of normalization typically involve more tables and create the need for
a larger number of joins, which can reduce performance. Accordingly, more highly
normalized tables are typically used in database applications involving many isolated transactions
(e.g. an Automated teller machine), while less normalized tables tend to be used in database
applications that need to map complex relationships between data entities and data attributes (e.g.
a reporting application).
Pitfalls in Relational-Database Design
(Why is normalization required? / Why is normalization adopted for

database design? / What are the advantage of normalized relations over non
normalized relation?)
A table that is not sufficiently normalized –is a bad design. It shows following
inconvenience.
• Repetition of information- wastes space.

For example, we can compare the space used by the two relations EMPLOYEE and
DEPARTMENT in Figure 1 with the space for an EMP_DEPT relation in Figure 2. In
EMP_DEPT, the attribute values pertaining to a particular department (DNUMBER,
DNAME, DMGR_SSN) are repeated for every employee who works for that department,
which wastes space. In contrast, each department’s information appears only once in the
DEPARTMENT relation in Figure 1. Only the department number (DNUMBER) is repeated
in the EMPLOYEE relation for each employee who works in that department.
Figure 1
EMPLOYEE
Ename Ssn Bdate Address Dnumber
Smith, John B 123456789 1965-01-09 731 Fondren, Houston, TX 5
Wong, Franklin T. 333445555 1955-12-08 638 Voss, Houston, TX 5
Zelyala, Aliciya J. 999887777 1968-07-19 3321 Castle, Spring, TX 4
Wallance, Jennifer S. 987654321 1941-06-20 291 Berry, Belliaire, TX 4
1
Narayan, Ramesh K. 666884444 1962-09-15 975 Fire Oak, Humble, TX 5

English, Joyce A. 453453453 1972-07-31 5631 Rice, Houston, TX 5
Jabbar, Ahamed V. 987987987 1969-03-29 980 Dallas, Houston, TX 4
Borg, James E 888665555 1937-11-10 450, Stone, Houston, TX 1
DEPARTMENT
Dname Dnumber Dmgr_ssn
Research 5 333445555
Administrator 4 987654321
Headquaters 1 888665555
Figure 2
EMP_DEPT
Ename Ssn Bdate Address Dnumber Dname Dmgr_ssn
Smith, John B 123456789 1965-01-09 731 Fondren, Houston, TX 5 Research 333445555
Wong, Franklin T. 333445555 1955-12-08 638 Voss, Houston, TX 5 Research 333445555
Zelyala, Aliciya J. 999887777 1968-07-19 3321 Castle, Spring, TX 4 Administrator 987654321
Wallance, Jennifer S. 987654321 1941-06-20 291 Berry, Belliaire, TX 4 Administrator 987654321
Narayan, Ramesh K. 666884444 1962-09-15 975 Fire Oak, Humble, TX 5 Research 333445555
English, Joyce A. 453453453 1972-07-31 5631 Rice, Houston, TX 5 Research 333445555
Jabbar, Ahamed V. 987987987 1969-03-29 980 Dallas, Houston, TX 4 Administrator 987654321
Borg, James E 888665555 1937-11-10 450, Stone, Houston, TX 1 Headquaters 888665555
• Logical inconsistencies of various types called update anomalies result from

involving data operations. In such a table we can see the following update anomalies:
♦Insertion anomaly – There is circumstance in which certain facts cannot be recorded at all.
The insertion anomaly occurs when we want to insert a new record in the relation. In
insertion anomaly, the user cannot insert a fact about an entity until he has an additional
fact about another entity.
To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for
the department that the employee works for, or nulls (if the employee does not work for a
department as yet). For example, to insert a new tuple for an employee who works in department
number 5, we must enter the attribute values of department 5 correctly so that they are
consistent with values for department 5 in other tuples in EMP_DEPT. In the design of Figure1
we do not have to worry about this consistency problem because we enter only the department
number in the employee tuple; all other attribute values of department 5 are recorded only once in
the database, as a single tuple in the DEPARTMENT relation.
• It is difficult to insert a new department that has no employees as yet in the EMP_DEPT
relation. The only way to do this is to place null values in the attributes for employee. This
causes a problem because SSN is the primary key of EMP_DEPT, and each tuple is supposed to
represent an employee entity—not a department entity. Moreover, when the first employee is
assigned to that department, we do not need the tuple with null values any more. This problem
does not occur in the design of Figure 1, because a department is entered in the DEPARTMENT
relation whether or not any employees work for it, and whenever an employee is assigned to that
department, a corresponding tuple is inserted in EMPLOYEE.
2
♦Deletion anomaly- The deletion anomaly occurs when a record is deleted from the relation.
In this anomaly, the deletion of facts about an entity automatically deleted the fact of
another entity.
For example, if we delete from EMP_DEPT an employee tuple that happens to represent the last
employee working for a particular department, the information concerning that department is lost
from the database. This problem does not occur in the database of Figure 1 because
DEPARTMENT tuples are stored separately.
♦Modification anomaly – The modification anomaly occurs when the record is modified in
the relation. In this anomaly, the modification in the value of specific attribute requires
modification in all records in which that value occurs.
The same information can be expressed on multiple records; therefore updates to the table may
result in logical inconsistencies. If the update is not carried through successfully—then the table
is left in an inconsistent state.
For example, in EMP_DEPT, if we change the value of one of the attributes of a particular
department—say, the manager of department 5—we must update the tuples of all employees who
work in that department; otherwise, the database will become inconsistent. If we fail to update
some tuples, the same department will be shown to have two different values for manager in
different employee tuples, which should not be the case.
Ideally, a relational database table should be designed in such a way as to exclude the possibility
of update, insertion, and deletion anomalies. The normal forms of relational database theory
provide guidelines for deciding whether a particular design will be vulnerable to such
anomalies. It is possible to correct an un-normalized design so as to make it adhere to the
demands of the normal forms: this is called normalization. Removal of redundancies of the
tables will lead to several tables, with referential integrity restrictions between them.
Define 1NF with example
1NF -A relation is in first normal form (1NF) if and only if the domain of an attribute must
include only atomic (simple, indivisible) values and that the value of any attribute in a tuple
must be a single value from the domain of that attribute. Hence, 1NF disallows having a set
of values, a tuple of values, or a combination of both as an attribute value for a single tuple.
Consider the DEPARTMENT relation schema shown in figure 3, whose primary key is
DNUMBER. Each department can have a number of locations. As we can see, DEPARTMENT
relation is not in 1NF because DLOCATIONS is not a single-valued attribute, as illustrated the
first tuple in following Figure.
Figure 3
DEPARTMENT
DNAME DNUMBER DMGRENO DLOCATIONS
Research 5 333445555 {Bellarie, Sugarland, Houston}
Administration 4 987654321 {Stafford}
3
Headquarters 1 888665555 {Houston}
There are three main techniques to achieve first normal form for such a relation:
1. Removing the attribute DLOCATIONS that violates 1NF and placing it in a

separate relation DEPT_LOCATIONS along with the primary key DNUMBER
of DEPARTMENT. The primary key of this relation will be the combination
{DNUMBER, DLOCATION}, shown in figure 4. A distinct tuple in DEPT _
LOCATIONS exists for each location of a department. This decomposes the non-
1NF relation into two 1NF relations.
Figure 4
DEPARTMENT
DNAME DNUMBER DMGRENO
Research 5 33445555
Administration 4 987654321
Headquarters 1 888665555
DEPT _ LOCATIONS
DNUMBER DLOCATION
5 Bangalore
5 New Delhi
5 Hyderabad
4 Chennai
1 Hyderabad
2. Expanding the key so that there will be a separate tuple in the original
DEPARTMENT relation for each location of a DEPARTMENT, as shown in
figure 5. In this case, the primary key becomes the combination {DNUMBER,
DLOCATION}. This solution has the disadvantage of introducing redundancy in
the relation.
Figure 5 - DEPARTMENT
DNAME DNUMBER DMGRENO DLOCATION
Research 5 33445555 Bangalore
Research 5 33445555 New Delhi
Research 5 33445555 Hyderabad
Administration 4 987654321 Chennai
Headquarters 1 888665555 Hyderabad
3. If a maximum number of values is known for the attribute—for example, if it is

known that at most three locations can exist for a department—replacing the
DLOCATIONS attribute by three atomic attributes DLOCATION1,
DLOCATION2, and DLOCATION3 shown in figure 6. This solution has the
disadvantage of introducing null values, if most departments have fewer than
three locations- which wastes space.
4
Figure 6 - - DEPARTMENT
DNAME DNUMBER DMGRENO DLOCATION1 DLOCATION2 DLOCATION3
Research 5 33445555 Bangalore New Delhi Hyderabad
Administration 4 987654321 Chennai Null Null
Headquarters 1 888665555 Hyderabad Null Null
Of the three solutions above, the first is generally considered best because it does not suffer from
redundancy and is completely general, having no limit placed on a maximum number of values.
First normal form also disallows multivalued attributes that are themselves
composite. These are called nested relations because each tuple can have a relation within it.
Figure 7 shows how the EMP_PROJ relation could appear if nesting is allowed. Each tuple
represents an employee entity, and a relation PROJS (PNUMBER, HOURS) within each tuple
represents the employee's projects and the hours per week that employee works on each project.
The schema of this EMP_PROJ relation can be represented as follows
EMP_PROJ (ENO, ENAME, {PROJS (PNUMBER, HOURS)}). The set braces {} identify the
attribute PROJS as multivalued and we list the component attributes that form PROJS between
parentheses ( ).
Figure 7 - EMP_PROJ
Ssn Ename PNUMBERS HOURS
123456789 Smith, John B 1 32.5
2 7.5
666884444 Narayan, Ramesh K. 3 40.0
453453453 English, Joyce A. 1 20.0
2 20.0
333445555 Wong, Franklin T. 2 10.0
3 10.0
10 10.0
20 10.0
999887777 Zelyala, Aliciya J. 30 30.0
10 10.0
987987987 Jabbar, Ahamed V. 10 35.5
30 5.5
987654321 Wallance, Jennifer S. 30 20.0
20 15.5
888665555 Borg, James E 20 NULL
Ssn is the primary key of the EMP_PROJ relation in while PNUMBER is the partial key of the
nested relation; that is, within each tuple, the nested relation must have unique values of PNUMBER.
To normalize this into 1NF, we remove the nested relation attributes into a new relation and propagate the
primary key into it; the primary key of the new relation will combine the partial key with the primary key
of the original relation. Decomposition and primary key propagation yield the schemas EMP_PROJ1 and
EMP_PROJ2 shown in figure 8.
Figure 8
EMP_PROJ1
SSN ENAME
EMP_PROJ2
5
SSN PNUMBER HOURS
This procedure can be applied recursively to a relation with multiple-level nesting to unrest the
relation into a set of 1NF relations.
Functional Dependencies
A functional dependency (FD) is a constraint between two sets of attributes in a relation
from a database. Functional dependencies play a key role in differentiating good database design
from bad database designs.
Definition of Functional Dependency- A functional dependency, denoted by X →Y, between two sets of
attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation
state r of R. The constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], we must also have t1[Y]
= t2[Y].
This means that the values of the Y component of a tuple in r depend on, or are determined by, the values of the X
component; or alternatively, the values of the X component of a tuple uniquely (or functionally) determine the values
of the Y component. We also say that there is a functional dependency from X to Y or that Y is functionally dependent
on X. The abbreviation for functional dependency is FD or f.d. The set of attributes X is called the left-hand side of the
FD, and Y is called the right-hand side.
As an example we are considering the following schema:

Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number,
amount).
We know that a bank branch has a unique value of assets, so given a branch name we can
uniquely identify the assets value. On the other hand, we know that a branch may make many
loans, so given a branch name we cannot uniquely determine a loan number. In other words, we
say that the functional dependency branch-name -> assets holds on Lending schema but we do
not expect the functional dependency branch-name-> loan-number to hold. The fact that a
branch has a particular value of assets, and the fact that a branch makes a loan are independent.
Some important concept regarding FD

• Relation extensions r(R) that satisfy the functional dependency constraints are called
legal extensions (or legal relation states) of R, because they obey the functional
dependency constraints. A functional dependency is a property of the relation
schema (intension) R, not of a particular legal relation state (extension) r of R.
Hence, an FD cannot be inferred automatically from a given relation extension r but must
be defined explicitly by someone who knows the semantics of the attributes of R.
Figure 9 - TEACH
Teacher Course Text
Smith Data Structure Bartram
Smith Data Management Martin
Hall Compilers Hoffman
Brown Data structure Horowitz
For example, Figure 9 shows a particular state of the TEACH relation schema. Although at first
glance we may think that TEXT → COURSE, we cannot confirm this unless we know that it is
true for all possible legal states of TEACH.
• A functional dependency is a type of constraint that is a generalization of the notion of

key.
6
Definition of Superkey-Let R is a relation schema. A subset K of R is a superkey of R if, in any

legal relation r(R), for all pairs t1 and t2 of tuples in r such that t1≠ t2, then t1 [K] ≠ t2 [K]. That is,
no two tuples in any legal relation r(R) may have the same value on attribute set K.
Functional dependency X→ Y says that, X functionally determines Y in a relation

schema R if and only if, whenever two tuples of r(R) agree on their X-value, they must
necessarily agree on their Y-value. Using the functional-dependency notation, we say
that X is a superkey of R if X-> R. That is, X is a superkey if, whenever t1[X] = t2[X], it is
also the case that t1 [R] = t2 [R] (that is, t1 = t2).
Similarly, we can say if a constraint on R states that there cannot be more than one tuple
with a given X-value in any relation instance r(R)—that is, X is a candidate key of R—
this implies that X →Y for any subset of attributes Y of R (because the key constraint
implies that no two tuples in any legal state r(R) will have the same value of X).
• If X →Y in R, this does not say whether or not Y →X in R.
Uses of functional dependencies:
1. To test relations to see whether they are legal under a given set of functional
dependencies. If a relation r is legal under a set F of functional dependencies, we say that
r satisfies F.
2. To specify constraints on the set of legal relations.
Examples: -
A B C D
a1 b1 c1 d1
a1 b2 c1 d2
a2 b2 c2 d2
a2 b3 c2 d3
a3 b3 c2 d4
Sample relation r
Here, A -> C is satisfied. There are two tuples that have an A value of a1. These tuples have the
same C value—namely, c1. Similarly, the two tuples with an A value of a2 have the same C
value, c2. There are no other pairs of distinct tuples that have the same A value.
The functional dependency C -> A is not satisfied, however. To see that it is not, consider the
tuples t 1 =: (a2, b3, c2, d3) and t2 = (a3, b3, c2, d4). These two tuples have the same C values,
c2, but they have different A values, a2 and a3, respectively. Thus, we have found a pair of tuples
t1 and t2 such that t1[C]= t2 [C], but t1 [A] ≠ t2 [A].
Trivial Functional Dependency

Some functional dependencies are said to be trivial because all relations satisfy
them. For example, A -> A is satisfied by all relations involving attribute A. Similarly, AB->A is
7
satisfied by all relations involving attribute A & B. In general, a functional dependency of the
form α->β is trivial if β ⊆ α, where α, β are set of attributes of any relation R.
There is a difference between the concepts of a relation satisfying a dependency and

a dependency holding on a schema.
If we consider the customer relation in following figure, we see that
customer-street-> customer-city is satisfied.
Customer relation
customer-name customer-street customer-city
Jones Main Harrison
Smith North Rye
Hayes Main Harrison
Curry North Park Rye
Lindsay Putnam Pittsfield
However, in the real world, two cities can have streets with the same name. Thus, it is possible, at
some time; to have an instance of the customer relation in which customer-street -> customer-city
is not satisfied. So, we would not include customer-street -> customer-city in the set of
functional dependencies that hold on Customer-schema.
In the loan relation of following figure the dependency loan-number, amount is satisfied. In
contrast to the case of customer-city and customer-street in Customer schema, the real-world
enterprise requires each loan to have only one amount. Therefore, we want to require that loan-
number -> amount be satisfied by the loan relation at all times. In other words, the constraint
loan-number -> amount hold on Loan-schema.
Loan relation
loan-number branch-name amount
L-17 Downtown 1000
L-23 Redwood 2000
L-15 Perryridge 1500
L-14 Downtown 1500
L-93 Mianus 500
****Closure of a set of Functional Dependencies (Important)
It is not sufficient to consider the given set of functional dependencies. Rather, we

need to consider all functional dependencies that hold. For a given set F of functional
dependencies, it can be proved that certain other functional dependencies hold. We say that such
functional dependencies are "logically implied" by F.
Logically Implied (or inferred) Functional Dependency-For a given a relational schema R, a

functional dependency f on R is logically implied by a set of functional dependencies F on R if
every relation instance r(R) that satisfies F also satisfies f.
Example: - There is a relation schema R = (A, B, C, G, H, I) and the set of functional

dependencies.
A-> B
A ->C
8
CG-> H
CG -> I
B->H
The functional dependency
A->H
is logically implied. That is, whenever given set of functional dependencies holds on a relation, A
->H must also hold on the relation.
Proof:
Let that t1 and t2 are tuples such that

t1 [A] = t2 [A]
Since we are given that A-> B, it follows from the definition of functional dependency that
t1[B] =t2[B]
Then, since we are given that B -> H, it follows from the definition of functional dependency that
t1 [H] = t2[H]
Therefore, we have shown that, whenever t1, and t2 are tuples such that t1 [A] = t2 [A], it must be
that t1 [H] = t2 [H]. But that is exactly the definition of A -> H.
Definition of Closure - F be a set of functional dependencies (say). The closure of F, denoted

by F+, is the set of all functional dependencies that include F as well as all dependencies
logically implied by F.
Axioms-
Given F, we can compute F+ directly from the formal definition of functional dependency. If F
were large, this process would be lengthy and difficult.
Axioms, or rules of inference, provide a simpler technique for reasoning about functional
dependencies.
Armstrong's Axioms-
We can use the following three rules to find logically implied functional dependencies. By
applying these rules repeatedly, we can find all of F+, given F. This collection of rules is called
Armstrong's axioms in honor of the person who first proposed it.
Reflexivity rule- If α is a set of attributes and β ⊆ α, then α -> β holds.

Augmentation rule- If α -> β holds and γ is a set of attributes, then γα -> γβ holds.
Transitivity rule- If α -> β holds and β -> γ holds, then a -> γ holds.
Armstrong's axioms are sound, because they do not generate any incorrect functional
dependencies. They are complete, because, for a given set F of functional dependencies, they
allow us to generate all F+.
Although Armstrong's axioms are complete, it is tiresome to use them directly for the
computation of F+. To simplify matters further, we list additional rules. It is possible to use
Armstrong's axioms to prove that these rules are correct.
Union rule: If α -> β holds and α -> γ holds, then α -> β γ holds.
Decomposition rule: If α -> β γ holds, then α -> β holds and α -> γ holds.
Pseudo-transitivity rule: If α -> β holds and γ β -> δ holds, then α γ -> δ holds.
9
Proof of Axioms (inference rules)

Each of the preceding inference rules can be proved from the definition of functional dependency,
either by direct proof or by contradiction. A proof by contradiction assumes that the rule does
not hold and shows that this is not possible.
Proof of Reflexivity rule
Suppose that X ⊇ Y and that two tuples t1 and t2 exist in some relation instance r of R such that
t1[X] = t2[X]. Then t1[Y] = t1[Y] because X ⊇ Y; hence, X→Y must hold in r.
Proof of Augmentation rule
Let us assume that X→Y holds in a relation instance r of R but that XZ→YZ does not hold. Then
there must exist two tuples t1and t2 in r such that (1) t1[X] = t2[X], (2) t1 [Y] = t2 [Y], (3) t1 [XZ] =
t2 [XZ], and (4) t1 [YZ] ≠t2 [YZ]. This is not possible because from (1) and (3) we deduce (5) t1[Z]
= t2[Z], and from (2) and (5) we deduce (6) t1 [YZ] = t2 [YZ], contradicting (4).
Proof of Transitivity rule

Let us assume that (1) X →Y and (2) Y→ Z both hold in a relation r. Then for any two tuples
t1and t2 in r such that t1 [X] = t2 [X], we must have (3) t1[Y] = t2[Y], from assumption (1); hence
we must also have (4) t1 [Z] = t2 [Z], from (3) and assumption (2); hence X →Z must hold in r.
Proof Union rule (Using Armstrong’s Axioms)

1. X → YZ (given).
2. YZ → Y (using Reflexivity rules and knowing that YZ ⊇ Y).
3. X → Y (using Transitivity rule on 1 and 2).
Decomposition rule (Using Armstrong’s Axioms)
1. X → Y (given).
2. X → Z (given).
3. X → XY (using augmentation rule on 1 by augmenting with X; notice that XX = X).
4. XY → YZ (using augmentation rule on 2 by augmenting with Y).
5. X →YZ (using transitivity on 3 and 4).
Pseudo-transitivity rule(Using Armstrong’s Axioms)
1. X →Y (given).
2. WY →Z (given).
3. WX →WY (using augmentation rule on 1 by augmenting with W).
4. WX → Z (using transitivity rule on 3 and 2).
A procedure to compute F+
F+ = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F+
for each pair of functional dependencies f1 and f2 in F+
if f1 and f2 can be combined using transitivity
add the resulting functional dependency to F+
10
until F+ does not change any further
Let us apply our rules to the example of schema R (A, B, C, G, H, I) and the set F of functional
dependencies {A-> B, A-> C, CG-> H, CG-> I, B-> H}. We list several members of F+ here:
A -> H. Since A -> B and B -> H hold, we apply the transitivity rule.
CG-> HI. Since CG -> H and CG ->I, the union rule implies that CG->HI.
AG->I. Since A->C and CG -> I, the pseudo-transitivity rule implies that AG -> I holds.
Another way of finding that AG -> I holds is as follows:
We use the augmentation rule on A -> C to infer AG -> CG. Applying the transitivity rule
to this dependency and CG-> I, we infer AG -> I.
//The left-hand and right-hand sides of a functional dependency are both subsets of R. Since
a set of size n has 2n subsets, there are a total of 2 x 2n = 2 n+1 possible functional
dependencies, where n is the number of attributes in R.
Typically, database designers first specify the set of functional dependencies F that can easily be
determined from the semantics of the attributes of R; then reflexivity, augmentation, and
transitivity are used to infer additional functional dependencies that will also hold on R. A
systematic way to determine these additional functional dependencies is first to determine
each set of attributes α that appears as a left-hand side of some functional dependency in F
and then to determine the set of all attributes that are dependent on α. Thus for each such
set of attributes α, we determine the set of attributes that are functionally determined by α
based on F; is called the closure of α under F.
*****Definition of Closure of Attribute Sets (Important)
Let α be a set of attributes. We call the set of all attributes functionally determined by α
under a set F of functional dependencies is the closure of α under F; we denote it by α +.
An algorithm to compute α+ the closure of α under F
result := α;
while (changes to result) do
for each functional dependency β->γ in F do
begin
if β ⊆ result then result := result ∪ γ ;
end
Attribute closure algorithm
Here the input is a set F of functional dependencies and the set α of attributes. The output is
stored in the variable result.
To illustrate how the algorithm works, we shall use it to compute (AG)+ with the set F of
functional dependencies {A-> B, A-> C, CG-> H, CG-> I, B-> H}.
11
We start with result = AG. The first time that we execute the while loop to test each
functional dependency, we find that
A -> B causes us to include B in result. To see this fact, we observe that A-> B is in F, A
⊆ result (which is AG), and so result: = result ∪ B.
A ->C causes result to become ABCG.
CG->H causes result to become ABCGH.
CG ->I causes result to become ABCGHI
The second time that we execute the while loop, no new attributes are added to result, and
the algorithm terminates.
Another example to illustrate Closure of Attribute Sets

Consider the following relations:
EMP_PROJ
SSN PNUMBER HOURS ENAME PNAME PLOCATIONS

Here
SSN → ENAME
PNUMBER→ PNAME, PLOCATIONS
SSN, PNUMBER → HOURS
Hence, {SSN}+ = {SSN, ENAME}

{PNUMBER}+ = {PNUMBER, PNAME, PLOCATIONS}
{SSN, PNUMBER}+ = {SSN, PNUMBER, HOURS, ENAME, PNAME, PLOCATIONS}
There are several uses of the attribute closure algorithm:

i) To test if α is a superkey, we compute α+, and check if α+ contains all attributes of R
ii) We can check if a functional dependency a -> β holds (or, in other words, is in F+),
by checking if β ⊆ α+. That is, we compute α + by using attribute closure, and then
check if it contains β.
iii) It gives us an alternative way to compute F+: For each γ ⊆ R, we find the closure γ+ ,
and for each S ⊆ γ+ we output a functional dependency γ->S.
Canonical Cover
Suppose that we have a set of functional dependencies F on a relation schema. Whenever

a user performs an update on the relation, the database system must ensure that the update does
not violate any functional dependencies, that is, all the functional dependencies in F are satisfied
in the new database state. The system must roll back the update if it violates any functional
dependencies in the set F.
We can reduce the effort spent in checking for violations by testing a simplified
set of functional dependencies that has the same closure as the given set. Any database that
satisfies the simplified set of functional dependencies will also satisfy the original set, and vice
versa, since the two sets have the same closure. However, the simplified set is easier to test.
Consider a set F of functional dependencies and the functional dependency α-> β in F.
12
Extraneous Attributes of functional Dependency-An attribute of a functional dependency is

said to be extraneous if we can remove it without changing the closure of the set of functional
dependencies. The formal definition of extraneous attributes is as follows. Consider a set F of
functional dependencies and the functional dependency a -> β in F.
• Attribute A is extraneous in α if A ∈ α, and F logically implies (F - {α -> β}) ∪ {(α -A)

-> β}.
• Attribute A is extraneous in β if A ∈ β, and the set of functional dependencies (F - {α ->
β}) ∪ {α -> (β - A)} logically implies F.
Testing extraneous attributes

• If A ∈ β to check if A is extraneous we shall consider the set F´ = (F - {α → β})
∪ {α → (β – A)} and check if α → A can be inferred from F´. To do so, compute
α+ (the closure of α) under F´ if α includes A, then A is extraneous in β.
• If A ∈ α, to check if A is extraneous, let γ = α - {A}, and check if γ →β can be

inferred from F. To do so, compute γ+ (the closure of γ) under F; if γ includes all
attributes in β, then A is extraneous in α.
Explanation 1:
If F Contains
1) ABC->XYZ
(say α) (say β)
2) BC->XYZ
Then A is extraneous in α
Because F logically implies (F- {ABC ->XYZ})  (BC->XYZ)
As we know by augmentation rule that, If BC->XYZ, then ABC->AXYZ
Then by decomposition rule, If ABC->AXYZ, then ABC-> XYZ
Explanation 2:
If F Contains
1) ABC->XYZ
(say α) (say β)
ABC->YZ
2)
3) YZ ->X
Then X is extraneous in β
Because F logically implies (F- {ABC ->XYZ}) (ABC->YZ)
As we know by transitivity rule that, If ABC->YZ and YZ->X then ABC->X
Then by union rule, If ABC->YZ & ABC->X, then ABC-> XYZ
13
Canonical Cover -A canonical cover FC, for F is a set of dependencies such that F logically
implies all dependencies in FC, and FC logically implies all dependencies in F. Furthermore, FC
must have the following properties:
i) No functional dependency in FC contains an extraneous attribute.

ii) Each left side of a functional dependency in FC is unique. That is, there are no two
dependencies α1->β1 and α2->β2 and in FC such that α1 =α2
The canonical cover of F, Fc can be shown to have the same closure as F; hence, testing
whether Fc is satisfied is equivalent to testing whether F is satisfied.
Is the canonical cover unique?

No, canonical cover of a set of functional dependencies is not unique. A canonical cover of a set
of functional dependencies F is a minimal set of dependencies that is equivalent to F.
Unfortunately there can be several canonical covers for a set of functional dependencies.
Computing canonical cover
Fc = F
repeat
Use the union rule to replace any dependencies in Fc of the form α1->β1 and α2->β2 with α1->β1, β2
Find a functional dependency α->β in Fc with an extraneous attribute either in α or in β
/* Note: the test for extraneous attributes is done using Fc not F */
If an extraneous attribute is found, delete it from α->β
until Fc does not change.
Consider the following set F of functional dependencies on schema (A, B, C):

A-> BC
B->C
A->B
AB-> C
Let us compute the canonical cover for F.
• There are two functional dependencies with the same set of attributes on the left
side of the arrow:
A-> BC
A->B
We combine these functional dependencies into A -> BC
• A is extraneous in AB -> C because F logically implies (F - {AB ->C}) ∪ {B-> C}.

This assertion is true because B-> C is already in our set of functional dependencies.
• C is extraneous in A -> BC, since A-> BC is logically implied by A -> B and B-> C.
Thus, our canonical cover is

A-> B
B->C
14
Another example –
Consider the following set F of functional dependencies on schema (A, B, D):

B-> A
D->A
AB ->D
Let us compute the canonical cover for F.
Step -1: Fc = F, hence Fc = {B-> A, D->A, AB ->D}

Step 2: There are no two or more functional dependencies with the same set of attributes on the
left side of the arrow.
Step 3. A is extraneous in AB → D because Fc logically implies (F - {AB ->D}) ∪ {B-> D}.
This assertion is true because B→ A is already in our set of functional dependencies, Fc. By
augmenting with B both sides we get BB →AB or B →AB. (i)
AB → D (ii) is present here, then by using transitivity rule on (i) and (ii) we get B→D.(iii)
So, AB -> D is replaced by B → D in Fc
No further reduction is possible, since all the FD’s have single attribute on the left hand side.
Step 4: Fc changes, Now Fc = {B-> A, D->A, B ->D}
Step 5: There are two functional dependencies with the same set of attributes on the left side of
the arrow: B→ D and B→A, We combine these functional dependencies into B -> AD (iv)
Step 6: A is extraneous in B -> AD, since B -> AD is logically implied by B -> D (v) and D-> A
(vi).
No further reduction is possible, since all the FD’s have single attribute on the left hand side.
Step 7: Fc changes, Now Fc = { D->A, B ->D}
Step 8: There are no two or more functional dependencies with the same set of attributes on the
left side of the arrow
Step 9: No reduction is possible, since all the FD’s have single attribute on the left hand side.
Step 10: Fc doesn’t changes. So canonical cover Fc is { D->A, B ->D}
2NF - Second normal form (2NF) is based on the concept of full functional dependency.
A functional dependency X -> Y is a full functional dependency if removal of any attribute
A from X means that the dependency does not hold any more; that is, for any attribute A ∈
X, (X – {A}) does not functionally determine Y.
A functional dependency X ->Y is a partial dependency if some attribute A ∈ X can be

removed from the dependency still holds; that is, for some A ∈ X, (X – {A}) -> Y.
2NF-A relation schema R is in 2NF with respect to a set F of functional dependencies if

it satisfies 1NF and every nonprime attribute A in R is fully functionally dependent on
the primary key of R.
Or General definition is
A relation schema R is in second normal form (2NF) with respect to a set F of functional
dependencies if it satisfies 1NF & every nonprime attribute A in R is not partially
dependent on any candidate key of R.
15
The test for 2NF involves testing for functional dependencies whose left-hand side attributes are
part the primary key. If the primary key contains a single attribute, the test need not be applied at
all.
Example1
Consider the following relations in figure 10:
Figure 10-EMP_PROJ
SSN PNUMBER HOURS ENAME PNAME PLOCATIONS

FD1
FD2
FD3
The EMP_ PROJ relation in above figure is in 1NF but is not in 2NF. The nonprime attribute
ENAME violates 2NF because of FD2, as to do the nonprime attributes PNAME and
PLOCATION because of FD3. The functional dependencies FD2 and FD3 make ENAME,
PNAME, and PLOCATION partially dependent on the primary key {SSN, PNUMBER} of
EMP_PROJ, thus violating the 2NF test.
If a database is in lower normal form to make it in higher normal form, we should

decompose that relation schema into several schemas with fewer attributes. Here also we
decompose EMP_PROJ into three relation schemas shown in figure 11.
Figure 11
EP1
SSN PNUMBER HOURS
FD1
EP2
SSN ENAME
FD2
PNUMBER PNAME PLOCATION

FD3
Partial and full functional dependencies will now be considered with respect to all candidate
key of a relation.
Example 2
Consider the relation schema LOTS shown in following figure which describes parcels of land
for sale in various counties of a state. Suppose that there are two candidate keys:
16
PROPERTY_ID# and {COUNTY_NAME, LOT#}; that is, lot numbers are unique only
within each county, but PROPERTY-ID numbers are unique across counties for the entire
state.
Based on the two candidate keys PROPERTY_ID# and {COUNTY_NAME, LOT#}, we know
that the functional dependencies FD1 and FD2 of Figure hold.
We choose PROPERTY_ID# as the primary key, so it is underlined, but no special consideration

will be given to this key over the other candidate key. Suppose that the following two additional
functional dependencies hold in LOTS:
COUNTY_NAME -> TAX_RATE

AREA->PRICE
In words, the dependency FD3 says that the tax rate is fixed for a given county (does not vary lot
by lot within the same county), while FD4 says that the price of a lot is determined by its area
regardless of which county it is in.
LOTS
PROPERTY-ID COUNTY_NAME LOT# AREA PRICE TAX-RATE
FD1
FD2
FD3
FD4
The LOTS relation schema violates the general definition of 2NF because TAX_RATE is
partially dependent on the candidate key {COUNTY_NAME, LOT#}, due to FD3. To
normalize LOTS into 2NF, we decompose it into the two relations LOTS1 and LOTS2, shown in
following figure. We construct LOTS1 by removing the attribute TAX_RATE that violates 2NF
from LOTS and placing it with COUNTY-NAME (the left-hand side of FD3 that causes the
partial dependency) into another relation LOTS2. Both LOTS1 and LOTS2 are in 2NF. Notice
that FD4 does not violate 2NF and is carried over to LOTS1.
LOTS1
PROPERTY-ID COUNTY_NAME LOT# AREA PRICE
FD1
FD2
FD4
17
LOTS2
COUNTY_NAME TAX_RATE
FD3
**The test for 2NF involves testing for functional dependencies whose left-hand site attributes
are part of candidate key. If candidate key contain single attribute, the test not to be applied at all.
Third normal form (3NF) is based on the concept of transitive dependency. A functional
dependency X-> Y in a relation schema R is a transitive dependency if there is a set of
attributes Z that is neither a candidate key nor a subset of any key of R, and both X -> Z
and Z ->Y hold.
3NF- According to Codd's original definition, a relation schema R is in 3NF with respect to
a set F of functional dependencies if it satisfies 2NF and no nonprime attribute of R is
transitively dependent on the primary key.
Or
(General Definition)
A 3NF relation should not have a nonkey attributes functionally determined by another
non-key attribute (or by a set of nonkey attributes).
A relation schema R is in third normal form (3NF) with respect to a set F of functional
dependencies if, whenever a nontrivial functional dependency X -> A holds in R, either (a)
X is a superkey of R, or (b) A is a prime attribute of R.
Example1:
Consider the relation schema EMP_DEPT in the figure 12.
Figure 12
EMP_DEPT
ENAME SSN DOB ADDRESS DNUMBER DENAME DMGRNO
FD1
FD2
The above relation schema is in 2NF, since no partial dependencies on a key exist.
However, EMP_DEPT is not in 3NF because of the transitive dependency of DMGRENO (and
also DNAME) on SSN via DNUMBER. We can normalize EMP_DEPT by decomposing it into
the two 3NF relation schemas EDI and ED2.
ED1
ENAME SSN DOB ADDRESS DNUMBER
FD1
ED2
18
DNUMBER DNAME DMGRNO
FD2
Intuitively, we see that EDI and ED2 represent independent entity facts about employees
and departments.
Intuitively, we can see that any functional dependency in which the left-hand side is part (proper
subset) of the candidate key, or any functional dependency in which the left-hand side is a nonkey
attribute, is a problematic FD. 2NF and 3NF normalization remove these problem FDs by
decomposing the original relation into new relation. In terms of the normalization process, it is
not necessary to remove the partial dependencies before the transitive dependencies, but
historically, 3NF has been defined with the assumption that a relation is tested for 2NF first
before it is tested for 3NF.
Example –2
Consider the following two relation Schemas:
LOTS1
PROPERTY-ID COUNTY_NAME LOT# AREA PRICE
FD1
FD2
FD4
LOTS2
COUNTY_NAME TAX_RATE
FD3
Here LOTS2 is in 3NF. However, FD4 in LOTS1 violates 3NF because AREA is not a
superkey and PRICE is not a prime attribute in LOTS1. To normalize LOTS1 into 3NF we
decompose it into relation schemas LOTS1A and LOTS1B shown in following figure. We
construct LOTS1A attribute by removing the attribute PRICE that violates 3NF from LOTS1 and
placing it with AREA (the left-hand side of FD4 that causes the transitive dependency) into
another relation LOTS1B. Both LOTS1A and LOTS1B are in 3NF.
LOTS1A
PROPERTY-ID COUNTY_NAME LOT# AREA
FD1
FD2 19
LOTS1B
AREA PRICE
FD3
Interpreting the General Definition of Third Normal Form
A relation schema R violates the general definition of 3NF if a functional dependency X->A
holds in R that violates both conditions (a) and (b). Violating (b) means that A is a nonprime
attribute. Violating (a) means that X is not a superset of any key of R; hence, X could be
nonprime or it could be a proper subset of a key of R. If X is nonprime, we typically have a
transitive dependency that violates 3NF, whereas if X is a proper subset of a key of R, we have a
partial dependency that violates 3NF (and also 2NF).
Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was found to be
stricter than 3NF. That is, every relation in BCNF is also in 3NF; however, a relation in 3NF
is not necessarily in BCNF.
BOYCE-CODD NORMAL FORM (BCNF)-A relation schema R is in BCNF with respect to

a set F of functional dependencies if whenever a nontrivial functional dependency X -> A
holds in R, then X is a superkey of R.
The formal definition of BCNF differs slightly from the definition of 3NF. The only difference
between definitions of BCNF and 3NF is that condition (b) of 3NF, which allows A to be prime,
is absent from BCNF.
Consider the following relation schema:
LOTS1A
PROPERTY ID# COUNTY_NAME LOT# AREA
FD1
FD2
FD5
20
Here FD5 violates BCNF in LOTS1A because AREA is not a superkey of LOTS1A.
Note that FD5 satisfies 3NF in LOTS1A because COUNTY_NAME is a prime attribute
(condition b), but this condition does not exist in the definition of BCNF. We can decompose
LOTS1A into two BCNF relations LOTS1AX and LOTS1AY, shown in following figure. This
decomposition loses the, functional dependency FD2 because its attributes no longer coexist in
the same relation after decomposition.
LOTS1AX
PROPERTY ID# COUNTY_NAME LOT#
LOTS1AY
AREA COUNTY_NAME
In practice, most relation schemas that are in 3NF are also in BCNF. Only if X ->A holds
in a relation schema R with X not being a superkey and A being a prime attribute will R be
in 3NF but not in BCNF.
Decomposition
If a database is in lower normal form to make it in higher normal form, we should
decompose that relation schema into several schemas with fewer attributes. Careless
decomposition, however, may lead to another form of bad design. We will discuss
some required properties of relational decomposition
1) We must make sure that each attribute in schema R which is decomposed into
D={ R1, R2, ....Rm}will appear in at least one relation schema Ri in the
decomposition so that no attributes are "lost"; formally we have  mi=1 Ri=R
This is called the attribute preservation condition of decomposition.
2) We must make sure that decomposition will be Lossless (Nonadditive) Joins

of decomposition.
Definition of lossless (Nonadditive) join-decomposition - Let F be a set of

functional dependencies on relation schema R. A decomposition D ={ R1,
R2, ....Rm}of R is a lossless join decomposition, if for all relation state r on schema R,
that are legal under F, the follows holds
r = ΠR1(r) ΠR2(r) ...... ΠRm(r)
3) We must make sure that decomposition will be dependency preserving.
21
We say that decomposition of relation schema R into D = {R1, R2,….Rm} is

dependency preserving with respect to F if the union of projection of F on each Rj in
D is equivalent to F that is ((ΠR1(F))  (ΠR2(F)) …  (ΠRm(F)))+ = F+
Given a set of dependencies F on R, the projection of F on Rj denoted by ΠRj(F) where Rj is

a subset of R, is set of dependencies X → Y in F+ such that attribute X  Y are in all
contained in Rj. Hence the projection of F on each relation schema Rj in the
decomposition of D ={ R1, R2, ....Rm} is the set of functional dependencies in F+, the
closure of F, such that their left and right hand side attributes are in Rj.
We will explain lossless-join-decomposition using following example.
Lending schema
Branch-Name Branch-City Assets Customer- Loan-Number Amount
Name
Downtown Brooklyn 9000000 Jones L-17 1000
Redwood Palo Alto 2100000 Smith L-23 2000
Perryridge Horseneck 1700000 Hayes L-15 1500
Downtown Brooklyn 9000000 Jackson L-14 1500
Mianus Horseneck 400000 Jones L-93 500
Round Hill Horseneck 8000000 Turner L-11 900
Pownal Bennington 300000 Williams L-29 1200
North Town Rye 3700000 Hayes L-16 1300
Downtown Brooklyn 9000000 Johnson L-18 2000
Perryridge Horseneck 1700000 Glenn L-25 2500
Brighton Brooklyn 7100000 Brooks L-10 2200
FD1
FD2
Due to presence of FD2 (Which is a transitive dependency Loan-Number -> Branch-

Name & Branch-Name-> Branch-City, Assets hence Loan-Number -> Branch-City, Assets),
Lending is not in 3NF.
We decompose Lending-schema into the following two schemas:

i. Branch-customer=Π branch-name, branch-city, assets, customer-name (Lending)
ii. Customer-loan = Π customer-name, loan-number, amount (Lending)
The following Functional dependency Holds:

Branch-customer = (branch-name, branch-city, assets, customer-name)
Customer-loan = (customer-name, loan-number, amount)
22
Resulting branch-customer and customer-loan relation are shown bellow:
Branch-customer
Branch-Name Branch-City Assets Customer-Name
Downtown Brooklyn 9000000 Jones
Redwood Palo Alto 2100000 Smith
Perryridge Horseneck 1700000 Hayes
Downtown Brooklyn 9000000 Jackson
Mianus Horseneck 400000 Jones
Round Hill Horseneck 8000000 Turner
Pownal Bennington 300000 Williams
North Town Rye 3700000 Hayes
Downtown Brooklyn 9000000 Johnson
Perryridge Horseneck 1700000 Glenn
Brighton Brooklyn 7100000 Brooks
Customer-loan
Customer- Loan-Number Amount
Name
Jones L-17 1000
Smith L-23 2000
Hayes L-15 1500
Jackson L-14 1500
Jones L-93 500
Turner L-11 900
Williams L-29 1200
Hayes L-16 1300
Johnson L-18 2000
Glenn L-25 2500
Brooks L-10 2200
Of course, there are cases in which we need to reconstruct the loan relation. For example,
suppose that we wish to find all branches that have loans with amounts less than $1000. No
relation in our alternative database contains these data. We need to reconstruct the lending
relation. It appears like the following Relation.
Branch-customer Customer-loan
Name
23

When we compare Branch-customer Customer-loan and the lending relation with which we
started, we notice a difference: Although every tuple that appears in the lending
relation appears in Branch-customer Customer-loan, there are tuples in Branch-
customer Customer-loan that are not in lending. In our example, Branch-
customer Customer-loan has the following additional tuples:
(Downtown, Brooklyn, 9000000, Jones, L-93, 500)

(Perryridge, Horseneck, 1700000, Hayes, L-16, 1300)
(Mianus, Horseneck, 400000, Jones, L-17, 1000)
(North Town, Rye, 3700000, Hayes, L-15, 1500)
Now we consider the query, "Find all bank branches that have made a loan in an amount less than
$1000." If we look back at Lending relation, we see that the only branches with loan amounts less
than $1000 are Mianus and Round Hill. However, when we apply the expression Branch-
customer Customer-loan, we obtain three branch names: Mianus, Round Hill, and Downtown.
A closer examination of this example shows why.

If a customer happens to have several loans from different branches, we cannot tell
which loan belongs to which branch from branch-customer and customer-loan. Thus, when we
join branch-customer and customer-loan, we obtain not only the tuples we had originally in
lending, but also several additional tuples.
Although we have more tuples in Branch-customer Customer-loan, we actually have

less information. We are no longer able, in general, to represent in the database information about
which loan are taken from which branch. Because of this loss of information, we call the
decomposition of Lending-schema into Branch-customer-schema and customer-loan-
schema a lossy decomposition, or a lossy-join decomposition.
A decomposition that is not a lossy-join decomposition is a lossless-join decomposition. It

should be clear from our example that a lossy-join decomposition is, in general, a bad
database design.
***The word loss in lossy refers to loss of information, not to loss of tuples. If a decomposition
does not have the lossless join property, we may get additional spurious tuples after the
NATURAL JOIN( ) operations are applied; these additional tuples represent erroneous
24
information. We prefer the term nonadditive join because it describes the situation more
accurately; if the property holds on a decomposition, we are guaranteed that no spurious tuples
bearing wrong information are added to the result after the d NATURAL JOIN operations are
applied.
Why is the decomposition lossy?
There is one attribute in common between Branch-customer-schema and Customer-loan-schema:

Branch-customer-schema  Customer-loan-schema = {customer-name}
The only way that we can represent a relationship between, for example, loan-number and
branch-name is through customer-name. This representation is not adequate because a
customer may have several loans (customer-name is not a superkey of Customer-loan), yet
these loans are not necessarily obtained from the same branch (customer-name is not a
superkey of Branch-customer).
Let us consider another alternative design, in which we decompose Lending-schema into the
following two schemas:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
Resulting Branch and Loan-info relation are shown bellow:
Branch
Branch-Name Branch-City Assets
Downtown Brooklyn 9000000
Redwood Palo Alto 2100000
Perryridge Horseneck 1700000
Mianus Horseneck 400000
Round Hill Horseneck 8000000
Pownal Bennington 300000
North Town Rye 3700000
Brighton Brooklyn 7100000
Loan-info
Branch-Name Customer- Loan-Number Amount
Name
Downtown Jones L-17 1000
Redwood Smith L-23 2000
Perryridge Hayes L-15 1500
Downtown Jackson L-14 1500
Mianus Jones L-93 500
Round Hill Turner L-11 900
Pownal Williams L-29 1200
North Town Hayes L-16 1300
25
Downtown Johnson L-18 2000

Perryridge Glenn L-25 2500
Brighton Brooks L-10 2200
If we reconstruct the lending relation. It appears like the following Relation.
Branch Loan-info
Name
When we compare Branch Loan-info and the lending relation with which
we started, we notice no difference.
There is one attribute in common between these two schemas:

Branch  loan-info = {branch-name}
Thus, the only way that we can represent a relationship between, for example, customer-name
and assets is through branch-name.
The difference between this example and the preceding one is that for a given branch-name, there
is exactly one assets value and exactly one branch-city (branch-name is superkey of Branch);
whereas a similar statement cannot be made for customer-name. That is, the functional
dependency branch-name -> assets, branch-city holds.
Testing for Lossless-join decomposition

If a relation is decomposed into two relations
Let R be a relation schema, and let F be a set of functional dependencies on R. Let R1, and R2
form a decomposition of R. This decomposition is a lossless-join decomposition of R if at least
one of the following functional dependencies is in F+:
R1  R2 -> R1
R1  R2 -> R2
In other words, if R1  R2 forms a superkey of either R1 or R2 , the decomposition of R is a
lossless-join decomposition.
If a relation is decomposed into more than two relations

There is an algorithm of testing lossless join property of a decomposition in which a relation is
decomposed into two or more relations.
26
Input of the algorithm is: - A universal relation R, a decomposition of R into D={R1, R2,
R3, ....,Rm} and a set F of functional dependencies.
Algorithm is as follows:
1. Create an initial matrix S with one row i for each relation Ri in D, and one column j for each
attribute Aj in R.
2. Set S(i, j) := bij for all matrix entries.
3. For each row i representing relation schema Ri
{for each column j representing attribute Aj
{if (relation includes attribute ) then set S(i, j):=aj ;};};
4. Repeat the following loop until a complete loop execution results in no changes to S
{for each functional dependency X →Y in F
{for all rows in S which have the same symbols in the columns corresponding to
attributes in X
{make the symbols in each column that correspond to an attribute in Y be the
same in all these rows as follows: if any of the rows has an "a" symbol for the column, set the
other rows to that same "a" symbol in the column. If no "a" symbol exists for the attribute in any
of the rows, choose one of the "b" symbols that appear in one of the rows for the attribute and set
the other rows to that same "b" symbol in the column ;}
If a row is made up entirely of "a" symbols, goto step 5;};};
5. If a row is made up entirely of "a" symbols, then the decomposition has the lossless join
property; otherwise it does not.
Example 1-
Let EMP_PROJ = {SSN, ENAME, PNUMBER, PNAME, PLOCATIONS,

HOURS) is decomposed into
EMP_LOCS = {ENAME, PLOCATIONS}
EMP_PROJ1 = {SSN, PNUMBER, HOURS, PNAME, PLOCATIONS}
Set of functional dependency F = {SSN → ENAME,
PNUMBER→(PNAME, PLOCATIONS), (SSN, PNUMBER) → HOURS }
Show the decomposition is lossless (holds non-additive join property) or not
Answer:
The initial Matrix S looks as follows:
SSN ENAME PNUMBER PNAME PLOCATIONS HOURS
EMP_LOCS b11 a2 b13 b14 a5 b16
EMP_PROJ1 a1 b22 a3 a4 a5 a6
27
Then we apply the functional dependencies SSN → ENAME,

PNUMBER→(PNAME, PLOCATIONS), (SSN, PNUMBER) → HOURS
one by one according to step 3
The loop in step 4 of the algorithm cannot change any ‘b’ symbols to ‘a’
symbols, hence the resulting matrix S does not have a row with all a
symbols, and so the decomposition does not have lossless join property.
Example 2-
Let EMP_PROJ = {SSN, ENAME, PNUMBER, PNAME, PLOCATIONS,

HOURS) is decomposed into
EMP = {SSN, ENAME}
PROJ = {PNAME, PNUMBER, PLOCATIONS}
WORKS_ON = {SSN, PNUMBER, HOURS}
Set of functional dependency F = {SSN → ENAME,

PNUMBER→(PNAME, PLOCATIONS), (SSN, PNUMBER) → HOURS }
Show the decomposition is lossless (non-additive join property or not)
Answer
EMP a1 a2 b13 b14 b15 b16
PROJ b21 b22 a3 a4 a5 b26
WORKS_ON a1 b32 a3 b34 b35 a6
After applying SSN → ENAME, S matrix will be

EMP a1 a2 b13 b14 b15 b16
WORKS_ON a1 b32 a2 a3 b34 b35 a6
After applying PNUMBER→(PNAME, PLOCATIONS), S matrix will be

EMP a1 a2 b13 b14 b15 b16
28
WORKS_ON a1 b32 a2 a3 b34 a4 b35 a5 a6
Now one row of matrix S made up entirely of "a" symbols, hence the decomposition has
the lossless join property.
Example 3
Let R = ABCDE, R1 = AD, R2 = AB, R3 = BE, R4 = CDE, and R5 = AE. Let the functional
dependencies be: A -> C, B -> C, C -> D, DE -> C, CE -> A. Test if the decomposition of R into
{R1,..,R5} is a lossless join decomposition.
Answer:

A B C D E
R1 a1 b12 b13 a4 b15
R2 a1 a2 b23 b24 b25
R3 b31 a2 b33 b34 a5
R4 b41 b42 a3 a4 a5
R5 a1 b52 b53 b54 a5
After applying A -> C the matrix S will be

A B C D E
R1 a1 b12 b13 a4 b15
R2 a1 a2 b23 b13 b24 b25
R3 b31 a2 b33 b34 a5
R4 b41 b42 a3 a4 a5
R5 a1 b52 b53 b13 b54 a5
After applying B -> C the matrix S will be

A B C D E
R1 a1 b12 b13 a4 b15
R2 a1 a2 b13 b24 b25
R3 b31 a2 b33 b13 b34 a5
R4 b41 b42 a3 a4 a5
R5 a1 b52 b13 b54 a5
After applying C -> D the matrix S will be

A B C D E
R1 a1 b12 b13 a4 b15
R2 a1 a2 b13 b24 a4 b25
R3 b31 a2 b13 b34 a4 a5
R4 b41 b42 a3 a4 a5
R5 a1 b52 b13 b54 a4 a5
After applying DE -> C the matrix S will be
29
A B C D E
R1 a1 b12 b13 a4 b15
R2 a1 a2 b13 a4 b25
R3 b31 a2 b13 a3 a4 a5
R4 b41 b42 a3 a4 a5
R5 a1 b52 b13 a3 a4 a5
After applying CE -> A the matrix S will be

A B C D E
R1 a1 b12 b13 a4 b15
R2 a1 a2 b13 a4 b25
R3 b31 a1 a2 a3 a4 a5
R4 b41 a1 b42 a3 a4 a5
R5 a1 b52 a3 a4 a5
Now one row of matrix S made up entirely of "a" symbols, hence the decomposition
has the lossless join property.
Dependency Preservation
There is another goal in relational-database design: dependency preservation. When an update is
made to the database, the system should be able to check that the update will not create an illegal
relation—that is, one that does not satisfy all the given functional dependencies.
If we want to check updates efficiently, we should design relational- database schemas that allow
update validation without the computation of joins. ---(In other words we should design relational
database schemas in such a way that dependency is preserved.)
To decide whether joins must be computed to check an update, we need to determine what
functional dependencies can be tested by checking each relation individually.
Let F be a set of functional dependencies on a schema R, and let R1, R2, ..., Rm be a
decomposition of R. The projection of F to Ri (denoted by ΠRi(F) )is the set of all functional
dependencies in F+ that include only attributes of Ri. Since all functional dependencies in a
projection involve attributes of only one relation schema, it is possible to test such a dependency
for satisfaction by checking only one relation.
Note that the definition of restriction uses all dependencies in F+, not just those in F. For
instance, suppose F = {A → B, B → C}, and we have a decomposition into AC and AB. The
restriction of F to AC is then A → C, since AC is in F+, even though it is not in F.
The set of projections ΠR1(F), ΠR2(F), … ΠRm(F) is the set of dependencies that can be checked
efficiently. We now must ask whether testing only the restrictions is sufficient. Let F' =
((ΠR1(F))  (ΠR2(F)) …  (ΠRm(F))). F' is a set of functional dependencies on schema R,
but, in general, F' ≠F. However, even if F' F, it may be that F´+ = F+. If the latter is true, then
every dependency in F is logically implied by F', and, if we verify that F' is satisfied, we
30
have verified that F is satisfied. We say that a decomposition having the property F'+ = F+ is
a dependency-preserving decomposition.
An algorithm to test dependency preservation without

computing closure F + of set of functional dependencies F
Checking the dependency preservation i.e., is weather ((ΠR1(F)) 

(ΠR2(F)) …  (ΠRm(F)))+ = F+ is expensive, sine it requires
computation of F+ and F´+.
We now give a more efficient test for dependency preservation, which avoids computing
F+. The idea is to test each functional dependency α → β in F by using a modified form
of attribute closure to see if it is preserved by the decomposition. We apply the following
procedure to each α → β in F.
result = α
while (changes to result) do
for each Ri in the decomposition t = (result  Ri)+  Ri
result = result  t
If result contains all attributes in β, then the functional dependency α → β is preserved.

The decomposition is dependency preserving if and only if all the dependencies in F are
preserved.
Example 1:
Suppose R (A, B, C) is a relation schema with a set of functional dependencies F = {A-
>B, B->C}. R (A, B, C) is decomposed into
R1 = {A, B} and R2= {A, C}. Check weather the decomposition is dependency
preserved or not?
Answer:
First we compute F+
Since A->B and B->C, by applying transitivity rule we get, A->C

Now F+ = {A->B, B->C, A->C, A->BC}
Now the projection of F on R1, i,e., ΠR1(F) = A->B

Now the projection of F on R2, i,e., ΠR2(F) = A->C
F´= ΠR1(F)  ΠR2(F) = { A->B, A->C}

F´+ ={ A->B, A->C}
Since F+ ≠ F´+, decomposition of R(A, B, C) into R1 = {A, B} and R2= {A, C}are not
dependency preserving.
31
Example 2:
Suppose R (A, B, C) is a relation schema with a set of functional dependencies F = {A-
>B, B->C}. R (A, B, C) is decomposed into
R1 = {A, B} and R2= {B, C}. Check weather the decomposition is dependency preserved
or not?
Answer:
First we compute F+
Since A->B and B->C, by applying transitivity rule we get, A->C

Now F+ = {A->B, B->C, A->C, A->BC}
Now the projection of F on R1, i,e., ΠR1(F) = A->B

Now the projection of F on R2, i,e., ΠR2(F) = B->C
F´= ΠR1(F)  ΠR2(F) = { A->B, B->C}

Since, Since A->B and B->C, by applying transitivity rule we get, A->C
F´+ ={ A->B, B->C , A->C, A->BC }
Since F+ = F´+, decomposition of R(A, B, C) into R1 = {A, B} and R2= {A, C}are
Not every BCNF decomposition is dependency preserving, but 3NF decomposition is

always dependency preserving.
Example
Suppose R (branch_name, customer_name, banker_name) is a relation schema with a set
of functional dependencies F = {banker_name -> branch_name, (branch_name,
customer_name)-> banker_name}.
Here the primary key is branch_name, customer_name.

Since here a non-trivial functional dependency banker_name -> branch_name holds and
banker_name is not a superkey and branch_name is a prime attribute, R (branch_name,
customer_name, banker_name) is not in BCNF but it is in 3NF.
If we want to make it BCNF, we have to decompose it into R1
={ banker_name, branch_name }and R2 ={ banker_name , customer_name }.
Now we compute F+= {banker_name -> branch_name, (branch_name, customer_name)->

banker_name}.
Projection of F on R1 (ΠR1(F) )= banker_name -> branch_name

Projection of F on R2 (ΠR2(F) ) = φ
F´= ΠR1(F)  ΠR2(F) = { banker_name -> branch_name }
F´+ = { banker_name -> branch_name }
32
Since, F+ ≠ F´+, decomposition of R(branch_name, customer_name, banker_name) into

R1 = { banker_name, branch_name } and R2= { banker_name , customer_name }are not
So, from this we can say that not every BCNF decomposition is dependency
preserving.
R (branch_name, customer_name, banker_name) with a set of functional dependencies F
= {banker_name -> branch_name, (branch_name, customer_name)-> banker_name} are
already in 3NF.
Compare between 3NF and BCNF with example
3NF BCNF
Definition A relation schema R is in third normal form A relation schema R is in BCNF
(3NF) with respect to a set F of functional with respect to a set F of functional
dependencies if, whenever a nontrivial functional dependencies if whenever a
dependency X -> A holds in R, either (a) X is a nontrivial functional dependency X
superkey of R, or (b) A is a prime attribute of R. -> A holds in R, then X is a
superkey of R.
Example R (branch_name, customer_name, banker_name) R (branch_name, customer_name,

is a relation schema with a set of functional banker_name) is a relation schema
dependencies F = {banker_name -> branch_name, with a set of functional
(branch_name, customer_name)-> banker_name} dependencies F = {banker_name ->
is in 3NF, because here superkey is branch_name, (branch_name,
(branch_name, customer_name) and right hand customer_name)-> banker_name}.
side of nontrivial functional dependency is not in BCNF, because here
banker_name -> branch_name is a prime attribute. superkey is (branch_name,
customer_name) and right hand
side of nontrivial functional
dependency banker_name ->
branch_name is a prime attribute.
Which is BCNF is more stricter than 3NF BCNF is more stricter than 3NF
stricter
Weather we It is always possible to obtain a 3NF design It is always possible to obtain a
have lossless without sacrificing a lossless join or dependency BCNF design without sacrificing
join property preservation. dependency preservation.
and
Dependency is
preserved or
not?
Disadvantages After achieving 3NF, since all transitive After achieving BCNF, since all
dependencies are not eliminated, we may have to transitive dependencies are
use null values to represent some of the possible eliminated, we do not have to use
33
meaningful relationships among data items, and null values to represent some of the
there is the problem of repetition of information. possible meaningful relationships
among data items, and there is no
the problem of repetition of
information.
Which is Since it is not always possible to satisfy If we choose BCNF and
preferable BCNF and dependency preservation, we dependency is not preserved,
may be forced to choose between BCNF we have to consider each
and dependency preservation with 3NF. If dependency in a minimum
we choose dependency preservation with cover Fc, that is not preserved
3NF, application programmer needs to in the decomposition. For
worry about writing code to keep each such dependency α→β
redundant data consistent on updates. we define a materialized view
that computes a join of all
relations in the
decomposition, and projects
the result on αβ. The
functional dependency can be
easily tested on this
materialized view.
34

Database Normalization Revised

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Database Normalization Revised

Загружено:

Авторское право:

Доступные форматы

Sudipta Saha Page 1 5/6/2011

Database Normalization Theory (Upto BCNF)

Definition: - Database normalization, (sometimes referred to as canonical synthesis), is a

Pitfalls in Relational-Database Design

(Why is normalization required? / Why is normalization adopted for

• Repetition of information- wastes space.

Narayan, Ramesh K. 666884444 1962-09-15 975 Fire Oak, Humble, TX 5

• Logical inconsistencies of various types called update anomalies result from

Define 1NF with example

Headquarters 1 888665555 {Houston}

1. Removing the attribute DLOCATIONS that violates 1NF and placing it in a

3. If a maximum number of values is known for the attribute—for example, if it is

SSN PNUMBER HOURS

As an example we are considering the following schema:

Some important concept regarding FD

• A functional dependency is a type of constraint that is a generalization of the notion of

Definition of Superkey-Let R is a relation schema. A subset K of R is a superkey of R if, in any

Functional dependency X→ Y says that, X functionally determines Y in a relation

• If X →Y in R, this does not say whether or not Y →X in R.

Uses of functional dependencies:

Trivial Functional Dependency

There is a difference between the concepts of a relation satisfying a dependency and

****Closure of a set of Functional Dependencies (Important)

It is not sufficient to consider the given set of functional dependencies. Rather, we

Logically Implied (or inferred) Functional Dependency-For a given a relational schema R, a

Example: - There is a relation schema R = (A, B, C, G, H, I) and the set of functional

Let that t1 and t2 are tuples such that

Definition of Closure - F be a set of functional dependencies (say). The closure of F, denoted

Reflexivity rule- If α is a set of attributes and β ⊆ α, then α -> β holds.

Proof of Axioms (inference rules)

Proof of Reflexivity rule

Proof of Augmentation rule

Proof of Transitivity rule

Proof Union rule (Using Armstrong’s Axioms)

Decomposition rule (Using Armstrong’s Axioms)

until F+ does not change any further

*****Definition of Closure of Attribute Sets (Important)

An algorithm to compute α+ the closure of α under F

Another example to illustrate Closure of Attribute Sets

SSN PNUMBER HOURS ENAME PNAME PLOCATIONS

Hence, {SSN}+ = {SSN, ENAME}

There are several uses of the attribute closure algorithm:

Suppose that we have a set of functional dependencies F on a relation schema. Whenever

Consider a set F of functional dependencies and the functional dependency α-> β in F.

Extraneous Attributes of functional Dependency-An attribute of a functional dependency is

• Attribute A is extraneous in α if A ∈ α, and F logically implies (F - {α -> β}) ∪ {(α -A)

Testing extraneous attributes

• If A ∈ α, to check if A is extraneous, let γ = α - {A}, and check if γ →β can be

i) No functional dependency in FC contains an extraneous attribute.

Is the canonical cover unique?

Computing canonical cover

Consider the following set F of functional dependencies on schema (A, B, C):

• A is extraneous in AB -> C because F logically implies (F - {AB ->C}) ∪ {B-> C}.

Thus, our canonical cover is

Consider the following set F of functional dependencies on schema (A, B, D):

Step -1: Fc = F, hence Fc = {B-> A, D->A, AB ->D}

Define 2NF with example

A functional dependency X ->Y is a partial dependency if some attribute A ∈ X can be

2NF-A relation schema R is in 2NF with respect to a set F of functional dependencies if

SSN PNUMBER HOURS ENAME PNAME PLOCATIONS

If a database is in lower normal form to make it in higher normal form, we should

PNUMBER PNAME PLOCATION

We choose PROPERTY_ID# as the primary key, so it is underlined, but no special consideration