Вы находитесь на странице: 1из 34

Normalization

Overview
Earliest formalized database design
technique and at one time was the starting point
for logical database design.
Today is used more as check on database
structures produces from E-R diagrams.
Data normalization process is another way of
demonstrating and learning about such
important topics as data redundancy, foreign
key, and other ideas that are so central to a solid
of database management.
In 1972, Dr E.F. Codd developed the technique
of normalization to support the design of
databases based on the relational model.
Normalization is often performed as a series of
tests on a table to determine whether it satisfies
or violates the rules for a given normal form.
There are several normal forms, although the
most commonly used ones are called first
normal form (1NF), second normal form (2NF),
and third normal form (3NF).
All these normal forms are based on rules about
relationships among the columns of a table.
Definition
Normalization
A technique for producing a set of tables with desirable properties
that support the requirements of a user or company. [Connolly,
Thomas M.]
A methodology for organizing attributes into tables so that
redundancy among the nonkey attributes is eliminated. [Gillenson,
Mark L.]
With Normalization
Each resultant tables will describe a single entity type or a single
many to many relationship.
Foreign key will appear exactly where they needed.
The output of the data normalization process is a properly structured
relational database.
Data redundancy and
update anomalies
A major aim of relational database design
is to group columns into tables to minimize
data redundancy and reduce the file
storage space required by the
implemented base tables.
For Example: The structure of these tables is described
using a Database Definition Language (DDL):
Staff (staffNo, name, position, salary, branchNo)
Primary Key staffNo
Foreign Key branchNo references
Branch(branchNo)
Branch (branchNo, branchAddress, telNo)
Primary Key branchNo
StaffBranch (staffNo, name, position, salary,
branchNo, branchAddress, telNo)
Primary Key staffNo
The StaffBranch table
In the StaffBranch table there is redundant
data:
the details of a branch are repeated for every member
of staff located at that branch.
In contrast, the details of each branch appear only
once in the Branch table and only the branch number
(branchNo) is repeated in the Staff table, to represent
where each member of staff is located.
Tables that have redundant data may have
problems called update anomalies, which are
classified as insertion, deletion, or modification
anomalies.
Insertion anomalies
There are two main types of insertion anomalies:
1. To insert the details of a new member of staff located
at a given branch into the StaffBranch table, we must
also enter the correct details for that branch. For
example, to insert the details of a new member of staff
at branch B002, we must enter the correct details of
branch B002 so that the branch details are consistent
with values for branch B002 in other records of the
StaffBranch table.
2. To insert details of a new branch that currently has no
members of staff into the StaffBranch table, its
necessary to enter nulls into the staff-related columns,
such as staffNo. However, as staffNo is the primary
key for the StaffBranch table, attempting to enter nulls
for staffNo violates entity integrity, and is not allowed.
Deletion anomalies
If we delete a record from the StaffBranch
table that represents the last member of
staff located at a branch, the details about
that branch are also lost from the
database.
For example, if we delete the record for
staff Art Peters (S0415) from the
StaffBranch table, the details relating to
branch B003 are lost from the database.
Modification anomalies
If we want to change the value of one of the
columns of a particular branch in the StaffBranch
table, for example the telephone number for
branch B001, we must update the records of all
staff located at that branch.
If this modification is not carried out on all the
appropriate records of the StaffBranch table, the
database will become inconsistent.
In this example, branch B001 would have
different telephone numbers in different staff
records.
Introduction to
the Data Normalization Technique
The input required by the data normalization
process :
1. A List of all attributes that must be incorporate
into the database all of the attribute in all the
entities involved in the business environment
under discussion plus all of the intersection
data attributes in all of the many to many
relationship between these entities.
2. A list of all the defining associations between
the attributes functional dependencies.
Functional Dependencies
A means of expressing that the value of
one particular attribute is associated with a
single, specific value of another attribute.
If one of these attributes has a particular
value, then the other attribute must have
some other value.

Example of Functional Dependencies
For a particular Salesperson number, 137, there
is exactly one Salesperson Name, Baker,
associated with it.
Why is this true?
a Salesperson Number uniquely identifies a
salesperson, and a person can have only one
name true for every person!
These defining associations are written with a
right-pointing arrow:
Salesperson Number Salesperson Name

determinant functionally dependent
First normal form (1NF)
A table in which the intersection of every
column and record contains only one
value.
Only first normal form (1NF) is critical in creating
appropriate tables for relational databases.
All the subsequent normal forms are optional.
However, to avoid the update anomalies, its
normally recommended that you proceed to third
normal form (3NF).
Converting to 1NF
To convert this version of the Branch table to
1NF:
create a separate table called BranchTelephone to
hold the telephone numbers of branches, by removing
the telNos column from the Branch table along with a
copy of the primary key of the Branch table
(branchNo).
The primary key for the new BranchTelephone table
is the new telNo column.
The Branch and BranchTelephone tables are in
1NF as there is a single value at the intersection
of every column with every record for each table.
partial dependency
Full functional dependency indicates that if A
and B are columns of a table, B is fully
functionally dependent on A, if B is not
dependent on any subset of A.
If B is dependent on a subset of A, this is
referred to as a partial dependency.
If a partial dependency exists on the primary
key, the table is not in 2NF. The partial
dependency must be removed for a table to
achieve 2NF.
Second normal form (2NF)
Definition:
A table that is in first normal form and every non-
primary-key column is fully functionally
dependent on the primary key.
A table that is already in 1NF
The values in each non-primary-key column can
be worked out from the values in all the columns
that make up the primary key.


Second normal form (2NF)
Second normal form applies only to tables
with composite primary keys, that is tables
with a primary key composed of two or
more columns.
A 1NF table with a single column primary
key is automatically in at least 2NF.
A table that is not in 2NF may suffer from
the update anomalies.
TempStaffAllocation table is not in 2NF.
Converting to 2NF (1)
Remove the non-primary-key columns that can be
worked out using only part of the primary key.
Remove the columns that can be worked out from
either the staffNo or the branchNo column but do not
require both.
Remove the branchAddress, name, and position
columns and place them in new tables.
Create two new tables called Branch and TempStaff.
The Branch table will hold the columns describing the details of
branches
The TempStaff table will hold the columns describing the details
of temporary staff.
Converting to 2NF (2)
1. The Branch table is created by removing the
branchAddress column from the
TempStaffAllocation table along with a copy of
the part of the primary key that the column is
related to, which in this case is the branchNo
column.
2. In a similar way, the TempStaff table is created
by removing the name and position columns
from the TempStaffAllocation table along with a
copy of the part of the primary key that the
columns are related to, which in this case is the
staffNo column.
Converting to 2NF (3)
Its not necessary to remove the hoursPerWeek column as the
presence of this column in the TempStaffAllocation table does not
break the rules of 2NF.
To ensure that we maintain the relationship between a temporary
member of staff and the branches at which he or she works for a set
number of hours leave a copy of the staffNo and branchNo
columns to act as foreign keys in the TempStaffAllocation table.
The primary key for the new Branch table is branchNo and the
primary key for the new TempStaff table is staffNo.
The TempStaff and Branch tables must be in 2NF because the
primary key for each table is a single column.
The altered TempStaffAllocation table is also in 2NF because the
non-primary-key column hoursPerWeek is related to both the
staffNo and branchNo columns
Third normal form (3NF)
A table that is already in 1NF and 2NF,
and in which the values in all non-primary-
key columns can be worked out from only
the primary key column(s) and no other
columns.
Transitively dependent
The formal definition for third normal form (3NF) is a table that is in
first and second normal forms and in which no non-primary-key
column is transitively dependent on the primary key.
Transitive dependency is a type of functional dependency that
occurs when a particular type of relationship holds between columns
of a table.
For example, consider a table with columns A, B, and C. If B is
functionally dependent on A (A B) and C is functionally
dependent on B (B C), then C is transitively dependent on A via B
(provided that A is not functionally dependent on B or C).
If a transitive dependency exists on the primary key, the table is not
in 3NF.
The transitive dependency must be removed for a table to achieve
3NF.
Converting to 3NF (1)
Remove the non-primary-key columns that can be worked out using
another non-primary- key column.
remove the columns that describe the branch at which the member
of staff works.
Remove the branchAddress and telNo columns and take a copy of
the branchNo column.
Create a new table called Branch to hold these columns and
nominate branchNo as the primary key for this table.
The branchAddress and telNo columns are candidate keys in the
Branch table as these columns can be used to uniquely identify a
given branch.
The relationship between a member of staff and the branch at which
he or she works is maintained as the copy of the branchNo column
in the StaffBranch table acts as a foreign key.
Converting to 3NF (1)
The new Branch table is in 3NF as all of the non-primary-key
columns can be worked out from the primary key, branchNo.
Although the other two non-primary-key columns in this table,
branchAddress and telNo, can also be used to work out the details
of a given branch, this does not violate 3NF because these columns
are candidate keys for the Branch table.
This example illustrates that the definition for 3NF can be
generalized to include all candidate keys of a table, if any exist.
Therefore, for tables with more than one candidate key can use
the generalized definition for 3NF, which is a table that is in 1NF and
2NF, and in which the values in all the non-primary-key columns can
be worked out from only candidate key column(s) and no other
columns.
Furthermore, this generalization is also true for the definition of 2NF,
which is a table that is in 1NF and in which the values in each non-
primary-key column can be worked out from
summary
Normalization is a technique for producing a set of tables with
desirable properties that supports the requirements of a user or
company.
Tables that have redundant data may have problems called update
anomalies, which are classified as insertion, deletion, or
modification anomalies.
The definition for first normal form (1NF) is a table in which the
intersection of every column and record contains only one value.
The definition for second normal form (2NF) is a table that is
already in 1NF and in which the values in each non-primary-key
column can be worked out from the values in all the column(s) that
make up the primary key.
The definition for third normal form (3NF) is a table that is already
in 1NF and 2NF, and in which the values in all non-primary-key
columns can be worked out from only the primary key column(s) and
no other columns.

Вам также может понравиться