Вы находитесь на странице: 1из 21

Normalization

Normalization

A technique for producing a set of


relations with desirable properties,
given the data requirements of an
enterprise.
Purpose of Normalization
The purpose of normalization is to identify a suitable set of relations that
support the data requirements of an enterprise. The characteristics of a
suitable set of relations include the following:
 the minimal number of attributes necessary to support the data
requirements of the enterprise
 attributes with a close logical relationship (described as functional
dependency) are found in the same relation
 minimal redundancy with each attribute represented only once with the
important exception of attributes that form all or part of foreign keys, which
are essential for the joining of related relations.
The benefits of using a database that has a suitable set of relations is that
the database will be easier for the user to access and maintain the data, and
take up minimal storage space on the computer.
Normalization supports Database
Design

Normalization is a formal technique that can be


used at any stage of database design. However, in
this section we highlight two main approaches for
using normalization
Data Redundancy
• Data redundancy is a condition created within a database or data
storage technology in which the same piece of data is held in two
separate places.
• This can mean two different fields within a single database, or two
different spots in multiple software environments or platforms.
Whenever data is repeated, this basically constitutes data
redundancy. This can occur by accident, but is also done deliberately
for backup and recovery purposes.
Data Anomalies
Normalization is the process of splitting relations into well structured
relations that allow users to insert, delete, and update tuples without
introducing database inconsistencies. Without normalization many
problems can occur when trying to load an integrated conceptual
model into the DBMS. These problems arise from relations that are
generated directly from user views are called anomalies. There are
three types of anomalies:
i. insertion anomalies.
ii. Update
iii. deletion
Insertion Anomaly
An insertion anomaly is the inability to add data to the database due to
absence of other data. For example, assume Student_Group is defined
so that null values are not allowed. If a new employee is hired but not
immediately assigned to a Student_Group then this employee could
not be entered into the database. This results in database
inconsistencies due to omission.
Update Anomaly
An update anomaly is a data inconsistency that results from data redundancy and a partial
update. For example, each employee in a company has a department associated with them as
well as the student group they participate in.
If A. Bruchs’ department is an error it must be updated at least 2 times or there will be inconsistent
data in the database. If the user performing the update does not realize the data is stored
redundantly the update will not be done properly.

Employee_ID Name Department Student_Group


123 J. Longfellow Accounting Beta Alpha Psi
234 B. Rech Marketing Marketing Club
234 B. Rech Marketing Management Club
456 A. Bruchs CIS Technology Org.
456 A. Bruchs CIS Beta Alpha Psi
Delete Anomaly
A deletion anomaly is the unintended loss of data due to deletion of
other data. For example, if the student group Beta Alpha Psi disbanded
and was deleted from the table above, J. Longfellow and the
Accounting department would cease to exist. This results in database
inconsistencies and is an example of how combining information that
does not really belong together into one table can cause problems.
Functional Dependency
• The attributes of a table is said to be dependent on each other when
an attribute of a table uniquely identifies another attribute of the
same table.
For example: Suppose we have a student table with attributes: Stu_Id,
Stu_Name, Stu_Age. Here Stu_Id attribute uniquely identifies the
Stu_Name attribute of student table because if we know the student id
we can tell the student name associated with it. This is known as
functional dependency and can be written as Stu_Id->Stu_Name or in
words we can say Stu_Name is functionally dependent on Stu_Id.
Example of Functional Dependency
• Formally:
If column A of a table uniquely identifies the column B of same table
then it can represented as A->B (Attribute B is functionally dependent
on attribute A)
Determinant
Refers to the attribute, or group of attributes, on the left-hand side of
the arrow of a functional dependency
Types of Functional Dependencies
• Trivial functional dependency
• non-trivial functional dependency
• Multivalued dependency
• Transitive dependency
Trivial functional dependency
• The dependency of an attribute on a set of attributes is known as trivial
functional dependency if the set of attributes includes that attribute.
• Symbolically: A ->B is trivial functional dependency if B is a subset of A.
• The following dependencies are also trivial: A->A & B->B
• For example: Consider a table with two columns Student_id and
Student_Name.
• {Student_Id, Student_Name} -> Student_Id is a trivial functional
dependency as Student_Id is a subset of {Student_Id,
Student_Name}. That makes sense because if we know the values of
Student_Id and Student_Name then the value of Student_Id can be
uniquely determined.
• Also, Student_Id -> Student_Id & Student_Name -> Student_Name are
trivial dependencies too
non-trivial functional dependency
• If a functional dependency X->Y holds true where Y is not a subset of
X then this dependency is called non trivial Functional dependency.
• If a FD X->Y holds true where X intersection Y is null then this
dependency is said to be completely non trivial function dependency.
Multivalued dependency
Multivalued dependency occurs when there are more than
one independent multivalued attributes in a table.
For example: Consider a bike manufacture company, which produces two colors (Black
and white) in each model every year.
Here columns manuf_year and color are independent of each other and dependent on
bike_model. In this case these two columns are said to be multivalued dependent on
bike_model. These dependencies can be represented like this:

bike_model ->> manuf_year


bike_model ->> color
bike_model manuf_year color

M1001 2007 Black

M1001 2007 Red

M2012 2008 Black

M2012 2008 Red

M2222 2009 Black

M2222 2009 Red


Transitive dependency
• A functional dependency is said to be transitive if it is indirectly formed by
two functional dependencies. For e.g.
• X -> Z is a transitive dependency if the following three functional
dependencies hold true:
• X->Y
• Y does not ->X
• Y->Z
• Note: A transitive dependency can only occur in a relation of three of more
attributes. This dependency helps us normalizing the database in 3NF
(3rd Normal Form).
• Example: Let’s take an example to understand it better:
Book Author Author_age

Game of Thrones George R. R. Martin 66

Harry Potter J. K. Rowling 49

Dying of the Light George R. R. Martin 66

• {Book} ->{Author} (if we know the book, we knows the author name)
• {Author} does not ->{Book}
• {Author} -> {Author_age}
• Therefore as per the rule of transitive dependency: {Book} -> {Author_age}
should hold, that makes sense because if we know the book name we can
know the author’s age

Вам также может понравиться