Вы находитесь на странице: 1из 41

PRESENTATION

ON
NORMALIZATION
Supervised By:-
Mr. Swapnil Sir
Presented By:-
Priya Mittal
M.Sc.[CS]
CONTENTS
What is Normalization
History
Objective
Data Anomalies
Normal Forms
Advantages and
Disadvantages
Conclusion
References



Normalization

Normalization is the process of restructuring the
logical data model of a database to eliminate
redundancy, organize data efficiently and reduce
repeating data and to reduce the potential for anomalies
during data operations.

Data normalization also may improve data consistency
and simplify future extension of the logical data model.

The objective of normalization: to create relations
where every dependency is on the key, the whole key,
and nothing but the key.

HISTORY


O Edgar F. Codd first proposed the process of
normalization and what came to be known as the 1st
normal form in his paper A Relational Model of Data
for Large Shared Data Banks
O Codd stated:
There is, in fact, a very simple elimination procedure
which we shall call normalization. Through
decomposition non-simple domains are replaced by
domains whose elements are atomic (non-
decomposable) values.
OBJECTIVE
O The main goal of Database Normalization
is to restructure the logical data model of
a database to:
O Eliminate redundancy
O Organize data efficiently
O Reduce the potential for data anomalies.
Data Anomalies
Normalization is the process of removing potential
anomalies from the database design.

These anomalies include:

Insertion anomalies
Update anomalies
Deletion anomalies

Without Normalization, it becomes difficult to
handle and update the database, without facing data
loss. Insertion, Updation and Deletion Anamolies
are very frequent if Database is not Normalized.

To understand these anomalies let us take an
example of Student table.



S_id S_Name S_Address Subject_opted
401 Adam Noida Bio
402 Alex Panipat Maths
403 Stuart Jammu Maths
404 Adam Noida Physics
Updation Anamoly : To update address of a student who occurs twice
or more than twice in a table, we will have to update S_Address
column in all the rows, else data will become inconsistent.
Insertion Anamoly : Suppose for a new admission, we have a Student
id(S_id), name and address of a student but if student has not opted for
any subjects yet then we have to insert NULL there, leading to
Insertion Anamoly.
Deletion Anamoly : If (S_id) 401 has only one subject and temporarily
he drops it, when we delete that row, entire student record will be
deleted along with it.

Normal Form
O Edgar F. Codd originally established three
normal forms: 1NF, 2NF and 3NF. There
are now others that are generally
accepted, but 3NF is widely considered to
be sufficient for most applications. Most
tables when reaching 3NF are also in
BCNF (Boyce-Codd Normal Form).
Normalization
BCNF
3NF
2NF
1NF
a relation in BCNF, is also
in 3NF

a relation in 3NF is also in
2NF

a relation in 2NF is also in
1NF


I am going to show you one simple E-R model
database.



Functional Dependencies

We say an attribute, B, has a functional dependency on
another attribute, A, if for any two records, which have
the same value for A, then the values for B in these two
records must be the same. We illustrate this as:
A B
Example: Suppose we keep track of employee email addresses,
and we only track one email address for each employee. Suppose
each employee is identified by their unique employee number. We
say there is a functional dependency of email address on employee
number:

employee number email address
Functional Dependencies
EmpNum EmpEmail EmpFname EmpLname
123 jdoe@abc.com John Doe
456 psmith@abc.com Peter Smith
555 alee1@abc.com Alan Lee
633 pdoe@abc.com Peter Doe
787 alee2@abc.com Alan Lee
If EmpNum is the PK then the FDs:
EmpNum EmpEmail
EmpNum EmpFname
EmpNum EmpLname
must exist.
Functional Dependencies
EmpNum EmpEmail
EmpNum EmpFname
EmpNum EmpLname
EmpNum
EmpEmail
EmpFname
EmpLname
EmpNum EmpEmail EmpFname EmpLname
3 different ways
you might see FDs
depicted
Determinant
Functional Dependency

EmpNum EmpEmail
Attribute on the LHS is known as the determinant
EmpNum is a determinant of EmpEmail.
EmpEmail is determines.
(determinants) Marks -> Grade (determines)

REMARK: In the result relation, Marks attribute is not a key attribute .Hence it can be
concluded that key attributes are determinants but not all the determinants are key attributes.

Full functional dependency : In above example Marks is fully
functional dependent on student#Course# and not on the sub set of
Student#Course# .
This means marks cannot be determined either by student # or Course#
alone . It can be determined by using Student# and Course# together. Hence
Marks is fully functional dependent on student#course#.

O CourseName is not fully functionally dependent on student#course# because one of the subset course#
determines the course name and Student# does not having role in deciding Course name .Hence
CourseName is not fully functional dependent on student #Course#.







REMARK: Formal Definition of full functional dependency: In a given
relation R ,X and Y are attributes. Y is fully functionally dependent
on attribute X only if it is not functionally dependent on sub-set of X.
However X may be composite in nature.



Transitive dependency
Consider attributes A, B, and C, and where
A B and B C.
Functional dependencies are transitive, which
means that we also have the functional dependency
A C
We say that C is transitively dependent on A
through B.
Transitive Dependency: In above example ,
Room# depends on Instructor Name and in turn
depends on Course# .Here Room# transitively
depends on Course#.








O Similarly Grade depends on Marks, in turn Marks
depends on Student#Course# hence Grade Fully
transitively depends on Student#Course#.



.


Partial Dependency: In the above relationship CourseName, IName,Room#
are partially dependent on composite attribute Student#Course# because
Course# alone can defines the coursename, IName,Room#.










O Formal Definition of Partial dependency: In a given relation R, X and Y are
attributes. Attribute Y is partially dependent on the attribute X only if it is
dependent on subset attribute X .However X may be composite in nature.


First Normal Form
First normal form (1NF) lays the groundwork for
an organized database design:
Ensure that each table has a primary key: minimal
set of attributes which can uniquely identify a
record.
Eliminate repeating groups (categories of data which
would seem to be required a different number of
times on different records) by defining keyed and
non-keyed attributes appropriately.
Atomicity: Each attribute must contain a single
value, not a set of values.

First Normal Form
The following in not in 1NF
EmpNum EmpPhone EmpDegrees
123 233-9876
333 233-1231 BA, BSc, PhD
679 233-1231 BSc, MSc
EmpDegrees is a multi-valued field:
employee 679 has two degrees: BSc and MSc
employee 333 has three degrees: BA, BSc, PhD
First Normal Form
EmpNum EmpDegree
333 BA
333 BSc
333 PhD
679 BSc
MSc 679
EmpNum EmpPhone
123 233-9876
333 233-1231
679 233-1231
An outer join between Employee and EmployeeDegree will
produce the information we saw before
Employee
EmployeeDegree
Second Normal Form
A relation is in 2NF if it is in 1NF, and every non-key
attribute is fully dependent on each candidate key. (That
is, we dont have any partial functional dependency.)
Second normal form (2NF) If a table has a
composite key, all attributes must be related to the
whole key:
No partial dependency exists between non-key
attributes and key attributes.
The database must meet all the requirements of the
first normal form.
Data which is redundantly duplicated across
multiple rows of a table is moved out to a separate
table.


Continued
O To remove this partial dependency we need to split
Student_Course_Result table into four separate tables,
STUDENT, COURSE, RESULT and EXAM_DATE tables
as shown in figure.

Continued
Continued
Continued
Continued
1. In the first table (STUDENT), the key attribute is Student# and all
other non-key attributes, StudentName and DateOfBirth are fully
functionally dependant on the key attribute.

2. In the Second Table (COURSE) , Course# is the key attribute and all
the non-key attributes, CourseName, DurationInDays are fully
functional dependant on the key attribute.

3. In third table (RESULT) Student#Course# together are key attributes
and all other non-key attributes, Marks and Grade are fully functional
dependant on the key attributes.

4. In the fourth Table (EXAM DATE) Course# is the key attribute and the
non-key attribute, DateOfExam is fully functionally dependant on the
key attribute.

Third Normal Form

Third normal form (3NF) requires that data stored
in a table be dependent only on the primary key, and
not on any other field in the table.
The database must meet all the requirements of the
second normal form.
Any field which is dependent not only on the
primary key but also on another field is moved out to
a separate table.
All Non-Key attributes non-transitively depends on
Primary key.

A relation R is said to be in 3NF if and
only if
It is in 2NF
No transitive dependency exists between non-key attributes
and key attributes.

In the above RESULT table Student# and Course# are the key
attributes. All other attributes, except grade are non-partially,
non transitively dependant on key attributes. The grade
attribute is dependent on Marks and in turn Marks is
dependent on Student# Course#. To bring the table in 3NF we
need to take off this transitive dependency.

Continued
Continued

Boyce-Codd Normal Form
(BCNF)

O Boyce-Codd normal form (BCNF)

O A relation is in BCNF, if and only if, every
determinant is a candidate key.

O The difference between 3NF and BCNF is that for
a functional dependency A->B, 3NF allows this
dependency in a relation if B is a primary-key
attribute and A is not a candidate key,

O whereas BCNF insists that for this dependency to
remain in a relation, A must be a candidate key.


FD1: clientNo, interviewDate -> interviewTime, staffNo, roomNo (Primary Key)

FD2: staffNo, interviewDate, interviewTime- > clientNo (Candidate key)

FD3: roomNo, interviewDate, interviewTime -> clientNo, staffNo (Candidate key)

FD4: staffNo, interviewDate- > roomNo (not a candidate key)

O As a consequence the ClientInterview relation may suffer from update anomalies.

O For example, two tuples have to be updated if the roomNo need be changed for staffNo SG5 on the
13-May-02.


O To transform the ClientInterview relation to BCNF, we must
remove the violating functional dependency by creating two new
relations called Interview and StaffRoom as shown below:
1. Interview (clientNo, interviewDate, interviewTime, staffNo)
2. StaffRoom (staffNo, interviewDate, roomNo)




Advantages

Greater overall database organization
Reduction of redundant data
Data consistency within the database
A much more flexible database design
A better handle on database security
Enforces concept of referential integrity



Disadvantages

There is one substantial drawback of a normalized
database: reduced database performance.

The factors compromised include CPU usage, memory
usage, and input/output (I/O). To In other words, a
normalized database requires much more CPU,
memory, and I/O to process transactions and database
queries.
Conclusion
O We have seen how Database
Normalization can decrease redundancy,
increase efficiency and reduce anomalies
by implementing three of seven different
levels of normalization called Normal
Forms. The first three NFs are usually
sufficient for most small to medium size
applications.
References
O Hillyer Mike, MySQL AB. An Introduction to Database
Normalization, http://dev.mysql.com/tech-
resources/articles/intro-to-normalization.html, accessed
October 17, 2006.


O Microsoft. Description of the database normalization
basics, http://support.microsoft.com/kb/283878 , accessed
October 17, 2006.


O Wikipedia. Database Normalization.
http://en.wikipedia.org/wiki/Database_normalization.html ,
accessed October 17, 2006.


May 2012 91.2814 40
May 2012 91.2814 41

Вам также может понравиться