Вы находитесь на странице: 1из 45

Database Systems

Lecture 06
Schema Refinement and
Normalization

Chapter 7, Database System Concepts, Silberschatz, Korth,


Sudarshan
Chapter 10, Database Systems; concepts, designs and
applications, S.K.Singh
Instructor: Sana Yousuf
Fatima Jinnah Women University

1
Goals of Normalization
Decide whether a particular relation R is in
good form.
In the case that a relation R is not in good
form, decompose it into a set of relations {R1,
R2, ..., Rn} such that
Each relation is in good form
The decomposition is a lossless-join decomposition

2
Decomposition
Suppose that relation R contains attributes
A1 ... An. A decomposition of R consists of
replacing R by two or more relations such
that:
Each new relation scheme contains a subset of the
attributes of R (and no attributes that do not
appear in R), and
Every attribute of R appears as an attribute of one
of the new relations.

3
Example of Redundancy

4
Decomposition

Eliminates Redundancy
To get back the original relation: Natural Join

5
Unnecessary Decomposition

Join returns original relation: Fine


No redundancy is removed and SID is stored twice: Unnecessary

6
Bad Decomposition

Association between CID and grade is lost


Join returns more rows than original
7
Database Normalization
Database normalization is the process of
removing redundant data from your tables in to
improve storage efficiency, data integrity, and
scalability.
In the relational model, methods exist for
quantifying how efficient a database is. These
classifications are called normal forms (or NF),
and there are algorithms for converting a given
database between them.
Normalization generally involves splitting existing
tables into multiple ones, which must be re-joined
or linked each time a query is issued.
History
Edgar F. Codd first proposed the process of
normalization and what came to be known
as the 1st normal form in his paper A
Relational Model of Data for Large Shared
Data Banks Codd stated:
There is, in fact, a very simple elimination
procedure which we shall call normalization.
Through decomposition nonsimple domains
are replaced by domains whose elements
are atomic (nondecomposable) values.
Normal Form
Edgar F. Codd originally established three
normal forms: 1NF, 2NF and 3NF. There are
now others that are generally accepted, but
3NF is widely considered to be sufficient for
most applications. Most tables when reaching
3NF are also in BCNF (Boyce-Codd Normal
Form).
Normalization

We discuss four normal forms: first, second, third, and Boyce-


Codd normal forms
1NF, 2NF, 3NF, and BCNF

Normalization is a process that improves a database design


by generating relations that are of higher normal forms.

The objective of normalization:

to
tocreate
createrelations
relationswhere
whereevery
everydependency
dependencyisison
onthe
thekey,
key,the
thewhole
wholekey,
key,
and
andnothing
nothingbut
butthe
thekey
key

11
Normalization

There is a sequence to normal forms:


1NF is considered the weakest,
2NF is stronger than 1NF,
3NF is stronger than 2NF, and
BCNF is considered the strongest

12
Normalization

1NF a relation in BCNF, is also in


3NF

2NF a relation in 3NF is also in


2NF

3NF a relation in 2NF is also in


1NF
BCNF

13
First Normal Form 1NF

We say a relation is in 1NF if all values stored in the


relation are single-valued and atomic
Values are simple and cannot be broken down further

EmpNum EmpPhone EmpDegrees


123 233-9876 BE, MS, PhD
333 233-1231 BA, BSc, PhD
679 233-1231 BSc, MSc

Is
Isthe
theabove
aboverelation
relationin
in INF?
INF?

14
First Normal Form 1NF
NewStu
Stuid lastName major credits status socSecNo
S1001 Smith History 90 Senior 100429500
S1003 Jones Math 95 Senior 010124567
S1006 Lee Math 15 Freshman 088520876
CSC

S1010 Burns English 63 Junior 099320985


Art

NewStu(StuId,
S1060 Jones
NewStu(StuId, lastName,
CSC 25major, Freshman
lastName, major, credits, status,
status, socSe
credits,064624738 socS

Assume students can have more than one major


The major attribute is not single-valued for each
tuple
15
First Normal Form 1NF

For
Foreach
eachmulti-valued
multi-valuedattribute,
attribute,create
createaanew
newtable,
table,ininwhich
whichyou
youplace
placethe
thekey
key
ofofthe
theoriginal
originaltable
tableand
andthe
themulti-valued
multi-valuedattribute.
attribute. Keep
Keepthetheoriginal
originaltable,
table,with
with
its
itskey
key

NewStu(StuId, lastName, major, credits, status, socSecNo)

NewStu2(stuId, lastName, credits,status,


socSecNo)
Majors(stuId, major)
Majors
NewStu2 stuId major
Stuid lastName credits status socSecNo
S1001 History
S1001 Smith 90 Senior 100429500 S1003 Math
S1003 Jones 95 Senior 010124567 S1006 CSC
S1006 Lee 15 Freshman 088520876 S1006 Math
S1010 Art
S1010 English
S1010 Burns 63 Junior 099320985 S1060 CSC
16
First Normal Form 1NF

Another Method

Flatten
Flattenthe
theoriginal
originaltable
tableby
bymaking
makingthe
themulti-valued
multi-valuedattribute
attributepart
partofofthe
thekey
key

Student(stuId, lastName, major, credits, status, socSecNo)

Stuid lastName major credits status socSecNo


S1001 Smith History 90 Senior 100429500
S1003 Jones Math 95 Senior 010124567
S1006 Lee CSC 15 Freshman 088520876

S1006 Lee Math 15 Freshman 088520876

NewStu Table Rewritten in 1NF, with {StuId, major} as primary key


S1010 Burns Art 63 Junior 099320985

17
First Normal Form 1NF

Employee

Unnormalized
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C, Perl, Java
2 Barbara Jones 224 IT Linux, Mac
3 Jake Rivera 201 R&D DB2, Oracle, Java

Employee

Normalized to INF
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java

18
Second Normal Form 2NF

Full Functional Dependency

CLASS
CLASS (Course#,
(Course#, StudID,
StudID, StuLastName,
StuLastName, FacID,
FacID, Sched,
Sched, Ro
R

Only one faculty member teaches each class

Course#, StudID StuLastName, FacID, Sched, Room,


Composite Key

We also have the following functional dependencies


Attributes
Attributesare
arefunctionally
Course# FacID, Sched, Room dependent
functionally
dependent on thecombination
on the combination
Course#
Course# and StudIDbut
and StudID butalso
StudID StuLastName functionally
functionally dependent onaa
dependent on
also
subset
subsetofofthat
thatcombination.
combination.
19
Second Normal Form 2NF

Full Functional Dependence

In
Inaarelation
relationR,
R,attribute
attributeBBofofRRisisfully
fullyfunctionally
functionallydependent
dependenton
onananattribute
attributeor
or
set
setofofattributes
attributesAAofofRRififBBisisfunctionally
functionallydependent
dependenton onAAbut
butnot
notfunctionally
functionally
dependent
dependenton onany
anyproper
propersubset
subsetofofAA

Course#, StudID StuLastName, FacID, Sched, Room,


Course# FacID, Sched, Room
(Partial FD)

StudID StuLastName
(Partial FD)

Grade
Gradeisisfully
fullyfunctionally
functionallydependent
dependenton
onCourse#,
Course#,StudID
StudID

20
Second Normal Form 2NF

AArelation
relationisisinin2NF
2NFififand
andonly
onlyififititisisinin1NF
1NFand
andall
allthe
thenonkey
nonkeyattributes
attributesare
arefully
fully
functionally
functionallydependent
dependenton onthe
thekey
key

If a relation is in 1NF and the key consists of a


single attribute:
The relation is automatically in 2NF

2NF Only when the key is composite

21
Second Normal Form 2NF

i.i. Identify
Identifyeach
eachpartial
partialFD
FD
ii.ii. Remove
Removethe
theattributes
attributesthat
thatdepend
dependon
oneach
eachofofthe
thedeterminants
determinantsso
so
identified
identified
iii.
iii. Place
Placethese
thesedeterminants
determinantsininseparate
separaterelations
relationsalong
alongwith
withtheir
theirdependent
dependent
attributes
attributes
iv.
iv. InInoriginal
originalrelation
relationkeep
keepthe
thecomposite
compositekey
keyand
andany
anyattributes
attributesthat
thatare
arefully
fully
functionally
functionallydependent
dependenton
onall
allofofitit
v.v. Even
Evenififthe
thecomposite
compositekey
keyhas
hasno
nodependent
dependentattributes,
attributes,keep
keepthat
thatrelation
relation
totoconnect
connectlogically
logicallythe
theothers
others

22
Second Normal Form 2NF

CLASS
CLASS (Course#,
(Course#, StudID,
StudID, StuName,
StuName, FacID,
FacID, Sched,
Sched, Room
Room
FDs grouped by determinant:
Course#, StudID StuLastName, FacID, Sched, Room, G
Course# FacID, Sched, Room
StudID StuLastName
Create tables grouped by determinants:

Student (StudID, StuLastName)


Course (Course#, FacID, Sched, Room)
Keep relation with original composite key, with attributes FD on it, if any

Class2 (Course#, StudID, Grade)


23
Second Normal Form 2NF
courseNo stuId stuLastName facId schedule room grade

ART103A S1001 Smith F101 MWF9 H221 A

ART103A S1010 Burns F101 MWF9 H221

ART103A S1006 Lee F101 MWF9 H221 B First Normal Form Relation
CSC201A S1003 Jones F105 TUTHF10 M110 A

CSC201A S1006 Lee F105 TUTHF10 M110 C

HST205A S1001 Smith F202 MWF11 H221

Course Student Class2


courseNo stuId grade stuId stuLastName courseNo facId schedule room

ART103A S1001 A S1001 Smith ART103A F101 MWF9 H221

ART103A S1010 S1010 Burns CSC201A F105 TUTHF10 M110

ART103A S1006 B S1006 Lee HST205A F202 MWF11 H221

CSC201A S1003 A S1003 Jones

CSC201A S1006 C

HST205A S1001
Second Normal Form Relations

24
Third Normal Form 3NF

Transitive Dependency
If A, B, and C are attributes of relation R, such that A B, and B
C, then C is transitively dependent on A
Example:

NewStudent
NewStudent (stuId,
(stuId, lastName,
lastName, major,
major, credits,
credits,
status)
status)

FD:creditsstatus
FD:creditsstatus

By
By transitivity:
transitivity:
stuIdcredits creditsstatus
stuIdcredits creditsstatus implies
implies
25
stuIdstatus
Third Normal Form 3NF

AArelation
relationisisin
in3NF
3NFifif itit isisin
in 2NF
2NF and
andno
no nonkey
nonkey attribute
attributeisis
transitively
transitivelydependent
dependenton onthe thekey
key

Example

NewStudent
NewStudent (stuId,
(stuId, lastName,
lastName, major,
major,
credits,
credits, status)
status)
With FD:creditsstatus

Is NOT in 3NF

26
Third Normal Form 3NF

NewStudent (stuId, lastName, major,


credits, status)
creditsstatus
Remove
Removethethedependent
dependentattribute,
attribute, status,
status,from
fromthe
therelation
relation
Create a new table with the dependent attribute and its
Create a new table with the dependent attribute and its
determinant,
determinant,credits
credits
Keep
Keepthe
thedeterminant
determinantin
inthe
theoriginal
originaltable
table

NewStu2 (stuId, lastName, major,


credits)
Stats (credits, status)

27
Third Normal Form 3NF
NewStudent
Stuid lastName Major Credits Status
S1001 Smith History 90 Senior
S1003 Jones Math 95 Senior
S1006 Lee CSC 15 Freshman

S1010 Burns Art 63 Junior

TransitiveJones
S1060
Dependency
CSC 25 Freshman

NewStu2 Stats
Stuid lastName Major Credits Credits Status
S1001 Smith History 90 15 Freshman
S1003 Jones Math 95
S1006 Lee CSC 15 25 Freshman
63 Junior

S1010 Burns Art 63 90 Senior


95 Senior

Removed
S1060 Transitive
Jones Dependency
CSC 25

28
Stages of Normalisation
Unnormalised
(UDF)
Remove repeating groups
First normal form
(1NF)
Remove partial dependencies
Second normal form
(2NF)
Remove transitive dependencies
Third normal form
(3NF)
Remove remaining functional
dependency anomalies
Boyce-Codd normal
form (BCNF)
Remove multivalued dependencies
Fourth normal form
(4NF)
Remove remaining anomalies
Fifth normal form
(5NF)
Unnormalised Normal Form (UNF)
Definition: A relation is unnormalised when it has not had any
normalisation rules applied to it, and it suffers from various anomalies.

This only tends to occur where the relation has been designed
using a bottom-up approach. i.e., the capturing of attributes to a
Universal Relation from a screen layout, manual report, manual
document, etc...
Recap up to 3NF

Example

inv_no line_no prod_no prod_desc qty

Is the relation in 2NF ? Since


Sinceprod_no
prod_noisisnot
notaacandidate
candidate
key
keyand
andwewehave:
have:
Is the relation in 3NF ?
prod_no
prod_no prod_desc
prod_desc

31
Recap up to 3NF

Example

ename ssn bdate address dnumber dname

Is the relation in 2NF ? Since


Sincednumber
dnumberisisnot
notaacandidate
candidatekey
keyand
and
we have:
we have:
Is the relation in 3NF ? dnumber
dnumber dname
dname

32
Boyce-Codd Normal Form

AArelation
relationisis in
inBCNF
BCNF ifif and
andonly
onlyififevery
everydeterminant
determinant isisaa
candidate
candidatekey key

For a relation with only one candidate key, 3NF


and BCNF are equivalent
We need to check for BNF if:
The candidate keys in the relation are composite keys
(that is, they are not single attributes)
There is more than one candidate key in the relation
The keys are not disjoint, that is, some attributes in the
keys are common

33
BCNF and 3NF

The difference between 3NF and BCNF is


that for a functional dependency A B,
3NF allows this dependency in a relation if
B is a primary-key attribute and A is not a
candidate key

BCNF insists that for this dependency to


remain in a relation, A must be a
candidate key

34
BCNF: Example

NewFac (facName, dept, office, rank, dateHired)

FDs:
office dept
facName,dept office, rank, dateHired Candidate Keys
facName,office dept, rank, dateHired
Primary Key
NewFac is not BCNF because office is not a candidate key

To make it BCNF, remove the dependent attributes to a new relation,


with the determinant as the key
Project into
Fac1 (office, dept)
Fac2 (facName, office, rank, dateHired)

Note we have lost a functional dependency in Fac2 no longer able


to see that {facName, dept} is a determinant, since they are in
different relations
35
Faculty
facName dept office rank dateHired
Adams Art A101 Professor 1975
Byrne Math M201 Assistant 2000
Davis Art A101 Associate 1992
Gordon Math M201 Professor 1982
Hughes Mth M203 Associate 1990
Smith CSC C101 Professor 1980
Smith History H102 Associate 1990
Tanaka CSC C101 Instructor 2001
Vaughn CSC C101 Associate 1995

Fac1 Fac2
office dept facName office rank dateHired
A101 Art Adams A101 Professor 1975

C101 CSC Byrne M201 Assistant 2000


C105 CSC Davis A101 Associate 1992

H102 History Gordon M201 Professor 1982


M201 Math Hughes M203 Associate 1990

M203 Math Smith C101 Professor 1980


Smith H102 Associate 1990
Tanaka C101 Instructor 2001
Vaughn C101 Associate 1995

36
Normalization: Example 1

Consider this Employee relation

EmpNum EmpName DeptNum DeptName

EmpName, DeptNum, and DeptName are non-key


attributes.
Is the relation in 2NF ?

Is the relation in 3NF ?

37
Normalization: Example 1

EmpNum EmpName DeptNum DeptName

We correct the situation by decomposing the original relation


into two 3NF relations.

EmpNum EmpName DeptNum DeptNum DeptName


Verify these two relations are in 3NF
Are they in BCNF?
38
Normalization: Example 2

Relation that stores information


about projects in large business

Work (projName, projMgr, empId, hours, empName, budget, startDate,


salary, empMgr, empDept, rating)

prijName projMgr empId hours empName budget startDate salary empMgr empDept rating

Jupiter Smith E101 25 Jones 100000 01/15/04 60000 Levine 10 9

Jupiter Smith E105 40 Adams 100000 01/15/04 55000 Jones 12

Jupiter Smith E110 10 Rivera 100000 01/15/04 43000 Levine 10 8

Maxima Lee E101 15 Jones 200000 03/01/04 60000 Levine 10

Maxima Lee E110 30 Rivera 200000 03/01/04 43000 Levine 10

Maxima Lee E120 15 Tanaka 200000 03/01/04 45000 Jones 15

39
Normalization: Example 2
1. Each project has a unique name.
2. Although project names are unique, names of employees and
managers are not.
3. Each project has one manager, whose name is stored in projMgr.
4. Many employees can be assigned to work on each project, and an
employee can be assigned to more than one project. The attribute
hours tells the number of hours per week a particular employee is
assigned to work on a particular project.
5. budget stores the amount budgeted for a project, and startDate
gives the starting date for a project.
6. salary gives the annual salary of an employee.
7. empMgr gives the name of the employees manager, who might not
be the same as the project manager.
8. empDept gives the employees department. Department names are
unique. The employees manager is the manager of the employees
department.
9. rating gives the employees rating for a particular project. The
project manager assigns the rating at the end of the employees
work on the project. 40
Normalization: Example 2
Functional dependencies
projName projMgr, budget, startDate
empId empName, salary, empMgr, empDept
projName, empId hours, rating
empDept empMgr
empMgr does not functionally determine empDept since people's
names were not unique (different managers may have same
name and manage different departments or a manager may
manage more than one department
projMgr does not determine projName
Primary Key
projName, empId since every member depends on that
combination

41
projName
projNameprojMgr,
projMgr,budget,
budget,startDate
startDate
empId
empId empName, salary, empMgr,em
empName, salary, empMgr, e
projName,
projName,empId
empIdhours,
hours,rating
rating

Normalization: Example 2
empDept
empDeptempMgr
empMgr

First Normal Form


Each cell is single valued, the relation Work is in 1NF

Second Normal Form


Pratial dependencies
projName projMgr, budget, startDate

empId empName, salary, empMgr, empDept

Transform to:
Proj
Proj(projName,
(projName,projMgr,
projMgr,budget,
budget,startDate)
startDate)
Emp
Emp(empId,
(empId,empName,
empName,salary,
salary,empMgr,
empMgr,
empDept)
empDept)
Work1
Work1(projName,
(projName,empId,
empId,hours,
hours,rating)
rating)

42
Proj
Proj(projName,
(projName,projMgr,
projMgr,budget,
budget,startDate)
startDate)
Emp (empId, empName, salary, empMgr,
Emp (empId, empName, salary, empMgr,
empDept)
empDept)
Work1
Work1(projName,
(projName,empId,
empId,hours,
hours,rating)
rating)

Normalization: Example 2
Second Normal Form
Proj Work1
prijName empId hours rating
prijName projMgr budget startDate
Jupiter E101 25 9
Jupiter Smith 100000 01/15/04
Jupiter E105 40
Maxima Lee 200000 03/01/04
Jupiter E110 10 8
Maxima E101 15
Maxima E110 30
Maxima E120 15
Emp
empId empName salary empMgr empDept
E101 Jones 60000 Levine 10
E105 Adams 55000 Jones 12
E110 Rivera 43000 Levine 10
E101 Jones 60000 Levine 10
E110 Rivera 43000 Levine 10
E120 Tanaka 45000 Jones 15

43
Proj
Proj(projName,
(projName,projMgr,
projMgr,budget,
budget,startDate)
startDate)
Emp
Emp (empId, empName, salary, empMgr,empDept)
(empId, empName, salary, empMgr, empDept)
Work1 (projName, empId, hours, rating)
Work1 (projName, empId, hours, rating)

Normalization: Example 2 empDept empM


empDept empM

Third Normal Form


Proj in 3NF no non-key atrribute functionally
determines another non-key attribute
Work1 in 3NF no transitive dependency involving
hours or rating
Emp not in 3NF transitive dependency
empDept empMgr
Need two relations:
Emp1 (empId, empName, salary, empDept)
Dep (empDept, empMgr)

44
Normalization: Example 2

Third Normal Form


Dept
empDept empMgr
empId empNam salary empDept 10 Levine
e 12 Jones
E101 Jones 60000 10
15 Jones
E105 Adams 55000 12
E110 Rivera 43000 10
E120 Tanaka 45000 15
Work1
prijName empId hours rating
Proj Jupiter E101 25 9
prijName projMgr budget startDate
Jupiter E105 40
Jupiter Smith 100000 01/15/04
Jupiter E110 10 8
Maxima Lee 200000 03/01/04 Maxima E101 15
Maxima E110 30
Maxima E120 15

This is also BCNF since the only determinant in each relation is


the primary key
45

Вам также может понравиться