Вы находитесь на странице: 1из 33

Week 8

Normalization
• Describe the process, strengths and weaknesses of Data Normalisation, and
so demonstrate an understanding of when and to what extent the technique
should be applied in practice
Learning Outcome
• Explain and apply the concepts of Functional Dependency and Determinant
through the understanding and construction of Determinacy Diagrams

• Describe and apply understanding of three Normal Forms for relations:


– Convert Un-normalised data into First Normal Form relations, so that data
items contain only single, simple values
– derive Second Normal Form relations by eliminating part-key dependencies
– derive Third Normal Form relations by removing transitive dependencies

• Convert a set of relations to Boyce-Codd Normal form


Functional Dependencies and Normalization for
Relational Databases
Each relation schema consists of a number of (a group) attributes

Each relational database consists of a number of relation schemas

Need some formal measure of why one grouping of attributes into a


relation schema may be better than another

Formal measure: Normalization


The purpose of normalization is to avoid bad schema designs.
Specific characteristics that are to be avoided are:

• Highly redundant data

• Reducing the NULL values in tuples

• Inability to uniquely identify records

• Anomalous creation, deletion or modification of records

• Lossy designs
Redundant Data

This design results in high redundancy for the dept_id,


dept_name & chair attributes.

• It should only be necessary to store these values once


for all students in the same department.
Null Values
• Reducing the NULL values in tuples
– Students not assigned to any dept.: waste the storage
space
– Other difficulties: aggregation operations (e.g., COUNT)
and joins

Anomalies - Modification

• A modification anomaly will occur if we change the


chair of a department in only some of its records. We will
end up with multiple names for chair.
Dept_ID Dept_Name Chair Student_ID Student_Name Address
1 D1 E1 12 ABC A1
1 D1 E1 13 DEF A2
2 D2 E2 21 IJK A3
1 D1 E1 14 LMO A4
3 D3
D3 E3
E1 31 PQR A5
3 D3 E3 98 STU A6

E1 with different names: D1 , D3


Anomalies - Deletion

Dept_ID Dept_Name Chair Student_ID Student_Name Address


1 D1 E1 12 ABC A1
1 D1 E1 13 DEF A2
22 D2 E2 21
21 IJK A3
1 D1 E1 14 LMO A4
3 D3 E3 31 PQR A5
3 D3 E3 98 STU A6

Deleting student 21 will result in deleting and losing data as well about dept
2.
Anomalies - Insertion

Dept_ID Dept_Name Chair Student_ID Student_Name Address


1 D1 E1 12 ABC A1
1 D1 E1 13 DEF A2
2 D2 E2 21 IJK A3
1 D1 E1 14 LMO A4
3 D3 E3 31 PQR A5
3 D3 E3 98 STU A6
4 D4 E4 NULL NULL NULL

Creating a new department without students will result in


adding NULL for student data.
Lossy Design

• To solve the various problems we have come across,


we can address to those problems by decomposing the
relation into multiple relations.

– Care should be taken not to lose the information which is

called
Lossy Decomposition

– We should achieve Lossless Decomposition


Consider the following database schema:

Dept_students(Dept_ID, Dept_name, Chair, Student_ID,


Student_name, Address)

Decomposed into:

Department(Dept_ID, Dept_name, Chair)


Student(Student_ID, Student_name, Address)

This kind of design


•eliminates need for NULL Values
•get rid of the types of anomalies we have come across

BUT
We have LOST the information of the relationship between
student and the department in which the student is enrolled
• Top-down database design:

Requirements

Mini-world

E1
R

? Relation schemas
Conceptual schema

E2
Functional Dependencies
• Dependencies between different
ATTRIBUTES - determinant
• Direct dependency
• Indirect dependency (Transitive)
• Partial dependency
Determinant
– X is a determinant of Y if each value of X has precisely
one (possibly NULL) associated value of Y
– Another way: X is a determinant of Y if and only if
whenever two tuples agree on their X value they agree
on their Y value
– X determines Y or Y is functionally dependent on X
– Formal definition:
• A functional dependency, denoted by X -> Y,
between two sets of attributes X and Y that are
subsets of R specifies a constraint on the possible
tuples that can form a relation state r of R. The
constraint is that, for any two tuples t1 and t2 in r that
have t1[X] = t2[X], we must also have t1[Y] = t2[Y]
– Functional determinancy diagram shows the
dependencies in a visual format X Y
Determinacy Diagram – Direct Dependency

Performer-id Performer-name

Performer-type

Performer-
location

Direct dependency (fully functional dependency)

All attributes in a R must be fully functionally dependent


on the primary key (or the PK is a determinant of all
attributes in R)
Performerid -> Performer-name, Performer-type, Performer-location
Determinacy Diagram – Indirect Dependency

Performer-id Performer-name

Performer-type Fee

Performer-
location

• Indirect dependency (transitive dependency)


– Value of an attribute is not determined directly by
the primary key
Performerid -> Performer-name, Performer-type, Performer-location
Performer-type -> fee
Determinacy Diagram – Partial Dependency
Performer-id Performer-name A functional dependency A->B is partially
dependent if there is some attribute that
Performer-type can be removed from A and the
dependency still holds.

Performer-location

Booking
Date

Agent-id Agent-name

Agent-location

For example, Performerid,Agentid -> BookingDate


Each value of (Performerid, Agentid) is associated with a single value of
BookingDate. However, it is not a full functional dependency because
BookingDate is also functionally dependent on a subset of
(Performerid,Agentid), namely Performerid.
Normalization
Normalization is a bottom-up approach to database design
that begins by examining the relationships between attributes.

A technique for producing a set of relations with desirable


properties given the data requirements of an enterprise.

A validation technique against the top-down approach of first


identifying the main entities and relationships.

First developed by E.F.Codd (1972b) to perform a series of


tests on a relation and to determine whether it satisifies or
violates the requirements of a given normal form.

Three forms proposed initially: First Normal Form (1NF),


Second Normal Form (2NF) and Third Normal Form (3NF).
Normalization
A stronger definition of 3NF is Boyce-Codd Normal Form
(BCNF) introduced by R. Boyce and E.F Codd.

Higher normal forms introduced at a later stage (Fagin,


1977,1979) that go beyond BCNF namely 4NF and 5NF.
Normalization – 1NF
xxxx Yyyy 222 666
zzzz 333 777
Data Entry
Unnormalized Table With repeating
Form
aaaa Xxxx 444 888 groups (More than
www 555 999 one value in a
cell)

xxxx yyyy 2222 666


Contains atomic xxxx zzzz 3333 777
values in each cell First Normal Form
but with
aaaa Xxxx 444 888
redundancy –
‘flattening’ the aaaa www 555 999
table
Unnormalized Form

PerID Per Per Per Fee AgID Ag AgLoc EvID EvName EvType BDate
Name Loc Type Name
100 Baron Mu S 75 A10 Blue PL E01 Gold Mus Jan10
A09 Black Cpe E02 Silver Drama Feb10
A05 Green Qb E03 Diamond Opera Mar10
101 Steed Fr D 100 A11 Red Pr E04 Copper Show Apr10
A12 White Bd E05 Bronze Ballet Feb10
102 Jones Ind A 50 A06 Blk By E06 Jupiter Drama Mar10
103 Eagle Uk C 60 A07 Pink Lon E07 Ruby Ballet May10
A08 Yello Liv E08 Plati Concer Apr10
104 Stoke Hk S 75 A01 Wht Zy E09 Gem Mus Feb10
105 Brass Aus C 60 A02 Brw Sy E10 Choc Show Jun10
A03 Purpl Mel E11 Coffee Mus Apr10
106 Peter Gm D 100 A04 Grey Ams E12 Crystal Opera Aug10
Determinancy Diagram – Performer-name

Performer-id Performer-location

Performer-type Fee

Agent-name
Agent-id
1NF

Agent-location

Event-id Event-name

Event-type

Booking-date

PerID Per Per Per Fee AgID Ag AgLoc EvID EvName EvType BDate
Name Loc Type Name
Normalized Form (1NF)

PerID Per Per Per Fee AgID Ag AgLoc EvID EvName EvType BDate
Name Loc Type Name
100 Baron Mu S 75 A10 Blue PL E01 Gold Mus Jan10
100 Baron Mu S 75 A09 Black Cpe E02 Silver Drama Feb10
100 Baron Mu S 75 A05 Green Qb E03 Diamond Opera Mar10
101 Steed Fr D 100 A11 Red Pr E04 Copper Show Apr10
101 Steed Fr D 100 A12 White Bd E05 Bronze Ballet Feb10
102 Jones Ind A 50 A06 Blk By E06 Jupiter Drama Mar10
103 Eagle Uk C 60 A07 Pink Lon E07 Ruby Ballet May10
103 Eagle Uk C 60 A08 Yello Liv E08 Plati Concer Apr10
104 Stoke Hk S 80 A01 Wht Zy E09 Gem Mus Feb10
105 Brass Aus C 95 A02 Brw Sy E10 Choc Show Jun10
105 Brass Aus C 95 A03 Purpl Mel E11 Coffee Mus Apr10
106 Peter Gm D 75 A04 Grey Ams E12 Crystal Opera Aug10

Redundant
Data Each cell
No
containing one
Primary
value
Key
• First normal form (1NF) - there is only one
value at the intersection of each row and
column of a relation - no set valued attributes
in 1 NF
• To be part of the formal definition of a relation
in the basic (flat) relational model
• Problem with 1NF - no primary key for the
relation
2NF determinancy diagram Relation in 2NF
PerID Per Per Per Fee
Performer- Name Loc Type
Performer-id
name 100 Baron Mu S 75
Performer- Fee 101 Steed Fr D 100
type
102 Jones Ind A 50
Performer- 103 Eagle Uk C 60
location
104 Stoke Hk S 80
105 Brass Aus C 95
106 Peter Gm D 75

EvID EvName EvType

E01 Gold Mus


Event-type E02 Silver Drama
Event-id
E03 Diamond Opera
Event-name E04 Copper Show

E05 Bronze Ballet


E06 Jupiter Drama

E07 Ruby Ballet


2NF determinancy diagram Relation in 2NF

AgID Ag AgLoc
Agent-name Name
Agent-id
A10 Blue PL
Agent-location A09 Black Cpe
A05 Green Qb

A11 Red Pr
A12 White Bd

A06 Blk By
A07 Pink Lon
A08 Yello Liv

A01 Wht Zy

A02 Brw Sy
A03 Purpl Mel

A04 Grey Ams


2NF determinancy diagram Relation in 2NF

PerID AgID EvID BookingDa


Performer- te
id 100 A10 E01 Jan10

Agent-id Booking-date 101 A11 E02 Feb10

102 A05 E03 Mar10


Event-id 103 A12 E04 Apr10
104 A09 E05 Feb10
105 A01 Choc Apr10
106 A20 Cryst Jan10

Partial Dependency identified and


presented in a separate relation.
• A table is in Second normal form (2NF) – if
 It is in 1NF

 It includes no partial dependencies, i.e no attribute is


dependent only on a portion of the primary key.
3NF determinancy diagram Relation in 3NF
PerID Per Per Per
Performer-id Performer- Name Loc Type
name
100 Baron Mu S
Performer-
type 101 Steed Fr D

102 Jones Ind A


Performer-
location 103 Eagle Uk C

104 Stoke Hk S

105 Brass Aus C

106 Peter Gm D

Per Type Fee

Performer- Fee A 50
type
C 60
Transitive Dependency D 100
identified and treated as a S 75
separate relation.
• Third normal form (3NF) - all attributes must
be directly dependent on primary key
• 3NF solves indirect (transitive) dependencies
problem in 1NF and 2NF
• Method: identify all transitive dependencies
and each transitive dependency will form a
new relation, with non-prime attributes
participating in the transitive dependency and
the attribute which determines others as the
attributes for the new relation
Relational Database Schema Produced

Performer(PerID, PerName, PerType,PerLoc)

TypeFee(PerType, Fee)

Agent(AgID, AgName)

AgLocation(AgID, AgLoc)

Event(EvID, EvName, EvType)

Booking(PerID, AgID, EvID,BDate)


Summary

1NF – Looking for Fully Functional Dependency (Direct


Dependency.
Occurrence of redundant data.

2NF – taking away Partial Dependencies from the table in


1NF

3NF – Removing Transitive Dependencies from the table in


1NF and 2NF

Вам также может понравиться