Вы находитесь на странице: 1из 35

Normalization ..1..

P. M. Jat @ DAIICT

Normalization
Normalization is the process of analyzing the given relation schema based on FDs and candidate keys to achieve desirable properties of Minimizing redundancies and Minimizing update anomalies. Defines, various Normal Forms as measure of goodness of a relation

10/17/2011

Database Systems

Normal forms
Initially Codd proposed three normal forms, which he called First, Second, and Third normal forms. A stronger definition of 3NF, called Boyce-Codd Norm Form (BCNF) was proposed later by Boyce and Codd. All these normal forms are based on the functional dependencies among the attributes of a relation. Later a Fourth normal form (4NF) and Fifth normal form (5NF) were proposed based on multi-value dependencies and join dependencies respectively
10/17/2011 Database Systems 3

First Normal Form


Definition of first normal form is bit messy. Original definition of 1NF says that attributes have atomic values Defining atomicity subtle is Name atomic (can be broken into FName, LName), is Date atomic (can be broken into Day, Month, and Year). If yes, why an Address can not be atomic, why cant Skills (list of values) be atomic?
10/17/2011 Database Systems 4

First Normal Form


As per CJ Date, 1NF requires that value for every attribute in a tuple is a single value of appropriate type And type could be anything scalar, complex, arrays, user defined types, and even relations that is what modern DBMS supports ORDBMS! (Examples on next slide)
Note: DLocations (in Company) and Residents (in HoR) could be stored in Arrays, if we permit such types for an attribute

However permitting multi-values (like arrays), or relations (nested relations) as attributes, goes against basic understanding of relational model and you may need to perform non-relation operations to retrieve data.
10/17/2011 Database Systems 5

Examples Using List and Relations as attributes


Multi-values:
Department(DNo, DName, MgrSSN, Dlocations) Where locations are stored as set or list {Delhi, Mumbai, Pune}

Nested Relations:
Invoice(InvNo, Date, CustID, InvoiceItems); where

InvoiceItems is a relation, typicallyInvoiceItems(ItemCode, ItemName, Qty, Rate)

10/17/2011

Database Systems

A popular definition of First Normal Form


A relation having no repeating groups is in First Normal Form. For example following are repeating values and are not allowed in First Normal Form: value for DLocation as
set or list {Delhi, Mumbai, Pune} and Resident1 and Resident2 in HoR Database.

10/17/2011

Database Systems

Two ways of understanding Normal Forms


Begin with BCNF (more stricter form), and come down to 2NF, or Reverse, i.e. start with 2NF These days First approach is preferred with following reasons: Most relations that you design using your common sense, or by mapping ER to Relations; you are likely to have BCNF Second, 2NF and 3NF are discussed more for historical (or possibly academic) reasons
10/17/2011 Database Systems 8

FDs in canonical form


Have left side is irreducible (or minimal) FDs; for example following FDs hold: {ssn, fname} salary hold in company database but it is not irreducible, because it can be reduced to ssn salary; while {ssn, pno} hours} is irreducible, because you can not drop any of attribute Have single attribute in right hand side
10/17/2011 Database Systems 9

BCNF
A relation R is in Boyce-Codd Normal Form, when determinant of every FD that holds on R, is super-key of R. In other words, For every FD A relation R, A is its super key. B that holds on

10/17/2011

Database Systems

10

Relations in Company Database


employee(ssn, fname, salary, superssn, dno) department(dno, dname, mgrssn, mgrstartdate) dept_locations(dno, dlocation) project(pno, pname, plocation, dno) works_on(essn, pno, hours) dependent(essn, dep_name, dep_bdate, relationship)

Check if these relations are in BCNF (a desirable form) ?


10/17/2011 Database Systems 11

FDs in Company Database


ssn -> fname, salary, superssn, dno, dname, mgrssn, mgrstartdate dno -> dname, mgrssn, mgrstartdate pno -> pname, plocation, dno, dname, mgrssn, mgrstartdate {essn, pno} -> hours {essn, dep_name} -> dep_bdate, relationship

10/17/2011

Database Systems

12

BCNF
Use of term key or super-key in definition of BCNF may cause confusion: Consider the definition- for every FD A B that holds on relation R, A is its super key, and following FDs in Company: Is FD {ssn, fname} salary acceptable in BCNF?

10/17/2011

Database Systems

13

BCNF
To accept FD like {ssn, fname} salary (which are basically reducible FDs), the definition uses the term super-key; if input set of FD is in canonical form usage of term key is also fine. Therefore, informally you can that a relation is in BCNF, if for every FD that holds on R, determinant is always a key.

10/17/2011

Database Systems

14

Relations in TGMC
Member(MembID, MembName, MembEmail, TeamID) Team(TeamID, TeamPWD, MentorID) Mentor(MentorID, MentorName, Email, InstID) Institute(InstID, InstName, City, PIN, State)

Check if these relations are in BCNF (a desirable form) ?

10/17/2011

Database Systems

15

FDs in TGMC
MembID -> MembName, MembEmail, TeamID, TeamPWD, MentorID, MentorName, Email, InstID, InstName, City, PIN, State TeamID -> TeamPWD, MentorID, MentorName, Email, InstID, InstName, City, PIN, State MentorID -> MentorName, Email, InstID InstID -> InstName, City, PIN, State PIN -> City, State
10/17/2011 Database Systems 16

Relations in TGMC
Sometimes for some reason, if InstID is placed in Team relation too.
Member(MembID, MembName, MembEmail, TeamID) Team(TeamID, TeamPWD, MentorID, InstID) Mentor(MentorID, MentorName, Email, InstID) Institute(InstID, InstName, City, PIN, State)

Now, relation Team in not in BCNF. Why ?

10/17/2011

Database Systems

17

3NF
3NF is less restrictive that BCNF, it relaxes BCNF condition for prime attributes (attribute that are part of some candidate key) A relation is in 3NF, if, for every FD A holds on relation R, A is its super key, or B is a prime attribute. B that

10/17/2011

Database Systems

18

BCNF and 3NF


Suppose we have WORKS_ON as following WORKS_ON(ESSN, PNo, PName*, Hours) FDs (suppose):
{ESSN, PNo} --> Hours {ESSN, PName} --> Hours PNO --> PName

Keys: {ESSN, PNO}, and {ESSN, PName} *Sometimes students find their own reasons and allow redundancies like this
10/17/2011 Database Systems 19

BCNF and 3NF


Below is typical situation where a relation is in 3NF but not in BCNF. A relation R(A,B,C} having following FDs: {A,B} C, C A; and {A,B} as key.

10/17/2011

Database Systems

20

2NF
Consider our EMP-DEP relation (and FDs in Company Database), that had lot of redundanciesEMP_DEP(ssn, fname, salary, superssn, dno, dname, mgrssn, mgrstartdate)

Concerned FDs:
ssn -> fname, salary, superssn, dno, dname, mgrssn, mgrstartdate dno -> dname, mgrssn, mgrstartdate

The relation EMP-DEP is neither in BCNF and nor in 3NF?


10/17/2011 Database Systems 21

2NF
And culprit FDs are following:
dno -> dname, mgrssn, mgrstartdate

In 2NF, we permit FD X B; when we also have a FD A X, and A is super-key: That means we have FD A B transitively inferred from FDs A X and X B) B is still determined by A (super key) Example: dno --> mgrssn is acceptable FD in 2NF, because you also have FD ssn dno and ssn is Key
10/17/2011 Database Systems 22

Decomposition of EMP-DEP
Decomposition strategy based on Transitivity FDs Identifying X (transitivity pivot) in R, create another R2 relation that has X and all attributed determined by X; and X become key of R2 Remaining attributes of R are put in R1, and X becomes FK in R1 referring to R2

10/17/2011

Database Systems

23

Definition of 2NF

A relation R is in 2NF, if every non-prime attribute is irreducibly dependent (or determined by) on key (we do not say super key) Whether, you have direct dependency or inferred through Transitivity

10/17/2011

Database Systems

24

Summary
1NF: attributes have atomic values or have no repeating groups. BCNF: Only acceptable FDs, where super key is determinant 3NF: Relaxes BCNF. FD X Y is acceptable; either X is super-key or Y is prime attribute 2NF: non-prime attributes are irreducibly dependent on key
10/17/2011 Database Systems 25

What normal form following relation W is? W(ssn, pno, pname, hours) Concerned FDs:
pno -> pname {ssn, pno} -> hours

Start with BCNF till you reach to a form that accepts given FDs
10/17/2011 Database Systems 26

Exercises

Which Normal Form the relation is?

10/17/2011

Database Systems

27

Which NF now relations are?

In every decomposed relation, all non-prime attributes are fully functionally dependent on PK?

10/17/2011

Database Systems

28

Example-2
Consider relation (S#, SName, P#, QTY), and SName is unique. Candidates keys are {S#, P#} and {SNAME, P#}. Following FDs exist1. {S#,P#} QTY 2. {SNAME, P#} QTY 3. S# SNAME 4. SNAME S# In which normal form the relation is? 2NF: Yes/No? YES 3NF: Yes/No? YES BCNF: Yes/No?
No, Because of FDs 3 and 4
10/17/2011 Database Systems 29

Example-2 contd..
Obviously there are redundancy and anomalies in relation (S#, SName, P#, QTY), to make it in BCNF, it can be decomposed to 1. (S#, SNAME), where both are candidate keys 2. (S#, P#, QTY), where (S#, P#) is candidate key OR 1. (S#, SNAME), where both are candidate keys 2. (SNAME, P#, QTY), where (SNAME, P#) is candidate key

10/17/2011

Database Systems

30

Which Normal Form following relation is?

Book(ISBN, Title, Author, Publisher, Price, AccessonNo);

10/17/2011

Database Systems

31

Which Normal Form following relation is?

Member(ID, Name, Type, NoOfBooks, IssueDuration);

10/17/2011

Database Systems

32

IssueLog( IssueDate, MemberID, AccessonNo, DueDate, ReturnDate)

10/17/2011

Database Systems

33

How would you normalize following relation R


R (StudentID, SPI, CPI_UptoDate, CPI_UptoASem, AcadYear, Semester, ProgramCode, CourseNo, Grade)

10/17/2011

Database Systems

34

References

Elmasri/Navathe 5th ed; Relational Theory for SQL by CJ Date

10/17/2011

Database Systems

35

Вам также может понравиться