Вы находитесь на странице: 1из 13

DBMS UNIT-4 Schema Refinement

UNIT-4
SCHEMA REFINEMENT AND NORMALIZATION
 Database Design is a collection of processes that facilitate the designing, development,
implementation and maintenance of enterprise data management systems
 Data Base Design involves
1. Requirement Analysis: Discover requirements from the users and analyze
them.
2. Conceptual Design: To produce a conceptual data model like ER Diagrams
that fulfils the requirements.
3. Logical Design: Use Data Model like Relational, Network, Hierarchical etc. for
the Data Base.
4. Schema Refinement: Schema Refinement is the process that redefines the
schema of a relation.
5. Physical Design: Physical design translate the schemas into actual database
structures.

REDUNDANCY:
 Redundancy means having multiple copies of same data in the database.
 Redundancy can be either at table level or at the fields level.
Example:
Sid Sname Cid Cname Fid Fname Fee

S1 A C1 JAVA F1 X 5K

S2 B C2 DBMS F2 Y 7K

S3 C C1 JAVA F1 X 5K

S4 B C1 JAVA F1 X 5K

In the above example, row1, row3 and row4 are storing the redundant information for
the fields Cname, Fid, Fname, Fee.
 Disadvantages Of Data Redundancy
o Increases the size of the database unnecessarily.
o Causes data inconsistency.
o Decreases efficiency of database.
o May cause data corruption.

ANOMALIES:
 Anomalies are problems that can occur in poorly planned, un-normalised databases
where all the data is stored in one table.
 The Anomalies are basically of 3 types:
Insertion anomaly, Deletion anomaly, and Updation anomaly

1
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |1
DBMS UNIT-4 Schema Refinement

 Update Anomaly: An update anomaly is a data inconsistency that results from data
redundancy and a partial update.
Ex: In the example table of redundancy,
If the fee of the course with Cid=C1 need to be updated to 7K, then 3 rows need to be
updated. If any one of the rows is not updated, then it leads to inconsistency in the data
base.
 Deletion Anomaly: A deletion anomaly is the unintended loss of data due to deletion
of other data.
Ex: In the example table of redundancy,
If a student registered for the course C2 is completed, then the total row of S2 need to
be deleted. Then the data base may lose the details about Course C2
 Insertion Anomaly: An insertion anomaly is the inability to add data to the database
due to absence of other data.
Ex: In the example table of redundancy,
If a new course to be added into the data base with CID=C3, then the fields SID and
SNAME must be filled with NULL.

Anomalies are avoided by the process of normalization.

FUNCTIONAL DEPENDENCIES:
 Functional dependency is a relationship that exists when one attribute uniquely
determines another attribute.
 Definition: If R is a relation with attributes X and Y, a functional dependency between
the attributes is represented as X->Y which specifies Y is functionally dependent on X.
Here X is a determinant set and Y is a dependent attribute.
 Each value of X is associated with precisely one Y value.
 Functional dependency in a database serves as a constraint between two sets of
attributes.
 Defining functional dependency is an important part of relational database design and
contributes to aspect normalization.
Example:

For the above relation,


STUD_NO->STUD_NAME, STUD_NO->STUD_ADDR hold
STUD_NO is determining STUD_NAME
STUD_NO is determining STUD_ADDR

2
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |2
DBMS UNIT-4 Schema Refinement

 Functional Dependency Set: Functional Dependency set or FD set of a relation is


the set of all FDs present in the relation.
Ex: For the Student relation,
{ STUD_NO->STUD_NAME, STUD_NO->STUD_PHONE, STUD_NO-
>STUD_STATE, STUD_NO->STUD_COUNTRY,
STUD_NO -> STUD_AGE, STUD_STATE->STUD_COUNTRY}

ARMSTRONG'S AXIOMS:
 Armstrong's Axioms is a set of rules.
 It provides a simple technique for reasoning about functional dependencies.
 It was developed by William W. Armstrong in 1974.
 It is used to infer all the functional dependencies on a relational database.

Rule 1 Reflexivity
If A is a set of attributes and B is a subset of A, then A holds B. { A → B }
Rule 2 Augmentation
If A hold B and C is a set of attributes, then AC holds BC. {AC → BC}
It means that attribute in dependencies does not change the basic dependencies.
Rule 3 Transitivity
If A holds B and B holds C, then A holds C.
If {A → B} and {B → C}, then {A → C}
A holds B {A → B} means that A functionally determines B.
Rule 4 Union
If A holds B and A holds C, then A holds BC.
If{A → B} and {A → C}, then {A → BC}

Rule 5 Decomposition
If A holds BC and A holds B, then A holds C.
If{A → BC} and {A → B}, then {A → C}

Rule 6 Pseudo Transitivity


If A holds B and BC holds D, then AC holds D.
If{A → B} and {BC → D}, then {AC → D}

 Trivial Functional Dependency: If A holds B {A → B}, where A is a subset of B,


then it is called a Trivial Functional Dependency.
 Non-Trivial Functional Dependency: If A holds B {A → B}, where B is not a
subset A, then it is called as a Non-Trivial Functional Dependency.

3
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |3
DBMS UNIT-4 Schema Refinement

ATTRIBUTE CLOSURE:
 Attribute closure of an attribute set can be defined as set of attributes which can be
functionally determined from it.
 Procedure to Find Attribute Closure:
To find attribute closure of an attribute set:
o Add elements of attribute set to the result set.
o Recursively add elements to the result set which can be functionally determined
from the elements of the result set.

Example:

R(ABCDE) = {AB->C, B->D, C->E, D->A} To find (B)+ ,we will add attribute in set
using various FD which has been shown in table below.

Attributes Added in Closure Functional Dependency used

{B} Triviality
{B,D} B->D
{B,D,A} D->A
{B,D,A,C} AB->C
{B,D,A,C,E} C->E

(B)+= {B,D,A,C,E}

DATABASE KEYS:
 Keys are used to establish and identify relationships between tables.
 Keys are used to uniquely identify any record or row of data inside a table.
 A Key can be a single attribute or a group of attributes, where the combination may act
as a key.
 Types of Keys:
o Super Key
 Super Key is defined as a set of attributes within a table that can uniquely
identify each record within a table.
 Super Key is a superset of Candidate key.
o Candidate Key
 Candidate keys are defined as the minimal set of fields which can
uniquely identify each record in a table.
 There can be more than one candidate key.
 A candidate key can be a combination of more than one column
(attributes).
o Primary Key
Primary key is a candidate key that is most appropriate to become the main key
for any table. It is a key that can uniquely identify each record in a table.

4
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |4
DBMS UNIT-4 Schema Refinement

CANDIDATE KEYS AND SUPER KEYS USING ATTRIBUTE CLOSURE


 If attribute closure of an attribute set contains all attributes of relation, the attribute set
will be super key of the relation.
 If no subset of this attribute set can functionally determine all attributes of the relation,
the set will be candidate key as well.

Example:

EMPLOYEE
E-ID E-NAME E-CITY E-STATE

E001 John Delhi Delhi

E002 Mary Delhi Delhi

E003 John Noida U.P.

The FD set for EMPLOYEE relation given are:


{E-ID->E-NAME, E-ID->E-CITY, E-ID->E-STATE, E-CITY->E-STATE}

Calculate attribute closure of different set of attributes:


(E-ID)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-NAME)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-CITY)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-STATE)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-CITY,E-STATE)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-NAME)+ = {E-NAME}
(E-CITY)+ = {E-CITY,E-STATE}

Super Keys: {E-ID}, {E-ID, E-NAME}, {E-ID, E-CITY}, {E-ID, E-STATE},


{E-ID, E-CITY, E-STATE}

Candidate Key: Minimal set of attributes whose attribute closure is set of all
attributes of relation is called candidate key of relation.
So E-ID will be candidate key for the Employee relation.

PRIME ATTRIBUTES:

 Attributes that form a candidate key of a relation, i.e. attributes of candidate key, are
called Prime attributes.
 Prime Attributes also referred as Key attributes
Ex: E-ID is the prime attribute of Employee relation
NON-PRIME ATTRIBUTES:

 Attributes other that prime attributes are called Non-Prime Attributes of the relation.

Ex: E-NAME,E-CITY,E-STATE are non-prime attributes.

5
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |5
DBMS UNIT-4 Schema Refinement

NORMALIZATION:
Normalization is the process of arranging data in the data base to minimize redundancy,
update anomalies, deletion anomalies and insertion anomalies.

 The objective of normalization:


“to create relations where every dependency is on the key, the whole key, and
nothing but the key”.
Normalization rules are divided into the following normal forms:

1. First Normal Form (1NF)


2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce-Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)

1. FIRST NORMAL FORM (1NF):


 A table is in the first normal form iff the domain of each attribute contains only
atomic values, and
 The value of each attribute contains only a single value from that domain.
 For a table to be in the First Normal Form, it should follow the following 4 rules:
o It should only have single(atomic) valued attributes/columns.
o Values stored in a column should be of the same domain
o All the columns in a table should have unique names.
o And the order in which data is stored, does not matter.
Example:

ROLL_NO NAME SUBJECT ROLL_NO NAME SUBJECT


101 Akon OS, CN 101 Akon OS
103 Ckon Java 101 Akon CN
102 Bkon C, C++ 103 Ckon Java
Table 1: UNNORMALIZED
102 Bkon C
102 Bkon C++
Table 2: FIRST NORMAL FORM

 In Table 1:
2 have opted for more than 1 subject. And we have stored the subject names in a
single column. But as per the 1st Normal form each column must contain atomic
value.
 Converting a table to First Normal Form:
An unnormalized table can be converted to First Normal Form by introducing new
rows or columns for the multiple value columns
Table 2 is an example for table in the First Normal Form
 1NF is most used for Small and Medium size applications

6
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |6
DBMS UNIT-4 Schema Refinement

2. SECOND NORMAL FORM (2NF):

For a table to be in the Second Normal Form (2NF), iff


 It is in 1NF and
 There should be no Partial Dependency.

Partial Dependency: If proper subset of candidate key determines non-prime attribute, it is


called partial dependency.

Example: SCORE Table

SCORE_ID STUDENT_ID SUBJECT_ID MARKS TEACHER


1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher

student_id + subject_id forms a Candidate Key

primary key for this table is a composition of two columns which is student_id & subject_id
but the teacher's name only depends on subject, hence the subject_id, and has nothing to do
with student_id.

This is Partial Dependency, where an attribute in a table depends on only a part of the
primary key and not on the whole key.

Steps to decompose a non-2NF relation to a 2NF relation

Step 1: Create a separate relation for each partial dependency


Step 2: Remove the right-hand side attribute of the partial dependency from the relation
that is being decomposed.
Example 1:

FLIGHT_ID FLIGHT_DAY PILOT BOARDING_GATE


IC123 Monday Kesav 2
IC123 Tuesday Mark 2
IC217 Wednesday Kesav 3
IA156 Monday Steve 1

For this Flight_Schedule table,


F = { Flight_ID Flight_Day → Pilot Boarding_Gate,
Flight_ID → Boarding_Gate }
This table is in 1NF, but not in 2NF because of the FD
Flight_ID → Boarding_Gate.
In our example,
the key is (Flight_ID, Flight_Day).

7
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |7
DBMS UNIT-4 Schema Refinement

Conversion of Relation to 2NF:

Step 1: Create a separate relation for each partial dependency.


In our example, Flight_ID → Boarding_Gate is the partial dependency.
Hence, we need to create a separate relation for this FD. Let us name this relation as
Boarding.
Boarding ( Flight_ID, Boarding_Gate)

Step 2: Remove the right-hand side attribute of the partial dependency from the relation
that is being decomposed.
In the relation Flight_Schedule (Flight_ID, Flight_Day, Pilot, Boarding_Gate), the
attribute Boarding_Gate should be removed as per this condition.
The reason is, Boarding_Gate is the right hand side (RHS) attribute of the partial
dependency, Flight_ID → Boarding_Gate.
Hence,
Flight_Schedule (Flight_ID, Flight_Day, Pilot).

Thus, Flight_Schedule (Flight_ID, Flight_Day, Pilot, Boarding_Gate) is decomposed


into
Flight_Schedule (Flight_ID, Flight_Day, Pilot) and
Boarding ( Flight_ID, Boarding_Gate).

FLIGHT_ID FLIGHT_DAY PILOT FLIGHT_ID BOARDING_GATE


IC123 Monday Kesav IC123 2
IC123 Tuesday Mark IC217 3
IC217 Wednesday Kesav IA156 1
IA156 Monday Steve
Flight_Schedule Table Boarding Table

Example 2:

Assume a relation R (A, B, C, D, E) with the following set of functional dependencies;

F = {AB → C, B → D, E → D}

The key for this relation is ABE. Then, all three given FDs are partial dependencies, viz.,

AB → C, B → D, and E → D.

Step 1: separate tables for partial dependencies; hence, R1 (ABC), R2 (BD) and R3 (ED).

Step 2: remove RHS of these two partial FDs from R; hence, R4(A, B, E).

Thus, we have four tables R1 (ABC), R2 (BD), R3 (ED) and R4 (ABE).

8
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |8
DBMS UNIT-4 Schema Refinement

3. THIRD NORMAL FORM (3NF):

For a table to be in the Third Normal Form (3NF), iff


 It is in 2NF and
 There should be no Transitive Dependency.

Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive
dependency.
A relation is in 3NF iff at least one of the following condition holds in every non-trivial function
dependency X –> Y
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some candidate key).
Example:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY


222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super


key (EMP_ID). It violates the rule of third normal form.

Steps to decompose a non-3NF relation to a 3NF relation

Step1: Identify any dependencies between non-key attributes within each table
Step2: Remove them to form a new table
Step3: Promote one of the attributes to be the key of the new table

Example: Consider the Employee table which is not in 3NF


Move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with
EMP_ZIP as a Primary key
EMPLOYEE table EMPLOYEE_ZIP table
EMP_ID EMP_NAME EMP_ZIP EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 201010 UP Noida
333 Stephan 02228 02228 US Boston
444 Lan 60007 60007 US Chicago
555 Katharine 06389 06389 UK Norwich
666 John 462007 462007 MP Bhopal

9
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |9
DBMS UNIT-4 Schema Refinement

4. BOYCE-CODD NORMAL FORM:


For a table to be in the Boyce-Codd Normal Form (BCNF), iff
 It should be in the Third Normal Form (3NF).
 And, for any dependency X → Y, X should be a super key.
Example: EMPLOYEE table
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
In the above table Functional dependencies are as follows:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

Steps to decompose a non-BCNF relation to a BCNF relation

1. Find a nontrivial functional dependency X → Y which violates the BCNF condition


(where the X is not a superkey)
2. Split your table in two tables:
o one with attributes XY (all attributes from the dependency),
o one with X attributes together with the remaining attributes from the original relation

Example:

EMP_COUNTRY table EMP_DEPT_MAPPING table


EMP_ID EMP_COUNTRY EMP_ID EMP_DEPT
264 India D394 283
264 India D394 300
D283 232
D283 549

EMP_DEPT table
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

10
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET P a g e | 10
DBMS UNIT-4 Schema Refinement

5. FOURTH NORMAL FORM (4NF)

For a table to be in the Fourth Normal Form (4NF), iff


 It should be in the Boyce-Codd Normal Form (BCNF).
 There should be no Multivalued Dependency.
Multivalued Dependency: For a dependency X → Y, if for a single value of X, multiple
values of Y exists, then the relation will be a multi-valued dependency.

Example:

STUDENT

STU_ID COURSE HOBBY


21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two


courses, Computer and Math and two hobbies, Dancing and Singing.

So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of


data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics

STUDENT_HOBBY

STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey

11
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET P a g e | 11
DBMS UNIT-4 Schema Refinement

COMPARISION OF NORMAL FORMS:

1NF eliminates the Multiple values in each domain


2NF eliminates Partial Dependencies
3NF eliminates Transitive Dependencies
4NF eliminates Multivalued Dependencies

DECOMPOSITION OF RELATIONS
Decomposition: The process of breaking up or dividing a single relation into two or more
sub relations is called as decomposition of a relation.
 Decomposition helps in eliminating some of the problems of bad design such as
redundancy, inconsistencies and anomalies.
Properties of Decomposition:
The following two properties must be followed when decomposing a given relation-

1. Lossless Join Decomposition-


Lossless decomposition ensures-
 No information is lost from the original relation during decomposition.
 When the sub relations are joined back, the same relation is obtained that was
decomposed.
 Every decomposition must always be lossless.

12
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET P a g e | 12
DBMS UNIT-4 Schema Refinement

2. Dependency Preservation-

Dependency preservation ensures


 None of the functional dependencies that holds on the original relation are lost.
 The sub relations still hold or satisfy the functional dependencies of the original
relation.
LOSSLESS JOIN DECOMPOSITION:

A decomposition {R1, R2,…, Rn} of a relation R is called a lossless decomposition for R if


the natural join of R1, R2,…, Rn produces exactly the relation R.

 R1 ∩ R1 -> R1, that is: all attributes common to both R1 and R2 functionally determine
ALL the attributes in R1.
 R1 ∩ R2 -> R2, that is: all attributes common to both R1 and R2 functionally determine
ALL the attributes in R2

Example:

R = (A, B, C, D, E). We decompose it into R1 = (A, B, C), R2 = (A, D, E). The set of

functional dependencies is: A → BC, CD → E, B → D, E → A.

R1 ∩ R2 = A; (A → BC) -> (A → ABC) -> (R1 ∩ R2 → R1)

this is a lossless-join decomposition.

DEPENDENCY PRESERVING DECOMPOSITION

A Decomposition D = { R1, R2, R3….Rn } of R is dependency preserving wrt a set F of


Functional dependency if
(F1 ∪ F2 ∪ … ∪ Fm)+ = F+.
 Consider a relation R is decomposed or divided into R1 with FD { f1 } and R2 with {
f2 }, then there can be three cases:
F1 U F2 = F -----> Decomposition is dependency preserving.
F1 U F2 is a subset of F -----> Not Dependency preserving.
F1 U F2 is a super set of F -----> This case is not possible.
Example:

Let a relation R(A,B,C,D) and set a FDs F = { A -> B , A -> C , C -> D} are given.

A relation R is decomposed into -

R1 = (A, B, C) with FDs F1 = {A -> B, A -> C}, and

R2 = (C, D) with FDs F2 = {C -> D}.

F1 ∪ F2 = {A -> B, A -> C, C -> D}

so, F1 U F2 = F.

So, the decomposition is Dependency preserving.

13
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET P a g e | 13

Вам также может понравиться