Академический Документы
Профессиональный Документы
Культура Документы
UNIT-4
SCHEMA REFINEMENT AND NORMALIZATION
Database Design is a collection of processes that facilitate the designing, development,
implementation and maintenance of enterprise data management systems
Data Base Design involves
1. Requirement Analysis: Discover requirements from the users and analyze
them.
2. Conceptual Design: To produce a conceptual data model like ER Diagrams
that fulfils the requirements.
3. Logical Design: Use Data Model like Relational, Network, Hierarchical etc. for
the Data Base.
4. Schema Refinement: Schema Refinement is the process that redefines the
schema of a relation.
5. Physical Design: Physical design translate the schemas into actual database
structures.
REDUNDANCY:
Redundancy means having multiple copies of same data in the database.
Redundancy can be either at table level or at the fields level.
Example:
Sid Sname Cid Cname Fid Fname Fee
S1 A C1 JAVA F1 X 5K
S2 B C2 DBMS F2 Y 7K
S3 C C1 JAVA F1 X 5K
S4 B C1 JAVA F1 X 5K
In the above example, row1, row3 and row4 are storing the redundant information for
the fields Cname, Fid, Fname, Fee.
Disadvantages Of Data Redundancy
o Increases the size of the database unnecessarily.
o Causes data inconsistency.
o Decreases efficiency of database.
o May cause data corruption.
ANOMALIES:
Anomalies are problems that can occur in poorly planned, un-normalised databases
where all the data is stored in one table.
The Anomalies are basically of 3 types:
Insertion anomaly, Deletion anomaly, and Updation anomaly
1
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |1
DBMS UNIT-4 Schema Refinement
Update Anomaly: An update anomaly is a data inconsistency that results from data
redundancy and a partial update.
Ex: In the example table of redundancy,
If the fee of the course with Cid=C1 need to be updated to 7K, then 3 rows need to be
updated. If any one of the rows is not updated, then it leads to inconsistency in the data
base.
Deletion Anomaly: A deletion anomaly is the unintended loss of data due to deletion
of other data.
Ex: In the example table of redundancy,
If a student registered for the course C2 is completed, then the total row of S2 need to
be deleted. Then the data base may lose the details about Course C2
Insertion Anomaly: An insertion anomaly is the inability to add data to the database
due to absence of other data.
Ex: In the example table of redundancy,
If a new course to be added into the data base with CID=C3, then the fields SID and
SNAME must be filled with NULL.
FUNCTIONAL DEPENDENCIES:
Functional dependency is a relationship that exists when one attribute uniquely
determines another attribute.
Definition: If R is a relation with attributes X and Y, a functional dependency between
the attributes is represented as X->Y which specifies Y is functionally dependent on X.
Here X is a determinant set and Y is a dependent attribute.
Each value of X is associated with precisely one Y value.
Functional dependency in a database serves as a constraint between two sets of
attributes.
Defining functional dependency is an important part of relational database design and
contributes to aspect normalization.
Example:
2
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |2
DBMS UNIT-4 Schema Refinement
ARMSTRONG'S AXIOMS:
Armstrong's Axioms is a set of rules.
It provides a simple technique for reasoning about functional dependencies.
It was developed by William W. Armstrong in 1974.
It is used to infer all the functional dependencies on a relational database.
Rule 1 Reflexivity
If A is a set of attributes and B is a subset of A, then A holds B. { A → B }
Rule 2 Augmentation
If A hold B and C is a set of attributes, then AC holds BC. {AC → BC}
It means that attribute in dependencies does not change the basic dependencies.
Rule 3 Transitivity
If A holds B and B holds C, then A holds C.
If {A → B} and {B → C}, then {A → C}
A holds B {A → B} means that A functionally determines B.
Rule 4 Union
If A holds B and A holds C, then A holds BC.
If{A → B} and {A → C}, then {A → BC}
Rule 5 Decomposition
If A holds BC and A holds B, then A holds C.
If{A → BC} and {A → B}, then {A → C}
3
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |3
DBMS UNIT-4 Schema Refinement
ATTRIBUTE CLOSURE:
Attribute closure of an attribute set can be defined as set of attributes which can be
functionally determined from it.
Procedure to Find Attribute Closure:
To find attribute closure of an attribute set:
o Add elements of attribute set to the result set.
o Recursively add elements to the result set which can be functionally determined
from the elements of the result set.
Example:
R(ABCDE) = {AB->C, B->D, C->E, D->A} To find (B)+ ,we will add attribute in set
using various FD which has been shown in table below.
{B} Triviality
{B,D} B->D
{B,D,A} D->A
{B,D,A,C} AB->C
{B,D,A,C,E} C->E
(B)+= {B,D,A,C,E}
DATABASE KEYS:
Keys are used to establish and identify relationships between tables.
Keys are used to uniquely identify any record or row of data inside a table.
A Key can be a single attribute or a group of attributes, where the combination may act
as a key.
Types of Keys:
o Super Key
Super Key is defined as a set of attributes within a table that can uniquely
identify each record within a table.
Super Key is a superset of Candidate key.
o Candidate Key
Candidate keys are defined as the minimal set of fields which can
uniquely identify each record in a table.
There can be more than one candidate key.
A candidate key can be a combination of more than one column
(attributes).
o Primary Key
Primary key is a candidate key that is most appropriate to become the main key
for any table. It is a key that can uniquely identify each record in a table.
4
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |4
DBMS UNIT-4 Schema Refinement
Example:
EMPLOYEE
E-ID E-NAME E-CITY E-STATE
Candidate Key: Minimal set of attributes whose attribute closure is set of all
attributes of relation is called candidate key of relation.
So E-ID will be candidate key for the Employee relation.
PRIME ATTRIBUTES:
Attributes that form a candidate key of a relation, i.e. attributes of candidate key, are
called Prime attributes.
Prime Attributes also referred as Key attributes
Ex: E-ID is the prime attribute of Employee relation
NON-PRIME ATTRIBUTES:
Attributes other that prime attributes are called Non-Prime Attributes of the relation.
5
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |5
DBMS UNIT-4 Schema Refinement
NORMALIZATION:
Normalization is the process of arranging data in the data base to minimize redundancy,
update anomalies, deletion anomalies and insertion anomalies.
In Table 1:
2 have opted for more than 1 subject. And we have stored the subject names in a
single column. But as per the 1st Normal form each column must contain atomic
value.
Converting a table to First Normal Form:
An unnormalized table can be converted to First Normal Form by introducing new
rows or columns for the multiple value columns
Table 2 is an example for table in the First Normal Form
1NF is most used for Small and Medium size applications
6
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |6
DBMS UNIT-4 Schema Refinement
primary key for this table is a composition of two columns which is student_id & subject_id
but the teacher's name only depends on subject, hence the subject_id, and has nothing to do
with student_id.
This is Partial Dependency, where an attribute in a table depends on only a part of the
primary key and not on the whole key.
7
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |7
DBMS UNIT-4 Schema Refinement
Step 2: Remove the right-hand side attribute of the partial dependency from the relation
that is being decomposed.
In the relation Flight_Schedule (Flight_ID, Flight_Day, Pilot, Boarding_Gate), the
attribute Boarding_Gate should be removed as per this condition.
The reason is, Boarding_Gate is the right hand side (RHS) attribute of the partial
dependency, Flight_ID → Boarding_Gate.
Hence,
Flight_Schedule (Flight_ID, Flight_Day, Pilot).
Example 2:
F = {AB → C, B → D, E → D}
The key for this relation is ABE. Then, all three given FDs are partial dependencies, viz.,
AB → C, B → D, and E → D.
Step 1: separate tables for partial dependencies; hence, R1 (ABC), R2 (BD) and R3 (ED).
Step 2: remove RHS of these two partial FDs from R; hence, R4(A, B, E).
8
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |8
DBMS UNIT-4 Schema Refinement
Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive
dependency.
A relation is in 3NF iff at least one of the following condition holds in every non-trivial function
dependency X –> Y
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some candidate key).
Example:
Step1: Identify any dependencies between non-key attributes within each table
Step2: Remove them to form a new table
Step3: Promote one of the attributes to be the key of the new table
9
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET Page |9
DBMS UNIT-4 Schema Refinement
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
Example:
EMP_DEPT table
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
10
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET P a g e | 10
DBMS UNIT-4 Schema Refinement
Example:
STUDENT
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
11
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET P a g e | 11
DBMS UNIT-4 Schema Refinement
DECOMPOSITION OF RELATIONS
Decomposition: The process of breaking up or dividing a single relation into two or more
sub relations is called as decomposition of a relation.
Decomposition helps in eliminating some of the problems of bad design such as
redundancy, inconsistencies and anomalies.
Properties of Decomposition:
The following two properties must be followed when decomposing a given relation-
12
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET P a g e | 12
DBMS UNIT-4 Schema Refinement
2. Dependency Preservation-
R1 ∩ R1 -> R1, that is: all attributes common to both R1 and R2 functionally determine
ALL the attributes in R1.
R1 ∩ R2 -> R2, that is: all attributes common to both R1 and R2 functionally determine
ALL the attributes in R2
Example:
R = (A, B, C, D, E). We decompose it into R1 = (A, B, C), R2 = (A, D, E). The set of
Let a relation R(A,B,C,D) and set a FDs F = { A -> B , A -> C , C -> D} are given.
so, F1 U F2 = F.
13
Prepared by T V N Sudheer, Assistant Professor, CSE, QISCET P a g e | 13