Вы находитесь на странице: 1из 80

The Evolution of DBMS

Data Redundancy
Risk to data integrity Data isolation Difficult access to data Unsatisfactory security measure Concurrent access Major Disadvantages when different departments have large data to share

Concept of a Database
A database is a collection of interrelated data from which some information can be extracted. The collection of data must be logically coherent with some inherent meaning.
A database is designed and built for specific purpose, keeping in mind the needs of the applications that are going to use it and the end users of those applications. It is managed by a software package known as a Database Management System (DBMS).

Database Models
Hierarchical

Network
Relational

Hierarchical Model
This model is like a hierarchical tree structure, used to construct a hierarchy of Records In the form of nodes and branches. The data elements present in the structure have a Parent-Child relationship. Closely related information in the parent-child structure is stored together as a logical unit. A parent unit may have many child units, but a child is restricted to have only one parent. This leads to the repetition of same child record for different parents. Each supplier occurrence includes the corresponding shipments quantity.
P02 NUT 16 Chenn P01 Screw 8 Cal

S01 S02

Ram
Shyam 20

15

Cal Chenn

S01 S02 S03

Ram
Shyam 20 25

15

Cal Chenn

Shiva

Hyd

Drawbacks of Hierarchical Model


Is not flexible enough to represent all the relationship proportions which occur in the real world. It cannot represent the Many-to-Many relationship. Used only when the concerned data has a clearly hierarchical character with a single root, for example the DOS directory structure.

Drawbacks of Hierarchical Model


The hierarchical database approach is relatively fast -

as long as you only want to access the data from the


top. The most serious of the problems related to data storage is the difficulty of searching for items in the bottom or middle of the hierarchy. For ex, to find all of the customers who ordered a specific item, the database would have to inspect each customer, every order and each item.

Network Model
It is like Hierarchical model but here, multiple parent-child relationships are used. Rapid and easy access to data is possible in this model to multiple access paths to the data elements.

S01

Ram

15

Cal

S02

Shyam

20

Chenn

150

200

250

100

P01

Screw

Cal

P02

NUT

16

Chenn

Network Model Description


A connector occurrence, represents the association (shipment) between one supplier and one part, and contains data describing the association. All connector occurrences for a given supplier or part are placed on a chain starting and returning to that supplier or part. Each connector thus has exactly two chains, one supplier and one part chain. For example supplier S02 supplies 250 of part P02 and 100 of part P01. Transaction is maintained using pointers and tracing the pointers is the drawback of this design.

Relational Model
Supplier Parts

Supplier Code S01 S02 S03

Supplier Supplier Name Status City Supplier Ram 15 Calcutta Shyam 20 Chennai Amit 25 Hyderabad
Shipment

Pcode PartName P01 Screw P02 Nut P03 Bolt P04 Nut
Qty 150 200 250 100 300

weight 8 16 20 16

City Calcutta Chennai Hyderabad Mumbai

Scode S01 S01 S02 S02 S03

Pcode P01 P02 P03 P04 P01

Relational Model
Data is organized in terms of rows and columns in a table know as relations

The position of a row in a table is of no importance


The intersection of a row and column must give single value and not set of values.

All values appearing in the columns are derived from underlying domain (A set of Values)
Rows must be unique.

All columns values are atomic.


In relational database, there are no hard-coded relationships defined between tables. A relationship can be specified at any time using any column name.

Properties of a Table
No duplicate row/column No significance of row/column order

Values should be atomic

Relational Operators
Restriction Projection Product Join Union Intersect Difference

Decomposition
Break a Relation into two or more Relations Ex: R(Rollno,Name,CourseNo,Grade) R1(Rollno, Name) R2(Rollno,CourseNo,Grade)
All attributes are preserved Is the information preserved

Loss Less Join Decomposition


A Decomposition is said to be Loss Less Join Decomposition, if the original relation can be got back by joining the parts.
i.e. if R is decomposed into R1 & R2 If R = R1 join R2 then {R1,R2} Is a loss less join decomposition on R.

Loss Less Join


Consider a schema Roll No Name Courseno 1 A CS1 2 B CS2 1 A CS2 Decomposed into A 1 CS1 B 2 CS2 A 1 CS2 Then we have loss less join

Grade A B B
A B B

Loss Less Join


But if it is broken into RollNO Name 1 A 2 B 1 A AND Rollno CourseNo 1 CS1 2 CS2 1 CS2

Grade A B B

Loss Less Join


We have a lossy join decomposition Output of join is the following table Rollno Name Course Grade 1 A CS1 A 2 B CS2 B 1 A CS2 B 1 A CS2 A 1 A CS1 B The join is non less join since there are extra tuples There is a loss of information

Relational Model
User Level External View

Conceptual View

Administrator Level

Internal View

Hardware Level

Properties of Relational Database


Represents data in the form of tables. Does not hard-code relationships between tables. Does not require the user to understand its physical implementation Provides information about its content and structure in the form of tables (system table). Supports the concept of NULL values. Null values are supported for the representation of missing and inapplicable information. It is independent of the data types ( for ex a null values in char field mean the same as integer field)

Database Design
Requirement Formulation and Analysis Collection and Documentation of Requirement Analysis of Requirement

Conceptual Design
Data Modeling

First Level
Second Level

E-R Modeling
Normalization

Database Design
Requirement Formulation and Analysis

What are user-views of the data (present and future) ?


What data elements (or attributes) are required in these user-views ? What are the entities and their primary keys? What are the operational requirements regarding Security, Integrity and Response Time?

Conceptual Design
To identify the entities and relationships that
effect the organizations data, naturally. The objective of this step is to specify the conceptual structure of the data and is often referred to as data modeling.

Data Modeling

Describes relationship between data objects First Level Entity Relationship Normalization Second Level -

Entity Relationship Diagram


Is a technique for analysis and logical modeling of a systems data requirements. It uses three basic concepts, entities, their attributes and the relationships that exists between the entities. It uses graphical notations for representing these.

Entity Relationship Diagram


Entity
NAME Data object in the system Uniquely identifiable by the identifier. Has attributes that describe it.

Attribute

Describes an Entity

Relationship

Relation

Relates two entities Uniquely identified by the identifier.

Entity
Entity
NAME Data object in the system Uniquely identifiable by the identifier. Has attributes that describe it.

An entity is an object, place, person, concept, activity about which an enterprise records data. It is an object which can have instances or occurrences. Each instance should be capable of being uniquely identified. Each entity has certain properties, or attributes associated with it and operations applicable to it. An entity is represented by a rectangle in the E-R Model.
Physical object Abstract object Event Location Employee, Machine, Book, Client, Student, Item Account, Department Application, Reservation, Invoice, Contract Building, city, State

Attributes
Attributes are data elements that describe an entity. If the attribute of an entity has more attributes that describe it, then it is not an attribute of that entity, but another entity. Attributes can either be listed next to the entities, or placed in circles and attached to the entities.

Entity
Customer Book Order

Attributes
Name, Address, Status ISBN, Title, Author, Price Order Number, Order Date, Placed by

Relationship
Thus is an association between entities. It is represented by a diamond in the E-r diagram.

STUDENT

ENROLLS IN

COURSE

MANAGE

EMP

WORK IN

DEPT

Formally Worked In

Degree of Relationship
One to One ( 1:1) One to Many (1:N) Many to Many (M:N)

One to One Relationship (1:1)

Order Requisition

RAISES

Purchase Order

One order requisition raises one purchase order.


One purchase order is raises by one order requisition

One to Many Relationship (1:N)

1 WORKS IN

Employee

Department

One employee works in at most one department.


One Department can have many employees.

Many to Many Relationship (1:N)


M N CONTAINS

ORDER

ITEM

One order may contain many items.


One item can be contained in many orders.

E-R Model
An organization has employees assigned to specific departments. The employees may work on several projects at the same time. The project uses parts which are supplied by different suppliers, and stored in various warehouses.

PERSON

PARTS

SUPPLIED BY

SUPPLIER

BELONGS TO

WORKS ON

USE

STORED IN

1
DEPT

M
WAREHOUSE

PROJECT

Un-normalized Data structure


An un-normalized data Structure contains redundant and disorganized data, which needs to be organized, by dividing the data over several tables to avoid redundancy. This is achieved by going through the process of Normalization.

The Raw table containing Invoice details:

Invoice Invoice Order Challan Cust Custmoer Item No date no no no Name No 112 12/08/97 1 1 C1 S. Srikant I1 112 12/08/97 2 1 C1 S. Srikant I2 113 16/08/97 1 1 C4 Kavita I4 114 16/08/97 1 1 C1 S. Srikant I8 114 16/08/97 2 1 C1 S. Srikant I2

Item Qty Desc. sold Rate Pepsi 2 8 Butter 1 22 Bread 1 12 Biscuit 2 22 Pepsi 4 8

Dis. Nil Nil Nil Nil Nil

Inv Value 38 38 12 54 54

The table presents several difficulties in operations Like :


Sup Name

S01 S02 S03 S01

Ram Shyam Amit Ram

status 15 20 25 15

city Calcutta Madras Hyd Calcutta

Part 101 102 103 102

Name Screw Nut Bolt Nut

Weight 8 16 20 8

Qty 500 200 250 500

Destination Mumbai Hyderabad Chennai Mumbai

The table presents several difficulties in operations Like : Insertion of Fields : If a new field is introduced into the system, it cannot be added to the database, until it becomes related to another existing field, for example, if a supplier name is introduced, its details cannot be entered in the table unless a part name is present for that supplier.

Updation of Fields : If the supplier code of a supplier is to be modified, it has to be changed throughout the table, in all occurrences of the supplier record. Missing out even a single correction, would result in inaccurate data.

Deletion of Fields : If information related to a specific column is to be deleted, the entire row has to be deleted, which results in loss of required information. For example, if the row of the supplier ABC is deleted, the information about the Part name lost.

Need for Normalization :

Improve Database Design


Ensure minimum redundancy Reduces need to reorganize data when design is modified/ enhanced Removes anomalies for database activities

Steps in Normalization

First Normal Form Identify repeating groups of fields Remove repeating groups to a separate table Identify the keys for the table Key of parent table is brought as part of the concatenated key of the second table.

First Normal Form


Invoice Table
Invoice No 112 113 114 Invoice Order Challan date no no 12/08/97 1 1 16/08/97 1 1 16/08/97 1 1 Cust no C1 C4 C1 Custmoer Invoice Name Value S. Srikant 38 Kavita 12 S. Srikant 54

Invoice Items
Invoice Item No Item No Desc 112 I1 Pepsi 112 I2 Butter 113 I4 Bread 114 I8 Biscuit 114 I2 Pepsi Qty Sold 2 1 1 2 4 Rate 8 22 12 22 8 Discount Nil Nil Nil Nil Nil

Second Normal Form

Check if all fields are dependent on the whole key Remove fields that depend on part of the key Group partially-dependent fields as a separate table Name the tables Identify key(s) to the table(s)

Second Normal Form


Invoice Table
Invoice No 112 113 114 Invoice date 12/08/99 14/09/99 16/08/99 Order Challan Cust Customer No. No. No Name 1 1 C1 Rochna 2 1 D1 Kavita 3 1 C2 Anupam Invoice Value 38 12 54

Invoice Items
Invoice No 112 112 113 114 114 Item No. I1 I2 I4 I8 I2 Qty Sold Discount 2 Nil 1 Nil 1 Nil 2 Nil 4 Nil

Item Table

Item No. I1 I2 I4 I8

Item Desc. Coke Burger Pizza Patties

Rate 8 22 12 22

Third Normal Form

Remove fields that Depend on other non-key attribute Can be calculated or derived from logic Group interdependent fields as separate tables, identify the key name and name of the table.

Third Normal Form


Invoice Table
Invoice No 200 201 202 Invoice Order Challan date No. No. 12/10/00 1 1 13/10/00 2 1 14/10/00 2 1 Cust No A1 A2 A4 Invoice Value 50 34 89

Customer Table

Cust No. Customer Name A1 Anupam A2 Samir A3 Ali A4 Rochna

Invoice Items

Invoice No 200 201 202


Items

Item No. It1 It2 It3

Qty Sold 3 1 2

Discount Nil Nil Nil

Item No. Item Desc. It1 Patties It2 Cake It3 Pizza

Rate 8 100 50

Client/Server

SQL

SQL

SQL

SQL

Database Server

Database

NORMAL FORMS SUMMARY 1NF A relation is in 1NF if every attribute is atomic (not a set of values)
Prime Attributes : An attribute is called a prime attribute if it is a part of the key. It is a non prime attribute if it is not part of the key

NORMAL FORMS SUMMARY 2-NF


A Relation is in two NF if every non prime attribute is non partially dependent on the key in R Every non key attribute is fully dependent on the key.

NORMAL FORMS SUMMARY 3-NF


A Relation is in 3NF if every attribute attribute is Fully dependent on the key and Non Key elements are not fully dependent on each other

NORMAL FORMS SUMMARY


Consider the Relation R (Roll No, CourseNo, CourseName,Grade) The dependencies are

Rollno, courseno-- Grade


Courseno-courseName

Key is Rollno,courseno
Partial dependency not in 2 NF Decompose the table (Rollno,courseno,grade) & (Courseno,coursename)

NORMAL FORMS SUMMARY


Consider the Relation (User#, USername,Designation,Entitlement, Book#,BookName Date of Issue) The function dependencies are User# UserName, Designation,Entitlement Book# BookName User#, Book# date of issue Key is (User #, Book#)

NORMAL FORMS SUMMARY


There is a lot of duplicate information, For example username is repeated. Can not store book informaion without an associated issue information (will have to use NULL values otherwise not advisable) This schema is in 1 NF

There are partial as well as transitive dependencies (hence not in 2 and 3 NF)

NORMAL FORMS SUMMARY


Convert to 2 NF by eliminating partial dependencies Decompose into (User #, User Name, Designation, Entitlement)

(Book#, Book Name)


(User#, Book#, Issue Date)

All these are in 2 NF


Are there more redundancies left ?

NORMAL FORMS SUMMARY


In this relation (User#, Username, Designation, Entitlement) Designation Entitlement Gives rise to transitive dependency An Example extension could be User# UserName Designation Entitlement

1
2 3

A
B C

Lect
Prof Lect

20
10 20

NORMAL FORMS SUMMARY


This schema does not have partial dependencies but have a transitive dependencies

Eliminate by decomposing it into


(User#, UserName, Designation)

(Designation, Entitlement)

These are in 3 NF No Redundancies

Why NORMAL FORM


A Relation in 3NF will not have redundancies arisingout of partial and transitive dependencies.

Hence there would be no Anomalies


Insetion Anomaly Deletion Anomaly Updation Anomaly

E-R to Relational Conversion Principles


Each entity maps into a relation. All simple components will become attributes of the relation.

Choose one of the Key attributes of E as the Primary Key of R.


Employee Employee Number, Date of Birth, First Name, Last Name, Designation

Key Attribute of E is Employee Number#


We now give simple rules to convert a Conceptual Schema in an E-R diagram to a Logical Schema

E-R to Relational Conversion Principles


The First Rule suggests that each entity can be represented by a relation. If there are several keys for the entity, designate any for the relation. A 1:1 relationship between two entities A, B can be represented by augmenting one of them with the key of another. 1 1

INSTRUCTOR-----<Teaches>------COURSE
Instructor Info, Course No Choose any Entity for augmenting Course Info

E-R to Relational Conversion Principles


A 1:N relationship between two entities A and B can be handled by adding a Key of A (the 1 side relation) to the attribute of B (the N side Relation) DEPARTMENT---1--<Offers>---N---Course Department Department Number, Name, Location Course Course Number, Course name Augmenting the 1 Side with the key of N Side

E-R to Relational Conversion Principles


The M:N relationship can be represented by creating a new relation with the keys of entities involved. STUDENT--M-----<Register>---N---COURSE rollno# rollno,courseno courseno#

An M:N relationship when translated to a relation, will result a new relation for the relationship one for each entity and one for relationship Student Relation Register Relation Course Relation

E-R to Relational Conversion Principles


An N-any relationship (n>2) can be represented by a new relationship with the keys of each of the participative entities as attributes. SUPPLIER---------<Supplies>---------PARTS | DEPARTMENT Supplier_id, Parts_id, Department_id

E-R to Relational Conversion Principles


A multivalued attributes of an entity A is represented as a new relation with the Key of A as an Extra Attribute.

EMPLOYEE ------Empno, ------Phone No(Multivalued Attribute) Employee Relation empno, ename, sal Phone Relation---- empno, Phone No

Assignment on E-R Diagram


Consider the train ticket reservation software. The system allows passengers to reserve berths on trains. Passengers are identified by name, address, age and sex. Passengers travel between stations in a particular class. Trains originate at a station and terminate at another station . They are classified as slow, fast and superfast.

A Reservation is made for a passenger on a train for a particular seat/ birth against the payment of the cost of the ticket.
The cost/Fare is calculated from a databasewhich stores the cost of each class of tickets for each type of train from every location to every other location.

Assignment on E-R Diagram


Entities Attributes
PASSENGERS TRAIN - Name, Address(Composite), Age, Sex - Train Number, Train Name, Origin, Dest, Type of Train

SEAT

- Bogie Number
Berth Number Class

Assignment on E-R Diagram


Relationship Between Attributes Relationship HAS Between Train(1) Seat(N) Reaservation Train(1) Seat(1) Passenger(1) Attributes Nil Nil From, To Date of Journey Date of Resrvation

Conceptual Data Model


TRAIN (Train No,type,name,origin,destination

<Has> <Reservation>

SEAT

PASSENGER

Rule 1. The Information Representation Rule All information in a relational database is represented explicitly and logically represented by the data values in the tables

Rule 2. The Guaranteed Access Rule


Each and every datum (atomic value) in a relational database is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.

Rule 3.Systematic Treatment of Null Values


Null values (distinct from the empty character string of blank characters, and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way (independent of data type)

Rule 4. Database Description Rule

A description of the database is stored and maintained in the form of tables as is done while defining the data. The database description is represented at the logical level just like ordinary data, so that authorized users can apply to regular data.

Rule 5. Comprehensive Data Sublanguage rule The RDBMS must be completely manageable through its own extension of SQL . The SQL should support DDL, Views, DML, Integrity Constraints, Authorization and transaction boundaries.

Rule 6. View Updating Rule


All views that are theoretically updatable are also updatable by the system. A view is a logical window onto part or all of one or more tables

Rule 7. High-level Insert, Update and Delete


An RDBMS must do move than just be able to retrieve relational data sets. It has to be capable of inserting, updating, deleting data as a relational set. A database can not be called relational, if it uses a single record at a time procedural technique when it comes to manipulating the data.
SQL processes sets of records rather than one record at a time and it provides automatic navigation to the data. All SQL statements accepts sets as Input and all SQL statements return sets as output.

Rule 8: Physical Data Independence


Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.
Application programs must deal with the logical aspects only. The database files are not accessed by an application program directly. Application interact with Oracle server Which controls access to the database. Modifications can be made at the server level to improve performance, to modify the physical storage Representation, or for any other reason without requiring logical modification of user applications.

Rule 9: Logical Data Independence


Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit nonimpairment are made to the base tables.
Splitting of table into two tables either by rows or by columns and combining two Tables into one by means of a join.

Rule 10: Integrity Independence


Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog (not in the application programs).
In addition to entity integrity and referential integrity , there is a requirement to be able to specify additional integrity constraints reflecting business policies and government regulations. These constraints are defined in terms of the high level data sublanguage and are stored in the catalog not in the application programs.

Rule 11: Distribution Independence A relational DBMS has distribution independence.

The DBMS must have a data sublanguage that enables application programs and terminal activities to remain logically unimpaired when data distribution is first introduced and when data is re-distributed.

Rule 12. Non-Subversion Rule If a relational system has a low-level (single-record-at-a-time) language, that low level can not be used to subvert or by pass the integrity rules and constraints expressed in the high-level relational language (multiple-records-at-a-time).
AN RDBMS product has to satisfy at least 6 rules of Codd to be accepted as a full fledged RDBMS

Вам также может понравиться