Вы находитесь на странице: 1из 55

DATABASE MANAGEMENT SYSTEM

Session I
Introduction to Relational Model

Krishnendu Rarhi ©
APEX Institute of Technology
Roadmap
• Introduction
• Traditional File-based system
• Database Approach
• Three-Level ANSI-SPARC Architecture
• Multi-user DBMS Architecture
• Relational Model

Krishnendu Rarhi © APEX Institute of Technology


Examples of Database Applications

■ Purchases from the Supermarket


■ Purchases using your Debit/credit card
■ Booking a holiday at the travel agent
■ Taking out insurance
■ Renting a book
■ Using the Internet
■ Studying at University

Krishnendu Rarhi © APEX Institute of Technology


Name the largest database been maintained by Indian
Government?
Aadhaar (World’s largest biometric Database)

IRCTC (World’s largest Ticketing Database)

Largest Database around the world


• Library of Congress
• Central Intelligence Agency (CIA)
• Amazon
• YouTube
• Google (Size unknown)
• Sprint, AT&T (World’s largest telecommunication company)
• National Energy Research Scientific Computing Center (NERSC) in Oakland, California
(operated by the Lawrence Berkeley National Laboratory)
• World Data Centre for Climate (Operated by the Max Planck Institute for Meteorology and
German Climate Computing Centre)

Krishnendu Rarhi © APEX Institute of Technology


Section I
Preliminaries

Krishnendu Rarhi © APEX Institute of Technology


Introduction

■ Data are unprocessed values/ facts


■ Information is the result of processing raw data to reveal its meaning
■ Database is a collection of data, typically describing the activities of one or more
related organizations
■ Database Management System (DBMS) is a software designed to assist in maintaining
and utilizing large collection of data

Krishnendu Rarhi © APEX Institute of Technology


Traditional File-based System

A collection of application programs that perform services for the end-user such as
production of reports. Each program defines and manages its own data.

Krishnendu Rarhi © APEX Institute of Technology


Limitations of File-based Approach

■ Separation & Isolation of data


– Each program maintains its own set of data
– Users of one program may be unaware of potentially useful data held by other
programs.
■ Duplication of data
– Same data is held by different programs.
– Wasted space and potentially different values and/or different formats for the
same item.

Krishnendu Rarhi © APEX Institute of Technology


Limitations of File-based Approach

■ Data dependence
– File structure is defined in the program code
■ Incompatible file formats
– Programs are written in different languages, and so cannot easily access each
other’s files
■ Fixed queries/ proliferation of application programs Other Limitations:
• Security
– Programs are written to satisfy particular functions. • Integrity
– Any new requirement needs a new program • Recovery
■ Data Portability • Restricted one-user access at a time

Krishnendu Rarhi © APEX Institute of Technology


Database Approach

■ Arose because:
– Definition of data was embedded in application programs, rather than being
stored separately and independently.
– No control over access and manipulation of data beyond that imposed by
application programs.

■ Result:
– the database and Database Management System (DBMS).

Krishnendu Rarhi © APEX Institute of Technology


Formal Definition

Database is a shared collection of logically related data (and a


description of this data), designed to meet the information needs of an
organization.
■ System catalog (metadata) provides description of data to enable program–data independence (data
abstraction).

■ Logically related data comprises entities, attributes, and relationships of an organization’s information.

DBMS is a software that enables users to define, create, maintain and


control access to the database.

Krishnendu Rarhi © APEX Institute of Technology


Database Management System

Krishnendu Rarhi © APEX Institute of Technology


Components of DBMS Environment

■ Hardware
– Can range from a PC to a network of computers.
■ Software
– DBMS, operating system, network software (if necessary) and also the application
programs.
■ Data
– Used by the organization and a description of this data called the schema.
■ Procedures
– Instructions and rules that should be applied to the design and use of the database and
DBMS.
■ People

Krishnendu Rarhi © APEX Institute of Technology


Roles in Database Environment

■ Data Administrator (DA)

■ Database Administrator (DBA)

■ Database Designers (Logical and Physical)

■ Application Programmers

■ End Users (naive and sophisticated)

Krishnendu Rarhi © APEX Institute of Technology


History of Database System

■ First-generation
– Hierarchical and Network

■ Second generation
– Relational

■ Third generation
– Object-Relational
– Object-Oriented

Krishnendu Rarhi © APEX Institute of Technology


Advantages of DBMS

■ Control of data redundancy ■ Balance conflicting requirements


■ Data consistency ■ Improved data accessibility and
■ More information from the same responsiveness
amount of data ■ Increased productivity
■ Sharing of data ■ Improved maintenance through
■ Improved data integrity data independence
■ Improved security ■ Increased concurrency
■ Enforcement of standards ■ Improved backup and recovery
■ Economy of scale services

Krishnendu Rarhi © APEX Institute of Technology


Disadvantages of DBMS

■ Complexity
■ Size
■ Cost of DBMS
■ Additional hardware costs
■ Cost of conversion
■ Performance
■ Higher impact of a failure

Krishnendu Rarhi © APEX Institute of Technology


Section II
Database Environment

Krishnendu Rarhi © APEX Institute of Technology


Objective of Three-Level Architecture

■ All users should be able to access same data


■ A user’s view is immune to changes made in other views
■ Users should not need to know physical database storage details
■ DBA should be able to change database storage structures without affecting the
users’ views
■ Internal structure of database should be unaffected by changes to physical aspects
of storage
■ DBA should be able to change conceptual structure of database without affecting
all users

Krishnendu Rarhi © APEX Institute of Technology


Three-Level ANSI-SPARC Architecture

■ ANSI: American National Standard Institute

■ SPARC: Standards Planning and Requirements


Committee

■ DBTG: Database Task Group

■ CODASYL: Conference on Data Systems and


Languages

Krishnendu Rarhi © APEX Institute of Technology


Three-Level ANSI-SPARC Architecture

■ External Level
– Users’ view of the database.
– Describes that part of database that is relevant to a particular user.
■ Conceptual Level
– Community view of the database.
– Describes what data is stored in database and relationships among the data
■ Internal Level
– Physical representation of the database on the computer.
– Describes how the data is stored in the database

Krishnendu Rarhi © APEX Institute of Technology


Three-Level ANSI-SPARC Architecture

Krishnendu Rarhi © APEX Institute of Technology


Data Independence

■ Logical Data Independence


– Refers to immunity of external schemas to changes in conceptual schema.
– Conceptual schema changes (e.g. addition/removal of entities).
– Should not require changes to external schema or rewrites of application programs
■ Physical Data Independence
– Refers to immunity of conceptual schema to changes in the internal schema.
– Internal schema changes (e.g. using different file organizations, storage
structures/devices).
– Should not require change to conceptual or external schemas

Krishnendu Rarhi © APEX Institute of Technology


Data Independence

Krishnendu Rarhi © APEX Institute of Technology


Data Model

Integrated collection of concepts for describing data, relationships between data, and
constraints on the data in an organization, to represent data in an understandable way
■ Data Model comprises:
– a structural part;
– a manipulative part;
– possibly a set of integrity rules
■ Categories of data models include:
– Object-based
– Record-based
– Physical

Krishnendu Rarhi © APEX Institute of Technology


Data Model

■ Object-Based Data Models


– Entity-Relationship
– Semantic
– Functional
– Object-Oriented

■ Record-Based Data Models


– Relational Data Model
– Network Data Model
– Hierarchical Data Model

■ Physical Data Models

Krishnendu Rarhi © APEX Institute of Technology


Relational Data Model

■ Object-Based Data Models


– Entity-Relationship
– Semantic
– Functional
– Object-Oriented

■ Record-Based Data Models


– Relational Data Model
– Network Data Model
– Hierarchical Data Model

■ Physical Data Models

Krishnendu Rarhi © APEX Institute of Technology


Network Data Model

■ Object-Based Data Models


– Entity-Relationship
– Semantic
– Functional
– Object-Oriented

■ Record-Based Data Models


– Relational Data Model
– Network Data Model
– Hierarchical Data Model

■ Physical Data Models

Krishnendu Rarhi © APEX Institute of Technology


Hierarchical Data Model

■ Object-Based Data Models


– Entity-Relationship
– Semantic
– Functional
– Object-Oriented

■ Record-Based Data Models


– Relational Data Model
– Network Data Model
– Hierarchical Data Model

■ Physical Data Models

Krishnendu Rarhi © APEX Institute of Technology


Conceptual Modeling

■ Conceptual schema is the core of a system supporting all user views.


■ Should be complete and accurate representation of an organization’s
data requirements.
■ Conceptual modeling is process of developing a model of information
use that is independent of implementation details.
■ Result is a conceptual data model

Krishnendu Rarhi © APEX Institute of Technology


Functions of DBMS

■ Data Storage, Retrieval, and ■ Authorization Services.


Update.
■ Support for Data Communication.
■ A User-Accessible Catalog.
■ Integrity Services.
■ Transaction Support.
■ Services to Promote Data
■ Concurrency Control Services. Independence.
■ Recovery Services. ■ Utility Services

Krishnendu Rarhi © APEX Institute of Technology


System Catalog

■ Repository of information (metadata) describing the data in the database.


■ One of the fundamental components of DBMS.
■ Typically stores:
– names, types, and sizes of data items;
– constraints on the data;
– names of authorized users;
– data items accessible by a user and the type of access;
– usage statistics

Krishnendu Rarhi © APEX Institute of Technology


Multi-user DBMS Architecture

■ Teleprocessing

■ File-server

■ Client-server

Krishnendu Rarhi © APEX Institute of Technology


Multi-user DBMS Architecture

■ Teleprocessing
– Traditional architecture.
– Single mainframe with a number of terminals attached.
– Trend is now towards downsizing

Krishnendu Rarhi © APEX Institute of Technology


Multi-user DBMS Architecture

■ File Server
– File-server is connected to several workstations across a network.

– Database resides on file-server.

– DBMS and applications run on each workstation.

– Disadvantages include:
 Significant network traffic.
 Copy of DBMS on each workstation.
 Concurrency, recovery and integrity control more complex

Krishnendu Rarhi © APEX Institute of Technology


Multi-user DBMS Architecture

■ Traditional Two-tier Client-Server


– Client (tier 1) manages user interface and runs applications.
– Server (tier 2) holds database and DBMS.

– Advantages include:
■ wider access to existing databases;
■ increased performance;
■ possible reduction in hardware costs;
■ reduction in communication costs;
■ increased consistency

Krishnendu Rarhi © APEX Institute of Technology


Multi-user DBMS Architecture

■ Three-Tier Client-Server
– Client side presented two problems preventing true scalability:
 ‘Fat’ client, requiring considerable resources on client’s computer to run effectively.
 Significant client side administration overhead
– Advantages:
 ‘Thin’ client, requiring less expensive hardware.
 Application maintenance centralized.
 Easier to modify or replace one tier without affecting others.
 Separating business logic from database functions makes it easier to implement load
balancing.
 Maps quite naturally to Web environment

Krishnendu Rarhi © APEX Institute of Technology


Multi-user DBMS Architecture

■ Other Architectures
– Middleware Oriented Architecture
– Service Oriented Architecture
– Cloud oriented Architecture
– Distributed Database Architecture

■ Transaction Processing Monitor is a program that controls data transfer between clients and servers in
order to provide a consistent environment, particularly for online transaction processing (OLTP).

Krishnendu Rarhi © APEX Institute of Technology


Section III
Relational Model

Krishnendu Rarhi © APEX Institute of Technology


Terminology

■ A Relation is a table with columns and rows.


– Only applies to logical structure of the database, not the physical structure.
■ Attribute is a named column of a relation.
■ Domain is the set of allowable values for one or more attributes
■ Tuple is a row of a relation.
■ Degree is the number of attributes in a relation.
■ Cardinality is the number of tuples in a relation.
■ Relational Database is a collection of normalized relations with distinct relation
names.

Krishnendu Rarhi © APEX Institute of Technology


Relational Model
Instances of Branch & Staff Relations

Krishnendu Rarhi © APEX Institute of Technology


Relational Model
Example of Attribute Domains

Krishnendu Rarhi © APEX Institute of Technology


Relational Model
Alternative Terminology

Krishnendu Rarhi © APEX Institute of Technology


Relational Model
Mathematical Definition of Relation

 Consider two sets, D1 & D2, where D1 = {2, 4} and D2 = {1, 3,


5}.
 Cartesian product, D1 ´ D2, is set of all ordered pairs, where
first element is member of D1 and second element is member of
D2.

D1 ´ D2 = {(2, 1), (2, 3), (2, 5), (4, 1), (4, 3), (4, 5)}

 Alternative way is to find all combinations of elements with


first from D1 and second from D2

Krishnendu Rarhi © APEX Institute of Technology


Relational Model
Mathematical Definition of Relation

 Any subset of Cartesian product is a relation; e.g.


R = {(2, 1), (4, 1)}
 May specify which pairs are in relation using some condition
for selection; e.g.
o second element is 1:
R = {(x, y) | x ÎD1, y ÎD2, and y = 1}
o first element is always twice the second:
S = {(x, y) | x ÎD1, y ÎD2, and x = 2y}

Krishnendu Rarhi © APEX Institute of Technology


Relational Model
Mathematical Definition of Relation

 Consider three sets D1, D2, D3 with Cartesian Product D1 ´ D2 ´


D3; e.g.

D1 = {1, 3}D2 = {2, 4}D3 = {5, 6}


D1 ´ D2 ´ D3 = {(1,2,5), (1,2,6), (1,4,5), (1,4,6), (3,2,5),
(3,2,6), (3,4,5), (3,4,6)}
 Any subset of these ordered triples is a relation

Krishnendu Rarhi © APEX Institute of Technology


Relational Model
Mathematical Definition of Relation

Cartesian product of n sets (D1, D2, . . ., Dn) is:


D1 ´ D2 ´ . . . ´ Dn = {(d1, d2, . . . , dn) | d1 ÎD1, d2 ÎD2, . . . , dnÎDn}

usually written as:


n
XD i
i=1

 Any set of n-tuples from this Cartesian product is a relation on


the n sets

Krishnendu Rarhi © APEX Institute of Technology


Relational Model

 Relation schema
Database Relations

o Named relation defined by a set of attribute and domain name


pairs.

 Relational database schema


o Set of relation schemas, each with a distinct name

Krishnendu Rarhi © APEX Institute of Technology


Relational Model

 Relation name is distinct from all other relation names in


relational schema.
Properties of Relations

 Each cell of relation contains exactly one atomic (single) value.


 Each attribute has a distinct name.
 Values of an attribute are all from the same domain
 Each tuple is distinct; there are no duplicate tuples.
 Order of attributes has no significance.
 Order of tuples has no significance, theoretically

Krishnendu Rarhi © APEX Institute of Technology


Relational Model
 Superkey
o An attribute, or set of attributes, that uniquely identifies a tuple within a
Relational Keys

relation.
o A Super key may have additional attributes that are not needed for unique
identification
 Candidate Key
o Superkey (K) such that no proper subset is a superkey within the relation.
o In each tuple of R, values of K uniquely identify that tuple (uniqueness).
o No proper subset of K has the uniqueness property (irreducibility)

Krishnendu Rarhi © APEX Institute of Technology


Relational Model
 Primary Key
o candidate key selected to identify tuples uniquely within relation
Relational Keys

o can't appear more than once in the table, and cannot contain NULL
o table cannot have more than one primary key
 Alternate Keys
o Candidate keys that are not selected to be primary key.
 Foreign Key
o Attribute, or set of attributes, within one relation that matches candidate
key of some (possibly same) relation
 Surrogate Key
o unique identifier in a database, for example, incremental key, GUID
o has no actual meaning
o has an existence only for data analysis.

Krishnendu Rarhi © APEX Institute of Technology


Relational Model
 Null
o Represents value for an attribute that is currently unknown or not
applicable for tuple that deals with incomplete or exceptional data.
Integrity Constrains

o Represents the absence of a value and is not the same as zero or spaces,
which are values
 Entity Integrity
o In a base relation, no attribute of a primary key can be null
 Referential Integrity
o If foreign key exists in a relation, either foreign key value must match a
candidate key value of some tuple in its home relation or foreign key value
must be wholly null
 General Constraints
o Additional rules specified by users or database administrators that define
or constrain some aspect of the enterprise
Krishnendu Rarhi © APEX Institute of Technology
Relational Model
 Base Relation
o Named relation corresponding to an entity in conceptual schema, whose
tuples are physically stored in database
 View
o Dynamic result of one or more relational operations operating on base
Views

relations to produce another relation


 A virtual relation that does not necessarily actually exist in the
database but is produced upon request, at time of request
 Contents of a view are defined as a query on one or more base
relations
 Views are dynamic, meaning that changes made to base
relations that affect view attributes are immediately reflected in
the view
Krishnendu Rarhi © APEX Institute of Technology
Relational Model
 Provides powerful and flexible security mechanism by hiding
parts of database from certain users
Purpose of Views

 Permits users to access data in a customized way, so that same


data can be seen by different users in different ways, at same
time

 Can simplify complex operations on base relations

Krishnendu Rarhi © APEX Institute of Technology


Relational Model
 All updates to a base relation should be immediately reflected in
all views that reference that base relation
 If view is updated, underlying base relation should reflect
Updating Views

change
 There are restrictions on types of modifications that can be
made through views:
o Updates are allowed if query involves a single base relation and contains a
candidate key of base relation.
o Updates are not allowed involving multiple base relations.
o Updates are not allowed involving aggregation or grouping operations

Krishnendu Rarhi © APEX Institute of Technology

Вам также может понравиться