Unit 1: Introduction To Databases

1 Introduction to Databases
Unit 1
Introduction to Databases
• Introduction
• Requirements of Databases
• Characteristics of the database
• Codd rules to convert a DBMS into RDBMS
• Database system concept
• Data Models
• Schemas and Instances
• Database Architecture
• Data Independence
• Database system environment
• Classification of DBMS system
• Database system utilities
INTRODUCTION
Information Age
The present time is known as the information age, reason being that humans are dealing with data
and information related to business or organization. Since the beginning of civilization, man is
manipulating data and the give and take of information has been in practice, but this has been
considered as an important discipline only for the last few decades. Today, data manipulation and
information processing have become the major tasks of any organization, small or big, whether it
is an educational institution, government concern, scientific, commercial or any other. Thus we
can say that information is the essential requirement of any business or organization.
Data
It is the plural of a Greek word datum, which means any raw facts, or figure like numbers, events,
letters, transactions, etc, based on which we cannot reach any conclusion. It can be useful after
processing, e.g. 78, it is simply a number (data) but if we say marks in physics 78 then it becomes
information. It means somebody got distinction in physics.
Information
Information is processed data. The user can take decision based on information.
Data Processing Information
Information systems, through their central role in information economy, bring about the following
changes:
• Global exposure of the industry.
• Actively working people.
• Precedence of idea and information over money.
• Growth in the business size.
• Globalization – changing technologies.
• Integration among different components based on information flow.
• Need for optimum utilization of resources.
• Deciding loss/benefit of business.
• Future oriented Information.
• External Interfaces.
The organization as an information system
Management
Marketing
Product Planning Control

Sales Development
Corporate
Database
Accountin
g
A/c Payable Material
A/c
rereceivable
Requirement
Manufacturing Purchasing
Planning
Productio
Scheduling n
An organization is only a mechanism for processing information and considers that the traditional
management of information can be viewed in the context of information and process. The
manager may be considered as a planning and decision center. Established routes of information
flow are used to determine the effectiveness of the organization in achieving its objectives. Thus,
information is often described as the key to success in business.
Information Quality
We expect information to be ‘reliable’ and ‘accurate’. These features can be measured by the
degree of completeness, precision and timeliness.
Completeness
The user of information should receive all the details necessary to aid decision-making. It is
important for all information to be supplied before decisions are made. For example, new stock
should not be ordered until full details of current stock levels are known. This is a simple
example, since we know what information is required and where to obtain it. Difficulties begin
when we are not sure of the completeness of the information received. Business analysts and
economic advisors are well aware of these problems when devising strategies and fiscal plans.
Precision
Inaccurate information can be more damaging than incomplete information to a business. The
degree of accuracy required depends on the recipient’s position in the management hierarchy. In
general terms, the higher the position, the less accuracy required. Decisions made at the top
management level are based on annual summaries of items such as sales, purchases and capital
spending. Middle managers would require a greater degree of accuracy, perhaps weekly or
monthly totals. Junior management requires the greatest degree of accuracy to aid decision-
making. Daily up-to-date information is often necessary, with accuracy to the nearest percentage
point or unit.
Timeliness
This is described as ‘the provision of prepared information as soon as it is required’. We also need
to consider the case where accurate information is produced, but not used immediately, rendering
it out-of-date. Some systems demand timely information and cannot operate without it. Airline
reservation systems are one example, passengers and airline staff depend on timely information
concerning flight times, reservations and hold-ups.
Data Processing
This is a traditional term used to describe the processing of function-related data with a business
organization. Sales order processing is a typical example of data processing. Note that processing
may be carried out manually or using a computer. Some systems employ a combination of both
manual and computerized processing techniques. In both the cases, the data processing is
essentially. Differences can be described in terms of:
Speed
Computers can process data much quicker than any human. Hence, a computer system has a
potentially higher level of productivity and, therefore, it is cheaper for high-volume data
processing. Speed allows more timely information to be generated.
Accuracy
Computers have a reputation for accuracy, assuming that correct data has been input and that
procedures define processing steps correctly. The errors in computer systems are thus human
errors (software, or input), or less likely, machine errors (hardware failure).
Volume
As processing requirements increase, possibly due to business expansion, managers require more
information processing. Human systems cannot cope up with these demands. Banking is a prime
example where the dependency on computers is total.
Decision-Making
There are some tasks that computers cannot perform. These activities usually have a high degree
of non-procedural thinking in which the rules of processing are difficult to define - it would be
extremely difficult to produce a set of ‘rules’ even for safety in crossing a busy road. Many
management posts still rely to a great degree on human decision-making. Top management
decisions on policy and future business are still determined by a board of directors and not by a
computer.
Having understood the basic concept and significance of information and database, let us now get
into the basics:
• Data: As we described earlier, Data are the ‘raw’ facts used for information
processing. Data must be collected and then ‘input’ ready for processing.
 Each item of data must be clearly labeled, formatted and its size determined.
For example, a customer account number may be labeled ‘A/C’, in numeric
format, of size of five digits.
 Data may enter a system in one form and then be changed as it is processed or
calculated. Customer order data, for example, may be converted to electronic
form by keying in the orders from specially prepared data entry forms. The
order data may then be used to update both customer and stock files.
• Input: The transaction is the primary data input which leads to system action, e.g., the
input of a customer order to the sales order processing system. The volume and
frequency of transactions will often determine the structure of an organization.
 In addition to transaction data, a business system will also need reference to
stored data, know as standing or fixed data. Within a sales order processing
system we have standing data in the form of customer names and addresses,
stock records and price lists. The transactions contain some standing data, for
referencing, but mainly variable data, such as items and quantities ordered.
• Output: Output from a business system is often seen as planning or control information, or as
input to another system. This can be understood if we consider a stock control system. Output
will be stock level information; slow-and fast-moving items for example are stock orders, for
items whose quantities fall below their reorder level. Stock movement information would be
used to plan stock levels and reorder levels, whilst stock order requirements would be used as
input to the purchasing system.
• Files: A file is an ordered collection of data records, stored for retrieval or amendment, as
required. When files are amended from transaction data, this is referred to as updating. In
order to aid information flow, files may be shared between sub systems. For example, a stock
file may be shared between the sales function and the purchasing function.
• Processes: Data is converted to output or information by processing. Processing examples
include sorting, calculating and extracting.
DATABASE
A database is a collection of related data or operational data extracted from any firm or
organization. For example, consider the names, telephone number, and address of people you
know. You may have recorded this data in an indexed address book, or you may have stored it on
a diskette, using a personal computer and software such as Microsoft Access of MS Office or
ORACLE, SQL SERVER etc.
The common use of the term database is usually more restricted.
A database has the following implicit properties:
• A database represents some aspect of the real world, sometimes called the miniworld or the
Universe of Discourse (U.D.). Changes to the miniworld are reflected in the database.
• A database is a logically coherent collection of data with some inherent meaning. A random
assortment of data cannot correctly be referred to as a database.
• A database is designed, built and populated with data for a specific purpose. It has an intended
group of users and some preconceived applications in which these users are interested.
In other words, a database has some source from which data is derived, some degree of interaction
with events and an audience that is actively interested in the contents of the database. A database
can be of any size and of varying complexity. For example, the list of names and addresses
referred to earlier may consist of only a few hundred records, each with a simple structure. On the
other hand, the card catalog of a large library may contain half a million cards stored under
different categories – by primary author’s last name, by subject, by book titles – with each
category organized in alphabetic order.
Here are several examples of databases.
1. Manufacturing company
2. Bank
3. Hospital
4. University
5. Government department
In general, it is a collection of files (tables)
Entity: A person, place, thing or event about which information must be kept.
Attribute: Pieces of information describing a particular entity. These are mainly the
characteristics about the individual entity. Individual attributes help to identify and distinguish
one entity from another.
Student (Database Name)
Entity Attributes
Personnel Name, Age, Address, Father’s Name
Academic Name, Roll No., Course, Depts. Name
Hierarchy of Database
Bit 0,1
Byte 10101011 (8-bits)
Field (Attribute name like name, Age, Address)
Record (One or more rows in a table)
File (Table or collection of all files)
Database (Collection of files or tables)

e.g.
Student (Database Name)
Field name or attribute name

Personal (Table Name) Academic (Table Name)
Name Father Name Age Name
ROLL COURSE Dept.
NO Name
John Albert 24 RECORD
John 12 MSC Computer
Ramesh Suresh 18 Ramesh 15 BCA Computer
Why Database?
Handling of a small shop’s database can be done normally but if you have a large database and
multiple users then in that case you have to maintain computerized database. The advantages of a
database system over traditional, paper-based methods of record-keeping tag will perhaps be more
readily apparent in these examples. Here are some of them.
• Compactness: No need for possibly voluminous paper files.
• Speed: The machine can retrieve and change data faster than a human can..
• Accuracy: Accurate, up-to-date information is available on demand at any time.
Benefits of the Database Approach

There are following benefits of the Database Approach:
• Redundancy and duplication can be reduced. In the database approach, the

views of different user groups are integrated during database design. For
consistency, we should have a database design that stores each logical data item –
such as student’s name or birth date – in only one place in the database. This does
not permit inconsistency, and it saves time. However, in some cases, controlled
redundancy may be useful for improving the performance of queries.
• Inconsistency can be avoided (to some extent). Employee E4 works in
department D5 – is represented by two distinct entries in the stored database.
Suppose also that the DBMS is not aware of this duplication (i.e. redundancy is not
controlled). Then there will necessarily be an occasion on which the two entries
will not agree, i.e., when one of the two has been updated and the other has not. At
such times the database is said to be inconsistent.
• The data can be shared. Same database can be used by variety of users, for their
different objectives, simultaneously.
• Security restrictions can be applied. It is likely that some users is often will not
be authorized to access all information in the database. For example, financial data
is often considered confidential, and hence only authorized persons are allowed to
access such data. In addition, some users may be permitted only to retrieve data,
whereas others are allowed both to retrieve and to-update.
• Integrity can be maintained. The problem of integrity is the problem of ensuring
that the data in the database in accurate it means if the data type of any field is
number then we cannot insert any string text here.
DATABASE SYSTEM
A DBMS is a sophisticated piece of software, which supports the creation, manipulation and
administration of database system. A database system comprises of a database of operational data
together with the processing functionality required to access and manage that data. Typically, this
means a computerized record keeping system whose overall purpose is to maintain information
and to make that information available on demand.
The DBMS as an Interface between physical Database and user Requests
The DBMS responds to a query by invoking the appropriate sub-programs, each of
which performs its special function to interpret the query, or to locate the desired data in the
database and insert it in the designed order. Thus DBMS shields database users from the tedious
programming they would have to do, organize data for storage, or to gain access to it once it has
been stored.
User Request
Query Record Operation Other

Language
Interface
Database Management
System
Operating system
Database
As already mentioned, a database consists of a group of related files of different record types and
the DBMS allows users to access data anywhere in the database, without the knowledge of how
data are actually organized on the storage device.
The DBMS (database approach) tries to overcome all of the shortcomings of

the pre database approach as follows:
• Data Validation Problems: If many programs manipulate a particular type of information
then validation of its correctness must be carried out by each of those on guard against entry of
any illegal values. Consequently, program code may need to duplicate and, if the validation
conditions change, each program (at least) must be recompiled.
• Data Sharing Problems: Perhaps more seriously, if a file is used by several programs and
there is a need to change its structures in some way, perhaps to add a new type information
object that is required by a new program, then each program will need to be recompiled-unless
one maintains duplicate information in different structures, in which case there is a
synchronization problem.
• Manipulation Problems: When writing a program using a conventional programming
language and operating system facilities, a programmer uses record-level commands (i.e. reads
and writes) on each file to perform the required functions; this is laborious and hence
unproductive of the programmer’s time.
• Data Redundancy: The same piece of information may be stored in two or more files. For
example, the particulars of an individual who may be a customer and an employee may be
stored in two or more files.
• Program/Data Dependency: In the traditional approach, if a data field is to be added to a
master file, all such programs that access the master file would have to be changed to allow for
this new field which would have been added to the master record.
• Lack of Flexibility: In view of the strong coupling between the program and the data, most
information retrieval possibilities would be limited to well-anticipated and predetermined
requests for data, the system would normally be capable of producing schedule records and
queries which it would have been programmed to create. In the fast moving and competent
business environment of today, apart from such regularly scheduled records there is a need for
responding to un-anticipatory queries and some kind of investigative analysis which cannot be
envisaged professionally.
So let us now try to appreciate how DBMS solves some of the issues.
• Data Validation: In principle, validation rules for data objects can be held in the schema and
enforced on entry by the DBMS. This reduces the amount of application code that is needed.
Changes to these rules need be made exactly once because they are not duplicated.
• Data Sharing: Changes to the structures of data objects are registered by modifications
to the schema. Existing application programs need not be aware of any differences, because a
correspondence between their view of data and that, which is now supported,
can also be held in the schema and interpreted by the DBMS. This concept is often referred
to as data independence; applications are independent of the actual representation of
their data.
One of the main reasons for using DBMS is to have central control of both the data and the
processes that access those data. The person who has such central control over the system is called
the database administrator (DBA). The functions of the DBA include the following:
• Schema Definition: The DBA creates the original database schema by writing a set of defines
that is translated by the DDL (Data Defn. Lang.) Compiler to a set of tables that is store
permanently in the data dictionary.
• Storage Structure and Access-Method Definition: The DBA creates appropriate storage
structures and access methods by writing a set of definitions, which is translated by the DDL
compiler.
• Schema and Physical-Organization Modification: Programmers accomplish the relatively
rare modifications either to the database schema or to the description of the physical storage
organization. By writing a set of definitions that is used by either the DDL compiler or the
data-storage and data defn. Language compilers to generate modifications to the appropriate
intend system-tables (for example, the data dictionary).
• Growing of Authorizations for Data Access: The granting of different types of
authorizations allows the DBA to regulate the parts of the database, which various users can
access.
• Integrity-Constraint Specification: The data values stored in the database must satisfy
certain consistency constraints e.g., perhaps the number of hours an employee may work in 1
week may not exceed a pre-specified limit (say 80 hours)
CODD RULES
Rule 1 : The information Rule.
"All information in a relational data base is represented explicitly at the logical level and in
exactly one way - by values in tables."
Everything within the database exists in tables and is accessed via table access routines.
Rule 2 : Guaranteed access Rule.

"Each and every datum (atomic value) in a relational data base is guaranteed to be logically
accessible by resorting to a combination of table name, primary key value and column name."
To access any data-item you specify which column within which table it exists, there is no
reading of characters 10 to 20 of a 255 byte string.
Rule 3 : Systematic treatment of null values.

"Null values (distinct from the empty character string or a string of blank characters and distinct
from zero or any other number) are supported in fully relational DBMS for representing missing
information and inapplicable information in a systematic way, independent of data type."
If data does not exist or does not apply then a value of NULL is applied, this is understood by the
RDBMS as meaning non-applicable data.
Rule 4 : Dynamic on-line catalog based on the relational model.

"The data base description is represented at the logical level in the same way as-ordinary data, so
that authorized users can apply the same relational language to its interrogation as they apply to
the regular data."
The Data Dictionary is held within the RDBMS, thus there is no-need for off-line volumes to tell
you the structure of the database.
Rule 5 : Comprehensive data sub-language Rule.

"A relational system may support several languages and various modes of terminal use (for
example, the fill-in-the-blanks mode). However, there must be at least one language whose
statements are expressible, per some well-defined syntax, as character strings and that is
comprehensive in supporting all the following items
• Data Definition
• View Definition
• Data Manipulation (Interactive and by program).
• Integrity Constraints
• Authorization.
Every RDBMS should provide a language to allow the user to query the contents of the RDBMS
and also manipulate the contents of the RDBMS.
Rule 6 : .View updating Rule

"All views that are theoretically updatable are also updatable by the system."
Not only can the user modify data, but so can the RDBMS when the user is not logged-in.
Rule 7 : High-level insert, update and delete.

"The capability of handling a base relation or a derived relation as a single operand applies not
only to the retrieval of data but also to the insertion, update and deletion of data."
The user should be able to modify several tables by modifying the view to which they act as base
tables.
Rule 8 : Physical data independence.

"Application programs and terminal activities remain logically unimpaired whenever any
changes are made in either storage representations or access methods."
The user should not be aware of where or upon which media data-files are stored
Rule 9 : Logical data independence.

"Application programs and terminal activities remain logically unimpaired when information-
preserving changes of any kind that theoretically permit un-impairment are made to the base
tables."
User programs and the user should not be aware of any changes to the structure of the tables
(such as the addition of extra columns).
Rule 10 : Integrity independence.

"Integrity constraints specific to a particular relational data base must be definable in the
relational data sub-language and storable in the catalog, not in the application programs."
If a column only accepts certain values, then it is the RDBMS which enforces these constraints
and not the user program, this means that an invalid value can never be entered into this column,
whilst if the constraints were enforced via programs there is always a chance that a buggy
program might allow incorrect values into the system.
Rule 11 : Distribution independence.

"A relational DBMS has distribution independence."
The RDBMS may spread across more than one system and across several networks, however to
the end-user the tables should appear no different to those that are local.
Rule 12 : Non-subversion Rule.

"If a relational system has a low-level (single-record-at-a-time) language, that low level cannot
be used to subvert or bypass the integrity Rules and constraints expressed in the higher level
relational language (multiple-records-at-a-time)."
0. Foundation Rule
Interestingly Codd defined a Rule 0 for relational database systems.
"For any system that is advertised as, or claimed to be, a relational database management system,
that system must be able to manage databases entirely through its relational capabilities, no
matter what additional capabilities the system may support." (Codd, 1990)
That means, no matter what additional features a relational database might support, in order to be
truly called relational it must comply with the 12 rules.
DATA MODELS
Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints. The various data
models that have been proposed fall into three different groups: object-based logical models,
record-based logical models, and physical models.
Object-Based Logical Models
Object-based logical models are used in describing data at the logical and view levels. They are
characterized by the fact that they provide fairly flexible structuring capabilities and allow data
constraints to be specified explicitly. There are many different models, and more are likely to
come. Several of the more widely known ones are
• The entity-relationship model

• The object-oriented model
The Entity-Relationship Model
The entity-relationship (E-R) data model is based on a perception of a real world that consists of a
collection of basic objects, called entities, and of relationships
Social-Security Customer-street Account-number Balance
Customer-name Customer-city
D
ep
Customer os Account
it or
Figure 1 A sample E-R diagram.
among these objects. An entity is a “thing” or “object” in the real world that is distinguishable
from other objects. For example, each person is an entity, and bank accounts can be considered to
be entities. Entities are described in a database by a set of attributes. For example, the attributes
account-number and balance describe one particular account in a bank. A relationship is an
association among several entities. For example, a Depositor relationship associates a customer
with each account that she has. The set of all entities of the same type, and the set and relationship
of the same type, are termed an entity set and relationship set, respectively.
In addition to entities and relationships, the E-R model represents certain constraints to which the
contents of a database must conform. One important constraint is mapping cardinalities, which
express the number of entities to which another entity can be associated via a relationship set.
The overall logical structure of a database can be expressed graphically by an E-R diagram, which
is built up from the following components:
• Rectangles, which represent entity sets

• Ellipses, which represent attributes
• Diamonds, which represent relationships among entity sets
• Lines, which link attributes to entity sets and entity sets to relationships
Each component is labeled with the entity or relationship that it represents.
The Object-Oriented Model
Like the E-R model, the object-oriented model is based on a collection of objects. An object
contains values stored in instance variables within the object. An object also contains bodies of
code that operate on the object. These bodies of code are called methods.
Objects that contain the same types of values and the same methods are grouped together into
classes. A class may be viewed as a type definition for objects. This combination of data and
methods comprising a type definition is similar to a programming-language abstract data type.
The only way in which one object can access the data of another object is by invoking a method of
that other object. This action is called sending a message to the object. Thus, the call interface of
the methods of an object defines that object’s externally visible part. The internal part of the
object-the instance variables and method code-are not visible externally. The result is two levels of
data abstraction.
To illustrate the concept, let us consider an object representing a bank account. Such an object
contains instance variables account-number and balance. It contains a method pay-interest, which
adds interest to the balance. Assume that the bank had been paying 6 percent interest on all
accounts, but now is changing its policy to pay 5 percent if the balance is less than $1000 or 6
percent if the balance is $1000 or greater. Under most data models, making this adjustment would
involve changing code in one or more application programs. Under the object-oriented model, the
only change is made within the pay-interest method. The external interface to the objects remains
unchanged.
bank
person branch
employee customer
teller secretary
Figure 2: Class hierarchy
Unlike entities in the E-R model, each object has its own unique identity, independent of the
values that it contains. Thus, two objects containing the same values are nevertheless distinct. The
distinction among individual objects is maintained in the physical level through the assignment of
distinct object identifiers.
Record-Based Logical Models
Record-based logical models are used in describing data at the logical and view levels. In contrast
to object-based data models, they are used both to specify the overall logical structure of the
database and to provide a higher-level description of the implementation.
Record-based models are so named because the database structured in fixed-format records of
several types. Each record type defines a fixed number of fields, or attributes, and each field is
usually of a fixed length. The use of fixed-length records simplifies the physical-level
implementation of the database. This simplicity is in contrast to many of the object-based models,
whose richer structure often leads to variable-length records at the physical level.
The three most widely accepted record-based data models are the relational, network, and
hierarchical models. The relational model, which has gained favor over the other two in recent
years. The network and hierarchical models are still used in a large number of older databases.
Here, we present a brief overview of each model.
Relational Model
The relational model uses a collection of tables to represent both data and the relationships among
those data. Each table has multiple columns, and each column has a unique name. Figure 3
presents a sample relational database comprising of two tables: one shows bank customers, and the
other shows the accounts that belong to those customers. It shows, for example, that Customer
Johnson, with
Customer-name Social-security Customer-street Customer-city Account-

number
Johnson 192-83-7465 Alma Palo Alto A-101
Smith 019-28-3746 North Rye A-215
Hayes 677-89-9011 Main Harrison A-102
Turner 182-73-6091 Putnam Stamford A-305
Johnson 192-83-7465 Alma Palo alto A-201
Jones 321-12-3123 Main Harrison A-217
Lindsay 336-66-9999 Park Pittsfield A-222
Smith 019-28-3746 North Rye A-201
Account- Balance
number
A-101 500
A-215 700
A-102 400
A-305 350
A-201 900
A-217 750
A-222 700
Figure 3. A sample relational database
Social-security number 192-83-7465, lives on Main in Harrison, and has two accounts: A-101,
with a balance of $500, and A-201, with a balance of $900. Note that customer Johnson and Smith
share account number A-201 (they may share a business venture)
Network Model
Data in the network model are represented by collections of records (in the Pascal sense), and
relationships among data are represented by links, which can be viewed as pointers. The records in
the database are organized as collections of arbitrary graphs. Figure 4 presents a sample network
database using the same information as in figure 3.
Figure 4. Network Model
Hierarchical Model
The hierarchical model is similar to the network model in the sense that data and relationships
among data are represented by records and links, respectively. It differs from the network model in
that the records are organized as collections of trees rather than arbitrary graphs. Figure 5 presents
a sample hierarchical database with the same information as in figure 4.
Figure 5. A sample hierarchical database.
Difference Among the Models
The relational model differs from the network and hierarchical models in that it does not use
pointers or links. Instead, the relational model relates records by the
Values that they contain. This freedom from the use of pointers allows a formal mathematical
foundation to be defined.
Physical Data Models
Physical data models are used to describe data at the lowest level. In contrast to logical data
models, there are few physical data models in use. Two of the widely known ones are the unifying
model and the frame-memory model.
Physical data models capture aspects of database-system implementation that are not covered in
this book.
DATABASE LANGUAGES
A database system provides two different types of languages: one to specify the database schema,
and the other to express database queries and updates.
Data-Definition Language
A database schema is specified by a set of definitions expressed by a special language called a

data-definition language (DDL). The result of compilation of DDL statements is a set of tables that
is stored in special file called data dictionary, or data directory.
A data dictionary is a file that contains metadata-that is, data about data.
This file is consulted before actual data are read or modified in the database system.
The storage structure and access methods used by the database system are specified by a set of
definitions in a special type of DDL called a data storage and definition language. The result of
compilation of these definitions is a set of instructions to specify the implementation details of the
database schemas-details are usually hidden from the users.
Data-Manipulation Language
The levels of abstraction apply not only to the definition or structuring of data, but also to the
manipulation of data. By data manipulation, we mean
• The retrieval of information stored in the database

• The insertion of new information into the database
• The deletion of information from the database
• The modification of information stored in the database
At the physical level, we must define algorithms that allow efficient access to data. At higher
levels of abstraction, we emphasize ease of use. The goal is to provide efficient human interaction
with the system.
A data-manipulation language (DML) is a language that enables users to access or manipulate data
as organized by the appropriate data model. There are basically two types:
• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Nonprocedural DMLs require a user to specify what data are needed without specifying
how to get those data.
A query is statement requesting the retrieval of information. The portion of a DML that involves
information retrieval is called a query language. Although technically incorrect, it is common
practice to use the terms query language and data-manipulation language synonymously.
INSTANCES AND SCHEMAS

Database changes over time when and as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called an instance of the database. The
overall design of the database is called the database schema, schemas one changes infrequently, if
at all.
Analogies to the concepts of data types, variables and values in programming languages are useful
here. Returning to the customer record types definition, note that in declaring the type of customer,
we have not declared any variables. To declare such variables in a Pascal-like language, we write
Var customer: customer; variable customer2 now corresponds to an area of storage containing a
customer type record.
A database schema corresponds to the programming-language type definition. A variable of a
given type has a particular value at a given instant. Thus, the value of a variable in programming
languages corresponds to an instance of a database schema. In other words “the description of a
database is called the database schema, which is specified during database design and is not
expected to change frequently”, A displayed schema is called a schema diagram.
E.g. student-schema.
Name. Roll No Class Mayan
Cours
Course No Department
e.
Schema diagram
A schema diagram displays only some aspects of a schema, such as the names of record types and
data items, and some types of constraints. Other aspects are not specified in the schema diagram.
As in the above diagram they’re neither in data type of each data item, nor in the relationships
among the various files.
DATA ABSTRACTION
For the system to be usable, it must retrieve data efficiently. This concern has led to the design of
complex data structures for the representation of data in the database. Since many database-system
users are not computer trained, developers hide the complexity from users through several levels
of abstraction, to simplify users’ interactions with the systems:
• Physical level. The lowest level of abstraction describes how the data are actually
stored. At the physical level, complex low-level data structures are described in detail.
• Logical level. The next higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The entire database is thus
described in terms of a small number of relatively simple structures. Although
implementation of the simple structures at the logical level may involve complex
physical-level structures, the user of the logical level does not need to be aware of this
complexity. Database administrators, who must decide what information is to be kept
in the database, use the logical level of abstraction.
• View level. The highest level of abstraction describes only part of the entire database.
Despite the use of simpler structures at the logical level, some complexity remains,
because of the large size of the database. Many users of the database system will not be
concerned with all this information. Instead, such users need to access only a part of the
database. So that their interaction with the system is simplified, the view level of
abstraction is defined. The system may provide many views for the same database.
The interrelationship among these three levels of abstraction is illustrated in Figure given
below.
The Three Levels of Data Abstraction
An analogy to the concept of data types in programming languages may clarify the distinction
among levels of abstraction. Most high-level programming languages support the notion of a
record type. For example, in a Pascal-like language, we may declare a record as follows:
type customer = record
customer-name : string;
social-security : string;
customer-street : string;
customer-city : string;
end
This code defines a new record called customer with three fields. Each field has a name and a type
associate with it. A banking enterprise may have several such record types, including
• Account, with fields account-number and balance
• Employee, with fields employee-name and salary

At the physical level, a customer, account, or employee record can be described as a block of
consecutive storage locations (for example, words or bytes). The language compiler hides this
level of detail from programmers. Similarly, the database system hides many of the lowest-level
storage details from database programmers. Database administrators may be aware of certain
details of the physical organization of the data.
At the logical level, each such record is described by a type definition, as illustrated in the
previous code segment, and the interrelationship among these record types is defined.
Programmers using a programming language work at this level of abstraction. Similarly, database
administrators usually work at this level of abstraction.
Finally, at the view level, computer users see a set of application programs that hide details of the
data types. Similarly, at the view level, several views of the database are defined, and database
users see these views. In addition to hiding details of the logical level of the database, the views
also provide a security mechanism to prevent users from accessing parts of the database. For
example, tellers in a bank see only that part of the database that has information on customer
accounts; they cannot access information concerning salaries of employees.
OVERALL SYSTEM STRUCTURE
A database system is partitioned into modules that deal with each of the responsibilities of the
overall system. Some of the functions of the database system may be provided by the computer’s
operating system. In most cases, the computer’s operating system provides only the most basic
services, and the database system must build on that base. Thus, the design of a database system
must include consideration of the interface between the database system and the operating system.
The functional components of a database system can be broadly divided into query processor
components and storage manger components. The query processor components include:
• DML compiler, which translates DML statements in a query language into low-level
instructions that the query evaluation engine understands. In addition, the DML compiler
attempts to transform a user’s request into an equivalent but more efficient form, thus
finding a good strategy for executing the query.
• Embedded DML precompiler, which converts DML statements embedded in an
application program to normal procedure calls in the host language. The precompiler must
interact with the DML compiler to generate the appropriate code.
• DDL interpreter, which interprets DDL statements and records them in a set of tables
containing metadata.
• Query evaluation engine, which executes low-level instructions generated by the DML
compiler.
The storage manger components provides the interface between the low-level data stored in the
database and the application programs and queries submitted to the system. The storage manger
components include:
• Authorization and integrity manger, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
• Transaction manager, which ensures that the database remains in a consistent (correct)
state despite system failures, and that concurrent transaction executions proceed without
conflicting.
• File manager, which manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
• Buffer manager, which is responsible for fetching data from disk storage into main
memory, and deciding what data to cache in memory.
In addition, several data structure are required as part of the physical system implementation:
Data files, which store the database itself.
Data dictionary, which stores metadata about the structure of the database. The data dictionary is
used heavily. Therefore, great emphasis should be placed on developing a good design and
efficient implementation of the dictionary.
Indices, which provide fast access to data items that hold particular values.
Statistical data, which store statistical information about the data in the database. This information
is used by the query processor to select efficient ways to execute a query.
naïve user application sophisticated database users
(tellers, agents, etc.) programmer users administrator
Figure 6 shows these components and the connections among them.
application application database

query
interfaces programs scheme
embedded
DML DML
DML
precompiler interpreter
application precompiler
programs query
object code processor
query
evaluation
engine
database
management
system
transaction buffer
manager manager
storage
manager
file manager
statistical
indices
data
disk storage
data data
files dictionary
DATA INDEPENDENCE
The ability to modify a schema definition in one level without affecting a schema definition in the
next higher level is called data independence. There are two levels of data independence:
1. Physical data independence is the ability to modify the physical schema without causing
application programs to be rewritten. Modification at the physical level are occasionally
necessary to improve performance.
2. Logical data independence is the ability to modify the logical schema without causing
application programs to be rewritten. Modifications at the logical level are necessary
whenever the logical structure of the database is altered (for example, when money-market
accounts are added to banking system).
Logical data independence is more difficult to achieve than is physical data independence,
since application programs are heavily dependent on the logical structure of the data that
they access.
The concept of data independence is similar in many respects to the concept of abstract
data types in modern programming languages. Both hide implementation details from the
users, to allow users to concentrate on the general structure, rather than on low-level
implementation details.
DATABASE USERS
The primary goal of a database system is to provide an environment for retrieving information
from and storing new information into the database. There are four different types of database
system users, differentiated by the way that they expect to interact with the system.
• Application programmers are computer professionals who interact with the system through
DML (Data Manipulation Language) calls, which are embedded in a program written in a host
language (for example, Cobol, PL/S, Pascal, C). These programs are commonly referred as
application programs. e.g.: A Banking system includes programs that generate payroll checks
that debit accounts, that credit accounts, or that transfer funds between accounts.
• Sophisticated Users: Such users interact with the system without writing programs. Instead,
they form their requests in database query language. Each such query is submitted to a very
processor whose function is to breakdown DML statement into instructions that the storage
manager understands. Analysts who submit to explore data in the database till in the category.
• Specialized Users: Such users are those who write specialized database applications that do
not fit into the fractional data-processing framework. e.g. computer-aided design systems,
knowledge base and expert systems, systems that store data with complex data types (for
example, graphics data and audio data).
• Naive users: These users are unsophisticated who interact with the system by involving one of
the permanent application programs that have been written. For example, a bank teller who
needs to transfer $50 from account A to account B invokes a program called transfer.
This program asks the teller for the amount of money to be transferred, the account from which
the money is to be transferred, and the account to which the money is to be transferred.
THE DATABASE SYSTEM ENVIRONMENT
A DBMS is a complex software system. In this section we discuss the types of software
components that constitute a DBMS and the types of computer system software with which the
DBMS interacts.
DBMS Component Modules
Figure 7 illustrates, in a simplified form, the typical DBMS components. The database and the
DBMS catalog are usually stored on disk. Access to the disk is controlled primarily by the
operating system (OS), which schedules disk input/output. A higher-level stored data manger
module of the DBMS controls access to DBMS information that is stored on disk, whether it is
part of the database or the catalog. The dotted lines and circles market A,B,C,D, and E in figure 7
illustrate accesses that are under the control of this stored data manager. The stored data manager
may use basic OS services for carrying out low level data transfer between the disk and computer
main storage, but it controls other aspects of data transfer, such as handling buffers in main
memory. Once the data is in main memory buffers, it can be processed by other DBMS modules,
as well as by application programs. Application
programmers
APPLICATION
PROGRAMS
Casual
DBA Staff users Parametric
users
PreCompiler
DDL PRIVILEGED INTERACTIVE
STATEMENTS COMMANDS QUERY
Host Language
Compiler
Query DML COMPILER

Compiler STATEMENTS (CANNED)
TRANSACTIONS
DDL E System
Compiler Catalog/ DML
Data Compiler
Dictionary execution
Run-time
execution Database execution
Processor
Stored D Concurrency Control/

Data
Backup/Recovery Subsystems
Manager
STORED DATABASE
Figure 7. Typical component modules of a DBMS. Dotted lines show accesses that are under the
control of the stored data manager.
The DDL complier process schema definition, specified in the DDL, and stores descriptions of the
schemas (meta-data) in the DBMS catalog. The catalog includes information such as the names of
files, data items, storage details of each file, mapping information among schemas, and
constraints, in addition to many other types of information that are needed by the DBMS modules.
DBMS software modules then look up the catalog information as needed.
The run-time database processor handles database accesses at run time; it receives retrieval or
update operations and carries them out on the database. Access to disk goes through the stored
data manager. The query compiler handles high-level queries that are entered interactively. It
parses, analyzes, and compiles or interprets a query by creating database access code, and then
generates calls to the run-time processor for executing the code.
The pre-compiler extracts DML commands from an application program written in a host
programming language. These commands are sent to the DML compiler for compilation into
object code for database access. The rest of the program is sent to the host language compiler. The
object codes for the DML commands and the rest of the program are linked, forming a canned
transaction whose executable code includes calls to the run-time database processor.
Figure 7 is not meant to describe a specific DBMS; rather it illustrates typical DBMS modules.
The DBMS interacts with the operating system when disk accesses-to the database or to the
catalog-are needed. If the computer system is shared by many users, the OS will schedule DBMS
disk access requests and DBMS processing along with other process. The DBMS also interfaces
with compilers for general-purpose host programming languages. User-friendly interfaces to the
DBMS can be provided to help any of the user types shown in figure 7 to specify their requests.
CLASSIFICATION OF DATABASE MANAGEMENT SYSTEM
Several criteria are normally used to classify DBMSs. The first is the data model on which the
DBMS is based. The two types of data models used in many current commercial DBMSs are the
relational data model and the object data model. Many legacy applications still run on database
systems based on the hierarchical and network data models. The relational DBMSs are evolving
continuously, and, in particular, have been incorporating many of the concepts that were
developed in object databases. This has led to a new class of DBMSs that are being called object-
relational DBMSs. We can hence categorize DBMSs based on the data model: relational, object,
object-relational, hierarchical, network, and other.
A DBMS is centralized if the data is stored at a single computer site. A centralized DBMS can
support multiple users, but the DBMS and the database themselves reside totally at a single
computer site. A distributed DBMS (DDBMS) can have the actual database and DBMS software
distributed over many sites, connected by a computer network. Homogeneous DDBMSs use the
same DBMS software at multiple sites. A recent trend is to develop software to access several
autonomous preexisting databases stored under heterogeneous DBMSs. This leads to a federated
DBMS (or multidatabase system), where the participating DBMSs are loosely coupled and have a
degree of local autonomy. Many DDBMs use a client-server architecture.
Another criterion is the cost of the DBMS. The majority of DBMS packages cost between $10,000
and $ 100,000. Single-user low-end systems that work with microcomputer cost between $100 and
$3000. At the other end, a few elaborate packages cost more than $100,000.
We can also classify a DBMS on the basis of the types of access path options for storing files. One
well-known family of DBMSs is based on inverted file structures. Finally, a DBMS can be
general-purpose or special-purpose. When performance is a primary consideration, a special-
purpose DBMS can be designed and built for a specific application; such a system cannot be used
for other applications without major changes. Many airline reservations and telephone directory
systems developed in the past are special-purpose DBMSs. These fall into the category of on-line
transaction processing (OLTP) systems, which must support a large number of concurrent
transactions without imposing excessive delays.
DATABASE SYSTEM UTILITIES
In addition to possessing the software modules just described, most DBMSs have database utilities
that help the DBA in managing the database system. Common utilities have the following types of
functions:
1. Loading: A loading utility is used to load existing data files-such as text files or sequential
files-into the database. Usually, the current (source) format of the data file and the desired
(target) database file structure are specified to the utility, which then automatically
reformats the data and stores it in the database. With the proliferation of DBMSs,
transferring data from one DBMS to another is becoming common in many organizations.
Some vendors are offering products that generate the appropriate loading programs, given
the existing source and target database storage descriptions (internal schemas). Such tools
are also called conversion tools.
2. Backup: A backup utility creates a backup copy of the database, usually by dumping the
entire database onto tape. The backup copy can be used to restore the database in case of
catastrophic failure. Incremental backups are also often used, where only changes since the
previous backup are recorded. Incremental backup is more complex but it saves space.
3. File reorganization: This utility can be used to reorganize a database file into a different
file organization to improve performance.
4. Performance monitoring: Such a utility monitors database usage and provides statistics to
the DBA. The DBA uses the statistics in making decisions such as whether or not to
reorganizes files to improve performance.
Other utilities may be available for sorting files, handling data compression, monitoring access by
users, and performing other functions.

Unit 1: Introduction To Databases

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Unit 1: Introduction To Databases

Загружено:

Авторское право:

Доступные форматы

1 Introduction to Databases

Product Planning Control

Student (Database Name)

Byte 10101011 (8-bits)

Field (Attribute name like name, Age, Address)

Record (One or more rows in a table)

File (Table or collection of all files)

Database (Collection of files or tables)

Field name or attribute name

Benefits of the Database Approach

• Redundancy and duplication can be reduced. In the database approach, the

Query Record Operation Other

The DBMS (database approach) tries to overcome all of the shortcomings of

Rule 2 : Guaranteed access Rule.

Rule 3 : Systematic treatment of null values.

Rule 4 : Dynamic on-line catalog based on the relational model.

Rule 5 : Comprehensive data sub-language Rule.

Rule 6 : .View updating Rule

Rule 7 : High-level insert, update and delete.

Rule 8 : Physical data independence.

Rule 9 : Logical data independence.

Rule 10 : Integrity independence.

Rule 11 : Distribution independence.

Rule 12 : Non-subversion Rule.

Object-Based Logical Models

• The entity-relationship model

The Entity-Relationship Model

Social-Security Customer-street Account-number Balance

Figure 1 A sample E-R diagram.

• Rectangles, which represent entity sets

Each component is labeled with the entity or relationship that it represents.

The Object-Oriented Model

Figure 2: Class hierarchy

Record-Based Logical Models

Here, we present a brief overview of each model.

Customer-name Social-security Customer-street Customer-city Account-

Figure 3. A sample relational database

Figure 4. Network Model

Figure 5. A sample hierarchical database.

Difference Among the Models

Physical Data Models

A database schema is specified by a set of definitions expressed by a special language called a

• The retrieval of information stored in the database

INSTANCES AND SCHEMAS

The Three Levels of Data Abstraction

• Employee, with fields employee-name and salary

OVERALL SYSTEM STRUCTURE

application application database

DBMS Component Modules

Query DML COMPILER

Stored D Concurrency Control/

CLASSIFICATION OF DATABASE MANAGEMENT SYSTEM

DATABASE SYSTEM UTILITIES

Вам также может понравиться