Вы находитесь на странице: 1из 18

DATABASE-1: Definitions and Concepts

Overview History of Database What is a Database? Components of a Database Different Database Users Data Abstraction Database Schema and Instances Database Languages DDL, DML, DQL, DCL Data Models Transaction Management Database Management System Data Administrator (DA) & Database Administrator (DBA) Advantages and Disadvantages of a DBMS

History of Database
Data may be defined as a collection of related information. For example during a census, information related to different people like name, age, occupation, income etc. may be collectively stored for each person, thus forming a list of related data. To make use of this huge amount of data, it needs to be stored to be accessed efficiently whenever needed.
No. CJ2ed/0221 Cobol Made Easy J. K. Jones Second Edition $10.95 A. Banerjee 2/3 Park Circus Calcutta 700019 1. MK1ed/2341 2. CG1ed/0542

Book Index Cards Member Index Cards

Before the advent of computers, paper index cards were used to store information and maintain a catalogue of different types of related data. For example a library would use different sets of index cards one set to keep a record of its books and another set to keep a record of its members. When a new book was bought, a separate index card was added to the existing list with information related to the new book. Similarly when a new member joined the library, a new card was added to the member set with information related to the member and the books borrowed. With the emergence of computers, instead of using paper index cards, data were stored in separate computer files in the form of records. Each file contained a group of related records. Thus the library would now have a file containing records of the different books and a separate file to contain records of the different members. The figure to the right shows one such application where a manufacturing company is using three separate files for its business.
Supplier Processing Application Supplier Data File

The first file, the Supplier file, stores a list of Supplier File User names, addresses, telephone numbers, and contact persons of different suppliers who are Order supplying raw materials to the company for Processing making their products. The second file, the Order file, is storing details of the purchase orders placed on the different Order File User suppliers for raw materials.

Application

Order Data File

Payment The third file, the Payment file, is storing details Processing of the payments made to the different suppliers Application against their purchase orders based on their terms of payment. Payment File User

Payment Data File

The users of the different files interact with the files by means of specific application programs. Thus the Supplier file user interacts with the Supplier file through the Supplier Processing Application program, the Order file user uses the DB01 Definitions and Concepts Page 1 of 18 Joyrup Bhattacharya

Order Processing Application program to access the Order file and the Payment file user runs the Payment Processing Application program to access the Payment file. Though File Processing Systems were an improvement over the earlier manual system of keeping records, however it had some major drawbacks: 1. Data Duplication: The same data may need to be stored in different files. For example in the above application when an order is placed on a supplier, the Order file should also contain the Supplier Name and Address. Thus the same data i.e. the Supplier Name and Address occurs in the Supplier file and also in the Order file. Moreover if several orders are placed on the same supplier at different times, the number of occurrences of the same data will be even more. Two problems arise due to this. Firstly unnecessary space is used up to store the same data at different places. The second problem is that of data integrity. A collection of data is said to have integrity if it is logically consistent everywhere. For example if a supplier changes its address, then the same change needs to be updated in all the files where the address of the supplier is stored. However chances are that due to manual or other mistakes the same data may not get updated everywhere, leading to data inconsistency and data integrity problems. 2. Separated and isolated Data: Usually different files are used to contain different information. In case data needs to be combined from these different files, the programmer must know which data need to be selected from which file, before combining them to form a third file. For example the Supplier file, Order file and the Payment file contain different information. If the user wants to find the payments that are due in a particular month, he has to refer to both the Order file and Payment file to get the required result. In case data need to be combined from many files, the process becomes complex on the part of the programmer. 3. File format dependent Applications: In file processing systems, application programs are written based on the data files on which they work. Thus the actual formats of the data in the files are an integral part of the application program code i.e. there is a dependency between the data files and the application programs that work on those files. In case any modification is made on the data type in a file, then all application programs that use that data file are also required to be modified. For example all application programs that process files containing phone numbers need to be modified if the phone number changes from 7 to 8 digits. Modifying several programs is a time consuming and error prone job. 4. File Incompatibility: In case different programming platforms are used to develop the application programs, then the formats of the data files on which these programs act will also be different for each of the programming languages used. Thus a C program data file will be different from a Visual Basic program data file. Under such a situation, if a requirement is there to combine data from the different data files, then the data files need to be first converted to a common format and then used. This is both a complex and time-consuming task. 5. User unfriendly data representation: In a file processing system it is difficult to combine data from different files and display them in a user friendly manner based on the specific requirements of the end user. This is because it is difficult to process relationships between different data from different files in a file processing system. Database technologies were developed to overcome these difficulties. Unlike File Processing systems, in Database Processing systems the user application programs do not directly interact with the stored data, but through an intermediate system called the Database Management System or DBMS. In doing so, the application programs become independent of the way in which the data is actually stored. DB01 Definitions and Concepts Page 2 of 18 Joyrup Bhattacharya

Data Integrity

What is a Database?
To overcome the shortfalls of the File Processing systems, Database technologies were developed during the 60s with major effort given by IBM Corporation. Unlike File Processing systems whose main components are application program dependent data files, the main component of a Database Processing System is a Database. A database is usually a collection of information or data related to a particular topic or subject. This data can be purely textual in nature like the name, address, and contact number etc. of a person in a telephone directory. It can contain graphical data like photograph of the person along with his name, or it can be a collection of audio/video files as in a music album database. Moreover in a database, the structure of the stored data should be independent of application programs that may use this data. In general a database should have the following characteristics: 1. Data Independence: In file processing systems, the application programs are dependent on the data structures of the data files on which they act. Thus if a data format is changed for better efficiency or accuracy or new data items are added to accommodate changes, the application programs also need to be changed to accommodate the changes. In a database system data items are stored independent of any application program. Application programs interact with the Database Management System which in turn interacts with the database, making the application programs independent of any changes made to the database. This ability to modify the data scheme in one level without affecting the data scheme in a higher level is called data independence. There are two kinds of data independence. These are Physical Data Independence (also called program data independence) and Logical Data Independence. The differences include: Physical Data Independence It is the ability to modify the physical schema of data storage without the need to rewrite the application programs that access the database. It leaves the users views and methods of accessing the information unaffected by changes made to the physical organisation of data at the physical or internal level. Such modifications are usually done to improve the overall performance of a database. Application programs do not dependent much on the physical structure of the data. Hence it is relatively easier to achieve physical data independence. Example: Changing the file organisation from sequential to random access to improve performance. Logical Data Independence It is the ability to modify the underlying logical or conceptual schema of a database without the need to rewrite application programs. It leaves the users views and methods for accessing the information unaffected by changes made to the structure of database at the conceptual level. It allows the logical structure of the database to be altered dynamically in case a change is required. Since application programs are highly dependent on the logical structure of the data, it is a difficult job to achieve logical data independence. Example: Adding a new field like the mobile phone number to an existing record of a person in a company.

Data Independenc e

2. Data Integrity: In a database a particular data is kept at a single place avoiding duplication of data. This ensures data updates need to be implemented at a single point only, eliminating chances for any confusion. 3. Data Flexibility: In a database, the same data can be accessed from many places simultaneously and in different ways based on the requirements of the respective application programs.

DB01 Definitions and Concepts

Page 3 of 18

Joyrup Bhattacharya

4. User Friendly Interface: The end user is not required to bother about the actual or physical storage of data. Highly technical software called the Database Management System takes care of the low level data structures and the relationship between the different data. Thus the complexity of the data and implementation details are hidden by the DBMS and the end user can access the required data with the minimal of technical knowledge. We can thus formally define a Database as stated below. A database is a self-describing, shared collection of interrelated data from where users can efficiently retrieve information in response to specific queries. Database Definition

Components of a Database

Supplier

The self-describing nature of a database implies SupI SupNam SupCity SupPhon D e e that we do not have to rely on any external 2136587 information to find out what the data in the 0001 Godrej Mumbai 1 database represent or the relation between the 2417555 0002 Bajaj Kolkata 2 different data components. In a File Processing system, the data files contain only the data, and 0003 Steelco Chennai 3265147 8 the description of the data is a part of the 2422369 Application Programs that access the data files. On 0004 Philips Kolkata the other hand, in a Database Processing system, the data description is inbuilt into the database along with the data. In the set of data shown above in the form of a table, the data part consists of the values: 0001, Godrej, Mumbai, 21365871 etc. Whereas the headings: SupID, SupName, SupCity, and SupPhone, describe the database structure and meaning of each data item. Both of these are inbuilt into the database. The data related to the structure or description of a database is called Metadata or Data Dictionary. For example Metadata includes the table names, the column names, the properties of the columns in the tables like the data-type, the length of each data-type etc. This Metadata is usually stored in the form of tables called System Tables. Thus in our previous example of the manufacturing company, we basically had three separate tables of data viz. Supplier, Order, and Payment. Similarly each table contained several columns in which data was divided and stored. The System Tables that can be used to describe the database for the above company are shown to the right. We have two System Tables containing the Metadata. These include System Tables describing the different TABLES (first table) and the different COLUMNS in the various tables (second table). The first table contains the names of the different tables that comprise the database, the number of columns in each table and the Primary Keys. Similarly the second table contains details about the different columns that form each of the tables. The column names, data-types of the data they contain and the lengths of each data type are included as information needed to describe the database.
Table Name Supplier Order Payment Column Name SupID SupCity SupPhon e OrdNo Date SupIDF Item Rate Table Name Supplier Supplier Supplier Order Order Order Order Order Number of Columns 4 6 4 Primary Key SupID OrdNo ReceiptNo Data Type Integer Text Text Integer Integer Text Integer Text Float Length 2 20 20 2 2 10 2 30 4 2 2 2 10 4

Metadata/ Data Dictionary

SupName Supplier

To improve database performance, a database contains another kind of data called indexes. Date Payment Text Suppose it is required to list the names of all suppliers in a particular city. Since the database Amount Payment Float may be stored as a sorted file with respect to the ID-number of the suppliers, it will be a time consuming process to query each CityIndex DB01 Definitions and Concepts Page 4 of 18

Qty Order Integer ReceiptN Payment Integer o OrdNoF Payment Integer

Index

SupCity SupID Chennai 0003 Kolkata Bhattacharya 0002 Joyrup Kolkata 0004 Mumbai 0001 Mumbai 0005

and every record and then find out the names located in a particular city. To speed up the process, a special data structure called an index may also be maintained by the database. It is similar to finding a name from the telephone directory. The index stores the different cities in alphabetical order and relates each city to the respective supplier ID as shown to the right. It is easier to find the city in alphabetical order from the index and then find the supplier name from the SupID given against each city name in the index. The DBMS looks for the supplier name from the Supplier table by matching the SupID as given in the index. We will discuss more about indexes in a later section. A database may also contain data about the applications that use the database. These may include the structure of the different data entry forms, the different types of reports or queries etc. This last category of data is called Application Metadata. We can summarise the components of a database by the diagram to the right. The basic unit of information stored in a database are bits. These bits combine to form characters (both strings and numbers). These strings and numbers are collected to form different fields, which in turn form records. Several records are collected to form data files. Data-files along with other special data structures like Metadata, Application Metadata and Indexes form the Database.

DATABASE FILES+METADATA+INDEXES RECORDS FIELDS CHARACTERS BITS

The term shared collection in the description of a database implies that all data is stored centrally in the database. This central data is then shared by every individual who has access to the particular data. Data is not stored in different individual files as per the need of different individuals with the same data repeating in more than one file, as in a file processing system. Different application programs fetch the data from the central database where a particular data is stored only once. The next term interrelated data implies that the data stored as different relations or tables are not independent but are related to each other. For example in the above example, the Order table is related to the Supplier table through the SupID attribute. From the Order table if we know the SupID, we can find out the phone number of the corresponding supplier from the Supplier table by matching the SupID numbers in both the tables. Therefore the data stored in different tables are related to each other by means of special attributes or keys (discussed in detail in later sections). The final part of the definition indicates that the information stored in a database can be efficiently updated and retrieved by the users by writing specific queries in a data query language like SQL. The queries are submitted to the database management system, which responds to these queries by combining data from different tables and present the required data to the end-user in a manner convenient to the user.

Database Users
The main aim of a database is to provide ways of storing and retrieving information in an efficient manner. To do this, different kinds of people may need to access or handle the database both during the development and during the implementation stage. These users include the general public accessing a public database like a railway reservation database. They may include company executives handling confidential data in the company database. At the lower end we have the computer professionals engaged in developing a database and the data-entry operators engaged in entering the raw data into the database. Depending upon the type of use, we can classify database users into the following categories: 1. Application Programmers: These are people who are engaged in developing general application programs to access databases. The application program is usually written in a base or host language (like C, Visual Basic etc.). DB01 Definitions and Concepts Page 5 of 18 Joyrup Bhattacharya

Commands in a special Data Manipulation Language (like SQL) are then embedded within the host language code, to access the database and perform data manipulations. 2. Sophisticated Users: These are the people who interact with a database without writing application programs but by requesting information from a database by writing queries in Data Manipulation Languages like SQL. These queries are then processed by a query processor and submitted to a database storage manager to provide the necessary outputs. Analysts who may be required to analyse data based on certain criteria and generate special reports fall under this category. 3. Specialised Users: These people are engaged in writing specialised database application programs involving complex data structures like graphics, audio, or video data or are engaged in writing special application programs to implement computer aided design systems. 4. Inexperienced Users: These are end users who interact with a database through permanent application programs like menu driven interfaces in a railway enquiry system, in an automated bank teller machine etc. 5. Database Administrators: In an organisation the Database Administrator is the person who is responsible for overall control and fine tuning of the database to get the best performance. The DBA is responsible for maintaining the database server and provide users with access to their required information as and when required.

Data Abstraction
In a database, the stored data needs to be retrieved and manipulated efficiently. Complex algorithms and data structures have been developed to do this. However not all users of the database are computer experts and hence may not be expected to understand these complex data structures to manipulate the data.
View Level View 2

View 1

View n

Logical Level

Physical To overcome this difficulty, the database Level approach provides some level of data abstraction i.e. the developers of the database hide from the database users the details of actually how the data is stored. Instead, it presents to the user a view of the data that is readily understandable by him. This helps to simplify the users interaction with the database system as it allows the user to manipulate the data without being concerned about the underlying mechanism by which the data gets actually stored.

In a database system thus different levels of data abstraction are used to simplify the final data representation i.e. to connect the raw data type to the final user view of the data. These levels include:

1. Physical Level: This is the lowest level of data abstraction. At this level the
complex low level data structures used to store the data are described. For example at the byte level, the different records that comprise a database may be stored as a linear linked list, as a binary tree structure, as fixed length records or as variable length records. The data representation at the physical level thus describes how blocks of data consisting of bytes of raw data are stored in consecutive storage locations. The database system hides many of these lowest level storage details from the database programmers and the end users.

2. Logical Level: This is the next higher level. At this level the data and the
relationships that exist between those data are defined. The entire database

DB01 Definitions and Concepts

Page 6 of 18

Joyrup Bhattacharya

is described in terms of relatively simple structures like data tables etc. though at the physical level this may involve manipulation of complex data structures. The logical level of abstraction is used by database administrators who decide what information needs to be kept in the database and the relationship between the different data. For example in a student database, the different aspects related to a student, like students personal data, students accounts related data, students academic performance related data etc. needs to be defined and the relationships that exist between these different aspects are established at the logical level.

3. View Level: This is the highest level of data abstraction. In case of a large
database, some complexity may still remain at the logical level. Moreover majority of users will not be required to access the entire database, but will be concerned with only a part of the database. Accordingly, depending upon the nature of use and the type of user, different user-friendly views of the database are defined. Apart from providing appropriate database views, this level also provides security to the database by providing selective access to different users. For example in a student database, different views or forms may be provided at the view level like the student personal data entry view, student fees entry view, students marks entry and report card generation view etc. Of these teachers may be given access to only the marks entry view, while the accounts department may be given access to the fees related data view etc. thus providing data security at these different view levels.

Database Schema and Instance


The overall design and description of a database is in general called the database schema. The schema is used to define the following: a) The physical structure of the database i.e. the data structures used to store the data physically in the database. It also specifies the character sets or symbols used to encode the data. ASCII is the best known character set used. b) The logical structure of the database i.e. the different relations or tables that comprise the database, the relationships between those tables and the different attributes for the relations. c) The different constrains or business rules that govern different transactions. d) Rules to determine who has access to the schema. Though the contents of a database may change over time, but its schema as determined during design time, is hardly changed. A database may have different types of schema at the different levels of data abstraction discussed earlier. Based on these, the Three-Schema Architecture has been developed to construct a database system. It consists of the following: 1. Physical/Internal schema: This corresponds mainly to the physical data abstraction level and deals with the physical organisation of data. It forms the lowest level and describes the different data structures used and how the raw data gets stored at the byte level.
User-1 View-1 User-2 View-2 n User-n View-

Database Schema

Three Schema Architecture

2. Logical/Conceptual schema: This corresponds mainly to the logical data Sub/External Views Schema Logical/Conceptual Schema Physical/Internal abstraction level. It is used to describe the Schema logical structure of the database based on the Stored Database different data types and the relationships that exist between those data types. It describes the different data operations possible and any constraint or business rule to be imposed on those data. The logical schema

DB01 Definitions and Concepts

Page 7 of 18

Joyrup Bhattacharya

hides the details of physical storage structures from the developer or database administrator. 3. Sub/External View schema: This corresponds to the view level of data abstraction and deals with the way a particular user application views the data from the database. It forms the highest level. Each view or external schema is used to describe a part of the database that a particular user group is interested in and hides the rest of the database from that user group. In general a database system supports one physical schema, one logical schema and several sub-schemas as shown in the diagram above. When a new database is defined, we only specify the database schema to the DBMS. At this stage the state of the database is empty as it contains no data. We get the initial state of the database only when the database is first filled with the initial data. Whereas a database schema describes the structure of a database, the database state or database instance indicates the collection of information stored at any particular moment in the database. At any point in time, a database has a current state or instance. It is the responsibility of the database management system to ensure that every instance of a database is a valid instance satisfying the various constraints specified in the schema. For example in case a bank allows a minimum account balance of Rs. 1000, then the DBMS should take care of this constraint to ensure that at no instance can a bank account have a balance less than Rs. 1000. Unlike a database schema, an instance can change frequently as and when data in added, updated or removed from the database. However changes may need to be applied to a schema once in a while. For example the mobile phone number or the email address may need to be incorporated to the existing database of customers in a bank. This is known as schema evolution and is allowed by most modern DBMSs during the time a database is operational.

DB Instance

Database Languages (DDL, DML, DQL, DCL)


To implement and use a database, three different classes of programming languages are used in general. These can be broadly divided into Data Definition Languages or DDL, Data Manipulation Languages or DML and the Data Control Languages or DCL. The functions and examples of these are described below: 1. Data Definition Language (DDL): The design and structure of a database is usually specified by a specific language called a Data Definition Language. The DDL forms a link between the logical and physical structure of a database i.e. the way the user views the data and the way the data is physically stored. Once the DDL statements are written and compiled, they produce a set of relations (tables), which are stored in a special file called a Data Dictionary or Data Directory. The major functions of the DDL are thus: a) To describe or create the logical schema or different relations in a database. b) To describe the data fields or attributes of each record i.e. to describe each fields logical name, data-type, field length, etc. c) To describe the relationships between the different relations. d) To describe the integrity constraints. e) Describe the specific keys and indexes for accessing the data. f) Provide means of data security and data restrictions. g) Provide means of logical and physical data independence. Examples of DDL statements in SQL include CREATE to establish a new table, ALTER to alter the structure of the database, DROP to delete tables from the database, TRUNCATE to remove all records from a table etc. 2. Data Manipulation Language (DML): Once the general structure of a database is formed using a DDL, the database can be accessed, filled and

DB01 Definitions and Concepts

Page 8 of 18

Joyrup Bhattacharya

manipulated by the user using a Data Manipulation Language. The Data Query Language or DQL is a subset of DML and is used to write specific queries to retrieve specific data. DQL is very flexible and can be used to express quite complicated queries, sometimes very concisely. The different functions and characteristics of a DML include: a) b) c) d) e) Insert new information into the database Retrieve existing information from the database based on certain criteria Delete information from the database Modify, sort, and update information in the database Enable a user and application programs to process data on a logical basis rather than bother about how the data is physically organised. f) Supports high-level languages (like COBOL, VB etc.) in which application programs are generally written. In general DML statements are embedded within high-level host languages in which application programs are written. In general there are basically two types of DML. These are: a) Procedural DMLs: In a procedural DML, to retrieve particular information, the user has to specify both the specific data requirement along with how to get that data. Procedural DMLs are more efficient than non-procedural languages. Example of a procedural approach include Relational Algebra which can be used to manipulate data organised in relations (tables) using the various relational operators. However relational algebra is hard to use and due to their complexity they are generally not used in commercial databases. b) Non-procedural DMLs: In a non-procedural DML, to retrieve a particular information, the user has to specify only the specific data requirement without specifying the means to get that data. Since a user is not required to specify the means of getting the data, these languages may not generate very efficient codes. Examples of non-procedural DMLs include Relational Calculus, Transform-Oriented-Languages (e.g. SEQUEL, SQL), Query-byExample and Query-by-Form (e.g. MS-Access). Of these, due to its complexity Relational Calculus is never used in commercial database processing. In Transform-Oriented-Languages like SQL, the input data may be expressed as several relations (tables), which are then transformed to express the required result as a single relation (table). Query-by-Example and Query-by-Form are graphical languages. In these, the user is presented with a graphical interface in the form of a Data-EntryForm. The database management system analyses the entries made by the user and generates the required queries. Examples of DML statements in SQL include SELECT to retrieve rows of data, INSERT to place new rows of data in the database, UPDATE to replace existing values in the database with new values, DELETE to delete rows of data etc. 3. Data Control Language (DCL): The Data Control Language defines activities that are not part of DDL or DML. DCL commands are used to control the distribution of access privileges to users. It defines, when proposed changes to a database can be made irreversibly. Only database administrator can execute DCL commands. Examples of DCL statements in SQL include CALL to execute an SQL procedure, RETURN to return a value from an SQL function, SET assignment: to assign a value to an SQL variable, VALUES to invoke an SQL routine, ALTER PASSWORD to change passwords etc.

Data Models
Data models are a collection of conceptual tools for describing the data, the relationships between the data, the constraints applicable on the data etc. There DB01 Definitions and Concepts Page 9 of 18 Joyrup Bhattacharya

are various data models available, which can be broadly classified into the following: 1. Physical Models: These data models are used to describe data at the lowest level of data abstraction i.e. the way the data is physically stored in the database. Two popular data models used to describe the physical architecture are: a) Unifying Model b) Frame Memory Model 2. Record Based Logical Models: These data models are used to describe data at the logical and view levels. It uses concepts that may be understood by the end users and at the same time not too far from the way data is actually organized within the computer. In this model, the database is formed using fixed format records of several types with each record type containing a fixed number of fixed length fields. The different record based models include:
Items Supply a) Relational Model: In this model, data and the relationship between ItemCod Item SupNam SupCit ItemCod e e y e them is represented as a collection Godrej Mumba I0001 Fridge I0001 of tables. Each table has multiple i Almira I0002 columns and rows with each column Steelco Kolkat h I0002 a having unique name. All columns in a I0003 Table Kolkat particular row in the table form a record. The figure Steelco I0003 a above shows a relational database consisting of the tables Items (2 columns) and Supply (3 columns). The Relational model is discussed in detail in a later section.

b) Hierarchical Model: The Hierarchical SupplyData model is the oldest of database models. Here records are logically ModernKolkata GodrejMumbai organised into a hierarchy of relationships forming an inverted SteelcoKolkata tree pattern. All records in a hierarchy are called nodes with each node I0003Table I0001Fridge related to the next in a Parent-Child relationship. Records that own other I0002Almirah I0003Table records are called parent records. The top parent record (here SupplyData) is called the root record. Each parent record can have one or more child records. But any child record can have only a single parent record. c) Network Model: This model is used to GodrejMumbaiSt I0001FridgeI00 store data similar to the hierarchy models eelcoKolkataMod 02AlmirahI000 parent-child relationship. However unlike ernKolkata 3Table the hierarchical model, it allows a record to be a child of more than one parent records. The relationship between different records is then represented by links in the form of pointers as shown in the figure above. In the example, the I0003|Table record can be seen to be a child of both the Steelco| Kolkata and the Modern| Kolkata records. 3. Object Based Logical Models: These data models are used in describing data at the logical and view levels. These models are closer to human perception and farther from system perception. Different object based logical models include: a) The Entity Relationship (ER) Model: The ER model views the real world as a collection of basic objects called entities with relationships existing between those entities. Each entity in turn is described by a set of attributes. Entities and relationships of the same type are grouped together to form an entity set and a SupItemName Sup-City relationship set. Several Name Code
Supplier Suppli es Items

DB01 Definitions and Concepts

Page 10 of 18

Joyrup Bhattacharya

graphical shapes are used to construct an ER diagram to express the overall logical structure of a database. b) The Object Oriented Model: By the middle of the 1980's it was observed that relational databases were not practical for storing data in fields like medicine, multimedia and high energy physics, all of which needed more flexibility in how their data was represented and accessed. This led to object oriented databases where users could define their own methods of access to data and how it was represented and manipulated. It is based on a collection of objects and codes called methods that operate on these objects. Objects that contain the same type of values and the same methods are grouped together into classes. Multimedia Databases, used for storing several different types of files i.e. text, audio, video and images in a single database, fall under this category.

Transaction Management
When working with a database, there may arise certain situations, when a particular transaction involves two or more separate operations which form one logical unit of work. For example consider the situation in a stock transfer. Suppose x units of item-t are transferred from the store in a factory to the showroom for sale. For a valid transfer, the stock of item-t in the factory should get reduced by x units and simultaneously the stock of item-t in the showroom should get increased by x units to keep the total number of units constant before and after the transfer. The transaction will be incomplete and erroneous if either the factory stock or the showroom stock is not updated due to some errors during the transfer. Thus either both the transactions should occur or neither should occur. This all-or-none requirement is called atomicity. A similar situation arises in case of money transfer form one bank account to another. There the debit from one account must be followed by a credit from another account simultaneously. Moreover in case of money transfer, the total amount involved in the transaction should be constant. Therefore an increase in the account A should correspond to a decrease in the account B, i.e. the sum of the money in account A and that in account B should be preserved. This requirement to maintain the correctness of the transfer is called consistency. After a particular transfer is over, the database should be able to preserve the new values in spite of any system snag or failure. This property is called durability. We call this collection of separate operations that form a single logical unit of work, a transaction. Each transaction forms one unit of both atomicity and consistency. In our above example, the change of records in the two accounts was carried out by two separate operations or programs. Here each program by itself does not transfer the database from one consistent state to another. Hence each program by itself does not carry out a transaction as the atomicity property is not satisfied in such an operation. Thus in case all the operations in a transaction do not take place due to a system failure or any other mishap, a failed transaction should have no effect on the state of the database and the database must be restored to the previous state before the said transaction had started. It is the responsibility of the Transaction Management Module of a DBMS to preserve the state of the database in case of any failures. Moreover it is the responsibility of the database programmer to design the database in such a manner so as to maintain these two properties in a transaction.

Atomicity

Consistency

Transaction

DB01 Definitions and Concepts

Page 11 of 18

Joyrup Bhattacharya

Database Management System (DBMS)


A Database Management System or DBMS is a collection of software programs that enables users to define, create, maintain and manipulate a database for various applications. The first step in handling a database is to define the database. This includes specifying the physical and logical structure of the database, defining the data types, the constraints imposed on the data, etc. This is usually done using Data Definition Languages (DDL). Once the logical and physical structure of the database is defined, the next step is creating the database. This implies populating the database i.e. actually entering data into a storage medium to form the database. The final step is to manipulate the database to enter, retrieve or update data using special application programs that incorporate statements in special Data Manipulation Languages (DML). We can thus summarise functions of a DBMS as: the different
Users and Programmers

Application Programs / Queries Query Processo r Transactio n Manager

DBMS

DBMS Software Storage Manager Database (File Manager + Buffer System


Manager)

Metadat a (Database Definition)

Data (Database Data)

Indexes

Database

1. Perform data storage and retrieval


functions and handle user queries

2. Implement
procedures administrators

data manipulation developed by the at the

3. Enforce

database security physical and logical level

4. Interface

with the OS to allocate computer resources like printers etc. to users case of system crashes, power outages etc.

5. Implement back up and recovery in


The above figure shows the essential parts of a Database Management System. These are now described in detail. 1) Database: The lowest level forms the database where the raw data is stored. At this level, we have the metadata and indexes stored along with the data. As discussed earlier, metadata deals with information related to the structure of the data. Apart from these the database level also contains another type of data structure called indexes which are used to find data items quickly in a database and hence helps to improve database performance. 2) DBMS Software: The next higher level is the DBMS software. It consists of several modules used to manipulate and process the data in the database. The different modules that are used include the following: a) Storage Manager: The function of this module is to modify the information in the database and retrieve information from the database, when requested by the higher levels. It thus serves as an interface between the low level data stored in the database and the application programs and DB01 Definitions and Concepts Page 12 of 18 Joyrup Bhattacharya

queries submitted to the database system. It translates the various DML statements into low level file system commands. Thus in a simple database, the storage manager may be the file system of the underlying operating system itself. However in larger databases it may consist of the following components: i) Authorisation and Integrity Manager: It checks whether a user is authorised to access the database. It is also responsible for maintaining the integrity of the system. To maintain the integrity, it interacts with the Query Processor to find out what data is being operated upon by the current queries. In case of several queries running in the system, it takes care so that no two queries interfere with each other. ii) Transaction Manager: It keeps track of the changes made to the data to recover lost data in case of a system failure and maintain a consistent state of the database. It maintains a data log containing a record of the changes made so that un-executed changes can be executed after the system has recovered from a failure. It also maintains execution of different transactions simultaneously without any conflict. iii) The Buffer Manager: The buffer manager is used to handle main memory. It obtains blocks of data from the disk and allocates the blocks to a portion of the main memory. The buffer manager will keep a block in the main memory as long as it is required and will return the block to the disk if the main memory is needed by another block. iv) The File Manager: The file manager is used to keep track of file locations on the disk. A file is stored in the storage device in a collection of disk blocks. When requested by the buffer manager, the file manager obtains the required block or blocks that contain a particular file. b) Query Manager/Processor: The job of Query Processor is to convert a query as submitted by the user, and expressed in a high level language (like SQL) into a sequence of commands in a low level language to the Storage Manager to retrieve the appropriate information. It is also handles requests for modification of data and metadata. It is usually made up of the following modules: i) DML Compiler: This module is used to translate DML statements in a query language (like SQL) to a low level language that the query evaluation engine understands. The DML compiler also optimises the user queries to increase the efficiency of the queries. ii) Embedded DML pre-compiler: This module interacts with the DML Compiler to generate the appropriate codes for DML statements embedded in an application program. iii) DDL Interpreter: This module is responsible for interpreting DDL statements and tabulating them in a set of tables called system tables that contain the metadata. iv) Query Evaluation Engine: This module receives low level instructions from the DML Compiler and executes them to retrieve the required data from the database. 3) Application Programs: Users interact with a database through application program interfaces. A typical DBMS allows programmers to write application programs that through system calls to the DBMS are able to manipulate data in a database. The most frequent interaction with a database is to query a database. Apart from queries, application programs are also written to modify data or modify the database schema. However access is given only to database administrators to modify an existing schema or create a new database. There may be several application programs that are used by different user types. 4) Users: At the outermost level are the end users as described earlier, who are responsible for maintaining and accessing the database. These include the DB01 Definitions and Concepts Page 13 of 18 Joyrup Bhattacharya

database administrator, the sophisticated users, the specialised users and the inexperienced users.

Data Administrator (DA) & Database Administrator (DBA)


It is the job of a special category of people in an organisation to determine whether a database technology has been successfully developed and implemented. These people are termed as Data Administrators (DA). The job of a DA is to look after the following:

1. Strategic Planning: The DA is the key person involved in strategic planning


of data resources and determines the major business areas or processes the database should serve.

2. Determine Data Requirement: The DA decides what data will be stored in


the database to carry out these processes and their corresponding data sources.

3. Determine Access Policies: The DA lays down policies for accessing and
maintaining the database and determines the access rights of the different database users.

4. The DA plays a business oriented role in determining the business


strategies and policies involved in using a DBMS. To do so he should have access to the top-level management and should be granted a wide range of authority in connection with the database. A Database Administrator (DBA) on the other hand is a technical person who is responsible for defining the internal model of a database. He is the person who creates and maintains a database. To design a database, the DBA first has to discuss with the users to determine their specific requirements. He then determines the physical storage requirement of the data, the accuracy requirement, frequency of data access, search strategies, and security levels of different data. The DBA also identifies the different data sources and the persons responsible for entering and updating the data. Finally with all the specifications available, the DBA converts these requirements into a physical design which specifies the hardware requirements of the database. Depending upon the above functions, we can classify the different jobs of a DBA as:

1. Schema Definition: The original database schema is created by the DBA


by writing a set of definitions. These are then translated by the DDL compiler to form a set of tables consisting of metadata that is stored in the data dictionary.

2. Storage Structure & Access Method Definition: The storage structures of


different data types and their access mechanisms are defined, guided by the need to efficiently store and retrieve the data. These definitions are then translated by the DDL compiler to form the actual data structures.

3. Schema Modification: In case there is the rare need to modify the logical or
the physical schema, the DBA is responsible to write a set of definitions that are translated by the DDL compiler to accomplish the required modification to the internal system tables.

4. Data Access Authorisation: Every user of the database may not be required to
access the entire database. Moreover some user may be allowed to modify data while some may be allowed to view data only. It is the DBA who is responsible for granting rights to different classes of users. This authorisation data is kept in a special system file which is consulted by the DBMS whenever a user wants to access the database.

5. Integrity Constraint Specification: Based on certain business rules or other


criteria there may be certain constraints on certain data types. For example a bank may allow a minimum bank account balance, beyond which a customer will

DB01 Definitions and Concepts

Page 14 of 18

Joyrup Bhattacharya

not be able to withdraw money. The DBA is required to specify all such integrity constraints explicitly.

DB01 Definitions and Concepts

Page 15 of 18

Joyrup Bhattacharya

Advantages and Disadvantages of using a DBMS


The advantages of using a DBMS over a File Processing System are: 1. Minimised data duplication: In a DBMS, a particular data is stored in one place only. Whenever any application is required to access the data, the DBMS retrieves the data for the application from that place. Since a particular data is stored at a single place, storage space is saved. Moreover when an update is required, data needs to be updated at one place only. This eliminates the problem of data integrity.
Supplier Processing Application Supplier File User Order Processing Application Order File User Payment Processing Application Payment File User

DBMS

Database

2. Data remains together: In a Database system, all data are stored at a single place called a database. Whenever an application program requires some data, the DBMS retrieves the data from the database. In case data from multiple locations need to be combined, the DBMS does the same by retrieving the required data from the database. 3. File format independent application programs: In a Database system the application programs that access the data, interact with the data through the DBMS and not directly with the database. In case any change occurs in the data formats, the DBMS takes care of the same. Thus physical and logical data independence makes application programs independent of schema modifications. 4. Compatibility between different files: In a Database Processing system, the application programs do not interact directly with the data files, instead they interact with the DBMS. The DBMS in turn interacts with the database files to generate the require results. Hence, in case different programming platforms are used to develop the application programs, they need to interact only with the DBMS and not with the different data files. Thus the question of compatibility in formats of different data files does not arise. 5. User Friendly Interfaces: Database technology makes it easier to represent data in a user friendly manner by combining data from different tables as required. In spite of the huge success of a DBMS over a conventional file processing system, however there are certain limitations of the DBMS approach as described below:

1. Concurrency Problems: In case a DBMS package is not designed for multiple


users, problems can arise when more than one user wants to access the database simultaneously. This problem of concurrently accessing the same record in a database is known as concurrency problem. For example let two persons A and B have a joint bank account. Suppose two of them simultaneously view their bank balance from two different ATMs. Let the bank balance shown be Rs. 40,000/-. Suppose A withdraws Rs. 20,000/- and closes the transaction whereby the DBMS program writes back a balance amount of 20,000/-. However B still sees the bank balance as Rs. 40,000/- as no change is made to the screen view of person B after the transaction by A. So B now withdraws Rs 25,000/- and closes the transaction which writes back the balance record as (40,000-25,000) = Rs.15,000/- by overwriting the previous record balance of Rs. 20,000/- as entered by A. Thus at the end of the transaction, the account shows a balance of Rs. 15,000/- when actually, there is a negative balance of Rs. 5000/-. Concurrency Problem

DB01 Definitions and Concepts

Page 16 of 18

Joyrup Bhattacharya

One can avoid a concurrency problem by locking a file when it is used by one person, so that it is not available for another person at the same time. Another method is to lock the particular record that is accessed by one user, so that the file may be available for another user for accessing other records.

2. Ownership Problem: In a file based system, generally data in a particular file is


handled by a particular individual. When a database is created using those files, the data is no longer the specific property of the application user, but instead is owned by the entire company. Any user with an access right should be able to access or use the data. Giving up ownership of data may be traumatic for any company employee and managers.

3. Resource Problem: When a DBMS is implemented, the amount of data that


needs to be accessed and manipulated also increases. To handle the new database and run the DBMS programs, extra resources or upgradation of existing resources may be required. Thus extra terminals, printers, storage devices, servers, communication devices, etc. may need to be purchased. This adds to the cost of setting up a DBMS.

4. Security Problem: The DBMS should be able to give access to the database to
authorised personnel only. Security considerations should include means of controlling physical access to terminals, storage devices, and specific interface forms for updating or deletion of records.

Questions from this Section


1. 2. 3. 4. 5. 6. 7. 8. 9. State the major differences between a file processing system and a DBMS.4 What are the disadvantages of a conventional file system? What is integrity problem? What is a Database? What are the levels of data abstraction? Explain each of them briefly. 2+4 What is a Database Schema? What is a DB instance? Describe the three schema architecture of a database. 3 What is the difference between logical and physical data independence? What are the different types of Database users? 4 3 3 3 4 2

10. State the different database languages. 3 11. Distinguish between DDL and DML. 12. What are the basic characteristics of DML? What are the types of DML? 13. What do you mean by atomicity and consistency? 14. What do you mean by a transaction? 15. Name different types of database models. 16. What is a DBMS? State the advantages and disadvantages of a DBMS. 17. What are the components of a Query Processor? 18. What are the components of a Storage Manager? 19. What are the major functions of a DBA? 20. What are the responsibilities of DBA and that of a database designer? DB01 Definitions and Concepts Page 17 of 18 4 3+3 2+2 3 2 2+4 4 4 4 4

Joyrup Bhattacharya

DB01 Definitions and Concepts

Page 18 of 18

Joyrup Bhattacharya

Вам также может понравиться