Вы находитесь на странице: 1из 721

Chapter 1, Problem 1RQ

Problem

Define the following terms: data, database, DBMS, database system, database catalog,
program-data independence, user view, DBA, end user, canned transaction, deductive database
system, persistent object, meta-data, and transaction-processing application.

Step-by-step solution

Step 1 of 14

Data

The word data is derived from the Latin which means ‘to give’; data is real given facts, from
which additional facts can be inferred. Data is a collection of known facts that can be recorded
and that have implicit meanings.

Comment

Step 2 of 14

Database
Database is a collection of related data or operational data extracted from any firm or
organization. In other words, a collection of organized data is called database.

Comment

Step 3 of 14

DBMS (Database Management System)

DBMS is a collection of programs that enables users to create, maintain, and manipulate a
database. The DBMS is a general purpose software system that facilitates the process of
defining, constructing, and manipulating database.

Comment

Step 4 of 14

Database Systems

A database system comprises a database of operational data, together with the processing
functionality required to access and manage that data. The combination of the DBMS and the
database is called database systems.

Comment

Step 5 of 14

Database Catalog
A database catalog contains complete description of the databases, database objects, database
structure, details of users, and constraints etc. that are stored.

Comment

Step 6 of 14

Program-data independence

In traditional file processing, the structure of the data files is ‘hard-coded” into the programs. To
change the structure of the data file, one or more programs that access that file, should be
changed. The process of changing can introduce errors. In contrast to this more traditional
approach, DBMS access stores the structure in a catalog, separating the DBMS programs and
the data definition. Storing the data and programs separately is known as program-data
independence.

Comment

Step 7 of 14

User View

The way in which the database appears to a particular user is called user view.

Comment

Step 8 of 14
DBA (Database Administrator)

DBA is a person who is responsible for authorizing access to the database, coordinating and
monitoring its use, and acquiring software and hardware resources as needed.

Comment

Step 9 of 14

End User

End users are the people who want to access the database for different purposes like, querying,
updating, and generating reports.

Comment

Step 10 of 14

Canned Transactions

Standardized queries and updates on the database using carefully programmed and tested
programs.

Comment

Step 11 of 14
Deductive Database System

A deductive database system is a database system that supports the proof-theoretic view of a
database, and ,in particular, is capable of deducing are inferring additional facts from the given
facts in the extensional database by applying specified deductive anxious are rules of inference
to those given facts.

Comments (3)

Step 12 of 14

Persistent object

Object-Oriented database systems are compatible with programming languages such as c++ and
JAVA. An object that is stored in such a way that it survives that termination of the DBMS
program is persistent.

Comment

Step 13 of 14

Meta Data

Information about the data is called Meta data. The information stored in the catalog is called
Meta data. The schema of a table is an example of Meta data.

Comment
Step 14 of 14

Transaction processing application

A transaction is a logical unit of database. The processing includes one or more database
operations like, insertion, deletion, modification and retrieval. The database operations that form
a transaction can either be embedded within an application program on they can be specified
interactively via a high-level query language such as SQL.

Comment
Chapter 1, Problem 2RQ

Problem

What four main types of actions involve databases? Briefly discuss each.

Step-by-step solution

Step 1 of 5

The four types of actions involve the database are as follows:

• Database Administration

• Database Designing

• Database Usage by end users.

• System Analysis and Application Programming

Comments (1)

Step 2 of 5

• Database Administration:
• Database Administration is a process of administering the database resources such as
application programs, database management system.

• Database Administrator (DBA) is responsible for giving the permission to access the
database.

• The administrative work also includes acquiring the software and hardware resources.

• The security of the database is also managed by the database administration.

Comment

Step 3 of 5

• Database designing:

• Database designing is a process of designing the database which includes identifying the data
to be stored in the database and which data structures will be required to store the data.

• Database design should fulfill the requirements of all the user groups of the organization.

Comment

Step 4 of 5

• Database Usage by end user:

• End users are the users who can directly access the database for querying, updating and
generating the reports. There are following types of end users:

o Casual end user: These are the users who access the database occasionally. Middle and
high-level managers are the examples of the Casual end users.
o Parametric end user: These are the users who constantly access the database. Bank tellers
are the examples of the parametric end users.

o Sophisticated end user: They are under the category of engineers, scientists who implement
the application to meet the complex requirements.

o Standalone users: These are the users who maintain personal database by using ready-made
program packages.

Comment

Step 5 of 5

• System Analysis and Application Programming:

• The system analysis is a process which determines the requirement of the end users.

• The system analysis is done by the System Analysts. System Analysts develop the
specification for the canned transactions that meet the requirement of the end users.

• The implementation of these specification is done by the Application programmers.

Comment
Chapter 1, Problem 3RQ

Problem

Discuss the main characteristics of the database approach and how it differs from traditional file
systems.

Step-by-step solution

Step 1 of 4

Characteristics of Database:

Self – Describing nature of a database system:

A fundamental characteristic of the database approach is that the database system contains not
only the database itself but also complete definitions are description of the database. Structure
and constraints.

• The information stored in the catalogs is called meta – data, and if describes the structure of the
primary database.

• In traditional file processing, data definition is typically part of the application programs
themselves.

Those programs are constrained to work with only one specific database; whose structure is
declared in the application programs.

Comment
Step 2 of 4

Insulation between programs and data and data abstraction:–

In traditional file processing, the structure of data files is embedded in the applications programs,
so any changes to the structure of a file may require changing all programs that access that file.

• DBMS access programs do not require such changes in vast cases.

• The structure of data files is stored in DBMS catalog separately from the access programs.

Comment

Step 3 of 4

Support of multiple views of the data

A database typically has many users; each of whom may require a different perspective are view
of the database.

• A multi-user DBMS whose users have a variety of district applications must provide facilities for
defining multiple view.

• In case of traditional approach multiple views of data not supported.

Comment

Step 4 of 4

Sharing of Data and Multi-user Transaction Process:–‘


A multi-user DBMS must allow multiple users to access the database at the sometime. The
DBMS must include concurrency central software to ensure that several users trying to update
the same data do so in an controlled manner so that the result of the updates is correct.

• In traditional database, no such data sharing is possible, there is no such concurrency software
available.

Comment
Chapter 1, Problem 4RQ

Problem

What are the responsibilities of the DBA and the database designers?

Step-by-step solution

Step 1 of 2

Responsibilities of DBA:

DBA stands for Data Base Administrator. The purpose of a database administrator is highly
technical, who is responsible for managing the database used in the organization.

• The database administrator has the responsibility to build the physical design of the database.

• The database administrator deals with the technical responsibilities like,

o Defence enforcement

o Performance of the database

o Provide access to the database

o Acquire resources such hardware and software components

o Backup of the data from the database

o Recovery of the lost data from the database

o Monitoring and Coordinating the use of database

o Monitoring response time and security breaches.


Comment

Step 2 of 2

Responsibilities of Database Designer:

Database designer is the Architect of the database, database designer work is versatile, and
He/she works with everyone in the organization. The responsibilities of database designer is as
follows,

• The data to be stored in the database is identified by the database designers

• Appropriate structure to store the data are chosen by database designers

• Database designer studies and understands the business needs

• They communicate about the architecture to business and management and also may
participates in business development as advisor

• Ensure consistency across database

• Create and Enforce database development standards and processes.

Comment
Chapter 1, Problem 5RQ

Problem

What are the different types of database end users? Discuss the main activities of each.

Step-by-step solution

Step 1 of 2

The end users perform various database operations like querying, updating, and generating
reports.

The different types of end users are as follows:

• Casual end users

• Naive or parametric end users

• Sophisticated end users

• Standalone Users

Comment

Step 2 of 2
Casual end users:

• The Casual end users access the database occasionally.

• Each time they access the database, their request will vary.

• They use sophisticated database query language to retrieve the data from the database.

Naive or parametric end users:

• Naïve or parametric end users spend most of their time in querying and updating the database
using standard types of queries.

Sophisticated end users:

• The sophisticated end users access the database to implement their own applications to meet
their specific goals.

• The sophisticated end users are engineers, scientists, and business analysts.

Standalone Users:

• The standalone end users maintain their own databases by creating one using the ready-made
program packages that provides a graphical user interface.

Comment
Chapter 1, Problem 7RQ

Problem

Discuss the differences between database systems and information retrieval systems.

Step-by-step solution

Step 1 of 14

Database Approach:– A databases is more than a file it contains information about more then
one entity and information about relationships among the entities.

Information retrieval systems:– It information retrieval system data are stored in file is a very old
rout often used approach to system developed.

Comment

Step 2 of 14

Database approach:– Data about a single entity (i.e., Product customer, department) are each
stored to a “table” in the database.

Comment
Step 3 of 14

Information retrieval systems: Each program (system) often had its own unique set of files.

Comment

Step 4 of 14

Database approach: Databases are designed to meet the needs of multiple users and to be used
in multiple applications.

Comment

Step 5 of 14

Information retrieval systems: User of information retrieval systems are almost always at the
mercy of the information department to write programs that manipulate stored data and produce
needed information.

Comment

Step 6 of 14
Database approach: Database approach are relatively complex to design, implement and
maintained.

Comment

Step 7 of 14

Information retrieval systems: Information retrieval systems are very simple to design and
implement as they are normally based on a single application or information system.

Comment

Step 8 of 14

Database approach: The process speed is slow in comparison to information retrieval systems.

Comment

Step 9 of 14

Information retrieval systems:– The processing speed is faster than other ways of storing data

Comment
Step 10 of 14

Author Differences :–

In database systems program – data independence, bent in case of information retrieval


systems program – data are dependence.

Comment

Step 11 of 14

In database system minimal data redundancy improved data consistence, enforcement of


standards improved data quality, but in information retrieval systems duplication of data is resent

Comment

Step 12 of 14

Improve data sharing is present in database, but in case of data retrieval limited data
sharing.

Comment

Step 13 of 14
In database flexibility and scalability are present but in retrieval system, data are not flexible
and scalable

Comment

Step 14 of 14

In database, reduce data redundancy, but in case of data retrieval systems data redundancy
is are of the important problems.

Comment
Chapter 1, Problem 7RQ

Problem

Discuss the differences between database systems and information retrieval systems.

Step-by-step solution

Step 1 of 14

Database Approach:– A databases is more than a file it contains information about more then
one entity and information about relationships among the entities.

Information retrieval systems:– It information retrieval system data are stored in file is a very old
rout often used approach to system developed.

Comment

Step 2 of 14

Database approach:– Data about a single entity (i.e., Product customer, department) are each
stored to a “table” in the database.

Comment
Step 3 of 14

Information retrieval systems: Each program (system) often had its own unique set of files.

Comment

Step 4 of 14

Database approach: Databases are designed to meet the needs of multiple users and to be used
in multiple applications.

Comment

Step 5 of 14

Information retrieval systems: User of information retrieval systems are almost always at the
mercy of the information department to write programs that manipulate stored data and produce
needed information.

Comment

Step 6 of 14
Database approach: Database approach are relatively complex to design, implement and
maintained.

Comment

Step 7 of 14

Information retrieval systems: Information retrieval systems are very simple to design and
implement as they are normally based on a single application or information system.

Comment

Step 8 of 14

Database approach: The process speed is slow in comparison to information retrieval systems.

Comment

Step 9 of 14

Information retrieval systems:– The processing speed is faster than other ways of storing data

Comment
Step 10 of 14

Author Differences :–

In database systems program – data independence, bent in case of information retrieval


systems program – data are dependence.

Comment

Step 11 of 14

In database system minimal data redundancy improved data consistence, enforcement of


standards improved data quality, but in information retrieval systems duplication of data is resent

Comment

Step 12 of 14

Improve data sharing is present in database, but in case of data retrieval limited data
sharing.

Comment

Step 13 of 14
In database flexibility and scalability are present but in retrieval system, data are not flexible
and scalable

Comment

Step 14 of 14

In database, reduce data redundancy, but in case of data retrieval systems data redundancy
is are of the important problems.

Comment
Chapter 1, Problem 8E

Problem

Identify some informal queries and update operations that you would expect to apply to the
database shown in Figure 1.2.
Step-by-step solution

Step 1 of 2

Information Queries:–

a) Retrieve the transcript – a list of all courses and grades – of ‘smith’

b) List the name of students who took the section of the ‘Database’ course offered in fall 2005
and their grades in that section.

c) List the pre-requisites of the “Database” course

Comment

Step 2 of 2

Updates Operations:–

a) Change the class of “Smith” to sophomore

b) Create a new section for the “Database” course for this semester.

c) Enter a grade of ‘A’ for ‘Smith’ in the ‘Database’ section of last semester

Comment
Chapter 1, Problem 9E

Problem

What is the difference between controlled and uncontrolled redundancy? Illustrate with
examples.

Step-by-step solution

Step 1 of 3

Storing the same facts or data at multiple places in the database is considered as redundancy. In
other words, duplication of data is known as redundancy.

Some of the problems with redundant data are as follows:

• Inconsistency of data

• Wastage of memory space

Comment

Step 2 of 3

Differences between controlled redundancy and uncontrolled redundancy is as follows:


Comment

Step 3 of 3

Example to illustrate controlled redundancy and uncontrolled redundancy is as follows:

Consider the following tables.

Employee(empno, ename, job, salary, dob)

Department(deptno, dname, location)

Project (pno, pname, description)

works(empno, deptno, pno)

Assume that an employee can work on multiple projects. So, in works table, empno and deptno
are redundant if an employee works on two or more projects.
Figure 1 is an example of controlled redundancy. Deptno for empno 100 is same in all three
records.

Figure 2 is an example of uncontrolled redundancy. Deptno for empno 100 is inconsistent in the
two records.

Comment
Chapter 1, Problem 10E

Problem

Specify all the relationships among the records of the database shown in Figure 1.2.
Step-by-step solution

Step 1 of 2

Relationship in the database specify how the data tables are related to each other.

Comment

Step 2 of 2

The relationship between tables are as follows:

• Consider the tables COURSE and SECTION. The two tables have common column
“Course_number”.

Hence, the table SECTION is related to COURSE through Course_number.

• Consider the tables STUDENT and GRADE_REPORT. The two tables have common column
“Student_number”.

Hence, the table GRADE_REPORT is related to STUDENT through Student_number.

• Consider the tables COURSE and PREREQUISITE. The two tables have common column
“Course_number”.

Hence, the table PREREQUISITE is related to COURSE through Course_number.

• Consider the tables SECTION and GRADE_REPORT. The two tables have common column
“Section_identifier”.

Hence, the table GRADE_REPORT is related to SECTION through Section_identifier.


Chapter 1, Problem 11E

Problem

Give some additional views that may be needed by other user groups for the database shown in
Figure 1.2.
Step-by-step solution

Step 1 of 2

Additional views for the given database:

New view can be created, which filters each section number of a student and grade of the
student.

GRADE_SEC_REPORT

Student_number Section_identifier Course_number Grade

This view is very helpful for university’s administration to print each section’s grade report.

Comment

Step 2 of 2

Additional view can be created, which filters total number of courses took by a student and the
grade achieved by a student in that courses.

COURSE_GRADE_REPORT

Student_number Course_number Grade GPA

This view is very helpful for university’s administration to determine students’ honours.
Chapter 1, Problem 12E

Problem

Cite some examples of integrity constraints that you think can apply to the database shown in
Figure 1.2.
Step-by-step solution

Step 1 of 1

Few constraints that can be imposed on database are:

1. Grade can be given only to enrolled students.

2. Each section must belong to any Course.

3. Each course must be a part of existing department

4. Prerequisite of each course must have been an offered course in past or must be an existing
course.

Student must be a part of section for which he is graded

Comment
Chapter 1, Problem 13E

Problem

Give examples of systems in which it may make sense to use traditional file processing instead
of a database approach.

Step-by-step solution

Step 1 of 2

Despite the advantages of using a database approach, there are some situations in which a
DBMS may involve unnecessary overhead costs that would not be incurred in traditional file
processing.

Comment

Step 2 of 2

The following are examples of systems in which it may make sense to use traditional file
processing instead of a database approach.

• Many computer aided design foals (CAD) used by the chemical and civil engineers have
proprietary file and data management software that is geared for the internal manipulations or
drawing and 3D objects.
• Similarly, communication and switching systems designed by companies like At & T.

• The GIS implementations often implement their own data organization schemes for efficiently
implementing functions related to processing maps, physical contours, lines, polygons, and so
on. General purpose DBMS’s are inadequate for their purpose.

• Small single user applications.

• The real-time navigation system that requires less data.

Comment
Chapter 1, Problem 14E

Problem

Consider Figure 1.2.

a. If the name of the ‘CS’ (Computer Science) Department changes to ‘CSSE’ (Computer
Science and Software Engineering) Department and the corresponding prefix for the course
number also changes, identify the columns in the database that would need to be updated.

b. Can you restructure the columns in the COURSE, SECTION, and PREREQUISITE tables so
that only one column will need to be updated?
Step-by-step solution

Step 1 of 2

a) The following columns need to be updated when the name of the department changed along
with the course number.

In the STUDENT table, Major has to be updated. In the COURSE table, Course_number and
Department should be updated. In the SECTION table, Course_number should be updated. In
the PREREQUISITE table, Course_number and Prerequisite_number are to be modified.

Comment

Step 2 of 2

b) The columns of the tables are split as follows:

The tables are as follows after restructuring:


Comments (1)
Chapter 2, Problem 1RQ

Problem

Define the following terms: data model, database schema, database state, internal schema,
conceptual schema, external schema, data independence, DDL, DML, SDL, VDL, query
language, host language, data sublanguage, database utility, catalog, client/server architecture,
three-tier architecture, and n-tier-architecture.

Step-by-step solution

Step 1 of 19

Data model

The data model describes the logical structure of the database and it introduces abstraction in
the DBMS (Database Management System). The data model provides a tool to describe the data
and their relationships.

Comment

Step 2 of 19

Database Schema
The database schema describes the overall design of the database. It is a basic structure to
define how the data is organized in the database. The database schema can be depicted by the
schema diagrams.

Comment

Step 3 of 19

Database state

The actual data stored in the database in a moment in time is called the database state.

Comment

Step 4 of 19

Internal Schema

It is also referred as the Physical level schema. The internal schema represents the structure of
the data as viewed by the DBMS and it describes the physical storage structure of the database.

Comment

Step 5 of 19

Conceptual Schema
It is also referred to as the Logical level schema. It describes the logical structure of the whole
database for a group of users. It hides the internal details of the physical storage structure.

Comment

Step 6 of 19

External Schema

The external schema referred as User level schema. It describes the data which is viewed by the
end users. This schema describes the part of the database for a user group and it hides the rest
of the database from that user group.

Comment

Step 7 of 19

Data independence

The capacity to change the schema at the physical level of a database system without affecting
the schema at the conceptual or external level is called data independence.

Comment

Step 8 of 19

DDL
DDL stands for Data Definition Language. It is used to create, alter, and drop the database
tables, views, and indexes.

Comment

Step 9 of 19

DML

DML stands for Data Manipulation Language. It is used to insert, retrieve, update, and delete the
records in the database.

Comment

Step 10 of 19

SDL

SDL stands for Storage Definition Language. It is used to specify the internal schema of the
database and specify the mapping between two schemas.

Comment

Step 11 of 19

VDL
VDL stands for View Definition Language. It specifies the user views and their mappings to the
logical schema in the database.

Comment

Step 12 of 19

Query Language

The query language is a high-level language used to retrieve the data from the database.

Comment

Step 13 of 19

Host Language

The host language is used for application programming in a database. The DML commands are
embedded in a general-purpose language to manipulate the data in the database.

Comment

Step 14 of 19

Data Sublanguage

The data manipulation language commands are embedded in a general-purpose language to


manipulate the data such as insert, update, and delete operations in the database, here the DML
is referred as a data sublanguage.

Comment

Step 15 of 19

Database utility

The database utility is a software module to help the DBA (Database Administrator) to manage
the database.

Comment

Step 16 of 19

Catalog

The catalog stores the complete description of the database structure and its constraints.

Comment

Step 17 of 19

Client/server architecture

The client/server architecture is a database architecture and it contains two modules. A client
module usually a PC that provides the user interface. A server module can respond the user
queries and provide services to the client machines.
Comment

Step 18 of 19

Three-tier architecture

The three-tier architecture consists of three layers such as client, application server, and
database server. The client machine usually contains the user interface and the intermediate
layer (application layer) running the application programs and storing business rules. The
database layer stores the data.

Comment

Step 19 of 19

n-tier architecture

The n-tier architecture consists of four or five tiers. The intermediate layer or business logic layer
is divided into multiple layers. And distributing programming and data throughout a network.

Comment
Chapter 2, Problem 2RQ

Problem

Discuss the main categories of data models. What are the basic differences among the relational
model, the object model, and the XML model?

Step-by-step solution

Step 1 of 2

The three main categories of data models are as follows:

• High-level or Conceptual data model

• Representational or implementational data model

• Low -level or Physical data model

Comment

Step 2 of 2

The Differences between relational model, the object model and XML model are as
follows:
Relational Model Object Model XML Model

The data in relational model It refers to the model which The data in the XML model is in
is represented logically and deals with how applications hierarchical mode. We can
information about the will interact with the resources define different types of the
relationship types. from any external resource. data in a single XML document.

It also deals with the


The data is defined in
relationship between the
columns with the field name The data in XML document
classes, methods and
and the entire data in a does not have any inherent
properties of the classes.
column must be in the same ordering.
type. It is closer to conceptual data
models.

The relational database The classes in the object


Data is represented in the form
uses high-level query model are designed in acyclic
of tags known as elements.
language graph manner.

Example: Document Object


Example: SQL Example: Stylus studio
Model (DOM)

Comment
Chapter 2, Problem 3RQ

Problem

What is the difference between a database schema and a database state?

Step-by-step solution

Step 1 of 1

Difference between a database schema and a database state:-

Database schema is a description of the database and the database state is the database it
self.

The description of a database is called the database schema, which is specified during
database design and is not expected to change frequently. Most data models have certain
convention for displaying schemas as diagram. A displayed schema is called a schema diagram
schema diagram displays the structure of each record type but not the actual instances of
records. A schema diagram displays only some aspects of a schema, such as the names of
record types and data items, and some types of constraints.

The data in the database at a particular moment in time is called a database state. It is also
called the current set of occurrences are instances in the data base. In a given database state,
each schema construct has its own current set of instances many database states can be
constructed to covers pond to a particular data base schema. Every time we insert are delete a
record are change the value of a data item in a record we change one state of the database into
another state.
When we define a new database we specify its database schema only to the DBMS. At this
point, the covers pending database state in the empty state with no data. The DBMS in partly
responsible for ensuring the every state of the database is a valid state. – that is , a state that
satisfies the structure and constraints specified in the schema.

The schema is sometimes called the intension, and a database state is called an extension of
the schema.

Comment
Chapter 2, Problem 4RQ

Problem

Describe the three-schema architecture. Why do we need mappings among schema levels? How
do different schema definition languages support this architecture?

Step-by-step solution

Step 1 of 3

Three-schema architecture :-

The goal of he three-schema architecture is to separate the user applications and the physical
database. In this architecture schemas can be defined at the following three levels.

(1) internal level :-

it has an internal schema, which describes the physical storage structure of the database.

(2) Conceptual level :-

It has a conceptual schema, which describes the structure of the whole database for a
community of users. The conceptual schema hides the details of physical storage structures and
concentrates on describing entities, data types, relationships, user operations and constraints.

Comment
Step 2 of 3

(3) External level :-

It includes a number of external schema are user views. Each external schema describes the
part of the database that a particular user group is interested in and hides the rest of the
database from that group. A high-level data model on an implementation data model can be used
at this level.

Need of mapping :-

The process of transforming requests and results between levels are called mappings.

The conceptual internal mapping define the coverspondence between the conceptual view and
the stared database. It specifies how conceptual records and fields are represented at the
internal level.

An external conceptual mapping defines the covers pondence between a particular external view
and the conceptual view.

Comment

Step 3 of 3

Different schema definition language :-

DDL :-

Data definition language is used to specify conceptual and internal schemas for the database
and any mappings between the two, the DBMS will have a DDL compiler whose function is to
process DDL statements in order to identify descriptions of the schema constructs and to store
the schema description in the DBMS catalog.

SDL :-
Storage definition language is used to specify the internal schema. The mappings between the
two schemas may be specified in either one of these languages. In mast relational DBMS’s to
day, there is no specific language that performs the sale of SDL. Instead the internal schema is
specified by a combination of parameters and specifications related to storage.

VDL :-

View Definition Language is used to specify user view and their mappings to the conceptual
schema but in most DBMS’s the DDL is used to define both conceptual and external schemas. In
relational DBMS’s SQL is used in the sale of VDL to define user are application views as results
of predefined queries.

Comment
Chapter 2, Problem 5RQ

Problem

What is the difference between logical data independence and physical data independence?
Which one is harder to achieve? Why?

Step-by-step solution

Step 1 of 3

The data independency refers to the task of changing a level of schema without affecting the
other levels or the levels at higher level.

There are following two different ways in which data independence is achieved:

• Logical data independence

• Physical data independence

Comment

Step 2 of 3

Logical data independence is the capacity to change the conceptual schema without changing
the external schema. This only requires changing the view definition and the mappings. For
example, changing the constraints of an attribute that does not affect the external schema,
insertion and deletion of data items that changes the table size but does not affect the external
schema.

Physical data independence is the capacity to change the internal schema without changing the
conceptual schema or the external schema. For example, reorganization of files on the physical
storage to enhance the operations on the database and since the data is the same and only the
files are relocated, the conceptual/external schema remains unaffected.

Comment

Step 3 of 3

The logical data independence is harder to achieve. Changing the attribute constraints and the
structure of the table might result in invalid data for the changed attributes. The table or the
application program that references the modified table will get affected which should not be the
case in logical data independence.

Comment
Chapter 2, Problem 6RQ

Problem

What is the difference between procedural and nonprocedural DMLs?

Step-by-step solution

Step 1 of 2

Difference between procedural and nonprocedural DML “-

Procedural DML :-

Procedural data manipulation language is called low level DML. Procedural DML must be
embedded in a general purpose programming language. This type of DML typically retrieves.
Individual records are objects from the database and process each separately. Therefore, it
needs to use programming language. Constructs, such as looping to retrieve and process each
record form a set of records.

Procedural DMLs are also called record –at-a-time DML.

Comment

Step 2 of 2
Non-procedural DML :-

Non-procedural is called high level DML. Non-procedural DML can be used on its own to specify
complex database operations concisely many DBMS’s allow high-level DML statements either to
be entered interactively from a display monitor ore terminal are to be embedded in a general-
purpose programming language.

A query in a high level DML often specifies which data to retrieve rather than how to retrieve it.
Therefore such languages are also called declarative.

Non-procedural DML requires a user to specify what data are needed without specifying low to
get these data.

Comment
Chapter 2, Problem 7RQ

Problem

Discuss the different types of user-friendly interfaces and the types of users who typically use
each.

Step-by-step solution

Step 1 of 7

User friendly interfaces provided by the DBMS are as follows:

(a)

Menu-Based interfaces:

• These interfaces contain the lists of options through which the user can send the request.

• Pull-down menus are a very popular technique in web-based user interfaces.

User who use the interface:

• These types of interfaces are used by the web browsing users and web clients.

Comment

Step 2 of 7
(b)

Forms-based interfaces:

• These types of interfaces display a form to each user.

• The user can fill the entries to insert new data.

• These Forms are usually designed and programmed for naive users as interfaces to recorded
transactions.

User who use the interface:

• User who wants to submit the online information by filling and submitting the details.

• Mostly used to create accounts on a website, or enrolling into some institution etc.

Comment

Step 3 of 7

(c)

Graphical user interfaces:

• A graphical user interfaces contain a diagrammatic form that comprises a schema to the user.

• The user can ask a query by manipulating the diagram.

• These interfaces use mouse as pointing device to pick certain parts of the displayed schema
diagram.

User who use the interface:

• Mostly used by the users who uses the electronic gadgets such as mobile phones and touch
screens.

• Users who uses the applications that are accessed by pointing devices.
Comment

Step 4 of 7

(d)

Natural language interfaces:

• These interfaces accept the request from the user and tries to interpret it.

• The natural language interfaces have its own schema which is like the database conceptual
schema.

User who use the interface:

• The Search engines in these days are using natural language interfaces.

• The users can use these search engines that accepts the words and retrieves the related
information.

Comment

Step 5 of 7

(e)

Speech input and output:

• These interfaces accept speech as an input and outputs the speech as a result.

User who use the interface:

• These types of interfaces are used in the inquiry for telephone directory or to get the flight
information over the smart gadgets, etc.
Comment

Step 6 of 7

(f)

Interfaces for parametric users:

• Paramedic users such as bank tellers have a small set of operations that they must perform
repeatedly.

• These interfaces contain some commands to perform a request with minimum key strokes.

User who use the interface:

• These can be used in bank transactions to deposit or withdrawal of money.

Comment

Step 7 of 7

(g)

Interfaces for the DBA:

• These interfaces contain some commands for creating accounts, to manipulate the database
and to perform some operations on the database.

User who use the interface:

• These interfaces are specifically used by the Database administrators.

Comment
Chapter 2, Problem 8RQ

Problem

With what other computer system software does a DBMS interact?

Step-by-step solution

Step 1 of 7

Database management system (DBMS):

A database management system (DBMS) is a set of program that empowers users to build and
maintain a database.

It is a general-purpose software system that enables the processes to define, construct,


manipulate, and share databases among various applications and users.

Comment

Step 2 of 7

List of other computer system software a database management system (DBMS) interacts
with:
The following are the list of other computer system software a database management system
(DBMS) interacts with:

• Computer-Aided Software Engineering (CASE) tools.

• Data dictionary systems.

• Application development environments.

• Information repository systems.

• Communication software.

Comment

Step 3 of 7

CASE tools:

The design phase of the database system often employs the CASE tools.

Comment

Step 4 of 7

Data dictionaries:

Data dictionaries are similar to database management system catalog, however, they include
variety of information.

• Typically, data dictionaries can be directly accessed by the database administrator (DBA)
whenever required.
Comment

Step 5 of 7

Application development environments:

Typically, application development environments often provide an environment to develop


database application and have facilities that aid in many features of database systems, including
graphical user interface (GUI) development, database design, querying, update, and application
program development.

• Examples of application development environments are listed below:

o JBuilder (Borland)

o PowerBuilder (Sybase)

Comment

Step 6 of 7

Information repository systems:

• The information repository is a kind of data dictionary that can also stores information like
design decisions, application program descriptions, usage standards, and user information.

• Like data dictionaries, information repository can also be directly accessed by the database
administrator.

Comment
Step 7 of 7

Communication software:

• The database management system also requires interfacing with communication software.

• The main function of the communication software is to enable users residing remote from the
database system to access the database through personal computers, or workstations.

• The communication software are connected to the database system through communications
hardware like routers, local networks, phone lines, or satellite communication devices.

Comment
Chapter 2, Problem 9RQ

Problem

What is the difference between the two-tier and three-tier client/server architectures?

Step-by-step solution

Step 1 of 2

The difference between a two-tire architecture and a three tire architecture is that of a layers
through which data and queries pass at time of processing, for any database.

In two tire architecture there is two layers viz., Client layer (user interface) and query server or
transaction server. Application programs run on client side and when data processing is required
connection is established with the server (DBMS), where data is stored. Once connection is
established, transaction and query requests are sent using Open Database Connectivity’s API’s,
which are then processed at server side. It may also happen that client side takes care of user
interaction and query processing while server stores data, manages disks etc. Exact distribution
of functionality differs but two - tire architecture has two layers.

Comment

Step 2 of 2
In three- tire architecture there are three layers, and a new application or web layer is between
client and database service layer. The idea behind three tire architecture is to partition roles in
different layers and each layer has specific task. In three-tire architecture, user layer or client
layer provide user interface from where user can run query. Query gets processes at application
or web server layer. This layer also checks for any business constraints that may be imposed on
type of query user can send or verify credentials of user so has verify access permissions that
user has. This layer can also be called as Business logic layer. Finally Database server manages
storage of data in the system.

Comment
Chapter 2, Problem 10RQ

Problem

Discuss some types of database utilities and tools and their functions.

Step-by-step solution

Step 1 of 2

Few categories of database utilities and tools and their functions are:

1. Loading:

Load existing data files such as text files into the database.

• Transfer data from one dbms to another dbms easily used in many organizations.

• Vendors are offering the conversion tools. Those tools are useful loading programs.

2. Backup:

It is one of the utility that organize a backup copy of the database.

• Put entire database onto tape and those database backup copies can be used in the case of
catastrophic loss for recovering system state.

Comment
Step 2 of 2

3. Database storage reorganization:

It is a utility that can be used to restructure a set of database files into a different file organization
to raise the performance of the database.

4. CASE tools:

CASE tools are used to produce a plan for a database application.

5. Data Dictionary system:

Information repository plays main role in data dictionary system.

• It is one of the repository is used to store design process, user information and application
program description.

• This information can be accessed by user when it is required.

• Information repository contains additional information than the DBMS catalog.

6. Performance monitoring:

It is used to control database usage and maintain stats.

• Those stats are used by the DBA in making selection, those selections are related to file
restructure and indexing for raise the performance of database.

There are several utilities are available those are

• Sorting the text files in the database.

• Data compression techniques handled by database.

Comment
Chapter 2, Problem 11RQ

Problem

What is the additional functionality incorporated in n-tier architecture (n > 3)?

Step-by-step solution

Step 1 of 1

It is customary to divide the layer between the user and the stored data in three tire architecture
into finer components, thereby giving rise to an n-tire architecture, where n may be 4 or 5.
Typically, the business logic layer is divided into multiple layer.

1. N-tire architecture distributes data and programming over the network.

2. Each tire can run on appropriate processor or operating system platform and can be handled
independently.

Another layer that is typically used by vendors of ERP and CRM packages is the middleware
layer which accounts for the front-end modules communicating with a number of back-end
databases.

Comment
Chapter 2, Problem 13E

Problem

Choose a database application with which you are familiar. Design a schema and show a samp
database for that application, using the notation of Figures 1.2 and 2.1. What types of additional
information and constraints would you like to represent in the schema? Think of several users o
your database, and design a view for each.
Step-by-step solution

Step 1 of 2

Consider Flight Reservation system.

• Each flight is identifies by Number, and consists of one or more FLIGHT_LEGs with Leg_no.
And flies on certain weekdays.

• Each FLIGHT_LEG has scheduled arrival and departure time and arrival and departure airport
and one or more LEG_INSTANCEs – one for eachDate on which flight travels.

• FARE is kept for each flight and there are certain set of restrictions on FARE.

• For each FLIGHT_LEG instance, SEAT_RESERVATIONs are kept, as are AIRPLANE used on
each leg and the actual arrival and departure times and airports.

• AIRPLANE is identified by an airplane id, and is of a particular AIRPLANE_TYPE. It has a fixe


no. of seats.

• CAN_LAN relates AIRPLANE_TYPE to the AIRPORTS at which they can land.

• AIRPORT is identified by airport code.

Comment

Step 2 of 2

Following constraints hold good on schema:

a. Asked flight number or flight leg is available on given date. Data can be checked from
LEG_INSTANCE table.

b. A non reserved seat must exist for specifies date and flight. We can get total number of seats
available from AIRPLANE.
c. Fligh_leg can correspond to existing flight number.

d. Arrival and code must be of existing airports.

e. Leg_instance can have entries only for valid Flight_number and leg_number combination.

f. Flight_number in any relation is of a valid flight that has its entry in FLIGHT table.

g. Airplane_type_name in CAN_LAND must be a vlaid name from AIRPLANE_TYPE.

Comment
Chapter 2, Problem 14E

Problem

If you were designing a Web-based system to make airline reservations and sell airline tickets,
which DBMS architecture would you choose from Section 2.5? Why? Why would the other
architectures not be a good choice?

Step-by-step solution

Step 1 of 4

There are four architectures discussed in section 2.5 in the textbook. They are

1. Centralized DBMS architecture

2. Basic Client/Server Architecture

3. Two-Tier Client/Server Architecture

4. Three-Tier Client/Server Architecture

Comment

Step 2 of 4

For designing a Web-based system to make airline reservations and sell airline tickets, Three-tie
client/server architecture will be the best choice.

• A web user interface is necessary as different types of users such as naive users or casual
users will interact with the system.

• Web user interface is placed in the client system.

• User can interact with user interface and submit the transactions.

• Web server can handle those transactions, validate the data and manipulate database
accordingly.

• Webserver/application server will handle the application logic of the system.

• The database server contains the DBMS.

Comment
Step 3 of 4

In centralized DBMS architecture, DBMS functionality and user interface are performed on the
same system. But for a Web-based system, they must be on different systems.

Hence centralized DBMS architecture is not appropriate for web-based system.

Comment

Step 4 of 4

In three-tier Client/Server Architecture, the business logic is placed in application server or web
server.

Basic Client/Server architecture or Two-Tier Client/Server architecture can be considered


appropriate for web server if the business logic can be placed in database server or client. But if
business logic is placed in database server or client, it will be a burden.

Hence, Basic Client/Server architecture and Two-Tier Client/Server architecture are not
appropriate for web-based system.

Comment
Chapter 2, Problem 15E

Problem

Consider Figure 2.1. In addition to constraints relating the values of columns in one table to
columns in another table, there are also constraints that impose restrictions on values in a
column or a combination of columns within a table. One such constraint dictates that a column o
a group of columns must be unique across all rows in the table. For example, in the STUDENT
table, the Student_number column must be unique (to prevent two different students from havin
the same Student_number). Identify the column or the group of columns in the other tables that
must be unique across all rows in the table.

Step-by-step solution

Step 1 of 2

By using schema diagram of the database, the database tables are constructed. Each data bas
table contains column and those columns are unique.

Comment

Step 2 of 2

Group of columns that will be unique in each table are:

1. STUDENT: Student_number

2. COURSE: Course_number. If course name is separate for each course Course_name can
also be a column.
3. PREREQUISITE: Course_number can be a unique identifier but only if a course has single
PREREQUISITE or else Course_number and Prerequisite_number will together form unique
combination.

4. SECTION: Section_identifier

• Consider that no two sections can have the same Section_identifier.

• Look at that Section_identifier is unique only within a given course allow in a given term.

5. GRADE_REPORT: Section_identifier and Student_number.

• The Section_identifier will be different if a student takes the same course or different course in
other term.

Comment
Chapter 3, Problem 1RQ

Problem

Discuss the role of a high-level data model in the database design process.

Step-by-step solution

Step 1 of 2

High-level data model provides the concepts for presenting data which are close to the user
recognize data. It helps to show the data requirements of the users in a detailed description of
the entity types, relationships and constraints.

Comment

Step 2 of 2

The role of a high-level data model in the database design process is as follows:

• The design process of the High-level data model is easy to understand and useful in
communicating with non-technical users.

• This model acts as a reference to ensure that all the user requirements are met and do not
conflict with each other.

• High-level data model helps to concentrate on specifying the properties of data to the database
designers, without being concerned with storage details in the database design process.

• This data model helps in conceptual design.

Comment
Chapter 3, Problem 2RQ

Problem

List the various cases where use of a NULL value would be appropriate.

Step-by-step solution

Step 1 of 2

Use of NULL values is appropriate in two situations:

1. When value of an attribute is irrelevant for an entity.

For example: In a schema that stores information about a person if we have an attribute called
Company, which sores the company name where a person works. Now for a student who is no
working, this attribute value will be irrelevant, so we can put in a NULL value at its place.

Comment

Step 2 of 2

2. When value of a particular attribute is not known; either because it is not known that value for
attribute exist or because existing value is unknown; then we can put NULL as value.

For example: In a schema that stores information about a person if we have an attribute called
Company, which sores the company name where a person works. Now for a person it is
possible that he is not working or it might be the case that the value of the company in which
person works is unknown, so we can put in a NULL value at its place.

Comment
Chapter 3, Problem 3RQ

Problem

Define the following terms: entity, attribute, attribute value, relationship instance, composite
attribute, multivalued attribute, derived attribute, complex attribute, key attribute, and value set
(domain).

Step-by-step solution

Step 1 of 5

1. Entity: An entity is an object (thing) with independent physical (car, home, person) or
conceptual (company, university course) existence in the real world.

2. Attribute: Each real world entity (thing) has certain properties that represent its significance i
real world or describes it. These properties of an entity are known as attribute.

For example: consider a car: various things that describe a car can be: model, manufacture,
color, cost etc...

All these are relevant in a miniworld and are important in describing a car. These are attributes o
a CAR.

Comment

Step 2 of 5

3. Attribute Value: Associated with each real world entity are certain attributes that describe tha
entity. Value of these attributes for any entity is called attribute value.

For Example: Attribute Value of color attribute of car entity can be Red.

4. Relationship Instance: Each relationship instance rj in R is an association of entities, where


the association includes exactly one entity from each participating entity type. Each such
relationship instance rj represent the fact that the entities participating in rj are related in some
way in the corresponding miniworld situation.

For example: In relationship type WORKS_FOR between the two entity types EMPLOYEE and
DEPARTMENT, which associates each employee with the department for which the employee
works. Each relationship instance in the relationship set WORKS_FOR associates one
EMPLOYEE and one DEPARTMENT.

Comment
Step 3 of 5

5. Composite Attribute: An attribute that can be divided into smaller subparts, which represent
more basic attributes with independent meanings, is called a composite attribute.

For Example: consider an attribute called phone number that in relation to an employee of a
company. One can have phone number as a single attribute or as two attributes, viz. ., area cod
and number. Since phone number can be broken into two independent attributes, it is a
composite attribute.

Weather to break a composite attribute or divide it in basic attributes depends on usage of the
attribute in miniworld.

6. Multivalued Attribute: For a real world entity, an attribute may have more than one value. Fo
example: Phone number attribute of a person. A person may have one, two or three phones. So
there is a possibility of more than one value for this attribute. Any attribute that can have more
than one value is a multivalued attribute.

Comment

Step 4 of 5

7. Derived Attribute: For a real world entity, an attribute may have value that is independent of
other attributes or can not be derived from other attributes; such attributes are called as stored
attributes. There are also certain attributes, whose value can be derived using value of other
attributes; such attributes are known as derived attributes.

For example: if date of birth of a person is a stored attribute, and using DOB attribute and
current date age of a person can be calculated; so age is a derived attribute.

8. Complex Attribute: Composite and multivalued attribute can be nested arbitrarily. Arbitrary
nesting can be represented by grouping components of a composite attribute between
parenthesis () and separating the components with comas, and by displaying multivalued
attributes between braces {}. Such attributes are called composite attributes.

For Example: if a person has more than one address and each residence has multiple phones
and address_phone attribute can be specifies as:

(Address_phone({Phone(Area_code,Ph_Num)},Address(street_address,
(Number,Street,Apartment_number),City,State,Zip))

Comment

Step 5 of 5
9. Key Attribute: Each real world entity is unique in itself. There are certain attributes whose
value is different for all similar type of entities. These attributes are called Key attributes. These
attributes are used to specify uniqueness constraint in a relation.

For Example: Consider a entity Car. For all cars, attribute, registration number and car number
will have different values. These are key of all entity of car type.

It is possible that a set of attributes form a key.

10. Value Set (domain): For a Attribute of a real world entity, there is a range of values from
which a particular attribute can take value. For example: Age attribute of an employee must
have value, let, from 18-70 then all integers in range 18-70 are domain of attribute Age In most
programming languages basic data types such as integers, strings, float, date etc… are used to
specify domain of a particular attribute.

Comment
Chapter 3, Problem 4RQ

Problem

What is an entity type? What is an entity set? Explain the differences among an entity, an entity
type, and an entity set.

Step-by-step solution

Step 1 of 4

Entity type: An entity type defines a collection (or set) of entities that have the same attributes.
database usually contains a group of entities that are similar. These entities have same attribute
but different attribute values. A collection of these entities is an entity type.

For example a car dealer might like to store details of all car in his showroom in a car database
A collection of all car entities will be call as entity type.

Each entity type in a database is represented by its name and its attributes.

Comment

Step 2 of 4

For example in CAR can be the name of the entity type and Reg_num, Car_num, Manufacturer
model, cost, color can be attributes.

Entity Set: At a particular time the dealer might have a set of eight cars and at some other time
he might have a set of different 4 cars.

The collection of all entities of a particular entity type in a database, at any point of time are
called entity set. It is referred by same name as entity type.

Comment

Step 3 of 4

For example if we have 4 entities (4 cars):

Entity set will include:

Name: CAR
Entities: e1(reg_1, DL_1, ford, 1870,2000000,white),e2(reg_2, DL_3, ford,
1830,1000000,white),e3(reg_3, DL_3, ford, 1877,2100000,red),e4(reg_4, DL_4, ford,
1970,2500000,white)

Comment

Step 4 of 4

An entity is a real world object or thing that has independent physical or conceptual existence.

Often there are many entities of similar type and about those information needs to be stored in
database. Name of this database and attributes of entity jointly form an entity type, or in other
words entity type is collection of entities that have similar attributes. At two instance of time,
entities in miniworld about which information is stored in the database can be different. Collectio
of entities of an entity type at an instance of time is called entity set.

Comment
Chapter 3, Problem 5RQ

Problem

Explain the difference between an attribute and a value set.

Step-by-step solution

Step 1 of 2

Attribute:

Every entity has certain things that represent its importance in the real world. These properties o
entities are known as attribute.

Example:

Let us consider a Bus, bus contains different things that describe a bus can be model, color,
manufacture date, year, country etc.

Value set:

For attribute of an entity, there is a range of values from which an attribute can take a value.

Example:

Age attributes of an employee must have a value. Let us consider Age is attribute in the range o
16 - 60 then all are integers and those known as the value set of attribute Age.

Comment

Step 2 of 2

The difference between an attribute and value set.

Attribute Value set

A table grouped the data in rows and columns. The Value set is the group of values that may
columns are known as attributes of that table. be allow to that attribute for each entity.

Value set is a range of values which an


Attribute contains certain properties of an entity.
attribute can take a value.
Comment
Chapter 3, Problem 6RQ

Problem

What is a relationship type? Explain the differences among a relationship instance, a relationship
type, and a relationship set.

Step-by-step solution

Step 1 of 3

Relationship type:

This expresses a type of relationship that is occurring between the entities and also lists the
possible count of relationships between entities.

Comment

Step 2 of 3

Consider the following diagram.

Explanation:

STUDENT and COURSE are entities and ENROLL refers to the relationship.

S1, S2, S3… are the instances of entity STUDENT.

C1, C2, C3… are the instances of entity COURSE.

r1, r2, r3… are the relationship types between the entities.

Relationship type is the association between the entities. In the above diagram ENROLL is the
relationship type.

Relationship instance refers to exactly one instance from each participating entity type. S1 is
related to C1 through r1. S1 and C1 are one instance, S2 and C2 are one instance, S3 and C1
and so on.

Relationship set refers to all instances of a Relationship type. {(S1, C1), (S2, C2) , S1, C3) …}
form the relationship set.

Comment

Step 3 of 3

Differences between relationship instance, type and set:

Relationship instance Relationship type Relationship set

It refers to exactly one instance from It refers association This is a collection instances
each participating entity type. between the entities. of a relationship type.

Comment
Chapter 3, Problem 7RQ

Problem

What is a participation role? When is it necessary to use role names in the description of
relationship types?

Step-by-step solution

Step 1 of 3

The Participation role is the part that every entity participates in a relationship.

• This role is important to use role name in the depiction of relationship type when the similar
entity type participates more than once in a relationship type in various roles.

• The role names are necessary in recursive relationships.

Example:

An employee is related to a department in which he works in a company.

So, we can say that a relationship may exist between various entities (of same or different entity
type).

Each entity type that participates in a relationship type plays a role in the relationship.

Comment

Step 2 of 3

Participation Role or Role name signifies role that a participating entity from the entity type
plays in each relationship instance and helps to explain what relationship means.

Example:

In WORKS_FOR relationship type, EMPLOYE plays the role of worker and DEPARTMENT plays
role of department or employer. In figure below an employee works for department. E1 and E3
work for D1 and E2 works for D2.

Comment

Step 3 of 3

Using Role name is not necessary in the description of relationship types where all participating
entities are distinct as in example above because, in such cases name of entity type generally
specify the role played by each entity type.

But when one entity type participates in a relation in more than one role; recursive
relationships; it becomes necessary to use role names in the description of relationship types.

Example:

Consider entity type EMPLOYEE. There can be another employee who can supervise the first
employee. In this case role cannot be describes using the entity type name as this is relationship
of an entity type with itself. In such a case using role name becomes important. In figure below
Supervision relationship type relates employee and supervisor.

E1 supervises E2. Here each relationship instance ri in SUPERVISION associates two


employee, ei and ej, one playing role of supervisor and other playing role of supervisee.

Comment
Chapter 3, Problem 8RQ

Problem

Describe the two alternatives for specifying structural constraints on relationship types. What are
the advantages and disadvantages of each?

Step-by-step solution

Step 1 of 3

The two alternatives for specifying structural constraints on relationship types are as follows:

• Cardinality ratio

• Participation constraint

Comment

Step 2 of 3

Cardinality Ratio:

• The entity can participate in any number of relationship instances.

• The cardinality ratio specifies the maximum participation of the entity.

• For a binary relationship, the cardinality ratios can be 1:1, 1:N, N:1 and M:N.

• Cardinality Ratio is represented on ER diagram by 1,M and N on the left and right side of the
diamond.

Participation constraint:

• The participation constraint specifies the minimum number of relationship instances that can be
participated by each entity.

• The participation constraint specifies the minimum participation of the entity. It is also called as
minimum cardinality constraint.

• There are two types of participation constraints. They are total and partial participation
constraints.

• Participation constraint is represented in an ER diagram a line joining the participating entity


type and relationship. Total participation is represented by a double line where as partial
participation is represented by a single line.

Comment

Step 3 of 3

Advantages and disadvantages:

• The cardinality ratio and participation constraint specify the participation of the entity in the
relationship instances.

• They are helpful in describing the binary relationship types.

• It is a costly affair for some of the entities and relationships to be expressed using these two
modeling constructs.

Comment
Chapter 3, Problem 9RQ

Problem

Under what conditions can an attribute of a binary relationship type be migrated to become an
attribute of one of the participating entity types?

Step-by-step solution

Step 1 of 2

• The attributes of a relationship with cardinality 1:1 and 1: N can be migrated to become an
attribute of entity types.

• In case of 1:1 cardinality, the attribute can be moved to either of entity types in the binary
relationship.

• In case of 1: N cardinality, the attribute can be migrated only to N side of the relationship.

Comment

Step 2 of 2

Example

• Consider a binary relationship, Works_for between the EMPLOYEE and DEPARTMENT.

• This relationship is between the DEPARTMENT and EMPLOYEE is of cardinality 1: N.

• Each employee is in one department but there can be several employees in a single
department.

• In this scenario, an attribute Start_date in relationship type WORKS_FOR that can be migrated
to EMPLOYEE entity type that tells start date when the employee started working for that
department.

Comment
Chapter 3, Problem 10RQ

Problem

When we think of relationships as attributes, what are the value sets of these attributes? What
class of data models is based on this concept?

Step-by-step solution

Step 1 of 3

Solution:

Relationship as attributes:

• Whenever the attribute refers to one entity type to another entity type, there a relationship
exists.

• They can have attributes like entity types.

• For those attributes having cardinality relationship type as 1:1 or 1: N.

• The relationship types to become attributes of entity types when it is migrated.

For example:

Take the scenario as follows:

There is a relationship between the EMPLOYEE and DEPARTMENT.

• The relationship DEPARTMENT:EMPLOYEE is of the cardinality 1: N.

• Here, each employee is in one department and several employees are in a single department.

Start_date attribute is in the WORKS_FOR relationship type that can be migrated to


EMPLOYEE entity type.

This will inform the Start_date, when EMPLOYEE started working for that department.

Date will be the domain or value set for Start_date of EMPLOYEE in any department. This will
not change or depend on any attribute whether it is present or not.

Comment

Step 2 of 3

The Value sets of attributes:

The set of values attribute can call as domain or value set.

In conceptual design phase of data model all entity types, relationships and constraints
are specified as follows:

• DEPARTMENT entity type contains the attributes like name, locations, number, manager and
managerstartdate.

• Here, multi-valued attribute is location. Key attributes are both Name and number.

• PROJECT entity type contains the attributes like name, number location,
controllingdepartment.

• Key attributes are both Name and number.

• EMPLOYEE entity type contains the attributes like name, sex, ssn, salary, department,
address, salary, department, birthdate and supervisor.

• Composite attributes are both Name and address.

• DEPENDENT entity type contains the attributes like employee, dependantname, sex,
relationship, and birthdate.

Comment

Step 3 of 3

The relational data model is based on this concept.

Comment
Chapter 3, Problem 11RQ

Problem

What is meant by a recursive relationship type? Give some examples of recursive relationship
types.

Step-by-step solution

Step 1 of 2

Recursive relationship:

If there is a relationship between the two entities of the similar type is called as recursive
relationship.

• The relationship between occurrences of two different entities is termed as recursive


relationship

Comment

Step 2 of 2

Example of recursive relationship:

The following is the example of recursive relationship,

Consider that the entity might be a PERSON. In this entity, the attribute will be MOTHER which is
a person itself.

Here, the recursive relationship exists because one row in the PERSON table refers to another
row in the same PERSON table.

Comment
Chapter 3, Problem 12RQ

Problem

When is the concept of a weak entity used in data modeling? Define the terms owner entity type,
weak entity type, identifying relationship type, and partial key.

Step-by-step solution

Step 1 of 5

The concept of a weak entity is used in the conceptual phase of a data modeling. While
modeling, the entity types who do not have key attributes of there own.

Example

Consider the entity types DEPENDENT and EMPLOYEE.

• A DEPENDENT can only be an EMPLOYEE of the company.

• The DEPENDENT attributes can be same for relatives of two employees so, there can be no
unique way of distinguishing between two records such entity types are called weak entity types.

Comments (1)

Step 2 of 5

Owner entity type

The entities belong to a weak entity type are identified by being associated to specific entities
from another entity type in combination with one of their attribute values.

Comment

Step 3 of 5

Weak Entity Type

Entity types that do not have key attributes of their own are called weak entity types.

Comment

Step 4 of 5

Identifying Relationship Type

A relationship type that relates a weak entity to its owner entity type is called identifying
relationship type.

Comment

Step 5 of 5

Partial key

A partial key is a set of attributes in weak entity types that can uniquely identify weak entities that
are related to the same owner entity.

Comment
Chapter 3, Problem 13RQ

Problem

Can an identifying relationship of a weak entity type be of a degree greater than two? Give
examples to illustrate your answer.

Step-by-step solution

Step 1 of 4

Identifying relationship: The relationship between a strong and a weak entity is known as
identifying relationship.

Comment

Step 2 of 4

The degree of an identifying relationship of a weak entity can be two or greater than two.

Comment

Step 3 of 4

Consider the following ER diagram:

Here,

• Student and Company are the two strong entities and Interview is the weak entity.

• The selection_process is an identifying relationship.

• The degree of the identifying relationship (selection_process) is 3.

• In the above ER diagram, the student applies for a job in a company and interview is a selection
process for the student to take a job in the company.

Comment

Step 4 of 4

Therefore, from the above ER diagram, it can be concluded that the degree of an identifying
relationship of a weak entity can be greater than 2.

Comment
Chapter 3, Problem 14RQ

Problem

Discuss the conventions for displaying an ER schema as an ER diagram.

Step-by-step solution

Step 1 of 1

Comment
Chapter 3, Problem 15RQ

Problem

Discuss the naming conventions used for ER schema diagrams.

Step-by-step solution

Step 1 of 1

The naming conventions used for ER schema diagrams are as follows:

• The entity type names should be in singular names.

• The names of the entity type and the relationship type are should written in uppercase letters.

• The attribute names of each entity are initial letter capitalized.

• The role names are in lowercase.

Comment
Chapter 3, Problem 16E

Problem

Which combinations of attributes have to be unique for each individual SECTION entity in the
UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld
constraints:

a. During a particular semester and year, only one section can use a particular classroom at a
particular DaysTime value.

b. During a particular semester and year, an instructor can teach only one section at a particular
DaysTime value.

c. During a particular semester and year, the section numbers for sections offered for the same
course must all be different.

Can you think of any other similar constraints?

Step-by-step solution

Step 1 of 4

a.

Consider the following miniworld constraint:

A particular classroom can be used by a section at a particular DaysTime value, during a


particular semester and year.

The attribute combinations, that must be unique for the above constraint, are as follows:

Sem, Year, SecID, CRoom, DaysTime

Comment

Step 2 of 4

b.

Consider the following miniworld constraint:

Only one section can be taught by an instructor at a particular DaysTime value, during a
particular semester and year.

The attribute combinations, that must be unique for the above constraint, are as follows:

Sem, Year, SecId, DaysTime, Id (of the INSTRUCTOR teaching the SECTION)

Comment

Step 3 of 4
c.

Consider the following miniworld constraint:

The section numbers corresponding to the sections offered for the same course must all be
different during a particular semester and year.

The attribute combinations, that must be unique for the above constraint, are as follows:

Sem, Year, SecNo, CCode (of the COURSE related to the SECTION)

Comment

Step 4 of 4

Some of the other similar constraints related to SECTION entity are as follows:

• In a particular semester and year, a student can take only one section at a particular DaysTime
value.

• In a particular semester and year, an instructor of a particular rank cannot teach two sections at
the same DaysTime value.

• Only one section of a particular course can use only one classroom during each particular
semester and year.

Comment
Chapter 3, Problem 17E

Problem

Composite and multivalued attributes can be nested to any number of levels. Suppose we want
to design an attribute for a STUDENT entity type to keep track of previous college education.
Such an attribute will have one entry for each college previously attended, and each such entry
will be composed of college name, start and end dates, degree entries (degrees awarded at that
college, if any), and transcript entries (courses completed at that college, if any). Each degree
entry contains the degree name and the month and year the degree was awarded, and each
transcript entry contains a course name, semester, year, and grade. Design an attribute to hold
this information. Use the conventions in Figure 3.5.

Step-by-step solution

Step 1 of 3

Complex attributes are the attributes that are formed by nesting multivalued attributes and
composite attributes.

• The curly braces {} are used to group the components of multivalued attributes.

• The open braces () are used to group the components of composite attributes.

Comment

Step 2 of 3

A multivalued attribute PreviousCollege is used to hold the college previously attended by the
student.

• The components of PreviousCollege are CollegeName, StartDate, EndDate.

A multivalued attribute Degree is used to hold the details of degrees awarded to the student.

• The components of Degree are DegreeName, Month, Year.

A multivalued attribute Transcript is used to hold the details of transcript of the student.

• The components of Transcript are CourseName, Semester, Year, Grade.

Comment

Step 3 of 3

An attribute that holds the details of PreviousCollege, Degree and Transcript of the STUDENT
entity is as follows:

{PreviousCollege (CollegeName, StartDate, EndDate,

{Degree (DegreeName, Month, Year)},

{Transcript (CourseName, Semester, Year, Grade)})}

Comment
Chapter 3, Problem 18E

Problem

Show an alternative design for the attribute described in Exercise that uses only entity types
(including weak entity types, if needed) and relationship types.

Exercise

Composite and multivalued attributes can be nested to any number of levels. Suppose we want
to design an attribute for a STUDENT entity type to keep track of previous college education.
Such an attribute will have one entry for each college previously attended, and each such entry
will be composed of college name, start and end dates, degree entries (degrees awarded at that
college, if any), and transcript entries (courses completed at that college, if any). Each degree
entry contains the degree name and the month and year the degree was awarded, and each
transcript entry contains a course name, semester, year, and grade. Design an attribute to hold
this information. Use the conventions in Figure 3.5.

Step-by-step solution

Step 1 of 3

The alternative design for the entity STUDENT with attribute to keep track of previous college
education as discussed in the previous problem is as shown below:

Comment

Step 2 of 3

The strong entities are as given below:

• STUDENT

• COLLEGE

• DEGREE

The weak entities are as given below:

• TRANSCRIPT

• ATTENDANCE

Comment

Step 3 of 3

Relationships between the entities are as given below:

• There exists a binary 1:N relationship PREVIOUS_ATTENDED_COLLEGE between STUDENT


and ATTENDANCE.

• There exists a binary 1:N relationship ATTENDED between COLLEGE and ATTENDANCE.

• There exists a binary M:N relationship DEGREE_AWARDED between ATTENDANCE and


DEGREE.
• There exists a binary 1:N relationship MAINTAIN_ATTENDANCE between ATTENDANCE and
TRANSCRIPT.

Comment
Chapter 3, Problem 19E

Problem

Consider the ER diagram in Figure, which shows a simplified schema for an airline reservations
system. Extract from the ER diagram the requirements and constraints that produced this
schema. Try to be as precise as possible in your requirements and constraints specification.

Figure An ER diagram for an AIRLINE database schema

Step-by-step solution

Step 1 of 2

Refer the ER diagram of the AIRLINE database schema given in figure 3:21.

The requirements and the constraints that produced from the schema are as follows:

AIRPORT

• The database represents the information about each AIRPORT.

• Each AIRPORT has its unique Airport_code, AIRPORT Name, City and State where it is
located.

• Each AIRPORT is identified by airport code.

FLIGHT

• Each FLIGHT is identified by a unique number.

• It also specifies the information about the airline for the FLIGHT and the days on which it is
scheduled.

FLIGHT_LEG

• Each FLIGHT consists of one or more FLIGHT_LEGs with Leg_no.

• FARE is kept for each flight and there are certain set of restrictions on FARE.

• Each FLIGHT_LEG has the details of its scheduled arrival time, departure time and an Airport
Arrival, Airport Departure.

Comment

Step 2 of 2

LEG_INSTANCE

• Each FLIGHT_LEG has the details of its scheduled arrival time, departure time and Airport
Arrival and Airport Departure with one or more LEG_INSTANCEs.

• A LEG_INSTANCE is an instance of a FLIGHT LEG for a date on which flight travels.

• The information for the AIRPLANE used and the number of available seats is kept in the LEG
INSTANCE.

RESERVATION

• In LEG INSTANCE, RESERVATIONs for every customer include the Customer Name, Phone,
and Seat Number(s).

AIRPLANE, AIRPLANE TYPE, CAN_LAND

• All the information about the AIRPLANEs and AIRPLANE TYPEs are included.

• AIRPLANE is identified by an airplane id, and the particular type of an AIRPLANE_TYPE.

• It has a fixed number of seats and has a particular manufacturing company name.
• CAN_LAND relates AIRPLANE_TYPE to the AIRPORTS where they can land at a time.

Comment
Chapter 3, Problem 20E

Problem

In Chapters 1 and 2, we discussed the database environment and database users. We can
consider many entity types to describe such an environment, such as DBMS, stored database,
DBA, and catalog/data dictionary. Try to specify all the entity types that can fully describe a
database system and its environment; then specify the relationship types among them, and draw
an ER diagram to describe such a general database environment.

Step-by-step solution

Step 1 of 1

Entity types that can fully describe a database environment and users are:

1. USERS(User_name, User_id, Kind_of_user): User_name gives name of user, User_id is


unique identifier for each user and Kind of user tells if user is from DBA staff, casual_User,
Application Programmer, Parametric user.(list can be expanded to include menu based
application user, form base application user and so on)

2. COMMAND_INTERFACE_TYPE (Interface_identifier, User_group, Next_tool):


Interface_identifier can tell which interfaces user can use, viz. DDL statements, Privileged
commands, Interactive query, Application programs, compiled transactions, menu based
interface, form based interface and so on. User_group tells which user group will use this
interface and so that others cannot carry out instructions which they don’t have access to.
Next_tool tells tool_id of tool that will be used by interface for further processing.

3. TOOLS (Tool_id, Tool_type, Next_tool): Tool_id helps to uniquely identify the tool, Tool_type
tells if the tool is a compiler, or and optimizer or storage tool, Next_tool tells the Tool_id of next
tool that will be used by this tool for completing the transaction.

E-R diagram:

Comment
Chapter 3, Problem 21E

Problem

Design an ER schema for keeping track of information about votes taken in the U.S. House of
Representatives during the current two-year congressional session. The database needs to keep
track of each U.S. STATE’s Name (e.g., ‘Texas’, ‘New York’, ‘California’) and include the Region
of the state (whose domain is {‘Northeast’, ‘Midwest’, ‘Southeast’, ‘Southwest’, ‘West’}). Each
CONGRESS_PERSON in the House of Representatives is described by his or her Name, plus
the District represented, the Start_date when the congressperson was first elected, and the
political Party to which he or she belongs (whose domain is {‘Republican’, ‘Democrat’,
‘Independent’, ‘Other’}). The database keeps track of each BILL (i.e., proposed law), including
the Bill_name, the Date_of_vote on the bill, whether the bill Passed_or_failed (whose domain is
{‘Yes’, ‘No’}), and the Sponsor (the congressperson(s) who sponsored—that is, proposed—the
bill). The database also keeps track of how each congressperson voted on each bill (domain of
Vote attribute is {‘Yes’, ‘No’, ‘Abstain’, ‘Absent’}). Draw an ER schema diagram for this
application. State clearly any assumptions you make.

Step-by-step solution

Step 1 of 2

Comment

Step 2 of 2

ASSUMPTIONS:

1. Each CONGRESS_PERSON can represent one district and one district is represented by one
CONGRESS_MAN.

2. Bill is sponsored by one CONGRESS_MAN.

3. Every BILL has different name.

Above schema has three entity types

1. US_STATE_REGION: represents states and regions in US

2. CONGRESS_PERSON: who are elected from various regions and are related to
US_STATE_REGION by relationship REPRESENTATIVE.

3. BILL: each bill is related to CONGRESS_PERSON, who presents it and is voted by all
CONGRESS_MAN.

Comment
Chapter 3, Problem 22E

Problem

A database is being constructed to keep track of the teams and games of a sports league. A
team has a number of players, not all of whom participate in each game. It is desired to keep
track of the players participating in each game for each team, the positions they played in that
game, and the result of the game. Design an ER schema diagram for this application, stating any
assumptions you make. Choose your favorite sport (e.g., soccer, baseball, football).

Step-by-step solution

Step 1 of 2

Consider a soccer league in which various teams participate to win the title. The following is the
ER diagram for the database of a sports league.

Comment

Step 2 of 2

Assumptions:

• Only two teams can participate in each game.

• Each player in a team has unique number.

• On a date only one game takes place.

• A player can play many games.

Comment
Chapter 3, Problem 23E

Problem

Consider the ER diagram shown in Figure for part of a BANK database. Each bank can have
multiple branches, and each branch can have multiple accounts and loans.

a. List the strong (nonweak) entity types in the ER diagram.

b. Is there a weak entity type? If so, give its name, partial key, and identifying relationship.

c. What constraints do the partial key and the identifying relationship of the weak entity type
specify in this diagram?

d. List the names of all relationship types, and specify the (min, max) constraint on each
participation of an entity type in a relationship type. Justify your choices.

e. List concisely the user requirements that led to this ER schema design.

f. Suppose that every customer must have at least one account but is restricted to at most two
loans at a time, and that a bank branch cannot have more than 1,000 loans. How does this show
up on the (min, max) constraints?

An ER diagram for a BANK database schema.

Step-by-step solution

Step 1 of 6

(a)

Non weak entity types are:

• LOAN

• CUSTOMER

• ACCOUNT

• BANK

Comment

Step 2 of 6

(b)

Yes there is a weak entity type BANK_BRANCH and its Partial key is Branch_no and

identifying relationship is BRANCHES.

Comment

Step 3 of 6

(c)

• No two branches have same number.

• A bank can have any number of branches but a branch is of only one bank.
Comment

Step 4 of 6

(d)

Relationship types are:

• BRANCHES: BANK (min, max) = (1, 1) and BANK_BRANCH (min, max) = (1.*). A bank can
have any number of branches but a branch can be owned by a single bank

• ACCTS: ACCOUNT (min, max) = (1..*) and BANK_BRANCH(min, max) = (1, 1). An account
can be with one branch but a branch can have many accounts.

• LOANS: LOAN (min, max) = (1..*) and BANK_BRANCH(min, max) = (1,1). A branch can give
any number of loans but a loan is given from one branch only.

• A_C: ACCOUNT(min, max) = (1.*) and CUSTOMER(min, max) = (1,1). A customer can have
any number of accounts but an account is owned by only one customer

• L_C: CUSTOMER(min, max) = (1,1) and LOAN(min, max) = (1..*). A customer can take any
number of loans but a loan is given to only one customer.

Comments (1)

Step 5 of 6

(e)

Consider a banking system

• Each BANK has a unique code, name and address.

• A bank can have any number of BANK_BRANCH. Each BANK_BRANCH has number that is
unique in branches of that bank.

• Each BANK_BRACH opens account and gives loans to customers.

• Each account and loan.is identifies by account number and has balance, is of particular type.

• Each customer is identified by Ssn. Name address phone of customer are stored.

Comment

Step 6 of 6

(f)

Relationship type constraints are:

• BRANCHES: BANK (min, max) = (1, 1) and BANK_BRANCH (min, max) = (1.*)

• ACCTS: ACCOUNT (min, max) = (1,500) and BANK_BRANCH(min, max) = (1, 1)

• LOANS: LOAN (min, max) = (1,1000) and BANK_BRANCH(min, max) = (1,1)

• A_C: ACCOUNT(min, max) = (1.*) and CUSTOMER(min, max) = (1,1)

• L_C: CUSTOMER(min, max) = (1,1) and LOAN(min, max) = (1,2)

Comments (2)
Chapter 3, Problem 24E

Problem

Consider the ER diagram in Figure Assume that an employee may work in up to two
departments or may not be assigned to any department. Assume that each department must
have one and may have up to three phone numbers. Supply (min, max) constraints on this
diagram. State clearly any additional assumptions you make. Under what conditions would the
relationship HAS_PHONE be redundant in this example?

Part of an ER diagram for a COMPANY database.

Step-by-step solution

Step 1 of 2

Consider the ER diagram for the COMPANY database. The employee may work in up to two
departments or may not be a part of any department. The (min, max) constraint in this case is (0,
2). Each department must have one phone number and may have up to three phone numbers.
The (min, max) constraint in this case is (1, 3).

The following are the other assumptions made for the COMPANY database:

• Each department must have one employee and may have up to twenty employees. The (min,
max) constraint in this case is (1, 20).

• Each phone used by only one department. The (min, max) constraint in this case is (1, 1).

• Each phone is assigned to at least one employee and may be assigned to 5 employees. The
(min, max) constraint in this case is (1, 5).

• Each employee must have one phone and may have up to 3 phones. The (min, max) constraint
in this case is (1, 3).

Comment

Step 2 of 2

The following is the ER diagram after supplying the (min, max) constraints for the COMPANY
database:

The relationship HAS_PHONE would be redundant under the following condition:

• If the EMPLOYEEs assigned to all PHONEs of their DEPARTMENT and none of any other
department.

Comment
Chapter 3, Problem 25E

Problem

Consider the ER diagram in Figure. Assume that a course may or may not use a textbook, but
that a text by definition is a book that is used in some course. A course may not use more than
five books. Instructors teach from two to four courses. Supply (min, max) constraints on this
diagram. State clearly any additional assumptions you make. If we add the relationship ADOPTS,
to indicate the textbook(s) that an instructor uses for a course, should it be a binary relationship
between INSTRUCTOR and TEXT, or a ternary relationship among all three entity types? What
(min, max) constraints would you put on the relationship? Why?

Part of an ER diagram or a COURSES database.

Step-by-step solution

Step 1 of 1

Relationship type constraints are:

TEACHES: INSTRUCTOR (min, max) = (1,1) and COURSE (min, max) = (2,4). Assumption: One
course is taught by a single teacher.

USES: TEXT (min, max) = (0, 5) and COURSE (min, max) = (1, 1).

Assumption: One text can be used by single course.

If relationship ADOPTS is added in between INSTRUCTOR and TEXT (min, max) constraints
would be:

INSTRUCTOR (min, max) = (1,1) and TEXT (min, max) = (0, 20).

Since each Instructor can take 2-4 courses and can use unto five texts for each course or none,
min and max constraints will be like above.

Comment
Chapter 3, Problem 26E

Problem

Consider an entity type SECTION in a UNIVERSITY database, which describes the section
offerings of courses. The attributes of SECTION are Section_number, Semester, Year.
Course_number, Instructor, Room_no (where section is taught), Building (where section is
taught), Weekdays (domain is the possible combinations of weekdays in which a section can be
offered {‘MWF’, ‘MW’, ‘TT’, and so on}), and Hours (domain is all possible time periods during
which sections are offered {‘9–9:50 a.m.’, ‘10–10:50 a.m.’, …, ‘3:30–4:50 p.m.’, ‘5:30–6:20 p.m.’,
and so on}). Assume that Section_number is unique for each course within a particular
semester/year combination (that is, if a course is offered multiple times during a particular
semester, its section offerings are numbered 1, 2, 3, and so on). There are several composite
keys for section, and some attributes are components of more than one key. Identify three
composite keys, and show how they can be represented in an ER schema diagram.

Step-by-step solution

Step 1 of 4

The attributes of the SECTION entity are as follows:

• Section_number

• Semester

• Year

• Course_number

• Instructor

• Room_no

• Building

• Weekdays

• Hours

Comment

Step 2 of 4

As Section_number is unique for a course in particular semester of a year, {Section_number,


Semester, Year, Course} can be considered as composite key for SECTION entity.

As unique room can be allocated for a specific days and hours in a particular semester of a year,
{Semester, Year, Room_no, Weekdays, Hours} can be considered as composite key for
SECTION entity.

As unique Instructor can be allocated to teach for a specific days and hours in a particular
semester of a year, {Semester, Year, Instructor, Weekdays, Hours} can be considered as
composite key for SECTION entity.

Comment

Step 3 of 4

Hence, the composite keys for SECTION entity are as follows:

• Key 1: Section_number, Semester, Year, Course

• Key 2: Semester, Year, Room_no, Weekdays, Hours

• Key 3: Semester, Year, Instructor, Weekdays, Hours

Comment

Step 4 of 4

The ER schema diagram is as follows:


Chapter 3, Problem 27E

Problem

Cardinality ratios often dictate the detailed design of a database. The cardinality ratio depends on
the real-world meaning of the entity types involved and is defined by the specific application. For
the following binary relationships, suggest cardinality ratios based on the common-sense
meaning of the entity types. Clearly state any assumptions you make.

Entity 1 Cardinality Ratio Entity 2

1. STUDENT ______________ SOCIAL_SECURITY_CARD

2. STUDENT ______________ TEACHER

3. CLASSROOM ______________ WALL

4. COUNTRY ______________ CURRENT_PRESIDENT

5. COURSE ______________ TEXTBOOK

6. ITEM (that can be found in an order) ______________ ORDER

7. STUDENT ______________ CLASS

8. CLASS ______________ INSTRUCTOR

9. INSTRUCTOR ______________ OFFICE

10 EBAY_AUCTIONJTEM ______________ EBAY_BID

Step-by-step solution

Step 1 of 3

1. Each student will have a unique social security number. So there exists a 1:1 cardinality ratio
between STUDENT and SOCIAL_SECURITY_NUMBER entities.

2. A student can be taught by many teachers and a teacher can teach many students. So there
exists a M: N cardinality ratio between STUDENT and TEACHER entities.

3. A class room can have 4 walls and there will be a common wall for two class rooms. So there
exists a 2: 4 cardinality ratio between CLASSROOM and WALL entities.

4. Each country will have an only one president and a person can be president to only one
country. So there exists a 1:1 cardinality ratio between COUNTRY and PRESIDENT entities.

5. A course can have any number of textbooks but a textbook can belong to only one course. So
there exists a 1:N cardinality ratio between COURSE and TEXTBOOK entities.

Comments (2)

Step 2 of 3

6. An order can consist of many items and an item can belong to more than one order. So there
exists a M: N cardinality ratio between ORDER and ITEM entities.

7. A student can belong to one class, but a class can consist of many students. So there exists a
N:1 cardinality ratio between STUDENT and CLASS entities.

8. A class can have many instructors and an instructor can belong to more than one class. So
there exists a M: N cardinality ratio between CLASS and INSTRUCTOR entities.

9. An instructor can belong to one office, but an office can have more than one instructor. So
there exists a N:1 cardinality ratio between INSTRUCTOR and OFFICE entities.

10. An eBay auction item can have any number of bids. So there exists a 1:N cardinality ratio
between EBAY_AUCTION_ITEM and EBAY-BID entities.

Comment

Step 3 of 3

Summary of cardinality ratio:


Comment
Chapter 3, Problem 28E

Problem

Consider the ER schema for the MOVIES database in Figure.

Assume that MOVIES is a populated database. ACTOR is used as a generic term and includes
actresses. Given the constraints shown in the ER schema, respond to the following statements
with True, False, or Maybe. Assign a response of Maybe to statements that, although not
explicitly shown to be True, cannot be proven False based on the schema as shown. Justify each
answer.

a. There are no actors in this database that have been in no movies.

b. There are some actors who have acted in more than ten movies.

c. Some actors have done a lead role in multiple movies.

d. A movie can have only a maximum of two lead actors.

e. Every director has been an actor in some movie.

f. No producer has ever been an actor.

g. A producer cannot be an actor in some other movie.

h. There are movies with more than a dozen actors.

i. Some producers have been a director as well.

j. Most movies have one director and one producer.

k. Some movies have one director but several producers.

l. There are some actors who have done a lead role, directed a movie, and produced a movie.

m. No movie has a director who also acted in that movie.

Figure An ER diagram for a MOVIES database schema.

Step-by-step solution

Step 1 of 13

a.

There exists a many to many (M: N) relationship named PERFORMS_IN between ACTOR and
MOVIE. ACTOR and MOVIE have full participation in relationship PERFORMS_IN.

Hence, the given statement is TRUE.

Comment

Step 2 of 13

b.
There exists a many to many (M: N) relationship named PERFORMS_IN between ACTOR and
MOVIE. The maximum cardinality M or N indicates that there is no maximum number. Some of
the actors may be acted in more than ten movies.

Hence, the given statement is MAY BE.

Comment

Step 3 of 13

c.

There exists a 2 to N relationship named LEAD_ROLE between ACTOR and MOVIE. The
maximum cardinality for an actor to act in a movie as a lead role is N. N can be 2 or more.

Hence, the given statement is TRUE.

Comment

Step 4 of 13

d.

There exists a 2 to N relationship named LEAD_ROLE between ACTOR and MOVIE. The
maximum cardinality 2 indicates that an actor can act as a lead role in only two movies.

Hence, the given statement is TRUE.

Comments (1)

Step 5 of 13

e.

There exists a one to one (1: 1) relationship named ALSO_A_DIRECTOR between ACTOR and
DIRECTOR. Director does not have total participation in the relationship named
ALSO_A_DIRECTOR. So, there may be an actor who is also a director, but every director cannot
be an actor.

Hence, the given statement is FALSE.

Comment

Step 6 of 13

f.

There exists a one to one (1: 1) relationship named ACTOR_PRODUCER between ACTOR and
PRODUCER. Producer does not have total participation in the relationship named
ACTOR_PRODUCER. So, there may be an actor who is also a producer.

Hence, the given statement is FALSE.

Comment

Step 7 of 13

g.

A producer can act in any movie other than directed by him.

Hence, the given statement is FALSE.

Comment

Step 8 of 13

h.

There exists a many to many (M: N) relationship named PERFORMS_IN between ACTOR and
MOVIE. The maximum cardinality M indicates that there is no maximum number. A movie can
have more than 12 actors performing in it.

Hence, the given statement is MAY BE.

Comment

Step 9 of 13

i.

There exists a one to one (1: 1) relationship named ALSO_A_DIRECTOR between ACTOR and
DIRECTOR.
There exists a one to one (1: 1) relationship named ACTOR_PRODUCER between ACTOR and
PRODUCER.

Hence, there may be an actor who is a director as well a producer

Hence, the given statement is TRUE.

Comment

Step 10 of 13

j.

There exists a one to many relationship named DIRECTS between DIRECTOR and MOVIE. A
director can direct N movies.

There exists a many to many relationship named PRODUCES between PRODUCER and
MOVIE. A producer can produce any number of movies.

So, there may be one director and one producer for a movie.

Hence, the given statement is MAY BE.

Comment

Step 11 of 13

k.

There exists a one to many relationship named DIRECTS between DIRECTOR and MOVIE. A
director can direct N movies.

There exists a many to many relationship named PRODUCES between PRODUCER and
MOVIE. A producer can produce any number of movies.

So, there can be one director and several producers for movies.

Hence, the given statement is TRUE.

Comment

Step 12 of 13

l.

There exists a 2 to N relationship named LEAD_ROLE between ACTOR and MOVIE.

There exists a one to one (1: 1) relationship named ALSO_A_DIRECTOR between ACTOR and
DIRECTOR.

There exists a one to one (1: 1) relationship named ACTOR_PRODUCER between ACTOR and
PRODUCER.

So, there may an actor who is a producer, director and performed a lead role in a movie.

Hence, the given statement is TRUE.

Comment

Step 13 of 13

m.

There may be a movie in which a director performed in the movie directed by him.

Hence, the given statement is FALSE.

Comment
Problem
Chapter 3, Problem 29E

Given the ER schema for the MOVIES database in Figure, draw an instance diagram using three
movies that have been released recently. Draw instances of each entity type: MOVIES,
ACTORS, PRODUCERS, DIRECTORS involved; make up instances of the relationships as they
exist in reality for those movies.

An ER diagram for a MOVIES database schema.

Step-by-step solution

Step 1 of 2

Comment

Step 2 of 2

Amir Khan: Produced a movie he acted in and Also directed the movie.

Comment
Chapter 3, Problem 30E

Problem

Illustrate the UML diagram for Exercise. Your UML design should observe the following
requirements:

a. A student should have the ability to compute his/her GPA and add or drop majors and minors.

b. Each department should be able to add or delete courses and hire or terminate faculty.

c. Each instructor should be able to assign or change a student’s grade for a course.

Note: Some of these functions may be spread over multiple classes.

Reference Problem 16

Which combinations of attributes have to be unique for each individual SECTION entity in the
UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld
constraints:

a. During a particular semester and year, only one section can use a particular classroom at a
particular DaysTime value.

b. During a particular semester and year, an instructor can teach only one section at a particular
DaysTime value.

c. During a particular semester and year, the section numbers for sections offered for the same
course must all be different.

Can you think of any other similar constraints?

Step-by-step solution

Step 1 of 5

The UML diagram consists of a class, such that the class is equivalent to the entity in ER
diagram. The class consists of following three sections:

• Class name: It is the top section of the UML class diagram. Class name is similar to the entity
type name in ER diagram.

• Attributes: It is the middle section of the UML class diagram. Attributes are the same as the
attributes of an entity in the ER diagram.

• Operations: It is the last section of the UML class diagram. It indicates the operations that can
be performed on individual objects, where each object is similar to the entities in ER diagram.

Comment

Step 2 of 5

a.

The operation that indicates the ability of the student to calculate his/her GPA and also to add or
drop the majors and minors is specified in the last section of the UML class diagram. The
operations are as follows:

• computer_gpa

• add_major

• drop_major

• add_minor

• drop_minor

Comment

Step 3 of 5

b.

The operation that indicates the ability of each department to add or delete a course and also to
hire or terminate a faculty is specified in the last section of the UML class diagram. The
operations are as follows:

• add_course

• delete_course

• hire_faculty

• terminate_faculty

Comment

Step 4 of 5

c.

The operation that indicates the ability of each instructor to assign or change the grade of a
student for a particular course is specified in the last section of the UML class diagram. The
operations are as follows:

• assign_grade

• change_grade

Comment

Step 5 of 5

The UML diagram corresponding to the above requirements are as follows:

Comment
Chapter 3, Problem 31LE

Problem

Consider the UNIVERSITY database described in Exercise 16. Build the ER schema for this
database using a data modeling tool such as ERwin or Rational Rose.

Reference Exercise 16

Which combinations of attributes have to be unique for each individual SECTION entity in the
UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld
constraints:

a. During a particular semester and year, only one section can use a particular classroom at a
particular DaysTime value.

b. During a particular semester and year, an instructor can teach only one section at a particular
DaysTime value.

c. During a particular semester and year, the section numbers for sections offered for the same
course must all be different.

Can you think of any other similar constraints?

Step-by-step solution

Step 1 of 1

Refer to the exercise 3.16 for the UNIVERSITY database. Use Rational Rose tool to create the
ER schema for the database as follow:

• In the options available on left, right click on the option Logical view, go to New and select the
option Class Diagram.

• Name the class diagram as UNIVERSITY. Select the option Class available in the toolbar and
then click on empty space of the Class Diagram file. Name the class as COLLEGE.

Right click on the class, select the option New Attribute, and name the attribute as CName.
Similarly, create the other attributes COffice and CPhone.

• Now right click on the attribute CName, available on the left under the class UNIVERSITY, and
select the option Open Specification. Select the Protected option under Export Control. This
will make CName as primary key.

• Similarly create another class INSTRUCTOR; its attributes Id, Rank, IName, IOffice and
IPhone; and Id as the primary key.

• Select the option Unidirectional Association from the toolbar, for creating relationships
between the two classes. Now click on the class COLLEGE; while holding the click drag the
mouse towards the class INSTRUCTOR and release the click. This will create the relationship
between the two selected classes.

Name the association as DEAN. Since the structural constraint in the ER diagram is specified
using (min, max) notation, so specify the structural constraints using the Rational Rose tool as
follows:

• Right click on the association close to the class COLLEGE and select 1 from the option
Multiplicity.
• Again, right click on the association close to the class INSTRUCTOR and select Zero or One
from the option Multiplicity.

• Similarly, create other classes and their associated attributes. Specify the relationships and
structural constraints between the classes, as mentioned above.

ER schema may be specified using alternate diagrammatic notation that is class diagram,
through the use of Rational Rose tool as follows:

Comment
Chapter 3, Problem 32LE

Problem

Consider a MAIL_ORDER database in which employees take orders for parts from customers.
The data requirements are summarized as follows:

■ The mail order company has employees, each identified by a unique employee number, first
and last name, and Zip Code.

■ Each customer of the company is identified by a unique customer number, first and last name,
and Zip Code.

■ Each part sold by the company is identified by a unique part number, a part name, price, and
quantity in stock.

■ Each order placed by a customer is taken by an employee and is given a unique order number.
Each order contains specified quantities of one or more parts. Each order has a date of receipt
as well as an expected ship date. The actual ship date is also recorded.

Design an entity-relationship diagram for the mail order database and build the design using a
data modeling tool such as ERwin or Rational Rose.

Step-by-step solution

There is no solution to this problem yet.


Get help from a Chegg subject expert.

Ask an expert
Chapter 3, Problem 35LE

Problem

Consider the ER diagram for the AIRLINE database shown in Figure Build this design using a
data modeling tool such as ERwin or Rational Rose.

An ER diagram for an AIRLINE database schema

Step-by-step solution

Step 1 of 1

Refer to the figure 3.21 for the ER schema of AIRLINE database. Use Rational Rose tool to
create the ER schema for the database as follow:

• In the options available on left, right click on the option Logical view, go to New and select the
option Class Diagram.

• Name the class diagram as AIRLINE. Select the option Class available in the toolbar and then
click on empty space of the Class Diagram file. Name the class as AIRPORT.

Right click on the class, select the option New Attribute, and name the attribute as Airport_code.
Similarly, create the other attributes City, State and Name.

• Now right click on the attribute Airport_code, available on the left under the class AIRPORT,
and select the option Open Specification. Select the Protected option under Export Control.
This will make Airport_code as primary key.

• Similarly create another class FLIGHT_LEG and its attribute Leg_no

• Select the option Unidirectional Association from the toolbar, for creating relationships
between the two classes. Now click on the class AIRPORT; while holding the click drag the
mouse towards the class FLIGHT_LEG and release the click. This will create the relationship
between the two selected classes.

Name the association as DEPARTURE_AIRPORT. Since the structural constraint in the ER


diagram is specified using (min, max) notation, so specify the structural constraints using the
Rational Rose tool as follows:

• Right click on the association close to the class AIRPORT and select 1 from the option
Multiplicity.

• Again, right click on the association close to the class FLIGHT_LEG and select n from the
option Multiplicity.

• Similarly, create other classes and their associated attributes. Specify the relationships and
structural constraints between the classes, as mentioned above.

ER schema may be specified using alternate diagrammatic notation that is class diagram,
through the use of Rational Rose tool as follows:
Comment
Chapter 4, Problem 1RQ

Problem

What is a subclass? When is a subclass needed in data modeling?

Step-by-step solution

Step 1 of 3

Subclass:

The sub class is also called as a derived class. This class extends from another class (Parent
Class) so that it inherits protected and public members from the parent class.

The sub class is same as the entity in the superclass but in a distinct specific role.

Comment

Step 2 of 3

An entity is an object (thing) with independent physical (car, home, person) or conceptual
(company, university course) existence in the real world.).

Each real-world entity (thing) has certain properties that represent its significance in real world or
describes it. These properties of an entity are known as attribute. An entity type defines a
collection (or set) of entities that have the same attributes.

A database usually contains a group of entities that are similar. These entities have same
attributes but different attribute values. A collection of these entities is an entity type.

In each entity type there may exist, smaller groupings on basis of one or other
attribute/relationship. Such attributes or relationships may not apply to all entities in entity type
but are of significant value for that group. All such groups can be represented as separate
classes or entity types. These form subclass of bigger entity type.

Example:

Consider am entity type VEHICLE. Now all vehicles have property that they have manufacturer,
number_plate, registration_number, colour etc. , but there are certain properties hat we may link
only to carrier vehicles like load_capacity, size(for width and height of product it can take) etc…,
and certain attributes that can be attached to passenger vehicles only are sitting_capacity,
ac/non ac etc…, so we can have subclasses for Entity type vehicle as PASSENGER_VEHICLE
and GOODS_VEHICLE. PASSENGER_VEHICLE and GOODS_VEHICLE are subclasses of
VEHICLE superclass.

Comment

Step 3 of 3

Subclass needed in data modeling:

To define inheritance relationship between two classes, the subclass is needed in data modeling.

Concept of subclass is used in data modeling to represent data more meaningfully and to
represent those attributes/relationships clearly that are part of a group of entities in superclass
and are not part of all entities.

Comment
Chapter 4, Problem 2RQ

Problem

Define the following terms: superclass of a subclass, superclass/subclass relationship, IS-A


relationship, specialization, generalization, category, specific (local) attributes, and specific
relationships.

Step-by-step solution

Step 1 of 9

1. Superclass of a subclass: In each entity type there may exist, smaller groupings on basis
of one or other attribute/relationship. Such attributes or relationships may not apply to all
entities in entity type but are of significant value for that particular group. All such groups can be
represented as separate classes or entity types. These form subclass of bigger entity type.
Bigger entity type is known as superclass.

For example: Consider am entity type VEHICLE. Now all vehicles have property that they have
manufacturer, number_plate, registration_number, colour etc. , but there are certain properties
hat we may link only to carrier vehicles like load_capacity, size(for width and height of product it
can take) etc…, and certain attributes that can be attached to passenger vehicles only are
sitting_capacity, ac/non ac etc…, so we can have subclasses for Entity type vehicle as
PASSENGER_VEHICLE and GOODS_VEHICLE. PASSENGER_VEHICLE and
GOODS_VEHICLE are subclasses of VEHICLE superclass

Comment

Step 2 of 9

2. Superclass/subclass relationship: Relationship between a superclass and any one of its


subclass is known as superclass/subclass relationship.

Comment

Step 3 of 9

3. is-a relationship: A superclass/subclass relationship is often called as is-a relationship


because of the way in which concept is referred.

For example: Consider am entity type VEHICLE. Now all vehicles have property that they have
manufacturer, number_plate, registration_number, colour etc. , but there are certain properties
hat we may link only to carrier vehicles like load_capacity, size(for width and height of product it
can take) etc…, and certain attributes that can be attached to passenger vehicles only are
sitting_capacity, ac/non ac etc…, so we can have subclasses for Entity type vehicle as
PASSENGER_VEHICLE and GOODS_VEHICLE. PASSENGER_VEHICLE and
GOODS_VEHICLE are subclasses of VEHICLE superclass.

Or we can say GOOD_VEHICLE is a VEHICLE..

4.

Comment

Step 4 of 9

Specialization: Specialization is a process of defining a set of subclass of an entity


type(superclass of specialization). The set of subclass that forms a specialization is defined on
basis of some distinguishing characteristic of the entities in the superclass.

For example: the set of {GOOD_VEHICLE and CARRIER_VEHICLE} is a specialization of


superclass VEHICLE that distinguishes among vehicle entities on basis of purpose which each
vehicle serves. There can be several specializations of same entity type based on different
distinguishing characteristics.

Foe example: On basis that vehicle is commercial or not we can have other specialization
{COMMERCIAL, PRIVATE}.

Specialization is a process that allows user to do following:

a. Define a set of subclass of an entity type.

b. Establish additional specific attribute with each subclass.

c. Establish additional specific relationship types between each subclass and other entity types
or other subclasses.

Comment

Step 5 of 9
5. Generalization: This is a reverse process of abstraction in which differences between several
entity types are suppressed, common features are identified, and generalized into a single
superclass of which the original entity types are special subclass.

For example: GOOD_VEHICLE and CARRIER_VEHICLE are two classes and they have certain
attributes, viz. , number_plate, reg_number, color, etc. ; these attributes from both these classes
can be taken in common and a new superclass can be created VEHICLE. This is called
generalization.

Comment

Step 6 of 9

6. Category: It may happen sometime that need arises for modeling a single
superclass/subclass relationship with more than one superclass, where the superclasses
represent different entity types. In this case, the subclass will represent a collection of objects
that is a subset of the of distinct entity types; such a subclass is called a union or a category.

Comment

Step 7 of 9

7.

Comments (2)

Step 8 of 9

Specific (local) attributes: Consider am entity type VEHICLE. Now all vehicles have property that
they have manufacturer, number_plate, registration_number, colour etc. , but there are certain
properties hat we may link only to CARRIER_VEHICLES subclass like load_capacity, size(for
width and height of product it can take) etc…, and certain attributes that can be attached to
PASSENGER_VEHICLES subclass only: sitting_capacity, ac/non ac etc. These attributes that
are part of only subclaases and not of superclass are called local attributes or specific
attributes.

Comment

Step 9 of 9

8. Specific relationships: Like local attributes there are certain relationships that are true only
for a subclass of superclass and not for all subclasses or for superclass. Such relations are
called specific relationships.

For example: CARRIES_GOODS can b a relation between CARRIER_VEHICLES and


COMPANY and but not between PASSENGER_VEHICLE and COMPANY.

Comment
Chapter 4, Problem 3RQ

Problem

Discuss the mechanism of attribute/relationship inheritance. Why is it useful?

Step-by-step solution

Step 1 of 2

The Enhanced entity relationship (EER) model is the extension of the ER model. The EER model
includes some new concepts in addition to the concepts of the ER model. The EER model
includes the concepts of subclass, superclass, specialization, generalization, category or union
type. The ER model with all these additional concepts is associated with the mechanism of
attribute and relationship inheritance.

Comment

Step 2 of 2

The type of each entity is defined by the set of attributes and the relationship types. The
members of the subclass entity inherit the attributes and the relationships of the superclass
entity. This mechanism is useful because, the attributes in the subclass possess the
characteristics of the superclass.

Comment
Chapter 4, Problem 4RQ

Problem

Discuss user-defined and predicate-defined subclasses, and identify the differences between the
two.

Step-by-step solution

Step 1 of 1

Predicate-defined subclasses: When we decide entities that will become member of each
class of specialization by placing condition on some attribute of the superclass. Such subclasses
are called predicate-defined subclass.

User- defined subclasses: When there is no condition for determining membership in a


subclass, the subclass is called user defined. Membership in such a subclass is determined by
the database users when they apply the operation to add entity to the subclass; hence,
membership is specified individually for each entity by user, not by any condition that that may be
evaluated automatically.

Difference between predicate defined and user defined subclass are:

1. Membership of predicate defined subclasses can be decided automatically but it is not the
same for user defined subclasses.

Comment
Chapter 4, Problem 5RQ

Problem

Discuss user-defined and attribute-defined specializations, and identify the differences between
the two.

Step-by-step solution

Step 1 of 5

User- defined specialization:

Comment

Step 2 of 5

If there is no condition for deciding membership of all subclasses, then the sub class is called
user defined specialization.

Comment

Step 3 of 5

Membership in such a specialization is determined by the database users when any operation is
performed to add entity to the subclass.

Comment

Step 4 of 5

Hence, membership is specified individually for each entity by user.

Attribute-defined specialization:

If the user chooses entities, the entity become member of each class of specialization by placing
condition on some attribute of the superclass. Such subclasses are called attribute-defined
subclass.

Comment

Step 5 of 5

The difference between user-defined specialization and attribute-defined specialization is as


follows:

User-defined specialization Attribute-defined specialization

The user is responsible for identifying proper The value of the same attribute is used in
subclass. defining predicate for all subclasses.

Membership of user-defined defined Membership of attribute defined


specialization cannot be decided automatically. specialization can be decided automatically.

Comment
Chapter 4, Problem 6RQ

Problem

Discuss the two main types of constraints on specializations and generalizations.

Step-by-step solution

Step 1 of 1

Two main constraints on specialization and generalization are:

1. Disjoint Constraint: This specifies that the subclasses of the specialization must be disjoint.
This means that an entity can be a member of at most one of the subclasses of the
specialization. A specialization that is attribute-defined implies the disjoint ness constraint if the
attribute used to define membership predicate is single-valued.

If disjoint ness constraint holds true than specialization is disjoint. There might be a set of entities
that are common to subclasses, this is condition of overlap.

2. Completeness Constraint: This may be total or partial. A total specialization constraint


specifies that every entity in the superclass must be a member of at least one of the subclass in
the specialization. Partial specialization allows an entity not to belong to any of the subclasses.

Comment
Problem
Chapter 4, Problem 7RQ

What is the difference between a specialization hierarchy and a specialization lattice?

Step-by-step solution Next

Step 1 of 1

A subclass itself may have further subclasses specified on it, forming a hierarchy or a lattice of
specializations. A specialization hierarchy has that constraint that every subclass participates
as a subclass in only one class/subclass relationship; that is, each subclass has only one parent,
which results in a tree structure.

In contrast, for a specialization lattice, a subclass can be a subclass in more than one
class/subclass relationship.

Comment
Chapter 4, Problem 8RQ

Problem

What is the difference between specialization and generalization? Why do we not display this
difference in schema diagrams?

Step-by-step solution

Step 1 of 2

Specialization is a process of defining a set of subclass of an entity type (superclass of


specialization). The set of subclass that forms a specialization is defined on basis of some
distinguishing characteristic of the entities in the superclass.

For example: the set of {GOOD_VEHICLE and CARRIER_VEHICLE} is a specialization of


superclass VEHICLE that distinguishes among vehicle entities on basis of purpose which each
vehicle serves. There can be several specializations of same entity type based on different
distinguishing characteristics.

Foe example: On basis that vehicle is commercial or not we can have other specialization
{COMMERCIAL, PRIVATE}.

Specialization is a process that allows user to do following:

a. Define a set of subclass of an entity type.

b. Establish additional specific attribute with each subclass.

c. Establish additional specific relationship types between each subclass and other entity types
or other subclasses.

Comment

Step 2 of 2

Generalization: This is a reverse process of abstraction in which differences between several


entity types are suppressed, common features are identified, and generalized into a single
superclass of which the original entity types are special subclass.

For example: GOOD_VEHICLE and CARRIER_VEHICLE are two classes and they have certain
attributes, viz. , number_plate, reg_number, color, etc. ; these attributes from both these classes
can be taken in common and a new superclass can be created VEHICLE. This is called
generalization.

Specialization and generalization can be viewed as functionally reverse processes of each other.
We do not generally display difference in design of schema because the decision as to which
process is more appropriate in a particular situation is often subjective.

Comment
Chapter 4, Problem 9RQ

Problem

How does a category differ from a regular shared subclass? What is a category used for?
Illustrate your answer with examples.

Step-by-step solution

Step 1 of 3

Category is different from regular shared subclasses because:

1. A category has two or more superclasses that may represent distinct entity types, whereas
other regular shared subclasses always have a single superclass.

Regular shared subclass fig:

Category fig:

Comments (1)

Step 2 of 3

2. An entity that is member of shared subclass must exist in all superclasses i.e. it is subset of
intersection of superclasses. In case of category, a member entity can be part of any one of
superclass, i.e., it is subset of union of superclasses.

3. Attribute inheritance works selectively in case of categories. Attributes of any one of


superclass are inherited, depending on the superclass to which entity belongs. On the other
hand, a shared subclass inherits all the attributes of its superclasses.

Comment

Step 3 of 3

USE:It may happen sometime that need arises for modeling a single superclass/subclass
relationship with more than one superclass, where the superclasses represent different entity
types. In this case, the subclass will represent a collection of objects that is a subset of the of
distinct entity types; in such cases union or a category is used.

For example: Consider a piece of property. This can be owned by a person, a business firm, a
charitable institution, a bank etc. All this entities are of different type but will jointly form total set
of land owners. Above figure illustrate this example.

Comment
Chapter 4, Problem 10RQ

Problem

For each of the following UML terms (see Sections 3.8 and 4.6) discuss the corresponding term
in the EER model, if any: object, class, association, aggregation, generalization, multiplicity,
attributes, discriminator, link, link attribute, reflexive association, and qualified association.

Step-by-step solution

Step 1 of 1

S.No UML Term EER model Term

1 Object Entity

2 Class Entity type

3 Association Relationship types

4 Aggregation Relationship between a whole object and component part

5 Generalization Generalization

6 Multiplicity (min, max) notation

7 Attributes Attributes

8 Discriminator Partial key

9 Link Relationship instances

10 Link Attribute Relationship attribute

11 Reflexive association Recursive relationship

12 Qualified association Weak entity

Comment
Chapter 4, Problem 11RQ

Problem

Discuss the main differences between the notation for EER schema diagrams and UML class
diagrams by comparing how common concepts are represented in each.

Step-by-step solution

Step 1 of 1

Following are some of the differences between the notation for EER schema diagram and UML
class diagram notations are as follows:

Comment
Problem
Chapter 4, Problem 12RQ

List the various data abstraction concepts and the corresponding modeling concepts in the EER
model.

Step-by-step solution

Step 1 of 3

The list of four abstraction concepts in the EER (Enhanced Entity-Relationship model) are as
follows:

• Classification and instantiation

• Identification

• Specialization and generalization

• Aggregation and association

Comment

Step 2 of 3

Classification and instantiation

• The classification is used to assign the similar entities or object to the entity type or object type.

• The instantiation is a quite opposite of the classification and it is used to a specific examination
of distinct objects of a class.

Identification

• Identify the classes and objects are uniquely identified by the identifier is known as an
identification.

• The identification needs two levels:

o The identification is used to tell the difference between the classes and objects.

o The identification is also used to identify the database objects and to relate them to their real-
world counterparts.

Specialization and generalization

• The specialization is used to categorizing a class of objects into subclasses.

• The generalization is the quite opposite of the generalization and it is used combined several
classes into a higher-level class.

Aggregation and association

• The aggregation is used to build the composite objects from their component objects.

• The association is used to associate objects from several independent classes.

Comment

Step 3 of 3

The following are the modeling concepts of the EER model:

• The modeling concepts in the EER model almost like all the ER model modeling concepts. In
addition, the EER model contains subclass and superclass are related to the concepts of the
Specialization and generalization.

• Another modeling concepts in the EER model is category or union type. Which have no
standard terminology related to the abstract concepts of the EER model.

Comment
Chapter 4, Problem 13RQ

Problem

What aggregation feature is missing from the EER model? How can the EER model be further
enhanced to support it?

Step-by-step solution

Step 1 of 2

Missing feature:

In the EER (Enhanced Entity Relationship) model may not be used explicitly and it includes the
possibility of combining the objects which are related to specific instance into a higher level
aggregate object.

• This may be sometimes helpful because this higher-level aggregate may be related to some
other object.

• This type of relationship between the primitive object and aggregate object is referred as IS-A-
PART-OF and its inverse is called as IS-A-COMPONENT-OF.

Comment

Step 2 of 2

Enhancement:

This missing feature must be further enhanced by representing the aggregation feature correctly
in EER model by creating the additional entity types.

Comment
Chapter 4, Problem 14RQ

Problem

What are the main similarities and differences between conceptual database modeling
techniques and knowledge representation techniques?

Step-by-step solution

Step 1 of 2

Major similarities and differences between conceptual database modeling techniques and
knowledge representation techniques:

1. Both the disciplines use an abstraction process to identify common properties and important
aspects of objects in the miniworld while suppressing insignificant differences and unimportant
details.

2. Both disciplines provide concepts, constraints, operations, and languages for defining data
and representing knowledge.

3. KR is generally broader in scope than semantic data models. Different forms of knowledge,
such as rules, incomplete and default knowledge, temporal and spatial knowledge, are
represented in KR schemes.

Comment

Step 2 of 2

4. KR schemes include reasoning mechanisms that deduce additional facts stored in a database.
Hence, whereas most current database systems are limited to answering the direct queries,
knowledge-based systems using KR schemes can answer queries that involve inferences over
the stored data.

5. Whereas most data models concentrate on the representation of database schemas, or meta-
knowledge, KR schemes often mix up the schemas with the instances themselves in order to
provide flexibility in representing exceptions. This often leads to inefficiencies when KR schemes
are implemented in comparison to database especially when large amount of data needs to be
stored.

Comment
Chapter 4, Problem 15RQ

Problem

Discuss the similarities and differences between an ontology and a database schema.

Step-by-step solution

Step 1 of 1

The difference between ontology and database schema is that, the schema is usually limited to
describing a small subset of a miniworld form reality in order to store and manage data. Ontology
is usually considered to be more general in that. It attempts to describe a part of reality or a
domain of interest (e.g., medical terms, electronic-commerce applications) as completely as
possible

Comment
Chapter 4, Problem 16E

Problem

Design an EER schema for a database application that you are interested in. Specify all
constraints that should hold on the database. Make sure that the schema has at least five entity
types, four relationship types, a weak entity type, a superclass/subclass relationship, a category,
and an n-ary (n > 2) relationship type.

Step-by-step solution

Step 1 of 2

Comment

Step 2 of 2

Here weak entity type INTERVIEW has ternary identifying relationships- JOB_OFFER,
CANDIDATE and EMPLOYER. An interview can be related to candidate who gives interview and
some employer that takes it and some job offer for which interview can be taken.

Employer can be a government organization or a private firm, and is hiring for a department for
which a candidate can apply or wants to work for.

A candidate can be a fresher or may have some work experience.

Comment
Chapter 4, Problem 17E

Problem

Consider the BANK ER schema in Figure, and suppose that it is necessary to keep track of
different types of ACCOUNTS (SAVINGS_ACCTS, CHECKING_ACCTS, …) and LOANS
(CAR_LOANS, HOME_LOANS, …). Suppose that it is also desirable to keep track of each
ACCOUNT’S TRANSACTIONS (deposits, withdrawals, checks, …) and each LOAN's
PAYMENTS; both of these include the amount, date, and time. Modify the BANK schema, using
ER and EER concepts of specialization and generalization. State any assumptions you make
about the additional requirements.

An ER diagram for an AIRLINE database schema

Step-by-step solution

Step 1 of 2

Following are the assumptions:

• There are only three types of accounts SAVING, CURRENT and CHECKING accounts.

• There are only three types of loans CAR loans, HOME loans and PERSONAL loans.

• Each user can do any number of transactions on an account.

• A loan can be repaid in any number of payments

• Each transaction and payment have unique id.

Comment

Step 2 of 2

The modified enhanced entity relationship diagram is as follows:


Comment
Chapter 4, Problem 18E

Problem

The following narrative describes a simplified version of the organization of Olympic facilities
planned for the summer Olympics. Draw an EER diagram that shows the entity types, attributes,
relationships, and specializations for this application. State any assumptions you make. The
Olympic facilities are divided into sports complexes. Sports complexes are divided into one-sport
and multisport types. Multisport complexes have areas of the complex designated for each sport
with a location indicator (e.g., center, NE corner, and so on). A complex has a location, chief
organizing individual, total occupied area, and so on. Each complex holds a series of events
(e.g., the track stadium may hold many different races). For each event there is a planned date,
duration, number of participants, number of officials, and so on. A roster of ail officials will be
maintained together with the list of events each official will be involved in. Different equipment is
needed for the events (e.g., goal posts, poles, parallel bars) as well as for maintenance. The two
types of facilities (one-sport and multisport) will have different types of information. For each type,
the number of facilities needed is kept, together with an approximate budget.

Step-by-step solution

Step 1 of 3

In the EER diagram,

• “Rectangle box” denotes entity.

• “Diamond-shaped” symbol represents the relationship.

• “Oval” symbol connected with attribute represents the attribute.

Comment

Step 2 of 3

The following is the EER diagram for the organization of Olympic facilities planned for the
summer Olympics.

Comment

Step 3 of 3

Explanation:

• The Olympic facilities are divided into sports complexes. The sport complexes are divided into
one sport and multisport types.

• There exist a holds relationship between Complex and Event entities. The complex holds the
number of events.

• Each event is assigned to an officer.

• Both complex and event have equipment. The complex maintains maintenance equipment and
event has event equipment.

Comment
Chapter 4, Problem 19E

Problem

Identify all the important concepts represented in the library database case study described
below. In particular, identify the abstractions of classification (entity types and relationship types),
aggregation, identification, and specialization/generalization. Specify (min, max) cardinality
constraints whenever possible. List details that will affect the eventual design but that have no
bearing on the conceptual design. List the semantic constraints separately. Draw an EER
diagram of the library database.

Case Study: The Georgia Tech Library (GTL) has approximately 16,000 members, 100,000
titles, and 250,000 volumes (an average of 2.5 copies per book). About 10% of the volumes are
out on loan at any one time. The librarians ensure that the books that members want to borrow
are available when the members want to borrow them. Also, the librarians must know how many
copies of each book are in the library or out on loan at any given time. A catalog of books is
available online that lists books by author, title, and subject area. For each title in the library, a
book description is kept in the catalog; the description ranges from one sentence to several
pages. The reference librarians want to be able to access this description when members
request information about a book. Library staff includes chief librarian, departmental associate
librarians, reference librarians, check-out staff, and library assistants.

Books can be checked out for 21 days. Members are allowed to have only five books out at a
time. Members usually return books within three to four weeks. Most members know that they
have one week of grace before a notice is sent to them, so they try to return books before the
grace period ends. About 5% of the members have to be sent reminders to return books. Most
overdue books are returned within a month of the due date. Approximately 5% of the overdue
books are either kept or never returned. The most active members of the library are defined as
those who borrow books at least ten times during the year. The top 1% of membership does 15%
of the borrowing, and the top 10% of the membership does 40% of the borrowing. About 20% of
the members are totally inactive in that they are members who never borrow.

To become a member of the library, applicants fill out a form including their SSN, campus and
home mailing addresses, and phone numbers. The librarians issue a numbered, machine-
readable card with the members photo on it. This card is good for four years. A month before a
card expires, a notice is sent to a member for renewal. Professors at the institute are considered
automatic members. When a new faculty member joins the institute, his or her information is
pulled from the employee records and a library card is mailed to his or her campus address.
Professors are allowed to check out books for three-month intervals and have a two-week grace
period. Renewal notices to professors are sent to their campus address.

The library does not lend some books, such as reference books, rare books, and maps. The
librarians must differentiate between books that can be lent and those that cannot be lent. In
addition, the librarians have a list of some books they are interested in acquiring but cannot
obtain, such as rare or out- of-print books and books that were lost or destroyed but have not
been replaced. The librarians must have a system that keeps track of books that cannot be lent
as well as books that they are interested in acquiring. Some books may have the same title;
therefore, the title cannot be used as a means of identification. Every book is identified by its
International Standard Book Number (ISBN), a unique international code assigned to all books.
Two books with the same title can have different ISBNs if they are in different languages or have
different bindings (hardcover or softcover). Editions of the same book have different ISBNs.

The proposed database system must be designed to keep track of the members, the books, the
catalog, and the borrowing activity.

Step-by-step solution

Step 1 of 2

Entity Types:

1. LIBRARY_MEMBER

2. BOOK

3. STAFF_MEMBER

Relationship types:

1. ISSUE_CARD

2. ISSUE_NOTICE

3. ISSUE_BOOK

4. GET_DESCRIPTION

Aggregation:

1. All entity types are aggregation of constituent attributes as can be seen from EER diagram.

2. Relationship types that have member attributes (see figure) are also aggregation.

Identification:

1. All entity types and Relationship type are identified by names.

2. Each entity of entity type is identified differently by:

a. LIBRARY_MEMBER: Ssn

b. BOOK: Key(Title, Bind, Language, ISBN)

c. STAFF_MEMBER: Ssn

Specialization/ generalization:

1. Specialization of STAFF_MEMBER on basis of Designation. This is a partial disjoint


specialization.
2. Specialization of BOOK on basis of In_Library. This is a total disjoint specialization.

3. Specialization of IN_LIBRARY_BOOK on basis of Can_be_rented. This is a total disjoint


specialization.

Other Constraints that may pose in future:

1. Fine that will be charged for a lost card.

2. Expiry period of lost card

3.

Comment

Step 2 of 2

Privileges that may be entitled to a particular group of users.

4. Book description might change with new issues.

5. Fine that will be charged for damaged book.

Comment
Chapter 4, Problem 20E

Problem

Design a database to keep track of information for an art museum. Assume that the following
requirements were collected:

■ The museum has a collection of ART_OBJECTS. Each ART_OBJECT has a unique ld_no, an
Artist (if known), a Year (when it was created, if known), a Title, and a Description. The art
objects are categorized in several ways, as discussed below.

■ ART_OBJECTS are categorized based on their type. There are three main types—PAINTING,
SCULPTURE, and STATUE—plus another type called OTHER to accommodate objects that do
not fall into one of the three main types.

■ A PAINTING has a Paint_type (oil, watercolor, etc.), material on which it is Drawn_on (paper,
canvas, wood, etc.), and Style (modern, abstract, etc.).

■ A SCULPTURE or a statue has a Material from which it was created (wood, stone, etc.),
Height, Weight, and Style.

■ An art object in the OTHER category has a Type (print, photo, etc.) and Style.

■ ART_OBJECTs are categorized as either PERMANENT_COLLECTION (objects that are


owned by the museum) and BORROWED. Information captured about objects in the
PERMANEN_COLLECTION includes Date_acquired, Status (on display, on loan, or stored), and
Cost. Information captured about BORROWED objects includes the Collection from which it was
borrowed, Date_borrowed, and Date_returned.

■ Information describing the country or culture of Origin (Italian, Egyptian, American, Indian, and
so forth) and Epoch (Renaissance, Modern, Ancient, and so forth) is captured for each
ART_OBJECT.

■ The museum keeps track of ARTIST information, if known: Name, DateBorn (if known),
Date_died (if not living), Country_of_origin, Epoch, Main_style, and Description. The Name is
assumed to be unique.

■ Different EXHIBITIONS occur, each having a Name, Start_date, and End_date. EXHIBITIONS
are related to all the art objects that were on display during the exhibition.

■ Information is kept on other COLLECTIONS with which the museum interacts; this information
includes Name (unique), Type (museum, personal, etc.), Description, Address, Phone, and
current Contact_person.

Draw an EER schema diagram for this application. Discuss any assumptions you make, and then
justify your EER design choices.

Step-by-step solution

Step 1 of 2

Consider the following museum database to create the ER diagram:

The following are the assumptions:

• An ARTIST can create any number of ART_OBJECTS.

• ART_OBJECT will be displayed in the exhibition.

• Many ART_OBJECTS can be displayed in many EXHIBITIONS.

Comment

Step 2 of 2

The EER schema diagram for the art museum database is as follows:
Comments (1)
Chapter 4, Problem 21E

Problem

Figure shows an example of an EER diagram for a small-private-airport database; the database
is used to keep track of airplanes, their owners, airport employees, and pilots. From the
requirements for this database, the following information was collected: Each AIRPLANE has a
registration number [Reg#], is of a particular plane type [OF_TYPE], and is stored in a particular
hangar [STORED_IN]. Each PLANE_TYPE has a model number [Model], a capacity [Capacity],
and a weight [Weight]. Each HANGAR has a number [Number], a capacity [Capacity], and a
location [Location]. The database also keeps track of the OWNERs of each plane [OWNS] and
the EMPLOYEES who have maintained the plane [MAINTAIN]. Each relationship instance in
OWNS relates an AIRPLANE to an OWNER and includes the purchase date [Pdate]. Each
relationship instance in MAINTAIN relates an EMPLOYEE to a service record [SERVICE]. Each
plane undergoes service many times; hence, it is related by [PLANE_SERVICE] to a number of
SERVICE records. A SERVICE record includes as attributes the date of maintenance [Date], the
number of hours spent on the work [Hours], and the type of work done [Work_code]. We use a
weak entity type [SERVICE] to represent airplane service, because the airplane registration
number is used to identify a service record. An OWNER is either a person or a corporation.
Hence, we use a union type (category) [OWNER] that is a subset of the union of corporation
[CORPORATION] and person [PERSON] entity types. Both pilots [PILOT] and employees
[EMPLOYEE] are subclasses of PERSON. Each PILOT has specific attributes license number
[Lic_num] and restrictions [Restr]; each EMPLOYEE has specific attributes salary [Salary] and
shift worked [Shift]. All PERSON entities in the database have data kept on their Social Security
number [Ssn], name [Name], address [Address], and telephone number [Phone]. For
CORPORATION entities, the data kept includes name [Name], address [Address], and telephone
number [Phone]. The database also keeps track of the types of planes each pilot is authorized to
fly [FLIES] and the types of planes each employee can do maintenance work on [WORKS_ON].
Show how the SMALL_AIRPORT EER schema in Figure 4.12 may be represented in UML
notation. (Note: We have not discussed how to represent categories (union types) in UML, so
you do not have to map the categories in this and the following question.)

EER schema for a SMALL_AIRPORT database.

Step-by-step solution

Step 1 of 2

Consider the EER schema for a SMALL_AIRPORT database. The following is the UML diagram
that represents the SMALL_AIRPORT database.

Comment

Step 2 of 2
Each entity and relationships are shown in the UML diagram. In the provided EER diagram, there
is a union type (category) specified for OWNER. The OWNER is a subset of the union of
CORPORATION and PERSON. The categories are not mapped in the UML as specified.

Comments (2)
Chapter 4, Problem 22E

Problem

Show how the UNIVERSITY EER schema in Figure 4.9 may be represented in UML notation.

Step-by-step solution

Step 1 of 2

• The entity relationship diagram refers to the diagram that represents the relationship between
different entities and their attributes.

The entities can be people, objects etc.

• The UML refers to the unified modeling language which is a language used to develop or model
the fields in software engineering.

It is very helpful to understand the designing of the system.

Comment

Step 2 of 2

For the given ER diagram, the UML diagram is shown below:


Comment
Chapter 4, Problem 23E

Problem

Consider the entity sets and attributes shown in the following table. Place a checkmark in one
column in each row to indicate the relationship between the far left and far right columns.

a. The left side has a relationship with the right side.

b. The right side is an attribute of the left side.

c. The left side is a specialization of the right side.

d. The left side is a generalization of the right side.

(b) Has
(a) Has a (c) Is a (d) Is a
an Entity
Entity Set Relationship Specialization Generalization
Attribute Attrib
with of of
that is

1. MOTHER PERS

2. DAUGHTER MOT

3. STUDENT PERS

4. STUDENT Stude

5. SCHOOL STUD

6. SCHOOL CLAS

7. ANIMAL HOR

8. HORSE Breed

9. HORSE Age

10. EMPLOYEE SSN

11. FURNITURE CHAI

12. CHAIR Weig

13. HUMAN WOM

14. SOLDIER PERS

15. ENEMY_COMBATANT PERS

Step-by-step solution

Step 1 of 2

Relationship between Entity Sets and Attributes

Specialization: Specialization is the process of classifying a class of objects into more


specialized subclasses. Consider an example a “PERSON” class, classify this class objects into
more specialized subclasses like MOTHER, STUDENT, SOILDER, and so on.

Generalization: Generalization is a relationship in which the child class is based on the parent
class. Both child and parent class elements in a generalization relationship must be of the same
type.

Aggregation: It specifies a whole/part relationship between the aggregate (whole) and a


component part. When a class is formed as a collection of other classes, it is called an
aggregation relationship between these classes. It is also called a “has a” relationship.

Inheritance: A child class properties is derived from parent class properties. It is also called an “Is
a” relationship.

Comment
Step 2 of 2

Consider the entity sets and attributes and apply one of the relationship.

Entity Sets and Attributes Relationship Table

Comment
Chapter 4, Problem 24E

Problem

Draw a UML diagram for storing a played game of chess in a database. You may look at
http://www.chessgames.com for an application similar to what you are designing. State clearly
any assumptions you make in your UML diagram. A sample of assumptions you can make about
the scope is as follows:

1. The game of chess is played between two players.

2. The game is played on an 8 x 8 board like the one shown below:

3. The players are assigned a color of black or white at the start of the game.

4. Each player starts with the following pieces (traditionally called chessmen):

a. king

b. queen

c. 2 rooks

d. 2 bishops

e. 2 knights

f. 8 pawns

5. Every piece has its own initial position.

6. Every piece has its own set of legal moves based on the state of the game. You do not need to
worry about which moves are or are not legal except for the following issues:

a. A piece may move to an empty square or capture an opposing piece.

b. If a piece is captured, it is removed from the board.

c. If a pawn moves to the last row, it is “promoted” by converting it to another piece (queen, rook,
bishop, or knight).

Note: Some of these functions may be spread over multiple classes.

Step-by-step solution

Step 1 of 1

Assumptions:

1. In any move maximum two pieces can get affected.

2. Player can promote a piece.

3. A piece gets promoted.

4. After a move captured piece is removed from board.

Comment
Chapter 4, Problem 25E

Problem

Draw an EER diagram for a game of chess as described in Exercise. Focus on persistent
storage aspects of the system. For example, the system would need to retrieve all the moves of
every game played in sequential order.

Exercise

Draw a UML diagram for storing a played game of chess in a database. You may look at
http://www.chessgames.com for an application similar to what you are designing. State clearly
any assumptions you make in your UML diagram. A sample of assumptions you can make about
the scope is as follows:

1. The game of chess is played between two players.

2. The game is played on an 8 x 8 board like the one shown below:

3. The players are assigned a color of black or white at the start of the game.

4. Each player starts with the following pieces (traditionally called chessmen):

a. king

b. queen

c. 2 rooks

d. 2 bishops

e. 2 knights

f. 8 pawns

5. Every piece has its own initial position.

6. Every piece has its own set of legal moves based on the state of the game. You do not need to
worry about which moves are or are not legal except for the following issues:

a. A piece may move to an empty square or capture an opposing piece.

b. If a piece is captured, it is removed from the board.

c. If a pawn moves to the last row, it is “promoted” by converting it to another piece (queen, rook,
bishop, or knight).

Note: Some of these functions may be spread over multiple classes.

Step-by-step solution

Step 1 of 1

EER diagram for chess game

Enhanced Entity Relationship diagram is the concept of superclass and subclass entity types in
the ER model.

Here super classes are PLAYER, MOVES, PIECES and subclasses are Name, Color,
Cur_position, Initian_Position, Piece_name, Position_before_move, Changed_position.
Sequence order for game play:

Step 1: PLAYER makes a first move.

Step 2: PIECES get moved and give the chance for position.

Step 3: PLAYER, take the chance and ready to nest move

Step 4: PIECES change the position for avoiding the PLAYER move.

Step 5: This process will continue until the End.

Comment
Chapter 4, Problem 26E

Problem

Which of the following EER diagrams is/are incorrect and why? State clearly any assumptions
you make.

a.

b.

c.

Step-by-step solution

Step 1 of 3

a.

The given EER diagram is correct.

• E is a super class and E1 and E2 are sub classes of entity E.

• E1 and E2 are overlapping entities of entity E. It indicates that E may be a member of E1 or E2


or both.

• There exists a one to many relationship R between E2 and E3.

Comment

Step 2 of 3

b.

The given EER diagram is correct.

• E is a super class and E1 and E2 are sub classes of entity E.

• E1 and E2 are disjoint entities of entity E. It indicates that E may be a member of E1 or E2.

• There exists a one to one relationship R between E1 and E2.

Comment

Step 3 of 3

c.

The given EER diagram is incorrect.

• E1 and E3 are overlapping entities of entity say E. It indicates that E may be a member of E1 or
E3 or both.

• The overlapping entities E1 and E3 cannot share a relationship R. So there cannot be a many
to many relationship between E1 and E3.

Hence, given EER is not possible.

Comments (1)
Chapter 4, Problem 27E

Problem

Consider the following EER diagram that describes the computer systems at a company. Provide
your own attributes and key for each entity type. Supply max cardinality constraints justifying
your choice. Write a complete narrative description of what this EER diagram represents.

Step-by-step solution

Step 1 of 5

S.No Entity Type Attributes Key

1 COMPUTER RAM, ROM, Processor, S_no, Manufacturer, Cost S_no

2 ACCESSORY S_no, cost, type S_no

3 LAPTOP Weight, Screen_size NA

4 DESKTOP Color NA

Lic_no, Cost, Manufacturer, Is_system_software,


5 SOFTWARE Lic_no
Year_of_manufacturing, Version, Author

6 OPERATONG_SYSTEM Name, size NA

7 COMPONENT Manufacturer, S_no, Cost, Type S_no

8 KEYBOARD Type NA

9 MEMORY Size NA

10 MONITOR Size, Resolution, Type NA

11 MOUSE Type, Is_wired NA

12 SOUND_CARD Type NA

13 VIDEO_CARD Type NA

Comment

Step 2 of 5

(min,max)
Relationship Entity type1 (min,max)
S.No constraint, Entity type2 name
name name constraint
REASON

1 SOLD_WITH COMPUTER (1,1) ACCESSORY (1,N)

2 INSTALLED COMPUTER (1,1) SOFTWARE (1,M)


3 INSTALLED_OS COMPUTER (1,1) OPERATING_SYSTEM (1,N)

4 MEM_OPTIONS LAPTOP (1,1) MEMORY (1,N)

5 OPTIONS DESKTOP (1,1) COMPONENT (1,N)

6 SUPPORTS SOFTWARE (1,N) COMPONENT (1,M)

Comment

Step 3 of 5

As all components and accessories are restricted by S_no and all softwares are restricted by
Lic_no so each can go to a single LAPTOP/ DESKTOP/ COMPUTER. On the contrary a
computer can have any no of ACCESSORY/ SOFTWARE/ OPERATING_SYSTEM/
COMPONENT/ MEMORY.

SOFTWARE may need many supporting COMPONENTS and a COMPONENT can SUPPORT
many SOFTWARES.

Comment

Step 4 of 5

Narrative description: A database is needed to maintain all computers systems in a company.


Each COMPUTER in company has a unique S_no. it has a fixed RAM, ROM, Processor,
Manufacturer and Cost. A COMPUTER can be a LAPTOP or a DESKTOP. Each LAPTOP has
Screen_size, Weight. Each DESKTOP has a Colour. A COMPUTER has many SOFTWARE
INSTALLED. Each SOFTWARE has a unique Lic_no. It has also associated with it Cost,
Manufacturer, Is_system_software, Year_of_manufacturing, Version, Author.
OPERATING_SYSTEM is also software that is related to COMPUTER and has size of memory it
consumes and name associated with it.

Comment

Step 5 of 5

With COMPUTER one can get ACCESSORY. Each ACCESSORY has cost, S_no, and
type(audio/ video./ input/output). ACCESSORY can be categorized into KEYBOARD (type),
MOUSE (type, Is_wired), MONITOR(size, resolution, type).

Associated with DESKTOP and software we have various COMPONENT (Manufacturer, S_no,
Cost, Type). COMPONENT are further divided in MEMORY(size), AUDIO_CARD(type),
VIDEO_CARD(type). LAPTOP can also have MEMORY_OPTOIONS.

Comment
Chapter 4, Problem 29LE

Show transcribed image text


Consider an ONLINE AUCTION database system in which members (buyers and sellers) participate the sale of items The data
requirements for this system are summarized as follows: The online site has members, each of whom is identi ed by a unique
member number and is described by an e-mail address, name. password, home address, and phone number. A member may
be a buyer or a seller A buyer has a shipping address recorded in the database A seller has a bank account number and
routing. number recorded in the database. Items are placed by a seller for sale and are identi ed by unique item number
assigned by the system. Items are also described b) an Item title. a description, starting bid price, bidding increment, the start
date of the auction, and the end date of the auction. Items are also categorized based on a xed classi cation hierarchy ( r
example a modem may be classi ed as COMPUTER rightarrow HARDWARE rightarrow MODEM). Buyers make bids for items
they are interested in. Bid price and time of bid is recorded. The bidder at the end of the auction with the highest bid price is
declared the winner and a transaction between buyer and seller may the proceed The buyer and seller my record feedback
regarding their completed transaction. Feedback contains a rating of the other party participating in the transaction (1-10)and a
comment.

EER diagram for Online Auction Database

View comments (1)



Chapter 4, Problem 30LE
Consider a database system for a baseball organization such as the major leagues. The data requirements
are summarized as follows:
The personnel involved in the league include players, coaches, managers, and umpires. Each is identi ed
by a unique personnel id. They are also described by their rst and last names along with the date and
place of birth.
Players are further described by other attributes such as their batting orientation (left, right, or switch) and
have a lifetime batting average (BA).
Within the players group is a subset of players called pitchers. Pitchers have a lifetime ERA (earned run
average) associated with them.
Teams are uniquely identi ed by their names. Teams are also described by the city in which they are
located and the division and league in which they play (such as Central division of the American League).
Teams have one manager, a number of coaches, and a number of players.
Games are played between two teams, with one designated as the home team and the other the visiting
team on a particular date. The score (runs, hits, and errors) is recorded for each team. The team with the
most runs is declared the winner of the game.
With each nished game, a winning pitcher and a losing pitcher are recorded. In case there is a save
awarded, the save pitcher is also recorded.
With each nished game, the number of hits (singles, doubles, triples, and home runs) obtained by each
player is also recorded.
Design an enhanced entity–relationship diagram for the BASEBALL database
Using that EER diagram, model the database in Microsoft Access.
Populate each table with appropriate data.
our populated Access database with all relationships added (with referential integrity, of course)
If you are a completist, you can nd data at ESPN or MLB sites.

Expert Answer

Anonymous answered this


Was this answer helpful? 0 0
5 answers

The EER model of the Baseball Database is as follows:

Below are the Database tables designed in MS Access for Teams, Managers, Umpires, Players and Pitchers:
For Managers:

For Players:

For Pitchers:

For Umpires:
These all the related tables used to manage the Baseball game with a Master Database as follows:

Master DB: Part1:

Master DB Part2:
Problem
Chapter 4, Problem 31LE

Consider the EER diagram for the UNIVERSITYdatabase shown in Figure 4.9.Enter this design
using a data modeling tool such as ERwin or Rational Rose. Make a list of the differences in
notation between the diagram in the text and the corresponding equivalent diagrammatic notation
you end up using with the tool.

Step-by-step solution

Step 1 of 1

Refer to the figure 4.9 for the EER diagram of the UNIVERSITY database. Use Rational Rose
tool to create the ER schema for the database as follow:

• In the options available on left, right click on the option Logical view, go to New and select the
option Class Diagram.

• Name the class diagram as UNIVERSITY. Select the option Class available in the toolbar and
then click on empty space of the Class Diagram file. Name the class as FACULTY.

Right click on the class, select the option New Attribute, and name the attribute as Rank.
Similarly, create the other attributes Foffice, Fphone and Salary.

• Similarly create another class GRANT and its attributes Title, No, Agency and St_date.

• Now right click on the attribute No, available on the left under the class GRANT, and select the
option Open Specification. Select the Protected option under Export Control. This will make
the attribute No as primary key.

• Select the option Unidirectional Association from the toolbar, for creating relationships
between the two classes. Now click on the class FACULTY; while holding the click drag the
mouse towards the class GRANT and release the click. This will create the relationship between
the two selected classes.

Name the association as PI. Since the structural constraint in the EER diagram is specified using
cardinality ratio, so specify the structural constraints using the Rational Rose tool as follows:

• Right click on the association close to the class FACULTY and select 1 from the option
Multiplicity.

• Again, right click on the association close to the class GRANT and select n from the option
Multiplicity.

• Similarly, create other classes and their associated attributes. Specify the relationships and
structural constraints between the classes, as mentioned above.

ER schema may be specified using alternate diagrammatic notation that is class diagram,
through the use of Rational Rose tool as follows:
The list of differences in notation between the EER diagram used in the figure 4.9 and its
equivalent diagrammatic notation, drawn through the Rational Rose tool, are as follows:

• In the EER diagram the entities are specified in a rectangle. However, the class diagram in
Rational Rose makes use of top section of the class diagram for specifying the entities.

• The attributes are specified in the EER diagram using the oval. The class diagram in the
Rational Rose makes use of the middle section, for specifying the attributes.

• The primary keys in the EER diagram are specified by underlining the attribute in an oval.

An attribute can be made a primary key in the class diagram in the Rational Rose by selecting
the option Open Specification; followed by selecting the Protected option under Export
Control. A yellow color key against the attribute in the class diagram in the Rational Rose
indicates primary key.

• The relationship between two entities is specified in the diamond shaped box. For example, in
figure 4.9 PI is the relationship between FACULTY and GRANT.

The class diagram in Rational Rose makes use of option Unidirectional Association for
specifying the relation or association between two entities. For example, in the above class
diagram, the association named PI is specified on the line joining the two entities.

• The structural constraint in the EER diagram is specified using cardinality ratio. For example, in
the PI relationship, FACULTY: GRANT is of cardinality ratio 1:N.

In the class diagram made using Rational Rose, the Multiplicity option is used for specifying the
cardinality ratio.

Comment
Chapter 4, Problem 31LE

Problem

Consider the EER diagram for the UNIVERSITYdatabase shown in Figure 4.9.Enter this design
using a data modeling tool such as ERwin or Rational Rose. Make a list of the differences in
notation between the diagram in the text and the corresponding equivalent diagrammatic notation
you end up using with the tool.

Step-by-step solution

Step 1 of 1

Refer to the figure 4.9 for the EER diagram of the UNIVERSITY database. Use Rational Rose
tool to create the ER schema for the database as follow:

• In the options available on left, right click on the option Logical view, go to New and select the
option Class Diagram.

• Name the class diagram as UNIVERSITY. Select the option Class available in the toolbar and
then click on empty space of the Class Diagram file. Name the class as FACULTY.

Right click on the class, select the option New Attribute, and name the attribute as Rank.
Similarly, create the other attributes Foffice, Fphone and Salary.

• Similarly create another class GRANT and its attributes Title, No, Agency and St_date.

• Now right click on the attribute No, available on the left under the class GRANT, and select the
option Open Specification. Select the Protected option under Export Control. This will make
the attribute No as primary key.

• Select the option Unidirectional Association from the toolbar, for creating relationships
between the two classes. Now click on the class FACULTY; while holding the click drag the
mouse towards the class GRANT and release the click. This will create the relationship between
the two selected classes.

Name the association as PI. Since the structural constraint in the EER diagram is specified using
cardinality ratio, so specify the structural constraints using the Rational Rose tool as follows:

• Right click on the association close to the class FACULTY and select 1 from the option
Multiplicity.

• Again, right click on the association close to the class GRANT and select n from the option
Multiplicity.

• Similarly, create other classes and their associated attributes. Specify the relationships and
structural constraints between the classes, as mentioned above.

ER schema may be specified using alternate diagrammatic notation that is class diagram,
through the use of Rational Rose tool as follows:
The list of differences in notation between the EER diagram used in the figure 4.9 and its
equivalent diagrammatic notation, drawn through the Rational Rose tool, are as follows:

• In the EER diagram the entities are specified in a rectangle. However, the class diagram in
Rational Rose makes use of top section of the class diagram for specifying the entities.

• The attributes are specified in the EER diagram using the oval. The class diagram in the
Rational Rose makes use of the middle section, for specifying the attributes.

• The primary keys in the EER diagram are specified by underlining the attribute in an oval.

An attribute can be made a primary key in the class diagram in the Rational Rose by selecting
the option Open Specification; followed by selecting the Protected option under Export
Control. A yellow color key against the attribute in the class diagram in the Rational Rose
indicates primary key.

• The relationship between two entities is specified in the diamond shaped box. For example, in
figure 4.9 PI is the relationship between FACULTY and GRANT.

The class diagram in Rational Rose makes use of option Unidirectional Association for
specifying the relation or association between two entities. For example, in the above class
diagram, the association named PI is specified on the line joining the two entities.

• The structural constraint in the EER diagram is specified using cardinality ratio. For example, in
the PI relationship, FACULTY: GRANT is of cardinality ratio 1:N.

In the class diagram made using Rational Rose, the Multiplicity option is used for specifying the
cardinality ratio.

Comment
Chapter 4, Problem 32LE

Problem

Consider the EER diagram for the small AIRPORTdatabase shown in Figure. Build this design
using a data modeling tool such as ERwin or Rational Rose. Be careful how you model the
category OWNER in this diagram. (Hint: Consider using CORPORATION_IS_OWNER and
PERSON_IS_OWNER as two distinct relationship types.)

EER schema for a SMALL_AIRPORT database.

Step-by-step solution

Step 1 of 2

Refer to the figure 4.12 for the EER schema of AIRLINE database. Use Rational Rose tool to
create the EER schema for the database as follow:

• In the options available on left, right click on the option Logical view, go to New and select the
option Class Diagram.

• Name the class diagram as SMALL_AIRPORT. Select the option Class available in the toolbar
and then click on empty space of the Class Diagram file. Name the class as PLANE_TYPE.

Right click on the class, select the option New Attribute, and name the attribute as Model.
Similarly, create the other attributes Capacity and Weight.

• Now right click on the attribute Model, available on the left under the class PLANE_TYPE, and
select the option Open Specification. Select the Protected option under Export Control. This
will make Model as the primary key.

• Similarly create another class EMPLOYEE and its attribute Salary and Shift.

• Select the option Unidirectional Association from the toolbar, for creating relationships
between the two classes. Now click on the class PLANE_TYPE; while holding the click drag the
mouse towards the class EMPLOYEE and release the click. This will create the relationship
between the two selected classes.

Name the association as WORKS_ON. Since the structural constraint in the EER diagram is
specified using cardinality ratio, so specify the structural constraints using the Rational Rose tool
as follows:

• Right click on the association close to the class PLANE_TYPE and select n from the option
Multiplicity.

• Again, right click on the association close to the class EMPLOYEE and select n from the option
Multiplicity.

• Similarly, create other classes and their associated attributes. Specify the relationships and
structural constraints between the classes, as mentioned above.

ER schema may be specified using alternate diagrammatic notation that is class diagram,
through the use of Rational Rose tool as follows:
Comment

Step 2 of 2

In the above class diagram, OWNER is the superclass, and PERSON and CORPORATION are
the subclasses. The subclasses can further participate in specific relationship types.

For example, in the above class diagram the PERSON subclass participates in the
OWNER_TYPE relationship. The subclass PERSON is further related to an entity type
PERSON_IS_OWNER via the OWNER_TYPE relationship.

Similarly, the subclass CORPORATION is related to CORPORATION_IS_OWNER via the


OWNER_TYPE relationship.

The relationship types can be specified using the Rational Rose as follows:

• Create the subclass PERSON_IS_OWNER of the class PERSON as explained above. Also
create the association between the class PERSON and its subclass PERSON_IS_OWNER and
name it as OWNER_TYPE, as explained above.

• Similarly, create the subclass CORPORATION_IS_OWNER of the class CORPORATION and


name the association between them as OWNER_TYPE.

Comment
Chapter 4, Problem 33LE

Problem

Consider the UNIVERSITY database described in Exercise 3.16.You already developed an ER


schema for this database using a data modeling tool such as ERwin or Rational Rose in Lab
Exercise 3.31. Modify this diagram by classifying COURSES as either
UNDERGRAD_COURSES or GRAD_COURSES and INSTRUCTORS as either
JUNIOR_PROFESSORS or SENIOR_PROFESSORS. Include appropriate attributes for these
new entity types. Then establish relationships indicating that junior instructors teach
undergraduate courses whereas senior instructors teach graduate courses.

Reference Exercise 3.31

Consider the EER diagram for the UNIVERSITYdatabase shown in Figure 4.9.Enter this design
using a data modeling tool such as ERwin or Rational Rose. Make a list of the differences in
notation between the diagram in the text and the corresponding equivalent diagrammatic notation
you end up using with the tool.

Reference Problem 3.16

Which combinations of attributes have to be unique for each individual SECTION entity in the
UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld
constraints:

a. During a particular semester and year, only one section can use a particular classroom at a
particular DaysTime value.

b. During a particular semester and year, an instructor can teach only one section at a particular
DaysTime value.

c. During a particular semester and year, the section numbers for sections offered for the same
course must all be different.

Can you think of any other similar constraints?


Step-by-step solution

Step 1 of 1

Refer to the Exercise 3.16 for the UNIVERSITY database and the ER schema developed for this
database through Rational Rose tool. Using Rational Rose, make the required changes and
create the ER schema as follows:

• COURSE is the superclass and UNDERGRAD_COURSES and GRAD_COURSES are its


subclasses. The subclasses are introduced in the class diagram, developed using Rational Rose
tool in Lab Exercise 3.31, via Rational Rose tool as follows:

• Consider the class COURSE developed in Exercise 3.31. Select the option Class available in
the toolbar and then click on empty space of the Class Diagram file. Name the subclass as
UNDERGRAD_COURSES.

Right click on the class, select the option New Attribute, and name the attribute as Title.
Similarly, create the other attribute Department.

Similarly, create another subclass GRAD_COURSES of the class COURSE and its attributes
Title and Department.

• Similarly, create the subclasses JUNIOR_PROFESSORS and SENIOR_PROFESSORS of the


superclass INSTRUCTOR. Also create the attributes Specialization, Designation and
Qualification for these subclasses, as described above.

• The subclass JUNIOR_PROFESSORS is further related to another subclass


UNDERGRAD_COURSES via the TEACHES relationship. Also, the subclass
SENIOR_PROFESSORS is further related to another subclass GRAD_COURSES via the
TEACHES relationship.

The relationship types between the subclass and superclass can be specified using the Rational
Rose as follows:

• Select the option Unidirectional Association from the toolbar, for creating relationships
between the two classes. Now click on the class JUNIOR_PROFESSORS; while holding the
click drag the mouse towards the class UNDERGRAD_COURSES and release the click. This will
create the relationship between the two selected classes.

Name the association as TEACHES.

• Similarly, create the relationship between the classes SENIOR_PROFESSORS and


GRAD_COURSES.

ER schema with the changes may be specified using alternate diagrammatic notation that is
class diagram, through the use of Rational Rose tool as follows:
Comment
Chapter 5, Problem 1RQ

Problem

Define the following terms as they apply to the relational model of data: domain, attribute, n-
tuple, relation schema, relation state, degree of a relation, relational database schema, and
relational database state.

Step-by-step solution

Step 1 of 7

575-5-1RQ

1. Domain: Domain is a set of atomic (indivisible) values that can appear in a particular column
in a relational schema. A common method of specifying domain is to specify a data type (integer,
character, floating point, etc...) from which the data values forming a domain can be drawn.

For example: Consider a relational schema called Student that may have facts about students
in a particular course. Consider a fact to be name of the student. Name of a student must be a
char string. So we can say domain of name is char string.

Comment

Step 2 of 7

2. Attribute: An Attribute is a role played by some domain in the relational schema.

For example: In relational Schema STUDENT, NAME can be one of the attributes of the relation

NOTATIONS:

• Relational Schema R1 >> R(A1,A2,…..,AN)

• Attributes>> A1, A2 ….

• Domain of say A1>> dom(A1)

• Tuple>> t

Comment

Step 3 of 7

3. N-tuple: If a Relational Schema consists of n Attributes, i.e., degree of relational schema is n,


then n-tuple is an ordered list of n values that represent a tuple , t = ; where each value
vi,1<=i<=n, is a element of dom(Ai) or is a special NULL value.

For example: In a relational schema STUDENT, if we have four attributes, viz., Name, Roll No.,
Class, and , Rank then n-tuple for a student can be where

Student Ram has roll number 1 and studies in class to and got rank 5 in class.

4. Relational Schema: Relational schema is but collection of attributes that define facts and
relation between a real world entity and name. In other words a relational schema R, denoted by
R (A1,A2,….,AN), is made up of a name and a list of attributes A1, A2,…,An.

For example: STUDENT can be name of a relational schema and Name, Roll No., Class, and ,
Rank can be its four attributes.

Comment

Step 4 of 7

5. : A relation state, r, of a relation schema R(A1, A2,……An), is a set of n-tuples. In another


words a relation state of a relational schema is a collection of various tuples, where each tuple
represents information about single entity.

For example: In relational schema for student collection of data for 2 students, viz., , is a relation
state.

Formal Definition: A relation state, r(R), is a mathematical relation of degree n on the domains
of all attributes, which is a subset of the cartesian product of the domains that define R:

r(R) C (dom (A1) × dom (A2)×……..× dom (An))

Comment

Step 5 of 7

6. Degree of a Relation: The degree (or arity) of a relation is the number of attributes n of its
relational schema.

Comment
Step 6 of 7

7. Relational Database Schema: A Relational Database Schema S is a set of relation schemas,


S = { R1,R2,….Rn} and a set of integrity constraints IC.

Comment

Step 7 of 7

8. : A Relational Database State DB of S is set of relation states, DB = {r1,r2,….rn}, such that


each ri is state of Ri and such that the ri relation states satisfy the integrity constraints specified
in IC.

Comment
Problem
Chapter 5, Problem 2RQ

Why are tuples in a relation not ordered?

Step-by-step solution

Step 1 of 2

A relation in database management is defined as a set of tuples.

And mathematically, the elements of a set have no order among them.

Comment

Step 2 of 2

Hence, the tuples in a relation are not ordered.

Comment
Chapter 5, Problem 3RQ

Problem

Why are duplicate tuples not allowed in a relation?

Step-by-step solution

Step 1 of 1

Duplicate tuples are not allowed in a relation as it violates the relational integrity constraints.

• A key constraint states that there must be an attribute or combination of attributes in a relation
whose values are unique.

• There should not be any two tuples in a relation whose values are same for their attribute
values.

• If the tuples contains duplicate values, then it violates the key constraint.

Hence, duplicate tuples are not allowed in a relation.

Comment
Chapter 5, Problem 4RQ

Problem

What is the difference between a key and a superkey?

Step-by-step solution

Step 1 of 2

A super key SK is a set of attributes that uniquely identifies the tuples of a relation. It satisfies the
uniqueness constraint.

A key K is an attribute or set of attributes that uniquely identifies the tuples of a relation. It is a
minimal super key. In other words, when an attribute is removed from super key, it will no longer
be a super key.

Comment

Step 2 of 2

The differences between key and super key are as follows:

Comment
Chapter 5, Problem 5RQ

Problem

Why do we designate one of the candidate keys of a relation to be the primary key?

Step-by-step solution

Step 1 of 1

Every relation must contain an attribute or combination of attributes which can used to uniquely
identify each tuple in a relation.

• An attribute or combination of attributes which can used to uniquely identify each tuple in a
relation is known as candidate key.

• A relation can have more than one candidate key.

• Among several candidate key, one candidate key which is usually single and simple is chosen
as a primary key.

• A primary key is an attribute that uniquely identifies each tuple in a relation.

Comment
Chapter 5, Problem 6RQ

Problem

Discuss the characteristics of relations that make them different from ordinary tables and files.

Step-by-step solution

Step 1 of 2

The tables, relations, and files are the key concepts of the relational data model. A relation
resembles a table, but it has some added constraints to it to use the link between two tables in
an efficient way.

A file is basically a collection of records or a table stored on a physical device.

Comment

Step 2 of 2

Even though both the relation and a table are used to store/represent data, there are differences
between them as shown below:

Comment
Chapter 5, Problem 7RQ

Problem

Discuss the various reasons that lead to the occurrence of NULL values in relations.

Step-by-step solution

Step 1 of 2

NULL value:

The absence of a data, that is “nothing” represented as an empty value.

• The NULL value can be considered as a data.

• The data may be “zero”, “blank” or “none”

For example,

If the student does not have any pen or pencil for the exam,

• For that particular student, the values of those attributes are defined as NULL.

• The NULL value can be either the values do not exist, an unknown value or the value not yet
available.

Comment

Step 2 of 2

The Occurrence of NULL values in relations:

• The tuple can be marked as NULL, When the value of an attribute is not applicable.

• The tuple can be marked as NULL, When the existing value of an attribute is unknown.

• If the value of an attribute does not apply to a tuple, it is also marked as NULL.

• If the value of an attribute is not known or not found, the particular tuple is marked as NULL.s

• For instance, suppose the values are known but specifically does not apply to the tuple it is
marked as NULL.

• In relations of NULL values, the values exist but at present, it is not available.

• In relations of NULL values, the different meanings can be conveyed by different codes.

• In relations, the operations of NULL value have been proved when the lack of value (NULL) is
found.

Comment
Chapter 5, Problem 8RQ

Problem

Discuss the entity integrity and referential integrity constraints. Why is each considered
important?

Step-by-step solution

Step 1 of 2

Entity Integrity Constraint: It states that no primary key value can be NULL.

Importance: Primary key values are used to identify a tuple in a relation. Having NULL value for
primary key will mean that we cannot identify some tuples.

Referential Integrity Constraints: It states that a tuple in one relation that refers to another
relation must refer to an existing tuple in that relation

Comment

Step 2 of 2

Definition using Foreign Key: For two relational schemas R1 and R2, a set of attributes FK in
relation schema R1, is foreign key of R1 that references relation R2 f it satisfies following
condition:

• Attributes in FK have same domain(s) as primary key attributes PK of R2; the attributes FK are
said to reference relation R2.

• A value of FK in a tuple t1 of the current state r1 (R1) either occurs in as a value of PK for some
tuple in the current state r2 (R2) or is NULL . In former case (t1 [FK] = t2 [PK]) tuple t1 is said to
refer to the tuple t2.

When these two conditions hold true between R1 the referencing relation and R2 the referenced
relation the referential integrity constraint is said to hold true.

Importance: Referential Integrity constraints are specified among two relations and are used to
maintain consistency among tuples in two relations.

Comment
Chapter 5, Problem 9RQ

Problem

Define foreign key. What is this concept used for?

Step-by-step solution

Step 1 of 2

A foreign key is an attribute or composite attribute of one relation which is/are a primary key of
other relation that is used to maintain relationship between two relations.

• A relation can have more than one foreign key.

• A foreign key can contain null values.

Comment

Step 2 of 2

The concept of foreign key is used to maintain referential integrity constraint between two
relations and hence in maintaining consistency among tuples in two relations.

• The value of a foreign key should match with value of the primary key in the referenced relation.

• A value to a foreign key cannot be added which does not exist in the primary key of the
referenced relation.

• It is not possible to delete a tuple from the referenced relation if there is any matching record in
the referencing relation.

Comment
Chapter 5, Problem 10RQ

Problem

What is a transaction? How does it differ from an Update operation?

Step-by-step solution

Step 1 of 2

A transaction is a program in execution that involves various operations that can be done on the
database.

The operations that are included in a transaction are as follows:

• Reading data from the database.

• Deleting a tuple from the database.

• Inserting new tuples to the database

• Updating values of existing tuples in the database.

Comment

Step 2 of 2

The main difference between update operation and a transaction is as follows:

• In an update operation, only a single attribute value can be changed at one time.

• In a transaction, more than one update operation along with reading data from the database,
insertion and deletion operations can be done.

Comment
Chapter 5, Problem 11E

Problem

Suppose that each of the following Update operations is applied directly to the database state
shown in Figure 5.6. Discuss all integrity constraints violated by each operation, if any, and the
different ways of enforcing these constraints.

a. Insert <‘Robert’, ‘F’ ‘Scott’, ‘943775543’, ‘1972-06-21’, ‘2365 Newcastle Rd, Bellaire, TX’, M,
58000, ‘888665555’, 1> into EMPLOYEE.

b. Insert <‘ProductA’, 4, ‘Bellaire’, 2> into PROJECT.

c. Insert <‘Production’, 4, ‘943775543’, ‘2007-10-01’> into DEPARTMENT.

d. Insert <‘677678989’, NULL, ‘40.0’> into WORKS_ON.

e. Insert <‘453453453’, ‘John’, ‘M’, ‘1990-12-12’, ‘spouse’> into DEPENDENT.

f. Delete the WORKS_ON tuples with Essn = ‘333445555’.

g. Delete the EMPLOYEE tuple with Ssn = ‘987654321’..

h. Delete the PROJECT tuple with Pname = ‘ProductX’.

i. Modify the Mgr_ssn and Mgr_start_date of the DEPARTMENT tuple with Dnumber = 5 to
‘123456789’ and ‘2007-10-01’, respectively.

j. Modify the Super_ssn attribute of the EMPLOYEE tuple with Ssn = ‘999887777’ to
‘943775543’.

k .Modify the Hours attribute of the WORKS_ON tuple with Essn = ‘999887777’ and Pno = 10 to
‘5.0’.

Step-by-step solution
Step 1 of 11

(a)

Acceptable operation.

Comment

Step 2 of 11

(b)

Not Acceptable. Violates referential integrity constraint as value of Department number that is
foreign key is not present in DEPARTMENT relation.

Ways of enforcing as follows:

• Not performing the operation and explain to user cause of the same.

• Inserting NULL value in department field and performing operation.

• Prompting user to insert department with Dept number 2 in DEPRTMENT relation and then
performing the operation.

Comment

Step 3 of 11

(c)

Not Acceptable. Violates Key constraint. Department with dept number 4 already exist. Ways of
enforcing as follows:

• Not performing the operation and explain to user cause of the same.

Comment

Step 4 of 11

(d)

Not Acceptable. Violates entity Integrity constraint and referential integrity constraint. Value of
one of the Attributes of primary is NULL. Also value of Essn is not present in referenced relation,
i.e., EMPLOYEE.

Ways of enforcing as follows:

• Not performing the operation and explain to user cause of the same.

• Prompting user to specify correct values for the primary key and performing the operation.

Comment

Step 5 of 11

(e)

Acceptable

Comment

Step 6 of 11

(f)

Acceptable

Comment

Step 7 of 11

(g)

Not Acceptable.

Violates referential integrity constraint as value of Ssn has been used as foreign key of
WORKS_ON, EMPLOYEE, DEPENDENT, DEPARTMENT relations and deleting record with Ssn
= ‘987654321’ will leave no corresponding entry for record in WORKS_ON relation.

Ways of enforcing as follows:

• Not performing the operation and explain to user cause of the same.

• Deleting corresponding records in corresponding tables as well.

Comment
Step 8 of 11

(h)

Not Acceptable.

Violates referential integrity constraint as value of Pnumber has been used as foreign key of
WORKS_ON relation and deleting record with Pname = ‘ProductX’ will also delete product with
Pnumber = ’1’. Since this value has been used in WORKS_ON table so deleting this record will
violate referential integrity constraint. Ways of enforcing as follows:

• Not performing the operation and explain to user cause of the same.

• Deleting corresponding records in corresponding tables as well.

Comment

Step 9 of 11

(i)

Acceptable.

Comment

Step 10 of 11

(j)

Not Acceptable.

Violates referential integrity constraint as value of Super_Ssn is also foreign key for EMPLOYEE
relation. Since no employee with Ssn = ‘943775543’ exist so Super_Ssn of any employee cannot
be ‘943775543’.

Ways of enforcing as follows:

• Not performing the operation and explain to user cause of the same.

• Prompting user to either add a record in EMPLOYEE relation with Ssn = ‘943775543’ or to
change Super_Ssn to some valid value.

Comment

Step 11 of 11

(k)

Acceptable.

Comment
Chapter 5, Problem 12E

Problem

Consider the AIRLINE relational database schema shown in Figure, which describes a database
for airline flight information. Each FLIGHT is identified by a Flight_number, and consists of one or
more FLIGHT_LEGs with Leg_numbers 1, 2, 3, and so on. Each FLIGHT_LEG has scheduled
arrival and departure times, airports, and one or more LEG_INSTANCEs—one for each Date on
which the flight travels. FAREs are kept for each FLIGHT. For each FLIGHT_LEG instance,
SEAT_RESERVATIONs are kept, as are the AIRPLANE used on the leg and the actual arrival
and departure times and airports. An AIRPLANE is identified by an Airplane_id and is of a
particular AIRPLANE_TYPE. CAN_LAND relates AIRPLANE_TYPEs to the AIRPORTs at which
they can land. An AIRPORT is identified by an Airport_code. Consider an update for the AIRLINE
database to enter a reservation on a particular flight or flight leg on a given date.

a. Give the operations for this update.

b. What types of constraints would you expect to check?

c. Which of these constraints are key, entity integrity, and referential integrity constraints, and
which are not?

d. Specify all the referential integrity constraints that hold on the schema shown in Figure.

The AIRLINE relational database schema.

Step-by-step solution

Step 1 of 4

a.

First it is necessary check if the seats are available on the on a particular flight or flight leg on a
given date. This can be done by checking the LEG_INSTANCE relation.

SELECT Number_of_available_seats FROM LEG_INSTANCE

WHERE Flight_number ='FL01' and Date='2000-06-07';

If the Number_of_available_seats>0, then perform the following operation to reserve a seat.

INSERT INTO SEAT_RESERVATION VALUES

('FL01', '1', '2000-06-07', '1', 'John','9910110110');

Comment

Step 2 of 4

b.

The constraints that need to be checked into to perform the update are as follows:

• Check if Number_of_available_seats in LEG_INSTANCE relation for the particular flight on the


particular date is greater than 1.

• Check if the particular SEAT_NUMBER for particular flight on the particular date is available or
not.

Comments (1)
Step 3 of 4

c.

Checking the Number_of_available_seats in LEG_INSTANCE relation does not come under


entity or referential integrity constraint.

Checking for SEAT_NUMBER particular flight on the particular date comes under entity integrity
constraint.

Comment

Step 4 of 4

d.

A referential integrity constraint specifies that the value of a foreign key should match with value
of the primary key in the primary table.

The referential integrity constraints hold are as follows:

• Flight_number of FLIGHT_LEG relation is a foreign key which references the Flight_number of


FLIGHT relation.

• Flight_number of LEG_INSTANCE is a foreign key which references the Flight_number of


FLIGHT relation.

• Flight_number of FARE is a foreign key which references the Flight_number of FLIGHT relation.

• Flight_number of SEAT_RESERVATION is a foreign key which references the Flight_number of


FLIGHT relation.

• Departure_airport_code and Arrival_airport_code of FLIGHT_LEG are foreign keys which


references the Airport_code of AIRPORT relation.

• Departure_airport_code and Arrival_airport_code of LEG_INSTANCE are foreign keys which


references the Airport_code of AIRPORT relation.

• Airport_code of CAN_LAND is a foreign key which references the Airport_code of AIRPORT


relation.

• Flight_number and Leg_number of LEG_INSTANCE are foreign keys which references


Flight_number and Leg_number of FLIGHT_LEG.

• Airplane_id of LEG_INSTANCE is a foreign key which references the Airplane_id of AIRPLANE


relation.

• Flight_number, Leg_number and Date of SEAT_RESERVATION are are foreign keys which
references Flight_number, Leg_number and Date of LEG_INSTANCE relation.

• Airplane_type_name of CAN_LAND is a foreign key which references the Airplane_type_name


of AIRPLANE_TYPE relation.

Comment
Chapter 5, Problem 13E

Problem

Consider the relation CLASS(Course#, Univ_Section#, Instructor_name, Semester,


Building_code, Room#, Time_period, Weekdays, Credit_hours). This represents classes taught
in a university, with unique Univ_section#s. Identify what you think should be various candidate
keys, and write in your own words the conditions or assumptions under which each candidate
key would be valid.

Step-by-step solution

Step 1 of 2

The relation CLASS specified about the uniqueness of and classes that are
taught in University.

As per the CLASS relation, the following are the possible candidate keys:

1. – If this is unique throughout all the semesters.

2. – If at least one course is taught by an instructor for each


semester.

3. – If at
given same time, for a specific semester, same room cannot be used by more than one course.

Comment

Step 2 of 2

4. – These would be the candidate keys if the


is not unique. In this case, more than one Universities are considered and
depending on the section numbers used by rules of University.

5. Otherwise, – If is unique, then all the sections


are assigned with unique numbers throughout the semester.

Comment
Chapter 5, Problem 14E

Problem

Consider the following six relations for an order-processing database application in a company:

CUSTOMER(Cust#, Cname, City)

ORDER(Order#, Odate, Cust#, Ord_amt)

ORDER_ITEM(Order#, Item#, Qty)

ITEM(Item#, Unit_price)

SHIPMENT(Order#, Warehouse#, Ship_date)

WAREHOUSE(Warehouse#, City)

Here, Ord_amt refers to total dollar amount of an order; Odate is the date the order was placed;
and Ship_date is the date an order (or part of an order) is shipped from the warehouse. Assume
that an order can be shipped from several warehouses. Specify the foreign keys for this schema,
stating any assumptions you make. What other constraints can you think of for this database?

Step-by-step solution

Step 1 of 2

Foreign Keys:

a. Cust# of ORDER is FK for CUSTOMER: orders are taken from recognized customers only.

b. Order# of ORDER_ITEM is FK of ORDER.

c. Item# of ORDER_ITEM is FK of ITEM: Orders are taken only for items in stock.

d. Order# of SHIPMENT is FK of ORDER: Shipment is done only for orders taken.

e. Warehouse# of SHIPMENT is FK of WAREHOUSE: shipment is done only from companies


warehouses.

Comment

Step 2 of 2

Other Constraints:

• Ship_date must be greater (later date) then Odate in ORDER. Order must be taken before it is
shipped.

• Ord_amt must be greater than Unit_price.

Comment
Chapter 5, Problem 15E

Problem

Consider the following relations for a database that keeps track of business trips of salespersons
in a sales office:

SALESPERSON(Ssn, Name, Start_year, Dept_no)

TRIP(Ssn, From_city, To_city, Departure_date, Return_date, Trip id)

EXPENSE(Trip id, Account#, Amount)

A trip can be charged to one or more accounts. Specify the foreign keys for this schema, stating
any assumptions you make.

Step-by-step solution

Step 1 of 3

A foreign key is a column or composite of columns which is/are a primary key of other table that
is used to maintain relationship between two tables.

• A foreign key is mainly used for establishing relationship between two tables.

• A table can have more than one foreign key.

Comment

Step 2 of 3

The foreign keys in the given relations are as follows:

• Ssn is a foreign key in TRIP relation. It references the Ssn of SALESPERSON relation.

• Trip_id is a foreign key in EXPENSE relation. It references the Trip_id of TRIP relation.

Comment

Step 3 of 3

Assume that there are additional tables that stores the department information and account
details. Then possible foreign keys are as follows:

• Dept_no is a foreign key in SALESPERSON relation.

• Account# is a foreign key in EXPENSE relation.

Comment
Chapter 5, Problem 16E

Problem

Consider the following relations for a database that keeps track of student enrollment in courses
and the books adopted for each course:

STUDENT(Ssn, Name, Major, Bdate)

COURSE(Course#, Cname, Dept)

ENROLL(Ssn, Course#, Quarter, Grade)

BOOK ADOPTION(Course#, Quarter, Book_isbn)

TEXT(Book_isbn, Book_title, Publisher, Author)

Specify the foreign keys for this schema, stating any assumptions you make.

Step-by-step solution

Step 1 of 2

A foreign key is a column or composite of columns which is/are a primary key of other table that
is used to maintain relationship between two tables.

• A foreign key is mainly used for establishing relationship between two tables.

• A table can have more than one foreign key.

Comment

Step 2 of 2

The foreign keys in the given relations are as follows:

• Ssn is a foreign key in ENROLL table which references the Ssn of STUDENT table . Ssn is a
primary key in STUDENT table.

• Course# is a foreign key in ENROLL table which references the Course# of COURSE table .
Course#is a primary key in COURSE table.

• Course# is a foreign key in BOOK_ADOPTION table which references the Course# of


COURSE table . Course# is a primary key in COURSE table.

• Book_isbn is a foreign key in BOOK_ADOPTION table which references the Book_isbn of


TEXT table . Book_isbn is a primary key in TEXT table.

Comment
Chapter 5, Problem 17E

Problem

Consider the following relations for a database that keeps track of automobile sales in a car
dealership (OPTION refers to some optional equipment installed on an automobile):

CAR(Serial no, Model, Manufacturer, Price)

OPTION(Serial_no, Option_name, Price)

SALE(Salesperson_id, Serial_no, Date, Sale_price)

SALESPERSON(Salesperson_id, Name, Phone)

First, specify the foreign keys for this schema, stating any assumptions you make. Next, populate
the relations with a few sample tuples, and then give an example of an insertion in the SALE and
SALESPERSON relations that violates the referential integrity constraints and of another
insertion that does not.

Step-by-step solution

Step 1 of 4

Foreign keys are:

a. Serial_no from OPTION is FK for CAR: spare parts can be added to cars with serial number.

b. Serial_no from is FK for CAR:only car with serial number can be put to sale.

c. Salesperson_id from is FK for SALESPERSON: salesperson can sell any car.

Comments (2)

Step 2 of 4

Consider a relation schema state:

CAR:

Serial_no Model Manufacturer Price(lakh)

1 1987 ford 7

2 1998 Tata 4

3 1988 Ferrari 20

4 1952 Ford 2

Serial_no Option_name Price

2 Abc 200

4 def 400

OPTION:

Comment

Step 3 of 4

SALESPERSON:

Saleperson_id Name Phone

Sl1 Ram 9910101010

Sl2 John 9999999999

Sl3 Mario 9090909090

Saleperson_id Serial_no Date Sale_price(lakh)

Sl1 1 2000-6-07 7.5


Sl2 2 2000-6-08 4.1

Comment

Step 4 of 4

Insertion in that violates Referential Integrity constraint:

Insert <’Sl4’, ‘5’,’2000-07-07’,’21’> into

Invalid Saleperson_id and Serial_no.

Insertion in that does not violates Referential Integrity constraint:

Insert < ’Sl1’,’4’,’2000-09-07’,’2.1’> into

Insertion in SALESPERSON can not violate Referential Integrity constraint. A valid insertion for
SALESPERSON can be:

Insert <’Sl4’, ‘Jack’,’9190000000’> into SALESPERSON.

Comment
Chapter 5, Problem 18E

Problem

Database design often involves decisions about the storage of attributes. For example, a Social
Security number can be stored as one attribute or split into three attributes (one for each of the
three hyphen-delineated groups of numbers in a Social Security number—XXX-XX-XXXX).
However, Social Security numbers are usually represented as just one attribute. The decision is
based on how the database will be used. This exercise asks you to think about specific situations
where dividing the SSN is useful.

Step-by-step solution

Step 1 of 2

Usually during the database design, the social security number (SSN) is stored as single
attribute.

• SSN is made up of 9 digits divided into three parts.

• The format of SSN is XXX-XX-XXXX.

• Each part is separated by a hyphen.

• The first part represents the area number.

• The second part represents the group number.

• The third part represents the serial number.

Comment

Step 2 of 2

The situations where it is preferred to store the SSN as parts instead of as a single attribute is as
follows:

• Area number determines the location or state. In some cases, it is necessary to group the data
based on the location to generate some statistical information.

• The area code (or city code) is required and sometimes country code is needed for dialing the
international phone numbers.

• Every part has its own independent existence.

Comment
Chapter 5, Problem 19E

Problem

Consider a STUDENT relation in a UNIVERSITY database with the following attributes (Name,
Ssn, Local_phone, Address, Cell_phone, Age, Gpa). Note that the cell phone may be from a
different city and state (or province) from the local phone. A possible tuple of the relation is
shown below:

Name Ssn Local_phone Address Cell_phone Age Gpa

123 Main St.,


George Shaw William 123-45-
555-1234 Anytown, CA 555-4321 19 3.75
Edwards 6789
94539

a. Identify the critical missing information from the Local_phone and Cell_phone attributes. (Hint:
How do you call someone who lives in a different state or province?)

b. Would you store this additional information in the Local_phone and Cell_phone attributes or
add new attributes to the schema for STUDENT?

c. Consider the Name attribute. What are the advantages and disadvantages of splitting this field
from one attribute into three attributes (first name, middle name, and last name)?

d. What general guideline would you recommend for deciding when to store information in a
single attribute and when to split the information?

e. Suppose the student can have between 0 and 5 phones. Suggest two different designs that
allow this type of information.

Step-by-step solution

Step 1 of 5

a. State, province or city code is missing from phone number information.

Comment

Step 2 of 5

b. Since cell phone and local phone can be of different city or state, additional information must
be added in Local_phone and Cell_phone attributes.

Comment

Step 3 of 5

c. If Name is Split in First_name, Middle_name and Last_name attributes there can be following
advantages:

• Sorting can be done on basis of First Name or Last Name or Middle Name.

Disadvantages:

• By splitting single attribute into three attributes NULL values may increase in database. (If few
students don’t have a Middle Name.)

• Extra Memory will be consumed for storing NULL values of attributes that may not exist for a
particular student. (Middle Name).

Comment

Step 4 of 5

d. To decide when to store information in single attribute:

• When storing information in different attributes will create NULL values, single attribute must be
preferred.

• When while using single attribute atomicity can not be maintained, we must use different
attributes.

• When information needs to be sorted on the basis of some Sub-field of and attribute or when
any sub-field is needed for decision making, we must split single attribute into many.

e.
Comment

Step 5 of 5

First Design

• STUDENT(Name, Ssn, Phone_number_count, Address, Age, Gpa)

Phone (Ssn, Phone_number)

Second Design:

• STUDENT(Name, Ssn, Phone_number1, Phone_number2, Phone_number3, Phone_number4,


Phone_number5, Address, Age, Gpa)

Although schema can be designed in either of the two ways but design first is better than second
as it leaves lesser number of NULL values.

Comment
Chapter 5, Problem 20E

Problem

Recent changes in privacy laws have disallowed organizations from using Social Security
numbers to identify individuals unless certain restrictions are satisfied. As a result, most U.S.
universities cannot use SSNs as primary keys (except for financial data). In practice, Student_id,
a unique identifier assigned to every student, is likely to be used as the primary key rather than
SSN since Student_id can be used throughout the system.

a. Some database designers are reluctant to use generated keys (also known as surrogate keys)
for primary keys (such as Student_id) because they are artificial. Can you propose any natural
choices of keys that can be used to identify the student record in a UNIVERSITY database?

b. Suppose that you are able to guarantee uniqueness of a natural key that includes last name.
Are you guaranteed that the last name will not change during the lifetime of the database? If last
name can change, what solutions can you propose for creating a primary key that still includes
last name but remains unique?

c. What are the advantages and disadvantages of using generated (surrogate) keys?

Step-by-step solution

Step 1 of 1

(a)

Some Operation on Students Name and Local and cell phone numbers
(originals) can jointly be used for generating id for student.

For Example:

First name + initials of name+ ‘_’ + last name + ‘_’ + digits of


local_phone_number + sum of digits of cell phone number + ‘_’ + increasing
record counter.

For Example: for record

Let it be 57th entry into the system. We can have unique identifier as:
GeorgeGWE_Edwards_555-123430_57.

Assumptions: Each student has different local_number unless they have same
address and two students with same address will not have same names.

Some hash operations can also be used on various fields for generation of
key.

(b)

In case if natural key uses Last name and as last name can change we can
include a column called original last name. That can be used for identification.

(c)

Advantages of Surrogate keys:

Immutability:

• Surrogate keys do not change while the row exists. This has two advantages:

Database applications won't lose their "handle" on the row because the data changes;

• Many database systems do not support cascading updates of keys across foreign keys of
related tables. This results in difficulty in modifying the primary key data.

Flexibility for changing requirements

Because of changing requirements, the attributes that uniquely identify an entity might change. In
that case, the attribute(s) initially chosen as the natural key will no longer be a suitable natural
key.

Example :

An employee ID is chosen as the natural key of an employee DB. Because of a merger with
another company, new employees from the merged company must be inserted, who have
conflicting IDs (as their IDs were independently generated when the companies were Separate).
In these cases, generally a new attribute must be added to the natural key (e.g. an attribute
"original_company"). With a surrogate key, only the table that defines the surrogate key must be
changed. With natural keys, all tables (and possibly other, related software) that use the natural
key will have to change. More generally, in some problem domains it is simply not clear what
might be a suitable natural key. Surrogate keys avoid problems from choosing a natural key that
later turns out to be incorrect.

Performance
Often surrogate keys are composed of a compact data type, such as a four-byte integer. This
allows the database to query faster than it could multiple columns.

• A non-redundant distribution of keys causes the resulting b-tree index to be completely


balanced.

• If the natural key is a compound key, joining is more expensive as there are multiple columns to
compare. Surrogate keys are always contained in a single column.

Compatibility

Several database application development systems, drivers, and object-relational mapping


systems, such as Ruby on Rails or Hibernate (Java), depend on the use of integer or GUID
surrogate keys in order to support database-system-agnostic operations and object-to-row
mapping.

Disadvantages of surrogate keys:

Disassociation

Because the surrogate key is completely unrelated to the data of the row to
which it is attached, the key is disassociated from that row. Disassociated
keys are unnatural to the application's world, resulting in an additional level of
indirection from which to audit.

Query Optimization

Relational databases assume a unique index is applied to a table's primary


key. The unique index serves two purposes: 1) to enforce entity integrity—
primary key data must be unique across rows—and 2) to quickly search for
rows queried. Since surrogate keys replace a table's identifying attributes—the
natural key—and since the identifying attributes are likely to be those queried,
then the query optimizer is forced to perform a full table scan when fulfilling
likely queries. The remedy to the full table scan is to apply a (non-unique)
index on each of the identifying attributes. However, these additional indexes
will take up disk space, slow down inserts, and slow down deletes.

Normalization

The presence of a surrogate key can result in the database administrator


forgetting to establish, or accidentally removing, a secondary unique index on
the natural key of the table. Without a unique index on the natural key,
duplicate rows are likely to appear and are difficult to identify.

Business Process Modeling

Because surrogate keys are unnatural, flaws can appear when modeling the
business requirements. Business requirements, relying on the natural key,
then need to be translated to the surrogate key.

Inadvertent Disclosure

Proprietary information may be leaked if sequential key generators are used.


By subtracting a previously generated sequential key from a recently
generated sequential key, one could learn the number of rows inserted during
that time period. This could expose, for example, the number of transactions
or new accounts per period. The solution to the inadvertent disclosure
problem is to generate a random primary key. However, a randomly generated
primary key must be queried before assigned to prevent duplication and cause
an insert rejection.

Inadvertent Assumptions

Sequentially generated surrogate keys create the illusion that events with a
higher primary key value occurred after events with a lower primary key value.
This illusion would appear when an event is missed during the normal data
entry process and is, instead, inserted after subsequent events were
previously inserted. The solution to the inadvertent assumption problem is to
generate a random primary key. However, a randomly generated primary key
must be queried before assigned to prevent duplication and cause an insert
rejection.

Comment
Chapter 6, Problem 1RQ

Problem

How do the relations (tables) in SQL differ from the relations defined formally in Chapter 3?
Discuss the other differences in terminology. Why does SQL allow duplicate tuples in a table or in
a query result?

Step-by-step solution

Step 1 of 1

SQL allows a table(relation) to have two or more tuples that are identical in all their attribute
values. Hence, in general, an SQL table is not a set of tuples, because a set does not allow two
identical members; rather, it is a multiset of tuples. Some SQL relations are constrained to be
sets because a key constraint has been declared or because of DISTINCT option has been used
in SELECT statement.

On contrary relation defined formally says that a relation is set of tuples that is, same values are
not allowed for any tuple.

Correspondence between ER and Relational Model can help in understanding other differences
in terminology:

ER Model Relational Model

Entity type Entity relation

1:1 or 1:N relationship type Foreign key(or relationship type)

M:N relationship type Relationship relation and two foreign keys

n-ary relationship type Relationship relation and n foreign keys

Simple Attributes Attribute

Composite attributes Set of simple component attribute

Multivalued attributes Relation and foreign keys

Value set Domain

Key attributes Primary(or secondary) key

SQL allows duplicate tuples for following reasons:

1. Duplicate elimination is a expensive operation.

2. User may want to see duplicate tuples in the result of query.

3. When an aggregate function is applied to tuples, in most cases user don’t want to remove
duplicates.

Comment
Chapter 6, Problem 2RQ

Problem

List the data types that are allowed for SQL attributes.

Step-by-step solution

Step 1 of 1

List of data types allowed for SQL attributes:-

The basic data types available for attributes are

Numeric data types

Character string

Bit string

Boolean

Date and time.

Comment
Chapter 6, Problem 3RQ

Problem

How does SQL allow implementation of the entity integrity and referential integrity constraints
described in Chapter 3? What about referential triggered actions?

Step-by-step solution

Step 1 of 6

An entity integrity constraint specifies that every table must have a primary key and the primary
key should contain unique values and cannot contain null values.

SQL allows implementation of the entity integrity constraint using PRIMARY KEY clause.

• The PRIMARY KEY clause must be specified at the time of creating a table.

• It ensures that no duplicate values are inserted into the table.

Comment

Step 2 of 6

Following are the examples to illustrate how the entity integrity constraint is implemented in SQL:

CREATE TABLE BOOKS

(BOOK_CODE INT PRIMARY KEY,

BOOK_TITLE VARCHAR(20),

BOOK_PRICE INT );

In the table BOOKS, BOOK_CODE is a primary key.

CREATE TABLE AUTHOR

(AUTHOR_ID INT PRIMARY KEY,

AUTHOR_NAME VARCHAR(20));

In the table AUTHOR, AUTHOR_ID is a primary key.

Comment

Step 3 of 6

A foreign key is an attribute or two or more attributes which is/are a primary key of other table
that is used to maintain relationship between two tables.

A referential integrity constraint specifies that the value of a foreign key should match with value
of the primary key in the primary table.

SQL allows implementation of the referential integrity constraint using FOREIGN KEY clause.

• The FOREIGN KEY clause must be specified at the time of creating a table.

• It ensures that it is not possible to add a value to a foreign key which does not exist in the
primary key of the primary/linked table.

Comment

Step 4 of 6

Following is the example to illustrate how the referential integrity constraint is implemented in
SQL:

CREATE TABLE BOOKSTORE

(BOOK_CODE INT FOREIGN KEY REFERENCES BOOKS(BOOK_CODE),

AUTHOR_ID INT FOREIGN KEY REFERENCES AUTHOR(AUTHOR_ID),

BOOK_TYPE VARCHAR(20),

PRIMARY KEY(BOOK_CODE, AUTHOR_ID));

In the table BOOKSTORE, BOOK_CODE, AUTHOR_ID together form the primary key.

BOOK_CODE is a foreign key which refers the BOOK_CODE of table BOOKS.

AUTHOR_ID is a foreign key which refers the AUTHOR_ID of table AUTHOR.

The use of the foreign key BOOK_CODE is that it is not possible to add a tuple to BOOKSTORE
table unless there is a valid BOOK_CODE in the BOOKS table.

The use of the foreign key AUTHOR_ID is that it is not possible to add a tuple to BOOKSTORE
table unless there is a valid AUTHOR_ID in the AUTHOR table.

Comment
Step 5 of 6

When a foreign key is violated, the default action performed by the SQL is to reject the operation.

• Instead of rejecting the operation, it is possible to add a REFERENTIAL TRIGGERED ACTION


clause to the foreign key which will automatically insert a NULL value or a default value.

• The options provided along with REFERENTIAL TRIGGERED ACTION are SET NULL, SET
DEFAULT, CASCADE.

• A qualifier ON DELETE or ON UPDATE must be specified along with the options.

Comment

Step 6 of 6

Following is the example to illustrate how the referential triggered action is implemented in SQL:

CREATE TABLE EMPLOYEE

(EMPNO INT PRIMARY KEY,

ENAME VARCHAR(20),

JOB VARCHAR(20),

SALARY INT,

MANAGER INT FOREIGN KEY REFERENCES EMPLOYEE(EMPNO)

ON DELETE SET NULL);

Comment
Chapter 6, Problem 4RQ

Problem

Describe the four clauses in the syntax of a simple SQL retrieval query. Show what type of
constructs can be specified in each of the clauses. Which are required and which are optional?

Step-by-step solution

Step 1 of 1

The four clauses in the syntax of a simple SQL retrieval query:

The following are the four clauses of a simple SQL retrieval query.

Select:

• It is a statement connected with the From clause to extract or get the data from the database in
a human readable format.

• The select clause is required.

From:

• The From clause should be used in combination with the Select statement for retrieving the
data.

• It will prompt the database to use which table to retrieve the data and we can mention multiple
tables in the from clause.

• It is required.

Where:

• It is used to impose conditions on the query and remove the rows or tuples which does not
satisfy the condition.

• We can use more than one condition in the where clause and

• It is optional.

Order By:

• This clause is used to sort the values of the output either in ascending order or descending
order.

• The default value of the Order By is ascending order.

• This clause is also optional.

Example of Simple Sql query:

Select * from employee where empno=10 Order by desc;

Comment
Chapter 6, Problem 5E

Problem

Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. What are the
referential integrity constraints that should hold on the schema? Write appropriate SQL DDL
statements to define the database.

Step-by-step solution

Step 1 of 2

From the figure 1.2 in the text book the referential integrity constraints that should hold

the following notation:

R.(A1, ..., An) --> S.(B1, ..., Bn)

This represent a foreign key from the attributes A1, ..., An of referencing relation R

to S (the referenced relation)):

PREREQUISITE.(CourseNumber) --> COURSE.(CourseNumber)

PREREQUISITE.(PrerequisiteNumber) --> COURSE.(CourseNumber)

SECTION.(CourseNumber) --> COURSE.(CourseNumber)

GRADE_REPORT.(StudentNumber) --> STUDENT.(StudentNumber)

GRADE_REPORT.(SectionIdentifier) --> SECTION.(SectionIdentifier)

Comment

Step 2 of 2

SQL statements for above data base.

CREATE TABLE STUDENT ( Name VARCHAR(30) NOT NULL,

StudentNumber INTEGER NOT NULL, Class CHAR NOT NULL, Major CHAR(4),

PRIMARY KEY (StudentNumber) );


CREATE TABLE COURSE ( CourseName VARCHAR(30) NOT NULL,

CourseNumber CHAR(8) NOT NULL, CreditHours INTEGER, Department CHAR(4),

PRIMARY KEY (CourseNumber), UNIQUE (CourseName) );

CREATE TABLE PREREQUISITE ( CourseNumber CHAR(8) NOT NULL,

PrerequisiteNumber CHAR(8) NOT NULL, PRIMARY KEY (CourseNumber,


PrerequisiteNumber), FOREIGN KEY (CourseNumber) REFERENCES

COURSE (CourseNumber), FOREIGN KEY (PrerequisiteNumber) REFERENCES

COURSE (CourseNumber) );

CREATE TABLE SECTION ( SectionIdentifier INTEGER NOT NULL,

CourseNumber CHAR(8) NOT NULL, Semester VARCHAR(6) NOT NULL,

Year CHAR(4) NOT NULL, Instructor VARCHAR(15), PRIMARY KEY (SectionIdentifier),


FOREIGN KEY (CourseNumber) REFERENCES

COURSE (CourseNumber) );

CREATE TABLE GRADE_REPORT ( StudentNumber INTEGER NOT NULL,

SectionIdentifier INTEGER NOT NULL, Grade CHAR, PRIMARY KEY (StudentNumber,


SectionIdentifier), FOREIGN KEY (StudentNumber) REFERENCES

STUDENT (StudentNumber), FOREIGN KEY (SectionIdentifier) REFERENCES

SECTION (SectionIdentifier) );

Comment
Chapter 6, Problem 6E

Problem

Repeat Exercise, but use the AIRLINE database schema of Figure.

Exercise

Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. What are the
referential integrity constraints that should hold on the schema? Write appropriate SQL DDL
statements to define the database.

The AIRLINE relational database.

Step-by-step solution

Step 1 of 10

Below referential integrity constraints for the AIR LINE data base schema is based on the figure
2.1 from the text book.

FLIGHT_LEG.(FLIGHT_NUMBER) --> FLIGHT.(NUMBER)

FLIGHT_LEG.(DEPARTURE_AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE)

FLIGHT_LEG.(ARRIVAL_AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE)

LEG_INSTANCE.(FLIGHT_NUMBER, LEG_NUMBER) -->

FLIGHT_LEG.(FLIGHT_NUMBER, LEG_NUMBER)

LEG_INSTANCE.(AIRPLANE_ID) --> AIRPLANE.(AIRPLANE_ID)

LEG_INSTANCE.(DEPARTURE_AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE)

LEG_INSTANCE.(ARRIVAL_AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE)

FARES.(FLIGHT_NUMBER) --> FLIGHT.(NUMBER)

CAN_LAND.(AIRPLANE_TYPE_NAME) --> AIRPLANE_TYPE.(TYPE_NAME)

CAN_LAND.(AIRPORT_CODE) --> AIRPORT.(AIRPORT_CODE)

AIRPLANE.(AIRPLANE_TYPE) --> AIRPLANE_TYPE.(TYPE_NAME)

SEAT_RESERVATION.(FLIGHT_NUMBER, LEG_NUMBER, DATE) -->

LEG_INSTANCE.(FLIGHT_NUMBER, LEG_NUMBER, DATE)

Comment

Step 2 of 10

CREATE TABLE statements for the database is,

CREATE (AIRPORT_CODE CHAR (3) NOT NULL, NAME VARCHAR (30) NOT NULL, CITY
VARCHAR (30) NOT NULL, STATE VARCHAR (30), PRIMARY KEY (AIRPORT_CODE) );

Comment

Step 3 of 10

CREATE TABLE FLIGHT (NUMBER VARCHAR (6) NOT NULL, AIRLINE VARCHAR (20) NOT
NULL, WEEKDAYS VARCHAR (10) NOT NULL, PRIMARY KEY (NUMBER));

Comment
Step 4 of 10

CREATE TABLE FLIGHT_LEG (FLIGHT_NUMBER VARCHAR (6) NOT NULL,

LEG_NUMBER INTEGER NOT NULL, DEPARTURE_AIRPORT_CODE CHAR (3) NOT NULL,


SCHEDULED_DEPARTURE_TIME TIMESTAMP WITH TIME ZONE,

ARRIVAL_AIRPORT_CODE CHAR (3) NOT NULL, SCHEDULED_ARRIVAL_TIME TIMESTAMP


WITH TIME ZONE, PRIMARY KEY (FLIGHT_NUMBER, LEG_NUMBER), FOREIGN KEY
(FLIGHT_NUMBER) REFERENCES FLIGHT (NUMBER), FOREIGN KEY
(DEPARTURE_AIRPORT_CODE) REFERENCES

AIRPORT (AIRPORT_CODE), FOREIGN KEY (ARRIVAL_AIRPORT_CODE)


(AIRPORT_CODE));

Comment

Step 5 of 10

CREATE TABLE LEG_INSTANCE (FLIGHT_NUMBER VARCHAR (6) NOT NULL,

LEG_NUMBER INTEGER NOT NULL, LEG_DATE DATE NOT NULL,


NO_OF_AVAILABLE_SEATS INTEGER, AIRPLANE_ID INTEGER,

DEPARTURE_AIRPORT_CODE CHAR(3), DEPARTURE_TIME TIMESTAMP WITH TIME


ZONE, ARRIVAL_AIRPORT_CODE CHAR(3), ARRIVAL_TIME TIMESTAMP WITH TIME ZONE,
PRIMARY KEY (FLIGHT_NUMBER, LEG_NUMBER, LEG_DATE), FOREIGN KEY
(FLIGHT_NUMBER, LEG_NUMBER) REFERENCES FLIGHT_LEG (FLIGHT_NUMBER,
LEG_NUMBER), FOREIGN KEY (AIRPLANE_ID) REFERENCES

AIRPLANE (AIRPLANE_ID), FOREIGN KEY (DEPARTURE_AIRPORT_CODE)


(AIRPORT_CODE),

FOREIGN KEY (ARRIVAL_AIRPORT_CODE) (AIRPORT_CODE) );

Comment

Step 6 of 10

CREATE TABLE FARES (FLIGHT_NUMBER VARCHAR (6) NOT NULL,

FARE_CODE VARCHAR (10) NOT NULL, AMOUNT DECIMAL (8, 2) NOT NULL,

RESTRICTIONS VARCHAR (200), PRIMARY KEY (FLIGHT_NUMBER, FARE_CODE),


FOREIGN KEY (FLIGHT_NUMBER) REFERENCES FLIGHT (NUMBER) );

Comment

Step 7 of 10

CREATE TABLE AIRPLANE_TYPE (TYPE_NAME VARCHAR (20) NOT NULL,

MAX_SEATS INTEGER NOT NULL, COMPANY VARCHAR (15) NOT NULL,

PRIMARY KEY (TYPE_NAME) );

Comment

Step 8 of 10

CREATE TABLE CAN_LAND (AIRPLANE_TYPE_NAME VARCHAR (20) NOT NULL,


AIRPORT_CODE CHAR (3) NOT NULL, PRIMARY KEY (AIRPLANE_TYPE_NAME,
AIRPORT_CODE), FOREIGN KEY (AIRPLANE_TYPE_NAME) REFERENCES
AIRPLANE_TYPE (TYPE_NAME),

FOREIGN KEY (AIRPORT_CODE) (AIRPORT_CODE) );

Comment

Step 9 of 10

CREATE TABLE AIRPLANE (AIRPLANE_ID INTEGER NOT NULL,

TOTAL_NUMBER_OF_SEATS INTEGER NOT NULL, AIRPLANE_TYPE VARCHAR (20) NOT


NULL, PRIMARY KEY (AIRPLANE_ID),

FOREIGN KEY (AIRPLANE_TYPE) REFERENCES AIRPLANE_TYPE (TYPE_NAME) );

Comment

Step 10 of 10

CREATE TABLE SEAT_RESERVATION (FLIGHT_NUMBER VARCHAR (6) NOT NULL,


LEG_NUMBER INTEGER NOT NULL, LEG_DATE DATE NOT NULL,

SEAT_NUMBER VARCHAR (4), CUSTOMER_NAME VARCHAR (30) NOT NULL,


CUSTOMER_PHONE CHAR (12), PRIMARY KEY (FLIGHT_NUMBER, LEG_NUMBER,
LEG_DATE, SEAT_NUMBER), FOREIGN KEY (FLIGHT_NUMBER, LEG_NUMBER,
LEG_DATE) REFERENCES

LEG_INSTANCE (FLIGHT_NUMBER, LEG_NUMBER, LEG_DATE) );

Comment
Chapter 6, Problem 7E

Problem

Consider the LIBRARY relational database schema shown in Figure. Choose the appropriate
action (reject, cascade, set to NULL, set to default) for each referential integrity constraint, both
for the deletion of a referenced tuple and for the update of a primary key attribute value in a
referenced tuple. Justify your choices.

A relational database scheme for a LIBRARY database.

Step-by-step solution

Step 1 of 7

The appropriate actions of the LIBRARY relational database schema are as follows:

• The REJECT action will not permit the automatic changes in the LIBRARY database.

• If the BOOK is deleted the CASCADE on DELETE action is automatically propagated to the
rows of the referenced relation BOOK_AUTHORS.

• If the BOOK is updated the CASCADE on UPDATE action is automatically propagated to the
rows of the referenced relation BOOK_AUTHORS.

Therefore, the CASCADE on DELETE and CASCADE on UPDATE actions are chosen for the
above referential integrity.

Comment

Step 2 of 7

• It is not possible to delete the rows in the PUBLISHER relation because it is referenced to the
rows in the BOOK table.

• If the PUBLISHER’s name is updated the CASCADE on UPDATE action is automatically


propagated to the rows of the referenced relation BOOK.

Therefore, the ON DELETE REJECT and CASCADE on UPDATE actions are chosen for the
above referential integrity.

Comment

Step 3 of 7

• If the BOOK is deleted the CASCADE on DELETE action is automatically propagated to the
rows of the referenced relation BOOK_LOANS.

• If the BOOK is updated the CASCADE on UPDATE action is automatically propagated to the
rows of the referenced relation BOOK_LOANS.

• It is not possible to delete the rows in the BOOK relation because it is referenced to the rows in
the BOOK_LOANS table.

Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT


actions are chosen for the above referential integrity.

Comment
Step 4 of 7

• If a BOOK is deleted, then delete all its associated rows in the relation BOOK_COPIES.

• If the BOOK is deleted the CASCADE on DELETE action is automatically propagated to the
rows of the referenced relation BOOK_COPIES.

• If the BOOK is updated the CASCADE on UPDATE action is automatically propagated to the
rows of the referenced relation BOOK_COPIES.

Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT


actions are chosen for the above referential integrity.

Comment

Step 5 of 7

• If the rows deleted in a BORROWER table, the CASCADE on DELETE action is automatically
propagated to the rows of the referenced relation BOOK_LOANS.

• If the CardNo is updated in the BORROWER table, the CASCADE on UPDATE action is
automatically propagated to the rows of the referenced relation BOOK_LOANS.

• It is not possible to delete the rows in the BORROWER relation because it is referenced to the
rows in the BOOK_LOANS table.

Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT


actions are chosen for the above referential integrity.

Comment

Step 6 of 7

• If the rows deleted in a LIBRARY_BRANCH table, the CASCADE on DELETE action is


automatically propagated to the rows of the referenced relation BOOK_COPIES.

• If the Branch_id is updated in the LIBRARY_BRANCH table, the CASCADE on UPDATE action
is automatically propagated to the rows of the referenced relation BOOK_COPIES.

• It is not possible to delete the rows in the LIBRARY_BRANCH relation because it is referenced
to the rows in the BOOK_COPIES table.

Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT


actions are chosen for the above referential integrity.

Comment

Step 7 of 7

• If the rows deleted in a LIBRARY_BRANCH table, the CASCADE on DELETE action is


automatically propagated to the rows of the referenced relation BOOK_LOANS.

• If the Branch_id is updated in the LIBRARY_BRANCH table, the CASCADE on UPDATE action
is automatically propagated to the rows of the referenced relation BOOK_LOANS.

• It is not possible to delete the rows in the LIBRARY_BRANCH relation because it is referenced
to the rows in the BOOK_LOANS table.

Therefore, the CASCADE on DELETE, CASCADE on UPDATE, and ON DELETE REJECT


actions are chosen for the above referential integrity.

Comment
Chapter 6, Problem 8E

Problem

Write appropriate SQL DDL statements for declaring the LIBRARY relational database schema of
Figure. Specify the keys and referential triggered actions.

A relational database scheme for a LIBRARY database.

Step-by-step solution

Step 1 of 7

Set of statements for the LIBRARY relational schema from the figure 6.14 in the text book. The
CREATE TABLE is like this:

CREATE TABLE BOOK ( BookId CHAR(20) NOT NULL, Title VARCHAR(30) NOT NULL,
PublisherName VARCHAR(20), PRIMARY KEY (BookId), FOREIGN KEY (PublisherName)
REFERENCES PUBLISHER (Name) ON UPDATE CASCADE );

Comment

Step 2 of 7

CREATE TABLE BOOK_AUTHORS ( BookId CHAR(20) NOT NULL, AuthorName


VARCHAR(30) NOT NULL, PRIMARY KEY (BookId, AuthorName), FOREIGN KEY (BookId)
REFERENCES BOOK (BookId) ON DELETE CASCADE ON UPDATE CASCADE );

Comment

Step 3 of 7

CREATE TABLE PUBLISHER ( Name VARCHAR(20) NOT NULL, Address VARCHAR(40) NOT
NULL, Phone CHAR(12), PRIMARY KEY (Name) );

Comment

Step 4 of 7

CREATE TABLE BOOK_COPIES ( BookId CHAR(20) NOT NULL, BranchId INTEGER NOT
NULL, No_Of_Copies INTEGER NOT NULL, PRIMARY KEY (BookId, BranchId), FOREIGN KEY
(BookId) REFERENCES BOOK (BookId)

ON DELETE CASCADE ON UPDATE CASCADE,FOREIGN KEY (BranchId) REFERENCES


BRANCH (BranchId) ON DELETE CASCADE ON UPDATE CASCADE );

Comment

Step 5 of 7

CREATE TABLE BORROWER ( CardNo INTEGER NOT NULL, Name VARCHAR(30) NOT
NULL, Address VARCHAR(40) NOT NULL, Phone CHAR(12),

PRIMARY KEY (CardNo) );

Comment
Step 6 of 7

CREATE TABLE BOOK_LOANS ( CardNo INTEGER NOT NULL, BookId CHAR(20) NOT NULL,
BranchId INTEGER NOT NULL, DateOut DATE NOT NULL,

DueDate DATE NOT NULL, PRIMARY KEY (CardNo, BookId, BranchId),

FOREIGN KEY (CardNo) REFERENCES BORROWER (CardNo) ON DELETE CASCADE ON


UPDATE CASCADE, FOREIGN KEY (BranchId) REFERENCES LIBRARY_BRANCH (BranchId)
ON DELETE CASCADE ON UPDATE CASCADE,

FOREIGN KEY (BookId) REFERENCES BOOK (BookId) ON DELETE CASCADE ON UPDATE


CASCADE );

Comment

Step 7 of 7

CREATE TABLE LIBRARY_BRANCH ( BranchId INTEGER NOT NULL, BranchName


VARCHAR(20) NOT NULL, Address VARCHAR(40) NOT NULL,

PRIMARY KEY (BranchId) );

Comment
Chapter 6, Problem 9E

Problem

How can the key and foreign key constraints be enforced by the DBMS? Is the enforcement
technique you suggest difficult to implement? Can the constraint checks be executed efficiently
when updates are applied to the database?

Step-by-step solution

Step 1 of 3

Enforcement of key constraint in DBMS (Database management System):

Key constraint:

The technique that is often used to check efficiently for the key constraint is to create an index on
the combination of attributes that form each key (primary or secondary).

• Before inserting a new record (tuple), each index is searched to check that no value currently
exists in the index that matches the key value in the new record.

• If the search is successful then it inserts the record.

Foreign key constraint:

The technique to check the foreign key constraint is that using the index on the primary key of
each referenced relation will make the check relatively efficient.

Whenever a new record is inserted in a referencing relation, its foreign key value is used to
search the index for the primary key of the referenced relation, and if the referenced record
exists, then the new record can be successfully inserted in the referencing relation.

For deletion of a referenced record, it is useful to have an index on the foreign key of each
referencing relation so as to be able to determine efficiently whether any records reference the
record being deleted.

Comment

Step 2 of 3

Implementation of enforcement technique:

, the enforcement technique of using the index is easy to identify the duplicate data
records.

• If any other alternative structure like hashing is used instead of using the index on key
constraint then it only does the linear searches to check for constraints and it makes the checks
quite inefficient.

Comment

Step 3 of 3

Efficient constraint checks:

, the constraint checks are executed efficiently while inserting or deleting the record from
the database.

• Using the index to enforce the key constraint avoids the duplication of data records and this
helps the product vendors to achieve the greater data storage and management.

Thus, the constraint checks using the index is efficient.

Comment
Chapter 6, Problem 10E

Problem

Specify the following queries in SQL on the COMPANY relational database schema shown in
Figure 5.5. Show the result of each query if it is applied to the COMPANY database in Figure 5.6.

a. Retrieve the names of all employees in department 5 who work more than 10 hours per week
on the ProductX project.

b. List the names of all employees who have a dependent with the same first name as
themselves.

c. Find the names of all employees who are directly supervised by ‘Franklin Wong’.

Step-by-step solution

Step 1 of 9

a)

Query:

Select emp.Fname, emp.Lname

from employee emp, works_on w, project p

where emp.Dno = 5 and emp.ssn = w.Essn and w.Pno = p.pnumber and p.pname = 'ProductX'
and w.hours > 10
Comment

Step 2 of 9

Result:

Fname Lname

John Smith

Joyce English

Comment

Step 3 of 9

Explanation:

The above query will display the names of all employees of department “5” and who works more
than 10 hours per week on the project “Product X”.

Comment

Step 4 of 9

b)

Query:

Select emp.Fname, emp.Lname

from employee emp, dependent d

where emp.ssn= d.essn and emp.Fname = d.Dependent_name

Comment

Step 5 of 9

Result: (empty)

Fname Lname

Comment

Step 6 of 9

Explanation:

The above query will display the names of the entire employee who have a dependent with the
same first name as themselves.

• Here, the result is empty. Because, it does not have the same first name in dependent and
employee table.

Comment

Step 7 of 9

c)

Query:

Select emp.Fname, emp.Lname

from employee emp, employee emp1

where emp1.Fname= ‘Franklin’ and emp1.Lname = ‘Wong’ and emp.superssn = emp1.ssn

Comment

Step 8 of 9

Fname Lname

John Smith

Ramesh Narayan

Joyce English

Comment

Step 9 of 9
Explanation:

The above query uses self-join to display the names of all the employees who are under the
supervision of Franklin Wong.

Comment
Chapter 6, Problem 11E

Show transcribed image text


E Chegg Study TEXTBOOK SOLUTIONS EXPERT Q&A Search home study /engineering /computer science database systems
/solutions manual fundamentals of database systems /7th edition /chapter 6 problem 11e Fundamentals of Database Systems
(7th Edition) E Chapter 6, Problem 11E Bookmark Show all steps: a ON Problem Specify the updates of Exercise using the SQL
update commands. Exercise What is meant by a recursive relationship type? Give some example of recursive relationship types.
Step-by-step solution There is no solution to this problem yet. Get help from a Chegg subject expert. ASK AN EXPERT

If the same entity type participate more than once in a relationship type in different roles then such
relationship types are called recursive relationship. It occur within unary relationships. The relationship may
be one to one, one to many or many to many. That is the cardinality of the relationship is unary. The
connectivity may be 1:1, 1:M, or M:N.

For example, in the below gure REPORTS_TO is a recursive relationship as the Employee entity type plays
two roles – 1) Supervisor and 2) Subordinate.

The above relationship can also be de ned as relationship between a manager and a employee.  An
employee is a manager as well as employee.

To implement recursive relationship, a foreign key of the employee’s manager number would be held in
each employee record.

Emp_entity( Emp_no,Emp_Fname, Emp_Lname, Emp_DOB, Emp_NI_Number, Manager_no);

Manager no - (this is the employee no of the

View comments (1)



Chapter 6, Problem 12E

Problem

Specify the following queries in SQL on the database schema of Figure 1.2.

a. Retrieve the names of all senior students majoring in ‘cs’ (computer science).

b. Retrieve the names of all courses taught by Professor King in 2007 and 2008.

c. For each section taught by Professor King, retrieve the course number, semester, year, and
number of students who took the section.

d. Retrieve the name and transcript of each senior student (Class = 4) majoring in CS. A
transcript includes course name, course number, credit hours, semester, year, and grade for
each course completed by the student.

Step-by-step solution

Step 1 of 4

a.

The query to display the names of senior students majoring in CS is as follows:

Query:

SELECT Name FROM STUDENT

WHERE Major = “CS” AND Class = “4”;

Output:

Explanation:

• There are no rows in the database where Class is Senior, and Major is CS.

• SELECT is used to query the database and get back the specified fields.

o Name is the columns of STUDENT table.

• FROM is used to query the database and get back the preferred information by specifying the
table name.

o STUDENT is a table name.

• WHERE is used to specify a condition based on which the data is to be retrieved. In the
database, Seniors are represented by Class 4. The condition is as follows:
o Major='CS'AND Class = ‘4’

Comment

Step 2 of 4

b.

The query to get the course name that are taught by professor King in year 2007 and 2008 is as
follows:

Query:

SELECT Course_name

FROM COURSE, SECTION

WHERE COURSE.Course_number = SECTION.Course_number

AND Instructor = 'King'

AND (Year='07' or Year='08');

Output :

Explanation:

• SELECT is used to query the database and get back the specified fields.

o Course_name is the columns of COURSE table.

• FROM is used to query the database and get back the preferred information by specifying the
table name.

o COURSE, SECTION are table names.

• WHERE is used to specify a condition based on which the data is to be retrieved. The
conditions are as follows:

o COURSE.Course_number = SECTION.Course_number

o Instructor = 'King'

o (Year='07' or Year='08')

• The conditions are concatenated with AND operator. All the conditions must be satisfied.

Comment

Step 3 of 4

c.

The query to retrieve the course number, Semester, Year and number of students who took the
section taught by professor King is as follows:

Query:

SELECT Course_number, Semester, Year, Count(G.Student_number) AS 'Number of Students'

FROM SECTION AS S, GRADE_REPORT AS G

WHERE S.Instructor= 'King'

AND S.Section_identifier=G.Section_identifier;

Output :

Explanation:

• SELECT is used to query the database and get back the specified fields.

o Course_number, Semester, Year are the columns of SECTION table.

• FROM is used to query the database and get back the preferred information by specifying the
table name.

o GRADE_REPORT, SECTION are table names.


• WHERE is used to specify a condition based on which the data is to be retrieved. The
conditions are as follows:

o S.Instructor= 'King'

o S.Section_identifier=G.Section_identifier

Comment

Step 4 of 4

d.

The query to display the name and transcript of each senior students majoring in CS is as
follows:

Query:

SELECT ST.Name, C.Course_name, C.Course_number, C.Credit_hours, S.Semester, S.Year,


G.Grade

FROM STUDENT AS ST, COURSE AS C, SECTION AS S, GRADE_REPORT As G

WHERE Class = 4 AND Major='CS'

AND ST.Student_number= G.Student_number

AND G.Section_identifier= S.Section_identifier

AND S.Course_number= C.Course_number;

Output :

No rows selected.

Explanation:

• SELECT is used to query the database and get back the specified fields.

o Course_number, Course_number, Credit_hours are the columns of COURSE table.

o Semester, Year are the columns of SECTION table.

o Name is the columns of STUDENT table.

o Grade is the columns of GRADE_REPORT table.

• FROM is used to query the database and get back the preferred information by specifying the
table name.

o STUDENT, COURSE, GRADE_REPORT, SECTION are table names.

o ST is the alias name for STUDENT table.

o G is the alias name for GRADE_REPORT table.

o S is the alias name for SECTION table.

o C is the alias name for COURSE table.

• WHERE is used to specify a condition based on which the data is to be retrieved. The
conditions are as follows:

o Class = 4

o Major='CS'

o ST.Student_number= G.Student_number

o G.Section_identifier= S.Section_identifier

o S.Course_number= C.Course_number

Comment
Chapter 6, Problem 13E

Problem

Write SQL update statements to do the following on the database schema shown in Figure 1.2.

a. Insert a new student, , in the database.

b. Change the class of student ‘Smith’ to 2.

c. Insert a new course, <’Knowledge Engineering’, ‘cs4390’, 3, ‘cs’>.

d. Delete the record for the student whose name is ‘Smith’ and whose student number is 17.

Step-by-step solution

Step 1 of 4

a.

The query to insert a new student into STUDENT relation is as follows:

Query:

INSERT INTO STUDENT VALUES ('Johnson', 25, 1, 'MATH');

Explanation:

• INSERT command is used to insert a row into a relation.

• STUDENT is the name of the relation.

Output:

Comment

Step 2 of 4

b.

The query to update the class of a student with name Smith to 2 is as follows:

Query:
UPDATE STUDENT

SET CLASS = 2

WHERE Name='Smith';

Explanation:

• UPDATE command is used to modify the data in a relation.

• STUDENT is the name of the relation.

• SET is used to specify the new value for a column.

• WHERE is used to specify a condition based on which the data is to be retrieved.

Output:

Comment

Step 3 of 4

c.

Query:

INSERT INTO COURSE VALUES

('Knowledge Engineering','cs4390', 3,'cs');

Explanation:

• INSERT command is used to insert a row into a relation.

• COURSE is the name of the relation.

Output:

Comment

Step 4 of 4

d.

Query:

DELETE FROM STUDENT

WHERE Name='Smith' AND Student_number=17;

Explanation:

• DELETE command is used to delete a row from the specified relation.

• STUDENT is the name of the relation.

• WHERE is used to specify a condition based on which the data is to be retrieved.

Output:
Chapter 6, Problem 14E

Problem

Design a relational database schema for a database application of your choice.

a. Declare your relations using the SQL DDL.

b. Specify a number of queries in SQL that are needed by your database application.

c. Based on your expected use of the database, choose some attributes that should have
indexes specified on them.

d. Implement your database, if you have a DBMS that supports SQL.

Step-by-step solution

Step 1 of 6

Consider a student database that stores the information about students, courses and faculty.

a.

The DDL statement to create the relation STUDENT is as follows:

CREATE TABLE STUDENT (

StudentID int(11) NOT NULL,

FirstName varchar(20) NOT NULL,

LastName varchar(20) NOT NULL,

Address varchar(30) NOT NULL,

DOB date,

Gender char

);

The DDL statement to add a primary key to the relation STUDENT is as follows:

ALTER TABLE STUDENT

ADD PRIMARY KEY (StudentID);

The DDL statement to create the relation COURSE is as follows:

CREATE TABLE COURSE (

CourseID varchar(30) NOT NULL,

CourseName varchar(30) NOT NULL,

PRIMARY KEY (CourseID)

);

The DDL statement to create the relation FACULTY is as follows:

CREATE TABLE FACULTY (

FacultyID int(11) NOT NULL,

FacultyName varchar(30) NOT NULL,

PRIMARY KEY (FacultyID)

);

The DDL statement to create the relation REGISTRATION is as follows:

CREATE TABLE REGISTRATION (

StudentID int(11) NOT NULL,

CourseID varchar(30) NOT NULL,

PRIMARY KEY (StudentID, CourseID)

);

The DDL statement to create the relation TEACHES is as follows:

CREATE TABLE TEACHES (

FacultyID int(11) NOT NULL,

CourseID varchar(30) NOT NULL,

DateQualified varchar(12),

PRIMARY KEY (FacultyID,CourseID)

);

The DDL statement to add a column GradePoints to the relation COURSE is as follows:

ALTER TABLE COURSE

ADD COLUMN GradePoints int(2);

Comment

Step 2 of 6
b.

A wide number of queries can be written using the five relations based on the requirement of the
user. So, the number of queries is not fixed and will vary.

Some of the possible queries that are needed by the database application are as follows:

The query to retrieve the details of the students is as follows:

SELECT *

FROM STUDENT;

The query to retrieve the details of the faculties is as follows:

SELECT *

FROM FACULTY;

The query to retrieve the details of the courses offered is as follows:

SELECT *

FROM COURSE;

The query to retrieve which course is taught by which faulty is as follows:

SELECT *

FROM TEACHES;

The query to retrieve the names of the students who have registered for a course is as follows:

SELECT FirstName, LastName

FROM STUDENT, REGISTRATION

WHERE STUDENT.StudentID=REGISTRATION.StudentID;

Comment

Step 3 of 6

The query to retrieve the details of the male students is as follows:

SELECT * FROM STUDENT

WHERE GENDER= 'M';

The query to retrieve the courses with grade point 3 and above is as follows:

SELECT * FROM COURSE

WHERE GradePoints >=3;

Comment

Step 4 of 6

c.

Indexes are used for faster retrieval of data. Some of the attributes that can used as indexes are
as follows:

• An index can be specified on FirstName in STUDENT relation.

• An index can be specified on LastName in STUDENT relation.

• An index can be specified on CourseName in COURSE relation.

• An index can be specified on FacultyName in FACULTY relation.

Comment

Step 5 of 6

d.

The implementation of the student database is as follows:


Comment

Step 6 of 6
Comment
Chapter 6, Problem 15E

Problem

Consider that the EMPLOYEE table’s constraint EMPSUPERFK as specified in Figure 6.2 is
changed to read as follows:

CONSTRAINT EMPSUPERFK FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(Ssn)

Answer the following questions:

a. What happens when the following command is run on the database state shown in Figure 5.6?

DELETE EMPLOYEE WHERE Lname = ‘Borg’

b. Is it better to CASCADE or SET NULL in case of EMPSUPERFK constraint ON DELETE?

Step-by-step solution

Step 1 of 2

a)
From the figure 8.2 in the text book, while EMP table constraint specified as

CONSTRAINT EMPSUPER FK FOREIGN KEY(supper_ssn) REFERNCES EMPLOYEE(Ssn)


ON DELETET CASCADE ON UPDATE CASCADE,

From the figure 5.5 in the text book the result is like this.

The James E. Borg entry is deleted from the table, and each employee with him as a

supervisor is also (and their supervisees, and so on). In total, 8 rows are deleted and the

table is empty.

Comment

Step 2 of 2

b)

Yes, It is better to SET NULL, since an employee is not fired (DELETED) when their

supervisor is deleted. Instead, their SUPERSSN should be SET NULL so that they can later

get a new supervisor.

Comment
Chapter 6, Problem 16E

Problem

Write SQL statements to create a table EMPLOYEE_BACKUP to back up the EMPLOYEE table
shown in Figure 5.6.

Step-by-step solution

Step 1 of 4

Step1:

Create a table EMPLOYEE is as follows:

CREATE TABLE EMPLOYEE (

Fname varchar(15) NOT NULL,

Minit char(1) DEFAULT NULL,

Lname varchar(15) NOT NULL,

Ssn char(9) NOT NULL,

Bdata date DEFAULT NULL,

Address varchar(30) DEFAULT NULL,

Sex char(1) DEFAULT NULL,

Salary decimal(10,2) DEFAULT NULL,

Super_ssn char(9) DEFAULT NULL,

Dno int(11) NOT NULL,

PRIMARY KEY ( Ssn )

);

Step2:

Insert the data into the EMPLOYEE table using INSERT command.

INSERT INTO EMPLOYEE VALUES ('James', 'E', 'Borg', '888665555', DATE '1937-11-10', '450
Stone, Houston, TX', 'M', 55000, NULL, 1);

INSERT INTO EMPLOYEE VALUES ('Jennifer', 'S', 'Wallace', '987654321', DATE '1941-06-20',
'291 Berry, Bellaire, Tx', 'F', 37000, '888665555', 4);

INSERT INTO EMPLOYEE VALUES ('Franklin', 'T', 'Wong', '333445555', DATE '1955-12-08',
'638 Voss, Houston, TX', 'M', 40000, '888665555', 5);

INSERT INTO EMPLOYEE VALUES ('John', 'B', 'Smith', '123456789', DATE '1965-01-09', '731
Fondren, Houston, TX', 'M', 30000, '333445555', 5);
INSERT INTO EMPLOYEE VALUES ('Alicia', 'J', 'Zelaya', '999887777', DATE '1968-01-19', '3321
castle, Spring, TX', 'F', 25000, '987654321', 4);

INSERT INTO EMPLOYEE VALUES ('Ramesh', 'K', 'Narayan', '666884444', DATE '1920-09-15',
'975 Fire Oak, Humble, TX', 'M', 38000, '333445555', 5);

INSERT INTO EMPLOYEE VALUES ('Joyce', 'A', 'English', '453453453', DATE '1972-07-31',
'5631 Rice, Houston, TX', 'F', 25000, '333445555', 5);

INSERT INTO EMPLOYEE VALUES ('Ahmad', 'V', 'Jabbar', '987987987', DATE '1969-03-29',
'980 Dallas, Houston, TX', 'M', 22000, '987654321', 4);

INSERT INTO EMPLOYEE VALUES ('Melissa', 'M', 'Jones', '808080808', DATE '1970-07-10',
'1001 Western, Houston, TX', 'F', 27500, '333445555', 5);

Step3:

Now, select the EMPLOYEE table to display all the rows.

select * from EMPLOYEE;

Sample Output:

Comment

Step 2 of 4

The SQL statements to create a table EMPLOYEE_BACKUP to store the backup data of
EMPLOYEE table is as follows:

The SQL statement to create the EMPLOYEE_BACKUP table:

CREATE TABLE EMPLOYEE_BACKUP LIKE EMPLOYEE;

Explanation:

• The SQL statement will create the table EMPLOYEE_BACKUP with the same structure as the
table EMPLOYEE.

• CREATE TABLE is the command to create a table.

• LIKE is the keyword used to copy the structure of the table EMPLOYEE.

Comment

Step 3 of 4

The SQL statement to insert the data into the EMPLOYEE_BACKUP:

INSERT INTO EMPLOYEE_BACKUP (SELECT * FROM EMPLOYEE);

Explanation:

• The SQL statement will insert the data in the table EMPLOYEE_BACKUP into the table
EMPLOYEE_BACKUP.

Comment

Step 4 of 4

SELECT * FROM EMPLOYEE will fetch all the records from the table EMPLOYEE.

Sample Output:

Comment
Chapter 7, Problem 1RQ

Problem

Describe the six clauses in the syntax of an SQL retrieval query. Show what type of constructs
can be specified in each of the six clauses. Which of the six clauses are required and which are
optional?

Step-by-step solution

Step 1 of 3

A query in SQL consists of up to six clauses. The clauses are specified in following order.

• SELECT < attribute list >

• FROM < table list >

• [ WHERE < condition > ]

• [ GROUP BY < grouping attributes (S) > ]

• [ HAVING < group condition > ]

• [ ORDER BY < attribute list > ]

Comment

Step 2 of 3

The definition of the types of the values returned by the query is made with the help of the
SELECT clause.

The FROM clause is used to retrieve the desired data from the table for the provided query.

The WHERE clause is a conditional clause. It is used to retrieve the values with restriction.

The GROUP BY clause is used to group the results for the provided query according to the
properties.

The HAVING clause is used to retrieve the results of the GROUP BY clause with some
restriction.

The ORDER BY clause is used to sort the values returned by the query in a specific order.

Comment

Step 3 of 3

The SELECT and FROM clauses are the required clauses and the clauses like WHERE,
GROUP BY, HAVING and ORDER BY are optional clauses.

Comment
Chapter 7, Problem 2RQ

Problem

Describe conceptually how an SQL retrieval query will be executed by specifying the conceptual
order of executing each of the six clauses.

Step-by-step solution

Step 1 of 1

A retrieval query in SQL can consist of up to six clauses, but only the first two-SELECT and
FROM- are mandatory. The clauses are specified in the following order, with the clauses
between square brackets […] being optional:

SELECT

FROM

[WHERE]

[GROUP BY]

[HAVING]

[ORDER BY ]

The SELECT clause lists the attributes or functions to be retrieved. The FROM clause specifies
all relation needed in query, including joined relations, but not those in nested queries. The
WHERE clause specifies the conditions for selection of tuples from these relations, including join
conditions if needed. GROUP BY specifies grouping attributes, HAVING specifies a condition on
groups being selected rather than individual tuples. ORDER BY specifies an order for displaying
the result of a query.

A query is evaluated conceptually by first applying FROM clause, followed by WHERE clause,
and then GROUP BY, and HAVING. ORDER BY s applied at the end to sort the query result. The
values of the attributes specified in SELECT clause are shown in result.

Comment
Chapter 7, Problem 3RQ

Problem

Discuss how NULLs are treated in comparison operators in SQL. How are NULLs treated when
aggregate functions are applied in an SQL query? How are NULLs treated if they exist in
grouping attributes?

Step-by-step solution

Step 1 of 1

In SQL NULL is treated as an UNKNOWN value. SQL has thre logical operators TRUE, FALSE,
UNKNOWN.

For comparison operators in SQL, NULL can be compared using IS or IS NOT operator. SQL
treats each NULL as a distinct value, so =,<,> can not be used for comparison.

In general, NULL values are discarded when aggregate functions are applied to a particular
column.

If NULL exists in the grouping attribute, then separate group is created for all tuples with a NULL
value in the grouping attribute.

Comment
Chapter 7, Problem 4RQ

Problem

Discuss how each of the following constructs is used in SQL, and discuss the various options for
each construct. Specify what each construct is useful for.

a. Nested queries

b. Joined tables and outer joins

c. Aggregate functions and grouping

d. Triggers

e. Assertions and how they differ from triggers

f. The SQL WITH clause

g. SQL CASE construct

h. Views and their updatability

i. Schema change commands

Step-by-step solution

Step 1 of 11

a.

Nested Queries:

A nested query is a type of SQL query that is used within another SQL queries with WHERE
clause. It is also known as sub query or Inner query.

Options:

It can be used with the SELECT, INSERT, UPDATE, and DELETE statements. These statements
are used with the operators <, >, <=, >=, =, IN, BETWEEN.

SYNTAX:

Get the employee id of all employee who are enrolled in the same business as the other
employee with salary 35000.

Select * from

where in

Use:

It is used to return values after comparison from the selected values.

Comment

Step 2 of 11

b.

Joined Tables:

A joined-table is the resultant table that is the generated by an inner join, or an outer join, or a
cross join.

Uses of Joined Tables:

A joined table can be used in any context where the SELECT statement is used.

Outer Join:

Types of outer:

1) Left outer join: when left outer join is applied on tables it return all the rows from the left table
and those right table rows also came which is same in the left table row. It is denoted by the
symbol (?).
Syntax:

SELECT columnFROM table_ALEFTJOIN table_BON table_A.column_1=table_B.column_2;

2) Right outer join: when the right outer join is applied to tables, it returns all the rows from the
right table and those left table rows also came which is same in the right table row. It is denoted
by the symbol (?).

Syntax:

SELECT columnFROM table_ARIGHTJOIN tableBON table_A.column1=tableB.column2;

3) Full outer join: when the full outer join is applied on the table it return all the rows from both
the left and the right table. It is denoted by the symbol (?).

Syntax:

SELECT column
FROM table_AFULLOUTERJOIN table_BON table_A.column1=table_B.column2;

Options:

It is used with the SELECT, FROM and ON statements.

Use:

Join can be used to get a resultant column or table by adding two different table.

Comment

Step 3 of 11

c.

Aggregate Functions:

It is a function where the multiple input values take from the Column to generate a single value
as an output

Aggregate functions are: Avg, Count, First, Last, Max, Min, Sum etc.

Option:

It can be used with the SELECT and FROM.

Use:

It is used to perform mathematics operation easily.

Grouping:

In many cases to subgroup the tuples in a relation the aggregation function may apply. These
subgroups are dependent on some attribute values. On applying the group by clause the table is
divided into different group.

Syntax for using Group by clause:

SELECT column name, function (column name)

FROM table name


WHERE column name operator value

GROUP BY column name;

Options:

It can be used with the SELECT,FROM and WHERE statements.

Use:

The GROUP BY clause is applied when there is a need of dividing the table into different group
according the attributes values.

Comment

Step 4 of 11

d.

Triggers:

A database trigger is procedural code, which automatically execute or fire when event (INSERT,
DELETE or UPDATE) occurs.

Syntax for trigger:

Options:

It can be used with the INSERT, DELETE and UPDATE statements.

Use:

Trigger can be used for the following purpose:

1. To create some derived column automatically.

2. To improve security authorization.

3. To avoid the invalid transaction

Comment

Step 5 of 11

e.

Assertions:

It is an expression that should be always true. When there is create the expression should
always be true. DBMS checks the assertion after any change in the expression that may violate
the expression.

Syntax for Assertions:

Create assertion check

Predicates always return a result either true or false.

Option:

It can be used with the CREATE, CHECK and FROM statements.

Use:

It can be used to check the condition of schema only.

The following table shows the difference between ASSERTION andTRIGGERS:

ASSERTIONS TRIGGERS

Assertion only check the conditions it do not Triggers check the condition and if required
modify the data. the change the data also.

Assertion neither linked the particular table nor Trigger linked the both particular table and
particular events in the database. particular in the database.

All Triggers cannot be implements as


All Assertion can be used as the Trigger.
assertions.

Oracle database does not implements Assertions. Oracle database implements Triggers.

Comment

Step 6 of 11

f.

The SQL WITH clause:

This clause was introduced as a convenience in SQL 99 and it was added into the Oracle SQL
syntax in Oracle 9.2, it may not available in all SQL based DBMS. It allows the user to define the
table in a such a way that it is only being used in a particular query. It is sometime similar like
creating a view that will be used in a particular query then drop.

Syntax for SQL WITH clause:

WITH temporary table

SELECT Column name

FROM table name


WHERE condition

GROUP BY column name;

Option:

It can be used with SELECT, FROM, WHERE and GROUP BY statements.

Used:

It can be used to create a complex statement rather than simple statements. It can be used to
break down complex SQL queries with which it easy for debugging and processing the complex
queries.

Comment

Step 7 of 11

g.

SQL CASE construct:

The SQL case constructs used as the if-else-then used in java similarly it is used in SQL. It can
be used when some value or any values is different on a particular condition. SQL case construct
can be used with any SQL query where the conditional values have to be extract.

Syntax of Sql case construct:

Case expression

WHEN condition_a THEN result_1

WHEN condition_b THEN result_2

WHEN condition_c THEN result_3

ELSE result

END;

Comment

Step 8 of 11

Option:

It can be used with the SELECT and FROM statement.

Comment

Step 9 of 11

Use:

It can be used to perform a operation when there is a particular condition occur.

Comment

Step 10 of 11

h.

Views and their updatability:

The view is a virtual table which is derived from the other table and these other tables are base
table. And these base tables are physically exist and its tuples are stored in the database.

Syntax for creating view:

CREATE VIEW virtual table

AS SELECT attributes

FROM different tables

WHERE conditions;

It creates the view there is the name of the view and in the AS SELECT we define the attributes
which came under virtual table, the FROM clause defines the table from where the attributes will
be extracted for the virtual table and in the where there is particular condition which should be
satisfied by the virtual table.

Option:

It can be used with the AS SELECT, FROM, WHERE statements.

Use:

The virtual table is create when the table need to reference frequently.

Comment

Step 11 of 11

i.

Schema change Commands:


The schema change command used in sql to alter a schema by adding or dropping the
attributes, table, constraints and other schema elements. This can be done when the database
does not require to again compile the database schema and the database is optional.

The different Schema change Commands are as follows:

• The drop command

• The alter command

DROP command:

The drop commands can be used to drop schema elements, Such as tables, attributes,
constraints. The whole schema can be drop by the command DROP SCHEMA.

Syntax of drop command:

DROP SCHEMA employee CASCADE;

ALTER command:

The schema can be change with the help of the Alter command, such as changing the column
name, adding or dropping the attributes.

Syntax of alter command:

ALTER TABLE employee ADD COLUMN phone_no VARCHAR (15);

Use:

It can be used to change the schema or to drop the schema.

Comment
Chapter 7, Problem 5E

Problem

Specify the following queries on the database in Figure 5.5 in SQL. Show the query results if
each query is applied to the database state in Figure 5.6.

a. For each department whose average employee salary is more than $30,000, retrieve the
department name and the number of employees working for that department.

b. Suppose that we want the number of male employees in each department making more than
$30,000, rather than all employees (as in Exercise a). Can we specify this query in SQL? Why or
why not?

Step-by-step solution

Step 1 of 2

a)

The query to retrieve dname and count of employees working in that department whose average
salary is greater than 30000 is as follows:

Query:

SELECT Dname, COUNT(*) FROM DEPARTMENT, EMPLOYEE


WHERE DEPARTMENT.Dnumber=EMPLOYEE.DNo

GROUP BY Dname

HAVING AVG(Salary) > 30000;

Output:

Explanation:

• SELECT is used to query the database and get back the specified fields.

o Dname, LAST_NAME, FIRST_NAME is an attribute of DEPARTMENT table.

• FROM is used to query the database and get back the preferred information by specifying the
table name.

o EMPLOYEE and DEPARTMENT are table names.

• WHERE is used to specify a condition based on which the data is to be retrieved.

The conditions are as follows:

o DEPARTMENT.Dnumber=EMPLOYEE.DNo

• GROUP BY is used to group the result of a SELECT statement done on a table where the tuple
values are similar for more than one column.

o Dname is the group by attribute.

• HAVING clause is used to specify the condition based on group by function.

o AVG(Salary) > 30000 is the condition.

• COUNT(*) is used to count the number of tuples that satisfy the conditions.

Comment

Step 2 of 2

(b)

The query to retrieve dname and count of employees working in that department whose salary is
greater than 30000 is as follows:

Query:

SELECT Dname, COUNT(*) FROM DEPARTMENT, EMPLOYEE

WHERE DEPARTMENT.Dnumber=EMPLOYEE.DNo

AND Sex='M'

AND Salary > 30000

GROUP BY Dname;

Output:

Explanation:

• SELECT is used to query the database and get back the specified fields.

o Dname, LAST_NAME, FIRST_NAME is an attribute of DEPARTMENT table.

• FROM is used to query the database and get back the preferred information by specifying the
table name.

o EMPLOYEE and DEPARTMENT are table names.

• WHERE is used to specify a condition based on which the data is to be retrieved.

The conditions are as follows:

o DEPARTMENT.Dnumber=EMPLOYEE.DNo

o Sex='M'

o Salary > 30000

• GROUP BY is used to group the result of a SELECT statement done on a table where the tuple
values are similar for more than one column.

o Dname is the group by attribute.

Comments (1)
Chapter 7, Problem 6E

Problem

Specify the following queries in SQL on the database schema in Figure 1.2.

a. Retrieve the names and major departments of all straight-A students (students who have a
grade of A in all their courses).

b. Retrieve the names and major departments of all students who do not have a grade of A in
any of their courses.

Step-by-step solution

Step 1 of 2

a.

The query to retrieve the names and major departments of the students who got A grade in all
the courses is as follows:

Query:

SELECT Name, Major FROM STUDENT

WHERE NOT EXISTS (SELECT * FROM GRADE_REPORT

WHERE Student_number= STUDENT.Student_number

AND NOT (GRADE='A'));

Explanation:

• SELECT is used to query the database and get back the specified fields.

o Name, Major are columns of STUDENT table.

• FROM is used to query the database and get back the preferred information by specifying the
table name.

o STUDENT is a table name.

• WHERE is used to specify a condition based on which the data is to be retrieved.

• The inner query retrieves the details of the student who got other than A grade for any courses.

• The outer query retrieves the name and major of the student who got A grade for all courses.

• NOT EXISTS is used to retrieve only those students which are not retrieved by inner query.

Output:
Comment

Step 2 of 2

b.

The query to retrieve the names and major departments of the students who got A grade in all
the courses is as follows:

Query:

SELECT Name, Major FROM STUDENT

WHERE NOT EXISTS (SELECT * FROM GRADE_REPORT

WHERE Student_number= STUDENT.Student_number

AND (GRADE= 'A'));

Explanation:

• SELECT is used to query the database and get back the specified fields.

o Name, Major are columns of STUDENT table.

• FROM is used to query the database and get back the preferred information by specifying the
table name.

o STUDENT is a table name.

• WHERE is used to specify a condition based on which the data is to be retrieved.

• The inner query retrieves the details of the student who got A grade for any courses.

• The outer query retrieves the name and major of the student who did not get A grade for any
courses.

• NOT EXISTS is used to retrieve only those students which are not retrieved by inner query.

Output:

Comment
Chapter 7, Problem 7E

Problem

In SQL, specify the following queries on the database in Figure 5.5 using the concept of nested
queries and other concepts described in this chapter.

a. Retrieve the names of all employees who work in the department that has the employee with
the highest salary among all employees.

b. Retrieve the names of all employees whose supervisor’s supervisor has ‘888665555’ for Ssn.

c. Retrieve the names of employees who make at least $10,000 more than the employee who is
paid the least in the company.

Step-by-step solution

Step 1 of 4

SQL:

Structured Query Language (SQL) is a database language for managing and accessing the data
in a relational database.

• SQL consists of queries to insert, update, delete, and retrieve records from a database. It even
creates a new database and database table.

Nested query:

Some of the queries require the need of existing values to be obtained and then it is utilized in a
comparison condition. This is referred as nested query. In this, a completed “select from where”
blocks exist inside WHERE clause of a different query. This query is referred as outer query.

The format of “ select ” statement is:

SELECT attribute-list FROM table-list WHERE condition

o Here, “SELECT”, “FROM”, and “WHERE” are the keywords.

o “attribute-list” is the list of attributes.

• To retrieve all the attributes of a table, instead of giving all attributes in the table, asterisk (*) can
be used.

o “table-list” is the list of tables.

o Condition is optional.

Comment

Step 2 of 4

a)

Query:

SELECT LNAME FROM EMPLOYEE WHERE DNO = (SELECT DNO FROM

EMPLOYEE WHERE SALARY = (SELECT MAX(SALARY) FROM EMPLOYEE) )

Explanation:

The first nested (outer) query selects all employee names. While the second query selects
department number with the employee of highest salary among all the employees.

Comment

Step 3 of 4

b)
Query:

SELECT LNAME FROM EMPLOYEE WHERE SUPERSSN IN (SELECT SSN

FROM EMPLOYEE WHERE SUPERSSN = ‘888665555’)

Explanation:

The first nested (outer) query selects the employee names where the supervisor’s supervisor
serial number in the second query matches with the number “888665555”.

Comments (1)

Step 4 of 4

c)

Query:

SELECT LNAME FROM EMPLOYEE WHERE SALARY > 10000 + ( SELECT MIN(SALARY)
FROM EMPLOYEE)

Explanation:

The first nested (outer) query selects the employee names where the salary is greater than
10,000 and in the second query, it selects the employee who has the least salary.

Comment
Chapter 7, Problem 8E

Problem

Specify the following views in SQL on the COMPANY database schema shown in Figure 5.5.

a. A view that has the department name, manager name, and manager salary for every
department

b. A view that has the employee name, supervisor name, and employee salary for each
employee who works in the ‘Research’ department

c. A view that has the project name, controlling department name, number of employees, and
total hours worked per week on the project for each project

d. A view that has the project name, controlling department name, number of employees, and
total hours worked per week on the project for each project with more than one employee
working on it

Step-by-step solution

Step 1 of 4

a.

A view that has the department name along with the name and salary of the manager for every
department is as follows:

CREATE VIEW MANAGER_INFORMATION

AS SELECT Dname, Fname AS Manager_First_name, Salary

FROM DEPARTMENT, EMPLOYEE

WHERE DEPARTMENT.Mgr_ssn = EMPLOYEE.Ssn;

Explanation:

• CREATE VIEW will create a view with the MANAGER_INFORMATION.

• SELECT is used to query the database and get back the specified fields.

o Dname is an attribute of DEPARTMENT table.

o Fname and Salary are attributes of EMPLOYEE table.

• FROM is used to query the database and get back the preferred information by specifying the
table name.

o DEPARTMENT, EMPLOYEE are table names.

• WHERE is used to specify a condition based on which the data is to be retrieved.

o DEPARTMENT.Mgr_ssn = EMPLOYEE.Ssn is the condition.

Comment

Step 2 of 4

b.

A view that has the employee name, supervisor name and employee salary for each employee
who works in the Research department is as follows:

CREATE VIEW EMPLOYEE_INFORMATION

AS SELECT e.Fname AS Employee_first_name,

e.Minit AS Employee_middle_init,

e.Lname AS Employee_last_name,
s.Fname AS Manager_fname,

s.Minit AS Manager_minit,

s.Lname AS Manager_Lname, Salary

FROM EMPLOYEE AS e, EMPLOYEE AS s,

DEPARTMENT AS d

WHERE e.Super_ssn = s.Ssn

AND e.Dno = d.Dnumber

AND d.Dname = 'Research';

Explanation:

• CREATE VIEW will create a view with the EMPLOYEE_INFORMATION.

• SELECT is used to query the database and get back the specified fields.

o Dname is an attribute of DEPARTMENT table.

o Fname, Lname, Minit and Salary are attributes of EMPLOYEE table.

• FROM is used to query the database and get back the preferred information by specifying the
table name.

o DEPARTMENT, EMPLOYEE are table names.

o e, s are the alias names of EMPLOYEE table.

o d is alias name of DEPARTMENT table.

• WHERE is used to specify a condition based on which the data is to be retrieved. The
conditions specified in the query are

o e.Super_ssn = s.Ssn checks

o e.Dno = d.Dnumber

o d.Dname = 'Research'

Comment

Step 3 of 4

c.

A view that has the project name, controlling department name, number of employees, and total
hours worked per week on the project is as follows:

CREATE VIEW PROJECT_INFORMATION

AS SELECT Pname, Dname, COUNT(WO.Essn), SUM(WO.Hours)

FROM PROJECT AS P, WORKS_ON AS WO,

DEPARTMENT AS D

WHERE P.Dnum = D.Dnumber

AND P.Pnumber = WO.Pno

GROUP_BY Pno;

Explanation:

• CREATE VIEW will create a view with the PROJECT_INFORMATION.

• SELECT is used to query the database and get back the specified fields.

o Dname is an attribute of DEPARTMENT table.

o Pname is an attribute of PROJECT table.

o Essn and Hours are attributes of WORKS_ON table.

• FROM is used to query the database and get back the preferred information by specifying the
table name.

o DEPARTMENT, EMPLOYEE and WORKS_ON are table names.

o P is the alias name for PROJECT table.

o D is alias name of DEPARTMENT table.

o WO is alias name of WORKS_ON table.

• WHERE is used to specify a condition based on which the data is to be retrieved. The
conditions specified in the query are

o P.Dnum = D.Dnumber

o P.Pnumber = WO.Pno

• GROUP BY is used to group the result of a SELECT statement done on a table where the tuple
values are similar for more than one column.

o Pno is the group by attribute.

Comment

Step 4 of 4

d.

The following is the view that has the project name, controlling department name, number of
employees, and total hours worked per week on the project for each project with more than one
employee working on it.

CREATE VIEW PROJECT_INFO

AS SELECT Pname, Dname, COUNT(WO.Essn), SUM(WO.Hours)

FROM PROJECT AS P, WORKS_ON AS WO,

DEPARTMENT AS D
WHERE P.Dnum = D.Dnumber

AND P.Pnumber = WO.Pno

GROUP_BY Pno

HAVING COUNT(WO.Essn) > 1;

Explanation:

• CREATE VIEW will create a view with the PROJECT_INFO.

• SELECT is used to query the database and get back the specified fields.

o Dname is an attribute of DEPARTMENT table.

o Pname is an attribute of PROJECT table.

o Essn and Hours are attributes of WORKS_ON table.

• FROM is used to query the database and get back the preferred information by specifying the
table name.

o DEPARTMENT, EMPLOYEE and WORKS_ON are table names.

o P is the alias name for PROJECT table.

o D is alias name of DEPARTMENT table.

o WO is alias name of WORKS_ON table.

• WHERE is used to specify a condition based on which the data is to be retrieved. The
conditions specified in the query are

o P.Dnum = D.Dnumber

o P.Pnumber = WO.Pno

• GROUP BY is used to group the result of a SELECT statement done on a table where the tuple
values are similar for more than one column.

o Pno is the group by attribute.

• HAVING clause is used to specify the condition based on group by function.

o COUNT(WO.Essn) > 1 is the condition.

Comment
Chapter 7, Problem 9E

Problem

Consider the following view, DEPT_SUMMARY, defined on the COMPANY database in Figure
5.6:

CREATE VIEW DEPT_SUMMARY (D, C, Total_s, Average_s)AS SELECT Dno, COUNT

State which of the following queries and updates would be allowed on the view. If a query or
update would be allowed, show what the corresponding query or update on the base relations
would look like, and give its result when applied to the database in Figure 5.6.

a.

SELECT * FROM DEPT_SUMMARY;

b.

SELECT D,C FROM DEPT_SUMMARY WHERE TOTAL_S > 100000;

c.

SELECT D, AVERAGE_S FROM DEPT_SUMMARY WHERE C > ( SELECT C FROM D

d.

UPDATE DEPT_SUMMARY SET D=3 WHERE D = 4;

e.

DELETE FROM DEPT_SUMMARY WHERE C > 4;

Step-by-step solution

Step 1 of 5

a) Allowed

D C Total_s Average_s

5 4 133000 33250
4 3 93000 31000

1 1 55000 55000

Comments (1)

Step 2 of 5

b) Allowed

D C

5 4

Comment

Step 3 of 5

c) Allowed

D Average_s

5 33250

Comment

Step 4 of 5

d) Not allowed because update on aggregate functions is not evaluated.

Comment

Step 5 of 5

e) Not allowed because there can be multiple meaning of the query.

Comment
Chapter 8, Problem 1RQ

Problem

List the operations of relational algebra and the purpose of each.

Step-by-step solution

Step 1 of 6

The operations of relational algebra are as follows:

• SELECT

• PROJECT

• THETA JOIN

• EQUI JOIN

• NATURAL JOIN

• UNION

• INTERSECTION

• MINUS or DIFFERENCE

• CARTESIAN PRODUCT

• DIVISION

Comment

Step 2 of 6

SELECT operation:

• It is used to obtain a subset of tuples of a relation based on a condition. In other words, it


retrieves only those tuples that satisfy the condition.

• The symbol used to denote SELECT operation is .

• The notation of SELECT operation is .

• retrieves the tuples from relation Employee whose job is clerk.

PROJECT operation:

• It is used to obtain certain attributes/columns of a relation. The attributes to be retrieved must


be specified as a list separated by commas.

• The symbol used to denote PROJECT operation is .

• The notation of PROJECT operation is .

• retrieves only the employee’s last name, first name and


employee number of all employees in relation Employee

Comment

Step 3 of 6

THETA JOIN operation:

• THETA JOIN operation combines related tuples from two relations and outputs as a single
tuple.

• The symbol used to denote THETA JOIN operation is .

• The notation of THETA JOIN between the relations R and S is given as .

EQUI JOIN operation:

• An EQUIJOIN operation combines all the tuples of relations R and S that satisfy the condition.
The comparison operator must be =.

• The notations of EQUI JOIN between the relations R and S is given as

Comment

Step 4 of 6

NATURAL JOIN operation:

• It is similar to EQUIJOIN. The only difference is the join attributes of relation S are not included
in the resultant relation.

• The notations of NATURAL JOIN between the relations R and S is given as


UNION operation:

• When UNION operation is applied on relations R and S, the resultant relation consists of all the
tuples in relation R or S or both R and S.

• If similar tuples are in both R and S relations, then only one tuple will be in the resultant relation.

• The UNION operation can be applied on relations R and S only if the relations are union
compatible.

• The symbol used to denote UNION operation is .

• The notation of UNION between the relations R and S is given as .

Comment

Step 5 of 6

INTERSECTION operation:

• When INTERSECTION operation is applied on relations R and S, the resultant relation consists
of only the tuples that are in both R and S.

• The symbol used to denote INTERSECTION operation is .

• The notation of INTERSECTION between the relations R and S is given as

MINUS or DIFFERENCE operation:

• When DIFFERENCE operation is applied on relations R and S, the resultant relation consists of
only the tuples that are R but not in S.

• The symbol used to denote DIFFERENCE operation is .

• The notation of DIFFERENCE between the relations R and S is given as .

Comment

Step 6 of 6

CARTESIAN PRODUCT operation:

• When CARTESIAN PRODUCT operation is applied on relations R and S, the resultant relation
consists of all the attributes of relation R and S along with all possible combination of the tuples
of R and S.

• The symbol used to denote CARTESIAN PRODUCT operation is .

• The notation of CARTESIAN PRODUCT between the relations R and S is given as .

DIVISION operation:

• This combines all the tuples of that appears in with every tuple from to
form a new relation where .

• The symbol used to denote DIVISION operation is .

• The notation of DIVISION between R and S is given as .

Comments (1)
Chapter 8, Problem 2RQ

Problem

What is union compatibility? Why do the UNION, INTERSECTION, and DIFFERENCE


operations require that the relations on which they are applied be union compatible?

Step-by-step solution

Step 1 of 2

Union compatibility: The two relations are said to be union compatible if both the relations have
the same number of attributes and the domain of the similar attributes is same.

Comment

Step 2 of 2

The UNION, INTERSECTION and DFFERENCE operations require that the relations on which
they are applied be union compatible because all these operations are binary set operations. The
tuples of the relations are directly compared under these operations and the tuples should have
same no of attributes and the domain of the similar attributes should be same.

Comment
Chapter 8, Problem 3RQ

Problem

Discuss some types of queries for which renaming of attributes is necessary in order to specify
the query unambiguously.

Step-by-step solution

Step 1 of 1

When a query has an NATURAL JOIN operation than renaming foreign key attribute is
necessary, if the name is not already same in both relations, for operation to get executed. In
EQUIJOIN after the operation is performed there are two attributes that have same values for all
tuples. These are attributes which have been checked in condition. In NATURAL JOIN one of
them has been removed only single attribute is there.

DIVISION operation is another such operation. Division takes place on basis of common attribute
so names must be same.

Comment
Chapter 8, Problem 4RQ

Problem

Discuss the various types of inner join operations. Why is theta join required?

Step-by-step solution

Step 1 of 6

Various types of inner join operations:

From multiple relations when combining the data, then the related information can be presented
in single table.

This operation is known as inner join operations.

Inner join operations are two types. They are:

• EQUI JOIN operations

• NATURAL JOIN operations

Comment

Step 2 of 6

EQUI JOIN operation:

• In this operation, it will use the conditions and the relations with equality comparisons.

• is called an EQUIJOIN operator where the only comparison operator used in a JOIN
operation.

In the end result of equijoin operations, always have one or more pair of attributes.

It is having identical values in every tuple.

Example syntax:

Table Expression [INNER] JOIN table Expression

{ON Boolean Expression}

Or,

Comment

Step 3 of 6

NATURALJOIN operation:

One of each pair of attributes with identical values is superfluous.

• *- is denoted by the NATURAL JOIN operation.

• It is created to get rid of the second (superfluous) attribute in an EQUI JOIN condition.

Definition:

• The standard definition of the NATURAL JOIN operation requires two join attributes.

Comment

Step 4 of 6

It has the same name in both relations.

• If the case is not possible, then the remaining operation is firstly applied.

Example syntax:

Comment

Step 5 of 6

Theta join operation:

• Theta join operation is consists of equerries.

• From two relations to combine tuples, where the combination condition for the equality of
shared attributes is not simple.

• Then it is convenient for the JOIN operation to have a more general form.

The operator to represent the Theta join operation is .

- Join operation is a binary operation. It is denoted as


Where,

is an attribute for relation R

is an attribute for relation S

• have the same domain and is the comparison operator

• operator is used to join the attributes those are NULL in the tuples or instructs the tuple do
not appear the result when the join condition is FALSE.

• So, the two relations will join that results in a subset of the Cartesian product, which is a
subset determined by the join condition.

Example syntax:

The result of

Professions careers is shown below.

Name Job Career Pays

Haney Mechanic Mechanic 6500

David Archaeologist Archaeologist 40,000

Comment

Step 6 of 6

John Doctor Doctor 50,000

Comment
Chapter 8, Problem 5RQ

Problem

What role does the concept of foreign key play when specifying the most common types of
meaningful join operations?

Step-by-step solution

Step 1 of 3

A foreign key is a column or composite of columns which is/are a primary key of other table that
is used to maintain relationship between two tables.

• A foreign key is mainly used for establishing relationship between two tables.

• A table can have more than one foreign key.

Comment

Step 2 of 3

The JOIN operation is used to combine related tuples from two relations into a single tuple.

• In order to perform JOIN operation, there should exist relationship between two tables.

• The relationship is maintained through the concept of foreign key.

• If there is no foreign key, then JOIN operation may not lead to meaningful results.

Hence, a foreign key concept is needed to establish relationship between two tables.

Comment

Step 3 of 3

Example:

Consider the following relational database.

EMPLOYEE(Name, Ssn, Manager_ssn, Job, Salary, Address, DeptNum)

DEPARTMENT(Dno,Dname, Mgr_ssn)

DeptNum is a foreign key in relation EMPLOYEE. The JOIN operation can be performed on two
relations based on the foreign key.

To retrieve employee name, DeptNum, Dname, the JOIN is as follows:

Comment
Chapter 8, Problem 6RQ

Problem

What is the FUNCTION operation? For what is it used?

Step-by-step solution

Step 1 of 3

FUNCTION operation:

• FUNCTION operation also known as AGGREGATE FUNCTION operation is used to perform


some mathematical aggregate functions on the numeric data.

• It also allows grouping of data/tuples based on some attributes of the relation.

• The aggregate functions are SUM, AVERAGE, MAXIMUM, MINIMUM and COUNT.

Comment

Step 2 of 3

The syntax of FUNCTION operation is as follows:

(R)

where,

is a list of attributes from R based on which grouping is to be performed.

is the symbol used for aggregate function operation.

is a list of pairs where a pair consists of function and

attributes.

Comment

Step 3 of 3

FUNCTION operation is used for obtaining the summarized data from the relations.

Example: MAXIMUM Salary MINIMUM Salary (EMPLOYEE)

The above query will find the maximum and minimum salary in the EMPLOYEE relation.

Comment
Chapter 8, Problem 7RQ

Problem

How are the OUTER JOIN operations different from the INNER JOIN operations? How is the
OUTER UNION operation different from UNION?

Step-by-step solution

Step 1 of 2

OUTER JOIN and INNER JOIN: Consider two relational databases R and S. When user wants
to keep all the tuples in R, or all those in S, or all the tuples in R, or all those in S, or all those in
both relations in the result of the JOIN regardless of weather or not they have matching tuples in
other relation, set of operations called outer joins can do so. This satisfies the need of queries in
which tuples from two tables are to be combined by matching corresponding rows, but without
losing any tuples for lack of matching values.

When only matching tuples (based on condition) are contained in resultant relation and not all
tuples then join is INNER JOIN (EQUIJOIN and NATURALJOIN).

In OUTER JOIN if matching values of other relation are not present fields are padded by NULL
value.

Comment

Step 2 of 2

OUTER UNION and UNION: For UNION operation databases have to be UNION compatible, i.e,
they have same number of attributes and each corresponding pair of attributes have same
domain.

OUTER UNION operation was developed to take the union of tuples from two relations if the
relations are not union compatible. This operation will take UNION of tuples in two relations R(X,
Y) and S(X,Z) that are partial compatible, meaning that only some attributes, say X, are union
compatible. Resultant relation is of form RESULT(X, Y, Z).

Two tuples t1 in R and T2 in S are said to match if t1[X] =t2[X] and are considered to contain
same entity instance. These are combined in single tuple.

For rest of tuples NULL values are padded.

Comment
Chapter 8, Problem 8RQ

Problem

In what sense does relational calculus differ from relational algebra, and in what sense are they
similar?

Step-by-step solution

Step 1 of 2

Difference between relational calculus and relational algebra:

Relational calculus Relational algebra

It is a non-procedural language. It is a procedural language.

The query specifies how the


The query specifies what output is to be retrieved.
desired output is retrieved.

The order of the operations to be


The order of the operations to be followed for getting the
followed for getting the result is
result is not specified.
specified.

The evaluation of the query


The evaluation of the query does not depend on the order
depends on the order of the
of the operations.
operations.

New relations are not created by performing operations on New relations can be obtained by
the existing relations. Formulas are directly applied on the performing operations on the
existing relations. existing relations.

The queries are domain


The queries are domain independent.
dependent.

Comment

Step 2 of 2

Similarities between relational calculus and relational algebra:

• Relational algebra and relational calculus are formal query languages for relational model.

• They are used for retrieving information from database.

Comment
Chapter 8, Problem 9RQ

Problem

How does tuple relational calculus differ from domain relational calculus?

Step-by-step solution

Step 1 of 2

The relational calculus is a non-procedural query language that uses predicates.

• The query in relational calculus specifies what output is to be retrieved.

• The order of the operations to be followed for getting the result is not specified.

• In other words, the evaluation of the query does not depend on the order of the operations.

• The two variations of relational calculus are:

o Tuple relational calculus

o Domain relational calculus

Comment

Step 2 of 2

The differences between tuple relational calculus and domain relational calculus are as follows:

Comment
Chapter 8, Problem 10RQ

Problem

Discuss the meanings of the existential quantifier (∃) and the universal quantifier (∀).

Step-by-step solution

Step 1 of 2

Quantifier’s are two types

(1) Existential quantifiers:-

(2) Universal quantifiers:-

(1) Existential quantifiers:-

Existential quantifier is a logical relation and symbolized as (“ there exists”).

Here

The statement is

Based on the formula of existential quantifiers is if F is a formula, then so is .

Where t is a tuple variable.

If the formula F evaluates to TRUE for some tuple assigned to free occurrences of t in F, then the
formula is TRUE. Otherwise, it is FALSE.

Comment

Step 2 of 2

(2) Universal Quantifiers:-

Universal quantifiers is a logical relation, it is symbolized as .

The statement is .

Based on the formula of universal quantifiers is

If F is a formula then statement is .

Here t is the tuple variable and the formula F

Evaluates to true for every tuple assigned to free occurrences of t in F, then F is TRUE other wire
it is FALSE.

Comment
Chapter 8, Problem 11RQ

Problem

Define the following terms with respect to the tuple calculus: tuple variable, range relation,atom,
formula, and expression.

Step-by-step solution

Step 1 of 3

Tuple relational calculus: The tuple relational calculus is a non-procedural language. It contains
a declarative expression that specifies what is to be retrieved.

Comment

Step 2 of 3

Tuple variable: A query in the tuple relational calculus is represented as . Here, t is a


tuple variable for which predicate P is true.

Range Relation: In the tuple relational calculus, every tuple ranges over a relation. The variable
takes any tuple as its value from the relation.

Atom: The atom in the tuple relational calculus identifies the range of the tuple variable. The
condition in the tuple relational calculus is made of atoms.

Comment

Step 3 of 3

Formula: A formula or condition is made of atoms. These atoms in the formula are connected
via the logical operators like AND, OR, NOT. Every atom in the formula is treated as a formula
i.e., the formula may or may not have multiple atoms.

Expression: The tuple relational calculus contains a declarative expression that specifies what is
to be retrieved.

Example:

Consider an expression . In this


expression, t is the tuple variable, is the formula,
and are atoms, the is a range relation that
specifies the range of the tuple variable t over the relation .

Comment
Chapter 8, Problem 11RQ

Problem

Define the following terms with respect to the tuple calculus: tuple variable, range relation,atom,
formula, and expression.

Step-by-step solution

Step 1 of 3

Tuple relational calculus: The tuple relational calculus is a non-procedural language. It contains
a declarative expression that specifies what is to be retrieved.

Comment

Step 2 of 3

Tuple variable: A query in the tuple relational calculus is represented as . Here, t is a


tuple variable for which predicate P is true.

Range Relation: In the tuple relational calculus, every tuple ranges over a relation. The variable
takes any tuple as its value from the relation.

Atom: The atom in the tuple relational calculus identifies the range of the tuple variable. The
condition in the tuple relational calculus is made of atoms.

Comment

Step 3 of 3

Formula: A formula or condition is made of atoms. These atoms in the formula are connected
via the logical operators like AND, OR, NOT. Every atom in the formula is treated as a formula
i.e., the formula may or may not have multiple atoms.

Expression: The tuple relational calculus contains a declarative expression that specifies what is
to be retrieved.

Example:

Consider an expression . In this


expression, t is the tuple variable, is the formula,
and are atoms, the is a range relation that
specifies the range of the tuple variable t over the relation .

Comment
Chapter 8, Problem 12RQ

Problem

Define the following terms with respect to the domain calculus: domain variable, range relation,
atom, formula, and expression.

Step-by-step solution

Step 1 of 3

Domain variable:-

A variable whose value is drawn from the domain of an attribute.

To form a relation of degree ‘n’ for a query result, domain variables are used.

Ex:

The domain of domain variable Crs might be the set of possible values of the Crs code attribute
of the relation teaching.

Comment

Step 2 of 3

Range relation:-

In the domain calculus, the type of variables is used in formulas, other wise variables

having the range over tuples. The variable range over single values from domains of

attributes.

ATOM:-

A list of values in a relation must be a tuple let the relation R as

Here R is the name of the relation of degree j and each , and is a domain variable.

Comment

Step 3 of 3

Formula:-

In a domain relational calculus formula is recursively defined. Starting with simple atomic
formulas and building bigger and better formulas using the logical connectives.

A formula is mode up of atoms.

Expression:-

It is the domain relational calculus. That is the form of

Here

are domain variables.

An expression in a domain calculus is called formulas.

Comment
Chapter 8, Problem 13RQ

Problem

What is meant by a safe expression in relational calculus?

Step-by-step solution

Step 1 of 3

An expression in relational calculus is said to be safe expression if it ensures to output a finite set
of tuples.

Comment

Step 2 of 3

The relational calculus expression that generates all the tuples from the universe that are not
student tuples is as follows:

It generates infinite number of tuples as there will be so many tuples other than student tuples.

Such expressions in relational calculus that does not generate a finite set of tuples are known as
unsafe expression.

Comment

Step 3 of 3

The generated tuples of the safe expression must be from the domain of an expression.
Otherwise it is considered as unsafe.

Comment
Chapter 8, Problem 14RQ

Problem

When is a query language called relationally complete?

Step-by-step solution

Step 1 of 2

A relational query language is said to be relationally complete if a query that is expressed in


relational calculus can also be expressed in query language.

• The expressive power of query language will be equivalent to relational algebra.

• Relational completeness is a criterion by which the expressive strength of a language can be


measured.

Comment

Step 2 of 2

• Some of the queries cannot be expressed in relational calculus or relational algebra.

• Almost all relational query languages (for example SQL) are relationally complete. They are
more expressive than relational algebra or relational calculus.

Comment
Chapter 8, Problem 15E

Problem

Show the result of each of the sample queries in Section 8.5 as it would apply to the database
state in Figure 5.6.

Step-by-step solution

Step 1 of 6

Query 1:-

Result

FNAME LNAME ADDRESS

John Smith 731 Fondren,

F Rank in Wong 638 Voss,

Ramesh Narayan 975 F ire, Oak, Humble , Tx

Joyce English 5631 Rice,

Comment

Step 2 of 6

Query 2:-

PNUMBER DNUM LNAME ADDRESS B DATE

10 4 Wallace 291 , 20 – JUN – 31

30 4 Wallace 291 , 20 – JUN - 31


Comment

Step 3 of 6

Query 3:-

Result :-

Is empty because here no tuples satisfy the result.

LNAME F NAME

Query 4:

Result:

Is

P NO

Comment

Step 4 of 6

Query 5:-

Result:

L NAME F NAME

Smith John

Wong

Comment

Step 5 of 6

Query 6:-

Result :-

L NAME F NAME

Zelaga Alicia

Narayan Ramesh

English Joyce

Jobber Ahmad

Borg James

Comment

Step 6 of 6

Query 7:-

Result:

L NAME FNAME

Wallace Jennifer

Wong

Comment
Chapter 8, Problem 16E

Problem

Specify the following queries on the COMPANY relational database schema shown in Figure 5.5
using the relational operators discussed in this chapter. Also show the result of each query as it
would apply to the database state in Figure 5.6.

a. Retrieve the names of all employees in department 5 who work more than 10 hours per week
on the ProductX project.

b. List the names of all employees who have a dependent with the same first name as
themselves.

c. Find the names of all employees who are directly supervised by ‘Franklin Wong’.

d. For each project, list the project name and the total hours per week (by all employees) spent
on that project.

e. Retrieve the names of all employees who work on every project.

f. Retrieve the names of all employees who do not work on any project.

g. For each department, retrieve the department name and the average salary of all employees
working in that department.

h. Retrieve the average salary of all female employees.

i. Find the names and addresses of all employees who work on at least one project located in
Houston but whose department has no location in Houston.

j. List the last names of all department managers who have no dependents.
Step-by-step solution

Step 1 of 10

Comment

Step 2 of 10
Comment

Step 3 of 10

Comment

Step 4 of 10

Comments (1)

Step 5 of 10
Comment

Step 6 of 10

Comment

Step 7 of 10

Comments (1)

Step 8 of 10

Comment
Step 9 of 10

Comments (2)

Step 10 of 10

Comments (2)
Chapter 8, Problem 17E

Problem

Consider the AIRLINE relational database schema shown in Figure, which was described in
Exercise. Specify the following queries in relational algebra:

a. For each flight, list the flight number, the departure airport for the first leg of the flight, and the
arrival airport for the last leg of the flight.

b. List the flight numbers and weekdays of all flights or flight legs that depart from Houston
Intercontinental Airport (airport code ‘iah’) and arrive in Los Angeles International Airport (airport
code ‘lax’).

c. List the flight number, departure airport code, scheduled departure time, arrival airport code,
scheduled arrival time, and weekdays of all flights or flight legs that depart from some airport in
the city of Houston and arrive at some airport in the city of Los Angeles.

d. List all fare information for flight number ‘col97’.

e. Retrieve the number of available seats for flight number ‘col97’ on ‘2009-10-09’.

The AIRLINE relational database scheme.

Exercise

Consider the AIRLINE relational database schema shown in Figure, which describes a database
for airline flight information. Each FLIGHT is identified by a Flight_number, and consists of one or
more FLIGHT_LEGs with Leg_numbers 1, 2, 3, and so on. Each FLIGHT_LEG has scheduled
arrival and departure times, airports, and one or more LEG_INSTANCEs— one for each Date on
which the flight travels. FAREs are kept for each FLIGHT. For each FLIGHT_LEG instance,
SEAT_RESERVATIONs are kept, as are the AIRPLANE used on the leg and the actual arrival
and departure times and airports. An AIRPLANE is identified by an Airplane_id and is of a
particular AIRPLANE_TYPE. CAN_LAND relates AIRPLANE_TYPEs to the AIRPORTs at which
they can land. An AIRPORT is identified by an Airport_code. Consider an update for the AIRLINE
database to enter a reservation on a particular flight or flight leg on a given date.

a. Give the operations for this update.

b. What types of constraints would you expect to check?

c. Which of these constraints are key, entity integrity, and referential integrity constraints, and
which are not?

d. Specify all the referential integrity constraints that hold on the schema shown in Figure.

Step-by-step solution

Step 1 of 4

The following symbols are used to write a relation algebra query:


Comment

Step 2 of 4

a.

Following is the query to list the flight number, the first leg of flight’s departure airport, and the
last leg of flight’s arrival airport from each flight:

Explanation:

• FLIGH_LEG_IN holds the data about the combinations of FLIGHT and FLIGHT whose
FLIGHT’s Flight_number is equal to FLIGHT_LEG’s Flight_number.

• MAX_FLIGHT_LEG holds the data about Flight_numbers whose Leg_number is maximum in


the FLIGHT_LEG_IN.

• MIN_FLIGHT_LEG holds the data about Flight_numbers whose Leg_number is minimum in the
FLIGHT_LEG_IN.

• In RESULT1, the data about the Flight_number, Leg_number and Arrival_airport_code of


MAX_FLIGHT_LEG is stored.

• In RESULT2, the data about the Flight_number, Leg_number and Arrival_airport_code of


MIN_FLIGHT_LEG is stored.

• RESULT will display the resultant tuples of the Union of the Set Algebra of RESULT1 and
RESULT2.

Comments (1)

Step 3 of 4

b.

Following is the query to retrieve the flight numbers and weekdays of all flights or flight legs that
flies from Houston Intercontinental Airport whose code is given as ‘iah’ to Los Angeles
International Airport whose code is given as ‘lax’:

Explanation:

• FLIGH_LEG_IN holds the data about the combinations of FLIGHT and FLIGHT whose
FLIGHT’s Flight_number is equal to FLIGHT_LEG’s Flight_number.

• In RESULT1, the data about the FLIGHT_LEG is stored whose Departure_airport_code is


‘iah’and Arrival_airport_code is ‘lax’.

• RESULT will display the Flight_number, Weekdays of RESULT1.

c.

Following is the query to retrieve the flight number, airport code and scheduled time of departure,
airport code and scheduled time of arrival, and weekdays of all flights or flight legs that flies from
one of the airport in city of Houston and lands at one of the airport in Los Angeles:

Explanation:

• FLIGH_LEG_IN holds the data about the combinations of FLIGHT and FLIGHT whose
FLIGHT’s Flight_number is equal to FLIGHT_LEG’s Flight_number.

• The DEPART_CODE will hold the data about the Airport_code of AIRPORT whose City =
‘Houston’.
• The ARRIVE_CODE will hold the data about the Airport_code of AIRPORT whose City = ‘Los
Angeles’.

• The HOUST_DEPART holds the resultant of the relation obtained when the JOIN operation is
applied between the relations DEPART_CODE and FLIGHT_LEG_IN which satisfies condition
that Airport_Code = Departure_airport_code.

• The HOUST_TO_LA holds the resultant of the relation obtained when the JOIN operation is
applied between the relations ARRIVE_CODE and HOUST_DEPART which satisfies condition
that Airport_Code = Arrival_airport_code.

• RESULT will display the Flight_number, Departure_airport_code, Scheduled_departure_time,


Arrival_airport_code, Scheduled_arrival_time and Weekdays of HOUST_TO_LA.

d.

Following is the query to retrieve the fare information of the whose flight number is ‘col97’:

Explanation:

RESULT will hold the data about the all the FARE’s whose Flight_number is ‘col97’.

Comment

Step 4 of 4

e.

Following is the query to get the number of available seats whose flight number is ‘col97’ and
dated on ‘2009-10-09’:

Explanation:

• LEG_INST_INFO holds the data about LEG_INSTANCE whose Flight_number is ‘col97’ and
Date is ‘2009-10-09’.

• RESULT will display the Number_of_available_seats information of the LEG_INST_INFO.

Comment
Chapter 8, Problem 18E

Problem

Consider the LIBRARY relational database schema shown in Figure, which is used to keep track
of books, borrowers, and book loans. Referential integrity constraints are shown as directed arcs
in Figure, as in the notation of Figure 5.7. Write down relational expressions for the following
queries:

a. How many copies of the book titled The Lost Tribe are owned by the library branch whose
name is ‘Sharpstown’?

b. How many copies of the book titled The Lost Tribe are owned by each library branch?

c. Retrieve the names of all borrowers who do not have any books checked out.

d. For each book that is loaned out from the Sharpstown branch and whose Due_date is today,
retrieve the book title, the borrower’s name, and the borrower’s address.

e. For each library branch, retrieve the branch name and the total number of books loaned out
from that branch.

f. Retrieve the names, addresses, and number of books checked out for all borrowers who have
more than five books checked out.

g. For each book authored (or coauthored) by Stephen King, retrieve the title and the number of
copies owned by the library branch whose name is Central.

A relational database scheme for a LIBRARY database.

Step-by-step solution

Step 1 of 7

a.

Following is the relational expression to find the number of copies of the book whose title is ‘The
Lost Tribe’ in the library branch whose name is ‘Sharpstown’:

Comment

Step 2 of 7

b.

Following is the relational expression to find the number of copies of the book whose title is ‘The
Lost Tribe’ is available at each branch of the library:

Comment
Step 3 of 7

c.

Following is the relational expression to retrieve the names of the borrowers who have no books
checked out:

Comment

Step 4 of 7

d.

Following is the relational expression to retrieve the book title, borrower’s name and address of
the book that is loaned out from of the borrowers who have no books checked out from branch
whose name is ‘Sharpstown’ and which has the due date as today:

Comment

Step 5 of 7

e.

Following is the relational expression to retrieve the branch name and the total number of books
loaned out from that branch:

Comments (1)

Step 6 of 7

f.

Following is the relational expression to retrieve the name, address and total number of books for
all borrowers who have more than five books checked out:

Comment

Step 7 of 7

g.

Following is the relational expression to retrieve the title and number of copies of each book
authored or coauthored by Stephen King in library branch whose name is Central:

Comment
Chapter 8, Problem 19E

Problem

Specify the following queries in relational algebra on the database schema given in Exercise:

a. List the Order# and Ship_date for all orders shipped from Warehouse# W2.

b. List the WAREHOUSE information from which the CUSTOMER named Jose Lopez was
supplied his orders. Produce a listing: Order#, Warehouse#.

c. Produce a listing Cname, No_of_orders, Avg_order_amt, where the middle column is the total
number of orders by the customer and the last column is the average order amount for that
customer.

d. List the orders that were not shipped within 30 days of ordering.

e. List the Order# for orders that were shipped from all warehouses that the company has in New
York.

Exercise

Consider the following six relations for an order-processing database application in a company:

CUSTOMER(Cust#, Cname, City)

ORDER(Order#, Odate, Cust#, Ord_amt)

ORDER_ITEM(Order#, Item#, Qty)

ITEM(Item#, Unit_price)

SHIPMENT(Order#, Warehouse#, Ship_date)

WAREHOUSE(Warehouse#, City)

Here, Ord_amt refers to total dollar amount of an order; Odate is the date the order was placed;
and Ship_date is the date an order (or part of an order) is shipped from the warehouse. Assume
that an order can be shipped from several warehouses. Specify the foreign keys for this schema,
stating any assumptions you make. What other constraints can you think of for this database?

Step-by-step solution

Step 1 of 6

Relational Algebra

It is a procedural language to perform various queries on the database.

The operations of the relational algebra are as follows:

• Select: It is used to select the tuples and it is presented by a symbol σ. • Project: it is used to
projects the columns and it is represented by ∏. • Union is identified by ∪. • Set different is
identified by –. • Cartesian product is identified by Χ. • Rename is identified by ρ

Comment

Step 2 of 6

a.

Query to retrieve the order number and shipping date for all the orders that are shipped from
Warehouse "W2":

Explanation:

• First projects the Order# and Ship_date and then select the Warehouse# "W2" for all orders.

• The above query will select the fields Order# and Ship_date from the table SHIPMENT whose
Warehouse number = "W2" for all the orders.

Comment

Step 3 of 6

b.

Query to retrieve the order number and warehouse number for all the orders of customer named
"Jose Lopez":

Explanation:
• First select the Customer named "Jose Lopez" was supplied his orders and then project the
listing of Order#, Warehouse#.

• TEMP will give the details of the ORDER and the CUSTOMER table whose Cname is ‘Jose
Lopez’. The details of Jose Lopez will be the output.

• The above query will display only the Order# and Warehouse# and perform natural join on
SHIPMENT and the TEMP table whose Order# is same as the Order# number of TEMP.

Comment

Step 4 of 6

c.

Query to retrieve the Cname and total number of orders and average order amount of each
customer:

Explanation:

• The relation TEMP specifies the list of attributes between parenthesis in the RENAME
operation.

• To define the aggregate functions in the query by using the following syntax:

• The number of orders and average order amount is group by the cname field.

• The above query will display only the Customer name, number of orders, and average order
amount and perform natural join on CUSTOMER and the TEMP table whose Cust# is same as
the Cust# number of TEMP.

Comment

Step 5 of 6

d.

Query to list the orders that are not shipped within 30 days of ordering:

Explanation:

• First projects the Order#, Odate, Cust#, and Order_amt then select the orders were not shipped
within the thirty days.

• Select the number of days is calculated by subtracting order date from shipping date and
perform natural join on SHIPMENT whose Order# is same as the Order# number of ORDER.

Comment

Step 6 of 6

e.

Query to list the order# of the orders shipped from the warehouses located in New York:

Explanation:

• TEMP will give the details of the WAREHOUSE whose City is ‘NEW YORK’. The details of
‘NEW YORK’ will be the output.

• Project the Warehouse# from the SHIPMENT table and it is divided by the TEMP.

• The division operator includes all the rows in the SHIPMENT table in combination with every
row from relation TEMP and finally the resultant rows appear in the SHIPMENT relation.

Comment
Chapter 8, Problem 20E

Problem

Specify the following queries in relational algebra on the database schema given in Exercise:

a. Give the details (all attributes of trip relation) for trips that exceeded $2,000 in expenses.

b. Print the Ssns of salespeople who took trips to Honolulu.

c. Print the total trip expenses incurred by the salesperson with SSN = ‘234-56-7890’.

Exercise

Consider the following relations for a database that keeps track of business trips of salespersons
in a sales office:

SALESPERSON(Ssn, Name, Start_year, Dept_no)

TRIP(Ssn, From_city, To_city, Departure_date, Return_date, Trip id)

EXPENSE(Trip id, Account#, Amount)

A trip can be charged to one or more accounts. Specify the foreign keys for this schema, stating
any assumptions you make.

Step-by-step solution

Step 1 of 4

The relational database schema is:

SALESPERSON (Ssn, Name, Start_year, Dept_no)

TRIP (Ssn, From_city, To_city, Departure_date, Return_date, Trip_id)

EXPENSE(Trip_id, Account#, Amount)

Comment

Step 2 of 4

a) Details for trips that exceeded $2000 in expenses.

Comment

Step 3 of 4

b) Print the SSN of salesman who took trips to ‘Honolulu’.

Comment

Step 4 of 4

c) Print the total trip expenses incurred by the salesman with SSN= ‘234-56-7890’.

Comment
Chapter 8, Problem 21E

Problem

Specify the following queries in relational algebra on the database schema given in Exercise:

a. List the number of courses taken by all students named John Smith in Winter 2009 (i.e.,
Quarter=W09).

b. Produce a list of textbooks (include Course#, Book_isbn, Book_title) for courses offered by the
‘CS’ department that have used more than two books.

c. List any department that has all its adopted books published by ‘Pearson Publishing’.

Exercise

Consider the following relations for a database that keeps track of student enrollment in courses
and the books adopted for each course:

STUDENT(Ssn, Name, Major, Bdate)

COURSE(Course#, Cname, Dept)

ENROLL(Ssn, Course#, Quarter. Grade)

BOOK ADOPTION(Course#, Quarter, Book_isbn)

TEXT(Book_isbn, Book_title, Publisher, Author)

Specify the foreign keys for this schema, stating any assumptions you make.

Step-by-step solution

Step 1 of 3

a.

Π Course# (σ Quarter=W09 ((σ Name= ‘John Smith’ (STUDENT) ENROLL))

Explanation:

• This query will give the courses taken by the student named ‘John

Smith’ in winter 2009.

• Here, ‘Π’ is nothing but the projection, ‘σ’ represents selection operation and

‘ ’ represents the natural join operation.

Comment

Step 2 of 3

b.

Π Course#,Book_isbn,Book_title(σ Dept=’CS’ ( Course) (Book_adaption))U (πCourse no (σ Course no


>=1))

Explanation:

• The above query will retrieve the list of textbooks for CS course with the use of natural join.

• The union operator for this query is used to get the common rows from two queries.

Comment

Step 3 of 3

c.

BOOK_ALL_DEPTS = π Dept ((Book_adaption Course))

BOOK_OTHER_DEPTS= π DEPT ((σ Publisher <> ‘Pearson Publishers’ (Book adaption


Text) Course))

BOOK_ANY_DEPTS = BOOK_ALL_DEPTS - BOOK_OTHER_DEPTS

Explanation:

• The above query will list the departments which have all the adopted books published by
“Pearson publishing”.

• In this query ‘<>’ operator is used for “not equal to” operation.

Comment
Chapter 8, Problem 22E

Problem

Consider the two tables T1 and T2 shown in Figure 8.15. Show the results of the following
operations:

Step-by-step solution

Step 1 of 7

Operations of relational algebra

The two tables T1 and T2 represent database states.

TABLE T1 TABLE T2

P Q R A B C

10 a 5 10 b 6

15 b 8 25 c 3

25 a 6 10 b 5

Comment

Step 2 of 7

a) The operation is “THETA JOIN”. It produces all the combinations of tuples


that satisfy the join condition . Following table is the result of the “THETA JOIN”
operation.

P Q R A B C

10 a 5 10 b 6

10 a 5 10 b 5

25 a 6 25 c 3

Comment

Step 3 of 7

b) The operation is “THETA JOIN”. It produces all the combinations of tuples


that satisfy the join condition . Following table is the result of the “THETA JOIN”
operation.

P Q R A B C

15 b 8 10 b 6

15 b 8 10 b 5

Comment
Step 4 of 7

c) The operation is “LEFT OUTER JOIN”. It produces the tuples that are in
the first or left relation T1 with the join condition . If no matching tuple is found in
T2, then the attributes are filled with a NULL values. Following table is the result of the “LEFT
OUTER JOIN” operation.

P Q R A B C

10 a 5 10 b 6

10 a 5 10 b 5

15 a 8 NULL NULL NULL

25 a 8 25 c 3

Comment

Step 5 of 7

d) The operation is “RIGHT OUTER JOIN”. It produces the tuples that are in
the second or right relation T2 with the join condition . If no matching tuple is found
in T1, then the attributes are filled with a NULL values. Following table is the result of the “RIGHT
OUTER JOIN” operation.

P Q R A B C

15 b 8 10 b 6

NULL NULL NULL 25 c 3

15 b 8 10 b 5

Comment

Step 6 of 7

e) The operation is “UNION”. It produces a relation that includes all the tuples that are
in T1 or T2 or both T1 and T2. The operation is possible since T1 and T2 are union compatible.
Following table is the result of the “UNION” operation.

P Q R

10 a 5

15 b 8

25 a 6

10 b 6

25 c 3

10 b 5

Comment

Step 7 of 7

f) The operation is “THETA JOIN”. It produces all the


combinations of tuples that satisfy the join condition . Following
table is the result of the “THETA JOIN” operation.

P Q R A B C

10 a 5 10 b 5

Comment
Chapter 8, Problem 23E

Problem

Specify the following queries in relational algebra on the database schema in Exercise:

a. For the salesperson named ‘Jane Doe’, list the following information for all the cars she sold:
Serial#, Manufacturer, Sale_price.

b. List the Serial# and Model of cars that have no options.

c. Consider the NATURAL JOIN operation between SALESPERSON and SALE. What is the
meaning of a left outer join for these tables (do not change the order of relations)? Explain with
an example.

d. Write a query in relational algebra involving selection and one set operation and say in words
what the query does.

Exercise

Consider the following relations for a database that keeps track of automobile sales in a car
dealership (OPTION refers to some optional equipment installed on an automobile):

CAR(Serial no, Model, Manufacturer, Price)

OPTION(Serial_no, Option_name, Price)

SALE(Salesperson_id, Serial_no, Date, Sale_price)

SALESPERSON(Salesperson_id, Name, Phone)

First, specify the foreign keys for this schema, stating any assumptions you make. Next, populate
the relations with a few sample tuples, and then give an example of an insertion in the SALE and
SALESPERSON relations that violates the referential integrity constraints and of another
insertion that does not.

Step-by-step solution

Step 1 of 4

(a)

Comment

Step 2 of 4

(b)

Comment

Step 3 of 4

(c)

Meaning of LEFT OUTER JOIN operation between SALESPERSON and SALE is that all the
records for which JOIN condition evaluates to be true and all the records from SALESPERSON
that do not match condition will also be displayed and attribute values for attributes
corresponding to SALE table will be marked as NULL.

For example: Consier records for two sale person

a. ID_1,ABC,9999999

b. ID_2,DEF,8888888

And having tuple:

a) ID_1,111, 2-08-2008,500000

Result of join operation will have two tuples:

a) ID_1,ABC,9999999, 111, 2-08-2008,500000


b) ID_2,DEF,8888888,NULL,NULL,NULL

Comment

Step 4 of 4

(d)

This query gives information about Doe couple, who happen to work at same place.

Comment
Chapter 8, Problem 24E

Problem

Specify queries a, b, c, e, f, i, and j of Exercise 8.16 in both tuple and domain relational calculus.

Reference Exercise 8.16

Specify the following queries on the COMPANY relational database schema shown in Figure 5.5
using the relational operators discussed in this chapter. Also show the result of each query as it
would apply to the database state in Figure 5.6.

a. Retrieve the names of all employees in department 5 who work more than 10 hours per week
on the ProductX project.

b. List the names of all employees who have a dependent with the same first name as
themselves.

c. Find the names of all employees who are directly supervised by ‘Franklin Wong’.

d. For each project, list the project name and the total hours per week (by all employees) spent
on that project.

e. Retrieve the names of all employees who work on every project.

f. Retrieve the names of all employees who do not work on any project.

g. For each department, retrieve the department name and the average salary of all employees
working in that department.

h. Retrieve the average salary of all female employees.

i. Find the names and addresses of all employees who work on at least one project located in
Houston but whose department has no location in Houston.

j. List the last names of all department managers who have no dependents.
Step-by-step solution

Step 1 of 10

Tuple relational calculus

The tuple relational calculus is dependent on the use of tuple variables. A tuple variable is a
named relation of “ranges over”.

Domain relational calculus

The variables in the tuple relational calculus take their values from domains of attributes rather
than tuples of relations.

Comment

Step 2 of 10

a.

• To specify the range of a tuple variable e as the EMPLOYEE relation.

• Select the LNAME, FNAME attributes of the EMPLOYEE relation where DNO=5 work for
HOURS>10.

Tuple relational calculus:

Explanation:

• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a, the PROJECT
considers as the b and the WORKS_ON considers as the c.

• In the above tuple relational calculus, there is a free variable a and these appear to the left of
the bar (|).

• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.

• The conditions EMPLOYEE (a) and WORKS_ON (c) specify the range relations for a and c.
The condition a.ssn=c.ESSN is a join condition.

Domain relational calculus:


Explanation:

• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only q and s are free because they appear to the left of the bar.

• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q and s for Name fields.

• There is a condition for selecting a tuple after the bar (|).

• A condition relating two domain variables from relations t=e is a join condition.

Comment

Step 3 of 10

b.

• To specify the range of a tuple variable e as the EMPLOYEE relation.

• Select the LNAME, FNAME attributes of the EMPLOYEE relation who have a dependent with
the same first name as themselves.

Tuple relational calculus:

Explanation:

• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a and the
DEPENDENT considers as the b.

• In the above tuple relational calculus, there is a free variable a and these appear to the left of
the bar (|).

• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.

• The conditions EMPLOYEE (a) and DEPENDENT (b) specify the range relations for a and b.
The condition a.ssn=b.ESSN is a join condition.

Domain relational calculus:

Explanation:

• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only q and s are free because they appear to the left of the bar.

• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q and s for Name fields.

• There is a condition for selecting a tuple after the bar (|).

• A condition relating two domain variables from relations a=t and b=q is a join condition.

Comment

Step 4 of 10

c.

• To specify the range of a tuple variable e as the EMPLOYEE relation.

• Select the LNAME, FNAME attributes of the EMPLOYEE relation to find the names of
employees that are directly supervised by 'Franklin Wong'.

Tuple relational calculus:

Explanation:

• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a and the
EMPLOYEE considers as the b by using self-join.

• In the above tuple relational calculus, there is a free variable a and these appear to the left of
the bar (|).

• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.

• The conditions EMPLOYEE (a) and EMPLOYEE (b) specify the range relations for e and s. The
condition a.ssn=b.SSN is a self-join condition.

Domain relational calculus:

Explanation:
• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only q and s are free because they appear to the left of the bar.

• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q and s for Name fields.

• There is a condition for selecting a tuple after the bar (|).

• A condition relating two domain variables from relations y=d and S.FNAME='Franklin' AND
S.LNAME='Wong' is a join condition.

Comment

Step 5 of 10

e.

• To specify the range of a tuple variable e as the EMPLOYEE relation.

• Select the LNAME, FNAME attributes of the EMPLOYEE relation to retrieve the names of
employees who work on every project.

Tuple relational calculus:

Explanation:

• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a and the FORALL
PROJECT considers as the b.

• In the above tuple relational calculus, there is a free variable a and these appear to the left of
the bar (|).

• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.

• The conditions EMPLOYEE (a) and FORALL PROJECT (b) specify the range relations for a
and b. The condition WHERE PNUMBER=PNO AND ESSN=SSN.

Domain relational calculus:

Explanation:

• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only q and s are free because they appear to the left of the bar.

• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q and s for Name fields.

• There is a condition for selecting a tuple after the bar (|).

• A condition relating two domain variables from relations e=t and PNUMBER=PNO AND
ESSN=SSN is a join condition.

Comment

Step 6 of 10

f.

• To specify the range of a tuple variable e as the EMPLOYEE relation.

• Select the LNAME, FNAME attributes of the EMPLOYEE relation to retrieve the names of
employees who do not work on any project.

Tuple relational calculus:

Comment

Step 7 of 10

Explanation:

• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a and the
WORKS_ON considers as the b.

• In the above tuple relational calculus, there is a free variable a and these appear to the left of
the bar (|).

• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.

• The conditions EMPLOYEE (a) and WORKS_ON (b) specify the range relations for e and w.
The condition WHERE ESSN=SSN.

Domain relational calculus:

Explanation:
• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only q and s are free because they appear to the left of the bar.

• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q and s for Name fields.

• There is a condition for selecting a tuple after the bar (|).

• A condition relating two domain variables from relations a=t WHERE ESSN=SSN is a join
condition.

Comment

Step 8 of 10

i.

• To specify the range of a tuple variable e as the EMPLOYEE relation.

• Select the LNAME, FNAME, and ADDRESS attributes of the EMPLOYEE relation employees
who work on at least one project located in Houston.

Tuple relational calculus:

Explanation:

• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a, the PROJECT
considers as the b and the WORKS_ON considers as the c.

• In the above tuple relational calculus, there is a free variable a and these appear to the left of
the bar (|).

• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.

• The conditions EMPLOYEE (a) and WORKS_ON (c) specify the range relations for a and c.
The condition a.ssn=c.ESSN and PNO=PNUMBER AND PLOCATION='Houston' is a join
condition.

Domain relational calculus:

Explanation:

• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only q and s are free because they appear to the left of the bar.

• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q, s, and v for Name and address fields.

• There is a condition for selecting a tuple after the bar (|).

• A condition relating two domain variables from relations t=e and e.ssn=w.ESSN and
PNO=PNUMBER AND PLOCATION='Houston' is a join condition.

Comment

Step 9 of 10

j.

• To specify the range of a tuple variable e as the EMPLOYEE relation.

• Select the LNAME attribute of the EMPLOYEE relation of department managers who have no
dependents.

Tuple relational calculus:

Explanation:

• In the provided Tuple Relational calculus, the EMPLOYEE considers as the a, the
DEPARTMENT considers as the b and the DEPENDENT considers as the c.

Comment

Step 10 of 10

In the above tuple relational calculus, there is a free variable a and these appear to the left of the
bar (|).

• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.
• The conditions EMPLOYEE (a) and DEPARTMENT (b) specify the range relations for e and d.
The condition a.ssn=b.MGRSSN and SSN=ESSN is a join condition.

Domain relational calculus:

Explanation:

• There is a need of the 10 variables for the EMPLOYEE relation, of the variables q, r, s…z. The
only s is free because they appear to the left of the bar.

• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable s for Name fields.

• There is a condition for selecting a tuple after the bar (|).

• A condition relating two domain variables from relations e=t and e.ssn=d.MGRSSN and
SSN=ESSN is a join condition.

Comment
Chapter 8, Problem 25E

Problem

Specify queries a, b, c, and d of Exercise 1 in both tuple and domain relational calculus.

Exercise 1

Consider the AIRLINE relational database schema shown in Figure, which was described in
Exercise 2. Specify the following queries in relational algebra:

a. For each flight, list the flight number, the departure airport for the first leg of the flight, and the
arrival airport for the last leg of the flight.

b. List the flight numbers and weekdays of all flights or flight legs that depart from Houston
Intercontinental Airport (airport code ‘iah’) and arrive in Los Angeles International Airport (airport
code ‘lax’).

c. List the flight number, departure airport code, scheduled departure time, arrival airport code,
scheduled arrival time, and weekdays of all flights or flight legs that depart from some airport in
the city of Houston and arrive at some airport in the city of Los Angeles.

d. List all fare information for flight number ‘col97’.

e. Retrieve the number of available seats for flight number ‘col97’ on ‘2009-10-09’.

The AIRLINE relational database scheme.

Exercise 2

Consider the AIRLINE relational database schema shown in Figure, which describes a database
for airline flight information. Each FLIGHT is identified by a Flight_number, and consists of one or
more FLIGHT_LEGs with Leg_numbers 1, 2, 3, and so on. Each FLIGHT_LEG has scheduled
arrival and departure times, airports, and one or more LEG_INSTANCEs— one for each Date on
which the flight travels. FAREs are kept for each FLIGHT. For each FLIGHT_LEG instance,
SEAT_RESERVATIONs are kept, as are the AIRPLANE used on the leg and the actual arrival
and departure times and airports. An AIRPLANE is identified by an Airplane_id and is of a
particular AIRPLANE_TYPE. CAN_LAND relates AIRPLANE_TYPEs to the AIRPORTs at which
they can land. An AIRPORT is identified by an Airport_code. Consider an update for the AIRLINE
database to enter a reservation on a particular flight or flight leg on a given date.

a. Give the operations for this update.

b. What types of constraints would you expect to check?

c. Which of these constraints are key, entity integrity, and referential integrity constraints, and
which are not?

d. Specify all the referential integrity constraints that hold on the schema shown in Figure.

Step-by-step solution

Step 1 of 5

a.

Tuple Relational Calculus:


In the provided Tuple Relational calculus the FLIGHT consider as the f and the FLIGHT_LEG
consider as the l.

• In the above tuple relational calculus there are two free variable f and l and these appear to the
left of the bar (|).

• The variables are retrieved which come before the bar (|), for all those tuples which satisfies the
conditions provided after the bar.

• The conditions FLIGHT (f) and FLIGHT_LEG (l) specifies the range relations for f and l. The
condition f.Fnumber = l.flight_number is a join condition, whose purpose is similar to the INNER
JOIN operation

Domain Relational Calculus:

• There are need of the 10 variables for the FLIGHT relation, of the ten variables q, r, s…z. Only
q, and v are free, because they appear to the left of the bar.

• Firstly there is specification of the requested attributes, flight number, departure airport for the
first leg of the flight and the arrival airport for the last leg of the flight.

• There is condition for selecting a tuple after the bar (|).

• A condition relating two domain variable from relations m=z is a join condition.

Comment

Step 2 of 5

b.

Tuple Relational Calculus:

In the provided Tuple Relational calculus the FLIGHT consider as the f and the FLIGHT_LEG
consider as the l.

• In the created tuple relational calculus there is a single to free variable f this is appear to the left
of the bar ( | ) .

• The variables are retrieved which come before the bar (|), for all those tuples which satisfies the
conditions provided after the bar.

• The condition l.Departure_airport_code=’iah’ and l.Arrival_airport_code=’Iax’ is a selection


condition, which is similar to the SELECT operation in relational algebra.

• The conditions FLIGHT (f) and FLIGHT_LEG (l) specified the range relations for f and l. The
condition f.Fnumber = l.flight_number is a join condition, whose purpose is similar to the INNER
JOIN operation.

Domain Relational Calculus:

• There are need of the 10 variables for the FLIGHT relation, of the ten variables q, r, s…z, only
u, and v are free, because they appear to the left of the bar.

• Firstly there is specification of the requested attributes flight number, Weekdays, departurefrom
the Houstonintercontinental and arrive in los Angeles international Airport and of all the flight and
the arrival airport for the last leg of the flight.

• The values assigned to the variable qrstuvwxyz, they become the tuple of the FLIGHT relation
and these values are for q (Departure_airport_code) and r (Arrival_airport_code) is equal to ‘iah’
and ‘Iax’ respectively.

• Then there is condition for selecting a tuple after the bar (|).

• A condition relating two domain variable from relations m=z is a join condition.

Comment

Step 3 of 5

c.

Tuple Relational Calculus:

In the provided Tuple Relational calculus the FLIGHT consider as the f and the FLIGHT_LEG
consider as the l.

• In the created tuple relational calculus there are two free variable f and l and these appear to
the left of the bar (|) .

• The variables are retrieved which come before the bar (|), for all those tuples which satisfies the
conditions provided after the bar.
• The condition l.Departure_airport_code=’iah’ and l.Arrival_airport_code=’Iax’ is a selection
condition, which is similar to the SELECT operation in relational algebra.

Comment

Step 4 of 5

The conditions FLIGHT (f) and FLIGHT_LEG (l) specifies the range relations for f and l. The
condition f.Fnumber = l.flight_number is a join condition, whose purpose is similar to the INNER
JOIN operation

Domain Relational Calculus:

• There are need of the 10 variables for the FLIGHT relation and 5 variable for FLIGHT_LEG, of
the 15 variables k, l…..q, r, s…z, only u, l, m, n, o and v are free, because they appear to the left
of the bar.

• Firstly there is specification of the requested attributes flight number, Departure_airport_code,


Scheduled_departure_time, Arrival_airport_code, scheduled_Arrival_time and weekdays for all
flight depart from some airportin the Houston city and arrive at some airport in the city of Los
Angles.

• The values assigned to the variable qrstuvwxyz and jklmnop, they become the tuple of the
FLIGHT, FLIGHT_LEG relation and these values are for q (Departure_airport_code) and r
(Arrival_airport_code) is equal to ‘iah’ and ‘Iax’ respectively.

• Then there is condition for selecting a tuple after the bar (|).

• A condition relating two domain variable from relations m=z is a join condition.

Comment

Step 5 of 5

d.

Tuple Relational Calculus:

• In the created tuple relational calculus there are two free variable f and r and these appear to
the left of the bar (|) .

• The variables are retrieved which come before the bar (|), for all those tuples which satisfies the
conditions provided after the bar.

• The condition r.Fnumber=’col197’is a selection condition, which is similar to the SELECT


operation in relational algebra.

• The condition FLIGHT (f) and FARE(r) specifies the range relations for f and r. The condition
r.Fnumber = f.flight_number is a join condition, whose purpose is similar to the INNER JOIN
operation

Domain Relational Calculus:

• There are need of the 10 variables for the FLIGHT relation and 5 variable for FLIGHT_LEG, of
the 15 variables k, l…..q, r, s…z, only s, t, u, v, and m are free, because they appear to the left of
the bar.

• Firstly there is specification of the requested attributes flight number, Fare_code, Amount,
Restrication, and Airline for all fare information for flight number ‘col197’.

• The values assigned to the variable qrstuvwxyz and lmnop , they become the tuple of the
FARE, FLIGHT relation and these values are for q (flight_number) is equal to ‘col197’.

• Then there is condition for selecting a tuple after the bar (|).

• A condition relating two domain variable from relations m=z is a join condition

Comment
Chapter 8, Problem 26E

Problem

Specify queries c, d, and f of Exercise in both tuple and domain relational calculus.

Exercise

Consider the LIBRARY relational database schema shown in Figure, which is used to keep track
of books, borrowers, and book loans. Referential integrity constraints are shown as directed arcs
in Figure, as in the notation of Figure 5.7. Write down relational expressions for the following
queries:

a. How many copies of the book titled The Lost Tribe are owned by the library branch whose
name is ‘Sharpstown’?

b. How many copies of the book titled The Lost Tribe are owned by each library branch?

c. Retrieve the names of all borrowers who do not have any books checked out.

d. For each book that is loaned out from the Sharpstown branch and whose Due_date is today,
retrieve the book title, the borrower’s name, and the borrower’s address.

e. For each library branch, retrieve the branch name and the total number of books loaned out
from that branch.

f. Retrieve the names, addresses, and number of books checked out for all borrowers who have
more than five books checked out.

g. For each book authored (or coauthored) by Stephen King, retrieve the title and the number of
copies owned by the library branch whose name is Central.

A relational database scheme for a LIBRARY database.

Step-by-step solution

Step 1 of 3

c.

Following is the relational expression to retrieve the names of the borrowers who have no books
checked out:

Tuple Relational calculus:

Explanation:

• In the provided Tuple Relational calculus, the Borrower considers as the b and the Book_Loans
consider as the l.

• In the above tuple relational calculus, there are two free variable b and l and these appear to
the left of the bar (|).

• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.

• The conditions Borrower (b) and Book_Loans (l) specifiy the range relations for b and l. The
condition b.Card_No = l.Card_No is a join condition.

Domain Relational Calculus:


Explanation:

• There is a need of the 10 variables for the Borrower relation, of the ten variables q, r, s…z. The
only q is free because they appear to the left of the bar.

• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q for Name filed.

• There is a condition for selecting a tuple after the bar (|).

• A condition relating two domain variables from relations m=z is a join condition.

Comment

Step 2 of 3

d.

Following is the relational expression to retrieve the book title, borrower’s name and address of
the book that is loaned out from of the borrowers who have no books checked out from a branch
whose name is ‘Sharps town’ and which has the due date as today:

Tuple Relational calculus:

Explanation:

• In the provided Tuple Relational calculus, the Borrower considers as the b and the Book_Loans
consider as the c.

• In the above tuple relational calculus, there are two free variable b and c and these appear to
the left of the bar (|).

• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.

• The conditions Borrower (b) and Book_Loans (c) specify the range relations for and l. The
condition b.branch_name = “sharptown” and c.Card_No = b.Card_No and c.Card_No =
a.Card_No is a join condition.

Domain Relational Calculus:

• There is a need of the 16 variables for the BOOK relation, The only a,e, and f are free because
they appear to the left of the bar.

• Firstly, there is a specification of the requested attributes title from book and name and address
fields form borrower.

• The values assigned to the variable ijklm, they become the tuple of the Book_loans relation and
these values are for i (card_no) is equal to o (card_no) and branch_name=”Sharptown”.

• Then there is a condition for selecting a tuple after the bar (|).

• A condition relating two domain variables from relations i=o and j=f is a join condition.

Comment

Step 3 of 3

f.

Following is the relational expression to retrieve the name, address and the total number of
books for all borrowers who have more than five books checked out:

Tuple Relational calculus:

Explanation:

• In the provided Tuple Relational calculus, the Borrower considers as the b and the Book_Loans
consider as the a.

• In the above tuple relational calculus, there are two free variable b and a and these appear to
the left of the bar (|).

• The variables are retrieved which come before the bar (|), for all those tuples which satisfy the
conditions provided after the bar.

• The conditions Borrower (b) and Book_Loans (a) specify the range relations for b and a. The
condition b.Card_No = l.Card_No is a join condition and retrieve the total number of books for all
borrowers using count() function.

Domain Relational Calculus:

Explanation:

• There is a need of the 10 variables for the Borrower relation, of the ten variables q, r, s…z. The
only q,s, and v are free because they appear to the left of the bar.

• Firstly, there is a specification of the requested attribute, the name of the barrower, by the free
domain variable q for Name filed, s for address, and v for a total number of books.
• There is a condition for selecting a tuple after the bar (|).

• A condition relating two domain variables from relations m=z is a join condition and count is
greater than 5.

Comment
Chapter 8, Problem 27E

Problem

In a tuple relational calculus query with n tuple variables, what would be the typical minimum
number of join conditions? Why? What is the effect of having a smaller number of join
conditions?

Step-by-step solution

Step 1 of 1

In a tuple relational calculus, query with n tuple variables should be at least ( n – 1) join
conditions, and the second side, the Cartesian product with one of the range relations would be
taken. This usually does not make sense.

Comment
Chapter 8, Problem 28E

Problem

Rewrite the domain relational calculus queries that followed Q0 in Section 8.7 in the style of the
abbreviated notation of Q0A, where the objective is to minimize the number of domain variables
by writing constants in place of variables wherever possible.

Step-by-step solution

Step 1 of 5

Q1: {qsr / (EMPLOYEE (qrstuvwxyz) AND DEPARTMENT ((mno) AND =


‘Research’ AND m = z )}

Comment

Step 2 of 5

This condition relations two domain variables, here range over attribute from two relations are.

m = 2 in Q1 and

domain variable to a constant l = ‘Research’

so domain relational calculus for above query is

Comment

Step 3 of 5

Q1A: { qsv / (EXISTS z ) (EXISTS m) (EMPLOYEE (qrstuvwxyz) AND DEPARTMENT


(‘Research ‘, m,n,o) AND m = z)}

Comment

Step 4 of 5

Q2: {; ksuv | PROTECT (hijk)

AND EMPLOYEE (qrstuvwxyz) AND DEPARTMENT (lmno) AND k = m AND n = t AND j = ‘ ’ )}

Domain relational calculus is,

Comment

Step 5 of 5

Q2A:{ iksuv / (EXISTS m) (EXISTS n) (EXISTS t) (PROJECT (h,l,’stafford’, k) AND EMPLOYEE


(q,r,s,t,u,v,w,x,y,z) AND DEPARTMENT (l,m,n,o))}

Remaining queries Q6, and Q7 will not be different so, they have no constants.

Comment
Chapter 8, Problem 29E

Problem

Consider this query: Retrieve the Ssns of employees who work on at least those projects on
which the employee with Ssn = 123456789 works. This may be stated as (FORALL x) (IF P
THEN Q), where

■ x is a tuple variable that ranges over the PROJECT relation.

■ P ≡ employee with Ssn = 123456789 works on project x.

■ Q ≡ employee e works on project x.

Express the query in tuple relational calculus, using the rules

■ (∀ x)(P(x)) = NOT (∃x) ( NOT(P(x))).

■ (IF P THEN Q)≡(NOT(P) ORQ).

Step-by-step solution

Step 1 of 1

{e.Ssn|EMPLOYEE(E), AND(( X)(NOT (PROJECT(x)) OR NOT ( ( y)


(WORKS_ON(y) AND y.Essn = ‘123456789’)) OR (( w)(WORKS_ON (w) AND w.Essn =
e.Ssn AND x.Pnumber = W.Pno)) )}

{e.Ssn|EMPLOYEE(E), AND(NOT( X)( (PROJECT(x)) AND (not ( y)


(NOT(WORKS_ON(y) OR NOT y.Essn = ‘123456789’))) OR (( w)(WORKS_ON (w) AND
w.Essn = e.Ssn AND x.Pnumber = W.Pno)) )}

Comment
Chapter 8, Problem 30E

Problem

Show how you can specify the following relational algebra operations in both tuple and domain
relational calculus.

a. σA=C(R(A, B, C))

b. π<A, B>(R(Α, B, C))

c. R(A, B, C) * S(C, D, E)

d. R(A, B, C) ⋃ S(A, B, C)

e. P(A, B, C) ⋂ S(A, B, C)

f. P(A, B, C) = S(A, B, C)

g. R(A, B, C) ×S(D, E, F)

h. P(A, B) ÷ S(A)

Step-by-step solution

Step 1 of 7

(a)

Tuple calculus expression followed by the domain calculus expression is

Comment

Step 2 of 7

(b)

Tuple calculus followed by the domain calculus is

Comment

Step 3 of 7

(c)

Tuple calculus expression followed by the domain calculus is

Comment

Step 4 of 7

(d)

Tuple calculus expression followed by the domain calculus is

Comment

Step 5 of 7
(e)

Tuple calculus expression

(f)

Tuple calculus expression

Comments (1)

Step 6 of 7

(g)

Tuple calculus expression

Comment

Step 7 of 7

(h)

Tuple relation calculus expression is

Comment
Chapter 8, Problem 31E

Problem

Suggest extensions to the relational calculus so that it may express the following types of
operations that were discussed in Section 8.4: (a) aggregate functions and grouping; (b) OUTER
JOIN operations; (c) recursive closure queries.

Step-by-step solution

Step 1 of 3

1. We can define a relation AGGREGATE with attributes Sum, Minimum, Maximum, Average,
Count etc. Using any query we can say

{t.Sum| AGGREGATE(t) AND ( x)(EMPLOYEE(x) AND x.Sum Σ e.Salary)}

We can get sum of salary of all Employees. We can include similar functions for other aggregate
operations.

Comment

Step 2 of 3

2. For OUTER JOIN a special Operation say with symbol δ can be used.

And query may look like:

{t.| (EMPLOYEE δ DEPARTMENT)(t)}

Comment

Step 3 of 3

3. Recursive closure: a special Operation say with symbol Φ can be used.

And query may look like:

{t.| EMPLOYEE (t) AND t.Ssn Φ t.Mgr_ssn }

So by specifying that it is a recursive closure operation we may instruct system to calculate result
of query.

Comment
Chapter 8, Problem 32E

Problem

A nested query is a query within a query. More specifically, a nested query is a parenthesized
query whose result can be used as a value in a number of places, such as instead of a relation.
Specify the following queries on the database specified in Figure 5.5 using the concept of nested
queries and the relational operators discussed in this chapter. Also show the result of each query
as it would apply to the database state in Figure 5.6.

a. List the names of all employees who work in the department that has the employee with the
highest salary among all employees.

b. List the names of all employees whose supervisor’s supervisor has ‘888665555’ for Ssn.

c. List the names of employees who make at least $10,000 more than the employee who is paid
the least in the company.

Step-by-step solution

Step 1 of 3

Consider the COMPANY database specified in Figure 5.5.


a. List the names of all employees who work in the department that has the employee with the
highest salary among all employees.

The query using the relational operators is as follows:

Result:

Comment

Step 2 of 3

b. List the names of all employees whose supervisor’s supervisor has '888665555' for SSN.

The query using the relational operators is as follows:

Result:

Comments (1)

Step 3 of 3

c. List the names of employees who make at least $10,000 more than the employee who is paid
the least in the company.

The query using the relational operators is as follows:

Result:

Comment
Chapter 8, Problem 33E

Problem

State whether the following conclusions are true or false:

a. NOT (P(x) OR Q(x)) → (NOT (P(x)) AND (NOT (Q(x)))

b. NOT (∃ x) ( P(x )) → ∀ x (NOT (P(x))

c. (∃ x) (P(x)) → ∀ x: (( P(x))

Step-by-step solution

Step 1 of 3

(a) TRUE

Comments (2)

Step 2 of 3

(b) TRUE

Comment

Step 3 of 3

(c) FALSE

Comment
Chapter 8, Problem 34LE

Problem

Specify and execute the following queries in relational algebra (RA) using the RA interpreter on
the COMPANY database schema in Figure 5.5.

a. List the names of all employees in department 5 who work more than 10 hours per week on
the ProductX project.

b. List the names of all employees who have a dependent with the same first name as
themselves.

c. List the names of employees who are directly supervised by Franklin Wong.

d. List the names of employees who work on every project.

e. List the names of employees who do not work on any project.

f. List the names and addresses of employees who work on at least one project located in
Houston but whose department has no location in Houston.

g. List the names of department managers who have no dependents.

Step-by-step solution

Step 1 of 7

a)

EMP_WORK_PRODUCT<--(σPname=’ProductX’(Project)) ?(Pnumber),(Pno)

(Works_on)

EMP_W_10<--

(Employee)?(Ssn,Essn)(σHours>10(EMP_WORK_PRODUCT))

π Lname, Fname, Minit(σ Dno = 5(EMP_W_10))

Explanation: The above query will display the names of all the employee of department and also
who works more than 10 hrs per week on the Product X project. For this query we have used
natural join and ‘σ’ is for selecting and ‘π’ is projection which eliminates duplicates.

Comment

Step 2 of 7

b)

EMP<--(Employee)? (Ssn,Fname),(Essn, Dependent_name) (DEPENDENT)

π Lname, Fname,Minit (EMP)

Explanation: The above query will display the names of all the employees who have a
dependent with the same first as themselves.

Comment

Step 3 of 7
C)

Wong_S<--πSsn(σFname=’Franklin’and Lname=’Wong’(Employee))

Emp_wong <--(Employee) ? (SuperSsn),(Ssn)(Wong_s)

π Lname, Fname,Minit(Emp_wong)

Explanation: The above query we use self join in this query to display the names of all the
employees who are under the supervision of Franklin Wong.

Comment

Step 4 of 7

D)

Emp_proj(Pno,Ssn) <-- πPno,Essn(Works_on)

All_proj <-- π Pnumber (Project)

All_proj_emp <-- Emp_proj ÷ All_proj

π Lname, Fname,Minit( Employee * All_proj_emp)

Explanation: The above query will give the names of employees who work on every project by
using minus operator which will remove all the rows that exists in left side table.

Comment

Step 5 of 7

e)

Emps <-- π Ssn (Employee)

Emps_Working(Ssn) <-- π Essn (Works_on)

Emp_Non_work_Project <-- Emps - Emps_Working

π Lname, Fname,Minit( Employee * Emp_Non_work_Project)

Explanation: The above query will give the names of employees who does not works on any
project by using minus operator which will remove all the rows that exists in left side table.

Comment

Step 6 of 7

f)

Emp_proj_Hou(Ssn)<--πEssn(Works_on(Pno),(Pnumber)(σPlocation=’Houston’(Project)))

Dept_NOLOC_HOU <--

πDno(Department)–πDno(σDlocation= ‘Houston’(Department’)

Emp_Dept_No_Hou<--

πSsn(Employee ? (Pno),(Dno)( Dept_NOLOC_HOU))

Emps_Result <-- Emp_proj_Hou - Emp_Dept_No_Hou

π Lname, Fname,Minit,Address( Employee * Emps_Result)

Explanation: the above query will give the names and address of employees who work at least
one project located in ‘Houston’ and no department location in ‘Houston’ by using minus operator
which will remove all the rows that exists in left side table.

Comment

Step 7 of 7

g)

Managers_Dept(Ssn) <-- π Mgr_Ssn(Department)

Dependents _Of _ Emps(Ssn) <-- π Essn (Dependent)

Emps_Result <-- Managers_Dept - Dependents _Of _ Emps

π Lname, Fname,Minit( Employee * Emps_Result)

Explanation: the above query will give the names of department managers who have no
dependents by using minus operator which will remove all the rows that exists in left side table.

Comment
Chapter 8, Problem 35LE

Problem

Consider the following MAILORDER relational schema describing the data for a mail order
company.

PARTS(Pno, Pname, Qoh, Price, Olevel)

CUSTOMERS(Cno, Cname, Street, Zip, Phone)

EMPLOYEES(Eno, Ename, Zip, Hdate)

ZIP_CODES(Zip, City)

ORDERS(Ono, Cno, Eno, Received, Shipped)

ODETAILS(Ono, Pno, Qty)

Qoh stands for quantity on hand : the other attribute names are self- explanatory. Specify and
execute the following queries using the RA interpreter on the MAILORDER database schema.

a. Retrieve the names of parts that cost less than $20.00.

b. Retrieve the names and cities of employees who have taken orders for parts costing more
than $50.00.

c. Retrieve the pairs of customer number values of customers who live in the same ZIP Code.

d. Retrieve the names of customers who have ordered parts from employees living in Wichita.

e. Retrieve the names of customers who have ordered parts costing less than $20.00.

f. Retrieve the names of customers who have not placed an order.

g. Retrieve the names of customers who have placed exactly two orders.

Step-by-step solution

Step 1 of 7

MAILORDER Relational Schema

a)

The following command is used to retrieve the names of “PARTS” that costs less than $20.00.

SELECT Pname FROM PARTS WHERE Price<$20.00;

Comment

Step 2 of 7

b)

The following command is used to retrieve the names and cities of employees and whose have
taken orders for parts costing more than $50.00.

SELECT Emp.Ename, Z.City FROM PARTS P, EMPLOYEES Emp,

ZIP_CODES Z, ODETAILS OT WHERE P.Pno=OT.Pno AND

Emp.Zip= Z.Zip AND Price>$50.00;

Comment

Step 3 of 7

c)

The following command is used to retrieve pairs of customer number values of customers and
who live in the same ZIP code:

SELECT C.Cno, C1.Cno FROM CUTOMERS C, CUSTOMERS C1 WHERE

C.Zip= C1.Zip AND C.Cno!=C1.Cno;

Comment

Step 4 of 7

d)
The following command is used to retrieve names of customer and who have ordered parts from
employees living in Wichita.

SELECT Distinct C.cname FROM CUSTOMERS C, ORDERS O, EMPLOYEES E, ZIP_CODE Z


WHERE C.cno=O.cno AND O.eno = e.eno AND E.zip=Z.zip AND Z.city=‘Wichita’);

Comment

Step 5 of 7

e)

The following command is used to retrieve names of customer and who have ordered parts
costing less than $20.00.

SELECT C.cname FROM Customers C where NOT EXISTS (select

P.Pno from parts p where p.price<20.00 and NOT

EXISTS (Select * from ORDERS O, Odetails OT where O.Ono= OT.Ono and O.Ono=C.Cno and
OT.Pno=P.Pno));

Comment

Step 6 of 7

f)

The following command is used to retrieve names of customer and who have not placed an
order.

SELECT C.cname from Customers C Where NOT EXISTS (Select Ono from ORDERS O,
Customers C where O.Ono=C.Cno);

Comment

Step 7 of 7

g)

The following command is used to retrieve names of customer and who have placed an exactly
two orders.

SELECT C.cname FROM Customers C, ORDERS O where O.Ono=C.Cno and COUNT


(Ono)=2;

Comment
Chapter 8, Problem 36LE

Problem

Consider the following GRADEBOOK relational schema describing the data for a grade book of a
particular instructor. ( Note : The attributes A, B, C, and D of COURSES store grade cutoffs.)

CATALOG(Cno, Ctitle)

STUDENTS(Sid, Fname, Lname, Minit)

COURSES(Term, Sec_no, Cno, A, B, C, D)

ENROLLS(Sid, Term, Sec_no)

Specify and execute the following queries using the RA interpreter on the GRADEBOOK
database schema.

a. Retrieve the names of students enrolled in the Automata class during the fall 2009 term.

b. Retrieve the Sid values of students who have enrolled in CSc226 and CSc227.

c. Retrieve the Sid values of students who have enrolled in CSc226 or CSc227.

d. Retrieve the names of students who have not enrolled in any class.

e. Retrieve the names of students who have enrolled in all courses in the CATALOG table.

Step-by-step solution

Step 1 of 5

GRADEBOOK Database

a)

The following command is used to retrieve the names of students enrolled in the Automata class
during the fall 2009 term.

• Select Fname, Minit, Lname FROM STUDENTS, ENROLLS, COURSES, CATALOG WHERE
STUDENTS.Sid= ENROLLS.Sid And COURSES.Cno=CATALOG.Cno And
COURSES.Term=ENROLLS.Term And CATALOG.Ctitle = Automata And ENROLLS.Term=2009;

Comment

Step 2 of 5

b)

The following command is used to retrieve the Sid values of students who have enrolled in
CSc226 and CSc227.

• Select Sid From STUDENTS WHERE Sid IN (Select Sid from ENROLLS, COURSES WHERE
COURSES.Term= ENROLLS.Term And COURSES.Cno=’CSc226’ And Sid IN (Select Sid from
ENROLLS, COURSES WHERE COURSES.Term= ENROLLS.Term And
COURSES.Cno=’CSc227’;

Comment

Step 3 of 5

c)

The following command is used to retrieve the Sid values of students who have enrolled in
CSc226 or CSc227.

• Select Sid From STUDENTS WHERE Sid IN (Select Sid from ENROLLS, COURSES WHERE
COURSES.Term= ENROLLS.Term And COURSES.Cno=’CSc226’ OR Sid IN (Select Sid from
ENROLLS, COURSES WHERE COURSES.Term= ENROLLS.Term And COURSES.Cno=
‘CSc227’;

Comment

Step 4 of 5

d)

The following command is used to retrieve the names of students who have not enrolled in any
class.

• Select Fname, Minit, Lname FROM STUDENTS WHERE NOT EXISTS (Select Sid from
ENROLLS);
Comment

Step 5 of 5

e)

The following command is used to retrieve the names of students who have enrolled in all
courses in the CATALOG table.

• Select Fname, Minit, Lname FROM STUDENTS WHERE NOT EXISTS (

( Select Cno from CATALOG) MINUS (Select Cno from COURSES , ENROLLS WHERE
COURSE.Term= ENROLLS.Term And STUDENTS.Sid=ENROLLS.Sid));

Comment
Chapter 8, Problem 37LE

Consider a database that consists of the following relations.


SUPPLIER(Sno, Sname)
PART(Pno, Pname)
PROJECT(Jno, Jname)
SUPPLY(Sno, Pno, Jno)
The database records information about suppliers, parts, and projects and includes a ternary relationship
between suppliers, parts, and projects. This relationship is a many-many-many relationship. Specify the
following queries in relational algebra.
1. Retrieve the part numbers that are supplied to exactly two projects.
2. Retrieve the names of suppliers who supply more than two parts to project J1.
3. Retrieve the part numbers that are supplied by every supplier.
4. Retrieve the project names that are supplied by supplier S1 only.
5. Retrieve the names of suppliers who supply at least two dierent parts each to at least two dierent
projects.
Chapter 9, Problem 1RQ

Problem

(a) Discuss the correspondences between the ER model constructs and the relational model
constructs. Show how each ER model construct can be mapped to the relational model and
discuss any alternative mappings.

(b) Discuss the options for mapping EER model constructs to relations, and the conditions under
which each option could be used.

Step-by-step solution

Step 1 of 3

A model representing the data in conceptual and abstract way is called ER model. This can be
used in database modeling. Also used to reduce the complexity of the database schema and
also produce a semantic data model of a system.

In relational schema relationship types are represented by two attributes, one as a primary key
and the other one as a foreign key instead of representing them explicitly.

a.

Some of the correspondence between ER model and relational model are as follows:

ER MODEL RELATIONAL MODEL

Entity relationship model has


Relational model has entities consisting of attributes.
entity and relationship among
Relationship is established through foreign keys.
the entities.

ER model consists strong


entity type that is represented Entity relations are constructed for each strong entity.
by a rectangle.

ER model also consists weak


entity type that is represented Entity relations are constructed for each weak entity.
by a rectangle.

All binary 1: 1 or 1: N
Relationship between two entities is represented by foreign
relationship type are
key or relationship relation having two foreign keys each
represented by a line
representing corresponding entity.
connecting line.

All binary M: N relationship


type are represented by a line Represented by relationship relation or two foreign keys
connecting line.

All n-ary relationship (n>2) type


are represented by a line Relationship relation and n foreign keys.
connecting line.

Relations have attributes corresponding to the entities of ER


Entities have simple attributes.
model.

Entities have composite


Relations have set of simple component attributes.
attributes.

Entities have multivalued Multivalued attributes of ER model are represented by


attributes relation and foreign keys.

ER model also has derived


Derived attributes are not included.
attributes

Value set is the set of values


that may be assigned to Domain is the value scope of particular attribute.
attributes.

This model consists of primary key, foreign key, composite


Key attributes are underlined.
key or candidate key etc.

Follow the following steps to map ER model into relational model efficiently:

1. Ignore derived attribute.

Derived attribute are the attributes which can be derived from other attributes like age, full name.
If ER diagram has any derived attribute than remove all derived attributes to make schema
simpler. Full name can be calculated by concatenating the first name, middle name and the last
name of the candidate. So it is not required to store the full name of the candidate separately.

2. Mapping of all strong Entities into tables.

• Map all strong entities into tables. Create a separate relation for each strong entity including all
simple attributes in the ER diagram and choose key attribute of ER diagram as primary key of
relation.

• Assume an entity type T in the ER model E, create a relation R including all simple attributes of
T, also choose unique attribute as a primary key of relation R.

• If multiple keys exist for T in E during the analysis of the design, then keep all of them to
describe specific information about the attributes. Keys can also be used for indexing the
database and also for other analysis.

3. Mapping of weak Entities.

• Map all weak entities into tables. Create a separate relation for each weak entity including all
simple attributes. Include all primary keys of the relations to which weak entity is related as
foreign key, to establish connection among the relations.

• Weak entity does not have its own candidate key. Here candidate key of the relation R is
composed of the primary key(s) of the participating entity(s) and the partial key of the weak
entity.

4. Binary 1:1 Mapping.

• For each binary 1:1 relationship in the relation R constructed by the ER schema, identify
relation between two entities. This relationship might occur in the form of foreign key or merging
two attributes into one as a candidate key.

• Also add the attributes which come under relationship. This can also be done by creating a new
relation R that includes primary keys of both participating relations as foreign key.

5. Binary 1: N Mapping.

• Identify all 1: N relationships in ER diagram. For each binary 1: N relationship in relation R, the
primary key present on the 1-side of the relationship becomes a foreign key on the N-side
relation.

• Another approach is to create a new relation S that includes primary keys of both participating
entities. Both primary keys work as foreign keys in S.

6. Binary M: N Mapping.

• Identify all M: N relationship in ER diagram. Create new relation S, corresponding to each


binary M: N relationship, to represent relationship R. Include both primary key attributes of
participating relations as foreign keys in the relation S. Also include the simple attributes of the
relationship.

• Combination of foreign keys will form primary key in S. As in 1: 1 or 1: N relationship, M: N


relationship can’t be represented by single foreign key attribute used in one of the participating
relations.

7.

Comment

Step 2 of 3

Mapping of Multivalued attributes.

• Create a new relation R, corresponding to each multivalued attribute A present in the ER


diagram. All simple attributes corresponding to A, would be present in relation R.

• The relation will comprise of primary key attribute K, such that the attribute K belongs to the
relation representing the relationship type containing A as a multivalued attribute. The primary
key of R would be the combination of A and K.

8. Mapping of N-ary relationship.

• For each n-ary relationship, having , represent the relation R through a new relation.
Include primary keys attributes of all participating relations as foreign key attributes and also
include the simple attribute of n-ary relationship.

• Since the participating entities are more than two, so without creating a new relation this cannot
be mapped. Combination of all foreign keys is generally used as a primary key in relation R.

Comment

Step 3 of 3

b.

Method of mapping EER model into relational model.

Mapping of Enhanced Entity Relationship (EER) model to relations includes all the 8 steps
followed in part (a). EER model is an extended model, used to map extended elements of the ER
model. Extended elements in the EER model are specialization or generalization and shared
subclasses.

The following steps can also be used for EER to relation mapping:

Mapping specialization or generalization.

• A number of subclasses, that constitutes a specialization, can be mapped into relational


schema using several options.

First option is to map the whole specialization into a single table. Second option is to map it into
multiple tables. In each option, variations may occur that depends upon the constraints on the
specialization or generalization.
• Each specialization containing m subclasses and generalized super class
C, having the primary key k and the attributes are converted into relational
schemas using one of the following options:

Option 9a: Multiple relations—superclass and subclasses.

• Create a relation R for superclass that includes all the attributes of C with the primary key k.
Create a separate relation having primary key k and attributes
for each subclass , where . Here k is working
as primary key for each relation .

Option 9b: Multiple relations—subclasses relations only.

• Create a relation corresponding to every subclass that includes the attributes


, where k is primary key for each relation
. The specialization, whose subclasses are total, can use this option. A total subclass is a class
such that at least one subclass must contain all the entities of super class.

• Specialization having disjointed constraints can be mapped through this option. In the case of
specialization overlapping, there can be replication of same entity in several relations. This will
cause redundancy in the relational schema.

Option 9c: Single relation with one type attribute.

• Create a single relation R that includes the attributes

. The attribute t is called a type

attribute or discriminating attribute. It represents the subclass to which each tuple/record


belongs. The attribute k is the primary key.

• This option is applicable for a specialization whose subclasses are disjoint. This option
generates many NULL values if independent attributes exist in the subclasses.

Option 9d: Single relation with multiple type attributes.

• Create a single relation R that includes the attributes

. The attribute k is the

primary key for the relation R.

• Each attribute is a Boolean type attribute. This indicates whether a record is contained by
a subclass or not. This option can be used for specialization, having overlapping subclasses.

Comment
Chapter 9, Problem 2E

Problem

Map the UNIVERSITY database schema shown in Figure 3.20 into a relational database
schema.

Step-by-step solution

Step 1 of 3

Refer Fig 3.20 of chapter 3 for the UNIVRESITY database schema from the textbook.

Comment

Step 2 of 3

Basic steps to map ER diagram into Relational Database Schema are as follows:

1. Ignore derived attribute.

If ER diagram has any derived attribute than remove all derived attributes to make schema
simpler. Derived attribute are the attributes which can be derived from other attributes like, age,
full name. Age can be calculated through difference of current date and the date of birth.

2. Mapping of all strong Entities into tables.

Map all strong entities into tables. Create a relation R that includes all single attributes in the ER
diagram and choose key attribute of ER diagram as primary key of relation R.

COLLEGE

CName COffice CPhone

INSTRUCTOR

Id IName Rank IOffice IPhone

DEPT

DCode DName DOffice DPhone

STUDENT

Sid DOB FName MName LName Addr Phone Major

COURSE

CCode Credits CoName Level CDesc


SECTION

SecId SecNo Sem Year Bldg RoomNo DaysTime

3. Mapping of weak Entities.

For each weak entity create a separate relation R. Add all the simple attributes of weak entity in
relation R. Include all primary keys of the relations to which weak entity is related as foreign key,
to establish connection among the relations. Since the provided ER diagram has no weak entity,
so there is no need to map weak entities.

4. 1:1 Mapping.

For each binary 1:1 relationship in the relation R constructed by the ER schema, identify relation
between two entities. This relationship might occur in the form of foreign key or by merging two
attributes into one (both must have exact same number of attributes). Also add the attributes
which come under relationship.

COLLEGE

CName COffice CPhone DeanId

INSTRUCTOR

Id IName Rank IOffice IPhone DCode CStartDate

5. 1: N Mapping.

Identify all 1:N relationships in ER diagram. For each regular binary 1:N relationship in relation R,
add primary key of participating relation of 1-side as foreign keys to the N-side relation.

COLLEGE

CName COffice CPhone DeanId DCode

DEPT

DCode DName DOffice DPhone CCode InstId SId

INSTRUCTOR

Id IName Rank IOffice IPhone DCode CStartDate SecId

COURSE

CCode Credits CoName Level CDesc SecId

6. M: N Mapping.

Identify all M:N relationship in ER diagram. For each M:N relationship, create new relation S to
represent relationship. Include all primary key attributes of participating relation as foreign key in
the relation S.

TAKES

Sid Grade SecId

7. Mapping of Multivalued attributes.

For each multivalued attribute in the ER diagram, create a new relation R. R will include all
attributes corresponding to multivalued attribute. Add primary key attribute as a foreign key in R.

Since the provided ER diagram has no multivalued attributes, so there is no need to map
multivalued attributes.

8. Mapping of N-ary relationship.

For each n-ary relationship, where , create a new relation R to represent the relationship.
Include primary keys attributes of all participating relations as foreign key attributes and also
include the simple attribute of n-ary relationship.

Since the maximum value of n is 2 in the ER diagram provided, so there is no n-ary relationship.

Comment

Step 3 of 3

Final relational schema, for ER diagram provided in Fig-3.20, can be generated as follows:

Final schema has seven relations, six from the strong entities and one from binary M: N
relationship. Each relational table has primary and foreign keys. TAKES table represents
relationship between STUDENT and SECTION table.

Also, Grade can be calculated with the help of Sid and SecId for corresponding semester, year or
in particular section.
• In COLLEGE table, CName is primary key and DeanId and DCode are foreign keys for
INSTRUCTOR and DEPT tables respectively. DeanId is the projection of Id attribute in
INSTRUCTOR table.

• In INSTRUCTOR table, Id is working as primary key. DCode and SecId are working as foreign
key for DEPT and SECTION tables respectively.

• In DEPT table, DCode is unique for each department and it is working as primary key. To
establish connection with COURSE, INSTRUCTOR and STUDENT, their primary keys can be
used as foreign keys. InstId of DEPT table is primary key (Id) attribute in INSTRUCTOR table
and it is working as foreign key here.

• STUDENT table has primary key only. To get the personal information of student SId will be
used. But to retrieve academic information connection is required with DEPT and TAKES table.

• Each course has its unique CCode in COURSE table. COURSE table is logically connected
with SECTION table and DEPT table to particulate the course in department and section.

• TAKES table is created using binary M: N relationship between STUDENT and SECTION. This
is normalized form of both tables.

• In SECTION table SecId is primary key.

Comment
Chapter 9, Problem 3E

Problem

Try to map the relational schema in Figure 6.14 into an ER schema. This is part of a process
known as reverse engineering, where a conceptual schema is created for an existing
implemented database. State any assumptions you make.

Step-by-step solution

Step 1 of 3

Take the relational schema from the text book figure 6.14 it shows the relations of mapping the
EER categories. Based on this we may construct the ER schema.

Comment

Step 2 of 3

Comment

Step 3 of 3

Here, BOOK_AUTHORS is the multivalued attributes. So it can be represented as weak entity


type.

Comment
Chapter 9, Problem 4E

Problem

Figure shows an ER schema for a database that can be used to keep track of transport ships
and their locations for maritime authorities. Map this schema into a relational schema and specify
all primary keys and foreign keys.

Figure

An ER schema for a SHIP_TRACKING database.

Step-by-step solution

Step 1 of 6

Following are the steps to convert the given ER scheme into a relational schema:

Step 1: Mapping the regular entity types:

Identify the regular entities in the given ER scheme and create a relation for each regular entity.
Include all the simple attributes of regular entities into relations.

The relations are SHIP, SHIP_TYPE, STATE_COUNTRY, and SEA/OCEAN/LAKE.

Comments (1)

Step 2 of 6

Step 2: Mapping the weak entity types:

The weak entities in the given ER scheme are SHIP_MOVEMENT, PORT, and PORT_VISIT.

Create a relation for each weak entity. Include all the simple attributes of weak entities into
relations and include the primary key of the strong entity that corresponds to the owner entity
type as a foreign key.

Comments (1)
Step 3 of 6

Step 3: Mapping of binary 1:1 relationship types:

There exists one binary 1:1 relationship mapping which is SHIP_AT_PORT in given ER scheme.

Step 4: Mapping of binary 1: N relationship types:

1: N relationship types in given ER scheme are HISTORY, TYPE, IN, ON, HOME_PORT.

For HISTORY 1: N relationship type, include the primary key of SHIP in SHIP_MOVEMENT. That
is handled in step 2.

For TYPE 1:N relationship type, include the primary key of SHIP_TYPE in SHIP.

For IN 1: N relationship type, include the primary key of STATE_COUNTRY in PORT.

For ON 1: N relationship type, include the primary key of SEA/OCEAN/LAKE in PORT.

For HOME_PORT 1:N relationship type, include the primary key of PORT_VISIT in SHIP.

Comment

Step 4 of 6

Step 5: Mapping of binary M: N relationship types:

There are no binary M: N relationship types in the given ER scheme.

Step 6: Mapping of multivalued attributes:

There are no multivalued attributes in the given ER scheme.

The relational schema is shown below:

Comments (3)

Step 5 of 6

The primary keys in the schema are:

SHIP: SnameSHIP_TYPE: TypeSHIP_MOVEMENT: Statename, Date, Time (Compound


key)SEA/OCEAN/LAKE: SeaNamePORT: PnameSTATE_COUNTRY: NamePORT_VISIT:
VSname, Start_date (Compound Key)

Comment

Step 6 of 6

The foreign keys in the schema are:


SHIP: Ship_type, P_nameSHIP_TYPE: NoneSHIP_MOVEMENT: StatenameSEA/OCEAN/LAKE:
NonePORT: NoneSTATE_COUNTRY: NamePORT_VISIT: VSname

Comment
Chapter 9, Problem 5E

Problem

Map the BANKER schema of Exercise 1 (shown in Figure 2) into a relational schema. Specify all
primary keys and foreign keys. Repeat for the AIRLINE schema (Figure 3.20) of Exercise 2 and
for the other schemas for Exercises 1 through 9.

Exercise 1

Consider the ER diagram shown in Figure 1 for part of a BANK database. Each bank can have
multiple branches, and each branch can have multiple accounts and loans.

a. List the strong (nonweak) entity types in the ER diagram.

b. Is there a weak entity type? If so, give its name, partial key, and identifying relationship.

c. What constraints do the partial key and the identifying relationship of the weak entity type
specify in this diagram?

d. List the names of all relationship types, and specify the (min, max) constraint on each
participation of an entity type in a relationship type. Justify your choices.

Figure 1

An ER diagram for a BANK database schema.

Exercise 2

Consider the ER diagram in Figure 2, which shows a simplified schema for an airline
reservations system. Extract from the ER diagram the requirements and constraints that
produced this schema. Try to be as precise as possible in your requirements and constraints
specification.

Figure 2

An ER diagram for an AIRLINE database schema.

Exercise 3

Which combinations of attributes have to be unique for each individual SECTION entity in the
UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld
constraints:

a. During a particular semester and year, only one section can use a particular classroom at a
particular DaysTime value.

b. During a particular semester and year, an instructor can teach only one section at a particular
DaysTime value.

c. During a particular semester and year, the section numbers for sections offered for the same
course must all be different.

Can you think of any other similar constraints?


Exercise 4

Composite and multivalued attributes can be nested to any number of levels. Suppose we want
to design an attribute for a STUDENT entity type to keep track of previous college education.
Such an attribute will have one entry for each college previously attended, and each such entry
will be composed of college name, start and end dates, degree entries (degrees awarded at that
college, if any), and transcript entries (courses completed at that college, if any). Each degree
entry contains the degree name and the month and year the degree was awarded, and each
transcript entry contains a course name, semester, year, and grade. Design an attribute to hold
this information. Use the conventions in Figure 3.5.

Exercise 5

Show an alternative design for the attribute described in Exercise 4 that uses only entity types
(including weak entity types, if needed) and relationship types.

Exercise 6

In Chapters 1 and 2, we discussed the database environment and database users. We can
consider many entity types to describe such an environment, such as DBMS, stored database,
DBA, and catalog/data dictionary. Try to specify all the entity types that can fully describe a
database system and its environment; then specify the relationship types among them, and draw
an ER diagram to describe such a general database environment.

Exercise 7

Design an ER schema for keeping track of information about votes taken in the U.S. House of
Representatives during the current two-year congressional session. The database needs to keep
track of each U.S. STATE?S Name (e.g., ?Texas?, ?New York?, ?California?) and include the
Region of the state (whose domain is {?Northeast?, ?Midwest?, ?Southeast?, ?Southwest?, ?
West?}). Each CONGRESS_PERSON in the House of Representatives is described by his or her
Name, plus the District represented, the Start_date when the congressperson was first elected,
and the political Party to which he or she belongs (whose domain is {?Republican?, ?Democrat?,
?Independent?, ?Other?}). The database keeps track of each BILL (i.e., proposed law), including
the Bill_name, the Date_of_vote on the bill, whether the bill Passed_or_failed (whose domain is
{?Yes?, ?No?}), and the Sponsor (the congressperson(s) who sponsored?that is, proposed?the
bill). The database also keeps track of how each congressperson voted on each bill (domainof
Vote attribute is {?Yes?, ?No?, ?Abstain., ?Absent?}). Draw an ER schema diagram for this
application. State clearly any assumptions you make.

Exercise 8

A database is being constructed to keep track of the teams and games of a sports league. A
team has a number of players, not all of whom participate in each game. It is desired to keep
track of the players participating in each game for each team, the positions they played in that
game, and the result of the game. Design an ER schema diagram for this application, stating any
assumptions you malie. Choose your favorite sport (e.g., soccer, baseball, football).

Exercise 9

Consider the ER diagram in Figure 3. Assume that an employee may work in up to two
departments or may not be assigned to any department. Assume that each department must
have one and may have up to three phone numbers. Supply (min, max) constraints on this
diagram. State clearly any additional assumptions you make. Under what conditions would the
relationship HAS_PHONE be redundant in this example?

Figure 3

Part of an ER diagram for a COMPANY database.

Figure 3.20
Step-by-step solution

There is no solution to this problem yet.


Get help from a Chegg subject expert.

Ask an expert
Chapter 9, Problem 6E

Problem

Map the EER diagrams in Figures 4.9 and 4.12 into relational schemas. Justify your choice of
mapping options.

Step-by-step solution

Step 1 of 7

The relational schema diagram for the EER diagram in figure 4.9 is as shown below:
Comment

Step 2 of 7

Explanation:

• The regular entity types are PERSON, DEPARTMENT, COLLEGE, COURSE and SECTION.
So, create a relation for each entity with their respective attributes.

• The FACULTY and STUDENT are sub classes of the entity PERSON. So, two relations one for
FACULTY and one for STUDENT are created and the primary key of PERSON is included in
both the relations along with their respective attributes.

• An entity INSTRUCTOR_RESEARCHER is created with Instructor_id as an attribute. This


attribute is included as a foreign key in the relations FACULTY and GRAD_STUDENT.

• There exists a binary 1:1 relationship CHAIRS between FACULTY and DEPARTMENT. So,
include the primary key of Faculty as a foreign key in relation DEPARTMENT.

• There exists a binary 1:N relationship CD between COLLEGE and DEPARTMENT. So, include
the primary key of COLLEGE as a foreign key in relation DEPARTMENT.

Comment

Step 3 of 7

• There exists a binary 1:N relationship DC between DEPARTMENT and COURSE. So, include
the primary key of DEPARTMENT as a foreign key in relation COURSE.

• There exists a binary 1:N relationship CS between COURSE and SECTION. So, include the
primary key of COURSE as a foreign key in relation SECTION.

• There exists a binary 1:N relationship ADVISOR between FACULTY and GRAD_STUDENT.
So, include the primary key of FACULTY as a foreign key in relation GRAD_STUDENT.

• There exists a binary 1:N relationship PI between FACULTY and GRANT. So, include the
primary key of FACULTY as a foreign key in relation GRANT.

• There exists a binary 1:N relationship TEACH between SECTION and


INSTRUCTOR_RESEARCHER. Create a relation TEACH and include the primary keys of
SECTION and INSTRUCTOR_RESEARCHER as attributes of TEACH.
• There exists a binary 1:N relationship MAJOR between STUDENT and DEPARTMENT. Create
a relation MAJOR and include the primary keys of STUDENT and DEPARTMENT as attributes of
MAJOR.

Comment

Step 4 of 7

There exists a binary 1:N relationship MINOR between STUDENT and DEPARTMENT. Create a
relation MINOR and include the primary keys of STUDENT and DEPARTMENT as attributes of
MINOR.

• There exists a binary M:N relationship COMMITTEE between FACULTY and


GRAD_STUDENT. Create a relation COMMITTEE and include the primary keys of FACULTY
and GRAD_STUDENT as attributes of COMMITTEE.

• There exists a binary M:N relationship BELONGS between FACULTY and DEPARTMENT.
Create a relation BELONGS and include the primary keys of FACULTY and DEPARTMENT as
attributes of BELONGS.

• There exists a binary M:N relationship REGISTERED between STUDENT and


CURRENT_SECTION. Create a relation REGISTERED and include the primary keys of
STUDENT and CURRENT_SECTION as attributes of REGISTERED.

• There exists a binary M:N relationship REGISTERED between STUDENT and


CURRENT_SECTION. Create a relation REGISTERED and include the primary keys of
STUDENT and CURRENT_SECTION as attributes of REGISTERED.

• There exists a binary M:N relationship TRANSCRIPT between SECTION and STUDENT.
Create a relation TRANSCRIPT and include the primary keys of SECTION and STUDENT as
attributes of TRANSCRIPT along with additional attributes of relation TRANSCRIPT.

Comment

Step 5 of 7

The relational schema diagram for the EER diagram in figure 4.12 is as shown below:

Comment
Step 6 of 7

Explanation:

• The regular entity types are PLANE_TYPE, AIRPLANE and HANGAR. So, create a relation for
each entity with their respective attributes.

• Create two relations CORPORATION and PERSON and include their respective attributes.

• Owner category is a subset of the union of two entities CORPORATION and PERSON. So, a
relation OWNER is created with Owner_id as an attribute. This attribute is included as a foreign
key in the relations CORPORATION and PERSON.

• The EMPLOYEE and PILOT are sub classes of the entity PERSON. So, two relations one for
EMPLOYEE and one for PILOT are created and the primary key of PERSON is included as
primary key in both the relations along with their respective attributes.

• An entity SERVICE is a weak entity. So, create a relation SERVICE and include as attributes
the primary key of AIRPLANE along with the attributes of SERVICE.

• There exists a binary 1:N relationship OF_TYPE between AIRPLANE and PLANE_TYPE. So,
include the primary key of AIRPLANE as a foreign key in relation PLANE_TYPE.

• There exists a binary 1:N relationship STORED_IN between AIRPLANE and HANGAR. So,
include the primary key of AIRPLANE as a foreign key in relation HANGAR.

• There exists a binary M:N relationship WORKS_ON between PLANE_TYPE and EMPLOYEE.
Create a relation WORKS_ON and include the primary keys of PLANE_TYPE and EMPLOYEE
as attributes of WORKS_ON.

Comment

Step 7 of 7

There exists a binary M:N relationship FLIES between PLANE_TYPE and PILOT. Create a
relation FLIES and include the primary keys of PLANE_TYPE and PILOT as attributes of FLIES.

• There exists a binary M:N relationship OWNS between AIRPLANE and OWNER. Create a
relation OWNS and include the primary keys of AIRPLANE and OWNER as attributes of OWNS
along with the attribute Pdate.

• There exists a binary M:N relationship MAINTAIN between SERVICE and EMPLOYEE. Create
a relation OWNS and include the primary keys of SERVICE and EMPLOYEE as attributes of
MAINTAIN.

Comment
Chapter 9, Problem 7E

Problem

Is it possible to successfully map a binary M : N relationship type without requiring a new


relation? Why or why not?

Step-by-step solution

Step 1 of 3

When there exists a many to many relationship between two entities, then the relationship type is
known as binary M: N relationship type.

Comment

Step 2 of 3

The steps to map a binary M: N relationship type R into relation is as follows:

• Create a new relation R1 to represent the relationship type R.

• Include the primary keys of the two participating entities as foreign keys in new relation R1.

• The primary keys of the two participating entities also become the composite primary key of
relation R1.

• Also include any simple attributes of the relationship type R.

Comment

Step 3 of 3

Hence, it is not possible to map a binary M: N relationship type without requiring a new relation.

Comment
Problem
Chapter 9, Problem 8E

Consider the EER diagram in Figure for a car dealer.

Map the EER schema into a set of relations. For the VEHICLE to CAR/TRUCK/SUV
generalization, consider the four options presented in Section 9.2.1 and show the relational
schema design under each of those options.

Figure

EER diagram for a car dealer.

Step-by-step solution

Step 1 of 8

Option multiple relations – superclass and subclasses:

Following are the set of relations for the VEHICLE to CAR/TRUCK/SUV generalization using the
option multiple relations – superclass and subclasses:

Comment

Step 2 of 8

Using the option multiple relations – superclass and subclasses, a separate relation is created for
super class and each sub class in the generalization.

• A relation VEHICLE is created with attributes Vin, Model and Price.

• A relation CAR is created with attribute Vin and Engine_size.

• A relation TRUCK is created with attribute Vin and Tonnage.

• A relation SUV is created with attribute Vin and No_seats.

Comment

Step 3 of 8

The relational schema for a car dealer EER diagram (refer figure 9.9) using the option multiple
relations – superclass and subclasses is as shown below:
Comment

Step 4 of 8

Option multiple relations –subclass relations only:

Following are the set of relations for the VEHICLE to CAR/TRUCK/SUV generalization using the
option multiple relations –subclass relations only:

Using the option multiple relations –subclass relations only, a separate relation is created for
each sub class in the generalization.

• A relation CAR is created with attribute Vin, Model, Price and Engine_size.

• A relation TRUCK is created with attribute Vin, Model, Price and Tonnage.

• A relation SUV is created with attribute Vin, Model, Price and No_seats.

Comment

Step 5 of 8

Option single relation with one type attribute:

Following are the set of relations for the VEHICLE to CAR/TRUCK/SUV generalization using the
option single relation with one type attribute:

Using the option single relation with one type attribute, a single relation is created for super class
as well as the sub class.

• The attributes of the relation will be the union of attributes of super class and sub classes.
• An attribute Vehicle_Type is added to specify the type of the vehicle

• A relation Vehicle is created with attributes Vin, Model, Price, Engine_size, Tonnage, No_seats
and Vehicle_Type.

Comment

Step 6 of 8

The relational schema for a car dealer EER diagram (refer figure 9.9) using the option single
relation with one type attribute is as shown below:

Comment

Step 7 of 8

Option single relation with multiple type attributes:

Following are the set of relations for the VEHICLE to CAR/TRUCK/SUV generalization using the
option single relation with multiple type attributes:

Using the option single relation with multiple type attributes, a single relation is created for super
class as well as the sub class.

• The attributes of the relation will be the union of attributes of super class and sub classes.

• An Boolean attribute Car_Type is added to indicate the type of the vehicle as car.

• An Boolean attribute Truck_Type is added to indicate the type of the vehicle as truck.

• An Boolean attribute SUV_Type is added to indicate the type of the vehicle as SUV.

• A relation Vehicle is created with attributes Vin, Model, Price, Car_Type, Engine_size,
Truck_Type, Tonnage, SUV_Type, No_seats.

Comment

Step 8 of 8

The relational schema for a car dealer EER diagram (refer figure 9.9) using the option single
relation with multiple type attributes is as shown below:

Comment
Chapter 9, Problem 9E

Problem

Using the attributes you provided for the EER diagram in Exercise, map the complete schema
into a set of relations. Choose an appropriate option out of 8A thru 8D from Section 9.2.1 in doing
the mapping of generalizations and defend your choice.

Exercise

Consider the following EER diagram that describes the computer systems at a company. Provide
your own attributes and key for each entity type. Supply max cardinality constraints justifying
your choice. Write a complete narrative description of what this EER diagram represents.

Step-by-step solution

Step 1 of 2

EER diagram represents:

EER diagram represents the computer systems at a company.

• The EER diagram starts with the relation computer.

• The relation computer has the attributes that RAM, ROM, Processor, S_no, Manufacturer, and
Cost.

• It has the primary key S_no and the cardinality of 1:M.

• EER diagram starts with the relation computer that it deals to many relations that Accessory,
Installed and d.

• The Accessory has a one-to-many cardinality and transfers the function to the keyboard,
monitor, and mouse.

• Also, the installed and installed_OS relation deals with the software and operating_system to
perform the operations and signals on the computer system to support with it.

• The relation d performs the cardinality to laptop and desktop with all other components.

• The other components that related are memory, video_card, and sound_card.

Cardinality:

• One-to-one cardinality describes the entity that related to only one occurrence to another
occurrence.

• One-to-many cardinality describes the entity that related to one occurrence to many
occurrences.

• Many-to-many cardinality describes the entity that related to many occurrences to many
occurrences.

Comment

Step 2 of 2

The following table describes the attributes, primary key, and cardinality of each relation:
Comment
Chapter 9, Problem 10LE

Problem

Consider the ER design for the UNIVERSITY database that was modeled using a tool like ERwin
or Rational Rose in Laboratory Exercise 3.31. Using the SQL schema generation feature of the
modeling tool, generate the SQL schema for an Oracle database.

Reference Exercise 3.31

Consider the UNIVERSITY database described in Exercise 16. Build the ER schema for this
database using a data modeling tool such as ERwin or Rational Rose.

Reference Exercise 16

Which combinations of attributes have to be unique for each individual SECTION entity in the
UNIVERSITY database shown in Figure 3.20 to enforce each of the following miniworld
constraints:

a. During a particular semester and year, only one section can use a particular classroom at a
particular DaysTime value.

b. During a particular semester and year, an instructor can teach only one section at a particular
DaysTime value.

c. During a particular semester and year, the section numbers for sections offered for the same
course must all be different.

Can you think of any other similar constraints?

Step-by-step solution

Step 1 of 1

Refer to the ER schema for UNIVERSITY database, generated using Rational Rose tool in
Laboratory Exercise 3.31. Use Rational Rose tool to create the SQL schema for an Oracle
database as follows:

• Open the ER schema generated using Rational Rose tool in Laboratory Exercise 3.31. In the
options available on left, right click on the option Component view, go to Data Modeler, then go
to New and select the option Database.
• Name the database as Oracle Database.

• Right click on Oracle Database and select the option Open Specification. In the field Target
select Oracle 7.x and click on OK.

• Import the ER schema, generated using Rational Rose tool in Laboratory Exercise- 3.31, to the
Oracle Database as follows:

• Right click on the Oracle Database, then go to New and select the option File.

• Now browse and select the ER schema generated using Rational Rose tool in Laboratory
Exercise 3.31. Selecting the file would import the ER schema for the UNIVERSITY database,
generated using Rational Rose tool in Laboratory Exercise 3.31.

• Click on File option in menu bar, followed by clicking on Save as option. Save the ER schema
by the file name 714374-9-10LE.

• This will generate the SQL schema of the UNIVERSITY database for the Oracle database.

Comment
Chapter 9, Problem 11LE

Problem

Consider the ER design for the MAIL_ORDER database that was modeled using a tool like
ERwin or Rational Rose in Laboratory Exercise. Using the SQL schema generation feature of the
modeling tool, generate the SQL schema for an Oracle database.

Exercise

Consider a MAIL_ORDER database in which employees take orders for parts from customers.
The data requirements are summarized as follows:

■ The mail order company has employees, each identified by a unique employee number, first
and last name, and Zip Code.

■ Each customer of the company is identified by a unique customer number, first and last name,
and Zip Code.

■ Each part sold by the company is identified by a unique part number, a part name, price, and
quantity in stock.

■ Each order placed by a customer is taken by an employee and is given a unique order number.
Each order contains specified quantities of one or more parts. Each order has a date of receipt
as well as an expected ship date. The actual ship date is also recorded.

Design an entity-relationship diagram for the mail order database and build the design using a
data modeling tool such as ERwin or Rational Rose.

Step-by-step solution

Step 1 of 1

Refer to the ER schema for MAIL_ORDER database, generated using Rational Rose tool in
Laboratory Exercise 3.32. Use Rational Rose tool to create the SQL schema for an Oracle
database as follows:

• Open the ER schema generated using Rational Rose tool in Laboratory Exercise 3.32. In the
options available on left, right click on the option Component view, go to Data Modeler, then go
to New and select the option Database.

• Name the database as Oracle Database.

• Right click on Oracle Database and select the option Open Specification. In the field Target
select Oracle 7.x and click on OK.

• Import the ER schema, generated using Rational Rose tool in Laboratory Exercise- 3.32, to the
Oracle Database as follows:

• Right click on the Oracle Database, then go to New and select the option File.
• Now browse and select the ER schema generated using Rational Rose tool in Laboratory
Exercise 3.32. Selecting the file would import the ER schema for the MAIL_ORDER database.

• Click on File option in menu bar, followed by clicking on Save as option. Save the ER schema
by the file name 714374-9-11LE.

• This will generate the SQL schema of the MAIL_ORDER database for the Oracle database.

Comment
Chapter 10, Problem 1RQ

Problem

What is ODBC? How is it related to SQL/CLI?

Step-by-step solution

Step 1 of 1

ODBCL:-

Open data base connectivity (ODBC) is the standardized application programming interface. It is
for accessing a database.

For accessing the files we use the ODBC soft ware and programming support of ODBC is
Microsoft.

SQL/CLI

SQL/CLI is the part of SQL standard. SQL / CLT means. Call level interface. It was developed as
a follow up to the technique known as ODBC.

SQL/ CLI is the set of functions.

Comment
Chapter 10, Problem 2RQ

Problem

What is JDBC? Is it an example of embedded SQL or of using function calls?

Step-by-step solution

Step 1 of 2

JDBCE

JDBC stand for Java database connectivity. It is a registered trademark of sun Microsystems.

JDBC is the call function interface it is for accessing the databases from java.

A JDBC driver is basically an implementation of the function calls. That is specified in the JDBC
application programming interface. It is designed for allow a single java program to connect
several different databases.

Comment

Step 2 of 2

JDBC is not the example of embedded SQL. It is a function call. That is specified in JDBC API.
JDBC function calls can access any RDBMS where the JDBC driver can available. So the
function libraries for this access are known as JDBC.

Comment
Chapter 10, Problem 3RQ

Problem

List the three main approaches to database programming. What are the advantages and
disadvantages of each approach?

Step-by-step solution

Step 1 of 3

Main approaches to database programming:-

The main approaches for database programming are

(1) Embedding database command in a general – purpose programming language:

Here database statements are embedded into the host programming language. But they are
identified by a special prefix and precompiled or preprocessor scans the source program code to
identify database statements and extract them for processing by the DBMS.

Comment

Step 2 of 3

(2) Using a library of database functions:-

A library of functions is made available to the host programming language for database calls.

Comment

Step 3 of 3

(3) Designing a brand new language:-

Database programming language is designed from scratch to be compatible with the database
model and query language. Here loops and conditional statements are added to the data base
language to convert it in to a full fledged programming language.

Advantages and disadvantages of database programming:-

In many applications, first two steps are most common approaches. But they require some
database access and main disadvantages of these two approaches is impedance mismatch.

In the third approach it is more appropriate for applications and it has intensive data base
interaction. In the third approach impedance mismatch is not occur here.

Comment
Chapter 10, Problem 4RQ

Problem

What is the impedance mismatch problem? Which of the three programming approaches
minimizes this problem?

Step-by-step solution

Step 1 of 2

Impedance mismatch:

Impedance mismatch is a term that is used to refer the problems occur in the differences
between the data base model and the programming language model.

It is less of problem when a special data base programming language is designed. At this time
that uses the same data model and data types as the database model.

In a relational model it has three main constructs.

Attributes

tuples

tables.

Comment

Step 2 of 2

1 st problem:-

In the data model the data types of the programming language differ from the attribute data type.

So, for this, it is necessary to have a binding for each programming language because different
languages have different data types.

2 nd problem:

The results of most queries are sets or multisite of tuples. And each is formed of a sequence of
attribute value.

So binding is needed to map the query result data structure, which is a table to an appropriate
data structure in the programming language.

The third approach of the data base programming that is designing a brand new language,
approach is minimize this impedance mismatch problem.

Comment
Chapter 10, Problem 5RQ

Problem

Describe the concept of a cursor and how it is used in embedded SQL.

Step-by-step solution

Step 1 of 2

A cursor is a pointer that points to a single tuple/ row from the result of a query that retrieves
multiple tuples.

It is declared when the SQL every command is declared in the program.

In the program cursor uses two commands

OPEN CURSOR. Command

FETCH command

And the cursor variable is an iterates.

Comments (1)

Step 2 of 2

In the embedded SQL, update / delete commands are used when the condition WHERE
CURRENT OF < Cursor name > specifies that the current tuple. It is represented by the cursor.
When declaring a cursor in the embedded SQL, some operations are performed in that.

DECLARE < Cursor name > [ INSENSITIVE ] [ SCROLL ] CURSOR

[ WITH HOLD ] FOR < query specification >

[ ORDER BY < Ordering Specification >]

[ FOR REND ONLY | FOR UPDATE [ OF < attribute ] ] ;

Comment
Chapter 10, Problem 6RQ

Problem

What is SQLJ used for? Describe the two types of iterators available in SQLJ.

Step-by-step solution

Step 1 of 2

SQL J

SQL J is standard it is adopted by several vendors for embedded SQL in java. SQL J is used for
accessing SQL database from java using function calls. And it is used in oracle DBMS.

SQL J is used for convert the SQL statements into java through the JDBC interface.

In SQL J an iterates is associated with the tuples and attributes in a query result. Here two types
of iterators is there.

(1) A named iterator.

(2) A positional iterator.

Comment

Step 2 of 2

A named iterator is associated with a query result by listing the attribute names and types. That
may appear in the query result. And

A positional iterator lists only the attribute types at the time of query result appear.

A part from this, is both cases, the list should be in the same order as the attributes that are listed
in the SELECT clause of the query. Looping over a query result is different for these two type of
iterators.

In the name iterator, there are no attribute names and in the positional iterator only attribute types
are present.

The positional iterator behaves as move similar to embedded SQL.

Comment
Chapter 10, Problem 7E

Problem

Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a
program segment to read a student’s name and print his or her grade point average, assuming
that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language.

Step-by-step solution

Step 1 of 1

Assuming all required variables have been declared already and assuming that Name of
STUDENT is unique , code will look like:

int Total_grade_avg = 0, total_course_count = 0;

Prompt("Entre name of Student”, Sname) ;

EXEC SQL

Select Student_number, Name

Into :number, :name

From STUDENT

Where Name = :Sname;

EXEC SQL DECLARE GR CURSOR FOR

Select Grade

from GRADE_REPORT

where Student_number = :number;

EXEC SQL OPEN GR

EXEC SQL FETCH from GR into :grade;

While(SQLCODE = = 0)

switch (:grade)

case ‘A’:
total_grade_avg+= 4;

case ‘B’:

total_grade_avg+= 3;

case ‘C’:

total_grade_avg+= 2;

case ‘D’:

total_grade_avg+= 1;

total_course_count++;

EXEC SQL FETCH from GR into :grade;

EXEC SQL CLOSE GR

If (total_course_count!=0)Total_grade_avg/ = total_course_count;

printf(“Grade average of student is ”, total_grade_avg);

Comment
Chapter 10, Problem 8E

Problem

Repeat Exercise 10.7, but use SQLJ with Java as the host language.

Reference 10.7

Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a
program segment to read a student’s name and print his or her grade point average, assuming
that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language.

Step-by-step solution

Step 1 of 1

Assuming all required variables have been declared already, headers have been included, and
assuming that Name of STUDENT is unique, code will look like:

int Total_grade_avg = 0, total_course_count = 0;

Sname = readEntry("entre student name:”);

try

#sql

Select Student_number, Name

Into :number, :name

From STUDENT

Where Name = :Sname

};

catch (SQLExeception se)

System.out.println(“no student with this name”+ Sname);

Return;
}

#sql iterator STU(Int number, string name);

STU s = null;

#sql s = { Select Grade

from GRADE_REPORT

where Student_number = :number};

while (s.next())

switch (:grade)

case ‘A’:

total_grade_avg+= 4;

case ‘B’:

total_grade_avg+= 3;

case ‘C’:

total_grade_avg+= 2;

case ‘D’:

total_grade_avg+= 1;

total_course_count++;

};

If (total_course_count!= 0 )

total_grade_avg = total_grade_avg/ total_course_count;

};

System.out.println(“Grade average of student is ”, +total_grade_avg);

s.close();

Comment
Chapter 10, Problem 9E

Problem

Consider the library relational database schema in Figure. Write a program segment that
retrieves the list of books that became overdue yesterday and that prints the book title and
borrower name for each. Use embedded SQL with C as the host language.

Figure

A relational database schema for a LIBRARY database.

Step-by-step solution

Step 1 of 1

Assuming all required variables have been declared already

EXEC SQL DECLARE DB CURSOR FOR

Select B.Book_id, B.Title, BW.Name

from BOOK B, BORROWER BW, BOOK_LOANS BL

where BL.Due_date = CurDate() + 1

AND BL.Card_no = BW.Card_no

AND BL.Book_id = B.Book_id

EXEC SQL OPEN DB

EXEC SQL FETCH from DB into :bookId, :bookTitle,:borrowerName;

While(SQLCODE = = 0)

printf(“BookId”,bookId );

printf(“Book Title”,bookTitle );

printf(“Borrower Name”,borrowerName );

EXEC SQL FETCH from DB into :bookId, :bookTitle,:borrowerName;

EXEC SQL CLOSE DB

Comment
Chapter 10, Problem 10E

Problem

Repeat Exercise, but use SQLJ with Java as the host language.

Exercise

Consider the library relational database schema in Figure. Write a program segment that
retrieves the list of books that became overdue yesterday and that prints the book title and
borrower name for each. Use embedded SQL with C as the host language.

Figure

A relational database schema for a LIBRARY database.

Step-by-step solution

Step 1 of 1

Assuming all required variables have been declared already, headers have been included.

#sql iterator DB(string bookId, string bookTitle, string borrowerName);

DB d = null;

#sql d = { Select B.Book_id, B.Title, BW.Name

from BOOK B, BORROWER BW, BOOK_LOANS BL

where BL.Due_date = CurDate() + 1

AND BL.Card_no = BW.Card_no

AND BL.Book_id = B.Book_id

};

while (d.next())

System.out.println(“book id :”+d. bookId + “book title:” + d. bookTitle + “borrower name : ” + d.


borrowerName);

};

d.close();

Comment
Chapter 10, Problem 11E

Problem

Repeat Exercise 10.7 and 10.9, but use SQL/CLI with C as the host language.

Reference 10.7

Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a
program segment to read a student’s name and print his or her grade point average, assuming
that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language.

Reference 10.9

Consider the library relational database schema in Figure 6.6. Write a program segment that
retrieves the list of books that became overdue yesterday and that prints the book title and
borrower name for each. Use embedded SQL with C as the host language.
Step-by-step solution

Step 1 of 4

Que 9.7 using SQL/CLI

#include sqlcli.h;

Void printGPA() {

SQLHSTMT stmt1 ;

SQLHDBC conv1 ;

SQLHENV env1 ;

SQLRETURN ret1, ret2, ret3, ret4 ;

ret1 = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env1);

if (!ret1) ret2 = SQLAllocHandle(SQL_HANDLE_DBC, env1, &con1) else exit;

if (!ret2) ret3 = SQLConnect (con1, “dbs”, SQL_NTS, “js”, SQL_NTS,”xyz”, SQL_NTS) else exit;

if (!ret3) ret4 = SQLAllocHandle(SQL_HANDLE_STMT, con1, &stmt1) else exit;

SQLPREPARE(stmt1, “Select Student_number, Name

From STUDENT

Where Name = ?”, SQL_NTS);

prompt (“Entre student name:” Sname);

SQLBindParameter(stmt1, 1, SQL_INTEGER, &Sname, 15, &fetchlen1);

ret1 = SQLExecute(stmt1);

if (!ret1)

SQLBindCol(stmt1, 1, SQL_INT, &number,4, &fetchlen1);

SQLBindCol(stmt1, 2, SQL_STRING, &name,15, &fetchlen2);

ret2 = SQLFetch(stmt1);

while (!ret2)

SQLPREPARE(stmt1, “Select Grade

from GRADE_REPORT

where Student_number = ?”, SQL_NTS);

SQLBindParameter(stmt1, 1, SQL_INTEGER, &number, 4, &fetchlen1);

ret1 = SQLExecute(stmt1);

if (!ret1)

SQLBindCol(stmt1, 1, SQL_INT, &grade,4, &fetchlen1);

ret2 = SQLFetch(stmt1);

while (!ret2)

switch (:grade)

case ‘A’:

total_grade_avg+= 4;

case ‘B’:

total_grade_avg+= 3;

case ‘C’:

total_grade_avg+= 2;

case ‘D’:

total_grade_avg+= 1;

total_course_count++;

ret2 = SQLFetch(stmt1);

Comment

Step 2 of 4

If (total_course_count!=0)Total_grade_avg/ = total_course_count;

System.out.printline(“Grade average of student is ”, total_grade_avg);

Comment

Step 3 of 4

}
else System.out.printline(“Sname does not match”);

Que 9.9 using SQL/CLI

#include sqlcli.h;

Void printDueBookRecord() {

SQLHSTMT stmt1 ;

SQLHDBC conv1 ;

SQLHENV env1 ;

SQLRETURN ret1, ret2, ret3, ret4 ;

ret1 = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env1);

if (!ret1) ret2 = SQLAllocHandle(SQL_HANDLE_DBC, env1, &con1) else exit;

if (!ret2) ret3 = SQLConnect (con1, “dbs”, SQL_NTS, “js”, SQL_NTS,”xyz”, SQL_NTS) else exit;

if (!ret3) ret4 = SQLAllocHandle(SQL_HANDLE_STMT, con1, &stmt1) else exit;

SQLPREPARE(stmt1, “Select B.Book_id, B.Title, BW.Name

from BOOK B, BORROWER BW, BOOK_LOANS BL

where BL.Due_date = CurDate() + 1

AND BL.Card_no = BW.Card_no

AND BL.Book_id = B.Book_id” , SQL_NTS

ret1 = SQLExecute(stmt1);

if (!ret1)

SQLBindCol(stmt1, 1, SQL_STRING, &Book_id,4, &fetchlen1);

SQLBindCol(stmt1, 2, SQL_STRING, &Title,30, &fetchlen2);

Comment

Step 4 of 4

SQLBindCol(stmt1, 3, SQL_STRING, &Borrowername,30, &fetchlen3);

ret2 = SQLFetch(stmt1);

while (!ret2)

System.out.printline (Book_id, Title, Borrower_name);

ret2 = SQLFetch(stmt1);

Comment
Chapter 10, Problem 12E

Problem

Repeat Exercise 10.7 and 10.9, but use JDBC with Java as the host language.

Reference 10.7

Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a
program segment to read a student’s name and print his or her grade point average, assuming
that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language.

Reference 10.9

Consider the library relational database schema in Figure 6.6. Write a program segment that
retrieves the list of books that became overdue yesterday and that prints the book title and
borrower name for each. Use embedded SQL with C as the host language.
Step-by-step solution

Step 1 of 2

9.7 using JDBC

Import java.io.*;

import java.sql.*;

…..

class PrintGPAAverage

Public static void main(String args[])

Throws SQLException, IOException{

Try{ Class.forName(“oracle.jdbc.driver.Oracle.Driver”)

} catch (ClassNotFoundException x) {

System.out.printline (“Driver could not be loaded”);

String dbacct, password, lname;

Integer number;

String name, Sname;

dbacct = readEntry(“entre database account:”);

passwrd = readEntry(“entre password:”);

Connection conn = DriveManager.getConnection(“jdbc:oracle:oci8:”+ dbacct, +passwrd);

Sname = readEntry (“entre student name”);

String q=“Select Student_number,Name From STUDENT Where Name = ”+Sname;

Statement s = conn.createStatement();

ResultSet r = s.ExecuteQuery(q);

while(r.next())

number = r.getInteger(1);

name = r.getString(2);

String t = “Select Grade from GRADE_REPORT where Student_number = “ + number.tostring();

Statement g = conn.createStatement();

ResultSet rs = g.executeQuery(t);

while (rs.next()){

switch (:grade)

case ‘A’:

total_grade_avg+= 4;

case ‘B’:

total_grade_avg+= 3;

case ‘C’:

total_grade_avg+= 2;

case ‘D’:

total_grade_avg+= 1;

total_course_count++;

If (total_course_count!=0)Total_grade_avg/ = total_course_count;

System.out.printline(“Grade average of student is ”, total_grade_avg);

Comment

Step 2 of 2

Exercise 6.9 as JDBC:

Import java.io.*;

import java.sql.*;

…..

class PrintGPAAverage

Public static void main(String args[])

Throws SQLException, IOException{

Try{ Class.forName(“oracle.jdbc.driver.Oracle.Driver”)

} catch (ClassNotFoundException x) {

System.out.printline (“Driver could not be loaded”);

}
String dbacct, password, lname;

String Book_Id, Book_title, Borrower_name;

dbacct = readEntry(“entre database account:”);

passwrd = readEntry(“entre password:”);

Connection conn = DriveManager.getConnection(“jdbc:oracle:oci8:”+ dbacct, +passwrd);

String q=“Select B.Book_id, B.Title, BW.Name

from BOOK B, BORROWER BW, BOOK_LOANS BL

where BL.Due_date = CurDate() + 1

AND BL.Card_no = BW.Card_no

AND BL.Book_id = B.Book_id”;

Statement s = conn.createStatement();

ResultSet r = s.ExecuteQuery(q);

while(r.next())

Book_Id = r.getString(1);

Book_title= r.getString(2);

Borrower_name = r.getstring(3);

System.out.println(“book id :”++ “book title:” + + “borrower name : ” +);

}}

Comment
Chapter 10, Problem 13E

Problem

Repeat Exercise 10.7, but write a function in SQL/PSM.

Reference 10.7

Consider the database shown in Figure 1.2, whose schema is shown in Figure 2.1. Write a
program segment to read a student’s name and print his or her grade point average, assuming
that A = 4, B = 3, C = 2, and D = 1 points. Use embedded SQL with C as the host language.

Step-by-step solution

Step 1 of 2

Consider the following SQL/PSM function to determine the average grade point of student.

//Function PSM2:

1. CREATE FUNCTION Average_grad ( IN in_name CHAR(20))

//Declare variables to store intermediate values

2. DECLARE total_avg INTEGER;

3. DECLARE std_no INTEGER;

4. DECLARE count INTEGER;

5. DECLARE final_avg FLOAT;

//Query to find the student number of user entered student name.

6. SELECT student_number INTO std_no FROM STUDENT WHERE

Name=in_name;

//Declare cursor to process the multiple row returned by the query

7. CURSOR grd is SELECT Grade FROM GRADE_REPORT WHERE

Student_number= std_no;

8. OPEN grd;

9. LOOP

10. FETCH grd INTO temp_grd;


11. EXIT WHEN grd% NOTFOUND

12. COUNT:=COUNT +1;

//use else-if statement to find the total points sum of student.

13. IF temp_grd=’A’ THEN total_avg:= total_avg+4;

14. ELSEIF temp_grd=’B’ THEN total_avg:= total_avg+3;

15. ELSEIF temp_grd=’C’ THEN total_avg:= total_avg+2;

16. ELSE temp_grd=’D’ THEN total_avg:= total_avg+1;

17. END IF;

18. END LOOP;

//calculate the average

19. final_avg:=total_avg/count;

//display student average point

20. Dbms_output.put_line(“The average is: ”||final_avg);

Comment

Step 2 of 2

Explanation of the above function:

• First a function Average_grad is created which takes the name as an input.

• Now, from the line number 2 to line number 5, variables are declared to store intermediate
values.

• Now, Query in line number 6 is used to find the student number of user entered student name.

• In the line number 7, cursor is declared to process the multiple row returned by the query.

• Now, from the line number 8 to line number 18, for loop is used to count the number of rows.
Also, else-if statement is used inside for loop to find the total point’s sum of student.

• At the end, in line number 19 the average is calculated.

• In the line number 20, Dbms_output.put_line is used to display the average.

Comment
Chapter 10, Problem 14E

Problem

Create a function in PSM that computes the median salary for the EMPLOYEE table shown in
Figure 5.5.

Step-by-step solution

Step 1 of 2

Following is the function in Persistent Stored Module (PSM ) to calculate the median salary for
the EMPLOYEE table:

//Function PSM1:

0) CREATE FUNCTION Emp_Median_Sal(IN Salary INTEGER)

1) RETURNS INTEGER

2) DECLARE median_salary INTEGER;

3) SELECT MEDIAN(Salary) INTO median_salary

4) FROM EMPLOYEE;

5) RETURN median_salary;

Comment

Step 2 of 2

Explanation:

Line 0: CREATE FUNCTION is used to create a function. The name of the function

created is Emp_Median_Sal. It takes the salaries of the EMPLOYEE table

as input.

Line 1: RETURNS is used to return the median salary among the inputs.

Line 2: DECLARE is used to declare local variables. median_salary is a

variable declared to hold the value of median salary.

Line 3: MEDIAN(Salary) will give the median value among the salaries. INTO

clause will assign the value returned by MEDIAN(Salary) into local

variable median_salary.

Line 4: FROM is used to specify from which table the data is to be considered.

Line 5: RETURN is used to return the median_salary.

Comment
Chapter 14, Problem 1RQ

Problem

Discuss attribute semantics as an informal measure of goodness for a relation schema.

Step-by-step solution

Step 1 of 2

Semantics of a relation refers to way of explaining the meaning of an attribute value in a tuple.

Comment

Step 2 of 2

• The semantics of an attribute should be considered in such a way that they can be interpreted
easily.

• Once the semantics of an attribute are clear, it will be easy to interpret a relation.

• The relation that is easy to interpret will indeed result in a good schema design.

Thus, the semantics of an attribute plays an informal measure to design a relation schema.

Comment
Problem
Chapter 14, Problem 2RQ

Discuss insertion, deletion, and modification anomalies. Why are they considered bad? Illustrate
with examples.

Step-by-step solution

Step 1 of 6

Insertion anomaly refers to the situation where it is not possible to enter data of certain attributes
into the database without entering data of other attributes.

Deletion anomaly refers to the situation where data of certain attributes are lost as the result of
deletion of some of the attributes.

Modification anomaly refers to the situation where partial update of redundant data leads to
inconsistency of data.

Comment

Step 2 of 6

Insertion, deletion and modification anomalies are considered bad due to the following reasons:

• It will be difficult to maintain consistency of data in the database.

• It leads to redundant data.

• It causes unnecessary updates of data.

• Memory space will be wasted at the storage level.

Comment

Step 3 of 6

Consider the following relation named Emp_Proj:

Insertion Anomalies:

• Assume that there is an employee E11 who is not yet working in a project. Then it is not
possible to enter details of employee E11 into the relation Emp_Proj.

• Similarly assume there is a project P7 with no employees assigned to it. Then it is not possible
to enter details of project P7 into the relation Emp_Proj.

• Therefore, it is possible to enter an employee details into relation Emp_Proj only if he is


assigned to a project.

• Similarly, it is possible to enter details of a project into relation Emp_Proj only if an employee is
assigned to a project.

Comment

Step 4 of 6

Deletion Anomalies:

• Assume that an employee E07 has left the company. So, it is necessary to delete employee
E07 details from the relation Emp_Pro.

• If employee E07 details are deleted from the relation Emp_Pro, then the details of project P5
will also be lost.

Update anomalies:

• Assume that the location of project P1 is changed from Atlanta to New Jersey. Then the update
should be done at three places.

• If the update is reflected for two tuples and is not done for the third tuple, then inconsistency of
data occurs.

Comment

Step 5 of 6
In order to remove insertion, deletion and modification anomalies, decompose the relation
Emp_Proj into three relations as shown below:

Comment

Step 6 of 6

Insertion Anomalies:

• It is possible to enter the details of employee E11 into relation Employee even though he is not
yet working in a project.

• It is possible to enter the details of project P7 into relation Project even though there are no
employees assigned to it.

Deletion Anomalies:

• If employee E07 details are deleted from the relation Employee, still the details of project P5 will
not be lost.

Update anomalies:

• If the location of project P1 is changed from Atlanta to New Jersey, then the update should be
done in relation Project at only one place.

Comment
Chapter 14, Problem 3RQ

Problem

Why should NULLs in a relation be avoided as much as possible? Discuss the problem of
spurious tuples and how we may prevent it.

Step-by-step solution

Step 1 of 4

Nulls values should be avoided in a relation as much as possible for the following reasons:

• Memory space will be wasted at the storage level.

• Meaning and purpose of the attributes is not communicated well.

Comment

Step 2 of 4

• When aggerate operations such as SUM, AVG etc. are performed on the attribute which has
null values, the result will be incorrect.

• When JOIN operation involves an attribute with null values, the result may be unpredictable.

• The NULL value has different meanings. It may be unknown, not applicable or absent.

Comment

Step 3 of 4

Spurious tuples are generated as the result of bad design or improper decomposition of the base
table.

• Spurious tuples are the tuples generated when a JOIN operation is performed on badly
designed relations. The resultant will have more tuples than the original set of tuples.

• The main problem with spurious tuples is that they are considered invalid as they do not appear
in the base tables.

Comment

Step 4 of 4

Spurious tuples can be avoided by taking care while designing relational schemas.

• The relations should be designed in such a way that when a JOIN operation is performed, the
attributes involved in the JOIN operation must be a primary key in one table and foreign key in
another table.

• While decomposing a base table into two tables, the tables must have a common attribute. The
common attribute must be primary key in one table and foreign key in another table.

Comment
Chapter 14, Problem 4RQ

Problem

State the informal guidelines for relation schema design that we discussed. Illustrate how
violation of these guidelines may be harmful.

Step-by-step solution

Step 1 of 1

Informal guidelines for relational schema:-

For designing a relation a relational database schema there are four types of informal measures
of guidelines that are

(1) Semantics of the attributes.

(2) Reducing the redundant information in tuples.

(3) Reducing the NULL values in tuples.

(4) Disallowing the possibility of generating spurious tuples.

These guidelines may be harmful,

(1) Anomalies that cause redundant work to be done during insertion into and modification of a
relation. And that may cause accidental loss of information during a deletion from a relation.

(2) Waste of storage space due to NULL and the difficulty of perfuming selections. Aggregation
operation and joins due to NULL values.

(3) Generation of invalid and spurious data during joins on improperly related base relations.

There problems may pointed out which can be detected with out additional tool of analysis’s.

Comment
Chapter 14, Problem 5RQ

Problem

What is a functional dependency? What are the possible sources of the information that defines
the functional dependencies that hold among the attributes of a relation schema?

Step-by-step solution

Step 1 of 3

Functional dependency: The functional dependency describes the relationship between the
attributes in a table. The functional dependency between the two attributes X, Y in a
relation R is said to be exist if one attribute determines the other attribute uniquely.

Comment

Step 2 of 3

The functional dependency is a property of the semantics i.e., the functional dependency
represents the semantic association between the attributes of the relation schema R. The main
use of the functional dependency is that it describes the relation schema R. It is done by
specifying the constraints on a relation R. These constraints are called legal extensions.

Comment

Step 3 of 3

In the functional dependency , the value Y is determined by the value of X i.e., X


determines Y.

Full functional dependency indicates that if A and B are attributes of the relation R then B is fully
functionally dependent on A, but not any proper subset of A.

Partial functional dependency indicates that if A and B are attributes of the relation R then B is
partially dependent on A if there is some attribute that can be removed from A and yet the
dependency still holds among the attributes of a relational schema.

Comment
Chapter 14, Problem 6RQ

Problem

Why can we not infer a functional dependency automatically from a particular relation state?

Step-by-step solution

Step 1 of 1

Certain FDs can be specified without refereeing to a specific relation, but as a property of those
attributes given there generally understood meaning. It is also possible that certain functional
dependencies may cease to exist in the real world if the relationship changes. Some tuples may
have values that agree to a supposed FD but a new tuple may not agree with the same. Since a
functional dependency is a property of the relation schema R, and not of a particular legal
relation state R, it is not possible to define FDs from a particular relation state.

Comment
Chapter 14, Problem 7RQ

Problem

What does the term unnormalized relation refer to? How did the normal forms develop historically
from first normal form up to Boyce-Codd normal form?

Step-by-step solution

Step 1 of 1

A unnormalized relation refer to a relation which does not meet any normal form condition.

The normalization process was first proposed by Codd(1972), takes a relation schema through
series of tests to certify weather it satisfies a certain normal form. The process, which proceeds
in a top-down fashion by evaluating each relation against criteria for normal forms and
decomposing relations as necessary, thus can be considered as relation design by analysis.
Initially Codd proposed three normal forms 1NF, 2NF and 3NF. A stronger definition of 3NF called
Boyce-Codd normal form(BCNF) was proposed later by Boyce and Codd. All these nornal forms
are based on a single analytical tool: the functional dependencies among attributes of relation.
1NF splits relation schema into schemas that have atomic values as domain for all attribues and
values of none of attribute is set of values. 2NF removes all partial dependencies of nonprime
attributes A in R on key and ensure that all nonprime attributes are fully functionally dependent
on the key of R. 3NF removes all transitive dependencies on key of R. and ensure that no non
prime attribute is transitively dependent on key.

Comment
Chapter 14, Problem 8RQ

Problem

Define first, second, and third normal forms when only primary keys are considered. How do the
general definitions of 2NF and 3NF, which consider all keys of a relation, differ from those that
consider only primary keys?

Step-by-step solution

Step 1 of 2

Definition of normal forms when only primary keys are considered

First Normal Form: It states that the domain of an attribute must include only atomic values and
that the values of any attribute in a tuple must be a single value from the domain of that attribute.
In other words first normal form does not allow relations with in relation as attribute values within
tuples.

Second Normal Form: It is based on concept of full functional dependency. A dependency X-> Y
is full functional dependency if after removing any attribute A from X dependency does no hold
any more. Else it is called partial dependency.

A relation schema R is said to be in 2NF f every nonprime attribute A in R is fully functional


dependent on the primary key of R.

Third Normal Form: It is based on concept of transitive dependency. A functional dependency X-


>Y in a relation schema R is a transitive dependency if there is a set of attributes of Z, that are
neither a candidate key nor a subset of any key of R, and both X->Z and Z->Y hold.

A relation schema is said to be in third normal form if it satisfies second normal form and no
nonprime attribute of R is transitively dependent on the primary key.

Comment

Step 2 of 2

The general definitions of 2NF and 3NF are different from general definition because general
definition takes into account candidate keys as well. As a general definition of prime atribute, an
attribute that is part of any candidate key will be considered as prime. Partial and full functional
dependencies and transitive dependencies will be considered with respect to all candidate keys
of a relation.

General definition of 2NF: A relation schema R is in second normal form if every non-prime
attribute A in R is not partially dependent on any key of R

General definition of 3NF: A relation schema is said to be in 3NF if, whenever a nontrivial
functional dependency X->A holds in R, either (a) X is a super key of R, or (b) A is a prime
attribute of R

A functional dependency is X-> Y trivial if X is superset of Y else dependency is non trivial.

Comment
Chapter 14, Problem 9RQ

Problem

What undesirable dependencies are avoided when a relation is in 2NF?

Step-by-step solution

Step 1 of 1

2NF removes all partial dependencies of nonprime attributes A in R on key and ensure that all
nonprime attributes are fully functionally dependent on the key of R.

Comment
Chapter 14, Problem 10RQ

Problem

What undesirable dependencies are avoided when a relation is in 3NF?

Step-by-step solution

Step 1 of 1

3NF removes all transitive dependencies on key of R. and ensure that no non prime attribute is
transitively dependent on key.

Comment
Chapter 14, Problem 11RQ

Problem

In what way do the generalized definitions of 2NF and 3NF extend the definitions beyond primary
keys?

Step-by-step solution

Step 1 of 1

The generalized definitions of second normal form and third normal form extend beyond primary
key by taking into consideration all the candidate keys of a relation.

• These definitions do not depend/revolve around only the primary key of a relation.

• These definitions take into consideration all the attributes that can be a possible key for a
relation

• These definitions also consider the partial and transitive dependencies on the candidate keys.

Comment
Chapter 14, Problem 12RQ

Problem

Define Boyce-Codd normal form. How does it differ from 3NF? Why is it considered a stronger
form of 3NF?

Step-by-step solution

Step 1 of 3

Boyce – Codd Normal Form (BCNF):

• A relation is said to be in BCNF if and only if every determinant is a candidate key.

• In the functional dependency XY, if the attribute Y is fully functionally dependent on X, then X is
said to be a determinant.

• A determinant can be composite or single attribute.

• BCNF is a stronger form of third normal form (3NF).

• A relation that is in BCNF will also be in third normal form.

Comment

Step 2 of 3

Following are the differences between 3NF and BCNF:

BCNF 3NF

BCNF is a stronger normal form than 3NF. 3NF is a weaker normal form than BCNF.

In the functional dependency XY, Y need not be In the functional dependency XY, Y must be
a prime attribute. a prime attribute.

It does not allow non-key attributes as


It allows non-key attributes as determinants.
determinants.

Comment

Step 3 of 3

BCNF is a stronger form of third normal form (3NF).

• In BCNF, every determinant must be a candidate key.

• BCNF does not allow some dependencies which are allowed in 3NF.

• A relation that is in BCNF will also be in third normal form.

• A relation that is in third normal form need not be in BCNF.

Comment
Chapter 14, Problem 13RQ

Problem

What is multivalued dependency? When does it arise?

Step-by-step solution

Step 1 of 3

Multivalued Dependency:

• It is defined as a full constraint between two different sets of attributes in a relation.

• This does not allow, having a set of values in a tuple.

• The tuples should be presented in a relation.

Comment

Step 2 of 3

Occurrence of Multivalued dependency:

• The relation which will have constraints that cannot be specified as the functional dependency
then the multivalued dependency arises.

• It will also occur when there is occurrence of one or more tuples in the same table in a
database.

Comment

Step 3 of 3

Example of the occurrence of multivalued dependency:

The Employee table has two multivalued dependencies listed below.

Ename ->> Pname

Ename ->> Dname

Here Ename indicates employee name, Pname indicates project name, and Dname indicates
dependent’s name.

This is a multivalued dependency because; an employee can work in more than one project and
can have more than one dependent.

Comment
Chapter 14, Problem 14RQ

Problem

Does a relation with two or more columns always have an MVD? Show with an example.

Step-by-step solution

Step 1 of 2

In a relation, when one attribute has multiple values referring to another attribute, then it indicates
that there is a multivalued dependency (MVD) in a relation.

An example of a relation with three attributes that have an MVD is as follows:

In the above relation, there exists two MVDs:

In order to remove the MVDs, decompose the relation into two relations as shown below:

Comment

Step 2 of 2

A relation with two or more columns will not always have a multivalued dependency (MVD).

An example of a relation with two attributes that does not have an MVD is as follows:

An example of a relation with three attributes that does not have an MVD is as follows:

Comment
Chapter 14, Problem 15RQ

Problem

Define fourth normal form. When is it violated? When is it typically applicable?

Step-by-step solution

Step 1 of 2

Violation of Fourth normal form:

• The fourth normal form is violated if the relation is having the multivalued dependencies which
are used to identify and decompose the relations in the relational schema R.

Comment

Step 2 of 2

Conditions for applying Fourth normal form:

• A relation can be in Fourth normal form, if the relation is in third normal form.

• For every non trivial dependencies X->Y where X is a superkey for R.

Comment
Chapter 14, Problem 16RQ

Problem

Define join dependency and fifth normal form.

Step-by-step solution

Step 1 of 2

Join dependency:

• It is a constraint which is specified on the relation schema which is denoted by JD (R1, R2, R3,
... ,Rn).

• A join dependency is said to be trivial join dependency if join dependency specified on the
relation schema is equal to R.

• It is a constraint with a set of legal relations over a database schema.

Comment

Step 2 of 2

Fifth normal form:

• It is a database normalization technique which is used to reduce the redundancy or duplicate


values of the relational databases recording multi valued facts.

• The table should be the standard for the fourth normal form.

• It is also called project join normal form because if there is any decomposition of the Relational
Schema R there will be lossless decomposition in join dependency.

• The fifth normal is defined with the join dependencies.

Comment
Chapter 14, Problem 17RQ

Problem

Why is 5NF also called project-join normal form (PJNF)?

Step-by-step solution

Step 1 of 2

Fifth Normal Form (5NF):

• A relation schema is said to be in fifth normal if it is in the fourth normal form and with the set of
the functional and join dependencies.

• The fifth normal is defined with the join dependencies.

If there is any decomposition of the Relational Schema R there will be lossless


decomposition in join dependency. So, the 5NF is called as project-join normal form
(PJNF).

Comment

Step 2 of 2

Examples of Project join normal form:

The following is the example of the project join normal form:

Consider when supplier(S) supplies the parts (p) to the projects (j).

The relationships are derived as follows:

• Supplier(S) supplies part (p).

• Project(j) uses the part (p) and

• Supplier(S) supplies at least one part (p) to the project (j).

Therefore it shows the join dependency in the relation which are decomposed into three relations
that are shown above and each relation is in 5NF.

Comment
Chapter 14, Problem 18RQ

Problem

Why do practical database designs typically aim for BCNF and not aim for higher normal forms?

Step-by-step solution

Step 1 of 1

Boyce Codd normal form (BCNF): The relation schema is said to be in BCNF whenever the
nontrivial functional dependency X->A in R and then X is the super key of the relational
schema(R).

The practical database design users prefer to use BCNF rather than going for the higher normal
forms because of the following reasons:

• It is simpler form of 3NF (third normal form).

• It reduces the redundancy (or duplicate) of the information in the thousands of tuples.

• The data model can be easily understood by using the BCNF normalization technique.

• It also improves the performance of the database when compared to the other normal forms.

• It is stronger than the 3NF because a relation in BCNF is also a relation in 3NF but not the vice-
versa.

• In most of the cases, the functional dependencies in R that violate the normal form up to BCNF
are not present.

The above points clearly say that database design users practically use BCNF when compared
to other higher normal forms which improve the consistency, performance and quality of the
database.

Comment
Chapter 14, Problem 19E

Problem

Suppose that we have the following requirements for a university database that is used to keep
track of students’ transcripts:

a. The university keeps track of each student’s name (Sname), student number (Snum), Social
Security number (Ssn), current address (Sc_addr) and phone (Sc_phone), permanent address
(Sp_addr) and phone (Sp_phone), birth date (Bdate), sex (Sex), class (Class) (‘freshman’,
‘sophomore’, …, ‘graduate’), major department (Major_code), minor department (Minor_code) (if
any), and degree program (Prog) (‘b.a.’, ‘b.s.’, ..., ‘ph.d.’). Both Ssn and student number have
unique values for each student.

b. Each department is described by a name (Dname), department code (Dcode), office number
(Doffice), office phone (Dphone), and college (Dcollege). Both name and code have unique
values for each department.

c. Each course has a course name (Cname), description (Cdesc), course number (Cnum),
number of semester hours (Credit), level (Level), and offering department (Cdept). The course
number is unique for each course.

d. Each section has an instructor (Iname), semester (Semester), year (Year), course
(Sec_course), and section number (Sec_num). The section number distinguishes different
sections of the same course that are taught during the same semester/year; its values are 1,2, 3,
..., up to the total number of sections taught during each semester.

e. A grade record refers to a student (Ssn), a particular section, and a grade (Grade).

Design a relational database schema for this database application. First show all the functional
dependencies that should hold among the attributes. Then design relation schemas for the
database that are each in 3NF or BCNF. Specify the key attributes of each relation. Note any
unspecified requirements, and make appropriate assumptions to render the specification
complete.

Step-by-step solution

Step 1 of 4

Functional Dependency:

Functional dependency exists when one attribute in a relation uniquely determines another
attribute. Functional dependency is represented as XY. X and Y can be composite.

The functional dependencies from the given information are as follows:

Comment

Step 2 of 4

From the functional dependencies FD 1 and FD 2, the relation STUDENT can be defined. Either
Ssn or Snum can be primary key.

From the functional dependencies FD 3 and FD 4, the relation DEPARTMENT can be defined.
Either Dname or Dcode can be primary key.

From the functional dependencies FD 5, the relation COURSE can be defined. Cnum is the
primary key.

From the functional dependencies FD 6, the relation SECTION can be defined. Sec_num,
Sec_course, Semester, Year will be the composite primary key.
From the functional dependencies FD 7 and FD 8, the relation GRADE can be defined. {Ssn,
Sec_course, Semester, Year} will be the composite primary key.

Comment

Step 3 of 4

The relations that are in third normal form are as follows:

Explanation:

• In STUDENT relation, either Ssn or Snum can be primary key. Either keys can be used to
retrieve the data from the STUDENT table.

• In DEPARTMENT relation, either Dname or Dcode can be primary key. Either keys can be used
to retrieve the data from the DEPARTMENT table.

• In COURSE table, Cnum is the primary key.

• The primary key for the SECTION table is {Sec_num, Sec_course, Semester, Year} which is a
composite primary key.

• The primary key for the GRADE table is {Ssn, Sec_course, Semester, Year} which is a
composite primary key.

Comment

Step 4 of 4

The relational schema is as follows:

Comment
Chapter 14, Problem 20E

Problem

What update anomalies occur in the EMP_PROJ and EMP_DEPT relations of Figures 14.3 and
14.4?

Step-by-step solution

Step 1 of 2

In EMP_PROJ, the partial dependencies can cause anomalies, that is

{SSN}-> {ENAME} and {PNUMBER}->{PNAME, PLOCATION}

Let the example as PROJECT temporarily has no EMPLOYEEs working on it.

when the last EMPLOYEE working on the information (PNAME, PNUMBER, PLOCATION) will
not be represented in the database and is removed. Then new PROJECT cannot be added
unless at least one EMPLOYEE is assigned to work on it.

Inserting a new tuples relating an existing EMPLOYEE to an existing PROJECT requires


checking both partial dependencies;

Let the example, if a different value is entered for PLOCATION than those values in other tuples
with the same value for PNUMBER, we get an update anomaly. Same like this comments apply
to EMPLOYEE information. The reason is that EMP_PROJ represents the relationship between
EMPLOYEEs and PROJECTs, and at the same time represents information concerning
EMPLOYEE and PROJECT entities.

Comment

Step 2 of 2

In EMP_DEPT, the transitive dependency can cause anomalies. That is

{SSN}->{DNUMBER}->{DNAME, DMGRSSN}

Let the Example for , if a DEPARTMENT temporarily has no EMPLOYEEs working for it, its
information (DNAME, DNUMBER, DMGRSSN) will not be represented in the database when the
last EMPLOYEE working on it is removed. A new DEPARTMENT cannot be added unless at
least one EMPLOYEE is assigned to work on it.

Inserting a new tuple relating a new EMPLOYEE to an existing DEPARTMENT requires checking
the transitive dependencies; for example, if a different value is entered for DMGRSSN than those
values in other tuples with the same value for DNUMBER, we get an update anomaly. The
reason is that EMP_DEPT represents the relationship between EMPLOYEEs and
DEPARTMENTs, and at the same time represents information concerning EMPLOYEE and
DEPARTMENT entities.

Comment
Chapter 14, Problem 21E

Problem

In what normal form is the LOTS relation schema in Figure 14.12(a) with respect to the restrictive
interpretations of normal form that take only the primary key into account? Would it be in the
same normal form if the general definitions of normal form were used?

Step-by-step solution

Step 1 of 1

575-10-23E

With respect to restrictive interpretation of normal form, the LOTS relational schema is in 2NF
since no partial dependencies are on the primary key. Other wise, it is not in 3NF, since following
two transitive dependencies are on the primary key:

PROPERTY_ID# ->COUNTY_NAME ->TAX_RATE, and

PROPERTY_ID# ->AREA ->PRICE.

Now, if we take all keys into account and use the general definition of 2NF and 3NF, then the
LOTS relation schema will only be in 1NF because there is a partial dependency

COUNTY_NAME ->TAX_RATE on the secondary key {COUNTY_NAME, #}, which violates 2NF.

Comment
Chapter 14, Problem 22E

Problem

Prove that any relation schema with two attributes is in BCNF.

Step-by-step solution

Step 1 of 2

BCNF:

• A relation R is said to be in BCNF if it contains a FD (functional dependencies) of the form a->b.

• Here, either a->b is a trivial FD or {a} is a super key of the relation R.

Comment

Step 2 of 2

Take he relation schema R= {a, b} with two attributes. Then the non-trivial FDs are

{a} -> {b} and {b} ->{a}.

The Functional Dependencies follows below cases:

Case 1: No FD holds in R.

In this case, the key is {a, b} and the relation satisfies BCNF.

Case 2: Only {a} -> {b} holds.

In this case, the key is {a} and the relation satisfies BCNF.

Case 3: Only {b} ->{a} holds.

In this case, the key is {B} and the relation satisfies BCNF.

Case 4: Both {a} -> {a} and {b} -> {a} hold.

In this case, there are two keys {a} and {a} and the relation satisfies BCNF.

Hence, any relation with two attributes is in BCNF.

Comment
Chapter 14, Problem 23E

Problem

Why do spurious tuples occur in the result of joining the EMP_PROJ1 and EMP_ LOCS relations
in Figure 14.5 (result shown in Figure 14.6)?

Step-by-step solution

Step 1 of 1

The spurious tuples are those tuples that are not valid. The spurious tuples occur in the result of
joining the EMP_PROJ1 and EMP_LOCS relations because the natural joining is based on the
common attribute Plocation.

• In EMP_LOCS, the primary key is {Ename, Plocation}.

• In EMP_PROJ1, the primary key is {Ssn, Pnumber}.

• The attribute Plocation is not a primary key or a foreign key in the relations EMP_PROJ1 and
EMP_LOCS.

• As Plocation is not a primary key or a foreign key in the relations EMP_PROJ1 and
EMP_LOCS, it resulted in spurious tuples.

Comment
Chapter 14, Problem 24E

Problem

Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional


dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for
R? Decompose R into 2NF and then 3NF relations.

Step-by-step solution

Step 1 of 1

575-10-26E

Let R = {A, B, C, D, E, F, G, H, I, J} and the set of functional dependencies

F = { {A, B}-> {C}, {A}->{D, E}, {B}->{F}, {F}->{G, H}, {D}->{I, J} }

A minimal set of attributes whose closure includes all the attributes in R is a key. Since the
closure of {A, B}, {A, B} + = R,

So, one key of R is {A, B}

Decompose R into 2NF and then 3NF

For this normalize R intuitively into 2NF then 3NF, we may follow below steps

Step 1:

Identify partial dependencies and that may violate 2NF. These are attributes that are

functionally dependent on either parts of the key, {A} or {B}, alone.

Now we can calculate the closures {A}+ and {B}+ to determine partially dependent attributes:

{A}+ = {A, D, E, I, J}. Hence {A} -> {D, E, I, J} ({A} -> {A} is a trivial dependency

{B}+ = {B, F, G, H}, hence {A} -> {F, G, H} ({B} -> {B} is a trivial dependency

For normalizing into 2NF, we may remove the attributes that are functionally dependent on part of
the key (A or B) from R and place them in separate relations R1 and R2, along with the part of
the key they depend on (A or B), which are copied into each of these relations but also remains
in the original relation, which we call R3 below:

R1 = {A, D, E, I, J}, R2 = {B, F, G, H}, R3 = {A, B, C}

The new keys for R1, R2, R3 are underlined. Next, we look for transitive dependencies

in R1, R2, R3.

The relation R1 has the transitive dependency {A} -> {D} -> {I, J}, so we remove the transitively
dependent attributes {I, J} from R1 into a relation R11 and copy the attribute D they are
dependent on into R11. The remaining attributes are kept in a relation R12. Hence, R1 is
decomposed into R11 and R12 as follows:

R11 = {D, I, J}, R12 = {A, D, E}

The relation R2 is similarly decomposed into R21 and R22 based on the transitive dependency
{B} -> {F} -> {G, H}:

R2 = {F, G, H}, R2 = {B, F}

The final set of relations in 3NF are {R11, R12, R21, R22, R3}

Comments (1)
Chapter 14, Problem 25E

Problem

Repeat Exercise for the following different set of functional dependencies G = {{A, B}, → {C}, {B,
D}→ {E, F}, {A, D}→{G, H}, {A}→{I},{H} → {J}}.

Exercise

Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional


dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for
R? Decompose R into 2NF and then 3NF relations.

Step-by-step solution

Step 1 of 6

The relation R={ A, B, C, D, E, F, G, H, I, J}

The set of functional dependencies are as follows:

{A, B}{C}

{B, D}{E, F}

{A, D}{G, H}

{A}{I}

{H}{J}

Step 1: Find the closure of single attributes:

{A}+{A, I}

{B}+{B}

{C}+{C}

{D}+{D}

{E}+{E}

{F}+{F}

{G}+{G}

{H}+{H, J}

{I}+{ I}

{J}+{J}

From the above closures of single attributes, it is clear that the closure of any single attribute
does not represent relation R. So, no single attribute forms the key for the relation R.

Comment

Step 2 of 6

Step 2: Find the closure of pairs of attributes that are in the set of functional
dependencies.

The closure of {A, B} is as shown below:

From the functional dependency {A, B}{C} and {A}{I},

{A, B}+{A, B, C, I}

The closure of { B, D} is as shown below:

From the functional dependency {B, D}{E, F},

{B, D}+{B, D, E, F}

The closure of { A, D} is as shown below:

From the functional dependency {A, D}{G, H}, {A}{I} and {H}{J},

{A, D}+{A, D, G, H, I, J}

From the above closures of pairs of attributes, it is clear that the closure of any pairs of attributes
does not represent relation R. So, no single attribute forms the key for the relation

Comment

Step 3 of 6

Step 3: Find the closure of union of the three pairs of attributes that are in the set of
functional dependencies.

The closure of {A, B, D} is as shown below:

From the functional dependency {A, B}{C}, {B, D}{E, F} and {A, D}{G, H}

{A, B, D}+{A, B, C, D, E, F, G, H}

From the functional dependency{A}{I}, the attribute I is added to {A, B, D}+.

Hence, {A, B, D}+{A, B, C, D, E, F, G, H, I}

From the functional dependency{H}{J}, the attribute J is added to {A, B, D}+.


Hence, {A, B, D}+{A, B, C, D, E, F, G, H, I, J}

The closure of {A, B, D} represents relation R.

Hence, the key for relation R is {A, B, D}.

Comment

Step 4 of 6

Decomposing the relation R into second normal form (2NF):

According to the second normal form, each non-key attribute must depend only on primary key.

• The key for relation R is {A, B, D}.

• {A} is a partial key that functionally determines the attribute I.

• {A, B} is a partial key that functionally determines the attribute C.

• {B, D } is a partial key that functionally determines the attribute E and F.

• {A, D} is a partial key that functionally determines the attribute G and H.

So, decompose the relation R into the following relations.

R1{A, I}

The key for R1 is {A}.

R2{A, B, C}

The key for R2 is {A, B}.

R3{B, D, E, F}

The key for R3 is { B, D}.

R4{A, B, D)

The key for R4 is { A, B, D}.

R5{A, D, G, H, J}

The key for R5 is { A, D}.

The relations R1, R2, R3, R4, R5 are in second normal form.

Comment

Step 5 of 6

Decomposing the relation R into third normal form (3NF):

According to the third normal form, the relation must be in second normal form and any non-key
attribute should not describe any non-key attribute.

• H is a non-key attribute that functionally determines the attribute J.

So, decompose the relation R5 into the following relations.

R6{A, D, G, H,}

The key for R3 is { A, D}.

R7{H, J}

The key for R7 is {H}.

Comment

Step 6 of 6

The final set of relations that re in third normal form are as follows:

R1{A, I}

R2{A, B, C}

R3{B, D, E, F}

R4{A, B, D)

R6{A, D, G, H,}

R7{H, J}

Comment
Chapter 14, Problem 26E

Problem

Consider the following relation:

A B C TUPLE#

10 bl cl 1

10 b2 c2 2

11 b4 cl 3

12 b3 c4 4

13 bl cl 5

14 b3 c4 6

a. Given the previous extension (state), which of the following dependencies may hold in the
above relation? If the dependency cannot hold, explain why by specifying the tuples that cause
the violation.

i. A → B,

ii. B → C,

iii. C → B,

iv. B → A,

v. C → A

b. Does the above relation have a potential candidate key? If it does, what is it? If it does not,
why not?

Step-by-step solution

Step 1 of 2

a)

1.) A->B does not hold good in current state of relation as attribute B has two values
corresponding to value 10 of attribute A.

2.) B->C: this relation can hold good in current relation state.

3.) C->B does not hold good in current state of relation as attribute B has two values
corresponding to value c1 of attribute C.

4.) B->A does not hold good in current state of relation as attribute A has two values
corresponding to value b1 and b3 of attribute B.

5.) C->A does not hold good in current state of relation as attribute A has two values
corresponding to value c1, c4 of attribute C.

Comment

Step 2 of 2

b) If value of attribute - TUPLE# remains different for all tuples in relation it can act as candidate
key.

Comment
Chapter 14, Problem 27E

Problem

Consider a relation R(A, B, C, D, E) with the following dependencies:

AB → C, CD → E, DE→ B

Is AB a candidate key of this relation? If not, is ABD? Explain your answer.

Step-by-step solution

Step 1 of 3

The candidate key is the minimal field or the combination of fields in a relation that can be used
to uniquely identify all the other fields of the given relation.

The candidate key is checked using the closure property of the set and the functional
dependencies of the given relation.

Comment

Step 2 of 3

Consider the given relation R (A, B, C, D, E) and the following function dependencies:

AB C, CD E, DE B

To check whether the key AB is the candidate key of the given relation R, find the closure of AB
as shown below:

Since, all the attributes of the relation R cannot be identified using the key AB, the AB is not the
candidate key for the given relation R.

Comment

Step 3 of 3

To check whether the key ABD is the candidate key of the given relation R, find the closure of
ABD as shown below:

Since, all the attributes of the relation R can be identified using the key ABD, the ABD is a
candidate key for the given relation R.

Hence, proved.

Comment
Chapter 14, Problem 28E

Problem

Consider the relation R, which has attributes that hold schedules of courses and sections at a
university; R = {Course_no, Sec_no, Offering_dept, Credit_hours, Course_level, lnstructor_ssn,
Semester, Year, Days_hours, Room_no, No_of_students}. Suppose that the following functional
dependencies hold on R:

{Course_no} → {Offering_dept, Credit_hours, Course_level}

{Course_no, Sec_no, Semester, Year} → {Days_hours, Room_no, No_of_students,


lnstructor_ssn}

{Room_no, Days_hours, Semester, Year} → {lnstructor_ssn, Course_no, Sec_no}

Try to determine which sets of attributes form keys of R. How would you normalize this relation?

Step-by-step solution

Step 1 of 5

Consider the following relation and functional dependencies:

Relation

Functional dependencies:

Comment

Step 2 of 5

The closure of Course_no is as shown below:

From the functional dependency

The attributes Offering_dept, Credit_hours, Course_level are added to the closure of Course_no
as Course_no functionally determines Offering_dept, Credit_hours, Course_level.

Comment

Step 3 of 5

The closure of Course_no, Sec_no, Semester, Year is as shown below:


Comment

Step 4 of 5

The closure of Room_no, Days_hours, Semester, Year is as shown below:

Comment

Step 5 of 5

Comment
Problem
Chapter 14, Problem 29E

Consider the following relations for an order-processing application database at ABC, Inc.

ORDER (O#, Odate, Cust#, Total_amount)

ORDER ITEM (O#, I#, Qty_ordered, Total_price, Discount%)

Assume that each item has a different discount. The Total_price refers to one item, Odate is the
date on which the order was placed, and the Total_amount is the amount of the order. If we apply
a natural join on the relations ORDER_ITEM and ORDER in this database, what does the
resulting relation schema RES look like? What will be its key? Show the FDs in this resulting
relation. Is RES in 2NF? Is it in 3NF? Why or why not? (State assumptions, if you make any.)

Step-by-step solution

Step 1 of 4

The natural join of two relations can be performed only when the relations have a common
attribute with the same name.

The relations ORDER and ORDER_ITEM have O# as a common attribute. So, based on the
attribute O#, the natural join of two relations ORDER and ORDER_ITEM can be performed.

The resulting relation RES when natural join is applied on relations ORDER and ORDER_ITEM
is as follows:

The key of the relation RES will be {O#,I#}.

Comment

Step 2 of 4

The functional dependencies in the relation RES are as given below:

Comment

Step 3 of 4

The relation RES is not in second normal form as partial dependencies exist in the relation.

• The key of the relation RES is {O#,I#}.

• O# is a partial primary key and it functionally determines Odate, Cust# and Total_amt%.

Comment

Step 4 of 4

According to the third normal form, the relation must be in second normal form and any non-key
attribute should not describe any non-key attribute.

The relation RES is not in third normal form as it is not in second normal form.

Comment
Chapter 14, Problem 30E

Problem

Consider the following relation:

CAR_SALE(Car#, Date_sold, Salesperson#, Commission%, Discount_amt)

Assume that a car may be sold by multiple salespeople, and hence {Car#, Salesperson#} is the
primary key. Additional dependencies are

Date_sold → Discount_amt and

Salesperson# → Commission%

Based on the given primary key, is this relation in INF, 2NF, or 3NF? Why or why not? How would
you successively normalize it completely?

Step-by-step solution

Step 1 of 4

The relation CAR_SALE is in first normal form (1NF) but not in second normal form.

• According to the first normal form, the relation should contain only atomic values.

• The primary key is {Car#, Salesperson#}.

• As the relation CAR_SALE contains only atomic values, the relation CAR_SALE is in the first
normal form.

Comment

Step 2 of 4

The relation CAR_SALE is not in second normal form as partial dependencies exist in the
relation.

• According to the second normal form, each non-key attribute must depend only on primary key.

• Salesperson# is a partial primary key and it functionally determines Commission%.

• As partial dependency exists in the relation, the relation CAR_SALE is not in second normal
form.

• In order to satisfy second normal form, remove the partial dependencies by decomposing the
relation as shown below:

CAR_SALE1(Car#, Date_sold, Salesperson#, Discount_amt)

CAR_SALE2 (Salesperson#, Commission%)

• The relations CAR_SALE1, and CAR_SALE2 are in second normal form.

Comment

Step 3 of 4

The relation CAR_SALE2 is in third normal form but the relation CAR_SALE1 is not in third
normal form as transitive dependencies exist in the relation.

• According to the third normal form, the relation must be in second normal form and any non-key
attribute should not describe any non-key attribute.

• In relations CAR_SALE1, Date_sold is a non-key attribute which functionally determines


Discount_amt.

• As transitive dependency exists in the relation, the relation CAR_SALE1 is not in third normal
form.

• In order to satisfy third normal form, remove the transitive dependencies by decomposing the
relation CAR_SALE1as shown below:

CAR_SALE3 (Car#, Date_sold, Salesperson#)

CAR_SALE4 (Date_sold, Discount_amt)

• The relations CAR_SALE3 and CAR_SALE4 are now in third normal form.

Comment

Step 4 of 4

The final set of relations that are in third normal are as follows:

CAR_SALE2 (Salesperson#, Commission%)

CAR_SALE3 (Car#, Date_sold, Salesperson#)

CAR_SALE4 (Date_sold, Discount_amt)

Comment
Chapter 14, Problem 31E

Problem

Consider the following relation for published books:

BOOK (Book_title, Author_name, Book_type, List_price, Author_affil, Publisher)

Author_affil refers to the affiliation of author. Suppose the following dependencies exist:

Book_title → Publisher, Book_type

Book_type → List_price

Author_name → Author_affil

a. What normal form is the relation in? Explain your answer.

b. Apply normalization until you cannot decompose the relations further. State the reasons
behind each decomposition.

Step-by-step solution

Step 1 of 4

a.

The relation Book is in first normal form (1NF) but not in second normal form.

Explanation:

• According to the first normal form, the relation should contain only atomic values.

• The primary key is (Book_Title, Author_Name).

• As the relation Book contains only atomic values, the relation Book is in the first normal form.

• According to the second normal form, each non-key attribute must depend only on primary key.

• Author_Name is a partial primary key and it functionally determines Author_affil.

• Book_title is a partial primary key and it functionally determines Publisher and Book_type.

• As partial dependency exists in the relation, the relation Book is not in second normal form.

Comment

Step 2 of 4

b.

The relation Book is in first normal form. It is not in second normal form as partial dependencies
exist in the relation.

In order to satisfy second normal form, remove the partial dependencies by decomposing the
relation as shown below:

Book_author (Book_title, Author_name)

Book_publisher(Book_title, Publisher, Book_type,

List_price)

Author(Author_name, Author_affil)

The relations Book_author, Book_publisher and Author are in second normal form.

Comment

Step 3 of 4

According to the third normal form, the relation must be in second normal form and any non-key
attribute should not describe any non-key attribute.

• The relations Book_author and Author is in third normal form.

• The relations Book_publisher is not in third normal form as transitive dependency exists in the
relation.

• Book_type is a non-key attribute which functionally determines List_price.

• In order to satisfy third normal form, remove the transitive dependencies by decomposing the
relation Book_publisher as shown below:

Book_details(Book_title, Publisher,Book_type)

Book_price (Book_type, List_price)

The relations Book_author, Book_details, Book_price and Author are in third normal form.

Comment

Step 4 of 4

The final set of relations that are in third normal are as follows:
Book_author (Book_title, Author_name)

Book_details (Book_title, Publisher,Book_type)

Book_price (Book_type, List_price)

Author(Author_name, Author_affil)

Comment
Chapter 14, Problem 32E

Problem

This exercise asks you to convert business statements into dependencies. Consider the relation
DISK_DRIVE (Serial_number, Manufacturer, Model, Batch, Capacity, Retailer). Each tuple in the
relation DISK_DRIVE contains information about a disk drive with a unique Serial_number, made
by a manufacturer, with a particular model number, released in a certain batch, which has a
certain storage capacity and is sold by a certain retailer. For example, the tuple Disk_drive
(‘1978619’, ‘WesternDigital’, ‘A2235X’, ‘765234’, 500, ‘CompUSA’) specifies that WesternDigital
made a disk drive with serial number 1978619 and model number A2235X, released in batch
765234; it is 500GB and sold by CompUSA.

Write each of the following dependencies as an FD:

a. The manufacturer and serial number uniquely identifies the drive.

b. A model number is registered by a manufacturer and therefore can’t be used by another


manufacturer.

c. All disk drives in a particular batch are the same model.

d. All disk drives of a certain model of a particular manufacturer have exactly the same capacity.

Step-by-step solution

Step 1 of 1

a)

manufacturer, serialNumber → model, batch, capacity, retailer

b)

model → manufacturer

c)

manufacturer, batch → model

d)

model → capacity

Comments (1)
Chapter 14, Problem 33E

Problem

Consider the following relation:

R(Doctor#, Patient#, Date, Diagnosis, Treat_code, Charge)

In the above relation, a tuple describes a visit of a patient to a doctor along with a treatment code
and daily charge. Assume that diagnosis is determined (uniquely) for each patient by a doctor.
Assume that each treatment code has a fixed charge (regardless of patient). Is this relation in
2NF? Justify your answer and decompose if necessary. Then argue whether further
normalization to 3NF is necessary, and if so, perform it.

Step-by-step solution

Step 1 of 1

Let the relation R (Doctor#, Patient#, Date, Diagnosis, Treat_code , Change)

Functional dependencies of relation R is

{Doctor#, Patient#, Date}→{Diagnosis, Treat_code, Charge}

{Treat_code}→{Charge}

Here there is no partial dependencies, So, the given relation is in 2NF. And it is not 3NF because
the Charge is a nonkey attribute that is determined by another nonkey attribute, Treat_code.

We must decompose this as:

R (Doctor#, Patient#, Date, Diagnosis, Treat_code)

R1 (Treat_code, Charge)

We could further infer that the treatment for a given diagnosis is functionally dependant, but we
should be sure to allow the doctor to have some flexibility when prescribing cures.

Comment
Chapter 14, Problem 34E

Problem

Consider the following relation:

CAR_SALE (Car_id, Option_type, Option_listprice, Sale_date, Option_discountedprice)

This relation refers to options installed in cars (e.g., cruise control) that were sold at a dealership,
and the list and discounted prices of the options.

If CarlD → Sale_date and Option_type → Option_listprice and CarlD, Option_type →


Option_discountedprice, argue using the generalized definition of the 3NF that this relation is not
in 3NF. Then argue from your knowledge of 2NF, why it is not even in 2NF.

Step-by-step solution

Step 1 of 3

The relation CAR_SALE is as shown below:

CAR_SALE( Car_id, Option_type, Option_listprice,

Sale_date, Option_discountedprice)

The functional dependencies are as given below:

Car_id Sale_date

Option_type Option_listprice

Car_id, Option_type Option_discountedprice

Comment

Step 2 of 3

In order for a relation to be in third normal form, all nontrivial functional dependencies must be
fully dependent on the primary key and any non-key attribute should not describe any non-key
attribute. In other words, there should not be any partial dependency and transitive dependency.

• For the relation CAR_SALE, Car_id, Option_type is a primary key.

• In functional dependency Car_id Sale_date, Car_id is a partial key that determines Sale_date.
Hence, there exists partial dependency in the relation.

• In functional dependency Option_type Option_listprice, Option_type is a partial key that


determines Option_type. Hence, there exists partial dependency in the relation.

Therefore, the relation CAR_SALE is not in third normal form.

Comment

Step 3 of 3

According to the second normal form, the relation must be in first normal form and each non-key
attribute must depend only on primary key. In other words, there should not be any partial
dependency.

• For the relation CAR_SALE, Car_id, Option_type is a primary key.

• In functional dependency Car_id Sale_date, Car_id is a partial key that determines Sale_date.
Hence, there exists partial dependency in the relation.

• In functional dependency Option_type Option_listprice, Option_type is a partial key that


determines Option_type. Hence, there exists partial dependency in the relation.

Therefore, the relation CAR_SALE is not in second normal form.

Comment
Chapter 14, Problem 35E

Problem

Consider the relation:

BOOK (Book_Name, Author, Edition, Year)

with the data:

Book_Name Author Edition Copyright_Year

DB_fundamentals Navathe 4 2004

DB_fundamentals Elmasri 4 2004

DB_fundamentals Elmasri 5 2007

DB_fundamentals Navathe 5 2007

a. Based on a common-sense understanding of the above data, what are the possible candidate
keys of this relation?

b. Justify that this relation has the MVD {Book} ↠ {Author} | {Edition, Year}.

c. What would be the decomposition of this relation based on the above MVD? Evaluate each
resulting relation for the highest normal form it possesses.

Step-by-step solution

Step 1 of 3

Candidate Key

A candidate key may be a single attribute or a set of attribute that uniquely identify tuples or
record in a database. Subset of candidate key are called prime attributes and rest of the
attributes in the table are called non-prime attributes.

Book_Name Author Edition Copyright_Year

DB_fundamentals Navathe 4 2004

DB_fundamentals Elmasri 4 2004

DB_fundamentals Elmasri 5 2007

DB_fundamentals Navathe 5 2007

Book_Name is same in all rows so this can’t be consider as a part of candidate key.

a.

Possible candidate keys:

(Author, Edition), (Author, Copyright_Year), (Book_Name, Author, Edition), (Book_Name, Author,


Copyright_Year), (Author, Edition, Copyright_Year), (Book_Name, Author, Edition,
Copyright_Year).

All above sets are candidate keys. Any one candidate key can be implemented. (Author, Edition),
(Author, Copyright_Year) will be a better choice to implement.

Comment

Step 2 of 3

b.

Multi Valued Dependency (MVD):

MVD occurs when the presence of one or more tuples in the table implies the presence of one or
more other rows in the same table. If at least two rows of table agree on all implying attributes,
then there components might be swapped, and the resulting tuples must be in the table. MVD
plays very important role in 4NF.

Consider the MVD .

The relationship indicates that the relationship between


Book_Name and Author is independent of the relationship between Book_Name and (Edition,
Copyright_Year).

By the definition of MVD, Book_Name is implying more than one Author and (Edition,
Copyright_Year). If the components of Author, Edition and Copyright are swapped than the
resulting rows would be present in the table. Therefore, the relation has MVD
.

Comment

Step 3 of 3

c.

Decomposition on the basis of MVD:

If a relation has MVD then redundant values will be there in the tuples and hence functional
dependency would not exist in that relation. Therefore, the relation will be in BCNF. So relation
can be decomposed into the following relations:

BOOK1 (Book_Name, Author, Edition)

BOOK2 (Edition, Copyright_Year)

Again BOOK1 is following MVD. Decompose it further and the final schema will be holding
highest normal form.

BOOK1_1 (Book_Name, Author)

BOOK1_2 (Book_Name, Edition)

BOOK2 (Edition, Copyright_Year)

Comment
Chapter 14, Problem 36E

Problem

Consider the following relation:

TRIP (Trip_id, Start_date, Cities_visited, Cards_used)

This relation refers to business trips made by company salespeople. Suppose the TRIP has a
single Start_date but involves many Cities and salespeople may use multiple credit cards on the
trip. Make up a mock-up population of the table.

a. Discuss what FDs and/or MVDs exist in this relation.

b. Show how you will go about normalizing the relation.

Step-by-step solution

Step 1 of 2

Relation TRIP has unique attribute Trip_id and particular Trip_id has single Start_date of the trip.
So Start_date is fully functionally dependent on Trip_id.

a.

FDs and MVDs that exist in the relation are:

FD1: ( )

Cities_visited and Cards_used may repeat for particular Start_date or Trip_id. Cities_visited and
Cards_used are independent of each other and they also have multiple values. Also, both
Cities_visited and Cards_used are dependent on Trip_id and Start_date, so the MVDs present in
the relation are as follows:

MVD1: ( )

MVD2: ( )

Comment

Step 2 of 2

b.

Normalizing relation

Relation is having one FD and two MVDs, so first split the relation to remove functional
dependency FD1.

TRIP1 ( Trip_id, Start_date)

Now split relation to remove multi valued functional dependency. Cities_visited and Cards_used
are independent of each other, if their components are swapped then relation will remain
unchanged. On the basis of Start_date, the relation can be decomposed as follows:

TRIP2 (Start_date, Cities_visited)

TRIP3 (Start_date, Cards_used)

Following is the final schema for the table provided.

TRIP1 ( Trip_id, Start_date)

TRIP2 (Start_date, Cities_visited)

TRIP3 (Start_date, Cards_used)

Comment
Chapter 15, Problem 1RQ

Problem

What is the role of Armstrong’s inference rules (inference rules IR1 through IR3) in the
development of the theory of relational design?

Step-by-step solution

Step 1 of 1

There are six inference rules (IR) for functional dependencies (FD) of which first 3 rules:
reflexive, augmentations, and transitive, are referred as Armstrong axioms.

Inference Rule 1 (reflexive rule)

If , then .

The reflexive rule is defined as any set of attributes functionally determines itself.

Inference Rule 2 (augmentation rule)

The augmented rule is defined as, when extending the left-hand side attributes of a FD results in
another valid FD.

Inference Rule 3 (transitive rule)

Transitive rule is defined as if A determines B and B determine C then A determines C.

Database designers specify the set of functional dependencies F that can be determined by
defining the attributes of relation R, and then IR1, IR2 and IR3 are used to define additional
functional dependencies that hold on R. These 3 inference rules are inferring new functional
dependencies (additional rules can also be determined from them). Hence they define new facts
and preferred by database designers in relational database design.

Comment
Chapter 15, Problem 2RQ

Problem

What is meant by the completeness and soundness of Armstrong’s inference rules?

Step-by-step solution

Step 1 of 1

The inference rules (IR) for functional dependencies (FD) reflexive, augmentation, and transitive
rules are referred as Armstrong inference rules.

Inference Rule 1 (reflexive rule)

If , then .

The reflexive rule is defined as any set of attributes functionally determines itself.

Inference Rule 2 (augmentation rule)

The augmented rule is defined as, when extending the left-hand side attributes of a FD results in
another valid FD.

Inference Rule 3 (transitive rule)

Transitive rule is defined as if A determines B and B determine C then A determines C.

As given by Armstrong, the inference rules IR1, IR2, and IR3 are sound and complete.

Sound

It means that for any given set of functional dependencies F specified on a relation schema R,
any dependency that is defined from F by using IR1 through IR3 that contained in every relation
states of relation R, satisfies the dependencies in F.

Complete

It means that using IR1 through IR3 continuously again and again to define dependencies until
there are no more dependencies can be defined from it, results in the complete set of all possible
dependencies that can be defined from F.

Comment
Chapter 15, Problem 3RQ

Problem

What is meant by the closure of a set of functional dependencies? Illustrate with an example.

Step-by-step solution

Step 1 of 2

The closure of a set of functional dependencies is nothing but a set of dependencies that consist
of functional dependencies of a relation denoted by F as well as the functional dependencies that
can be inferred from or implied by F.

The closure of a set of functional dependencies of a relation R is denoted by F+.

Comment

Step 2 of 2

Example:

Consider a relation Student with attributes StudentNo, Sname, address, DOB, CourseNo ,
CourseName, Credits, Duration.

The functional dependencies of Student are as follows:

The set of functional dependencies of Student is denoted by F.

So,

The functional dependencies that can be inferred from F are as follows:

Hence,

Comment
Chapter 15, Problem 4RQ

Problem

When are two sets of functional dependencies equivalent? How can we determine their
equivalence?

Step-by-step solution

Step 1 of 1

• Two set of functional dependencies (FD) A and B are equivalent if . Hence


equivalence means that every FD in A can be defined from B, and every FD in B can be defined
from A, A is equivalent to B if both the conditions, A covers B and B covers A, hold.

• A set of functional dependencies A is said to cover another set of functional dependencies B if


every FD in B is also in , it implies if every dependency in B can be defined from A, it can be
referred as B is covered by A.

• Whether A covers B, the statement is determined by calculating with respect to A for each
FD in B, then checking whether this includes the attributes in F, if this holds true for
every FD in B, then A covers B. Similarly determined for B covers A and hence both A and B are
said to be equivalent.

Comment
Chapter 15, Problem 5RQ

Problem

What is a minimal set of functional dependencies? Does every set of dependencies have a
minimal equivalent set? Is it always unique?

Step-by-step solution

Step 1 of 1

If a set of functional dependencies F is said to be minimal sets if it satisfies the following


conditions.

1. There are set of dependencies in F, and then every dependency in F contains one single
attribute for its right-hand side.

2. Any dependency in F cannot be replaced with another dependency , where Q


is a proper subset of P; it contains a set of dependencies that is equivalent to F.

3. Any dependency cannot be removed from F and contains a set of dependencies that is
equivalent to F.

Condition 1 states that every dependency is accepted with a single attribute on the right-hand
side.

Conditions 2 and 3 ensure that there are no dependencies that occur repeatedly either by having
redundant attributes on the left-hand side of a dependency or by having a dependency that can
be defined from the remaining FDs in a set of functional dependency F respectively.

A minimal cover of a set of functional dependencies A is a set of functional dependencies F that


satisfies the property that every dependency in A is in the closure of F, and is a minimal set
of dependencies equivalent to A without redundancy in a standard acceptable form. Hence there
is an equivalent set which is unique.

Comment
Chapter 15, Problem 6RQ

Problem

What is meant by the attribute preservation condition on a decomposition?

Step-by-step solution

Step 1 of 1

Attribute preservation condition on decomposition:

Decomposition:-

Replace an un normalized relation by a set of normalized relations.

Let is the relation schema than is a decomposition.

Attribute preservation

Every Attribute is in some relation. All attributes must be preserved through the process of
normalization.

Start with universal relation schema

that includes all the attributes of the database.

Here every attribute name is unique

Using the functional dependencies, the algorithms decompose the universal relation schema R
into a set of relation schemas that will become the relational database
schema. D is called decomposition of .

Such that and

Each attribute in ‘R’ will appear in at least one relation schema in the decomposition so that
no attributes are lost.

Attribute preservation condition of decomposition

Comment
Chapter 15, Problem 7RQ

Problem

Why are normal forms alone insufficient as a condition for a good schema design?

Step-by-step solution

Step 1 of 1

forms along in sufficient as a condition for good schema design from the describe properties of
decompositions,

1) loss less joint property and

2) Dependency preservation property,

Using these both, used by the design algorithms to achieve desirable decomposition

It is insufficient to test the relation schemas independently of one another for compliance with
higher normal from like 2nF, 3NF and 13 CNF. The resulting relations must collectively satisfy
these two additional propertied dependency preservation and loss less join property to quality as
a good.

Comment
Chapter 15, Problem 8RQ

Problem

What is the dependency preservation property for a decomposition? Why is it important?

Step-by-step solution

Step 1 of 2

Dependency preservation property for decomposition:-

Let be a set of functional dependencies on schema . Take be a


decomposition of the projection of on : is denoted by .

Where is subset of .is the set of all functional dependencies such that attributes in
are contained in . dlence the projection of on each relation schema in the
decomposition is the set of functional dependencies in . Such that all their LHS and RHS
attributes are in .

of the dependencies that hold on each belongs to be equivalent to closure of .

Comment

Step 2 of 2

Important:-

1) With this property we would like to check easily that updates to the database do not result in
illegal relations being created.

2) It would be nice if our design allowed us to check updates without having to compute natural
joins. To know whether joins must be computed.

3) We want to preserve dependencies because each dependencies in represents a constraint


on the database.

4) It is always possible to find a dependency preserving decomposition with respect to


such that each relation in is in .

Comment
Chapter 15, Problem 9RQ

Problem

Why can we not guarantee that BCNF relation schemas will be produced by dependency-
preserving decompositions of non-BCNF relation schemas? Give a counterexample to illustrate
this point.

Step-by-step solution

Step 1 of 3

We can not guarantee that relation schemas will be produced by dependency-preserving


decompositions of non- relation schema.

For this, consider are example.

Take two functional dependencies that exist in the relation .

-fd1:

Fd2: instructor course.

Here is a candidate keys so. This relation is in but not in

Comment

Step 2 of 3

Comment

Step 3 of 3

A relation is NOT in BCNF. That should be decomposed, so as to meet this property. While
possible forgoing the preservation of all functions dependencies in the decomposed relations

Comment
Chapter 15, Problem 10RQ

Problem

What is the lossless (or nonadditive) join property of a decomposition? Why is it important?

Step-by-step solution

Step 1 of 1

Loss Less join property of decomposition:

This is the one property of decomposition. The word loss in lossess means, lost of information.
But not to loss of tuples.

Basic definition of loss less – join.

A decomposition of has the losses join property with respect to . Set


of dependencies on if, for every relation of that satisfies , the following holds.

Where is the natural join of all the relations in D

Equation

Emp-PROJ

SSN PNUM Hours ENAME PNAME PLOCATION

SSN ENAME

PNUM PNAME PLOCATION

SSN PNUM hours

Here “hours” is the lossless join.

Important:-

Important feature of decomposition is that it gives lossless joins. It shows the problem of spurious
tuples.

If the relations chosen do not have total information afoot the entity /relationship, when we join
the relations, then obtain the tuples. Actually that is not belonging in there.

These spurious tuples contain the wrong in formation.

To avoid this type problems, we can go through lossless join property.

Comment
Chapter 15, Problem 11RQ

Problem

Between the properties of dependency preservation and losslessness, which one must definitely
be satisfied? Why?

Step-by-step solution

Step 1 of 1

Dependency preservation and loss lenses both are describe by the properties of decompositions.
With this both are used by the algorithms to achieve desirable decompositions.

Property of dependency preservation:-

It ensures us to in force a constraint on the original relation from corresponding instances in the
smaller relations.

Property of lossless join property:-

It ensures that to find out any instance of the original relation from corresponding instance in the
smaller relations.

Here no spurious rows are generated. When relations are reunited through natural join operation.

To test the relation schemas independently of one another for compliance with higher normal
forms like , and , dependency preservation is not sufficient.

Comments (1)
Chapter 15, Problem 12RQ

Problem

Discuss the NULL value and dangling tuple problems.

Step-by-step solution

Step 1 of 2

NULL values and dangling tuple problems.

When designing a relational database schema, we must consider the problems with NULLS.

NULLS can have multiple interpretations. That are

1) The attribute does not apply to this tuple

2) The attribute value for this tuple is unknown.

3) The value is known but absent, that is, it has not been recorded yet.

Comment

Step 2 of 2

Dangling tuples:-

Tuples that “disappear” in computing a join.

Let a pair of relations and and the natural join . And tuple in ‘ ’ that does
not join with any tuple in .

There is no tuple ‘ ’in . Such that

This is called dangling tuple. It may or may not e acceptable.

Example:

For suppose there is a tuple in the account relation with the value of “ ”,
but no matching tuple in the branch relation for the Town 1

branch.

This is undesirable. As should refer to a branch that exists. and now there is a another tuple
. In the branch relation with “ ”, but no matching tuple in The account
relation for the “ ”branch.

Means that, a branch exists for which no accounts exist. When a branch is being opened.

Comment
Chapter 15, Problem 13RQ

Problem

Illustrate how the process of creating first normal form relations may lead to multivalued
dependencies. How should the first normalization be done properly so that MVDs are avoided?

Step-by-step solution

Step 1 of 2

Multivalued dependencies are a consequence of first normal form which disallows an attribute in
a tuple to have a set of values. If we have two or more multivalued independent attributes in the
same relation schema, we get into a problem of having to repeat every value of one of the
attributes with every value of other attribute to keep the relation state consistent and to maintain
the independence among attributes involved. this constraint is specified by a multivalued
dependency.

For example: consider a EMP relation with attributes Ename, Project_name, Dependent_name

the relation has following tuples:

1.) ('a','x','n')

2.) ('a','x','m')

3.) ('a','y','n')

4.) ('a','y','m')

Comment

Step 2 of 2

Here employee name 'a' has two depenedents and work for two projects. Since each attribute
value must be atomic, the problem of multivalued dependency has risen in the relation.

Informally, whenever two independent 1:N relationships A:B and A:C are mixed in the same
relation, R(A, B, C) an MVD may arise.

Whenever a relation schema R is decomposed into R1= (X U Y) and R2 = (R-Y) based on an


MVD X->>Y that holds in R, the decomposition has nonadditive join property.

The property NJB': The relation schema R1 and R2 form a nonadditive join decomposition of R
with respect to a set of functional and multivalued dependencies if and only if

(R1 n R2)->>(R1- R2)

...deals with problem of MVD and thus using this property we can get a relation which is in 1NF
and does not has MVD.

Comment
Chapter 15, Problem 14RQ

Problem

What types of constraints are inclusion dependencies meant to represent?

Step-by-step solution

Step 1 of 1

Types of constraints are inclusion dependencies ment to represent.

inclusion dependencies, it is defined in order to formalize two types of interrelational


constraints. Which cannot be expressed using functional dependencies or multivalued
dependencies.

That two are

Referential integrity constraint:-

It relates attributes across relations. So, the foreign key or referential integrity constraint cannot
be specified as a functional or multivalued dependency.

Class/subclass relationship:-

It represents a relations between two the class/subclass relationship. Also has no formal
definition in terms of the functional, multivalued and join dependencies.

Comment
Chapter 15, Problem 15RQ

Problem

How do template dependencies differ from the other types of dependencies we discussed?

Step-by-step solution

Step 1 of 2

Template dependencies differ from the other type of dependences

Template dependencies:-

It is a technique for representing constraints in relations. Based on the semantics of attributes


with in the relation some peculiar constraint may option. Basic idea of template dependencies is,
to specify a template-or-example. That defines each constraint or dependency.

In this dependencies, there are two types

(1) Tuple-generating templates

(2) Constraint generating templates.

And a template consists of number of hypothesis tuples that appear in one or more relations.

Comment

Step 2 of 2

And other part of template is template conclusion. The conclusion is a set of tuples that must also
exist in the relations. If the hypothesis tuples are there.

Let one example

Take relation

We may apply the template dependencies to this relation, , it shous the template for functional
dependencies .

Hypothesis

Here we take

Conclusion

But while come through other dependencies it is some what different.

Comment
Chapter 15, Problem 16RQ

Problem

Why is the domain-key normal form (DKNF) known as the ultimate normal form?

Step-by-step solution

Step 1 of 1

Domain-key normal form is known as ultimate normal form.

Behind the idea of domain-key normal form is. It specify the ultimate normal form that taken in to
account all possible types of dependencies that should hold on the valid relation states can be
enforced simply by domain constraints and key constraints.

- A relation in DkNF has no modification anomalies, and conversely.

- DkNF is the ultimate normal form means, here no higher normal form related to modification
anomalies.

- In domain – key normal form the relation is on every constraint. That is logical consequence of
the definition of keys and domains.

Keys: - the unique identifier of a tuple.

Damain:- physical and logical description of an attributes.

Comment
Chapter 15, Problem 17E

Problem

Show that the relation schemas produced by Algorithm 15.4 are in 3NF.

Step-by-step solution

Step 1 of 1

Assume that one of the relation schemas , formed by algorithm 15.4 is not in 3NF.

Now a functional dependency is valid in where,

• M is not a super key of R.

• A is not a prime attribute of R.

However, as per the step 2 of algorithm will comprise of a set of attributes


where for , implying that X is a key of and
that are the only nonprime attributes of .

Thus, if a functional dependency holds in the relation schema , where A is not prime
and M is not a super key of , then M must be a subset of X or else M would comprise of X and
therefore would be a super key.

If both and holds and M is a subset of X, then this contradicts the condition
that is a functional dependency in a minimal cover of functional dependencies, as
removing an attribute from the key X of functional dependency leaves a valid functional
dependency.

This infringes one of the minimality conditions and hence the relational schema must be in
3NF.

Comment
Chapter 15, Problem 18E

Problem

Show that, if the matrix S resulting from Algorithm 15.3 does not have a row that is all a symbols,
projecting Son the decomposition and joining it back will always produce at least one spurious
tuple.

Step-by-step solution

Step 1 of 2

Let take the universal relation and a decomposition of and a set


of functional dependencies.

Based on Algorithm (given in the text book)

Take the matrix S, it is considered to be some relation state of . (From step1 in algorithm)

Row in represents a tuple ,it is corresponding to and that has a symbols in columns
and that correspond to the attributes of and symbols in the remaining columns.

From the step 4 of Algorithm

During the loop, the algorithm then transforms the rows of this matrix, that they represent the
tuples.

So, the tuples satisfy all the functional dependencies in . Any two rows in which
represents two tuples in that agree in their values for the left-hand-side attributes of a
functional dependency in and it will also agree in their values for the right-hand-side
attributes .

If any row in ends up with all a symbols, then the decomposition has the non additive join
property with respect to .

In other hand, if no row ends up being all a symbols, decomposition ‘D’ does not satisfy the
lossless-join Property.

Comment

Step 2 of 2

At this time the relation state represented by . And relation state of that satisfies the
dependencies in . But does not satisfy the non additive join condition.

From step 4:-

The loop in the algorithm cann’t change any symbols to a symbols.

So the symbols.

So, the Ruslting matrix ‘S’ does not have a row with all ‘a’ symbols and the decomposition does
not have the loss-join property.

Let take the example.

Consider the relational schema and set of functional dependencies

Comment
Chapter 15, Problem 19E

Problem

Show that the relation schemas produced by Algorithm 15.5 are in BCNF.

Step-by-step solution

Step 1 of 2

Show that the relation schemas produced by algorithm are in BCNE.

In this algorithm the loop will continue until all relation schemas are in BCNF 11.3 Algorithm

Input: A universal relation and a set of functional dependencies on the attributes of

Step 1 : Set D :

Step 2: while there is a relation schema Q is D that is not in BCNF do

Choose the relation schema in that is not in ;

Find the functional dependency in a that violates BCNF; replace Q in D by two relation
schemas and

Comment

Step 2 of 2

According to this algorithm, we decompose one relation schema Q. That is not in BCNF into two
relation schemas. According to the property of lossless join decomposition property 1, for binary
composition and claim 2 (Preservation of Nonadditivity in successive Decompositions) [which is
menctioved in text book], the decomposition D has the no additive join property.

At the end of the algorithm. All relation schemas in D will be in BCNF.

Example:-

Working of this algorithm.

Let take one relation (for example) which is not in BCNF.

Project –ID Company-name Ploat # Area Price Tax-Rate

First loop: it is in BCNF

Project –ID Company-name Ploat # Area Price

Company-name Tax-Rate

Second loop: it is also not in BCNF

Project –ID Company-name Ploat # Area

Area Price

Company-name Tax-Rate

Final loop: it is under in BCNF

Project –ID Area Ploat #

Area Company-name

Company-name Tax-Rate

Comment
Chapter 15, Problem 20E

Problem

Write programs that implement Algorithms 15.4 and 15.5.

Step-by-step solution

Step 1 of 6

Program to implement Algorithm 15.4

The following program converts a relational schema into 3NF. SynthesisAlgorithm is a public
class having main method to start execution. First, program takes the input from the keyboard,
stores them into several list. Input values are the attribute names and functional dependencies
for the relation.

In this program, first step calculates minimal cover of the functional dependencies. Second step
calculates the attributes to be considered for the relation. Third step checks whether or not
primary key is contained in any of the relation. Forth step finds if there is any redundant relation
and removes that relation from the schema.

Following is the java code to implement Synthesis algorithm to convert a relation into 3NF.

import java.util.*;

import java.io.*;

public class SynthesisAlgorithm

// main method to start the execution of the program.

public static void main(String []args)

BufferedReader br=new BufferedReader(new InputStreamReader(System.in));

// If irrelevant values will be entered, it might give // Exception at Runtime.

System.out.println("Note: Everything is case

Sensitive, please enter values in the same case

everywhere.");

System.out.println("Enter the name of Relation:");

// It will store the name of relation.

String relationName=br.readLine();

System.out.println("How many attributes are there in

the Relation?");

// Number of attributes in the relation for efficient management of the attributes.

int n=Integer.parseInt(br.readLine());

System.out.println("Type name of one attribute in each

line:");

// This list contains all attribute names.

LinkedList<String> attributeList=new

LinkedList<String>();

// for loop will insert all attributes to the list.

for(int i=0;i<n;i++)

attributeList.add(br.readLine());

System.out.println("How many functional dependencies

are there in the relation "+relationName);

// Number of Functional Dependencies.

int numOfFuncDep=Integer.parseInt(br.readLine());

// this will initialize Left Hand Side attributes of

Functional Dependencies.

LinkedList<String>[] fucDepLHSattr=new

LinkedList[numOfFuncDep];

// this will initialize Right Hand Side attributes of

Functional Dependencies.

LinkedList<String>[] fucDepRHSattr=new

LinkedList[numOfFuncDep];

for(int i=0;i<numOfFuncDep;i++)

{
// Left Hand side of functional dependency might // have more than one determinants.

fucDepLHSattr[i]=new LinkedList<String>();

System.out.println("Number of attributes in LHS of

functional dependency["+i+"]");

//Number of determinant in Left Hand side of //functional dependency.

//temp1 variable overrides itself for each //functional dependency.

int temp1=Integer.parseInt(br.readLine());

System.out.println("Enter the attribute names of

LHS["+i+"]");

for(int j=0;j<temp1;j++)

fucDepLHSattr[i].add(br.readLine());

// Right Hand side of functional dependency might // have more than one determinants.

fucDepRHSattr[i]= new LinkedList<String>();

System.out.println("Number of attributes in RHS of

functional dependency["+i+"]");

//Number of dependants in Right Hand side of //functional dependency.

//temp2 variable overrides itself for each //functional dependency.

int temp2=Integer.parseInt(br.readLine());

System.out.println("Enter the attribute names of

RHS["+i+"]");

// inserting all attributes on right hand side of // the functional dependency.

for(int j=0;j<temp2;j++)

fucDepRHSattr[i].add(br.readLine());

System.out.println("Step 1: Finding minimal

cover...");

//initializing a collection to contain the minimal //cover of FDs.

HashMap<String,String> canonicalFDs=new

HashMap<String,String>();

// calling the minimal cover to calculate minimum FDs // required for the relation.

canonicalFDs=minimalCover(fucDepLHSattr,fucDepRHSattr)

Comment

Step 2 of 6

for(int i=0;i<numOfFuncDep;i++)

for(int j=0;j<numOfFuncDep && j!=i;j++)

// Since, HashMap has unique key, value pair, it // will remove redundant FDs.

canonicalFDs.get(i).containsKey(canonicalFDs.get

(j));

// reducing the attributes from each side.

canonicalFDs=minusFD(canonicalFDs.get(i),

canonicalFDs.get(i));

System.out.println("Step 2: Calculating attributes for

each Functional Dependency...");

for(int i=0;i<canonicalFDs.size();i++)

System.out.print("Relation"+i+": ");

// this will print the relation for each functional // dependency.

System.out.print(relationName+"("+canonicalFDs.get(

i)+","+canonicalFDs.get(i)+")");

//printing each relation in new line.

System.out.print("\n");

System.out.println("Step 3: Checking whether key

attributes are exist in any of the relations...");

//checking primary keys that exist in the created //relations.


if(canonicalFDs==minimalCover(fucDepLHSattr,fucDepRHSa

ttr))

System.out.println("No redundant attributes

exist:");

System.out.println("Step 4: Reducing redundant


relations from the schema...");

System.out.println("Final schema is as follows:");

// this loop will print the final schema.

for(int i=0;i<canonicalFDs.size();i++)

System.out.print("Relation"+i+": "); System.out.print(relationName+"("+canonicalFDs.get(

i)+","+canonicalFDs.get(i)+")");

public static HashMap<String,String>

minusFD(HashMap<String,String> map, Object pair)

map.remove(pair);

// this method will find the minimal cover of FDs.

public static HashMap<String,String>

minimalCover(LinkedList[] LHSlist,LinkedList[]

RHSlist)

//if the set of FDs are null this will throw //exception.

if(LHSlist==null || RHSlist==null)

throw new IllegalArgumentException("Functional

Dependency can't be NULL.");

else

System.out.println(" Converting Functional

Dependencies into canonical form...");

HashMap<String,String> canonicalFDs=new

HashMap<String,String>();

for(int i=0;i<LHSlist.length && i<RHSlist.length;i++)

canonicalFDs.put(convertIntoCanonical(LHSlist[i],RHSl

ist[i]));

return canonicalFDs;

// this method converts all functional dependencies into // canonical form.

public static HashMap<String,String>

convertIntoCanonical(LinkedList<String>

list1,LinkedList<String> list2)

// initializing a HashMap to hold canonical FDs.

HashMap<String,String> map=new

HashMap<String,String>();

// both loop will insert FDs into map, that hold only // unique pair.

for(int j=0;j<list1.size();j++)

for(int i=0;i<list2.size();i++)

map.put(list1.get(i),list2.get(i));

return map;

Comment

Step 3 of 6

Program to implement Algorithm 15.5

The following program convert a relation into BCNF using relational decomposition algorithm. In
the first step, it considered all attributes in the single relation.

In second step enters into a loop of functional dependency and check whether or not any
functional dependency violates BCNF. If any FD violates BCNF, a new relation will be created
having all those attributes participates in that functional dependency.

At the same time the dependents are removed from the parent relation.

Following is the java code to implement Decomposition algorithm to convert a relation into BCNF.

import java.util.*;

import java.io.*;
public class DecompositionIntoBCNF

// main method to start the execution of the program.

public static void main(String []args) throws Exception

BufferedReader br=new BufferedReader(new

InputStreamReader(System.in));

// If irrelevant values will be entered, it might give // Exception at Runtime.

System.out.println("Note: Everything is case

sensitive, please enter values in the same case

everywhere.");

System.out.println("Enter the name of Relation:");

// It will store the name of relation.

String relationName=br.readLine();

System.out.println("How many attributes are there in

the Relation?");

// Number of attributes in the relation for efficient

// management of the attributes.

int n=Integer.parseInt(br.readLine());

System.out.println("Type name of one attribute in each

line:");

// This list contains all attribute names.

LinkedList<String> attributeList=new

LinkedList<String>();

// for loop will insert all attributes to the list.

for(int i=0;i<n;i++)

attributeList.add(br.readLine());

System.out.println("How many functional dependencies

are there in the relation "+relationName);

// Number of Functional Dependencies.

int numOfFuncDep=Integer.parseInt(br.readLine());

// this will initialize Left Hand Side attributes of // Functional Dependencies.

LinkedList<String>[] fucDepLHSattr=new

LinkedList[numOfFuncDep];

// this will initialize Right Hand Side attributes of // Functional Dependencies.

LinkedList<String>[] fucDepRHSattr=new

LinkedList[numOfFuncDep];

for(int i=0;i<numOfFuncDep;i++)

// Left Hand side of functional dependency might

// have more than one determinants.

fucDepLHSattr[i]=new LinkedList<String>();

System.out.println("Number of attributes in LHS of

functional dependency["+i+"]");

// Number of determinant in Left Hand side of

// functional dependency.

// temp1 variable overrides itself for each

// functional dependency.

int temp1=Integer.parseInt(br.readLine());

System.out.println("Enter the attribute names of

LHS["+i+"]");

for(int j=0;j<temp1;j++)

fucDepLHSattr[i].add(br.readLine());

// Right Hand side of functional dependency might

// have more than one determinants.

fucDepRHSattr[i]= new LinkedList<String>();

System.out.println("Number of attributes in RHS of

functional dependency["+i+"]");

// Number of dependants in Right Hand side of

// functional dependency.

// temp2 variable overrides itself for each

// functional dependency.

int temp2=Integer.parseInt(br.readLine());

System.out.println("Enter the attribute names of

RHS["+i+"]");

// inserting all attributes on right hand side of

// the functional dependency.

for(int j=0;j<temp2;j++)
fucDepRHSattr[i].add(br.readLine());

LinkedList<String> output=new LinkedList<String>();

LinkedList[] decomposition=new

LinkedList[numOfFuncDep];

output=attributeList;

int d=0;

// repeat until any functional dependency violates

// BCNF. while(!inBCNF(output,fucDepLHSattr[d],fucDepRHSattr[d]

,d))

decomposition[d]=new LinkedList<String>();

// if FD violates BCNF, create new relation

// consisting attributes of LHS in FD.

for(int j=0;j<fucDepLHSattr[d].size();j++)

Comment

Step 4 of 6

decomposition[d].add(fucDepLHSattr[d].get(j));

// add RHS attributes to the relation.

for(int j=0;j<fucDepRHSattr[d].size();j++)

decomposition[d].add(fucDepRHSattr[d].get(j));

// remove RHS attributes of FD from parent

// relation.

for(int j=0;j<fucDepRHSattr[d].size();j++)

output.remove(fucDepRHSattr[d].get(j));

d++;

// limit the loop up to the Number of functional

// dependencies.

if(d>=numOfFuncDep)

break;

System.out.println("Following are the decomposed

relations:");

// this loop will print the relations.

for(int k=0;k<d;k++)

System.out.print(relationName+""+(k+1)+"(");

// HashSet removes the redundant attributes from the

// relation.

HashSet hs=new HashSet();

for(int q=0;q<decomposition[k].size();q++)

hs.add(decomposition[k].get(q));

Iterator it=hs.iterator();

// while loop will print one attribute at a time.

while(it.hasNext())

System.out.print(it.next());

System.out.print(")\n");

// inBCNF method will check whether or not a relation is

// in BCNF.

public static boolean inBCNF(LinkedList<String>

relation,LinkedList<String> list1,LinkedList<String>

list2,int index)

// this loop will concatenate the attributes of LHS

// and RHS.

for(int i=0;i<list2.size();i++)

list1.add(list2.get(i));

// if the functional dependency violates BCNF this

// will return false otherwise return true.

if(list1.size()< relation.size())

return false;
else

// sorting attributes to compare attributes whether

// or not they exist in the relation.

Collections.sort(list1);

Collections.sort(relation);

// if attributes of functional dependency and

// relation are similar this follows BCNF otherwise

// it will return false.

for(int j=0;j<list1.size() && j<relation.size();

j++)

if(list1.get(j)==relation.get(j))

continue;

else return false;

return true;

Comment

Step 5 of 6

The following output gets displayed by the above program:

E:\Tom\java, c & c++ code>javac DecompositionIntoBCNF.java

Note: DecompositionIntoBCNF.java uses unchecked or unsafe operations.

Note: Recompile with -Xlint:unchecked for details.

E:\Akram\java, c & c++ code>java DecompositionIntoBCNF

Note: Everything is case sensitive, please enter values in the same case everywhere.

Enter the name of Relation:

MyRelation

How many attributes are there in the Relation?

Type name of one attribute in each line:

How many functional dependencies are there in the relation MyRelation

Number of attributes in LHS of functional dependency[0]

Enter the attribute names of LHS[0]

Number of attributes in RHS of functional dependency[0]

Comment

Step 6 of 6

Enter the attribute names of RHS[0]

Number of attributes in LHS of functional dependency[1]

Enter the attribute names of LHS[1]

Number of attributes in RHS of functional dependency[1]

Enter the attribute names of RHS[1]

Number of attributes in LHS of functional dependency[2]

2
Enter the attribute names of LHS[2]

Number of attributes in RHS of functional dependency[2]

Enter the attribute names of RHS[2]

Following are the decomposed relations:

MyRelation1(ABC)

MyRelation2(CDE)

MyRelation3(BDE)

E:\Tom\java, c & c++ code>

Comment
Chapter 15, Problem 21E

Problem

Consider the relation REFRIG(Model#, Year, Price, Manuf_plant, Color), which is abbreviated as
REFRIG(M, Y, P, MP, C), and the following set F of functional dependencies: F = {M → MP, {M,
Y}→ P, MP → C}

a. Evaluate each of the following as a candidate key for REFRIG, giving reasons why it can or
cannot be a key: {M}, {M, Y}, {M, C}.

b. Based on the above key determination, state whether the relation REFRIG is in 3NF and in
BCNF, and provide proper reasons.

c. Consider the decomposition of REFRIG into D = {R1 (M, Y, P), R2(M, MP, C)}. Is this
decomposition lossless? Show why. (You may consult the test under Property NJB in Section
14.5.1.)

Step-by-step solution

Step 1 of 3

Consider the relation schema REFRIG and the functional dependencies F provided in the
question.

a.

Consider the key {M}.

{M} cannot be a candidate key as it cannot determine the attributes P and Y.

Consider the key {M, Y}.

It is provided that .

Since is the superset of M so by IR1,

Since and so by IR3,

Since and so by IR3,

Therefore, {M, Y} is a candidate key as it determines the attributes P, MP and C.

Consider the key {M, C}.

{M, C} cannot be a candidate key as it cannot determine the attributes P and Y.

Comment

Step 2 of 3

b.

REFRIG is not in 2NF as there is a functional dependency , in which M is partially


dependent on the key {M, Y}. Hence REFRIG is not in 3NF.

Since M is not the super key in so REFRIG is not in BCNF too.

Comment

Step 3 of 3

c.

Consider the decomposition of REFRIG as follows,

Applying the test for Binary Decomposition,

Now it is provided that .

Since and so by IR3,

Hence, .

In the above decomposition, is and is .


Since , the NJB test is satisfied and hence decomposition is lossless.

Comment
Chapter 15, Problem 22E

Problem

Specify all the inclusion dependencies for the relational schema in Figure 5.5.

Step-by-step solution

Step 1 of 1

Inclusion dependencies

Inclusion dependencies are defined in two types of interrelational constraints.

- referential integrity constraints

- Class/subclass relationships.

Definition of inclusion dependency:- let be the set of attributics between – X of


relation schema . And of relation schema specifies the constraint that at any specific
time where is a relation state and a relation state of . Then we must have

From the figure 5.5 in the text book, we can specify the following inclusion dependencies on the
relational schema.

DEPENDENT.ESsn < EMPLOYEE.Ssn

WORKS-ON.P number

DEPT-LOCATIONS.D number

All the preceding inclusion dependencies represent referential integrity constraints.

We can also use inclusion dependencies to represent class/subclass relationships.

Comment
Chapter 15, Problem 23E

Problem

Prove that a functional dependency satisfies the formal definition of multivalued dependency.

Step-by-step solution

Step 1 of 2

Functional dependency satisfies the formal definition of multi valued dependency.

Functional dependencies

Consider the rule for the functional dependencies if , and is a subset of


then

Hence are single attributes and are set of attributes.

It should be based on the formal definition of functional dependencies.

Multi valued dependencies:-

While come to multi valued dependencies, it may follow the below rule

If (BB intersects CC) where AA, BB, and CC are sets of attributes, and intersect
performs set intersection.

Comments (2)

Step 2 of 2

As with function dependencies (FDs), inference rules for multi valued dependencies (MVPs)
have been developed. A functional dependency is a multi valued dependencies it follows the
replication Rule. Ice. If then

Holds

Now assume that all attributes are included in universal relation schema

and that and W are subsets of R

It may follow the bellow rules.

If then where is
all attributes in except

Augmentation rule:

If and there exists W with the properties

That (a) is empty

(b)

(c) Then

Here and W have to be disjoint and Z has to be a subset of or equal to Y

So, by the above rules “every functional dependency is also an multi valued dependencies,
because. It satisfies the formal definition of an multi valued dependencies.

Comment
Chapter 15, Problem 24E

Problem

Consider the example of normalizing the LOTS relation in Sections 14.4 and 14.5. Determine
whether the decomposition of LOTS into {LOTS1AX, LOTS1AY, LOTS1B, LOTS2} has the
lossless join property by applying Algorithm 15.3 and also by using the test under property NJB
from Section 14.5.1.

Step-by-step solution

Step 1 of 8

Consider the example given in text book

Comment

Step 2 of 8

Comment

Step 3 of 8

Let take the relation.

Lots (property-id, country-name, #, area, price, tax-rate)

Suppose we decompose the above relation into two relations

LOTSIAX, LOTSIAY as follows. (From step 1, 2 of Algorithm 11.1)

LOTSIAX (Property-id, country-name, #, Area, Price)

LOTSIAY (Country-name, Tax-rate)

There are a problem with this decomposition but we wish to focus on are aspect at the moment.
Let an instance of the relation LOTS be

Comment

Step 4 of 8

Comment

Step 5 of 8

Now let the decomposed relations LOTSAX, LOTSAY. Be

Comment

Step 6 of 8
And

Comment

Step 7 of 8

All the information that was in tehr elation LOTS appears to be still available in LOTSIAX and
LOTSIAY. But this is not so.

Suppose, we construct LOTSIAX by removing the attribute Tax-rate that violates 2NF form LUTS
and placing it wilt country-name. Into another relation LOTSIY

Let

Comment

Step 8 of 8

Now we need to retrieve #. Then we would need to join LUTSIAX and LOTSIAY. Then the join
would have some tuples.

A decomposition of a relation into relations is called loss less join


decomposition. With respect to .

Optaining result from steps of – Algorithm 11.1

Let decomposition of R has the non additive join property. That represents the set
of functional dependencies on . If and only if either the functional dependencies
it is also in .

By the above relation.

Let

And .

Now apply the property of the NJB, we get

a
so the functional dependencies

it is in and it is also in

Comment
Chapter 15, Problem 25E

Problem

Show how the MVDs Ename ↠ and Ename ↠ Dname in Figure 14.15(a) may arise during
normalization into 1NF of a relation, where the attributesPname and Dname are multivalued.

Step-by-step solution

Step 1 of 2

Given multi valued dependency is

E name P name and E name D name

According 11.4 cal figure given in Text book

EMP

E name P name D name

It is in first normal form

Now, we need to show that attributes P name and D name are multi valued.

And hold the EMP relation.

Let example of 11.4 (a) gives in text book

EMP

E name P name D name

Smith X John

Smith Y Anna

Smith X Anna

Smith Y John

Comment

Step 2 of 2

By above relation EMP shows. An employee where name is E name works on the project where
P name and has a dependent whose name I D name.

An employee may work on several projects and may have several dependents.

The employee projects and dependents are independent of one another.

To maintain this relation state consistent. We must have a separate tople to represent every
combination of based on this Decomposing the EMP relation into two 4 NF relations EMP-
PROJECTS and EMP-DEPENDENTS.

Is

EMP-PROJECTS EMP-DEPENDENTS
E name P name

Smith X

Smith Y

E name P name

Smith john

Smith john

This specifies the MVD on the EMP

Comment
Problem
Chapter 15, Problem 26E

Apply Algorithm 15.2(a) to the relation in Exercise to determine a key for R. Create a minimal set
of dependencies G that is equivalent to F, and apply the synthesis algorithm (Algorithm 15.4) to
decompose R into 3NF relations.

Exercise

Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional


dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for
F? Decompose R into 2NF and then 3NF relations.

Step-by-step solution

Step 1 of 4

Refer to the Exercise 14.24 for the set of functional dependencies F and relation R. The
functional dependencies in F are as follows:

• The combination of all the attributes is always a candidate key for that relation. So
ABCDEFGHIJ will be a candidate key for the relation R.

• Reduce unnecessary attributes from the key as follows:

• Since C can be determined by so remove it from the key.

• Attributes D and E can be removed because they are determined by

• Attribute F can be removed because it can be determined by

• Attributes G and H can be removed because they are determined by

• Attributes I and J can be removed because they are determined by

Therefore, attribute set AB is a candidate key for relation R.

Comment

Step 2 of 4

Minimal set of dependencies (Minimal cover)

If functional dependencies of a relation are not in canonical form then first convert them into
canonical form using decomposition rule of inference.

Refer to the Exercise 14.24 for the set of functional dependencies F and convert them into
canonical form as follows:

If there exist any extraneous functional dependency, remove it.

Determine the minimal set of dependencies G, using the tests as follows:

• Test for minimal set of LHS (only test functional dependencies with ≥2 attributes)

1. Testing for

Test the functional dependency :


Since so is necessary.

2. Testing for

Test the functional dependency :

Since so is necessary.

• Test for minimal set of RHS

1. Testing for

Since so is necessary.

2. Testing for

Since so is necessary.

3.

Comment

Step 3 of 4

Testing for

Since so is necessary.

4. Testing for

Since so is necessary.

5. Testing for

Since so is necessary.

6. Testing for

Since so is necessary.

7. Testing for

Since so is necessary.

8. Testing for

Since so is necessary.

Therefore necessary functional dependencies are as follow:


After applying composition rule of inference, the minimal set of dependencies is:

Hence, the minimal set of dependencies G, that is equivalent to F, is:

Comment

Step 4 of 4

Following steps must be used to decompose R into 3NF relations, using synthesis algorithm:

Step 1: Calculate minimal cover

The set of above functional dependencies is a minimal cover of R.

Step 2: Creating relation for each functional dependency

There are five functional dependencies in the relation R. Create five relations
, all having the corresponding attributes as follows:

Step 3: Creating relation for key attributes

• AB is the candidate key in relation R. Since attributes A and B already exist in relation so
there is no need to create another relation for key attributes.

• If another relation is created containing the candidate key AB, then it will result in
redundancy, and step 4 can be used for removing the redundant relation.

Therefore, the final 3NF relations obtained after decomposing R are as follows:

and

Comment
Chapter 15, Problem 27E

Problem

Repeat Exercise 1 for the functional dependencies in Exercise 2.

Exercise 1

Apply Algorithm 15.2(a) to the relation in Exercise to determine a key for R. Create a minimal set
of dependencies G that is equivalent to F, and apply the synthesis algorithm (Algorithm 15.4) to
decompose R into 3NF relations.

Exercise

Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional


dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for
F? Decompose R into 2NF and then 3NF relations.

Exercise 2

Repeat Exercise for the following different set of functional dependencies G = {{A, B}, {B, D}→
{E, F}, {A, D}→{G, H}, {A}→{I},{H}{J}}.

Exercise

Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional


dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for
F? Decompose R into 2NF and then 3NF relations.

Step-by-step solution

Step 1 of 5

Refer to the Exercise 14.25 for the set of functional dependencies F and relation R. The
functional dependencies in F are as follows:

• The combination of all attributes is always a candidate key for that relation. So ABCDEFGHIJ
will be candidate key for the relation R. Reduce unnecessary attributes from the key. Since C can
be determined by so remove it from the key.

• Since attributes B and D are determining attributes E and F so both should be removed from
the candidate key.

• Since attributes A and D are determining attributes G and H so both should be removed from
the candidate key.

• Since attribute A is determining attributes I so it should be removed from the candidate key.

• Since attribute H is determining attributes J so it should be removed from the candidate key.

Therefore, the attribute set ABD is a candidate key for the relation R.

Comment

Step 2 of 5

Minimal set of dependencies (Minimal cover)

If functional dependencies of a relation are not in canonical form then first convert them into
canonical form using decomposition rule of inference.

Refer to the Exercise 14.25 for the set of functional dependencies F and convert them into
canonical form as follows:
If there exist any extraneous functional dependency, remove it.

Determine the minimal set of dependencies G, using the tests as follows:

• Test for minimal set of LHS (only test functional dependencies with ≥2 attributes)

1. Testing for

Test the functional dependency :

Since so is necessary.

2. Testing for

Test the functional dependency :

Since so is necessary.

3. Testing for

Test the functional dependency :

Since so is necessary.

4. Testing for

Test the functional dependency :

Since so is necessary.

5. Testing for

Test the functional dependency :

Since so is necessary.

6. Testing for

Test the functional dependency :

Since so is necessary.

Comment

Step 3 of 5

Test for minimal set of RHS

1. Testing for

Since so is necessary.

2. Testing for

Since so is necessary.

3. Testing for

Since so is necessary.

4. Testing for

Since so is necessary.

5. Testing for
Since so is necessary.

6. Testing for

Since so is necessary.

7. Testing for

Since so is necessary.

Therefore necessary functional dependencies are as follow:

After applying composition rule of inference to above canonical functional dependencies, the
minimal functional dependencies G (where ) obtained are as follows:

Hence, the minimal set of functional dependencies G, that is equivalent to F, is:

Comment

Step 4 of 5

Following steps must be followed to decompose the relation R into 3NF relation using synthesis
algorithm. Refer Exercise 14.25 for the functional dependencies.

Step 1: Calculate minimal cover

Minimal cover of the given functional dependencies is as follows:

The set of above functional dependencies is a minimal cover of R.

Step 2: Creating relation for each functional dependency

There are five functional dependencies in the relation R, create five relations
, all having corresponding attributes.

Comment

Step 5 of 5

Step 3: Creating relation for key attributes

ABD is the candidate keys in relation R. Create a new relation containing attributes A, B and
D. Therefore, all six relations with their corresponding attributes are as follow:

Step 4: Eliminating redundant relations

Remove all relations which are redundant. A relation R is redundant if R is a projection of another
relation S in the same schema . Since there is no redundant relation in the schema, so
there is no need to remove any relation.

Therefore, the final 3NF relations obtained after decomposing R are as follows:
and

Comment
Chapter 15, Problem 29E

Problem

Apply Algorithm 15.2(a) to the relations in Exercises 1 and 2 to determine a key for R. Apply the
synthesis algorithm (Algorithm 15.4) to decompose R into 3NF relations and the decomposition
algorithm (Algorithm 15.5) to decompose R into BCNF relations.

Exercise 1

Consider a relation R(A, B, C, D, E) with the following dependencies:

AB → C, CD → E, DE→ B

Is AB a candidate key of this relation? If not, is ABD? Explain your answer.

Exercise 2

Consider the relation R, which has attributes that hold schedules of courses and sections at a
university; R = {Course_no, Sec_no, Offering_dept, Credit_hours, Course_level, lnstructor_ssn,
Semester, Year, Days_hours, Room_no, No_of_students}. Suppose that the following functional
dependencies hold on R:

{Course_no} → {Offering_dept, Credit_hours, Course_level}

{Course_no, Sec_no, Semester, Year} → {Days_hours, Room_no, No_of_students,


lnstructor_ssn}

{Room_no, Days_hours, Semester, Year} → {lnstructor_ssn, Course_no, Sec_no}

Try to determine which sets of attributes form keys of R. How would you normalize this relation?

Step-by-step solution

Step 1 of 6

Refer to the Exercise 14.27 for the set of functional dependencies and relation R. The functional
dependencies are as follows:

Canonical functional dependency

Functional dependency having only one attribute on their right hand side.

• The combination of all attributes is always a candidate key for that relation. So ABCDE will be
candidate key for the relation R. Since all functional dependencies are in canonical form, there
is no need to convert them into canonical form.

• Reduce unnecessary attributes from the key as follows:

• Since C can be determined by so remove it from the key.

The attribute set ABDE can be considered as a candidate key.

• Since E can be determined by so remove it from the key.

The attribute set ABD can be considered as a candidate key.

Therefore, ABD is a candidate key for the relation R.

Comment

Step 2 of 6

Refer to the Exercise 14.27 for the set of functional dependencies and relation R. Following steps
must be used to decompose R into 3NF relations, using synthesis algorithm:

Step 1: Finding the minimal cover

The set of above functional dependencies is a minimal cover of R.

Step 2: Creating relation for each functional dependency

There are three functional dependencies, and their corresponding attributes are as follows:
Step 3: Creating relation for key attributes

• ABD is the candidate key in relation R. Since attributes A, B and D already exist in the above
relations, so there is no need to create another relation for key attributes.

• If another relation is created containing the candidate key ABD, then it will result in
redundancy, and step 4 can be used for removing the redundant relation.

Therefore, the final 3NF relations obtained after decomposing R are as follows:
and .

Comment

Step 3 of 6

Refer to the Exercise 14.27 for the set of functional dependencies and relation R. Following steps
must be used to decompose R into BCNF relations, using decomposition algorithm:

Step 1: Initialize the decomposition algorithm.

S= (A, B, C, D, E)

Step 2: Check whether or not any functional dependency violates BCNF. If yes, then decompose
the relation.

Decompose R into three relations having the following attributes:

Therefore, the final BCNF relations obtained after decomposing R are as follows:

and .

Comment

Step 4 of 6

Refer to the Exercise 14.28 for the set of functional dependencies and relation R. Since
functional dependencies are not in canonical form, convert them into canonical functional
dependencies as follows:

The entire attribute set of the relation R is a candidate key. Since Days_hours, Room_no,
No_of_students and Instructor_ssn can be determined by functional dependencies FD4, FD5,
FD6 and FD7 respectively, so remove them from the candidate key. Remaining attributes in the
candidate key are as follows:

Since Offering_dept, Credit_hours and Course_level can be determined by FD1, FD2 and FD3
respectively, so remove them from the candidate key. Remaining attributes in candidate are as
follows:

Therefore, would be the minimal candidate key for the


relation R.

Comment

Step 5 of 6

Refer to the Exercise 14.28 for the set of functional dependencies and relation R. Following steps
must be used to decompose R into 3NF relations, using synthesis algorithm:

Step 1: Finding the minimal cover

Since functional dependencies are not in canonical form, convert them into canonical functional
dependencies as follows:
Since Instruct_ssn, Course_no and Sec_no have been determined already, so these are
extraneous attributes. Minimal cover for the relation R is as follows:

The composed form of above functional dependencies is as follows:

Step 2: Creating relation for each functional dependency

There are two functional dependencies, and their corresponding relations are as follows:

Step 3: Creating relation for key attributes

• The relation R has the candidate key (Course_no, Sec_no, Semester, Year). Since attributes
(Course_no, Sec_no, Semester, Year) already exist in the above relations, so there is no need to
create another relation for key attributes.

• If another relation is created containing the candidate key (Course_no, Sec_no, Semester,
Year), then it will result in redundancy, and step 4 can be used for removing the redundant
relation.

Therefore, the final 3NF relations obtained after decomposing R are as follows:

and

Comment

Step 6 of 6

Refer to the Exercise 14.28 for the set of functional dependencies and relation R. Following steps
must be used to decompose R into BCNF relations, using decomposition algorithm:

Step 1: Initialize the decomposition algorithm.

Step 2: Check whether or not any functional dependency violates BCNF. If yes, then decompose
the relation.

Since violates BCNF, relation R is decomposed into two relations .

Therefore, the final BCNF relations obtained after decomposing R are as follows:

and

Comment
Chapter 15, Problem 31E

Problem

Consider the following decompositions for the relation schema R of Exercise. Determine whether
each decomposition has (1) the dependency preservation property, and (2) the lossless join
property, with respect to F. Also determine which normal form each relation in the decomposition
is in.

a. D1 = {R1, R2, R3, R4, R5}; R1 = {A, B, C}, R2 = {A, D, E}, R3 = {B, F}, R4 = {F, G, H}, R5 =
{D, I, J}

b. D2 = {R1, R2, R3}; R1 = {A, B, C, D, E}, R2 = {B, F, G, H}, R3 = {D, I, J}

c. D3 = {R1, R2, R3, R4, R5}; R1= {A, B, C, D}, R2= {D, E}, R3 = {B, F}, R4 = {F, G, H}, R5= {D, I,
J}

Exercise

Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of functional


dependencies F = {{A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G, H}, {D}→{I, J}}. What is the key for
F? Decompose R into 2NF and then 3NF relations.

Step-by-step solution

Step 1 of 10

Consider the relation R and functional dependencies as follows:

Comment

Step 2 of 10

a.

The decomposition for the relation schema R is:

Relation R1 satisfies the functional dependency .

Relation R2 satisfies the functional dependency .

Relation R3 satisfies the functional dependency .

Relation R4 satisfies the functional dependency .

Relation R5 satisfies the functional dependency .

Hence, the decomposition satisfies the dependency preserving property.

Comment

Step 3 of 10

In order to know if satisfies the nonadditive join property, apply the algorithm 15.3. Please
refer the algorithm 15.3 (testing for nonadditive join property) given in the textbook.
The first row consists of “a” symbols in all the cells. Hence, the decomposition satisfies the
nonadditive join property.

Comment

Step 4 of 10

In relation R1, is the primary key and also a super key. It satisfies Boyce Codd normal
form.

In relation R2, is the primary key and also a super key. It satisfies Boyce Codd normal form.

In relation R3, is the primary key and also a super key. It satisfies Boyce Codd normal form.

In relation R4, is the primary key and also a super key. It satisfies Boyce Codd normal form.

In relation R5, is the primary key and also a super key. It satisfies Boyce Codd normal form.

All the relations of decomposition are in Boyce Codd normal form.

Comment

Step 5 of 10

b.

The decomposition for the relation schema R is:

Relation R1 satisfies the functional dependency and .

Relation R2 satisfies the functional dependency and .

Relation R3 satisfies the functional dependency .

Hence, the decomposition satisfies the dependency preserving property.

Comment

Step 6 of 10

In order to know if satisfies the nonadditive join property, apply the algorithm 15.3. Please
refer the algorithm 15.3 (testing for nonadditive join property) given in the textbook.
The first row consists of “a” symbols in all the cells. Hence, the decomposition satisfies the
nonadditive join property.

Comment

Step 7 of 10

In relation R1, is the primary key. The relation R1 is in first normal form as there is partial
dependency. The attribute A is a partial key and it determines the attributes D and E.

In relation R2, is the primary key. The relation R2 is in second normal form as there is
transitive dependency. The attribute F is a non-key attribute that functional determines the
attributes G and H.

In relation R3, is the primary key and also a super key. It satisfies Boyce Codd normal form.

Comment

Step 8 of 10

c.

The decomposition for the relation schema R is:

Relation R1 satisfies the functional dependency .

Relation R3 satisfies the functional dependency .

Relation R4 satisfies the functional dependency .

Relation R5 satisfies the functional dependency .

The functional dependency is not satisfied.

Hence, the decomposition does not satisfy the dependency preserving property.

Comment

Step 9 of 10

In order to know if satisfies the nonadditive join property, apply the algorithm 15.3. Please
refer the algorithm 15.3 (testing for nonadditive join property) given in the textbook.
There is no row in the matrix that consists of “a” symbols in all the cells. Hence, the
decomposition does not satisfy the nonadditive join property.

Comment

Step 10 of 10

The normal form of relation R1 cannot be determined as it satisfies only functional dependency
. Nothing can be said about the attribute D of relation R1.

The normal form of relation R2 cannot be determined as it does not satisfy any functional
dependency.

In relation R3, is the primary key and also a super key. It satisfies Boyce Codd normal form.

In relation R4, is the primary key and also a super key. It satisfies Boyce Codd normal form.

In relation R5, is the primary key and also a super key. It satisfies Boyce Codd normal form.

Comment
Chapter 16, Problem 1RQ

Problem

What is the difference between primary and secondary storage?

Step-by-step solution

Step 1 of 1

Following are the differences between primary and secondary storage:

Primary storage Secondary storage

The CPU can directly access the The CPU cannot directly access the secondary storage
primary storage devices. devices.

Fast access to data is provided by Slower access to data is provided by the secondary
the primary storage devices. storage devices.

The storage capacity is limited. The storage capacity is larger.

Cost of primary storage devices is


Cost of secondary storage devices is low than the
high than the secondary storage
primary storage devices.
devices.

Examples of secondary storage are hard disk drive,


Examples of primary storage are
magnetic disks, magnetic tapes, optical disks and flash
main memory and cache memory.
memory.

Comment
Chapter 16, Problem 2RQ

Problem

Why are disks, not tapes, used to store online database files?

Step-by-step solution

Step 1 of 1

To store online database files, we use disks. Disks are secondary storage device; a disk is a
random access addressable device.

Data us stored and retrieved in units called disk blocks while come to the tapes.

Tapes are the sequential access addressable device.

Comment
Chapter 16, Problem 3RQ

Problem

Define the following terms: disk, disk pack, track, block, cylinder, sector, interblock gap, and
read/write head.

Step-by-step solution

Step 1 of 3

Disk: The disk is the secondary storage device that is used to store the huge amount of data.
The disk stores the data in the digital form i.e., 0’s and 1’s. The most basic unit of data that can
be stored in the disk is bit.

Disk pack: The disk pack contains the layers of hard disks to increase the storage capacity i.e.,
it includes many disks.

Comment

Step 2 of 3

Track: In the disk, the information is stored on the surface in the form of circles with various
diameters. Each circle of the surface is called a track.

Block: Each track of the disk is divided into equal sized slices. One or more such slices are
grouped together to form a disk block. The block may contain single slice (sector). The size of
the block is fixed at the time of disk formatting.

Comment

Step 3 of 3

Cylinder: In the disk pack, the tracks with the same diameter forms a cylinder.

Sector: Each track of the disk is divided into small slices. Each slice is called as sector.

Interblock gap: The interblock gap separates the disk blocks. The data cannot be stored in the
interblock gap.

Read/write head: The read/write head is used to read or write the block.

Comment
Chapter 16, Problem 4RQ

Problem

Discuss the process of disk initialization.

Step-by-step solution

Step 1 of 1

Process of disk initialization:-

In the disk formatting / initialization process, tracks are divided into equal size. It is set by the
operating system.

Initialization means,

The process of defining the tracks and sectors, so that data and programs can be stored and
retrieved.

While initialization of the disk, block size is fixed, and it can not be changed dynamically.

Comment
Chapter 16, Problem 5RQ

Problem

Discuss the mechanism used to read data from or write data to the disk.

Step-by-step solution

Step 1 of 1

When the disk drive begins to rotate the disk when ever a particular read or write request is
initiated and once the read/unit head is positioned on the right track and the block specified in the
block address moves under the read/write head. The electronic component of the read/write
head is activated to transfer the data.

Below procedure is follows when the data is read for or write from disk.

(1) The head seeks to the correct track

(2) The correct head is turned on

(3) The correct sector is located.

(4) The data is read from the hard disk and transferred to a buffer RAM

Comment
Chapter 16, Problem 6RQ

Problem

What are the components of a disk block address?

Step-by-step solution

Step 1 of 1

Disk block address:-

Data is stored and retrieved in units called disk blocks or pages.

Address of a block:-

Consists of a combination of cylinder number, track number (Surface number with in the cylinder
on which the track is located. Block number (with in the track) is supplied to the disk

Comment
Chapter 16, Problem 7RQ

Problem

Why is accessing a disk block expensive? Discuss the time components involved in accessing a
disk block.

Step-by-step solution

Step 1 of 4

The data is arranged in an order and then stored in a block of the disk is said to be known as
blocking. The data can be transferred from the disk to the main memory in units.

Accessing the data in the main memory is less expensive than accessing the data in the disk.
This is due to the following components:

• Seek time.

• Rotational latency.

• Block transfer time.

Comment

Step 2 of 4

The access of the data in the disk is more expensive because of the time components. The time
components are explained as follows:

• Seek time:

o The disk contains a set of tracks. Each track has one head. This is said to be known as disk
head. The track is formed by sectors of fixed size.

o A sector is said to be known as a small sub-division of a track present on the disk.

o Each sector can store up to 512 bytes of data in which the user can access the data.

o For reading the data present in the disk, there is an arm on the disk. This is used to read a
record from the disk.

o Seek time is said to be known as the total time taken to position the arms to the disk head
present on the track.

o Accessing a disk block takes more seek time. Therefore, this is one of the major reasons for
the expensiveness of accessing a disk block.

Comment

Step 3 of 4

• Rotational latency:

o Latency is said to be known as time delay.

o The total amount of time taken between the request for an information and how long it takes
the disk to position the sector where that data is available. This is said to be known as rotational
latency.

o This is also said to be a waiting time in which if this time increases, then the expensiveness of
accessing a disk block will also increase.

Comment

Step 4 of 4

• Block transfer time:

o If there is need to transfer the data in the disk from one block to another block then it will take
some time.

o This time is said to be known as block transfer time. At the time of accessing a block of data
from the disk, the transfer time may increase. This will result in the expensiveness of accessing a
disk block.

Comment
Chapter 16, Problem 8RQ

Problem

How does double buffering improve block access time?

Step-by-step solution

Step 1 of 1

Improve block access time using duffer buffer:-

Double buffering is used to read a continuous stream of blocks from disk to memory.

Double buffering permits continuous reading or writing of data on consecutive

Disk blocks, which eliminates the seek time and rotational delay for all but the first block transfer.
Moreover, in the programs data is kept ready for processing and it reducing the waiting time.

Double buffering processing time is np

Where n blocks

p processing time/block

Comment
Chapter 16, Problem 9RQ

Problem

What are the reasons for having variable-length records? What types of separator characters are
needed for each?

Step-by-step solution

Step 1 of 1

Variable-length records (Reasons) A file is a sequence of records. All records in a file are of the
same type and in same size. If different records in the file have different size, the file is said to be
made up of variable – length records.

A file may have variable-length records for sever reasons.

* File records are of the same record type, but one or more of the fields are of different size.

* File records are same record type, but one or more of the fields are optimal.

Means, they may have some values but not for all.

* File contains records of different record types and varying size. If related records of different
types were placed together on the disk block it would be occur.

Type of separator characters are need for each:-

In the variable length fields, each record has a value for each field. But we do not know the exact
length of some field values. To determine the bytes with in that record it represent each field.
Then we can use separator characters like

? or % or $

Types of separator characters:-

Separating the field name from the field value and separating one field from the next field.

For this we use three types of characters.

Example

Here we use three separator characters. That are

,
and

Comment
Chapter 16, Problem 10RQ

Problem

Discuss the techniques for allocating file blocks on disk.

Step-by-step solution

Step 1 of 1

Techniques for allocating file Blocks on Disk:

Many techniques are there fore allocating the blocks of a file on Disk. In that

Contiguous allocation

Linked allocation

Clusters

Indexed allocation.

Contiguous allocation:-

File blocks are allocated to consecutive disk blocks. This makes reading the whole file very fast
using double buffering.

Linked allocation:-

Each file block contains a pointer to the next file block. It is easy to expand the file but makes it
slow to read the whole file.

Clusters:-

Combination of two allocates of consecutive disk blocks. Clusters are sometime called as file
segments/extends.

Indexed allocation:-

One or more index blocks contain pointers to the actual file blocks.

Comment
Chapter 16, Problem 11RQ

Problem

What is the difference between a file organization and an access method?

Step-by-step solution

Step 1 of 1

Difference between a file organization and an access method

File organization:-

- It shows “how the physical records in a file are arranged on the disk.

A file organization refers to the organization of the data of a file in to records, blocks, and access
structures

- In this, records and blocks are placed on the storage medium and interlinked.

Access methods:-

How the data can be retrieved based on the file organization.

It provides a group of operations and that can be applied to a file.

Some access methods apply to a file organization and can be applied only to file organization in
certain way.

Comment
Chapter 16, Problem 12RQ

Problem

What is the difference between static and dynamic files?

Step-by-step solution

Step 1 of 1

Difference between static and dynamic files:-

Static file:- in the file organization update operations are rarely performed.

While come to dynamic files,

It may change frequently,

Up date operations are constantly applied to them.

Comment
Chapter 16, Problem 13RQ

Problem

What are the typical record-at-a-time operations for accessing a file? Which of these depend on
the current file record?

Step-by-step solution

Step 1 of 1

Typical record at a time operations are:

1.) Reset: Set the file pointer to the beginning of file.

2.) Find (locate): Searches for the first record that satisfies a search condition. Transfer the block
containing that record into memory buffer. The file pointer points to the record in buffer and it
becomes the current record.

3.) Read (Get): Copies current record from the buffer to the program variable in the user
program. This command may also advance the current record pointer to the next record in the
file, which may necessitate reading the next file block from disk.

4.) FindNext: Searches for next record in file that satisfies the search condition. Transfer the
block containing that record into main memory buffer. The record is located in the buffer and
becomes current record.

5.) Delete: Delete current record and updates file on disk to reflect the deletion.

6.) Modify: Modifies some field values for current record and eventually update file on disk to
reflect the modification.

7.) Insert: Insert new record in the file by locating the block where record is to be inserted,
transferring the block into main memory buffer, writing the record into the buffer, and eventually
writing buffer to disk to reflect insertion.

Operations that are dependent on current record are:

1.) Read

2.) FindNext

3.) Delete

4.) Modify

Comment
Chapter 16, Problem 14RQ

Problem

Discuss the techniques for record deletion.

Step-by-step solution

Step 1 of 1

Techniques for record deletion:-

We may delete a record from the file using following techniques.

(1) A program must first find it’s block

Copy the block in to a buffer

Delete the record from the buffer and then,

Rewrite the block back to the disk.

By using this, it leaves the unused space in the disk block. When we use this technique for
deleting the large number of records result is wasted in storage space.

(2) Another technique for record deletion is deletion marker (deletion is to have an extra byte or
bit). In this

Setting the deletion marker to a certain (deleted) value.

A different value of the marker indicates a valid record

Search programs and consider only valid records in a block.

These two deletion techniques requires periodic reorganization.

During this reorganization. The file blocks are accessed consecutively and records are packed by
removing deleted records.

For un ordered file, we use either spanned or un spanned organization and it is used with either
fixed-length or variable-length records.

Comment
Chapter 16, Problem 15RQ

Problem

Discuss the advantages and disadvantages of using (a) an unordered file, (b) an ordered file,
and (c) a static hash file with buckets and chaining. Which operations can be performed
efficiently on each of these organizations, and which operations are expensive?

Step-by-step solution

Step 1 of 4

(a) an unordered file:

It can be defined as the collection of records, those are placed in file in the same order as they
are inserted.

Advantages:

• It is fast, and Insertion of simple records are added at the end of the last page of the file.

• It is easy to get the records from the unordered file.

Disadvantages:

• Blank spaces may appear in the unordered file.

• It will take time to sort the records in unordered file.

Comment

Step 2 of 4

b) an ordered file:

An ordered file, it is stores records in order and it will changes the file when records are inserted.

Advantages:

• Recording a sequential based file is more capable as all the files are being stored as the order.

• Helpful when large volume of data is present.

Disadvantage:

• Rearranging of file would be needed for storing or modifying or deleting any records.

Comment

Step 3 of 4

c) static file hashing:

Advantages:

• The speed is the biggest advantage and it is efficient when huge volume of data is present.

Disadvantages:

• Difficult to implement the static file hashing.

Comment

Step 4 of 4

The hashing technique is the most efferent to be executed and is expensive process because of
is sophisticated structures.

• The extendable hashing is a type of dynamic hashing, which splits and associate the bucket of
the database size for change, because when a hash function is to be adjusted on a dynamic
basis.

• There is a cache which is an added advantage for faster improvement of information’s.

Comment
Chapter 16, Problem 16RQ

Problem

Discuss the techniques for allowing a hash file to expand and shrink dynamically. What are the
advantages and disadvantages of each?

Step-by-step solution

Step 1 of 5

Hashing techniques are allow the techniques for dynamic growth and shrinking of the number of
the file records.

Techniques that are include the dynamic hashing, extendible hashing and linear hashing.

In the static hashing the primary pages are fixed and allocated sequentially, pages are never de-
allocated and if needed pages of overflowed.

This technique use the binary representation of the hash value .

In the dynamic hashing, the directory is a binary tree, the directories can be stored on disk, and
they expand or shrink dynamically. Directory entries point to the disk blocks and that contain the
stored records.

Dynamic hashing is good for the database and that grows and shrinks in size, and hash function
that allows dynamically.

Comment

Step 2 of 5

In dynamic hashing, extendable hashing is the one form. It generates the values over a large
range typically b-bit integers with

Hash function tht allows only prefix to index into a table of bucket addresses.

Example

Let the length of the prefix be I fits,

I must be in the limts between 0 and 32

Bucket address table size is here initially is 0.

Now, value of i grows and shrinks as the size of the database.

Comment

Step 3 of 5

The number of buckets also changes dynamically because of coalescing and spilitting of
buckets.

Advantages and disadvantages of hashing techniques:-

(1) Advantages of static hashing:-

Static hashing uses a fixed address space, and perform computation on the internal binary
representation of the search key.

Using bucket overflow, static hashing is redused, and it can not be eliminated.

Disadvantages:-

Here data base grow is with in time. And if initial number of buckets is too small, then
performance will degraded.

If data base is shrinks, than again space will be wasted.

Comment

Step 4 of 5

Extendible hashing:-

Advantages:-

It is a type of directory

Hash performance dose not degrade with growth of a file

Minimal space is over headed.

Disadvantages:-

Bucket address table may it self become big.

Changing size of bucket address table is an expensive operation.

Comment
Step 5 of 5

Linear hashing:-

Advantages:-

It avoids the directory by splitting.

Overflow pages not likely to be long

Duplicates handled easily

It allows a hash file to expand to shrink its number of buckets dynamically with out a
directory file.

Disadvantages:-

Linear hashing handles the problem of long overflow chains without using a directory and
handles duplicates.

Comment
Chapter 16, Problem 17RQ

Problem

What is the difference between the directories of extendible and dynamic hashing?

Step-by-step solution

Step 1 of 1

The differences between directories of extendible and dynamic hashing are as follows:

Comment
Chapter 16, Problem 18RQ

Problem

What are mixed files used for? What are other types of primary file organizations?

Step-by-step solution

Step 1 of 3

A mixed file refers to a file in which contains records of different file types.

• An additional field known as record type field is added as the first field along with the fields of
the records to distinguish the file to which it belongs.

• The records in a mixed file will be of varying size.

Comment

Step 2 of 3

The uses of mixed file are as follows:

• To place related records of different record types together on disk block.

• To increase the efficiency of the system while retrieving related records.

Comment

Step 3 of 3

The other types of primary file organization are as follows:

• Unordered file organization

• Ordered file organization

• Hashed/file organization

• Sorted file organization

• Hashed file organization

• Indexed (b-tree) file organization

Comment
Chapter 16, Problem 19RQ

Problem

Describe the mismatch between processor and disk technologies.

Step-by-step solution

Step 1 of 2

In computer systems, the collection of data can be stored physically in the storage
medium.

• From the DBMS (DataBase Management System), the data can be processed, retrieved, and
updated whenever it is needed.

• The storage medium structure in a computer will have some storage hierarchy to make the
process of collections of data.

There are two main divisions in the storage hierarchy of a computer system.

• Primary storage

• Secondary and tertiary storage

Primary storage:

• This storage medium in the computer can be directly accessed by the CPU (Central
Processing Unit), it can be stored only as temporarily.

• The primary storage is also called as main memory (RAM).

• In main memory, the data can be accessed faster with faster cache memories but less storage
capacity and cost-effective.

• Please note that in case of any power failures or browser crash, the contents of the main
memory will be erased automatically.

Secondary and tertiary storage:

• This storage medium in the computer can be stored permanently in the way of disks, tapes,
CD-ROMs, or DVDs.

• The secondary storage is also called as secondary memory or Hard Disk Drives (ROM).

• In today’s world, the data can be stored in offline considered as removable media, it is called as
tertiary storage.

• It will store the data as a permanent medium of choice.

• The data cannot be accessed directly in this type of storage, at first it will be copied to primary
storage and then the CPU processes the data.

Comment

Step 2 of 2

The Mismatch between processor and disk technologies:

In computer systems, the processing can be done by RAM which is having a series of
chips.

• For an efficient performance, the faster memory is provided to the processor.

• Also, the processor has the support of cache memory to retrieve the information faster which
will be an added advantage.

In computer systems, the disk technologies need the space to accumulate the data.

• In disk technologies, the collection of data can be stored physically.

• The data cannot be accessed directly in disk type technologies, at first it will be copied to
primary storage and then the CPU processes the data.

• When it is compared to the processor, the time consumption will be more, and the processor is
better to run the processes.

Hence, the processor will provide efficient performance better than the disk technologies.

Comment
Chapter 16, Problem 20RQ

Problem

What are the main goals of the RAID technology? How does it achieve them?

Step-by-step solution

Step 1 of 2

To increase reliability of database when using the redundant array of independent disks by
introducing redundancy

Disk mirroring:-

It is the technique for introducing redundancy in a database is called mirroring/ shadowing. Is


store data redundantly on two identical physical disks that are treated as one logical disk.

In case of mirroed data, the data items can be read from any disk, hot for writing the data item
must be written on both. Means, When data is read, it can be retrieved from the disk with shorter
queuing, seek, and rotational delays.

If one disk fails, the other disk is still there to continuously provide the data. It improves the
reliability.

Comment

Step 2 of 2

Quantities example from book:-

The mean time to failure of a mired disk depends on the man time to failure of the individual
disks, as well as on the mean time to repair, which is the time it takes (an average) to replace a
failed disk and to restore the data an it. Suppose that, the failures of the two disks are
independent; means there is no connection between the failure of one disk and the failure of the
other.

It the system has 100 disks in an array. The mean to repair is 24 hours, and the MTTF if 200,000
hours on each disk.

The mean time to data loss of a mirrored system is

Comment
Chapter 16, Problem 21RQ

Problem

How does disk mirroring help improve reliability? Give a quantitative example.

Step-by-step solution

Step 1 of 1

The technique of data striping to achieve higher transfer rates and improves the performance of
disk in RAID, which has two levels (i) bit-level data striping (ii) block level data striping.

Comment
Chapter 16, Problem 22RQ

Problem

What characterizes the levels in RAID organization?

Step-by-step solution

Step 1 of 2

Raid Levels:-

In the RAID organization, one solution that presents it self because of the increased size and
reduced cost of hard drives is to built in redundancy. RAID can be implemented in hardware and
software and it is a set of physical disk drives viewed by the operating system as a single logical
drive.

Levels:-

Depends on the data redundancy introduced and correctness checking technique used in the
schema.

Level 0:-

Uses data striping and it has no redundancy and no correctness checking.

Level 1:-

Redundancy through mirroring and no correctness checking.

Level 2:-

In this level; mirroring and no mirroring combined with memory like correctness checking.

For example:

Using parity hit:

Various versions of level 2 are possible.

Comment

Step 2 of 2

Level 3:-

Level 3 is seems like as level 2, but uses the single disk for parity. Level 3 is some time called as
bit-interleaved. Disk controller can detect whether a sector has been read correctly. A single
parity bit can be used for error correction as well as detection.

Level 4:-

Block level data striping and parity like level 3 and in this level stores blocks.

Level 5:-

Block level data striping but data and parity are distributed across all disks.

Level 6:-

Uses the P+Q redundancy scheme, and P+Q redundancy using Reed-suloman codes to recover
from multiple disk failures.

Comment
Chapter 16, Problem 23RQ

Problem

What are the highlights of the popular RAID levels 0, 1, and 5?

Step-by-step solution

Step 1 of 1

Different RAID (Redundant Array of Inexpensive Disks) organizations were defined based on
different combinations of the two factors,

1. Granularity of data interleaving (striping)

2. Pattern used to compute redundant information.

There are various levels of RAID from 0 to 6. The popularly used RAID organization is level 0
with striping, level 1 with mirroring, and level 5 with an extra drive for parity.

RAID level 0

• It uses data striping.

• It has no redundant data and hence it provides best write performance as updates are not
required to be duplicated.

• It splits data evenly across multiple disks.

RAID level 1

• It provides good read performance as it uses mirrored disks.

• Performance improvement is possible by scheduling a read request to the disk with shortest
expected seek and rotational delay.

RAID level 5

• It uses block level data striping.

• Data and parity information are distributed across all the disks. If any one disk fails, the data
lost is due to any changes is determined by using the information of the parity available from the
remaining disks.

Comment
Chapter 16, Problem 24RQ

Problem

What are storage area networks? What flexibility and advantages do they offer?

Step-by-step solution

Step 1 of 1

There is a demand for storage and management of cost all data as data are integrated across
organization and it is necessary to move from a static fixed data which are used from centered
architecture operation to a more flexible and dynamic infrastructure for the processing of
information requirements, most of the organizations moved to the better criterion of storage area
networks (SANs).

• In SAN, online storage peripherals are configured as nodes on a high-speed network and can
be attached and removed from servers in a very flexible manner.

• They allow storage systems to be placed at longer distances from the servers and provide good
performance and different connectivity options

• It provides point-to-point (every devices are connected to every other device) connections
between servers and storage systems through fiber channel; it allows connecting multiple RAID
systems, tape libraries to servers.

Advantages

1. It is more flexible as it provides flexible connection with many devices that is many-to-many
connectivity among servers and storage devices using fiber channel hubs and switches.

2. Between a server and storage system there is a distance separation of up to 10km provided
by using fiber optic cables.

3. It provides better isolation capabilities by allowing non-interruptive addition of new peripheral


devices and servers.

Comment
Chapter 16, Problem 25RQ

Problem

Describe the main features of network-attached storage as an enterprise storage solution.

Step-by-step solution

Step 1 of 1

In enterprise applications it is necessary to maintain solutions at a very low cost to provide high
performance. Network-attached storage (NAS) devices are used for this purpose. It does not
provide any of the services common to the server, but it allows the addition of storage for file
sharing.

Features

• It provides very large amount of hard-disk storage space and it is attached to a network and
multiple or more number of servers can make use of those space without shutting them down so
that it ensure better maintenance and improve the performance.

• It can be located at anywhere in the local area network (LAN) and used with different
configuration.

• A hardware device called as NAS box or NAS head acts as a gateway between the NAS
systems and clients who are connected in the network.

• It does not use any of the devices such as monitor, keyboard, or mouse, disk drives that are
connected to many NAS systems to increase total capacity.

• It can store any data that appears in the form of files, such as e-mails, web content includes
text, image or videos, and remote system backups.

• It works to provide reliable operation and for easy maintenance.

• It includes built-in features such as security (authenticate the access) or automatic sending of
alerts through mail in case of error occurred on the device that are connected.

• It contributes to provide high degree of scalability, reliability, flexibility and performance.

Comment
Chapter 16, Problem 26RQ

Problem

How have new iSCSI systems improved the applicability of storage area networks?

Step-by-step solution

Step 1 of 1

Internet SCSI (iSCSI) is a protocol proposed to issue commands that allows clients (initiators) to
send SCSI commands to SCSI storage devices through remote channels.

• The main feature is that, it does not require any special cabling connections as needed by Fiber
Channel and it can run for longer distances using existing network infrastructure.

• iSCSI allows data transfers over intranets and manages storage over long distances.

• It can transfer data over variety of networks includes local area networks (LANs), wide area
networks (WANs) or the Internet.

• It is bidirectional; when the request is given, it is processed and the resultant data is sent in
response to the original request.

• It combines different features such as simplicity, low cost, and the functionality of iSCSI devices
provides good upgrades and hence applied in small and medium sized business applications.

Comment
Chapter 16, Problem 27RQ

Problem

What are SATA, SAS, and FC protocols?

Step-by-step solution

Step 1 of 3

SATA Protocol:

SATA stands for serial ATA, wherein ATA represents attachment; therefore SATA becomes serial
AT attachment.

SATA is a modern storage protocol that has fully replaced the most commonly used SCSI (small
computer system interface) and parallel ATA in laptops and small personal computers. SATA
overcomes design limitations of previous storage protocol.

• SATA is suitable for tiered storage environment.

• SATA can be used for small and medium sized enterprises.

• SATA support interchangeability.

Comment

Step 2 of 3

SAS Protocol:

SAS stands for serial attached SCSI. SAS overcomes design limitations of previous storage
protocol and also considered superior to SATA.

• SAS was designed to replace SCSI interfaces in Storage area network (SAN).

• SAS drives are faster than SATA drives and has dual portability.

• SATA can be used for small and medium sized enterprises.

• SATA support interchangeability.

Comment

Step 3 of 3

FC Protocol:

FC stands for serial Fiber channel protocol. Fiber channel is used to connect multiple RAID
systems, taps, which have different configurations.

• Fiber channel supports point to point connection between server and storage system. It also
Provide flexibility to connect too many connections between servers and storage devices.

• Fiber channel has almost the same performance like SAS. It uses fiber optic cables, so high
speed data transfer supported.

• No distance limitation. Low cost alternative for devices.

Comment
Chapter 16, Problem 28RQ

Problem

What are solid-state drives (SSDs) and what advantage do they offer over HDDs?

Step-by-step solution

Step 1 of 2

Solid-state drives (SSD):

SSD is abbreviation for solid-state drives, which uses integrated circuit assemblies as storage to
store data permanently. It is a nonvolatile memory, means it will not forget the data on system
memory when the system is turned off.

SSD is based on flash memory technology, that’s why sometimes it is known as flash memory,
and they don’t require continuous power supply to store data on secondary storage, so they are
known as solid state disk or solid state drives.

SSD does not have read and write head like traditional electromagnetic disk, instead it has
controller (embedded processor) for various operations. It makes speed of data retrieval faster
than magnetic disks. Commonly in SSDs, interconnected NAND flash memory cards are used.

SSD uses wear leveling technique to store data that extend the life of SSD by storing data to
separate NAND cell, instead of overwriting it.

Comment

Step 2 of 2

Advantages of SSDs over HDDs are as follows:

• Faster access time and higher transfer rate:

In SSD data can be accessed directly from different locations on flash memory, so access time in
SSD is 100 times faster than HDD and latency time is low, consequently data transfer rate is high
and system boot up time is low.

• More reliable:

SSD does not have a moving mechanical arm for read and write operations. Data is stored on
integrated circuit chips. SSD has controller to manage all the operations on flash cells, and data
can be written and erased on flash cell, only limited number of time before it fails. The controller
manages these activities, so that SSD can work for many years under normal use.

• No moving component (durable):

As SSD does not have moving component, so data on SSD is safer, even when equipment is
being handled roughly.

• Uses less power:

As in SSD, there is no head rotation to read and write data, so power consumption is lower than
HDD and saves battery life. SDD uses only 2-3 watts whereas HDD uses 6-7 watts of power.

• No noise and generate less heat:

As no moving head rotation is there, so SSD generate less heat and doesn’t make noise that
helps to increase life and reliability of the drive.

• Light weight:

As SSDs are mounted on circuit board and they don’t have moving head and spindle, so they are
light weight and small in size.

Comment
Chapter 16, Problem 29RQ

Problem

What is the function of a buffer manager? What does it do to serve a request for data?

Step-by-step solution

Step 1 of 2

The buffer manager is a software module of DBMS whose responsibility is to serve to all the data
requests and take decision about choosing a buffer and to manage page replacement.

The main functions of buffer manager are:

• To speed up the processing and increase efficiency.

• To increase the possibility that the requested page is found in main memory.

• To find an appropriate replacement for a page while reading a new disk block from disk, such
that the replacement page will not be required soon.

• The buffer manager must ensure that the number of buffers fits in the main memory.

• Buffer manager functions according to the buffer replacement policy and selects the buffers that
must be emptied, when the requested amount of data surpasses the available space in buffer.

Comment

Step 2 of 2

The buffer manager handles two types of operations in buffer pool to fulfill its functionality:

1. Pin count: This is the counter to track the number of page requests or corresponding number
of users who requested that page. Initially counter value is set to zero. If the counter value is
always zero, the page is unpinned. Only unpinned blocks are allowed to be written on the disk.
As the value of counter is incremented the pages are called pinned.

2. Dirty bit: Initially its values is set to zero for all pages. When the page is updated, its value is
updated to 1.

Buffer manager processes the page requests in following steps:

• Buffer manager checks the availability of the page in buffer. If the page is available, it
increments the pin count and sends the page.

• If page is not in buffer, than buffer manager takes the following steps:

• Buffer manager decides a page according to the replacement policy and increments page’s pin
count.

• If the dirty bit of replacement page is on, buffer manager writes that page onto disk and
replaces the old copy.

• If the dirty bit is not on, buffer manager does not write the page back to disk.

• Buffer manager reads the new page and conveys the memory location of the page to the
demanding application.

Comment
Chapter 16, Problem 30RQ

Problem

What are some of the commonly used buffer replacement strategies?

Step-by-step solution

Step 1 of 2

Buffer replacement strategies:

In large DBMSs, files contain so many pages and it is not possible to keep all the data in memory
at the same time. To overcome this storage problem and improve efficiency of DBMS
transactions, buffer manager (software) uses buffer replacement strategies that decide what
buffer to use and which pages are to be replaced in the buffer to give a space to newly requested
pages.

Comment

Step 2 of 2

Some commonly used buffer replacement strategies are as follows:

• LRU (Least recently used):

The LRU strategy keeps track of page usages for specific period of time and it removes the
oldest used page.

LRU works on the principle that the pages which are frequently used are most likely to be used in
further processing too. To maintain the strategy the buffer manager has to maintain a table where
the frequency of the page usage is recorded for every page. This is very common and simple
policy.

It has problem of sequential flooding, which means that there are frequent scanning and
repeated use of I/O for each page.

• Clock policy:

This is an approximate LRU technique. It is like Round robin strategy. In clock replacement policy
buffers are arranged in a circle like a clock with a single clock hand. The buffer manager sets
“use bit” on each reference. If “use bit” is not set (flag 0) for any buffer that means it is not used in
a long time and is vulnerable for replacement. It replaces the old page not the oldest.

• FIFO (First In First Out):

This is the simplest buffer replacement technique. When buffer is required to store new pages,
the oldest arrived page is swapped out. The pages are arranged into the buffer in a queue in a
fashion that most recent page is the tail and oldest arrival is the head.

During replacement the page at the head of the queue is replaced first. This strategy is simple
and easy to implement but not desirable, because it replaces the oldest page which may be most
frequently used page and in future it can be needed, so again it will be swapped in. It creates
processing overhead.

• MRU (Most recently used):

It removes most recently used pages first. This is also called fetch and discard. This is useful in
sequential scanning when most recently used page, won’t be used in future for a period of time.

In situation of sequential scanning LRU and CLOCK strategies don’t perform well. To enhance
performance of FIFO, it can be modified by using some pinned block like root index block, and
make sure that they can’t be replaced and always remain in buffer.

Comment
Chapter 16, Problem 31RQ

Problem

What are optical and tape jukeboxes? What are the different types of optical media served by
optical drives?

Step-by-step solution

Step 1 of 2

Optical jukeboxes:

Optical jukebox is an intelligent data storage device that uses an array of optical disk platters,
and automatically load and unload these disks like according to the storage need. Jukeboxes has
high capacity storage and it supports up to terabytes and even petabytes of tertiary storage.

• Optical jukeboxes have up to 2000 different disk slots. As optical jukeboxes keep traversing
different disk storage according data requirement, so it create time overhead and affect
processing.

• Jukeboxes are cost effective and provide random access of data.

• The process of dynamically loading and uploading of disk drives is called migration.

Magnetic jukeboxes:

Magnetic tape jukeboxes uses a number of tapes as a storage and automatically load and
unload taps on tape drives. This is a popular tertiary storage medium that can handle data up to
terabytes.

Comment

Step 2 of 2

Optical media used by optical drives:

Optical media stores data in digital form. Optical media can store all type of data like audio,
video, software, images and text.

To read and write data on optical media, optical drive is used. Optical drive read and write data
using laser waves. Laser waves are electromagnetic waves with specific wavelength to read
different type of media.

The following Optical media used by optical drives:

• CD(Compact disk): According to use and recording type there are three type of CDs

Read-only: CD-ROM

Writable: CD-R

Re-writable: CD-RW

• DVD(Digital versatile disk) : high capacity drives

• Blu-ray disk: most commonly used to store video.

Comment
Chapter 16, Problem 32RQ

Problem

What is automatic storage tiering? Why is it useful?

Step-by-step solution

Step 1 of 1

Automated storage tiering (AST):

AST is the one of the storage types that filters and transfers the data among different types of
storage like SATA, SAS, SSDs based on the storage requirement, dynamically.

Automated tiering mechanism is managed by the storage administrator. According to the tiering
policy, less used data is transferred to the SATA drives, as it is slower and is not much
expensive, and frequently used data is transferred to high speed SAS or solid state drives.

The automated tiering highly improves performance of the DBMS.

EMC implements FAST (fully automated storage tiering). It automatically monitors data
activeness, and moves active data to high performance storage like SSD and inactive data to
inexpensive and slower storage like SATA. Therefore, AST is useful as it results in high
performance and low cost.

Comment
Chapter 16, Problem 33RQ

Problem

What is object-based storage? How is it superior to conventional storage systems?

Step-by-step solution

Step 1 of 2

Object - based storage:

In object based storage system data is organized in units called object instead of blocks in file. In
this storage system, data is not stored in hierarchy rather than all the data is stored in the form of
objects, and required object can be searched directly using unique global identifier, without
overhead.

Every object in object based storage has three parts:

• Data: It is the information that is to be stored in the objects.

• Variable Meta data: This field has the information about main data like location of the data,
usability, confidentiality and other information required to manage the data.

• Unique global identifier: This identifier stores the address information of the data so that data
can be located easily.

Comment

Step 2 of 2

Object storage system is better than conventional storage system in following ways:

• As the organizations are expanding, their data is also increasing day by day. If the file system is
used as a data storage system and data is stored in the blocks, it would become very difficult to
manage huge amount of data. In conventional file systems, data is stored in hierarchical fashion
and all these data are stored into blocks with their own unique address.

To solve this management overhead, data is stored in the form of objects with additional
metadata information.

• Object based storage provides security of data. In object based systems, the objects can be
accessed directly by the applications through unique global identifier. While in the file storage
system data need to be searched in linear or binary fashion that generates processing overhead
and is time consuming.

• Object based storage system supports features like replication, encapsulation and distribution
of objects, that makes data secure, manageable and easily accessible. However, conventional
file based storage system does not supports replication and distribution of objects.

Comment
Chapter 16, Problem 34E

Problem

Consider a disk with the following characteristics (these are not parameters of any particular disk
unit): block size B = 512 bytes; interblock gap size G = 128 bytes; number of blocks per track =
20; number of tracks per surface = 400. A disk pack consists of 15 double-sided disks.

a. What is the total capacity of a track, and what is its useful capacity (excluding interblock
gaps)?

b. How many cylinders are there?

c. What are the total capacity and the useful capacity of a cylinder?

d. What are the total capacity and the useful capacity of a disk pack?

e. Suppose that the disk drive rotates the disk pack at a speed of 2,400 rpm (revolutions per
minute); what are the transfer rate (tr) in bytes/msec and the block transfer time (btt) in msec?
What is the average rotational delay (rd) in msec? What is the bulk transfer rate? (See Appendix
B.)

f. Suppose that the average seek time is 30 msec. How much time does it take (on the average)
in msec to locate and transfer a single block, given its block address?

g. Calculate the average time it would take to transfer 20 random blocks, and compare this with
the time it would take to transfer 20 consecutive blocks using double buffering to save seek time
and rotational delay.

Step-by-step solution

Step 1 of 8

Given data

Block size

Inter block gap size

Number of blocks per track

Number of tracks per surface

Disk pack consists of 15 double – sided disks

Comment

Step 2 of 8

(a) Total track size

Block per track (block size block gap size)

Bytes

k bytes

Useful capacity of a track = block per tract block size

Bytes

Comment

Step 3 of 8

(b) Number of cylinders

Numbers of tracks

400

Comment

Step 4 of 8

(c) Total cylinder capacity


Comment

Step 5 of 8

(d) Total capacity of a disk pack

Bytes

m bytes

Useful capacity of a disk pack

Comment

Step 6 of 8

(e) Transfer rate

Block transfer time

Average rotational delay

Comment

Step 7 of 8

(f) Average time to locate and transfer a block

Comment

Step 8 of 8

(g) Time to transfer 20 random blocks

Time to transfer 20 consecutive blocks using double

Buffering

Comment
Chapter 16, Problem 35E

Problem

A file has r = 20,000 STUDENT records of fixed length. Each record has the following fields:
Name (30 bytes), Ssn (9 bytes), Address (40 bytes), PHONE (10 bytes), Birth_date (8 bytes),
Sex (1 byte), Major_dept_code (4 bytes), Minor_dept_code (4 bytes), Class_code (4 bytes,
integer), and Degree_program (3 bytes). An additional byte is used as a deletion marker. The file
is stored on the disk whose parameters are given in Exercise.

a. Calculate the record size R in bytes.

b. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned
organization.

c. Calculate the average time it takes to find a record by doing a linear search on the file if (i) the
file blocks are stored contiguously, and double buffering is used; (ii) the file blocks are not stored
contiguously.

d. Assume that the file is ordered by Ssn; by doing a binary search, calculate the time it takes to
search for a record given its Ssn value.

Exercise

What are SATA, SAS, and FC protocols?

Step-by-step solution

Step 1 of 6

Comment

Step 2 of 6

Comment

Step 3 of 6
Comments (1)

Step 4 of 6

Comment

Step 5 of 6

Comment

Step 6 of 6

Comment
Chapter 16, Problem 36E

Problem

Suppose that only 80% of the STUDENT records from Exercise have a value for Phone, 85% for
Major_dept_code, 15% for Minor_dept_code, and 90% for Degree_program; and suppose that
we use a variable-length record file. Each record has a 1-byte field type for each field in the
record, plus the 1-byte deletion marker and a 1-byte end-of-record marker. Suppose that we use
a spanned record organization, where each block has a 5-byte pointer to the next block (this
space is not used for record storage).

a. Calculate the average record length R in bytes.

b. Calculate the number of blocks needed for the file.

Exercise

What are solid-state drives (SSDs) and what advantage do they offer over HDDs?

Step-by-step solution

Step 1 of 3

Assume that a variable length record file is being used.

It is provided that each record has 1 byte field type, along with 1 byte deletion marker and 1 byte
end of record marker.

So the fixed record size would be calculated for fields not mentioned in the question, that is
Name, Ssn, Address, Birth_date, Sex, Class_code.

Therefore,

And for the remaining variable length fields, that is Phone, Major_dept_code, Minor_dept_code,
Degree_program), the number of bytes per record can be calculated as,

Comment

Step 2 of 3

a.

Therefore, the average record length R is,

The average record length is .

Comment

Step 3 of 3

b.

Since a spanned record-file organization is being used, where each block has unused space of
5-bytes pointer, so the usable bytes in each block are .

The number of blocks required for the file can be calculated as,

The numbers of blocks required for file are .

Comment
Chapter 16, Problem 37E

Problem

Suppose that a disk unit has the following parameters; seek time s = 20 msec; rotational delay rd
= 10 msec; block transfer time btt= 1 msec; block size B = 2400 bytes; interblock gap size G =
600 bytes. An EMPLOYEE file has the following fields: Ssn, 9 bytes; Last_name, 20 bytes;
First_name, 20 bytes; Middle_init, 1 byte; Birth_date, 10 bytes; Address, 35 bytes; Phone, 12
bytes; Supervisor_ssn, 9 bytes; Department, 4 bytes; Job_code, 4 bytes; deletion marker, 1 byte.
The EMPLOYEE file has r = 30,000 records, fixed-length format, and unspanned blocking. Write
appropriate formulas and calculate the following values for the above EMPLOYEE file:

a. Calculate the record size R (including the deletion marker), the blocking factor bfr, and the
number of disk blocks b.

b. Calculate the wasted space in each disk block because of the unspanned organization.

c. Calculate the transfer rate tr and the bulk transfer rate btr for this disk unit (see Appendix B for
definitions of tr and btr).

d. Calculate the average number of block accesses needed to search for an arbitrary record in
the file, using linear search.

e. Calculate in msec the average time needed to search for an arbitrary record in the file, using
linear search, if the file blocks are stored on consecutive disk blocks and double buffering is
used.

f. Calculate in msec the average time needed to search for an arbitrary record in the file, using
linear search, if the file blocks are not stored on consecutive disk blocks.

g. Assume that the records are ordered via some key field. Calculate the average number of
block accesses and the average time needed to search for an arbitrary record in the file, using
binary search.

Step-by-step solution

Step 1 of 7

Consider the following parameter of a disk:

Seek time s = 20 msec

Rotational delay rd = 10 msec

Block transfer time btt = 1 msec

Block size B = 2400 bytes

Inter block gap size G = 600 bytes

Consider a file EMPLOYEE is having records such that r = 30,000.

Different fields common in each record are as follows:

Field name Size (in bytes)

Ssn 9

First_name 20

Last_name 20

Middle_init 1

Address 35

Phone 12

Birth_date 10

Supervisor_ssn 9

Department 4

Job_code 4
deletion marker 1

Comment

Step 2 of 7

The record size R can be calculated as,

The record size is .

Since the file is unspanned so the blocking factor bfr can be calculated as,

The blocking factor is .

In an unspanned organization of records, the number of file blocks can be calculated as,

The numbers of file blocks are .

Comment

Step 3 of 7

As the file has unspanned organization, so wasted space in each block can be calculated as,

The wasted space in each disk block is .

Comments (1)

Step 4 of 7

The transfer rate tr can be calculated as,

The transfer rate for the disk is .

The bulk transfer rate btr can be calculated as,

The bulk transfer rate for the disk is .

Comment

Step 5 of 7

While searching for an arbitrary record in a file using the liner search the average number of
block accesses can be found as follows:

• Records are searched on key fields.

If one record satisfies the search condition, on average half of the blocks are to be searched, that
is .

If the record does not satisfies the search condition, all blocks are to be searched, that is
.

• Records are searched on non-key fields.

In this case all blocks are to be searched, that is .

To calculate the average time to find a record using linear search on the file, the search is
performed on average half of the file blocks.

Half of 1579 file blocks is approximately, 1579/2 = 789.5 blocks.

If the blocks are stored on consecutive disk block and double buffering is used, the average time
taken to read 789.5 blocks is,
If the file blocks are stored consecutively and double buffering is used, then the average time
taken to find a record by doing linear search on the file is .

Comment

Step 6 of 7

If the file blocks are not stored in consecutive disk blocks, the time taken to read 789.5 blocks is,

If the file blocks are not stored consecutively, then the average time taken to find a record by
doing linear search on the file is .

Comment

Step 7 of 7

While the records are ordered via some key field and binary search is going on, then the average
number of block accesses can be found as follows

• If record is found then on an average half of the blocks are to be accessed, that is
.

• If the record is not found then all blocks are to be accessed, that is .

If it is assumed that records are ordered through some key field, the time taken to search a
record, using binary search, is calculated as,

The average time taken to search a record via some key field is .

Comment
Chapter 16, Problem 39E

Problem

Load the records of Exercise into expandable hash files based on extendible hashing. Show the
structure of the directory at each step, and the global and local depths. Use the hash function
h(K) = K mod 128.

Exercise

What are optical and tape jukeboxes? What are the different types of optical media served by
optical drives?

Step-by-step solution

Step 1 of 10

Consider the following records:

2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943, 4750, 6975, 4981 and
9208.

The hash function is .

Comment

Step 2 of 10

Calculate the hash value (bucket number) and binary value to each record as follows:

Comment

Step 3 of 10

Now, perform the extendible hashing with local depth 0 and global depth 0. Here, each bucket
can hold two records.

The record 3 i.e., 4692 cannot be inserted because, already two records are inserted. Increase
the global depth to one to insert more elements. Now, the global depth is 1 and local depth is 1.

Check the binary value of each record. Map the record to 0 if the binary value of the record starts
with 0. Map the record to 1 if the binary value of the record starts with 1. For example, the binary
value of bucket number for 2369 is 1000001 (First bit is highlighted). The first bit is 1 thus, it
should be mapped to 1. The binary value of bucket number for 3760 is 0110000. The first bit is 0
thus, it should be mapped to 0.
Comment

Step 4 of 10

The next record cannot be inserted because all the blocks are filled.

Comment

Step 5 of 10

Now, increase the global depth to 2. Thus, check for the first two bits of the binary value of the
bucket number.

Now, insert the next record.


Comment

Step 6 of 10

The record 1821 cannot be inserted. Thus, increase the global depth to 3.

Comment

Step 7 of 10

Now, insert other records. The record 1074 can be inserted easily because there is a space in
the bucket.
Now, insert 7115.

Comment

Step 8 of 10

The record 7115 cannot be inserted. Now, increase the local depth to 3 for the last bucket and
insert the elements.
The records left are 6975, 4981 and 9208. The record 6975 cannot be inserted. Increase the
global depth to 4 and insert the elements.

Comment

Step 9 of 10
The last record cannot be inserted. Insert 9208 by increasing the local depth to 4 in the
corresponding block. The final table is as follows:

Comment

Step 10 of 10
Comment
Chapter 16, Problem 40E

Problem

Load the records of Exercise into an expandable hash file, using linear hashing. Start with a
single disk block, using the hash function h0 = K mod 20, and show how the file grows and how
the hash functions change as the records are inserted. Assume that blocks are split whenever an
overflow occurs, and show the value of n at each stage.

Exercise

What are optical and tape jukeboxes? What are the different types of optical media served by
optical drives?

Step-by-step solution

Step 1 of 1

When we apply hash function K Mod 2^0

we get a single bucket.

We split this bucket into two buckets with new function K Mod 2^1

Bucket1:2369, 4871, 5659, 1821, 7115, 3943, 6975, 4981

Bucket2:3760,4692, 1074, 1620, 2428, 4750, 9208

Now we can split bucket into four buckets:

B1a:2369, 1821,4981

B1b:1074,4750

B1c:4871,5659,7115,3943,6975,

B1d:3760, 4692,1620,2428,9208

Since some bucket more than 2 elements they can be split using function K Mod 2^3

B1:2369,

B5:1821,4981

B7:4871,3943,6975

B3:5659, 7115

B8:3760,9208

B4:4692,1620,2428

B2: 1074

B6:4750

Since some buckets are still greater in size so we apply another function on them K Mod 2^4

B1: 2369

B5:4981

B7:4871,3943,

B8:9208

B15:6975

B4:4692,1620

B11:5659,7115

B12:2428

B13:1821

B16:3760

B14:4750

B2:1074

Now we have all buckets of correct size.

Comment
Chapter 16, Problem 41E

Problem

Compare the file commands listed in Section 16.5 to those available on a file access method you
are familiar with.

Step-by-step solution

Step 1 of 1

File commends listed in Files of Unordered Records, on a file access methods.

Records are placed in the file in the order in which they are inserted. Records are inserted at the
end of the file. This record organization is called heap/ pile file. File commends in the files of
unordered records:-

Inserting a new record:-

Delete a record

External sating

Inserting a record:-

New record insertion is very efficient. It is done by when new record is inserted. Then the last
block of the file is copied in to a butter than the new record is added then block is rewriters back
to the disk.

Delete a record:-

Program must find it’s block first, and copy the block into a buffer, then delete the record from the
buffer and finally rewrite the block back to it disk. In this record deletion. We use the technique of
deletion marker.

External sorting:-

When we want to read all records in order of the value of some fields. Then we create a sorted
copy of the file. For a large disk file it is an expensive. So, for this we use external sorting.

Comment
Chapter 16, Problem 42E

Problem

Suppose that we have an unordered file of fixed-length records that uses an unspanned record
organization. Outline algorithms for insertion, deletion, and modification of a file record. State any
assumptions you make.

Step-by-step solution

Step 1 of 1

Compare the heap file (unordered files) and file access methods.

Heap file:-

- The simplest and basic type of organization.

- Records are placed in the file in the order in which are inserted.

- Inserting a new record is very efficient.

- New records are inserted at the end of the file.

- Searching is done by only search procedure. Mainly involves a linear search, and it is an
expensive procedure.

Fine access methods:-

- In the file organization, organization of the data of a file into records, blocks, and access
structures.

- Records and blocks are placed on the storage medium and they are interlinked. Example:
sorted file.

Access methods:-

- Provide a group of operations and that can be applied to a file.

Example: Open, find, delete, modify, insert close ……etc.

- An organization is consists of several access methods. It is possible to apply.

- Some access methods can be applied only to file organized in certain ways. That are

Records organized by serially, (sequential)

Relative record number based on organization. (Relative)

Indexed based organization (indexed)

Method access refers to the way that is, in which records are accessed. A file with an
organization of indexed or relative may still have its records accessed sequentially. But records in
a file with an organization of sequential. Cannot be accessed directly.

Comment
Chapter 16, Problem 43E

Problem

Suppose that we have an ordered file of fixed-length records and an unordered overflow file to
handle insertion. Both files use unspanned records. Outline algorithms for insertion, deletion, and
modification of a file record and for reorganizing the file. State any assumptions you make.

Step-by-step solution

Step 1 of 2

For ordered file of fixed length:

Algorithms: Consider that file name is abc and file is ordered on Key field that is a numeric fiels
and in increasing order.

For insertion: Let for record that is to be inserted value of Key field be n

1. Open file abc and take file pointer in variable fp

2. Find record where fp.key>n

3. Insert current record at this position.

4. Save the file data

5. Close file

For deletion: let record to be deleted has value for key field = n

1. Open file abc and take file pointer in variable fp

2. Find record where fp.key = n

3. Delete the record.

4. Save result

5. Close file

For modification: let record to be modified has value of key field = n and value of Name is
to be modified to xyz.

1. Open file abc and take file pointer in variable fp

2. Find record where fp.key=n

3. Set fp.name = ‘xyz’

4. Save result

5. Close file.

For am unordered file:

Comment

Step 2 of 2

For insertion: Let for record that is to be inserted value of Key field be n

1. Open file abc and take file pointer in variable fp

2. Seek end of file

3. Insert current record at this position.

4. Save the file data

5. Close file

For deletion: let record to be deleted has value for key field = n

1. Open file abc and take file pointer in variable fp

2. Find record where fp.key = n

3. Delete the record.

4. Save result

5. Close file

For modification: let record to be modified has value of key field = n and value of Name is
to be modified to xyz.

1. Open file abc and take file pointer in variable fp

2. Find record where fp.key = n

3. Set fp.name = ‘xyz’

4. Save result

5. Close file.

Comment
Chapter 16, Problem 44E

Problem

Can you think of techniques other than an unordered overflow file that can be used to make
insertions in an ordered file more efficient?

Step-by-step solution

Step 1 of 1

575-13-33E

Yes, we may think that it is possible to use an overflow file in which the records are chained
together in a manner similar to the overflow for static hash files. The overflow records that should
be inserted in each block of the ordered file are linked together in the overflow file, and a pointer
to the first record in the linked list, that is kept in the block of the main file.

The list may or may not be kept ordered.

Comment
Chapter 16, Problem 45E

Problem

Suppose that we have a hash file of fixed-length records, and suppose that overflow is handled
by chaining. Outline algorithms for insertion, deletion, and modification of a file record. State any
assumptions you make.

Step-by-step solution

Step 1 of 2

Over flow is handled by chaining. Means, in a bucket. Multiple blocks are chained together and
attached by a number of over flow buckets together.

In a hash structure. The insertion is done like this

Step 1:

Each bucket stores a value all the entries that point to the same bucket have the same
values on the first ; bits

Step 2:

To locate the bucket containing search key ;

Compute

Use the first high order nits of as a displacement in to the bucket address table and
follow the pointer to the appropriate bucket.

Step 3: T inserts a record with search key value ;

Follow lookup procedure to locate the bucket, say

If there is room in bucket , insert the record

Otherwise the bucket must be split and insertion reattempted.

Comment

Step 2 of 2

Deletion in hash file:-

To delete a key value,

Sept 1.

Locate it in its bucket and remove it

Step 2.

The bucket it self can be removed if it becomes empty

Step 3.

Coalescing of buckets is possible-can only coalesce with a “buddy” bucket having the same
value of and same prefix, if one such bucket exists

Assumptions:-

Each key in the record is unique

Data file in the record is open

Overflow file is open

A bucket record has been defined

Comment
Chapter 16, Problem 45E

Problem

Suppose that we have a hash file of fixed-length records, and suppose that overflow is handled
by chaining. Outline algorithms for insertion, deletion, and modification of a file record. State any
assumptions you make.

Step-by-step solution

Step 1 of 2

Over flow is handled by chaining. Means, in a bucket. Multiple blocks are chained together and
attached by a number of over flow buckets together.

In a hash structure. The insertion is done like this

Step 1:

Each bucket stores a value all the entries that point to the same bucket have the same
values on the first ; bits

Step 2:

To locate the bucket containing search key ;

Compute

Use the first high order nits of as a displacement in to the bucket address table and
follow the pointer to the appropriate bucket.

Step 3: T inserts a record with search key value ;

Follow lookup procedure to locate the bucket, say

If there is room in bucket , insert the record

Otherwise the bucket must be split and insertion reattempted.

Comment

Step 2 of 2

Deletion in hash file:-

To delete a key value,

Sept 1.

Locate it in its bucket and remove it

Step 2.

The bucket it self can be removed if it becomes empty

Step 3.

Coalescing of buckets is possible-can only coalesce with a “buddy” bucket having the same
value of and same prefix, if one such bucket exists

Assumptions:-

Each key in the record is unique

Data file in the record is open

Overflow file is open

A bucket record has been defined

Comment
Chapter 16, Problem 46E

Problem

Can you think of techniques other than chaining to handle bucket overflow in external hashing?

Step-by-step solution

Step 1 of 5

To handle a bucket overflow in external hashing, there is a techniques like chaining and Trie-
Based hashing.

Through this technique:

- it allow the number of allocated buckets to grow and shrink as needed.

- Distributes records among buckets based on the values of the leading bits in their hash values.
We can show this technique by the following.

Let bucket of disk address is

Comment

Step 2 of 5

Comment

Step 3 of 5

Over flow is done by,

the bucket (block) based on the first binary digit of the hash address.

So, the address is split into

Comment

Step 4 of 5

Comment

Step 5 of 5

Here bulk flow is done and now again it is split on 2nd bit in the hash address

Ti show this,

Suppose we have:
If we want to inset in the previous structure thour the structure is comes like this

Comment
Chapter 16, Problem 47E

Problem

Write pseudocode for the insertion algorithms for linear hashing and for extendible hashing.

Step-by-step solution

Step 1 of 2

Pseudo code for the insertion algorithms:-

We assume that the elements in the hash table T are keys with no information.

The key K is identical to the element containing key K. Every slot contains either a key or Nil.

HASH – INSERT (T, K)

Report

If

Then

Return j

Else

Unitl

Error “hash table over flow”

Comment

Step 2 of 2

Pseudo code for the insertion algorithms for extendible mashing:-

Insertion

Algorithm: initialize (num buckets)

Input: desired number of buckets

1. Initialize array of linked lists;

Algorithm: in sert (key, value)

Input: key – value pair

// compute table entry:

Entry = key. Has code ( ) mod num buckets

If table [entry] is null

//no list present, so create one

Table [Entry] = new linked list;

Table [Entry].add (key. value)

Else

//otherwise, add to existing list

Table [entry].add (key. value)

End if.

Comment
Chapter 16, Problem 48E

Problem

Write program code to access individual fields of records under each of the following
circumstances. For each case, state the assumptions you make concerning pointers, separator
characters, and so on. Determine the type of information needed in the file header in order for
your code to be general in each case.

a. Fixed-length records with unspanned blocking

b. Fixed-length records with spanned blocking

c. Variable-length records with variable-length fields and spanned blocking

d. Variable-length records with repeating groups and spanned blocking

e. Variable-length records with optional fields and spanned blocking

f. Variable-length records that allow all three cases in parts c, d, and e

Step-by-step solution

Step 1 of 6

a.

Consider the following program code for fixed length records with unspanned blocking.

//initialize the initial address of starting location using pointer

*starting_location=200;

// record_to_access

int x

//x is the fifth record in the field

x = 5;

y is the second field of the fifth record

y = 2;

//record_size

R=25;

//for loop is used to check the value of byte.

for (B=0; B>=25; B++)

//while loop is used to check the bytes B remaining in each field

while (B

x = starting_locaton+(R*x)+y;

• In the above code, assume that the starting location of memory address is 200.

• In computer memory, records are stored into the block.

• When the records size is less than the block size, each block store more than one record.

• Block size is defined by B bytes and records size is defined by R.

Comment

Step 2 of 6

b.

Consider the following program code for fixed length records with spanned blocking.

//initialize the initial address of starting location

*starting_location=200;

// record_to_access

int x

//x is the fifth record in the field

x = 5;

//y is the second field of the fifth record

y = 2;

//record_size

R=25;

//initialize the value of i


int i=0;

//B is the block size

int B;

//a is field size

int a=1

// for loop is used to check the value of byte.

for (B=0; B>=25; B++)

// while loop is used to check the separating character

while ($)

//if while loop contain the separating symbol, update the value of

//current_location

current_location = current_location + 25B;

//while loop is used to check the bytes B remaining in each field

while (B

//update the value of variable i

i= i + 2*(a+1)

• In the above code, $ is used as separator character.

• while loop contain the separating symbol, update the value of current_location

• update the value of variable I

Comment

Step 3 of 6

c.

Consider the following code for variable length records with variable length fields and spanned
blocking.

//initialize the initial address of starting location

*starting_location=200;

// record_to_access

int x

//x is the fifth record in the field

x = 5;

//y is the second field of the fifth record

y = 2;

//record_size

R=25;

//a is field size

int a=1

// ReadFirstByte is used to reads first byte of current line and returns true if it indicates an //empty
record

empty = ReadFirstByte(a);

//if statement is used to check the condition

if (! empty)

// update the value of crnt_Rcrd_Length

crnt_Rcrd_Length += a.length ();

//if statement is used to check the value of crnt_Rcrd_Length

if (crnt_Rcrd_Length!= R)

empty = false;

// if statement is used to check the value of crnt_Rcrd_

if (crnt_Rcrd_Length > R)

// not efficient, nor thread safe - deep copy occurs here

records.push_back(*this);

• In the above code assume that each record has an end of record byte.

• Move byte by byte to access the records.

• if statement is used to check the value of specified condition in loop.


Comment

Step 4 of 6

d.

Consider the following code for variable length records with repeating group and spanned
blocking.

if (! empty)

// update the value of crnt_Rcrd_Length

crnt_Rcrd_Length += a.length ();

• Consider the highlighted code. It will be removed from part (c) to determine variable length
records with repeating group and spanned blocking.

• Since the spanned blocking involves records spanning more than one block, so the record
length is not required.

Comment

Step 5 of 6

e.

Consider the following code for variable length records with optional field and spanned blocking.

if (crnt_Rcrd_Length!= R)

empty = false;

• Consider the highlighted code. It will be removed from part (c) to determine variable length
records with optional field and spanned blocking.

• As some of the fields in the file records are optional, so the record length of the records, present
in the files, can be skipped.

Comment

Step 6 of 6

f.

Consider the following code for variable length records that allow all three cases in parts c, d and
e.

if (crnt_Rcrd_Length > R)

// not efficient, nor thread safe - deep copy occurs here

records.push_back(*this);

• Consider the highlighted code. It will be removed from part (c) to determine variable length
records that allow all three cases in parts c, d and e.

• One or more of the fields of the records, present in the files, are of varying size so their size
need not be greater than R. Hence the above part of the code can be skipped.

Comment
Chapter 16, Problem 49E

Problem

Suppose that a file initially contains r = 120,000 records of R = 200 bytes each in an unsorted
(heap) file. The block size B = 2,400 bytes, the average seek time s = 16 ms, the average
rotational latency rd = 8.3 ms, and the block transfer time btt = 0.8 ms. Assume that 1 record is
deleted for every 2 records added until the total number of active records is 240,000.

a. How many block transfers are needed to reorganize the file?

b. How long does it take to find a record right before reorganization?

c. How long does it take to find a record right after reorganization?

Step-by-step solution

Step 1 of 4

Let X = # of records are deleted and 2X= # of records added.

So, total active records = 240,000

= 120,000 - X + 2X.

X = 120,000

Physically records may deleting for reorganization is = 360,000.

Comment

Step 2 of 4

(a)

No. of blocks for Reorganization = Blocks Read + Blocks Written.

-200 bytes/record and 2400 bytes/block gives us 12 records per block

- involves 360,000 records

360,000/12 = 30K blocks

-Writing involves 240,000 records

240000/12 = 20K blocks.

Total blocks transferred during reorganization = 30K + 20K

= 50K blocks.

Comment

Step 3 of 4

(b)

On an average we assume that half the file will be read.

So, Time = (b/2)* btt = 15000 * 0.8 ms

= 12000 ms.

= 12 sec.

Comment

Step 4 of 4

(c)

Time to locate a record after reorganization = (b/2) * btt

= 10000 * 0.8

= 8 sec.

Comment
Chapter 16, Problem 50E

Problem

Suppose we have a sequential (ordered) file of 100,000 records where each record is 240 bytes.
Assume that B = 2,400 bytes, s = 16 ms, rd = 8.3 ms, and btt = 0.8 ms. Suppose we want to
make X independent random record reads from the file. We could make X random block reads or
we could perform one exhaustive read of the entire file looking for those X records. The question
is to decide when it would be more efficient to perform one exhaustive read of the entire file than
to perform X individual random reads. That is, what is the value for X when an exhaustive read of
the file is more efficient than random X reads? Develop this as a function of X.

Step-by-step solution

Step 1 of 3

The records in the file are ordered sequentially.

Total number of records in the file (Tr) = 100000.

Size of each record (rs) = 240 bytes.

Size of each block (B) = 2400 bytes.

Average seek time (s) = 16 ms.

Average rotational latency (rd) = 8.3 ms.

Block transfer time (btt) = 0.8 ms.

Calculate the total number of blocks (TB) in file using the formula .

Hence, total number of blocks in file (TB) = 10000 blocks.

Comment

Step 2 of 3

Calculate the time required for exhaustive reads (er) using the formula .

Hence, the time required for exhaustive read (er) = 8024.3 ms.

Comment

Step 3 of 3

Consider X be the number of records need to be read.

The equation to decide the performance of one exhaustive read of the entire file is more efficient
than performing X individual random reads follows:

Time required to perform X individual random reads > time required for exhaustive read

Therefore, when 320 or more individual random reads are required, then it is better to read
the file exhaustively.

The function in X that relates the individual random reads and exhaustive reads is given by the
following equation:

Comment
Chapter 16, Problem 51E

Problem

Suppose that a static hash file initially has 600 buckets in the primary area and that records are
inserted that create an overflow area of 600 buckets. If we reorganize the hash file, we can
assume that most of the overflow is eliminated. If the cost of reorganizing the file is the cost of
the bucket transfers (reading and writing all of the buckets) and the only periodic file operation is
the fetch operation, then how many times would we have to perform a fetch (successfully) to
make the reorganization cost effective? That is, the reorganization cost and subsequent search
cost are less than the search cost before reorganization. Support your answer. Assume s = 16
msec, rd = 8.3 msec, and btt = 1 msec.

Step-by-step solution

Step 1 of 1

Primary Area = 600 buckets

Secondary Area = 600 buckets

Total reorganization cost = Buckets Read & Buckets Written for (600 & 600) + 1200

= 2400 buckets

= 2400 (1 ms)

= 2400 ms

Let X = number of random fetches from the file.

Average Search time per fetch = time to access (1 + 1/2) buckets where 50% of time we need to
access the overflow bucket.

Access time for one bucket access = (S + r + btt)

= 16 + 8.3 + 0-8

= 25.1 ms

Time with reorganization for the X fetches

= 2400 + X (25.1) ms

Time without reorganization for X fetches = X (25.1) (1 + 1/2) ms

= 1.5 * X * (25.1) ms.

So, 2400 + X (25.1) < (25.1) * (1.5X)

2374.9/ 12.55 < X

So, 189.23 < X

If we take at least 190 fetches, then the reorganization is worthwhile.

Comment
Chapter 16, Problem 52E

Problem

Suppose we want to create a linear hash file with a file load factor of 0.7 and a blocking factor of
20 records per bucket, which is to contain 112,000 records initially.

a. How many buckets should we allocate in the primary area?

b. What should be the number of bits used for bucket addresses?

Step-by-step solution

Step 1 of 2

575-13-41E

(a)

No of buckets in primary area = 112000/(20*0.7)

= 8000.

Comment

Step 2 of 2

(b)

Let ‘K’ is the number of bits used for bucket addresses. So, 2K < = 8000 < = 2 k+1

2 12 = 4096

2 13 = 8192

K = 12

Boundary Value = 8000 - 2 12

= 8000 - 4096

= 3904 -

Comment
Chapter 17, Problem 1RQ

Problem

Define the following terms: indexing field, primary key field, clustering field, secondary key field,
block anchor, dense index, and nondense (sparse) index.

Step-by-step solution

Step 1 of 1

Define the following terms:-

Indexing field:-

Record structure is consisting of several fields. The record fields are used to construct an index.
An index access structure is usually defined on a single field of a file. Any field in a file can be
used to create an index and multiple indexes on different fields can be constructed on a field.

Primary key field:-

A primary key is the ordering key field of the file. A field that is uniquely identifies a record.

Clustering field:-

A secondary index is also an ordered field with two fields. ( like a primary index). The first field is
of the same data type as some non-ordering field of the data file that is an indexing field. If the
secondary access structure uses a key field, which has a distinct value for every record.
Therefore, it is called as secondary key field.

Block anchor:-

The total number of entries in the index is the same as the number of disk block in the ordered
data file.

The first record in each block of the data file is called as block anchor.

Dense index:

An index has an index entry for every search key value (and hence every record) in the data file.
Index record contains the pointer and search key value to the records on the disk

Non-dense:-

An index has entries for only some of the search values.

Comment
Chapter 17, Problem 2RQ

Problem

What are the differences among primary, secondary, and clustering indexes? How do these
differences affect the ways in which these indexes are implemented? Which of the indexes are
dense, and which are not?

Step-by-step solution

Step 1 of 1

Differences among primary secondary and clustering indexes:-

Comment
Chapter 17, Problem 3RQ

Problem

Why can we have at most one primary or clustering index on a file, but several secondary
indexes?

Step-by-step solution

Step 1 of 2

A file which is in an order has some fixed size of the records with some key fields is said to be
the primary index. But the clustering index in which it has a block pointer and the data with a field
of the same type as the clustering field.

Adding or removing records in the file cannot be done easily. It has some problems in which the
data records are physically ordered.

To overcome this problem, a whole block can be reserved for each of the clustering fields.

Comment

Step 2 of 2

A file which is not in an order is said to be secondary index. It can be defined on a single key field
with a unique value and on a non-key field with repeated values.

The following is the reason behind why there are at most one primary or clustering indexes
whereas several indexes for secondary index:

• Primary and clustering index can use a single key field such that both of them cannot be there
in a file but for secondary index, a unique value can be taken as a key field in every records or a
non-key field with the repeated values in which the pointers will point to another block that have
pointers to the repeated values.

Comment
Chapter 17, Problem 4RQ

Problem

How does multilevel indexing improve the efficiency of searching an index file?

Step-by-step solution

Step 1 of 4

Solution:

Multilevel indexing improves the efficiency of searching an indexing file.

• In multilevel indexing, the main idea is to reduce the blocks of the index that are searched.

• It is the blocking factor for the index.

So, the search space is reduced much faster.

A Multi-level defines the index file that will be referred first with an ordered file with a
distinct k value.

• By using single level index, create the primary index and then create the second-level, third-
level and so on.

• So that the multi-level index can be created with the single index blocks.

Comment

Step 2 of 4

For improving the efficiency of searching the index file, multilevel index in is follows the
following steps:

Step1:

• Multilevel index considers the index file. The distinct value with an ordered file for each key k (i)

Step 2:

• In first level, create a primary index.

• It is called primary index.

• Also, use block anchors.

• So, there is one entry in the level for each block.

• Hence, the second level blocking factor before, is some as the first level of the index.

• Here, before the blocking factor the first level 1 has entries, then the first level needs

blocks.

• Then, in the second level index is needed.

Step 3:

• In next level, the primary index has an entry in the second level for the second-level blocks.

So, the entries in the third level is

• Now repeat the process until all the entries fit in the single block of some index level fit.

• Now, it is in the block at the th level. Also, it is the top index level.

• So, reduce the number of entries by a factor of at the previous level.

Comment

Step 3 of 4

Use the formula to calculate the value,

Hence in the multilevel index,

Approximately levels will be corresponding to the first-level entries

Where

From the above steps and processer, we may improve the efficiency of the search an index file.

Comment

Step 4 of 4

The following ways that the multilevel indexing improved the efficiency of searching an
index file is:

• While searching the record, it reduces the access of number of blocks in the given indexing field
value.
• The benefits of multi-level indexing include the reduction of insertion and deletion problems in
indexing.

• While inserting new entries, it leaves some space that deals to tshe advantage to developers to
adopt the multi-level indexing.

• By using B-tress and B+ trees, it is often implemented.

Comment
Chapter 17, Problem 5RQ

Problem

What is the order p of a B-tree? Describe the structure of B-tree nodes.

Step-by-step solution

Step 1 of 2

Order P of a B – tree:-

A tree, it consists that, each node contains at must p – 1 search values and P pointers in the
order

Where :

Here each is a pointer to child node and

Is search value from some ordered set of values.

Comment

Step 2 of 2

Structure of the B-tree

Structure of a B-tree follows the below steps.

Step 1:

Each internal node in the B-tree is in the form of

Here

is a tree pointer

is a data pointer. and

Search key value is equal to

Step 2:

With in each node,

Step 3:

For all search key field values X in the Sub tree pointed at by :

Step 4: Each node have at most tree pointers.

Sep 5: Each node, except the root and leaf nodes, has at least two tree pointers unless it is the
only node in the tree

Step 6: A node with a tree pointers, , has search key field values.

Step 7: All nodes are at the same level. Leaf nodes have to same structure as internal nodes
except that all of their tree pointers are Null

Below figure shows the structure:-

Comment
Chapter 17, Problem 6RQ

Problem

What is the order p of a B+-tree? Describe the structure of both internal and leaf nodes of a B+-
tree.

Step-by-step solution

Step 1 of 4

Order P of a B + -tree:-

Implementation of a dynamic multilevel index use a variation of the B-tree data structure is called
as -tree.

Structure of internal nodes of a B + -tree:-

Comment

Step 2 of 4

Comment

Step 3 of 4

From the above figure,

Step 1

Each internal node is of the form of

Where and each is a tree pointer.

Step 2

Within each internal node,

Step 3

For all search field values X in the sub tree pointed at by we have for
; where for and

Step 4 each internal node has at most P tree pointers

Step 5

Each internal node, except the root, has at least tree pointers.

The root node has at least two tree pointers, if it is an internal node.

Step 6

An internal node with pointers, . Has search field values.

Structure of leaf nodes of B + -tree:-

Comment

Step 4 of 4

From the above figure:-


Step 1:

Each leaf node is the form of

Where , each is a data pointer and points to the next leaf node of the .

Step 2:

Within each leaf node,

Step 3:

Each is a data pointer that points to the record whose search field value is

or to a file block containing the record.

Step 4:

Each leaf node has at least values.

Step 5:-

All leaf nodes are at the same values.

Comment
Chapter 17, Problem 7RQ

Problem

How does a B-tree differ from a B+-tree? Why is a B+-tree usually preferred as an access
structure to a data file?

Step-by-step solution

Step 1 of 1

The main difference in B-tree and B+ - tree is

A B-tree has data pointers in the both internal and leaf nodes, where as

In B+-tree, it has only tree pointers in internal nodes and all data pointers are in leaf
nodes.

B+-tree preferred as an access structure to a data file because, entries in the internal nodes of a
B+-tree leading to fewer levels improving the search time.

In addition that, the entire tree can be traversed in order using the pent pointers.

Comment
Chapter 17, Problem 8RQ

Problem

Explain what alternative choices exist for accessing a file based on multiple search keys.

Step-by-step solution

Step 1 of 3

Choices for accessing file based on multiple fields are:

1. Ordered Index on Multiple Attributes: In this index is created on search key field that is a
combination of attributes . If an index is created on attributes , the search key values are tuples
with n values:

A lexicographic ordering of these tupl values establishes an order on this composite search key.
Lexicographic ordering works similar to ordering of character strings. An index on a composite
key of n attributes works similarly to primary or secondary indexing.

Comment

Step 2 of 3

2. Partitioned Hashing: Partitioned hashing is an extension of static external hashing that


allows access on multiple keys. It is only suitable for equality comparisons; range queries are not
supported. In partitioned hashing, for a key consisting of n components, the hash function is
designed to produce a result with n separate hash address. Th bucket address is a
concatenation of these n addresses. It is then possible to search for the required composite
search keys by looking up the appropriate buckets that match the parts of the address in which
we are interested.

For example, consider the composite aearch key id Dno is hashed to 3 bits and Age to 5 bits; we
get 8 bits of address. Suppose that Dno = has hash address ‘100’ and for Age = 59 has address
‘10101’ to search combination, search bucket address = 10010101.

An advantage of portioned hashing is is that it can be easily extended to any number of


attributes. The bucket address can be designed so that high order bits in the address correspond
to more frequently accessed attributes. Additionally, no separate access needs to be maintained
for the individual attributes. The main drawback of portioned hashing is tat it cannot handle range
queries on any of the component attributes.

Comment

Step 3 of 3

3. Grid Files: We can construct grid array with one linear scale for each of search attributes. This
method is particularly useful for range queries that would map into a set of cells corresponding to
a group of values along the linear scales. Conceptually, the grid file concept may be applied to
any number of search keys. For n search keys, the grid array would have n dimensions of the
search keys attributes and provides an access by combinations of value along those dimensions.
Grid files perform well in terms of reduction in time for multiple key accesses. However, they
represent a space overhead in terms of grid array structure. Moreover, with dynamic files, a
frequent recognition of the files adds to maintenance cost.

Comment
Chapter 17, Problem 9RQ

Problem

What is partitioned hashing? How does it work? What are its limitations?

Step-by-step solution

Step 1 of 1

Partitioned flashing:-

It is an extension of static external hashing. That allows access on multiple keys. Means, hash
values that are split into segments. That depend on each attribute of the search key.

Let take one example:

Let , for customer and search-key being (customer – street, customer – city)

Search – key value hash value

(Main, ) 101111

(Main, ) 101001

(Park, ) 010010

(Spring, ) 001001

(, ) 110010

Working functionally of partitioned hashing:-

In partitioned hashing, for a key consisting of components. Hash function is designed to


produce a result with separate, hash addresses.

Bucket address is added to these address

Now, it is ready to search for the required composite search key by looking up eh
appropriate buckets that mach the parts of the address in which we are interested.

Limitations of partitioned hashing:-

It can be easily extended to any number of attributes.

For individual attributes, it has no separate access structure.

It cannot handle range queries on any of the component attributes.

Comment
Chapter 17, Problem 11RQ

Problem

Show an example of constructing a grid array on two attributes on some file.

Step-by-step solution

Step 1 of 2

Take grid array for the EMPLOYEE file with one linear scale D no and another for the age
attribute.

D no

0 1,2

1 3,4

25

3 6,7

48

5. 9,10

Comment

Step 2 of 2

Linear scale for Age

0 1 2 3 4 5

<20 21-25 26-30 31-40 41-50 >50

Through this data we want to show that the linear scale of D no has D no combined as one
value 0 on the scale while D no corresponds to the value 2 on the scale and age is divided
into its scale of 0 to 5 by grouping ages and distribute the employees uniformly by age.

For this the grid array shows cells. And each cell points to some bucket address where the
records corresponding to that cell are stored.

Now our request for D no and age maps into cell . It is corresponding to grid array,
and it will be found in the corresponding bucket. For ‘n’ search keys, the grid array would have ‘n’
dimensions.

Grid array on D no and AGE attributes.

Employee File.

Comment
Chapter 17, Problem 12RQ

Problem

What is a fully inverted file? What is an indexed sequential file?

Step-by-step solution

Step 1 of 2

Fully inverted file:-

Indexes that are all secondary and new records are inserted at the end of the file. Then the data
file it self is an unordered file. So, a file that have secondary index on every one of its field is
called as fully invented file. Usually, the indexes that are implemented as B+- tree and up load
dynamically to reflect insertion or deletion of records.

Comment

Step 2 of 2

Indexed sequential file:-

An indexed sequential file is a sequential file which has an index.

Sequential file means it stored into order of a key field.

Indexed sequential files are important for applications where data needs to be accessed
through

Sequential and randomly using the index.

An indexed sequential file allows fast access to a specific record.

Let an example.

An organization may store the details about it’s employees as an indexed sequential file,
and sometimes the file is accessed

Sequential:-

For example, when the whole of the file is processed to produce pay slips at the end of the
month.

Randomly:

An example changes address, or a female employee gets married can changes her surname so,
indexed sequential file can only be stored an a random access device.

Example magnetic disc, CD

Comment
Chapter 17, Problem 13RQ

Problem

How can hashing be used to construct an index?

Step-by-step solution

Step 1 of 1

Hashing technique is used for searching wherein fast retrieval of records is necessary. The
reference file used for this is known as hash file. The search condition is validated using the hash
key which is nothing but the reference name that has to be found.

Functions of hashing:

• A hash function ‘f’ or randomizing function is entered in the hash field value of a record and
determines the address of it.

• It is also used as an internal search function within a program, whenever a group of records is
accessed by using the value of only one field.

• Access structures similar to indexes that are based on hashing can be created; the hash index
is a secondary structure to access the file by using hashing function on a search key.

• The index entries contains the key (K) and the pointer (P) used to point to the record containing
the key or block containing the record for that key.

• The index files that contain these index entries can be organized as a dynamically expandable
hash file, using dynamic or linear or extendible hashing techniques, searching for an entry is
performed by using hash search algorithm on K.

• Once an entry is identified the pointer (P) is used to locate the corresponding record in the data
file.

Comment
Chapter 17, Problem 14RQ

Problem

What is bitmap indexing? Create a relation with two columns and sixteen tuples and show an
example of a bitmap index on one or both.

Step-by-step solution

Step 1 of 1

The bitmap index is a data structure that allows querying on more number of keys

• It is used for relations that contain a large number of rows so that it can be used identify the
relation for the specific key value.

• It creates an index for one or more columns and each value or value range in those columns
selected is/are indexed.

• A bitmap index is created for those columns that should contain only a small number of unique
values.

Construction

• To create a bitmap index for a set of records in a relation or a table, the records must be
numbered from 0 to n with an id that is used to be mapped to a physical address that contains a
block number and a record offset within the block.

• It is created for one particular value of a particular field (or column) as an array of bits.

• For example a bitmap index is constructed for the column F and a value V of that column. A
relation with n rows of n tuples and it contains n bits. The jth bit is set to 1 if the row j has the
value V for column F, otherwise it is set to 0.

Example

S.No Customer Name Gender

1 Alan M

2 Clara F

3 John M

4 Benjamin M

5 Marcus M

6 Alice F

7 Joule F

8 Louis M

9 Samuel M

10 Lara F

11 Andy F

12 Martin M

13 Catherine F

14 Fuji F

15 Zarain F

16 Ford M

Bitmap index for Gender

For M: 1011100110010001, the row that contains the tuple M wherever it appears are set to 1,
other are set to 0.

For F: 0100011001101110, the row that contains the tuple F wherever it appears are set to 1,
other are set to 0.

Comment
Chapter 17, Problem 15RQ

Problem

What is the concept of function-based indexing? What additional purpose does it serve?

Step-by-step solution

Step 1 of 1

Function-based indexing is a new type of indexing that has been developed and used by the
Oracle database systems as well as in some other organizational products that provides financial
profit.

By applying any function to the value that belongs to the field or to the collection of fields, a result
is obtained which is used as the key to the index that is used to create an index.

It ensures that Oracle Database System will use this index to search instead of performing the
scan over full table, even when a function is used in the search value of a query.

Example,

The query that create an index, using function LOWER (CustomerName),

CREATE INDEX lower_ix ON Customer (LOWER (CustomerName));

It returns the customer name in lower case letter; LOWER ("MARTIN") results in “martin”, the
query given below uses the index:

SELECT CustomerName

FROM Customer

WHERE Lower (CustomerName) = "martin".

If the functional-based indexing is not used, an Oracle database system perform scanning
process for the entire table, as -tree index is a searching process by using directly only the
column value, any function that is used on a column avoids using such an index.

Comment
Chapter 17, Problem 16RQ

Problem

What is the difference between a logical index and a physical index?

Step-by-step solution

Step 1 of 1

Physical index

• The index entries with the key (K) and the physical pointer (P), used to point to the physical
address of the record stored on the disk as a block number and offset. This is referred as
physical index.

• For example, a primary file organization is based on extendible or linear hashing, and then at
each time when a bucket is split, some of the records are allocated to a newer bucket and hence
they are provided with new physical addresses.

• If there is a secondary indexing used on the file, the pointers that point to that record must be
determined and updated (pointer must be changed if the record moved to another location) but it
is considered to be a difficult task.

Logical index

• The index entries of logical index are a pair of keys K and Ks.

• Every entry of the records contains one value of K used for primary organization of files and
another key Ks for the secondary indexing field matched with the value K of the field.

• While searching the secondary index on the value of Ks, a program can identify the location of
the corresponding value of K and use this matching key terms to access the record through the
primary organization of the file, thus it introduces an extra search level of indirection between the
data and access structure.

Comment
Chapter 17, Problem 17RQ

Problem

What is column-based storage of a relational database?

Step-by-step solution

Step 1 of 1

Column-based storage of relations is a traditional way of storing the relations by row (one by
one). It provides advantages especially for read-only queries, which are from read-only
databases. It stores each column of data in relational databases individually and provides
performance advantages.

Advantages

• Partitioning the table vertically column by column, so those tables with a two-column are
constructed for each and every attribute of the table and thus only the columns that are needed
can be accessed.

• Column-wise indexes and join indexes are used on multiple tables to provide answer to the
queries without accessing the data tables.

• Materialized views are used to support queries on multiple columns.

Column-wise storage of data provides an extra feature in the index creation. The same column
present in each table on number of projections creates indexes on each projection. For storing
the values in the same column, various strategies, data compression, null value suppression,
and various encoding techniques are used.

Comment
Chapter 17, Problem 18E

Problem

Consider a disk with block size B = 512 bytes. A block pointer is P = 6 bytes long, and a record
pointer is PR = 7 bytes long. A file has r = 30,000 EMPLOYEE records of fixed length. Each
record has the following fields: Name (30 bytes), Ssn (9 bytes), Department_code (9 bytes),
Address (40 bytes), Phone (10 bytes), Birth_date (8 bytes), Sex (1 byte), Job_code (4 bytes),
and Salary (4 bytes, real number). An additional byte is used as a deletion marker.

a. Calculate the record size R in bytes.

b. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned
organization.

c. Suppose that the file is ordered by the key field Ssn and we want to construct a primary index
on Ssn. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the
number of first-level index entries and the number of first-level index blocks; (iii) the number of
levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the
multilevel index; and (v) the number of block accesses needed to search for and retrieve a record
from the file—given its Ssn value—using the primary index.

d. Suppose that the file is not ordered by the key field Ssn and we want to construct a secondary
index on Ssn. Repeat the previous exercise (part c) for the secondary index and compare with
the primary index.

e. Suppose that the file is not ordered by the nonkey field Department_code and we want to
construct a secondary index on Department_code, using option 3 of Section 17.1.3, with an extra
level of indirection that stores record pointers. Assume there are 1,000 distinct values of
Department_code and that the EMPLOYEE records are evenly distributed among these values.
Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the number of
blocks needed by the level of indirection that stores record pointers; (iii) the number of first-level
index entries and the number of first-level index blocks; (iv) the number of levels needed if we
make it into a multilevel index; (v) the total number of blocks required by the multilevel index and
the blocks used in the extra level of indirection; and (vi) the approximate number of block
accesses needed to search for and retrieve all records in the file that have a specific
Department_code value, using the index.

f. Suppose that the file is ordered by the nonkey field Department_code and we want to construct
a clustering index on Department_code that uses block anchors (every new value of
Department_code starts at the beginning of a new block). Assume there are 1,000 distinct values
of Department_code and that the EMPLOYEE records are evenly distributed among these
values. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the
number of first-level index entries and the number of first-level index blocks; (iii) the number of
levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the
multilevel index; and (v) the number of block accesses needed to search for and retrieve all
records in the file that have a specific Department_code value, using the clustering index
(assume that multiple blocks in a cluster are contiguous).

g. Suppose that the file is not ordered by the key field Ssn and we want to construct a B+-tree
access structure (index) on Ssn. Calculate (i) the orders p and pleaf of the B+-tree; (ii) the
number of leaf-level blocks needed if blocks are approximately 69% full (rounded up for
convenience); (iii) the number of levels needed if internal nodes are also 69% full (rounded up for
convenience); (iv) the total number of blocks required by the B+-tree; and (v) the number of block
accesses needed to search for and retrieve a record from the file?given its Ssn value?using the
B+-tree.

h. Repeat part g, but for a B-tree rather than for a B+-tree. Compare your results for the B-tree
and for the B+-tree.

Step-by-step solution

Step 1 of 31

Disk operations on file using primary, secondary, clustering, B+ tree and B-tree methods

(a) Calculation of Record Size

Record size is calculated as follows

Record size=Name (in bytes)+ Ssn(in bytes)+Department_code(in bytes)

+Address (in bytes) + Phone (in bytes)+ Birth_date(in bytes)

+Sex (in bytes)+ Job_code(in bytes)+Salary(in bytes)

+1(1byte for deletion marker)

Record size

Comment
Step 2 of 31

(b) Calculation of Blocking factor and number of file blocks

Blocking factor, bfr

Records per block

Number of file blocks,

Comment

Step 3 of 31

(c) Operations on file ordered by key field Ssn

(i) Calculation of Index blocking factor and

Index record length,

Blocking factor,

Comment

Step 4 of 31

(ii) Calculation of number of first –level index and number of first level index blocks

Number of first – level index entries

=Number of first level index blocks

Number of first-level index entries,

Number of first-level index blocks,

Comment

Step 5 of 31

(iii) Calculation of number of levels for multi-level index

Number of second-level index entries =

Number of first-level blocks, = 221 entries

Number of second-level index blocks,

Number of third-level index entries, = number of second-level index blocks,

= 7 entries

Number of third-level index blocks,

It is the top index level because the third level has only one index.
Hence, the index has x = 3 levels

(iv) Calculation of number of blocks for multi-level index

Total number of the blocks for the index

From bit (ii), Number of first-level index blocks, =221 blocks

From bit (iii), Number of second-level index blocks, =7 block

Number of second-level index blocks , =1 blocks

Therefore, the total number of blocks,

Comment

Step 6 of 31

(v) Calculation of number of block access to search and retrieve a record using primary
index on a file.

For primary index type of index, the number of block access is equal to

the access one block at each level plus one block from the data file.

Therefore, the number of block access =x+1

Since the file is ordered with a single key field, Ssn. So it is a type of primary index.

Number of blocks access to search for a record

Comment

Step 7 of 31

(d) Repetition of part c for the secondary index

(i) Index record length

Index blocking factor bfr

In the ‘c’ part, the assumes that leaf-level index blocks contain block pointers. And it is possible
to assume that they contain record pointers. And Record size is

So, leaf – level index blocking factor bfri.

Index records/block for internal nodes, block pointers are always used, so the fan-out for
internal nodes to is 34.

Comment

Step 8 of 31

(ii) Number of first-level index entries

Number of file records

Number of first level index blocks

Number of first level index entries

Number of file records

Number of first-level index block


Comment

Step 9 of 31

(iii) Calculate the number of levels

Number of second –level index entries

Number of first-level index blocks

Number of second – level index blocks

Ceiling

Number of third-level index block

So, the third level has one block and it is the top of the level

So, index has total 3 levels

(iv) Total number of blocks for the index

(v) Number of blocks accesses to search for a record

Comment

Step 10 of 31

(e) Operations on the file which is constructed using secondary index on


Department_code

(i) Calculation of index blocking factor

Index record size

Index blocking factor

Comment

Step 11 of 31

(ii) Calculation of number of blocks for indirection

Here distinct values of Department_code.

Number of records for each value is

So, we know that record pointer size

Number of bytes need at the level of indirection for each value of

Department_code is

It is fits on the block

So, 1000 blocks are needed for the level of indirection.

Comment

Step 12 of 31
(iii) Calculation of number of first-level index entries and number of first level blocks

Number of first-level index entries

Number of distinct values of Department_code

Number of first level index blocks

Comment

Step 13 of 31

(iv) Calculation of number of levels for multi-level index

We can calculate the number of levels by number of second level index entries

Number of first level index blocks

Entries

Number of second-level index blocks

Ceiling

The index has

Comments (1)

Step 14 of 31

(v) Calculation of number of blocks for multi-level index

Total number of blocks for the index

Comment

Step 15 of 31

(vi ) Calculation of number of block access to search and retrieve all records in the file for
a Department_code value

Number of block accesses to search for and retrieve the block containing the record pointers at
the level of indirection

If records are distributed over 30distrinct blocks, we need an additional blocks.

So, total block accesses needed on average to retrieve all the records with in a given value for
Department_code

Comment

Step 16 of 31

(f) Operations on the file which is constructed using clustering index on Department_code

(i) Calculation of index blocking factor

Index blocking factor


Where

Comment

Step 17 of 31

(ii) Calculation of number of first-level index entries and number of first level blocks

Number of first level index entries

Number of distinct DEPARTMENT CODE values

entries.

Number of first-level index blocks

Comment

Step 18 of 31

(iii) Calculation of number of levels for multi-level index

Calculate the number of levels as number of second-level index entries

Number of first-level index blocks

Number of second-level index blocks

Ceiling

Second level has one block and it is in the top index level

The index has

Comments (1)

Step 19 of 31

( iv ) Calculation of number of blocks for multi-level index

Total number of blocks for the index

Comment

Step 20 of 31

(v) Calculation of number of block access to search and retrieve all records in the file for a
Department_code value

Number of block accesses to search for the first block in the cluster of blocks

So, the records are clustered in ceiling

So, total block accesses needed on average to retrieve all the records with a given

DEPARTMENT CODE

Comment
Step 21 of 31

(g) Operations on B+ tree

(i) Calculation of orders p and p-leaf of B+ tree

Orders P and P leaf of the

Each internal node has

So,

For leaf nodes, the record pointers are included in the leaf nodes, and it satisfied the

(Or)

Comments (2)

Step 22 of 31

(ii) Calculation of leaf nodes if the blocks are 69 percent full

Nodes are full on the average, so the average number of key values in a leaf node is

If we round up this for convenience, we get 22 key values and 22 record pointers per leaf node.

So, the file has records and hence values of , the number of leaf-level nodes
needed is

Comment

Step 23 of 31

(iii) Calculation of number of levels if internal nodes are 69 percent full

Calculate the number of levels as average fan-out for the internal nodes is

Number of second level tree blocks

Number of third level tree blocks

Number of fourth-level tree blocks

So, the fourth level has one block and the tree has levels

So,
Comment

Step 24 of 31

(iv) Calculation of total number of blocks

Total number of blocks for the tree

Comment

Step 25 of 31

(v) Calculation of number of block access to search and retrieve a record of Ssn using B+
tree

Number of blocks accesses to search for a record

Comment

Step 26 of 31

(h) Repetition of part (g) for B-tree

(i) p and p leaf order of the

Each internal node has

Choose p value as large value that satisfies the inequality

So,

For leaf nodes, the record pointers are included in the leaf nodes, and it satisfied the

Comment

Step 27 of 31

(ii) Each node of B-Tree is 69% full .So the average number of key values in a leaf node is

If we get ceiling of 21.39 for convenience, we get 22 key values and 22 record pointers per leaf
node.

So, the file has records and hence values of , the number of leaf-level nodes
needed is

Comment

Step 28 of 31

(iii) Calculate the number of levels as average fan-out for the internal nodes

is
Number of second level tree blocks

Number of third level tree blocks

Number of fourth-level tree blocks

So, the fourth level has one block and the tree has levels

So,

Comment

Step 29 of 31

(iv) Total number of blocks for the tree

Comments (1)

Step 30 of 31

(v) Number of blocks accesses to search for a record

Comment

Step 31 of 31

Comparison of B+ tree and B-tree

Calculation of approximate number of entries in B+ tree

At root level, each node on average will have 34 pointers and 33 (p-1) search fields

Root 1 node 33 entries 34 pointers

Level1 34 nodes 1122 entries 1156 pointers

Level2 1156 nodes 38148 entries 39304 pointers

Level3 39304 nodes 1297032 entries

Calculation of approximate number of entries in B tree

At root level, each node on average will have 23 pointers and 22 (p-1) search fields

Root 1 node 23 entries 22 pointers

Level1 22 nodes 506 entries 484 pointers

Level2 484 nodes 11132 entries 10648 pointers


Leaf Level 10648 nodes 244904 entries

For given block size, pointer size and search key field size, a three level B+ tree holds 1336335
entries on average .Similarly, for given block size, pointer size and search key field size, a leaf
level B tree holds 256565 entries on average .Therefore, average entries stored on B+ tree are
more than the average entries stored in B tree.

Comment
Chapter 17, Problem 19E

Problem

A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65,
37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49,
33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order
p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like.

Step-by-step solution

Step 1 of 34

B+ Tree Insertion:

Here, the given a set of keys to be inserted into a of order

• The Order implies that each node in the tree should have at most 4 pointers.

• Means the leaf nodes must have at least 2 keys and at most 3 keys.

• The insertion first start from the root, when root or any node overflows its capacity, it must split.

• When a leaf node is full the first elements will keep in that node and rest elements
should form the right node.

• The element at that rightmost position of the left partition will propagate up to the parent node.

• If the propagation is from the leaf node, a copy of the element should maintain at leaf. Else, just
move that element to its parent node.

• All the elements in the key list should be there in the leaf nodes.

Comment

Step 2 of 34

In problem given a set of keys to insert into the B+ tree in order.

The given list is,

23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20,24, 28, 39, 43, 47, 50, 69, 75,
8, 49, 33, 38.

First, insert the first three keys into the root; it will not result in overflow. Since, the capacity of the
node is also 3.

The resultant B+ tree will be as below:

Since, the node is also a leaf node and there is no pointer.

Comment

Step 3 of 34

Insert 60:

After the insertion of 60 into this node, it will results in an overflow, So the node to be split into
two and a new level will created as below:

Comment

Step 4 of 34

Insert 46:

Insertion of 46 will not affect the capacity constraint of the second node in level 2.

The resultant tree will be,


Level1- 37

Level2: 23,37: 46,60,65

The tree will look as below:

Comment

Step 5 of 34

Insert 92:

Insertion of next key, 92 will results in the overflow of the second node in the level2, it will be
46,60,65,92.

• Therefore, we need to split that node from 60 and create one new node in level2 and duplicate
60 in the parent node as below:

Comment

Step 6 of 34

Comment

Step 7 of 34

Insert 48:

Insertion of 48 will not prompt any overflow it will insert to the second node in the level2 as
below:

Comment

Step 8 of 34

Insert 71:

Insertion of 71 into B+ tree also will not prompt any overflow.

• It can insert into third node of level 2 without violating order constraints.

• Therefore, the updated tree will be as below:

Level1: 37, 60

Level2: 23, 37: 46, 48, 60: 65, 71, 92

The tree will be look as below:


Comment

Step 9 of 34

Insert 56:

Next insertion is of 56.

• Clearly 56 is belongs to the second node of level 2 but it will results in an overflow as shown
below:

• So, need to split that node (46, 48, 56, 60).

• The first two (46, 48) will form the first node of split and (56, 60) will form the second, the last
element of the first set (48) will propagate to up.

• Since it is a leaf node, it will be only duplication.

However, the resultant B+ will be as below:

Level 1: 37, 48, 60

Level 2: 23, 37: 46, 48: 56, 60: 65, 71, 92

Comment

Step 10 of 34

The rest insertion operations can be performed as below:

• The level is counts from root to leaves, that is; root will have level value 1 and increment 1
downwards.

Insert 59:

Level 1: 37, 48, 60

Level 2: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92

Comment

Step 11 of 34

Insert 18:

Level 1: 37, 48, 60

Level 2: 18, 23, 37: 46, 48: 56, 59, 60: 65, 71, 92

Comment

Step 12 of 34

Insert 21:

Level 1: 37, 48, 60

Level 2: 18, 21, 23, 37: 46, 48: 56, 59, 60: 65, 71, 92

Overflow. Split (18, 21, 23, 37) and propagate 21 to above level.

Level 1: 21, 37, 48, 60

Level 2: 18, 21,: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92

Again, overflow in level 1. Split and propagate 37, since it is not a leaf node so no need to take a
copy of 37. This will results a new level in the tree.

Level 1: 37

Level 2: 21: 48, 60

Level 3: 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Comment

Step 13 of 34

Insert 10:

Level 1: 37

Level 2: 21: 48, 60

Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92

Comment

Step 14 of 34

Insert 74:

Level 1: 37

Level 2: 21: 48, 60

Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 74, 92

Overflow in level 3. Split overloaded node at 71

Level 1: 37

Level 2: 21: 48, 60, 71

Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 92

Comment

Step 15 of 34

Insert 78:

Level 1: 37

Level 2: 21: 48, 60, 71

Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 16 of 34

Insert 15:

Level 1: 37

Level 2: 21: 48, 60, 71

Level 3: 10, 15, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Overflow in the first node of level 3, split it at 15 and propagate 15 up.

Level 1: 37

Level 2: 15, 21: 48, 60, 71

Level 3: 10, 15: 18, 21: 23, 37: 46, 48:

Comment

Step 17 of 34

56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 18 of 34

Insert 16:

Level 1: 37

Level 2: 15, 21: 48, 60, 71

Level 3: 10, 15 16, 18, 21: 23, 37: 46, 48:

56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 19 of 34

Insert 20:

Level 1: 37
Level 2: 15, 21: 48, 60, 71

Level 3: 10, 15 16, 18, 20, 21: 23, 37: 46, 48:

56, 59, 60: 65, 71: 74, 78, 92

Overflow at the inserted node, split it at 18 and propagate 18 up.

Level 1: 37

Level 2: 15, 18, 21: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 37: 46, 48:

56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 20 of 34

Insert 24:

Level 1: 37

Level 2: 15, 18, 21: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24, 37: 46, 48:

56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 21 of 34

Insert 28:

Level 1: 37

Level 2: 15, 18, 21: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24, 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Overflow in the fourth node of level 3, split it at 24 and propagate 24 up as below.

Level 1: 37

Level 2: 15, 18, 21,24: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Again, overflow at level 2, need one more split at 18 as below.

Level 1: 18, 37

Level 2: 15: 21,24: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 22 of 34

Insert 39:

Level 1: 18, 37

Level 2: 15: 21, 24: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 23 of 34

Insert 43:

Level 1: 18, 37

Level 2: 15: 21, 24: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43, 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Over flow at the inserted node, so split that node at second element 43 as below.

Level 1: 18, 37

Level 2: 15: 21, 24: 43, 48, 60, 71

Comment

Step 24 of 34

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: , 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Again, overflow at level 2.

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: , 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Comment
Step 25 of 34

Insert 47:

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 26 of 34

Insert 50:

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56, 59, 60: 65, 71:

74, 78, 92

Overflow at the inserted node. Split the node at 56, the second element and propagate it up as
below.

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 56, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 71: 74, 78,
92

Comment

Step 27 of 34

Insert 69:

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 56, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
78, 92

Comment

Step 28 of 34

Insert 75:

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 56, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75 78, 92

Overflow at the inserted node, split and propagate up the node at the second element.

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 56, 60, 71, 75

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75: 78, 92

Again, overflow at the inserted node, split it at 60 and propagate up.

Level 1: 18, 37, 48, 60

Level 2: 15: 21, 24: 43: 56: 71, 75

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75: 78, 92

Again overflow at the inserted node of 60. Split it at 37 and propagate 37 into a new level.

Level 1: 37

Level 2: 18: 48, 60

Level 3: 15: 21, 24: 43: 56: 71, 75

Level 4: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75: 78, 92

Comment

Step 29 of 34

Insert 8:

Level 1: 37

Level 2: 18: 48, 60

Level 3: 15: 21, 24: 43: 56: 71, 75

Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:


28, 37: 39, 43: 46, 47, 48: 50, 56:

59, 60: 65, 69, 71: 74, 75: 78, 92

Comment

Step 30 of 34

Insert 49:

Level 1: 37

Level 2: 18: 48, 60

Level 3: 15: 21, 24: 43: 56: 71, 75

Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:

28, 37: 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92

Comment

Step 31 of 34

Insert 33:

Level 1: 37

Level 2: 18: 48, 60

Level 3: 15: 21, 24: 43: 56: 71, 75

Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:

28, 33, 37: 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92

Comment

Step 32 of 34

Insert 38:

Level 1: 37

Level 2: 18: 48, 60

Level 3: 15: 21, 24: 43: 56: 71, 75

Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:

28, 33, 37: 38, 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92

Comment

Step 33 of 34

The tree after the insertion of the last key 38 will give us the final B+ tree.

• From each node except the leaf nodes, a left pointer is there to the child nodes in which left
pointer points to node having keys less than that parent node and right pointer points to the node
having key values larger than that parent node.

• Each set in the above tree levels will form a node and set elements are the keys present in that
node.

Comment

Step 34 of 34

Graphically the final tree after the insertion of keys will look as below:

Comment
Chapter 17, Problem 19E

Problem

A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65,
37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49,
33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order
p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like.

Step-by-step solution

Step 1 of 34

B+ Tree Insertion:

Here, the given a set of keys to be inserted into a of order

• The Order implies that each node in the tree should have at most 4 pointers.

• Means the leaf nodes must have at least 2 keys and at most 3 keys.

• The insertion first start from the root, when root or any node overflows its capacity, it must split.

• When a leaf node is full the first elements will keep in that node and rest elements
should form the right node.

• The element at that rightmost position of the left partition will propagate up to the parent node.

• If the propagation is from the leaf node, a copy of the element should maintain at leaf. Else, just
move that element to its parent node.

• All the elements in the key list should be there in the leaf nodes.

Comment

Step 2 of 34

In problem given a set of keys to insert into the B+ tree in order.

The given list is,

23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20,24, 28, 39, 43, 47, 50, 69, 75,
8, 49, 33, 38.

First, insert the first three keys into the root; it will not result in overflow. Since, the capacity of the
node is also 3.

The resultant B+ tree will be as below:

Since, the node is also a leaf node and there is no pointer.

Comment

Step 3 of 34

Insert 60:

After the insertion of 60 into this node, it will results in an overflow, So the node to be split into
two and a new level will created as below:

Comment

Step 4 of 34

Insert 46:

Insertion of 46 will not affect the capacity constraint of the second node in level 2.

The resultant tree will be,


Level1- 37

Level2: 23,37: 46,60,65

The tree will look as below:

Comment

Step 5 of 34

Insert 92:

Insertion of next key, 92 will results in the overflow of the second node in the level2, it will be
46,60,65,92.

• Therefore, we need to split that node from 60 and create one new node in level2 and duplicate
60 in the parent node as below:

Comment

Step 6 of 34

Comment

Step 7 of 34

Insert 48:

Insertion of 48 will not prompt any overflow it will insert to the second node in the level2 as
below:

Comment

Step 8 of 34

Insert 71:

Insertion of 71 into B+ tree also will not prompt any overflow.

• It can insert into third node of level 2 without violating order constraints.

• Therefore, the updated tree will be as below:

Level1: 37, 60

Level2: 23, 37: 46, 48, 60: 65, 71, 92

The tree will be look as below:


Comment

Step 9 of 34

Insert 56:

Next insertion is of 56.

• Clearly 56 is belongs to the second node of level 2 but it will results in an overflow as shown
below:

• So, need to split that node (46, 48, 56, 60).

• The first two (46, 48) will form the first node of split and (56, 60) will form the second, the last
element of the first set (48) will propagate to up.

• Since it is a leaf node, it will be only duplication.

However, the resultant B+ will be as below:

Level 1: 37, 48, 60

Level 2: 23, 37: 46, 48: 56, 60: 65, 71, 92

Comment

Step 10 of 34

The rest insertion operations can be performed as below:

• The level is counts from root to leaves, that is; root will have level value 1 and increment 1
downwards.

Insert 59:

Level 1: 37, 48, 60

Level 2: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92

Comment

Step 11 of 34

Insert 18:

Level 1: 37, 48, 60

Level 2: 18, 23, 37: 46, 48: 56, 59, 60: 65, 71, 92

Comment

Step 12 of 34

Insert 21:

Level 1: 37, 48, 60

Level 2: 18, 21, 23, 37: 46, 48: 56, 59, 60: 65, 71, 92

Overflow. Split (18, 21, 23, 37) and propagate 21 to above level.

Level 1: 21, 37, 48, 60

Level 2: 18, 21,: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92

Again, overflow in level 1. Split and propagate 37, since it is not a leaf node so no need to take a
copy of 37. This will results a new level in the tree.

Level 1: 37

Level 2: 21: 48, 60

Level 3: 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92
Comment

Step 13 of 34

Insert 10:

Level 1: 37

Level 2: 21: 48, 60

Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 92

Comment

Step 14 of 34

Insert 74:

Level 1: 37

Level 2: 21: 48, 60

Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71, 74, 92

Overflow in level 3. Split overloaded node at 71

Level 1: 37

Level 2: 21: 48, 60, 71

Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 92

Comment

Step 15 of 34

Insert 78:

Level 1: 37

Level 2: 21: 48, 60, 71

Level 3: 10, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 16 of 34

Insert 15:

Level 1: 37

Level 2: 21: 48, 60, 71

Level 3: 10, 15, 18, 21: 23, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Overflow in the first node of level 3, split it at 15 and propagate 15 up.

Level 1: 37

Level 2: 15, 21: 48, 60, 71

Level 3: 10, 15: 18, 21: 23, 37: 46, 48:

Comment

Step 17 of 34

56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 18 of 34

Insert 16:

Level 1: 37

Level 2: 15, 21: 48, 60, 71

Level 3: 10, 15 16, 18, 21: 23, 37: 46, 48:

56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 19 of 34

Insert 20:

Level 1: 37
Level 2: 15, 21: 48, 60, 71

Level 3: 10, 15 16, 18, 20, 21: 23, 37: 46, 48:

56, 59, 60: 65, 71: 74, 78, 92

Overflow at the inserted node, split it at 18 and propagate 18 up.

Level 1: 37

Level 2: 15, 18, 21: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 37: 46, 48:

56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 20 of 34

Insert 24:

Level 1: 37

Level 2: 15, 18, 21: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24, 37: 46, 48:

56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 21 of 34

Insert 28:

Level 1: 37

Level 2: 15, 18, 21: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24, 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Overflow in the fourth node of level 3, split it at 24 and propagate 24 up as below.

Level 1: 37

Level 2: 15, 18, 21,24: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Again, overflow at level 2, need one more split at 18 as below.

Level 1: 18, 37

Level 2: 15: 21,24: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 22 of 34

Insert 39:

Level 1: 18, 37

Level 2: 15: 21, 24: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 23 of 34

Insert 43:

Level 1: 18, 37

Level 2: 15: 21, 24: 48, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43, 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Over flow at the inserted node, so split that node at second element 43 as below.

Level 1: 18, 37

Level 2: 15: 21, 24: 43, 48, 60, 71

Comment

Step 24 of 34

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: , 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Again, overflow at level 2.

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: , 46, 48: 56, 59, 60: 65, 71: 74, 78, 92

Comment
Step 25 of 34

Insert 47:

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 56, 59, 60: 65, 71: 74, 78, 92

Comment

Step 26 of 34

Insert 50:

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56, 59, 60: 65, 71:

74, 78, 92

Overflow at the inserted node. Split the node at 56, the second element and propagate it up as
below.

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 56, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 71: 74, 78,
92

Comment

Step 27 of 34

Insert 69:

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 56, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
78, 92

Comment

Step 28 of 34

Insert 75:

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 56, 60, 71

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75 78, 92

Overflow at the inserted node, split and propagate up the node at the second element.

Level 1: 18, 37, 48

Level 2: 15: 21, 24: 43: 56, 60, 71, 75

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75: 78, 92

Again, overflow at the inserted node, split it at 60 and propagate up.

Level 1: 18, 37, 48, 60

Level 2: 15: 21, 24: 43: 56: 71, 75

Level 3: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75: 78, 92

Again overflow at the inserted node of 60. Split it at 37 and propagate 37 into a new level.

Level 1: 37

Level 2: 18: 48, 60

Level 3: 15: 21, 24: 43: 56: 71, 75

Level 4: 10, 15 16, 18: 20, 21: 23, 24: 28, 37: 39, 43: 46, 47, 48: 50, 56: 59, 60: 65, 69, 71: 74,
75: 78, 92

Comment

Step 29 of 34

Insert 8:

Level 1: 37

Level 2: 18: 48, 60

Level 3: 15: 21, 24: 43: 56: 71, 75

Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:


28, 37: 39, 43: 46, 47, 48: 50, 56:

59, 60: 65, 69, 71: 74, 75: 78, 92

Comment

Step 30 of 34

Insert 49:

Level 1: 37

Level 2: 18: 48, 60

Level 3: 15: 21, 24: 43: 56: 71, 75

Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:

28, 37: 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92

Comment

Step 31 of 34

Insert 33:

Level 1: 37

Level 2: 18: 48, 60

Level 3: 15: 21, 24: 43: 56: 71, 75

Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:

28, 33, 37: 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92

Comment

Step 32 of 34

Insert 38:

Level 1: 37

Level 2: 18: 48, 60

Level 3: 15: 21, 24: 43: 56: 71, 75

Level 4: 8, 10, 15 16, 18: 20, 21: 23, 24:

28, 33, 37: 38, 39, 43: 46, 47, 48: 49, 50, 56: 59, 60: 65, 69, 71: 74, 75: 78, 92

Comment

Step 33 of 34

The tree after the insertion of the last key 38 will give us the final B+ tree.

• From each node except the leaf nodes, a left pointer is there to the child nodes in which left
pointer points to node having keys less than that parent node and right pointer points to the node
having key values larger than that parent node.

• Each set in the above tree levels will form a node and set elements are the keys present in that
node.

Comment

Step 34 of 34

Graphically the final tree after the insertion of keys will look as below:

Comment
Chapter 17, Problem 20E

Problem

Repeat Exercise, but use a B-tree of order p = 4 instead of a B+-tree.

Exercise

A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65,
37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49,
33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order
p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like.

Step-by-step solution

Step 1 of 1

Insertion will take place in steps represented in diagram:

Comment
Chapter 17, Problem 21E

Problem

Suppose that the following search field values are deleted, in the given order, from the B+-tree of
Exercise; show how the tree will shrink and show the final tree. The deleted values are 65, 75,
43, 18, 20, 92, 59, 37.

Exercise

A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65,
37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49,
33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order
p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like.

Step-by-step solution

Step 1 of 10

In the - tree deletion algorithm, the deletion of a key value from a leaf node is

(1) It is less than half full.

In this case, we may combine with the next leaf node.

Comment

Step 2 of 10

(2) If the key value is deleted from right most value. Then its value will appear in an internal
node.

In this case, the key value to the left of the deleted key and left node will replaces the deleted key
value in the internal node.

From the data, deleting 65 will only affect the leaf node.

Deleting 75 will cause a leaf node to be less than half. So, it is combined with the next node and
also 75 is removed than the internal node.

Comment

Step 3 of 10

Comment

Step 4 of 10

Deleting 43 causes a leaf node to be less than half full, and is combined with the next node.

So the next node has 3 entries. It’s right must entry 48 can replace 43 in both the leaf and
interval nodes.

Comment

Step 5 of 10
Comment

Step 6 of 10

In the next step we may delete 18, it is in the right most entry in a leaf node and appears in an
internal node of the . Now the leaf node is less than half full and combined with the
next node.

The value 18 must be removed from the internal node. Causing underflow in the internal

One approach for dealing with under flow internal nodes is to reorganize the values of the under
flow node with its child nodes, so 21 is moved up into the under flow node leading to the
following free.

Comment

Step 7 of 10

Comment

Step 8 of 10

Deleting 20 and 92 will not cause under flow.

Deleting 59 causes under flow and the remaining value go is combined with the next leaf node.
Hence 60 is no larger a right most entry in a leaf node. This is normally don by moving 56 up to
replace 60 in the internal node, but since this leads to under flow in the node that used to
contains 56 the nodes can be reorganized as follows.

Comment

Step 9 of 10

Comment

Step 10 of 10

Finally removing 37 causes serious underflow, leading to a reorganization of the whole tree. One
approach to deleting the value on the root node is to use the right mast value in the root node is
to use the right mast value in the next leaf node to replace the root an move this leaf node to the
left sub tree. In this case the resulting tree may book as follows.
Comment
Chapter 17, Problem 22E

Problem

Repeat Exercise 1, but for the B-tree of Exercise 3.

Exercise 1

Suppose that the following search field values are deleted, in the given order, from the B+-tree of
Exercise 2; show how the tree will shrink and show the final tree. The deleted values are 65, 75,
43, 18, 20, 92, 59, 37.

Exercise 2

A PARTS file with Part# as the key field includes records with the following Part# values: 23, 65,
37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20, 24, 28, 39,43,47, 50,69, 75, 8,49,
33, 38. Suppose that the search field values are inserted in the given order in a B+-tree of order
p = 4 and pleaf = 3; show how the tree will expand and what the final tree will look like.

Exercise 3

Repeat Exercise 2, but use a B-tree of order p = 4 instead of a B+-tree.

Step-by-step solution

Step 1 of 1

Deletion will take place in following order:

Comment
Chapter 17, Problem 23E

Problem

Algorithm 17.1 outlines the procedure for searching a nondense multilevel primary index to
retrieve a file record. Adapt the algorithm for each of the following cases:

a. A multilevel secondary index on a nonkey nonordering field of a file. Assume that option 3 of
Section 17.1.3 is used, where an extra level of indirection stores pointers to the individual records
with the corresponding index field value.

b. A multilevel secondary index on a nonordering key field of a file.

c. A multilevel clustering index on a nonkey ordering field of a file.

Step-by-step solution

Step 1 of 3

Comment

Step 2 of 3

Comment

Step 3 of 3

Comment
Chapter 17, Problem 24E

Problem

Suppose that several secondary indexes exist on nonkey fields of a file, implemented using
option 3 of Section 17.1.3; for example, we could have secondary indexes on the fields
Department_code, Job_code, and Salary of the EMPLOYEE file of Exercise. Describe an
efficient way to search for and retrieve records satisfying a complex selection condition on these
fields, such as (Department_code = 5 AND Job_code =12 AND Salary = 50,000), using the
record pointers in the indirection level.

Exercise

Consider a disk with block size B = 512 bytes. A block pointer is P = 6 bytes long, and a record
pointer is PR = 7 bytes long. A file has r = 30,000 EMPLOYEE records of fixed length. Each
record has the following fields: Name (30 bytes), Ssn (9 bytes), Department_code (9 bytes),
Address (40 bytes), Phone (10 bytes), Birth_date (8 bytes), Sex (1 byte), Job_code (4 bytes),
and Salary (4 bytes, real number). An additional byte is used as a deletion marker.

a. Calculate the record size R in bytes.

b. Calculate the blocking factor bfr and the number of file blocks b, assuming an unspanned
organization.

c. Suppose that the file is ordered by the key field Ssn and we want to construct a primary index
on Ssn. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the
number of first-level index entries and the number of first-level index blocks; (iii) the number of
levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the
multilevel index; and (v) the number of block accesses needed to search for and retrieve a record
from the file—given its Ssn value—using the primary index.

d. Suppose that the file is not ordered by the key field Ssn and we want to construct a secondary
index on Ssn. Repeat the previous exercise (part c) for the secondary index and compare with
the primary index.

e. Suppose that the file is not ordered by the nonkey field Department_code and we want to
construct a secondary index on Department_code, using option 3 of Section 17.1.3, with an extra
level of indirection that stores record pointers. Assume there are 1,000 distinct values of
Department_code and that the EMPLOYEE records are evenly distributed among these values.
Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the number of
blocks needed by the level of indirection that stores record pointers; (iii) the number of first-level
index entries and the number of first-level index blocks; (iv) the number of levels needed if we
make it into a multilevel index; (v) the total number of blocks required by the multilevel index and
the blocks used in the extra level of indirection; and (vi) the approximate number of block
accesses needed to search for and retrieve all records in the file that have a specific
Department_code value, using the index.

f. Suppose that the file is ordered by the nonkey field Department_code and we want to construct
a clustering index on Department_code that uses block anchors (every new value of
Department_code starts at the beginning of a new block). Assume there are 1,000 distinct values
of Department_code and that the EMPLOYEE records are evenly distributed among these
values. Calculate (i) the index blocking factor bfri (which is also the index fan-out fo); (ii) the
number of first-level index entries and the number of first-level index blocks; (iii) the number of
levels needed if we make it into a multilevel index; (iv) the total number of blocks required by the
multilevel index; and (v) the number of block accesses needed to search for and retrieve all
records in the file that have a specific Department_code value, using the clustering index
(assume that multiple blocks in a cluster are contiguous).

g. Suppose that the file is not ordered by the key field Ssn and we want to construct a B+-tree
access structure (index) on Ssn. Calculate (i) the orders p and pleaf of the B+-tree; (ii) the
number of leaf-level blocks needed if blocks are approximately 69% full (rounded up for
convenience); (iii) the number of levels needed if internal nodes are also 69% full (rounded up for
convenience); (iv) the total number of blocks required by the B+-tree; and (v) the number of block
accesses needed to search for and retrieve a record from the file?given its Ssn value?using the
B+-tree.

h. Repeat part g, but for a B-tree rather than for a B+-tree. Compare your results for the B-tree
and for the B+-tree.

Step-by-step solution

Step 1 of 2

The EMPLOYEE file contains the fields Name, Ssn, Department_code, Address, Phone,
Birth_date, Sex, Job_code, Salary.

The primary index is maintained on the key field Ssn .

Consider that the secondary indexes are maintained on the fields Department_code, Job_code
and Salary. The fields Department_code, Job_code and Salary are non-key fields.
Comment

Step 2 of 2

The steps to retrieve records based on the complex condition (Department_code = 5 AND
Job_code = 12 AND Salary = 50,000) using record pointers in indirection level is as follows:

1. First retrieve the record pointers of the records that satisfy the condition Department_code = 5
using secondary index on Deparment_code.

2. Then among the records pointers retrieved in step 1, retrieve the record pointers of the records
that satisfy the condition Job_code = 12 using secondary index on Job_code.

3. Then among the records pointers retrieved in step 2, retrieve the record pointers of the records
that satisfy the condition Salary = 50000 using secondary index on Salary.

Comment
Chapter 17, Problem 25E

Problem

Adapt Algorithms 17.2 and 17.3, which outline search and insertion procedures for a B+-tree, to a
B-tree.
Step-by-step solution

Step 1 of 2

Searching record in B-tree with key field value= K

n<- block containing root node of B- tree

read block n;

while(n is not the leaf node of tree) do

begin

q<- number of tree pointers in node n;

if K<= n.K1( * n.Ki referes to the ith search field value in node n*)

then n<- n.P1(* n.Pi refers to the ith tree pointer in node n *)

else if K> n.Kq-1

then n<- n.Pq

else

begin

search node n for an entry i such that n.Ki = K

if for (n.Ki == K)

use data pointer to access the file record;

exit;

else search node n for an entry i such thatn.Ki-1< K <=n.Ki;

n<-n.Pi

end

read block n;

end;

begin;

search leaf node n for an entry i such that n.Ki = K

if for (n.Ki == K)

use data pointer to access the file record;

else

return value does not exist;(*if we rech at this level value does not exist*);

Comment

Step 2 of 2

Insertion Key field value K in B-tree of order p:

n<- block containing root node of B-tree;

read block n; set stack S o empty;

while (n is not leaf node of tree) do

begin
push addres of n on stack S;

q<- number of tree pointers in node n;

if K<= n.K1

then n<- n.P1

else if K> n.Kq-1

then n<- n.Pq

else begin

search node n for entry i such that n.Ki-1< K<= n.Ki;

n<- n.Pi

end;

read block n

end;

search block for entry (Ki, P) with K = Ki;

if found

then record already in file; cannot isert

else

begin

create entry (Pr, ) where Pr points to the new record;

if leaf node n is not full

then insert (Pr, ) in correct psition in leaf node n

else begin

copy n to temp;

insert entry (Pr, ) in temp in corret position;

new<- a new empty leaf node for tree ;

j<- [(pleaf+1)/2];

n<- first j entries intemp (upo entry (Prj,

new<- remaining entries in temp; K <- Kj;

finished<- false;

repeat

if stack S is empty

then

begin

root<- a new empty internal node for the tree;

ROOT<- >; finished<- true;

end

else begin

n<- pop stack S;

if internal node n is not full

then

begin

insert (new, ) in correct position in internal node n;

finished<- true

end

else begin

copy n to temp

insert (new, ) in temp in correct position ;

new<- a new empty internal node for tree;

j<-[(p+1)/2];

n<- new enteries upto tree pointer Pj in temp;

new<- entries from tree pointer Pj+1 in temp;

K<- Kj

end

end;

entill finished;

end;

end;

Comment
Chapter 17, Problem 26E

Problem

It is possible to modify the B+-tree insertion algorithm to delay the case where a new level is
produced by checking for a possible redistribution of values among the leaf nodes. Figure 17.17
illustrates how this could be done for our example in Figure 17.12; rather than splitting the
leftmost leaf node when 12 is inserted, we do a left redistribution by moving 7 to the leaf node to
its left (if there is space in this node). Figure 17.17 shows how the tree would look when
redistribution is considered. It is also possible to consider right redistribution. Try to modify the
B+-tree insertion algorithm to take redistribution into account.

Step-by-step solution

Step 1 of 1
Refer to figure 17.17 for the redistribution of the values among the leaf nodes at a new level. The
figure shows inserting the values 12, 9 and 6. In the figure, value 12 is inserted into the leaf node
by moving 7 to its left leaf node through left redistribution.

The values 12, 9 and 6 can be distributed among the leaf nodes, at a new level, using right
redistribution as follows:

• When a new value is inserted in a leaf node, the tree is divided into leaf nodes and internal
nodes. Every value that appears in the internal node also appears as the rightmost value at the
leaf level, such that the tree pointer to the left of this value points to this value.

• If a new values needs to be inserted in the leaf node and the leaf node is full, then it is split. The
first values, where denotes the order of leaf nodes, present in the
original node are retained and rests of the values are moved to a new leaf node. The duplicate
value of the jth search value is retained at the parent node and a pointer pointing to the new
node is created.

• This new node is inserted in the parent node. If the parent node is full then it is split. The jth
search value is moved to the parent and values present in the internal nodes up to are kept,
where is the jth tree pointer and .

• The values from till the last value present in the node are kept in the new internal node.
The splitting of parent node and leaf nodes continues in this way and results in new level for the
tree.

The modified tree insertion algorithm based on the right redistribution is as follows:

Comment
Chapter 17, Problem 27E

Problem

Outline an algorithm for deletion from a B+-tree.

Step-by-step solution

Step 1 of 1

Delete node with value of key = K

n<- block containing root node of B+ -Tree;

read block n;

while (n is not leaf node of B+ - tree)

begin

q<- number of tree pointer in node n ;

if K<= n.K1( * n.Ki referes to the ith search field value in node n*)

then n<- n.P1(* n.Pi refers to the ith tree pointer in node n *)

else if K> n.Kq-1

then n<- n.Pq

else

begin

search node n for an entry i such that n.Ki = K

if (n.Ki == K)

Access the left most value in tree pointed by n.Pi+1

Store this value in a temp;

Delete this value from tree;

replace K with temp;

exit;

else search node n for an entry i such thatn.Ki-1< K <=n.Ki;

n<-n.Pi

end

read block n;

end;

search leaf node n for an entry (Ki, Pri) with = Ki;

If not found

value does not exist in tree

else

if it is the single entry in leaf node and P.next is not null

temp. K1 , Ptemp.Pr<- Pnext.K1, Pnext.Pr;

Delte P.next.K1;

Excahange record value of parent record and tmp.

replace value of n by temp;

exit;

else if it is not single entry;

Delete it;

else if it is single entry and Pnext = NULL

Access the right most value in tre pointed by parent

tore value in temp;

exchange parent and temp;

acess record to be deleted;

replace this record by temp;

exit;

Comment
Chapter 17, Problem 28E

Problem

Repeat Exercise for a B-tree.

Exercise

Outline an algorithm for deletion from a B+-tree.

Step-by-step solution

Step 1 of 2

Algorithm for deletion from B-tree:

B-Tree –delete (x, k)

// is the root of the sub tree and k is the key which is to be deleted.

// if K deleted successfully, then B-tree-Delete return true. Other wise it returns false.

Note: - This function is designed so that when ever it is called recursively has at least keys.

If is a leaf then

if is in then

Delete k from and it return true.

Else return false // k is not in subs tree.

Else // is an internal node.

If k is in them

The child of that precedes k

If has at least keys them

The predecessor of

(Use B-tree-find largest)

Copy over k

// replace with

B-tree-Delete ( ) // recursive call else // has keys

The child of that follows k

If z has at least keys then

The successor of k

Copy over k // replace k with

B-Tree-Delete // recursive call else // both and have keys

Merge and all of into

// here contains 2t-1 keys,

// k and the pointer to z will be deleted from x.

B-tree-Delete (y,k) // recursive call else // k is not in internal node x

Points to the root, of the sub tree and it contain k.

If contain k

If C has keys then

If C has an immediate left /right sibling, z

With t or more keys then let 1<1 be the key in that follows C.

Move 1<1 into C as the first /last key in C.

Let be the last/first key in the immediate left/right sibling, z

Replace in with from z

(ie. Move up into )

More the last/first child sub tree of C.

Else // C and both of its immediate siblings have (t-1)

// we cannot descend to the child node with only keys so

Comment

Step 2 of 2

Merge ‘C’ with immediate siblings and make the appropriate key of ,c

B-tree-delete (c, k)

Comment
Chapter 20, Problem 1RQ

Problem

What is meant by the concurrent execution of database transactions in a multiuser system?


Discuss why concurrency control is needed, and give informal examples.

Step-by-step solution

Step 1 of 1

Multi-user system

Users, that can use the many system and access data at the same time. That is called multi user
system.

Concurrency control is needed for

Lost update problem –

Two or more transactions read a record at the same time, but when the records are saved, only
the last record saved will reflect any changes while all other changes will be lost.

Temporary update (or dirty read) problem

By this we cannot save the data update because someone else may have accessed the record
and locked it due to a concurrency safety feature. However, another transaction reads the
temporary update. the data from the temporary update is now incorrect or "dirty data".

Incorrect Summary problem –

Let the example from the book

uses an airline seat reservation issue. a person wants to buy a ticket for a seat so the system
takes a summary of how many open seats on the plane. between the time the

summary action starts and finishes, some other seats become reserved by other tellers and the
initial summary comes back to our customer and is now inaccurate because it will not reflect the
true number of seats available.

Comment
Chapter 20, Problem 2RQ

Problem

Discuss the different types of failures. What is meant by catastrophic failure?

Step-by-step solution

Step 1 of 1

Types of Failures

Failures in database management system are categorized as transaction, system, and media
failures.

There are many possible reasons for transaction to fail during execution:

Computer failure:

During the transaction execution, the computer hardware, media, software or network may crash.
This type of crashes will cause database management system failures.

Transaction or system error:

The operations such as such as divide by zero or integer overflow will cause the transaction to
fail. Occurrences of logical programming error or erroneous parameter values will cause failures.
User may interrupt the system during the transaction execution.

Local errors:

Errors or exception conditions that are detected by the transaction will cause failures. Then the
transaction halts and cancels all inputted data; because something along the way prevents it
from proceeding.

Concurrency control enforcement:

Several transactions become deadlocked and are aborted.

Disk failure:

The data stored in the disk blocks may be lost because of a read-write error or a read/write head
crash. This could occur during a read or a write operation of the transaction.

Physical problems and catastrophes:

Power failure, robbery, fire accident, destruction and many more refer to physical problems.

Catastrophic failure:

Catastrophic failure will occur very rarely. Catastrophic failure includes many forms of physical
misfortune to our database and there is an endless list of such problems.

• The hard drive with all data may completely damage

• Fire accident that may cause the loss of physical devices and data loss.

• Power or air-conditioning failures.

• Destruction of physical devices.

• Theft of storage media and physical devices.

• Overwriting disks or tapes by mistake.

Comment
Chapter 20, Problem 3RQ

Problem

Discuss the actions taken by the read_item and write_item operations on a database.

Step-by-step solution

Step 1 of 1

In a database, the operations like read item and write item that may

Actions taken by the read item operation on a database (assume the read operation is performed
on data item X):

Find the address of the disk block that contains item X.

Copy the disk block into a buffer in main memory if that disk is not already in some main memory
buffer.

Copy item X from the buffer to the program variable named X.

Actions taken by the write item operation on a database (assume the write operation is
performed on data item X):

Find the address of the disk block that contains item X.

Copy the disk block into a buffer in main memory if that disk is not already in some main memory
buffer.

Copy item X from the program variable named X into its correct location in the buffer.

Store the updated block from the buffer back to disk (either immediately or at some later point in
time).

Comment
Chapter 20, Problem 4RQ

Problem

Draw a state diagram and discuss the typical states that a transaction goes through during
execution.

Step-by-step solution

Step 1 of 2

State diagram of a transaction,

Comment

Step 2 of 2

Typical states that a transaction

begin_transaction - start

read or write - read or change or delete a record.

end_transaction - finish

commit_transaction - change or delete completed.

rollback - change or delete unsuccessful, all changes will be reset.

Important of transaction commit points

Transactions commit points in the log where the transaction has completed successfully and all
of the reads and write that go along with it.

Comment
Chapter 20, Problem 5RQ

Problem

What is the system log used for? What are the typical kinds of records in a system log? What are
transaction commit points, and why are they important?

Step-by-step solution

Step 1 of 2

System log:

The system log used for “to recover from failures that affect transactions”.

The system maintains a log to keep track of all transaction operations that affect the values of
database items." basically it is used to keep track of all the meaningful stuff from a database.

Comment

Step 2 of 2

Typical kinds of records in a system log:

start_transaction - start

commit - finish

read - read

write - write

abort - don’t change anything.

Important of transaction commit points

Points in the log where the transaction has completed successfully and all of the reads and
writes that go along with it.

Comment
Chapter 20, Problem 6RQ

Problem

Discuss the atomicity, durability, isolation, and consistency preservation properties of a database
transaction.

Step-by-step solution

Step 1 of 4

Atomicity:

• This property states that a transaction must be treated as an atomic unit, that is, either all its
operations are executed or none.

• There must be no state in a database where a transaction is left partially completed.

• States should be defined either before the execution of the transaction or after the
execution/abortion/failure of the transaction.

• This property requires that execute a transaction to completion. If the transaction is fail.

• If there is a failure at midway or user explicitly cancels the operation or due to any internal error
occurred, database ensures whether any partial state from leftover operation or not.

• Database can UNDO or ROLLBACK all the changes as the database was present in its first
place.

• To complicate for some reason, such as a system crash during transaction execution, the
recovery technique must undo any effects of the transaction on the database.

Comment

Step 2 of 4

Durability or permanency:

• The changes applied to the database by a committed transaction must persist in the database,
and must not be lost if failure occurs.

• It is the responsibility of the recovery subsystem of the DBMS.

• If a transaction updates a chunk of data in a database and commits, then the database holds
the modified data.

• Even if a transaction commits but the system fails before the data could be written on to the
disk, then the data will be updated once the system springs back into action.

Comment

Step 3 of 4

Isolation:

• A transaction should appear as though it is being executed in isolation from other transactions
simultaneously or in parallel.

• That is, the execution of a transaction should not be interfered with by any other transactions
executing concurrently.

• It is enforced by the concurrency control sub system of the DBMS. If every transaction does not
make its updates visible to other transactions until it is committed.

• It solves the temporary update problem and eliminates cascading rollbacks.

• In simple terms, one transaction cannot read data from another transaction until it is not
completed.

• If two transactions are executing sequentially, and one wants to see the changes done by the
another, it must wait until the other is finished.

Comment

Step 4 of 4

Consistency preservation:

• The consistency property ensures that the database remains in a consistent state before the
start of the transaction and after the transaction is over (whether it is successful or not).

• It states that when transaction is finished the data will remain in a consistent state.

• A transaction either creates a new and valid state of data, or, if any failure occurs, returns all
data to its state before the transaction was started.

• Execution of transaction should take the database from one consistent state to another.

Comment
Chapter 20, Problem 7RQ

Problem

What is a schedule (history)? Define the concepts of recoverable, cascade-less, and strict
schedules, and compare them in terms of their recoverability.

Step-by-step solution

Step 1 of 4

Schedule (or history)

A schedule (or history) S of n transactions T1, T2 , ...,Tn is an ordering of the operations of the
transactions subject to the constraint that, for each transaction Ti that participates in S, the
operations of Ti in S must appear in the same order in which they occur in Ti.

If we can ensure that a transaction T, when committed, never has to roll back, then we have a
demarcation between recoverable and non-recoverable schedules.

Schedules determined as non-recoverable should not be permitted.

Among the recoverable schedules, transaction failures generate a spectrum of recoverability,


from easy to complex.

Comment

Step 2 of 4

Recoverable:

A schedule S is recoverable if no transaction T in S commits until all transactions T’, that have
written an item that T reads, have committed.

A transaction T reads from transaction T’ in a schedule S if some item X is first written by T’ and
read later by T.

In addition, T’ should not be aborted before before T reads item X, and there should be no
transactions that write X after T’ writes it and before T reads it (unless those transactions, if any,
have aborted before T reads X).

Comment

Step 3 of 4

Cascadeless schedule:

A schedule is said to avoid cascading rollback if every transaction in the schedule reads only
items that were written by committed transactions. This guarantees that read items will not be
discarded.

Uncommitted transaction has to be rolled back because it read an item from a transaction and
that is that failed.

This form of rollback is undesirable, since it can lead to undoing a significant amount of work. It is
desirable to restrict the schedules to those where cascading rollbacks cannot occur.

Comment

Step 4 of 4

Strict schedule:

Transactions can neither read nor write an item X until the last transaction that wrote X has
committed or aborted.

Strict schedules simplify the recovery process.

The process of undoing a write (X) operation of an aborted transaction is simply to restore the
before image, the old-value for X.

Though this always works correctly for strict schedules, it may not work for recoverable or
cascadeless schedules.

If the schedule is cascadeless it is recoverable.

If it is strict it is cascadeless. The reverse is not always true

Comment
Chapter 20, Problem 8RQ

Problem

Discuss the different measures of transaction equivalence. What is the difference between
conflict equivalence and view equivalence?

Step-by-step solution

Step 1 of 3

Different measures of transaction equivalence are:

1.) Conflict equivalence: Two schedules are said to be conflict equivalent if the order of any two
conflicting operations is the same in both schedules. Two operations in a schedule are said to be
conflict if they belong to different transactions, access the same database item, and at least one
of the two operations is a write item operation. If two conflicting operations are applied in different
orders in two schedules, the effect can be different on the database or on other transactions in
the schedules, and hence the schedules are not conflict equivalent.

Comment

Step 2 of 3

2.) View equivalence: Another less restrictive definition of schedules is called view equivalence.
Two schedules S and S' are said to be view equivalent if the following 3 conditions hold:

1.) The same set of transactions participates in S and S', and S and S' include the same
operation of those transactions.

2.) For any operation ri(X) of Ti in S, if the value of X read but the operation has been written by
an operation wj(X) of Tj, the same condition must hold the value of X read by operation ri(X) of Ti
in S'.

3.) If the operation wk(Y) of Tk is the ast operation to write item Y in S, then wk(Y) of Tk must
also be the last operation to write item in S'.

The idea behind view equivalence is that as long as each rea operation of the transaction reads
the result of the same write operation in both the schedules, the write operation of each
transaction must produce same result. The read operations are thus said to have same view in
both schedules. Condition # ensures that the final write operation on each data item is the same
in both schedules, so the database stat should e the same at the end of both schedules.

Comment

Step 3 of 3

The difference between view equivalence and conflict equivalence arise under unconstrained
write assumption. View serializability is less restrictive under unconstrained write assumption,
where the value written by a operation wi(X) in it can be independent of its old value from the
database. This is called a blind write, and it is illustrated by the following schedule Sg of three
transactions T1: r1(X); w1(X); T2: w2(x); and T3: w3(X):

Sg: r1(X); w2(X); w1(X); w3(X);c1;c2;c3;

in Sg the operation w2(X) and w3(X) are blind writes, since T2 and T3 do not read the value of X.
The Schedule Sg is view serializable but not conflict serializable. Conflict serializable schedules
are view serializable but not vice versa. Testing of view serializability has been shown to be NP-
hard, meaning that finding an efficient polynomial time algorithm for this problem is highly
unlikely.

Comment
Chapter 20, Problem 9RQ

Problem

What is a serial schedule? What is a serializable schedule? Why is a serial schedule considered
correct? Why is a serializable schedule considered correct?

Step-by-step solution

Step 1 of 4

Serial schedule:

A schedule “S” is referred as serial, for each transaction “T” participating in schedule, the
operations of T must be executed consecutively in schedule.

• So from this perspective, it is clear that only one transaction at a time is active and whenever if
that transaction is committed, then it initiates the execution of next transaction.

Comment

Step 2 of 4

Serializable schedule:

The schedule is referred as “serializable schedule. When a schedule t T be a set of n


transactions ( ), is serializable and if it is equivalent to n transactions executed
serially.

Consider that possibly there are “n” serial schedule of “n” transactions and moreover there are
possibly non-serial schedules. If two disjoined groups of the nonserial schedules are formed then
it is equivalent o one or more of the serial schedules. Hence, the schedule is referred as
serializable.

Comment

Step 3 of 4

Reason for the correctness of serial schedule:

A serial schedule is said to be correct on the assumption of that each transactions is independent
of each other. So according to the “consistency preservation” property, when the transaction runs
in isolation, it is executed from the beginning to end from the other transaction .Thus, the output
is correct on the database.

Therefore a set of transaction executed one at a time is correct.

Comment

Step 4 of 4

Reason for the correctness of serializable schedule:

The simple method to prove the correctness of serializable schedule is that to prove the
satisfactory definition.

In this definition, it compares the results of the schedules on the database, if both produce same
final state of database. Then, two schedules are equivalent and it is proved to be serializable.

Therefore, the serializable schedule is correct when the two schedules are in the same order.

Comment
Chapter 20, Problem 10RQ

Problem

What is the difference between the constrained write and the unconstrained write assumptions?
Which is more realistic?

Step-by-step solution

Step 1 of 1

Constrained write assumption state that any write operation wi(X) in Ti is preceded by a ri(X) in Ti
and the value written by wi(X) in Ti depends only on value of X read by ri(X). This assume that
computation of the new value of X is a function f(X) based on the old value of X read from the
database.

Unconstrained write assumption state that the value written by an operation wi(X) in it can be
independent of its old value from the database. This is called a blind write, and it is illustrated by
the following schedule Sg of three transactions T1: r1(X); w1(X); T2: w2(x); and T3: w3(X):

Sg: r1(X); w2(X); w1(X); w3(X);c1;c2;c3;

in Sg the operation w2(X) and w3(X) are blind writes, since T2 and T3 do not read the value of X.

Constrained write assumption is more realistic as often we need to take in account the value of a
variable before editing the value in the application or query.

Comment
Chapter 20, Problem 11RQ

Problem

Discuss how serializability is used to enforce concurrency control in a database system. Why is
serializability sometimes considered too restrictive as a measure of correctness for schedules?

Step-by-step solution

Step 1 of 4

The concept of serializability of schedules is used to identify which schedules are correct when
transaction executions have interleaving of their operations in the schedules. A schedule S of n
transactions is serializable if it is equivalent to some serial schedule of the same n transactions.

Saying that a non serial schedule S is serializable is equivalent of saying that it is correct,
because it is equivalent to a serial schedule, which is considered correct.

There are several ways of saying that a Schedule is equivalent:

Two schedules are result equivalent if they produce the same final state of database. However
two schedules may accidentally produce same final state, so result equivalence cannot be used
to define equivalence of schedules.

Comment

Step 2 of 4

Conflict equivalence: Two schedules are said to be conflict equivalent if the order of any two
conflicting operations is the same in both schedules. Two operations in a schedule are said to be
conflict if they belong to different transactions, access the same database item, and at least one
of the two operations is a write item operation. If two conflicting operations are applied in different
orders in two schedules, the effect can be different on the database or on other transactions in
the schedules, and hence the schedules are not conflict equivalent.

Comment

Step 3 of 4

View equivalence: Another less restrictive definition of schedules is called view equivalence.
Two schedules S and S' are said to be view equivalent if the following 3 conditions hold:

1.) The same set of transactions participates in S and S', and S and S' include the same
operation of those transactions.

2.) For any operation ri(X) of Ti in S, if the value of X read bt the operation has been written by an
operation wj(X) of Tj, the same condition must hold the value of X read by operation ri(X) of Ti in
S'.

3.) If the operation wk(Y) of Tk is the ast operation to write item Y in S, then wk(Y) of Tk must
also be the last operation to write item in S'.

Comment

Step 4 of 4

Serializability of schedules is sometimes considered to be too restrictive as a condition for


ensuring the correctness of concurrent executions. Some applications can produce schedules
that are correct by satisfying conditions less stringent than either conflict serializability or view
serializability.

An example of the type of transactions known as debit card transactions- for example, those that
apply deposits and withdrawals to data item whose value is the current balance of a bank
account. The semantics of debit- card operations is that they update value of a data item X by
either adding or subtracting to current value and both these operations are commutative- and it is
possible to produce correct schedules that are not serializable.

With additional knowledge, or semantics, that the operation between each ri(I) and wi(I) are
commutative, we know that the order of executing the sequence consisting of (read, write,
update) is not important as long as each (read, write, update)sequence by a particular
transaction Ti on a particular item is not interrupted by conflicting operations. Hence a non
serializable can also be considered correct. Researchers have been working on extending
concurrency control theory to deal with case where serializability is considered to be too
restrictive as a condition for correctness of schedules.

Comment
Chapter 20, Problem 12RQ

Problem

Describe the four levels of isolation in SQL. Also discuss the concept of snapshot isolation and
its effect on the phantom record problem.

Step-by-step solution

Step 1 of 2

The statement ISOLATION LEVEL is used to specify isolation value, where these values can be
SERIALIZABLE, REPEATABLE END, READ COMMITTED OR READ UNCOMMITTED.
SERIALIZABLE is the default isolation level, but some system uses READ COMMITTED as the
default level.

The four isolation levels are as follows:

1. Level 0: If the dirty reads of higher level transactions cannot be overwritten by a transaction,
then such transaction have level 0 isolation.

Such isolation level has the value READ UNCOMMITTED. It lets the transaction display the data
of previous statement on current page, whether or not the transaction is committed. This is called
dirty read too.

Example:

Statement 1:

Begin tran

UPDATE stu SET marks=200 where rollno. = 34

waitfor delay ’00:00:20’

COMMIT;

Statement 2:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED

SELECT * FROM stu;

The statement 2 will execute after update of stu table by statement 1 and display records before
the transaction is committed.

2. Level 1: The transaction having this isolation level has no lost updates. Such isolation level
has the value READ COMMITTED.

In this isolation level, the SQL query statement takes only committed values. If any transaction is
locked or incomplete, then the select statement will wait until all the transactions complete.

3. Level 2: The transaction having this isolation level has no dirty reads as well as no lost
updates. Such isolation level has the value REPEATABLE READ.

Repeatable read is the extension to the committed read. It ensures that if the same query is
executed again in the transaction, it will not read the change in the data value that another query
has made. No other user can modify the data values until the transaction is committed or rolled
back by the previous user.

4. Level 3: In addition to the properties from level 2, isolation level 3 has repeatable reads. Such
isolation level has the value SERIALIZABLE. Serializable isolation level works like repeatable
read except that it prevents Phantoms, when same query is executed twice. This option works on
range lock. It locks whole the table if there is none of the condition is specified on index.

Comment

Step 2 of 2

Snapshot isolation:

Snapshot isolation is used in concurrency control protocols and some commercial DBMSs. Its
definition comprises of the data items that is read by a transaction based on the committed
values of the items present in the database snapshot.

Snapshot isolation ensures that Phantom record problem does not happen. It ensures this,
through the records that are executed in the database at the beginning of a transaction.

Comment
Chapter 20, Problem 13RQ

Problem

Define the violations caused by each of the following: dirty read, nonrepeatable read, and
phantoms.

Step-by-step solution

Step 1 of 1

Violations caused by :

Dirty read –

A transaction that reads information from another transaction, The initial transaction commits
while the other transaction aborts. This causes the source used in the initial transaction to
become incorrect.

Nonrepeatable read –

The transaction reads a value from a record. Another transaction changes the values of the
record that was read. When the initial transaction reads the record again, the values are different.

Phantoms –

A transaction may read a set of rows from a table based on some condition specified in the SQL
WHERE –class.Seeing a new row that was inserted during the process of the initial transaction.
The new row only shows up if the initial transaction is repeated.

Comment
Chapter 20, Problem 14E

Problem

Change transaction T2 in Figure 20.2(b) to read

read_item(X);

X := X + M;

if X > 90 then exit

else write_item(X);

Discuss the final result of the different schedules in Figures 20.3(a) and (b), where M = 2 and N =
2, with respect to the following questions: Does adding the above condition change the final
outcome? Does the outcome obey the implied consistency rule (that the capacity of X is 90)?

Step-by-step solution

Step 1 of 1

Let the condition is

read_item(X);

X:= X+M;

if X > 90 then exit

else write_item(X);

So, this condition is does not change the final output unless the initial value of X > 88.

The outcome, however, does obey the implied consistency rule that X < 90, since the

value of X is not updated if it becomes greater than 90.

Comment
Chapter 20, Problem 15E

Problem

Repeat Exercise 20.14, adding a check in T1 so that does not exceed 90.

Reference Exercise 20.14

Change transaction T2 in Figure 20.2(b) to read

read_item(X);

X := X + M;

if X > 90 then exit

else write_item(X);

Discuss the final result of the different schedules in Figures 20.3(a) and (b), where M = 2 and N =
2, with respect to the following questions: Does adding the above condition change the final
outcome? Does the outcome obey the implied consistency rule (that the capacity of X is 90)?

Step-by-step solution

Step 1 of 1

Let the data as

read_item(X);

X:= X+M;

if X > 90 then exit

else write_item(X);

from this we may write like

T1 T2

read_item(X);

X := X-N;

read_item(X);

X := X+M;

write_item(X);

read_item(Y);

if X > 90 then exit

else write_item(X);

Y := Y+N;

if Y> 90 then

exit

else write_item(Y);

This condition does not change the final output unless the initial value of X > 88 or

Y > 88.

This output obeys the implied consistency rule that X < 90 and Y < 90.
Chapter 20, Problem 16E

Problem

Add the operation commit at the end of each of the transactions T1 and T2 in Figure 20.2, and
then list all possible schedules for the modified transactions. Determine which of the schedules
are recoverable, which are cascade-less, and which are strict.

Step-by-step solution

Step 1 of 6

Let the data as

Let the two Transactions from text book

T1T2

read_item(X); read_item(X);

X := X - N ; X := X + M;

write_item(X); write_item(X);

read_item(Y); commit T 2

Y := Y + N;

write_item(Y);

commit T 1

From these transactions we can be written as using the shorthand notation. That is

T 1 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ;

T 2 : r 2 (X); w 2 (X); C 2 ;

Comment

Step 2 of 6

Given m transactions with number of operations n1, n2, ..., nm,

the number of possible schedules is

(n1 + n2 + ... + nm)! / (n1! * n2! * ... * nm!),

Here ! is the factorial function.

In our case, Let us consider

m =2

n1 = 5

n2 = 3,

so the number of possible schedules is

(5+3)! / (5! * 3!) = 8*7*6*5*4*3*2*1/ 5*4*3*2*1*3*2*1 = 56.

Comment

Step 3 of 6

So, that 56 possible schedules, and the type of each schedule are

S 1 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; r 2 (X); w 2 (X); C 2 ; strict (and hence

cascadeless)

S 2 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); r 2 (X); C 1 ; w 2 (X); C 2 ; recoverable

S 3 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); r 2 (X); w 2 (X); C 1 ; C 2 ; recoverable

S 4 : r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); r 2 (X); w 2 (X); C 2 ; C 1 ; non-recoverable

S 5 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 1 (Y); C 1 ; w 2 (X); C 2 ; recoverable

S 6 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 1 (Y); w 2 (X); C 1 ; C 2 ; recoverable

S 7 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 1 (Y); w 2 (X); C 2 ; C 1 ; non-recoverable

S 8 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 2 (X); w 1 (Y); C 1 ; C 2 ; recoverable

S 9 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 2 (X); w 1 (Y); C 2 ; C 1 ; non-recoverable

S 10 : r 1 (X); w 1 (X); r 1 (Y); r 2 (X); w 2 (X); C 2 ; w 1 (Y); C 1 ; non-recoverable

S 11 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 1 (Y); C 1 ; w 2 (X); C 2 ; recoverable

S 12 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 1 (Y); w 2 (X); C 1 ; C 2 ; recoverable


S 13 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 1 (Y); w 2 (X); C 2 ; C 1 ; non-recoverable

S 14 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 2 (X); w 1 (Y); C 1 ; C 2 ; recoverable

S 15 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 2 (X); w 1 (Y); C 2 ; C 1 ; non-recoverable

S 16 : r 1 (X); w 1 (X); r 2 (X); r 1 (Y); w 2 (X); C 2 ; w 1 (Y); C 1 ; non-recoverable

S 17 : r 1 (X); w 1 (X); r 2 (X); w 2 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; recoverable

S 18 : r 1 (X); w 1 (X); r 2 (X); w 2 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; non-recoverable

S 19 : r 1 (X); w 1 (X); r 2 (X); w 2 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; non-recoverable

S 20 : r 1 (X); w 1 (X); r 2 (X); w 2 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; non-recoverable

Comment

Step 4 of 6

S 21 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; w 2 (X); C 2 ; strict (and hence

cascadeless)

S 22 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 1 (Y); w 2 (X); C 1 ; C 2 ; cascadeless

S 23 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 1 (Y); w 2 (X); C 2 ; C 1 ; cascadeless

S 24 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 2 (X); w 1 (Y); C 1 ; C 2 ; cascadeless

S 25 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 2 (X); w 1 (Y); C 2 ; C 1 ; cascadeless

S 26 : r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 2 (X); C 2 ; w 1 (Y); C 1 ; cascadeless

S 27 : r 1 (X); r 2 (X); w 1 (X); w 2 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; cascadeless

S 28 : r 1 (X); r 2 (X); w 1 (X); w 2 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; cascadeless

S 29 : r 1 (X); r 2 (X); w 1 (X); w 2 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; cascadeless

S 30 : r 1 (X); r 2 (X); w 1 (X); w 2 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; cascadeless

S 31 : r 1 (X); r 2 (X); w 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; cascadeless

S 32 : r 1 (X); r 2 (X); w 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; cascadeless

S 33 : r 1 (X); r 2 (X); w 2 (X); w 1 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; cascadeless

S 34 : r 1 (X); r 2 (X); w 2 (X); w 1 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; cascadeless

S 35 : r 1 (X); r 2 (X); w 2 (X); C 2 ; w 1 (X); r 1 (Y); w 1 (Y); C 1 ; strict (and hence

cascadeless)

S 36 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; w 2 (X); C 2 ; strict (and hence

cascadeless)

S 37 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); w 2 (X); C 1 ; C 2 ; cascadeless

S 38 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); w 2 (X); C 2 ; C 1 ; cascadeless

S 39 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 2 (X); w 1 (Y); C 1 ; C 2 ; cascadeless

S 40 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 2 (X); w 1 (Y); C 2 ; C 1 ; cascadeless

Comment

Step 5 of 6

S 41 : r 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 2 (X); C 2 ; w 1 (Y); C 1 ; cascadeless

S 42 : r 2 (X); r 1 (X); w 1 (X); w 2 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; cascadeless

S 43 : r 2 (X); r 1 (X); w 1 (X); w 2 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; cascadeless

S 44 : r 2 (X); r 1 (X); w 1 (X); w 2 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; cascadeless

S 45 : r 2 (X); r 1 (X); w 1 (X); w 2 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; cascadeless

S 46 : r 2 (X); r 1 (X); w 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; cascadeless

S 47 : r 2 (X); r 1 (X); w 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; cascadeless

S 48 : r 2 (X); r 1 (X); w 2 (X); w 1 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; cascadeless

S 49 : r 2 (X); r 1 (X); w 2 (X); w 1 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; cascadeless

S 50 : r 2 (X); r 1 (X); w 2 (X); C 2 ; w 1 (X); r 1 (Y); w 1 (Y); C 1 ; cascadeless

Comment

Step 6 of 6

S 51 : r 2 (X); w 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 ; non-recoverable

S 52 : r 2 (X); w 2 (X); r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 2 ; C 1 ; recoverable

S 53 : r 2 (X); w 2 (X); r 1 (X); w 1 (X); r 1 (Y); C 2 ; w 1 (Y); C 1 ; recoverable

S 54 : r 2 (X); w 2 (X); r 1 (X); w 1 (X); C 2 ; r 1 (Y); w 1 (Y); C 1 ; recoverable

S 55 : r 2 (X); w 2 (X); r 1 (X); C 2 ; w 1 (X); r 1 (Y); w 1 (Y); C 1 ; recoverable

S 56 : r 2 (X); w 2 (X); C 2 ; r 1 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; strict (and hence

cascadeless)

Comment
Chapter 20, Problem 17E

Problem

List all possible schedules for transactions T1 and T2 in Figure 20.2, and determine which are
conflict serializable (correct) and which are not.

Step-by-step solution

Step 1 of 3

Let the two Transactions T1 and T2 are as follows:

Comment

Step 2 of 3

The Shorthand notation for the two transactions is,

Comment

Step 3 of 3

Below are the 15 possible schedules and their type of each schedule:

Comment
Chapter 20, Problem 18E

Problem

How many serial schedules exist for the three transactions in Figure 20.8(a)? What are they?
What is the total number of possible schedules?

Step-by-step solution

Step 1 of 2

Let the three Transactions from text book like

T 1 T 2 T3

read_item(X); read_item(Z); read_item(Y);

write_item(X); read_item(Y); read_item(Z);

read_item(Y); write_item(Y) write_item(Y);

write_item(Y); read _item(X) write_item(Z)

write_itme(X)

Comment

Step 2 of 2

From defination of serial schedules the above three transactions are

T1 T2 T3

T3 T2 T1

T2 T3 T1

T2 T1 T3

T3 T1 T2

T1 T3 T2

Total number of serial schedules for the three transactions = 6

And

The total number of serial schedules for n transactions is factorial(n)

ie..(n!).

Comment
Chapter 20, Problem 19E

Problem

Write a program to create all possible schedules for the three transactions in Figure 20.8(a), and
to determine which of those schedules are conflict serializable and which are not. For each
conflict-serializable schedule, your program should print the schedule and list all equivalent serial
schedules.

Step-by-step solution

Step 1 of 1

Programs for finding serializable schedules:

Array TansactionT1Commands [4] ;

Int t1Counter = 0;

Int t2Counter = 0;

Int t3Counter = 0;

Int maxCounter = 0;

Int Transaction;

Array TansactionT2Commands[5] ;

Array TansactionT3Commands[4] ;

Array FinalSchedules[12];

Array Schedules[12][5000];//there can be many schedules we will take only 5000

While (maxCounter < 5000)

For (int i = 0; i< 13; i++)

Int ti = Rand(3);

If (ti == 1 && t1Counter< 4)

FinalSchedule[i++] = TansactionT1Commands[t1Counter++];

Else If(ti== 2 && t2Counter< 5)

FinalSchedule[i++] = TansactionT2Commands[t2Counter++];

Else if (t1== 3 && t3Counter< 4)

FinalSchedule[i++] = TansactionT3Commands[t3Counter++];

If (FinalSchedules[12] in Schedules[12][5000]);

////Do nothing

}
Else

Save FinalSchedules[12] in Schedules[12][5000]);

maxCounter++;

Check if Seralizable (FinalSchedules[12]);

Check if Seralizable (Array FinalSchedules[12])

For each transaction create a node.

For each case in Schedule S where Tj executes a read_item(X) after Ti executes a write_item(X),
create an edge (Ti-> Tj) in the precedence graph.

For each case in Schedule S where Tj executes a write_item(X) after Ti executes a read_item(X),
create an edge (Ti-> Tj) in the precedence graph.

For each case in Schedule S where Tj executes a write_item(X) after Ti executes a


write_item(X), create an edge (Ti-> Tj) in the precedence graph.

The schedule is seralizavble only if there is no cycles.

If Serializable print FinalSchedules[12])

Return;

Comment
Chapter 20, Problem 20E

Problem

Why is an explicit transaction end statement needed in SQL but not an explicit begin statement?

Step-by-step solution

Step 1 of 1

A transaction is an atomic operation. It has only one way to begin, that syntax is like this

BEGIN_ TRANSACTION

------;

- - - - - ; // READ OR WRITE //

-----;

END TRANSATIONS;

COMMIT_TRANSACTION

Transactions could end up in two ways:

Successfully installs-- its updates to the database (i.e., commit)

or

Removes -- its partial updates (which may be incorrect) from the database (abort).

So, it is important for the database systems to identify the right way of ending a transaction. It is
for this reason an "End" command is needed in SQL2 query.

Comment
Chapter 20, Problem 21E

Problem

Describe situations where each of the different isolation levels would be useful for transaction
processing.

Step-by-step solution

Step 1 of 2

Transaction isolation measure the influence of other concurrent transactions on a given


transaction. This affects of concurrency has two levels, that are

the highest in Read Uncommitted

and

the lowest in Serializable.

Isolation level Serializable:

In this level preserves consistency in all situations, thus it is the safest execution mode. It is
recommended for execution environment where every update is crucial for a correct result. For
example, airline reservation, debit credit, salary increase, and so on.

Isolation level Repeatable Read:

In this level is similar to Serializable except Phantom problem may occur here. Thus, in record
locking (finer granularity), this isolation level must be avoided. It can be used in all types of
environments, except in the environment where accurate summary information (e.g., computing
total sum of a all different types of

account of a bank customer) is desired.

Comment

Step 2 of 2

Isolation level Read Committed:

In this level a transaction may see two different values of the same data items during its
execution life. A transaction in this level applies write lock and keeps it until it commits. It also
applies a read (shared) lock but the lock is released as soon as the data item is read by the
transaction. This isolation level may be used for making balance, weather, departure or arrival
times, and so on.

Isolation level Read Uncommitted:

In this level a transaction does not either apply a shared lock or a write lock. The transaction is
not allowed to write any data item, thus it may give rise to dirty read, unrepeatable read, and
phantom. It may be used in the environment where statistical average of a large number of data
is required.

Comment
Chapter 20, Problem 22E

Problem

Which of the following schedules is (conflict) serializable? For each serializable schedule,
determine the equivalent serial schedules.

a. r1(X); r3(X); w1(X); r2(X); w3(X);

b. r1(X); r3(X); w3(X); w1(X); r2(X);

c. r3(X); r2(X); w3(X); r1(X); w1(X);

d. r3(X); r2(X); r1(X); w3(X); w1(X);

Step-by-step solution

Step 1 of 5

Serializable schedule:

A conflict graph corresponding to a schedule decides whether given schedule is conflict


serializable or not. If conflict graph contains cycle, then the schedule is not serializable. The
drawing sequence of conflict graph:

1) Create a node labeled Ti in graph for each of the transaction Ti which participates in schedule
S.

2) An edge is created from Ti to Tj in graph, where a write_item(X) is executed by Ti and then a


read_item(X) is executed by Tj.

3) Create an edge in graph from Ti to Tj, where a read_item(X) is executed by Ti and then a
write_item(X) is executed by Tj.

4) Create an edge in graph from Ti to Tj, where a write_item(X) is executed by Ti and then a
write_item(X) is executed by Tj.

5) If no cycles are present in conflict graph, then it is a serializable schedule.

Comment

Step 2 of 5

(a)

Given schedule:

Conflict graph:

The conflict graph has cycle, in T1-T3. Hence, given schedule is .

Comment

Step 3 of 5

(b)

Given schedule:

Conflict graph:
The conflict graph has cycle, in T1-T3. Hence, the schedule S is .

Comment

Step 4 of 5

(c)

Given schedule:

Conflict graph:

The graph contains no cycles. Hence, the schedule S is .

• The equivalent schedule that is serial is: , that is,

Comment

Step 5 of 5

(d)

Given schedule:

Conflict graph:

The conflict graph has cycle, in T1-T3. Hence, the schedule S is .


Chapter 20, Problem 23E

Problem

Consider the three transactions T1 T2, and T3, and the schedules S1 and S2 given below. Draw
the serializability (precedence) graphs for S1 and S2 and state whether each schedule is
serializable or not. If a schedule is serializable, write down the equivalent serial schedule(s).

T1: r1 (X); r1 (Z); w1 (X);

T2: r2 (Z); r2 (Y); w2 (Z); w2(Y);

T3: r3 (X); r3 (Y); w3 (Y);

S1: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); w3 (Y); r2 (Y); w2 (Z); w2 (Y);

S2: r1 (X); r2 (Z); r3 (X); r1 (Z); r2 (Y); r3 (Y); w1 (X); w2 (Z); w3 (Y); w2 (Y);

Step-by-step solution

Step 1 of 2

The schedule S1 is as follows:

S1: r1(X); r2(Z); r1(Z); r3(X); r3(Y); w1(X); w3(Y); r2(Y); w2(Z); w2(Y)

The precedence graph for S1 is as follows:

The schedule S1 is a serializable schedule as there is no cycle in the precedence graph.

• T3 reads X before X is modified by T1.

• T1 reads Z before Z is modified by T2.

• T2 reads Y and writes it only after T3 has written to it.

The equivalent serializable schedule is as follows:

Comment

Step 2 of 2

The schedule S2 is as follows:

S2: r1(X); r2(Z); r3(X); r1(Z); r2(Y); r3(Y); w1(X); w2(Z); w3(Y); w2(Y)

The precedence graph for S1 is as follows:


The schedule S2 is not a serializable schedule as there is cycle in the precedence graph.

• T2 reads Y before T3 reads it and modifies Y.

• T3 reads Y which is later modified by T2.

Comment
Chapter 20, Problem 24E

Problem

Consider schedules S3, S4, and S5 below. Determine whether each schedule is strict,
cascadeless, recoverable, or nonrecoverable. (Determine the strictest recoverability condition
that each schedule satisfies.)

S3: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); c1; w3 (Y); c3; r2(Y); w2(Z); w2(Y); c2;

S4: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); w3 (Y); r2(Y); w2(Z); w2(Y); c1; c2; c3;

S5: r1 (X); r2 (Z); r3 (X); r1 (Z); r2 (Y); r3 (Y); w1 (X); c1; w2(Z); w3(Y); w2(Y); c3; c2;

Step-by-step solution

Step 1 of 5

Strict schedule: A schedule is said to be a strict schedule if a transaction neither reads or writes
an item x until another transaction that wrote x is committed.

The schedule S3 is a not a strict schedule because of the following reason:

• The operation r3(x) is before w1(x) in the schedule S3.

• It means that T3 reads the value of x before T1 has written the value of x.

• T3 must read x only after T1 commits.

The schedule S4 is a not a strict schedule because of the following reason:

• The operation r3(x) is before w1(x); in the schedule S4.

• It means that T3 reads the value of x before T1 has written the value of x.

• T3 must read x only after T1 commits.

The schedule S5 is a not a strict schedule because of the following reason:

• The operation r3(x) is before w1(x); in the schedule S5.

• It means that T3 reads the value of x before T1 has written the value of x.

• T3 must read x only after T1 commits.

Comment

Step 2 of 5

Cascadeless schedule: A schedule is said to be a cascadeless schedule if a transaction reads


or writes an item x only after a transaction that wrote x is committed.

The schedule S3 is a not a cascadeless schedule because of the following reason:

• The operation r3(x) is before w1(x) in the schedule S3.

• It means that T3 reads the value of x before T1 commits.

The schedule S4 is a not a cascadeless schedule because of the following reason:

• The operation r3(x) is before w1(x); in the schedule S4.

• It means that T3 reads the value of x before T1 commits

The schedule S5 is a not a cascadeless schedule because of the following reason:

• The operation r3(x) is before w1(x); in the schedule S5.

• It means that T3 reads the value of x before T1 commits

Comment

Step 3 of 5

Recoverable and non-recoverable schedule:

A schedule is said to be a recoverable schedule if no transaction T commits until the transaction


T’ that wrote x and whose value of x is read by T is committed.

Schedule S3:

• If the T1 aborts first and then T3 and T2 are committed, then the schedule S3 is recoverable as
rolling back of T1 does not affect T2 and T3.

• If the T1 commits first and then T3 aborts and then T2 commits, then the schedule S3 is not
recoverable as rolling back of T3 will affect T2 as it has read the value of y written by T3.

• If the T1 and T3 commits and then T2 aborts, then the schedule S3 is recoverable as rolling
back of T2 does not affect T1 and T3.

• Strictest condition is transaction T3 must be committed before T2.

Comment

Step 4 of 5

Schedule S4:
• If the T1 aborts first and then T2 and T3 are committed, then the schedule S4 is recoverable as
rolling back of T1 does not affect T2 and T3.

• If the T1 commits first and then T2 aborts and then T3 commits, then the schedule S4 is
recoverable as rolling back of T1 does not affect T2 and T3. The value of y which is read and
written by T3 will be restored by the rollback of T2.

• If the T1 and T2 commits and T3 aborts, then the schedule S4 is not recoverable as rolling back
of T3 will affect T2 as it has read the value of y written by T3.

• Strictest condition is transaction T3 must be committed before T2.

Comment

Step 5 of 5

Schedule S5:

• If the T1 aborts first and then T3 and T2 are committed, then the schedule S5 is recoverable as
rolling back of T1 does not affect T2 and T3. T1 writes the value of x which is not read by T2 nor
T3.

• If the T1 commits first and then T3 aborts and then T2 commits, then the schedule S5 is not
recoverable as rolling back of T3 will affect T2 as it has read the value of y written by T3.

• If the T1 and T3 commits and then T2 aborts, then the schedule S5 is recoverable as rolling
back of T2 does not affect T1 and T3.

• Strictest condition is transaction T3 must be committed before T2.

Comment
Chapter 21, Problem 1RQ

Problem

What is the two-phase locking protocol? How does it guarantee serializability?

Step-by-step solution

Step 1 of 2

Two-phase locking:

Two-phase locking schema is a one of the locking schema is which a transaction cannot request
a new lock until it unlocks the operations in the transaction. It is involved in two phases.

• Locking phase

• Unlocking phase.

Locking phase:

This is the expanding or growing phase in which the new locks are acquired but none is
released.

Unlocking phase:-

This is the second phase referred as shrinking phase in which it releases the existing locks and
does not acquire the new locks.

Comment

Step 2 of 2

Guarantee of serializability:

The attraction of the two-phase algorithm derives from a theorem which provides that the two-
phase locking algorithm always leads to serializable schedules.

It is proved that if every transaction in a schedule follows the two-phase locking protocol, then the
schedule is guaranteed to be serializable.

With the two-phase locking protocol, the schedule is guaranteed to be serializability because the
protocols will prevent interface among different transactions and it avoids the problems of last
update, uncommitted dependency and inconsistent analysis if the two phase locking is enforced.

Comment
Chapter 21, Problem 2RQ

Problem

What are some variations of the two-phase locking protocol? Why is strict or rigorous two-phase
locking often preferred?

Step-by-step solution

Step 1 of 2

Variations two-phase locking protocol:-

According to the two-phase locking protocol, locks are handled by transactions and there are a
number of variations of two-phase locking.

That is

(1) Conservative 2PL (or) static 2PL

It requires a transaction to lock all the items it access before the transaction beings execution by
predeceasing its read-set and write-set, it is a deadlock-free protocol.

(2) Basic 2PL

This a one technique of 2PL and transaction locks data items incrementally. This may cause
dead lock which is dealt with.

Comment

Step 2 of 2

Strict or rigorous two-phase locking is preferred because,

In this variation, a transaction T does not release any of it’s exclusive (write) locks until after it
commits or aborts. So, no other transaction can read/write an item that is written by T unless T
have committed.

And strict 2PL is not dead lock-free.

And most restrictive variation of strict -2PL is rigorous 2PL. it also guarantees the strict
schedules.

In this, a transaction T does not release any of it’s locks until after it commits or aborts and so it is
easier to implement than strict 2PL.

Comment
Chapter 21, Problem 3RQ

Problem

Discuss the problems of deadlock and starvation, and the different approaches to dealing with
these problems.

Step-by-step solution

Step 1 of 4

Deadlock:

• A deadlock refers to a situation in which a transaction Ti waits for an item that is locked by
transaction Tj. The transaction Tj in turn waits for an item that is locked by transaction Tk.

• When each transaction in a set of transactions is waiting for an item that is locked by other
transaction, then it is called deadlock.

Example:

Suppose there are two transaction T1 and T2 and there are two items X and Y.

• Initially transaction T1 hold the item X and transaction T2 hold the item Y.

• In order for the transaction T1 to complete its execution, it needs item Y which is locked by
transaction T2.

• In order for the transaction T2 to complete its execution, it needs item X which is locked by
transaction T1.

Such a situation is known as deadlock situation because neither transaction T1 and T2 can
complete its execution.

Comment

Step 2 of 4

The different approaches to dealing with deadlock are as follows:

• Deadlock prevention: The transaction acquires the lock on all the items it needs before starting
the execution. If it cannot acquire a lock on an item, then it should not lock any other items and
should wait and try to acquire locks again.

• Deadlock detection: A wait for graph is used to check for deadlocks.

• Timeouts: A transaction is aborted if it waits for a period longer than the system defined time.

Comment

Step 3 of 4

Starvation:

• Starvation refers to a situation in which a low priority transaction waits indefinitely while other
high priority transactions execute normally.

• Starvation problem occurs when locking is used.

Comment

Step 4 of 4

The different approaches to dealing with starvation are as follows:

• Use the first come first serve queue to maintain the transactions that are waiting. The
transactions can acquire lock on an item in the same order they have been placed in the queue.

• Increase the priority of the transactions that are waiting longer so that at some point of time it
becomes the transaction with highest priority and proceeds to execute.

Comment
Chapter 21, Problem 4RQ

Problem

Compare binary locks to exclusive/shared locks. Why is the latter type of locks preferable?

Step-by-step solution

Step 1 of 2

Binary locks:-

Binary locks are type of lock. It has only two states of a lock, it is too simple, and it is too
restrictive. It is not used in the practice.

Exclusive/shared lock:-

Exclusive/shared locks that may provide more general locking capabilities and that are used in
practical database locking schemas.

In this lock.

Read-lock as a shared lock and

Write-lock as an exclusive lock.

From the above locks, exclusive/shared lock is preferable, because,

Share-lock is the read-locked item through this other operations are allow to read the item and
where as a write-locked is a single transaction exclusively holds the lock on the item. Here these
are three locking operations.

That are

Read-lock (X)

Write-lock (X), and

Un lock (X)

Comment

Step 2 of 2

If we use the shared locking scheme. The system must following the

(1) A transaction T must issue the operation read-lock (X) or write-lock(X) before any read-item
(X) operation is performed in T

(2) A transaction T must issue the operation write-lock (X) before any write-items (X) operation is
performed in T.

(3) A Transactions T must issue the operation unlock (X) after all read-items (X) and writer-
item(X) operations are completed in T.

(4) A Transaction T will not issue a read lock (X) operation if it already holds a read (Shared) lock
or a write (Exclusive) lock on item X. This rule may be relaxed.

Comment
Chapter 21, Problem 5RQ

Problem

Describe the wait-die and wound-wait protocols for deadlock prevention.

Step-by-step solution

Step 1 of 2

Wait-die and wound-wait protocols:-

Transactions are start based on the order of the timestamps, hence. If transaction starts
before transaction , then .

So, we notice that, the order transaction has the smaller timestamp value.

Two schemes that prevent dead lock are called wait-die and wound-wait.

For suppose, transaction tries to lock an item X but is not able to because X is locked by some
other Transaction . With a conflicting lock. These rules are followed by below schemas.

Comment

Step 2 of 2

Wait-die:-

If . Then abort and restart it later with the


same times stamps; other wise younger than as allowed to waid.

In a wait-die an older transaction is allowed to wait on a younger transaction and it is requesting


an item held by an older transaction is aborted and restarted.

The wound-wait is the opposite to wait-die.

Means:

A younger transaction is allowed to wait an older one. Where an older transaction requesting an
item held by a younger transaction precepts the younger transaction by a forting. It

Comment
Chapter 21, Problem 6RQ

Problem

Describe the cautious waiting, no waiting, and timeout protocols for deadlock prevention.

Step-by-step solution

Step 1 of 3

We may prevention the dead lock by using following.

Cautious waiting:-

Suppose, a transaction tries to lock an item but it is not able to do. Because is locked
by some other transaction with a conflicting lock.

And

If is not blocked, than is blocked and allowed to wait other wise abort

Ie

If X is waiting for , let it wait unless is also waiting for to release some other item.

Comment

Step 2 of 3

No waiting:-

In case of inability to obtain a lock, a Transaction aborts and is resubmitted with a fined delay

Comment

Step 3 of 3

Timeout

If a transactions waits for a period longer than a system-defined time out period, and the system
assumes that the transaction may be dead locked and ;aborts it-regardless of whether a
deadlock actually exists or not.

If we use time out protocol in the dead lock prevention. Some transactions that were not
deadlocked and they may abort and may have to be resubmitted.

Comment
Chapter 21, Problem 7RQ

Problem

What is a timestamp? How does the system generate timestamps?

Step-by-step solution

Step 1 of 1

Timestamp:-

Time stamp is a unique identifier created by the DBMS to identify a transaction and it’s values
are assigned in the order in which the transactions are submitted to the system.

Time stamp means.

A monotonically increasing variable (integer) indicating the age of an operation or a transaction.

Time stamps that can be generated by system in several ways

It is to use a counter and that is incremented each time its value is assigned to a transaction. In
this schema, the transaction time stamps are numbered like 1, 2, 3,…and A computer counter
has a finite maximum value. So the system must periodically reset the counter to zero. When no
transactions are executing for some short period of time

and system may implement the timestamps to use the current date/time values of the system
clock and ensure that no two time stamp value are generated during the same tick of the clock.

Comment
Chapter 21, Problem 8RQ

Problem

Discuss the timestamp ordering protocol for concurrency control. How does strict timestamp
ordering differ from basic timestamp ordering?

Step-by-step solution

Step 1 of 3

Time stamp ordering protocol for concurrency control:

The protocol manages concurrent executing such that the time stamps determine the
serializability order.

The protocol maintains for each data Q through two timestamp values.

(1) W-timestamp(Q)

It is a largest time-stamp of any transaction that executed write (Q) successfully.

(2) R-time stamp (Q)

It is the largest time-stamp of any transaction that executed read (Q)

Successfully.

Time stamp ordering protocol ensures that any conflicting read and write operations are executed
in timestamp order.

Comment

Step 2 of 3

Differ from strict time stamp ordering through basic timestamp ordering:-

Strict time stamp ordering (TO)

When transaction ‘T’ issues a write-item (X) operations and read-item (X) operation.

If TS(T)> read-TS(X) then delay T until the transaction ‘T’ that wrote or read X has terminated

and

if TS(T)> write-TS(X) the delay T until the transaction ‘T’ that wrote or read X has terminated

Comment

Step 3 of 3

Basic timestamp ordering:-

When transaction ‘T’ issues a write-item (X) operations and read-item (X) operation.

If TS (T)> read-TS(X) then delay T until the transaction ‘T’ that wrote or read X has terminated

and

If TS (T)> write-TS(X) the delay T until the transaction ‘T’ that wrote or read X has terminated

If read-TS(X)>TS(T) or does not exist, then execute write-item (X) of T and set write-TS(X) to
TS(T).

And

If write-TS(X) >TS(T), then an younger transaction has already written to the data item so a fort
and roll-back T and reject the operation.

If write-TS(X) TS (T), then execute read-item (X) of T and set read-TS(X) to the larger of TS(T)
and current read-TS(X).

Comment
Chapter 21, Problem 9RQ

Problem

Discuss two multiversion techniques for concurrency control. What is a certify lock? What are the
advantages and disadvantages of using certify locks?

Step-by-step solution

Step 1 of 4

Multiple concurrency control techniques are the ones that retain the old value of data items, while
dealing with the newer version of the values. The purpose behind holding the older values as
well as is to maintain serializablity and to support some older values as well that are compatible
with the previous data.

Two multiversion techniques for concurrency control are as follows:

1. Multiversion Technique Based on Timestamp Ordering.

2. Multiversion Two-Phase Locking Using Certify.

Comment

Step 2 of 4

Consider the description of the two multiversion techniques for concurrency control discussed
above:

1.

Multiversion Technique Based on Timestamp Ordering:

In this, several versions of each data X are maintained. For each version there must two be more
details.

• Read_TS: It is the time stamp of that particular moment when the data is read. It contains the
highest value of all time stamps.

• Write_TS: It hold the value of that particular moment at which the data is updated.

Whenever a write operation is performed over an item X, the newer version of both the read_TS
and write_TS is made, while previous version is also retained.

Comment

Step 3 of 4

2.

Multiversion Two-Phase Locking Using Certify Locks:

In this, there are three kinds of locking modes for each item. These three kinds of locking modes
are as follows:

• Read

• Write

• Certify

So, if a state is said to be locked then it may be any of these three locks.

• In the previous locking scheme, if a transaction holds a write lock over an item, then no one
item is allowed to access that. But here it is to allow other transactions T to read an item X while
a single transaction T holds a write lock on X.

• For this purpose two version of x is to be held. Then in case of committing a transaction, certify
lock is to be maintained over an item.

Comment

Step 4 of 4

Certify Lock:

It is the kind of lock that is attained only when all the updated values need to be finalized so that
it can get a stable state. It is similar to a commit statement when all the transactions that are
performed successfully are need to be saved.

Advantages of Certify Lock:

• When the transaction is completed and is ready to be saved, then a certify lock is maintained
over a transaction or over an item so as to maintain a monopoly over it.

• The updating of the data item can be completed securely and the data get saved from any kind
of hindrance.

Disadvantage of Certify Lock:

When a transaction is completed and there is maintained a certifies lock, then in that case none
of the other data item or other process is not able to have access over that item and cannot have
access even for reading the item.

Comment
Chapter 21, Problem 10RQ

Problem

How do optimistic concurrency control techniques differ from other concurrency control
techniques? Why are they also called validation or certification techniques? Discuss the typical
phases of an optimistic concurrency control method.

Step-by-step solution

Step 1 of 2

In all concurrency control techniques, certain degree of checking is done before a database
operation can be executed. For example, in locking a check is done to determine weather the
item being accessed is locked. In timestamp ordering, the transaction timestamp is checked
against the read and the write timestamps of the item. Such checking represent overhead during
transactions. In optimistic concurrency control techniques, also known as validation or
certification techniques, no checking is done while the transaction is executing. In one of
validation schemes, updates in the transaction are not applied directly to the database items until
the transaction reaches its end. During transaction execution all updates are made to the local
copies of data items that are kept for transaction. At the end of transaction execution, validation
phases checks weather any of the transaction’s updates violate serializability. Certain information
needed by validation phase must be kept in the system. If serializability is not violated the
transaction is committed and database is updated from local copies; otherwise the transaction is
aborted and restarted later.

Comment

Step 2 of 2

Phases of Concurrency control protocol:

1.) Read phase: A transaction can read values of committed data items from the database.
However, updates are applied only to local copies of the data items kept in the transaction
workspace.

2.) Validation phase: Checking is performed to ensure that serializability will not be violated if
the transaction updates are applied to the database.

3.) Write phase: If the validation phase is successful, the transaction updates are applied to the
database; otherwise, the updates are discarded and the transaction restarted.

The idea behind optimistic concurrency control is to do all checks at once; hence, transaction
execution proceeds with a minimum overhead until the validation phase is reached. Since in the
validation phase it is decided that if transaction can be committed or must be aborted it is also
called as validation or certification technique.

Comment
Problem
Chapter 21, Problem 11RQ

What is snapshot isolation? What are the advantages and disadvantages of concurrency control
methods that are based on snapshot isolation?

Next
Step-by-step solution

Step 1 of 1

Snapshot isolation:

Snapshot isolation is used in concurrency control protocols and some commercial DBMSs. Its
definition comprises of the data items that is read by a transaction based on the committed
values of the items present in the database snapshot.

Snapshot isolation ensures that Phantom record problem does not happen. It ensures this,
through the records that are executed in the database at the beginning of a transaction.

Advantages of concurrency control methods based on snapshot isolation are as follows:

• As the database statement or even database transaction only have the records, that were
executed in the database when the transaction had started, so the snapshot isolation ensures
that the phantom record problem does not arises.

• The problems of nonrepeatable read and dirty read might arise during the transaction
execution. Snapshot isolation ensures that these problems of nonrepeatable read and dirty read
does not occur.

• The concurrency control methods based on snapshot isolation has reduced overhead
associated with the two phase locking, as there is no necessity to apply read locks to the items,
in the read operations linked with the concurrency control methods.

Disadvantages of concurrency control methods based on snapshot isolation are as follows:

• Nonserializable schedules can occur in the case of concurrency control based snapshot
isolation. There are few anomalies such as write-skew anomalies, read-only transaction anomaly
that violates serializability.

Such anomalies results in corrupted or inconsistent database.

Comment
Chapter 21, Problem 12RQ

Problem

How does the granularity of data items affect the performance of concurrency control? What
factors affect selection of granularity size for data items?

Step-by-step solution

Step 1 of 3

The size of data item is often referred to as data item granularity. Smaller the size of data item it
is fine granularity, larger size is course granularity.

Comment

Step 2 of 3

How does it affect performance of concurrency control?

1.) First notice that the larger the data item size is, the lower the degree of concurrency
permitted. For example, if the data item size is a disk block, a transaction T that needs to lock a
record B must lock the whole disk block X that contains B because a lock is associated with the
whole data item (block). Now, if another transaction S wants to lock a different record C that
happens to reside in the same block X in a conflicting lock mode, it is forced to wait. If the data
item size was a single record, transaction S would be able proceed, because it would be locking
a different data item (record).

2.) The smaller the data item size is, the more the number of items in the database. Because
every item is associated with a lock, the system will have a larger number of active locks to be
handled by lock manager. More lock and unlock operations will be performed, causing a higher
overhead. In addition , more storage space will be required for the lock table. For timestamps,
storage is required for the read_TS and write_TS for each item, and there will be similar
overhead for handling a large number of items.

Comment

Step 3 of 3

Factors affecting selection of granularity size for data items:

Best item size is dependent on transactions involved. If a typical transaction accesses a small
number of records, it is advantageous to have the data item granularity be one record. On other
hand, if a transaction typically accesses many records in the same file. It may be better to have
block or file granularity so that the transaction will consider all the records as one data item.

Comment
Chapter 21, Problem 13RQ

Problem

What type of lock is needed for insert and delete operations?

Step-by-step solution

Step 1 of 2

Types of locks needed for insert and delete operations:-

If we want to per form a delete/insert operation a new item in the database, it can not be
accessed until. The item is created and the insert operation is completed. For this we use the
locks that

(1) two-phase locking (2) index locking by using two-phase locking if we use the delete
operation, that may be performed only if the transaction deleting the tuple holds an exclusive lock
on the tuple to be deleted.

And

Comment

Step 2 of 2

A transaction that inserts a new tuple into the database is automatically given an exclusive lock
on the inserted tuple.

Insertion and deletion can lead to the phantom phenomenon.

A transaction that scans a relation and a transaction that inserts a tuple in the relation. And if only
tuple locks are used non-serializable schedules can result.

the transaction scanning the relation is reading information that indicates which tuples the
relation contains and while a transaction inserting a tuple updates the same information.

Transactions inserting or deleting a tuple acquire and exclusive lock on the data item.

From the above protocol, it provides a law concurrency for insertions/deletions.

And

Index locking protocols provide higher concurrency while preventing the phantom problem by
requiring the locks on certain index buckets.

Comment
Chapter 21, Problem 14RQ

Problem

What is multiple granularity locking? Under what circumstances is it used?

Step-by-step solution

Step 1 of 1

Multiple granularity locking is a lock that may contain locks are set of objects. That contain other
object locks are exploiting the hierarchical nature of contains a relationship.

Multiple granularity locks should have to make some decision for all transactions and data
containers are nested.

Multiple granularity locks used in where the granularity level can be different for various maxis of
transactions.

• The Multiple granularity lock may use in concurrency control performance and Ensure that
correctness, efficiency.

• To create multiple granularity locking, there is required, some extra type of locks, those locks
are termed as intention locks.

Comment
Chapter 21, Problem 15RQ

Problem

What are intention locks?

Step-by-step solution

Step 1 of 2

Intention locks:-

A lock that can be used for, “to macking a lock at multiple granularity levels practical, additional
types of lock is needed. That is intention lock.

Main idea a behind intention locks is, for a transaction to indicate which type of lock it will require
later for a row in that table.

(Not locking the object, but declare intension to lock part of the object) here, there are three types
of intention locks.

Comment

Step 2 of 2

(1) Intention – shared (IS):-

Indicates that, a shared lock (S) will be requested on some decendant node (S)

(2) Intention – exclusive (1X):-

Indicates that an exclusive lock (S) will be requested on some descendant node(S)

(3) shared-intention-exclusive (Six)

It includes that the current node is locked in shared mod but an exclusive lock (S) will be
requested on some descendent node (S)

And

The intention lock protocall follows.

(1) Before a given transaction can acquire an S lock on a given row. It must first acquire an Is or
stronger lock on the table contain the row.

(2) Before the given transaction can acquire an X lock on a given row. It mot first acquire an IX
lock on the table containing that row.

Comment
Chapter 21, Problem 16RQ

Problem

When are latches used?

Step-by-step solution

Step 1 of 1

Latches are used. For, to guarantee the physical integrity of a page when that page is being
written from the buffer disk.

And

Latch would be acquired for the page the page written to disk and then the latch be released.

Typically locks are held for a short duration. This is a called as latches.

Comment
Chapter 21, Problem 17RQ

Problem

What is a phantom record? Discuss the problem that a phantom record can cause for
concurrency control.

Step-by-step solution

Step 1 of 2

Phantom record:-

When a new record is inserted by some transaction T, that satisfies the condition, a set of
records accessed by another Transaction . At this time, transaction followed by transaction
T, it is new one and it is not included for equivalent serial order. And the Transaction logically
conflict in the latter case there is really no record in common between the two transactions, since
may have locked for all records before transaction ‘T’ inserted the new record.

The record that causes the conflict of the phantom record

Comment

Step 2 of 2

The phantom record can cause for concurrency record:-

For this we take an example.

Suppose, the transaction T is inserted a New EMPLOYEE record whose Dno=5.

While transaction is accessing all EMPLOYEE records whose Dno=5. then the equivalent
serial order is T followed by the . Then must read the new EMPLOYEE record and include
its salary in the sum calculation. At this time the new salary should not be included and the latter.

Case there is really no record in common between the two transactions. Since may have
locked all the records with Dno=5 before T inserted the new record. This is because the record
that cause the conflict is a phantom record. It is suddenly appeared in the database on being
inserted.

If the other operation in the two transactions conflict, the conflict due to the phantom record may
not be recognized by the concurrency control protocol.

Comment
Chapter 21, Problem 18RQ

Problem

How does index locking resolve the phantom problem?

Step-by-step solution

Step 1 of 2

Index locking:-

Index includes entries that have an attribute values. Plus a set of pointers to all records in the file
with that value. And if the index entry is locked before the record it self can be accessed. Then
the conflict on the phantom record can be detected because transaction would request a read
lock on the index entry and transaction T would request a write lock on the same entry before
that could place the locks on the actual records.

Since the index lock conflict the phantom conflict and that would be detected.

Comment

Step 2 of 2

Example:-

Let the index on Dno of EMPLOYEE would be include an entry for each distinct Dno value, plus
a set of pointers to all EMPLOYEE records with that value.

At this time if the index entry is locked, before the record itself can be accessed, then the conflict
on the phantom record can be detected because transaction would request a read lock on the
index entry for Dno=5 and transaction T would request a write lock on the same entry before they
could place the locks on the actual records.

Since the index locks conflict the phantom conflict would be detected.

Comment
Chapter 21, Problem 19RQ

Problem

What is a predicate lock?

Step-by-step solution

Step 1 of 1

Predicate lock:-

Index locking is a special case of predicate locking for which an index supports efficient
implementation of the predicate lock.

Predicate lock means all records that satisfy some logical predicate, and it satisfy an arbitrary
predicate

In general predicate locking has a lot of locking has a lot of locking over head.

It is too expensive.

Fancier index locking tricks are used in practice.

Comment
Chapter 21, Problem 20E

Problem

Prove that the basic two-phase locking protocol guarantees conflict serializability of schedules.
(Hint: Show that if a serializability graph for a schedule has a cycle, then at least one of the
transactions participating in the schedule does not obey the two-phase lockingprotocol.)

Step-by-step solution

Step 1 of 1

For This proof we tack contradiction, and assume binary locks for simplicity.

Let n transactions T1, T2, ..., Tn such that they all obey the basic two-phase locking rule which is
no transaction has an unlock operation followed by a lock operation. And Suppose that a non-
(conflict)-serializable schedule S for T1, T2, ..., Tn does occur; then, according to the precedence
(serialization) graph for S must have a cycle. Hence, there must be some sequence within the
schedule of the form:

S: ...; [o1(X); ...; o2(X);] ...; [ o2(Y); ...; o3(Y);] ... ; [on(Z); ...; o1(Z);]...

where each pair of operations between square brackets [o,o] are conflicting (either [w,w], or [w,
r], or [r,w]) in order to create an arc in the serialization graph. This implies that in transaction T1,
Than a sequence of the following form occurs:

T1: ...; o1(X); ... ; o1(Z); ...

Furthermore, T1 has to unlock item X (so T2 can lock it before applying o2(X) to follow

the rules of locking) and has to lock item Z (before applying o1(Z), but this must occur

after Tn has unlocked it). Hence, a sequence in T1 of the following form occurs:

T1: ...; o1(X); ...; unlock(X); ... ; lock(Z); ...; o1(Z); ...

This implies that T1 does not obey the two-phase locking protocol (since lock(Z) follows

unlock(X)), contradicting our assumption that all transactions in S follow the two-phase

locking protocol.

Comment
Chapter 21, Problem 21E

Problem

Modify the data structures for multiple-mode locks and the algorithms for read_lock(X),
write_lock(X), and unlock(X) so that upgrading and downgrading of locks are possible. (Hint: The
lock needs to check the transaction id(s) that hold the lock, if any.)

Step-by-step solution

Step 1 of 1

List of transaction ids that have read-locked an item is maintained, as well as the (single)
transaction id that has write-locked an item. Only read_lock and write_lock are shown below.

read_lock (X, Tn):

B: if lock (X) = "unlocked"

then begin lock (X) <- "read_locked, List(Tn)";

no_of_reads (X) <- 1

end

else if lock(X) = "read_locked, List"

then begin

(* add Tn to the list of transactions that have read_lock on X *)

lock (X) <- "read_locked, Append(List,Tn)";

no_of_reads (X) <- no_of_reads (X) + 1

end

else if lock (X) = "write_locked, Tn"

(* downgrade the lock if write_lock on X is held by Tn itself *)

then begin lock (X) <- "read_locked, List(Tn)";

no_of_reads (X) <- 1

end

else begin

wait (until lock (X) = "unlocked" and the lock manager wakes up the transaction);

goto B;

end;

write_lock (X,Tn);

B: if lock (X) = "unlocked"

then lock (X) <- "write_locked, Tn"

else

if ( (lock (X) = "read_locked, List") and (no_of_reads (X) = 1)

and (transaction in List = Tn) )

(* upgrade the lock if read_lock on X is held only by Tn itself *)

then lock (X) = "write_locked, Tn"

else begin

wait (until ( [ lock (X) = "unlocked" ] or

[ (lock (X) = "read_locked, List") and (no_of_reads (X) = 1)

and (transaction in List = Tn) ] ) and

the lock manager wakes up the transaction);

goto B;

end;

Comment
Chapter 21, Problem 22E

Problem

Prove that strict two-phase locking guarantees strict schedules.

Step-by-step solution

Step 1 of 1

Strict two-phase locking guarantees strict schedules, Since no other transaction that can read or
write an item and written by a transaction T until , T has committed and the condition for a strict
schedule is satisfied.

Comment
Chapter 21, Problem 23E

Problem

Prove that the wait-die and wound-wait protocols avoid deadlock and starvation.

Step-by-step solution

Step 1 of 2

Two schemas that prevent deadlocks ar called wait-die and wait-wound. Suppose that
transaction Ti tries to lock an item X but is not able to because X is locked by some other
transaction Tj with a conflicting lock. The rules followed by these schemes are as follows:

• Wait – die: If TS(Ti)< TS(Tj), then (Ti older than Tj) Ti is allowed to wait; otherwise (Ti younger
than Tj) abort Ti (Ti dies) and restart it later with the same timestamp.

• Wound – wait: If TS(Ti)< TS(Tj), then (Ti older than Tj) abort Tj (Ti wounds Tj) and restart it
later with the same timestamp; otherwise (Ti younger than Tj) Ti ia allowed to wait.

Comment

Step 2 of 2

In wait-die, an older transaction is allowed to wait on younger transaction, whereas a younger


transaction requesting an item held by an older transaction is aborted and restarted. The wound-
wait approach does the opposite: A younger transaction is allowed to wait on an older one,
whereas an older transaction requesting an item held by a younger transaction preempts the
younger transaction by aborting it. Both schemes end up aborting the younger of the two
transactions that may be involved in a deadlock. It can be shows that these two techniques are
deadlock-free, since in wait-die, transactions only wait on younger transactions so no cycle is
created. However, both techniques may cause some transactions to be aborted and restarted
needlessly, even though those transactions may never actually cause a deadlock.

Comment
Chapter 21, Problem 24E

Problem

Prove that cautious waiting avoids deadlock.

Step-by-step solution

Step 1 of 1

Cautious waiting avoids deadlock:

In cautious waiting, a transaction Ti can wait on a transaction Tj (and hence Ti becomes blocked)
only if Tj is not blocked at that time, say time b(Ti), when Ti waits.

Later, at some time b(Tj) > b(Ti), Tj can be blocked and wait on another transaction Tk

only if Tk is not blocked at that time. However, Tj cannot be blocked by waiting on an

already blocked transaction since this is not allowed by the protocol. Hence, the wait-for

graph among the blocked transactions in this system will follow the blocking times and

will never have a cycle, and so deadlock cannot occur.

Comment
Chapter 21, Problem 27E

Problem

Why is two-phase locking not used as a concurrency control method for indexes such as B+-
trees?

Step-by-step solution

Step 1 of 1

Two phase locking can also be applied to indexes such as B+ trees, where the nodes of an index
correspond to disk pages. However, holding locks on index pages until the shrinking phase of
2PL could cause an undue amount of transaction blocking because searching an index always
starts at the root. Therefore, if a transaction wants to insert a record (write operation), the root
would be locked in exclusive mode, so all other conflicting lock requests for the index must wait
until the transaction enters the shrinking phase. This blocks all other transactions from accessing
the index, so in practice other approaches to locking an index must be used.

Comment
Chapter 21, Problem 28E

Problem

The compatibility matrix in Figure 21.8 shows that IS and IX locks are compatible. Explain why
this is valid.

Step-by-step solution

Step 1 of 1

IS and IX are compatible. When transaction T holds IS and IX is requested By T’, T is having
only a shared lock and moreover T’ might be having intensions having an exclusive lock on a
node that might be different from one on which T is working.

Similarly T’ might be holding IX and T might request IS lock, since T’ might be having intensions
of accessing only a node that may be different from one accessed by T both operations are
compatible.

Comment
Chapter 21, Problem 29E

Problem

The MGL protocol states that a transaction T can unlock a node N, only if none of the children of
node N are still locked by transaction T. Show that without this condition, the MGL protocol would
be incorrect.

Step-by-step solution

Step 1 of 2

The rule that parent node can be unlocked only when none of child are not still locked by
transaction T. This rule enforces 2PL rules to produce serializable schedules. If this rule is not
followed, schedule will not be serializable and if schedule will not be serializable the transaction
will not produce correct results and thus the protocol will fail.

Comment

Step 2 of 2

This rule ensures serializability of transactions by governing the order of locking and
manipulation of data item by a transaction T. Let a transaction T wants to insert data in a node.
That is let leaf node. Now before data is inserted and leaf node is unlocked let root node is
unlocked. Now consider a situation when leaf node is full, this will call for splitting, but as root has
been unlocked and might be locked by transaction T’, operation can not proceed. Hence protocol
fails.

Comment
Chapter 22, Problem 1RQ

Problem

Discuss the different types of transaction failures. What is meant by catastrophic failure?

Step-by-step solution

Step 1 of 1

Types of failures :

Computer failure –

Main memory failure. Any thing that was not committed to the disk is gone. Restart the system
and pray it doesn't crash again.

Transaction or system error –

Divide by zero or integer overflow and this transaction failure may also occur because of
erroneous parameter values or because of a logical programming error.

Logical errors:

Errors or exception conditions that are detected by the transaction. A transaction that proceeds
but halts and cancels all inputted data because something along the way prevents it from
proceeding.

Concurrency control enforcement

Several transactions become deadlocked and are aborted.

Disk failure:

Some disk blocks may lose their data because of a read or write malfunction or because of a
read/ write head crash.

Catastrophic failure:

This would include many forms of physical misfortune to our database server. This refers to an
endless list of problems

or at least your hard drive with all your data is screwed...

Comment
Chapter 22, Problem 2RQ

Problem

Discuss the actions taken by the read_item and write_item operations on a database.

Step-by-step solution

Step 1 of 1

In a database, The operations like read item and write item that may

Actions taken by the read item operation on a database (assume the read operation is performed
on data item X):

Find the address of the disk block that contains item X.

Copy the disk block into a buffer in main memory if that disk is not already in some main memory
buffer.

Copy item X from the buffer to the program variable named X.

Actions taken by the write item operation on a database (assume the write operation is
performed on data item X):

Find the address of the disk block that contains item X.

Copy the disk block into a buffer in main memory if that disk is not already in some main memory
buffer.

Copy item X from the program variable named X into its correct location in the buffer.

Store the updated block from the buffer back to disk (either immediately or at some later point in
time).

Comment
Chapter 22, Problem 3RQ

Problem

What is the system log used for? What are the typical kinds of entries in a system log? What are
checkpoints, and why are they important? What are transaction commit points, and why are they
important?

Step-by-step solution

Step 1 of 4

System log: Recovery from transaction failures usually means that the database is restored to
the most recent consistent state just before the time of failure. To do this, the system must keep
information about changes that were applied to data items by various transactions. This
information is typically kept in the system log. Thus system logs help in data recovery in case of
failures.

Comment

Step 2 of 4

A typical strategy for recovery may b summarized information as follows:

1.) If there is extensive damage to a wide portion of the database due to catastrophic failure,
such as a disk crash, the recovery method restores a past copy of the database that was backed
up to archival storage and reconstructs a more current state by reapplying or redoing the
operations of committed transactions from the backed up log, up to the time of failure.

2.) When the database is not physically damaged, but has become inconsistent due to non-
catastrophic failures the strategy is to reverse any changes that caused inconsistency by undoing
some operations. It may also be necessary to re-do some operations in order to restore a
consistent state of database. In this case, we do not need a complete archival copy of the
database. Rather, the entries kept in the online system log are consulted during recovery.

Typical kind of entries that System log include:

1.) [T, write command, data item,old value, new value]

2.) [T, read command, data item, value] //used for checking accesses to database

2.) [Checkpoint]

3.) [Commit, T]

4.) read_TS //(TS = TimeStamp)

5.) write_TS

Comment

Step 3 of 4

Checkpoint: This is a type of entry in the system log. A [checkpoint] record is written into the log
periodically at that point when the system writes out to the database on disk all DBMS buffers
that have been modified. As a consequence of this, all transactions have there [commit, T]
entries in the log before a [checkpoint] entry do not need to have their WRITE operations redone
in case of a system crash, since all their updates will be recorded in the database on disk during
check pointing. A checkpoint record may also include additionally information, such as a list of
active transaction ids, and the locations of the first and the most recent records in the log for
active transaction. This can facilitate undoing transaction operations in the event that a
transaction must be rolled back.

Comment

Step 4 of 4

Commit Point: A commit point is point at which execution o transaction gets completed and is
written to database and cannot be rolled back.

A commit point is important in case of recovery techniques based on deferred updates.

A typical deferred update protocol is stated as follows:

1.) A transaction cannot change the database on disk until it reaches commit point.

2.) A transaction does not reach its commit point until all its update operations are recorded in
the log and the log is force- written to disk.

Comment
Chapter 22, Problem 4RQ

Problem

How are buffering and caching techniques used by the recovery subsystem?

Step-by-step solution

Step 1 of 1

Buffering and caching techniques in the recovery subsystem:-

In a subsystem. The recovery process is of ten closely inter twined with operating system
functions. In general one or more disk pages that include the data items to be updated are
cached into main memory buffers and then updated in memory before being written back to disk.

At this time, the performance gap between disk and CPU increase, disk I/O has become a major
performance bottleneck for data intensive applications. Disk I/O latency, in particular is much
more difficult to improve than disk band width.

While, buffering and caching in main memory have been used extensively to bridge the
performance gap between CPU and disk.

Comment
Chapter 22, Problem 5RQ

Problem

What are the before image (BFIM) and after image (AFIM) of a data item? What is the difference
between in-place updating and shadowing, with respect to their handling of BFIM and AFIM?

Step-by-step solution

Step 1 of 3

BFIM and AFIM :-

Before image (BFIM) :-

The old value of the data item before updating is called the before image (BFIM).

After image (AFIM):-

The new value of the data item after updating is called the after image (AFIM)

Comment

Step 2 of 3

When flushing a modified buffer back to disk is follows two strategies. That are

In – place updating.

shadowing.

Comment

Step 3 of 3

Difference between in place updating and shadowing:-

In – place updating – writes the buffer to the same original disk location, and over writing the old
value of any changed data items on disk.

Here , single copy of each database disk block is maintained.

This process is called as before image.

Shadowing:-

Writes an updated buffer at a different disk location.

Here multiple versions of data items can be maintained.

This process is called as after image (AFIM).

BFIM and AFIM, both are kept on disk and it is not strictly necessary to maintain a log for
recovery.

Comment
Chapter 22, Problem 6RQ

Problem

What are UNDO-type and REDO-type log entries?

Step-by-step solution

Step 1 of 2

UNDO type and REDO type log entries:-

In the database recovery techniques, the recovery is achieved by the performing only UNDO’s
and only REDO’s or by a combination of the two.

These operations are recovered in the log when they happen.

The log entry information included for a write command and it is needed for UNDO and REDO.

UNDO type log entries:-

This entries includes the old value (S) in the data base before a write operation has

been executed

UNDO type log entries are necessary for rollback operations.

This type entries are use full in “Restore all BFIMs on to the disk, means Remove all

AFIMs.

Comment

Step 2 of 2

REDO type log entries:-

These entries, includes the new values in the data base a write operation has been

executed.

It is necessary for repeating already committed transactions.

Ex: In case of disk failure.

This type of entries are use full in “Restore the all AFIMs on to disk.

Comment
Chapter 22, Problem 7RQ

Problem

Describe the write-ahead logging protocol.

Step-by-step solution

Step 1 of 1

Write – ahead logging protocol:-

When in – place up dating, (means immediate or differed ) is used, then log is necessary for
recovery and in this case, it must be available to recovery manager.

For example:

If BFIM of the data item is recoded in the appropriate log entry and that the log entry is flushed to
disk before the BFIM is overwritten with the AFIM in the database on disk.

This total achieved by write – ahead logging (WAL) protocol.

Write – Ahead protocol states that

(1) For undo:-

Before a data item’s AFIM is flushed to the database disk, its BFIM must be written to the log and
the log must me saved on a stable store. (log disk).

(2) For Redo:-

Before a transaction executes, its commit operation.

All it’s AFIM must be written to the log and the log must be saved on a stable store.

Comment
Chapter 22, Problem 8RQ

Problem

Identify three typical lists of transactions that are maintained by the recovery subsystem.

Step-by-step solution

Step 1 of 1

List of transaction maintained by the recovery sub systems:-

For the best performance of the recovery process, the DBMS recovery subsystem may need to
maintain number of transactions. In that three main and typical transactions is there.

That are

(1) active transactions.

(2) Committed transactions.

(3) Aborted transactions.

These three lists makes the recovery process more efficient.

Comment
Chapter 22, Problem 9RQ

Problem

What is meant by transaction rollback? What is meant by cascading rollback? Why do practical
recovery methods use protocols that do not permit cascading rollback? Which recovery
techniques do not require any rollback?

Step-by-step solution

Step 1 of 4

Transaction roll back:-

Transaction rollback means that, if a transaction has failed after a disk write, the writes need to
be undone.

Means that,

To maintain atomicity, a transaction’s operations are redone or undone.

Undo : Restore all BFIM s on to disk (Remove all AFIM s)

Redo: Restore all AFIM s on to disk.

Data base recovery is achieved either by performing only Undo s or only Redo s by a
combination of the two.

These operations are recorded in the log as they happen.

Comment

Step 2 of 4

Cascading roll back:

Cascading roll back is where the failure and rollback of some transaction requires the rollback of
other.

Uncommitted transactions because they read updates of the failed transaction.

And

In mean wile, any values that are derived from the values that were rolled back will also be undo.

Comment

Step 3 of 4

Practical recovery methods use protocols that do not permit cascading roll back because, it is
complex and time – consuming.

Practical recovery methods guarantee cascade less or strict schedules.

Comment

Step 4 of 4

UNDO / REDO recovery technique is do not required any rollback in a deferred update.

Comment
Chapter 22, Problem 10RQ

Problem

Discuss the UNDO and REDO operations and the recovery techniques that use each.

Step-by-step solution

Step 1 of 1

UNDO / REDO operations:-

If we want to describe a protocol for write – ahead logging, then we must distinguish between two
types of log entry information included for a write.

Command that are

UNDO

REDO

A UNDO – type log entries includes the old value (BFIM) of the item since this is needed to undo
the effect of the operation from the log.

A REDO – type entry includes the new value (AFIM) of the item written by the operation since
this is needed to read the effect of the operation from the log.

In the UNDO / REDO algorithm, both types of log entries are combined. And cascading roll back
is possible when the read – item entries in the log are considered to be UNDO – type entries.

Comment
Chapter 22, Problem 11RQ

Problem

Discuss the deferred update technique of recovery. What are the advantages and disadvantages
of this technique? Why is it called the NO-UNDO/REDO method?

Step-by-step solution

Step 1 of 5

Deffeved update technique of recovery:-

The main thought of this technique is, to deffer or postpone any actual updates to the database
until the transaction completes its execution successfully end reaches its commit point.

Through this technique, the updates are recorded only in the log and in the cache buffers.

After the transaction reaches its commit point and the log is force written to disk and the updates
are recorded in the data base.

Differed update technique is also called as NO – UNDO / REDO recovery.

Deferred update protocol. It maintains two main rules.

A transaction cannot change any items in the database until it commits.

A transaction may not commit until all of the write operations are successfully

recorded in the log.

This means that we must check to see that the log is actually written to disk

Example:-

Comment

Step 2 of 5

Log file:

Start

write

commit

check point

start

write

write

commit

start

write

start

write

system crash ……………

Comment

Step 3 of 5

from this example:

Since committed, their changes were written to disk.

How ever, did not commit, hence, their changes were not written to disk.

To recover, we simply ignore those transactions that did not commit.

Comment

Step 4 of 5

Advantages and disadvantages of deferred update technique:-

Advantages:-
Recovery is made easier.

Any transaction that reached the commit point (from the log) has its writes applied to

the database (REDO).

All other transactions are ignored.

Cascading rollback does not occur because, no other transactions sees the work of

another until it is committed (no stale reads).

Disadvantages:-

Concurrency is limited:

Must empty strict 2PL which limits concurrency.

Comment

Step 5 of 5

Deferred update technique is called as

NO – UNDO / REDO recovery method because. From the second step (A transaction does not
reach its commit point until all its update operations are recorded in the log and the log is force –
written to disk ) of this protocol is a restatement of the write – ahead logging (WAL) protocol.
Because the database is never updated on disk until after the transaction commits. There is
never a need to UNDO any operations.

Hence this is known as the NO – UNDO / REDO method.

Comment
Chapter 22, Problem 12RQ

Problem

How can recovery handle transaction operations that do not affect the database, such as the
printing of reports by a transaction?

Step-by-step solution

Step 1 of 1

If a transaction that has actions that do not affect the database, such a generating and printing
messages or reports from the information retrieved from the database, fails before completion,
we may not want user to get these reports, since the transaction has failed to complete. If such
erroneous reports are produced, part of the recovery process would have to inform the user
these reports are wrong, since the user may take an action based on these reports that affects
the database. Hence such reports must be generated only after the transaction reaches its
commit point. A common method of dealing with such actions is to issue the command that
generate the reports but keep them as batch jobs, which are executed only after the transaction
reaches its commit point. If the transaction fails, the batch jobs are canceled.

Comment
Chapter 22, Problem 13RQ

Problem

Discuss the immediate update recovery technique in both single-user and multiuser
environments. What are the advantages and disadvantages of immediate update?

Step-by-step solution

Step 1 of 2

Immediate update technique:-

Immediate update applies the write operations to the database as the transaction is executing.
When the transaction issues an update commend. Then the database can be updated with out
any need to wait for the transaction to reach it’s commit point and the update operation must still
be recorded in the log before it is applied to the database using the write ahead is maintain two
logs.

(1) REDO log : A record of each new data item in the database.

(2) UNDO log: A record of each update data item old vale

And

It follows the two rules.

(1) Transaction T may not update the database until all undo entries have been written to the
UNDO log.

(2) Transaction T is not allowed to commit until all REDO and UNDO log entries are written.

Comment

Step 2 of 2

Advantages and disadvantages of immediate update:-

Advantages:-

Immediate update allows higher concurrency, because transactions write continuously to the
database rather than waiting until the commit point.

Disadvantages:-

It can lead the cascading roll backs – time consuming and may be problematic.

Comment
Chapter 22, Problem 14RQ

Problem

What is the difference between the UNDO/REDO and the UNDO/NO-REDO algorithms for
recovery with immediate update? Develop the outline for an UNDO/NO-REDO algorithm.

Step-by-step solution

Step 1 of 3

Difference between UNDO/ REDO and UNDO/NO – REDO algorithms:-

UNDO / REDO algorithms:-

Recovery techniques based on immediate update and it uses in the single user

environment.

This recovery schema category apply to undo and also redo for recovery.

In a single – user environment there is no concurrency control is required but a log is

maintained under WAL.

The recovery manger performs.

Undo of a transaction if it is in the active table.

Redo of a transaction if it is in the commit table.

Recovery schemas of this category applies undo and also redo to recover the database

from failure.

Comment

Step 2 of 3

UNDO / NO – REDO algorithm:-

In this algorithm, AFIM’s of a transaction are flushed to the database disk under

WAL before it commits.

For this reason the recovery manager undoes all transactions during recovery.

Here No transaction is redone.

It is possible that a transaction might have completed execution and ready to commit

but this transaction is also undone.

Comment

Step 3 of 3

Out line for a an undo / No – Redo algorithm:-

In this algorithm, AFIMs of a transaction are flushed to the database disk under WAL

before it commits.

Reason for the recovery manager undoes all transactions during recovery

Here NO trans.

Comment
Chapter 22, Problem 15RQ

Problem

Describe the shadow paging recovery technique. Under what circumstances does it not require a
log?

Step-by-step solution

Step 1 of 3

Shadow paging recovery technique:-

Shadow paging is considers that the data base to be made up of a number of fixed size disk
pages (or disk blocks ) – say , n – for recovery purposes.

Shadow paging technique is mused to manage the access of data items by the concurrent
transactions, two directories (current and shadow) are used. The directory arrangement is
illustrated below.

Comment

Step 2 of 3

Current directory. Shadow directory

After updating data items 2.5 (not up dated)

Comment

Step 3 of 3

Here data items means pages:-

Shadow paging is not required for a log

Comment
Chapter 22, Problem 16RQ

Problem

Describe the three phases of the ARIES recovery method.

Step-by-step solution

Step 1 of 2

Three phases of ARIES recovery method:-

The ARIES recovery methods / Algorithms. Consists of three phases.

(1) Analysis phase

(2) Redo phase

(3) Undo phase.

Comment

Step 2 of 2

In the analysis phase, step identifies the dirty pages in the buffer and the set of transactions
active at the time of crash. The appropriate point in the log where redo is to start is also
determined,

Where in the redo phase, redo operations are applied and where in undo. The log is scanned
back words and the operations of transactions active at the time of crash are undone in reverse
order.

Comment
Chapter 22, Problem 17RQ

Problem

What are log sequence numbers (LSNs) in ARIES? How are they used? What information do the
Dirty Page Table and Transaction Table contain? Describe how fuzzy checkpointing is used in
ARIES.

Step-by-step solution

Step 1 of 4

Log sequence numbers in ARIES:-

In ARIES, every log record is associated log sequence number (LSN) that is monotonically
increasing and indicates the address of the log record on disk.

A log record is used for to write.

(1) data up date

(2) transaction commit

(3) transaction abort

(4) undo

(5) transaction end.

Comment

Step 2 of 4

In the case of undo, a compensating log record is written.

Dirty page table and Transaction table:-

For efficient recovery, two tables are needed.

These tables are stored in the log during checkpoint.

(1) Transaction table :-

Table contains an entry for each active transaction, with information such as

transaction ID. Transaction status and the LSN of the most recent log record for the

transaction.

(2) Dirty page table:-

This table contains an entry for each dirty page, in the buffer which includes the page

ID and the LSN corresponding to the earliest update to that page.

Comment

Step 3 of 4

Fuzzy check pointing:-

Fuzzy check pointing is used for to reduce the cost of check pointing and allow the system to
continue to execute transactions.

ARIES uses the fuzzy check pointing it does follows.

Writes a begin-check point record in the log.

Writes an end – check point record in the log. With this record the contents of

transaction table and dirty table are appended to the end of the log.

Writes the LSN of the begin – check point record to a special file. This special file is

accessed during recovery to located the last check point information.

Comment

Step 4 of 4

In practice, Fuzzy check point technique use when the system can resume transaction
processing after the record is written to the log without having to wait for the process of check
point action step 2 ( force – write all memory buffers that have been modified to disk ) to finish
until the above step is completed. Then the previous record should remain valid.

To accomplish this, the system maintains a pointer to the valid check point, which continues to
point to the previous record in the log. Once the above step is concluded, the pointer changes to
point to the new check point in the log.

Comment
Chapter 22, Problem 18RQ

Problem

What do the terms steal/no-steal and force/no-force mean with regard to buffer management for
transaction processing?

Step-by-step solution

Step 1 of 1

In a transaction processing , he buffer management, manages through.

(1) Steal / no – steal :-

A system is said to steal buffers if it allows the buffers that contain dirty data (means it is
uncommitted but updated) data to be swapped to physical storage.

If steal is allowed in the buffer management undo transaction is necessary.

(2) Force / No – force :-

A system is said to force buffers if every committed data is guarantied to be forced on to the disk
at commit time.

If force is not allowed, redo is necessary.

Comment
Chapter 22, Problem 19RQ

Problem

Describe the two-phase commit protocol for multidatabase transactions.

Step-by-step solution

Step 1 of 1

Prepare phase –

The global coordinator (initiating node) ask a participants to prepare (to promise to commit or
rollback the transaction, even if there is a failure)

Commit - Phase –

If all participants respond to the coordinator that they are prepared, the coordinator asks

all nodes to commit the transaction, if all participants cannot prepare, the coordinator asks all
nodes to roll back the transaction.

Comment
Chapter 22, Problem 20RQ

Problem

Discuss how disaster recovery from catastrophic failures is handled.

Step-by-step solution

Step 1 of 1

Catastrophic failures from handled by disaster recovery, in this, the entire database along with a
log file are copied to a cheap and large storage device periodically. When a catastrophe strikes,
the most recent back up copy is placed back where the database used to do.

Comment
Chapter 22, Problem 21E

Problem

Suppose that the system crashes before the [read_item, T3, A] entry is written to the log in
Figure 22.1(b). Will that make any difference in the recovery process?

Step-by-step solution

Step 1 of 1

Let consider the data from text book figure 19.1(b) .

If the system crashes before the [ read_item, T3, A] entry is written to the log, There will be no
difference in the recovery process, because read_item operations are needed only for
determining if cascading rollback of additional transactions is necessary.

Comment
Chapter 22, Problem 22E

Problem

Suppose that the system crashes before the [write_item, T2, D, 25, 26] entry is written to the log
in Figure 22.1(b). Will that make any difference in the recovery process?

Step-by-step solution

Step 1 of 2

When the system cashes before the transaction T2 performs a write operation on item D, there
will a difference in the recovery process.

Comment

Step 2 of 2

During the recovery process, the following transactions must be rolled back.

• The transaction T3 has not reached it commit point. So, the transaction T3 have to be rolled
back.

• Also, the transaction T2 has not reached it commit point. So, the transaction T2 have to be
rolled back.

Hence, the transactions T2 and T3 have to be rolled back in the recovery process.

Comment
Chapter 22, Problem 23E

Problem

Figure shows the log corresponding to a particular schedule at the point of a system crash for
four transactions T1 T2, T3, and T4. Suppose that we use the immediate update protocol with
checkpointing. Describe the recovery process from the system crash. Specify which transactions
are rolled back, which operations in the log are redone and which (if any) are undone, and
whether any cascading rollback takes place.

Figure A sample schedule and its corresponding log.

Step-by-step solution

Step 1 of 5

The recovery process from the system crash will be as follows:

• Undo all the write operations of the transaction that are not committed.

• Redo all the write operations of the transaction that are committed after the check point.

• Do not redo/undo the transaction that have committed before checkpoint.

Comment

Step 2 of 5

The transactions that need to be roll backed are as follows:

• The transaction T3 has not reached it commit point. So, the transaction T3 have to be rolled
back.

• Also, the transaction T2 has not reached it commit point. So, the transaction T2 have to be
rolled back.

Comment

Step 3 of 5

The operations that are to be redone are as follows:

• write_item, T4, D, 25, 15: The transaction T4 must redo the write operation on item D.

• write_item, T4, A, 30, 20: The transaction T4 must redo the write operation on item A.

Comment

Step 4 of 5

The operations that are to be undone are as follows:

• write_item, T2, D, 15, 25

• write_item, T3, C, 30, 40:

• write_item, T2, B, 12, 18:

Comment
Step 5 of 5

As no transaction has read an item which is written by an uncommitted transaction, no cascading


rollbacks occur in the schedule.

Comment
Chapter 22, Problem 24E

Problem

Suppose that we use the deferred update protocol for the example in Figure 22.6. Show how the
log would be different in the case of deferred update by removing the unnecessary log entries;
then describe the recovery process, using your modified log. Assume that only REDO operations
are applied, and specify which operations in the log are redone and which are ignored.

Step-by-step solution

Step 1 of 2

In the case of deferred update by removing the un necessary log entries , the write operations of
uncommitted transactions are not recorded in the database until the transactions commit. So, the
write operations of T2 and T3 would not have been applied to the database and so T4 would
have read the previous values of items A and B, thus leading to a recoverable schedule.

By using the procedure RDU_M (deferred update with concurrent execution in a

multiuser environment), the following result is obtained:

Comment

Step 2 of 2

The list of committed transactions T since the last checkpoint contains only transaction

T4. The list of active transactions T' contains transactions T2 and T3.

Only the WRITE operations of the committed transactions are to be redone. Hence, REDO is
applied to:

[write_item,T4,B,15]

[write_item,T4,A,20]

The transactions that are active and did not commit i.e., transactions T2 and T3 are

canceled and must be resubmitted. Their operations do not have to be undone since they

were never applied to the database

Comments (1)
Chapter 22, Problem 25E

Problem

How does checkpointing in ARIES differ from checkpointing as described in Section 22.1.4?

Step-by-step solution

Step 1 of 1

From described in section 19.1.4 in textbook,

The main difference is that with ARIES, main memory buffers that have been modified are not
flushed to disk. ARIES, however writes additional information to the LOG in the form of a
Transaction Table and a Dirty Page Table when a checkpoint occurs.

Comment
Chapter 22, Problem 26E

Problem

How are log sequence numbers used by ARIES to reduce the amount of REDO work needed for
recovery? Illustrate with an example using the information shown in Figure 22.5. You can make
your own assumptions as to when a page is written to disk.

Step-by-step solution

Step 1 of 1

ARIES can be used to reduce the amount of REDO work through log sequence numbers as
follows:

• ARIES reduces the amount of REDO work by starting redoing after the point, where all prior
changes have been applied to the database. ARIES performs REDO at the position in the log
that corresponds to smallest LSN, M.

• In the Figure 22.5, REDO must start at the log position 1 as the smallest LSN in Dirty Page
Table is 1.

• When , then the page corresponding to LSN is changed and is propagated to the
database.

• In the figure 22.5 the transaction performs the update of page C and page C has a LSN of
7.

• When REDO starts at log position 1, page C is propagated to the database. But the page C is
not changed as its LSN (7) is greater than the LSN of current log position (1).

• Now consider the LSN 2. Page B is associated with this LSN and it would be propagated to the
database. The page B would be updated if its LSN is less than 2. Similarly, the page
corresponding to LSN 6 would be updated.

• However the page corresponding to the LSN 7 need not be updated as the LSN of page C, that
is 7, is not less than the current log position.

Comment
Chapter 22, Problem 27E

Problem

What implications would a no-steal/force buffer management policy have on checkpointing and
recovery?

Step-by-step solution

Step 1 of 1

No-Steal/Force Buffer Management Policy Implications

• No-steal/force buffer management policy means that the cache or buffer page that has been
updated by the transaction cannot be written to disk before the transaction commits

• Force means that pages updated by a transaction are written to disk before transaction commit.

• During checkpoint scheme in no-steal, all modified main memory buffers to disk would not be
able to write pages updated by uncommitted transactions.

• With Force, after a transaction is done, its updates are written to disk. If there is any failure
during this transaction, then REDO is still needed. UNDO is not needed since uncommitted
updates are never written to disk.

Comment
Chapter 22, Problem 28E

Problem

Choose the correct answer for each of the following multiple-choice questions:

Incremental logging with deferred updates implies that the recovery system must

a. store the old value of the updated item in the log

b. store the new value of the updated item in the log

c. store both the old and new value of the updated item in the log

d. store only the Begin Transaction and Commit Transaction records in the log

Step-by-step solution

Step 1 of 1

Incremental loging with deferred updates implies that the recovery system must necessarily,

Option (b)

Store the new value of the updated item in the log.

Comment
Chapter 22, Problem 29E

Problem

Choose the correct answer for each of the following multiple-choice questions:

The write-ahead logging (WAL) protocol simply means that

a. writing of a data item should be done ahead of any logging operation

b. the log record for an operation should be written before the actual data is written

c. all log records should be written before a new transaction begins execution

d. the log never needs to be written to disk

Step-by-step solution

Step 1 of 1

The write ahead logging (WAL) protocol simply means that the log record for an operation should
be written before the actual data is written.

Option (b)

The log record for an operation should be written before the actual data is written.

Comment
Problem
Chapter 22, Problem 30E

Choose the correct answer for each of the following multiple-choice questions:

In case of transaction failure under a deferred update incremental logging scheme, which of the
following will be needed?

a. an undo operation

b. a redo operation

c. an undo and redo operation

d. none of the above

Step-by-step solution

Step 1 of 1

In case of transaction failure under a deferred update incremental logging scheme which of the
following will needed.

Option (c)

An undo and redo operations.

Comments (1)
Chapter 22, Problem 31E

Problem

Choose the correct answer for each of the following multiple-choice questions:

For incremental logging with immediate,updates, a log record for a transaction would contain

a. a transaction name, a data item name, and the old and new value of the item

b. a transaction name, a data item name, and the old value of the item

c. a transaction name, a data item name, and the new value of the item

d. a transaction name and a data item name

Step-by-step solution

Step 1 of 1

For incremental logging with immediate updates a log record for a transaction would contain.

Option (a)

A Transaction name, data item name, old value of item, new value of item

Comment
Chapter 22, Problem 32E

Problem

Choose the correct answer for each of the following multiple-choice questions:

For correct behavior during recovery, undo and redo operations must be

a. commutative

b. associative

c. idempotent

d. distributive

Step-by-step solution

Step 1 of 1

For correct behavior during recovery, undo and redo operations must be

Option (c)

Idempotent

Comment
Chapter 22, Problem 33E

Problem

Choose the correct answer for each of the following multiple-choice questions:

When a failure occurs, the log is consulted and each operation is either undone or redone. This
is a problem because

a. searching the entire log is time consuming

b. many redos are unnecessary

c. both (a) and (b)

d. none of the above

Step-by-step solution

Step 1 of 1

When a failure occurs, the log is consulted and each operation is either undone or redone.

This is the problem because.

Option (a)

Searching the entire log is time consuming.

Comment
Chapter 22, Problem 34E

Problem

Choose the correct answer for each of the following multiple-choice questions:

Using a log-based recovery scheme might improve performance as well as provide a recovery
mechanism by

a. writing the log records to disk when each transaction commits

b. writing the appropriate log records to disk during the transaction’s execution

c. waiting to write the log records until multiple transactions commit and writing them as a batch

b. never writing the log records to disk

Step-by-step solution

Step 1 of 1

When using a log based recovery scheme it might improve performance as well as providing a
recovery mechanism by

Option C

Waiting to write the log records until multiple transactions commit and waiting them as a batch.

Comment
Chapter 22, Problem 35E

Problem

Choose the correct answer for each of the following multiple-choice questions:

There is a possibility of a cascading rollback when

a. a transaction writes items that have been written only by a committed transaction

b. a transaction writes an item that is previously written by an uncommitted transaction

c. a transaction reads an item that is previously written by an uncommitted transaction

d. both (b) and (c)

Step-by-step solution

Step 1 of 1

There is a possibility of a cascading rollback when

Option (d)

A transaction writes & reads an item that is previously written by an uncommitted transaction.

Comment
Chapter 22, Problem 36E

Problem

Choose the correct answer for each of the following multiple-choice questions:

To cope with media (disk) failures, it is necessary

a. for the DBMS to only execute transactions in a single user environment

b. to keep a redundant copy of the database

c. to never abort a transaction

d. all of the above

Step-by-step solution

Step 1 of 1

To cope with media (disk) failures. It is necessary

Option (b)

To keep a redundant copy of the database.

Comment
Chapter 22, Problem 37E

Problem

Choose the correct answer for each of the following multiple-choice questions:

If the shadowing approach is used for flushing a data item back to disk, then

a. the item is written to disk only after the transaction commits

b. the item is written to a different location on disk

c. the item is written to disk before the transaction commits

b. the item is written to the same disk location from which it was read

Step-by-step solution

Step 1 of 1

If the shadowing approach is used for flushing a data item back to disk then.

Option (b)

The item is written to different location on disk.

Comment
Chapter 30, Problem 1RQ

Problem

Discuss what is meant by each of the following terms: database authorization, access control,
data encryption, privileged (system) account, database audit, audit trail.

Step-by-step solution

Step 1 of 1

Database authorization

Database authorization ensures the security of the portions of the database against unauthorized
access.

Access control

Most common problem of security is the prevention of accessing the system by an unauthorized
person to obtain information or to inject malicious content that modifies the database. DBMS
must include various security mechanisms which restrict access to the entire database system.
This function is performed by creating user accounts and passwords for the login process to
secure from unauthorized users by the DBMS.

Data encryption

Sensitive data such as card numbers (ATM or credit card) provided by bank must be protected
that is transmitted through communications network; it provides additional protection for
database. The data is encoded so that unauthorized users who access those data will have
difficulty in decoding it.

Privileged account

The DBA account provides important capabilities. The commands are privileged that include
granting and revoking commands of privileges to individual accounts, users, or user groups by
performing following actions

• Account creation

• Privilege granting

• Privilege revocation

• Security level assignment

Database audit

If there are any modifications or any alterations with the database are identified without their
knowledge, a database audit is performed. It consists of reviewing the log to examine all
accesses and operations applied to the database during certain period of time.

Audit trail

The database log is used for security purposes as it contains all details of the accessing and the
operations are referred as audit trail.

Comment
Chapter 30, Problem 2RQ

Problem

Which account is designated as the owner of a relation? What privileges does the owner of a
relation have?

Step-by-step solution

Step 1 of 1

Owner account is designated as the owner of a relation which is typically the account that was
used when the relation was created in the first place. The owner of a relation is given all
privileges on that relation. The owner account holder can pass privileges on any of the owner
relation to other users by granting privileges to their accounts.

Comment
Chapter 30, Problem 3RQ

Problem

How is the view mechanism used as an authorization mechanism?

Step-by-step solution

Step 1 of 1

The view mechanism is an important discretionary authorization mechanism in its own right.

For example:-

If the owner A of a relation R wants another account B to be able to retrieve only some fields of
R, then A can create a view V of R that includes only those attributes and then grant SELECT on
V to B. the same applies to limiting B to retrieving only certain tuples of R; a view V can be
created by defining the view by means of a query that selects only those tuples from R that A
wants to allow B to access.

Comment
Chapter 30, Problem 4RQ

Problem

Discuss the types of privileges at the account level and those at the relation level.

Step-by-step solution

Step 1 of 1

There are two levels of privileges to be assigned to use the database system, account level and
relation (or table level).

• At account level, each account of the relation holds particular privileges independently specified
by the database administrator in the database.

• At relation level, each individual relation or view in the database accessing privileges are
controlled by database administrator.

Account level

It includes,

1. CREATE SCHEMA or CREATE TABLE privilege, to create a schema.

2. CREATE VIEW privilege.

3. ALTER privilege, to perform changes such as adding or removing attributes.

4. DROP privilege, to delete relations or views.

5. MODIFY privilege, to insert, delete, or update tuples.

6. SELECT privilege, to retrieve information from the database.

Relation level

• It refers to either base relation or view (virtual) relation.

• Each type of command can be applied for each user by specifying the individual relation.

Access matrix model, an authorization model is used for granting and revoking of privileges.

Comment
Chapter 30, Problem 5RQ

Problem

What is meant by granting a privilege? What is meant by revoking a privilege?

Step-by-step solution

Step 1 of 1

Granting and revoking of privileges should be performed so that it ensures secure and authorized
access and hence both of them should be controlled on each relation R in a database.

It is carried out by assigning an owner account, which is the account that was used when the
relation was created. The owner of the relation is the one who uses all privileges on that relation.

Granting of privileges

The owner account holder can transfer the privileges on any of the relations owned to other
users by issuing GRANT command (granting privileges) to their accounts. Types of privileges
granted on each individual relation R by using GRANT command are as follows,

• SELECT privilege on some relation, gives the privilege to retrieve the information (tuples) from
that relation.

• Modification privilege is provided to do insert, delete, and update operations that modify the
database.

• References privilege is granted to refer a relation based on integrity constraints specified.

Revoking of privileges

When any of the privileges is granted it is given temporarily, it should be necessary to cancel that
privilege after the task has been completed. REVOKE command is used in SQL for canceling the
privileges granted to them.

Comment
Chapter 30, Problem 6RQ

Problem

Discuss the system of propagation of privileges and the restraints imposed by horizontal and
vertical propagation limits.

Step-by-step solution

Step 1 of 2

Propagation of privileges: whenever the owner A of a relation R grants a privilege on R to


another account B, the privilege can be given to B with or without the GRANT OPTION. If the
GRANT OPTION is given, this means that B can also grant that privilege on R to other accounts.
Suppose that B is given GRANT OPTION by A and that B then grants the privilege on R to a
third account C, also with GRANT OPTION. In this way, privileges on R can propagate to other
accounts without the knowledge of the owner of R. If the owner account A now revoke he
privileges granted to B, all the privileges that B propagated based on that privileges should
automatically be revoked by the system.

It is possible for a user to receive a certain privileges from two or more sources. For example, A'
may a certain privilege from both B' and C'. Now let B' revokes privileges from A' but A' will still
have them from virtue of C'. If now C' also revokes the privileges A' will loose them permanently.
The DBMS that allows propagation of privileges must keep a track of how all the privileges were
granted do that revoking of privileges can be done correctly and completely.

Comment

Step 2 of 2

Since propagation of privileges can lead to many accounts having privilege on a relation without
the knowledge of owner. There must be ways to restrict number of people that can have
privileges on an relation. This can be done using limiting by Horizontal propagation and by
limiting by Vertical propagation.

Limiting Horizontal propagation to an integer number i mean that an account B given the
GRANT OPTION can grant privileges to at most i other accounts.

Vertical propagation limits the depth of the granting of privileges. Granting of privileges with
vertical propagation zero is equivalent to granting the privileges with no GRANT OPTION. If
account A grants privileges to account B with vertical propagation set to j>0, this means that the
account B has GRANT OPTION on the privilege, but B can grant the privilege to other accounts
only with a vertical propagation less than j. In effect vertical propagation limits the sequence of
GRANT OPTIONS that can be given from one account to the next based on single original grant
of the privileges.

For example: Suppose that A grant SELECT to B on EMPLOYEE relation with horizontal
propagation = 1 and vertical propagation = 2. B can grant select to almost one account because
horizontal propagation = 1. Additionally, B cannot grant privilege to another account with vertical
propagation set to 0 or 1. Thus we can limit propagation by using these two methods.

Comment
Chapter 30, Problem 7RQ

Problem

List the types of privileges available in SQL.

Step-by-step solution

Step 1 of 1

Following type of privileges can be granted on each individual relation R:

1.) Select (retrieval or read) privilege on R: Gives the account retrieval privilege. In SQL this
gives the account the privilege to use SELECT statement to retrieve the tuples from R

2.) Modify privilege on R: This gives the account the capability to modify tuples of R. In SQL
this privilege is further divided into UPDATE, DELETE, and INSERT privileges to apply
corresponding SQL commands to R. Additionally, both the INSERT and UPDATE privileges can
specify that only certain attributes of R can be updated by the account.

3.) Reference privileges on R: This gives the account the capability to reference relation R
when specifying integrity constraints. This privilege can also be restricted to specific attributes of
R.

To create a view an account must have SELECT privilege on all relations involved in view
definition.

Comment
Chapter 30, Problem 8RQ

Problem

What is the difference between discretionary and mandatory access control?

Step-by-step solution

Step 1 of 2

a. Discretionary Access Control (DAC) policies are characterized by a high degree of flexibility,
which makes them suitable for a large variety of application domains.

By contrast Mandatory Access Control policies are having a drawback of being too rigid in
that they require a strict classification of subject and objects into security levels, and therefore
they are applied to ery few environments.

Comment

Step 2 of 2

b. The main drawback of DAC models is their vulnerability to malicious attacks, such as Trojan
horses embedded in application programs. The reason is that discretionary authorization models
do not impose any control on how information is propagated and used once it has been
accessed by authorized user to do so.

By contrast Mandatory Access Control policies ensure a high degree of protection- in a way,
they prevent any illegal flow of information.

Comment
Chapter 30, Problem 9RQ

Problem

What are the typical security classifications? Discuss the simple security property and the *-
property, and explain the justification behind these rules for enforcing multilevel security.

Step-by-step solution

Step 1 of 1

Typical security classes are top secret (Ts), secret (S), confidential (C), and unclassified (U),
where TS is the highest level and U the lowest: .

Simple security: A subject S is not allowed read access to an object 0 unless


. This is known as simple security property.

*Property: A subject S is not allowed to write on object O unless . This


known as star property.

The first rule is that no subjects can red on object whose security classification is higher than the
subject’s security clearance.

The second restriction is less intuitive; it prohibits a subject from writing an object at a lower
security classification than the subject’s security clearance violations of this rule would allow
information to flow from higher to lower classifications which violates a basic tenet of multilevel
security.

Comment
Chapter 30, Problem 10RQ

Problem

Describe the multilevel relational data model. Define the following terms: apparent key,
polyinstantiation, filtering.

Step-by-step solution

Step 1 of 3

Define:

1.) Apparent key: The apparent key of a multilevel relation is the set of attributes that would
have formed the primary key in a regular (single- level) relation.

Comment

Step 2 of 3

2.) Filtering: A multilevel relation will appear to contain different data to subjects with different
clearance levels. In some cases, it is possible to store a single tuple in the relation at a higher
classification level and produce the corresponding tuples at a lower- level classification through a
process known s filtering.

Comment

Step 3 of 3

3.) Polyinstantiation In some cases, it is necessary to store two or more tuples at different
classification levels with the same value for the apparent key. This leads to the concept of
polyinstantiation, where several tuples can have same apparent key value but different attributes
value for users at different classification levels.

Comment
Chapter 30, Problem 11RQ

Problem

What are the relative merits of using DAC or MAC?

Step-by-step solution

Step 1 of 1

Discretionary access control (DAC) policies are characterized by a high degree of

flexibility, which makes them suitable for a large variety of application domains.

The main drawback of DAC models is their vulnerability to malicious attacks, such as

Trojan horses embedded in application programs.

Where as mandatory policies ensures a high degree of protection in a way, they

prevent any illegal flow of information.

MAC have the drawback of being too rigid and they are only applicable in limited

environments.

In many practical situations discretionary policies are preferred because they offer a

better trade off between security and applicability.

Comment
Problem
Chapter 30, Problem 12RQ

What is role-based access control? In what ways is it superior to DAC and MAC?

Step-by-step solution

Step 1 of 1

Role – based access control (RBAC) technology for managing and enforcing security in large –
scale enterprise wide systems. The basic notation is that permissions are associated with soles,
and users are assigned to appropriate roles.

Roles can be created using the CREATE ROLE and DESTROY ROLE commands, ERANT and
REVOKE used to assign and revoke privileges from voles.

RBAC appears to be a viable alternative to traditional DAC and MAC, it ensures that only
authorized users are given access to certain data or resources.

Many DBMS have allowed the concept of voles, where privileges can be assigned to voles.

Role hierarchy in RBAC is natural way of organizing roles to reflect the organization’s lines of
authority and responsibility.

Using an RBAC model highly desirable goal for addressing the key security requirements of
web – based applications.

DAC and MAC models lack capabilities needed to support the security requirements emerging
enterprises and web – based applications.

Comment
Chapter 30, Problem 13RQ

Problem

What are the two types of mutual exclusion in role-based access control?

Step-by-step solution

Step 1 of 1

Allocation of duties is an important requirement in various database management systems. It is


necessary to prevent doing work by the single user that involves the requirement of two or more
people, so that collision can be prevented. To implement this process successfully mutual
exclusion of roles are used.

Two roles are said to be mutually exclusive if the user does not able to use both the roles.

Mutual exclusion of roles can be classified in to two types.

1. Authorization time exclusion.

2. Runtime exclusion.

Authorization time exclusion

It is a static process in which two roles that are mutually exclusive are not assigned to user’s
authorization at the same time.

Runtime exclusion

It is a dynamic process, where the two roles are mutually exclusive are authorized to one user at
the same time but can activate any one authorization that is both the roles cannot be activated at
the same time.

Comment
Chapter 30, Problem 14RQ

Problem

What is meant by row-level access control?

Step-by-step solution

Step 1 of 1

In row level access control, the name itself determines that access control rules are implemented
on the data row by row.

Each row is given a label, where data sensitivity information is stored.

• It ensures data security by allowing the permissions to be set not only for column or table but
also for each row.

• Database administrator provides the user with the default session label initially.

• Row-level contains levels of hierarchy of sensitivity of data to maintain privacy or security.

• Unauthorized users are prevented from viewing or altering certain data by using labels
assigned.

• A user is represented by a low number who have low level authorization, the access is denied
to data having a higher-level number.

• If the label is not given to a row, it is automatically assigned depending upon the user’s session
label.

Comment
Chapter 30, Problem 15RQ

Problem

What is label security? How does an administrator enforce it?

Step-by-step solution

Step 1 of 1

Label Security policy is a policy defined by the administrator. The policy is invoked
automatically whenever the policy affected data is accessed through an application. When this
policy is implemented, each row is added with a new column.

The new column contains the label for each row that is considered to be the sensitivity of the row
as per the policy. Each user has an identity in label-based security; it is compared to the label
assigned to each row to determine whether the user has rights to access to view the contents of
that row.

The database administrator has the privilege to set an initial label for the row.

Label security administrator defines the security labels for data and authorizations that govern
access to specified projects for users.

Example

If a user has SELECT privilege on the table, Label Security will automatically evaluate each row
returned by the query to determine whether the user is provided with the rights to view the data.

If the user is assigned with sensitivity level 25, the user can view all rows that have a security
level of 25 or lower.

Label security can be used to perform security checks on statements that include insert, delete,
and update.

Comment
Chapter 30, Problem 16RQ

Problem

What are the different types of SQL injection attacks?

Step-by-step solution

Step 1 of 1

SQL injection attacks are more common threats to database systems. Types of injection attacks
include,

• SQL Manipulation

• Code injection

• Function Call injection

Explanation

SQL Manipulation

A modification attack that changes an SQL command in the application, or by extending a query
by adding additional query components using set operations such as union, intersect, or minus in
SQL query.

Example

The query used to check authentication:

SELECT * FROM loginusers

WHERE username="john" and paSSswoRd="johnpwd";

Check whether any rows are returned by using this query.

The hacker can try to change or manipulate the SQL statement as follows:

SELECT * FROM loginusers

WHERE username="john" and paSSswoRd="johnpwd" or "a"="a";

So the hacker knows “john” as a valid login and without knowing his password able to log into the
database system.

Code Injection

• It allows the addition of extra SQL statements or commands to the existing or original SQL
statement by introducing a computer bug caused by processing invalid data.

• The attacker injects the code into a computer program to change the course of action.

• It is a one of the method used for hacking the system to obtain information without
authorization.

Function call Injection

• A database or operating system (OS) function call is injected into the SQL statements to
change the data or to make a system call that is considered to be privileged.

• It is possible to introduce a function that performs some operation related to communication of


network and SQL queries are created that are dynamic as they are executed at run time.

Example

The query given makes the user request a page from a web server.

SELECT TRANSLATE ("||HTTP.REQUEST ('http: //129.107.12.1/') ||", '97876763','9787') FROM


dual;

The attacker can identify the string that is given as an input, the URL of the web page for doing
any other illegal operations.

Comment
Chapter 30, Problem 17RQ

Problem

What risks are associated with SQL injection attacks?

Step-by-step solution

Step 1 of 1

Risk associated with SQL injection attacks are,

Database Fingerprinting

The attacks related to database are determined by the attacker by identifying the type of backend
database which are performed if there is weakness in DBMS.

Denial of Service

The attacker can make buffer to overflow with request or consume more number of resources or
they delete some data, thus denying the service to the intended users.

Bypassing Authentication

The attacker can access the database system as an authorized user and perform all the desired
operations.

Identifying Injectable Parameters

The attacker obtains the sensitive information such as the type and structure of the back-end
database of a web application. It is possible as the default error page is descriptive that are
returned by application servers.

Executing Remote Commands

By this the attacker uses the tool to execute the commands on the database. For example
attacker can execute stored procedures and functions from a remote SQL interface.

Performing Privilege Escalation

This attack makes use of logical flaws within the database to improve the level of access.

Comment
Problem
Chapter 30, Problem 18RQ

What preventive measures are possible against SQL injection attacks?

Step-by-step solution

Step 1 of 1

Preventing from SQL injection attacks is achieved by using some programming rules to all
procedures and functions that are accessed through web. Some of the techniques include,

Bind Variables

• The bind variables are used to (using parameter) protects against injection attacks and hence
performance is improved.

• For example, consider the code using java and JDBC:

PreparedStatement st=con.prepareStatement ("SELECT * FROM employee WHERE empid=?


AND pwd=?");

st.setString (1, empid);

st.setString (2, pwd);

• User input should be bound to a parameter instead of using it in the statement, in this example
the input ‘1’ is assigned to a bind variable ‘empid’ instead of directly passing string parameters.

Filtering Input

• It is used to remove the escape characters by using Replace function of SQL from input strings.

• For example the delimiter (“) double quote is replaced by (‘’) two single quotes.

Function Security

Database standard and custom functions should be restricted as they take advantage during the
SQL function injection attacks.

Comment
Chapter 30, Problem 19RQ

Problem

What is a statistical database? Discuss the problem of statistical database security.

Step-by-step solution

Step 1 of 1

Statistical database are used mainly to produce statistics on various populations. The database
may contain data on individuals , which should be protected from user access. Users are
permitted to retrieve statistical information on the populations such as averages, sums, counts,
minimums maximums, and standard deviations.

A population is a set of tuples of a relation (table that satisfy some selection condition

Statistical queries involve applying statistical functions to a population of tuples.

Statistical database security techniques fail to provide security to individual data in some
situations.

For ex: We may want to retrieve the number of individuals in a population or the average income
in the population.

Comment
Chapter 30, Problem 20RQ

Problem

How is privacy related to statistical database security? What measures can be taken to ensure
some degree of privacy in statistical databases?

Step-by-step solution

Step 1 of 2

Statistical database are used mainly to produce statistics about various populations. The
database may contain confidential data about individuals, which should be protected from user
access. However, users are permitted to retrieve statistical information about the populations,
such as averages, sums, counts, maximums, minimums, and standard deviations. Since there
can be ways to retrieve private information using aggregate function when much information is
available about a person, statistical database that store information impose potential threats to
privacy.

Consider a example: PERSON relation with attributes Name, Ssn, Income, Address, City, Zip,
Sex and Last_degree.

A population is set of tuples of a relation that satisfy some selection condition. Hence, each
selection condition on the PERSON relation will specify a particular population of PERSON
tuples. For example Sex = 'F' or Last_degree = 'M.Tech'.

Statistical queries involve applying statistical functions to a population of tuples. For example:
Avg Income. However, access to personal information is not allowed. Statistical database
security techniques must prohibit queries that retrieve attribute values and by allowing only
queries that involve aggregate functions such as ADD, MIN,,MAX, AVG, COUNT and
STANDATRD DEVIATION. Such queries are sometime called statistical queries.

Comment

Step 2 of 2

It is the responsibility of a database management system to ensure the confidentiality of


information about individuals, while still providing useful statistical summaries of data about those
individuals to user. Provision of privacy protection is paramount. Its violation can be illustrated in
following statistical queries:

Q1 SELECT COUNT (*) FROM PERSON

WHERE ;

Q2 SELECT AVG(Income) FROM PERSON

WHERE;

Let someone is interested in find in salary of Jane Smith, who is a female with last degree 'M.S,
and stays in Houston adding all these to let we get 1 as result of Q1. Now using same condition
for Q2 will give salary of Jane Smith. even if result is not 1 for Q1, still MAX and MIN functions
can be used to get range of salary.

Measures taken to ensure privacy:

1.) No statistical queries are permitted whenever number of tuples in the population specified by
selection falls below some threshold.

2.) Prohibit query that repeatedly refer to same population of tuples.

3.) Introduce slight noises in result of queries.

4.) Partitioning of database into groups and any qury must refer to any complete group, but never
to subsets of records within groups.

Comment
Chapter 30, Problem 21RQ

Problem

What is flow control as a security measure? What types of flow control exist?

Step-by-step solution

Step 1 of 3

Flow control regulates the distribution or flow of information among accessible objects. A flow
between object X and object Y occurs when a program reads values from X and writes values
into Y. Flow control checks that information contained in some object does not flow explicitly or
implicitly into less protected objects. Thus, a user cannot get indirectly in Y what he or she
cannot get directly in X. Most flow controls employ some concepts of security class; the transfer
of information from a sender to a receiver is allowed only if the receiver's security class is at least
as privilege as sender's.

Examples of a flow control program include preventing a service program from leaking a
customer's confidential data, and blocking the transmission of secret military data to an unknown
classified user.

A flow policy specifies the channels long which information is allowed to move. The simplest flow
policy specifies just two classes of information: confidential(C), and non-Confidential (N), and
allows all flows except those from class C to N. This policy can solve the confidentiality problem
that arises when a service program handles data such a s customer information, some of which
may be confidential.

Comment

Step 2 of 3

Access control mechanisms are responsible for checking users' authorizations for resource
access: Only granted operations are executed. Flow controls can be enforced by an extended
access control mechanism, which involve assigning a security class to each running program.
The program is allowed to read a particular memory segment only if its class is as high as that of
the segment. It is allowed to write in a segment only if its class as low as that of the segment.
This automatically ensures that no information transmitted by the person can move from a higher
to a lower class. For example, a military program with secret clearance can only read from
objects that are unclassified and confidential and can only write into objects that are secret or top
secret.

Two types of flows exist:

1.) Explicit flows: Occurring as a consequence of assignment instructions, such as Y:= f(X1, Xn)

2.) Implicit flows: Generated by conditional instructions, such as if f(Xm+1,..., Xn) then y:= f(X1,
Xm).

Comment

Step 3 of 3

Flow control mechanisms must verify that only authorized flows, both explicit and implicit, are
executed. A set of rules must be satisfied to ensure secure information flows. Rules may be
expressed using flow relations among classes and assigned to information, stating the
authorized flow within the system. This relation can define, for a class, the set of classes where
information can flow, or can state the specific relations to be verified between two classes to
allow information to flow from one to another. In general, flow control mechanisms implement the
control by assigning a label to each object and by specifying the security class of the object.
Labels are then used to verify the flow relations defined in the model.

Comment
Chapter 30, Problem 22RQ

Problem

What are covert channels? Give an example of a covert channel.

Step-by-step solution

Step 1 of 2

A covert channel allows a transfer of information that violates the security or the policy.
Specifically, covert channel allows information to pass from higher classification level to a lower
classification level through improper means. Covert channels can be classified into two broad
categories:

1.) Timing Channels: In a timing channel the information is conveyed by the timing event
processes

2.) Storage channels: In storage channels temporal synchronization is not required, in that
information is conveyed by accessing system information or what is otherwise inaccessible to the
user.

Comment

Step 2 of 2

In a simple example of a convert channel, consider a distributed database system in which two
nodes have user security levels of secret(S) and unclassified (U). In order for a transaction to
commit, both nodes must agree to commit. They mutually can only do operations that are
consistent with *- property, which states that in any transaction, the S site cannot writ or pass
information to the U site. However, if these two sites collude to set up a covert channel between
them, a transaction involving secret data may be committed unconditionally by the U site, but the
S site may do so in some predefined agreed-upon way so that certain information may be
passed from the site S to the U site. Measures such as locking prevent concurrent writing of the
information by users with different security levels into the same objects, preventing the storage-
type convert channels. Operating systems and distributed database provide control over the
multi-programming of operations that allows a sharing of resources without the possibility of
encroachment of one program or process into another's memory or other resources in the
system, thus preventing timing-oriented covert channels. In general, covert channels are not a
major problem in well-implemented robust database implementations. However, certain schemes
may be contrived by clever uses that implicitly transfer information.

Some security experts believe that one way to avoid covert channels is to disallow programmers
to actually gain access to sensitive data that a program will process after the program has been
put into operation.

Comment
Chapter 30, Problem 23RQ

Problem

What is the goal of encryption? What process is involved in encrypting data and then recovering
it at the other end?

Step-by-step solution

Step 1 of 1

Suppose data is communicated via a secure channel but still falls into wrong hands. In this
situation, by using encryption we can disguise the message so that even if the transmission is
diverted, the message will not be revealed. Encryption is a means of maintaining secure data in
an insecure environment.

Encryption consists of applying an encryption algorithm to data using some predefined


encryption key.

The resulting data has to be decrypted using a decryption key to recover the original data.

Comment
Chapter 30, Problem 24RQ

Problem

Give an example of an encryption algorithm and explain how it works.

Step-by-step solution

Step 1 of 3

Public key encryption: Public key encryption is based on mathematical functions rather than
operations on bit patterns. They also involve the use of two separate keys, in contrast to
conventional encryption, which uses one key only. The use of two keys can have profound
consequences in the areas of confidentiality, key distribution, and authentication. The two keys
used for public key encryption are referred to as the public key and the private key. Invariably, the
private key is kept secret, but it is referred to as private key rather than secret key to avoid
confusion with conventional encryption.

Comment

Step 2 of 3

A public key encryption scheme, or infrastructure, has six ingredients:

1.) Plaintext: data that is to be transmitted (encrypted).

2.) Encryption algorithm: Algorithm that will perform transformations on plain text.

3. and 4.) Public key and Private Key: If one of these is used for encryption the other is used for
decryption.

5.) Cipher text: Encrypted data or scrambled text for a given plaintext and set of keys.

6.) Decryption algorithms: This algorithm accepts the cipher text and the matching key and
produces the original plain text.

Comment

Step 3 of 3

Public key is made public for others to use, whereas the private key is known only to its owner. It
relies on one key for encryption and other for decryption.

Essential steps are as follows:

1.) Each user generates a pair of keys to be used for the encryption and decryption of messages.

2.) Each user places one of the keys in a public register or other accessible file. This is the public
key. The companion key is kept private.

3.) If a sender wishes to send a private message to a receiver, the sender encrypts the message
using the receiver's public key.

4.) When the receiver receives the message, he or she decrypts it using the receiver's private
key. No other recipient can decrypt the message because only the receiver knows his or her
private key.

Comment
Chapter 30, Problem 25RQ

Problem

Repeat the question for the popular RSA algorithm.

Question

Give an example of an encryption algorithm and explain how it works.

Step-by-step solution

Step 1 of 1

The RSA encryption algorithm incorporates results form number theory, combined with the
difficulty of determining the prime factors of a target. The RSA algorithm also operates with
modular arithmetic -mod n.

Two keys e and d, are used for encryption and decryption. An important property is that they can
be interchanged. n is chosen as a large integer that is a product of two large distinct prime
numbers, a and b. The encryption key e is a randomly chosen number between 1 and n that is
relatively prime to (a-1) *(b-1). The plaintext block P is encrypted as P^e mod n. Because the
exponentiation is performed mod n, factoring P^e to uncover the encrypted plaintext is difficult.
However, the decrypting key d is carefully chosen so that (P^e)^d mod n = P. The d can be
computed from the condition that d*e = 1 mod((a-1) * (b-1)). Thus, the legitimate receiver who
knows d simply computes (P^e)^dmod n = P and recovers p without having to factor P^e.

Comment
Chapter 30, Problem 26RQ

Problem

What is a symmetric key algorithm for key-based security?

Step-by-step solution

Step 1 of 1

Symmetric key uses same key for both encryption and decryption, by using this characteristic
fast encryption and decryption is possible to be used for sensitive data in the database.

• The message is encrypted with a secret key and can be decrypted with the same secret key.

• Algorithm used for symmetric key encryption is called as symmetric key algorithm and as they
are mostly used for encrypting the content of a message, they are also called content
encryption algorithm.

• The secret key is derived from password string used by the user by applying the same function
to the string at both sender and receiver. Thus it is also referred as password based encryption
algorithm.

• Encrypting the content using longer key is difficult to break than using shorter key as the
encryption entirely depends upon the key.

Comment
Chapter 30, Problem 27RQ

Problem

What is the public key infrastructure scheme? How does it provide security?

Step-by-step solution

Step 1 of 2

Public key encryption scheme,

1. Plain text: This is the data or readable message that is fed into the algorithm as input.

2. Encryption algorithm: This algorithm performs various trans formations on the plaintext.

3. Public and private keys: These are a pair of keys that have been selected so that if one is
used for encryption, the other is used for decryption. The exact transformations performed by the
encryption algorithm detention the public or private key that is provided as in put.

4. Cipher text: This is the scrambled message produced as output. It depends on the plain text
and the key. For a given message two different keys will produce two different cipher texts.

5. Decryption algorithm: This algorithm accepts the cipher text and the matching key and
produces the original plaintext. A general purpose public key cryptographic works with one key
for encryption and different but related key for decryption.

Comment

Step 2 of 2

The steps are as follows:-

1. Each user generates a pair of key s to be used for the encryption and decryption of message.

2. User places one of two keys in public register of in an accessible file.(Public key) and
companion key is kept private.

3. If user wishes to send a private message to a receiver, the sender encrypts it using receiver
public key.

4. Receiver receives message, decry its it using the receiver private key, No other user can
decrypt the message thus this provide security to data.

Comment
Chapter 30, Problem 28RQ

Problem

What are digital signatures? How do they work?

Step-by-step solution

Step 1 of 1

Digital signature is a means of associating a mark unique to an individual with a body of text. The
mark should be unforgettable i.e others able to check whether signature comes from the
originator.

Digital signature consists of a string of symbols.

- signature must be different for each use. This can be achieved by making each digital signature
a function of the message that it is signing together with a time stamp.

- Public key techniques are the means cheating digital signatures.

Comment
Chapter 30, Problem 29RQ

Problem

What type of information does a digital certificate include?

Step-by-step solution

Step 1 of 1

A digital certificate combines the public key with the identity of the person that consists of the
corresponding private key into a statement that was digitally signed. The certificate are issued
and signed by certification authority (CA).

The following are the list of information included in the certificate:

1. The certificate owner information, which is a unique identifier known as the distinguished name
(DN) of the owner. It includes owner’s name, organization and other related information of the
owner.

2. The public key of the owner.

3. The date of issue of the certificate.

4. The validity period is specified by ‘Valid From’ and ‘Valid To’ dates.

5. Information of the issuer identifier.

6. Digital signature of the certification authority (CA) who issues the certificate.

All the information is encoded through message-digest function, which creates the signature.

Comment
Chapter 30, Problem 30E

Problem

How can privacy of data be preserved in a database?

Step-by-step solution

Step 1 of 3

Protecting data from unauthorized access is refereed as data privacy. The data warehouses in
which a large amount of data is stored must be kept private and secure.

There are many challenges associated with data privacy. Some of them are as follows:

• In order to preserve data privacy, performing data mining and analysis should be minimized.
Usually, a large amount of data is collected and stored in centralized location. Violating one
security policy will expose all the data. So, it is better to avoid storing data in central warehouse.

Comment

Step 2 of 3

• The database contains personal data of the individuals. So, personal data of the individuals is
to be kept secure and private.

• A lot of people in the organization and outside the organization access the data. Data must be
protected from illegal access/attacks.

Comment

Step 3 of 3

Some of the measures to provide data privacy are as follows:

• A good security mechanism should be imposed to protect the data from unauthorized users. It
includes physical security which includes protecting the location where the data is stored.

• Provide controlled and limited access to the data. Ensure that only authorized users can access
the data by using biometrics, passwords etc. Also impose mechanism so that they access the
data that they need.

• It is better to avoid storing data in central warehouse and distribute the data in different
locations.

• Anonymize the data and remove all the personal information.

Comment
Chapter 30, Problem 31E

Problem

What are some of the current outstanding challenges for database security?

Step-by-step solution

Step 1 of 3

Challenges in database Security:

1.) Data Quality: The database community needs techniques and organizational solutions to
access and attest the quality of data. Techniques may be as simple as Quality stamps posted on
Web sites. We also need techniques that provide more efficient integrity semantics verification
and tools for assessment od data quality, based on techniques such as record linkage.

Comment

Step 2 of 3

2.) Intellectual property Rights: With the widespread use of internet and intranets, legal and
informational aspects of data are becoming major concerns of organizations. To address these
concerns, watermarking techniques for relational data have recently been proposed. The main
purpose of digital watermarking is to protect content from unauthorized duplication and
distribution by enabling provable ownership of the content. It has traditionally relied upon the
availability of large noise domain within which the object can be altered while retaining its
essential properties. However, research is needed to assess the robustness of such techniques
and to investigate different approaches aimed at preventing intellectual property right violations.

Comment

Step 3 of 3

3.) Data Survivability: Database systems need to operate and continue their functions, even
with reduced capabilities, despite disruptive events such as information warfare attacks. A
DBMS, in addition to making every effort to prevent an attack and detecting one in the event of
occurrence, should be able to do following:

1.) Confinement: Take immediate action to eliminate the attacker's access to the system and to
isolate or contain the problem to prevent further spread.

2.) Damage assessment: Determine the extent of the problem, including failed functions and
corrupted data.

3.) Reconfiguration: Reconfigure to allow operation to continue is a degraded mode while


recovery proceeds.

4.) Repair: Recover corrupted or lost data and repair or reinstall failed system functions to
reestablish a normal level of operation.

5.) Fault treatment: To the extent possible, identify the weaknesses exploited in the attack and
take steps to prevent a recurrence.

The goal of the information warfare attacker is to damage the organization's operation and
fulfillment of its mission through disruption of its information systems. The specific target of an
attack may be the system itself or its data. While attacks that bring the system down outright are
server and dramatic, they must also be well timed to achieve the attackers goal, since attacks will
receive immediate and concentrated attention in order to bring the system back to operational
condition, diagnose how the attack took place, and installs preventive measures.

Comment
Chapter 30, Problem 32E

Problem

Consider the relational database schema in Figure 5.5. Suppose that all the relations were
created by (and hence are owned by) user X, who wants to grant the following privileges to user
accounts A, B, C, D, and E

a. Account A can retrieve or modify any relation except DEPENDENT and can grant any of these
privileges to other users.

b. Account B can retrieve all the attributes of EMPLOYEE and DEPARTMENT except for Salary,
Mgr_ssn, and Mgr_start_date.

c. Account C can retrieve or modify WORKS_ON but can only retrieve the Fname, Minit, Lname,
and Ssn attributes of EMPLOYEE and the Pname and Pnumber attributes of PROJECT.

d. Account D can retrieve any attribute of EMPLOYEE or DEPENDENT and can modify
DEPENDENT.

e. Account E can retrieve any attribute of EMPLOYEE but only for EMPLOYEE tuples that have
Dno = 3.

f. Write SQL statements to grant these privileges. Use views where appropriate.

Step-by-step solution

Step 1 of 6

(a) GRANT SELECT, UPDATE

ON EMPLOYEE, DEPARTMENT, DEPT_LOCATIONS, PROJECT, WORKS_ON

TO USER_A

WITH GRANT OOPTION ;

Comment

Step 2 of 6

(b) CREATE VIEW EMPS AS

SELECT FNAME, MINIT, LNAME, SSN, BDATE, ADDRESS, SEX,

SUPERSSN, DN O

FROM EMPLOYEE ;

GRANT SELECT ON EMPS

TO USER _ B;

CREATE VIEW DEPTS AS

SELECT DNAME, DNUMBER FROM DEPARTMENT;

GRANT SELECTION ON DEPTS

TO USER _ B;

Comment

Step 3 of 6

(c) GRANT SELECT, UPDATE ON WORKS ON TO USE_C

CREATE VIEW EMPI AS

SELECT FNAME, MINIT, LNAME, SSN

FROM EMPLOYEE ;

GRANT SELECT ON EMPL

TO USER _ C;

CREATE VIEWPROJIAS
SELECT PNAME, PNUMBER,

FROM PROJECT;

GRANT SELECTION PROJ1

TO USER_C;

Comment

Step 4 of 6

(d) GRANT SELECT ON EMPLOYEE, DEPEN DENT TO USER_D;

GRANT UPDATE ON DEPENDENT TO USER_D;

Comment

Step 5 of 6

(e) CREATE VIEW DNO 3_ EMPLOYEEES AS

SELECT * FROM EMPLOYEE

WHERE DNO = 3;

GRANT SELECT ON DNO 3_EMPLOYEES TO USER_E;

Comment

Step 6 of 6

(f) Working of the above statements grants privileges.

Comment
Chapter 30, Problem 32E

Problem

Consider the relational database schema in Figure 5.5. Suppose that all the relations were
created by (and hence are owned by) user X, who wants to grant the following privileges to user
accounts A, B, C, D, and E

a. Account A can retrieve or modify any relation except DEPENDENT and can grant any of these
privileges to other users.

b. Account B can retrieve all the attributes of EMPLOYEE and DEPARTMENT except for Salary,
Mgr_ssn, and Mgr_start_date.

c. Account C can retrieve or modify WORKS_ON but can only retrieve the Fname, Minit, Lname,
and Ssn attributes of EMPLOYEE and the Pname and Pnumber attributes of PROJECT.

d. Account D can retrieve any attribute of EMPLOYEE or DEPENDENT and can modify
DEPENDENT.

e. Account E can retrieve any attribute of EMPLOYEE but only for EMPLOYEE tuples that have
Dno = 3.

f. Write SQL statements to grant these privileges. Use views where appropriate.

Step-by-step solution

Step 1 of 6

(a) GRANT SELECT, UPDATE

ON EMPLOYEE, DEPARTMENT, DEPT_LOCATIONS, PROJECT, WORKS_ON

TO USER_A

WITH GRANT OOPTION ;

Comment

Step 2 of 6

(b) CREATE VIEW EMPS AS

SELECT FNAME, MINIT, LNAME, SSN, BDATE, ADDRESS, SEX,

SUPERSSN, DN O

FROM EMPLOYEE ;

GRANT SELECT ON EMPS

TO USER _ B;

CREATE VIEW DEPTS AS

SELECT DNAME, DNUMBER FROM DEPARTMENT;

GRANT SELECTION ON DEPTS

TO USER _ B;

Comment

Step 3 of 6

(c) GRANT SELECT, UPDATE ON WORKS ON TO USE_C

CREATE VIEW EMPI AS

SELECT FNAME, MINIT, LNAME, SSN

FROM EMPLOYEE ;

GRANT SELECT ON EMPL

TO USER _ C;

CREATE VIEWPROJIAS
SELECT PNAME, PNUMBER,

FROM PROJECT;

GRANT SELECTION PROJ1

TO USER_C;

Comment

Step 4 of 6

(d) GRANT SELECT ON EMPLOYEE, DEPEN DENT TO USER_D;

GRANT UPDATE ON DEPENDENT TO USER_D;

Comment

Step 5 of 6

(e) CREATE VIEW DNO 3_ EMPLOYEEES AS

SELECT * FROM EMPLOYEE

WHERE DNO = 3;

GRANT SELECT ON DNO 3_EMPLOYEES TO USER_E;

Comment

Step 6 of 6

(f) Working of the above statements grants privileges.

Comment
Chapter 30, Problem 33E

Problem

Suppose that privilege (a) of Exercise is to be given with GRANT OPTION but only so that
account A can grant it to at most five accounts, and each of these accounts can propagate the
privilege to other accounts but without the GRANT OPTION privilege. What would the horizontal
and vertical propagation limits be in this case?

Reference Problem 30.32

Consider the relational database schema in Figure 5.5. Suppose that all the relations were
created by (and hence are owned by) user X, who wants to grant the following privileges to user
accounts A, B, C, D, and E

a. Account A can retrieve or modify any relation except DEPENDENT and can grant any of these
privileges to other users.

b. Account B can retrieve all the attributes of EMPLOYEE and DEPARTMENT except for Salary,
Mgr_ssn, and Mgr_start_date.

c. Account C can retrieve or modify WORKS_ON but can only retrieve the Fname, Minit, Lname,
and Ssn attributes of EMPLOYEE and the Pname and Pnumber attributes of PROJECT.

d. Account D can retrieve any attribute of EMPLOYEE or DEPENDENT and can modify
DEPENDENT.

e. Account E can retrieve any attribute of EMPLOYEE but only for EMPLOYEE tuples that have
Dno = 3.

f. Write SQL statements to grant these privileges. Use views where appropriate.

Step-by-step solution

Step 1 of 1

The horizontal propagation granted to USERA is 5.

The vertical propagation limit granted to USER_A is level 1.

So that uses A can then grant it with level 0 vertical limit (i.e with out the GRANT OPTION) to at
most five users, who then cannot further grant the privilege.

Comment
Chapter 30, Problem 34E

Problem

Consider the relation shown in Figure 30.2(d). How would it appear to a user with classification
U? Suppose that a classification U user tries to update the salary of ‘Smith’ to $50,000; what
would be the result of this action?

Step-by-step solution

Step 1 of 1

EMPLOYEE would appear to users with in classification U as follows:

NAME: SALARY Job performance TC

Smith null null U

If a classification user tried to up date the salary of smith to $ 50,000, a third polyinstantiation
of smith tuple would result as follows.

NAME SALARY JOB performance TC.

Smith 40000 C fair SS

Smith 40000 C excellent C C

Smith 50000 null C

Brown C 80000 s good C S

Comment

Вам также может понравиться