Академический Документы
Профессиональный Документы
Культура Документы
SEMETER : VI
YEAR : 2010
Prepared by
R.
Saravanan
Sr. Lect, IT
UNIT I
Introduction to Database Systems: Overview – Data Models – Database System
Architecture – History of Database Systems. Entity-Relationship Model: Basic
Concepts – Constraints – Keys – Design Issues – Entity Relationship Diagram –
Weak Entity Sets – Extended E-R Features – Design of an E-R Database Schema –
Reduction of E-R Schema to Tables – The Unified Modeling Language UML.
UNIT II
Relational Model: Structure of Relational Databases – Relational Algebra –
Extended Relational Algebra Operations – Modification of Database – Views –
Tuple Relational Calculus – Domain Relational Calculus. SQL: Background – Basic
Structure – Set Operations – Aggregate Functions – Null Values – Nested
Subqueries – Views – Complex Queries – Modification of the database – Joined
Relations – Data-Definition Language – Embedded SQL –Dynamic SQL – Other
SQL Features. Other Relational Languages: Query-by-Example – Datalog – User
Interfaces and Tools
UNIT III
Integrity and Security: Domain Constraints – Referential Integrity – Assertions –
Triggers – Security and Authorization – Authorization in SQL – Encryption and
Authentication. Relational-Database Design: First Normal Form – Pitfalls in
Relational-Database Design – Functional Dependencies – Decomposition –
Desirable Properties of Decomposition – Boyce-Codd Normal Form – Third Normal
Form – Fourth Normal Form – More Normal Forms – Overall Database Design
Process.
UNIT IV
Storage and File Structures: Overview of Physical Storage Media – Magnetic Disks
– RAID – Tertiary Storage – Storage Access – File Organization – Organization of
Records in Files – Data-Dictionary Storage. Indexing and Hashing: Basic Concepts
– Ordered Indices – B+-Tree Index Files – B-Tree Index Files – Static Hashing –
Dynamic Hashing – Comparison of Ordered Indexing and Hashing – Index
Definition in SQL – Multiple-Key Access
UNIT V
Transactions: Transaction concept – Transaction State – Implementation of
Atomicity and Durability – Concurrent Executions – Serializability – Recoverability
– Implementation of Isolation – Transaction Definition in SQL – Testing for
Serializability
TEXT BOOK
REFERENCE BOOKS
UNIT
Unit-I- I
2 mark Q & A:
18. Define Entity, Entity Set, and extensions of entity set. Give one
example for each.
• Entity – object or thing in the real world. Egs., each person, book, etc.
• Entity set – set of entities of the same entity that share the same
properties or attributes. Eg., Customer, loan, account, etc
• Extensions of entity set – individual entities that constitute a set. Eg.,
individual bank customers
are distinct from each individual entity in the collection. Such an attribute
is called a key attribute.
28. Define weak Entity set. Give an example and explain why it is weak
entity set.
Entity set with no sufficient attributes to form a primary key.
• Payment entity set is weak since duplication exists
Syntax:
create table <table name>(columnname data type (size) constraint
constraint_name references parent table name);
Two types:
Procedural DML:
It requires a user to specify what data are needed and how to get
those data.
Non-Procedural DML:
It requires a user to specify what data are needed without specifying
how to get those data.
Partial
if only some entities in E participate in relationships in R, the
participation of entity set E in relationship R is said to be partial.
File manager:
File manager manages the allocation of space on disk storage and
the data structures used to represent information stored on disk.
Buffer manager:
Buffer manager is responsible for fetching data from the disk
storage into the main memory, and deciding what data to cache in
memory.
Lecture notes:
10 | P a g e Department of Information Technology
Database Management Systems
In other way
Database Applications
In the early days, database applications were built directly on top of file
systems
Integrity problems
• Integrity constraints (e.g. account balance > 0) become
“buried” in program code rather than being stated explicitly
• Hard to add new constraints or change existing ones
11 | P a g e Department of Information Technology
Database Management Systems
Atomicity of updates
• Failures may leave database in an inconsistent state with
partial updates carried out
• Example: Transfer of funds from one account to another
should either complete or not happen at all
• Program to transfer Rs.200 from A to B acc. System failure
occurs then rs. 200 is removed from A but not credited to B
Security problems
o Hard to provide user access to some, but not all, data
Files vs DBMS
VIEW OF DATA
Data Abstraction
The need for efficiency has led designed to use complex data Structures to
represent data in the database. Developers hide the complexity from users
through several levels of abstraction
Physical level
Logical level
The next –higher level of abstraction describes what data are stored
in the database,
and what relationships exist among those data.
The logical level thus describes the entire database in terms of a
small number of relatively simple structures.
View of level
The highest level of abstraction describes only part of the entire database.
Even though the logical level uses simpler structures, complexity remains
because of the variety of information stored in a large database.
The physical schema describes the database design at the physical level,
The logical schema describes the database design at the logical level.
A database may also have several schemas at the view level, sometimes
called subschema, that describe different views of the database.
The logical schema is by far the most important, in terms of its effect on
application programs, since programmers construct applications by using the
logical schema.
The physical schema is hidden beneath the logical schema, and can
usually be changed easily without affecting application programs.
DATA MODELS
Both provide a way to describe the design of a database at the logical level.
Entity-Relationship Model
For eg., each person is an entity; and bank accounts can be considered as
entities: Entities are described in a database by a set of attributes.
For eg., the attributes acc_no and bal may describe one particular account
in a bank and they form attributes of the account entity set. Similarly, attributes
cust_name, cust_street address and cust_id may describe a customer entity.
Relational Model
The relational model uses a collection if tables to represent both data and
the relationships among those data. Each table has multiple columns, and each
column has a unique name.
The relational data model is the most widely used data model, and a vast
majority of current database systems are based on the relational model.
The relational model is at a lower level of abstraction than the E-R model.
Database designs are often carried out in the E-R model, and then
translated to the relational model.
For eg., it is easy to see customer and account correspond to the entity
sets of the same name, while the table depositor corresponds to the relationship
set depositor.
The object-oriented data model is another data model that has seen
increasing attention. The object-oriented model can be seen as extending the E-R
model with notions of encapsulation, methods and object identity.
Preceded the relational data model. These models were tied closely to the
underlying implementation, and complicated the task of modeling data.
Database Languages
Database System provides a
DDL (Data Definition Language) – to specify the database schema and
Data dictionary/ Data Directory – Contains Metadata i.e data about data
Eg: schema of a table
The system consults the data dictionary before reading or modifying the
actual data.
DML
Retrieval
Insertion
Deletion
Modification
Database Users
There are different types of user that play different roles in a database
environment. Following is a brief description of these users:
Application Programmers
End Users
End users are those persons who interact with the application directly.
They are responsible to insert, delete and update data in the database. They get
information from the system as and when required. Different types of end users
are as follows:
Naive Users: Naive users are those users who do not have any technical
knowledge about the DBMS. They use the database through application
programs by using simple user interface. They perform all operations by
using simple commands provided in the user interface.
Example:
The data entry operator in an office is responsible for entering
records in the database. He performs this task by using menus and buttons
etc. He does not know anything about database or DBMS. He interacts with
the database through the application program.
Sophisticated Users: Sophisticated users are the users who are familiar
with the structure of database and facilities of DBMS. Such users can use a
query language such as SQL to perform the required operations on
databases. Some sophisticated users can also write application programs.
Database Administrator
A database system is partitioned into modules that deal with each of the
responsibilities of the overall system.
The functional components of a database system can be broadly divided
into
The storage manager and
The query processor components.
The database systems structure the data as to minimize the need to move
data between disk and main memory.
The job of the database system is to translate updates and queries written
in a nonprocedural language, at the logical level, into an efficient sequence of
operations at the physical level.
Storage Manager
Query Processor
Magnetic tapes were developed for data storage. Data processing tasks
such as payroll were automated, with data stored on tapes.
Tapes could be read only sequentially and data sizes were much larger
than main memory, thus data processing programs were forced to process
data in a particular order, by reading and merging data from tapes and card
checks.
Widespread use of hard disks allowed direct access to data. With disks,
network and hierarchical databases could be created that allowed data
structures such as lists and trees to be stored on disk.
22 | P a g e Department of Information Technology
Database Management Systems
1980s
That changed with System R, are fully functional System R prototype led to
IBM’s first relational database product, SQL/DS.
Early 1990s:
The SQL language for decision support applications, which are very
intensive. Decision support and querying re-emerged as a major application
area for databases. Tools for analyzing large amounts of data saw large
growths in usage.
Many databases vendors introduced parallel database products in this
period.
Late 1990s:
The major event was the explosive growth of the World Wide Web.
Databases were deployed much more extensively than ever before.
Database Systems had to support very high transaction processing rates
and high reliability. Database system also had to support Web interfaces to
data.
Entity
-It is an object that exists in the real world.
Example:
- Person, Employee, Car, Home etc..
Entity Set
- A set of entities of the same type.
Attributes
- A set of properties that describe an entity.
Types of Attributes
Stored Vs Derived
Stored attribute is one that has some value
where as derived attribute is a one where its value is derived from stored
attribute.
E.g. SA-DOB
DA- Age derived from DOB.
Key Attribute
An attribute which is used to uniquely identify records.
E.g.. eid, sno, dno
Relationship
It is an association among several entities. It specifies what type of
relationship exists between entities.
Relationship set:
Constraints
Participation constraints
a. Mapping Cardinalities:
Express the no.of entities to which another entity can be associated via a
relationship set
Types
One-to-One
An entity in set A is associated with at most one entity in set B and vice
versa.
One-to-many
Many-to-One
Many-to-Many
One or more no. of entities in set A is associated with one or more
no. of entities in set B.
b. Participation Constraints:
Total Participation
The participation of an entity set E in a relationship set R is said to be total
if every entity in E participates in atleast one relationship in R.
Partial Participation:
The participation of an entity set E in a relationship set R is said to be
partial if only a few of the entities in E participated in relationship in R.
Keys
Primary Key:
– It is a key used to uniquely identify an entity in the entity set.
– E,g eno,rno,dno etc…
Super Key:
It is a set of one or more attributes that allow us to uniquely identify an
entity in the entity set. Among them one must be a primary key attribute.
E.G.. Eid (primary key) and ename together can be identify an entity in
entity set.
Candidate key:
They are minimal super keys for which no proper subset is a superkey.
E.g.. Ename and eaddr can be sufficient to identify an employee in
employee set.
{eid} and {ename,eaddr} – Candidate keys
FOREIGN Keys
Domain
KEYS:
Participation Constraints:
Consider the above E-R diagram, which consists of two entity sets,
customer and loan, related through a binary relationship set borrower.
From the above diagram, we see that the relationship set borrower is
many-to-many.
MAPPING CARDINALITIES:
One-to-one relationship:
One-to-Many relationship:
Many-to-One relationships:
Many-to-many Relationship:
If a relationship set has also some attributes associated with it, then we
link these attributes to that relationship set.
For example, in the below diagram, we have the access_date descriptive
attribute attached to the relationship set depositor to specify the most
recent date on which a customer accessed that account.
Roles:
PARTICIPATION CONSTRAINTS:
Partial participation:
Ternary relationship consists of the three entity sets employee, job, and
branch, related through the relationship set works_on.
We can specify some types of many-to-one relationships in the case of
nonbinary relationship sets.
The edge between loan and borrower has a cardinality constraint of 1..1,
meaning the minimum and the maximum cardinality are both 1.
The limit 0...* on the edge from customer to borrower indicates that a
customer can have zero or more loans.
An entity set may not have sufficient attributes to form a primary key is
referred to as a weak entity set.
The existence of a weak entity set depends on the existence of a
identifying entity set.
It must relate to the identifying entity set via a total, one-to-many
relationship set from the identifying to the weak entity set
Identifying relationship depicted using a double diamond
The discriminator (or partial key) of a weak entity set is the set of
attributes that distinguishes among all the entities of a weak entity set.
The primary key of a weak entity set is formed by the primary key of the
strong entity set on which the weak entity set is existence dependent, plus
the weak entity set’s discriminator.
We depict a weak entity set by double rectangles.
We underline the discriminator of a weak entity set with a dashed line.
Example: payment-number – discriminator of the payment entity set
The primary key of the strong entity set is not explicitly stored with the
weak entity set, since it is implicit in the identifying relationship.
If loan-number were explicitly stored, payment could be made a strong
entity, but then the relationship between payment and loan would be
duplicated by an implicit relationship defined by the attribute loan-number
common to payment and loan.
Specialization
Generalization
Attribute Inheritance
Constraints on Generalizations
Aggregation
Specialization:
o Officer
o Teller
o Secretary
Generalization:
Attribute inheritance:
The crucial property of the higher and lower level entities created by
specialization and generalization is attribute inheritance.
A lower-level entity set inherits all the attributes and relationship
participation of the higher-level entity set to which it is linked.
Example: The officer, teller, and secretary entity sets can participate in
the works_for relationship set, since the super class employee participates in
the works_for relationship.
The attributes of the higher-level entity sets are said to be inherited by the
lower-level entity sets.
Example: customer and employee entity inherit the attributes of person.
Constraints on Generalizations:
Condition-defined:
Constraint on whether or not entities may belong to more than one lower-
level entity set within a single generalization.
Aggregation
For each entity set and relationship set there is a unique table which is
assigned the name of the corresponding entity set or relationship set.
A weak entity set becomes a table that includes a column for the
primary key of the identifying strong entity set.
Redundancy of Tables
Method 1:
Method 2:
Form a table for each entity set with all local and inherited attributes
Single unit without concern for the details of its internal structure.
o Dependent_namemultivalued attribute.
o Start_datebase attribute.
o Employment_lengthderived attribute.
Two account entities: account={account_number, balance}
o Saving _account={interest_rate}
o Checking_account={overdraft_amount}
Loan={loan_number, amount, originating_branch}
Loan_payment={payment_number, payment_date,
payment_number}
The above diagram includes the entity sets, relationship sets, and mapping
cardinalities arrived at through the design processes.
The attributes of the entity set are shown in one occurrence of the entity
and all other occurrence of the entity is shown without any attributes.
What is UML?
Use Cases
• TYPE
o Primary - represents major common processes
o Secondary - represents minor or rare processes
o Optional - processes that may not be implemented in the system
depending upon resources
• DESCRIPTION - short and essential
Borrow books
Overview: A student goes to library and borrows books. The student brings a
few books to the counter. The librarian asks the student id. The student shows
the student id and the book that he/she wishes to borrow. The librarian scans the
student id and the book through the bar code scanner.
Cross-References: None
Alternative Courses:
Line 3: The librarian tells the student to pay the fine. Use case cannceled.
Line 5: The system notifies the librarian that the limit borrowing amount of book
is over. Use case cancelled.
Include relationship
An include relationship connects a base use case (i.e. borrow books) to an
inclusion use case (i.e. check Fine). An include relationship specifies how
behaviour in the inclusion use case is used by the base use case.
Extend relationship
An extend relationship specifies how the functionality of one use case can
be inserted into the functionality of another use case. The base use case
implicitly incorporates the behaviour of another use case at a location specified
indirectly by the extending use case. The extend relationships are important
because they show optional functionality or system behavior.
Notice the extend relationship between Request a book and Search. The
extend relationship is significant because it shows optional functionality. If the
student desires, he/she can search the book through the system. However, the
student may only Request a book through the system without searching the book
if the student knows the call number.
Generalisation relationship
Class Diagrams
• Conceptual
o used in the initial analysis (not a model of software design)
o shows real world concepts
o better to overspecify than underspecify it
• Specification
o does describe software
o describes the interface of a software, not the implementation
o types rather than classes
• Implementation
o describes the actual software in the system
o classes
1. Generalisations (subtyping)
2. Associations
Each association has two association ends. Each end is attached to one of the
classes in the association. An end can be explicitly named with a label. This label
is called a role name. An association end also has multiplicity, which is an
indication of how many objects may participate in the given relationship.
The following are the additional model elements covered in a class diagram:
Interfaces <<interface>>
Q-BANK
7.a)Define and Explain the term database and database management system
with example. (7)-may 2007
b)Compare and contrast the conventional file system with DBMS. (8)-april 2008
8)What is data model?Compare and contrast three popular data models with
example. (15)-april 2008
12)Draw an E-R diagram for a hospital with a set of patients and a set of
doctors.Associate with each patient a log of the various tests and examinations
conducted. (15)-may 2007
13) Explain the distinctions among the super key, candidate key and primary key
(7)-May2007
UNIT - II
2Mark Q & A
1. Give the syntax for creating the table with composite primary key.
Multicolumn primary key is called composite primary key.
Query: select loan no, branch name from loan where amount>=500 and
amount<=1000;
3. Find the names of all branches with customers who have an account
in the bank and who live in the Harrison city using Equi-join.
4. Find the names of all branches with customers who have an account
in the bank and who live in the Harrison city using Sub-Queries.
5. Select the rows from borrower such that the loan numbers are
lesser than any loan number where the branch name is Downtown.
Query: select * from borrower where loan_no < any (Select loan_no
from loan where branch_name='downtown');
6. Define self-join
Joining of a table to itself is called self-join. i.e., it joins one row in a
table to another row.
Degree:
Number of attributes ‘n’ of its relation schema is called a degree of a
relation. eg. Account table degree is 3, since three attributes are there in
that relation.
Domain:
Set of permitted values for each attribute (or) data type describing
the types of values that can appear in each column is called a domain of a
relation. eg. Set of all account numbers of the account table.
20. Define super key and give example to illustrate the super key.
Set of one or more attributes taken collectively, allowing to identify
uniquely an entity in the entity set.
21. Define candidate key and give example to illustrate the candidate
key.
Super keys with no proper subset are called the candidate keys.
Otherwise it is called minimal super key. Candidate key is nothing but the
primary key used in SQL.
Eg1. {SSN} is the candidate key for the super keys {SSN} and
{SSN, Cust_name} of customer table.
Eg2. {Branch_name} is the candidate key for the super keys
{Branch_name} and {Branch_name, Branch_city} of Branch table.
22. List out the six fundamental operators and 4 additional operators in
relational algebra.
Six Fundamental operators:
Selection ( )
Projection ()
Union ()
Set Difference (-)
Cartesian Product (X)
Rename ()
23. Which operators are called as unary operators and explain why they
are called so.
Unary operators:
• Selection ()
• Projection ()
• Rename ()
These operators are called as unary operators because they operate
on only one relation.
24. Which operators are called as binary operators and explain why
they are called so.
63 | P a g e Department of Information Technology
Database Management Systems
Binary operators:
• Union ()
• Set Difference (-)
• Cartesian Product (X)
These operators are called as binary operators because they
operate on pairs of relations.
26. Explain the use of set difference operator and give an example to
illustrate the same.
Use of set difference operator:
Allows finding tuples that are in one relation but are not in another
relation.
Example: Find all customers of the bank who have an account but
not a loan.
27. Explain the two conditions needed for the set difference operation
(union operation) to be valid.
28. Explain with one example why the additional operators are
separated from the fundamental operators?
Additional operators are used instead of fundamental operators to
reduce the complexity of long relational algebra expressions.
Eg. r s
=r–(r–s)
Intersection can be used for repeated set difference operations.
29. Define and give the general format used for generalized projection.
Give one example expression to illustrate the same.
Generalized projection extends the projection operation by allowing
arithmetic functions to be used in the projection list.
General format used for Generalized projection is
F1 F2 … Fn(E) where
E is the relational algebra expression.
F1 F2 … Fn are the arithmetic expressions involving constants
and attributes in the schema of E. Special case these can be
simply an attribute or a constant.
Example expression for illustration:
acc_no, branch_name, balance + 100 (account)
30. What is the use of outer join and list out the three types of outer
join with the notations used in relational algebra?
Natural join combines only the common columns. So some information
will be lost, if it has no common column. So outer join is used. It avoids this
loss of information.
Three types of outer join:
Left outer join
Right outer join
Full outer join
33. Define materialized views and explain the use of such views.
Definitions:
Certain database systems allow view relations to be stored, but they
make sure that if the actual relations used in the view definition change
then the view is kept up to date. Such views are called materialized
views.The process of keeping views up to date is called view maintenance.
Lecture notes:
Relational Model
Definitions
Example:
Students(sid: string, name: string, login: string, age: integer, gpa: real)
Relational database:
Example of relation:
Attribute:
The set of allowed values for each attribute is called the domain of the attribute.
E.g. the value of an attribute can be an account number, but cannot be a set of
account numbers. Domain is said to be atomic if all its members are atomic. The
special value null is a member of every domain. The null value causes
complications in the definition of many operations.
Database schema:
Relation schema:
integrity constraints.
Relation Instance:
attributes
Database:
E.g.
Keys:
K⊆ R
Types of keys:
3) Foreign Keys:
Relation Algebra
1. Select
2. Project
3. Union
4. Set difference
5. Cartesian product and
6. Rename.
Example
A B C D
a a 1 7
a b 5 7
b b 1 3
2
b b 2 1
3 0
Selection Operation
Notation: p(r)
p is called the selection predicate
Defined as:
p(r)
= {t | t Î r and p(t)}
Where p is a formula in propositional calculus consisting of
terms connected by : (and), (or), (not)
Each term is one of:
<attribute> op <attribute> or <constant>
where op is one of:
Select
(Sigma) is used.
Syntax
condition
(tablename)
Example
sal>1000(emp)
Project Operation
Syntax
(Table
∏ <Attribute name)
Example
list>
∏ (employee)
eid,sal
σSal>10
∏ ( (employee)
)
eid,sal 00
Selects tuples where sal>1000 & from them only eid and salary attributes
are selected.
Union Operation
Consider a query to find the names of all bank customers who have either an
account or a loan or both. Note that the customer relation does not contain the
information, since a customer does not need to have either an account or a loan
at the bank.
1. We know how to find the names of all customers with a loan in the
bank:
Πcustomer-name (borrower)
Πcustomer-name (depositor)
1. The relations r and s must be of the same arity. That is, they must have the
same number of attributes.
2. The domains of the ith attribute of r and the ith attribute of s must be the
same, for all i.
3.Suppose that we want to find the names of all customers who have a
loan at the Perryridge branch. We need the information in both the loan
relation and the borrower relation to do so.
4. If we write
Find the names of all customers who live on the same street and in the same city
as Smith.” We can obtain Smith’s street and city by writing
However, in order to find other customers with this street and city, we must
reference the customer relation a second time.In the following query, we use the
rename operation on the preceding expression to give its result the name smith-
addr, and to rename its attributes to street and city, instead of customer-street
and customer-city:
Πcustomer.customer-name
(σcustomer.customer-street=smith-addr.street ∧
Customer.customer-city=smith-addr.city
Additional Operations:
We define additional operations that do not add any power to the algebra,
but simplify common queries.
Consider the query “Find the names of all customers who have a loan at the
bank, along with the loan number and the loan amount”
The natural join is a binary operation that allows us to combine certain selections
and Cartesian product into one operation.
Suppose that we wish to find all customers who have an account at all the
branches located in Brooklyn.
Step 1:
We can obtain all branches in Brooklyn by the expression
Step 2:
We can find all (customer-name, branch-name) pairs for which the customer has
an account at a branch by writing
Step 3:
Now, we need to find a customer who appears in r2 with every branch name in
r1.operation that provides exactly those customers is divide operation. We
formulate the query by writing
The evaluation of an assignment does not result in any relation being displayed
to the user. Rather, the result of the expression to the right of the ← is assigned
to the relation variable on the left of the←. This relation variable may be used in
subsequent expressions.
It formed the inspiration for the database query languages QUEL and SQL
of which the latter, although far less faithful to the original relational model and
calculus, is now used in almost all relational database management systems as
the ad-hoc query language.
Along with the tuple calculus Codd also introduced the domain calculus
which is closer to first-order logic and showed that these two calculi (and the
relational algebra) are equivalent in expressive power. Subsequently query
languages for the relational model were called relationally complete if they could
express at least all these queries.
Relational database
h : R → 2C
a function that associates a header with each relation name in R. (Note that this
is a simplification from the full relational model where there is more than one
domain and a header is not just a set of column names but also maps these
column names to a domain.)
t:C→D
The set of all tuples over D is denoted as TD. The subset of C for which a tuple t is
defined is called the domain of t (not to be confused with the domain in the
schema) and denoted as dom(t).
db : R → 2TD
that maps the relation names in R to finite subsets of TD, such that for every
relation name r in R and tuple t in db(r) it holds that
dom(t) = h(r).
The latter requirement simply says that all the tuples in a relation should contain
the same column names, namely those defined for it in the schema.
Atoms
For the construction of the formulas we will assume an infinite set V of tuple
variables. The formulas are defined given a database schema S = (D, R, h) and a
partial function type : V -> 2C that defines a type assignment that assigns
headers to some tuple variables. We then define the set of atomic fomulas
A[S,type] with the following rules:
1. if v and w in V, a in type(v) and b in type(w) then the formula " v.a = w.b "
is in A[S,type],
2. if v in V, a in type(v) and k denotes a value in D then the formula " v.a = k
" is in A[S,type], and
3. if v in V, r in R and type(v) = h(r) then the formula " r(v) " is in A[S,type].
• (t.name = "Codd") -- tuple t has a name attribute and its value is "Codd"
• (t.age = s.age) -- t has an age attribute and s has an age attribute with the
same value
• Book(t) -- tuple t is present in relation Book.
The formal semantics of such atoms is defined given a database db over S and a
tuple variable binding val : V -> TD that maps tuple variables to tuples over the
domain in S:
Formulas
The atoms can be combined into formulas, as is usual in first-order logic, with the
logical operators ∧ (and), ∨ (or) and ¬ (not), and we can use the existential
quantifier (∃) and the universal quantifier (∀) to bind the variables. We define the
set of formulas F[S,type] inductively with the following rules:
Examples of formulas:
Note that the last formula states that all books that are written by C. J. Date have
as their subject the relational model. As usual we omit brackets if this causes no
ambiguity about the semantics of the formula.
We will assume that the quantifiers quantify over the universe of all tuples over
the domain in the schema. This leads to the following formal semantics for
formulas given a database db over S and a tuple variable binding val : V -> TD:
1. " f1 ∧ f2 " is true if and only if " f1 " is true and " f2 " is true,
2. " f1 ∨ f2 " is true if and only if " f1 " is true or " f2 " is true or both are true,
3. " ¬ f " is true if and only if " f " is not true,
4. " ∃ v : H ( f ) " is true if and only if there is a tuple t over D such that dom(t)
= H and the formula " f " is true for val[v->t], and
5. " ∀ v : H ( f ) " is true if and only if for all tuples t over D such that dom(t) =
H the formula " f " is true for val[v->t].
Queries
Finally we define what a query expression looks like given a schema S = (D, R,
h):
{ v : H | f(v) }
where v is a tuple variable, H a header and f(v) a formula in F[S,type] where type
= { (v, H) } and with v as its only free variable. The result of such a query for a
given database db over S is the set of all tuples t over D with dom(t) = H such
that f is true for db and val = { (v, t) }.
Domain-independent queries
Because the semantics of the quantifiers is such that they quantify over all the
tuples over the domain in the schema it can be that a query may return a
different result for a certain database if another schema is presumed. For
example, consider the two schemas S1 = ( D1, R, h ) and S2 = ( D2, R, h ) with
domains D1 = { 1 }, D2 = { 1, 2 }, relation names R = { "r1" } and headers h =
{ ("r1", {"a"}) }. Both schemas have a common instance:
db = { ( "r1", { ("a", 1) } ) }
of the query will also be infinite. To solve these problems we will restrict our
attention to those queries that are domain independent, i.e., the queries that
return the same result for a database under all of its schemas.
Safe queries
In order to limit the query expressions such that they express only domain-
independent queries a syntactical notion of safe query is usually introduced. To
determine whether a query expression is safe we will derive two types of
information from a query. The first is whether a variable-column pair t.a is bound
to the column of a relation or a constant, and the second is whether two variable-
column pairs are directly or indirectly equated (denoted t.v == s.w).
For deriving equatedness we introduce the following reasoning rules (next to the
usual reasoning rules for equivalence relations: reflexivity, symmetry and
transitivity):
• for every column name a in H we can derive that v.a is equated with a
bound pair in f,
• for every subexpression of f of the form " ∀ w : G ( g ) " we can derive that
for every column name a in G we can derive that w.a is equated with a
bound pair in g, and
• for every subexpression of f of the form " ∃ w : G ( g ) " we can derive that
for every column name a in G we can derive that w.a is equated with a
bound pair in g.
The restriction to safe query expressions does not limit the expressiveness since
all domain-independent queries that could be expressed can also be expressed
by a safe query expression. This can be proven by showing that for a schema S =
(D, R, h), a given set K of constants in the query expression, a tuple variable v
and a header H we can construct a safe formula for every pair v.a with a in H that
states that its value is in the active domain. For example, assume that K={1,2},
R={"r"} and h = { ("r", {"a, "b"}) } then the corresponding safe formula for v.b
is:
This formula, then, can be used to rewrite any unsafe query expression to an
equivalent safe query expression by adding such a formula for every variable v
and column name a in its type where it is used in the expression. Effectively this
means that we let all variables range over the active domain, which, as was
already explained, does not change the semantics if the expressed query is
domain independent.
where each Xi is either a domain variable or constant, and p(<X1, X2, ...., Xn>)
denotes a DRC formula. The result of the query is the set of tuples Xi to Xn which
makes the DRC formula true.
This language uses the same operators as tuple calculus; Logicial operators ∧
(and), ∨ (or) and ¬ (not). The existential quantifier (∃) and the universal
quantifier (∀) can be used to bind the variables.
[2]
Its computational expresivity is equivalent to that of Relational algebra ..
Examples
In this example, A, B, C denotes both the result set and a set in the table
Enterprise.
In this example, we're only looking for the name, so <B> denotes the column
Name. F = C is a requirement, because we need to find Enterprise crew members
AND they are in the Stellar Cartography Department.
In this example, the value of the requested F domain is directly placed in the
formula and the C domain variable is re-used in the query for the existence of a
department, since it already holds a crew member's id.
SQL
History
IBM Sequel language developed as part of System R project at the IBM San
Jose Research Laboratory
Renamed Structured Query Language (SQL)
ANSI and ISO standard SQL:
o SQL86
o SQL89
o SQL92
o SQL:1999 (language name became Y2K compliant!)
o SQL:2003
Commercial systems offer most, if not all, SQL92 features, plus varying
feature sets from later standards and special proprietary features.
o Not all examples here may work on your particular system.
DQL: SELECT
DML: DELETE, INSERT, UPDATE
DDL: CREATE, DROP, TRUNCATE, ALTER
Allows the specification of not only a set of relations but also information about
each relation, including
The schema for each relation.
The domain of values associated with each attribute.
Integrity constraints
The set of indices to be maintained for each relations.
Security and authorization information for each relation.
The physical storage structure of each relation on disk.
Select
The SELECT statement as the name says is used to extract data from Oracle
Database. The syntax for the simplest SELECT statement is as follows.
Example:
SELECT *
FROM emp;
This command will display all the fields of the table emp and all of the records.
Example:
The result of this statement will be only two columns of emp table and only those
records where salary is greater than 2000.
Example:
86 | P a g e Department of Information Technology
Database Management Systems
The output of this statement will be exactly the same as the one above except
that the output will be sorted based on ename column.
SQL Operators
Example:
Logical Operators and the operators in the Other category can be best
understood by looking at their respective real world examples.
Example:
SELECT sal
FROM emp
WHERE deptno = 30 AND sal > 2000;
The output of the query will be only one column i.e. sal and only those records
will be displayed where department number is 30 and the salary is greater than
2000. So when you use AND operator it means both conditions needs to satisfy
for the record to appear in the output but in case of OR, either first condition
needs to be true or the second one e.g.
SELECT sal
FROM emp
WHERE deptno = 30 OR sal > 2000;
Example:
SELECT *
FROM emp
WHERE job IN ('CLERK','ANALYST');
The output of the query will be all the coulums of emp table but only those
records where job column contains either “CLERK” or “ANALYST”. You can also
use IN operator with as follows.
SELECT *
FROM emp
WHERE sal IN (SELECT sal
FROM emp
WHERE deptno = 30);
By having NOT before IN can complete invert the results like in the following
example. Such types of queries fall under the category called “Sub-Queries”
which we will discuss in the article ahead in this chapter. There is a special
technique to interpret them.
SELECT *
FROM emp
WHERE sal NOT IN (SELECT sal
FROM emp
WHERE deptno = 30);
Example:
SELECT *
FROM emp
WHERE sal BETWEEN 2000 AND 3000;
88 | P a g e Department of Information Technology
Database Management Systems
Only those records will be displayed where the salary is between 2000 and 3000
including both 2000 and 3000.
Example:
Example:
SELECT sal
FROM emp
WHERE ename LIKE 'SM%';
The output will be only those salaries from emp (employee) table where ename
(employee name) begins with “SM”. Another variation of above query is as
follows.
The output will be only those records where ename begins with “SMITH” and
there should not be more than one character after it.
Example:
The output will be ename and deptno but only those records where comm field
has NULL value. NULL is a special value and just keep in mind that its not Zero. It
can be visualized as empty field occupying zero byte
Functions
Single Row functions are further subdivided into five categories viz., Character,
Data, Numeric, Conversion & other functions. First we will start with Character
Functions or more precisely “Single Row Character Functions.”
Character Functions
CHR
LTRIM
INSTR
ASCII
RTRIM
INSTRB
CONCAT
TRIM
LENGTH
INITCAP
REPLACE
LENGTHB
LOWER
SOUNDEX
UPPER
SUBSTR
LPAD
SUBSTRB
RPAD
Example:
Output:
Pet
---
CAT
Example:
SELECT ASCII('Q')
FROM DUAL;
Output:
ASCII('Q')
----------
81
Example:
Output:
Result
-----------------
JAMES is a good boy
Example:
Output:
Capitals
---------
The King
Example:
Output:
Lowercase
-------------
the king
Example:
Output:
LPAD example
---------------
*+*+*+*+*Page 1
Example:
Output:
Result
--------
SPIDERMAN
Example:
Output:
Result
--------
1234567
Output:
Result
--------
1234567000
Example:
Output:
Changes
--------------
HONG KONG
Example:
SELECT ename
FROM emp
WHERE SOUNDEX(ename) = SOUNDEX('SMYTHE');
Output:
ENAME
----------
SMITH
This function allows you to compare words that are spelled differently, but sound
alike in English. You must have noticed that if you do a search in google
(www.google.com) using wrong spelling e.g. I made a wrong spelled word “Neus”
search on google and it came up with, “Did you mean News?”. That is basically
the beauty of this function.
Example:
Result
---------
MAN
Output:
Result
--------
DE
Example:
Output:
Result
----------
14
Similarly we can use INSTRB; for a single-byte database character set, INSTRB is
equivalent to INSTR. Lets suppose a double-byte database character set.
Output:
Result
--------
27
Example:
Output:
Result
--------
9
Output:
Result
--------
14
Date Functions:
ADD_MONTHS
MONTHS_BETWEEN
LAST_DAY
ROUND
SYSDATE
TRUNC
Example:
SELECT ADD_MONTHS(hiredate,1)
FROM emp
WHERE ename = 'SMITH';
Example:
SELECT SYSDATE,
LAST_DAY(SYSDATE) "Last",
LAST_DAY(SYSDATE) - SYSDATE "Days Left"
FROM DUAL;
Output:
Example:
Example:
Output:
New Year
---------
01-JAN-93
Example:
Output:
New Year
---------
01-JAN-92
Numeric Functions:
ABS
ROUND
SIGN
TRUNC
CEIL
SQRT
FLOOR
MOD
Example:
Output:
Result
----------
25
Example:
Output:
Result
----------
-1
Example:
Output:
Result
----------
26
Example:
FROM DUAL;
Output:
Result
----------
25
Example:
Output:
Round
----------
25.3
Output:
Round
----------
30
Example:
Output:
Truncate
----------
25.2
Example:
Output:
Truncate
----------
20
98 | P a g e Department of Information Technology
Database Management Systems
-1 will truncate (make zero) first digit left of the decimal point of 25.29
Example:
Output:
Modulus
----------
3
Example:
Output:
Square root
-----------
5
Conversion Functions:
TO_CHAR
TO_DATE
TO_NUMBER
Example:
Output:
Result
------------------
May 01, 1981
Example:
Output:
Result
--------------
$10,000.00-
Example:
Example:
Output:
Result
---------
-1000
NVL
VSIZE
Example:
Output:
ENAME COMMISSION
---------- -------------------------
ALLEN 300
WARD 500
MARTIN 1400
Example:
Output:
ENAME BYTES
---------- ----------
CLARK 5
KING 4
MILLER 6
Group Functions:
A group function as the name says, gets implemented on more than one record
within a column. They can be better understood by looking at their real world
examples. There are five very important group functions
AVG
COUNT
MAX
MIN
SUM
Example:
Output:
Average
----------
2077.21429
Example:
Output:
Total
----------
18
Example:
Output:
Maximum
----------
5000
On the same line we can find out minimum value using the MIN group function.
Example:
Output:
Total
----------
29081
BASIC STRUCTURE
SQL keywords
Data retrieval
• SELECT is used to retrieve zero or more rows from one or more tables in a
database. In most applications, SELECT is the most commonly used Data
Manipulation Language command. In specifying a SELECT query, the user
specifies a description of the desired result set, but they do not specify
what physical operations must be executed to produce that result set.
Translating the query into an efficient query plan is left to the database
system, more specifically to the query optimizer.
o Commonly available keywords related to SELECT include:
FROM is used to indicate from which tables the data is to be
taken, as well as how the tables JOIN to each other.
WHERE is used to identify which rows to be retrieved, or
applied to GROUP BY. WHERE is evaluated before the GROUP
BY.
GROUP BY is used to combine rows with related values into
elements of a smaller set of rows.
HAVING is used to identify which of the "combined rows"
(combined rows are produced when the query has a GROUP
BY keyword or when the SELECT part contains aggregates),
are to be retrieved. HAVING acts much like a WHERE, but it
operates on the results of the GROUP BY and hence can use
aggregate functions.
ORDER BY is used to identify which columns are used to sort
the resulting data.
Example 1:
SELECT * FROM books WHERE price > 100.00 and price < 150.00 ORDER BY
title
This is an example that could be used to get a list of expensive books. It retrieves
the records from the books table that have a price field which is greater than
100.00. The result is sorted alphabetically by book title. The asterisk (*) means to
show all columns of the books table. Alternatively, specific columns could be
named.
Example 2:
Example 2 shows both the use of multiple tables in a join, and aggregation
(grouping). This example shows how many authors there are per book. Example
output may resemble:
Title Authors
---------------------- -------
SQL Examples and Guide 3
The Joy of SQL 1
How to use Wikipedia 2
Pitfalls of SQL 1
How SQL Saved my Dog 1
Data manipulation
First there are the standard Data Manipulation Language (DML) elements. DML is
the subset of the language used to add, update and delete data.
Example:
INSERT INTO my_table (field1, field2, field3) VALUES ('test', 'N', NULL);
UPDATE my_table SET field1 = 'updated value' WHERE field2 = 'N';
DELETE FROM my_table WHERE field2 = 'N';
Data transaction
COMMIT and ROLLBACK interact with areas such as transaction control and
locking. Strictly, both terminate any open transaction and release any locks held
on data. In the absence of a BEGIN WORK or similar statement, the semantics of
SQL are implementation-dependent.
Example:
BEGIN WORK;
UPDATE inventory SET quantity = quantity - 3 WHERE item = 'pants';
COMMIT;
Data definition
The second group of keywords is the Data Definition Language (DDL). DDL allows
the user to define new tables and associated elements. Most commercial SQL
104 | P a g e Department of Information Technology
Database Management Systems
databases have proprietary extensions in their DDL, which allow control over
nonstandard features of the database system.
The most basic items of DDL are the CREATE and DROP commands.
Some database systems also have an ALTER command, which permits the user
to modify an existing object in various ways -- for example, adding a column to
an existing table.
Example:
CREATE TABLE my_table (
my_field1 INT,
my_field2 VARCHAR (50),
my_field3 DATE NOT NULL,
PRIMARY KEY (my_field1, my_field2)
)
All DDL statements are auto commit so while droping a table need to have close
look at its future needs.
Data control
The third group of SQL keywords is the Data Control Language (DCL). DCL
handles the authorization aspects of data and permits the user to control who
has access to see or manipulate data within the database.
Example:
GRANT SELECT, UPDATE ON my_table TO some_user, another_user
ther
Example:
SELECT * FROM inventory -- Retrieve everything from inventory table
105 | P a g e Department of Information Technology
Database Management Systems
The set operators in SQL are based on the same principles, except they don't
have a complement, and can determine the 'difference' between two sets. Here
are the operators which we apply to combine two queries:
These are powerful ways of manipulating information, but take note: you can
only apply them if the results of the two queries (that are going to be combined)
have the same format - that is, the same number of columns, and identical
column types! (Although many SQLs try to be helpful by, for example, coercing
one data type into another, an idea which is superficially helpful and fraught with
potential for errors). The general format of such queries is illustrated by:
Different strokes..
An outer join could be used (with modification for NULLs if these little
monstrosities are present) to achieve the same result as except.
Similarly, an inner join (with select distinct) can do what intersect does.
Set operators can be combined (as you would expect when playing around with
sets) to achieve results that simply cannot be obtained using a single set
operator.
Note that there are some restrictions on using order by with set operators -
order by may only be used once, no matter how big the compound statement,
and the select list must contain the columns being used for the sort.
106 | P a g e Department of Information Technology
Database Management Systems
Pseudoset operators
Not content with implementing set operators, SQL database creators have also
introduced what are called "pseudoset operators". These operators don't fit
conveniently into set theory, because they allow multiple rows (redundancies)
which are forbidden in true sets.
We use the pseudoset operator union all to combine the outputs of two queries
(all that is done is that the results of the second query are appended to the
results of the first). Union all does exactly what we required from a FULL OUTER
JOIN, which as we've already mentioned, is not implemented in many nominally
"SQL-92 compliant" databases!
This special mark can appear instead of a value wherever a value can appear in
SQL, in particular in place of a column value in some row. The deviation from the
relational model arises from the fact that the implementation of this ad hoc
concept in SQL involves the use of three-valued logic, under which the
comparison of NULL with itself does not yield true but instead yields the third
truth value, unknown; similarly the comparison NULL with something other than
itself does not yield false but instead yields unknown. It is because of this
behaviour in comparisons that NULL is described as a mark rather than a value.
The relational model depends on the law of excluded middle under which
anything that is not true is false and anything that is not false is true; it also
requires every tuple in a relation body to have a value for every attribute of that
relation. This particular deviation is disputed by some if only because E.F. Codd
himself eventually advocated the use of special marks and a 4-valued logic, but
this was based on his observation that there are two distinct reasons why one
might want to use a special mark in place of a value, which led opponents of the
use of such logics to discover more distinct reasons and at least as many as 19
have been noted, which would require a 21-valued logic. SQL itself uses NULL for
several purposes other than to represent "value unknown". For example, the sum
of the empty set is NULL, meaning zero, the average of the empty set is NULL,
meaning undefined, and NULL appearing in the result of a LEFT JOIN can mean
"no value because there is no matching row in the right-hand operand".
NESTED SUBQUERIES
Subqueries
Wouldn't it be nice if you could perform one query, temporarily store the
result(s), and then use this result as part of another query? You can, and the
trickery used is called a subquery. The basic idea is that instead of a 'static'
condition, you can insert a query as part of a where clause! An example is:
Note that in the above query, the inner select statement must return just one
value (for example, an average). There are other restrictions - the subquery must
be in parenthesis, and it must be on the right side of the conditional operator
(here, a greater than sign). You can use such subqueries with =, >, <, >=, <=
and <>, but not {to the best of my knowledge?} with between .. and.
Multiple select subqueries can be combined (using logical operators) in the same
statement, but avoid complex queries if you possibly can!
The assumption is that the nested select statement returns a list. The outer shell
of the statement can then use in to get cracking on the list, looking for a match
of value within the list! There is a surprisingly long list of operators that resemble
in, and can be used in a similar fashion. Here it is:
Operat
What it does
or
>= any
> all Compare this with > any - it should be clear that the
>= all
If you understand > all, these should present no
< all
problem!
<= all
You're not likely to use this one much. It implies that (to
= all succeed) all the values returned by the inner subquery
are equal to one another and the value being tested!
Correlated Subqueries
Whew, we're nearly finished with the subqueries, but there is one more distinct
flavour! The correlated subquery is a nested select statement that can (using
trickery) refer to the outer select statement containing it. By so doing, we can
successively apply the inner select statement to each line generated by the outer
statement! The trick that we use is to create an alias for the outer select
statement, and then refer to this alias in the inner select statement, thus
constraining the inner select to dealing with the relevant row. For an example, we
return to our tedious drug table:
DrugDosing
D1 30 OD
D2 10 OD
D3 200 TDS
Let's say we wanted (for some obscure reason) all doses that are greater than
average dose, for each dosing frequency. [Meaningless, but it serves the
purposes of illustration].
109 | P a g e Department of Information Technology
Database Management Systems
The sense of this statement should be clear - we use the outer select to choose
a row (into which we put DoseMg, and Frequency). We then check whether this
row is a candidate (or not) using the where statement. What does the where
statement check? Well, it makes sure that DoseMg is greater than a magic
number. The magic number is the average dose for a particular Frequency, the
frequency associated with the current row. The only real trickery is how we use
the label fred to refer to the current line from within the inner select statement.
This label fred is called an alias. We'll learn a lot more about aliases later (Some
will argue that aliases are so important you should have encountered them long
before, but we disagree).
Correlated subqueries are not the only way of doing the above. Using a
temporary view is often more efficient, but it's worthwhile knowing both
techniques. We discuss views next.
VIEWS
As the name suggests, a view gives a particular user access to selected portions
of a table. A view is however more than this - it can limit the ability of a user to
update parts of a table, and can even amalgamate rows, or throw in additional
columns derived from other columns. Even more complex applications of views
allow several tables to be combined into a single view!
(Otherwise you have to explicitly destroy a view - SQL won't simply overwrite a
view without the or replace instruction, but will instead give you an irritating
error).
Remember that if one alters the view, you alter the underlying table at
the same time!
One cannot use an order by statement (or something else called a for update
clause) within a view. There is a whole lot of other convenient things you can do
to views. Where you include summary statistics (eg count, sum, etc) in a view it
is termed an aggregate view. Likewise, using distinct, you can have a view on
the possible values of a column or grouping of columns.
You can even create a view that is derived from several tables (A multi-table
view). This is extremely sneaky, as you can largely avoid complex join
statements in code which pulls data out of several tables! Ladányi puts things
rather well:
"Pre-joined and tested views reduce errors .. that subtly or obviously undermine
the accuracy of reports, and thus the credibility and subsequent professional
well-being of the people creating them".
This is implicit in the way we sneakily use a select statement - to limit access to
certain columns, for example, we just select the column-names we want access
to, and ignore the rest! There is of course a catch (Isn't there always?) - if you
insert a row, then SQL doesn't know what to put into the column entry that's not
represented in the view, so it will insert either NULL, or the default value for that
column. (Likewise, with delete, the entire row will be deleted, even the column
entry that is invisible).
It is also obvious how we limit access to certain rows - we use a where clause
that only includes the rows we want in the view. Note that (depending on your
selection criterion) it is possible to insert a row into a view (and thus the
underlying database) and then not be able to see this row in the view! With
(in)appropriate selection criteria for the view, one can also alter the properties of
rows visible in the view so that they now become hidden!
More draconic constraints are possible. The option with read only prevents any
modifications to the view (or the underlying database); while with check option
prevents you from creating rows in the view that cannot be selected (seen) in the
view itself. If you use the "with check option", then you should follow this with a
name, otherwise SQL will create an arbitrary and quite meaningless name for the
constraint that will only confuse you when an error occurs!
An under-utilised but rather attractive use of views is to make them based on set
(or pseudo-set) operators, for example the union of two sets.
COMPLEX QUERIES
Order-independent queries
Remember the tables we discussed when we were talking about foreign keys,
and even earlier when we talked about common sense in normalising data, (not
that we are displaying conspicuous amounts of this with the following tables
which are, after all, only for demonstration purposes)! How does one amalgamate
the data in the tables? (In dataspeak, we call the relationship between the tables
master-detail, or sometimes, parent-child). Well, let's look at the tables..
DrugRegimen DrugDosing
R1 Carbimazole D1 D1 30 OD
R2 Carbimazole D2 D2 10 OD
ne
You might think that a natural extension of the good old select statement is the
following:
and you would be perfectly correct, but what does the above statement give us
when we actually use it? Here we go..
R1 CARBIMAZOLE D1 D1 30 OD
R2 CARBIMAZOLE D2 D1 30 OD
CARBAMAZEPI
R3 D3 D1 30 OD
NE
R1 CARBIMAZOLE D1 D2 10 OD
R2 CARBIMAZOLE D2 D2 10 OD
CARBAMAZEPI
R3 D3 D2 10 OD
NE
CARBAMAZEPI
R3 D3 D3 200 TDS
NE
Every single row of the first table has been joined with each and every row of the
second table, not just the rows that we think should correspond! (This is called a
Cartesian join or cross join, and can rapidly generate enormous tables - as
Ladányi points out, if you perform a cross join on three tables, each with a
thousand rows, then - voila - you have 1000 * 1000 * 1000 = one billion rows,
enough to bring most databases to their knees).
For our purposes, most of the rows in the above cross join are meaningless, but
we can easily reduce the rows to only those we are interested in. We simply use
a where statement to join the tables on the Dosing column, thus:
R1 CARBIMAZOLE D1 D1 30 OD
R2 CARBIMAZOLE D2 D2 10 OD
CARBAMAZEPIN
R3 D3 D3 200 TDS
E
It's so important to always have a where condition with your Cartesian joins, let's
make it into a rule:
If the from clause in a select statement has a comma in it, check the where
clause.
Then check the where clause again. And again! ... Rule #4.
Also note that the two tables each had a column with the same name - "Dosing".
We easily sidestepped this one by simply talking about DrugRegimen.Dosing =
DrugDosing.Dosing, rather than, say, Dosing = Dosing, which would have forced
an error! Needless to say, you can select individual columns from the cross join,
rather than having to say select *.
The above is an example of an inner join. What this means is that if, for every
value in the DrugRegimen.Dosing column, there's a corresponding value in the
DrugDosing.Dosing column, and vice versa, then everything's fine. However, if
(due to some silly person not enforcing relational integrity) there is no matching
value in the corresponding column, the whole row with its unmatched value will
disappear from the final report - it will softly and suddenly vanish away! Apart
from being a goad to ensure relational integrity in all of your databases, this
should alert you to the possibility that you might trustingly run a query on a
database, and get complete garbage out, because you used an inner join! The
solution is an outer join.
Needless to say, few vendors have stuck to the SQL-92 standard as regards outer
joins. For example, Oracle sneaks three tiny characters into the where statement
thus:
The (+) tells SQL to "join in a NULL row if you can't find anything that matches a
problem row in DrugDosing.Dosing " - all very convenient, but not standard SQL-
92. Also note that the (+) is on the same side of the equals sign as the table that
is 'augmented', not the one that's causing the problem. It should be clear why
this is called a left outer join, and
Many vendor SQLs also do not implement the SQL standard for a full outer join,
that lists all rows (whether matched or not) from all tables. The SQL-92 syntax for
the from clause is:
There are other ways of achieving a full outer join in most SQLs. Remember
that the way to avoid all this is to meticulously enforce constraints on
integrity! Also note that an outer join will NOT help you if there are duplicate
entries in one of the tables you are using for the join (which can only occur in a
'relationally challenged' database).
For example in Oracle the DDL statements refer to CREATE, DROP, ALTER, etc..
DBCC (Database Console Commands) - Statements check the physical and logical
consistency of a database.
before the application program is compiled. There are two types of embedded
SQL: static and dynamic.
The SQL standard defines embedding of SQL as embedded SQL and the language
in which SQL queries are embedded is referred as host language.
EXEC SQL
declare c cursor for select cname, ccity from deposit, customer where
deposit.cname = customer.cname and deposit.balance > :amount
END-EXEC
This statement causes the DB system to execute the query and to save the
results within a temporary relation. A series of fetch statement are executed to
make tuples of the results available to the program.
The program can then manipulate the variable cn and cc using the features of
the host programming language. A single fetch request returns only one tuple.
We need to use a while loop (or equivalent) to process each tuple of the result
until no further tuples (when a variable in the SQLCA is set). We need to use
close statement to tell the DB system to delete the temporary relation that held
the result of the query.
• Embedded SQL can execute any valid update, insert, or delete statements.
• Dynamic SQL component allows programs to construct and submit SQL
queries at run time
• SQL-92 also contains a module language, which allows procedures to be
defined in SQL.
Dynamic SQL allows you to write SQL that will then write and execute more SQL
for you. This can be a great time saver because you can:
Most dynamic SQL is driven off the system tables, and all the examples I use here
will use system tables, but if you have suitable data in user tables, you can write
dynamic SQL from the contents of these tables, too.
These two statements generate Update Stats and DBCC commands for each user
table in the database. You can cut and paste the results into a SQL window and
run them, or save the output to a file for later use.
For the first example there is an even quicker way of achieving this--you can use
the built-in SP sp_msforeachtable like this:
sp_msforeachtable
The system stored procedure loops through the tables in the current database
and executes the command "Update statistics" with each table name substituted
where you see the '?' character.
You can use cursors and the EXEC or sp_executesql commands to execute your
dynamic SQL at the same time as you generate it. These commands take a char
or varchar parameter that contains a SQL statement to execute, and runs the
command in the current database. This script will execute the DBCC REINDEX
command for all the user tables in the current database.
This short query interrogates the master database table sysdatabases, and
executes a DBCC command against every database. This sort of thing is
especially useful for carrying out database maintenance and backups with earlier
versions of SQL Server where the maintenance wizards are not very well
developed, or where you want to do some kind of non-standard task across all
databases.
If you were to expand this routine using the cursor method above you could write
a code block that would, for example, dynamically back up all databases, with
new and deleted databases catered for automatically.
Query by Example (QBE) is a powerful search tool that allows anyone to search a
system for document(s) by entering an element such as a text string or
document name, quickly searching through documents to match your entered
criteria. It’s commonly believed that QBE is far easier to learn than other, more
formal query languages, (i.e. SQL) while still providing people with the
opportunity to perform powerful searches.
Searching for documents based on matching text is easy with QBE; the user
simply enters (or copy and paste) the target text into the search form field. When
118 | P a g e Department of Information Technology
Database Management Systems
the user clicks search (or hits enter) the input is passed to the QBE parser for
processing. The query is created and then the search begins, using key words
from the input the user provided. It auto-eliminates mundane words such as and,
is, or, the, etc… to make the search more efficient and not to barrage the user
with results. However, when compared with a formal query, the results in the
QBE system will be more variable.
The user can also search for similar documents based on the text of a full
document that he or she may have. This is accomplished by the user’s
submission of documents (or numerous documents) to the QBE results template.
The analysis of these document(s) the user has inputted via the QBE parser will
generate the required query and submits it to the search engine, that will
obviously then search for relevant and similar material for the specified list.
Example
A simple example using the Suppliers and Parts database is given here, just to
give you a feel for how QBE works.
This "query" selects all supplier numbers (S#) where the owner of the supplier
company is "J. Doe" and the supplier is located in "Rome".
Other commands like the "P." (print) command are: "U." (update), "I." (insert) and
"D." (delete).
The result of this query depends on what the values are for your the Suppliers
and Parts database.
2.2.11 DATALOG
Datalog is a query and rule language for deductive databases that syntactically
is a subset of Prolog. Its origins date back to around 1978 when Hervé Gallaire
and Jack Minker organized a workshop on logic and databases. The term Datalog
was coined in the mid 1980's by a group of researchers interested in database
theory.
Query evaluation with Datalog is sound and complete and can be done efficiently
even for large databases. Query evaluation is usually done using bottom up
strategies. For restricted forms of datalog that don't allow any function symbols,
safety of query evaluation is guaranteed.
In contrast to Prolog, it
Example
parent(bill,mary).
parent(mary,john).
ancestor(X,Y) :- ancestor(X,Z),ancestor(Z,Y).
ancestor(X,Y) :- parent(X,Y).
• XSB, is a Logic Programming and Deductive Database system for Unix and
Windows.
The SmallSQL Database is a 100% pure Java DBMS for desktop applications. It
has a JDBC 3.0 driver and supports SQL-92 and SQL-99 standards. It has a very
small footprint of approx. 200KB for driver and database together. This is very
small for a JDBC 3.0 interface.
The difference to other 100% pure Java databases is that it has no network
interface and user management. The target applications are Java desktop
applications. There is no installation required.
Enterprise Manager is the primary administrative tool for Microsoft SQL Server
2000 and provides a MMC–compliant user interface that allows users to:
MMC is a tool that presents a common interface for managing different server
applications in a Microsoft Windows network. Server applications provide a
component called an MMC snap-in that presents MMC users with a user interface
for managing the server application. SQL Server Enterprise Manager is the
Microsoft SQL Server 2000 MMC snap-in.
Q-BANK
2.a)Write notes on the various set operations in relational algebra. (8)-nov 2008
a)QBE (8)
Employee database
Employee(emp_name,street,city,phone,pin)
Works(emp_name,company_name,salary)
Company(company_name,city)
Manages(emp_name,manager_name)
(b) Find all the employees names who do not work for ICICI insurance company
(3)
(c) Find all the employees who live in the same cities and on the same streets
as do their managers .(4)
(d) Assume that ICICI insurance company may be located in several cities .find
all companies located in every sity in which UTI insurance company is located.(4)
UNIT - III
2Marks Q & A:
Denormalization:
Process of storing the join of higher normal form relations as base
relation, which is in the lower normal form.
12 m
Where P
17. What are aggregate functions? And list the aggregate functions
supported by SQL?
Aggregate functions are functions that take a collection of
values as input and return a single value.
37. List some security violations (or) name any forms of malicious access.
Unauthorized reading of data
Unauthorized modification of data
Unauthorized destruction of data.
40. List out various user authorization to modify the database schema.
Index authorization
Resource authorization
Alteration authorization
Drop authorization
Lecture Notes:
Relational-Database Design
Example
An a/c balance cannot be null.
No 2 accounts can have the same acc_no.
Every acc-No in the depositor relation must have a matching acc-no in the
A/c relation
1. Domain Constraints
2. Referential Integrity
3. Assertions
4. Triggers
5. Security and Authorization
6. Authorization in SQL
7. Encryption and Authentication
1. Domain Constraints:
Domain constraints are the most elementary form of integrity
constraint.
They are tested easily by the system whenever a new data item is
entered into the database.
The domain HourlyWage has a constraint that ensures that the hourly
wage is greater than 4 .00. The clause constraint wage-value-test is optional,
and is used to give the name wage-value-test to the constraint. The name is used
to indicate which constraintan update violated.
As another example, the domain can be restricted to contain only a specified set
of values by using the in clause:
create domain AccountType char (10)
constraint account-type-test
check (value in (’Checking ’,’Saving ’))
The preceding check conditions can be tested quite easily, when a tuple is
inserted or modified.
2. Referential Integrity:
It is used to ensure that a value that appears in one relation for a given set
of attributes also appears for a certain set of attributes in another relation.
Example:
dept_name varchar2(10),
132 | P a g e Department of Information Technology
Database Management Systems
location varchar2(10)
);
employee_name varchar2(10),
salary number(5),
doj date,
);
The field "Dept_no" in the employee table is declared a foreign key, and it
refers to the field "Dept_no" in the Department table which is declared a primary
key.
Consider the relationship set R between entity sets E1 and E2. The
relational schema for R includes the primary keys K1 of E1 and K2 of E2.
Then K1 and K2 form foreign keys on the relational schemas for E1 and E2
respectively.
3.Assertions
• The sum of all loan amounts for each branch must be less than the
sum of all account balances at the branch.
• Every loan has at least one customer who maintains an account
with a minimum balance of $1000.00.
Since SQL does not provide a “for all X, P(X)” construct (where P is a
predicate), we are forced to implement the construct by the equivalent
“not exists X such that not P(X)” construct, which can be written in SQL.
The database stores triggers just as if they were regular data, so that they
are persistent and are accessible to all database operations.
Once we enter a trigger into the database, the database system takes on
the responsibility of executing it whenever the specified event occurs and
the corresponding condition is satisfied.
Another Example
1. DB system level
3. Network level:
4. Physical Level:
5. Human Level:
Protection from stolen passwords etc…
Solution:
* Frequent change of passwords.
* Data audits
The data stored in the database need protection from unauthorized access
and malicious destruction or alteration, in addition to the protection
against accidental introduction of inconsistency that integrity constraints
provide.
Security Violations
Among the forms of malicious access are:
• Unauthorized reading of data (theft of information)
• Unauthorized modi .cation of data
• Unauthorized destruction of data
Authorization:
We may assign a user several forms of authorization on parts of the
database. For
example,
o Read authorization allows reading,but not modi .cation,of data.
o Insert authorization allows insertion of new data,but not modi .cation of
existing data.
o Update authorization allows modi .cation,but not deletion,of data.
o Delete authorization allows deletion of data.
Authorization in SQL
• The SQL language offers a fairly powerful mechanism for defining
authorizations.
Privileges in SQL:
• The SQL standard includes the privileges delete ,insert ,select ,and
update .
• The select privilege corresponds to the read privilege.
• SQL also includes a references privilege that permits a user/role to
declare foreign keys when creating relations.
• If the relation to be created includes a foreign key that references
attributes of another relation, the user/role must have been granted
references privilege on those attributes.
grant <privilege list >on <relation name or view name >to <user/role list >
The privilege list allows the granting of several privileges in one command.
Public-key encryption:
Authentication:
• Authentication refers to the task of verifying the identity of a
person/software connecting to a database. The simplest form of
authentication consists of a secret password which must be presented
when a connection is opened to a database.
• Password-based authentication is used widely by operating systems as
well as databases.
• A more secure scheme involves a challenge-response system.The
database system sends a challenge string to the user. The user encrypts
the challenge string using a secret password as encryption key, and then
returns the result. The database system can verify the authenticity of the
user by decrypting the string with the same secret password, and checking
the result with the original challenge string. This scheme ensures that no
passwords travel across the network.
139 | P a g e Department of Information Technology
Database Management Systems
Design Goals:
a. Avoid redundant data.
b. Ensure that relationships among attributes are represented.
c. Facilitate the checking of updates for violation of db integrity
constraints.
Here branch Adayar details are represented 2 times . This leads to a redudancy
problem.
Redundancy:
Data for branch_name,branch_city, assets are represented for each loan
that a branch makes
a. wastage space
b. Complicates updating,introducing inconsistency of assets value.
Decomposition:
* Decompose the relation-schema, lending schema into,
Branch-schema=(branch_name,branch_city,assets)
Loan-schema = (customer_name,loan_no,branch_name,amount)
All attributes of original schema R must appear in decomposition(R1,R2)
R= R1 U R2
Lossless join decomposition.
It requires that the value for a certain set of attributes determines uniquely
the value for another set of attributes.
X Y
E.G…
Marks Grade
Types:
A. Full Dependencies:
B. Partial Dependencies:
C. Transitive Dependencies:
X Y
Y Z
X Z
For e.g.. Grade depends on marks and in turn make depends on {student_no
course_no}, hence Grade depends fully transitively on {student_no course_no}
Definition[single attribute]:
XY
For example, given the value of item code, there is only one value of item
name for it. Thus item name is functionally dependent on item code. This is
shows as:
Similarly in table 1, given an order number, the date of the order is known.
Definition[composite attribute]:
X, Z Y
mentioned below, for example, Order no and Item code together determine Qty
and Price. Thus:
Observations:
Name
Address
Roll no
Department
Year of study
Relation key:
For example, using Vendor code, the Vendor name and address are uniquely
determined.
Example:
In this Vendor code and Item code together form the key. This dependency is
shown in the following diagram ( figure 2).
Quantity
supplied
Vendor code
Date of
Item code
supply
Price/unit
JOIN DEPENDENCIES:
• However, there are join dependencies that are not equivalent to any
multivalued dependency.
Example:
o Consider Loan-info-schema = (branch-name, customer-name, loan-
number, amount).
o Each loan has one or more customers, is in one or more branches and has
a loan amount; these relationships are independent, hence we have the
join dependency
o *(=(loan-number, branch-name), (loan-number, customer-name), (loan-
number, amount))
(loan-number, branch-name)
(loan-number, customer-name)
(loan-number, amount)
Normal Forms
Normalization
Purpose of Normalization
1NF
2NF
3NF
BCNF
R is in 4NF if
R is in BCNF if
R is in 3NF if
R is in 2NF if
R is in 1NF
Decomposition
E.G .. DEPT
• The test for 2NF involves testing for functional dependencies whose left
hand side attributes are part of primary key. If the primary key contains a
single attribute, the test need not be applied at all.
• A relation schema R is in 2NF if every non-prime attribute A in R is
fully functionally dependent on the primary key of R.
SS ENAME
N
FD1 FD2
FD3
RULES:
Consider a relation,
Name
Department
Rollno
Year
Hostel_name
If it is decided to ask all first year students to move to Kaveri hostel, and
all second year students to Ganga hostel, this change should be made in many
places in first table. Also when a student year of study changes, his hostel
change should also be noted in first table. This is undesirable.
First table is thus in 2NF but not in 3NF, to transform it to 3NF, we should
introduce another relation which includes the functionally related non-key
attributes. This is shown below,
Year Hostel_nam
e
1 Ganga
2 Kaveri
3 Krishna
4 Godavari
Assumptions:
2. Assume further that the composite keys have a common attribute in that
relation.
Example:
It is assumed that
Keys:
1. Professor Code
2. Department
By combining these two keys we can find out in each department how
many percentage of hours each professor have spent
Using Professor and Head of Dept we can find out in particular dept. how
many percent of hours he has spent.
In each dept how many percent of time he spent we cannot find out.
The relationship diagram for the above relation is given in fig. Table gives
the relation attributes. The two possible composite keys are professor code and
Head of Dept. observe that department as well as Head of Dept are not non-key
attributes. They are a part of a composite key
Head of Dept
Department
Professor code
Percent time
P1 Physics Ghosh 50
P1 Mathematic Krishnan 50
s
P2 Chemistry Rao 25
P2 Physics Ghosh 75
The relation given in table is in 3NF. observe, however, that the names of
Dept and Head of Dept are duplicated. further if professor P2 resigns, row 3 and
4 are deleted. We lose the information that Rao is the Head of Dept of chemistry.
Department
Percent time
157 | P a g Professor
e code
Department of Information Technology
Database Management Systems
o BCNF.
o Lossless-join.
o Dependency-preservation.
5. If we cannot achieve this, we accept
o 3NF
o Lossless-join.
o Dependency-preservation.
6. A final point: there is a price to pay for decomposition. When we
decompose a relation, we have to use natural joins or Cartesian products
to put the pieces back together. This takes computational time.
This process tells who normalization fits into overall database design. Let us
assume that a relation schema R is given, proceeded to normalize it. There are
several ways in which we could have come up with the schema R:
When we carefully define an E-R diagram, identifying all entities correctly, the
tables generated from the E-R diagram should not need further normalization.
However, there can be functional dependencies between attributes of an entity.
For instance, suppose an employee entity had attributes department-number and
department-address, and there is a functional dependency department-number
→ department-address. We would then need to normalize the relation generated
from employee.
Most examples of such dependencies arise out of poor E-R diagram design.
Functional dependencies can help us detect poor E-R design. If the generated
relations are not in desired normal form, the problem can be fixed in the E-R
diagram. That is, normalization can be done formally as part of data modeling.
Alternatively, normalization can be left to the designer’s intuition during E-R
modeling, and can be done formally on the relations generated from the E-R
model.
Q - BANK
(a)Referential Integrity(7)
4.Explain the two normal forms for Relational database schemes and also
compare it (15)-April/May 2009
9. Give an algorithm for testing lossless join and describe the pipeline in detail
(15)- Nov2007
10. Explain good and bad decomposition of normalization with examples (15)-
Nov2007
UNIT - IV
2Marks Q & A:
22. What are the factors to be taken into account when choosing a RAID level?
• Monetary cost of extra disk storage requirements.
• Performance requirements in terms of number of I/O operations
• Performance when a disk has failed.
• Performances during rebuild.
26. Distinguish between fixed length records and variable length records?
Fixed length records
Every record has the same fields and field lengths are fixed.
Variable length records
File records are of same typ e but one or more of the fields are of
varying size.
27. What are the ways in which the variable-length records arise in database
systems?
o Storage o f multiple record types in a file.
o Record types that allow variable lengths for one or more fields.
o Record types that allow repeating fields.
30. What are the two types of blocks in the fixed –length representation?
Define them.
Anchor block: Contains the first record of a chain.
Overflow block: Contains the records other than those that are the
first record o f a chain.
38. What are the techniques to be evaluated for both ordered indexing and
hashing?
Access types
Access time
Insertion time
Deletion time
Space overhead
52. What can be done to reduce the occurrences of bucket overflows in a hash
file organization?
To reduce bucket overflow the number of bucket is chosen to be
(nr/fr)*(1+d).
We handle bucket overflow by using
• Overflow chaining(closed hashing)
• Open hashing
Lecture notes:
Basic Concepts – Ordered Indices – B+-Tree Index Files – B-Tree Index Files
– Static Hashing – Dynamic Hashing – Comparison of Ordered Indexing and
Hashing – Index Definition in SQL – Multiple-Key Access.
Cache
fastest and most costly form of storage; volatile;managed by the computer
system hardware
(Note: “Cache” is pronounced as “cash”)
Main memory
• Fast access (10s to 100s of nanoseconds; 1 nanosecond =
10–9 seconds)
• Generally too small (or too expensive) to store the entire database
capacities of up to a few Gigabytes widely used currently
Capacities have gone up and per byte costs have decreased steadily
and rapidly (roughly factor of 2 every 2 to 3 years)
Flash memory
Data survives power failure
Data can be written at a location only once, but location can be erased
and written to again
Can support only a limited number (10K – 1M) of write/erase
cycles.
Erasing of memory has to be done to an entire bank of
memory
Reads are roughly as fast as main memory
But writes are slow (few microseconds), erase is slower
NOR Flash
Fast reads, very slow erase, lower capacity
Used to store program code in many embedded devices
NAND Flash
Page-at-a-time read/write, multi-page erase
High capacity (several GB)
Widely used as data storage mechanism in portable devices
Magnetic-disk
Data is stored on spinning disk, and read/written magnetically
Primary medium for the long-term storage of data; typically stores entire
database.
Data must be moved from disk to main memory for access, and written
back for storage
direct-access – possible to read data on disk in any order, unlike
magnetic tape
Survives power failures and system crashes
• disk failure can destroy data: is rare but does happen
n Optical storage
l non-volatile, data is read optically from a spinning disk using a laser
l CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms
l Write-one, read-many (WORM) optical disks used for archival
storage (CD-R, DVD-R, DVD+R)
l Multiple write versions also available (CD-RW, DVD-RW, DVD+RW,
and DVD-RAM)
l Reads and writes are slower than with magnetic disk
l Juke-box systems, with large numbers of removable disks, a few
drives, and a mechanism for automatic loading/unloading of disks
available for storing large volumes of data
n Tape storage
l non-volatile, used primarily for backup (to recover from disk failure),
and for archival data
l sequential-access – much slower than disk
l very high capacity (40 to 300 GB tapes available)
l tape can be removed from drive ⇒ storage costs much cheaper than
disk, but drives are expensive
l Tape jukeboxes available for storing massive amounts of data
n hundreds of terabytes (1 terabyte = 109 bytes) to even a
petabyte (1 petabyte = 1012 bytes)
Storage Hierarchy
Magnetic Disk
n Read-write head
l Positioned very close to the platter surface (almost touching it)
l Reads or writes magnetically encoded information.
n Surface of platter divided into circular tracks
l Over 50K-100K tracks per platter on typical hard disks
n Each track is divided into sectors.
l Sector size typically 512 bytes
l Typical sectors per track: 500 (on inner tracks) to 1000 (on outer
tracks)
174 | P a g e Department of Information Technology
Database Management Systems
n To read/write a sector
l disk arm swings to position head on right track
l platter spins continually; data is read/written as sector passes under
head
n Head-disk assemblies
l multiple disk platters on a single spindle (1 to 5 usually)
l one head per platter, mounted on a common arm.
n Cylinder i consists of ith track of all the platters
n Earlier generation disks were susceptible to “head-crashes” leading to loss
of all data on disk
l Current generation disks are less susceptible to such disastrous
failures, but individual sectors may get corrupted
Disk controller – interfaces between the computer system and the disk drive
hardware.
l accepts high-level commands to read or write a sector
l initiates actions such as moving the disk arm to the right track and
actually reading or writing the data
l Computes and attaches checksums to each sector to verify that
data is read back correctly
If data is corrupted, with very high probability stored
checksum won’t match recomputed checksum
l Ensures successful writing by reading back sector after writing it
l Performs remapping of bad sectors
Disk Subsystem
n Access time – the time it takes from when a read or write request is
issued to when data transfer begins. Consists of:
l Seek time – time it takes to reposition the arm over the correct
track.
Average seek time is 1/2 the worst case seek time.
– Would be 1/3 if all tracks had the same number of
sectors, and we ignore the time to start and stop arm
movement
4 to 10 milliseconds on typical disks
l Rotational latency – time it takes for the sector to be accessed to
appear under the head.
Average latency is 1/2 of the worst case latency.
4 to 11 milliseconds on typical disks (5400 to 15000 r.p.m.)
n Data-transfer rate – the rate at which data can be retrieved from or
stored to the disk.
l 25 to 100 MB per second max rate, lower for inner tracks
l Multiple disks may share a controller, so rate that controller can
handle is also important
E.g. ATA-5: 66 MB/sec, SATA: 150 MB/sec, Ultra 320 SCSI: 320
MB/s
Fiber Channel (FC2Gb): 256 MB/s
.
n Mean time to failure (MTTF) – the average time the disk is expected to
run continuously without any failure.
l Typically 3 to 5 years
l Probability of failure of new disks is quite low, corresponding to a
theoretical MTTF of 500,000 to 1,200,000 hours for a new disk
E.g., an MTTF of 1,200,000 hours for a new disk means that
given 1000 relatively new disks, on an average one will fail
every 1200 hours
l MTTF decreases as disk ages
Disk arm scheduling- algorithms order pending accesses to tracks so that disk
arm movement is minimized
Elevator algorithm - move disk arm in one direction (from outer to inner tracks
or vice versa), processing next request in that
blocks to correspond to how data will be accessed E.g. Store related information
on the same or nearby blocks/cylinders.
Bit level striping – split the bits of each byte across multiple disks
In an array of eight disks, write bit i of each byte to disk i. Each access can read
data at eight times the rate of a single disk. But seek/access time worse than for
a single disk. Bit level striping is not used much any more
Block level striping –it stripes block along multiple disks. with n disks, block i of
a file goes to disk (i mod n) + 1. Requests for different blocks can run in parallel if
the blocks reside on different disks. A request for a long sequence of blocks can
utilize all disks in parallel.
RAID Levels:
Mirroring provides high reliability, but it is expensive. Striping provides high data-
transfer rates, but doesn’t improve reliability. various schemes provide
redundancy at lower cost by combining disk striping with” “parity” bits. These
schemes are classified into RAID levels. They are as follows:
RAID Level 1:
RAID Level 2:
RAID Level 3:
RAID Level 4:
• Block-Interleaved Parity;
• uses block level striping, and keeps a parity block on a separate disk for
corresponding blocks from N other disks.
• When writing data block, corresponding block of parity bits must also be
computed and written to parity disk
• To find value of a damaged block, compute XOR of bits from corresponding
blocks (including parity block) from other disks.
• Provides higher I/O rates for independent block reads than Level 3
• block read goes to a single disk, so blocks stored on different disks can be
read in parallel
• Provides high transfer rates for reads of multiple blocks than no striping
• Before writing a block, parity data must be computed Can be done by
using old parity block, old value of current block and new value of current
block (2 block reads + 2 block writes)
• Or by re-computing the parity value using the new values of blocks
corresponding to the parity block
• More efficient for writing large amounts of data sequentially
• Parity block becomes a bottleneck for independent block writes since
every block write also writes to parity disk
RAID Level 5:
RAID Level 6:
Hardware Issues:
• Many systems maintain spare disks which are kept online, and used as
replacements for failed disks immediately on detection of failure
• Reduces time to recovery greatly
• Many hardware RAID systems ensure that a single point of failure will not
stop the functioning of the system by using Redundant power supplies with
battery backup
• Multiple controllers and multiple interconnections to guard against
controller/interconnection failures
Tertiary storage:
In large database system some of the data may have to reside on tertiary
storage. The two main tertiary storage media are:
• Optical disks
• Magnetic tapes
Opticall Disks
Compact diskread only memory (CDROM) Removable disks, 640 MB per disk
Seek time about 100 msec (optical read head is heavier and slower) Higher
latency (3000 RPM) and lower datatransfer rates (36 MB/s) compared to
magnetic disks n Digital Video Disk (DVD) DVD5 holds 4.7 GB , and DVD9 holds
8.5 GB
DVD10 and DVD18 are double sided formats with capacities of 9.4 GB and 17 GB
Slow seek time, for same reasons as CDROM Record once versions (CDR and
DVDR) are popular data can only be written once, and cannot be erased.high
capacity and long lifetime; used for archival storage Multiwrite versions (CDRW,
Magnetic Tapes
GB with DLT (Digital Linear Tape) format, 100 – 400 GB+ with Ultrium format,
and 330 GB with Ampex helical scan format Transfer rates from few to 10s of
MB/s Currently the cheapest storage medium Tapes are cheap, but cost of drives
is very highn Very slow access time in comparison to magnetic disks and optical
disks limited to sequential access. some formats (Accelis) provide faster seek
(10s of seconds) at cost of lower capacity Used mainly for backup, for storage of
infrequently used information,and as an offline medium for transferring
information from one system
to another. Tape jukeboxes used for very large capacity storage (terabyte (1012
bytes) to petabye (1015 bytes)
Storage Access
storage units called blocks. Blocks are units of both storage allocation and data
transfer.
Bufffer Manager
Programs call on the buffer manager when they need a block from disk.
Buffer manager does the following:
• If the block is already in the buffer, return the address of the block in
main memory
1. . If the block is not in the buffer
• . Allocate space in the buffer for the block
• . Replacing (throwing out) some other block, if
required, to make space for the new block.
• Replaced block written back to disk only if it was modified since the
most recent time that it was written to/fetched from the disk.
• Read the block from the disk to the buffer, and return the
address of the block in main memory to requester.
BufferReplacement
Policies
Most operating systems replace the block least recently used (LRU
strategy) Idea behind LRU – use past pattern of block references as a predictor
of future references Queries have welldefined access patterns (such as
sequential scans), and
a database system can use the information in a user’s query to predict future
references
e.g. when computing the join of 2 relations r and s by a nested loops for each
tuple tr of r do for each tuple ts of s do
if the tuples tr and ts match … Mixed strategy with hints on replacement strategy
provided by the query optimizer is preferable
BuffferReplacement
Policies (Cont.)
Pinned block – memory block that is not allowed to be written back to disk.
Tossimmediate
strategy – frees the space occupied by a block as soon as the final tuple of
that block has been processed n Most recently used (MRU) strategy – system
must pin the block currently being processed. After the final tuple of that block
has been processed, the block is unpinned, and it becomes the most recently
used block. Buffer manager can use statistical information regarding the
FILE ORGANISATION:
*FIXED-LENGTH RECORDS:
For an example, let us consider a file of account records for our bank
database. Each record of this file is defined as:
type deposit=record
account-number:char(10);
branch-name:char(22);
balance:real;
end
• If we assume that each character occupies 1 byte and that a real occupies 8
bytes, our account record is 40 bytes long. A simple approach is to use the
first 40 bytes for the first record, the next 40 bytes for the second record, and
so on.
187 | P a g e Department of Information Technology
Database Management Systems
• It might be easier simply to move the final record of the file into the space
occupied by the deleted record.(fig-1.3)
Header
A- Perryridg 400
Record 102 e
0
Record
1
Record
4
Record
6
♦ NOTE: Insertion and deletion for files of fixed-length records are simple to
implement, because
♦ The space made available by a deleted record is exactly the space needed to
insert a record. An inserted record may not fit in the space left free by a
deleted record, or it may fill only part of that space.
*VARIABLE-LENGTH RECORDS:
They are:
*BYTE-STRING REPRESENTATION.
*FIXED-LENGTH REPRESENTATION.
branch-name:char(22);
account-info:array[1…. ]of
record
account-number:char(10);
balance:real;
end
end
*BYTE-STRING REPRESENTATION:
Perryridge A- 400 A-
102 201
Perryridge A- 700 ⊥
218
Downtown A- 500 A-
101 110
Redwood A- 700 ⊥
222
A- 750 ⊥
Brighton 217
• The actual records are allocated contiguously in the block starting from the
end of the block.
• The free space in the block is contiguous,between the final entry in the
header array,and the first record.
• If a record is inserted,space is allocated fopr it at the end of free space,and
an entry containing its size and location is added to the header.
• If a record is deleted,the space that it occupies is freed,and its entry is set
to deleted(its size set to –1;for example).
• Records can be grown or shrunk by similar techniques,as long as there is
space in the block.
• The cost of moving the records is not too high,since the size of a block is
limited:A typical values is 4 kilobytes.
• The slotted-page structures requires that there be no pointers that directly
to records.
NOTE:
variable-length record.
• A record int this file is of the account-list type,but with the array containing
eactly three elements. Those branches with fewer than three accounts(for
example,Round Hill) have records with null fields.
• We use the symbol ⊥ to represent this situation in the fig 1.7.
• The reserved-space method is useful when most records have a length close
to the maximum.
• In our bank example,some branches may have may have many more
accounts than others.This situation leads us to consider the linked list
method.
• To represent the file by the linked list nmethod,we add a pointer field as we
did in the fig 1.4 The resulting structure appears in the fig 1.8.
5 A-201 900
7 A-110 600
8 A-218 700
The file structures of fig 1.4 and 1.8 both use pointers,the difference is that,in fig
1.6,we use pointers to chain together only deleted records,whereas in the fig
1.8,we chain together all records pertaining to the same branch.
A disadvantage to the structure of fig 1.8 is that we waste space in all records
except the first in a chain
This wasted space is significant since we except,in practice,that each branch has
large number of accounts.
To deal with this problem,we allow two kinds of blocks in our file:
3. Thus,all recors within a block have the same length,even though not all
records in the file have the same length.Fig1.8 shows this file structure.
• Heap file organization:Any record can be placed anywhere in the file where
there is space for the record.There is no ordering of records.Typically,there is
a single file for each relation.
• Sequentail file organization:Records are stroed in sequentail order,according
to the value of a “search key” of each record.
• Hashinf file organization: Ahash function is computed on some attribute of
each record.The result of the hash function specifies in which blocck of the file
the record should be placed.
• Generally,a separate file is used to store the records of each relation.
• However,in a clustering file organization,records of several different relations
are stored in the same file; further related records of the different relations
are stored in the same block,so that one I\o operation fetches related records
from all the relations.
• The fig1.11 shows a sequential file of account records taken from our banking
example.In that example,the records are stored in search-key order,using
branch name as the search key.
• It is difficult,however,to maintain physical sequential order as records are
minserted and deleted,since it is costly to move many records as a result of a
single insertion or deletion.
• We can manage deletion by using pointer chains,as we saw previously.
• For insertion,we apply the following rules:
• Locate the record in the file that comes before the record to be inserted in
search-ckey order.
• If there is a free record (ie., space left after a deletion)within the same block
as this record, insert the new record there. Otherwise, insert the new record in
an overflow block. In either case,adjust the pointers so as to chain together
the records in search-key order.
• Fig 1.12 shows the file file of fig1.11 after the insertion of the
record(NorthTown,A-888,800). The structure in the fig1.12 allows fast
insertion of new records,but forces sequential file-processing applications to
process records in an order that does not match thephysical order of the
records.
• If relatively few records need to be stored in overflow blocks,this approach
works well.
• At this point,the file should be reorganized so that it is once again physically
in sequential order.Such reorganizations are costly,and must be dne during
times when the system load is low..
The databsde system stores all relations in this one file,and manages the file
itself.
To see the advantage of storing many relations in this one file,consider the
following SQL query for the bank database:
select account-number,customer-name,customer-street,customer-city
from depositer,customer
where depositer.customer-name=customer.customer-name.
• The fig 1.15 show a file structure designed for efficient excecution of queries
involving depositer&customer.
• The depositer tuples for each customer-name are stored near the customer
tuple for the corresponding customer-name.
• This structure mixes together tuples of two relations, but allows for efficient
processing of the join.
• When a tuple of the customer relation is read,the entire block containg that
tuple is copied from disk into main memory.
• Since the depositer tuples are stored on the disk near the customer-tuple,the
block containing the customer tuple contains tuples of the depositer relation
needed to process the query.If a customer has so many accounts that the
depositor records do not fit in one block,the remaining records appear on
nearby blocks.
• Requires more block accesses the than it did in the scheme under which we
stored each relation in a separate file.
select account-number,customer-name,customer-street,customer-city
199 | P a g e Department of Information Technology
Database Management Systems
from depositer,customer
where depositer.customer-name=customer.customer-name
DATA-DICTIONARY STORAGE:
Among the types of information that the system must store are these:
Further,the database may store statistical and descriptive data about the
relations such as:
The data dictionary may also note the storage organization(Sequential ,hash or
heap of relations,and the location where each relation is stored:
If relations are stored in operating system files,the dictionaary would note the
names of the file(or files) containing each relation.
If the database stores all relations in a single file,the dictionary may note the
blocks containing records of each relation in a data structure such as a linked
list.
In indices, we shall need to store information about each index on each of the
relations:
The exact choice of how to represent system data by relations must be made by
the system designers. One possible representation,with primary keys underlined
is:
View-metadata(view-name, definition)
Basic Concepts
Ordered Indices
B+-Tree Index Files
B-Tree Index Files
Static Hashing
Dynamic Hashing
Comparison of Ordered Indexing and Hashing
Index Definition in SQL
Multiple-Key Access
Basic Concepts
Index files are typically much smaller than the original file
Two basic kinds of indices:
Ordered indices: search keys are stored in sorted order
Hash indices: search keys are distributed uniformly across “buckets”
using a “hash function”.
1. Notice how we would find records for Perryridge branch using both
methods. (Do it!)
2. Dense indices are faster in general, but sparse indices require less space
and impose less maintenance for insertions and deletions. (Why?)
3. A good compromise: to have a sparse index with one entry per block.
Multi-Level Indices
1. Even with a sparse index, index size may still grow too large. For 100,000
records, 10 per block, at one index record per block, that's 10,000 index
records! Even if we can fit 100 index records per block, this is 100 blocks.
2. If index is too large to be kept in main memory, a search results in several
disk reads.
o If there are no overflow blocks in the index, we can use binary
search.
o This will read as many as blocks (as many as 7 for our 100
blocks).
o If index has overflow blocks, then sequential search typically used,
reading all b index blocks.
3. Solution: Construct a sparse index on the index (Figure 12.4).
Use binary search on outer index. Scan index block found until correct
index record found. Use index record as before - scan block pointed to for
desired record.
Index Update
Regardless of what form of index is used, every index must be updated whenever
a record is either inserted into or deleted from the file.
1. Deletion:
o Find (look up) the record
o If the last record with a particular search key value, delete that
search key value from index.
o For dense indices, this is like deleting a record in a file.
o For sparse indices, delete a key value by replacing key value's entry
in index by next search key value. If that value already has an index
entry, delete the entry.
2. Insertion:
o Find place to insert.
o Dense index: insert search key value if not present.
o Sparse index: no change unless new block is created. (In this case,
the first search key value appearing in the new block is inserted into
the index).
Secondary Indices
3. Secondary indices must be dense, with an index entry for every search-
key value, and a pointer to every record in the file.
4. Secondary indices improve the performance of queries on non-primary
keys.
5. They also impose serious overhead on database modification: whenever a
file is updated, every index must be updated.
6. Designer must decide whether to use secondary indices or not.
4. Each nonleaf node in the tree must have between and n children,
where n is fixed for a particular tree.
Structure of a B -Tree
Pointer (nth pointer in the leaf node) is used to chain leaf nodes together
in linear order (search key order). This allows efficient sequential
processing of the file.
4. Figures 11.7 (textbook Fig. 11.8) and textbook Fig. 11.9 show B -trees for
the deposit file with n=3 and n=5.
Queries on B -Trees
This means that the path is not long, even in large files. For a 4k byte disk
block with a search-key size of 12 bytes and a disk pointer of 8 bytes, n is
around 200. If n =100, a look-up of 1 million search-key values may take
nodes to be accessed. Since root is in usually in the
buffer, so typically it takes only 3 or fewer disk reads.
Updates on B -Trees
When insertion causes a leaf node to be too large, we split that node. In
Figure 11.8, assume we wish to insert a record with a bname value of
``Clearview''.
o We put the first values in the existing node, and the remainder
into a new node.
o Figure 11.10 shows the result.
o The new node must be inserted into the B -tree.
o We also need to update search key values for the parent (or higher)
nodes of the split leaf node. (Except if the new node is the leftmost
one)
o Order must be preserved among the search key values in each
node.
o If the parent was already full, it will have to be split.
o When a non-leaf node is split, the children are divided among the
two new nodes.
o In the worst case, splits may be required all the way up to the root.
(If the root is split, the tree becomes one level deeper.)
o Note: when we start a B -tree, we begin with a single node that is
both the root and a single leaf. When it gets full and another
insertion occurs, we split it into two leaf nodes, requiring a new root.
3. Deletions Causing Combining:
Deleting records may cause tree nodes to contain too few pointers. Then
we must combine nodes.
4. To summarize:
o Insertion and deletion are complicated, but require relatively few
operations.
o Number of operations required for insertion and deletion is
proportional to logarithm of number of search keys.
o B -trees are fast as index structures for database.
1. The B -tree structure is used not only as an index but also as an organizer
for records into a file.
2. In a B -tree file organization, the leaf nodes of the tree store records
instead of storing pointers to records, as shown in Fig. 11.17.
3. Since records are usually larger than pointers, the maximum number of
records that can be stored in a leaf node is less than the maximum
number of pointers in a non leaf node.
4. However, the leaf node are still required to be at least half full.
5. Insertion and deletion from a B -tree file organization are handled in the
same way as that in a B -tree index.
6. When a B -tree is used for file organization, space utilization is particularly
important. We can improve the space utilization by involving more sibling
nodes in redistribution during splits and merges.
7. In general, if m nodes are involved in redistribution, each node can be
2. Advantages:
3. Disadvantages:
o Leaf and non-leaf nodes are of different size (complicates storage)
o Deletion may occur in a non-leaf node (more complicated)
STATIC HASHING
Hash Functions
Hash Indices
1. A hash index organizes the search keys with their associated pointers into
a hash file structure.
2. We apply a hash function on a search key to identify a bucket, and store
the key and its associated pointers in the bucket (or in overflow buckets).
3. Strictly speaking, hash indices are only secondary index structures, since if
a file itself is organized using hashing, there is no need for a separate hash
index structure on it.
DYNAMIC HASHING
o If the bucket is full, we must split the bucket, and redistribute the
records.
o If bucket is split we may need to increase the number of bits we use
in the hash.
6. Two cases exist:
1. If , then only one entry in the bucket address table points to bucket
j.
2. If , then more than one entry in the bucket address table points to
bucket j.
o Then we can split bucket j without increasing the size of the bucket
address table
o Note that all entries that point to bucket j correspond to hash
prefixes that have the same value on the leftmost bits.
o We allocate a new bucket z, and set and to the original value
plus 1.
o Now adjust entries in the bucket address table that previously
pointed to bucket j.
o Leave the first half pointing to bucket j, and make the rest point to
bucket z.
o Rehash each record in bucket j as before.
o Reattempt new insert.
7. Note that in both cases we only need to rehash records in bucket j.
8. Deletion of records is similar. Buckets may have to be coalesced, and
bucket address table may have to be halved.
9. Insertion is illustrated for the example deposit file of Figure 12.25
of Figure 12.31
10.Advantages:
o Extendable hashing provides performance that does not degrade as
the file grows.
o Minimal space overhead - no buckets need be reserved for future
use. Bucket address table only contains one pointer for each hash
value of current prefix length.
11.Disadvantages:
o Extra level of indirection in the bucket address table
o Added complexity
An index is created by
on r (<attribute-list>)
The attribute list is the list of attributes in relation r that form the search key for
the index.
on branch (bname)
If the search key is a candidate key, we add the word unique to the definition:
on branch (bname)
If the index creation succeeds, any attempt to insert a tuple violating this
requirement will fail.
The unique keyword is redundant if primary keys have been defined with
integrity constraints already.
Multiple-Key Access:
Until now, we have assumed implicitly that only one index (or hash table) is used
to
Assume that the account file has two indices: one for branch-name and one for
balance.
Consider the following query: “Find all account numbers at the Perryridge branch
select loan-number
from account
1. Use the index on branch-name to find all records pertaining to the Perryridge
2. Use the index on balance to find all records pertaining to accounts with
balances
3. Use the index on branch-name to find pointers to all records pertaining to the
Perryridge branch. Also, use the index on balance to find pointers to all records
two sets of pointers. Those pointers that are in the intersection point to
records
The third strategy is the only one of the three that takes advantage of the
existence
of multiple indices. However, even this strategy may be a poor choice if all of the
following hold:
• There are only a few records pertaining to both the Perryridge branch and
Q – BANK
7.Explain DBTG data structure and architecture .Also discuss the DBTG data
retrieval facility (15)-May2007
UNIT - V
Transactions
Concurrence Control
Recovery System
2Marks Q & A:
1. What is transaction?
Collections of operations that form a single logical unit of work are
called transactions.
Growing phase: a transaction may obtain locks but not release any lock
Shrinking phase: a transaction may release locks but may not obtain
any new locks.
17. What are the two methods for dealing deadlock problem?
The two methods for dealing deadlock problem is deadlock
detection and deadlock recovery.
Nonvolatile storage
34. What are the time stamps associated with each data item?
• W-timestamp (Q) denotes the largest time stamp if any
transaction that executed WRITE (Q) successfully.
• R-timestamp (Q) denotes the largest time stamp if any
transaction that executed READ (Q) su ccessfully
37. When does the transaction go into an active state and partially committed
state?
A transaction goes into an active state immediately after it starts
execution, where it can issue read and write operations. When the
transaction ends, it moves into the partially committed state.
Lecture Notes:
TRANSACTIONS
• These properties are often called the ACID properties; the acronym is
derived from the first letter of each of the four properties.
Transaction State:
1. Active, the initial state; the transaction stays in this state while it is
executing.
2. Partially committed, after the final statement has been execute.
3. Failed, after the discovery that normal execution can no longer Proceed.
4. Aborted, after the transaction has been rolled back and the database has
been restored to its state prior to the start of the transaction.
5. Committed, after successful completion.
Concurrent Executions:
Suppose the current values of accounts A and B are $1000 and $2000,
respectively.
Suppose also that the two transactions are executed one at a time in the
order T1 followed by T2. This execution sequence appears as follows:
The final values of accounts A and B, after the execution in this figure
takes place, are $855 and $2145, respectively.
After this execution takes place, we arrive at the same state as the one in
which the transactions are executed serially in the order T1 followed by T2.
The sum A + B is indeed preserved.
237 | P a g e Department of Information Technology
Database Management Systems
Serializability
For this reason, we shall not interpret the type of operations that a
transaction can perform on a data item.
read and write. We thus assume that, between a read(Q) instruction and a
write(Q) instruction on a data item Q, a transaction may perform an arbitrary
sequence of operations on the copy of Q that is residing in the local buffer of the
transaction.
T1 T2
read(A)
write(A) read(A)
write(A)
read(B)
write(B) read(B)
write(B)
Schedule3-showing only the read and write instructions
Conflict Serializability
However, if Ii and Ij refer to the same data item Q, then the order of the two
steps maymatter.
Since we are dealing with only read and write instructions, there are four cases
that we need to consider:
1. Ii = read(Q), Ij = read(Q). The order of Ii and Ij does not matter, since the
same value of Q is read by Ti and Tj , regardless of the order.
2. Ii = read(Q), Ij = write(Q). If Ii comes before Ij, then Ti does not read the
value of Q that is written by Tj in instruction Ij. If Ij comes before Ii, then Ti
reads the value of Q that is written by Tj. Thus, the order of Ii and Ij
matters.
the order of Ii and Ij directly affects the final value of Q in the database
state that results from schedule S.
However, the write(A) instruction of T2 does not conflict with the read(B)
instruction of T1, because the two instructions access different data items.
Since the write(A) instruction of T2 in schedule 3 does not conflict with the
read(B) instruction of T1, we can swap these instructions to generate an
equivalent schedule.
T1 T2
read(A)
write(A) read(A)
read(B) write(A)
write(B) read(B)
write(B)
T1 T2
read(A)
write(A)
read(B) read(A)
write(B) write(A)
read(B)
write(B)
T3 T2
read(Q) write(Q)
write(Q)
Schedule 7.
View Serializability
Consider two schedules S and S_, where the same set of transactions
participates in both schedules. The schedules S and S_ are said to be view
equivalent if three conditions are met:
1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S,
then transaction Ti must, in schedule S_, also read the initial value of Q.
3. For each data item Q, the transaction (if any) that performs the final write(Q)
operation in schedule S must perform the final write(Q) operation in schedule S_.
read(A)
T1 T2
A:=A-50
Write(A)
read(B)
B:=B-10
write(B)
read(B)
B:=B+5
0
read(A)
write(B)
Schedule 8. A:=A+10
T3 T4 T6
Read(Q)
Write(Q)
Write(Q)
Write(Q)
Recoverability
Recoverable Schedules
Cascadeless Schedules
T8 T9
read(A) Read(A)
write(A)
read(B)
Schedule 10
Read(A)
Read(B)
Write(A)
Read(A)
Schedule 11 read(A)
Transaction T11 writes a value of A that is read by transaction T12. Suppose that,
at this point, T10 fails. T10 must be rolled back. Since T11 is dependent on T10,
T11 must be rolled back. Since T12 is dependent on T11, T12 must be rolled
back.
243 | P a g e Department of Information Technology
Database Management Systems
Implementation of Isolation
There are various concurrency-control schemes that we can use to ensure that,
even when multiple transactions are executed concurrently, only acceptable
schedules are generated, regardless of how the operating-system time-shares
resources (such as CPU time) among the transactions.
As a result of the locking policy, only one transaction can execute at a time.
Therefore, only serial schedules are generated. These are trivially serializable,
and it is easy to verify that they are cascadeless as well.
The SQL standard specifies that a transaction begins implicitly. Transactions are
ended by one of these SQL statements:
• Commit work commits the current transaction and begins a new one.
The standard also specifies that the system must ensure both
serializability and freedom from cascading rollback.
The definition of serializability used by the standard is that a schedule must have
the same effect as would some serial schedule. Thus, conflict and view
serializability are both acceptable.
This graph consists of a pair G = (V, E), where V is a set of vertices and E is
a set of edges. The set of vertices consists of all the transactions participating in
the schedule.
The set of edges consists of all edges Ti →Tj for which one of three
conditions holds:
.
T1 T2 T2 T1
(a) (b)
If the precedence graph for S has a cycle, then schedule S is not conflict
serializable.
T1 T2
Ti
Tj Tk
Tm
(a)
Ti Ti
Tj Tk
Tk Tj
(b) (c)
Tm Tm
Concurrency Control
Lock-Based Protocols
Graph-Based Protocols
Timestamp-Based Protocols
Multiple Granularity
Multiversion Protocols
Deadlock Handling
247 | Page Department of Information Technology
Database Management Systems
Lock-Based Protocols:
2. shared mode (S): Data item can only be read. S-lock is requested
using lock- S(A) instruction. Locks can be released:U-lock(A)
Locking protocol:
Locking Rules/Protocol:
Lock 2
Lock 1
Example
1. X-lock(B)
2. read B
3. B := B-50
4. write B
5. U-lock(B)
6. S-lock(A)
7. read A
8. U-lock(A)
9. S-lock(B)
10. read B
11. U-lock(B)
12. display A + B
13. X-lock(A)
14. read A
15. A := A+50
16. write A
17. U-lock(A)
T1 T2
1. X-lock(B)
2. read(B)
3. B := B-50
4. write(B)
5. S-lock(A)
6. read(A)
7. S-lock(B)
8. X-lock(A)
T1 T2
• When the first lock is released, the transaction moves from phase 1 to
phase 2.
• Properties of the Two-Phase Locking Protocol. Ensures serializability It can
be shown that the transactions can be serialized in the order of their lock
points (i.e. the point where a transaction acquired its final lock).
• Does not ensure freedom from deadlocks. Cascading roll-back is possible.
Modifications of the two-phase locking protocol
All locks are held till commit/abort. Transactions can be serialized in the
order in which they commit.
Refine the two-phase locking protocol with lock conversions
Phase 1:Acquire a lock-S on item can acquire a lock-X on item can convert a
lock-S to a lock-X (upgrade)
* Ensures serializability; but still relies on the programmer to insert the various
locking instructions.
*Strict and rigorous two-phase locking (with lock conversions) are used
extensively in DBMS.
read(D)
else
grant Ti a lock-S on D;
read(D);
end
write(D)
else
else
grant Ti a lock-X on D;
end
251 | P a g e Department of Information Technology
Database Management Systems
write(D);
end
Implementation of Locking:
Lock table:
Graph-Based Protocols:
Example: The following 4 transactions follow the tree protocol on the database
graph below.
Timestamp based protocol is the locking protocols and the order between
every pair of conflicting transactions at execution time by the first that
both members of the pairs request that involves incompatible modes.
Another method for determining the serializability order is to select an
ordering among transaction in advance.The most common method for
doing so is to use a timestamp ordering scheme.
TIMESTAMPS:
To implement this scheme we associate with each data item Q two timestamp
values.
The timestamp-ordering protocol ensures that any conflicting read and write
operations are executed in timestamp order.This protocols operates as follows:
read(B)
254 | P a g e Department of Information Technology
Database Management Systems
read(B)
B := B – 50
write(B)
read(A)
read(A)
display(A+B)
A := A + 50
write(A)
display(A+B)
T14 T15
W-timestamp(Q) to TS(Ti).
Under Thomas’ write rule the write(Q) operation of T16 would be ign
ored.The result is a schedule that is view equivalent to the serial
schedule<T16,T17>.
VALIDATION-BASED PROTOCOLS:
To perfom the validation test we need to know when the various phases of
transaction Ti took place and then the associate three different timestamps with
transaction Ti.
2.Validation(Ti) is the time when Ti finished its read phase and started
2.The set of data items written by Ti does not intersect with the set of data
items read by Tj and Ti completes its write phase before Tj starts its validation
phase(start(Ti)<validation(Tj)).This condition ensures that the writes of Ti and Tj
do not overlap.
Multiple Granularity
Multiversion Protocols:
write has already read a version created by a transaction older than Ti.
Disadvantages
DEADLOCK HANDLING:
write(Y) write(X)
T1 T2
lockX
on X
write (X)
lockX
on Y
write (Y)
on X
on Y
DEADLOCK PREVENTION:
TIMEOUT-BASED SCHEMES:
When a deadlock is detected, the system must recover from the deadlock.
The most common solution is to roll back one or more transactions to break the
deadlock. Three actions are required:
1.Selection of a victim: Select that transaction(s) to roll back that will incur
minimum cost.
2.Rollback: Determine how far to roll back transaction.Total rollback: Abort the
transaction and then restart it. More effective to roll back transaction only as far
as necessary to break deadlock.
DEADLOCK DETECTION:
Other require the ability to delete data items.To examine how such
transactions affect concurreny control,we introduce these additional
operations.
DELETION:
INSERTION:
Consider the transaction t20 that executes the following SQL query on the
bank database.
Select sum(balance)
From account
Where branch_name=’perryridge’
• A transaction Ti can access tuples of a relation only after first finding them
through one or more of the indices on the relation.
• A transaction Ti that performs a lookupmust acquire a shared lock on all
the index leaf nodes that it accesses.
• A transaction Ti may not insert delete or update a tuple ti in arelation r
without updating all indices on r.
• The rules of the two phase locking protocol must be observed.
RECOVERY SYSTEM
Recovery system:
Failure Classification
There are various types of failure that may occur in a system.there are :
Transaction failure :
1. Logical errors:
2. System errors:
(e.g., deadlock)
3. System crash:
Disk failure:
A head crash or similar disk failure destroys all or part of Disk . Destruction
is assumed to be detectable: disk drives use checksums to detect failures
Recovery Algorithms
2. Actions taken after a failure to recover the database contents to a state that
ensures atomicity, consistency and durability.
Storage Structure :
The various data items in the database may be stored and accessed in a
number of different storage media.
Storage types
up) RAM.
Successful completion :
Information arrived safely at its destination
267 | P a g e Department of Information Technology
Database Management Systems
Partial failure:
Destination block has incorrect information
Total failure:
• Destination block was never updated .
• Protecting storage media from failure during data transfer (one
solution):
Protecting storage media from failure during data transfer copies of a block
may differ due to failure during output operation.
Data Access
Block movements between disk and main memory are initiated through the
following two operations:
• output(B) transfers the buffer block B to the disk, and replaces the
appropriate physical block there.
We assume, for simplicity, that each data item fits in, and is stored inside, a
single block.
Transactions:
• Several output operations may be required for Ti (to output A and B). A
failure may occur after one of these modifications have been made but
before all of them are made.
two approaches:
2. shadow paging
Assume that transactions run serially, that is, one after the other.
We assume for now that log records are written directly to stable storage (that
is, they are not buffered)
commit.
T0 : read (A)
T1 : read (C)
A: - A - 50 C:- C- 100
read (B)
B:- B + 50
write (B)
present
Example :
<T0 start>
A = 950
B = 2050
<T0 commit>
<T1 start>
C = 600
BB, BC
<T1 commit>
BA
x1
(b) undo (T1) and redo (T0): C is restored to 700, and then A and B are set to
950 and 2050 respectively.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
checkpoints:
When a system failure occurs, we must consult the log to determine those
transactions that need to be redone and those that need to be undone. We need
to search the entire log to determine this information.
Transaction are not allowed to perform any update actions, such as writing
to a buffer block or writing a log record, while a checkpoint is in progress.
Consider a transaction Ti that committed prior to the checkpoint.
For such a transaction, the <Ti commit> record appears in the log before
the <checkpoint> record. Any database modifications made by Ti must been
written to the database either prior o the checkpoint or as part of the checkpoint
itself.
Shadow Paging
To commit a transaction
3. Make the current page table the new shadow page table, as follows:
Disadvantages :
Q-BANK:
(a)Log-Based recovery(7)
(b)List different security concerns for a bank and state whether these concern
relates to physical security, human security, operating system security, or
database security (9)
7.List the ACID properties and explain in detail on necessities of each of ACID
properties(15)
8.(a)explain the two principal methods for dealing with the deadlock
problem(9)April/May2009
(b) List and explain the properties of the transaction that the database system
maintain (6)
(a)Shadow Paging