Академический Документы
Профессиональный Документы
Культура Документы
UNIT-1
A database is an organized collection of data. The data are typically organized to model
relevant aspects of reality in a way that supports processes requiring this information. For
example, modelling the availability of rooms in hotels in a way that supports finding a hotel
with vacancies.
Database management systems (DBMSs) are specially designed software applications
that interact with the user, other applications, and the database itself to capture and
analyze data. A general-purpose DBMS is a software system designed to allow the
definition, creation, querying, update, and administration of databases. Well-known DBMSs
include MySQL, MariaDB, PostgreSQL, SQLite, Microsoft SQL Server, Microsoft Access,
Oracle, SAP HANA, dBASE, FoxPro, IBM DB2, LibreOffice Base, FileMaker Pro and
InterSystems Cach. A database is not generally portable across different DBMSs, but
different DBMSs can interoperate by using standards such as SQL and ODBC or JDBC to
allow a single application to work with more than one database.
The interactions catered for by most existing DBMSs fall into four main groups:
Data definition Defining new data structures for a database, removing data
structures from the database, modifying the structure of existing data.
Update Inserting, modifying, and deleting data.
Retrieval Obtaining information either for end-user queries and reports or for
processing by applications.
Administration Registering and monitoring users, enforcing data security,
monitoring performance, maintaining data integrity, dealing with concurrency
control, and recovering information if the system fails.
A DBMS is responsible for maintaining the integrity and security of stored data, and for
recovering information if the system fails.
Data Abstraction: The major purpose of a database system is to provide users with anabstract view of the
system.
The system hides certain details of how data is stored and created and maintained
Complexity should be hidden from database users.
There are several levels of abstraction:
1. Physical Level:
o How the data are stored.
o E.g. index, B-tree, hashing.
o Lowest level of abstraction.
o Complex low-level structures described in detail.
2. Conceptual Level:
o Next highest level of abstraction.
o Describes what data are stored.
o Describes the relationships among data.
o Database administrator level.
3. View Level:
o Highest level.
o Describes part of the database for a particular group of users.
o Can be many different views of a database.
o E.g. tellers in a bank get a view of customer accounts, but not of payroll data.
Data Models:1. Data models are a collection of conceptual tools for describing data, data
relationships, data semantics and data constraints. There are three different groups:
1. Object-based Logical Models.
2. Record-based Logical Models.
3. Physical Data Models.
We'll look at them in more detail now.
5. basic idea: hide implementation details of the database schemes from the users
Database Manager
1. The database manager is a program module which provides the interface
between the low-level data stored in the database and the application programs and
queries submitted to the system.
2. Databases typically require lots of storage space (gigabytes). This must be stored on
disks. Data is moved between disk and main memory (MM) as needed.
3. The goal of the database system is to simplify and facilitate access to data.
Performance is important. Views provide simplification.
4. So the database manager module is responsible for
o Interaction with the file manager: Storing raw data on disk using the file
system usually provided by a conventional operating system. The database
manager must translate DML statements into low-level file system
commands (for storing, retrieving and updating data in the database).
o Integrity enforcement: Checking that updates in the database do not violate
consistency constraints (e.g. no bank account balance below $25)
o Security enforcement: Ensuring that users only have access to information
they are permitted to see
o Backup and recovery: Detecting failures due to power failure, disk crash,
software errors, etc., and restoring the database to its state before the failure
o Concurrency control: Preserving data consistency when there are
concurrent users.
8
5. Some small database systems may miss some of these features, resulting in simpler
database managers. (For example, no concurrency is required on a PC running MSDOS.) These features are necessary on larger systems.
Database Administrator
1. The database administrator is a person having central control over data and
programs accessing that data. Duties of the database administrator include:
o Scheme definition: the creation of the original database scheme. This
involves writing a set of definitions in a DDL (data storage and definition
language), compiled by the DDL compiler into a set of tables stored in the
data dictionary.
o Storage structure and access method definition: writing a set of
definitions translated by the data storage and definition language compiler
o Scheme and physical organization modification: writing a set of
definitions used by the DDL compiler to generate modifications to
appropriate internal system tables (e.g. data dictionary). This is done rarely,
but sometimes the database scheme or physical organization must be
modified.
o Granting of authorization for data access: granting different types of
authorization for data access to various users
o Integrity constraint specification: generating integrity constraints. These
are consulted by the database manager module whenever updates occur.
Database Users
1. The database users fall into several categories:
o Application programmers are computer professionals interacting with the
system through DML calls embedded in a program written in a host language
(e.g. C, PL/1, Pascal).
These programs are called application programs.
The DML precompiler converts DML calls (prefaced by a special
character like $, #, etc.) to normal procedure calls in a host language.
The host language compiler then generates the object code.
Some special types of programming languages combine Pascal-like
control structures with control structures for the manipulation of a
database.
These are sometimes called fourth-generation languages.
They often include features to help generate forms and display data.
o Sophisticated users interact with the system without writing programs.
They form requests by writing queries in a database query language.
These are submitted to a query processor that breaks a DML
statement down into instructions for the database manager module.
9
10
11
12
13
14
15
Operators in relational algebra are not necessarily the same as SQL operators, even if they
have the same name. For example, the SELECT statement exists in SQL, and also exists in
relational algebra. These two uses of SELECT are not the same. The DBMS must take
whatever SQL statements the user types in and translate them into relational algebra
operations before applying them to the database.
Terminology
Operators - Write
INSERT - provides a list of attribute values for a new tuple in a relation. This
operator is the same as SQL.
16
Operators - Retrieval
There are two groups of operations:
Mathematical
set
theory
based
UNION, INTERSECTION, DIFFERENCE, and CARTESIAN PRODUCT.
Special
database
SELECT (not the same as SQL SELECT), PROJECT, and JOIN.
relations:
operations:
Relational SELECT
SELECT is used to obtain a subset of the tuples of a relation that satisfy a select condition.
For example, find all employees born after 1st Jan 1950:
SELECTdob '01/JAN/1950'(employee)
Relational PROJECT
The PROJECT operation is used to select a subset of the attributes of a relation by
specifying the names of the required attributes.
For example, to get a list of all employees surnames and employee numbers:
PROJECTsurname,empno(employee)
SELECT and PROJECT
SELECT and PROJECT can be combined together. For example, to get a list of employee
numbers for employees in department number 1:
17
UNION
of
R
and
S
the union of two relations is a relation that includes all the tuples that are either in R
or in S or in both R and S. Duplicate tuples are eliminated.
INTERSECTION
of
R
and
S
the intersection of R and S is a relation that includes all tuples that are both in R
and S.
DIFFERENCE
of
R
and
S
the difference of R and S is the relation that contains all the tuples that are in R but
that are not in S.
UNION Example
Figure : UNION
18
INTERSECTION Example
Figure : Intersection
DIFFERENCE Example
Figure : DIFFERENCE
CARTESIAN PRODUCT
The Cartesian Product is also an operator which works on two sets. It is sometimes called
the CROSS PRODUCT or CROSS JOIN.
It combines the tuples of one relation with all the tuples of the other relation.
19
In its simplest form the JOIN operator is just the cross product of the two relations.
As the join becomes more complex, tuples are removed within the cross product to
make the result of the join more meaningful.
JOIN allows you to evaluate a join condition between the attributes of the relations
on which the join is undertaken.
Figure : JOIN
20
Natural Join
Invariably the JOIN involves an equality test, and thus is often described as an equi-join.
Such joins result in two attributes in the resulting relation having exactly the same value. A
`natural join' will remove the duplicate attribute(s).
In most systems a natural join will require that the attributes have the same name to
identify the attribute(s) to be used in the join. This may require a renaming
mechanism.
If you do use natural joins make sure that the relations do not have two attributes
with the same name by accident.
OUTER JOINs
Notice that much of the data is lost when applying a join to two relations. In some cases this
lost data might hold useful information. An outer join retains the information that would
have been lost from the tables, replacing missing data with nulls.
There are three forms of the outer join, depending on which data is to be kept.
21
22
23
24
25
Many of the tables in a database will have relationships, or links, between them, either in a
one-to-one or a one-to-many relationship. The connection between the tables is made by a
Primary Key Foreign Key pair, where a Foreign Key field(s) in a given table is the
Primary Key of another table. As a typical example, there is a one-to-many relationship
between Customers and Orders. Both tables have a CustID field, which is the Primary Key
of the Customers table and is a Foreign Key of the Orders Table. The related fields do not
need to have the identical name, but it is a good practice to keep them the same.
A SQL SELECT statement can be broken down into numerous elements, each beginning
with a keyword. Although it is not necessary, common convention is to write these
keywords in all capital letters. In this article, we will focus on the most fundamental and
common elements of a SELECT statement, namely
SELECT
FROM
WHERE
ORDER BY
The most basic SELECT statement has only 2 parts: (1) what columns you want to return
and (2) what table(s) those columns come from.
If we want to retrieve all of the information about all of the customers in the Employees
table, we could use the asterisk (*) as a shortcut for all of the columns, and our query looks
like
26
If we want only specific columns (as is usually the case), we can/should explicitly specify
them in a comma-separated list, as in
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
which results in the specified fields of data for all of the rows in the table:
Explicitly specifying the desired fields also allows us to control the order in which the fields
are returned, so that if we wanted the last name to appear before the first name, we could
write
SELECT EmployeeID, LastName, FirstName, HireDate, City FROM Employees
The next thing we want to do is to start limiting, or filtering, the data we fetch from the
database. By adding a WHERE clause to the SELECT statement, we add one (or more)
conditions that must be met by the selected data. This will limit the number of rows that
answer the query and are fetched. In many cases, this is where most of the "action" of a
query takes place.
We can continue with our previous query, and limit it to only those employees living in
London:
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE City = 'London'
resulting in
27
If you wanted to get the opposite, the employees who do not live in London, you would
write
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE City <> 'London'
It is not necessary to test for equality; you can also use the standard equality/inequality
operators that you would expect. For example, to get a list of employees who where hired
on or after a given date, you would write
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE HireDate >= '1-july-1993'
Of course, we can write more complex conditions. The obvious way to do this is by having
multiple conditions in the WHERE clause. If we want to know which employees were hired
between two given dates, we could write
SELECT EmployeeID, FirstName, LastName, HireDate, City
FROM Employees
WHERE (HireDate >= '1-june-1992') AND (HireDate <= '15-december-1993')
resulting in
28
Note that SQL also has a special BETWEEN operator that checks to see if a value is between
two values (including equality on both ends). This allows us to rewrite the previous query
as
SELECT EmployeeID, FirstName, LastName, HireDate, City
FROM Employees
WHERE HireDate BETWEEN '1-june-1992' AND '15-december-1993'
We could also use the NOT operator, to fetch those rows that are not between the specified
dates:
SELECT EmployeeID, FirstName, LastName, HireDate, City
FROM Employees
WHERE HireDate NOT BETWEEN '1-june-1992' AND '15-december-1993'
Let us finish this section on the WHERE clause by looking at two additional, slightly more
sophisticated, comparison operators.
What if we want to check if a column value is equal to more than one value? If it is only 2
values, then it is easy enough to test for each of those values, combining them with the OR
operator and writing something like
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE City = 'London' OR City = 'Seattle'
However, if there are three, four, or more values that we want to compare against, the
above approach quickly becomes messy. In such cases, we can use the IN operator to test
against a set of values. If we wanted to see if the City was either Seattle, Tacoma, or
Redmond, we would write
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE City IN ('Seattle', 'Tacoma', 'Redmond')
As with the BETWEEN operator, here too we can reverse the results obtained and query
for those rows where City is not in the specified list:
29
Description
_
matches any single character
(underscore)
%
[]
matches any single character within the specified range (e.g. [a-f]) or set (e.g.
[abcdef]).
[^]
matches any single character not within the specified range (e.g. [^a-f]) or set (e.g.
[^abcdef]).
WHERE FirstName LIKE '_im' finds all three-letter first names that end with 'im' (e.g. Jim,
Tim).
WHERE LastName LIKE '%stein' finds all employees whose last name ends with 'stein'
WHERE LastName LIKE '%stein%' finds all employees whose last name includes 'stein'
anywhere in the name.
WHERE FirstName LIKE '[JT]im' finds three-letter first names that end with 'im' and begin
with either 'J' or 'T' (that is, only Jim and Tim)
WHERE LastName LIKE 'm[^c]%' finds all last names beginning with 'm' where the
following (second) letter is not 'c'.
Here too, we can opt to use the NOT operator: to find all of the employees whose first name
does not start with 'M' or 'A', we would write
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
WHERE (FirstName NOT LIKE 'M%') AND (FirstName NOT LIKE 'A%')
resulting in
30
Until now, we have been discussing filtering the data: that is, defining the conditions that
determine which rows will be included in the final set of rows to be fetched and returned
from the database. Once we have determined which columns and rows will be included in
the results of our SELECT query, we may want to control the order in which the rows
appearsorting the data.
To sort the data rows, we include the ORDER BY clause. The ORDER BY clause includes
one or more column names that specify the sort order. If we return to one of our first
SELECT statements, we can sort its results by City with the following statement:
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
ORDER BY City
By default, the sort order for a column is ascending (from lowest value to highest value), as
shown below for the previous query:
If we want the sort order for a column to be descending, we can include the DESC keyword
after the column name.
The ORDER BY clause is not limited to a single column. You can include a comma-delimited
list of columns to sort bythe rows will all be sorted by the first column specified and then
by the next column specified. If we add the Country field to the SELECT clause and want to
sort by Country and City, we would write:
31
Note that to make it interesting, we have specified the sort order for the City column to be
descending (from highest to lowest value). The sort order for the Country column is still
ascending. We could be more explicit about this by writing
SELECT EmployeeID, FirstName, LastName, HireDate, Country, City FROM Employees
ORDER BY Country ASC, City DESC
but this is not necessary and is rarely done. The results returned by this query are
It is important to note that a column does not need to be included in the list of selected
(returned) columns in order to be used in the ORDER BY clause. If we don't need to
see/use the Country values, but are only interested in them as the primary sorting field we
could write the query as
SELECT EmployeeID, FirstName, LastName, HireDate, City FROM Employees
ORDER BY Country ASC, City DESC
32
Conclusion
In this article we have taken a look at the most basic elements of a SQL SELECT statement
used for common database querying tasks. This includes how to specify and filter both the
columns and the rows to be returned by the query. We also looked at how to control the
order of rows that are returned.
Although the elements discussed here allow you to accomplish many data access /
querying tasks, the SQL SELECT statement has many more options and additional
functionality. This additional functionality includes grouping and aggregating data
(summarizing, counting, and analyzing data, e.g. minimum, maximum, average values). This
article has also not addressed another fundamental aspect of fetching data from a relational
databaseselecting data from multiple tables.
References
Additional and more detailed information on writing SQL queries and statements can be
found in these two books:
McManus, Jeffrey P. and Goldstein, Jackie, Database Access with Visual Basic.NET (Third
Edition), Addison-Wesley, 2003
Hernandez Michael J. and Viescas, John L., SQL Queries for Mere Mortals, Addison-Wesley,
2000.
Jackie Goldstein is the principal of Renaissance Computer Systems, specializing in consulting,
training, and development with Microsoft tools and technologies. Jackie is a Microsoft
Regional Director and MVP, founder of the Israel VB User Group, and a featured speaker at
international developer events including TechEd, VSLive!, Developer Days, and Microsoft PDC.
He is also the author of Database Access with Visual Basic.NET (Addison-Wesley, ISBN 067232-3435) and a member of the INETA Speakers Bureau. In December 2003, Microsoft
designated Jackie as a .NET Software Legend.
33
Nested Quries:A Subquery or Inner query or Nested query is a query within another SQL query and
embedded within the WHERE clause.
A subquery is used to return data that will be used in the main query as a condition to
further restrict the data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along
with the operators like =, <, >, >=, <=, IN, BETWEEN etc.
There are a few rules that subqueries must follow:
Example:
Consider the CUSTOMERS table having the following records:
34
Example:
Consider a table CUSTOMERS_BKP with similar structure as CUSTOMERS table. Now to
copy complete CUSTOMERS table into CUSTOMERS_BKP, following is the syntax:
SQL> INSERT INTO CUSTOMERS_BKP
SELECT * FROM CUSTOMERS
WHERE ID IN (SELECT ID
35
Example:
Assuming, we have CUSTOMERS_BKP table available which is backup of CUSTOMERS table.
Following example updates SALARY by 0.25 times in CUSTOMERS table for all the
customers whose AGE is greater than or equal to 27:
SQL> UPDATE CUSTOMERS
SET SALARY = SALARY * 0.25
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );
This would impact two rows and finally CUSTOMERS table would have the following
records:
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 35 | Ahmedabad | 125.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 2125.00 |
| 6 | Komal | 22 | MP
| 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
36
Example:
Assuming, we have CUSTOMERS_BKP table available which is backup of CUSTOMERS table.
Following example deletes records from CUSTOMERS table for all the customers whose
AGE is greater than or equal to 27:
SQL> DELETE FROM CUSTOMERS
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE > 27 );
This would impact two rows and finally CUSTOMERS table would have the following
records:
+----+----------+-----+---------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+---------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+---------+----------+
SQL Subquery
Subquery or Inner query or Nested query is a query in a query. SQL subquery is usually
added in the WHERE Clause of the SQL statement. Most of the time, a subquery is used
when you know how to search for a value using a SELECT statement, but do not know the
exact value in the database.
Subqueries are an alternate way of returning data from multiple tables.
Subqueries can be used with the following SQL statements along with the comparision
operators like =, <, >, >=, <= etc.
37
SELECT
INSERT
UPDATE
DELETE
Subquery Example:
1) Usually, a subquery should return only one record, but sometimes it can also return
multiple records when used with operators like IN, NOT IN in the where clause. The query
would be like,
SELECT
first_name,
FROM
WHERE games NOT IN ('Cricket', 'Football');
last_name,
subject
student_details
last_name subject
-------------
-------------
----------
Shekar
Gowda
Badminton
Priya
Chandra
Chess
2) Lets consider the student_details table which we have used earlier. If you know the
name of the students who are studying science subject, you can get their id's by using this
query below,
SELECT
FROM
WHERE first_name IN ('Rahul', 'Stephen');
id,
first_name
student_details
but, if you do not know their names, then to get their id's you need to write the query in
this manner,
SELECT
FROM
WHERE
first_name
FROM
WHERE subject= 'Science');
id,
IN
(SELECT
first_name
student_details
first_name
student_details
38
Output:
id
first_name
--------
-------------
100
Rahul
102
Stephen
In the above sql statement, first the inner query is processed first and then the outer query
is processed.
3) Subquery can be used with INSERT statement to add rows of data from one or more
tables to another table. Lets try to group all the students who study Maths in a table
'maths_group'.
INSERT
INTO
SELECT
id,
first_name
FROM student_details WHERE subject= 'Maths'
||
maths_group(id,
'
'
||
name)
last_name
4) A subquery can be used in the SELECT statement as follows. Lets use the product and
order_items table defined in the sql_joins section.
select p.product_name, p.supplier_name, (select order_id from order_items where product_id = 101) as
order_id from product p where p.product_id = 101
product_name
supplier_name order_id
------------------
------------------
----------
Television
Onida
5103
39
UNIT-2
Prolems Caused by Redundancy:Storing the SeHne inforrnation redundantly, that is, in l110re than one place
\vithin a database, can lead to several problcll1S:
- Redundant Storage: SOU1C iuforInation is stored repeatedly.
the A field or the B field, they can differ in the C field without violating the FD. On the other
hand, if we add a tuple (aI, bl, c2, dl) to the instance shown in this figure, the resulting
instance would violate the FD; to
see this violation, compare the first tuple in the figure with the new tuple.
Decomposition
1. The previous example might seem to suggest that we should decompose schema as
much as possible.
Careless decomposition, however, may lead to another form of bad design.
2. Consider a design where Lending-schema is decomposed into two schemas
3. Branch-customer-schema = (bname, bcity, assets, cname)
4.
5.
Customer-loan-schema = (cname, loan#, amount)
6.
7. We construct our new relations from lending by:
8. branch-customer =
9.
10.
customer-loan =
13. We notice that there are tuples in branch-customer customer-loan that are not in
lending.
14. How did this happen?
o The intersection of the two schemas is cname, so the natural join is made on
the basis of equality in the cname.
o If two lendings are for the same customer, there will be four tuples in the
natural join.
o Two of these tuples will be spurious - they will not appear in the original
lending relation, and should not appear in the database.
o Although we have more tuples in the join, we have less information.
o Because of this, we call this a lossy or lossy-join decomposition.
o A decomposition that is not lossy-join is called a lossless-join
decomposition.
o The only way we could make a connection between branch-customer and
customer-loan was through cname.
15. When we decomposed Lending-schema into Branch-schema and Loan-info-schema,
we will not have a similar problem. Why not?
16. Branch-schema = (bname, bcity, assets)
17.
18.
Branch-loan-schema = (bname, cname, loan#, amount)
19.
o The only way we could represent a relationship between tuples in the two
relations is through bname.
o This will not cause problems.
o For a given branch name, there is exactly one assets value and branch city.
20. For a given branch name, there is exactly one assets value and exactly one bcity;
whereas a similar statement associated with a loan depends on the customer, not on
the amount of the loan (which is not unique).
21. We'll make a more formal definition of lossless-join:
o Let R be a relation schema.
o
is a decomposition of R if
o
o
That is,
.
It is always the case that:
for
42
A decomposition
of a relation schema R is a lossless-join
decomposition for R if, for all relations r on schema R that are legal under C:
22. In other words, a lossless-join decomposition is one in which, for any legal relation r,
if we decompose r and then ``recompose'' r, we get what we started with - no more
and no less.
Lossless-Join Decomposition
Why is this true? Simply put, it ensures that the attributes involved in the natural
join (
This ensures that we can never get the situation where spurious tuples are
generated, as for any value on the join attributes there will be a unique tuple in one
of the relations.
43
2. We'll now show our decomposition is lossless-join by showing a set of steps that
generate the decomposition:
o First we decompose Lending-schema into
o Branch-schema = (bname, bcity, assets)
o
o
o
o
Since bname
assets bcity, the augmentation rule for functional
dependencies implies that
o bname
bname assets bcity
o
o
Dependency Preservation
1. Another desirable property in database design is dependency preservation.
o We would like to check easily that updates to the database do not result in
illegal relations being created.
o It would be nice if our design allowed us to check updates without having to
compute natural joins.
o To know whether joins must be computed, we need to determine what
functional dependencies may be tested by checking each relation
individually.
o Let F be a set of functional dependencies on schema R.
o
Let
The restriction of F to
be a decomposition of R.
that
44
Let
5.
6.
7.
8.
in D do
begin
9.
10.
11.
12.
:= the restriction of
to
end
13.
14.
15.
16.
17.
18.
do
begin
19.
20.
21.
22.
23.
24.
25.
26.
27.
end
compute
if (
and
Use this simpler method on exams and assignments (unless you have exponential
time available to you).
Normal Forms
46
A relation is in 1NF if all attribute values are atomic: no repeating group, no composite
attributes.
Formally, a relation may only has atomic attributes. Thus, all relations satisfy 1NF.
Example:
Consider the following table. It is not in 1 NF.
DEPT_NO
MANAGER_NO
D101
12345
D102
13456
EMP_NO
20000
20001
20002
EMP_NAME
Carl Sagan
Magic Johnson
Larry Bird
30000
30001
Jimmy Carter
Paul Simon
MANAGER_NO
EMP_NO
EMP_NAME
D101
12345
20000
Carl Sagan
D101
12345
20001
Magic Johnson
D101
12345
20002
Larry Bird
D102
13456
30000
Jimmy Carter
D102
13456
30001
Paul Simon
Problem of NFNF (non-first normal form): relational operations treat attributes as atomic.
A relation R is in 2NF if
o (a) R is in 1NF, and
o (b) all non-prime attributes are fully dependent on the candidate keys.
A prime attribute appears in a candidate key.
There is no partial dependency in 2NF. For a nontrivial FD X -> A and X is a subset of a
candidate key K, then X = K.
Example:
The following relation is not in 2NF. The relation has the following FD:
47
Course
CSCI 5333
Credit
3
Grade
A
S1
CSCI 4230
S2
CSCI 5333
B-
S2
CSCI 4230
S3
CSCI 5333
B+
A relation R is said to be in the third normal form if for every nontrivial functional
dependency X --> A,
o (1) X is a superkey, or
o (2) A is a prime (key) attribute.
An attribute is prime (a key attribute) if it appears in a candidate key. Otherwise, it is
non-prime.
Example:
The example relation for anomalies is not in 3NF.
EMPLOYEE(EMP_NO, NAME, DEPT_NO, MANAGER_NO).
with the following assumptions:
NAME
Paul Simon
DEPT_NO
D123
MANAGER_NO
54321
20000
Art Garfunkel
D123
54321
13000
Tom Jones
D123
54321
21000
Nolan Ryan
D225
42315
22000
Magic Johnson
D225
42315
48
31000
Carl Sagan
D337
33323
Note that it is important to consider only non-trivial FD in the definitions of both 2NF
and 3NF.
Example:
Consider R(A,B,C) with the minimal cover F: {A -> B}. Note that F |- B -> B, or B -> Bis in F+.
For B -> B, B is not a superkey and B is non-prime. However, B -> B is not a violation of 3NF
as it is trivial and should not be considered for potential violation.
Example:
Consider the relation
S(SUPP#, PART#, SNAME, QUANTITY) with the following assumptions:
(1) SUPP# is unique for every supplier.
(2) SNAME is unique for every supplier.
(3) QUANTITY is the accumulated quantities of a part supplied by a supplier.
(4) A supplier can supply more than one part.
(5) A part can be supplied by more than one supplier.
We can find the following nontrivial functional dependencies:
(1) SUPP# --> SNAME
(2) SNAME --> SUPP#
(3) SUPP# PART# --> QUANTITY
(4) SNAME PART# --> QUANTITY
Note that SUPP# and SNAME are equivalent.
The candidate keys are:
(1) SUPP# PART#
(2) SNAME PART#
The relation is in 3NF.
However, the relation has unnecessary redundancy:
49
SUPP#
S1
SNAME
Yues
PART#
P1
QUANTITY
100
S1
Yues
P2
200
S1
Yues
P3
250
S2
Jones
P1
300
4. Delete the columns you just moved from the original table except for the determinate
which will serve as a foreign key.
5. The original table may be renamed to maintain semantic meaning.
Third Normal Form
A relational table is considered in the third normal form if all columns in the table are
dependent only upon the primary key. The five step process for transforming into a third
normal form are as follows:
1. Identify any determinants, primary key, and the columns they determine.
2. Create and name a new table for each determinant and the unique columns it
determines.
3. Move the determined columns from the original table to the new table. The determinate
becomes the primary key of the new table.
4. Delete the columns you just moved from the original table except for the determinate
which will serve as a foreign key.
5. The original table may be renamed to maintain semantic meaning.
The third normal form is where the relational tables should be because they have the
advantage of eliminating redundant data which saves space and reduces manipulation
anomalies.
When a relation has more than one candidate key, anomalies may result even though the
relation is in 3NF.
3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys
i.e. composite candidate keys with at least one attribute in common.
BCNF is based on the concept of a determinant.
A determinant is any attribute (simple or composite) on which some other attribute is fully
functionally dependent.
A relation is in BCNF is, and only if, every determinant is a candidate key.
51
->
b,d
Here, the first determinant suggests that the primary key of R could be changed from a,b to
a,c. If this change was done all of the non-key attributes present in R could still be
determined, and therefore this change is legal. However, the second determinant indicates
that a,d determines b, but a,d could not be the key of R as a,d does not determine all of the
non key attributes of R (it does not determine c). We would say that the first determinate is
a candidate key, but the second determinant is not a candidate key, and thus this relation is
not in BCNF (but is in 3rd normal form).
John
09:00 Zorro
Kerr
09:00 Killer
Adam
10:00 Zorro
Robert
13:00 Killer
Zane
14:00 Zorro
52
2. If A--->B and A--->C but B and C are unrelated, ie A--->(B,C) is false, then we have
more than one multi-valued dependency.
3. A relation is in 4NF when it is in BCNF and has no more than one multi-valued
dependency.
Example to understand 4NF:Take the following table structure as an example:
info(employee#, skills, hobbies)
A table is in fourth normal form (4NF) if and only if it is in BCNF and contains no more than
one multi-valued dependency.
1. Anomalies can occur in relations in BCNF if there is more than one multi-valued dependency.
2. If A--->B and A--->C but B and C are unrelated, ie A--->(B,C) is false, then we have more than one
multi-valued dependency.
3. A relation is in 4NF when it is in BCNF and has no more than one multi-valued dependency.
employee#
skills
hobbies
Programming
Golf
Programming
Bowling
Analysis
Golf
Analysis
Bowling
Analysis
Golf
Analysis
Gardening
Management
Golf
Management
Gardening
This table is difficult to maintain since adding a new hobby requires multiple new rows
corresponding to each skill. This problem is created by the pair of multi-valued dependencies
EMPLOYEE#--->SKILLS and EMPLOYEE#--->HOBBIES. A much better alternative would
be to decompose INFO into two relations:
54
employee#
skills
Programming
Analysis
Analysis
Management
hobbies(employee#, hobby)
employee#
hobbies
Golf
Bowling
Golf
Gardening
Properties of 5NF:
Anomalies can occur in relations in 4NF if the primary key has three or more fields.
5NF is based on the concept of join dependence - if a relation cannot be decomposed any
further then it is in 5NF.
Pair wise cyclical dependency means that:
o You always need to know two values (pair wise).
o For any one you must know the other two (cyclical).
Take the following table structure as an example of a buying table.This is used to track buyers,
what they buy, and from whom they buy. Take the following sample data:
buyer
Sally
Mary
Sally
Mary
Sally
vendor
item
Problem:- The problem with the above table structure is that if Claiborne starts to sell Jeans then
how many records must you create to record this fact? The problem is there are pair wise cyclical
dependencies in the primary key. That is, in order to determine the item you must know the
buyer and vendor, and to determine the vendor you must know the buyer and the item, and
finally to know the buyer you must know the vendor and the item.
Solution:- The solution is to break this one table into three tables; Buyer-Vendor, Buyer-Item,
and Vendor-Item. So following tables are in the 5NF.
56
Buyer-Vendor
buyer
vendor
Sally
Liz
Claiborne
Mary
Liz
Claiborne
Sally
Jordach
Mary
Jordach
Buyer-Item
buyer
item
Sally
Blouses
Mary
Blouses
Sally
Jeans
Mary
Jeans
Sally
Sneakers
57
Vendor-Item
vendor
item
Jordach
Jeans
Jordach
Sneakers
58
UNIT-3
What is a Transaction?
A transaction is an event which occurs on the database. Generally a transaction reads a
value from the database or writes a value to the database. If you have any concept of
Operating Systems, then we can say that a transaction is analogous to processes.
Although a transaction can both read and write on the database, there are some
fundamental differences between these two classes of operations. A read operation does
not change the image of the database in any way. But a write operation, whether performed
with the intention of inserting, updating or deleting data from the database, changes the
image of the database. That is, we may say that these transactions bring the database from
an image which existed before the transaction occurred (called theBefore Image or BFIM)
to an image which exists after the transaction occurred (called the After Image or AFIM).
59
For example, in an application that transfers funds from one account to another, the
atomicity property ensures that, if a debit is made successfully from one account,
the corresponding credit is made to the other account.
Consistency
Data is in a consistent state when a transaction starts and when it ends.
For example, in an application that transfers funds from one account to another, the
consistency property ensures that the total value of funds in both the accounts is the
same at the start and end of each transaction.
Isolation
The intermediate state of a transaction is invisible to other transactions. As a result,
transactions that run concurrently appear to be serialized.
For example, in an application that transfers funds from one account to another, the
isolation property ensures that another transaction sees the transferred funds in
one account or the other, but not in both, nor in neither.
Durability
After a transaction successfully completes, changes to data persist and are not
undone, even in the event of a system failure.
For example, in an application that transfers funds from one account to another, the
durability property ensures that the changes made to each account will not be
reversed.
Or
ACID Properties:In computer science, ACID (Atomicity, Consistency, Isolation, Durability ) is a set of
properties that guarantee that database transactions are processed reliably. In the context
of databases, a single logical operation on the data is called a transaction. For example, a
transfer of funds from one bank account to another, even involving multiple changes such
as debiting one account and crediting another, is a single transaction.
Jim Gray defined these properties of a reliable transaction system in the late 1970s and
developed technologies to achieve them automatically.
In 1983, Andreas Reuter and Theo Hrder coined the acronym ACID to describe them.
Atomicity
60
61
A and the value in B must sum to 100. The following SQL code creates a table as described
above:
CREATE TABLE acidtest (A INTEGER, B INTEGER CHECK (A + B = 100));
Atomicity failure
Assume that a transaction attempts to subtract 10 from A and add 10 to B. This is a valid
transaction, since the data continue to satisfy the constraint after it has executed. However,
assume that after removing 10 from A, the transaction is unable to modify B. If the database
retained A's new value, atomicity requires that both parts of this transaction, or neither, be
complete.
Consistency failure
Consistency is a very general term, which demands that the data must meet all validation
rules. In the previous example, the validation is a requirement that A + B = 100. Also, it may
be inferred that both A and B must be integers. A valid range for A and B may also be
inferred. All validation rules must be checked to ensure consistency.
T2 adds 10 to A.
If these operations are performed in order, isolation is maintained, although T2 must wait.
Consider what happens if T1 fails half-way through. The database eliminates T1's effects,
and T2 sees only valid data.
By interleaving the transactions, the actual order of actions might be:
T1 subtracts 10 from A.
T2 subtracts 10 from B.
T2 adds 10 to A.
T1 adds 10 to B.
Again, consider what happens if T1 fails halfway through. By the time T1 fails, T2 has
already modified A; it cannot be restored to the value it had before T1 without leaving an
invalid database. This is known as a write-write failure,[citation needed] because two
transactions attempted to write to the same data field. In a typical system, the problem
would be resolved by reverting to the last known good state, canceling the failed
transaction T1, and restarting the interrupted transaction T2 from the good state.
Durability failure
Assume that a transaction transfers 10 from A to B. It removes 10 from A. It then adds 10 to
B. At this point, a "success" message is sent to the user. However, the changes are still
queued in the disk buffer waiting to be committed to the disk. Power fails and the changes
are lost. The user assumes (understandably) that the changes have been made.
several transactions can acquire shared lock or read lock on the same data item
simultaneously. When a transaction achieves an exclusive lock on a particular data item, no
other transactions are allowed to read or update that data item, as read-write and writewrite operations are conflicting. A transaction can acquire locks on data items of various
sizes, ranging from the entire database down to a data field. The size of the data item
determines the fineness or granularity of the lock.
In a distributed database system, the lock manager or scheduler is responsible for
managing locks for different transactions that are running on that system. When any
transaction requires read or write lock on data items, the transaction manager passes this
request to the lock manager. It is the responsibility of the lock manager to check whether
that data item is currently locked by another transaction or not. If the data item is locked
by another transaction and the existing locking mode is incompatible with the lock
requested by the current transaction, the lock manager does not allow the current
transaction to obtain the lock; hence, the current transaction is delayed until the existing
lock is released. Otherwise, the lock manager permits the current transaction to obtain the
desired lock and the information is passed to the transaction manager. In addition to these
rules, some systems initially allow the current transaction to acquire a read lock on a data
item, if that is compatible with the existing lock, and later the lock is converted into a write
lock. This is called upgradation of lock. The level of concurrency increases by upgradation
of locking. Similarly, to allow maximum concurrency some systems permit the current
transaction to acquire a write lock on a data item, and later the lock is converted into a read
lock; this is called downgradation of lock.
Locks
When one thread of control wants to obtain access to an object, it requests a lock for that
object. This lock is what allows JE to provide your application with its transactional
isolation guarantees by ensuring that:
no other thread of control can read that object (in the case of an exclusive lock), and
no other thread of control can modify that object (in the case of an exclusive or nonexclusive lock).
Lock Resources
When locking occurs, there are conceptually three resources in use:
1. The locker.
This is the thing that holds the lock. In a transactional application, the locker is a
transaction handle. For non-transactional operations, the locker is the current
thread.
2. The lock.
64
This is the actual data structure that locks the object. In JE, a locked object structure
in the lock manager is representative of the object that is locked.
3. The locked object.
The thing that your application actually wants to lock. In a JE application, the locked
object is usually a database record.
JE has not set a limit for the maximum number of these resources you can use. Instead, you
are only limited by the amount of memory available to your application.
The following figure shows a transaction handle, Txn A, that is holding a lock on
database record002. In this graphic, Txn A is the locker, and the locked object is record 002.
Only a single lock is in use in this operation.
Types of Locks
JE applications support both exclusive and non-exclusive locks. Exclusive locks are granted
when a locker wants to write to an object. For this reason, exclusive locks are also
sometimes called write locks.
An exclusive lock prevents any other locker from obtaining any sort of a lock on the object.
This provides isolation by ensuring that no other locker can observe or modify an
exclusively locked object until the locker is done writing to that object.
Non-exclusive locks are granted for read-only access. For this reason, non-exclusive locks
are also sometimes called read locks. Since multiple lockers can simultaneously hold read
locks on the same object, read locks are also sometimes called shared locks.
A non-exclusive lock prevents any other locker from modifying the locked object while the
locker is still reading the object. This is how transactional cursors are able to achieve
repeatable reads; by default, the cursor's transaction holds a read lock on any object that
the cursor has examined until such a time as the transaction is committed or aborted.
In the following figure, Txn A and Txn B are both holding read locks on record 002,
while Txn C is holding a write lock on record 003:
65
Lock Lifetime
A locker holds its locks until such a time as it does not need the lock any more. What this
means is:
1. A transaction holds any locks that it obtains until the transaction is committed or
aborted.
2. All non-transaction operations hold locks until such a time as the operation is
completed. For cursor operations, the lock is held until the cursor is moved to a new
position or closed.
Blocks
Simply put, a thread of control is blocked when it attempts to obtain a lock, but that
attempt is denied because some other thread of control holds a conflicting lock. Once
blocked, the thread of control is temporarily unable to make any forward progress until the
requested lock is obtained or the operation requesting the lock is abandoned.
Be aware that when we talk about blocking, strictly speaking the thread is not what is
attempting to obtain the lock. Rather, some object within the thread (such as a cursor) is
attempting to obtain the lock. However, once a locker attempts to obtain a lock, the entire
thread of control must pause until the lock request is in some way resolved.
For example, if Txn A holds a write lock (an exclusive lock) on record 002, then if Txn
B tries to obtain a read or write lock on that record, the thread of control in which Txn B is
running is blocked:
66
However, if Txn A only holds a read lock (a shared lock) on record 002, then only those
handles that attempt to obtain a write lock on that record will block.
Moreover, any read locks that are requested while Txn C is waiting for its write lock will
also block until such a time as Txn C has obtained and subsequently released its write lock.
Avoiding Blocks
Reducing lock contention is an important part of performance tuning your concurrent JE
application. Applications that have multiple threads of control obtaining exclusive (write)
locks are prone to contention issues. Moreover, as you increase the numbers of lockers and
as you increase the time that a lock is held, you increase the chances of your application
seeing lock contention.
As you are designing your application, try to do the following in order to reduce lock
contention:
If possible, access heavily accessed (read or write) items toward the end of the
transaction. This reduces the amount of time that a heavily used record is locked by
the transaction.
Reduce your application's isolation guarantees.
By reducing your isolation guarantees, you reduce the situations in which a lock can
block another lock. Try using uncommitted reads for your read operations in order
to prevent a read lock being blocked by a write lock.
In addition, for cursors you can use degree 2 (read committed) isolation, which
causes the cursor to release its read locks as soon as it is done reading the record (as
opposed to holding its read locks until the transaction ends).
Be aware that reducing your isolation guarantees can have adverse consequences
for your application. Before deciding to reduce your isolation, take care to examine
your application's isolation requirements. For information on isolation levels,
see Isolation.
they operate only on non-overlapping portions of your database, then you can
reduce lock contention because your threads will rarely (if ever) block on one
another's locks.
Deadlocks
A deadlock occurs when two or more threads of control are blocked, each waiting on a
resource held by the other thread. When this happens, there is no possibility of the threads
ever making forward progress unless some outside agent takes action to break the
deadlock.
For example, if Txn A is blocked by Txn B at the same time Txn B is blocked by Txn A then
the threads of control containing Txn A and Txn B are deadlocked; neither thread can make
any forward progress because neither thread will ever release the lock that is blocking the
other thread.
When two threads of control deadlock, the only solution is to have a mechanism external to
the two threads capable of recognizing the deadlock and notifying at least one thread that it
is in a deadlock situation. Once notified, a thread of control must abandon the attempted
operation in order to resolve the deadlock. JE is capable of notifying your application when
it detects a deadlock. (For JE, this is handled in the same way as any lock conflict that a JE
application might encounter.) See Managing Deadlocks and other Lock Conflicts for more
information.
Note that when one locker in a thread of control is blocked waiting on a lock held by
another locker in that same thread of the control, the thread is said to be self-deadlocked.
Note that in JE, a self-deadlock can occur only if two or more transactions (lockers) are
used in the same thread. A self-deadlock cannot occur for non-transactional usage, because
the thread is the locker. However, even if you have only one locker per thread, there is still
the possibility of a deadlock occurring with another thread of control (it just will not be a
self-deadlock), so you still must write code that defends against deadlocks.
69
Deadlock Avoidance
The things that you do to avoid lock contention also help to reduce deadlocks (see Avoiding
Blocks).Beyond that, you should also make sure all threads access data in the same order as
all other threads. So long as threads lock records in the same basic order, there is no
possibility of a deadlock (threads can still block, however).
Be aware that if you are using secondary databases (indexes), then locking order is
different for reading and writing. For this reason, if you are writing a concurrent
application and you are using secondary databases, you should expect deadlocks.
Concurrency control:
In information technology and computer science, especially in the fields of computer
programming, operating systems, multiprocessors, and databases, concurrency control
ensures that correct results for concurrent operations are generated, while getting those
results as quickly as possible.
Computer systems, both software and hardware, consist of modules, or components. Each
component is designed to operate correctly, i.e., to obey or to meet certain consistency
rules. When components that operate concurrently interact by messaging or by sharing
accessed data (in memory or storage), a certain component's consistency may be violated
by another component. The general area of concurrency control provides rules, methods,
design methodologies, and theories to maintain the consistency of components operating
concurrently while interacting, and thus the consistency and correctness of the whole
system. Introducing concurrency control into a system means applying operation
constraints which typically result in some performance reduction. Operation consistency
and correctness should be achieved with as good as possible efficiency, without reducing
performance below reasonable levels. Concurrency control can require significant
additional complexity and overhead in a concurrent algorithm compared to the simpler
sequential algorithm.
For example, a failure in concurrency control can result in data corruption from torn read
or write operations.
Concurrency control theory has two classifications for the methods of instituting
concurrency control:
Pessimistic concurrency control
A system of locks prevents users from modifying data in a way that affects other users.
After a user performs an action that causes a lock to be applied, other users cannot perform
actions that would conflict with the lock until the owner releases it. This is called
70
Serializability:
In concurrency control of databases, transaction processing (transaction management),
and various transactional applications (e.g., transactional memory and software
transactional memory), both centralized and distributed, a transaction schedule is
serializable if its outcome (e.g., the resulting database state) is equal to the outcome of its
transactions executed serially, i.e., sequentially without overlapping in time. Transactions
are normally executed concurrently (they overlap), since this is the most efficient way.
Serializability is the major correctness criterion for concurrent transactions' executions. It
is considered the highest level of isolation between transactions, and plays an essential role
in concurrency control. As such it is supported in all general purpose database systems.
Strong strict two-phase locking (SS2PL) is a popular serializability mechanism utilized in
most of the database systems (in various variants) since their early days in the 1970s.
Serializability theory provides the formal framework to reason about and analyze
serializability and its techniques. Though it is mathematical in nature, its fundamentals are
informally (without Mathematics notation) introduced below.
At commit point all transaction operations have been logged and new entry is done
in log 'commit T' stating that all transaction operation permanently logged
before writing commit T the complete log should be written to disk from buffers
72
73
Conflict serializable
In conflict serializabability two schedules are conflict equivalent and we can reorder the
non conflicting operation to get the serial schedule
Conflicting operation
1) they are upon same data item
2)At least one of them is write
3) they are from different transactions
Non commutative that is their orders matter
DB Locking
DBMS is often criticized for excessive locking resulting in poor database performance
when sharing data among multiple concurrent processes. Is this criticism justified, or is
DBMS being unfairly blamed for application design and implementation shortfalls? To
evaluate this question, we need to understand more about DBMS locking protocols. In this
article, we examine how, why, what and when DBMS locks and unlocks database resources.
Future articles will address how to minimize the impact of database locking.
THE NEED FOR LOCKING
In an ideal concurrent environment, many processes can simultaneously access data in a
DBMS database, each having the appearance that they have exclusive access to the
database. In practice, this environment is closely approximated by careful use of locking
protocols.
74
Locking is necessary in a concurrent environment to assure that one process does not
retrieve or update a record that is being updated by another process. Failure to use some
controls (locking), would result in inconsistent and corrupt data.
In addition to record locking, DBMS implements several other locking mechanisms to
ensure the integrity of other data structures that provide shared I/O, communication
among different processes in a cluster and automatic recovery in the event of a process or
cluster failure. While these other lock structures use additional VMS lock resources, they
rarely hinder database concurrency, but can actually improve database performance.
HOW DBMS USES LOCKS
DBMS makes extensive use of the VMS Distributed Lock Manager for controlling virtually
every aspect of database access. Use of the Distributed Lock Manager ensures cluster-wide
control of database resources, thus allowing DBMS to take advantage of OpenVMS'
clustering technology.
VMS locks consume system resources. A typical process, running a DBMS application may
lock hundreds or thousands of records and database pages at a time. Using a VMS lock for
each of these resources in a busy database could easily exhaust these resources. The
system parameters: LOCKIDTBL, LOCKIDTBL_MAX, and REHASHTBL parameters
determine the number of locks that exist on the system at any one time.
To minimize the number of VMS locks required to maintain record and page integrity,
DBMS implements a technique called adjustable locking granularity. This allows DBMS to
manage a group of resources (pages or records) using a single VMS lock. When a conflicting
request is made for the same resource group, the process that is holding the lock is notified
that it is blocking another process and automatically reduces the locking-level of the larger
group.
Adjustable page locking is mandatory and hidden from the database administrator, while
adjustable recordlocking can be enabled and tuned or disabled for each database. When
adjustable record locking is enabled, DBMS attempts to minimize the number of VMS locks
required to maintain database integrity without impacting database concurrency.
TYPES OF LOCKS
DBMS employs many types of locks to ensure database integrity in a concurrent
environment. By using various lock types for different functions, DBMS can provide optimal
performance in many different environments.
- Area Locks
DBMS uses area locks to implement the DML (Data Manipulation Language) READY
statement. If a realm is readied by another run unit, later READY usage modes by other
run-units must be compatible with all existing READY usage modes.
Area locks can significantly affect database concurrency however, their impact is only felt
during a DML READY statement. Lock conflicts for area locks occur only when you attempt
75
to READY a realm. Once you successfully READY a realm, concurrent locking protocols (if
required) are handled at the page and record level. Table I displays compatible area ready
modes.
TABLE I AREA READY MODE COMPATIBILITY TABLE
First Run Unit
00
Concurrent
Retrieval
Protected
Retrieval
Concurrent Protected
Update
Update
Exclusive
Concurrent
Retrieval
GO
GO
GO
GO
WAIT
Protected
Retrieval
GO
GO
WAIT
WAIT
WAIT
Concurrent
Update
GO
WAIT
GO
WAIT
WAIT
Protected
Update
GO
WAIT
WAIT
WAIT
WAIT
Exclusive
WAIT
WAIT
WAIT
WAIT
WAIT
- Page Locks
Page locks are used to manage the integrity of the page buffer pool. DBMS automatically
resolves page lock conflicts by using the blocking AST features of the VMS lock manager.
Thus, page locks are not typically a major impediment to database concurrency unless
long-DML verbs are frequently executed in your environment. DBMS utilizes adjustable
locking to minimize the number of VMS locks required to maintain consistency of the buffer
pool. A high level of blocking ASTs is an indication that there is a lot of contention for
database pages in the buffer pool. Reducing the buffer length may help to reduce the
overhead of page level blocking ASTs.
- Record Locks
Record locks are typically the largest source of lock conflicts in a DBMS environment.
Record locks are used to manage the integrity of your data, and to implement the
"adjustable record locking granularity" feature of DBMS. Adjustable locking is the default
for record locks, but can be tuned or disabled by the DBA.
- Quiet Point Locks
Quiet point locks are used to control online database and afterimage journal backup
operations. Large quiet point lock stall times indicate that processes are waiting for online
backups to begin, or for the primary after-image journal file to be written to secondary
storage. To minimize the effects (duration) of quiet point locks, it is important that all
concurrent database processes (except for batch retrieval transactions) periodically
execute commits (or commit retaining). Even "concurrent retrieval" transactions should
76
periodically "commit [retaining]" their transactions. This ensures that the online backups
will achieve a "quiet point" quickly and allow new transactions to proceed.
- Freeze Locks
Freeze locks are used to stop (freeze) database activity during database process recovery.
When a process terminates abnormally (as a result of a process or node failure, STOP/ID,
or a CTRL-Y/STOP), all locks held by that process are automatically released. If
transactions were allowed to continue, database corruption would result. Thus, when a
process terminates abnormally, DBMS uses the freeze lock to stop database activity until
the failed process(es) can be recovered. Freeze locks typically are not a major source of
contention in most environments. However, if you are subject to frequent system or
process failures, or users are using CTRL-Y/STOP to exit from programs, freeze locks could
hinder database concurrency.
- DATABASE QUALIFIERS
Several of the DBMS creation and modification qualifiers have a direct impact on database
locking characteristics. Establishing the appropriate mix of qualifiers in your environment
can help minimize the impact of database locking.
- /HOLD_RETRIEVAL_LOCKS
The [no]hold_retrieval_locks qualifier, determines whether DBMS holds read-only record
locks on all records read for the duration of the transaction (until the next COMMIT
[without the RETAINING option] or ROLLBACK). Holding retrieval locks guarantees that
any records previously read during a transaction willnot have been changed by another
run-unit during the same transaction. While this increases theconsistency of your
transaction, it can significantly degrade concurrency. This option should only be used if your
transactions read very few records and consistency of all records read must be guaranteed
throughout the transaction. By default, DBMS uses /NOHOLD_RETRIEVAL_LOCKS. The
logical name, DBM$BIND_HOLD_RETRIEVAL_LOCKS may be used to override the default
established in the root file. If DBM$BIND_HOLD_RETRIEVAL_LOCKS translates to "1" then
all records read by the transaction are locked until the end of the transaction. Software
Concepts International recommends against using hold retrieval locks in most
environments.
- /[NO]WAIT_RECORD_LOCKS
The [no]wait_record_locks qualifier determines whether a run-unit waits when requesting
a record that islocked in a conflicting mode by another run-unit or if it receives a "lock
conflict" exception. This qualifieronly determines if the requesting run-unit will receive
a "lock
conflict" exception
not a "deadlock"exception (deadlock exceptions
are always returned when they occur). When the default (WAIT_RECORD_LOCKS) is used,
DBMS will not generate a "lock conflict" exception, and the blocked process will continue to
wait until the record is unlocked. Thus, the process can continue to wait indefinitely until
the record is unlocked by the other run-unit.
The logical name, DBM$BIND_WAIT_RECORD_LOCKS may be used to override the default
established in the root file. Again, a value of "1" enables wait on record lock conflicts, and a
77
value of "0" causes the process to receive the "lock conflict" exception. Software Concepts
International recommends clients to WAIT on record conflicts. This allows the application
to trap for "deadlocks," and avoids "live-lock" situations that cannot be detected. In
addition, the wait on record conflicts can be used with the /TIMEOUT to give the
application control over records locked for an excessive duration.
- /TIMEOUT=LOCK=seconds
The timeout qualifier allows you to specify the amount of time that a run-unit waits for a
locked record before returning a "lock timeout" exception. This qualifier must be used with
the
"wait"
on
record
locks
(above).
The
logical
name,
DBM$BIND_LOCK_TIMEOUT_INTERVAL may be used to override the default established in
the root file. The value of the translation determines the number of seconds to wait for a
locked record. If your applications trap the DBM$TIMEOUT exceptions, then Software
Concepts International recommends using lock timeouts with a time of at least 60 seconds.
Using the /TIMEOUT qualifier only if your application is designed to handle "lock timeout"
exceptions. COBOL shops that use declaratives, may want to handle "DBM$_DEADLOCK",
"DBM$LCKCNFLCT", and "DBM$TIMEOUT" exceptions in the same "USE" section.
- /ADJUSTABLE_LOCKING
Enabling, disabling, or modifying the values of the adjustable locking features of DBMS will
not significantly reduce record lock conflicts. However, adjustable locking can significantly
affect the amount of lock resources your application uses, as well as the overall overhead
associated with record locking.
The DBO/SHOW STATISTICS (record locking) screen provides useful insights into the
potential benefits and costs of adjustable locking. If you observe a blocking AST rate that is
more than 20-25% of the number of locks requested plus locks promoted, then this may
indicate significant adjustable locking overhead. In this case, try disabling adjustable
locking, or reducing the number of levels in its tree.
- /[NO]LOCK_OPTIMIZATION
Lock optimization sounds so obvious. Who wouldn't want "lock optimization?" Lock
optimization (the default) only controls whether area locks are held from one transaction
to another. This avoids the overhead of acquiring and releasing locks for each transaction.
In environments where long DML verbs are frequently executed, lock optimization may
actually degrade performance. This is because the process holding the lock does not release
the NOWAIT lock until the end of its current DML verb. Thus, if the current DML verb takes
a long time to complete, the process trying to ready the realm may experience a long delay.
- /SNAPSHOTS=(option)
Snapshots are included in this discussion of locking, because the use of snapshots (batch
retrieval transactions) can significantly reduce the level of lock contention in your
database. Although snapshot transactions are subject to page and other resource lock
conflicts, they are never involved in record lock conflicts thus providing significantly
increased concurrency between read-only and update transactions.
78
Enabling snapshots are not however a panacea All update processes (except EXCLUSIVE
or BATCH) must write before-images of their updates to the snapshot files. The use of
/DEFERRED qualifier minimizes this affect by allowing update processes to write to the
snapshot file only when snapshot transactions are active.
- BUFFER COUNT
Additional or excessive buffers require additional page level locking to manage the buffer
pool. If you are using large buffer counts, you may need to increase the enque limits on
your processes, as well as the SYSGEN parameters, LOCKIDTBL, LOCKIDTBL_MAX and
REHASHTBL.
- DBMS LOCK EXCEPTIONS
DBMS signals one of three types of exceptions when a process encounters a locked record
a deadlock, a lock conflict or a lock timeout.
- Deadlocks Exceptions
A deadlock exception, DBM$_DEADLOCK, is returned when two run-units attempt to access
a resource in mutually exclusive modes, and each run-unit is waiting for a resource that
the other run-unit holds. This indicates that neither run-unit can continue unless one of the
run-units releases its locks. When a deadlock occurs, DBMS will choose a "victim," and
signal that run-unit of the deadlock condition. This does not cause the "victim" to
automatically release its locks. The victim process should immediately execute a 'rollback'
to release its locks.
- Lock Conflict Exceptions
DBMS will only return the lock conflict exception, DBM$_LCKCNFLCT, when the run-unit is
bound to a database with "/NOWAIT_RECORD_LOCKS" enabled and it attempts to access a
record that is locked in a mutually exclusive mode by another run-unit. Note, that only the
"blocked" run-unit receives the exception.
- Lock Timeout Exceptions
The third type of exception is the lock timeout exception, DBM$TIMEOUT. A lock timeout
only occurs when the "/TIMEOUT=LOCK=nnn" and "/NOWAIT_RECORD_LOCKS" are
enabled and a run-unit attempts to access a record that is locked in a mutually exclusive
mode by another run-unit.
79
Performance of Locking
Normally, two factors govern the performance of locking, namely, resource
contention and data contention. Resource contention refers to the contention over
memory space, computing time and other resources. It determines the rate at which a
transaction executes between its lock requests. On the other hand, data contention refers
to the contention over data. It determines the number of currently executing transactions.
Now, assume that the concurrency control is turned off; in that case the transactions suffer
from resource contention. For high loads, the system may thrash, that is, the throughput of
the system first increases and then decreases. Initially, the throughput increases since only
few transactions request the resources. Later, with the increase in the number of
transactions, the throughput decreases. If the system has enough resources (memory
space, computing power, etc.) that make the contention over resources negligible, the
transactions only suffer from data contention. For high loads, the system may thrash due
to aborting (or rollback) and blocking. Both the mechanisms degrade the performance.
Timestamp-Based Technique
So far, we have discussed that the locks with the two-phase locking ensures the
serializability of schedules. Two-phase locking generates the serializable schedules based
80
on the order in which the transactions acquire the locks on the data items. A transaction
requesting a lock on a locked data item may be forced to wait till the data item is unlocked.
Serializability of the schedules can also be ensured by another method, which involves
ordering the execution of the transactions in advance using timestamps.
Timestamp-based concurrency control is a non-lock concurrency control
technique, hence, deadlocks cannot occur.
81
At 11:01 AM, another teller #2 looks up your account and still sees the $1,000 balance.
Teller #2 then adds your $300 deposit and saves your new account balance as $1,300.
At 11:09 AM, bank teller #1 returns to the terminal, finishes entering and saving the
updated value that is calculated to be $800. That $800 value writes over the $1300.
At the end of the day, your account has $800 when it should have had $1,100 ($1000 + 300
- 200).
Crash Recovery
Though we are living in highly technologically advanced era where hundreds of satellite
monitor the earth and at every second billions of people are connected through
information technology, failure is expected but not every time acceptable.
DBMS is highly complex system with hundreds of transactions being executed every
second. Availability of DBMS depends on its complex architecture and underlying hardware
or system software. If it fails or crashes amid transactions being executed, it is expected
82
that the system would follow some sort of algorithm or techniques to recover from crashes
or failures.
Failure Classification
To see where the problem has occurred we generalize the failure into various categories, as
follows:
TRANSACTION FAILURE
When a transaction is failed to execute or it reaches a point after which it cannot be
completed successfully it has to abort. This is called transaction failure. Where only few
transaction or process are hurt.
Reason for transaction failure could be:
Logical errors: where a transaction cannot complete because of it has some code error or
any internal error condition
System errors: where the database system itself terminates an active transaction because
DBMS is not able to execute it or it has to stop because of some system condition. For
example, in case of deadlock or resource unavailability systems aborts an active
transaction.
SYSTEM CRASH
There are problems, which are external to the system, which may cause the system to stop
abruptly and cause the system to crash. For example interruption in power supply, failure
of underlying hardware or software failure.
Examples may include operating system errors.
DISK FAILURE:
In early days of technology evolution, it was a common problem where hard disk drives or
storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or
any other failure, which destroys all or part of disk storage
Storage Structure
We have already described storage system here. In brief, the storage structure can be
divided in various categories:
Volatile storage: As name suggests, this storage does not survive system crashes and
mostly placed very closed to CPU by embedding them onto the chipset itself for examples:
main memory, cache memory. They are fast but can store a small amount of information.
83
Nonvolatile storage: These memories are made to survive system crashes. They are huge
in data storage capacity but slower in accessibility. Examples may include, hard disks,
magnetic tapes, flash memory, non-volatile (battery backed up) RAM.
Recovery and Atomicity
When a system crashes, it many have several transactions being executed and various files
opened for them to modifying data items. As we know that transactions are made of
various operations, which are atomic in nature. But according to ACID properties of DBMS,
atomicity of transactions as a whole must be maintained that is, either all operations are
executed or none.
When DBMS recovers from a crash it should maintain the following:
It should check the states of all transactions, which were being executed.
A transaction may be in the middle of some operation; DBMS must ensure the atomicity of
transaction in this case.
It should check whether the transaction can be completed now or needs to be rolled back.
Maintaining the logs of each transaction, and writing them onto some stable storage before
actually modifying the database.
Maintaining shadow paging, where are the changes are done on a volatile memory and
later the actual database is updated.
Log-Based Recovery
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to actual modification and stored
on a stable storage media, which is failsafe.
Log based recovery works as follows:
When a transaction enters the system and starts execution, it writes a log about it
<Tn, Start>
84
1. Deferred database modification: All logs are written on to the stable storage and
database is updated when transaction commits.
2. Immediate database modification: Each log follows an actual database modification.
That is, database is modified immediately after every operation.
Recovery with concurrent transactions
When more than one transactions are being executed in parallel, the logs are interleaved.
At the time of recovery it would become hard for recovery system to backtrack all logs, and
then start recovering. To ease this situation most modern DBMS use the concept of
'checkpoints'.
CHECKPOINT
Keeping and maintaining logs in real time and in real environment may fill out all the
memory space available in the system. At time passes log file may be too big to be handled
at all. Checkpoint is a mechanism where all the previous logs are removed from the system
and stored permanently in storage disk. Checkpoint declares a point before which the
DBMS was in consistent state and all the transactions were committed.
RECOVERY
When system with concurrent transaction crashes and recovers, it does behave in the
following manner:
85
The recovery system reads the logs backwards from the end to the last Checkpoint.
If the recovery system sees a log with <T n, Start> and <Tn, Commit> or just <Tn, Commit>, it
puts the transaction in redo-list.
If the recovery system sees a log with <T n, Start> but no commit or abort log found, it puts
the transaction in undo-list.
All transactions in undo-list are then undone and their logs are removed. All transaction in
redo-list, their previous logs are removed and then redone again and log saved.
Write-ahead logging: This principle states that before making any changes to the
database, it is necessary to force-write the log records to the stable storage.
Repeating history during redo: When the system restarts after a crash, ARIES
retraces all the actions of database system prior to the crash to bring the database to
the state which existed at the time of the crash. It then undoes the actions of all the
transactions that were not committed at the time of the crash.
case the failure occurs during the recovery itself, which causes restart of the
recovery process.
ARIES Recovery
ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) recovery is based on
the Write Ahead Logging (WAL) protocol. Every update operation writes a log record
which is one of
An undo-only log record: Only the before image is logged. Thus, an undo operation can be
done to retrieve the old data.
A redo-only log record: Only the after image is logged. Thus, a redo operation can be
attempted.
An undo-redo log record. Both before image and after images are logged.
Every log record is assigned a unique and monotonically increasing log sequence number
(LSN). Every data page has a page LSN field that is set to the LSN of the log record
corresponding to the last update on the page. WAL requires that the log record
corresponding to an update make it to stable storagegif before the data page corresponding
to that update is written to disk. For performance reasons, each log write is not
immediately forced to disk. A log tail is maintained in main memory to buffer log writes.
The log tail is flushed to disk when it gets full. A transaction cannot be declared committed
until the commit log record makes it to disk.
Once in a while the recovery subsystem writes a checkpoint record to the log. The
checkpoint record contains the transaction table (which gives the list of active
transactions) and the dirty page table (the list of data pages in the buffer pool that have not
yet made it to disk). A master log record is maintained separately, in stable storage, to store
the LSN of the latest checkpoint record that made it to disk. On restart, the recovery
subsystem reads the master log record to find the checkpoint's LSN, reads the checkpoint
record, and starts recovery from there on.
The actual recovery process consists of three passes:
Analysis. The recovery subsystem determines the earliest log record from which the next
pass must start. It also scans the log forward from the checkpoint record to construct a
snapshot of what the system looked like at the instant of the crash.
Redo. Starting at the earliest LSN determined in pass (1) above, the log is read forward and
each update redone.
87
Undo. The log is scanned backward and updates corresponding to loser transactions are
undone.
For further details of the recovery process, see [Mohan et al. 92,Ramamurthy & Tsoi 95].
It is clear from this description of ARIES that the following features are required for a log
manager:
Ability to write log records. The log manager should maintain a log tail in main memory
and write log records to it. The log tail should be written to stable storage on demand or
when the log tail gets full. Implicit in this requirement is the fact that the log tail can
become full halfway through the writing of a log record. It also means that a log record can
be longer than a pagegif.
Ability to wraparound. The log is typically maintained on a separate disk. When the log
reaches the end of the disk, it is wrapped around back to the beginning.
Ability to store and retrieve the master log record. The master log record is stored
separately in stable storage, possibly on a different duplex-disk.
Ability to read log records given an LSN. Also, the ability to scan the log forward from a
given LSN to the end of log. Implicit in this requirement is that the log manager should be
able to detect the end of the log and distinguish the end of the log from a valid log record's
beginning.
Ability to create a log. In actual practice, this will require setting up a duplex-disk for the
log, a duplex-disk for the master log record, and a raw device interface to read and write
the disks bypassing the Operating System.
Ability to maintain the log tail. This requires some sort of shared memory because the
log tail is common to all transactions accessing the database the log corresponds to. Mutual
exclusion of log writes and reads have to be taken care of.
The following sections describe some simplifying assumptions that we have made to fit the
protocol into Minirel and the interface and implementation of our log manager.
Write-ahead logging:
In computer science, write-ahead logging (WAL) is a family of techniques for
providing atomicity and durability (two of the ACID properties) in database
systems.
88
In a system using WAL, all modifications are written to a log before they are
applied. Usually both redo and undo information is stored in the log.
The purpose of this can be illustrated by an example. Imagine a program that
is in the middle of performing some operation when the machine it is running
on loses power. Upon restart, that program might well need to know whether
the operation it was performing succeeded, half-succeeded, or failed. If a
write-ahead log were used, the program could check this log and compare
what it was supposed to be doing when it unexpectedly lost power to what
was actually done. On the basis of this comparison, the program could decide
to undo what it had started, complete what it had started, or keep things as
they are.
WAL allows updates of a database to be done in-place. Another way to
implement atomic updates is with shadow paging, which is not in-place. The
main advantage of doing updates in-place is that it reduces the need to modify
indexes and block lists.
ARIES is a popular algorithm in the WAL family.
File systems typically use a variant of WAL for at least file system metadata
called journaling.
The PostgreSQL database system also uses WAL to provide point-in-time
recovery and database replication features.
SQLite database also uses WAL.
MongoDB uses write-ahead logging to provide consistency and crash safety.
Apache HBase uses WAL in order to provide recovery after disaster.
Write-Ahead Logging (WAL)
The Write-Ahead Logging Protocol:
Must of rce the log record for an update before the
corresponding data page gets to disk.
Must write all log records for a Xact before commit.
89
#1 guarantees Atomicity.
#2 guarantees Durability.
Exactly how islogging (and recovery!) done?
Well study the ARIES algorithms.
WAL & the Log
Each log record has a unique Log SequenceNumber (LSN).
LSNs always increasing.
Each data page contains a pageLSN.
The LSN of the most recent log record
for an update to that page.
System keeps track of fushedLSN.
The max LSN fushed so far.
WAL: Before a page is written,
pageLSN
fushedLSN
90
UNIT-4
Database Storage:Databases are stored physically as files of records on some storage medium. This section
will deal with the overview of avaiable storage media then briefly describes the magnetic
storage devices.
sequentially from the beginning. Tape jukeboxes are used to hold large collections of data
and is becoming a popular tertiary storage.
Magnetic Disk Devices:Magnetic disks are used for storing large amount of data. The capacity of disk is the number
of bytes it can store.
Disk platter has a flat circular shape. Its two surface are covered with magnetic material
and data is recorded on the surface. The disk surface is divided in to tracks , each track is a
circle of distict diameter. Track is subdivided into blocks (sectors). Depending on the disk
type, block size varies from 32 bytes to 4096 bytes. There may be hundreds of concentric
tracks on a disk surface containing thousands of sectors. In disk packs, tracks with the same
diameter on the various surfaces forms a cylinder
-
The hardware mechanism that reads or writes a block is the disk read/write head (disk
drive) . A disk or disk packes is mounted into the disk drive, which includes a motor to
rotate the disks. A read/write head include the electronic component attached to a
mechanical arm. The arms moves the read/write heads, positions them precisely over the
cylinder or tracks specified in a block address.
Placing File records on Disks:A file is organized logically as a sequence of records. Each record consists of a collections of
related data values or items which is corresponds to a particular fields of the record. In
database system, a record usually represents an entities. For example, an EMPLOYEE
record represents an employee entity and each item in this record specifies the value of an
attribute of that employee, such as Name, Address, Birthdate etc.
In most cases, all records in the file have the same type. That means every record has the
same fields, each field has fixed length data type. If all records has the same size (in bytes)
then the file is file of fixed-length records. If records in a file have different sizes, the file is
made up of variable length-records. In this lecture, we focus on only fixed-length record
file.
The records of a file must be allocated to
disk blocks in some ways. When a record
size is much smaller than block size a block
92
can contains several records. However, unless the block size happens to be a multiple of
record size, some records might cross block boundaries. In this situation, a part of a record
is stored in one block and the other part is in another block. It would thus require two
block accesses to read or write such a record. This organization is called spanned.
If records are not allowed to cross block boundaries, we have the unspanned organization.
In this lecture, from now on, we assume that records in a file are allocated in the
unspanned manner.
Basic Organizations of Records in Files:In this sections, we will examine several ways of organizing a collection of records in a file
on the disk and discuss the access methods that can be applied to each methods.
Heap Files Organization:In this organization, any record can simply be placed in the file in the order in which they
are inserted. That means there is no ordering of records, a new record is always inserted at
the end of the file. Therefore, this is sometimes called the Unordered File organization to
differentiated from the Ordered File organization which will be presented in the next
section.
In the below figure, we can see a
sample of heap file organization for
EMPLOYEE relation which consists
of 8 records stored in 3 contiguous
blocks, each blocks can contains at
most 3 records.
93
The file after deleting records of Raymond Wong and reoganizing the file is
94
Primary index: this index is specified on the ordering key field of an ordered file.
Ordering key field is the field that has the unique value for each record and the data
file is ordered based on its value.
Clustering index: this index is specified on the nonkey ordering field of ordered file
Secondary index: this index is specified on a field which is not the ordering field of the data file. A file can have
several secondary index.
Dense index: there is an index entry for every search key value in the data file.
Sparse index: An index entry is created for only some of the search values.
Primary Indexes
The index file includes a set of record. Each record (entry) in the primary index file has two field (k,p) : k is a key field
which have the same data type as the ordering key field of data file, p is a pointer to a disk block . The entries in the index
file are sorted based on the values of key fields.
Primary index can be dense or nondense.
95
96
Clustering Indexes
If the data file is ordered on a nonkey field (clustering field) which does not have unique value for each record, we can
create clustering index.
An index entry in clustering index file has two fields, the first one is the same as the clustering field of the data file, the
second one is the block pointer which points to the block that contains the first record with the value of the clustering field
in the entry.
Example: Assume the EMPLOYEE file is ordered by DeptId as in figure 14, we are looking for the records of employees of
D3. There is a index entry with value D3, follow the pointer in that index, we locate the first data record with value D3,
continue processing records until we encounter a record for a department other than D3.
Secondary Indexes
As mentioned above, secondary index is created on the field which is not an ordering field of the data file. This field might
have unique value for every records or have duplicates values. Secondary index must be dense. Figure 15,16 illustrates a
secondary index on a nonordering key field of a file and a secondary index on a nonordering , nonkey field respectively.
97
98
Figure 16: Secondary index on nonordering nonkey field of a data file using one
level of indirection
Secondary index usually need more storage space and longer search time than a primary index because it has larger
number of entries. Howerver, it improves the performance of queries that use keys other than the search key of the primary
index. If there is no secondary indices on such key, we would have to do linear search.
Looking up for a record with search key value K: Firstly, find the index entry of which the key value is smaller or
equal K. This searching in the index file can be done using linear or binary search. Once we have located the proper
entry, following the pointer to the block that might contain the required record.
Insertion: To insert a new record with search key value K, we firstly need to locate the data block by look up in the
index file. Then we store the new record in the block. No change needs to be made to the index unless a new block
is created or a new record is set to be the first record in the block. In those case, we need to add a new entry in the
index file or modify the key value in a existing index entry.
Deletion: Similar to insertion, we find the data block that contains the deleted record and delete it from the block.
The index file might be changed if the deleted record is the only record in the block ( an index entry will be deleted
either) or the deleted record is the first record in the block ( we need to update the key value in the correspond
index entry)
Modification: First, locating the records to be updated. If the field to be changed is not index field, then change the
record. Otherwise, delete the old record and then insert the modified one.
Structure of B+ tree
A B+ tree of order m has the following properties:
The root node of the tree is either a leaf node or has at least two used pointers. Pointers point to B+tree node at
the next level.
99
The leaf node in B+ tree have an entry for every values of the search field along with a data pointer to the record
(or to a block that contains this record). The last pointer points to the next leaf to the right. A leaf node contains at
least
m/2 and at most m-1 values. Leaf node is of the form (<k1, p1>, <k2, p2>, ,<km-1, pm-1>, pm)
Each internal node in B+ tree is of the form (p1, k1, p2, k2, , pm-1, km-1, pm). It contains up to m-1 search key
values k1, k2, k3, , km-1 and m pointers p1, p2, p3, pm. The search key values within a node is sorted k1 < k2
< k3 << km-1. Actually, the internal nodes in B+ tree forms multilevel sparse index on the leaf nodes. At
least
All paths from root node to leaf nodes have equal length.
If the root of B is leaf node, look among the search key values there. If the values is found in position i then pointer
i is the pointer to the desire record.
If we are at a internal node with key values k1, k2, k3, , km-1, we examine the node, looking for the smallest
search key value greater than k. Assume that this search key value is ki , we follow the pointer pi the the node in
the next level. If k < k1 then follows p1, if we have m pointers in the node and k >= km-1 then we follows pm the
the node in next level. Recursively apply the search procedure at the node in next level
Using searching to find the leaf node L to store new pair <key, pointer> to the new record.
If there is enough space for the new pair in L, put the pair there.
If there is no room in L, we split L into two leaf nodes and divide the keys between two leaf node so each is at least
half full.
100
Splitting at one level might lead to splitting in the higher level if a new key pointer pair needs to be inserted into a
full internal node in the higher level.
The following procedure describe the important steps in inserting a record in B+ tree
Example of inserting new record with key value 40 into tree in figure 10.16: Key values 40 will be inserted into leaf node
which is already full (with 3 keys 31, 37, 41). The node is split into two, first new node contains keys 31 and 37, second
node contains keys 40 and 41. Then the pair <40,pointer> will be copied up to the node in higher level.
101
Figure 19: Beginning the insertion of key 40, split the leaf node
The internal node in which pair<40, pointer> is inserted is also full (with keys 23, 31, 43 and 4 pointers), we have internal
node splitting situation. Consider 4 keys 23, 31, 40, 43 and 5 pointers. According to the above algorithm, first 3 pointers and
first 2 keys (23, 31) stay in the node, the last 2 pointers and last key (43) moved to the new right sibling of internal node.
Key 40 is left over and push up to the node in the higher level.
Deleting begins with looking up the leaf node L that contains the records, delete the data record, then delete the
key-pointer pair to that record in L
If after deleting L still has at least the minimum number of keys and pointers, nothing more can be done.
If one of the adjacent siblings of L has more than minimum number of keys and pointers, then borrow one
key-pointer pairs of the sibling, keeping the order of keys intact. Possibly, the keys at the parent of L must
be adjusted.
If cannot borrowing from siblings but entries from L and one of the siblings , says L can fit in a single node,
we merge these two nodes together. We need to adjust the keys at the parent and then delete a keypointer pair at the parent. If the parent still has enough number of keys and pointers, we are done. If not,
then we recursively apply the deletion at the parent.
102
Example:
Figure 22: Delete record with key 7 from the tree in figure 17 Borrow from the sibling
103
Figure 23: Beginning of deletion of record with key 11 from the tree in figure 22. This is the case of
merge two leaf nodes
Tree-Based Indexing:Tree-based indexing organizes the terms into a single tree. Each path into the tree
represents common properties of the indexed terms, similar to decision trees or
classification trees.
The basic tree-based indexing method is discrimination tree indexing. The tree reflects
exactly the structure of terms. A more complex tree-based method is abstraction
tree indexing. The nodes are labeled with lists of terms, in a manner that reflects the
substitution of variables from a term to another: the domain of variable substitutions in a
node is the codomain of the substitutions in a subnode (substitutions are mappings from
variables to terms).
A relatively recent tree-based method was proposed in [Graf1995]: substitution tree
indexing. This is an improved version of discrimination tree and abstraction tree indexing.
Each path in the tree represents a chain of variable bindings. The retrieval of terms is based
on a backtracking mechanism similar to the one in Prolog. Substitution tree indexing
exhibits retrieval and deletion times faster than other tree-based indexing methods.
However, it has the disadvantage of slow insertion times.
Since typed feature structures can be viewed as similar to first order terms with variables,
the unification process requires a sequence of substitutions. Substitution tree indexing
104
operations
105
The major (software) components withinf a DBMS, which involve levels of access
to physical storage are the following:
[rest of DBMS]
|
+------------------------+
| FILES & ACCESS METHODS |
|
|
|
| BUFFER MANAGER |
|
|
|
| DISK SPACE MANAGER |
+------------------------+
|
[physical data]
In short,
DISK SPACE MANAGER
Manages the precise use of space on the disk, keeping track of which
"pages" have been allocated, and when data should be read or writen into
those pages.
BUFFER MANAGER
Manages the control of pages which are currently residing in main
memory, as well as the transfer of those pages back and forth between
main memory and the disk.
FILES & ACCESS METHODS
Irregardless of the low level memory increments, much of the database
software will want to view data as logically organized into files, each of
which may be stored below using a large number of low level data pages.
and it will need to determine which frame to store it in, and thus which
existing page of the buffer pool to evict.
The decision of which page to evict is complicated by several factors:
Several current processes may have requested a particular page
at the same time, and that page can only be released from memory
after all of the requesting processes have released the page.
To accomplish this, a pin_count is kept for each page currently in the
buffer. The count is initially zero; is incremented each time a request
for the page is served (a.k.a. "pinning); and is decremented each time
a process subsequently releases the page (a.k.a. "unpinning").
Thus, the evicted page must be chosen from those pages with a
current pin count of zero. (if no such pages exist, then
There may be several candidate pages for eviction. There are
many factors that might influence our choice; we can adopt a
particular "replacement policy" for such decisions. (we defer the
discussion of such policies for the moment).
When a page is going to be evicted, we must be concerned as to
whether the contents of that page in main memory were altered
since the time it was brought in from the disk. If so, we must make
sure to write the contents of the page back to the disk (via the Disk
Manager). Conversely, if the page was only read, then we can
remove it from main memory, knowing that the contents are still
accurate on disk.
To accomplish this, a boolean value known as the dirty bit is kept for
each page in the buffer pool. When read from disk, the dirty bit is
initially set to false. However, when each process releases the page,
it must also inform the buffer manager of whether or not it had
changed any of the memory contents while it was checked out. If so,
then the dirty bit is set to true, ensuring that the contents will later
be written to disk should this page be evicted.
Buffer Replacement Policies
cannot fit on a single page of the disk, then multiple pages will be used to
represent that file.
For example, for a typical table in a relational database, each tuple would
be a record, and the (unordered) set of tuples would be stored in a single
file. Of course, other internal data for the DBMS can also be viewed as
records and files.
Each record has a unique identifier called a record id (rid). Among other
things, this will identify the disk address of the page which contains the
record.
The file and access layer will manage the abstraction of a file of records. It
will support the creation and destruction of files, as well as the insertion
and deletion of records to and from the file. It will also support the
retrieval of a particular record identified by rid, or a scan operation to step
through all records of the file, one at a atime.
Implementation
The file layer will need to keep track of what pages are being used in a
particular file, as well as how the records of the file are organized on those
pages.
There are several issues to address:
Whether the records in a file are to be maintained as an ordered
collection or unordered.
Whether the records of a given file are of fixed size or of variable
size.
We will consider three major issues:
Format of a single record
Format of a single page
Format of a single file
111
Memory hierarchy
The term memory hierarchy is used in computer architecture when discussing
performance issues in computer architectural design, algorithm predictions, and the lower
level programming constructs such as involving locality of reference. A "memory
hierarchy" in computer storage distinguishes each level in the "hierarchy" by response
time. Since response time, complexity, and capacity are related,[1] the levels may also be
distinguished by the controlling technology.
The many trade-offs in designing for high performance will include the structure of the
memory hierarchy, i.e. the size and technology of each component. So the various
components can be viewed as forming a hierarchy of memories (m1,m2,...,mn) in which
each member mi is in a sense subordinate to the next highest member mi-1 of the
hierarchy. To limit waiting by higher levels, a lower level will respond by filling a buffer and
then signaling to activate the transfer.
There are four major storage levels.[1]
Internal Processor registers and cache.
Main the system RAM and controller cards.
On-line mass storage Secondary storage.
Off-line bulk storage Tertiary and Off-line
storage.
This is a general memory hierarchy structuring. Many other structures are useful. For
example, a paging algorithm may be considered as a level for virtual memory when
designing a computer architecture.
Redundant Arrays of Independent Disks(RAID):RAID allows information to access several disks. RAID uses techniques such as disk
striping (RAID Level 0), disk mirroring (RAID Level 1), and disk striping with parity (RAID
Level 5) to achieve redundancy, lower latency, increased bandwidth, and maximized ability
to recover from hard disk crashes.
RAID consistently distributes data across each drive in the array. RAID then breaks down
the data into consistently-sized chunks (commonly 32K or 64k, although other values are
acceptable). Each chunk is then written to a hard drive in the RAID array according to the
115
RAID level employed. When the data is read, the process is reversed, giving the illusion that
the multiple drives in the array are actually one large drive.
What is RAID?
RAID (redundant array of independent disks; originally redundant array of inexpensive
disks) is a way of storing the same data in different places (thus, redundantly) on
multiple hard disks. By placing data on multiple disks, I/O (input/output) operations can
overlap in a balanced way, improving performance. Since multiple disks increases the
mean time between failures (MTBF), storing data redundantly also increases fault
tolerance.
Who Should Use RAID?
System Administrators and others who manage
large amounts of data would benefit from using
RAID technology. Primary reasons to deploy RAID
include:
-
Enhances speed
Increases storage capacity using a single
virtual disk
Minimizes disk failure
116
Software RAID implements the various RAID levels in the kernel disk (block device) code. It
offers the cheapest possible solution, as expensive disk controller cards or hot-swap
chassis [1] are not required. Software RAID also works with cheaper IDE disks as well as
SCSI disks. With today's faster CPUs, Software RAID outperforms Hardware RAID.
The Linux kernel contains an MD driver that allows the RAID solution to be completely
hardware independent. The performance of a software-based array depends on the server
CPU performance and load.
To learn more about Software RAID, here are the key features:
o
o
o
o
o
o
118
Each tree node is a disk page, and all the data resides in the leaf pages. This corresponds to
an index that uses Alternative (1) for data entries, in terms of the alternatives described in
Chapter 8; we can create an index with Alternative (2) by storing t.he data records in a
separate file and storing (key, rid) pairs in the leaf pages of the ISAM index. When the file is
created, all leaf pages are allocated sequentially and sorted on the search key value. (If
Alternative (2) or (3) is used, the data records are created and sorted before allocating the
leaf pages of the ISAM index.) The non-leaf level pages are then allocated. If there are
several inserts to the file subsequently, so that more entries are inserted into a leaf than
will fit onto a single page, additional pages are needed because the
The basic operations of insertion, deletion, and search are all quite straightforward.J;"'or an
equality selection search, we start at the root node and determine which subtree to search
by comparing the value in the search field of the given record with the key values in the
node. (The search algorithm is identical to that for a B+ tree; we present this algorithm in
more detail later.) For a range query, the starting point in the data (or leaf) level is
determined similarly, and data pages are then retrieved sequentially. For inserts and
deletes, the appropriate page is determined as for a search, and the record is inserted or
deleted with overflow pages added if necessary.
119
We assume that each leaf page can contain two entries. If we now insert a record with key
value 23, the entry 23* belongs in the second data page, which already contains 20* and
27* and has no more space. We deal with this situation by adding an overflow page and
putting 23* in. the overflow page. Chains of overflow pages can easily develop. F'or
instance, inserting 48*, 41 *, and 42* leads to an overflow chain of two pages. The tree of
Figure 10.5 with all these insertions is shown ill Figure 10.6.
the search and the leaf nodes contain the data entries. Since the tree structure grows and
shrinks dynamically, it is not feasible to allocate the leaf pages sequentially as in ISAM,
where the set of primary leaf pages was static. To retrieve all leaf pages efficiently, we have
to link them using page pointers. By organizing them into a doubly linked list, we can easily
traverse the sequence of leaf pages (sometimes called the sequence set) in either direction.
This structure is illustrated in Figure 10.7. 2
The following are some of the main characteristics of a B+ tree:
Operations (insert, delete) on the tree keep it balanced.
A minimum occupancy of 50 percent is guaranteed for each node except the root if the
deletion algorithm discussed in Section 10.6 is implemented. However, deletion is often
implemented by simply locating the data entry and removing it, without adjusting the tree
&'3 needed to guarantee the 50 percent occupancy, because files typically grow rather than
shrink. l1li Searching for a record requires just a traversal from the root to the appropriate
leaf. Vie refer to the length of a path from the root to a leaf any leaf, because the tree is
balanced as the height of the tree. For example, a tree with only a leaf level and a single
index level, such as the tree shown in Figure 10.9, has height 1, and a tree that h&'3 only
the root node has height O. Because of high fan-out, the height of a B+ tree is rarely more
than 3 or 4.
SEARCH:
The algorithm for sean:h finds the leaf node in which a given data entry belongs. A
pseudocode sketch of the algorithm is given in Figure 10.8. "\Te use the notation *ptT to
denote the value pointed to by a pointer variable ptT and & (value) to denote the address of
val'nc. Note that finding i in tTcc_seaTch requires
us to search within the node, which can be done
with either a linear search or a binary search (e.g.,
depending on the number of entries in the node).
In discussing the search, insertion, and deletion
algorithms for B+ trees, we assume that there are
no duplicates. That is, no two data entries are
allowed to have the same key value. Of course,
duplicates arise whenever the search key does not
contain a candidate key and must be dealt with in
practice. We consider how duplicates can be
handled in Section 10.7.
121
INSERT
The algorithm for insertion takes an entry, finds the leaf node where it belongs, and inserts
it there. Pseudocode for the B+ tree insertion algorithm is given in Figure HUG. The basic
idea behind the algorithm is that we recursively insert the entry by calling the insert
algorithm on the appropriate child node. Usually, this procedure results in going down to
the leaf node where the entry belongs, placing the entry there, and returning all the way
back to the root node. Occasionally a node is full and it must be split. When the node is split,
an entry pointing to the node created by the split must be inserted into its
parent; this entry is pointed to by the pointer variable newchildentry. If the (old) root is
split, a new root node is created and the height of the tree increa..<;es by 1.
122
The difference in handling leaf-level and index-level splits arises from the B+ tree
requirement that all data entries h must reside in the leaves. This requirement prevents us
from 'pushing up' 5 and leads to the slight redundancy of having some key values
appearing in the leaf level as well as in some index leveL However, range queries can be
efficiently answered by just retrieving the sequence of leaf pages; the redundancy is a small
price to pay for efficiency. In dealing with the index levels, we have more flexibility, and we
'push up' 17 to avoid having two copies of 17 in the index levels. Now, since the split node
was the old root, we need to create a new root node to hold the entry that distinguishes the
two split index pages. The tree after completing the insertion of the entry 8* is shown in
Figure 10.13.
DELETE:The algorithm for deletion takes an entry, finds the leaf node where it belongs, and deletes
it. Pseudocode for the B+ tree deletion algorithm is given in Figure 10.15. The basic idea
behind the algorithm is that we recursively delete the entry by calling the delete algorithm
on the appropriate child node. We usually go down to the leaf node where the entry
123
belongs, remove the entry from there, and return all the way back to the root node.
Occasionally a node is at minimum occupancy before the deletion, and the deletion causes
it to go below the occupancy threshold. When this happens, we must either redistribute
entries from an adjacent sibling or merge the node with a sibling to maintain minimum
occupancy. If entries are redistributed between two nodes, their parent node must be
updated to reflect this; the key value in the index entry pointing to the second node must be
changed to be the lowest search key in the second node. If two nodes are merged, their
parent must be updated to reflect this by deleting the index entry for the second node; this
index entry is pointed to by the pointer variable oldchildentry when the delete call returns
to the parent node. If the last entry in the root node is deleted in this manner because one
of its children was deleted, the height of the tree decreases by 1. To illustrate deletion, let
us consider the sample tree shown in Figure 10.13. To delete entry 19*, we simply remove
it from the leaf page on which it appears, and we are done because the leaf still contains
two entries. If we subsequently delete 20*, however, the leaf contains only one entry after
the deletion. The (only) sibling of the leaf node that contained 20* has three entries, and we
can therefore deal with the situation by redistribution; we move entry 24* to the leaf page
that contained 20* and copy up the new splitting key (27, which is the new low key value of
the leaf from which we borrowed 24*) into the parent. This process is illustrated in Figure
10.16. Suppose that we now delete entry 24*. The affected leaf contains only one entry
(22*) after the deletion, and the (only) sibling contains just two entries (27* and 29*).
Therefore, we cannot redistribute entries. However, these two leaf nodes together contain
only three entries and can be merged. \Vhile merging, we can 'tos::;' the entry ((27, pointer'
to second leaf page)) in the parent, which pointed to the second leaf page, because the
second leaf page is elnpty after the merge and can be discarded. The right subtree of Figure
10.16 after thi::; step in the deletion of entry 2!1 * is shown in Figure 10.17.
124
The situation when we have to merge two non-leaf nodes is exactly the opposite of the
situation when we have to split a non-leaf node. We have to split a nonleaf node when it
contains 2d keys and 2d + 1 pointers, and we have to add another key--pointer pair. Since
we resort to merging two non-leaf nodes onl when we cannot redistribute entries between
them, the two nodes must be minimally full; that is, each must contain d keys and d + 1
pointers prior to the deletion. After merging the two nodes and removing the key--pointer
pair to be deleted, we have 2d - 1 keys and 2d + 1 pointers: Intuitively, the leftmost pointer
on the second merged node lacks a key value. To see what key value must be combined
with this pointer to create a complete index entry, consider the parent of the two nodes
being merged. The index entry pointing to one of the merged nodes must be deleted from
the parent because the node is about to be discarded. The key value in this index entry is
precisely the key value we need to complete the new merged node: The entries in the first
node being merged, followed by the splitting key value that is 'pulled down' from the
parent, followed by the entries in the second non-leaf node gives us a total of 2d keys and
2d + 1 pointers, which is a full non-leaf node. Note how the splitting
Consider the merging of two non-leaf nodes in our example. Together, the nonleaf node and
the sibling to be merged contain only three entries, and they have a total of five pointers to
leaf nodes. To merge the two nodes, we also need to pull down the index entry in their
parent that currently discriminates between these nodes. This index entry has key value
17, and so we create a new entry (17, left-most child pointer in sibling). Now we have a
total of four entries and five child pointers, which can fit on one page in a tree of order d =
2. Note that pulling down the splitting key 17 means that it will no longer appear in the
parent node following the merge. After we merge the affected non-leaf node and its sibling
by putting all the entries on one page and discarding the empty sibling page, the new node
is the only child of the old root, which can therefore be discarded. The tree after completing
all these steps in the deletion of entry 24* is shown in Figure 10.18.
125
STATIC HASHING:
The Static Hashing scheme is illustrated in Figure 11.1. The pages containing the data can
be viewed as a collection of buckets, with one primary page and possibly additional
overflow pages per bucket. A file consists of buckets a through N - 1, with one primary page
per bucket initially. Buckets contain data entTies, which can be any of the three alternatives
. To search for a data entry, we apply a hash function h to identify the bucket to which it
belongs and then search this bucket. To speed the search of a bucket, we can maintain data
entries in sorted order by search key value; in
this chapter, we do not sort entries, and the
order of entries within a bucket has no
significance. To insert a data entry, we use the
hash function to identify the correct bucket and
then put the data entry there. If there is no
space for this data entry, we allocate a new
overflow page, put the data entry on this page,
126
and add the page to the overflow chain of the bucket. To delete a data entry, we use the
hashing function to identify the correct bucket, locate the data entry by searching the
bucket, and then remove it. If this data entry is the last in an overflow page, the overflow
page is removed from the overflow chain of the bucket and added to a list of free pages. The
hash function is an important component of the hashing approach. It must distribute values
in the domain of the search field uniformly over the collection of buckets. If we have N
buckets, numbered athrough N ~ 1, a hash function h of the form h(value) = (a * value +b)
works well in practice. (The bucket identified is h(value) mod N.) The constants a and b can
be chosen to 'tune' the hash function.
EXTENDIBLE HASHING:
To understand Extendible Hashing, let us begin by considering a Static Hashing file. If we
have to insert a new data entry into a full bucket, we need to add an overflow page. If we do
not want to add overflow pages, one solution is to
reorganize the file at this point by doubling the
number of buckets and redistributing the entries
across the new set of buckets. This solution suffers
from one major defect--the entire file has to be
read, and twice (h') many pages have to be written
to achieve the reorganization. This problem,
however, can be overcome by a simple idea: Use a
directory of pointers to bucket.s, and double t.he
size of the number of buckets by doubling just the
directory and splitting only the bucket that
overflowed. To understand the idea, consider the
sample file shown in Figure 11.2. The directory
consists of an array of size 4, with each element
being a point.er to a bucket.. (The global depth and
local depth fields are discussed shortly, ignore them for now.) To locat.e a data entry, we
apply a hash funct.ion to the search field and take the last. 2 bit.s of its binary
represent.ation t.o get. a number between 0 and ~~. The pointer in this array position
gives us t.he desired bucket.; we assume that each bucket can hold four data ent.ries.
Therefore, t.o locate a data entry with hash value 5 (binary 101), we look at directory
element 01 and follow the pointer to the data page (bucket B in the figure). To insert. a
dat.a entry, we search to find the appropriate bucket.. For example, to insert a data entry
with hash value 13 (denoted as 13*), we examine directory element 01 and go to the page
containing data ent.ries 1*, 5*, and 21 *. Since
127
by allocating a new bucketl and redistributing the contents (including the new entry to be
inserted) across the old bucket and its 'split image.' To redistribute entries across the old
bucket and its split image, we consider the last three bits of h(T); the last two bits are 00,
indicating a data entry that belongs to one of these two buckets, and the third bit
discriminates between these buckets. The redistribution of entries is illustrated in Figure
11.4.
LINEAR HASHING:
Linear Hashing is a dynamic hashing
technique, like Extendible Hashing, adjusting
gracefully to inserts and deletes. In contrast to
Extendible Hashing, it does not require a
directory, deals naturally with collisions, and
offers a lot of flexibility with respect to the
timing of bucket splits (allowing us to trade
off slightly greater overflow chains for higher
average space utilization). If the data
distribution is very skewed, however, overflow chains could cause Linear Hashing
performance to be worse than that of Extendible Hashing. The scheme utilizes a family of
hash functions ha, hI, h2, ... , with the property that each function's range is twice that of its
predecessor. That is, if hi maps a data entry into one of M buckets, hi+I maps a data entry
into one of 2lv! buckets. Such a family is typically obtained by choosing a hash function
hand an initial number N ofbuckets,2 and defining hi(value) "'= h(value) mod (2i N). If N is
128
chosen to be a power of 2, then we apply h and look at the last di bits; do is the number of
bits needed to represent N, and di = da+ i. Typically we choose h to be a function that maps
a data entry to some integer. Suppose
that we set the initial number N of buckets to be 32. In this case do is 5, and ha is therefore
h mod 32, that is, a number in the range 0 to 31. The value of dl is do + 1 = 6, and hI is h
mod (2 * 32), that is, a number in the range 0 to 63. Then h2 yields a number in the range 0
to 127, and so OIl. The idea is best understood in terms of rounds of splitting. During round
number Level, only hash functions hLeud and hLevel+1 are in use. The buckets in the file at
the beginning of the round are split, one by one from the first to the last bucket, thereby
doubling the number of buckets. At any given point within a round, therefore, we have
buckets that have been split, buckets that are yet to be split, and buckets created by splits
in this round, as illustrated in Figure 11.7. Consider how we search for a data entry with a
given search key value. \Ve apply
ha..:sh function h Level , and if this
leads us to one of the unsplit buckets,
we simply look there. If it leads us to
one of the split buckets, the entry may
be there or it may have been moved to
the new bucket created earlier in this
round by splitting this bucket; to
determine which of the two buckets
contains the entry, we apply hLevel+I'
2Note that 0 to IV - 1 is not the range
of fl.!
values are hashed; but whereas the directory is doubled in a single step of Extendible
Hashing, moving from hi to hi+l, along with a corresponding doubling in the number of
buckets, occurs gradually over the course of a round in Linear Ha.'3hing. The new idea
behind Linear Ha.'3hing is that a directory can be avoided by a clever choice of the bucket
to split. On the other hand, by always splitting the appropriate bucket, Extendible Hashing
may lead to a reduced number of splits and higher bucket occupancy. The directory
analogy is useful for understanding the ideas behind Extendible and Linear Hashing.
However, the directory structure can be avoided for Linear Hashing (but not for Extendible
Hashing) by allocating primary bucket pages consecutively, which would allow us to locate
the page for bucket i by a simple offset calculation. For uniform distributions, this
implementation of Linear Hashing has a lower average cost for equality selections (because
the directory level is eliminated). For skewed distributions, this implementation could
result in any empty or nearly empty buckets, each of which is allocated at least one page,
leading to poor performance relative to Extendible Hashing, which is likely to have higher
bucket occupancy. A different implementation of Linear Hashing, in which a directory is
actually maintained, offers the flexibility of not allocating one page per bucket; null
directory elements can be used as in Extendible Hashing. However, this implementation
introduces the overhead of a directory level and could prove costly for large, uniformly
distributed files. (Also, although this implementation alleviates the potential problem of
low bucket occupancy by not allocating pages for empty buckets, it is not a complete
solution because we can still have many pages with very few entries.)
130
UNIT-5
DISTRIBUTED DATABASES
A distributed database is a database in which storage devices are not all attached to a
common processing unit such as the CPU,[1] controlled by a distributed database
management system (together sometimes called a distributed database system). It may be
stored in multiple computers, located in the same physical location; or may be dispersed
over a network of interconnected computers. Unlike parallel systems, in which the
processors are tightly coupled and constitute a single database system, a distributed
database system consists of loosely-coupled sites that share no physical components.
System administrators can distribute collections of data (e.g. in a database) across multiple
physical locations. A distributed database can reside on network servers on theInternet, on
corporate intranets or extranets, or on other company networks. Because they store data
across multiple computers, distributed databases can improve performance at enduser worksites by allowing transactions to be processed on many machines, instead of
being limited to one.[2]
Two processes ensure that the distributed databases remain up-to-date and
current: replication and duplication.
1. Replication involves using specialized software that looks for changes in the
distributive database. Once the changes have been identified, the replication
process makes all the databases look the same. The replication process can be
complex and time-consuming depending on the size and number of the distributed
databases. This process can also require a lot of time and computer resources.
2. Duplication, on the other hand, has less complexity. It basically identifies one
database as a master and then duplicates that database. The duplication process is
normally done at a set time after hours. This is to ensure that each distributed
location has the same data. In the duplication process, users may change only the
master database. This ensures that local data will not be overwritten.
Both replication and duplication can keep the data current in all distributive locations.
A database user accesses the distributed database through:
Local applications
applications which do not require data from other sites.
Global applications
applications which do require data from other sites.
A homogeneous distributed database has identical software and hardware running all
databases instances, and may appear through a single interface as if it were a single
131
132
133
Homogeneous
Distributed
135
The terms distributed database system and database replication are related, yet distinct. In
a pure (that is, not replicated) distributed database, the system manages a single copy of all
data and supporting database objects. Typically, distributed database applications use
distributed transactions to access both local and remote data and modify the global
database in real-time.
Heterogeneous Services
Heterogeneous Services (HS) is an integrated component within the Oracle Database server
and the enabling technology for the current suite of Oracle Transparent Gateway products.
HS provides the common architecture and administration mechanisms for Oracle Database
gateway products and other heterogeneous access facilities. Also, it provides upwardly
compatible functionality for users of most of the earlier Oracle Transparent Gateway
releases.
136
Distributed transaction
A distributed transaction is an operations bundle, in which two or more network hosts are
involved. Usually, hosts provide transactional resources, while the transaction manager is
responsible for creating and managing a global transaction that encompasses all operations
against such resources. Distributed transactions, as any other transactions, must have all
four ACID (atomicity, consistency, isolation, durability) properties, where atomicity
guarantees all-or-nothing outcomes for the unit of work (operations bundle).
Open Group, a vendor consortium, proposed the X/Open Distributed Transaction
Processing (DTP) Model (X/Open XA), which became a de facto standard for behavior of
transaction model components.
Databases are common transactional resources and, often, transactions span a couple of
such databases. In this case, a distributed transaction can be seen as a database transaction
that must be synchronized (or provide ACID properties) among multiple participating
databases which are distributed among different physical locations. The isolation property
(the I of ACID) poses a special challenge for multi database transactions, since the (global)
serializability property could be violated, even if each database provides it (see also global
serializability). In practice most commercial database systems use strong strict two phase
137
locking (SS2PL) for concurrency control, which ensures global serializability, if all the
participating databases employ it. (see also commitment ordering for multidatabases.)
A common algorithm for ensuring correct completion of a distributed transaction is the
two-phase commit (2PC). This algorithm is usually applied for updates able to commit in a
short period of time, ranging from couple of milliseconds to couple of minutes.
There are also long-lived distributed transactions, for example a transaction to book a trip,
which consists of booking a flight, a rental car and a hotel. Since booking the flight might
take up to a day to get a confirmation, two-phase
commit is not applicable here, it will lock the
resources for this long. In this case more
sophisticated techniques that involve multiple undo
levels are used. The way you can undo the hotel
booking by calling a desk and cancelling the
reservation, a system can be designed to undo
certain operations (unless they are irreversibly
finished).
In practice, long-lived distributed transactions are implemented in systems based on Web
Services. Usually these transactions utilize principles of Compensating transactions,
Optimism and Isolation Without Locking. X/Open standard does not cover long-lived DTP.
The most common distributed concurrency control technique is strong strict two-phase
locking (SS2PL, also named rigorousness), which is also a common centralized concurrency
control technique. SS2PL provides both the serializability, strictness, and commitment
ordering properties. Strictness, a special case of recoverability, is utilized for effective
recovery from failure, and commitment ordering allows participating in a general solution
for global serializability. For large-scale distribution and complex transactions, distributed
locking's typical heavy performance penalty (due to delays, latency) can be saved by using
the atomic commitment protocol, which is needed in a distributed database for
(distributed) transactions' atomicity (e.g., two-phase commit, or a simpler one in a reliable
system), together with some local commitment ordering variant (e.g., local SS2PL) instead
of distributed locking, to achieve global serializability in the entire system. All the
commitment ordering theoretical
results are applicable whenever
atomic commitment is utilized over
partitioned,
distributed
recoverable (transactional) data,
including automatic distributed
deadlock
resolution.
Such
technique can be utilized also for a
large-scale
parallel
database,
where a single large database, residing on many nodes and using a distributed lock
manager, is replaced with a (homogeneous) multidatabase, comprising many relatively
small databases (loosely defined; any process that supports transactions over partitioned
data and participates in atomic commitment complies), fitting each into a single node, and
using commitment ordering (e.g., SS2PL, strict CO) together with some appropriate atomic
commitment protocol (without using a distributed lock manager).
If the message is, "RESETLOGS after complete recovery through change scn," you have
performed a complete recovery. Do not recover any of the other databases.
If the message is, "RESETLOGS after incomplete recovery UNTIL CHANGE scn," you have
performed an incomplete recovery. Record the SCN number from the message.
3.Recover all other databases in the distributed database system using change-based
recovery, specifying the SCN from Step 2.
-- Distributed recovery is more complicated than centralized database recovery because
failures can occur at the communication links or a remote site. Ideally, a recovery system
should be simple, incur tolerable overhead, maintain system consistency, provide partial
operability and avoid global rollback.
IMPQ
entity-relationship model (diagram)
(n.) Also called an entity-relationship (ER) diagram, a graphical representation of entities and their relationships to each
other, typically used in computing in regard to the organization of datawithin databases or information systems. An entity is a
piece of data-an object or concept about which data is stored.
A relationship is how the data is shared between entities. There are three types of relationships between entities:
1. One-to-One
One instance of an entity (A) is associated with one other instance of another entity (B). For example,
in a database of employees, each employee name (A) is associated with only one social security
number (B).
2. One-to-Many
One instance of an entity (A) is associated with zero, one or many instances of another
entity (B), but for one instance of entity B there is only one instance of entity A. For
example, for a company with all employees working in one building, the building name (A)
is associated with many different employees (B), but those employees all share the same
singular association with entity A.
3. Many-to-Many
One instance of an entity (A) is associated with one, zero or many instances of another
entity (B), and one instance of entity B is associated with one, zero or many instances of
entity A. For example, for a company in which all of its employees work on multiple
projects, each instance of an employee (A) is associated with many instances of a project
(B), and at the same time, each instance of a project (B) has multiple employees (A) associated with it.
140
141
databases, an entity often maps to a table. Anattribute is a component of an entity and helps
define the uniqueness of the entity. In relational databases, an attribute maps to a column.
You can create the logical design using a pen and paper, or you can use a design tool such as
Oracle Warehouse Builder or Oracle Designer.
While entity-relationship diagramming has traditionally been associated with highly
normalized models such as online transaction processing (OLTP) applications, the
technique is still useful in dimensional modeling. You just approach it differently. In
dimensional modeling, instead of seeking to discover atomic units of information and all of
the relationships between them, you try to identify which information belongs to a central
fact table(s) and which information belongs to its associated dimension tables.
One output of the logical design is a set of entities and attributes corresponding to fact
tables and dimension tables. Another output of mapping is operational data from your
source into subject-oriented information in your target data warehouse schema. You
identify business subjects or fields of data, define relationships between business subjects,
and name the attributes for each subject.
The elements that help you to determine the data warehouse schema are the model of your
source data and your user requirements. Sometimes, you can get the source model from
your company's enterprise data model and reverse-engineer the logical data model for the
data warehouse from this. The physical implementation of the logical data warehouse
model may require some changes due to your system parameters--size of machine, number
of users, storage capacity, type of network, and software.
Data Warehousing Schemas
A schema is a collection of database objects, including tables, views, indexes, and
synonyms. There are a variety of ways of arranging schema objects in the schema models
designed for data warehousing. Most data warehouses use a dimensional model.
Star Schemas
The star schema is the simplest data warehouse schema. It is called a star schema because
the diagram of a star schema resembles a star, with points radiating from a center. The
center of the star consists of one or more fact tables and the points of the star are the
dimension tables shown in Figure 2-1:
142
Unlike other database structures, in a star schema, the dimensions are denormalized. That
is, the dimension tables have redundancy which eliminates the need for multiple joins on
dimension tables. In a star schema, only one join is needed to establish the relationship
between the fact table and any one of the dimension tables.
The main advantage to a star schema is optimized performance. A star schema keeps
queries simple and provides fast response time because all the information about each
level is stored in one row. See Chapter 16, "Schemas", for further information regarding
schemas.
Note:
Oracle recommends you choose a star schema unless you have a clear
reason not to.
Other Schemas
Some schemas use third normal form rather than star schemas or the dimensional model.
Data Warehousing Objects
The following types of objects are commonly used in data warehouses:
Fact tables are the central tables in your warehouse schema. Fact tables typically
contain facts and foreign keys to the dimension tables. Fact tables represent data
143
usually numeric and additive that can be analyzed and examined. Examples include
Sales, Cost, and Profit.
Dimension tables, also known as lookup or reference tables, contain the relatively
static data in the warehouse. Examples are stores or products.
Fact Tables
A fact table is a table in a star schema that contains facts. A fact table typically has two
types of columns: those that contain facts, and those that are foreign keys to dimension
tables. A fact table might contain either detail-level facts or facts that have been aggregated.
Fact tables that contain aggregated facts are often called summary tables. A fact table
usually contains facts with the same level of aggregation.
Values for facts or measures are usually not known in advance; they are observed and
stored.
Fact tables are the basis for the data queried by OLAP tools.
Creating a New Fact Table
You must define a fact table for each star schema. A fact table typically has two types of
columns: those that contain facts, and those that are foreign keys to dimension tables. From
a modeling standpoint, the primary key of the fact table is usually a composite key that is
made up of all of its foreign keys; in the physical data warehouse, the data warehouse
administrator may or may not choose to create this primary key explicitly.
Facts support mathematical calculations used to report on and analyze the business. Some
numeric data are dimensions in disguise, even if they seem to be facts. If you are not
interested in a summarization of a particular item, the item may actually be a dimension.
Database size and overall performance improve if you categorize borderline fields as
dimensions.
Dimensions
A dimension is a structure, often composed of one or more hierarchies, that categorizes
data. Several distinct dimensions, combined with measures, enable you to answer business
questions. Commonly used dimensions are Customer, Product, and Time. Figure 2-2 shows
some a typical dimension hierarchy.
144
Dimension data is typically collected at the lowest level of detail and then aggregated into
higher level totals, which is more useful for analysis. For example, in the Total_Customer
dimension, there are four levels: Total_Customer, Regions, Territories, and Customers. Data
collected at the Customers level is aggregated to the Territories level. For the Regions
dimension, data collected for several regions such as Western Europe or Eastern Europe
might be aggregated as a fact in the fact table into totals for a larger area such as Europe.
145