Review 2

Database: Review
Sept. 2004 Yangjun Chen 91.3902 1

Database
I
n
t
r
o
d
u
c
t
i
o
n

B
+
-
t
r
e
e

H
a
s
h
i
n
g

Database: Review
What is a database?
The main characters of a database
The basic database design method
The entity-relationship data model
for application modeling
Database: Review
The main characteristics of the database approach:

single repository of data
sharable by multiple users
concurrency control and transaction concept
security and integrity constraints
self-describing - system catalogue contains meta data
program-data independence
some changes to the database are transparent to
programs/users
multiple views of data - to support individual needs of
programs/users
Database: Review
Database schema, Schema evolution,
Database state
Working process with a database system
Database system architecture
Data independence concept
Concepts and
Architecture
Database: Review
Database schema
Relation schema
Schema evolution
Database state
Student Name StNo Class Major
Smith 17 1 CS
Brown 8 2 CS
Course CName CNo CrHrs Dept
Database 8803 3 CS
C 2606 3 CS
Section SId CNo Semester Yr Instructor
32 8803 Spring 2000 Smith
25 8803 Winter 2000 Smith
43 2606 Spring 2000 Jones
Grades StNo Sid Grade
17 25 A
17 43 B
Database: Review
Working process with a database system:
Definition
record structure
data elements
names
data types
constraints
etc
Construction
create database
files
populate the
database with
records
Manipulation
querying
updating
Database: Review
Database Management System (DBMS)
collection of software facilitating the definition,
construction and manipulation of databases
Users/
actors
Request
manager
Storage
manager,
Query
evaluation
Meta data
Stored
database
DBMS
Database: Review
Three-schema architecture
External
view
External
view
Conceptual
schema
Internal
schema
Physical storage
structures and
details
Describes the
whole database
for all users
A specific user or
groups view of the
database
Database: Review
Data modeling using
ER-model
Entity-relationship model
- Entity types
- strong entities
- weak entities
- Relationships among entities
- Attributes - attribute classification
- Constraints
- cardinality constraints
- participation constraints
ER-to-Relation-mapping
Database: Review
employee
department
project
dependent
ER-model:
works for
manages
works on
dependents of
controls
supervision
bdate
ssn
name
lname
minit
fname
sex
address
salary
birthdate name sex
relationship
name
number location
name
number location
number of
employees
startdate
hours
1
1
1
N
supervisor supervisee
N
M
N
1
M
N 1
M
Database: Review

external hashing
static hashing & dynamic hashing
hash function
mathematical function that maps a key to a
bucket address
collisions
collision resolution scheme
- open addressing
- chaining
- multiple hashing
linear hashing
Hashing technique
Database: Review
External hashing: the data are on the disk.
Static hashing:
using a hashing function to map keys to bucket addresses
primary area can not be changed
collision resolution schema:
open addressing
chaining
multiple hashing
Dynamic hashing:
primary area can be changed
linear hashing
Database: Review
Linear hashing:

1. What is a phase?
2. How to split a bucket?
3. When to split a bucket?
4. What bucket will be chosen to split next?
Database: Review
Linear hashing:
initially hash file contains M buckets
h
i
= key mod 2
i
M (i = 0, 1, 2, ...)
insertion process can be divided into several phases
phase 1:
insertion using h
0
= key mod M
splitting using h
1
= key mod 2M
splitting rule: overflow of a bucket or
if load factor > constant (e.g., 0.70)
overflow will be put in the overflow area or redistributed through
splitting a bucket
splitting buckets from n = 0 to n = M- 1 (after each splitting
n is increased by 1.
Phase 1 finishes when n = M (in this case, the primary area
becomes 2M buckets long)
Database: Review
phase 2:
insertion using h
1
= key mod 2M
splitting using h
2
= key mod 4M
splitting rule: overflow of a bucket or
if load factor > constant (e.g., 0.70)
overflow will be put in the overflow area or redistributed through
splitting a bucket
splitting buckets from n = 0 to n = 2M- 1 (after each splitting
n is increased by 1.
Phase 1 finishes when n = 2M (in this case, the primary area
will contain 4M buckets.)

phase 3: ... h
2
= , h
3
= , ...
Database: Review
tree
- root, internal, leaf, subtree
- parent, child, sibling
balanced, unbalanced
b
+
-tree
- splits on overflow; merge on underflow
- in practice it is usually 3 or 4 levels deep
search, insert, delete algorithms
Multi-level
index
Database: Review
B+-tree insertion: leaf node splitting, internal node splitting
Leaf splitting

When a leaf splits, a new leaf is allocated
the original leaf is the left sibling, the new one is the right sibling
key and pointer pairs are redistributed: the left sibling will have smaller
keys than the right sibling
a 'copy' of the key value which is the largest of the keys in the left sibling
is promoted to the parent

33
12 22 33 44 48 55 12 22 44 48 55 31 33
22 33
insert 31
Database: Review
Internal node splitting

If an internal node splits and it is not the root,
insert the key and pointer and then determine the middle key
a new 'right' sibling is allocated
everything to its left stays in the left sibling
everything to its right goes into the right sibling
the middle key value along with the pointer to the new right sibling is
promoted to the parent (the middle key value 'moves' to the parent to
become the discriminator between this left and right sibling)
22 33
55
22
26 55
I nsert
26
33
Database: Review
Internal node splitting

When a new root is formed, a key value and two pointers must
be placed into it.

26 55
Insert
40
26 55
40
Database: Review
Deleting nodes from a B+-tree:
1. When deleting a key from a node A, check whether the
number of the remaining keys (or pointers) is > p/2(.
2. If it is not the case, redistribute the keys in the left sibling B or
in the right sibling C if it is possible. Otherwise, merge A and B or merge
A and C.
3. When redistributing or merging, change the key values in the
parent node so that the following condition is satisfied:
< P
1
, K
1
, P
2
, K
2
, , P
q-1
, K
q-1
, P
q
>
K
1
< K
2
< ...

<

K
q-1
(i.e. it is an ordered set)
for the key values, X, in the subtree pointed to by P
i

K
i-1
< X

<=

K
i
for 1 < i < q
X

<= K
1
for i = 1
K
q-1
< X

for i = q
Database: Review
A b
+
-tree

5
3 7 8
6 7 9 12 5 8 1 3
Records
p = 3,
p
leaf
= 2.
Database: Review
Entry deletion
- deletion sequence: 8, 12, 9, 7
5
3 7 9
6 7 12 5 9 1 3
Deleting 8 causes the node redistribute.
Database: Review
Entry deletion
5
3 7
6 7 5 9 1 3
12 is removed.
Database: Review
Entry deletion
5
3 6
6 5 7 1 3
9 is removed.
Database: Review
Entry deletion
5
3 6
6 5 1 3
Deleting 7 makes this pointer no use.
Therefore, a merge at the level above
the leaf level occurs.
Database: Review
Entry deletion
5 3
For this merge, 5 will be taken as a key value in A since
any key value in B is less than or equal to 5 but any key
value in C is larger than 5.
6 5 1 3
5
A
B
C
5
This point becomes useless.
The corresponding node
should also be removed.
Database: Review
Entry deletion
6 5 1 3
5 3
5
Database: Review
Data modeling using
Relational model
Relational algebra
Relational Data Model
- relational schema, relations
- database schema, database state
- integrity constraints and updating
Relational algebra
- select, project, join, cartesian product
- division
- set operations:
union, intersection, difference,

Database: Review
Integrity Constraints
any database will have some number of constraints that must
be applied to ensure correct data (valid states)
1. domain constraints
a domain is a restriction on the set of valid values
domain constraints specify that the value of each
attribute A must be an atomic value from the domain
dom(A).
2. key constraints
a superkey is any combination of attributes that
uniquely identify a tuple: t
1
[superkey] = t
2
[superkey].
- Example: <Name, SSN> (in Employee)
a key is superkey that has a minimal set of attributes
- Example: <SSN> (in Employee)
Database: Review
If a relation schema has more than one key, each of them is
called a candidate key.
one candidate key is chosen as the primary key (PK)
foreign key (FK) is defined as follows:
i) Consider two relation schemas R
1
and R
2
;
ii ) The attributes in FK in R
1
have the same domain(s) as the
primary key attributes PK in R
2
; the attributes FK are said to
reference or refer to the relation R
2
;
iii) A value of FK in a tuple t
1
of the current state r(R
1
) either
occurs as a value of PK for some tuple t
2
in the current state
r(R
2
) or is null. In the former case, we have t
1
[FK] = t
2
[PK],
and we say that the tuple t
1
references or refers to the tuple t
2
.
Example:
Employee(SSN, , Dno) Dept(Dno, )
FK
Database: Review
3. entity integrity
no part of a PK can be null
4. referential integrity
domain of FK must be same as domain of PK
FK must be null or have a value that appears as a PK
value
5. semantic integrity
other rules that the application domain requires:
state constraint: gross salary > net income
transition constraint: Widowed can only follow
Married; salary of an employee cannot decrease
Database: Review
Relational algebra
Retrieve for each female employee a list of the names of her
dependents:
FEMALE_EMPS o
SEX = F
(EMPLOYEE)
ACTUAL_DEPENDENTS EMPNAMES
EMPNAMES t
FNAME,LNAME, SSN
(FEMALE_EMPS)
RESULT t
FNAME, LNAME, DEPENDENT_NAME
(ACTUAL_DEPENDENTS )
DEPENDENT
SSN = ESSN
Database: Review
DDL
- creating schemas
- modifying schemas
DML
- select-from-where clause
- group by, having, order by
- update
- view
SQL
Database: Review
DDL - Examples:

Create schema:
Create schema COMPANY authorization JSMITH;
Create table:
Create table EMPLOYEE
(FNAME VARCHAR(15) NOT NULL,
MINIT CHAR,
LNAME VARCHAR(15) NOT NULL,
SSN CHAR(9) NOT NULL,
BDATE DATE,
ADDRESS VARCHAR(30),
SEX CHAR,
SALARY DECIMAL(10, 2),
SUPERSSN CHAR(9),
DNO INT NOT NULL,
PRIMARY KEY(SSN),
FOREIGN KEY(SUPERSSN) REFERENCES EMPLOYEE(SSN),
FOREIGN KEY(DNO) REFERENCES DEPARTMENT(DNUMBER));
Database: Review
DDL - Examples:

drop schema
DROP SCHEMA CAMPANY CASCADE;
DROP SCHEMA CAMPANY RESTRICT;
drop table
DROP TABLE DEPENDENT CASCADE;
DROP TABLE DEPENDENT RESTRICT;
alter table
ALTER TABLE COMPANY.EMPLOYEE
ADD JOB VARCHAR(12);
ALTER TABLE COMPANY.EMPLOYEE
DROP ADDRESS CASCADE;

Database: Review
DML - select-from-where clause

Retrieve a list of employees and the projects they are working on, ordered by
department, within each department, ordered alphabetically by last name, first
name:

SELECT DNAME, LNAME, FNAME, PNAME
FROM DEPARTMENT, EMPLOYEE, WORKS_ON, PROJECT
WHERE DNUMBER = DNO AND SSN = ESSN AND
PNO = PNUMBER
ORDER BY DNAME, LNAME, FNAME

order by clause
group by clause
having clause
aggregation functions: max, min, average, count, sum
Database: Review
DML - select-from-where clause
Insert
Update
Delete
INSERT INTO employee ( fname, lname, ssn, dno )
VALUES ( "Joe", "Smith", 909, 1);

UPDATE employee SET salary = 100000
WHERE ssn=909;

DELETE FROM employee WHERE ssn=909;
Note that Access changes the above to read:
INSERT INTO employee ( fname, lname, ssn, dno )
SELECT "Joe", "Smith", 909, 1;
Database: Review
View definition
Use a Create View command
essentially a select specifying the data that makes up the view
Create View Enames as select lname, fname from employee
CREATE VIEW Enames (lname, fname)
AS SELECT LNAME, FNAME
FROM EMPLOYEE
Database: Review
CREATE VIEW DEPT_INFO (DEPT_NAME,
NO_OF_EMPS,
TOTAL_SAL)
AS SELECT DNAME, COUNT(*), SUM(SALARY)
FROM DEPARTMENT, EMPLOYEE
WHERE DNUMBER = DNO
GROUP BY DNAME;
Database: Review
function dependencies
- data redundancy, update anomalies
- what is a function dependency?
- inference rules, minimal set of FDs
normal forms
- first normal form
- second normal form
- third normal form
- Boyce Codd normal form
Normalization
Database: Review
Data redundancy and update anomalies:
ename ssn bdate address
EmployeeDepartment
dnumber dname
This is similar to Employee, but we have included
dname
Database: Review
In the two prior cases with EmployeeDepartment and
EmployeeProject, we have redundant information in the
database
if two employees work in the same department, then
that department name is replicated
if more than one employee works on a project then the
project location is replicated
if an employee works on more than one project his/her
name is replicated
Redundant data leads to
additional space requirements
update anomalies
Database: Review
Suppose EmployeeDepartment is the only relation where
department name is recorded

insert anomalies
adding a new department is complicated unless there is
also an employee for that department
deletion anomalies
if we delete all employees for some department, what
should happen to the department information?
modification anomalies
if we change the name of a department, then we must
change it in all tuples referring to that department
Database: Review
Functional dependencies:
Suppose we have a relation R comprising attributes X,Y,
We say a functional dependency exists between the
attributes X and Y,

if, whenever a tuple exists with the value x for X, it will
always have the same value y for Y.
X Y
X Y
LHS RHS
Database: Review
student_no student_name course_no gender
Student
Given a specific student number, there is
only one value for student name and only
one value for gender found with it.
Student_no Student_name
gender
Database: Review
Inference Rules for Function Dependencies
From a set of FDs, we can derive some other FDs
Example:
F = {ssn {Ename, Bdate, Address, dnumber},
dnumber {dname, dmgrssn}}
ssn {dname, dmgrssn},
ssn dnumber,
dnumber dname.
inference
F
+
(closure of F): The set of all FDs that can be deduced from
F (with F together) is called the closure of F.
Database: Review
Inference Rules for Function Dependencies
Inference rules:
- IR1 (reflexive rule): If X _ Y, then X Y. (X X.)
- IR2 (augmentation rule): {X Y} |= ZX ZY.
- IR3 (transitive rule): {X Y, Y Z} |= X Z.
- IR4 (decomposition, or projective, rule):
{X ZY} |= X Y, X Z.
- IR5 (union, or additive, rule): {X Y, Y Z} |= X ZY.
- IR6 (pseudotransitive rule): {X Y, WY Z} |= WX Z.
Database: Review
Equivalence of Sets of FDs
E and F are equivalent if E
+
= F
+
.

Minimal sets of FDs
every dependency has a single attribute on the RHS
the attributes on the LHS of a dependency are minimal
we cannot remove any dependency from F and still
have a set of dependencies that is equivalent to F.
ssn pnumber hours ename plocation
{ssn, pnumber} hours,
ssn ename,
pnumber plocation.
Database: Review
Normal Forms

A series of normal forms are known that have,
successively, better update characteristics.

Well consider 1NF, 2NF, 3NF, and BCNF.

A technique used to improve a relation is
decomposition, where one relation is replaced by two or
more relations. When we do so, we want to eliminate
update anomalies without losing any information.
Database: Review
1NF - First Normal Form
The domain of an attribute must only contain atomic values.
This disallows repeating values, sets of values, relations
within relations, nested relations,
In the example database we have a department located in
possibly several locations: department 5 is located in
Bellaire, Sugarland, and Houston.
If we had the relation

then it would not be 1NF because there are multiple values to
be kept in dlocations.
Department
dnumber dname dmgrssn dlocations
5 Research 333445555
Bellaire, Sugarland, Houston
Database: Review
1NF - First Normal Form
If we have a non-1NF relation we can decompose it, or
modify it appropriately, to generate 1NF relations.
There are 3 options:
option 1: split off the problem attribute into a new
relation (create a DepartmentLocation relation).
dnumber dname dmgrssn dlocation
Department
dnumber
DepartmentLocation
5 Research 333445555 Bellaire 5
5 Sugarland
5 Houston
Generally considered the best
solution
Database: Review
2NF - Second Normal Form
full functional dependency
X Y is a full functional dependency if removal of
any attribute A from X means that the dependency does not
hold any more.
EmployeeProject
{ssn, pnumber} hours is a full dependency
(neither ssn hours , nor pnumber hours).
Database: Review
partial functional dependency
X Y is a partial functional dependency if removal of
some attribute A from X does not affect the dependency.

{ssn, pnumber} ename is a partial dependency
because ssn ename holds.)
EmployeeProject
Database: Review
A relation schema is in 2NF if
(1) it is in 1NF and
(2) every non-key attribute must be fully functionally
dependent on the primary key.
EmployeeProject
then this relation would not be 2NF because of two separate
violations of the 2NF definition:
Database: Review
We correct this by decomposing the relation into three
relations - splitting off the offending attributes - splitting
off partial dependencies on the key.
EmployeeProject
ssn pnumber hours
ename
plocation
ssn
pnumber
2NF
Database: Review
3NF - Third Normal Form
Transitive dependency
A functional dependency X Y in a relation schema R
is a transitive dependency if there is a set of attributes Z that is
not a subset of any key of R, and both X Z and Z Y hold.
EmployeeDept
dnumber dname
ssn dnumber and dnumber dname
Database: Review
A relation schema is in 3NF if
(1) it is in 2NF and
(2) each non-key attribute must not be fully functionally
dependent on another non-key attribute (there must be no
transitive dependency of a non-key attribute on the PK)
ename ssn bdate address dnumber dname
then this relation would not be 3NF because
dname is functionally dependent on dnumber and neither is
a key attribute
Database: Review
We correct this by decomposing - splitting off the transitive
dependencies
EmployeeDept
dnumber dname
ename ssn bdate address dnumber
dname dnumber
3NF
Database: Review
Boyce Codd Normal Form, BCNF
Consider a different definition of 3NF, which is
equivalent to the previous one.
A relation schema R is in 3NF if, whenever a
function dependency X A holds in R, either
(a) X is a superkey of R, or
(b) A is a prime attribute of R.
A superkey of a relation schema R = {A1, A2, ..., An} is a set of attributes S_ R
with the propertity that no tuples t1 and t2 in any legal state r of R will have
t1[S] = t2[S].
An attribute is called a prime attribute if it is a member of any key.
Database: Review
If we remove (b) from the previous definition for 3NF,
we have the definition for BCNF.
A relation schema is in BCNF if every determinant is a
superkey key. Stronger than 3NF:
- no partial dependencies
- no transitive dependencies where a non-key attribute
is dependent on another non-key attribute
- no non-key attributes appear in the LHS of a
functional dependency.

Database: Review
Consider:
student_no course_no instr_no
Instructor teaches one
course only.
Student takes a course
and has one instructor.
In 3NF!
{student_no, course_no} instr_no
instr_no course_no
Database: Review
This decomposition preserves all the information.
course_no instr_no
student_no instr_no
121 1803 99
121 1903 77
222 1803 66
222 77
99
77
66
S# C# I# I#
Only FD is instr_no course_no
but the join preserves
{student_no, course_no} instr_no
Database: Review
Definition of lossless join property
- relation decomposition
- lossless join property
Testing algorithm
- matrix construction
- matrix initialization
- matrix modification
Lossless
join
Database: Review
Basic definition of Lossless-join
A decomposition D = {R
1
, R
2
,..., R
m
} of R has the lossless
join property with respect to the set of dependencies F on R if,
for every relation r of R that satisfies F, the following holds,
-(t
R1
(r), ..., t
Rm
(r)) = r,
where - is the natural join of all the relations in D.
The word loss in lossless refers to loss of information, not to
loss of tuples.
Database: Review
SSN PNUM hours ENAME
Emp_PROJ
PNAME PLOCATION
F = {SSN ENAME, PNUM {PNAME, PLOCATION},
{SSN, PNUM} hours}
SSN ENAME
R1
PNUM PNAME PLOCATION
R2
SSN PNUM hours
R3
Lossless join
Database: Review
decomposion-1
A1
SSN
A2
ENAME
A3
PNUM
A4
PNAME
A5
PLOCATION
A6
hours
b11
b21
b31
b12
b22
b32
b13
b23
b33
b14
b24
b34
b15
b25
b35
b16
b26
b36
R1
R2
R3
a1
b21
a1
a2
b22
b32
b13
a3
a3
b14
a4
b34
b15
a5
b35
b16
b26
a6
R1
R2
R3
Database: Review
a1
b21
a1
a2
b22
a2
b13
a3
a3
b14
a4
b34
b15
a5
b35
b16
b26
a6
R1
R2
R3
a1
b21
a1
a2
b22
a2
b13
a3
a3
b14
a4
a4
b15
a5
a5
b16
b26
a6
R1
R2
R3
SSN ENAME
PNUM {PNAME, PLOCATION}
SSN ENAME
PNUM PNAME PLOCATION
Database: Review
Example: decomposition-2
SSN PNUM hours ENAME
Emp_PROJ
PNAME PLOCATION
F = {SSN ENAME, PNUM {PNAME, PLOCATION},
{SSN, PNUM} hours}
ENAME
R1
SSN PNAME
PLOCATION
R2
PNUM hours
Not lossless join
PLOCATION
Database: Review
decomposition-2
A1
SSN
A2
ENAME
A3
PNUM
A4
PNAME
A5
PLOCATION
A6
hours
b11
b21
b12
b22
b13
b23
b14
b24
b15
b25
b16
b26
R1
R2
b11
a1
a2
b22
b13
a3
b14
a4
a5
a5
b16
a6
R1
R2
The matrix can not be changed!
SSN ENAME
PNUM {PNAME, PLOCATION}
{SSN, PNUM} hours
Database: Review
Hierarchical database schema
- hierarchical schema
- record type, PCR type
- virtual PCR: virtual child, virtual parent
Database languages
- HDDL
- HDML
Hierarchical
databases
Database: Review
dependent
Dept_locations
employee
department
project
ERD for Chapter 6
database example
1
1
1
1
n
n
n
n
n
n
m
1
1
1
Works on
Database: Review
Virtual Parent-child Relationships
- Hierarchical schema using VPCR - for a Company database
Department
Dname
Dnum
Project
Pname
...
Dlocation
Location
Demployee
EPTR
Dmanager
MPTR
Pworker
Hours WPTR
Employee
Ename
Minit ...
Esupervisee
SPTR
Dependent
DEPname Minit
...
D E
L
P
Y
M
W
S
T
StartDate

Review 2

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Review 2

Загружено:

Авторское право:

Доступные форматы

Database: Review

Sept. 2004 Yangjun Chen 91.3902 1

Вам также может понравиться