Вы находитесь на странице: 1из 126

Faculty of Computer Science & Engineering

Ho Chi Minh City University of Technology

Data modeling

Dr. Vo Thi Ngoc Chau


(chauvtn@cse.hcmut.edu.vn)

Semester 1 2014-2015

Main references
[1] R. Elmasri, S. R. Navathe, Fundamentals of Database

Systems- 4th Edition, Pearson- Addison Wesley, 2003.

[2] H. G. Molina, J. D. Ullman, J. Widom, Database System


Implementation, Prentice-Hall, 2000.
[3] H. G. Molina, J. D. Ullman, J. Widom, Database Systems: The
Complete Book, Prentice-Hall, 2002

[4] A. Silberschatz, H. F. Korth, S. Sudarshan, Database System


Concepts 3rd Edition, McGraw-Hill, 1999.
[5] R. Ramakrishnan, J. Gehrke, Database Management Systems- 2rd
Edition, McGraw-Hill.
[6] M. P. Papazoglou, S. Spaccapietra, Z. Tari, Advances in ObjectOriented Data Modeling, MIT Press, 2000.
[7]. G. Simsion, Data Modeling: Theory and Practice, Technics
Publications, LLC, 2007.

Content

2.1. Concepts

2.2. Why is data modeling important?

2.3. Data modeling with entity-

relationship and relational models

2.4. Dependencies and normalization

2.5. Conclusion
3

2.1. Concepts
Data

modeling

Conceptual

data model

Entity relationship model

Representational

Relational data model

Database

data model

design

Relational database design


4

Data modeling

Modeling - Cambridge dictionary

Model = a representation of something, either


as a physical object which is usually smaller
than the real object, or as a simple description

of the object which might be used in calculations

Modeling = constructing a representation of


something, either as a physical object which is
usually smaller than the real object, or as a
simple description of the object which might be
used in calculations

Data modeling

Data modeling [7], pp. 31-32

formalizing and representing the data structures


of reality

a representation of the things of significance to


an enterprise and the relationships among those
things

Shoval and Frumermann, 1994, p28

Hay, 1996a

an attempt to capture the essence of things


both concrete and abstract

Keuffel, 1996
6

Data modeling

Data modeling [7], pp. 31-32

an abstract representation of the data about


entities, events, activities, and their associations
within an organization

McFadden, Hoffer et al., 1999

The core idea underlying all the definitions is


the same: a data model is used for describing
entities and their relationships within a core
domain.

Topi and Ramesh, 2002


7

Data modeling

Data modeling [7], pp. 31-32

Data modeling is generally viewed as a design


activity

an activity that involves the creation of


abstractions

Davydov, 1994

the art and science of arranging the structure


and relationship of data

Srinivasan and Teeni, 1990

McComb, 2004, p293

data modeling is a design discipline

Simsion and Witt, 2005, p7

Conceptual data model

A data model (aka semantic data model)

Provides concepts close to the way many users


perceive data

Provides the concepts essential for supporting


the application environment at a very high nonsystem-specific level

Used for a conceptual schema of a database


from data requirements

Example: entity-relationship model


9

Conceptual data model

Characteristics of a conceptual data model

Expressiveness

Simplicity

A small number of basic concepts that are distinct and orthogonal


in their meaning

Formality

Simple enough for an end user to use and understand an easy


diagrammatic notation

Minimality

Distinctions between data, relationships, constraints

Concepts must be formally defined state criteria for the validity


of a schema in the model

Unique interpretation

Complete and unambiguous semantics for each modeling construct

S. Navathe. Evolution of data modeling for databases. Communications of the ACM 35(9)(1992) 112-123.

10

Representational data model

A data model (aka implementation/logical data


model)

Provides concepts understood by end-users but not


too far removed from the way data is organized
within the computer

Hides some details of data storage but able to be


implemented on a computer system in a direct way
(in some DBMS)

Example: relational data model


11

Database design

Design the logical and physical structure of one or


more databases to accommodate the information

needs of the users in an organization for a defined


set of applications

The overall database design activity has to undergo


a systematic process called the design methodology,

whether the target database is managed by an


RDBMS, ODBMS, or ORDBMS, ...

The result of the design activity is a rigidly defined


database schema that cannot easily be modified
once the database is implemented.

12

Database design

Goals:

Satisfy the information content requirements of

the specified users and applications

Provide a natural and easy-to-understand

structuring of the information

Support processing requirements and any

performance objectives, such as response time,


processing time, and storage space
13

Database
design

Figure 2.1 Phases of database


design and implementation for

large databases
[1], pp. 368

14

Database design

Six main phases of the overall database


design and implementation process

Requirements collection and analysis

Conceptual database design

Choice of a DBMS

Data model mapping (also called logical

database design)

Physical database design

Database system implementation and tuning

15

Database design

Phase 1: Requirements Collection and Analysis

Identify the major application areas and user groups that will use the database
or whole work will be affected by the database

Study and analyze existing documentation concerning the applications

Study the current operating environment and planned use of the information

Analyze the types of transactions and their frequencies as well as of the flow of
information within the system
Study geographic characteristics regarding users, origin of transactions, destination of
reports,
Specify the input and output data for the transactions

Collect written responses to sets of questions from the potential database users
or user groups

Users priorities and the importance users place on various applications

Requirements from users and applications of the information system that


will interact with the database system

Time-consuming but crucial to the success of the information system

Know and analyze the expectations of the users and the intended uses of
16
the database in as much detail as possible

Database design

Phase 2: Conceptual Database Design

Phase 2a: Conceptual Schema Design

Examine the data requirements resulting from Phase 1


Produce a conceptual database schema in a DBMSindependent high-level data model

Phase 2b: Transaction Design

Design the characteristics of known database transactions


(applications) in a DBMS-independent way to ensure that
the database schema will include all the information
required by these transactions
Identify transactions input/output and functional
behavior

Group transactions into three categories: retrieval


transactions, update transactions, mixed transactions

17

Database design

Phase 2: Conceptual Database Design

Phase 2a: Conceptual Schema Design

Goal = a complete understanding of the database


structure, meaning (semantics), interrelationships, and
constraints

Identify: entity types, relationship types, attributes,


key attributes, cardinality and participation constraints
on relationships, weak entity types,
specialization/generalization hierarchies

Approaches

The centralized (or one-shot) schema design approach


The view integration approach

18

Database design

Phase 2a: Conceptual Schema Design

The centralized (or one-shot) schema design


approach

The requirements of the different applications and user


groups from Phase 1 are merged into a single set of
requirements before schema design begins.
A single schema corresponding to the merged set of
requirements is then designed.
The DBA is responsible for deciding how to merge the
requirements and for designing the conceptual schema
for the whole database

Once the conceptual schema is designed and finalized,


external schemas for the various user groups and
applications can be specified by the DBA.

19

Database design

Phase 2a: Conceptual Schema Design

The view integration approach

The requirements are not merged.


A schema (or view) is designed for each user group or
application based only on its own requirements.
Each user group or application

These schemas are merged or integrated into a global


conceptual schema for the entire database.

The DBA

The individual views can be reconstructed as external


schemas after view integration.
20

Database design

Phase 3: Choice of a DBMS

Technical factors

Storage structures and access paths, the user and


programmer interfaces, high-level query languages,
development tools, architectural options, supported by
DBMS
DBMS portability among different types of hardware

Non-technical factors

Suitability & type of the DBMS for the task

Cost: software acquisition cost, maintenance cost, hardware


acquisition cost, database creation and conversion cost,
personnel cost, training cost, operating cost
Availability of vendor services
21

Database design

Phase 4: Data Model Mapping (Logical


Database Design)

Create a conceptual schema and external


schemas in the data model of the selected DBMS

The result of this phase should be DDL


statements in the language of the chosen DBMS
that specify the conceptual and external level
schemas of the database system.
22

Database design

Phase 5: Physical Database Design

Restricted to choosing the most appropriate structures for the


database files from among the options offered by that DBMS

Choose specific storage structures and access paths for the


database files

Response time

Space utilization

Transaction throughput

Estimate record size and number of records in each database


file

Estimate the update and retrieval patterns for the file


cumulatively from all the transactions

Estimate the file growth, either in the record size because of


new attributes or in the number of records

23

Database design

Phase 6: Database System Implementation and Tuning

Create the database schemas and (empty) database files

Responsibility of the DBA in conjunction with the database


designers

Reformat the data for loading into the new database if needed

Load/populate with the data if needed

Implement database transactions referring to the conceptual


specifications of transaction, then write and test program code
with embedded DML commands

Responsibility of the application programmers

Database tuning continues as long as the database is in


existence, as long as performance problems are discovered,
and while the requirements keep changing.
24

Relational database design

Two main approaches

A top-down design technique

Designing a conceptual schema in a high-level data model


Mapping the conceptual schema into a relational database
schema
Analyzing each of the resulting relations based on the
functional dependencies and assigned primary keys

A bottom-up design technique (relational synthesis)

View relational database schema design strictly in terms of


functional and other types of dependencies specified on the
database attributes
Synthesize the relation schemas applying a normalization
algorithm

25

2.2. Why is data modeling important?

Leverage

Make programming simpler and cheaper

Poor data organization can be expensive to fix.

Conciseness

Take more directly to the heart of the business requirements

Data quality

Problems with data quality can be traced back to a lack of


consistency in:

Defining and interpreting data

Implementing mechanisms to enforce the definitions

A key role in achieving good data quality by establishing a


common understanding of what is to be held in each table
and column and how it is to be interpreted

G. C. Simsion, G. C. Witt, Data modeling essentials 3rd edition, Elsevier Inc, 2005, pp. 8-10.

26

2.3. Data modeling with entity-relationship and


relational models
Entity-relationship
Relational
Data

model

data model

model mapping

27

Database
design
Entity Relationship Model
P. P-S. Chen. The EntityRelationship Model Toward

a Unified View of Data. ACM


Transactions on Database
Systems 1(1)(March 1976)
9-36.

Figure 2.1 Phases of database


design and implementation for

large databases
[1], pp. 368

28

Entity Relationship Model

P. P-S. Chen. The Entity-Relationship Model Toward a Unified View of


Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.

The entity-relationship model adopts the


more natural view that the real world
consists of entities and relationships.

The model can achieve a high degree of data


independence and is based on set theory
and relation theory.

The entity-relationship model can be used as


a basis for a unified view of data.

A special diagrammatric technique, the


entity-relationship diagram, is introduced as
a tool for database design.

29

Entity Relationship Model

P. P-S. Chen. The Entity-Relationship Model Toward a Unified View of


Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.

Fig. 1. Analysis of data models using multiple levels


of logical views

30

Entity Relationship Model

P. P-S. Chen. The Entity-Relationship Model Toward a Unified View of


Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.

The entity-relationship (ER) modeling


concepts

Entity types

Relationship types

Attributes

Keys

Structural constraints
31

Entity Relationship Model

P. P-S. Chen. The Entity-Relationship Model Toward a Unified View of


Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.

Symbol

Meaning

Example

Employee

Entity type

Employees

Dependent

Weak entity type

Dependents of
each employee

works_on

Relationship type

Employee
works_on Project

dependents_of

Identifying
relationship of the
weak entity type

Dependents
dependents_of
Employees
32

Entity Relationship Model

P. P-S. Chen. The Entity-Relationship Model Toward a Unified View of


Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.

Street

Symbol

Meaning

Example

Name

Attribute

Name of an
employee

EmployeeID

Key Attribute

Distinct identifier of
an employee

PhoneNumber

Multivalued
Attribute

Phone numbers of an
employee

Composite
Attribute

Address (Street,
District, City) of an
employee

Derived Attribute

Age of an employee
(derived from date of
birth)

District

Address

Age

City

33

Entity Relationship Model

P. P-S. Chen. The Entity-Relationship Model Toward a Unified View of


Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.

Constraints

34

Figure 3.2. An ER schema diagram


for the COMPANY database
[1], pp. 54.

35

a. List the entity types


b. Is there a weak entity
type? If so, give its
name, partial key, and
identifying relationship
c.

What constraints do the


partial key and the
identifying relationship
of the weak entity type
specify in this diagram?

d. List the names of all


relationship types, and
specify the (min, max)
constraint on each
participation of an entity
type in a relationship
type.

Figure 3.18. An ER diagram for a BANK database schema


[1], pp. 81.

36

Enhanced Entity Relationship Modeling

Class/subclass relationships and type inheritance

The relationship between a superclass and any one of its subclasses

Specialization

Define a set of subclasses of an entity type

Establish additional specific attributes with each subclass

Establish additional specific relationship types between each


subclass and other entity types or other subclasses

Generalization

A reverse process of abstraction in which we suppress the


differences among several entity types

Identify their common features

Generalize them into a single superclass of which the original entity


types are special subclasses
37

partial specialization

total specialization

Figure 4.1. EER diagram notation to represent subclasses and specialization.


[1], pp. 87.

38

Figure 4.3. Generalization


[1], pp. 91.

39

- Disjoint (d): the subclasses of the specialization must be disjoint.

- Overlapping (o): the subclasses are not constrained to be disjoint,


their sets of entities may overlap.
Figure 4.5. EER diagram notation for overlapping (nondisjoint) specialization.
[1], pp. 93.

40

Figure 4.7. A specialization lattice


with multiple inheritance for a
UNIVERSITY database
[1], pp. 96.
a. List superclasses, subclasses
of each superclass

b. List class/subclass
relationships
c. Describe constraints on each
specialization

41

Database
design

A representational data

model supported by the


chosen DBMS
Relational Data Model
E. F. Codd. A Relational
Model of Data for Large
Shared Data Banks.
Communications of the ACM
13(6)(June, 1970) 377-387.

Figure 2.1 Phases of database


design and implementation for

large databases
[1], pp. 368

42

Data model

E. F. Codd. Data models in database management, ACM, 1980.

A combination of three following components

(1). A collection of data structure types (the building

blocks of any database that conforms to the model);

(2). A collection of operators or inferencing rules, which


can be applied to any valid instances of the data types
listed in (1), to retrieve or derive data from any parts

of those structures in any combinations desired;

(3). A collection of general integrity rules, which


implicitly or explicitly define the set of consistent

database states or changes of state or both -- these


rules may sometimes be expressed as insert-updatedelete rules

43

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks.


Communications of the ACM 13(6)(June, 1970) 377-387.

The term relation is used here in its accepted


mathematical sense.

A relational database based on the relational data


model is a collection of relations.

Given sets S1, S2, , Sn (not necessarily distinct),


R is a relation on these n sets if it is a set of ntuples each of which has its first element from S1,
its second element from S2, and so on.

Sj is the jth domain of R.

R is said to have degree n.

More concisely, R is a subset of the Cartesian


product S1xS2xxSn.

44

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks.


Communications of the ACM 13(6)(June, 1970) 377-387.

Fig. 1. A relation of degree 4


A relation of degree 4, called supply, which reflects the shipments-in-progress
of parts from specified suppliers to specified projects in specified quantities.
45

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks.


Communications of the ACM 13(6)(June, 1970) 377-387.

An n-ary relation R has the following


properties:

Each row represents an n-tuple of R.

The ordering of rows is immaterial.

All rows are distinct.

The ordering of columns is significant it


corresponds to the ordering S1, S2, , Sn of the
domains on which R is defined.

The significance of each column is partially


conveyed by labeling it with the name of the
corresponding domain.

46

Data model

E. F. Codd. Data models in database management, ACM, 1980.

A combination of three following components

(1). A collection of data structure types (the building

blocks of any database that conforms to the model);

(2). A collection of operators or inferencing rules, which


can be applied to any valid instances of the data types
listed in (1), to retrieve or derive data from any parts

of those structures in any combinations desired;

(3). A collection of general integrity rules, which


implicitly or explicitly define the set of consistent

database states or changes of state or both -- these


rules may sometimes be expressed as insert-updatedelete rules

47

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks.


Communications of the ACM 13(6)(June, 1970) 377-387.

Operations on relations

Usual set operations: union, intersection, minus

Projection ()

Select a subset of the tuples from a relation that satisfy a


selection condition

Cartesian product (cross product) (x)

Select certain columns of a relation (striking out the others)

Selection ()

Compatible relations

Combine tuples from two relations in a combinatorial fashion

Join (inner/outer join, equijoin, natural join)

Combine related tuples from two relations into single tuples

49

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks.


Communications of the ACM 13(6)(June, 1970) 377-387.

50

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks.


Communications of the ACM 13(6)(June, 1970) 377-387.

T R :S

51

Data model

E. F. Codd. Data models in database management, ACM, 1980.

A combination of three following components

(1). A collection of data structure types (the building

blocks of any database that conforms to the model);

(2). A collection of operators or inferencing rules, which


can be applied to any valid instances of the data types
listed in (1), to retrieve or derive data from any parts

of those structures in any combinations desired;

(3). A collection of general integrity rules, which


implicitly or explicitly define the set of consistent

database states or changes of state or both -- these


rules may sometimes be expressed as insert-updatedelete rules

52

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks.


Communications of the ACM 13(6)(June, 1970) 377-387.

Constraints inherent in the data model

Domain constraints

Key constraints

Constraints on nulls

Entity integrity constraints

Referential integrity constraints

53

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks.


Communications of the ACM 13(6)(June, 1970) 377-387.

Domain constraints

Within each tuple, the value of each attribute A


must be an atomic value from the domain dom(A).

Key constraints
1. Two distinct tuples in any state of the relation
cannot have identical values for (all) the attributes in
the key.
2. It is a minimal superkey that is, a superkey from
which we cannot remove any attributes and still have
the uniqueness constraint in condition 1 hold.

Superkey of the relation schema R specifies a uniqueness


constraint that no two distinct tuples in any state r of R
can have the same value for the superkey.

54

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks.


Communications of the ACM 13(6)(June, 1970) 377-387.

Key constraints

A relation schema may have more than one key.

Constraints on nulls

Specify whether null values are or are not


permitted on attributes

Entity integrity constraint

Candidate keys: primary key and secondary keys

No primary key value can be null.

Referential integrity constraint

Specify a referential integrity constraint between


the two relation schemas R1 and R2
55

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks.


Communications of the ACM 13(6)(June, 1970) 377-387.

Referential integrity constraint

A set of attributes FK (foreign key) in relation


schema R1 is a foreign key of R1 that references
relation R2 if it satisfies the following two rules:

The attributes in FK have the same domain(s) as the


primary key attributes PK of R2; the attributes FK are
said to reference or refer to the relation R2.

A value of FK in a tuple t1 of the current state r1(R1)


either occurs as a value of PK for some tuple t2 in the

current state r2(R2) or is null. In the former case, we


have t1[FK] = t2[PK], and we say that the tuple t1
references or refers to the tuple t2.

56

Relational Data Model

E. F. Codd. A Relational Model of Data for Large Shared Data Banks.


Communications of the ACM 13(6)(June, 1970) 377-387.

Relation Employee
Relation schema: EMPLOYEE (FNAME, MINIT, LNAME, SSN,

BDATE, ADDRESS, SEX, SALARY, SUPERSSN, DNO)


Attributes = FNAME, MINIT, LNAME, SSN,
Domain of Attribute SEX = {F, M}

Domain of SALARY = a set of positive integer numbers


Primary key = SSN

Foreign key = SUPERSSN

57

The SQL standard has gone through a number of revisions.


http://en.wikipedia.org/wiki/SQL (Accessed on 09/05/2012)

High-level database language on relational


databases: SQL-86, SQL-89, SQL-92, SQL-99,

58

SQL (structured query language)

Data Definition Language (DDL): Create,


Alter, Drop

Data Manipulation Language (DML): Select,


Insert, Update, Delete

Data Control Language (DCL): Commit,


Rollback, Grant, Revoke
59

SQL (structured query language)


CREATE TABLE TableName
{(colName dataType [NOT NULL] [UNIQUE]

[DEFAULT defaultOption]
[CHECK searchCondition] [,...]}
[PRIMARY KEY (listOfColumns),]
{[UNIQUE (listOfColumns),] [,]}

{[FOREIGN KEY (listOfFKColumns)


REFERENCES ParentTableName [(listOfCKColumns)],
[ON UPDATE referentialAction]
[ON DELETE referentialAction ]] [,]}

{[CHECK (searchCondition)] [,] })

60

SQL (structured query language)


SELECT [DISTINCT | ALL]
{* | [columnExpression [AS newName]] [,...] }
FROM TableName [alias] [, ...]
[WHERE condition]
[GROUP BY columnList]
[HAVING condition]
[ORDER BY columnList]
61

Database
design

Data model mapping


from a conceptual
schema based on the ER
model to a relational
database schema based
on the relational data

model

Figure 2.1 Phases of database


design and implementation for

large databases
[1], pp. 368

62

Data model mapping

Table 7.1. Correspondence between ER and Relational Models


[1], pp. 198.

63

Data model mapping

Mapping of specialization or generalization

Convert each specialization with m subclasses


{S1, S2, , Sm} and (generalized) superclass C,
where the attributes of C are {k, a1, , an} and
k is the (primary) key, into relation schemas
using one of the four following options:

Multiple relations Superclass and subclasses

Multiple relations Subclass relations only (total)

Single relation with one type attribute (disjoint)

Single relation with multiple type attributes


(overlapping)

64

Using Multiple relations Subclass relations only (total)


Figure 7.4, pp. 200.

Figure 4.3. Generalization


[1], pp. 91.

65

Figure 7.4. Using Single relation with multiple type attributes with Boolean fields MFlag
and PFlag.
[1], pp. 200.

66

Data model mapping

Mapping algorithms

Automatically create a relational schema from a


conceptual schema design

Notes on data model mapping

Slightly different for other post-relational data


models as compared to the relational data model

Constraints on databases

Inherent model-based constraints

Schema-based constraints

Application-based constraints

67

68

69

70

71

Query

Example: For every project located in Stafford,


retrieve the project number, the controlling
department number and the department managers
last name, address and birthdate.

SQL query:
SELECT P.NUMBER, P.DNUM, E.LNAME, E.ADDRESS, E.BDATE
FROM PROJECT AS P, DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND

P.PLOCATION=STAFFORD;

Relational algebra expression:


PNUMBER, DNUM, LNAME, ADDRESS, BDATE(((PLOCATION=STAFFORD(PROJECT))

DNUM=DNUMBER (DEPARTMENT))

MGRSSN=SSN

(EMPLOYEE))72

Query: For each project on which more than two employees


work, retrieve the project number, project name, and the
number of employees who work on that project.
SELECT

PNUMBER, PNAME, COUNT (*) AS PROJ_NUMBER

FROM

PROJECT, WORKS_ON

WHERE

PNUMBER=PNO

GROUP BY

PNUMBER, PNAME

HAVING

COUNT (*) > 2

Temp (PNUMBER, PNAME, PROJ_NUMBER) PNUMBER, PNAME

PNUMBER=PNO(PROJECT

Result

COUNT *

X WORKS_ON))

PROJ_NUMBER > 2

(Temp)
73

Query: retrieve the last name and first name of each employee

whose salary is greater than the max salary in department 5.

LNAME, FNAME
SALARY > C
EMPLOYEE

MAX SALARY
DNO = 5
EMPLOYEE

74

Data model mapping

Informal measures of quality for relation schema


design

Semantics of the attributes

Reducing the redundant values in tuples

Storage space & update anomalies

Reducing the null values in tuples

How to interpret the attribute values stored in a tuple of the


relation how the attribute values in a tuple relate to one another

Multiple interpretations of nulls

Disallowing the possibility of generating spurious tuples

Join relations with equality conditions on attributes that are either primary
keys or foreign keys in a way that guarantees that no spurious tuples are
generated
75

2.4. Dependencies and normalization

Functional dependency (FD)

The single most important concept in relational schema


design theory

A constraint between two sets of attributes from the


database

A property of the semantics or meaning of the


attributes.

Examples

SSN ENAME
The value of an employees social security number (SSN)
uniquely determines the employee name (ENAME)

PNUMBER {PNAME, PLOCATION}

{SSN, PNUMBER} HOURS

76

2.4. Dependencies and normalization

Functional dependency (FD)

A constraint between two sets of attributes from the database

A constraint on the possible tuples that can form a relation state


r of a relation schema R = {A1, A2, , An}
Denoted by XY

X, Y: two sets of attributes that are subsets of R


For any two tuples t1 and t2 in r that have t1[X] = t2[X], they
must also have t1[Y] = t2[Y].
The values of Y of a tuple in r depend on (are determined by)
the values of X.
The values of X of a tuple uniquely (functionally) determine
the values of Y.
If XY in R, this does not say whether or not YX in R.
77

2.4. Dependencies and normalization

Armstrongs inference rules (IR1-3) and


others (IR4-6)

78

2.4. Dependencies and normalization

Closure F+ of a set of functional dependencies


F specified on relation schema R
F+, the closure of F, is formally the set of all
dependencies that include F as well as all
dependencies that can be inferred from F.

79

2.4. Dependencies and normalization

Closure F+ of a set of functional dependencies F

F = {SSN{ENAME, BDATE, ADDRESS, DNUMBER},


DNUMBER{DNAME, DMGRSSN}}

F+ = {SSN{ENAME, BDATE, ADDRESS, DNUMBER},


DNUMBER{DNAME, DMGRSSN}, SSNSSN, ,

DNUMBER DNAME, SSN{DNAME, DMGRSSN}}

80

2.4. Dependencies and normalization

Closure F+ of a set of functional dependencies F

F = {{SSN, PNUMBER}HOURS, SSNENAME,


PNUMBER{PNAME, PLOCATION}}

F+ = ???

81

2.4. Dependencies and normalization

Closure X+ of a set of attributes X under F


where X is a set of attributes that appears as a
left-hand side of a functional dependency in F

X+, the closure of X under F, is the set


of attributes that are functionally
determined by X based on F.
82

2.4. Dependencies and normalization

Closure X+ of a set of attributes X under F


where X is a set of attributes that appears as a
left-hand side of a functional dependency in F

83

2.4. Dependencies and normalization

Closure X+ of a set of attributes X under F


where X is a set of attributes that appears as a
left-hand side of a functional dependency in F

F = {{SSN, PNUMBER}HOURS, SSNENAME,

PNUMBER{PNAME, PLOCATION}}

{SSN, PNUMBER}+ = {SSN, PNUMBER, HOURS, ENAME,


PNAME, PLOCATION}

{SSN}+ = {SSN, ENAME}

{PNUMBER}+ = {PNUMBER, PNAME, PLOCATION}

84

2.4. Dependencies and normalization

Closure X+ of a set of attributes X under F


where X is a set of attributes that appears as a
left-hand side of a functional dependency in F

F = {{A, B}{C}, {A}{D, E}, {B}{M}, {M}{G, H},

{D}{I, J}}

{A, B}+ = ???

{A}+ = ???

{B}+ = ???

{M}+ = ???, and {D}+ = ???

85

2.4. Dependencies and normalization

Identifying a key K of R based on a set of


given functional dependencies F

86

2.4. Dependencies and normalization

Identifying a key K of R based on a set of


given functional dependencies F

R = (A, B, C, D, E, G)

F = {CE, AEG, EAD}

K = ???

87

2.4. Dependencies and normalization

Identifying a key K of R based on a set of given


functional dependencies F

R = (A, B, C, D, E, G)

F = {CE, AEG, EAD}

K = {A, B, C, D, E, G}

(K-{A})+ = {B, C, D, E, G, A} K = {B, C, D, E, G}

(K-{B})+ = {C, D, E, G, A} K = {B, C, D, E, G}

(K-{C})+ = {B, D, E, G, A} K = {B, C, D, E, G}

(K-{D})+ = {B, C, E, G, A, D} K = {B, C, E, G}

(K-{E})+ = {B, C, G, E, A, D} K = {B, C, G}

(K-{G})+ = {B, C, E, A, D, G} K = {B, C}

88

2.4. Dependencies and normalization

Identifying a key K of R based on a set of


given functional dependencies F

R = (A, B, C, E, G)

F = {AB, BCE, EGA}

K = ???

89

2.4. Dependencies and normalization

Identifying all keys K of R based on a set of given


functional dependencies F

Find N = U - fFright(f)
N is a set of the attributes not in the right side of any functional
dependency in F. U is a set of all the attributes of R.

Find N+, the closure of N under F

If N+ = U then there is only one key K = N.

Otherwise, find D = fFright(f) - fFleft(f)


D is a set of the attributes only in the right side of the functional
dependencies in F.

Find L = U (N+D)
L is a set of the attributes neither in the right side nor functionally
determined by N.

Let Li be any subset of L. Find (NLi)+ under F.

If (NLi)+ = U then K = NLi.

Ki, Kj such that Ki Kj, remove Kj.

90

2.4. Dependencies and normalization

Identifying all keys K of R based on a set of


given functional dependencies F

R = (A, B, C, E, G)

F = {AB, BCE, EGA}

All keys K ???

91

2.4. Dependencies and normalization

R = (A, B, C, E, G)

F = {AB, BCE, EGA}

N = U - fFright(f) = {C, G}

N+ = {C, G}

D = fFright(f) - fFleft(f) =

L = U (N+D) = {A, B, E}

Subsets of L: L1 = {A}, L2 = {B}, L3 = {E}, L4 =


{A, B}, L5 = {A, E}, L6 = {B, E}, L7 = {A, B, E}

(NL1)+ = {C, G, A, B, E} K1 = {C, G, A}

(NL2)+ = {C, G, B, E, A} K2 = {C, G, B}

(NL3)+ = {C, G, E, A, B} K3 = {C, G, E}

92

2.4. Dependencies and normalization

Identifying all keys K of R based on a set of


given functional dependencies F

R = (A, B, C, E, G)

F = {ABC, CGE, BG, EA}

All keys K ???

93

2.4. Dependencies and normalization

Membership of a functional dependency XY


with respect to a set of functional
dependencies F, i.e. XY is inferred from F,
denoted by F XY

A functional dependency XY is inferred from a set of


dependencies F specified on R if XY holds in every
legal relation state r of R; that is, whenever r satisfies
all the dependencies in F, XY also holds in r.

94

2.4. Dependencies and normalization

Membership of a functional dependency XY


with respect to a set of functional
dependencies F, i.e. XY is inferred from F,
denoted by F XY

Check if F XY

Find X+, the closure of X under F

If Y X+ then F XY
95

2.4. Dependencies and normalization

Membership of a functional dependency XY


with respect to a set of functional
dependencies F, i.e. XY is inferred from F,
denoted by F XY

F = {A C, AC D, E AD, E H}

F A CD

A+ under F = {A, C, D} {C, D} F A CD

F E AH ???

96

2.4. Dependencies and normalization

A set of functional dependencies F is said to


cover another set of functional dependencies
E if every FD in E is also in F+; that is, if every
dependency in E can be inferred from F.
Alternatively, E is covered by F.

Two sets of functional dependencies E and F


are equivalent if E+ = F+; that is, every
dependency in E can be inferred from F and
every dependency in F can be inferred from E;
that is also, E covers F and F covers E.

97

2.4. Dependencies and normalization

F = {A C, AC D, E AD, E H}

G = {A CD, E AH}

Are F and G equivalent?


* Check if F covers G.
* Check if G covers F.

98

2.4. Dependencies and normalization

The minimal cover F of a set of functional


dependencies E is a set of functional
dependencies that satisfies the property that
every dependency in E is in the closure F+ of
F. In addition, this property is lost if any
dependency from the set F is removed; F
must have no redundancies in it, and the
dependencies in F are in a standard form.
99

2.4. Dependencies and normalization

The minimal cover F of a set of functional


dependencies E
1.

Every dependency in F has a single attribute for its righthand side.

2.

We cannot replace any dependency X A in F with a


dependency Y A, where Y is a proper subset of X, and still

have a set of dependencies that is equivalent to F.


3.

We cannot remove any dependency from F and still have a


set of dependencies that is equivalent to F.
100

2.4. Dependencies and normalization

The minimal cover F of a set of functional dependencies E

101

2.4. Dependencies and normalization

The minimal cover F of a set of functional


dependencies E

E = {A B, ABCD I, IJ G, IJ H, ACDJ GI}

The minimal cover F of E ???

102

2.4. Dependencies and normalization

Normalization

A process of analyzing the given relation


schemas based on their FDs and primary keys to
achieve the desirable properties of:

(1) minimizing redundancy


(2) minimizing the insertion, deletion, and update
anomalies

Normal form of a relation

The highest normal form condition that the


relation meets the degree to which it has
been normalized

103

2.4. Dependencies and normalization

The process of normalization through


decomposition produces the relational
schemas that have:

The lossless join or non-additive join property

No spurious tuple generation problem occurs.

The dependency preservation property

Each FD is represented in some individual relation.


104

2.4. Dependencies and normalization


Table 10.1. Summary of normal forms based on primary keys and
corresponding normalization ([1], pp. 321)

105

2.4. Dependencies and normalization

Pure relations based on the relational data


model are all in 1NF.

Relations in 2NF have non-key attributes

functionally dependent on the whole


primary key.

Relations in 3NF have non-key attributes

only functionally dependent on the whole


primary key.

106

Primary key = {SSN, PNUMBER}


Non-key attributes = HOURS, ENAME, PNAME, PLOCATION

FD1: {SSN, PNUMBER} HOURS


FD2: SSN ENAME
FD3: PNUMBER {PNAME, PLOCATION}

Functionally dependent on
a part of the primary key 107

Primary key = SSN

Non-key attributes = ENAME, BDATE, ADDRESS,


DNUMBER, DNAME, DMGRSSN
FD1: SSN {ENAME, BDATE, ADDRESS, DNUMBER}
FD2: DNUMBER {DNAME, DMGRSSN}

Non-key attributes are functionally


dependent on another non-key
attribute.
108

2.4. Dependencies and normalization

Relation: LOTS (PROPERTY_ID#,


COUNTY_NAME, LOT#, AREA, PRICE,
TAX_RATE)

Primary key: PROPERTY_ID#

Secondary key: {COUNTY_NAME, LOT#}

Functional dependencies

FD1: PROPERTY_ID# {COUNTY_NAME, LOT#, AREA,


PRICE, TAX_RATE}
FD2: {COUNTY_NAME, LOT#} {PROPERTY_ID#, AREA,
PRICE, TAX_RATE}

FD3: COUNTY_NAME TAX_RATE

FD4: AREA PRICE

109

Is LOTS really in 1NF?


Normalization is carried out as follows. Determine LOTS1, LOTS2,
LOTS1A, and LOTS1B.

Figure 10.11 (d). Summary of the progressive normalization of LOTS.


[1], pp. 322.
110

2.4. Dependencies and normalization

Relation: LOTS (PROPERTY_ID#,


COUNTY_NAME, LOT#, AREA, PRICE,
TAX_RATE)

Primary key: PROPERTY_ID#

Secondary key: {COUNTY_NAME, LOT#}

Functional dependencies

FD1: PROPERTY_ID# {COUNTY_NAME, LOT#, AREA,


PRICE, TAX_RATE}
FD2: {COUNTY_NAME, LOT#} {PROPERTY_ID#, AREA,
PRICE, TAX_RATE}

FD3: COUNTY_NAME TAX_RATE

FD4: AREA PRICE

2NF
111

2.4. Dependencies and normalization

Relation: LOTS1 (PROPERTY_ID#, COUNTY_NAME, LOT#, AREA,


PRICE) -- in 2NF

Primary key: PROPERTY_ID#

Secondary key: {COUNTY_NAME, LOT#}

Functional dependencies

FD1: PROPERTY_ID# {COUNTY_NAME, LOT#, AREA, PRICE}

FD2: {COUNTY_NAME, LOT#} {PROPERTY_ID#, AREA, PRICE}

FD4: AREA PRICE

3NF

Relation: LOTS2 (COUNTY_NAME, TAX_RATE) -- in 3NF

Primary key: COUNTY_NAME

FD: COUNTY_NAME TAX_RATE


112

2.4. Dependencies and normalization

Relation: LOTS1A (PROPERTY_ID#, COUNTY_NAME, LOT#,


AREA) -- in 3NF

Primary key: PROPERTY_ID#

Secondary key: {COUNTY_NAME, LOT#}

Functional dependencies

FD1: PROPERTY_ID# {COUNTY_NAME, LOT#, AREA}

FD2: {COUNTY_NAME, LOT#} {PROPERTY_ID#, AREA}

Relation: LOTS1B (AREA, PRICE) -- in 3NF

Primary key: AREA

FD: AREA PRICE

Relation: LOTS2 (COUNTY_NAME, TAX_RATE) -- in 3NF

Primary key: COUNTY_NAME

FD: COUNTY_NAME TAX_RATE

113

2.4. Dependencies and normalization

Boyce-Codd normal form (BCNF)

A simpler form of 3NF, but stricter than 3NF

Every relation in BCNF is also in 3NF; however, a


relation in 3NF is not necessarily in BCNF.

For a relation in 3NF, each non-key attribute must:

Fully functionally dependent on every key


Nontransitively dependent on every key

Not functionally dependent on other nonkey attributes

For a relation in BCNF, all key/non-key attributes must be


functionally dependent on superkeys.
XA holds in R => X is a superkey of R.
XA holds in R and X is not a superkey of R => R is not
114
in BCNF.

2.4. Dependencies and normalization

Figure 10.12. (a) BCNF normalization of LOTS1A


[1], pp. 325.

FD5: AREA COUNTY_NAME

AREA is not a superkey of LOTS1A.


LOTS1A is in 3NF; but not in BCNF.

115

2.4. Dependencies and normalization

Candidate key = {STUDENT, COURSE}


FD1: {STUDENT, COURSE} INSTRUCTOR
FD2: INSTRUCTOR COURSE

Normalization???
116

2.4. Dependencies and normalization

Multivalued dependencies

A multivalued dependency X

Y specified on

relation schema R, where X and Y are both

subsets of R, specifies the following constraint on


any relation state r of R: if two tuple t1 and t2
exist in r such that t1[X] = t2[X], then two tuples

t3 and t4 should also exist in r with the following


properties, where Z is used to denote (R-(XUY)):

t3[X] = t4[X] = t1[X] = t2[X]

t3[Y] = t1[Y] and t4[Y] = t2[Y]

t3[Z] = t2[Z] and t4[Z] = t1[Z]

117

2.4. Dependencies and normalization

Multivalued dependencies

ENAME

PNAME

ENAME

DNAME

No association between PNAME and DNAME


118

2.4. Dependencies and normalization

Multivalued dependencies (MVDs)

A multivalued dependency X

Y is a trivial MVD

if:

(a) Y is a subset of X

(b) X U Y = R

A trivial MVD does not specify any significant or


meaningful constraint on R.

A nontrivial MVD satisfies neither (a) nor (b).


119

2.4. Dependencies and normalization

Fourth normal form

A relation schema R is in 4NF with respect to a


set of dependencies (that includes functional
dependencies and multivalued dependencies) if,

for every nontrivial multivalued dependency X

in F+, X is a superkey for R.

120

2.5. Conclusion

Data modeling

Database design process: 6 phases

Conceptual database design

Choice of a DBMS a representational data model

Entity-relationship model

Relational data model

Data model mapping

Functional/multivalued dependencies

Normalization: 1NF, 2NF, 3NF, BCNF, 4NF


121

Questions ???

122

Review - Chapter 12 in [1]


12.1. What are the six phases of database design? Discuss each phase.

12.2. Which of the six phases are considered the main activities of the database
design process itself? Why?
12.3. Why is it important to design the schemas and applications in parallel?
12.4. Why is it important to use an implementation-independent data model during
conceptual schema design? What models are used in current design tools? Why?

12.5. Discuss the importance of Requirements Collection and Analysis.


12.7. Discuss the characteristics that a data model for conceptual schema design
should possess.
12.8. Compare and contrast the two main approaches to conceptual schema design.

12.9. Discuss the strategies for designing a single conceptual schema from its
requirements.
12.10. What are the steps of the view integration approach to conceptual schema
design? What are the difficulties during each step?
12.13. Discuss the factors that influence the choice of a DBMS package for the
information system of an organization.
12.14. What is system-independent data model mapping? How is it different from
system-dependent data model mapping?
12.15. What are the important factors that influence physical database design?

123

Review - Chapters 3, 4 in [1]


3.1. Discuss the role of a high-level data model in the database design process.

3.2. List the various cases where use of a null value would be appropriate.
3.3. Define the following terms: entity, attribute, attribute value, relationship
instance, composite attribute, multivalued attribute, derived attribute, complex
attribute, key attribute, value set (domain).
3.4. What is an entity type? What is an entity set? Explain the differences among an
entity, an entity type, and an entity set.
3.6. What is a relationship type? Explain the differences among a relationship
instance, a relationship type, and a relationship set.
3.7. What is a participation role? When is it necessary to use role names in the
description of relationship types?
3.12. When is the concept of a weak entity used in data modeling? Define the terms
owner entity type, weak entity type, identifying relationship type, and partial key.
4.1. What is a subclass? When is a subclass needed in data modeling?
Define the following terms: superclass of a subclass, superclass/subclass
relationship, is-a relationship, specialization, generalization, category, specific (local)
attributes) specific relationships.
4.6. Discuss the two main types of constraints on specializations and
generalizations.

124

Review - Chapter 5 in [1]


5.1. Define the following terms: domain, attribute, n-tuple,
relation schema, relation state, degree of a relation, relational
database schema, relational database state.

5.2. Why are tuples in a relation not ordered?


5.3. Why are duplicate tuples not allowed in a relation?
5.4. What is the difference between a key and a superkey?
5.8. Discuss the entity integrity and referential integrity
constraints. Why is each considered important?
5.9. Define foreign key. What is this concept used for?
125

Review - Chapter 10 in [1]


10.1. Discuss attribute semantics as an informal measure of goodness for a relation
schema.
10.2. Discuss insertion, deletion, and modification anomalies. Why are they considered
bad? Illustrate with examples.
10.3. Why should nulls in a relation be avoided as far as possible? Discuss the problem
of spurious tuples and how we may prevent it.
10.4. State the informal guidelines for relation schema design that we discussed.
Illustrate how violation of these guidelines may be harmful.
10.5. What is a functional dependency? What are the possible sources of the information
that defines the functional dependencies that hold among the attributes of a relation
schema?
10.6. Why can we not infer a functional dependency automatically from a particular
relation state?
10.12. What does the term unnormalized relation refer to? How did the normal forms
develop historically from first normal form up to Boyce-Codd normal form?
10.13. Define first, second, and third normal forms when only primary keys are
considered. How do the general definitions of 2NF and 3NF, which consider all keys of a
relation, differ from those that consider only primary keys?
10.14. What undesirable dependencies are avoided when a relation is in 2NF?
10.15. What undesirable dependencies are avoided when a relation is in 3NF?
10.16. Define Boyce-Codd normal form. How does it differ from 3NF? Why is it
126
considered a stronger form of 3NF?

Вам также может понравиться