Вы находитесь на странице: 1из 60

Chapter 3

The Relational Database


Model
Learning Objectives
0 In this chapter, students will learn:
0 That the relational database model offers a logical view of
data
0 About the relational model’s basic component: relations
0 That relations are logical constructs composed of rows
(tuples) and columns (attributes)
0 That relations are implemented as tables in a relational
DBMS
0 About relational database operators, the data dictionary, and the
system catalog
0 How data redundancy is handled in the relational database model
0 Why indexing is important
A Logical View of Data
Database stores and manages both data and metadata.

DBMS manages and controls access to data and database


structure and is placed between the application and database.

Data models are used for database design.

Relational model - allows the designer to focus on the logical


representation of data and its relationships rather than the
physical storage details.
 Related records are stored in independent table, which makes
relational database model much easier to understand than the
hierarchical or network models.
3
Tables and Their Characteristics
Logical view of relational database is facilitated by the creation of
data relationships based on logical construct known as relation.

Relation thought of as a table.


Relation: two-dimensional table that has specific characteristics

A table is called a relation because the relational model’s creator,


E.F.Codd, used the term relation as synonym for table. Table
dimensions like a matrix, consists of rows and columns

A table view of data makes it easy to spot and define entity


relationships, thereby greatly simplifying the task of database
design.
4
5
6
1. Table STUDENT is a two-dimensional structure.
2. Each row describes a single entity occurrence.
3. Each column represents an attribute, and each column has a distinct
name.
4. Cells of a table hold a single value.
5. Each column contains data about the attributes of the entity. All the
values in a column match the attribute’s characteristics. For e.g.
STU_GPA column contains only STU_GPA entries for each of the table
rows.
Different DBMSs support different data type. Most common ones are numeric,
character, date, logical
5. The column’s range of permissible values is known as its domain.
Like STU_GPA values are limited to the range 0-4, the domain is [0,4]
6. The order of rows and columns is immaterial to the user.
7. No two rows can be identical.
8. Each table must have a primary key. Primary key is an attribute or
combination of attributes that uniquely identifies any given row. In
the above example, STU_NUM is the primary key.
 Example : COURSES table
Keys
 Keys are used to ensure each row in a table is uniquely identifiable.
 They also establish relationship among the tables and ensure integrity
of data.
 Key consists of one or more attributes that determine other attributes.
For example – an invoice number identifies all invoice attributes,
such as invoice date and the customer name.
Dependencies
 The Key’s role is based on concept known as determination
Determination is the state in which knowing the value of one
attribute makes it possible to determine the value of another
 In a database table, if you know the value of attribute A, you can
determine the value of attribute B and denoted as A -> B
 Based on the earlier table, STU_NUM value 324567 determines
the STU_LNAME value Smith.
STU_NUM -> STU_LNAME
 In fact, STU_NUM determines all of the student’s attribute values.
STU_NUM -> STU_LNAME, STU_FNAME, STU_INIT, STU_DOB
8
In contrast, STU_NUM is not determined by STU_LNAME as it is
possible to have several students with last name Smith.
The principle of determination is very important because it is
used in the definition of central relational database concept
known as functional dependence.
STU_NUM value is not functionally dependent on STU_LNAME
because more than one student may have the last name Smith.

Functional dependence
The attribute B is functionally dependent on the attribute A if
each value in column A determines one and only one value in
column B
Consider the above table, there is no single unique attribute
which can be a primary key or a determinant.
In this case, it is possible to have more than one single attribute
to define functional dependence.
FLIGHT_NO,FLIGHT DATE -> FLIGHT_TIME,FLIGHT_DURATION
Such a multi attribute key is called a composite key.
Any attribute that is part of a key is known as key attribute.
in the previous STUDENT table, the combination of last name,
first name, initial and phone is likely to produce unique matches
(incase STU_NUM was not there)
For example –
STU_LNAME, STU_FNAME,STU_INIT,STU_PHONE -> STU_HRS,STU_CLASS
Or
STU_LNAME, STU_FNAME,STU_INIT,STU_PHONE -> STU_HRS,STU_CLASS,
STU_GPA
Or
STU_LNAME, STU_FNAME,STU_INIT,STU_PHONE -> STU_HRS,STU_CLASS,
STU_GPA,STU_DOB
The notion of functional dependence can be further refined by specifying
full functional dependence.
If the attribute (B) is functionally dependent on a composite
key (A) but not on any subset of that composite key, the
attribute (B) is fully functionally dependent on (A)
Types of Keys (cont’d.)
Within the broad key classification, several specialized keys can be defined

A ->BC or BC ->A or AB ->C

Super key is a simplest key


 Any key that uniquely identifies each row. In short, the super key
functionally determines all of the row’s attributes. It’s actually is the
primary key it self or primary key plus any other attribute(s)
 In the student table, super key could be
 STU_NUM or
STU_NUM,STU_LNAME

12
Types of Keys (cont’d.)

Candidate key
 A super key without unnecessary attributes, that is a minimal super key.
 It can have more than one attributes as long as it there are not duplicates
STU_NUM or the combination
STU_LNAME,STU_FNAME,STU_INIT,STU_PHONE is an example of a
Candidate key
 A Primary key is a candidate key chosen to be the unique row identifier.
Also, primary key is a super key as well as a candidate key.
 A maintain entity integrity, a null (no data entry at all) is not
permitted in the primary key.
Example
Year Month Date Major Minor
2008 01 13 0 1
2008 04 23 0 2
2009 11 05 1 0
2010 04 05 1 1

So (year, major, minor) or (year, month, date, major) or (year,


month, date) are super keys (since they produce unique
output) but not candidate keys, since you can
remove year or major and the remaining tuple will still be a
super key.
(month, date) and (major, minor) are candidate keys, since you
cannot remove any of the fields from them without breaking
uniqueness
Keys (cont’d.)
Nulls
No data entry
Not permitted in primary key
Should be avoided in other attributes
Cases where nulls are unavoidable are when in EMPLOYEE table,
EMP_INITIAL attribute can sometime be left null if the employee does
not have a middle name.
Existence of nulls in a table is an indication of poor database
design.
 Can create problems when functions such as COUNT, AVERAGE, and
SUM are used
 Can create logical problems when relational tables are linked
To avoid nulls, designers use special codes called flags.
0 Flags indicate absence of some value

15
Controlled redundancy
Makes the relational database work
Tables within the database share common attributes
enables tables to be linked together
Multiple occurrences of values not redundant when
required to make the relationship work
Redundancy exists only when there is unnecessary
duplication of attribute values
Example in table below –

16
In the above diagram, the link is indicated by the line that
connects the VENDOR and PRODUCT tables.
The link is created when two tables share an attribute with
common values.
The primary key of one table (VENDOR) appears as foreign key in
a related table (PRODUCT)
A foreign key (FK) is an attribute whose values match the
primary key values in the related table.

 A relational database can also be represented by a relational schema. A


relational schema is a textual representation of the database tables where each
table listed by its name followed by list of attributes in parentheses.

 VENDOR(VEND_CODE, VEND_CONTACT,VEND_AREA,VEND_PHONE)
 PRODUCT (PROD_CODE, PROD_DESCRIPT,PROD_PRICE, PROD_ON_HAND,
VEND_CODE)
VEND_CODE is primary key in the VENDOR table, and occurs as a
foreign key in the PRODUCT table.

Examples

19
Secondary key
Key used strictly for data retrieval purposes. For example in
CUSTOMER table, primary key is customer number which is difficult
to remember. So, secondary key can be a combination of customer’s
last name and phone number but would yield many matches. The
combination may not yield a unique outcome.

20
21
Integrity Rules

0 Relational database integrity rules are very important


to a good database design.
0 Many RDBMs enforce integrity rules automatically
0 Safer to ensure that application design conforms to
entity and referential integrity rules

22
23
24
Example
Table : AGENT  Table : CUSTOMER
 Primary Key : CUS_CODE
Primary Key: AGENT_CODE
 Foreign Key : AGENT_CODE
 Referential Integrity :
Entity Integrity : The Primary AGENT_CODE entries in the
key in both the tables has no CUSTOMER table matches all the
AGENT_CODE in AGENT_TABLE
null entries and all entries are
unique
Relational Database Operators
 The data in relational tables are of limited value unless the data can be
manipulated to generate useful information.
 Relational algebra defines theoretical way of manipulating table contents
using the eight relational operators.
 Relational algebra is a procedural query language, which takes instances of
relations as input and yields instances of relations as output. It uses
operators to perform queries.
 They accept relations as their input and yield relations as their output
 The eight relational operators are: UNION, INTERSECT, DIFFERENCE,
PRODUCT, SELECT, PROJECT, JOIN, and DIVIDE.
 Application of these operators are based on relational algebra theory.
 They define functions to manipulate data in one or more tables (relations).
 SQL commands can be used to accomplish relational algebra operations
which will learnt in future chapters.

26
1. SELECT, also known as restrict, yields values for all rows found in a table that
satisfy a given condition. SELECT can be used to list all of the row values, or
one’s which match specified condition, as shown in fig below-

Formally, SELECT is denoted by the lowercase Greek letter sigma (σ). Sigma is followed by the
condition to be evaluated (called a predicate) as a subscript, and then the relation is listed in
parentheses.
Notation − σp(r)
Where σ stands for selection and r stands for relation. p is prepositional logic formula which may
use connectors like and, or, and not. These terms may use relational operators like − =, ≠, ≥, < , >, ≤.
For example, to SELECT all of the rows in the CUSTOMER table that have the value ‘10010’ in the
CUS_CODE attribute, you would write the following:
Project
 PROJECT produces a list of all values for selected attributes.
 It yields a vertical subset of a table.

Formally, PROJECT is denoted by the Greek letter pi (π). Pi is followed by the list
of attributes to be returned as subscripts, and then the relation listed in parentheses.
Notation − ∏A1, A2, An (r)
Where A1, A2 , An are attribute names of relation r.
For example, to PROJECT the CUS_FNAME and CUS_LNAME attributes in the CUSTOMER
table, you would write the following:

28
UNION
 UNION combines all rows from two tables, excluding duplicate rows i.e.
 Union of two relations is a relation that includes all the tuples that are in Relation 1 and
in Relation 2 and tuples common in both relations are shown only once.
 The tables must have the same attribute characteristics to be used in the UNION. They
must be union-compatible.

Notation − r U s
Where r and s are either database relations or relation result set
For a union operation to be valid, the following conditions must hold −
r, and s must have the same number of attributes.Attribute domains must be
compatible. Duplicate tuples are automatically eliminated
If the relations SUPPLIER and VENDOR are union compatible, then a UNION between
them would be denoted as follows:
Supplier  vendor
INTERSECT
 INTERSECT yields only the rows that appear in both the tables.
 The tables must be union compatible. One cannot use INTERSECT, if
one of the attributes is numeric and one is character based.

INTERSECT is denoted by the symbol . If the relations SUPPLIER and VENDOR are
union-compatible, then an INTERSECT between them would be denoted as follows:
supplier  vendor
5. DIFFERENCE yields all rows in one table that are not found in
the other table.

DIFFERENCE is denoted by the minus symbol −. If the relations SUPPLIER and


VENDOR are union-compatible, then an DIFFERENCE of SUPPLIER minus VENDOR
would be written as follows:
supplier − vendor
5. PRODUCT yields all possible pairs of rows from two tables. If
one table has six rows and the other table has three rows, the
PRODUCT yields a list of composed 6*3 = 18 rows.
PRODUCT is denoted by the multiplication symbol .. The PRODUCT of the
CUSTOMER and AGENT relations would be written as follows:
customer × agent
32
7.
JOIN allows information to be intelligently combined from two or
more tables.
JOIN is the real power behind the relational database, allowing the
use of independent tables linked by common attributes.
0 Natural Join
0 Links tables by selecting rows with common values in common
attribute(s)
0 Equijoin
0 Links tables on the basis of an equality condition that compares
specified columns
0 Theta join
0 Any other comparison operator is used
0 Outer join
0 Matched pairs are retained, and any unmatched values in other
table are left null
0 A natural join is a result of three-stage process –
1. First, a PRODUCT of the tables is created, yielding the results as
shown in fig 3.12
2. Second, a SELECT is performed on the output of step 1 to yield
only the rows for which AGENT_CODE values are equal. The
common columns are referred to as the join columns. Step 2
yields the results shown in figure 3.13
3. A PROJECT is performed on the results of step 2 to yield a
single copy of each attribute, there by eliminating duplicate
columns. Step 3 yields the output as shown in figure 3.14
CUSTOMER X AGENT
35
σ customer.agent_code=agent.agent_code(new table)

∏cus_code, cus_name, cus_zip, agent_code, agent_phone (new table) 36


0 A Left outer join yields all of the rows in the CUSTOMER table including those
that do not have a matching value in the AGENT table. unmatched attributes
left blank or null Example is in fig 3.15

0 A right outer join yields all of the rows in the AGENT table, including those
that do not have matching values in CUSTOMER table. An example in figure
3.16
Outer Joins
Outer joins are especially useful when one is trying to determine
what value(s) in related tables cause referential integrity
problems.
Such problems are created when foreign key values do not match
the primary key values in the related tables.

Divide
In relational algebra, an operator that answers queries about one set of
data being associated with all values of data in another set of data.
0 The output of DIVIDE operation is a single column with the values of
column “a” from the dividend table rows where the value of the
common column in both table matches.
0 Table 1 is divided by table 2 to produce table 3. Tables must have
a common column.
0 Table 1 and 2 both contain the column CODE but don’t share LOC
0 To be included in the resulting Table 3, a value in the unshared
column (LOC) must be associated with every value in Table 1.
0 The only value associated with both A and B is 5.
The Data Dictionary
and System Catalog
Data dictionary
0 It automatically produces database documentation. Data
dictionary contains metadata that describes the data stored in
the database.
0 It stores:
0 the names of the data items in the database
0 the types and sizes of the data items
0 the constraints on each data item
0 the names of authorized users, the data items that each user can
access, and the types of access allowed.

40
System Catalog
0 System catalog is a very detailed system data dictionary. It
describes all objects within the database.
0 System catalog is a system-created database whose tables store
the database characteristics and contents.
0 System catalog tables can be queried just like any other tables.
0 System catalog automatically produces database documentation.
0 All data dictionary information are found in the system catalog.
42
Relationships within the
Relational Database
0 Relationships are classified as one-to-one (1:1), one-to-
many(1:M), and many-to-many (M:M)
0 1:M relationship is the relational database norm.
0 Data model in fig 3.18 shows implementation in figure 3.19.

43
0 As in above figure, each painting is painted by one and only one
painter but each painter could have painted any paintings.
0 There is only one row in the PAINTER table, but there are many
rows in the PAINTING table.
0 1:1 relationship implies, one entity can be related to one only
other entity and vice-versa.
0 For example, one department chair – a professor – can chair only
one department, and one department can have only one
department chair. The entities PROFESSOR and DEPARTMENT
exhibit a 1:1 relationship. It is modelled in figure 3.22 and its
implementation is shown in figure 3.23
0 M:N relationships is not supported directly in the relational
environment.
0 However the can be implemented by creating a new entity in 1:M
relationships with original entities and a linking entity.
0 Consider the below example of STUDENT can take many CLASSes
and each class can contain many STUDENTs. The M:M relationship
is shown in figure 3.24

47
TABLE NAME : STUDENT

STU_NUM STU_NAME CLASS_CODE


1000 BROWSER 100014
1000 BROWSER 100015
1000 BROWSER 100016
1001 SMITH 100014
1001 SMITH 100015

TABLE NAME : CLASS

CLASS_CODE STU_NUM COURSE_CODE CLASS_SECTIO


N
100014 1000 DBMS 3
100015 1000 OS 2
100016 1000 IWP 1
100014 1001 DBMS 3
100015 1001 OS 2
49
50
51
Example
52
Data Redundancy Revisited
0 Data redundancy leads to data anomalies
0 Can destroy the effectiveness of the database
0 Foreign keys
0 Control data redundancies by using common attributes shared by
tables
0 Crucial to exercising data redundancy control
0 Sometimes, data redundancy is necessary to serve crucial
information.
0 Redundancy is not a source of anomalies.

53
Indexes
0 To locate a particular book in the library, one doesn’t look
through each and every book but use the library’s catalog. This
catalog is indexed by title, topic and author.
0 An index points one to the book’s location to make retrieval
quick and easy.
0 An Index is an orderly arrangement to logically access rows in
a table
0 Index key
0 Index’s reference point
0 Points to data location identified by the key
0 Unique index
0 Index in which the index key can have only one pointer value
(row) associated with it
0 Each index is associated with only one table
54
55
Codd’s Relational Database Rules
0 In 1985, Codd published a list of 12 rules to define a relational
database system
0 Products marketed as “relational” that did not meet minimum
relational standards
0 Even dominant database vendors do not fully support all 12 rules

56
Foundation Rule
A relational database management system must manage its stored data
using only its relational capabilities.
1. Information Rule
All information in the database should be represented in one and
only one way - as values in a table.
2. Guaranteed Access Rule
Each and every value is guaranteed to be logically accessible by a
combination of table name, primary key value and column name.
3. Systematic Treatment of Null Values
Nulls must be represented and treated in a systematic way,
independent of data type
4. Dynamic On-line Catalog Based on the Relational Model
The metadata must be stored and managed as ordinary data,
that is in tables within the database. Such data must be available
to authorized users using the standard database relational
language.
5. Comprehensive Data Sublanguage Rule
A relational system may support several languages and various modes of
terminal use. However, there must be at least one language whose statements
are expressible, per some well-defined syntax, as character strings and whose
ability to support all of the following is comprehensible:
4. data definition
5. view definition
6. data manipulation (interactive and by program)
7. integrity constraints
8. authorization
9. transaction boundaries (begin, commit, and rollback).

6. View Updating
All views that are theoretically updateable are also updateable by the system.
7. High-level Insert, Update, and Delete
The database must support set-level inserts, updates and deletes..
8. Physical Data Independence
Application programs and terminal activities remain logically unaffected
whenever any changes are made in either storage representation or access
methods.
9. Logical Data Independence
Application programs and ad hoc facilities are logically unaffected when changes are
made to the table structures.
10. Integrity Independence
Integrity constraints specific to a particular relational database must be definable in
the relational data sublanguage and storable in the catalog, not in the application
programs.
11. Distribution Independence
The data manipulation sublanguage of a relational DBMS must enable application
programs and terminal activities to remain logically unaffected whether and whenever
data are physically centralized or distributed.
12. Non subversion Rule
If a relational system has or supports a low-level access of data, there must not be a
way to bypass the integrity rules of the database.
References -
Database principles
Fundamental of design, implementation and management
-Carlos Coronel, Steven Morris, Peter Ros

Вам также может понравиться