Вы находитесь на странице: 1из 8

Database Design, Implementation, and Validation

Relational Model

Vijay Kumar SICE, Computer Networking University of Missouri-Kansas City

Relational Model Relation Model: It is used to create relational schema. A relational model uses 2 dimensional (tabular structure) to represent the schema where columns represent set of attributes used in the relational schema and the rows represent the instances of the object. An instance of a relational schema is called a relation. For example, a relational schema in relational model can be represented as: Emp_name Age Sex Emp_No Relation: A relation is a two-dimensional table. It identifies a set of instances of objects of the same class using a set of attributes and their values. A relation is identified by a name. Example of a relation: Relation name - EMP: EMP Emp_name Anderson Decker Jackson Attribute names (column names): Age 21 22 19 Sex F M M Emp_No 100 200 300 Age Sex Emp_No

Emp_name

The basic questions 1. How values are assigned to these columns? Why Anderson is under Name and not under Age? 2. How column names such as Age, Sex, etc. are selected? We need to understand the following terms to answer these questions. Atomic value: Attribute: Domain (D): A value that cannot be divided into smaller parts without losing its meaning. Example: Social Security Number. Name of a column of a relation. Example: Age. A set of Atomic values. A domain is identified by a name. In order to create a proper domain for an attribute we associate a type with it. The domain of an attribute is written as dom (attribute name). A row of a relation is called a tuple. A tuple has several fields and the total number of fields in a tuple is the same as the total number of columns in the relation. Example: Anderson 21 F 100

Tuple:

Degree of a relation: The number of columns (attributes) in a relation. The degree of EMP is 4. A tuple of this relation is 4-tuple.

Relation scheme:

The structure of a relation, i.e., set of attributes but no values. A relation scheme is denoted by R(A1, A2, ..., An), where R is relation name and Ai's are attribute names. Example: EMP (Name, Age, Sex, Emp_No).

Relation instance (r): An instance of a relation represented by "r" is a set of ntuples. Thus r = {t1, t2, ..., tn}, where ti is a tuple. t = <v1, v2, ..., vn> where each vi is an element of dom(Ai); (1 i n). t[Ai] = value of the attribute Ai in tuple t. Formally: A relation scheme R is a finite set of attribute names {A1, A2, ..., An}. Corresponding to each attribute name Ai is a set Di, (1 i n) called the domain of Ai. The domains are arbitrary, non-empty set, finite or countably infinite. Let D = D1 D2 ... Dn A relation r on R is a finite set of mappings {t1, t2, ..., tp} from R to D with the restriction that for each mapping, t r, t(Ai) must be in Di, (1 i n). The mappings are called tuples. A relation instance r(R) is a subset of the Cartesian product of the domains of R, i.e., r(R) = {dom(A1) dom(A2) . . . dom(An)} Example Dom (Employee_No) = {100, 200, 300}, r(EMP) = {100, 200, 300} {21, 22, 19} dom (Age) = {21, 22, 19}

This will give us nine tuples but only three of them will be correct, i.e., gives the fact. The Cartesian product generates all combinations of these values, but only a subset will be meaningful for the relation EMP. This subset of this product will then represent an instance of the relation. Ordering of Tuples Tuples are not restricted to any specific order in a relation, i.e., a tuple can be placed anywhere in a relation without affecting its semantics. The ordering of columns is also immaterial.

However, when a relation is stored as a file, an order is defined among its relation for sorting purpose. This ordering is not a part of the relational model. Similar reasons are used in ordering the columns. In database we treat a relation r (R) as time varying, so that tuples can be deleted, added or modified. However, a relation scheme is time-invarient. Keys

A key of a relation r(R) is a subset (of its attributes) K = {A1, A2, ..., Am}, m n, (where n is the degree of R), of R with the following property: For any two disjoint tuples t1 and t2 of r, t1[K] t2[K]. This definition is too general and needs some refinement. A key of a relation r(R) is a subset K of R such that {t1 r and t2 r | t1(K) t2(K)} and no proper subset K' of K shares this property. Superkey The set of all attributes of a relation can also identify a tuple uniquely. But this set may contain some attribute whose value may be the same in all the tuples. Such a set may serve a key and we call them as a superkey. For example, {Emp_No, Name} can identify a tuple uniquely but Name attribute is not necessary only Emp_No is enough. The set {Emp_No, Name}, therefore, is a superkey of EMP. Candidate key The elimination of Name from {Emp_No, Name} gives a set where every member must be present to identify a tuple. Such a set is a minimal subset of a superkey is called a candidate key of EMP. If there are more than one candidate key in a relation then any one is selected by the system designer and is called the primary key. Convention: r(A B C D), {A,C} forms a primary key for r(R). Foreign Key (FK): A candidate key in a relation, whihc establishes a relationship with another relation. For example: if R1 and R2 are two relation schema C is an attribute of R1 and it is a candidate key of R1. If C attribute is included in relation R2 also, then C in R2 is a foreign key of R2.

Remember, C in R2 may not be a key (candidate or primary). In this situation the value of C will identify a unique tuple in R1. It may also identify a unique tuples in R2 also or it may identify a set of tuples. Operations on a relation Any operation (modify, insert, delete) on a relations must satisfy some correctness criteria before it is applied to data. These correctness criteria are called integrity constraints. There are three types: Key constraints: Entity integrity: A key must have a unique value. Operations that may violate this constraint are not allowed No primary key value can be null. The value of primary key is unique and null is not a unique value.

Referential integrity: If two relations are related via foreign key then if one relation is modified then the other related relation must be modified accordingly or the operation is not allowed. The presence of these keys and constraints makes the operation on a relation complex. These constraints are not explicitly expressed but implied in the transactions code. Relational Database Schema A relational database schema is R = {R1, R2, ..., Rn} and Integrity constraints = {IC1, IC2, ..., ICm}. A relational database is an occurrence of R. This occurrence (instance) is created by the user of the database management software. Relational Algebra A relation is processed by the following operations Select () Difference (-) Project () Intersection () Product () Union () Join and Division ()

The set of these operations is called Relational Algebra and is a procedural query language. A procedural data manipulation language (DML) has the following: Its commands must be embeded in a high level language. It manipulates "one record at a time" so the programmer has to use iterative commands (loops) to access desired records individually.)

Select: Denoted by the letter . It is a unary operator, i.e., it takes only one operand (relation). It yields another relation. The new relation is a horizontal subset of the parent relation r(R). Predicate (P) appears as a subscript of Example: Age = 21 (EMP) Anderson 21 F 100 <P> (relation name), i.e., A = a (r) Emp_No> 100 (EMP) Decker Jacson 22 19 M M 200 300

The following operators are allowed in the predicate: and (), or (V), =, -, , <, >, and Example Query: Find all male employees with Emp_No > 200.

Relational algebra expression: (Sex=M Emp_No > 200)(EMP) Result: Properties of Jackson 19 M 300

is commutative. That is <P1> (<P2> r) = <P2> (<P1> r) Projection () A projection is a unary operator on a relation r. It creates a vertical subset of the parent relation. Let X be a subset of the attributes of R. The result of the projection of r(R) onto X is the relation r'(X) obtained by extracting the columns (attributes) from R which are in X. At the end of this process the relation r'(X) may contain duplicate tuples. In relational algebra X(r) is a relation r'(X) = {t(X)| t is in r} Example Query: Find the Age of all employees. Relational Algebra expression: Age(EMP) Result Age 21 22 19 Properties of If two projections are applied on a relation in a row, the latter subsumes the former. Thus, If YXR, then Y(X(r)) = Y(r). Similarly, for a string of projections, only the outermost need be considered for evaluation. If X1 X2 ... Xm R then X1(X2 ( . . . (Xm(r)) . . .)) = X1(r) Boolean Operations: Intersection: Union: Difference: -

Suppose r and s are relations over R, then r s: a new relation containing all tuples that are common both in r and s. r s: a new relation containing all non-duplicate tuples of r and s. r - s: a new relation q containing all tuples of r that are not in s. Examples r A a1 a1 a2 r s = q(A a1 a1 B b1 b2 C) c1 c1 B b1 b2 b1 C c1 c1 c2 r s = q(A a1 B b2 s A a1 a2 a2 C) c1 B b2 b2 b2 C c1 c1 c2 r s = q(A a1 a2 B b1 b1 C) c1 c2

a2 a2

b1 b2

c2 c1

r s = r - (r - s) Note: r - s s - r Join Operation: A Join combines two relations on a common attribute. Join steps: Get cross product of two relations. Apply the join condition on the result of the cross product and keep those tuples, which satisfy the join condition. Note that a join condition is the equality between values of common attributes of the two relations. Optionally apply Projection on the result. In order to combine two relations there must at least be one common domain in both the relations. Let r(R) and s(S) be two relations over relation schemes R and S respectively, and R S = T. The join of r and s is: r join s = is a relation q(T) of all tuples t over T such that tr(R S) = ts(R S). Thus every tuple in q is a combination of a tuple from r and a tuple from s with equal (R S) values. Example: r s A B C M B r s a1 b1 c1 m1 b1 A B C M B a1 b1 c1 m2 b2 a1 b1 c1 m1 b1 = a2 b2 c2 m1 b1 a2 b2 c2 m2 b2 a2 b2 c2 m2 b2 The join condition is r.B = s.B. So when the join condition is applied to (r s), we have the following result. r join s A a1 a2 B b1 b2 C c1 c2 M B m1 b1 m2 b2

The definition of join does not require that R and S have a non-empty intersection. If R S = 0 then r join s is just a Cartesian product. The above result of join can be further improved. To allow complete flexibility in defining the predicate a join is divided into three types: Types of Join There are three types of join (a) Equijoin, (b) Natural join, and (c) Theta join. Equijoin: In equijoin the duplicate columns from the resultant relation are not removed. In the above example column B appears twice in the result. So equijoin is obtained by: Cartesian product and then the desired selection.

Natural join: In natural join the duplicate columns are removed by applying projection. So the result of natural join on r(R) and s(S) is r natural join s = ABCM(r join s) A B C M a1 b1 c1 m1 a2 b2 c2 m2 Theta join: The predicate includes operator other than an equality. It can be any boolean operator (, V, NOT, <, >, , etc.). Example: If the selection predicate in the above example is r.B s.B(t) then the result will be A B C M B a1 b1 c1 m2 b2 a2 b2 c2 m1 b1 A projection may not be applied since both B columns are different values in each tuple. Aggregate functions These are functions which work on a set of attributes of a relation. Let us use F as functional indicator. SUM: Sums up all the values of an attribute. AVERAGE: Gets the average of an attribute values. MAXIMUM/MINIMUM: Selects the maximum/minimum values from the set of values of an attribute. COUNT: Counts the number of values of an attribute in a relation.

Вам также может понравиться