Вы находитесь на странице: 1из 220

Database Management System

Surendra Singh Chahar Assistant Professor CS & IT Department SET, Sharda University
Surendra Singh Chahar Department of CS&IT, SU 1

Introduction What is Database? A database is a collection of related data. The data mean known facts that can be recorded and that have implicit meaning. A database has some source from which data are derived, some degree of interaction with events in the real world.

Surendra Singh Chahar Department of CS&IT, SU

An Example, University Database: Data about students, faculty, courses, research laboratories, course registration/enrollment etc. Reflects the state of affairs of the academic aspects of the university. Purpose: To keep an accurate track of the academic activities of the university.

Surendra Singh Chahar Department of CS&IT, SU

Example of a Database (with a Conceptual Data Model)


Example: Part of a UNIVERSITY environment. Some mini-world entities:
STUDENTs COURSEs SECTIONs (of COURSEs) (academic) DEPARTMENTs TEACHERs

Note: The above could be expressed in the ENTITYRELATIONSHIP data model.

Surendra Singh Chahar Department of CS&IT, SU

Example of a Database (with a Conceptual Data Model)


Some mini-world relationships: SECTIONs are of specific COURSEs STUDENTs take SECTIONs COURSEs have prerequisite COURSEs TEACHERs teach SECTIONs COURSEs are offered by DEPARTMENTs STUDENTs major in DEPARTMENTs Note: The above could be expressed in the ENTITYRELATIONSHIP data model.
Surendra Singh Chahar Department of CS&IT, SU 5

Database Management System (DBMS)


A general-purpose software system enabling:
the processes of Defining a database involves specifying data types, structures and constraints for the data to be stored in the database. Constructing the database is the process of storing the data itself on some storage medium that is controlled by the DBMS. Manipulating a database includes such functions as querying the database to retrieve specific data, updating the database to reflect changes.
Surendra Singh Chahar Department of CS&IT, SU 6

OS File System Storage Based Approach


Files of records use of data storage Data redundancy wastage of space maintaining consistency becomes difficult. Record structures - hard coded into programs structures modifications hard to perform Each different data access request (query) performed by a separate program difficult to anticipate all such request Creating the system requires a lot of effort
Surendra Singh Chahar Department of CS&IT, SU 7

DBMS Approach
DBMS separation of data and metadata flexibility of changing metadata program-data independence Data Access Language standardized SQL ad-hoc query formulation easy System Development less effort required concentration on logical level design is enough components to organize data storage
Surendra Singh Chahar Department of CS&IT, SU 8

Actors on the Scene Database AdministratorsDBA is responsible for authorizing access to the database, for coordinating and monitoring its use, and for acquiring software and hardware resources as required. Database DesignersDDs are responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data.

Surendra Singh Chahar Department of CS&IT, SU

End Users End users are the people whose jobs require access to the database for querying, updating, and generating reports; the database primarily exists for their use. Casual end users- They access the database, but they may need different information each time. Naive/parametric end users- these users involve in querying and updating the data. exp, Bank tellers check account balances and post withdrawals and deposits.

Surendra Singh Chahar Department of CS&IT, SU

10

 Sophisticated end users- Engineers, scientist, usersbusiness analysts, and others who thoroughly familiarize themselves with the facilities of the DBMS. DBMS.  Stand-alone users- Maintain personal data bases by Standusersusing readymade program packages that provide easy-toeasy-to-use menu or graphic-based interfaces. graphicinterfaces.

Surendra Singh Chahar Department of CS&IT, SU

11

Chapter 2
Database System Concepts and Architecture

Surendra Singh Chahar Department of CS&IT, SU

12

Data Models
Data Model: A set of concepts to describe the structure of a database, and certain constraints that the database should obey. Data Model Operations: Operations for specifying database retrievals and updates by referring to the concepts of the data model. Operations on the data model may include basic operations and user-defined operations.

Surendra Singh Chahar Department of CS&IT, SU

13

Categories of data models


Conceptual (high-level, External) data models: They provide concepts that are close to the way many users perceive data. Physical (low-level, internal) data models: They Provide concepts that describe details of how data is stored in the computer. Implementation (representational) data models: Provide concepts that fall between the above two, balancing user views with some computer storage details.
Surendra Singh Chahar Department of CS&IT, SU 14

Hierarchical Model
ADVANTAGES:
Hierarchical Model is simple to construct and operate on Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in manufacturing, personnel organization in companies Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT etc.

DISADVANTAGES:
Navigational and procedural nature of processing Database is visualized as a linear arrangement of records Little scope for "query optimization"

Surendra Singh Chahar Department of CS&IT, SU

15

Network Model
ADVANTAGES:
Network Model is able to model complex relationships and represents semantics of add/delete on the relationships. Can handle most situations for modeling using record types and relationship types. Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET etc. Programmers can do optimal navigation through the database.

DISADVANTAGES:
Navigational and procedural nature of processing Database contains a complex array of pointers that thread through a set of records.
Surendra Singh Chahar Department of CS&IT, SU 16

Schemas versus Instances


Database Schema: The description of a database. Includes descriptions of the database structure and the constraints that should hold on the database. Schema Diagram: A diagrammatic display of (some aspects of) a database schema. Schema Construct: A component of the schema or an object within the schema, e.g., STUDENT, COURSE. Database Instance: The actual data stored in a database at a particular moment in time. Also called database state (or occurrence).
Surendra Singh Chahar Department of CS&IT, SU 17

Three-Schema Architecture
Proposed to support DBMS characteristics of:
Program-data independence. Support of multiple user views of the data. Use of a catalog to store the database description (schema).

Surendra Singh Chahar Department of CS&IT, SU

18

Three-Schema Architecture
Defines DBMS schemas at three levels:
Internal level has an internal schema to describe physical storage structures of the database. Internal schema uses a physical data model and describes the complete details of data storage and access paths for the database. Conceptual schema at the conceptual level to describe the structure and constraints for the whole database for a community of users. It concentrates on describing entities, data types, relationships, user operations, and constraints.

Surendra Singh Chahar Department of CS&IT, SU

19

Three-Schema Architecture
External schemas at the external level to describe the various user views. Usually uses the same data model as the conceptual level.

Surendra Singh Chahar Department of CS&IT, SU

20

THREE SCHEMA ARCHITECTURE

b
External Level

END USERS

b
EXTERNAL VIEW

EXTERNAL VIEW

External/conceptual mapping

Conceptual Level

CONCEPTUAL SCHEMA

Conceptual/internal mapping

Internal Level

INTERNAL SCHEMA

STORED DATABASE
Surendra Singh Chahar Department of CS&IT, SU 21

Data Independence
It is defined as the capacity to change the schema at one level of a database system without having to change the schema at the next higher level. Logical Data Independence: The capacity to change the conceptual schema without having to change the external schemas and their application programs. Physical Data Independence: The capacity to change the internal schema without having to change the conceptual schema.

Surendra Singh Chahar Department of CS&IT, SU

22

Data Independence
When a schema at a lower level is changed, only the mappings between this schema and higher-level schemas need to be changed in a DBMS that fully supports data independence. The higher-level schemas themselves are unchanged. Hence, the application programs need not be changed since they refer to the external schemas.
Surendra Singh Chahar Department of CS&IT, SU 23

DBMS Languages
Data Definition Language (DDL): Used by the DBA and database designers to specify the conceptual schema of a database. In many DBMSs, the DDL is also used to define internal and external schemas (views). In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas.
Surendra Singh Chahar Department of CS&IT, SU 24

DBMS Languages
Data Manipulation Language (DML): Used to specify database retrievals and updates. DML commands (data sub-language) can be embedded in a general-purpose programming language (host language), such as COBOL, C or an Assembly Language. Alternatively, stand-alone DML commands can be applied directly (query language).

Surendra Singh Chahar Department of CS&IT, SU

25

DBMS Languages
High Level or Non-procedural Languages: e.g., SQL, are set-oriented and specify what data to retrieve than how to retrieve. Also called declarative languages. Low Level or Procedural Languages: recordat-a-time; they specify how to retrieve data and include constructs such as looping.

Surendra Singh Chahar Department of CS&IT, SU

26

DBMS Interfaces
User-friendly interfaces:
Menu-based, popular for browsing on the web Forms-based, designed for nave users Graphics-based (Point and Click, Drag and Drop etc.) Natural language: requests in written English Combinations of the above

Programmer interfaces for embedding DML in programming languages:


Pre-compiler Approach Procedure (Subroutine) Call Approach

Surendra Singh Chahar Department of CS&IT, SU

27

The Database System Environment


DBMS Component Modules

Surendra Singh Chahar Department of CS&IT, SU

28

Database System Utilities


To perform certain functions such as:
Loading A loading utility is used to load existing data filessuch as text files or sequential files-into the database. Usually, the current (source) format of the data file and the desired (target) database file structure are specified to the utility, which then automatically reformats the data and stores it in the database. Backing up A backup utility creates a backup copy of the database, usually by dumping the entire database onto tape. The backup copy can be used to restore the database in case of catastrophic failure.

Surendra Singh Chahar Department of CS&IT, SU

29

Database System Utilities


Reorganizing This utility can be used to reorganize a database file into a different file organization to improve performance.. Report generation utilities. Performance monitoring utilities. Other functions, such as sorting, user monitoring, data compression, etc.

Surendra Singh Chahar Department of CS&IT, SU

30

DBMS Languages
Once the design of a database is completed and a DBMS is chosen to implement the database, the first order of the day is to specify conceptual and internal schemas for the database and any mappings between the two. In many DBMSs where no strict separation of levels is maintained, one language, called the data definition language (OOL), is used by the DBA and by database designers to define both schemas. The DBMS will have a DDL compiler whose function is to process LJDL statements in order to identify descriptions of the schema constructs and to store the schema description in the DBMS catalog. In DBMSs where a clear separation is maintained between the conceptual and internal levels, the DDL is used to specify the conceptual schema only. Another language, the storage definition language (SOL), is used to specify the internal schema. The mappings between the two schemas may be specified in either one of these languages. For a true three-schema architecture, we would need a third language, the view definition language (VDL), to specify user views and their mappings to the conceptual schema, but in most DBMSs the DDL is used to define both conceptual and external schemas. Once the database schemas arc compiled and the database is populated with data, users must have some means to manipulate the database. Typical manipulations include retrieval, insertion, deletion, and modification of the data. The DBMS provides a set of operations or a language called the data manipulation language (DML) for these purposes. Surendra Singh Chahar 31
Department of CS&IT, SU

Chapter 3
Data Modeling Using the EntityRelationship (ER) Model

Surendra Singh Chahar Department of CS&IT, SU Shamkant B. Navathe

32

ER DIAGRAM OF COMPANY DATABASE Requirements of the Company The company is organized into DEPARTMENTs. Each department has a name, number and an employee who manages the department. We keep track of the start date of the department manager. Each department controls a number of PROJECTs. Each project has a name, number and is located at a single location.

Surendra Singh Chahar Department of CS&IT, SU

33

ER DIAGRAM OF COMPANY DATABASE (Cont.)

store each EMPLOYEEs social security number, address, salary, sex, and birthdate. Each employee works for one department but may work on several projects. We keep track of the number of hours per week that an employee currently works on each project. We also keep track of the direct supervisor of each employee. Each employee may have a number of DEPENDENTs. For each dependent, we keep track of their name, sex, birthdate, and relationship to employee.
Surendra Singh Chahar Department of CS&IT, SU 34

We

ER Model Concepts
Entities and Attributes Entities are specific objects or things in the mini-world that are represented in the database. For example the EMPLOYEE John Smith, the Research DEPARTMENT, the ProductX PROJECT Attributes are properties used to describe an entity. For example an EMPLOYEE entity may have a Name, SSN, Address, Sex, Birthdate A specific entity will have a value for each of its attributes. For example a specific employee entity may have Name='John Smith', SSN='123456789', Address ='731, Fondren, Houston, TX', Sex='M', BirthDate='09-JAN-55 Each attribute has a value set (or data type) associated with it e.g. integer, string,
Surendra Singh Chahar Department of CS&IT, SU 35

Types of Attributes (1)


Simple
Each entity has a single atomic value for the attribute. For example, SSN or Sex.

Composite
The attribute may be composed of several components. For example, Address (Apt#, House#, Street, City, State, ZipCode, Country) or Name (FirstName, MiddleName, LastName). Composition may form a hierarchy where some components are themselves composite.

Multi-valued
An entity may have multiple values for that attribute. For example, Color of a CAR or Previous Degrees of a STUDENT. Denoted as {Color} or {Previous Degrees}.

Surendra Singh Chahar Department of CS&IT, SU

36

Types of Attributes (2)


In general, composite and multi-valued attributes may be nested arbitrarily to any number of levels although this is rare. For example, Previous Degrees of a STUDENT is a composite multi-valued attribute denoted by {Previous Degrees (College, Year, Degree, Field)}.

Surendra Singh Chahar Department of CS&IT, SU

37

Entity Types and Key Attributes


Entities with the same basic attributes are grouped or typed into an entity type. For example, the EMPLOYEE entity type or the PROJECT entity type. An attribute of an entity type for which each entity must have a unique value is called a key attribute of the entity type. For example, SSN of EMPLOYEE. A key attribute may be composite. For example, VehicleTagNumber is a key of the CAR entity type with components (Number, State). An entity type may have more than one key. For example, the CAR entity type may have two keys:
VehicleIdentificationNumber (popularly called VIN) and VehicleTagNumber (Number, State), also known as license_plate number.
Surendra Singh Chahar Department of CS&IT, SU 38

ENTITY SET corresponding to the ENTITY TYPE CAR


CAR Registration(RegistrationNumber, State), VehicleID, Make, Model, Year, (Color)
car1 ((ABC 123, TEXAS), TK629, Ford Mustang, convertible, 1999, (red, black)) car2 ((ABC 123, NEW YORK), WP9872, Nissan 300ZX, 2-door, 2002, (blue)) car3 ((VSY 720, TEXAS), TD729, Buick LeSabre, 4-door, 2003, (white, blue))

. . .
Surendra Singh Chahar Department of CS&IT, SU 39

SUMMARY OF ER-DIAGRAM NOTATION FOR ER SCHEMAS


Symbol Meaning ENTITY TYPE WEAK ENTITY TYPE RELATIONSHIP TYPE IDENTIFYING RELATIONSHIP TYPE ATTRIBUTE KEY ATTRIBUTE MULTIVALUED ATTRIBUTE COMPOSITE ATTRIBUTE DERIVED ATTRIBUTE

E1 E1
R

R R N

E2 E2 E

TOTAL PARTICIPATION OF E2 IN R CARDINALITY RATIO 1:N FOR E1:E2 IN R STRUCTURAL CONSTRAINT (min, max) ON PARTICIPATION OF E IN R

(min,max)

Surendra Singh Chahar Department of CS&IT, SU

40

ER DIAGRAM OF COMPANY DATABASE


Entity-type: EMPLOYEE, DEPARTMENT, PROJECT, DEPENDENT

Surendra Singh Chahar Department of CS&IT, SU

41

Relationships and Relationship Type (1)


A relationship relates two or more distinct entities with a specific meaning. For example, EMPLOYEE John Smith works on the ProductX PROJECT or EMPLOYEE Franklin Wong manages the Research DEPARTMENT. Relationships of the same type are grouped or typed into a relationship type. For example, the WORKS_ON relationship type in which EMPLOYEEs and PROJECTs participate, or the MANAGES relationship type in which EMPLOYEEs and DEPARTMENTs participate. The degree of a relationship type is the number of participating entity types. Both MANAGES and WORKS_ON are binary relationships.

Surendra Singh Chahar Department of CS&IT, SU

42

Example relationship instances of the WORKS_FOR relationship between EMPLOYEE and DEPARTMENT EMPLOYEE
e1 e2 e3 e4 e5 e6 e7

WORKS_FOR
r1 r2 r3 r4 r5

DEPARTMENT
d1

      

d2

d3

r6 r7
Surendra Singh Chahar Department of CS&IT, SU 43

Example relationship instances of the WORKS_ON relationship between EMPLOYEE and PROJECT EMPLOYEE WORKS_ON PROJECT
r9 e1 e2 e3 e4 e5 e6 e7
    

r1 r2 r3 r4 r5

p1

p2

p3

 

r6 r8 r7
44

Surendra Singh Chahar Department of CS&IT, SU

Relationships and Relationship Type (2)


More than one relationship type can exist with the same participating entity types. For example, MANAGES and WORKS_FOR are distinct relationships between EMPLOYEE and DEPARTMENT, but with different meanings and different relationship instances.

Surendra Singh Chahar Department of CS&IT, SU

45

ER-DIAGRAM
ENTITY TYPES
WORKS_FOR MANAGES WORKS_ON CONTROLS SUPERVISION DEPENDENTS_OF

Surendra Singh Chahar Department of CS&IT, SU

46

Weak Entity Types


An entity that does not have a key attribute A weak entity must participate in an identifying relationship type with an owner or identifying entity type Entities are identified by the combination of: A partial key of the weak entity type The particular entity they are related to in the identifying entity type Example: Suppose that a DEPENDENT entity is identified by the dependents first name and birhtdate, and the specific EMPLOYEE that the dependent is related to. DEPENDENT is a weak entity type with EMPLOYEE as its identifying entity type via the identifying relationship type DEPENDENT_OF
Surendra Singh Chahar Department of CS&IT, SU 47

Weak Entity Type is: DEPENDENT Identifying Relationship is: DEPENDENTS_OF

Surendra Singh Chahar Department of CS&IT, SU

48

Constraints on Relationships
Constraints on Relationship Types
( Also known as ratio constraints ) Maximum Cardinality
One-to-one (1:1) One-to-many (1:N) or Many-to-one (N:1) Many-to-many

Minimum Cardinality (also called participation constraint or existence dependency constraints)


zero (optional participation, not existence-dependent) one or more (mandatory, existence-dependent)

Surendra Singh Chahar Department of CS&IT, SU

49

Many-to-one (N:1) RELATIONSHIP


EMPLOYEE
e1 e2 e3 e4 e5 e6 e7

WORKS_FOR
r1 r2 r3 r4 r5

DEPARTMENT
d1

      

d2

d3

r6 r7
Surendra Singh Chahar Department of CS&IT, SU 50

Many-to-many (M:N) RELATIONSHIP


EMPLOYEE
e1 e2 e3 e4 e5 e6 e7

WORKS_ON
r9 r1 r2 r3 r4 r5

PROJECT
p1

      

p2

p3

r6 r8 r7
51

Surendra Singh Chahar Department of CS&IT, SU

Relationships and Relationship Types (3)


We can also have a recursive relationship type. Both participations are same entity type in different roles. For example, SUPERVISION relationships between EMPLOYEE (in role of supervisor or boss) and (another) EMPLOYEE (in role of subordinate or worker). In following figure, first role participation labeled with 1 and second role participation labeled with 2. In ER diagram, need to display role names to distinguish participations.

Surendra Singh Chahar Department of CS&IT, SU

52

A RECURSIVE RELATIONSHIP SUPERVISION


EMPLOYEE
e1 e2 e3 e4 e5 e6 e7 2 1 1 2 1
   

SUPERVISION

  

r1 2 r2 r3 1 1 r4 1 2 r6 r5

2 2

Surendra Singh Chahar Department of CS&IT, SU

53

Recursive Relationship Type is: SUPERVISION (participation role names are shown)

Surendra Singh Chahar Department of CS&IT, SU

54

Attributes of Relationship types A relationship type can have attributes; for example, HoursPerWeek of WORKS_ON; its value for each relationship instance describes the number of hours per week that an EMPLOYEE works on a PROJECT.

Surendra Singh Chahar Department of CS&IT, SU

55

Attribute of a Relationship Type is: Hours of WORKS_ON

Surendra Singh Chahar Department of CS&IT, SU

56

Structural Constraints one way to express semantics of relationships


Structural constraints on relationships:


Cardinality ratio (of a binary relationship): 1:1, 1:N, N:1, or M:N


SHOWN BY PLACING APPROPRIATE NUMBER ON THE LINK.

Participation constraint (on each participating entity type): total (called existence dependency) or partial.
SHOWN BY DOUBLE LINING THE LINK

NOTE: These are easy to specify for Binary Relationship Types.


Surendra Singh Chahar Department of CS&IT, SU 57

Alternative (min, max) notation for relationship structural constraints:


Specified on each participation of an entity type E in a relationship type R  Specifies that each entity e in E participates in at least min and at most max relationship instances in R  Default(no constraint): min=0, max=n  Must have minemax, minu0, max u1  Derived from the knowledge of mini-world constraints Examples:  A department has exactly one manager and an employee can manage at most one department. Specify (0,1) for participation of EMPLOYEE in MANAGES Specify (1,1) for participation of DEPARTMENT in MANAGES  An employee can work for exactly one department but a department can have any number of employees. Specify (1,1) for participation of EMPLOYEE in WORKS_FOR Specify (0,n) for participation Singh Chahar Surendra of DEPARTMENT in WORKS_FOR 58

Department of CS&IT, SU

The (min,max) notation relationship constraints


EMPLOYEE (0,1)
Manages

(1,1)

DEPARTMENT

EMPLOYEE

(1,1)
Works_for

(1,N)

DEPARTMENT

Surendra Singh Chahar Department of CS&IT, SU

59

COMPANY ER Schema Diagram using (min, max) notation

Surendra Singh Chahar Department of CS&IT, SU

60

Relationships of Higher Degree

 

Relationship types of degree 2 are called binary Relationship types of degree 3 are called ternary and of degree n are called n-ary In general, an n-ary relationship is not equivalent to n binary relationships Higher-order relationships discussed further in Chapter 4
Surendra Singh Chahar Department of CS&IT, SU 61

Data Modeling Tools


A number of popular tools that cover conceptual modeling and mapping into relational schema design. Examples: E R Win, S- Designer (Enterprise Application Suite), ER- Studio, etc.

POSITIVES: serves as documentation of application requirements, easy user interface mostly graphics editor support

Surendra Singh Chahar Department of CS&IT, SU

62

Problems with Current Modeling Tools


DIAGRAMMING
Poor conceptual meaningful notation. To avoid the problem of layout algorithms and aesthetics of diagrams, they prefer boxes and lines and do nothing more than represent (primary-foreign key) relationships among resulting tables.(a few exceptions)

METHODOLGY
lack of built-in methodology support. poor tradeoff analysis or user-driven design preferences. poor design verification and suggestions for improvement.
Surendra Singh Chahar Department of CS&IT, SU 63

Some of the Currently Available Automated Database Design Tools


COMPANY Embarcadero Technologies ER Studio DB Artisan Developer 2000 and Designer 2000 System Architect 2001 Platinum Enterprice Modeling Suite: Erwin, BPWin, Paradigm Plus Pwertier Rational Rose RW Metro Xcase Enterprise Application Suite Visio Enterprise TOOL FUNCTIONALITY Database Modeling in ER and IDEF1X Database administration and space and security management Database modeling, application development Data modeling, object modeling, process modeling, structured analysis/design Data, process, and business component modeling

Oracle Popkin Software Platinum Technology Persistence Inc. Rational Rogue Ware Resolution Ltd. Sybase Visio

Mapping from O-O to relational model Modeling in UML and application generation in C++ and JAVA Mapping from O-O to relational model Conceptual modeling up to code maintenance Data modeling, business logic modeling

Data modeling, design and reengineering Visual Basic and Visual C++ Surendra Singh Chahar 64 Department of CS&IT, SU

ER DIAGRAM FOR A BANK DATABASE

Surendra Singh Chahar Department of CS&IT, SU

65

PROBLEM with ER notation

THE ENTITY RELATIONSHIP MODEL IN ITS ORIGINAL FORM DID NOT SUPPORT THE SPECIALIZATION/ GENERALIZATION ABSTRACTIONS

Surendra Singh Chahar Department of CS&IT, SU

66

Extended Entity-Relationship (EER) Model


Incorporates Set-subset relationships Incorporates Specialization/Generalization Hierarchies NEXT CHAPTER ILLUSTRATES HOW THE ER MODEL CAN BE EXTENDED WITH - Set-subset relationships and Specialization/Generalization Hierarchies and how to display them in EER diagrams

Surendra Singh Chahar Department of CS&IT, SU

67

The Relational Data Model and Relational Database Constraints

Surendra Singh Chahar Department of CS&IT, SU

68

Relational Model Concepts


 The relational Model of Data is based on the concept of a Relation.  A Relation is a mathematical concept based on the ideas of sets.  The strength of the relational approach to data management comes from the formal foundation provided by the theory of relations.

Surendra Singh Chahar Department of CS&IT, SU

69

INFORMAL DEFINITIONS
 RELATION: A table of values
 A relation may be thought of as a set of rows.  A relation may alternately be thought of as a set of columns.  Each row represents a fact that corresponds to a real-world entity or relationship.  Each row has a value of an item or set of items that uniquely identifies that row in the table.  Each column typically is called by its column name or column header or attribute name.
Surendra Singh Chahar Department of CS&IT, SU 70

FORMAL DEFINITIONS
 A Relation may be defined in multiple ways.  The Schema of a Relation: R (A1, A2, .....An)  Relation schema R is defined over attributes A1, A2, .....An For Example CUSTOMER (Cust-id, Cust-name, Address, Phone#)  Here, CUSTOMER is a relation defined over the four attributes Cust-id, Cust-name, Address, Phone#, each of which has a domain or a set of valid values. For example, the domain of Cust-id is 6 digit numbers.
Surendra Singh Chahar Department of CS&IT, SU 71

FORMAL DEFINITIONS
 A tuple is an ordered set of values  Each value is derived from an appropriate domain.  Each row in the CUSTOMER table may be referred to as a tuple in the table and would consist of four values.
 <632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000">

is a tuple belonging to the CUSTOMER relation.  A relation may be regarded as a set of tuples (rows).  Columns in a table are also called attributes of the relation.

Surendra Singh Chahar Department of CS&IT, SU

72

FORMAL DEFINITIONS
 A domain has a logical definition: e.g., USA_phone_numbers are the set of 10 digit phone numbers valid in the U.S.  A domain may have a data-type or a format defined for it. The USA_phone_numbers may have a format: (ddd)-ddddddd where each d is a decimal digit. E.g., Dates have various formats such as monthname, date, year or yyyy-mm-dd, or dd mm,yyyy etc.  An attribute designates the role played by the domain. E.g., the domain Date may be used to define attributes Invoicedate and Payment-date.

Surendra Singh Chahar Department of CS&IT, SU

73

FORMAL DEFINITIONS
 The relation is formed over the cartesian product of the sets; each set has values from a domain; that domain is used in a specific role which is conveyed by the attribute name.  For example, attribute Cust-name is defined over the domain of strings of 25 characters. The role these strings play in the CUSTOMER relation is that of the name of customers.  Formally,  Given R(A1, A2, .........., An)
 r(R) dom (A1) X dom (A2) X ....X dom(An)

   

R: schema of the relation r of R: a specific "value" or population of R. R is also called the intension of a relation r is also called the extension of a relation
Surendra Singh Chahar Department of CS&IT, SU 74

FORMAL DEFINITIONS
 Let S1 = {0,1} and S2 = {a,b,c}  Let R S1 X S2, then  for example: r(R) = {<0,a> , <0,b> , <1,c> } is one possible state or population or extension r of the relation R, defined over domains S1 and S2. It has three tuples.

Surendra Singh Chahar Department of CS&IT, SU

75

DEFINITION SUMMARY
Informal Terms Table Column Row Values in a column Table Definition Populated Table Formal Terms Relation Attribute/Domain Tuple Domain Schema of a Relation Extension

Surendra Singh Chahar Department of CS&IT, SU

76

Example

Surendra Singh Chahar Department of CS&IT, SU

77

CHARACTERISTICS OF RELATIONS
 Ordering of tuples in a relation r(R): The tuples are not considered to be ordered, even though they appear to be in the tabular form.  Ordering of attributes in a relation schema R (and of values within each tuple): We will consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to be ordered .  (However, a more general alternative definition of relation does not require this ordering).  Values in a tuple: All values are considered atomic (indivisible). A special null value is used to represent values that are unknown or inapplicable to certain tuples.

Surendra Singh Chahar Department of CS&IT, SU

78

CHARACTERISTICS OF RELATIONS

 Notation: We refer to component values of a tuple t by t[Ai] = vi (the value of attribute Ai for tuple t). Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing the values of attributes Au, Av, ..., Aw, respectively.

Surendra Singh Chahar Department of CS&IT, SU

79

CHARACTERISTICS OF RELATIONS

Surendra Singh Chahar Department of CS&IT, SU

80

Relational Integrity Constraints


 Constraints are conditions that must hold on all valid relation instances. There are three main types of constraints:
   Key constraints Entity integrity constraints Referential integrity constraints

Surendra Singh Chahar Department of CS&IT, SU

81

Key Constraints
An key is an attribute or collection of attributes that may be used to identify or retrieve one or more records. There are many types of keys in RDBMS:
Candidate Key Primary Key Foreign Key Super Key Alternate Key Compound Key Secondary Key
Surendra Singh Chahar Department of CS&IT, SU 82

Candidate Key
A candidate key is any set of one or more columns whose combined values are unique among the all tuples. A candidate key is an key that can be used to uniquely identify a record. A candidate key is a attribute which can be selected as a primary key. Each attribute or a set of attributes is called a candidate key of the relation if it satisfies the following axioms: The attribute or the set of attributes uniquely identifies each tuple in the relation, and If the key is a set of attributes then no subset of these attributes has property i.e. minimality.
Surendra Singh Chahar Department of CS&IT, SU 83

Primary Key
The primary key of a relational table uniquely identifies each record in the table. It is a unique identifier, such as a driver license number, telephone number (including area code), or vehicle identification number (VIN). The PRIMARY KEY constraint cannot accept null values. Because PRIMARY KEY constraints guarantee unique data, they are frequently defined on an identity column. Primary keys for some of the sample tables are:
Employee table Department table Project table EMPNO DEPTNO PROJNO
Surendra Singh Chahar Department of CS&IT, SU 84

A Primary Key on the PROJECT table

PROJNO (Primary Key) MA2100 MA2110 Robotics

PROJNAME

DEPTNO D01 D11

Linear Programming

A Composite Primary Key on the EMP_ACT table


EMPNO. (Primary Key) PROJNO (Primary Key) ACTNO (Primary Key) EMPTIME EMSTDATE (Primary Key)

000250 000250 000250

AD3112 AD3112 AD3112

60 60 70

1.0 .5 .5

2010-01-01 2010-02-01 2010-04-01


85

Surendra Singh Chahar Department of CS&IT, SU

EMPLOYEE Table
EMPNO (Primary Key) 000010 000030 000060 000120 000140 000170 FIRSTNAME LASTNAME WORKDEPT (Foreign Key) A00 C01 D11 A00 C01 D11 PHONENO

Christine Sally Irving Sean Heather Masatoshi

Haas Kwan Stern O'Connell Nicholls Yoshimura

3978 4738 6423 2167 1793 2890

Surendra Singh Chahar Department of CS&IT, SU

86

Primary Key
The criteria for selecting a primary key from a pool of candidate keys should be persistence, uniqueness, and stability: Persistence means that a primary key value for each row always exists. Uniqueness means that the key value for each row is different from all the others. Stability means that primary key values never change. Of the three candidate keys in the example, only EMPNO satisfies all of these criteria. An employee may not have a phone number when joining a company. Last names can change, and, although they may be unique at one point, are not guaranteed to be so. The employee number column is the best choice for the primary key. An employee is assigned a unique number only once, and that number is generally not updated as long as the employee remains with the company. Since each employee must have a number, values in the employee number column are persistent.

Surendra Singh Chahar Department of CS&IT, SU

87

Foreign Keys
A foreign key is a reference to a key in another relation, i.e., the referencing tuple has, as one of its attributes, the values of a key in the referenced tuple. Foreign keys need not have unique values in the referencing relation. A foreign key is a referential constraint between two tables. The foreign key identifies a column(s) in one (referencing) table that refers to a column(s) in another (referenced) table. The column in the referencing table must be the primary key or other candidate key in the referenced table.

Surendra Singh Chahar Department of CS&IT, SU

88

Super Key
A super key is a column or set of columns that uniquely identifies a row within a table. Example, Given table EMLOYEES{e_id, fname, lname, salary} Possible super keys are: {e_id}, {e_id, fname}, ., {e_id, fname, lname, salary} Here, only the minimal superkey {e_id} will be considered as a candidate key. A super key is a combination of attributes that can be uniquely used to identify a database record.

Surendra Singh Chahar Department of CS&IT, SU

89

Another example, in Branch_Schema,


{bname} is a super key. {bname, bcity} is a super key. {bname, bcity} is not a candidate key, as the super key {bname} is contained in it. {bname} is a candidate key. {bcity} is not a super key, as branches may be in the same city. {bname} is considered as primary key.

Alternate Key- An alternate key is any candidate key which is not


selected to be the primary key.

Compound Key- A compound key as a composite/concatenated key is


a key that consists of 2 or more attributes.

Secondary Key- A secondary key is any column or a set of columns


which is used for searching records from the database. For example, In a STUDENT table (RollNo, Name, DOB, City), RollNo is primary key and CITY is a secondary key.
Surendra Singh Chahar Department of CS&IT, SU 90

Integrity Constraints
The aim of data integrity is to specify rules that implicitly or explicitly define a consistent database state. The integrity of RDBMS is based on certain rules proposed by E. F. Codd . The Codds rules are:

Surendra Singh Chahar Department of CS&IT, SU

91

Codds Rules
Rule 0: The system must qualify as relational, as a database, and as a management system. For a system to qualify as a relational database management system (RDBMS), that system must use its relational facilities (exclusively) to manage the database. Rule 1: The information rule: All information in the database is to be represented in one and only one way, namely by values in column positions within rows of tables. Rule 2: The guaranteed access rule: All data must be accessible. This rule is essentially a restatement of the fundamental requirement for primary keys. It says that every individual scalar value in the database must be logically addressable by specifying the name of the containing table, the name of the containing column and the primary key value of the containing row.
Surendra Singh Chahar Department of CS&IT, SU 92

Codds Rules
Rule 3: Systematic treatment of null values: The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way. Rule 4: Active online catalog based on the relational model: The database description is represented at the logical level in the same way as the ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.
Surendra Singh Chahar Department of CS&IT, SU 93

Codds Rules
Rule 5: The comprehensive data sub-language rule: The system must support at least one relational language that
1.Has a linear syntax 2.Can be used both interactively and within application programs, 3.Supports data definition operations (including view definitions), data manipulation operations (update as well as retrieval), security and integrity constraints, and transaction management operations (begin, commit, and rollback).

Rule 6: The view updating rule: All views that are theoretically updatable must be updatable by the system.
Surendra Singh Chahar Department of CS&IT, SU 94

Codds Rules
Rule 7: High-level insert, update, and delete: The system must support set-at-a-time insert, update, and delete operators. This means that data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table. Rule 8: Physical data independence: Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.) must not require a change to the conceptual level and external level. Rule 9: Logical data independence: Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data independence.
Surendra Singh Chahar Department of CS&IT, SU 95

Codds Rules
Rule 10: Integrity independence: Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications. Rule 11: Distribution independence: The distribution of portions of the database to various locations should be invisible to users of the database. Existing applications should continue to operate successfully : 1. when a distributed version of the DBMS is first introduced; and 2. when existing distributed data are redistributed around the system.

Surendra Singh Chahar Department of CS&IT, SU

96

Codds Rules
Rule 12: The non-subversion rule: If a relational system has a low-level (single record at a time) language, that low level cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher-level relational language (multiple records at a time).

Surendra Singh Chahar Department of CS&IT, SU

97

Integrity Constraints
The integrity constraints are necessary to avoid situations like the following:
Some data has been inserted in the database but it cannot be identified i.e., it is not clear which object or entity the data is about. A student is enrolled in a course but no data about him is available in the relation that has information about students. During a query processing , a student is compared with a course number. A students quits the university and is removed from the student relation but is still enrolled in a course.

Constraints are not formally part of the relational model, but because of the integrity role that play in organizing data.

Surendra Singh Chahar Department of CS&IT, SU

98

Integrity Rules
The following are the 2 integrity rules to be satisfied by any relation: Entity Integrity: Primary key cannot be null. Referential Integrity: The database must not contain any unmatched Foreign key values. This is called the referential integrity rule.

Surendra Singh Chahar Department of CS&IT, SU

99

Entity Integrity
Relational Database Schema: A set S of relation schemas that belong to the same database. S is the name of the database. S = {R1, R2, ..., Rn} Entity Integrity: The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R). This is because primary key values are used to identify the individual tuples. t[PK] { null for any tuple t in r(R) Note: Other attributes of R may be similarly constrained to disallow null values, even though they are not members of the primary key.
Surendra Singh Chahar Department of CS&IT, SU 100

Referential Integrity
A constraint involving two relations (the previous constraints involve a single relation). Used to specify a relationship among tuples in two relations: the referencing relation and the referenced relation. Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R2. A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK]. A referential integrity constraint can be displayed in a relational database schema as a directed arc from R1.FK to R2.
Surendra Singh Chahar Department of CS&IT, SU 101

Referential Integrity Constraint


Statement of the constraint The value in the foreign key column (or columns) FK of the the referencing relation R1
can be either: (1) a value of an existing primary key value of the corresponding primary key PK in the referenced relation R2,, or.. (2) a null. In case (2), the FK in R1 should not be a part of its own primary key.
Surendra Singh Chahar Department of CS&IT, SU 102

5.5

Surendra Singh Chahar Department of CS&IT, SU

103

5.6

Surendra Singh Chahar Department of CS&IT, SU

104

5.7

Surendra Singh Chahar Department of CS&IT, SU

105

In-Class Exercise
Consider the following relations for a database that keeps track of student enrollment in courses and the books adopted for each course: STUDENT(SSN, Name, Major, Bdate) COURSE(Course#, Cname, Dept) ENROLL(SSN, Course#, Quarter, Grade) BOOK_ADOPTION(Course#, Quarter, Book_ISBN) TEXT(Book_ISBN, Book_Title, Publisher, Author) Draw a relational schema diagram specifying the foreign keys for this schema.

Surendra Singh Chahar Department of CS&IT, SU

106

Introduction to SQL
What is SQL?
When a user wants to get some information from a database file, he can issue a query. A query is a userrequest to retrieve data or information with a certain condition. SQL is a query language that allows user to specify the conditions. (instead of algorithms)

Surendra Singh Chahar Department of CS&IT, SU

107

Introduction to SQL
Concept of SQL
The user specifies a certain condition. The program will go through all the records in the database file and select those records that satisfy the condition. (searching). Statistical information of the data. The result of the query will then be stored in form of a table.
Surendra Singh Chahar Department of CS&IT, SU 108

Basic structure of an SQL query


General Structure Comparison Grouping Display Order Logical Operators Output Union SELECT, ALL / DISTINCT, *, AS, FROM, WHERE IN, BETWEEN, LIKE "% _" GROUP BY, HAVING, COUNT( ), SUM( ), AVG( ), MAX( ), MIN( ) ORDER BY, ASC / DESC AND, OR, NOT

INTO TABLE / CURSOR TO FILE [ADDITIVE], TO PRINTER, TO SCREEN UNION

Surendra Singh Chahar Department of CS&IT, SU

109

The Situation: The Particulars


field id name dob sex class hcode dcode remission mtest type width numeric 4 character 10 date 8 character 1 character 2 character 1 character 3 logical 1 numeric 2 contents student id number name date of birth sex: M / F class house code: R, Y, B, G district code fee remission Math test score

Surendra Singh Chahar Department of CS&IT, SU

110

SQL-99: Schema Definition, Basic Constraints, and Queries

Surendra Singh Chahar Department of CS&IT, SU

111

Data Definition, Constraints, and Schema Changes


Used to CREATE, DROP, and ALTER the descriptions of the tables (relations) of a database

Surendra Singh Chahar Department of CS&IT, SU

112

CREATE TABLE
Specifies a new base relation by giving it a name, and specifying each of its attributes and their data types (INTEGER, FLOAT, DECIMAL(i,j), CHAR(n), VARCHAR(n)) A constraint NOT NULL may be specified on an attribute
CREATE TABLE DEPARTMENT ( DNAME VARCHAR(10) NOT NULL, DNUMBER INTEGER NOT NULL, MGRSSN CHAR(9), MGRSTARTDATE CHAR(9) );
Surendra Singh Chahar Department of CS&IT, SU 113

CREATE TABLE
In SQL2, can use the CREATE TABLE command for specifying the primary key attributes, secondary keys, and referential integrity constraints (foreign keys). Key attributes can be specified via the PRIMARY KEY and UNIQUE phrases
CREATE TABLE DEPT ( DNAME VARCHAR(10) NOT NULL, DNUMBER INTEGER NOT NULL, MGRSSN CHAR(9), MGRSTARTDATE CHAR(9), PRIMARY KEY (DNUMBER), UNIQUE (DNAME), FOREIGN KEY (MGRSSN) REFERENCES EMP );
Surendra Singh Chahar Department of CS&IT, SU 114

DROP TABLE
Used to remove a relation (base table) and its definition The relation can no longer be used in queries, updates, or any other commands since its description no longer exists Example: DROP TABLE DEPENDENT;
Surendra Singh Chahar Department of CS&IT, SU 115

ALTER TABLE
Used to add an attribute to one of the base relations The new attribute will have NULLs in all the tuples of the relation right after the command is executed; hence, the NOT NULL constraint is not allowed for such an attribute Example:
ALTER TABLE EMPLOYEE ADD JOB VARCHAR(12);

The database users must still enter a value for the new attribute JOB for each EMPLOYEE tuple. This can be done using the UPDATE command.

Surendra Singh Chahar Department of CS&IT, SU

116

Features Added in SQL2 and SQL-99


CREATE SCHEMA REFERENTIAL INTEGRITY OPTIONS

Surendra Singh Chahar Department of CS&IT, SU

117

CREATE SCHEMA
Specifies a new database schema by giving it a name

Surendra Singh Chahar Department of CS&IT, SU

118

REFERENTIAL INTEGRITY OPTIONS


We can specify RESTRICT, CASCADE, SET NULL or SET DEFAULT on referential integrity constraints (foreign keys) CREATE TABLE DEPT ( DNAME VARCHAR(10) NOT NULL, DNUMBER INTEGER NOT NULL, MGRSSN CHAR(9), MGRSTARTDATE CHAR(9), PRIMARY KEY (DNUMBER), UNIQUE (DNAME), FOREIGN KEY (MGRSSN) REFERENCES EMP ON DELETE SET DEFAULT ON UPDATE CASCADE );
Surendra Singh Chahar Department of CS&IT, SU 119

REFERENTIAL INTEGRITY OPTIONS (continued)


CREATE TABLE EMP ( ENAME VARCHAR(30) NOT NULL, ESSN CHAR(9), BDATE DATE, DNO INTEGER DEFAULT 1, SUPERSSN CHAR(9), PRIMARY KEY (ESSN), FOREIGN KEY (DNO) REFERENCES DEPT ON DELETE SET DEFAULT ON UPDATE CASCADE, FOREIGN KEY (SUPERSSN) REFERENCES EMP ON DELETE SET NULL ON UPDATE CASCADE );

Surendra Singh Chahar Department of CS&IT, SU

120

Additional Data Types in SQL2 and SQL-99


Has DATE, TIME, and TIMESTAMP data types DATE:
Made up of year-month-day in the format yyyy-mm-dd

TIME:
Made up of hour:minute:second in the format hh:mm:ss

TIME(i):
Made up of hour:minute:second plus i additional digits specifying fractions of a second format is hh:mm:ss:ii...i

TIMESTAMP:
Has both DATE and TIME components
Surendra Singh Chahar Department of CS&IT, SU 121

Additional Data Types in SQL2 and SQL-99 (cont.)


INTERVAL:
Specifies a relative value rather than an absolute value Can be DAY/TIME intervals or YEAR/MONTH intervals Can be positive or negative when added to or subtracted from an absolute value, the result is an absolute value

Surendra Singh Chahar Department of CS&IT, SU

122

Retrieval Queries in SQL


SQL has one basic statement for retrieving information from a database; the SELECT statement This is not the same as the SELECT operation of the relational algebra Important distinction between SQL and the formal relational model; SQL allows a table (relation) to have two or more tuples that are identical in all their attribute values Hence, an SQL relation (table) is a multi-set (sometimes called a bag) of tuples; it is not a set of tuples SQL relations can be constrained to be sets by specifying PRIMARY KEY or UNIQUE attributes, or by using the DISTINCT option in a query

Surendra Singh Chahar Department of CS&IT, SU

123

Retrieval Queries in SQL (cont.)


Basic form of the SQL SELECT statement is called a mapping or a SELECT-FROM-WHERE block
SELECT FROM WHERE <attribute list> <table list> <condition>

<attribute list> is a list of attribute names whose values are to be retrieved by the query <table list> is a list of the relation names required to process the query <condition> is a conditional (Boolean) expression that identifies the tuples to be retrieved by the query
Surendra Singh Chahar Department of CS&IT, SU 124

Relational Database Schema--Figure 5.5

Surendra Singh Chahar Department of CS&IT, SU

125

Populated Database

Surendra Singh Chahar Department of CS&IT, SU

126

Simple SQL Queries


Basic SQL queries correspond to using the SELECT, PROJECT, and JOIN operations of the relational algebra All subsequent examples use the COMPANY database Example of a simple query on one relation Query 0: Retrieve the birthdate and address of the employee whose name is 'John B. Smith'.
Q0: SELECT BDATE, ADDRESS FROM EMPLOYEE WHERE FNAME='John' AND MINIT='B AND LNAME='Smith Similar to a SELECT-PROJECT pair of relational algebra operations; the SELECT-clause specifies the projection attributes and the WHERE-clause specifies the selection condition However, the result of the query may contain duplicate tuples
Surendra Singh Chahar Department of CS&IT, SU 127

Simple SQL Queries (cont.)


Query 1: Retrieve the name and address of all employees who work for the 'Research' department. Q1: SELECT FNAME, LNAME, ADDRESS FROM EMPLOYEE, DEPARTMENT WHERE DNAME='Research' AND DNUMBER=DNO Similar to a SELECT-PROJECT-JOIN sequence of relational algebra operations (DNAME='Research') is a selection condition (corresponds to a SELECT operation in relational algebra) (DNUMBER=DNO) is a join condition (corresponds to a JOIN operation in relational algebra)

Surendra Singh Chahar Department of CS&IT, SU

128

Simple SQL Queries (cont.)


Query 2: For every project located in 'Stafford', list the project number, the controlling department number, and the department manager's last name, address, and birthdate. Q2: SELECT FROM WHERE AND PNUMBER, DNUM, LNAME, BDATE, ADDRESS PROJECT, DEPARTMENT, EMPLOYEE DNUM=DNUMBER AND MGRSSN=SSN PLOCATION='Stafford'

In Q2, there are two join conditions The join condition DNUM=DNUMBER relates a project to its controlling department The join condition MGRSSN=SSN relates the controlling department to the employee who manages that department
Surendra Singh Chahar Department of CS&IT, SU 129

Aliases, * and DISTINCT, Empty WHERE-clause


In SQL, we can use the same name for two (or more) attributes as long as the attributes are in different relations A query that refers to two or more attributes with the same name must qualify the attribute name with the relation name by prefixing the relation name to the attribute name Example: EMPLOYEE.LNAME, DEPARTMENT.DNAME

Surendra Singh Chahar Department of CS&IT, SU

130

ALIASES
Some queries need to refer to the same relation twice In this case, aliases are given to the relation name Query 8: For each employee, retrieve the employee's name, and the name of his or her immediate supervisor. Q8: SELECT FROM WHERE E.FNAME, E.LNAME, S.FNAME, S.LNAME EMPLOYEE E S E.SUPERSSN=S.SSN

In Q8, the alternate relation names E and S are called aliases or tuple variables for the EMPLOYEE relation We can think of E and S as two different copies of EMPLOYEE; E represents employees in role of supervisees and S represents employees in role of supervisors
Surendra Singh Chahar Department of CS&IT, SU 131

ALIASES (cont.)
Aliasing can also be used in any SQL query for convenience Can also use the AS keyword to specify aliases Q8: SELECT FROM WHERE E.FNAME, E.LNAME, S.FNAME, S.LNAME EMPLOYEE AS E, EMPLOYEE AS S E.SUPERSSN=S.SSN

Surendra Singh Chahar Department of CS&IT, SU

132

UNSPECIFIED WHERE-clause
A missing WHERE-clause indicates no condition; hence, all tuples of the relations in the FROM-clause are selected This is equivalent to the condition WHERE TRUE Query 9: Retrieve the SSN values for all employees. Q9: SELECT FROM SSN EMPLOYEE

If more than one relation is specified in the FROM-clause and there is no join condition, then the CARTESIAN PRODUCT of tuples is selected
Surendra Singh Chahar Department of CS&IT, SU 133

UNSPECIFIED WHERE-clause (cont.)


Example: Q10: SELECT FROM SSN, DNAME EMPLOYEE, DEPARTMENT

It is extremely important not to overlook specifying any selection and join conditions in the WHERE-clause; otherwise, incorrect and very large relations may result

Surendra Singh Chahar Department of CS&IT, SU

134

USE OF *
To retrieve all the attribute values of the selected tuples, a * is used, which stands for all the attributes Examples: Q1C: SELECT FROM WHERE SELECT FROM WHERE * EMPLOYEE DNO=5 * EMPLOYEE, DEPARTMENT DNAME='Research' AND DNO=DNUMBER
135

Q1D:

Surendra Singh Chahar Department of CS&IT, SU

USE OF DISTINCT
SQL does not treat a relation as a set; duplicate tuples can appear To eliminate duplicate tuples in a query result, the keyword DISTINCT is used For example, the result of Q11 may have duplicate SALARY values whereas Q11A does not have any duplicate values Q11: Q11A: SELECT FROM SELECT FROM SALARY EMPLOYEE DISTINCT SALARY EMPLOYEE
136

Surendra Singh Chahar Department of CS&IT, SU

SET OPERATIONS
SQL has directly incorporated some set operations There is a union operation (UNION), and in some versions of SQL there are set difference (MINUS) and intersection (INTERSECT) operations The resulting relations of these set operations are sets of tuples; duplicate tuples are eliminated from the result The set operations apply only to union compatible relations ; the two relations must have the same attributes and the attributes must appear in the same order
Surendra Singh Chahar Department of CS&IT, SU 137

SET OPERATIONS (cont.)


Query 4: Make a list of all project numbers for projects that involve an employee whose last name is 'Smith' as a worker or as a manager of the department that controls the project. Q4: (SELECT PNAME FROM PROJECT, DEPARTMENT, EMPLOYEE WHERE DNUM=DNUMBER AND MGRSSN=SSN AND LNAME='Smith') UNION (SELECT PNAME FROM PROJECT, WORKS_ON, EMPLOYEE WHERE PNUMBER=PNO AND ESSN=SSN AND LNAME='Smith')

Surendra Singh Chahar Department of CS&IT, SU

138

NESTING OF QUERIES
A complete SELECT query, called a nested query , can be specified within the WHERE-clause of another query, called the outer query Many of the previous queries can be specified in an alternative form using nesting Query 1: Retrieve the name and address of all employees who work for the 'Research' department. Q1: SELECT FROM WHERE FROM WHERE FNAME, LNAME, ADDRESS EMPLOYEE DNO IN (SELECT DNUMBER DEPARTMENT DNAME='Research' )

Surendra Singh Chahar Department of CS&IT, SU

139

NESTING OF QUERIES (cont.)


The nested query selects the number of the 'Research' department The outer query select an EMPLOYEE tuple if its DNO value is in the result of either nested query The comparison operator IN compares a value v with a set (or multi-set) of values V, and evaluates to TRUE if v is one of the elements in V In general, we can have several levels of nested queries A reference to an unqualified attribute refers to the relation declared in the innermost nested query In this example, the nested query is not correlated with the outer query

Surendra Singh Chahar Department of CS&IT, SU

140

CORRELATED NESTED QUERIES


If a condition in the WHERE-clause of a nested query references an attribute of a relation declared in the outer query , the two queries are said to be correlated The result of a correlated nested query is different for each tuple (or combination of tuples) of the relation(s) the outer query Query 12: Retrieve the name of each employee who has a dependent with the same first name as the employee. Q12: SELECT FROM WHERE E.FNAME, E.LNAME EMPLOYEE AS E E.SSN IN (SELECT ESSN FROM DEPENDENT WHERE ESSN=E.SSN AND E.FNAME=DEPENDENT_NAME)
Surendra Singh Chahar Department of CS&IT, SU 141

CORRELATED NESTED QUERIES (cont.)


In Q12, the nested query has a different result for each tuple in the outer query A query written with nested SELECT... FROM... WHERE... blocks and using the = or IN comparison operators can always be expressed as a single block query. For example, Q12 may be written as in Q12A Q12A: SELECT FROM WHERE E.FNAME, E.LNAME EMPLOYEE E, DEPENDENT D E.SSN=D.ESSN AND E.FNAME=D.DEPENDENT_NAME

The original SQL as specified for SYSTEM R also had a CONTAINS comparison operator, which is used in conjunction with nested correlated queries This operator was dropped from the language, possibly because of the difficulty in implementing it efficiently
Surendra Singh Chahar Department of CS&IT, SU 142

CORRELATED NESTED QUERIES (cont.)


Most implementations of SQL do not have this operator The CONTAINS operator compares two sets of values , and returns TRUE if one set contains all values in the other set (reminiscent of the division operation of algebra).
Query 3: Retrieve the name of each employee who works on all the projects controlled by department number 5. Q3: SELECT FNAME, LNAME FROM EMPLOYEE WHERE ( (SELECT PNO FROM WORKS_ON WHERE SSN=ESSN) CONTAINS (SELECT PNUMBER FROM PROJECT WHERE DNUM=5) )

Surendra Singh Chahar Department of CS&IT, SU

143

CORRELATED NESTED QUERIES (cont.)


In Q3, the second nested query, which is not correlated with the outer query, retrieves the project numbers of all projects controlled by department 5 The first nested query, which is correlated, retrieves the project numbers on which the employee works, which is different for each employee tuple because of the correlation

Surendra Singh Chahar Department of CS&IT, SU

144

THE EXISTS FUNCTION


EXISTS is used to check whether the result of a correlated nested query is empty (contains no tuples) or not We can formulate Query 12 in an alternative form that uses EXISTS as Q12B below

Surendra Singh Chahar Department of CS&IT, SU

145

THE EXISTS FUNCTION (cont.)


Query 12: Retrieve the name of each employee who has a dependent with the same first name as the employee. Q12B: SELECT FNAME, LNAME FROM EMPLOYEE WHERE EXISTS (SELECT * FROM DEPENDENT WHERE SSN=ESSN AND FNAME=DEPENDENT_NAME)

Surendra Singh Chahar Department of CS&IT, SU

146

THE EXISTS FUNCTION (cont.)


Query 6: Retrieve the names of employees who have no dependents. Q6: SELECT FROM WHERE FNAME, LNAME EMPLOYEE NOT EXISTS (SELECT * FROM DEPENDENT WHERE SSN=ESSN)

In Q6, the correlated nested query retrieves all DEPENDENT tuples related to an EMPLOYEE tuple. If none exist , the EMPLOYEE tuple is selected EXISTS is necessary for the expressive power of SQL

Surendra Singh Chahar Department of CS&IT, SU

147

EXPLICIT SETS
It is also possible to use an explicit (enumerated) set of values in the WHERE-clause rather than a nested query Query 13: Retrieve the social security numbers of all employees who work on project number 1, 2, or 3. Q13: SELECT FROM WHERE DISTINCT ESSN WORKS_ON PNO IN (1, 2, 3)

Surendra Singh Chahar Department of CS&IT, SU

148

NULLS IN SQL QUERIES


SQL allows queries that check if a value is NULL (missing or undefined or not applicable) SQL uses IS or IS NOT to compare NULLs because it considers each NULL value distinct from other NULL values, so equality comparison is not appropriate . Query 14: Retrieve the names of all employees who do not have supervisors. Q14: SELECT FNAME, LNAME FROM EMPLOYEE WHERE SUPERSSN IS NULL Note: If a join condition is specified, tuples with NULL values for the join attributes are not included in the result
Surendra Singh Chahar Department of CS&IT, SU 149

Joined Relations Feature in SQL2


Can specify a "joined relation" in the FROM-clause Looks like any other relation but is the result of a join Allows the user to specify different types of joins (regular "theta" JOIN, NATURAL JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, CROSS JOIN, etc)

Surendra Singh Chahar Department of CS&IT, SU

150

Joined Relations Feature in SQL2 (cont.)


Examples: Q8: SELECT FROM WHERE can be written as: Q8: SELECT E.FNAME, E.LNAME, S.FNAME, S.LNAME FROM (EMPLOYEE E LEFT OUTER JOIN EMPLOYEES ON E.SUPERSSN=S.SSN) Q1: SELECT FNAME, LNAME, ADDRESS FROM EMPLOYEE, DEPARTMENT WHERE DNAME='Research' AND DNUMBER=DNO E.FNAME, E.LNAME, S.FNAME, S.LNAME EMPLOYEE E S E.SUPERSSN=S.SSN

Surendra Singh Chahar Department of CS&IT, SU

151

Joined Relations Feature in SQL2 (cont.)


could be written as: Q1: SELECT FNAME, LNAME, ADDRESS FROM (EMPLOYEE JOIN DEPARTMENT ON DNUMBER=DNO) WHERE DNAME='Research or as: Q1: SELECT FNAME, LNAME, ADDRESS FROM (EMPLOYEE NATURAL JOIN DEPARTMENT AS DEPT(DNAME, DNO, MSSN, MSDATE) WHERE DNAME='Research

Surendra Singh Chahar Department of CS&IT, SU

152

Joined Relations Feature in SQL2 (cont.)


Another Example; Q2 could be written as follows; this illustrates multiple joins in the joined tables Q2: SELECT FROM PNUMBER, DNUM, LNAME, BDATE, ADDRESS (PROJECT JOIN DEPARTMENT ON DNUM=DNUMBER) JOIN EMPLOYEE ON MGRSSN=SSN) ) PLOCATION='Stafford
153

WHERE

Surendra Singh Chahar Department of CS&IT, SU

AGGREGATE FUNCTIONS
Include COUNT, SUM, MAX, MIN, and AVG Query 15: Find the maximum salary, the minimum salary, and the average salary among all employees. Q15: SELECT FROM MAX(SALARY), MIN(SALARY), AVG(SALARY) EMPLOYEE

Some SQL implementations may not allow more than one function in the SELECT-clause

Surendra Singh Chahar Department of CS&IT, SU

154

AGGREGATE FUNCTIONS (cont.)


Query 16: Find the maximum salary, the minimum salary, and the average salary among employees who work for the 'Research' department. Q16: SELECT FROM WHERE MAX(SALARY), MIN(SALARY), AVG(SALARY) EMPLOYEE, DEPARTMENT DNO=DNUMBER AND DNAME='Research'

Surendra Singh Chahar Department of CS&IT, SU

155

AGGREGATE FUNCTIONS (cont.)


Queries 17 and 18: Retrieve the total number of employees in the company (Q17), and the number of employees in the 'Research' department (Q18). Q17: Q18: SELECT FROM SELECT FROM WHERE COUNT (*) EMPLOYEE COUNT (*) EMPLOYEE, DEPARTMENT DNO=DNUMBER AND DNAME='Research
156

Surendra Singh Chahar Department of CS&IT, SU

GROUPING
In many cases, we want to apply the aggregate functions to subgroups of tuples in a relation Each subgroup of tuples consists of the set of tuples that have the same value for the grouping attribute(s) The function is applied to each subgroup independently SQL has a GROUP BY-clause for specifying the grouping attributes, which must also appear in the SELECT-clause

Surendra Singh Chahar Department of CS&IT, SU

157

GROUPING (cont.)
Query 20: For each department, retrieve the department number, the number of employees in the department, and their average salary. Q20: SELECT DNO, COUNT (*), AVG (SALARY) FROM EMPLOYEE GROUP BY DNO In Q20, the EMPLOYEE tuples are divided into groups--each group having the same value for the grouping attribute DNO The COUNT and AVG functions are applied to each such group of tuples separately The SELECT-clause includes only the grouping attribute and the functions to be applied on each group of tuples A join condition can be used in conjunction with grouping

Surendra Singh Chahar Department of CS&IT, SU

158

GROUPING (cont.)
Query 21: For each project, retrieve the project number, project name, and the number of employees who work on that project. Q21: SELECT FROM WHERE GROUP BY PNUMBER, PNAME, COUNT (*) PROJECT, WORKS_ON PNUMBER=PNO PNUMBER, PNAME

In this case, the grouping and functions are applied after the joining of the two relations

Surendra Singh Chahar Department of CS&IT, SU

159

THE HAVING-CLAUSE
Sometimes we want to retrieve the values of these functions for only those groups that satisfy certain conditions The HAVING-clause is used for specifying a selection condition on groups (rather than on individual tuples)

Surendra Singh Chahar Department of CS&IT, SU

160

THE HAVING-CLAUSE (cont.)


Query 22: For each project on which more than two employees work , retrieve the project number, project name, and the number of employees who work on that project.
Q22: SELECT PNUMBER, PNAME, COUNT (*) FROM PROJECT, WORKS_ON WHERE PNUMBER=PNO GROUP BY PNUMBER, PNAME HAVING COUNT (*) > 2
Surendra Singh Chahar Department of CS&IT, SU 161

SUBSTRING COMPARISON
The LIKE comparison operator is used to compare partial strings Two reserved characters are used: '%' (or '*' in some implementations) replaces an arbitrary number of characters, and '_' replaces a single arbitrary character

Surendra Singh Chahar Department of CS&IT, SU

162

SUBSTRING COMPARISON (cont.)


Query 25: Retrieve all employees whose address is in Houston, Texas. Here, the value of the ADDRESS attribute must contain the substring 'Houston,TX'. Q25: SELECT FROM WHERE FNAME, LNAME EMPLOYEE ADDRESS LIKE '%Houston,TX%

Surendra Singh Chahar Department of CS&IT, SU

163

SUBSTRING COMPARISON (cont.)


Query 26: Retrieve all employees who were born during the 1950s. Here, '5' must be the 8th character of the string (according to our format for date), so the BDATE value is '_______5_', with each underscore as a place holder for a single arbitrary character. Q26: SELECT FROM WHERE FNAME, LNAME EMPLOYEE BDATE LIKE

'_______5_

The LIKE operator allows us to get around the fact that each value is considered atomic and indivisible; hence, in SQL, character string attribute values are not atomic
Surendra Singh Chahar Department of CS&IT, SU 164

ARITHMETIC OPERATIONS
The standard arithmetic operators '+', '-'. '*', and '/' (for addition, subtraction, multiplication, and division, respectively) can be applied to numeric values in an SQL query result Query 27: Show the effect of giving all employees who work on the 'ProductX' project a 10% raise.
Q27: SELECT WHERE FNAME, LNAME, 1.1*SALARY FROM EMPLOYEE, WORKS_ON, PROJECT SSN=ESSN AND PNO=PNUMBER AND PNAME='ProductX

Surendra Singh Chahar Department of CS&IT, SU

165

ORDER BY
The ORDER BY clause is used to sort the tuples in a query result based on the values of some attribute(s) Query 28: Retrieve a list of employees and the projects each works in, ordered by the employee's department, and within each department ordered alphabetically by employee last name.
Q28: SELECT FROM WHERE AND ORDER BY DNAME, LNAME, FNAME, PNAME DEPARTMENT, EMPLOYEE, WORKS_ON, PROJECT DNUMBER=DNO AND SSN=ESSN PNO=PNUMBER DNAME, LNAME
166

Surendra Singh Chahar Department of CS&IT, SU

ORDER BY (cont.)
The default order is in ascending order of values We can specify the keyword DESC if we want a descending order; the keyword ASC can be used to explicitly specify ascending order, even though it is the default

Surendra Singh Chahar Department of CS&IT, SU

167

Summary of SQL Queries


A query in SQL can consist of up to six clauses, but only the first two, SELECT and FROM, are mandatory. The clauses are specified in the following order: SELECT <attribute list> FROM <table list> [WHERE <condition>] [GROUP BY <grouping attribute(s)>] [HAVING <group condition>] [ORDER BY <attribute list>]

Surendra Singh Chahar Department of CS&IT, SU

168

Summary of SQL Queries (cont.)


The SELECT-clause lists the attributes or functions to be retrieved The FROM-clause specifies all relations (or aliases) needed in the query but not those needed in nested queries The WHERE-clause specifies the conditions for selection and join of tuples from the relations specified in the FROM-clause GROUP BY specifies grouping attributes HAVING specifies a condition for selection of groups ORDER BY specifies an order for displaying the result of a query A query is evaluated by first applying the WHERE-clause, then GROUP BY and HAVING, and finally the SELECT-clause

Surendra Singh Chahar Department of CS&IT, SU

169

Specifying Updates in SQL


There are three SQL commands to modify the database; INSERT, DELETE, and UPDATE

Surendra Singh Chahar Department of CS&IT, SU

170

INSERT
In its simplest form, it is used to add one or more tuples to a relation Attribute values should be listed in the same order as the attributes were specified in the CREATE TABLE command

Surendra Singh Chahar Department of CS&IT, SU

171

INSERT (cont.)
Example: U1: INSERT INTO EMPLOYEE VALUES ('Richard','K','Marini', '653298653', '30-DEC-52', '98 Oak Forest,Katy,TX', 'M', 37000,'987654321', 4 ) An alternate form of INSERT specifies explicitly the attribute names that correspond to the values in the new tuple Attributes with NULL values can be left out Example: Insert a tuple for a new EMPLOYEE for whom we only know the FNAME, LNAME, and SSN attributes. U1A: INSERT INTO EMPLOYEE (FNAME, LNAME, SSN) VALUES ('Richard', 'Marini', '653298653')

Surendra Singh Chahar Department of CS&IT, SU

172

INSERT (cont.)
Important Note: Only the constraints specified in the DDL commands are automatically enforced by the DBMS when updates are applied to the database Another variation of INSERT allows insertion of multiple tuples resulting from a query into a relation

Surendra Singh Chahar Department of CS&IT, SU

173

INSERT (cont.)
Example: Suppose we want to create a temporary table that has the name, number of employees, and total salaries for each department. A table DEPTS_INFO is created by U3A, and is loaded with the summary information retrieved from the database by the query in U3B. U3A: CREATE TABLE DEPTS_INFO (DEPT_NAME VARCHAR(10), NO_OF_EMPS INTEGER, TOTAL_SAL INTEGER); INSERT INTO SELECT FROM WHERE GROUP BY DEPTS_INFO (DEPT_NAME, NO_OF_EMPS, TOTAL_SAL) DNAME, COUNT (*), SUM (SALARY) DEPARTMENT, EMPLOYEE DNUMBER=DNO DNAME ;

U3B:

Surendra Singh Chahar Department of CS&IT, SU

174

INSERT (cont.)
Note: The DEPTS_INFO table may not be up-to-date if we change the tuples in either the DEPARTMENT or the EMPLOYEE relations after issuing U3B. We have to create a view (see later) to keep such a table up to date.

Surendra Singh Chahar Department of CS&IT, SU

175

DELETE
Removes tuples from a relation Includes a WHERE-clause to select the tuples to be deleted Tuples are deleted from only one table at a time (unless CASCADE is specified on a referential integrity constraint) A missing WHERE-clause specifies that all tuples in the relation are to be deleted; the table then becomes an empty table The number of tuples deleted depends on the number of tuples in the relation that satisfy the WHERE-clause Referential integrity should be enforced

Surendra Singh Chahar Department of CS&IT, SU

176

DELETE (cont.)
Examples: U4A: DELETE FROM WHERE U4B: U4C: DELETE FROM WHERE DELETE FROM WHERE (SELECT FROM WHERE DELETE FROM EMPLOYEE LNAME='Brown EMPLOYEE SSN='123456789 EMPLOYEE DNO IN DNUMBER DEPARTMENT DNAME='Research') EMPLOYEE
177

U4D:

Surendra Singh Chahar Department of CS&IT, SU

UPDATE
Used to modify attribute values of one or more selected tuples A WHERE-clause selects the tuples to be modified An additional SET-clause specifies the attributes to be modified and their new values Each command modifies tuples in the same relation Referential integrity should be enforced

Surendra Singh Chahar Department of CS&IT, SU

178

UPDATE (cont.)
Example: Change the location and controlling department number of project number 10 to 'Bellaire' and 5, respectively. U5: UPDATE SET WHERE PROJECT PLOCATION = 'Bellaire', DNUM = 5 PNUMBER=10

Surendra Singh Chahar Department of CS&IT, SU

179

UPDATE (cont.)
Example: Give all employees in the 'Research' department a 10% raise in salary. U6: UPDATE SET WHERE EMPLOYEE SALARY = SALARY *1.1 DNO IN (SELECT DNUMBER FROM DEPARTMENT WHERE DNAME='Research')

In this request, the modified SALARY value depends on the original SALARY value in each tuple The reference to the SALARY attribute on the right of = refers to the old SALARY value before modification The reference to the SALARY attribute on the left of = refers to the new SALARY value after modification
Surendra Singh Chahar Department of CS&IT, SU 180

Chapter Outline
1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant Information in Tuples and Update Anomalies 1.3 Null Values in Tuples 1.4 Spurious Tuples 2 Functional Dependencies (FDs) 2.1 Definition of FD 2.2 Inference Rules for FDs 2.3 Equivalence of Sets of FDs 2.4 Minimal Sets of FDs

Surendra Singh Chahar Department of CS&IT, SU

181

Chapter Outline(contd.)
3 Normal Forms Based on Primary Keys
3.1 3.2 3.3 3.4 3.5 3.6 Normalization of Relations Practical Use of Normal Forms Definitions of Keys and Attributes Participating in Keys First Normal Form Second Normal Form Third Normal Form

4 General Normal Form Definitions (For Multiple Keys) 5 BCNF (Boyce-Codd Normal Form)

Surendra Singh Chahar Department of CS&IT, SU

182

1. Informal Design Guidelines for Relational Databases


We first discuss informal guidelines for good relational design Then we discuss formal concepts of functional dependencies and normal forms
- 1NF (First Normal Form) - 2NF (Second Normal Form) - 3NF (Third Normal Form) - BCNF (Boyce-Codd Normal Form)

Surendra Singh Chahar Department of CS&IT, SU

183

Semantics of the Relation Attributes GUIDELINE 1: Informally, each tuple in a relation should represent one entity or relationship instance. (Applies to individual relations and their attributes).
Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation Only foreign keys should be used to refer to other entities Entity and relationship attributes should be kept apart as much as possible.
Surendra Singh Chahar Department of CS&IT, SU 184

Surendra Singh Chahar Department of CS&IT, SU

185

Redundant Information in Tuples and Update Anomalies


Mixing attributes of multiple entities may cause problems Information is stored redundantly wasting storage Problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

Surendra Singh Chahar Department of CS&IT, SU

186

EXAMPLE OF AN UPDATE ANOMALY (1)


Consider the relation:
EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)

Update Anomaly: Changing the name of project number P1 from Billing to Customer-Accounting may cause this update to be made for all 100 employees working on project P1.

Surendra Singh Chahar Department of CS&IT, SU

187

EXAMPLE OF AN UPDATE ANOMALY (2)


Insert Anomaly: Cannot insert a project unless an employee is assigned to . Inversely - Cannot insert an employee unless an he/she is assigned to a project. Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project.
Surendra Singh Chahar Department of CS&IT, SU 188

Surendra Singh Chahar Department of CS&IT, SU

189

Figure 10.4 Example States for EMP_DEPT and


EMP_PROJ

Surendra Singh Chahar Department of CS&IT, SU

190

Surendra Singh Chahar Department of CS&IT, SU

191

Guideline to Redundant Information in Tuples and Update Anomalies


GUIDELINE 2: Design a schema that does not suffer from the insertion, deletion and update anomalies. If there are any present, then note them so that applications can be made to take them into account

Surendra Singh Chahar Department of CS&IT, SU

192

1.3 Null Values in Tuples


GUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible Attributes that are NULL frequently could be placed in separate relations (with the primary key) Reasons for nulls:
attribute not applicable or invalid attribute value unknown (may exist) value known to exist, but unavailable

Surendra Singh Chahar Department of CS&IT, SU

193

2.1 Functional Dependencies (1) Functional dependency is a constraint between two sets of attributes from the database. FDs and keys are used to define normal forms for relations FDs are constraints that are derived from the meaning and interrelationships of the data attributes
Surendra Singh Chahar Department of CS&IT, SU 194

Functional Dependency
Definition. A functional dependency, denoted by X ~ Y, between two sets of attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is that, for any two tuples t1and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y] . This means that the values of the Y component of a tuple in r depend on, or are determined by, the values of the X component; alternatively, the values of the X component of a tuple uniquely (or functionally) determine the values of the Y component. We also say that there is a functional dependency from X to Y, or that Y is functionally dependent on X. The abbreviation for functional dependency is FD or f.d. The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side.
Surendra Singh Chahar Department of CS&IT, SU 195

Functional Dependencies (2)


X -> Y holds if whenever two tuples have the same value for X, they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R): If t1[X] = t2[X], then t1[Y] = t2[Y] X -> Y in R specifies a constraint on all relation instances r(R) Written as X -> Y; can be displayed graphically on a relation schema as in Figures. ( denoted by the arrow: ). FDs are derived from the real-world constraints on the attributes

Surendra Singh Chahar Department of CS&IT, SU

196

Examples of FD constraints (1)


social security number determines employee name SSN -> ENAME project number determines project name and location PNUMBER -> {PNAME, PLOCATION} employee ssn and project number determines the hours per week that the employee works on the project {SSN, PNUMBER} -> HOURS

Surendra Singh Chahar Department of CS&IT, SU

197

3 Normal Forms Based on Primary Keys


3.1 Normalization of Relations 3.2 Practical Use of Normal Forms 3.3 Definitions of Keys and Attributes Participating in Keys 3.4 First Normal Form 3.5 Second Normal Form 3.6 Third Normal Form

Surendra Singh Chahar Department of CS&IT, SU

198

3.1 Normalization of Relations (1)


Normalization: The process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller relations Normal form: Condition using keys and FDs of a relation to certify whether a relation schema is in a particular normal form

Surendra Singh Chahar Department of CS&IT, SU

199

Normalization of Relations (2)


2NF, 3NF, BCNF based on keys and FDs of a relation schema 4NF based on keys, multi-valued dependencies : MVDs; 5NF based on keys, join dependencies : JDs Additional properties may be needed to ensure a good relational design (lossless join, dependency preservation)

Surendra Singh Chahar Department of CS&IT, SU

200

3.2

Practical Use of Normal Forms

Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties The database designers need not normalize to the highest possible normal form. (usually up to 3NF, BCNF or 4NF) Denormalization: the process of storing the join of higher normal form relations as a base relation which is in a lower normal form

Surendra Singh Chahar Department of CS&IT, SU

201

3.2 First Normal Form Disallows composite attributes, multivalued attributes, and nested relations; attributes whose values for an individual tuple are nonatomic Considered to be part of the definition of relation

Surendra Singh Chahar Department of CS&IT, SU

202

Figure 10.8 Normalization into 1NF

Surendra Singh Chahar Department of CS&IT, SU

203

Figure: Normalization nested relations into 1NF

Surendra Singh Chahar Department of CS&IT, SU

204

3.3 Second Normal Form (1)


Uses the concepts of FDs, primary key

Definitions: Prime attribute - attribute that is member of the primary key K Full functional dependency - a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more
Examples: - {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold - {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds
Surendra Singh Chahar Department of CS&IT, SU 205

Second Normal Form (2) A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key R can be decomposed into 2NF relations via the process of 2NF normalization

Surendra Singh Chahar Department of CS&IT, SU

206

The normalization process. (a) Normalizing EMP_PROJ into 2NF relations. (b) Normalizing EMP_DEPT into 3NF relations.

Surendra Singh Chahar Department of CS&IT, SU

207

Normalization to 2NF and 3NF

Surendra Singh Chahar Department of CS&IT, SU

208

3.4 Third Normal Form (1)


Definition: Transitive functional dependency - a FD X -> Z that
can be derived from two FDs X -> Y and Y -> Z Examples:

- SSN -> DMGRSSN is a transitive FD since


SSN -> DNUMBER and DNUMBER -> DMGRSSN hold - SSN -> ENAME is non-transitive since there is no set of attributes X where SSN -> X and X -> ENAME

Surendra Singh Chahar Department of CS&IT, SU

209

Third Normal Form (2)


A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE: In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency . E.g., Consider EMP (SSN, Emp#, Salary ). Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

Surendra Singh Chahar Department of CS&IT, SU

210

4 General Normal Form Definitions (For Multiple Keys)

(1)

The above definitions consider the primary key only The following more general definitions take into account relations with multiple candidate keys A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R

Surendra Singh Chahar Department of CS&IT, SU

211

General Normal Form Definitions (2)


Definition: Superkey of relation schema R - a set of attributes S of R that contains a key of R A relation schema R is in third normal form (3NF) if whenever a FD X -> A holds in R, then either: (a) X is a superkey of R, or (b) A is a prime attribute of R
NOTE: Boyce-Codd normal form disallows condition (b) above

Surendra Singh Chahar Department of CS&IT, SU

212

5 BCNF (Boyce-Codd Normal Form)


A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R
Each normal form is strictly stronger than the previous one
Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF The goal is to have each relation in BCNF (or 3NF)

Surendra Singh Chahar Department of CS&IT, SU

213

Boyce-Codd normal form

Surendra Singh Chahar Department of CS&IT, SU

214

Figure 10.13 a relation TEACH that is in 3NF but not in BCNF

Surendra Singh Chahar Department of CS&IT, SU

215

Achieving the BCNF by Decomposition (1)


Two FDs exist in the relation TEACH: fd1: { student, course} -> instructor fd2: instructor -> course {student, course} is a candidate key for this relation and that the dependencies shown follow the pattern in Figure 10.12 (b). So this relation is in 3NF but not in BCNF A relation NOT in BCNF should be decomposed so as to meet this property, while possibly forgoing the preservation of all functional dependencies in the decomposed relations. (See Algorithm 11.3)
Surendra Singh Chahar Department of CS&IT, SU 216

Achieving the BCNF by Decomposition (2)


Three possible decompositions for relation TEACH 1. {student, instructor} and {student, course} 2. {course, instructor } and {course, student} 3. {instructor, course } and {instructor, student} All three decompositions will lose fd1. We have to settle for sacrificing the functional dependency preservation. But we cannot sacrifice the non-additivity property after decomposition. Out of the above three, only the 3rd decomposition will not generate spurious tuples after join.(and hence has the non-additivity property). A test to determine whether a binary decomposition (decomposition into two relations) is nonadditive (lossless) is discussed in section 11.1.4 under Property LJ1. Verify that the third decomposition above meets the property.

Surendra Singh Chahar Department of CS&IT, SU

217

SUMMARY

Surendra Singh Chahar Department of CS&IT, SU

218

Outcome and Operations of each normal form


Tables with multivalued attributes Remove multivalued attributes 1NF Remove partial dependencies 2NF Remove transitive dependencies 3NF Remove remaining Anomalies resulting From f.d. BCNF Remove multivalued dependencies 4NF Remove remaining anomalies 5NF

Surendra Singh Chahar Department of CS&IT, SU

220

Вам также может понравиться