Академический Документы
Профессиональный Документы
Культура Документы
Surendra Singh Chahar Assistant Professor CS & IT Department SET, Sharda University
Surendra Singh Chahar Department of CS&IT, SU 1
Introduction What is Database? A database is a collection of related data. The data mean known facts that can be recorded and that have implicit meaning. A database has some source from which data are derived, some degree of interaction with events in the real world.
An Example, University Database: Data about students, faculty, courses, research laboratories, course registration/enrollment etc. Reflects the state of affairs of the academic aspects of the university. Purpose: To keep an accurate track of the academic activities of the university.
DBMS Approach
DBMS separation of data and metadata flexibility of changing metadata program-data independence Data Access Language standardized SQL ad-hoc query formulation easy System Development less effort required concentration on logical level design is enough components to organize data storage
Surendra Singh Chahar Department of CS&IT, SU 8
Actors on the Scene Database AdministratorsDBA is responsible for authorizing access to the database, for coordinating and monitoring its use, and for acquiring software and hardware resources as required. Database DesignersDDs are responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data.
End Users End users are the people whose jobs require access to the database for querying, updating, and generating reports; the database primarily exists for their use. Casual end users- They access the database, but they may need different information each time. Naive/parametric end users- these users involve in querying and updating the data. exp, Bank tellers check account balances and post withdrawals and deposits.
10
Sophisticated end users- Engineers, scientist, usersbusiness analysts, and others who thoroughly familiarize themselves with the facilities of the DBMS. DBMS. Stand-alone users- Maintain personal data bases by Standusersusing readymade program packages that provide easy-toeasy-to-use menu or graphic-based interfaces. graphicinterfaces.
11
Chapter 2
Database System Concepts and Architecture
12
Data Models
Data Model: A set of concepts to describe the structure of a database, and certain constraints that the database should obey. Data Model Operations: Operations for specifying database retrievals and updates by referring to the concepts of the data model. Operations on the data model may include basic operations and user-defined operations.
13
Hierarchical Model
ADVANTAGES:
Hierarchical Model is simple to construct and operate on Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in manufacturing, personnel organization in companies Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT etc.
DISADVANTAGES:
Navigational and procedural nature of processing Database is visualized as a linear arrangement of records Little scope for "query optimization"
15
Network Model
ADVANTAGES:
Network Model is able to model complex relationships and represents semantics of add/delete on the relationships. Can handle most situations for modeling using record types and relationship types. Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET etc. Programmers can do optimal navigation through the database.
DISADVANTAGES:
Navigational and procedural nature of processing Database contains a complex array of pointers that thread through a set of records.
Surendra Singh Chahar Department of CS&IT, SU 16
Three-Schema Architecture
Proposed to support DBMS characteristics of:
Program-data independence. Support of multiple user views of the data. Use of a catalog to store the database description (schema).
18
Three-Schema Architecture
Defines DBMS schemas at three levels:
Internal level has an internal schema to describe physical storage structures of the database. Internal schema uses a physical data model and describes the complete details of data storage and access paths for the database. Conceptual schema at the conceptual level to describe the structure and constraints for the whole database for a community of users. It concentrates on describing entities, data types, relationships, user operations, and constraints.
19
Three-Schema Architecture
External schemas at the external level to describe the various user views. Usually uses the same data model as the conceptual level.
20
b
External Level
END USERS
b
EXTERNAL VIEW
EXTERNAL VIEW
External/conceptual mapping
Conceptual Level
CONCEPTUAL SCHEMA
Conceptual/internal mapping
Internal Level
INTERNAL SCHEMA
STORED DATABASE
Surendra Singh Chahar Department of CS&IT, SU 21
Data Independence
It is defined as the capacity to change the schema at one level of a database system without having to change the schema at the next higher level. Logical Data Independence: The capacity to change the conceptual schema without having to change the external schemas and their application programs. Physical Data Independence: The capacity to change the internal schema without having to change the conceptual schema.
22
Data Independence
When a schema at a lower level is changed, only the mappings between this schema and higher-level schemas need to be changed in a DBMS that fully supports data independence. The higher-level schemas themselves are unchanged. Hence, the application programs need not be changed since they refer to the external schemas.
Surendra Singh Chahar Department of CS&IT, SU 23
DBMS Languages
Data Definition Language (DDL): Used by the DBA and database designers to specify the conceptual schema of a database. In many DBMSs, the DDL is also used to define internal and external schemas (views). In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas.
Surendra Singh Chahar Department of CS&IT, SU 24
DBMS Languages
Data Manipulation Language (DML): Used to specify database retrievals and updates. DML commands (data sub-language) can be embedded in a general-purpose programming language (host language), such as COBOL, C or an Assembly Language. Alternatively, stand-alone DML commands can be applied directly (query language).
25
DBMS Languages
High Level or Non-procedural Languages: e.g., SQL, are set-oriented and specify what data to retrieve than how to retrieve. Also called declarative languages. Low Level or Procedural Languages: recordat-a-time; they specify how to retrieve data and include constructs such as looping.
26
DBMS Interfaces
User-friendly interfaces:
Menu-based, popular for browsing on the web Forms-based, designed for nave users Graphics-based (Point and Click, Drag and Drop etc.) Natural language: requests in written English Combinations of the above
27
28
29
30
DBMS Languages
Once the design of a database is completed and a DBMS is chosen to implement the database, the first order of the day is to specify conceptual and internal schemas for the database and any mappings between the two. In many DBMSs where no strict separation of levels is maintained, one language, called the data definition language (OOL), is used by the DBA and by database designers to define both schemas. The DBMS will have a DDL compiler whose function is to process LJDL statements in order to identify descriptions of the schema constructs and to store the schema description in the DBMS catalog. In DBMSs where a clear separation is maintained between the conceptual and internal levels, the DDL is used to specify the conceptual schema only. Another language, the storage definition language (SOL), is used to specify the internal schema. The mappings between the two schemas may be specified in either one of these languages. For a true three-schema architecture, we would need a third language, the view definition language (VDL), to specify user views and their mappings to the conceptual schema, but in most DBMSs the DDL is used to define both conceptual and external schemas. Once the database schemas arc compiled and the database is populated with data, users must have some means to manipulate the database. Typical manipulations include retrieval, insertion, deletion, and modification of the data. The DBMS provides a set of operations or a language called the data manipulation language (DML) for these purposes. Surendra Singh Chahar 31
Department of CS&IT, SU
Chapter 3
Data Modeling Using the EntityRelationship (ER) Model
32
ER DIAGRAM OF COMPANY DATABASE Requirements of the Company The company is organized into DEPARTMENTs. Each department has a name, number and an employee who manages the department. We keep track of the start date of the department manager. Each department controls a number of PROJECTs. Each project has a name, number and is located at a single location.
33
store each EMPLOYEEs social security number, address, salary, sex, and birthdate. Each employee works for one department but may work on several projects. We keep track of the number of hours per week that an employee currently works on each project. We also keep track of the direct supervisor of each employee. Each employee may have a number of DEPENDENTs. For each dependent, we keep track of their name, sex, birthdate, and relationship to employee.
Surendra Singh Chahar Department of CS&IT, SU 34
We
ER Model Concepts
Entities and Attributes Entities are specific objects or things in the mini-world that are represented in the database. For example the EMPLOYEE John Smith, the Research DEPARTMENT, the ProductX PROJECT Attributes are properties used to describe an entity. For example an EMPLOYEE entity may have a Name, SSN, Address, Sex, Birthdate A specific entity will have a value for each of its attributes. For example a specific employee entity may have Name='John Smith', SSN='123456789', Address ='731, Fondren, Houston, TX', Sex='M', BirthDate='09-JAN-55 Each attribute has a value set (or data type) associated with it e.g. integer, string,
Surendra Singh Chahar Department of CS&IT, SU 35
Composite
The attribute may be composed of several components. For example, Address (Apt#, House#, Street, City, State, ZipCode, Country) or Name (FirstName, MiddleName, LastName). Composition may form a hierarchy where some components are themselves composite.
Multi-valued
An entity may have multiple values for that attribute. For example, Color of a CAR or Previous Degrees of a STUDENT. Denoted as {Color} or {Previous Degrees}.
36
37
. . .
Surendra Singh Chahar Department of CS&IT, SU 39
E1 E1
R
R R N
E2 E2 E
TOTAL PARTICIPATION OF E2 IN R CARDINALITY RATIO 1:N FOR E1:E2 IN R STRUCTURAL CONSTRAINT (min, max) ON PARTICIPATION OF E IN R
(min,max)
40
41
42
Example relationship instances of the WORKS_FOR relationship between EMPLOYEE and DEPARTMENT EMPLOYEE
e1 e2 e3 e4 e5 e6 e7
WORKS_FOR
r1 r2 r3 r4 r5
DEPARTMENT
d1
d2
d3
r6 r7
Surendra Singh Chahar Department of CS&IT, SU 43
Example relationship instances of the WORKS_ON relationship between EMPLOYEE and PROJECT EMPLOYEE WORKS_ON PROJECT
r9 e1 e2 e3 e4 e5 e6 e7
r1 r2 r3 r4 r5
p1
p2
p3
r6 r8 r7
44
45
ER-DIAGRAM
ENTITY TYPES
WORKS_FOR MANAGES WORKS_ON CONTROLS SUPERVISION DEPENDENTS_OF
46
48
Constraints on Relationships
Constraints on Relationship Types
( Also known as ratio constraints ) Maximum Cardinality
One-to-one (1:1) One-to-many (1:N) or Many-to-one (N:1) Many-to-many
49
WORKS_FOR
r1 r2 r3 r4 r5
DEPARTMENT
d1
d2
d3
r6 r7
Surendra Singh Chahar Department of CS&IT, SU 50
WORKS_ON
r9 r1 r2 r3 r4 r5
PROJECT
p1
p2
p3
r6 r8 r7
51
52
SUPERVISION
r1 2 r2 r3 1 1 r4 1 2 r6 r5
2 2
53
Recursive Relationship Type is: SUPERVISION (participation role names are shown)
54
Attributes of Relationship types A relationship type can have attributes; for example, HoursPerWeek of WORKS_ON; its value for each relationship instance describes the number of hours per week that an EMPLOYEE works on a PROJECT.
55
56
Participation constraint (on each participating entity type): total (called existence dependency) or partial.
SHOWN BY DOUBLE LINING THE LINK
(1,1)
DEPARTMENT
EMPLOYEE
(1,1)
Works_for
(1,N)
DEPARTMENT
59
60
Relationship types of degree 2 are called binary Relationship types of degree 3 are called ternary and of degree n are called n-ary In general, an n-ary relationship is not equivalent to n binary relationships Higher-order relationships discussed further in Chapter 4
Surendra Singh Chahar Department of CS&IT, SU 61
POSITIVES: serves as documentation of application requirements, easy user interface mostly graphics editor support
62
METHODOLGY
lack of built-in methodology support. poor tradeoff analysis or user-driven design preferences. poor design verification and suggestions for improvement.
Surendra Singh Chahar Department of CS&IT, SU 63
Oracle Popkin Software Platinum Technology Persistence Inc. Rational Rogue Ware Resolution Ltd. Sybase Visio
Mapping from O-O to relational model Modeling in UML and application generation in C++ and JAVA Mapping from O-O to relational model Conceptual modeling up to code maintenance Data modeling, business logic modeling
Data modeling, design and reengineering Visual Basic and Visual C++ Surendra Singh Chahar 64 Department of CS&IT, SU
65
THE ENTITY RELATIONSHIP MODEL IN ITS ORIGINAL FORM DID NOT SUPPORT THE SPECIALIZATION/ GENERALIZATION ABSTRACTIONS
66
67
68
69
INFORMAL DEFINITIONS
RELATION: A table of values
A relation may be thought of as a set of rows. A relation may alternately be thought of as a set of columns. Each row represents a fact that corresponds to a real-world entity or relationship. Each row has a value of an item or set of items that uniquely identifies that row in the table. Each column typically is called by its column name or column header or attribute name.
Surendra Singh Chahar Department of CS&IT, SU 70
FORMAL DEFINITIONS
A Relation may be defined in multiple ways. The Schema of a Relation: R (A1, A2, .....An) Relation schema R is defined over attributes A1, A2, .....An For Example CUSTOMER (Cust-id, Cust-name, Address, Phone#) Here, CUSTOMER is a relation defined over the four attributes Cust-id, Cust-name, Address, Phone#, each of which has a domain or a set of valid values. For example, the domain of Cust-id is 6 digit numbers.
Surendra Singh Chahar Department of CS&IT, SU 71
FORMAL DEFINITIONS
A tuple is an ordered set of values Each value is derived from an appropriate domain. Each row in the CUSTOMER table may be referred to as a tuple in the table and would consist of four values.
<632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000">
is a tuple belonging to the CUSTOMER relation. A relation may be regarded as a set of tuples (rows). Columns in a table are also called attributes of the relation.
72
FORMAL DEFINITIONS
A domain has a logical definition: e.g., USA_phone_numbers are the set of 10 digit phone numbers valid in the U.S. A domain may have a data-type or a format defined for it. The USA_phone_numbers may have a format: (ddd)-ddddddd where each d is a decimal digit. E.g., Dates have various formats such as monthname, date, year or yyyy-mm-dd, or dd mm,yyyy etc. An attribute designates the role played by the domain. E.g., the domain Date may be used to define attributes Invoicedate and Payment-date.
73
FORMAL DEFINITIONS
The relation is formed over the cartesian product of the sets; each set has values from a domain; that domain is used in a specific role which is conveyed by the attribute name. For example, attribute Cust-name is defined over the domain of strings of 25 characters. The role these strings play in the CUSTOMER relation is that of the name of customers. Formally, Given R(A1, A2, .........., An)
r(R) dom (A1) X dom (A2) X ....X dom(An)
R: schema of the relation r of R: a specific "value" or population of R. R is also called the intension of a relation r is also called the extension of a relation
Surendra Singh Chahar Department of CS&IT, SU 74
FORMAL DEFINITIONS
Let S1 = {0,1} and S2 = {a,b,c} Let R S1 X S2, then for example: r(R) = {<0,a> , <0,b> , <1,c> } is one possible state or population or extension r of the relation R, defined over domains S1 and S2. It has three tuples.
75
DEFINITION SUMMARY
Informal Terms Table Column Row Values in a column Table Definition Populated Table Formal Terms Relation Attribute/Domain Tuple Domain Schema of a Relation Extension
76
Example
77
CHARACTERISTICS OF RELATIONS
Ordering of tuples in a relation r(R): The tuples are not considered to be ordered, even though they appear to be in the tabular form. Ordering of attributes in a relation schema R (and of values within each tuple): We will consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to be ordered . (However, a more general alternative definition of relation does not require this ordering). Values in a tuple: All values are considered atomic (indivisible). A special null value is used to represent values that are unknown or inapplicable to certain tuples.
78
CHARACTERISTICS OF RELATIONS
Notation: We refer to component values of a tuple t by t[Ai] = vi (the value of attribute Ai for tuple t). Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing the values of attributes Au, Av, ..., Aw, respectively.
79
CHARACTERISTICS OF RELATIONS
80
81
Key Constraints
An key is an attribute or collection of attributes that may be used to identify or retrieve one or more records. There are many types of keys in RDBMS:
Candidate Key Primary Key Foreign Key Super Key Alternate Key Compound Key Secondary Key
Surendra Singh Chahar Department of CS&IT, SU 82
Candidate Key
A candidate key is any set of one or more columns whose combined values are unique among the all tuples. A candidate key is an key that can be used to uniquely identify a record. A candidate key is a attribute which can be selected as a primary key. Each attribute or a set of attributes is called a candidate key of the relation if it satisfies the following axioms: The attribute or the set of attributes uniquely identifies each tuple in the relation, and If the key is a set of attributes then no subset of these attributes has property i.e. minimality.
Surendra Singh Chahar Department of CS&IT, SU 83
Primary Key
The primary key of a relational table uniquely identifies each record in the table. It is a unique identifier, such as a driver license number, telephone number (including area code), or vehicle identification number (VIN). The PRIMARY KEY constraint cannot accept null values. Because PRIMARY KEY constraints guarantee unique data, they are frequently defined on an identity column. Primary keys for some of the sample tables are:
Employee table Department table Project table EMPNO DEPTNO PROJNO
Surendra Singh Chahar Department of CS&IT, SU 84
PROJNAME
Linear Programming
60 60 70
1.0 .5 .5
EMPLOYEE Table
EMPNO (Primary Key) 000010 000030 000060 000120 000140 000170 FIRSTNAME LASTNAME WORKDEPT (Foreign Key) A00 C01 D11 A00 C01 D11 PHONENO
86
Primary Key
The criteria for selecting a primary key from a pool of candidate keys should be persistence, uniqueness, and stability: Persistence means that a primary key value for each row always exists. Uniqueness means that the key value for each row is different from all the others. Stability means that primary key values never change. Of the three candidate keys in the example, only EMPNO satisfies all of these criteria. An employee may not have a phone number when joining a company. Last names can change, and, although they may be unique at one point, are not guaranteed to be so. The employee number column is the best choice for the primary key. An employee is assigned a unique number only once, and that number is generally not updated as long as the employee remains with the company. Since each employee must have a number, values in the employee number column are persistent.
87
Foreign Keys
A foreign key is a reference to a key in another relation, i.e., the referencing tuple has, as one of its attributes, the values of a key in the referenced tuple. Foreign keys need not have unique values in the referencing relation. A foreign key is a referential constraint between two tables. The foreign key identifies a column(s) in one (referencing) table that refers to a column(s) in another (referenced) table. The column in the referencing table must be the primary key or other candidate key in the referenced table.
88
Super Key
A super key is a column or set of columns that uniquely identifies a row within a table. Example, Given table EMLOYEES{e_id, fname, lname, salary} Possible super keys are: {e_id}, {e_id, fname}, ., {e_id, fname, lname, salary} Here, only the minimal superkey {e_id} will be considered as a candidate key. A super key is a combination of attributes that can be uniquely used to identify a database record.
89
Integrity Constraints
The aim of data integrity is to specify rules that implicitly or explicitly define a consistent database state. The integrity of RDBMS is based on certain rules proposed by E. F. Codd . The Codds rules are:
91
Codds Rules
Rule 0: The system must qualify as relational, as a database, and as a management system. For a system to qualify as a relational database management system (RDBMS), that system must use its relational facilities (exclusively) to manage the database. Rule 1: The information rule: All information in the database is to be represented in one and only one way, namely by values in column positions within rows of tables. Rule 2: The guaranteed access rule: All data must be accessible. This rule is essentially a restatement of the fundamental requirement for primary keys. It says that every individual scalar value in the database must be logically addressable by specifying the name of the containing table, the name of the containing column and the primary key value of the containing row.
Surendra Singh Chahar Department of CS&IT, SU 92
Codds Rules
Rule 3: Systematic treatment of null values: The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way. Rule 4: Active online catalog based on the relational model: The database description is represented at the logical level in the same way as the ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.
Surendra Singh Chahar Department of CS&IT, SU 93
Codds Rules
Rule 5: The comprehensive data sub-language rule: The system must support at least one relational language that
1.Has a linear syntax 2.Can be used both interactively and within application programs, 3.Supports data definition operations (including view definitions), data manipulation operations (update as well as retrieval), security and integrity constraints, and transaction management operations (begin, commit, and rollback).
Rule 6: The view updating rule: All views that are theoretically updatable must be updatable by the system.
Surendra Singh Chahar Department of CS&IT, SU 94
Codds Rules
Rule 7: High-level insert, update, and delete: The system must support set-at-a-time insert, update, and delete operators. This means that data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table. Rule 8: Physical data independence: Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.) must not require a change to the conceptual level and external level. Rule 9: Logical data independence: Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data independence.
Surendra Singh Chahar Department of CS&IT, SU 95
Codds Rules
Rule 10: Integrity independence: Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications. Rule 11: Distribution independence: The distribution of portions of the database to various locations should be invisible to users of the database. Existing applications should continue to operate successfully : 1. when a distributed version of the DBMS is first introduced; and 2. when existing distributed data are redistributed around the system.
96
Codds Rules
Rule 12: The non-subversion rule: If a relational system has a low-level (single record at a time) language, that low level cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher-level relational language (multiple records at a time).
97
Integrity Constraints
The integrity constraints are necessary to avoid situations like the following:
Some data has been inserted in the database but it cannot be identified i.e., it is not clear which object or entity the data is about. A student is enrolled in a course but no data about him is available in the relation that has information about students. During a query processing , a student is compared with a course number. A students quits the university and is removed from the student relation but is still enrolled in a course.
Constraints are not formally part of the relational model, but because of the integrity role that play in organizing data.
98
Integrity Rules
The following are the 2 integrity rules to be satisfied by any relation: Entity Integrity: Primary key cannot be null. Referential Integrity: The database must not contain any unmatched Foreign key values. This is called the referential integrity rule.
99
Entity Integrity
Relational Database Schema: A set S of relation schemas that belong to the same database. S is the name of the database. S = {R1, R2, ..., Rn} Entity Integrity: The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R). This is because primary key values are used to identify the individual tuples. t[PK] { null for any tuple t in r(R) Note: Other attributes of R may be similarly constrained to disallow null values, even though they are not members of the primary key.
Surendra Singh Chahar Department of CS&IT, SU 100
Referential Integrity
A constraint involving two relations (the previous constraints involve a single relation). Used to specify a relationship among tuples in two relations: the referencing relation and the referenced relation. Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R2. A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK]. A referential integrity constraint can be displayed in a relational database schema as a directed arc from R1.FK to R2.
Surendra Singh Chahar Department of CS&IT, SU 101
5.5
103
5.6
104
5.7
105
In-Class Exercise
Consider the following relations for a database that keeps track of student enrollment in courses and the books adopted for each course: STUDENT(SSN, Name, Major, Bdate) COURSE(Course#, Cname, Dept) ENROLL(SSN, Course#, Quarter, Grade) BOOK_ADOPTION(Course#, Quarter, Book_ISBN) TEXT(Book_ISBN, Book_Title, Publisher, Author) Draw a relational schema diagram specifying the foreign keys for this schema.
106
Introduction to SQL
What is SQL?
When a user wants to get some information from a database file, he can issue a query. A query is a userrequest to retrieve data or information with a certain condition. SQL is a query language that allows user to specify the conditions. (instead of algorithms)
107
Introduction to SQL
Concept of SQL
The user specifies a certain condition. The program will go through all the records in the database file and select those records that satisfy the condition. (searching). Statistical information of the data. The result of the query will then be stored in form of a table.
Surendra Singh Chahar Department of CS&IT, SU 108
109
110
111
112
CREATE TABLE
Specifies a new base relation by giving it a name, and specifying each of its attributes and their data types (INTEGER, FLOAT, DECIMAL(i,j), CHAR(n), VARCHAR(n)) A constraint NOT NULL may be specified on an attribute
CREATE TABLE DEPARTMENT ( DNAME VARCHAR(10) NOT NULL, DNUMBER INTEGER NOT NULL, MGRSSN CHAR(9), MGRSTARTDATE CHAR(9) );
Surendra Singh Chahar Department of CS&IT, SU 113
CREATE TABLE
In SQL2, can use the CREATE TABLE command for specifying the primary key attributes, secondary keys, and referential integrity constraints (foreign keys). Key attributes can be specified via the PRIMARY KEY and UNIQUE phrases
CREATE TABLE DEPT ( DNAME VARCHAR(10) NOT NULL, DNUMBER INTEGER NOT NULL, MGRSSN CHAR(9), MGRSTARTDATE CHAR(9), PRIMARY KEY (DNUMBER), UNIQUE (DNAME), FOREIGN KEY (MGRSSN) REFERENCES EMP );
Surendra Singh Chahar Department of CS&IT, SU 114
DROP TABLE
Used to remove a relation (base table) and its definition The relation can no longer be used in queries, updates, or any other commands since its description no longer exists Example: DROP TABLE DEPENDENT;
Surendra Singh Chahar Department of CS&IT, SU 115
ALTER TABLE
Used to add an attribute to one of the base relations The new attribute will have NULLs in all the tuples of the relation right after the command is executed; hence, the NOT NULL constraint is not allowed for such an attribute Example:
ALTER TABLE EMPLOYEE ADD JOB VARCHAR(12);
The database users must still enter a value for the new attribute JOB for each EMPLOYEE tuple. This can be done using the UPDATE command.
116
117
CREATE SCHEMA
Specifies a new database schema by giving it a name
118
120
TIME:
Made up of hour:minute:second in the format hh:mm:ss
TIME(i):
Made up of hour:minute:second plus i additional digits specifying fractions of a second format is hh:mm:ss:ii...i
TIMESTAMP:
Has both DATE and TIME components
Surendra Singh Chahar Department of CS&IT, SU 121
122
123
<attribute list> is a list of attribute names whose values are to be retrieved by the query <table list> is a list of the relation names required to process the query <condition> is a conditional (Boolean) expression that identifies the tuples to be retrieved by the query
Surendra Singh Chahar Department of CS&IT, SU 124
125
Populated Database
126
128
In Q2, there are two join conditions The join condition DNUM=DNUMBER relates a project to its controlling department The join condition MGRSSN=SSN relates the controlling department to the employee who manages that department
Surendra Singh Chahar Department of CS&IT, SU 129
130
ALIASES
Some queries need to refer to the same relation twice In this case, aliases are given to the relation name Query 8: For each employee, retrieve the employee's name, and the name of his or her immediate supervisor. Q8: SELECT FROM WHERE E.FNAME, E.LNAME, S.FNAME, S.LNAME EMPLOYEE E S E.SUPERSSN=S.SSN
In Q8, the alternate relation names E and S are called aliases or tuple variables for the EMPLOYEE relation We can think of E and S as two different copies of EMPLOYEE; E represents employees in role of supervisees and S represents employees in role of supervisors
Surendra Singh Chahar Department of CS&IT, SU 131
ALIASES (cont.)
Aliasing can also be used in any SQL query for convenience Can also use the AS keyword to specify aliases Q8: SELECT FROM WHERE E.FNAME, E.LNAME, S.FNAME, S.LNAME EMPLOYEE AS E, EMPLOYEE AS S E.SUPERSSN=S.SSN
132
UNSPECIFIED WHERE-clause
A missing WHERE-clause indicates no condition; hence, all tuples of the relations in the FROM-clause are selected This is equivalent to the condition WHERE TRUE Query 9: Retrieve the SSN values for all employees. Q9: SELECT FROM SSN EMPLOYEE
If more than one relation is specified in the FROM-clause and there is no join condition, then the CARTESIAN PRODUCT of tuples is selected
Surendra Singh Chahar Department of CS&IT, SU 133
It is extremely important not to overlook specifying any selection and join conditions in the WHERE-clause; otherwise, incorrect and very large relations may result
134
USE OF *
To retrieve all the attribute values of the selected tuples, a * is used, which stands for all the attributes Examples: Q1C: SELECT FROM WHERE SELECT FROM WHERE * EMPLOYEE DNO=5 * EMPLOYEE, DEPARTMENT DNAME='Research' AND DNO=DNUMBER
135
Q1D:
USE OF DISTINCT
SQL does not treat a relation as a set; duplicate tuples can appear To eliminate duplicate tuples in a query result, the keyword DISTINCT is used For example, the result of Q11 may have duplicate SALARY values whereas Q11A does not have any duplicate values Q11: Q11A: SELECT FROM SELECT FROM SALARY EMPLOYEE DISTINCT SALARY EMPLOYEE
136
SET OPERATIONS
SQL has directly incorporated some set operations There is a union operation (UNION), and in some versions of SQL there are set difference (MINUS) and intersection (INTERSECT) operations The resulting relations of these set operations are sets of tuples; duplicate tuples are eliminated from the result The set operations apply only to union compatible relations ; the two relations must have the same attributes and the attributes must appear in the same order
Surendra Singh Chahar Department of CS&IT, SU 137
138
NESTING OF QUERIES
A complete SELECT query, called a nested query , can be specified within the WHERE-clause of another query, called the outer query Many of the previous queries can be specified in an alternative form using nesting Query 1: Retrieve the name and address of all employees who work for the 'Research' department. Q1: SELECT FROM WHERE FROM WHERE FNAME, LNAME, ADDRESS EMPLOYEE DNO IN (SELECT DNUMBER DEPARTMENT DNAME='Research' )
139
140
The original SQL as specified for SYSTEM R also had a CONTAINS comparison operator, which is used in conjunction with nested correlated queries This operator was dropped from the language, possibly because of the difficulty in implementing it efficiently
Surendra Singh Chahar Department of CS&IT, SU 142
143
144
145
146
In Q6, the correlated nested query retrieves all DEPENDENT tuples related to an EMPLOYEE tuple. If none exist , the EMPLOYEE tuple is selected EXISTS is necessary for the expressive power of SQL
147
EXPLICIT SETS
It is also possible to use an explicit (enumerated) set of values in the WHERE-clause rather than a nested query Query 13: Retrieve the social security numbers of all employees who work on project number 1, 2, or 3. Q13: SELECT FROM WHERE DISTINCT ESSN WORKS_ON PNO IN (1, 2, 3)
148
150
151
152
WHERE
AGGREGATE FUNCTIONS
Include COUNT, SUM, MAX, MIN, and AVG Query 15: Find the maximum salary, the minimum salary, and the average salary among all employees. Q15: SELECT FROM MAX(SALARY), MIN(SALARY), AVG(SALARY) EMPLOYEE
Some SQL implementations may not allow more than one function in the SELECT-clause
154
155
GROUPING
In many cases, we want to apply the aggregate functions to subgroups of tuples in a relation Each subgroup of tuples consists of the set of tuples that have the same value for the grouping attribute(s) The function is applied to each subgroup independently SQL has a GROUP BY-clause for specifying the grouping attributes, which must also appear in the SELECT-clause
157
GROUPING (cont.)
Query 20: For each department, retrieve the department number, the number of employees in the department, and their average salary. Q20: SELECT DNO, COUNT (*), AVG (SALARY) FROM EMPLOYEE GROUP BY DNO In Q20, the EMPLOYEE tuples are divided into groups--each group having the same value for the grouping attribute DNO The COUNT and AVG functions are applied to each such group of tuples separately The SELECT-clause includes only the grouping attribute and the functions to be applied on each group of tuples A join condition can be used in conjunction with grouping
158
GROUPING (cont.)
Query 21: For each project, retrieve the project number, project name, and the number of employees who work on that project. Q21: SELECT FROM WHERE GROUP BY PNUMBER, PNAME, COUNT (*) PROJECT, WORKS_ON PNUMBER=PNO PNUMBER, PNAME
In this case, the grouping and functions are applied after the joining of the two relations
159
THE HAVING-CLAUSE
Sometimes we want to retrieve the values of these functions for only those groups that satisfy certain conditions The HAVING-clause is used for specifying a selection condition on groups (rather than on individual tuples)
160
SUBSTRING COMPARISON
The LIKE comparison operator is used to compare partial strings Two reserved characters are used: '%' (or '*' in some implementations) replaces an arbitrary number of characters, and '_' replaces a single arbitrary character
162
163
'_______5_
The LIKE operator allows us to get around the fact that each value is considered atomic and indivisible; hence, in SQL, character string attribute values are not atomic
Surendra Singh Chahar Department of CS&IT, SU 164
ARITHMETIC OPERATIONS
The standard arithmetic operators '+', '-'. '*', and '/' (for addition, subtraction, multiplication, and division, respectively) can be applied to numeric values in an SQL query result Query 27: Show the effect of giving all employees who work on the 'ProductX' project a 10% raise.
Q27: SELECT WHERE FNAME, LNAME, 1.1*SALARY FROM EMPLOYEE, WORKS_ON, PROJECT SSN=ESSN AND PNO=PNUMBER AND PNAME='ProductX
165
ORDER BY
The ORDER BY clause is used to sort the tuples in a query result based on the values of some attribute(s) Query 28: Retrieve a list of employees and the projects each works in, ordered by the employee's department, and within each department ordered alphabetically by employee last name.
Q28: SELECT FROM WHERE AND ORDER BY DNAME, LNAME, FNAME, PNAME DEPARTMENT, EMPLOYEE, WORKS_ON, PROJECT DNUMBER=DNO AND SSN=ESSN PNO=PNUMBER DNAME, LNAME
166
ORDER BY (cont.)
The default order is in ascending order of values We can specify the keyword DESC if we want a descending order; the keyword ASC can be used to explicitly specify ascending order, even though it is the default
167
168
169
170
INSERT
In its simplest form, it is used to add one or more tuples to a relation Attribute values should be listed in the same order as the attributes were specified in the CREATE TABLE command
171
INSERT (cont.)
Example: U1: INSERT INTO EMPLOYEE VALUES ('Richard','K','Marini', '653298653', '30-DEC-52', '98 Oak Forest,Katy,TX', 'M', 37000,'987654321', 4 ) An alternate form of INSERT specifies explicitly the attribute names that correspond to the values in the new tuple Attributes with NULL values can be left out Example: Insert a tuple for a new EMPLOYEE for whom we only know the FNAME, LNAME, and SSN attributes. U1A: INSERT INTO EMPLOYEE (FNAME, LNAME, SSN) VALUES ('Richard', 'Marini', '653298653')
172
INSERT (cont.)
Important Note: Only the constraints specified in the DDL commands are automatically enforced by the DBMS when updates are applied to the database Another variation of INSERT allows insertion of multiple tuples resulting from a query into a relation
173
INSERT (cont.)
Example: Suppose we want to create a temporary table that has the name, number of employees, and total salaries for each department. A table DEPTS_INFO is created by U3A, and is loaded with the summary information retrieved from the database by the query in U3B. U3A: CREATE TABLE DEPTS_INFO (DEPT_NAME VARCHAR(10), NO_OF_EMPS INTEGER, TOTAL_SAL INTEGER); INSERT INTO SELECT FROM WHERE GROUP BY DEPTS_INFO (DEPT_NAME, NO_OF_EMPS, TOTAL_SAL) DNAME, COUNT (*), SUM (SALARY) DEPARTMENT, EMPLOYEE DNUMBER=DNO DNAME ;
U3B:
174
INSERT (cont.)
Note: The DEPTS_INFO table may not be up-to-date if we change the tuples in either the DEPARTMENT or the EMPLOYEE relations after issuing U3B. We have to create a view (see later) to keep such a table up to date.
175
DELETE
Removes tuples from a relation Includes a WHERE-clause to select the tuples to be deleted Tuples are deleted from only one table at a time (unless CASCADE is specified on a referential integrity constraint) A missing WHERE-clause specifies that all tuples in the relation are to be deleted; the table then becomes an empty table The number of tuples deleted depends on the number of tuples in the relation that satisfy the WHERE-clause Referential integrity should be enforced
176
DELETE (cont.)
Examples: U4A: DELETE FROM WHERE U4B: U4C: DELETE FROM WHERE DELETE FROM WHERE (SELECT FROM WHERE DELETE FROM EMPLOYEE LNAME='Brown EMPLOYEE SSN='123456789 EMPLOYEE DNO IN DNUMBER DEPARTMENT DNAME='Research') EMPLOYEE
177
U4D:
UPDATE
Used to modify attribute values of one or more selected tuples A WHERE-clause selects the tuples to be modified An additional SET-clause specifies the attributes to be modified and their new values Each command modifies tuples in the same relation Referential integrity should be enforced
178
UPDATE (cont.)
Example: Change the location and controlling department number of project number 10 to 'Bellaire' and 5, respectively. U5: UPDATE SET WHERE PROJECT PLOCATION = 'Bellaire', DNUM = 5 PNUMBER=10
179
UPDATE (cont.)
Example: Give all employees in the 'Research' department a 10% raise in salary. U6: UPDATE SET WHERE EMPLOYEE SALARY = SALARY *1.1 DNO IN (SELECT DNUMBER FROM DEPARTMENT WHERE DNAME='Research')
In this request, the modified SALARY value depends on the original SALARY value in each tuple The reference to the SALARY attribute on the right of = refers to the old SALARY value before modification The reference to the SALARY attribute on the left of = refers to the new SALARY value after modification
Surendra Singh Chahar Department of CS&IT, SU 180
Chapter Outline
1 Informal Design Guidelines for Relational Databases 1.1Semantics of the Relation Attributes 1.2 Redundant Information in Tuples and Update Anomalies 1.3 Null Values in Tuples 1.4 Spurious Tuples 2 Functional Dependencies (FDs) 2.1 Definition of FD 2.2 Inference Rules for FDs 2.3 Equivalence of Sets of FDs 2.4 Minimal Sets of FDs
181
Chapter Outline(contd.)
3 Normal Forms Based on Primary Keys
3.1 3.2 3.3 3.4 3.5 3.6 Normalization of Relations Practical Use of Normal Forms Definitions of Keys and Attributes Participating in Keys First Normal Form Second Normal Form Third Normal Form
4 General Normal Form Definitions (For Multiple Keys) 5 BCNF (Boyce-Codd Normal Form)
182
183
Semantics of the Relation Attributes GUIDELINE 1: Informally, each tuple in a relation should represent one entity or relationship instance. (Applies to individual relations and their attributes).
Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation Only foreign keys should be used to refer to other entities Entity and relationship attributes should be kept apart as much as possible.
Surendra Singh Chahar Department of CS&IT, SU 184
185
186
Update Anomaly: Changing the name of project number P1 from Billing to Customer-Accounting may cause this update to be made for all 100 employees working on project P1.
187
189
190
191
192
193
2.1 Functional Dependencies (1) Functional dependency is a constraint between two sets of attributes from the database. FDs and keys are used to define normal forms for relations FDs are constraints that are derived from the meaning and interrelationships of the data attributes
Surendra Singh Chahar Department of CS&IT, SU 194
Functional Dependency
Definition. A functional dependency, denoted by X ~ Y, between two sets of attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is that, for any two tuples t1and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y] . This means that the values of the Y component of a tuple in r depend on, or are determined by, the values of the X component; alternatively, the values of the X component of a tuple uniquely (or functionally) determine the values of the Y component. We also say that there is a functional dependency from X to Y, or that Y is functionally dependent on X. The abbreviation for functional dependency is FD or f.d. The set of attributes X is called the left-hand side of the FD, and Y is called the right-hand side.
Surendra Singh Chahar Department of CS&IT, SU 195
196
197
198
199
200
3.2
Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties The database designers need not normalize to the highest possible normal form. (usually up to 3NF, BCNF or 4NF) Denormalization: the process of storing the join of higher normal form relations as a base relation which is in a lower normal form
201
3.2 First Normal Form Disallows composite attributes, multivalued attributes, and nested relations; attributes whose values for an individual tuple are nonatomic Considered to be part of the definition of relation
202
203
204
Definitions: Prime attribute - attribute that is member of the primary key K Full functional dependency - a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more
Examples: - {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold - {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds
Surendra Singh Chahar Department of CS&IT, SU 205
Second Normal Form (2) A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key R can be decomposed into 2NF relations via the process of 2NF normalization
206
The normalization process. (a) Normalizing EMP_PROJ into 2NF relations. (b) Normalizing EMP_DEPT into 3NF relations.
207
208
209
210
(1)
The above definitions consider the primary key only The following more general definitions take into account relations with multiple candidate keys A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on every key of R
211
212
There exist relations that are in 3NF but not in BCNF The goal is to have each relation in BCNF (or 3NF)
213
214
215
217
SUMMARY
218
220