Вы находитесь на странице: 1из 49

Basic SQL

Chapter 4

Last Updated 12/29/2014 1:02 PM

CS 4347 Introduction to Database Systems

Michael Christiansen 2014

Structured Query Language

SQL is the standard language supported by all Relational


Database Management Systems.

Produces results by

SQL has been an ANSI standard since 1986.


Is the main reason for the success of the RDBMS technologies.
Is a declarative (non-procedural) programming language.
Specify constraints on tuples retrieved from existing relations.
Filtering and creating new relations from existing relations.

Encompasses both Data Definition and Data Manipulation.

Used to create DB schema, tables, views and other objects


Used to perform relational operations on the data.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

SQL Language Overview

Table, Row and Column are SQL / RDBMS equivalents of


Relations, Tuple, and Attribute.
SQL statements are terminated by a semicolon (;).
SQL is case-insensitive.

However, schema, table, attribute, etc. names can be made


case-sensitive depending on DBMS settings.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

Schema and Namespaces

A DBMS can maintain the tables and data for several


purposes i.e. several applications.

Namespace is the mechanism used to partition the objects


owned by one application from the other applications hosted
by the DBMS.
A schema maintains a unique namespace.

Each application needs it own namespace.

Partitioning the objects (table, view, etc.) of one application from other
applications.
Each application may maintain different entities with the same name.

A schema allows user access to the DBMS can be restricted


to specific applications data.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

Schema and Catalog

A schema is a database object which partitions its objects


from objects in other schema in the same DBMS server.
A Catalog is the collection of schema maintained by a DBMS.

A databases catalog includes a special schema


Information_Schema that contains descriptions of all the
databases schema.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

SQL Statements

SQL provides these statements that allow the user to CRUD


information.

CREATE: Create relational objects.


INSERT: Insert new rows into tables.
SELECT: Select / join / return specific rows from table(s).
UPDATE: Update the attribute values of specific table rows.
DELETE: Remove specific rows from tables.
DROP: Remove specific objects from the DBMS.
ALTER: Modify an existing object e.g. table, index, view, etc.

All of these statements maintain several options that determine


the target objects and operations.
Each vendor has their own extensions to the SQL statements.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

CREATE Statement

The CREATE statement is used to create database objects.

The general format is CREATE <OBJECT> <Options>

CREATE schema, table, view, trigger, etc.


create schema my_application;
create table my_application.customer ( <column definition>+ );

A schema and all of its objects can be removed with the


DROP SCHEMA <name> command.
The user needs sufficient permissions to execute these
statements.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

ECOMM Schema

CS4347 Introduction to Database Systems

Michael Christiansen 2014

Example CREATE TABLE with Column Definitions

CS4347 Introduction to Database Systems

Michael Christiansen 2014

SQL Data Types

The CREATE TABLE statement includes definitions of the


tables columns including the columns data type.

Numeric Types: Integer and floats of varying sizes, usually


measured in precision i.e. number of digits.

INTEGER (10 digits), BIGINT (19 digits), SMALLINT (5 digits)


DECIMAL(P,S) a floating point type with fixed precision and scale
e.g. 123.21 is P=5, S=2.

Decimal numbers are used to represent prices.

REAL, FLOAT, DOUBLE: Floating point numbers with increasing


size. See vendor docs for specific sizes.

CS 4347 Introduction to Database Systems

Michael Christiansen 2014

10

SQL Data Types

Character Types: Character arrays (string) of both fixed and


variable sizes.

CHARACTER(N) or CHAR(N) is a character string of fixed length


N that will be padded to the right with spaces.
VARCHAR(N) is a character string of maximum length N that will
be terminated at the end of the string (no padding).
CBLOB(N) is a Character BLOB of maximum size N.

CBLOB is used to store very large character data that might normally
be placed in a file. e.g. several megabytes and even larger.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

11

SQL Data Types

BOOLEAN: Simple true/false.


ENUM: A fixed set of strings.

DATE / TIME / TIMESTAMP: Represent a point in time with


varying precision.

ENUM(M, F) describes the values of a GENDER column.

DATE is a date at 12:00 AM.


TIME is a time in a 24 hour period.
TIMESTAMP is a specific time on a specific date.

BLOB: Binary Large Object

Used to store large binary data such as a mpg or jpg file.


BLOBs can not be indexed nor its contents queried i.e. in a
WHERE clause.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

12

Vendor-Specific Data Types

Spatial Data: GIS data describing a geographic position, area,


borders, and others.

XML Data: A character set containing and XML document.

Queries can be built comparing distances, sizes, relative


positions, and others.
Queries can be built examining the contents of the document.
XPATH expressions can be used to locate specific elements.

Object-Oriented: Tables can be mapped to classes and


procedures attached to the table.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

13

Data Type Constraints

The value assigned to a column (attribute) can be further


constrained by specific declarations in the create statement.

NOT NULL: Specifies that a value must always be provided.

These constraints will be automatically enforced by the DBMS.

The columns value can not be NULL.

DEFAULT <value>: If a value is not provided, the given value


will be used to populate the column.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

14

Data Type Constraints

CHECK <clause>: Specifies a boolean clause associated with


a column that must always evaluate to true.

DEPT_NUM INTEGER NOT NULL


CHECK (DEPT_NUM > 0 AND DEPT_NUM < 23),
DEPT_NUM column is integer, must be specified, and must be
between 1 and 22.

DOMAIN: This is a SQL user-defined data type.

CREATE DOMAIN DNUMBER AS INTEGER


CHECK (DNUMBER > 0 AND DNUMBER < 23),
DEPT_NUM DNUMBER NOT NULL;
Very few databases support the DOMAIN clause.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

15

MySQL Table Definition with Constraints


CREATE TABLE customer (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`first_name` VARCHAR(50) NOT NULL,
`last_name` VARCHAR(50) NOT NULL,
`gender` CHAR NOT NULL,
`dob` DATETIME NOT NULL,
`email` VARCHAR(100) NOT NULL,
UNIQUE INDEX `ID_UNIQUE` (`id` ASC),
UNIQUE INDEX `email_UNIQUE` (`email` ASC),
PRIMARY KEY (`id`)
);
CS4347 Introduction to Database Systems

Michael Christiansen 2014

16

Index

An index is another type of database object i.e. it is allocated


space, it stores information, it is maintained.

Every index requires additional DB operations to maintain the


indexs table i.e. add / remove rows, update key attributes.

The index defines a mechanism by which the retrieval of


specific rows can be accelerated.

A means of sorting the rows by different attributes to accelerate


B-TREE retrieval (and others).
Multiple indexes can be maintained on a single table.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

17

Index

An index can enhance the performance of an application.

The difference between a linear (n) and log2(n) lookup (search).

An index does not need to be unique.

But a unique index is needed to build a key.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

18

Keys and Referential Integrity Constraints

Each column definition can include constraints that describe


the tables key(s) and references from other tables.
Candidate Keys: A set of one or more attributes that uniquely
identify each row.

In MySQL keys are defined as named indexes.


Naming allows the specification of the tables Primary Key.

Primary Key: The single, best candidate key.

The exported key i.e. the key that is used as a foreign key in / by
other tables.
Also serves to suggest the tables physical row ordering.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

19

Keys and Referential Integrity Constraints

Natural Key: A key defined by attributes that occur in the


problem domain i.e. a persons SSN or a products serial #.

Also called the Business Key.

Surrogate Key: A key defined by the database schema that


serves only to uniquely identify each row.

This is usually the tables primary key.


Defined by Sequences and auto-increment columns.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

20

Foreign Keys

A foreign key is a table attribute that references the primary key


in a different (foreign) table.

Address.cust_id references Customer.id i.e. identifies a customer.

A foreign key is defined by including the primary key of the


referenced table as a column in the referencing table.

Address is the referencing and Customer is the referenced table.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

21

Two ways to define a FK Relationship

1-1: The FK ID is maintained on the referencing table

Customer maintains Purchase primary key.


1-1 is slightly more efficient (faster retrieval), but will not allow
more that a single reference.

1-M: The FK ID is maintained on the referenced table.

Address maintain Customers primary key.


Note that Address does not have a primary key but an index is
maintained for the cust_id (FK) attribute making Address a Value
Type in Hibernate.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

22

MySQL Table Definition


with Foreign Key Constraints
CREATE TABLE purchase (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`purchase_date` DATE NOT NULL,
`customer_id` BIGINT(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `id_UNIQUE` (`id` ASC),
INDEX `fk_purchase_customer1_idx` (`customer_id` ASC),
CONSTRAINT `fk_purchase_customer1`
FOREIGN KEY (`customer_id`)
REFERENCES `customer` (`id`)
ON DELETE RESTRICT
ON UPDATE RESTRICT
);

CS4347 Introduction to Database Systems

Michael Christiansen 2014

23

ON DELETE
Foreign Key Constraints

When we delete a Customer, do we maintain its Address,


Credit Card, and Purchase?

The ON DELETE RESTRICT

No. These owned items should be removed from the database


when the owning Customer is deleted.

By default, the DB will not allow a Customer to be deleted if there


exist a referenced Purchase, Address, or Credit-Card.

The ON DELETE CASCADE

CASCADE will automatically remove any Address, Credit_Card,


and Purchase records before deleting the Customer.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

24

ON UPDATE
Foreign Key Constraints

ON UPDATE RESTRICT

Prevents the modification of a referenced (primary) key if there


exists rows in the referencing (FK) table.
Or if the referencing FK attribute is updated to a non-existing
referenced primary key

For example, update Address.cust_id to a non-existing customer ID.

ON UPDATE CASCADE

Updates the referenced attribute triggers a change in all of FK


table attributes.
Only useful if the referenced tables primary key is a Natural Key.

Once assigned, a Surrogate Key should never be changed.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

25

COMPANY Schema
Example Schema from Text

CS4347 Introduction to Database Systems

Michael Christiansen 2014

26

SELECT Statement

SELECT is the most commonly used operation.

The SELECT statement describes which rows are selected


(identified) and returned to the user or application.

INSERT and UPDATE are also frequently executed.


Creating objects is a one-time operation.

Which tables are being selected from.


Which table attributes are to be returned.
Which table rows are to be returned.

SELECT is the most complex statement in SQL.

There are many ways to extract information from the database.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

27

SELECT-FROM-WHERE Clauses

The basic SELECT statement is made up of four parts:


SELECT <attribute-list>
FROM <table-list>
WHERE <conditions>
ORDER BY <attribute-list>;

<attribute-list> is the list of attributes in the querys result set or


used for grouping tuples in the result set.
<table-list> is the set of tables to be joined and queried.
<conditions> describe which tuples are present in the result-set.

Select the rows in the given tables where the attribute values
match the given conditions.

And return only the given columns in the result-set.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

28

Simple Select Statement

Return columns BDATE, SEX, ADDRESS from every row in


EMPLOYEE with the given first and last names.

SELECT BDATE, SEX, ADDRESS


FROM EMPLOYEE
WHERE FNAME = 'Brad' AND LNAME = 'Knight';

SELECT * will retrieve all the columns from EMPLOYEE.

SELECT *
FROM EMPLOYEE
WHERE FNAME = 'Brad' AND LNAME = 'Knight';

CS4347 Introduction to Database Systems

Michael Christiansen 2014

29

Logical and Physical View of a SELECT

Logically, we can think of the SELECT causing the DB to


iterate over every row in EMPLOYEE and evaluate the where
clause.

The result-set contains those rows where the WHERE clause


conditions holds true.

Physically, the use of an index on fname and lname will


greatly speed the execution of this select statement.

By identifying the CUSTOMER rows that maintain the target


fname/lname attribute values.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

30

Join Two Tables

SELECT FIRST_NAME, LAST_NAME, CITY


FROM CUSTOMER, ADDRESS
WHERE ZIPCODE like '75%'
AND CUSTOMER_ID = CUSTOMER.ID;

Select first, last names, city from customer where the


customers address in the 75* zip code area.
Notice the WHERE condition joins rows from both tables.
The join can be between any two columns, but the use of a
foreign key greatly enhances performance.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

31

Ambiguous Attribute Names

SELECT C.FIRST_NAME, C.LAST_NAME, A.CITY


FROM CUSTOMER AS C, ADDRESS AS A
WHERE A.ZIPCODE like '75%'
AND A.CUSTOMER_ID = C.ID;

This statement creates aliases C and A which can be used to


identify which attributes are being referenced in the SELECT
and WHERE clauses.
An alias is used

To make the query easier to understand.


To specify which attribute are referenced when multiple tables
have attributes of the same name.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

32

Recursive Query

Select each employee and the employees supervisor.


SELECT E.FNAME, E.LNAME, S.FNAME, S.LNAME
FROM
EMPLOYEE AS E,
EMPLOYEE AS S
WHERE E.SUPERSSN = S.SSN;

Note that this query uses EMPLOYEE as two logical tables.


Joins EMPLOYEE to EMPLOYEE where the SUPERSSN
attribute is a foreign key to the employees supervisor SSN
attribute.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

33

Default Select

All of the attributes can be returned by a SELECT statement


by using an * as the attribute list.
SELECT * FROM EMPLOYEE;

SELECT * FROM EMPLOYEE, DEPARTMENT


WHERE DNAME = Research and DNO = DNUMBER;

Returns all of the columns and all of the table rows.

Joins EMPLOYEE to DEPARTMENT returning all attributes from


both tables.

SELECT * FROM EMPLOYEE, DEPARTMENT;

Returns an unrestricted join i.e. a cross product between the two


tables.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

34

Selecting Distinct Values

In some cases we wish to return only distinct values in our


results i.e. eliminate rows with duplicate values from RS.

In this example, many employees may have the same salary.


SELECT SALARY FROM EMPLOYEE;

Returns the SALARY attribute from each row including duplicate


salary values.

SELECT DISTINCT SALARY FROM EMPLOYEE;

The DISTINCT keyword causes the SELECT to return all the


unique values from the SALARY column i.e. without duplicates.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

35

Set Operations on Tables

SQL provides three operations that can be performed on a


SELECTs result set.

SELECT1 UNION SELECT2

SELECT1 EXCEPT SELECT2

The union of the two selects result sets.


Subtract the member of SELECT1 from SELECT2

SELECT1 INTERSECT SELECT2

The intersection of the SELECT1 and SELECT2.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

36

Example Union

(SELECT DISTINCT pnumber


FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE dnum = dnumber AND mgrssn=ssn AND lname='Wong')
UNION
(SELECT DISTINCT pnumber
FROM PROJECT, WORKS_ON, EMPLOYEE
WHERE pnumber = pno AND essn=ssn AND lname = 'Wong');
Select projects where either the projects manager or projects
employee is named Wong.
Generally, the selected attributes should be the same in both
SELECT statements.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

37

String Matching

SQL String wildcard symbol is % and uses the LIKE clause


to perform the wildcard query.

The equals symbol = performs an exact match on the pattern


string.

SELECT fname, lname


FROM EMPLOYEE
WHERE address LIKE '%Houston%TX%';

Select the names of employees living in Houston TX.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

38

String Matching

The underscore _ can be used to match any single


character.
Two queries that select all employees born in the 1950s.

SELECT fname, lname


FROM EMPLOYEE
WHERE bdate LIKE '__5_______';
SELECT fname, lname
FROM EMPLOYEE
WHERE bdate LIKE 195%';

Interesting to note that we are performing a string matching


operation on column BDATE of type Date.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

39

String Matching

Literal strings can be quoted using the single or double tick


symbols ( or ).
A backslash \ can be used to escape a % or _ when used in
a match.
Literal strings can be concatenated using the || operator.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

40

BETWEEN operator

The BETWEEN / AND operator can be used to match values


between two literals.
SELECT *
FROM EMPLOYEE
WHERE (salary BETWEEN 30000 AND 40000) AND dno = 5;

The between operator works with strings.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

41

Arithmetic Operators in SELECT

Arithmetic operators (+-*/) can be used in the SELECT and


WHERE clauses.

SELECT fname, lname, salary*1.1 as 'Salary W 10% Raise


FROM Employee
WHERE salary*1.1 > 100000;

Select employees whose 10% raise places their salary over 100K.

Notice the use of the AS clause to create a human readable


column header for the query.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

42

Ordering the Result Set

The order of the result can be specified by the ORDER BY


clause.

Multiple attributes can be listed in the clause.

SELECT D.dname, E.lname, E.fname, P.pname


FROM DEPARTMENT D, EMPLOYEE E, WORKS_ON W, PROJECT P
WHERE D.dnumber=E.dno AND E.ssn=W.essn AND W.pno=P.pnumber
ORDER BY D.dname, E.lname, E.fname;
Retrieve a list of employees and the projects they are working on,
ordered by department and, within each department,
ordered alphabetically by last name, then first name.

Notice the use of aliases to make the query easier to


understand.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

43

INSERT Statement

Generally, the INSERT statement is used to insert a new tuple


/ row into a table.

INSERT INTO EMPLOYEE


VALUES ( 'Richard', 'K', 'Marini', '653298653', '1962-12-30',
'98 Oak Forest, Katy, TX', 'M', 37000, '653298653', 4 );
Notice that the order of the values must match the order and
number of the table columns declared in the CREATE statement.

INSERT INTO EMPLOYEE (fname, lname, dno, ssn)


VALUES ('Richard', 'Marini', 4, '653298653');

Optionally, attribute names can be specified as shown above.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

44

Insert from a Nested Select

Insert rows generated by a SELECT from other tables.

INSERT INTO WORKS_ON_INFO


(emp_name, proj_name, hours_per_week)
SELECT E.lname, P.pname, W.hours
FROM PROJECT P, WORKS_ON W, EMPLOYEE E
WHERE P.pnumber = W.pno AND W.essn = E.ssn;
The definition of the table works_on_info is in the slide notes.

Note how the nested SELECT provides the values for the
INSERT statement.
The selected columns (lname, pname, hours) must match the
data types and sizes of the inserted works_on_info table.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

45

DELETE Statement

The DELETE statement removes zero or more tuples (rows)


from a relation (table).
DELTE FROM <Table> [WHERE <condition>];

DELETE operates on only a single table.


The WHERE clause is optional and identifies which rows to
remove (just like the SELECT).

Removing a tuple can not violate a schema integrity


constraint.

Can not delete a tuple that is being reference by another tuple


through a foreign key (Unless the CASCADE option is used).
For example, can not remove an EMPLOYEE that is being
reference by DEPENDENT though the essn attribute.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

46

Example DELETE statements

DELETE FROM EMPLOYEE WHERE lname=Brown;

DELETE FROM EMPLOYEE WHERE dno=5;

Deletes N employees from department 5.

DELETE FROM EMPLOYEE WHERE ssn=123456789;

Remove the rows from EMPLOYEE where lname = Brown.

Removes a single tuple as EMPLOYEE.ssn is a unique attribute.

DELETE FROM EMPLOYEE;

Removes all rows from EMPLOYEE but not the table itself.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

47

UPDATE Statement

The UPDATE statement modifies the attribute values of


existing tuples.
UPDATE <table>
SET <attribute>=<value>,+
WHERE <conditions>

UPDATE can only operate on a single table.


The ATTR=Value specifies the new value for ATTR.
The WHERE conditions identify the tuples to modify.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

48

UPDATE Statement

UPDATE EMPLOYEE
SET salary = salary * 1.1
WHERE dno = 5;

Give all employees belonging to Department 5 a 10% raise.

CS4347 Introduction to Database Systems

Michael Christiansen 2014

49

Вам также может понравиться