1 SQL Overview

CS 680 Data Warehouses Lecture 1
Overview of Relational Model, SQL, normalization
Agenda for this Lecture

Crash course on database theory ER-Diagrams as a tool for conceptual modeling Relational model as a tool for logical modeling Relational algebra SQL Data Manipulation Language overview Database normalization Sources for this lecture:
1. 2. 3. Principles of Database and Knowledge-base Systems Volume 1, by Jeffrey Ullman or any other basic database textbook (chapters 2, 4, 7) Database System Concepts, by A. Silberschatz, H.F.Korth, S. Sudarshan (chapter 7) Internet, these notes, any other textbook on databases
Introduction to the theory of databases

Introduction to the theory of databases
Conceptual data model & entity relationship diagrams Logical data model & the relational data model Physical data model & DBMS data model
Conceptual ER-Diagrams Logical Relational Model Physical SQL Tables
3
Entity Relationship Data Model

A tool for conceptual data modeling Entity Relationship Model
Entity Sets: a group consisting of all similar entities
all persons, all claims, all cars
Attributes: properties of entity sets, which associate each entity in the set with a value in a domain of values for that attribute
social security number, name, address, occupation, employer domain of values: integers, dates, strings, etc
Keys: one or more attributes that uniquely identify an entity in the set Isa hierarchies: A isa B (read A is a B), if entity set B is a generalization of entity set A, or, equivalently B is a special kind of A.
The primary purpose of defining isa hierarchies between two sets is so that one can inherit the attributes of the other an employee is a person, the claimant is a person
Relationships: an ordered list of entity sets

employees work for companies, a supplier supplies parts One-to-many, one-to-one, many-to-many
4
Example of Entity Relationship Diagram (ER-diagram)

One-to-many Salary Assigned To Name One-to-one Name
Phone
EMPS
DEPTS
Manages
MANAGERS
Name
Location
Many-to-many Parent Of Note the presence of the arrow in a relationship

5
Name
PERSONS
Logical Data Models

A (logical) data model is a mathematical formalism with two parts A notation for describing data A set of operations used to manipulate that data Models: relational, network, hierarchical, object-oriented
6
Relational Data Model

Introduced by E. Codd in 1970 Generally, the model of choice for databases Supports simple and declarative languages Based on the set-theoretic notion of a relation A relation is a subset of the cartesian product of one or more domain, where a domain is simply a set of values, not unlike a data set.
Domains: character strings of size 20, integers, etc Cartesian product D1xD2xxDn is the set of all tuples (v1, v2, , vn) such that v1 is in D1, v2 is in D2, etc Members of relation are called tuples
Ops: relational algebra, relational calculus
Representing ER Diagrams in the relational model

Entity Set -> Relation
EMPS(ENAME, PHONE, SALARY) DEPT(DNAME, LOCATION) Doesnt make sense MANAGERS(MNAME)
Attributes -> Attributes Relationships -> Relation

MANAGES(MNAME, DNAME) ASSIGNED_TO(ENAME, DNAME) PARENT_OF(PNAME, CNAME)
8
Relational Algebra, basic operations

Union, A U B, all the tuples both in A and B Set difference, A-B, all the tuples in A but not in B Cartesian Product A x B, all possible tuples <a, b> where a in A and B in B Projection: we take a relation, and we remove some of its attributes Selection, we take a relation and we select only tuples satisfying a given condition
9
Relational Algebra, other operations

Join, we take the cartesian product A x B, we select based on an equality or some other arithmetic comparison operator, and then we project a number of columns Equijoin, a join where the condition is an equality Outer join, we take a join (A join B) and we also include non-matching records left-outer join, the non-matching records come from A right-outer join, the non-matching records come from B Semijoin, we take the join (A join B) and we project on all the columns from A
10
The SQL Data Manipulation Language

A real query language, mostly based on relational algebra The select statement
select R1.A1, R2.A2, Rn.An from R1, R2, Rn where <condition> Example select NAME from CUSTOMERS Where BALANCE < 0;
11
SQL: The Select Statement

The select statement
select R1.A1, R2.A2, Rn.An from R1, R2, Rn where <condition>
projection
cartesian product selection
select name from customers
projection
12
SQL: The select statement

select name as cname, balance, addr from customers projection (+renaming)
select * from customers where balance < 0
selection
select name from customers where balance < 0 select distinct location from suppliers
projection + selection
set
13

Select * from supplies, part cartesian product
select * from suppliers, parts where supplier.part_id = parts.part_id
Join (equijoin)
select supplier.part_id from suppliers, parts where supplier.part_id=parts.part_id and supplier.name=MySupplier
Join, selection, projection
select c1.name, c2.addr from customers c1, customers c2 where c1.balance < c2.balance and c2.name = Dimitra
Tuple variables
14

Select item from supplies Where item like E% Pattern matching
select * from orders where order_num like 1____
Pattern matching
select * from claims where submitted_date = systemGetCurrentDate()
System Functions
15

select name from supplies where item in ( select item from orders where customer_name=Dimitra) Subquery
Can you rewrite this query without using the subquery?
16
SQL: Aggregate Operators

select avg(balance) from customers average select avg(balance) as avg_bal from customers
renaming
select count(distinct name) as dist_name from suppliers
renaming
select count(name) as brie_supps from suppliers where item=Brie

17
SQL: Group aggregate operations

select item, avg(price) from suppliers group by item select item, avg(price) from suppliers group by item having count(*) > 1 Instead of a single value, many are computed, one per item group selection prior to aggregation
select . having count(distinct price) > 1
Group selection
18
SQL: Insert
insert into suppliers values (Ajax, Escargot, 24) must give all the values in the same order as defined in the table
insert into suppliers (Name, item, price) values (Ajax, Escargot, 24)
the values are named
insert into suppliers (name, item) values (Ajax, Escargot)
missing attributes default to NULL
insert into acme_sells select item, price from supplies where name=acme
inserted values are computed

19
SQL: Delete
Delete from R Where <condition> Generic form
delete from orders where name=Acme and item=Brie
selection
Delete from orders Where order_num in (select order_num from includes where item = Brie)
select with a subquery
20
SQL: Update
update R set A1=E1, Ak=Ek where <condition> generic form
update supplies set price=1 where name=Acme and item=Perrier
specific tuple update
update supplies set price = 0.8 * price where name=Acme
group update
21
SQL: Create/Drop Table

Create table R generic form
Create table supplies (name char(20) not null, item char(10) not null, price number (6,2))
specific tuple update
Drop table R
Deletes a table
Drop table supplies
22
How it all comes together

Conceptual Logical Physical
ER Diagram
Relational Model
SQL Tables
No operations at this level
Relational Algebra
SQL
23
Database Design Theory

A problematic design
supplier (sname, address, item, price) Redundancy the address of the supplier is repeated once for each item Potential inconsistency (update anomalies) if we update the address of a supplier in one record, we must make sure we update it in all the records Insertion anomalies we cannot record the address of a supplier if that supplier does not currently have at least one item Deletion anomalies if we delete all the items supplied by a supplier, we unintentionally loose track of the suppliers address supplier (sname, address) supplies (sname, item, price)
A better design (?)
24
Database Design and Data Dependencies

supplier (sname, address) supplies (sname, item, price) Design Advantage: Eliminated Redundancy Design Disadvantage: to find addresses of suppliers of a given item, we need a join, instead of a simple selection and projection Are there any other problems with this design? How do we find a good replacement for a bad design? The cause and cure of the redundancy go hand in hand: functional dependencies not only cause the redundancy but also permit the decomposition of the original relation into two relations, so that the original relation can be recovered from the two relations, making a new design that eliminates redundancy
Is this a better design? Why?
SNAME Acme Acme
ADDRESS 16 River St ?????
ITEM Brie Brie
PRICE 3.49 1.19
Functional Dependency SNAME->ADRRESS ??? A Original Relation decomposition B

25
Functional Dependencies
Constraints that depend only on the equality or inequality of values
Let R(A1,...An) be a relation, and let X and Y be subsets of {A1,..An}. We say X->Y, read X functionally determines Y or Y functionally depends on X if, it is not possible for R to have two tuples that agree on X but not on Y Example {sname}->{address} Another way to say this is: if you know the value of X, then you know (i.e., you can determine) the value of Y
Integrity constraints are not functional dependencies (example integrity constraints: no one with an employment history of 37 years is 27 years old, no one is 60 feet tall, etc)
26
Functional Dependencies in a Real Database

Key dependencies Policy->Benefit, Effective Date, Expiration Date Claim -> Person, Policy, Payment Foreign key dependencies CLAIM.Policy_id -> Policy.Policy_id
POLICY TABLE
Policy_ id Life123 Acc123 Benefit 100K 500K Effective Date 1/1/05 1/1/05 Expiration Date 1/1/15 1/1/06
CLAIM TABLE
Claim 1 2 Person Maria Peter Policy_ id Life123 Acc123 Payment 100K 500K
27
Desired Database Design Properties

When designing a database, there are some desired properties one strives for
Dependency Preservation Loss-less join decomposition BCNF (Boyce-Codd normal form)
or, failing that,

Dependency Preservation Loss-less join decomposition 3NF (third-normal form)
28
Dependency Preservation
If a given database schema satisfies a set of functional dependencies, and the schema is modified, the new schema should also satisfy the same functional dependencies, that is the new schema should not permit invalid data to be added.
supplier (sname, address) supplies (sname, item, price)
then this also satisfies sname->address
SNAME ITEM PRICE
supplier (sname, address, item, price) if this satisfies sname->address
SNAME Acme
ADDRESS 16 River St
Acme
Acme
Brie
Brie
3.49
1.19
29
Dependency Preservation
supplier (sname, address, item, price)
if this satisfies sname->address supplier (sname, item) supplies (sname, address, price) then this does not satisfy sname->address
SNAME Acme
ITEM Brie
SNAM E
Acme Acme
Address
16 River St 123 Main St
PRICE
3.49 1.19
This is a schema that does not preserve the functional dependencies
30
Loss-less Join Decomposition

Lets say you take a relation R and you decompose it into one or more relations R1, R2, ...Rn using a set of dependencies D. If joining the R1, R2, ...Rn together gives you back the original relation, the join is loss-less with respect to D.
A Original Relation decomposition B join Original Relation ???
31
Loss-less Join Decomposition

supplier (sname, address, item, price)
if this satisfies sname->address supplier (sname, item) supplies (sname, address, price) then this does not satisfy sname->address
SNAME
Acme Acme
ITEM
Feta Brie
SNAME
Acme Acme
Address
16 River St 123 Main St
PRICE
3.49 1.19
This is a schema that does not have the loss-less join property can you tell why?
32
Dont Overdo it
person (ssn, name, address, phone) with functional dependencies ssn->name, address, phone
probably not a good idea
person (ssn, name) person (ssn, address) person (ssn, phone) with functional dependencies ssn->name, address, phone
33
Normal Forms
Boyce-Codd normal form (BCNF) Third-Normal form (3NF) BCNF is stronger (i.e., more difficult to achieve) 3NF is an approximation and what appears to be working in practice Every schema in BCNF is also 3NF, but not vice versa, i.e., 3NF permits data relationships that are disallowed by BCNF
34
Normal Forms: Boyce-Codd Normal Form

A relation R with dependencies F is said to be in BCNF normal form if whenever X->Y holds in R and Y is not in X, then X is a super key of R X is a key or contains a key Too strong a condition: we might not be able to modify a schema by decomposition and bring it into this form without giving up either dependency preservation or the lossless join property
BCNF
customer (name, street, city) with name->street, city
Not in BCNF!
loan (branch, customer, loan_num, amount) with loan_num -> amount, branch
35
Normal Forms: 3NF

A relation R with dependencies F is said to be in 3NF normal form if whenever X->Y holds in R and Y is not in X, then X is a super key of R or X is contained in a candidate key X is a key or contains a key or is contained in a candidate key BCNF requires that all nontrivial dependencies be of the form X->Y where X is a superkey; 3NF relaxes this requirement slightly by allowing non-trivial functional dependencies whose X is not a superkey
loan (branch, customer, loan_num, amount) with loan_num -> amount, branch This relation is in 3NF because a candidate key is <load_num, customer>
36
3NF simplified
First rule of normal form
remove redundant data from horizontal rows; all data should be held in columns and rows
Second rule of normal form

remove redundant data from vertical rows; values uniquely identify each row in each table
Third rule of normal form

remove data values independent of primary row keys; each table contains unique data
37
What should one do?

Decompose the heck out of a relation to eliminate redundancy? Assume that the cost of storing and manipulating redundant data is an acceptable trade-off given the benefits of fast access? Normalization is good for eliminating redundancy but it generally comes at the cost of inefficient queries Data warehouses are (typically) relational databases that
dont care so much about redundancy are optimized for fast querying using dimensional modeling techniques however, there are recent trends (04/05) that tend to champion 3NF data warehouses versus dimensionally modeled data warehouses
38

1 SQL Overview

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

1 SQL Overview

Загружено:

Авторское право:

Доступные форматы

CS 680 Data Warehouses Lecture 1

Overview of Relational Model, SQL, normalization

Agenda for this Lecture

Introduction to the theory of databases

Entity Relationship Data Model

Relationships: an ordered list of entity sets

Example of Entity Relationship Diagram (ER-diagram)

Many-to-many Parent Of Note the presence of the arrow in a relationship

Logical Data Models

Relational Data Model

Ops: relational algebra, relational calculus

Representing ER Diagrams in the relational model

Attributes -> Attributes Relationships -> Relation

Relational Algebra, basic operations

Relational Algebra, other operations

The SQL Data Manipulation Language

SQL: The Select Statement

select name from customers

SQL: The select statement

select * from customers where balance < 0

SQL: The select statement

select * from suppliers, parts where supplier.part_id = parts.part_id

select supplier.part_id from suppliers, parts where supplier.part_id=parts.part_id and supplier.name=MySupplier

Join, selection, projection

SQL: The select statement

select * from orders where order_num like 1____

select * from claims where submitted_date = systemGetCurrentDate()

SQL: The select statement

Can you rewrite this query without using the subquery?

SQL: Aggregate Operators

select count(distinct name) as dist_name from suppliers

select count(name) as brie_supps from suppliers where item=Brie

SQL: Group aggregate operations

select . having count(distinct price) > 1

the values are named

insert into suppliers (name, item) values (Ajax, Escargot)

missing attributes default to NULL

inserted values are computed

delete from orders where name=Acme and item=Brie

select with a subquery

update supplies set price=1 where name=Acme and item=Perrier

specific tuple update

update supplies set price = 0.8 * price where name=Acme

SQL: Create/Drop Table

specific tuple update

How it all comes together

No operations at this level

Database Design Theory

Database Design and Data Dependencies

Is this a better design? Why?

SNAME Acme Acme

ADDRESS 16 River St ?????

ITEM Brie Brie

PRICE 3.49 1.19

Functional Dependency SNAME->ADRRESS ??? A Original Relation decomposition B

Functional Dependencies in a Real Database

Desired Database Design Properties

or, failing that,

supplier (sname, address, item, price) if this satisfies sname->address

This is a schema that does not preserve the functional dependencies

Loss-less Join Decomposition

Loss-less Join Decomposition