Академический Документы
Профессиональный Документы
Культура Документы
Database Systems
Winter 2016
Prof. Tyson Condie
Data is Everywhere
Easier and cheaper than ever to collect
Data grows faster than Moores law
12
9
Moore's Law
Overall Data
6
3
0
2010 2011 2012 2013 2014 2015
(IDC report*)
Sources:
"Big Data: The Next Frontier for Innovation, Competition and Productivity."
US Bureau of Labor Statistics | McKinsley Global Institute Analysis
Obviously, Search
Google & Bing
Cloud services
Amazon, Google AppEngine, MS Azure,
What is a Database?
In Other Words
Relational DataBase Management Systems were
invented to let you use one set of data in multiple
ways, including ways that are unforeseen at the
time the database is built and the 1st applications
are written.
(Curt Monash, analyst/blogger)
That is, think about the data independently of any particular
program.
View 2
View 3
Conceptual Schema
Physical Schema
DB
A Simple Idea: Applications should be
insulated from how data is structured and
stored.
sid
53666
53688
53650
name
login
Jones jones@cs
Smith smith@eecs
Smith smith@math
age
18
18
19
gpa
3.4
3.2
3.8
Physical schema:
Relations stored as unordered files
Index on first column of Students, first 2 cols of Enrolled
What is a DBMS?
Data out:
e.g. 2000
Query Optimization
and Execution
Relational Operators
Access Methods
Buffer Management
Disk Space Management
Customer accounts
stored on disk
These layers
must consider
concurrency
control and
recovery
Count
distinct
Having
Group(agg)
Join
Select
Proj
Join
Emp
Emp
Asgn
Emp
Employees
Projects
Assignments
Issues: view reconciliation, operator ordering, physical operator
choice, memory management, access path (index) use,
Who?
Instructor
Prof. Tyson Condie (tcondie@cs.ucla.edu)
Teaching Assistant
Ariyam Das (ariyam@cs.ucla.edu)
Textbook
Database Management Systems: 3rd Edition
By Ramakrishnan and Gehrke
How? Workload
A New Set of Projects:
Grading Policy
Homework & Projects: 25%
Lab 1 (5%): Introduction to SimpleDB
Lab 2 (10%): SimpleDB Operators
Lab 3 (10%): Query Optimization
How? Administrivia
https://sites.google.com/site/cs143databasesystems/
Lecture notes will be posted before lecture
We will be using Piazza for most communication,
Office Hours: Prof. Condie M 4-5pm, W 4-5pm
Lecture: Boelter 2760
TAs hours: see course website
Course waiting list instructions
http://cs.ucla.edu/classes/enroll
Semi-Structured
Unstructured
(schema-first)
(schema-later)
(schema-never)
Relational
Database
Formatted
Messages
DocumentsX
ML
Tagged
Text/Media
Plain Text
Media
name
login
Jones jones@cs
Smith smith@eecs
Smith smith @math
age gpa
18
3.4
18
3.2
19
3.8
DELETE
FROM Students S
WHERE S.name = 'Smith'
Powerful variants of these commands are available;
more later!
Keys
Keys are a way to associate tuples in different relations
Keys are one form of integrity constraint (IC)
Enrolled
sid
53666
53666
53650
53666
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
FOREIGN Key
Students
sid
53666
53688
53650
name
login
age gpa
Jones jones@cs
18 3.4
Smith smith@eecs 18 3.2
Smith smith@math 19 3.8
PRIMARY Key
Primary Keys
A set of fields is a superkey if:
No two distinct tuples can have same values in all key fields
E.g.
sid is a key for Students.
What about name?
The set {sid, gpa} is a superkey.
Other
Students can take only one course, and no two students in a course
receive the same grade.
cid
grade
Carnatic101
C
Reggae203
B
Topology112
A
History105
B
11111 English102 A
Students
sid
53666
53688
53650
name
login
age gpa
Jones jones@cs
18 3.4
Smith smith@eecs 18 3.2
Smith smith@math 19 3.8
Next Up
Well talk a bit about the SQL DML
Then well start describing the DBMS from storage on up