Академический Документы
Профессиональный Документы
Культура Документы
Contents
Introduction to DBMS
DBMS Users
DBMS architecture Data Models ER Modeling Relational Model SQL Normalization
What Is a DBMS?
Database : A very large, integrated collection of data. A Database Management System (DBMS) is a
Online retailers: order tracking, customized recommendations Manufacturing: production, inventory, orders, supply chain Human resources: employee records, salaries, tax deductions
10
directly on top of file systems Drawbacks of using file systems to store data: Data redundancy and inconsistency Multiple file formats, duplication of information in different files Difficulty in accessing data Need to write a new program to carry out each new task Data isolation multiple files and formats Integrity problems Integrity constraints (e.g. account balance > 0) become buried in program code rather than being stated explicitly Hard to add new constraints or change existing ones
11
Atomicity of updates
partial updates carried out Example: Transfer of funds from one account to another should either complete or not happen at all Concurrent access by multiple users Concurrent accessed needed for performance Uncontrolled concurrent accesses can lead to inconsistencies
Security problems
Example: Two people reading a balance and updating it at the same time
12
Database Users
13
Users are differentiated by the way they expect to interact with the system Application programmers interact with system through DML calls Sophisticated users form requests in a database query language Nave users invoke one of the permanent application programs that have been written previously Examples, people accessing database over the web, bank tellers, clerical staff
Database Administrator
14
Coordinates all the activities of the database system has a good understanding of the enterprises
information resources and needs. Database administrator's duties include: Storage structure and access method definition Schema and physical organization modification Granting users authority to access the database Backing up data Monitoring performance and responding to changes
View of Data
15
Levels of Abstraction
16
stored. Logical level: describes data stored in database, and the relationships among the data. type customer = record customer_id : string; customer_name : string; customer_street : string; customer_city : string;
end;
Views can also hide information (such as an employees salary) for security purposes.
17
languages Schema the logical structure of the database Example: The database consists of information about a set of customers and accounts and the relationship between them) Analogous to type information of a variable in a program Physical schema: database design at the physical level Logical schema: database design at the logical level
18
modify the physical schema without changing the logical schema Applications depend on the logical schema In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others.
19
Data Independence
Applications insulated from how data is structured
and stored. Logical data independence: Protection from changes in logical structure of data. Physical data independence: Protection from changes in physical structure of data.
20
Would win the ACM Turing Award for this work IBM Research begins System R prototype UC Berkeley begins Ingres prototype
High-performance (for the era) transaction processing
History (cont.)
1980s:
21
systems SQL becomes industry standard Parallel and distributed database systems Object-oriented database systems 1990s: Large decision support and data-mining applications Large multi-terabyte data warehouses Emergence of Web commerce 2000s: XML and XQuery standards Automated database administration Increasing use of highly parallel database systems Web-scale distributed data storage systems
Data Models
22
describing data.
Relational model Entity-Relationship data model (mainly for database design) Object-based data models (Object-oriented and Object-relational) Semi structured data model (XML) Other older models: Network model Hierarchical model
The relational model of data is the most widely used model today.
23
Database Design- Conceptual design: An entity is an object that exists and is (ER Model) distinguishable from other objects.
Example: specific person, company, event, plant Entities have attributes
share the same properties. Example: set of all persons, companies, trees, holidays Relationship: Association among two or more entities. E.g., Aisha works in Pharmacy department
24
ER-diagram
name ssn lot Employees Covers Policies policyid cost pname age
Dependents
Relational Model
25
Attributes
26
27
SSN
Name
lot
ssn
name
lot
Employees
28
INTRODUCTION TO SCHEMA REFINEMENT Problems Caused by Redundancy Storing the same information redundantly, that is, in more than one place within a database, can lead to several problems: Redundant storage: Some information is stored repeatedly. Update anomalies: If one copy of such repeated data is updated, an inconsistency is created unless all copies are similarly updated. Insertion anomalies: It may not be possible to store some information unless some other information is stored as well. Deletion anomalies: It may not be possible to delete some information
29
Formal method
- Normalization
30
Normal Forms
If a relation is in a certain normal form (BCNF,
3NF etc.), it is known that certain kinds of problems are avoided/minimized. This can be used to help us decide whether decomposing the relation will help. FDs are used to detect redundancy
31
Example decomposition(simple)
Bad database design
Eno
Ename Salary
Add
Bdate Dno
Dname Mngname
Eno
Ename Salary
Dno
Dname Mngname
32
create table account ( account_number char(10), branch_name char(10), balance integer) DDL compiler generates a set of tables stored in a data dictionary Data dictionary contains metadata (i.e., data about data) Database schema Data storage and definition language Specifies the storage structure and access methods used Integrity constraints Domain constraints Referential integrity (e.g. branch_name must correspond to a valid branch in the branch table) Authorization
33
Two classes of languages Procedural user specifies what data is required and
how to get those data Declarative (nonprocedural) user specifies what data is required without specifying how to get those data SQL is the most widely used query language
SQL
34
id 192 select customer_name from customer where customer_id = 192 Example: Find the balances of all accounts held by the customer with customer-id 192 select balance from depositor, account where customer_id = 192 and depositor.account_number = account.account_number
SQL
35
through one of Language extensions to allow embedded SQL Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database
Query Processing
1.Parsing and translation
36
2. 3.
Optimization Evaluation
37
Alternative ways of evaluating a given query Different algorithms for each operation Cost difference between a good and a bad way of evaluating
a query can be enormous Need to estimate the cost of operations Depends critically on statistical information about relations which the database must maintain Need to estimate statistics for intermediate results to compute cost of complex expressions
Transaction Management
38
performs a single logical function in a database application Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures. Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.
Transaction Concept
39
accesses and possibly updates various data items. E.g. transaction to transfer $50 from account A to account B:
1. 2. 3. 4. 5. 6. read(A) A := A 50 write(A) read(B) B := B + 50 write(B)
Two main issues to deal with: Failures of various kinds, such as hardware failures
40
1. read(A) 2. A := A 50 3. write(A) 4. read(B) 5. B := B + 50 6. write(B) Atomicity requirement if the transaction fails after step 3 and before step 6, money will be lost leading to an inconsistent database state
Failure could be due to software or hardware the system should ensure that updates of a partially executed transaction are not reflected in the database Durability requirement once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database by the transaction must persist even if there are software or hardware failures.
41
Transaction to transfer $50 from account A to account B:
42
allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it should be). T1 T2 1. read(A) 2. A := A 50 3. write(A) read(A), read(B), print(A+B) 4. read(B) 5. B := B + 50 6. write(B Isolation can be ensured trivially by running transactions serially that is, one after the other. However, executing multiple transactions concurrently has significant benefits.
ACID Properties
43
A transaction is a unit of program execution that accesses and possibly updates various data items.To preserve the integrity of data the database system must ensure:
Atomicity. Either all operations of the transaction are properly reflected in the
database or none are. Consistency. Execution of a transaction in isolation preserves the consistency of the database. Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions. That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj, finished execution before Ti started, or Tj started execution after Ti finished. Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.
Transaction State
44
Concurrent Executions
45
Advantages are: increased processor and disk utilization, leading to better transaction throughput E.g. one transaction can be using the CPU while another is reading from or writing to the disk reduced average response time for transactions: short transactions need not wait behind long ones. Concurrency control schemes mechanisms to achieve ACID properties with concurrency.
to control concurrency
46