Академический Документы
Профессиональный Документы
Культура Документы
n n
1 NF 2 NF 3 NF
Codds Rules
Data Normalization
n
The purpose of normalization is to produce a stable set of relations that is a faithful model of the operations of the enterprise.
n n n
Achieve a design that is highly flexible Reduce redundancy Ensure that the design is free of certain update, insertion and deletion anomalies
Normalization
1NF 1NF 2NF 2NF 3NF 3NF BCNF BCNF 4NF 4NF Flat file Partial dependencies removed Transitive dependencies removed Every determinant is a candidate key Non-tivial multi-valued dependencies removed
10001
Go, Hogs
6 / 15 / 99
Stereos To Go
Invoice
Stereos To Go
1/05
Date Shipped:
Item Number 1 2 3 4 5 Product Code
6 / 18 / 99
Product Description/Manufacturer Qty Price
Pioneer Remote A/V Receiver Cervwin Vega Loudspeakers Sony Disc-Jockey CD Changer
1 1 1
Unnormalized Relation
(Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item1 Item1_descrip Item1_qty Item1_price, Item2 Item2_descrip Item2_qty Item2_price, . . . , Item7 Item7_descrip Item7_qty Item7_price)
Unnormalized to 1NF
(Invoice_number, Invoice_date, Date_delivered, Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code, Item1, Item1_descrip, Item1_qty, Item1_price, Item2, Item2_descrip, Item2_qty, Item2_price, . . . , Repeating groups Item7, Item7_descrip, Item7_qty, Item7_price)
A flat file places all the data of a transaction into a single r ecord. record.
This is reminiscent of a COBOL or BASIC program processing a single transaction with one read statement.
Unnormalized to 1NF
(Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item, Item_descrip, Item_qty, Item_price)
Nominated group of attributes to serve as the key (form a unique combination) Eliminate the repeating groups. Each row retains data for one item. If a person bought 5 items, we would have five tuples
1NF r
10001 10001 123456 123456 John John Smith Smith SAGX730 SAGX730 Pioneer Pioneer Remote Remote A/V A/V Rec Rec
10001 10001 123456 123456 John John Smith Smith CDPC725 CDPC725 Sony Sony Disc Disc Jockey Jockey CD CD 10001 10001 123456 123456 John John Smith Smith S/H S/H 10001 10001 123456 123456 John John Smith Smith Tax Tax Shipping Shipping Sales Sales Tax Tax
From 1NF
(Invoice_number, Invoice_date, Date_delivered, Cust_account, Cust_name, Cust_addr, Cust_city, Cust_state, Zip_code, Item, Item_descrip, Item_qty, Item_price)
Functional dependencies and determinants Example: item_descrip is functionally dependent on item, such that item is the determinant of item_descript.
Is this unique by itself? What happens if the item is purchased more than once?
Partial dependency
(Invoice_number, Item, Item_descrip, Item_qty, Item_price)
Insertion anomalies
n
To add a new row, all customer (name, address, city, state, zip code, phone) and products (description) must be consistent with previous entries By deleting a row, a customer or product may cease to exist To modify a customers or products data in one row, all modifications must be carried
Deletion anomalies
n
Modification anomalies
n
CT -32S35 CT-32S35
PAN PAN
Inconsistency
DVD -A110 DVD-A110 PV -4210 PV-4210 PV -4250 PV-4250 CT -32S35 CT-32S35 Panasonic Panasonic PanaSonic PanaSonic Pana Pana Sonic Sonic PAN PAN
Deletion Anomaly
For Example
4377182 4398711 4578461 4873179 John Smith Arnold S Gray Davis Lisa Carr
lll lll lll lll
CA CA CA NV
Transitive Dependencies
A condition where A, B, C are attributes of a relation such that if A B and B C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).
Invoice_number Invoice_date Date_delivered Cust_account Cust_name Cust_addr Cust_city Cust_state Zip_code Item Item_descrip Invoice_number+Item Item_qty Item_price
City and state are dependent on zip code for their values and not the customers identifier (i.e., key). Zip_code City, State
3NF
Invoice Relation
(Invoice_number, Invoice_date, Date_delivered, Cust_account)
Customer Relation
(Cust_account, Cust_name, Cust_addr, Zip_code)
Zip_code Relation
(Zip_code, City, State)
Invoice_items Relation
(Invoice_number, Item, Item_qty, Item_price)
Items Relation
(Item, Item_descrip)
3NF
Invoice Relation
(Invoice_number, Invoice_date, Date_delivered, Cust_account)
Customer Relation
(Cust_account, Cust_name, Cust_addr, Zip_code)
Zip_code Relation
(Zip_code, City, State)
Invoice_items Relation
(Invoice_number, Item, Item_qty, Item_price)
Items Relation
(Item, Item_descrip)
Manufacturers Relation
(Manuf_code, Manuf_name)
Since the Items relation contains the manufacturers name in the description, a separate Manufacturers relation can be created
1NF: A relation is in first normal form if and only if every attribute is single-valued for each tuple (remove the repeating or multi-value attributes and create a flat file) 2NF: A relation is in second normal form if and only if it is in first normal form and the nonkey attributes are fully functionally dependent on the key (remove partial dependencies) 3NF: A relation is in third normal form if it is in second normal form and no nonkey attribute is transitively dependent on the key (remove transitive dependencies)
Codd's Rules
E. F. Codd presented these rules as a basis of determining whether a DBMS could be classified as Relational
Codd's Rules
n
Foundation Rules Structural Rules Integrity Rules Data Manipulation Rules Data Independence Rules
Foundation Rules
n n
Rule 0 Any system claimed to be a RDBMS must be able to manage databases entirely through its relational capabilities.
n
All data definition & manipulation must be able to be done through relational ops.
Foundation Rules
n n
Rule 12 - Nonsubversion Rule If a RDBMS has a low level (record at a time) language, that low level language cannot be used to subvert or bypass the integrity rules &constraints expressed in the higher-level relational language.
n
All database access must be controlled through the DBMS so that the integrity of the database cannot be compromised without the knowledge of the user or the DBA.
n
This does not prohibit use of record at a time languages e.g. PL/SQL
Codd's Rules
n
The fundamental structural construct is the table. Codd states that an RDBMS must support tables, domains, primary & foreign keys. Each table should have a primary key.
Structural Rules
n n
Rule 1 All info in a RDB is represented explicitly at the logical level in exactly one way - by values in a table.
n
ALL info even the Metadata held in the system catalogue MUST be stored as relations(tables) & manipulated in the same way as data.
Structural Rules
n n
Rule 6 - View Updating All views that are theoretically updatable are updatable by the system.
n
Codd's Rules
n
Rule 3 - Systematic treatment of null values Null values are supported for representation of 'missing' & inapplicable information in a systematic way & independent of data type.
Integrity Rules
n n
Rule 10 - Integrity independence Integrity constraints specific to a particular RDB MUST be definable in the relational data sublanguage & storable in the DB, NOT the application program.
n
Codd's Rules
n n
Data Manipulation Rules (Rule 2, 4, 5 & 7) User should be able to manipulate the 'Logical View' of the data with no need for knowledge of how it is Physically stored or accessed. Rule 2 - Guaranteed Access Each & every datum in an RDB is guaranteed to be logically accessible by a combination of table name, primary key value & column name.
n n
Rule 4 - Dynamic on-line Catalog based on relational model The DB description (metadata) is represented at logical level in the same way as ordinary data, so that same relational language can be used to interrogate the metadata as regular data.
n
System & other data stored & manipulated in the same way.
Rule 5 - Comprehensive Data Sublanguage RDBMS may support many languages & modes of use, but there must be at least ONE language whose statements can express ALL of the following n n n n n n
Data Definition View Definition Data manipulation (interactive & via program) Integrity constraints Authorization Transaction boundaries (begin, commit & rollback)
n
Rule 7 - High-level insert, update & delete Capability of handling a base table or view as a single operand applies not only to data retrieval but also to insert, update & delete operations.
Codd's Rules
n
Data Independence Rules (Rules 8, 9 11) These rules protect users & application developers from having to change the applications following any low-level reorganisation of the DB.
n n
Rule 8 - Physical Data Independence Application Programs & Terminal Activities remain logically unimpaired whenever any changes are made either to the storage organisation or access methods. Rule 9 - Logical Data Independence Appn Progs & Terminal Acts remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.
Rule 11 - Distribution Independence The data manipulation sublanguage of an RDBMS must enable application programs & queries to remain logically unchanged whether & whenever data is physically centralised or distributed.
This means that an Application Program that accesses the DBMS on a single computer should also work ,without modification, even if the data is moved from one computer to another in a network environment.
n
The user should 'see' one centralised DB whether data is located on one or more computers.
This rule does not say that to be fully Relational the DBMS must support distributed DB's but that if it does the query must remain the same.
Summary
n
Foundation Rules Structural Rules Integrity Rules Data Manipulation Rules Data Independence Rules