Вы находитесь на странице: 1из 44

What are update Anomalies

The Problems resulting from data redundancy in an un-normalized database table are collectively known as update anomalies. So any database insertion, deletion or modification that leaves the database in an inconsistent state is said to have caused an update anomaly. They are classified as Insertion anomalies: To insert the details of a new member of staff located at branch B1 into the Tbl_Staff_Branch Table shown above, we must enter the correct details of branch numner B1 so that the branch details are consistent with the values for branch B1 in other rows. To insert the details of a new branch that currently has no members of staff into the Tbl_Staff_Branch table, it is necessory to enter nulls for the staff details which is not allowed as staffID is the primary key. But if you normalize Tbl_Staff_Branch, which is in Second Normal Form (2NF) to Third Normal Dorm (3NF), you end up with Tbl_Staff and Tbl_Branch and you shouldn't have the problems mentioned above.
Deletion anomalies: If we delete a row from the Tbl_Staff_Branch table that represents the last member of staff located at that branch, (for e.g. row with Branch numbers B",B3 or B4) the detals about that branch are also lost from the Database. Modification anomalies: Should we need to change the address of a perticular branch in the Tbl_Staff_Branch table, we must update the rows of all staff located at that branch. If this modification is not carried out on all the relevent rows, the database will become inconsistent.
Top

What is Functional Dependency? what are the different types of Functional Dependencies?

Functional Dependencies are fundamental to the process of Normalization Functional Dependency describes the relationship between attributes(columns) in a table. For example, if A and B are attributes of a table, B is functionally dependent on A, if each value of A is associated with exactly one value of B (so, you can say, ' A functionally determines B').

Functional dependency between A and B Attribute or group of attributes on the left hand side of the arrow of a functional dependency is refered to as 'determinant' Simple example would be StaffID functionally determines Position in the above tables. Functional Dependency can be classified as follows: Full Functional dependency Indicates that if A and B are attributes(columns)of a table, B is fully functionally dependent on A if B is functionally dependent on A ,but not on any proper subset of A. E.g. StaffID---->BranchID
Partial Functional Dependency Indicates that if A and B are attributes of a table , B is partially dependent on A if there is some attribute that can be removed from A and yet the dependency still holds. Say for Ex, consider the following functional dependency that exists in the Tbl_Staff table: StaffID,Name -------> BranchID BranchID is functionally dependent on a subset of A (StaffID,Name), namely StaffID. Transitive Functional Dependency: A condition where A , B and C are attributes of a table such that if A is functionally dependent on B and B is functionally dependent on C then C is Transitively dependent on A via B. Say for Ex, consider the following functional dependencies that exists in the Tbl_Staff_Branch table:

StaffID---->Name,Sex,Position,Sal,BranchID,Br_Address BranchID----->Br_Address So, StaffID attribute functionally determines Br_Address via BranchID attribute.
Top

What is Database Normalization?

Database Normalization is a step wise formal process that allows us to decompose Database Tables in such a way that both Data Redundancy and Update Anomalies(see above for more info on update anomalies) are minimised. It makes use of Functional Dependencies that exist in a table (relation, more formally) and the primary key or Candidate Keys in analysing the tables. Three normal forms were initially proposed called First normal Form (1NF), Second normal Form (2NF), and Third normal Form (3NF). Subsequently R.Boyce and E.F.Codd introduced a stronger definition of 3NF called Boyce-Codd Normal Form(BCNF). With the exception of 1NF, all these normal forms are based on Functional dependencies among the attributes of a table. Higher normal forms that go beyond BCNF were introduced later such as Fourth Normal Form (4NF) and Fifth Normal Form (5NF). However these later normal forms deal with situations that are very rare. First Normal Form (1NF) The only thing that is required for a table to be in 1NF is to contain only atomic values (intersection of each row and column should contain one and only one value).this is sometimes referred to as : Eliminate Repeating groups.
Second Normal Form (2NF) A Table is said to be in 2NF if it is in 1NF and there are no partial dependencies i.e. every non primary key attribute of the Table is fully functionally dependent on the primary key. Third Normal Form (3NF) A Table that is in 1NF and 2NF and in which no non primary key attribute is transitively dependent on primary key. Boyce-codd Normal Form (BCNF) A Table is in BCNF if and only if every determinant(it is an attribute or a group of attributes on which some other attribute is fully functionally dependent, see functional dependency described above) is a candidate key. BCNF is a stronger form of 3NF. The difference between 3NF and BCNF is that for a Functional dependency A--->B, 3NF allows this dependency in a table if attribute B is a primary key attribute and attribute A is not a candidate key, where as BCNF insists that for this dependency to remain in a table, attribute A must be a candidate key.

Normalization Process
Fourth Normal Form (4NF) 4NF is a stronger normal form than BCNF as it prevents Tables from containing nontrivial Multi-Valued Dependencies (MVDs) and hence data redendancy. The Normalization of BCNF Tables to 4NF involves the removal of MVDs from the Table by placing the attribute(s) in a new Table along with the copy of the determinant(s). Fifth Normal Form(5NF) 5NF is also called Project-Join Normal Form(PJRF) and specifies that a 5NF Table has no Join dependency.
Top

What are update Anomalies

The Problems resulting from data redundancy in an un-normalized database table are collectively known as update anomalies. So any database insertion, deletion or modification that leaves the database in an inconsistent state is said to have caused an update anomaly. They are classified as Insertion anomalies: To insert the details of a new member of staff located at branch B1 into the Tbl_Staff_Branch Table shown above, we must enter the correct details of branch numner B1 so that the branch details are consistent with the values for branch B1 in other rows. To insert the details of a new branch that currently has no members of staff into the Tbl_Staff_Branch table, it is necessory to enter nulls for the staff details which is not allowed as staffID is the primary key. But if you normalize Tbl_Staff_Branch, which is in Second Normal Form (2NF) to Third Normal Dorm (3NF), you end up with Tbl_Staff and Tbl_Branch and you shouldn't have the

problems mentioned above.

Deletion anomalies: If we delete a row from the Tbl_Staff_Branch table that represents the last member of staff located at that branch, (for e.g. row with Branch numbers B",B3 or B4) the detals about that branch are also lost from the Database. Modification anomalies: Should we need to change the address of a perticular branch in the Tbl_Staff_Branch table, we must update the rows of all staff located at that branch. If this modification is not carried out on all the relevent rows, the database will become inconsistent.
Top

What is Functional Dependency? what are the different types of Functional Dependencies?

Functional Dependencies are fundamental to the process of Normalization Functional Dependency describes the relationship between attributes(columns) in a table. For example, if A and B are attributes of a table, B is functionally dependent on A, if each value of A is associated with exactly one value of B (so, you can say, ' A functionally determines B').

Functional dependency between A and B Attribute or group of attributes on the left hand side of the arrow of a functional dependency is refered to as 'determinant' Simple example would be StaffID functionally determines Position in the above tables. Functional Dependency can be classified as follows: Full Functional dependency Indicates that if A and B are attributes(columns)of a table, B is fully functionally dependent on A if B is functionally dependent on A ,but not on any proper subset of A. E.g. StaffID---->BranchID
Partial Functional Dependency Indicates that if A and B are attributes of a table , B is partially dependent on A if there is some attribute that can be removed from A and yet the dependency still holds. Say for Ex, consider the following functional dependency that exists in the Tbl_Staff table: StaffID,Name -------> BranchID BranchID is functionally dependent on a subset of A (StaffID,Name), namely StaffID. Transitive Functional Dependency: A condition where A , B and C are attributes of a table such that if A is functionally dependent on B and B is functionally dependent on C then C is Transitively dependent on A via B. Say for Ex, consider the following functional dependencies that exists in the Tbl_Staff_Branch table: StaffID---->Name,Sex,Position,Sal,BranchID,Br_Address BranchID----->Br_Address So, StaffID attribute functionally determines Br_Address via BranchID attribute.
Top

What is Database Normalization?

Database Normalization is a step wise formal process that allows us to decompose Database Tables in such a way that both Data Redundancy and Update Anomalies(see above for more info on update anomalies) are minimised. It makes use of Functional Dependencies that exist in a table (relation, more formally) and the primary key or Candidate Keys in analysing the tables. Three normal forms were initially proposed called First normal Form (1NF), Second normal Form

(2NF), and Third normal Form (3NF). Subsequently R.Boyce and E.F.Codd introduced a stronger definition of 3NF called Boyce-Codd Normal Form(BCNF). With the exception of 1NF, all these normal forms are based on Functional dependencies among the attributes of a table. Higher normal forms that go beyond BCNF were introduced later such as Fourth Normal Form (4NF) and Fifth Normal Form (5NF). However these later normal forms deal with situations that are very rare. First Normal Form (1NF) The only thing that is required for a table to be in 1NF is to contain only atomic values (intersection of each row and column should contain one and only one value).this is sometimes referred to as : Eliminate Repeating groups.
Second Normal Form (2NF) A Table is said to be in 2NF if it is in 1NF and there are no partial dependencies i.e. every non primary key attribute of the Table is fully functionally dependent on the primary key. Third Normal Form (3NF) A Table that is in 1NF and 2NF and in which no non primary key attribute is transitively dependent on primary key. Boyce-codd Normal Form (BCNF) A Table is in BCNF if and only if every determinant(it is an attribute or a group of attributes on which some other attribute is fully functionally dependent, see functional dependency described above) is a candidate key. BCNF is a stronger form of 3NF. The difference between 3NF and BCNF is that for a Functional dependency A--->B, 3NF allows this dependency in a table if attribute B is a primary key attribute and attribute A is not a candidate key, where as BCNF insists that for this dependency to remain in a table, attribute A must be a candidate key.

Normalization Process
Fourth Normal Form (4NF) 4NF is a stronger normal form than BCNF as it prevents Tables from containing nontrivial Multi-Valued Dependencies (MVDs) and hence data redendancy. The Normalization of BCNF Tables to 4NF involves the removal of MVDs from the Table by placing the attribute(s) in a new Table along with the copy of the determinant(s). Fifth Normal Form(5NF) 5NF is also called Project-Join Normal Form(PJRF) and specifies that a 5NF Table has no Join dependency.
Top

'll refer to the following tables when explaining some concepts

Tbl_Staff_Branch

Tbl_Staff

Tbl_Branch

Database design is a complex subject, no matter how easy some people think it is. This session only scratches the surface, but it is a good scratch. A properly designed database is a model of a business, or some "thing" in the real world. Like their physical model counterparts, data models enable you to get answers about the facts that make up the objects being modeled. It's the questions that need answers that determine which facts need to be stored in the data model. In the relational model, data is organized in tables that have the following characteristics: Every record has the same number of facts; Every field contains the same type of facts in each record; There is only one entry for each fact; no two records are exactly the same; The order of the records and fields is not important.

At the end of this reading, you should have a basic understanding of problems resulting from poor database design, be familiar with the Domain/Key normal form, understand a process for designing a relational database, and be aware of the tools used in Microsoft Access to support integrity constraints in a database. Why DeWhy design?

Accurate design is crucial to the operation of a reliable and efficient information system. Microcomputer technology is now so advanced that the impact of a poor design may not show up as early as in the past; however, when the problems appear they can be severe.Although Microsoft Access and Microsoft FoxPro are powerful and easy to use, they have historically not lent themselves well to ad hoc design. The design of a database has to do with the way data is stored and how that data is related. The design process is performed after you determine exactly what information needs to be stored and how it is to be retrieved. The more carefully you design, the better the physical database meets users' needs. In the process of designing a complete system, you must consider user needs from a variety of viewpoints. Problems Resulting from Poor Design A myriad of problems can manifest themselves as a result of poor database design: The database and/or application may not function properly. Data may be unreliable or inaccurate. Performance may be degraded. Flexibility may be lost.

The following section explains some common problems resulting from poor database design. The problems can be grouped under two categories: redundant data and modification anomalies. Redundant Data Consider the following table that stores data about products and suppliers. This seemingly harmless table contains many potential problems. Prod ID 34

Description Supplier Super Computer Next Generation Super Computer

Address

City

Region Country

Keyboard

Hafeez Center

Lahore

OR

Pakistan

27

Mouse

Mureee Center

Mureee

Pakistan

68 42 20

Processor Printer RAM

Hafeez Center

Lahore Lahore Rawalpindi

Pakistan Pakistan Pakistan

TechPoint Main market Interlink Raja Center

21

Scanner

Computer Mart Giga Computer Super Computer Micro Computer

Hall Road

Lahore

Pakistan

61

Hard Disk

Hall Road

Lahore

Punjab Pakistan

46

Flash Disk

Hafeez Center

Lahore

Pakistan

35

Floppy

Hall Road

Lahore

OR

Pakistan

Suppose Super Computer supplier supplies another part and we want to add into database. 37 Hafeez Center Lahore OR Pakistan device Computer Disk space is wasted by duplicating data about the supplier. Every time a new product is entered for a particular supplier, all of the supplier data has to be repeated. Imagine the problems if several suppliers supply hundreds of products each. Bluetooth Super

Modification Anomaly What if Super Computer moves from Pakistan to China? How many rows have to change in order to ensure that the new address is recorded? Prod ID 34 Description Supplier Super Computer Next Generation Super Computer Address City Region Country

Keyboard

Hafeez Center

Lahore

OR

China

27

Mouse

Mureee Center

Mureee

Pakistan

68 42 20 21

Processor Printer RAM Scanner

Hafeez Center

Lahore Lahore Rawalpindi Lahore

China Pakistan Pakistan Pakistan

TechPoint Main market Interlink Computer Mart Giga Computer Super Computer Micro Computer Raja Center Hall Road

61

Hard Disk

Hall Road

Lahore

Punjab Pakistan

46

Flash Disk

Hafeez Center

Lahore

China

35

Floppy

Hall Road

Lahore

OR

Pakistan

Again, imagine the issues surrounding modifications of hundreds of rows of data for one supplier. When changes are made, they must be made to all copies of the data. Think about the confusion that results from changing only a subset of the duplicate data. Deletion Anomaly

Sppose you no longer carried product 42 and decided to delete that row from the table? Prod ID 34

Description Supplier Super Computer Next Generation Super Computer

Address

City

Region Country

Keyboard

Hafeez Center

Lahore

OR

Pakistan

27

Mouse

Mureee Center

Mureee

Pakistan

68 42 20 21

Processor Printer RAM Scanner

Hafeez Center

Lahore Lahore Rawalpindi Lahore

Pakistan Pakistan Pakistan Pakistan

TechPoint Main market Interlink Computer Mart Giga Computer Super Computer Micro Computer Raja Center Hall Road

61

Hard Disk

Hall Road

Lahore

Punjab Pakistan

46

Flash Disk

Hafeez Center

Lahore

Pakistan

35

Floppy

Hall Road

Lahore

OR

Pakistan

Now, looking at the remaining data below, what is the address of TechPoint A deletion anomaly means that we lose more information than we want. We lose facts about more than one subject with one deletion. Inertion Anomaly

Next, you want to add a new supplier Computer Links but you have not yet ordered any products from that supplier. What do you add? Prod ID 34 Description Supplier Super Computer Next Generation Super Computer Address City Region Country

Keyboard

Hafeez Center

Lahore

OR

Pakistan

27

Mouse

Mureee Center

Mureee

Pakistan

68 42 20 21

Processor Printer RAM Scanner

Hafeez Center

Lahore Lahore Rawalpindi Lahore

Pakistan Pakistan Pakistan Pakistan

TechPoint Main market Interlink Computer Mart Giga Computer Super Computer Micro Computer Computer Links Raja Center Hall Road

61

Hard Disk

Hall Road

Lahore

Punjab Pakistan

46

Flash Disk

Hafeez Center

Lahore

Pakistan

35

Floppy

Hall Road

Lahore

OR

Pakistan

????? ???

Hall Road

Lahore

Punjab Pakistan

This situation is called an insertion anomaly. Negatively stated, we cannot add a fact about one subject until we have additional data about another subject. Domain/Key Normal Form Relational theorists have classified database schemas that have inconsistencies based on the anomalies to which they are susceptible. You may have encountered discussions about different forms, such as first normal form or third normal form. One of the unique normalization forms was proposed by R. Fagin, in 1981, and is used as the basis of this

presentation. Fagin ascertained that if the tables in your database are in Domain/Key Normal Form (D/KNF), then they are free of modification anomalies. To understand D/KNF there are four terms that must be understood: dependency, key, domain, and restriction. Dependency A dependency is a relationship that may exist between two columns. Given the value of one column, you are able to determine the value of another column. Let's use the table in the previous examples. Given the product number, we are able to determine the product description. This is a dependency: descriptions are dependent on product numbers. Given a supplier's name, are we able to determine the product description? Not necessarily. In the case of Super Computer, this supplier has a number of products associated with it. Therefore, in the above tables, description is not a dependency of suppliers. To detect a dependency, ask yourself this question: In this table, does the value of one column determine ALL POSSIBLE values of another column? ProductID Supplier Supplier Key Most tables should have a column or a combination of columns that uniquely identifies a row of data. A column is key if all other columns in a row are dependent on it. At first glance, it may appear that the ProductID in our example uniquely identifies a row of data. But ProductID 34 identifies the supplier as BigFoot Breweries, as do part numbers 35, and 37. Therefore, the column ProductID is not the key. In this table, we have a complex key, derived from ProductID, Description, and Supplier. Domain determines Description? determines Address? determines ProductID? YES YES NO

A domain is the set of values a column can have. Every column has a domain, which has both physical and logical properties. Physical Description The physical part of a domain is the type of information about that column. In our example, Supplier is defined as TEXT 40. Because of this definition, the physical description of the domain is the set of TEXT data with 40 or fewer characters. Similarly, the physical description for the domain of ProductID is expressed as INTEGER. This results in data of nine or fewer numbers Logical Description. The logical part of the domain is the set of information associated with that fact. Supplier addresses are not in the same domain as customer addresses, although they have the same physical property of TEXT 60. Consider the value 7124 E. 41st Place. Is this value in the domain of supplier address? To be in this domain it must have fewer than 60 characters and be a supplier's address. Restriction A restriction is a limitation of some type on the values in a table. A dependency is a type of restriction. Stating that Description is dependent on ProductID is a restriction. Keys are a type of restriction. When a column is a key, it means that all other columns in that table are dependent on the key. Remember that a key can be a combination of columns. A domain is another type of restriction. When defining the physical and logical properties of a column, we restrict the data in that column. Restriction is a general term. There are many other ways to restrict data in a table. Below are some examples: Invoice date must be formatted as MM/DD/YY. ProductID must begin with the number 100. Suppliers must be TEXT with 40 or fewer characters. Tax Total must be CURRENCY with values between $1.00 and $9,999,999.99.

The Normalizing Process

Normalizing the database ensures that the structure of the database allows changes to be made without incurring unexpected consequences. The role of normalization is to maintain stable, reliable data through good database design. The goal of good database design is to ensure that all restrictions are logical consequences of domain and key restrictions. Tables, like paragraphs, should have a single theme. The table in the anomalies examples has two themes: Information about products Information about the suppliers of products The way to manage this information most efficiently is to split the table into two tables: a table of products and a table of suppliers. Products Prod ID 34 68 42 20 21 61 46 35 Description Keyboard Processor Printer RAM Scanner Hard Disk Flash Disk Floppy Suppliers Supplier Super Computer Super Computer TechPoint Interlink Computer Mart Giga Computer Super Computer Micro Computer

Supplier Super Compute> Next Generation Super Computer TechPoint Interlink Computer Mart Giga Computer Super Computer Micro Computer

Address Hafeez Center

City Lahore

Region> Country OR Pakistan

Mureee Center

Mureee

Pakistan

Hafeez Center Main market Raja Center Hall Road

Lahore Lahore Rawalpindi Lahore

Pakistan Pakistan Pakistan Pakistan

Hall Road

Lahore

Punjab

Pakistan

Hafeez Center

Lahore

Pakistan

Hall Road

Lahore

OR

Pakistan

Now you can add products without duplications, change supplier locations without changing several rows, and not lose information if you delete a part. If you wish, you can always bring the original table back using a query with a join on Supplier. A Method of Database Design As you have seen, database design plays a major role in the stability and the reliability of your data. In this section, we show you the process of designing a database. To help illustrate the design process, a database named Rags is created for a fictitious wholesale clothing manufacturer called Unlimited Rags.

Although there are a number of rules that can be followed in designing a database structure, the design process is as much an art as it is a science. Follow these rules when at all possible, but not to the point where the database loses the functionality that is so important to the user. Doing a paper design first has several advantages: Saves time, money, and problems Makes system more reliable; avoids potential data-modification problems Serves as a blueprint for discussion Helps in estimating costs and size A good design should have the following objectives: Meet the users' needs Solve the problem Be free of modification anomalies Have a reliable and stable database, where the tables are as independent as possible Be easy to use Design of the Database Model The design of the database structure requires the following steps: 1. List the objects. 2. List the facts about the objects. 3. Turn the objects and facts into tables and columns.

4. Determine the relationship among objects. 5. Determine the key columns. 6. Determine the linking columns. 7. Determine the constraints. 8. Evaluate the design model. 9. Implement the database. Step 1: List the Objects Make a list of all objects. An object is a single theme, similar to a paragraph. At Unlimited Rags the objects are: Customer Product Employee Step 2: List the Facts About the Objects There is a great deal of information associated with every object. In this step, you should list the facts about an object and then eliminate the facts that are not important to the solution of the problem. The customer object, for example, can have many facts associated with it: company name, address, city, founders, number of employees, stock price. In this case, it is not important to keep information about the number of employees, stock price, or founders. Unlimited Rags needs only the information it will use now and possibly in the future. Object Employee Important Facts About the Object employee, name, birth date, gender, SSN, marital Ship Rate Invoice Dependent

status Customer Invoice Product company name, address, city, state, zip, contact, title date, salesperson, customer, quantity, shipping charge, tax, freight product name, description, cost, markup

Dependent name, date of birth Ship Rates state, rates Step 3: Turn the Objects and Facts into Tables and Columns Objects automatically become tables, and facts become columns once the column domains are determined. Recall that a domain is a set of values that a column can have. Every column has a domain, which has both physical and logical properties. For example, the column for employee last name is defined as TEXT 15. TEXT 15 is the physical property of the column. Because of this definition, its domain is the set of all employee last names with 15 characters or less. If a column is used to link two or more tables, the domains must be the same and the columns should be given the same name. If the logical description differs (for example, employee last name and customer last name), the columns are not the same and should not share the same name. The following is a list of the preliminary tables, columns, and domains for Unlimited Rags: Table: CUSTOMER Table: PRODUCT Name COMPANY CADD1 CADD2 Type TEXT TEXT TEXT Length 45 30 30 Name PRODNAME PRODDESC PRODCOST Type TEXT TEXT CURR Length 30 50

CCITY CSTATE CZIP CAC CTELPH CONTACT TITLE

TEXT TEXT TEXT TEXT TEXT TEXT TEXT

25 2 10 3 7 30 30

PMARKUP

NUMB

Table: DEPENDENT Name DLAST DFIRST DDOB Type TEXT TEXT D/T Length 15 10

Table: INVOICE Table: EMPLOYEE

Name INVDATE REQDATE SHIPNAME SHIPADDR SHIPCITY SHIPZIP INVTOTAL

Type D/T D/T TEXT TEXT TEXT TEXT CURR

Length

Name ESSN ELASTN

Type TEXT TEXT TEXT D/T TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT

Length 11 15 10

45 30 25 10

EFIRSTN EDOB EGENDER EMARITAL EADDR1 EADDR2 ECITY

1 1 30 20 25 2 10 3 7

Table: SHIP RATE Name SHIPST SHIPRATE Type TEXT NUMB Length 2

ESTATE EZIP EAC EHOMEPH

Often it helps in the design stages to draw boxes to represent the tables. In later steps you can then fill in key columns and draw the relationships among the tables. Step 4: Determine the Relationship Among Objects To determine the relationship among the objects, take each object and look at how that object may be related to another. Keep in mind that not every relationship existing between objects is important. The relationships that are important are those that allow you to model the database after the real-world situation that the database represents. One-to-one relationships. For any given row in Table A, there is only one row in Table B. For any given row in Table B, there is only one row in Table A. There are no one-to-one relationships in the Rags database. An

example of a one-to-one Step 4: Determine the Relationship Among

example of a one-to-one relationship is that of employee data and private employee data. General information, such as employee name, address, and start date, is kept in one table, and to ensure privacy, personal information, such as salary, is kept in another table. One-to-many relationships. For any given row in Table A, there are many rows in Table B. For any given row in Table B, there is only one row in Table A. The relationship between an employee and an employee's dependents is one-to-many, because one employee may have many dependents, but a dependent is related to only one employee. The relationship between customers and invoices is also one-to-many. One invoice is related to one customer, but a customer can have many invoices. Many-to-many relationships. For any given row in Table A, there are many rows in Table B. For any given row in Table B, there are many rows in Table A. There is a many-to-many relationship between the product table and the invoice table. A product can be associated with many different invoices and an invoice can contain many different products. In the case of the Rags database, we are attempting to model an environment that is based on sales transactions. Take the example of products and customers: Although in some circumstances we may be interested in the relationship between customers and products, in a sales transaction, the customer is related to a product only when a sale occurs. Therefore, a customer is related to an invoice, and the invoice carries the relationship to a product. The first step in determining the type of relationship between tables is to list every table and to see how it relates to any others: Customer is related to invoice. Customer is not related to any other table in the list. Employee is related to dependent. Employee (sales) is related to invoice.

Product is related to invoice. An effective method to find the type of relationship is to ask whether a specific record in Table A can point to (is linked to) one or to many rows in Table B, and then reverse the tables and ask the question again. Does a customer record point to one or many invoices? Many Does an invoice row link with one or many customers? One The relationship between the tables is one-to-many. A sales employee writes one or many invoices? Many An invoice is written by one or many employees? One The relationship between employee and invoice is also one-to-many. A product can be a line item on one or many invoices? Many Can an invoice be linked to one or many products? Many The relationship between product and invoice is many-to-many. The Ship Rate table illustrates that a table can be included in a database and not need to be relationally linked to any other table. Step 5: Determine the Key Columns A key can be an account number, social security number, part number, license number, or any other numeric value or combination of characters that are unique. A complex key is one

that is derived from more than one column. Microsoft Access supports complex keys directly. No other row in the table can have the value of the key column(s). Other tables may share the same set of key information. If a company name is universally unique, it is used as a unique row identifier. However, if there is any possibility another company could have the same name, then it is not unique and must not be employed as a key column. Do not use any column as a key where the possibility exists for a duplicate. A key column cannot contain null values. By definition, all key columns should be indexed. Because text names are usually not unique and cannot be used in math operations, it is useful to make key columns a sequential numeric value. In many cases, it is easier to develop your own unique row identifier. If you want automatic numbering for invoice numbers or employee ID numbers, COUNTER data type in Microsoft Access is a good choice for a physical description for the domain of a key column.

Most of the tables in the final Rags database contain columns with a COUNTER or NUMBR data type for the unique row identifier. Each key is also indexed, and duplicates are not allowed. Database performance is enhanced with a single numeric column as the key.
Step 6: Determine the Linking Columns If you have been careful about designating key columns, you also have determined the linking columns. Links provide a way to tie information (rows) in one table to another table. If a table has a key column, that column can generally serve as the link. Tables are linked together through their key columns. However, the placement of the key is important, and where the link is placed depends on the type of relationship between the tables. To determine the placement of the links, you must first know the type of relationship among the objects or tables. Once you know the type of relationship among tables, it is much easier to determine where to place the linking column to tie two tables together.

Note that not all tables need to be linked relationally. Employees must be linked with dependents, but you would not link employees with ship rates or products. Linking in a one-to-one relationship. In one-to-one relationships the link should be the most stable column or should be from the table where the key column is created. The most stable is the column least likely to change. If an automatic numbering system is being used, then use that column as the linking column. Linking in a one-to-many relationship. In one-to-many relationships the linking column should come from the one table. The key column from the employee table (one side) should be placed in the dependent table (many side). When the key empid is placed in the dependent table, it is referred to as a foreign key in the dependent table. Linking in a many-to-many relationship. The many-to-many relationship causes problems when attempting to retrieve data and when relating a value in one table to its corresponding value in the other table. It is important to understand this relationship to be able to recognize and control this situation when it arises. A classic many-to-many relationship is product and invoice. A product can be an item on many different invoices and an invoice can have many products associated with it. But which key will we use for a link? If invid is placed in the product table, then all of the product data would have to be repeated for each invoice that contains that product. If prodid is placed in the invoice table, then the invoice information has to be repeated for each product contained in the invoice. This leads to redundant data, and the potential for invalid data is increased. Performance may suffer. The solution to many-to-many relationships is to create an intersection table. This table should contain the key columns from both tables. This is illustrated in the following diagram. Step 7: Determine the Relationship Constraints

Often the information we get from a database comes from more than one table. For example, if we want to know who the parent of a particular dependent is, the name is determined by using the value in empid to look up the correct row in the employee table. The question of who the parent is can be answered only if there is a row in the employee table with an empid value corresponding to that in the dependent table. To ensure the integrity of the data in our database, our model should require, for example, that no row can be added to the dependent table, unless there is already a corresponding row in the employee table. This requirement is known as a relationship constraint. In this case, a constraint must exist on the dependent table that ensures that the employee (parent) exists. If you are creating an invoice, you must have a customer to bill. An entry in the customer table must exist before the invoice can be written. In this case, a constraint must exist on the invoice table to ensure that the customer exists. There are at least four methods to implement relationship constraints: Built-in controls in the DBMS Data entry and access procedures Programming Implementation of rules Microsoft Access has certain referential integrity constraint mechanisms built into the engine. With Microsoft FoxPro, the relationship constraints must be handled programmatically. In Microsoft Access, rules at the database or form level can be employed to enforce column domains (for example, accept values less than 200, or text value must be F or M) or in any other operation where you want a data entry test to be performed.

Step 8: Evaluate the Design The next step in the design process is the evaluation of the design. In this step, you should look for any design flaws that could cause the data to be unreliable, unstable, or redundant. Every table should be evaluated by asking the following questions: 1. Does each table have a single theme? It should. Each column should be a fact about the key. 2. Does each table have a key column(s)? It should. 3. Are there any dependencies? Only logical consequences of the key should exist. 4. Are the domains unique among tables? Do not mix domains unless the column is common between tables. 5. Are the restrictions domain or key? 6. Is the table easy to use? Evaluation of the Customer Table CUSTID COMPANY CADD1 CADD2 CCITY CSTATE CZIP CAC CTELPH CONTACT TITLE

The table has a single theme: customers. The table has a key: custid. The table does not have any dependencies that are not logical consequences of the key. Given custid, a company and company address can be uniquely determined. Given a company, we cannot determine any particular custid. Given a state, we cannot determine any particular custid. Therefore, the customer table does not have any dependencies. The column names are not used in any other tables except for custid, which is a foreign key in the invoice table. The restrictions are domain or key. Step 9: Implement the Design Once the database had been designed on paper, the next step is to implement the design in Microsoft Access. When defining tables in Microsoft Access, it is extremely important to keep your paper design in mind. Designing a database on the fly can cause problems that may be quite difficult to recover from. (Remember the anomalies earlier in this chapter.) In Microsoft Access 2.0, there are two tools that will help you complete the implementation of your design. The Table Wizard can be used to generate a variety of common tables. The graphical system relationships window can be used to set up relationships and key dependencies. Note, that while using the Table Wizard ensures proper relational design, the eight steps prior to implementation remain important and should be completed prior to constructing the tables. The following is a list of the final tables, columns, and domains for Unlimited Rags, including linking columns: Table: CUSTOMER Table: PRODUCT

Name CUSTID COMPANY CADD1 CADD2 CCITY CSTATE CZIP CAC CTELPH CONTACT TITLE

Type

Length

Name PRODID PNAME PDESCRIP PCOST PMARKUP

Type

Length

COUNTER TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT 45 30 30 25 2 10 3 7 30 30

COUNTER TEXT TEXT CURR NUMB 30 50

Table: SHIP RATE Name SHIPST SHIPRATE Type TEXT NUMB Length 2

Table: INVOICE Table: TRANSACTION Name INVID CUSTID INVDATE REQDATE SHIPNAME SHIPADDR SHIPCITY SHIPST SHIPZIP INVTOTAL Type Length Name INVID PRODID TQTY TDISC 45 30 25 2 10 TPRICE Type NUMB NUMB NUMB NUMB CURR Length

COUNTER NUMB D/T D/T TEXT TEXT TEXT TEXT TEXT CURR

Table: EMPLOYEE Table: DEPENDENT

Name EMPID ESSN ELASTN EFIRSTN EDOB EGENDER 1EMARITAL EADDR1 EADDR2 ECITY ESTATE EZIP EAC EHOMEPH Summary

Type

Length

Name EMPID DLAST DFIRST DDOB

Type NUMB TEXT TEXT D/T

Length

COUNTER TEXT TEXT TEXT D/T TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT TEXT 1 1 30 20 25 2 10 3 7 11 15 10

15 10

By following the nine-step design process, the problems of data redundancy, changing multiple occurrences of data, and deletion and insertion anomalies can be avoided. It is well worth the time spent in the design process to ensure a reliable and flexible system. Design to the point where redundancy is eliminated or controlled. As you design your database, keep in mind the following list of common database errors to avoid: Trash-table-putting everything in the same table No unique row identifier (key column or columns) No linking or common columns

Mixing logical and physical descriptions of domains Putting the linking column in the wrong table Restrictions not enforced Many-to-many relationships without intersecting tables

Database normalization
In the field of relational database design, normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristicsinsertion, update, and deletion anomaliesthat could lead to a loss of data integrity.[1] Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know as the First Normal Form (1NF) in 1970.[2] Codd went on to define the Second Normal Form (2NF) and Third Normal Form (3NF) in 1971,[3] and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974.[4] Higher normal forms were defined by other theorists in subsequent years, the most recent being the Sixth normal form (6NF) introduced by Chris Date, Hugh Darwen, and Nikos Lorentzos in 2002.[5] Informally, a relational database table (the computerized representation of a relation) is often described as "normalized" if it is in the Third Normal Form.[6] Most 3NF tables are free of insertion, update, and deletion anomalies, i.e. in most cases 3NF tables adhere to BCNF, 4NF, and 5NF (but typically not 6NF). A standard piece of database design guidance is that the designer should create a fully normalized design; selective denormalization can subsequently be performed for performance reasons.[7] However, some modeling disciplines, such as the dimensional modeling approach to data warehouse design, explicitly recommend non-normalized designs, i.e. designs that in large part do not adhere to 3NF.[8]

Objectives of normalization
This section needs additional citations for verification.
Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (August 2010)

A basic objective of the first normal form defined by Codd in 1970 was to permit data to be queried and manipulated using a "universal data sub-language" grounded in first-order logic.[9] (SQL is an example of such a data sub-language, albeit one that Codd regarded as seriously flawed.)[10] The objectives of normalization beyond 1NF (First Normal Form) were stated as follows by Codd:
1. To free the collection of relations from undesirable insertion, update and deletion dependencies; 2. To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs; 3. To make the relational model more informative to users; 4. To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by. E.F. Codd, "Further Normalization of the Data Base Relational Model"
[11]

The sections below give details of each of these objectives.

Free the database of modification anomalies

An update anomaly. Employee 519 is shown as having different addresses on different records.

An insertion anomaly. Until the new faculty member, Dr. Newsome, is assigned to teach at least one course, his details cannot be recorded.

A deletion anomaly. All information about Dr. Giddens is lost when he temporarily ceases to be assigned to any courses.

When an attempt is made to modify (update, insert into, or delete from) a table, undesired sideeffects may follow. Not all tables can suffer from these side-effects; rather, the side-effects can only arise in tables that have not been sufficiently normalized. An insufficiently normalized table might have one or more of the following characteristics:

The same information can be expressed on multiple rows; therefore updates to the table may result in logical inconsistencies. For example, each record in an "Employees' Skills" table might contain an Employee ID, Employee Address, and Skill; thus a change of address for a particular employee will potentially need to be applied to multiple records (one for each of his skills). If the update is not carried through successfullyif, that is, the employee's address is updated on some records but not othersthen the table is left in an inconsistent state. Specifically, the table provides conflicting answers to the question of what this particular employee's address is. This phenomenon is known as an update anomaly.

There are circumstances in which certain facts cannot be recorded at all. For example, each record in a "Faculty and Their Courses" table might contain a Faculty ID, Faculty Name, Faculty Hire Date, and Course Codethus we can record the details of any faculty member who teaches at least one course, but we cannot record the details of a newly-hired faculty member who has not yet been assigned to teach any courses. This phenomenon is known as aninsertion anomaly.

There are circumstances in which the deletion of data representing certain facts necessitates the deletion of data representing completely different facts. The "Faculty and Their Courses" table described in the previous example suffers from this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, we must delete the last of the records on which that faculty member appears, effectively also deleting the faculty member. This phenomenon is known as a deletion anomaly.

[edit]Minimize

redesign when extending the database structure

When a fully normalized database structure is extended to allow it to accommodate new types of data, the pre-existing aspects of the database structure can remain largely or entirely unchanged. As a result, applications interacting with the database are minimally affected. [edit]Make

the data model more informative to users

Normalized tables, and the relationship between one normalized table and another, mirror realworld concepts and their interrelationships. [edit]Avoid

bias towards any particular pattern of querying

Normalized tables are suitable for general-purpose querying. This means any queries against these tables, including future queries whose details cannot be anticipated, are supported. In contrast, tables that are not normalized lend themselves to some types of queries, but not others. For example, consider an online bookseller whose customers maintain wishlists of books they'd like to have. For the obvious, anticipated query -- what books does this customer want? -- it's enough to store the customer's wishlist in the table as, say, a homogeneous string of authors and titles. With this design, though, the database can answer only that one single query. It cannot by itself answer interesting but unanticipated queries: What is the most-wished-for book? Which customers are interested in WWII espionage? How does Lord Byron stack up against his contemporary poets? Answers to these questions must come from special adaptive tools completely separate from the database. One tool might be software written especially to handle such queries. This special adaptive software has just one single purpose: in effect to normalize the non-normalized field. Unforeseen queries can be answered trivially, and entirely within the database framework, with a normalized table. [edit]Example Querying and manipulating the data within an unnormalized data structure, such as the following non-1NF representation of customers' credit card transactions, involves more complexity than is really necessary: Customer Transactions

Jones

Tr. ID

Date

Amount

12890 14-Oct-2003 -87

12904 15-Oct-2003 -50

Tr. ID Wilkins

Date

Amount

12898 14-Oct-2003 -21

Tr. ID

Date

Amount

12907 15-Oct-2003 -18 Stevens 14920 20-Nov-2003 -70

15003 27-Nov-2003 -60

To each customer there corresponds a repeating group of transactions. The automated evaluation of any query relating to customers' transactions therefore would broadly involve two stages:

Unpacking one or more customers' groups of transactions allowing the individual transactions in a group to be examined, and Deriving a query result based on the results of the first stage

For example, in order to find out the monetary sum of all transactions that occurred in October 2003 for all customers, the system would have to know that it must first unpack the Transactions group of each customer, then sum the Amounts of all transactions thus obtained where the Date of the transaction falls in October 2003. One of Codd's important insights was that this structural complexity could always be removed completely, leading to much greater power and flexibility in the way queries could be formulated

(by users andapplications) and evaluated (by the DBMS). The normalized equivalent of the structure above would look like this: Customer Tr. ID Date Amount

Jones

12890 14-Oct-2003 -87

Jones

12904 15-Oct-2003 -50

Wilkins

12898 14-Oct-2003 -21

Stevens

12907 15-Oct-2003 -18

Stevens

14920 20-Nov-2003 -70

Stevens

15003 27-Nov-2003 -60

Now each row represents an individual credit card transaction, and the DBMS can obtain the answer of interest, simply by finding all rows with a Date falling in October, and summing their Amounts. All of the values in the data structure are on an equal footing: they are all exposed to the DBMS directly, and can directly participate in queries, whereas in the previous situation some values were embedded in lower-level structures that had to be handled specially. Accordingly, the normalized design lends itself to general-purpose query processing, whereas the unnormalized design does not. [edit]Background

to normalization: definitions

Functional dependency In a given table, an attribute Y is said to have a functional dependency on a set of attributes X (written X Y) if and only if each X value is associated with precisely one Y value. For example, in an "Employee" table that includes the attributes "Employee ID" and "Employee Date of Birth", the functional dependency {Employee ID} {Employee Date of Birth} would hold. Trivial functional dependency

A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} {Employee Address} is trivial, as is {Employee Address} {Employee Address}. Full functional dependency An attribute is fully functionally dependent on a set of attributes X if it is

functionally dependent on X, and not functionally dependent on any proper subset of X. {Employee Address} has a functional dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}. Transitive dependency

A transitive dependency is an indirect functional dependency, one in which XZ only by virtue of XY and YZ. Multivalued dependency A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows. Join dependency A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T. Superkey A superkey is a combination of attributes that can be uniquely used to identify a database record. A table might have many superkeys. Candidate key A candidate key is a special subset of superkeys that do not have any extraneous information in them. Examples: Imagine a table with the fields <Name>, <Age>, <SSN> and <Phone Extension>. This table has many possible superkeys. Three of these are <SSN>, <Phone Extension, Name> and <SSN, Name>. Of those listed, only <SSN> is a candidate key, as the others contain information not necessary to uniquely identify records Non-prime attribute A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table.

Primary key Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary key is a key which the database designer has designated for this purpose. [edit]Normal

forms

The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less vulnerable it is to inconsistencies and anomalies. Each table has a "highest normal form" (HNF): by definition, a table always meets the requirements of its HNF and of all normal forms lower than its HNF; also by definition, a table fails to meet the requirements of any normal form higher than its HNF. The normal forms are applicable to individual tables; to say that an entire database is in normal form n is to say that all of its tables are in normal form n. Newcomers to database design sometimes suppose that normalization proceeds in an iterative fashion, i.e. a 1NF design is first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of how normalization typically works. A sensibly designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms (above 3NF) does not usually require an extra expenditure of effort on the part of the

designer, because 3NF tables usually need no modification to meet the requirements of these higher normal forms. The main normal forms are summarized below. Normal form

Defined by

Brief definition

Table faithfully Two versions: represents First normal E.F. Codd (1970), a relation and has form (1NF) C.J. Date no repeating (2003)[12] groups

Second E.F. Codd normal (1971)[13] form (2NF)

No non-prime attribute in the table is functionally dependent on a proper subset of acandidate key

E.F. Codd (1971)[14]; see +also Carlo Third Zaniolo's normal equivalent but form (3NF) differentlyexpressed definition (1982)[15]

Every non-prime attribute is nontransitively dependent on every candidate key in the table

Boyce-Codd Raymond F. normal Boyce and E.F. form(BCNF Codd (1974)[16] )

Every non-trivial functional dependency in the table is a dependency on a superkey

Fourth

Ronald

Every non-

normal Fagin (1977)[17] form (4NF)

trivial multivalue d dependency in the table is a dependency on a superkey

Fifth normal Ronald form (5NF) Fagin (1979)[18]

Every nontrivial join dependency in the table is implied by the superkeys of the table

Domain/key normal Ronald form(DKNF Fagin (1981)[19] )

Every constraint on the table is a logical consequence of the table's domain constraints and key constraints

C.J. Date, Hugh Sixth Darwen, normal and Nikos form (6NF) Lorentzos (2002)[5
]

Table features no non-trivial join dependencies at all (with reference to generalized join operator)

[edit]Denormalization Main article: Denormalization Databases intended for online transaction processing (OLTP) are typically more normalized than databases intended for online analytical processing (OLAP). OLTP applications are characterized by a high volume of small transactions such as updating a sales record at a supermarket checkout counter. The expectation is that each

transaction will leave the database in a consistent state. By contrast, databases intended for OLAP operations are primarily "read mostly" databases. OLAP applications tend to extract historical data that has accumulated over a long period of time. For such databases, redundant or "denormalized" data may facilitate business intelligence applications. Specifically, dimensional tables in a star schema often contain denormalized data. The denormalized or redundant data must be carefully controlled during extract, transform, load (ETL) processing, and users should not be permitted to see the data until it is in a consistent state. The normalized alternative to the star schema is the snowflake schema. In many cases, the need for denormalization has waned as computers and RDBMS software have become more powerful, but since data volumes have generally increased along with hardware and software performance, OLAP databases often still use denormalized schemas. Denormalization is also used to improve performance on smaller computers as in computerized cash-registers and mobile devices, since these may use the data for look-up only (e.g. price lookups). Denormalization may also be used when no RDBMS exists for a platform (such as Palm), or no changes are to be made to the data and a swift response is crucial. [edit]Non-first

normal form (NF or

N1NF)

In recognition that denormalization can be deliberate and useful, the non-first normal form is a definition of database designs which do not conform to first normal form, by allowing "sets and sets of sets to be attribute domains" (Schek 1982). The languages used to query and manipulate data in the model must be extended accordingly to support such values. One way of looking at this is to consider such structured values as being specialized types of values (domains), with their own domainspecific languages. However, what is usually meant by non-1NF models is the approach in which the relational model and the languages used to query it are extended with a general mechanism for such structure; for instance, the nested relational model supports the use of relations as domain values, by adding two additional operators (nest and unnest) to the relational algebra that can create and flatten nested relations, respectively. Consider the following table: First Normal Form

Person Favorite Color

Bob

blue

Bob

red

Jane

green

Jane

yellow

Jane

red

Assume a person has several favorite colors. Obviously, favorite colors consist of a set of colors modeled by the given table. To transform a 1NF into an NF table a "nest" operator is required which extends the relational algebra of the higher normal forms. Applying the "nest" operator to the 1NF table yields the following NF table: Non-First Normal Form

Person Favorite Colors

Favorite Color

Bob

blue

red

Favorite Color

green Jane yellow

red

To transform this NF table back into a 1NF an "unnest" operator is required which extends the relational algebra of the higher normal forms (one would allow "colors" to be its own table). Although "unnest" is the mathematical inverse to "nest", the operator "nest" is not always the mathematical inverse of "unnest". Another constraint required is for the operators to be bijective, which is covered by the Partitioned Normal Form (PNF).

Вам также может понравиться