Вы находитесь на странице: 1из 22

Database normalization

This article or section is in need of attention from an expert on the subject. WikiProject Computer science or the Computer science Portal may be able to help recruit one. If a more appropriate WikiProject or portal exists, please adjust this template accordingly. Database normalization is a design technique by which relational database tables are structured in such a way as to make them invulnerable to certain types of logical inconsistencies and anomalies. Tables can be normalized to varying degrees: relational database theory defines "normal forms" of successively higher degrees of stringency, so, for example, a table in third normal form is less open to logical inconsistencies and anomalies than a table that is only in second normal form. Although the normal forms are often defined (informally) in terms of the characteristics of tables, rigorous definitions of the normal forms are concerned with the characteristics of mathematical constructs known as relations. Whenever information is represented relationallythat is, roughly speaking, as values within rows beneath fixed column headingsit makes sense to ask to what extent the representation is normalized. Contents 1 Problems addressed by normalization 2 Background to normalization: definitions 3 History 4 Normal forms 4.1 First normal form 4.2 Second normal form 4.3 Third normal form 4.4 Boyce-Codd normal form 4.5 Fourth normal form 4.6 Fifth normal form 4.7 Domain/key normal form 4.8 Sixth normal form 5 Example Of The Process 5.1 Starting Point 5.2 1NF 5.3 2NF 5.4 3NF and BCNF 5.5 4NF 5.6 5NF 6 Denormalization 6.1 Non-first normal form (NF) 7 Further reading 8 References 9 See also 10 External links

Problems addressed by normalization


A table that is not sufficiently normalized can suffer from logical inconsistencies of various types, and from anomalies involving data operations. In such a table:

The same fact can be expressed on multiple records; therefore updates to the table may result in logical inconsistencies. For example, each record in an unnormalized "DVD Rentals" table might contain a DVD ID, Member ID, and Member Address; thus a change of address for a particular member will potentially need to be applied to multiple records. If the update is not carried through successfullyif, that is, the member's address is updated on some records but not othersthen the table is left in an inconsistent state. Specifically, the table provides conflicting answers to the question of what this particular member's address is. This phenomenon is known as an update anomaly. There are circumstances in which certain facts cannot be recorded at all. In the above example, if it is the case that Member Address is held only in the "DVD Rentals" table, then we cannot record the address of a member who has not yet rented any DVDs. This phenomenon is known as an insertion anomaly. There are circumstances in which the deletion of data representing certain facts necessitates the deletion of data representing completely different facts. For example, suppose a table has the attributes Student ID, Course ID, and Lecturer ID (a given student is enrolled in a given course, which is taught by a given lecturer). If the number of students enrolled in the course temporarily drops to zero, the last of the records referencing that course must be deletedmeaning, as a side-effect, that the table no longer tells us which lecturer has been assigned to teach the course. This phenomenon is known as a deletion anomaly.

Ideally, a relational database should be designed in such a way as to exclude the possibility of update, insertion, and deletion anomalies. The normal forms of relational database theory provide guidelines for deciding whether a particular design will be vulnerable to such anomalies. It is possible to correct an unnormalized design so as to make it adhere to the demands of the normal forms: this is normalization. Normalization typically involves decomposing an unnormalized table into two or more tables which, were they to be combined (joined), would convey exactly the same information as the original table. Background to normalization: definitions

Functional dependency: Attribute B has a functional dependency on attribute A if, for each value of attribute A, there is exactly one value of attribute B. For example, Member Address has a functional dependency on Member ID, because a particular Member Address value corresponds to every Member ID value. An attribute may be functionally dependent either on a single attribute or on a combination of attributes. It is not possible to determine the extent to which a design is normalized without understanding what functional dependencies apply to the attributes within its tables; understanding this, in turn, requires knowledge of the problem domain. Trivial functional dependency: A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Member ID, Member Address} {Member Address} is trivial, as is {Member Address} {Member Address}. Full functional dependency: An attribute is fully functionally dependent on a set of attributes X if it is a) functionally dependent on X, and b) not functionally dependent on any proper subset of X. {Member Address} has a functional dependency on {DVD ID, Member ID}, but not a full functional dependency, for it is also dependent on {Member ID}. Multivalued dependency: A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows: see the Multivalued Dependency article for a rigorous definition. Superkey: A superkey is an attribute or set of attributes that uniquely identifies rows within a table; in other words, two distinct rows are always guaranteed to have distinct superkeys. {DVD ID, Member ID, Member Address} would be a superkey for the "DVD Rentals" table; {DVD ID, Member ID} would also be a superkey. Candidate key: A candidate key is a minimal superkey, that is, a superkey for which we can say that no proper subset of it is also a superkey. {DVD ID, Member ID} would be a candidate key for the "DVD Rentals" table. Non-prime attribute: A non-prime attribute is an attribute that does not occur in any candidate key. Member Address would be a non-prime attribute in the "DVD Rentals" table. Primary key: Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary key is a candidate key which the database designer has designated for this purpose.

History Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form: There is, in fact, a very simple elimination[1] procedure which we shall call normalization. Through decomposition non-simple domains are replaced by "domains whose elements are atomic (non-decomposable) values."

Edgar F. Codd, A Relational Model of Data for Large Shared Data Banks[2]

In his paper, Edgar F. Codd used the term "non-simple" domains to describe a heterogeneous data structure, but later researchers would refer to such a structure as an abstract data type. Normal forms The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less vulnerable it is to such inconsistencies and anomalies. Each table has a "highest normal form" (HNF): by definition, a table always meets the requirements of its HNF and of all normal forms lower than its HNF; also by definition, a table fails to meet the requirements of any normal form higher than its HNF. The normal forms are applicable to individual tables; to say that an entire database is in normal form n is to say that all of its tables are in normal form n. Newcomers to database design sometimes suppose that normalization proceeds in an iterative fashion, i.e. a 1NF design is first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of how normalization typically works. A sensibly designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms (above 3NF) does not usually require an extra expenditure of effort on the part of the designer, because 3NF tables usually need no modification to meet the requirements of these higher normal forms. Edgar F. Codd originally defined the first three normal forms (1NF, 2NF, and 3NF). These normal forms have been summarized as requiring that all non-key attributes be dependent on "the key, the whole key and nothing but the key". The fourth and fifth normal forms (4NF and 5NF) deal specifically with the representation of many-to-many and one-tomany relationships among attributes. Sixth normal form (6NF) incorporates considerations relevant to temporal databases. First normal form The criteria for first normal form (1NF) are:

A table must be guaranteed not to have any duplicate records; therefore it must have at least one candidate key. There must be no duplicate groups, i.e. no attributes which occur a different number of times on different records. For example, suppose that an employee can have multiple skills: a possible representation of employees' skills is {Employee ID, Skill1, Skill2, Skill3 ...}, where {Employee ID} is the unique identifier for a record. This representation would not be in 1NF. Note that all relations are in 1NF. The question of whether a given representation is in 1NF is equivalent to the question of whether it is a relation. Second normal form The criteria for second normal form (2NF) are:

The table must be in 1NF.

None of the non-prime attributes of the table are functionally dependent on a part (proper subset) of a candidate key; in other words, all functional dependencies of non-prime attributes on candidate keys are full functional dependencies. For example, consider a "Department Members" table whose attributes are Department ID, Employee ID, and Employee Date of Birth; and suppose that an employee works in one or more departments. The combination of Department ID and Employee ID uniquely identifies records within the table. Given that Employee Date of Birth depends on only one of those attributes namely, Employee ID the table is not in 2NF. Note that if none of a 1NF table's candidate keys are composite i.e. every candidate key consists of just one attribute then we can say immediately that the table is in 2NF. Third normal form

The criteria for third normal form (3NF) are:

The table must be in 2NF.

There are no non-trivial functional dependencies between non-prime attributes. A violation of 3NF would mean that at least one non-prime attribute is only indirectly dependent (transitively dependent) on a candidate key, by virtue of being functionally dependent on another non-prime attribute. For example, consider a "Departments" table whose attributes are Department ID, Department Name, Manager ID, and Manager Hire Date; and suppose that each manager can manage one or more departments. {Department ID} is a candidate key. Although Manager Hire Date is functionally dependent on {Department ID}, it is also functionally dependent on the non-prime attribute Manager ID. This means the table is not in 3NF. Boyce-Codd normal form The criteria for Boyce-Codd normal form (BCNF) are:


Fourth normal form

The table must be in 3NF. Every non-trivial functional dependency must be a dependency on a superkey.

The criteria for fourth normal form (4NF) are:

The table must be in BCNF.

There must be no non-trivial multivalued dependencies on something other than a superkey. A BCNF table is said to be in 4NF if and only if all of its multivalued dependencies are functional dependencies. Fifth normal form The criteria for fifth normal form (5NF and also PJ/NF) are:

The table must be in 4NF.

There must be no non-trivial join dependencies that do not follow from the key constraints. A 4NF table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys. Domain/key normal form Domain/key normal form (or DKNF) requires that a table not be subject to any constraints other than domain constraints and key constraints. Sixth normal form

It has been suggested that this section be split into a new article entitled Sixth normal form. (Discuss) This normal form was, as of 2005, only recently proposed: the sixth normal form (6NF) was only defined when extending the relational model to take into account the temporal dimension. Unfortunately, most current SQL technologies as of 2005 do not take into account this work, and most temporal extensions to SQL are not relational. See work by Date, Darwen and Lorentzos[3] for a relational temporal extension, or see TSQL2 for a different approach. Example Of The Process The following example illustrates how a database designer might employ his knowledge of the normal forms to make progressive improvements to an initially unnormalized database design. The example is somewhat contrived: in practice, few designs lend themselves to being normalized in strict stages in which the HNF increases at each stage.

The database in the example captures information about the suppliers with which various companies' divisions have relationships more specifically, it captures information about the types of parts which each division of each company sources from its suppliers. Starting Point Information has been presented initially in a way that does not even meet 1NF. Every record is for a particular Company/Division combination: for each of these combinations, repeating groups of part- and supplier-related information occur. 1NF does not permit repeating groups.

Suppliers and Parts By Company Division

Company

Company Founder

Compa ny Logo

Division

Part Type

Supplier

Supplier Country

Supplier Contine nt

Allied Clock and Horace Watch Washington

Sundial

Clocks

Spring Pendulum Spring Toothed Wheel

Tensile Globodynamics Tensile Globodynamics Pieza de Acero Pieza de Acero

USA USA Mexico Mexico

N. Amer. N. Amer. N. Amer. N. Amer.

Allied Clock and Horace Watch Washington

Sundial

Watches

Quartz Crystal Tuning Fork Battery

Microflux Microflux Dakota Electrics

Belgium Belgium USA

Europe Europe N. Amer.

Global Robot

Nils Neumann

Gearbo x

Industrial Robots

Flywheel Axle Axle Mechanical Arm

Wheels 4 Less Wheels 4 Less TransEuropa TransEuropa

USA USA Italy Italy

N. Amer. N. Amer. Europe Europe

Global Robot

Nils Neumann

Gearbo x

Domestic Robots

Artificial Brain Artificial Brain Metal Housing Backplate

Prometheus Labs Frankenstein Labs Pieza de Acero Pieza de Acero

Luxembou rg Germany Mexico Mexico

Europe Europe N. Amer. N. Amer.

1NF We eliminate the repeating groups by ensuring that each group appears on its own record. The unique identifier for a record is now {Company, Division, Part Type, Supplier}.

Suppliers and Parts By Company Division

Company

Company Founder

Compa ny Logo

Division

Part Type

Supplier

Supplier Country

Supplier Contine nt

Allied Clock and Horace Watch Washington

Sundial

Clocks

Spring

Tensile Globodynamics

USA

N. Amer.

Allied Clock and Horace Watch Washington

Sundial

Clocks

Pendulum

Tensile Globodynamics

USA

N. Amer.

Allied Clock and Horace Watch Washington

Sundial

Clocks

Spring

Pieza de Acero

Mexico

N. Amer.

Allied Clock and Horace Watch Washington

Sundial

Clocks

Toothed Wheel

Pieza de Acero

Mexico

N. Amer.

Allied Clock and Horace Watch Washington

Sundial

Watches

Quartz Crystal

Microflux

Belgium

Europe

Allied Clock and Horace Watch Washington

Sundial

Watches

Tuning Fork

Microflux

Belgium

Europe

Allied Clock and Horace Watch Washington

Sundial

Watches

Battery

Dakota Electrics

USA

N. Amer.

Global Robot

Nils Neumann

Gearbo x

Industrial Robots

Flywheel

Wheels 4 Less

USA

N. Amer.

Global Robot

Nils Neumann

Gearbo x

Industrial Robots

Axle

Wheels 4 Less

USA

N. Amer.

Global Robot

Nils Neumann

Gearbo x

Industrial Robots

Axle

TransEuropa

Italy

Europe

Global Robot

Nils Neumann

Gearbo x

Industrial Robots

Mechanical Arm

TransEuropa

Italy

Europe

Global Robot

Nils Neumann

Gearbo x

Domestic Robots

Artificial Brain

Prometheus Labs

Luxembou rg

Europe

Global Robot

Nils Neumann

Gearbo x

Domestic Robots

Artificial Brain

Frankenstein Labs

Germany

Europe

Global Robot

Nils Neumann

Gearbo x

Domestic Robots

Metal Housing

Pieza de Acero

Mexico

N. Amer.

Global Robot

Nils Neumann

Gearbo x

Domestic Robots

Backplate

Pieza de Acero

Mexico

N. Amer.

2NF One problem with the design at this stage is that Company Founder and Company Logo details for a given company may appear redundantly on more than one record; so may Supplier Countries and Continents. These phenomena arise from the part-key dependencies of a) the Company Founder and Company Logo attributes on Company, and b) the Supplier Country and Supplier Continent attributes on Supplier. 2NF does not permit part-key dependencies. We correct the problem by splitting out the Company Founder and Company Logo details into their own table, called Companies, as well as splitting out the Supplier Country and Supplier Continent Details into their own table, called Suppliers.

Suppliers and Parts By Company Division

Company

Division

Part Type

Supplier

Allied Clock and Watch Clocks

Spring

Tensile Globodynamics

Allied Clock and Watch Clocks

Pendulum

Tensile Globodynamics

Allied Clock and Watch Clocks

Spring

Pieza de Acero

Allied Clock and Watch Clocks

Toothed Wheel

Pieza de Acero

Allied Clock and Watch Watches

Quartz Crystal

Microflux

Allied Clock and Watch Watches

Tuning Fork

Microflux

Allied Clock and Watch Watches

Battery

Dakota Electrics

Global Robot

Industrial Robots

Flywheel

Wheels 4 Less

Global Robot

Industrial Robots

Axle

Wheels 4 Less

Global Robot

Industrial Robots

Axle

TransEuropa

Global Robot

Industrial Robots

Mechanical Arm

TransEuropa

Global Robot

Domestic Robots

Artificial Brain

Prometheus Labs

Global Robot

Domestic Robots

Artificial Brain

Frankenstein Labs

Global Robot

Domestic Robots

Metal Housing

Pieza de Acero

Global Robot

Domestic Robots

Backplate

Pieza de Acero

Companies

Company

Company Founder

Company Logo

Allied Clock and Watch

Horace Washington

Sundial

Global Robot

Nils Neumann

Gearbox

Suppliers

Supplier

Supplier

Supplier

Country

Continent

Tensile Globodynamics USA

N. Amer.

Pieza de Acero

Mexico

N. Amer.

Microflux

Belgium

Europe

Dakota Electrics

USA

N. Amer.

Wheels 4 Less

USA

N. Amer.

TransEuropa

Italy

Europe

Prometheus Labs

Luxembourg

Europe

Frankenstein Labs

Germany

Europe

3NF and BCNF There is still, however, redundancy in the design. The Supplier Continent for a given Supplier Country may appear redundantly on more than one record. This phenomenon arises from the dependency of non-key attribute Supplier Continent on non-key attribute Supplier Country, and means that the design does not conform to 3NF. To achieve 3NF (and, while we are at it, BCNF), we create a separate Countries table which tells us which continent a country belongs to.

Suppliers and Parts By Company Division

Company

Division

Part Type

Supplier

Allied Clock and Watch Clocks

Spring

Tensile Globodynamics

Allied Clock and Watch Clocks

Pendulum

Tensile Globodynamics

Allied Clock and Watch Clocks

Spring

Pieza de Acero

Allied Clock and Watch Clocks

Toothed Wheel

Pieza de Acero

Allied Clock and Watch Watches

Quartz Crystal

Microflux

Allied Clock and Watch Watches

Tuning Fork

Microflux

Allied Clock and Watch Watches

Battery

Dakota Electrics

Global Robot

Industrial Robots

Flywheel

Wheels 4 Less

Global Robot

Industrial Robots

Axle

Wheels 4 Less

Global Robot

Industrial Robots

Axle

TransEuropa

Global Robot

Industrial Robots

Mechanical Arm

TransEuropa

Global Robot

Domestic Robots

Artificial Brain

Prometheus Labs

Global Robot

Domestic Robots

Artificial Brain

Frankenstein Labs

Global Robot

Domestic Robots

Metal Housing

Pieza de Acero

Global Robot

Domestic Robots

Backplate

Pieza de Acero

Suppliers

Supplier

Supplier Country

Tensile Globodynamics

USA

Pieza de Acero

Mexico

Microflux

Belgium

Dakota Electrics

USA

Wheels 4 Less

USA

TransEuropa

Italy

Prometheus Labs

Luxembourg

Frankenstein Labs

Germany

Companies

Company

Company Founder

Company Logo

Allied Clock and Watch

Horace Washington

Sundial

Global Robot

Nils Neumann

Gearbox

Countries

Country

Contine

nt

USA

N. Amer.

Mexico

N. Amer.

Belgium

Europe

Italy

Europe

Luxembourg Europe 4NF What happens if a company has more than one founder or more than one logo? (Let us assume for the sake of the example that both of these things may happen.) One way of handling the situation would be to alter the primary key of our Companies table to {Company, Company Founder, Company Logo}. Representing multiple founders and multiple logos then becomes possible, but at the price of redundancy: Companies Company Allied Clock and Watch Global Robot International Broom International Broom International Broom International Broom Company Founder Horace Washington Nils Neumann Gareth Patterson Sandra Patterson Gareth Patterson Sandra Patterson Company Logo Sundial Gearbox Whirlwind Whirlwind Sweeper Sweeper

This type of redundancy reflects the fact that the design does not conform to 4NF. We correct the design by separating facts about founders from facts about logos. Suppliers and Parts By Company Division Company Division Part Type Spring Pendulum Spring Toothed Wheel Quartz Crystal Tuning Fork Battery Flywheel Axle Supplier Tensile Globodynamics Tensile Globodynamics Pieza de Acero Pieza de Acero Microflux Microflux Dakota Electrics Wheels 4 Less Wheels 4 Less

Allied Clock and Watch Clocks Allied Clock and Watch Clocks Allied Clock and Watch Clocks Allied Clock and Watch Clocks Allied Clock and Watch Watches Allied Clock and Watch Watches Allied Clock and Watch Watches Global Robot Global Robot Industrial Robots Industrial

Robots Global Robot Global Robot Global Robot Global Robot Global Robot Global Robot Industrial Robots Industrial Robots Domestic Robots Domestic Robots Domestic Robots Domestic Robots Axle Mechanical Arm Artificial Brain Artificial Brain Metal Housing Backplate TransEuropa TransEuropa Prometheus Labs Frankenstein Labs Pieza de Acero Pieza de Acero

Companies

Company

Allied Clock and Watch

Global Robot

International Broom

Company Logos

Company

Company Logo

Allied Clock and Watch Sundial

Global Robot

Gearbox

International Broom

Whirlwind

International Broom

Sweeper

Company Founders

Company

Company Founder

Allied Clock and Watch

Horace Washington

Global Robot

Nils Neumann

International Broom

Gareth Patterson

International Broom

Sandra Patterson

Suppliers

Supplier

Supplier Country

Tensile Globodynamics

USA

Pieza de Acero

Mexico

Microflux

Belgium

Dakota Electrics

USA

Wheels 4 Less

USA

TransEuropa

Italy

Prometheus Labs

Luxembourg

Frankenstein Labs

Germany

Countries

Country

Contine nt

USA

N. Amer.

Mexico

N. Amer.

Belgium

Europe

Italy

Europe

Luxembourg Europe

5NF We know that the Clocks division of Allied Clock and Watch relies upon its suppliers to provide springs, pendulums, and toothed wheels. We also know that the Clocks division deals with suppliers Tensile Globodynamics and Pieza de Acero. Let us suppose for the sake of the example that the following rule applies: if a supplier that a division deals with offers a part that the division needs, the division will always purchase it. If, for example, Tensile Globodynamics start producing Toothed Wheels, then Allied Clock and Watch will start purchasing them. This rule leads to redundancy in our design as it stands, causing it to fall short of 5NF. We correct the design by recording part-types-by-company-division separately from suppliers-by-company-division, and adding a further table that provides information as to which suppliers offer which parts.

Part Types By Company Division

Company

Division

Part Type

Allied Clock and Watch Clocks

Spring

Allied Clock and Watch Clocks

Pendulum

Allied Clock and Watch Clocks

Toothed Wheel

Allied Clock and Watch Watches

Quartz Crystal

Allied Clock and Watch Watches

Tuning Fork

Allied Clock and Watch Watches

Battery

Global Robot

Industrial Robots

Flywheel

Global Robot

Industrial Robots

Axle

Global Robot

Industrial Robots

Mechanical Arm

Global Robot

Domestic Robots

Artificial Brain

Global Robot

Domestic Robots

Metal Housing

Global Robot

Domestic Robots

Backplate

Suppliers By Company Division

Company

Division

Supplier

Allied Clock and Watch Clocks

Tensile Globodynamics

Allied Clock and Watch Clocks

Pieza de Acero

Allied Clock and Watch Watches

Microflux

Allied Clock and Watch Watches

Dakota Electrics

Global Robot

Industrial Robots

Wheels 4 Less

Global Robot

Industrial Robots

TransEuropa

Global Robot

Domestic Robots

Prometheus Labs

Global Robot

Domestic Robots

Frankenstein Labs

Global Robot

Domestic Robots

Pieza de Acero

Parts By Supplier

Part Type

Supplier

Spring

Tensile Globodynamics

Pendulum

Tensile Globodynamics

Spring

Pieza de Acero

Toothed Wheel

Pieza de Acero

Quartz Crystal

Microflux

Tuning Fork

Microflux

Battery

Dakota Electrics

Flywheel

Wheels 4 Less

Axle

Wheels 4 Less

Axle

TransEuropa

Mechanical Arm TransEuropa

Artificial Brain

Prometheus Labs

Artificial Brain

Frankenstein Labs

Metal Housing

Pieza de Acero

Backplate

Pieza de Acero

Companies

Company

Company Logo

Allied Clock and Watch Sundial

Global Robot

Gearbox

Company Founders

Company

Company Founder

Allied Clock and Watch

Horace Washington

Global Robot

Nils Neumann

International Broom

Gareth Patterson

International Broom

Sandra Patterson

Suppliers

Supplier

Supplier Country

Tensile Globodynamics

USA

Pieza de Acero

Mexico

Microflux

Belgium

Dakota Electrics

USA

Wheels 4 Less

USA

TransEuropa

Italy

Prometheus Labs

Luxembourg

Frankenstein Labs

Germany

Countries

Country

Contine nt

USA

N. Amer.

Mexico

N. Amer.

Belgium

Europe

Italy

Europe

Luxembourg Europe

Denormalization Databases intended for Online Transaction Processing (OLTP) are typically more normalized than databases intended for On Line Analytical Processing (OLAP). OLTP Applications are characterized by a high volume of small transactions such as updating a sales record at a super market checkout counter. The expectation is that each transaction will leave the database in a consistent state. By contrast, databases intended for OLAP operations are primarily "read only" databases. OLAP applications tend to extract historical data that has accumulated over a long period of time. For such databases, redundant or "denormalized" data may facilitate Business Intelligence applications. Specifically, dimensional tables in a star schema often contain denormalized data. The denormalized or redundant data must be carefully controlled during ETL processing, and users should not be permitted to see the data until it is in a consistent state. The normalized alternative to the star schema is the snowflake schema. Denormalization is also used to improve performance on smaller computers as in computerized cash-registers. Since these use the data for look-up only (e.g. price lookups), no changes are to be made to the data and a swift response is crucial. Non-first normal form (NF) In recognition that denormalization can be deliberate and useful, the non-first normal form is a definition of database designs which do not conform to the first normal form, by allowing "sets and sets of sets to be attribute domains" (Schek 1982). This extension introduces hierarchies in relations. Consider the following table:

Non-First Normal Form

Person

Favorite Colors

Bob

blue, red

Jane

green, yellow, red

Assume a person has several favorite colors. Obviously, favorite colors consist of a set of colors modeled by the given table. To transform this NF table into a 1NF an "unnest" operator is required which extends the relational algebra of the higher normal forms. The reverse operator is called "nest" which is not always the mathematical inverse of "unnest", although "unnest" is the mathematical inverse to "nest". Another constraint required is for the operators to be bijective, which is covered by the Partitioned Normal Form (PNF).

Further reading

Litt's Tips: Normalization Date, C. J., & Lorentzos, N., & Darwen, H. (2002). Temporal Data & the Relational Model (1st ed.). Morgan Kaufmann. ISBN 1-55860-855-9. Date, C. J. (1999), An Introduction to Database Systems (8th ed.). Addison-Wesley Longman. ISBN 0-32119784-4. Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM, vol. 26, pp. 120-125 Date, C.J., & Darwen, H., & Pascal, F. Database Debunkings H.-J. Schek, P.Pistor Data Structures for an Integrated Data Base Management and Information Retrieval System

References

1. 2. 3.
See also

^ His term eliminate is misleading, as nothing is "lost" in normalization. He probably described eliminate in a mathematical sense to mean elimination of complexity. ^ Codd, Edgar F. (June 1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377-387. ^ DBDebunk.

Aspect (computer science) Cross-cutting concern Inheritance semantics Functional normalization Orthogonalization Refactoring

External links

Database Normalization Basics by Mike Chapple (About.com) Database Normalization Intro, Part 2 An Introduction to Database Normalization by Mike Hillyer. Normalization by ITS, University of Texas. Rules of Data Normalization by Data Model.org A tutorial on the first 3 normal forms by Fred Coulson Free PDF poster available by Marc Rettig Description of the database normalization basics by Microsoft

Database Normalization Basics


If you've been working with databases for a while, chances are you've heard the term normalization. Perhaps someone's asked you "Is that database normalized?" or "Is that in BCNF?" All too often, the reply is "Uh, yeah." Normalization is often brushed aside as a luxury that only academics have time for. However, knowing the principles of normalization and applying them to your daily database design tasks really isn't all that complicated and it could drastically improve the performance of your DBMS. In this article, we'll introduce the concept of normalization and take a brief look at the most common normal forms. Future articles will provide in-depth explorations of the normalization process.

So, what is normalization? Basically, it's the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminate redundant data (for example, storing the same data in more than one table) and ensure data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article. Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. That said, let's explore the normal forms. First normal form (1NF) sets the very basic rules for an organized database: Eliminate duplicative columns from the same table. Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Second normal form (2NF) further addresses the concept of removing duplicative data: Meet all the requirements of the first normal form. Remove subsets of data that apply to multiple rows of a table and place them in separate tables. Create relationships between these new tables and their predecessors through the use of foreign keys.

Third normal form (3NF) goes one large step further: Meet all the requirements of the second normal form. Remove columns that are not dependent upon the primary key.

Finally, fourth normal form (4NF) has one additional requirement: Meet all the requirements of the third normal form. A relation is in 4NF if it has no multi-valued dependencies.

Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.

Вам также может понравиться