Академический Документы
Профессиональный Документы
Культура Документы
• Each row in the table represents a court booking at a tennis club that has one hard court
(Court 1) and one grass court (Court 2)
• A booking is defined by its Court and the period for which the Court is reserved
• Additionally, each booking has a Rate Type associated with it. There are four distinct rate
types:
• SAVER, for Court 1 bookings made by members
• STANDARD, for Court 1 bookings made by non-members
• PREMIUM-A, for Court 2 bookings made by members
• PREMIUM-B, for Court 2 bookings made by non-members
The table's candidate keys are:
• {Court, Start Time}
• {Court, End Time}
• {Rate Type, Start Time}
• {Rate Type, End Time}
Recall that 2NF prohibits partial functional dependencies of non-prime attributes on candidate
keys, and that 3NF prohibits transitive functional dependencies of non-prime attributes on
candidate keys. In the Today's Court Bookings table, there are no non-prime attributes: that is, all
attributes belong to candidate keys. Therefore the table adheres to both 2NF and 3NF.
The table does not adhere to BCNF. This is because of the dependency Rate Type → Court, in
which the determining attribute (Rate Type) is neither a candidate key nor a superset of a
candidate key.
Any table that falls short of BCNF will be vulnerable to logical inconsistencies. In this example,
enforcing the candidate keys will not ensure that the dependency Rate Type → Court is
respected. There is, for instance, nothing to stop us from assigning a PREMIUM A Rate Type to
a Court 1 booking as well as a Court 2 booking—a clear contradiction, as a Rate Type should
only ever apply to a single Court.
The design can be amended so that it meets BCNF:
Today's Bookings
Rate Types Court Start
Rate Type Court Time
1 09:30
SAVER 1 1 11:00
STANDARD 1 1 14:00
PREMIUM- 2 2 10:00
A 2 11:30
PREMIUM- 2 2 15:00
B
The candidate keys for the Rate Types table are {Rate Type} and {Court, Member Flag}; the
candidate keys for the Today's Bookings table are {Court, Start Time} and {Court, End Time}.
Both tables are in BCNF. Having one Rate Type associated with two different Courts is now
impossible, so the anomaly affecting the original table has been eliminated.
Achievability of BCNF
In some cases, a non-BCNF table cannot be decomposed into tables that satisfy BCNF and
preserve the dependencies that held in the original table. Beeri and Bernstein showed in 1979
that, for example, a set of functional dependencies {AB → C, C → B} cannot be represented by
a BCNF schema.[5] Thus, unlike the first three normal forms, BCNF is not always achievable.
Consider the following non-BCNF table whose functional dependencies follow the {AB → C, C
→ B} pattern:
Nearest Shops
P S Ne
ers hop are
on Typ st
e Sho
p
Da Opti E
vid cian agl
son e
Eye
Da Hair S
vid dres nip
son ser pets
W B Me
righ ooks rlin
t hop Bo
oks
Full B Do
er aker ugh
y y's
Full Hair Sw
er dres een
ser ey
Tod
d's
Full Opti E
er cian agl
e
Eye
For each Person / Shop Type combination, the table tells us which shop of this type is
geographically nearest to the person's home.
The candidate keys of the table are:
• {Person, Shop Type}
• {Person, Nearest Shop}
Because all three attributes are prime attributes (i.e. belong to candidate keys), the table is in
3NF. The table is not in BCNF, however, as the Shop Type attribute is functionally dependent on
a non-superkey: Nearest Shop.
The violation of BCNF means that the table is subject to anomalies. For example, Eagle Eye
might have its Shop Type changed to "Optometrist" on its "Fuller" record while retaining the
Shop Type "Optician" on its "Davidson" record. This would imply contradictory answers to the
question: "What is Eagle Eye's Shop Type?" Holding each shop's Shop Type only once would
seem preferable, as doing so would prevent such anomalies from occurring:
Shop
Shop Near Person Shop Shop Type
Person Shop Eagle Optician
Davidson Eagle Eye
Eye Snippets Hairdresser
Davidson Snippets Merlin Bookshop
Wright Merlin Books
Books Doughy' Bakery
Fuller Doughy' s
s Sweeney Hairdresser
Fuller Sweeney Todd's
Todd's
Fuller Eagle
Eye
In this revised design, the "Shop Near Person" table has a candidate key of {Person, Shop}, and
the "Shop" table has a candidate key of {Shop}. Unfortunately, although this design adheres to
BCNF, it is unacceptable on different grounds: it allows us to record multiple shops of the same
type against the same person. In other words, its candidate keys do not guarantee that the
functional dependency {Person, Shop Type} → {Shop} will be respected.
A design that eliminates all of these anomalies (but does not conform to BCNF) is possible. [6]
This design consists of the original "Nearest Shops" table supplemented by the "Shop" table
described above.
Shop
Nearest Shops Shop Shop Type
Person Shop Type Nearest
Eagle Optician
Shop
Eye
Davidson Optician Eagle
Snippets Hairdresser
EyeMerlin Bookshop
Davidson Hairdresser Snippets
Books
Wright Bookshop MerlinDoughy' Bakery
Books
s
Fuller Bakery Doughy'
Sweeney Hairdresser
s Todd's
Fuller Hairdresser Sweeney
Todd's
Fuller Optician Eagle
Eye
If a referential integrity constraint is defined to the effect that {Shop Type, Nearest Shop} from
the first table must refer to a {Shop Type, Shop} from the second table, then the data anomalies
described previously are prevented.
Fourth normal form (4NF) is a normal form used in database normalization. Introduced by
Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce-Codd normal form
(BCNF). Whereas the second, third, and Boyce-Codd normal forms are concerned with
functional dependencies, 4NF is concerned with a more general type of dependency known as a
multivalued dependency. A table is in 4NF if and only if, for every one of its non-trivial
multivalued dependencies X →→ Y, X is a superkey—that is, X is either a candidate key or a
superset thereof.[1]
Multitivalued dependencies
If the column headings in a relational database table are divided into three disjoint groupings X,
Y, and Z, then, in the context of a particular row, we can refer to the data beneath each group of
headings as x, y, and z respectively. A multivalued dependency X →→ Y signifies that if we
choose any x actually occurring in the table (call this choice xc), and compile a list of all the xcyz
combinations that occur in the table, we will find that xc is associated with the same y entries
regardless of z.
A trivial multivalued dependency X →→ Y is one in which Y consists of all columns not
belonging to X. That is, a subset of attributes in a table has a trivial multivalued dependency on
the remaining subset of attributes.
A functional dependency is a special case of multivalued dependency. In a functional
dependency X → Y, every x determines exactly one y, never more than one.
Example Consider the following example:
Pizza Delivery Permutations
Restaurant Pizza Variety Delivery Area
A1 Pizza Thick Crust Springfield
A1 Pizza Thick Crust Shelbyville
A1 Pizza Thick Crust Capital City
A1 Pizza Stuffed Crust Springfield
A1 Pizza Stuffed Crust Shelbyville
A1 Pizza Stuffed Crust Capital City
Elite Pizza Thin Crust Capital City
Elite Pizza Stuffed Crust Capital City
Vincenzo's Pizza Thick Crust Springfield
Vincenzo's Pizza Thick Crust Shelbyville
Vincenzo's Pizza Thin Crust Springfield
Vincenzo's Pizza Thin Crust Shelbyville
Each row indicates that a given restaurant can deliver a given variety of pizza to a given area.
The table has no non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery
Area}. Therefore it meets all normal forms up to BCNF. It does not, however, meet 4NF. The
problem is that the table features two non-trivial multivalued dependencies on the {Restaurant}
attribute (which is not a superkey). The dependencies are:
• {Restaurant} →→ {Pizza Variety}
• {Restaurant} →→ {Delivery Area}
These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties
of pizza a restaurant offers are independent from the areas to which the restaurant delivers. This
state of affairs leads to redundancy in the table: for example, we are told three times that A1
Pizza offers Stuffed Crust, and if A1 Pizza start producing Cheese Crust pizzas then we will need
to add multiple rows, one for each of A1 Pizza's delivery areas. There is, moreover, nothing to
prevent us from doing this incorrectly: we might add Cheese Crust rows for all but one of A1
Pizza's delivery areas, thereby failing to respect the multivalued dependency {Restaurant} →→
{Pizza Variety}.
To eliminate the possibility of these anomalies, we must place the facts about varieties offered
into a different table from the facts about delivery areas, yielding two tables that are both in 4NF:
Delivery Areas By
Varieties By Restaurant
Restaurant Restaurant Delivery
Restaurant Pizza Area
Variety A1 Pizza Springfield
A1 Pizza Thick A1 Pizza Shelbyville
Crust A1 Pizza Capital
A1 Pizza Stuffed City
Crust Elite Pizza Capital
Elite Pizza Thin City
Crust Vincenzo's Springfield
Elite Pizza Stuffed Pizza
Crust Vincenzo's Shelbyville
Vincenzo's Thick Pizza
Pizza Crust
Vincenzo's Thin
Pizza Crust
In contrast, if the pizza varieties offered by a restaurant sometimes did legitimately vary from
one delivery area to another, the original three-column table would satisfy 4NF.
Ronald Fagin demonstrated[2] that it is always possible to achieve 4NF. Rissanen's theorem is
also applicable on multivalued dependencies.
4NF in practice
A 1992 paper by Margaret S. Wu notes that the teaching of database normalization typically
stops short of 4NF, perhaps because of a belief that tables violating 4NF (but meeting all lower
normal forms) are rarely encountered in business applications. This belief may not be accurate,
however. Wu reports that in a study of forty organizational databases, over 20% contained one or
more tables that violated 4NF while meeting all lower normal forms
Note how this setup helps to remove redundancy. Suppose that Jack Schneider starts selling
Robusto's products. In the previous setup we would have to add two new entries since Jack
Schneider is able to sell two Product Types covered by Robusto: Breadboxes and Vacuum
Cleaners. With the new setup we need only add a single entry (in Brands By Travelling
Salesman).
Usage
Only in rare situations does a 4NF table not conform to 5NF. These are situations in which a
complex real-world constraint governing the valid combinations of attribute values in the 4NF
table is not implicit in the structure of that table. If such a table is not normalized to 5NF, the
burden of maintaining the logical consistency of the data within the table must be carried partly
by the application responsible for insertions, deletions, and updates to it; and there is a
heightened risk that the data within the table will become inconsistent. In contrast, the 5NF
design excludes the possibility of such inconsistencies. Spurious rows in result set may occur
unless you re-join ALL of the tables in 5NF.
Contents
[hide]
• 1 1NF tables as representations of relations
• 2 Repeating groups
○ 2.1 Example 1: Domains and values
○ 2.2 Example 2: Repeating groups across columns
○ 2.3 Example 3: Repeating groups within columns
○ 2.4 A design that complies with 1NF
• 3 Atomicity
• 4 Normalization beyond 1NF
• 5 Notes and references
• 6 See also
• 7 Further reading
• 8 External links
Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-
Telephone Number link appears on its own record.
[edit] Atomicity
Some definitions of 1NF, most notably that of Edgar F. Codd, make reference to the concept of
atomicity. Codd states that the "values in the domains on which each relation is defined are
required to be atomic with respect to the DBMS." [8] Codd defines an atomic value as one that
"cannot be decomposed into smaller pieces by the DBMS (excluding certain special
functions)."[9]
Hugh Darwen and Chris Date have suggested that Codd's concept of an "atomic value" is
ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be
understood.[10][11] In particular, the notion of a "value that cannot be decomposed" is problematic,
as it would seem to imply that few, if any, data types are atomic:
• A character string would seem not be atomic, as the RDBMS typically provides operators
to decompose it into substrings.
• A date would seem not to be atomic, as the RDBMS typically provides operators to
decompose it into day, month, and year components.
• A fixed-point number would seem not to be atomic, as the RDBMS typically provides
operators to decompose it into integer and fractional components.
Date suggests that "the notion of atomicity has no absolute meaning":[12] a value may be
considered atomic for some purposes, but may be considered an assemblage of more basic
elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to
atomicity. Columns of any conceivable data type (from string types and numeric types to array
types and table types) are then acceptable in a 1NF table—although perhaps not always
desirable. Date argues that relation-valued attributes, by means of which a field within a table
can contain a table, are useful in rare cases.[13]
[edit] Normalization beyond 1NF
Any table that is in second normal form (2NF) or higher is, by definition, also in 1NF (each
normal form has more stringent criteria than its predecessor). On the other hand, a table that is in
1NF may or may not be in 2NF; if it is in 2NF, it may or may not be in 3NF, and so on.
Normal forms higher than 1NF are intended to deal with situations in which a table suffers from
design problems that may compromise the integrity of the data within it. For example, the
following table is in 1NF, but is not in 2NF and therefore is vulnerable to logical inconsistencies:
Customer Names and Telephone Numbers
Customer ID First Name Surname Telephone Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659
456 Jane Wright 555-776-4100
789 Maria Fernandez 555-808-9633
The table's key is {Customer ID, Telephone Number}.
If Jane Wright changes her surname by marriage, the change must be applied to two rows. If the
change is only applied to one row, a contradiction results: the question "What is Customer 456's
name?" has two conflicting answers. 2NF addresses this problem.
Second normal form
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Second normal form (2NF) is a normal form used in database normalization. 2NF was
originally defined by E.F. Codd[1] in 1971. A table that is in first normal form (1NF) must meet
additional criteria if it is to qualify for second normal form. Specifically: a 1NF table is in 2NF if
and only if, given any candidate key and any attribute that is not a constituent of a candidate key,
the non-key attribute depends upon the whole of the candidate key rather than just a part of it.
In slightly more formal terms: a 1NF table is in 2NF if and only if none of its non-prime
attributes are functionally dependent on a part (proper subset) of a candidate key. (A non-prime
attribute is one that does not belong to any candidate key.)
Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more
than one attribute), the table is automatically in 2NF.
Contents
[hide]
• 1 Example
• 2 2NF and candidate keys
• 3 References
• 4 See also
• 5 Further reading
• 6 External links
[edit] Example
Consider a table describing employees' skills:
Employees' Skills
Employee Skill Current Work Location
Jones Typing 114 Main Street
Jones Shorthand 114 Main Street
Jones Whittling 114 Main Street
Roberts Light Cleaning 73 Industrial Way
Ellis Alchemy 73 Industrial Way
Ellis Juggling 73 Industrial Way
Harrison Light Cleaning 73 Industrial Way
The table's only candidate key is {Employee, Skill}.
The remaining attribute, Current Work Location, is dependent on only part of the candidate key,
namely Employee. Therefore the table is not in 2NF. Note the redundancy in the way Current
Work Locations are represented: we are told three times that Jones works at 114 Main Street, and
twice that Ellis works at 73 Industrial Way. This redundancy makes the table vulnerable to
update anomalies: it is, for example, possible to update Jones' work location on his "Typing" and
"Shorthand" records and not update his "Whittling" record. The resulting data would imply
contradictory answers to the question "What is Jones' current work location?"
A 2NF alternative to this design would represent the same information in two tables:
Employees Employees' Skills
Current Employee Skill
Employee Work Jones Typing
Location Jones Shorthand
114 Main Jones Whittling
Jones
Street Light
73 Roberts
Cleaning
Roberts Industrial Ellis Alchemy
Way Ellis Juggling
73 Light
Ellis Industrial Harrison Cleaning
Way
73
Harrison Industrial
Way
Update anomalies cannot occur in these tables, which are both in 2NF.
Not all 2NF tables are free from update anomalies, however. An example of a 2NF table which
suffers from update anomalies is:
Tournament Winners
Tournament Year Winner Winner Date of Birth
Des Moines Masters 1998 Chip Masterson 14 March 1977
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
Even though Winner and Winner Date of Birth are determined by the whole key {Tournament,
Year} and not part of it, particular Winner / Winner Date of Birth combinations are shown
redundantly on multiple records. This problem is addressed by third normal form (3NF).
[edit] 2NF and candidate keys
A table for which there are no partial functional dependencies on the primary key is typically, but
not always, in 2NF. In addition to the primary key, the table may contain other candidate keys; it
is necessary to establish that no non-prime attributes have part-key dependencies on any of these
candidate keys.
Multiple candidate keys occur in the following table:
Electric Toothbrush Models
Manufacturer Model Model Full Name Manufacturer Country
Forte X-Prime Forte X-Prime Italy
Forte Ultraclean Forte Ultraclean Italy
Dent-o-Fresh EZBrush Dent-o-Fresh EZBrush USA
Kobayashi ST-60 Kobayashi ST-60 Japan
Hoch Toothmaster Hoch Toothmaster Germany
Hoch Contender Hoch Contender Germany
Even if the designer has specified the primary key as {Model Full Name}, the table is not in
2NF. {Manufacturer, Model} is also a candidate key, and Manufacturer Country is dependent on
a proper subset of it: Manufacturer.
The third normal form (3NF) is a normal form used in database normalization. 3NF was
originally defined by E.F. Codd[1] in 1971. Codd's definition states that a table is in 3NF if and
only if both of the following conditions hold:
• The relation R (table) is in second normal form (2NF)
• Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on
every key of R.
A non-prime attribute of R is an attribute that does not belong to any candidate key of R. [2] A
transitive dependency is a functional dependency in which X → Z (X determines Z) indirectly, by
virtue of X → Y and Y → Z (where it is not the case that Y → X).[3]
A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo
Zaniolo in 1982. This definition states that a table is in 3NF if and only if, for each of its
functional dependencies X → A, at least one of the following conditions holds:
• X contains A (that is, X → A is trivial functional dependency), or
• X is a superkey, or
• A is a prime attribute (i.e., A is contained within a candidate key)[4]
Zaniolo's definition gives a clear sense of the difference between 3NF and the more stringent
Boyce-Codd normal form (BCNF). BCNF simply eliminates the third alternative ("A is a prime
attribute").
[edit] "Nothing but the key"
A memorable summary of Codd's definition of 3NF, paralleling the traditional pledge to give true
evidence in a court of law, was given by Bill Kent: every non-key attribute "must provide a fact
about the key, the whole key, and nothing but the key."[5] A common variation supplements this
definition with the oath: "so help me Codd".[6]
Requiring that non-key attributes be dependent on "the whole key" ensures that a table is in 2NF;
further requiring that non-key attributes be dependent on "nothing but the key" ensures that the
table is in 3NF.
Chris Date refers to Kent's summary as "an intuitively attractive characterization" of 3NF, and
notes that with slight adaptation it may serve as a definition of the slightly-stronger Boyce-Codd
normal form: "Each attribute must represent a fact about the key, the whole key, and nothing but
the key."[7] Here the requirement is concerned with every attribute in the table, not just non-key
attributes.
[edit] Example
An example of a 2NF table that fails to meet the requirements of 3NF is:
Tournament Winners
Tournament Year Winner Winner Date of Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
The only candidate key is {Tournament, Year}.
The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively
dependent on {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date
of Birth is functionally dependent on Winner makes the table vulnerable to logical
inconsistencies, as there is nothing to stop the same person from being shown with different
dates of birth on different records.
In order to express the same facts without violating 3NF, it is necessary to split the table into
two:
Player Dates of Birth
Tournament Winners Player Date of
Tournament Year Winner Birth
Indiana Al Chip 14 March
1998
Invitational Fredrickson Masterson 1977
Cleveland Bob Al 21 July
1999
Open Albertson Fredrickson 1975
Des Moines Al 28
1999 Bob
Masters Fredrickson September
Albertson
Indiana Chip 1968
1999
Invitational Masterson
Update anomalies cannot occur in these tables, which are both in 3NF.
[edit] Derivation of Zaniolo's conditions
A lemma proved by Zaniolo states that a table is in 3NF if and only if, for each of its functional
dependencies X → A, at least one of the following conditions holds:
• X contains A, or
• X is a superkey, or
• A is a prime attribute (i.e., A is contained within a candidate key)
The lemma is proved in the following way: Let X → A be a nontrivial FD (i.e. one where X does
not contain A) and let A be a non-key attribute. Also let Y be a key of R. Then Y → X. Therefore
A is not transitively dependent on Y if and only if X → Y, that is, if and only if X is a superkey.[8]
[edit] Normalization beyond 3NF
Most 3NF tables are free of update, insertion, and deletion anomalies. Certain types of 3NF
tables, rarely met with in practice, are affected by such anomalies; these are tables which either
fall short of Boyce-Codd normal form (BCNF) or, if they meet BCNF, fall short of the higher
normal forms 4NF or 5NF.