Fundamentals of Database Design

LESSON 1
A database is a collection of inter-related, stored data that serves the needs of

multiple users within one or more organizations.
The form that a database takes is determined by the database management system
(DBMS).
A DBMS is a generalized software system for implementing and maintaining
databases.
It allows you to store, modify, and extract information from a database.
Types of database system include
relational
hierarchical
network
object oriented
flat file
Most organizations use relational database systems.
They provide a greater degree of data independence than others.
The principle of data independence describes the ability to alter the structure of the
database without reprogramming applications.
Physical independence abstracts the programmer from low level details of how the
data is organized and stored.
Application independence allows the designer to design the database without being
concerned with the detail of the programs.
In practice, total data independence is rarely possible. But a properly designed
database offers a high degree of independence.
Different design techniques can be used for the different types of database.
Entity relationship (ER) analysis is a technique that is particularly suited to the
construction of relational database models.
Database design is the process of designing database structures to represent realworld activities and objects.
As a designer, your primary aim is to provide for the efficient storage, update, and
retrieval of data.
You must also ensure that the data stored in the database is reliable.
In addition, your design should aim to create a database that is fully scalable and
adaptable to new requirements.
A database is part of a larger information system that is used to manage the
resources of an organization.
Database design takes place as part of the software development of an information
system.
Software development typically follows a number of stages including
requirements
design
implementation
maintenance
At the requirements stage of software development you establish the data and data
processing needs of your system.
Data requirements are determined by the producers and users of data in a system.
By analyzing data requirements, system programmers can plan the applications
that meet your data processing needs.
When software is being developed, the database is designed in conjunction with the
technical architecture of the software and applications
infrastructure on which the users will interact with the software and data
The final stage of software development is the implementation and maintenance of
the system.
Developing a database design involves three distinct roles:
designing the applications which will meet your data processing needs
identifying data structures for the database
implementing the database design on a software platform
Depending on the scale of the project, these roles can be carried out by a single
individual or by a number of individuals working in close collaboration.
The process of database design incorporates the basic steps involved in defining
and modeling data requirements, and normalizing design.
It also involves modifying the design and allocating data structures for optimal
performance.
The stages in database design include
requirements analysis
logical design
physical design
database implementation
Requirements analysis takes place before the design process begins.
Its aim is to gather information about the current state of your organization.
In the context of database design, requirements analysis refers only to the data that
is to be in the database and not the processes.
Requirements analysis determines the information you need to store in the
database and how that information is used.
You can gather this information by interviewing the users and producers of data as
well as by analyzing company documentation.
After the data has been gathered, you produce a formal requirements specification.
This broadly defines the data to be processed and the data relationships in an
information system.
Logical design is the representation of the structure of a database - showing all the
data and the relationships between the data.
Once the requirements analysis has been completed, the designer looks at the
specifications and determines how they will translate to the database.
This can be done using a conceptual modeling technique such as ER analysis.
A conceptual data model is a diagrammatic representation of the requirements
specification.
It displays the characteristics of each entity and attribute in the database, as well as
the relationships between them.
Conceptual data modeling is completed independently of all hardware and software

considerations, including the DBMS.
The conceptual data model leads to better communication between the users,
analysts, and designers of the database.
And it enables them to form a common understanding of the information being
modeled.
Sometimes, during requirements analysis, different views of the database emerge
depending on the perspective of the data user or producer.
Logical design must consolidate all database views into a unified global schema,
this is known as view integration.
Once a global schema of the database has been produced, the next step in logical
design is to eliminate data anomalies and redundancy.
This is done using a specific technique called normalization.
Normalization is the process of organizing data attributes to orm stable and
adaptive entities.
Physical design is a blueprint for how to store data on a particular system.
At this stage the ER diagram is converted to a relational schema.
A relational schema defines the database structure in terms of tables, columns, and
indexes.
The main concern of physical design is how to modify the database design to
optimize performance of the database.
This is done by identifying and accommodating data processes based on high
frequency or on priority.
Physical design involves
denormalization
data allocation
Denormalization is the selective modification of a normalized database design to
optimize performance.
Data allocation refers to the performance implications of storing data on different
devices or servers.
Devices are disks or disk configurations on which the database can be stored.
Database design is an iterative process.

Each stage of the life cycle is linked to the previous stage in a continuous cycle of
re-evaluation and validation.
Suppose a prototype conceptual model of a system's data requirements is
examined by an end-user.
This might lead to modification of the requirements specification, depending on
whether the model accurately represents the user's needs.
Once the physical design is completed, the database is implemented using the data
definition language (DDL) of a DBMS.
As the database begins operation, you should monitor its performance to ensure
that your requirements are being met.
If requirements change, you can modify the database design to improve
performance.
Database design is a methodical, document-intensive process that can be
automated using computer-aided software engineering (CASE) tools.
CASE tools can be used to aid modeling, to support normalization, and ultimately, to
implement the database design.
Entity relationship (ER) analysis has a very important role in database design.
It is a specific technique for organizing and modeling a system's data.
ER analysis offers an efficient approach to the gathering and collating of an
enterprise's data requirements.
This information is formally presented as part of a requirements specification.
A requirements specification may include other models, such as a process model.
ER analysis is a conceptual data-modeling technique.
The aim of conceptual modeling is to represent real-world data requirements in a
manner that is easily understood by end-users and database designers.
A conceptual model of all the data and data relationships in a system is known as a
global schema.
The ER approach to conceptual modeling uses diagrammatic syntax - or notation to convey the data requirements in a given enterprise.
This ensures that ER models give a clear, unambiguous, and comprehensive

blueprint for the later stages of database design and implementation.
ER notation has a number of different conventions, including
Chen76
Rein85
Ever86
In this case let's use the Ever86 notation.
The basic ER model consists of two classes of object:
entities
relationships
An entity is an object about which the system needs to record information.
It can have a physical existence - an employee, for example.
And it can have a conceptual existence - a product category, for example.
In Ever86 entities are denoted by rectangular boxes.
The entity name is written inside the rectangle.
An entity occurrence is a specific example of an entity.
It ultimately manifests itself physically as a row in a database table.
The entity occurrences in an application environment are related to each other in
various ways.
Relationships represent real-world associations among one or more entities.
A relationship is denoted by a connection line between two entities and a
relationship name.
A relationship name is usually a verb.
In this relationship the relationship name is "employs".
The diagram reads COMPANY employs EMPLOYEE.

A relationship may have two names, depending on the direction in which it is read.
For example, EMPLOYEE is employed by COMPANY.
Relationships are described in terms of
degree
connectivity
cardinality
existence
Degree refers to the number of entities associated in the relationship.
In a binary relationship - an association between two entities - the degree is 2.
Here we see an example of a typical binary relationship.
A special category of binary relationship is shown here.
The relationship EMPLOYEE is supervised by EMPLOYEE is known as a recursive

relationship.
It is called recursive because the entity relates only to another instance of its own
type.
Connectivity describes the mapping of the associated entity occurrences in a
relationship.
Values for connectivity are either one or many.
One is denoted by a line perpendicular to the connecting line.
And many is denoted by a crow's foot.

The basic constructs for connectivity between entity occurrences in binary
relationships are
one-to-one
one-to-many
many-to-many
The relationship shown has a connectivity of one for COMPANY and many for
EMPLOYEE.
This means that there is one entity occurrence of COMPANY associated with many
occurrences of EMPLOYEE.
If expressed in natural language rather than ER notation, it would read that a
company employs at least one and probably many employees.
The number associated with the connectivity of a relationship is called the
cardinality.
Minimum cardinality is specified either by O (for zero) or | (for one) on the
connection line between entities.
On the many side of a relationship the maximum cardinality is either constant or
variable.
The lowest known value for maximum cardinality is one - denoted by |.
When the maximum cardinality is variable, it is specified by a crow's foot to
designate many.
Some situations are such that the actual maximum cardinality is known.
Here the minimum cardinality for COMPANY is 1 and the maximum is 1.
This is described as 'one and only one' and is shown by the symbol | |.
On the other side of the relationship, the minimum cardinality for EMPLOYEE is one
and the maximum is many.
In natural language, the cardinality for both entities signifies that each employee
works for 'one and only one' company.
Some enterprises have entities whose existence depends on the existence of
another entity.
The existence of an entity in a relationship is defined as either mandatory or
optional.
An entity is considered mandatory if an occurrence of the entity must always exist
for it to be included in a relationship.
For example, in this relationship the entity EMPLOYEE is dependent on the
permanent existence of COMPANY.
When an occurrence of an entity need not always exist for the entity to be included
in a relationship, it is considered optional.
For example, the entity ORDER may or may not be held at any DEPOT, therefore the
entity DEPOT in the relationship between ORDER and DEPOT is considered optional.
Optional existence defines a minimum cardinality of zero.
Therefore, optional existence is represented by 0 on the connection line between

the entities in a relationship.
Mandatory existence defines a minimum cardinality of one.
Therefore, mandatory existence is represented by a line perpendicular to the

connection line |.
In a one-to-many relationship, the entity at the one end is referred to as the Master
and the entity at the many end is the Detail.
Master and Detail are not types of entity - rather they are roles that an entity plays
in a relationship.
An entity can be a Master in one relationship and a Detail in another.
Ever86 is not the only commonly used ER notation.

The Chen76 and Rein85 conventions also provide a rich semantic basis for ER
analysis and conceptual modeling.
These ER conventions were devised by Gordon Everest, Peter Chen, and D. Reiner
respectively. The numbers refer to the years in which their ideas were first
published.
Both Chen76 and Rein85 use rectangular boxes to denote entities.
They differ from Ever86, however, in the way in which they represent the
relationship construct.
In Chen76 the relationship construct is a diamond that connects the associated

entities.
The relationship name is written inside the diamond.
Like Chen76, Rein85 uses a diamond to signify a relationship construct.

But the relationship name is written above the connecting line.
Here you see the basic constructs for representing connectivity using Chen76
notation.
On the one side, the number one is shown on the connection between the
relationship and one of the entities.
And on the other side, the letter N - or sometimes M - is used on the connection line
to designate the concept of many.
Mandatory existence is denoted by a line perpendicular to the relationship line.
Optional existence is denoted by 0 on the relationship line.

Maximum cardinality is defined explicitly on the ER diagram as a constant if a
number is displayed next to an entity.
And it is defined as a variable if no number is displayed.
Rein85 uses a shading approach to express the connectivity of a relationship.
The shaded side of the relationship diamond implies many and the unshaded side
implies one.
Here are the typical connectivity constructs in Rein85 form.
The circle on the connection line denotes optional existence.

And mandatory existence is implied by the absence of the optional symbol.
The similarity between the Chen76, Rein85, and Ever86 forms is clear from the
diagram.
All three are reasonably easy to learn and read, and they offer a simple approach to
ER modeling.
Although efforts at standardization are currently underway, there is no agreed
convention for the ER model.
And there are many other constructs in addition to the three types discussed.
For example, a broken connection line sometimes represents optional existence.
Database design would greatly benefit from standardization.

This is particularly true in the area of computer-aided software engineering.
Database modeling and design can be implemented as a set of software design
aids.
These design aids are typically contained in the database component of computeraided software engineering (CASE) tools.
CASE tools offer an interactive modeling capability using a simplified data modeling
approach.
They typically assist the analyst or designer in drawing ER diagrams by selecting
the optimal location for entities and by minimizing crossing lines and clutter.
They can also check inconsistencies in the model, such as repeated entities and
entities with no relationships.
In addition, CASE tools support view integration, transformation to SQL, and
normalization.
Suppose you are a database designer.
Here you have to define in ER notation certain relationship constructs based on
requirements identified
In ER analysis the data associated with an entity is known as the entity data
content.
It is typically expressed as a list of attributes.
Each attribute corresponds to a data item.
Here is an employee's personnel file.
Some attributes of EMPLOYEE contained in the file are

Employee_Id
Employee_Name
Employee_Address
During ER analysis you should define legitimate values for each attribute.
These values are typically defined in terms of datatype and domain.

The datatype for an attribute defines what class of data can be stored in that
attribute.
To clarify your data-processing requirements, you should declare logical datatypes
for each attribute.
This table shows some examples of datatypes and their logical business meaning.
An attributes datatype determines its domain.

The domain of an attribute defines what values an attribute can legitimately take
on.
Consider Salary as an attribute of EMPLOYEE, where the datatype is money and the
domain is $10,000 to $120,000.
The table shown expresses the logical domain for this datatype.
Every entity occurrence has a single attribute - or some composite of attributes that uniquely identifies it.
This is known as the identifier or primary key.
Here the primary key of EMPLOYEE is Employee_Id.
A descriptor or non-key attribute identifies a non-unique characteristic of an entity.

For example, a descriptor of EMPLOYEE is Employee_Name.
Because it is possible for two employees to have the same name, but they will still
be uniquely identified by their Employee_Id's
Entities and attributes are data items that convey information about an enterprise.
Entities are the principal data items in an enterprise.
And attributes are characteristics of entities.
Although it is easy to define entities and attributes as concepts, it is more difficult to
distinguish their roles in a database model.
There are a number of guidelines to help you distinguish between entities and
attributes.
If there is descriptive information about a data item, the data item should be
classified as an entity.
But if a data item requires only an identifier, the object should be classified as an
attribute.
Consider the concept "ORDER".
If there is some descriptive information about this concept, such as category and
cost, then it should be classified as an entity.
However, if only the order number is required to identify an order then it should be
classified as an attribute associated with some entity, such as SALES.
If more than one value of a descriptor attribute corresponds to one value of an
identifier, the descriptor should be classified as an entity.
Consider a large haulage company with many delivery depots.
Each depot supervises delivery personnel, processes orders, and maintains the
company's fleet of delivery vehicles.
You can classify depot as a multivalued attribute of COMPANY, but it is better

classified as an entity.
And you can assign the attribute Depot_Id as the primary key of DEPOT.
You should attach attributes to the entities that they directly describe.
For example, this is an attribute of DEPOT rather than EMPLOYEE, even though it
gives some information about the employee's place of work.
The procedure of identifying entities and attaching attributes to them is iterative.

You may discover after classifying certain data items that some entities are better
defined as attributes or attributes as entities.
As you know, associations between entities are described as relationships.
Some typical relationships are "employs", "works in", "operates", or any verb that
connects entities.
For every relationship you should specify degree, connectivity, and whether it is
optional or mandatory.
When constructing an entity relationship diagram (ERD), it is important to eliminate
redundant relationships.
A redundant relationship exists if two or more entities are used to represent the
same concept.
For example, here EMPLOYEE makes a DELIVERY which consists of ORDER, so the
EMPLOYEE delivers ORDER relationship is redundant.
Two relationships are allowed between two or more entities, as long as the two
relationships have different meanings.
Consider this situation, where an EMPLOYEE works on ORDER that is held in a
DEPOT.
The third relationship, EMPLOYEE works in DEPOT, isn't redundant because it doesn't
relate to the others.
Redundant relationships result in un-normalized tables when you transform the ER
model into relational schemas.
Attributes can be assigned to relationships as well as entities.
But they are typically assigned only to many-to-many relationships.
This is because in one-to-one and one-to-many relationships there is no ambiguity
in assigning an attribute to a particular entity.
Consider the one-to-many relationship between DEPOT and EMPLOYEE.
If the attribute Task_Assignment is applied to DEPOT, it designates the particular
tasks carried out by that depot.
And if the same attribute is applied to EMPLOYEE, it specifies the particular tasks
carried out by each occurrence of that entity
Suppose there is a many-to-many relationship between DEPOT and EMPLOYEE, so

that an employee can belong to many depots.
Then the attribute Task_Assignment is applied to the relationship.
This ensures that each instance of the relationship that connects one employee with
one depot has a unique Task_Assignment for that employee in that depot.
Many-to-many relationships are suitable only for preliminary data models.
And they are usually resolved early in the database design process.
All many-to-many relationships can be decomposed into a pair of one-to-many
relationships, by the introduction of an associative entity.
The associative entity is related to the two entities that participated in the many-tomany relationship.
To decompose the relationship between EMPLOYEE and VEHICLE, you introduce the
associative entity VEHICLE_ASSIGNMENT.
This represents one employee's relationship with one vehicle at a given time.
And the two one-to-many relationships resulting from the decomposition are
EMPLOYEE to VEHICLE_ASSIGNMENT and VEHICLE to VEHICLE_ASSIGNMENT.
The InterSwift haulage company is currently developing a database to store

customer orders and customer details.
Suppose you are a database designer.
Here you are to identify data items and the associations between the data items as
a preliminary step to constructing a conceptual model of InterSwift's customer
database.
Your inquiry is based on information gathered during requirements analysis.
During the requirements analysis stage of database design, you obtain different
perspectives of a company's business.
This can result in inconsistent views of the database, incompatible data concepts,
and redundant data.
When different views of the database are depicted conceptually, the models may
contain incompatible relationships and terminology.
For example, a marketing department might view the concept of product as a unit
of sales.
But manufacturing might view product in terms of raw materials and personnel.
View incompatibility can result in the same concept being modeled as an entity,
attribute, or relationship, depending on perspective.
To illustrate this, let's model the statement "the customer places an order for a
product" from three different perspectives.
Here customer, order, and product are depicted as entities, and "places", and
"consists of" as relationships.
This model defines "orders" as a relationship between customer and product.
Finally, the relationship "orders" is replaced by another relationship, "purchases".
And order is designated as an attribute of the PRODUCT entity.
A vital stage in the database design process is the merging of different views into a
unified, nonredundant conceptual model.
This integrated conceptual model is known as a global schema.
Each relationship in a global schema is based on verifiable assertions about the
actual data in an enterprise.
The global schema is the result of the resolution of all differences in perspective and
terminology between end-user views.
One approach to integrating different views is to merge just two models at the one
time.
This is known as the binary approach and has the advantage of involving a small
number of ER constructs, and is relatively easy to conceptualize.
The approach described here is based on an integration methodology defined by
Batini, Lenzerini, and Navathe.
It involves the following steps:
comparison of models
conformation or alignment of models
merging and restructuring of models
First you compare the conceptual models to detect potential conflicts.
Type conflicts arise if different ER constructs are used to model the same concept.
For example, in this diagram an entity and an attribute are used to model the
concept of order.
This can be resolved by maintaining the entity type ORDER and moving the
attribute type Order_No from PRODUCT to ORDER.
Dependency conflicts result when users specify different levels of connectivity for
the same concepts.
Consider a situation where invoices are cleared by multiple payments in the
CUSTOMER view.
But the billing manager says that the COMPANY view is that a single payment is
made for each invoice.
The solution to dependency conflicts is to select the more general connectivity.
In this case the INVOICE to PAYMENT relationship should be a one-to-many

relationship.
Key conflicts occur when different primary keys are assigned to the same entity in
different views.
For example, a conflict results if the social security number (Employee_SSN) and
Employee_Id are both assigned as primary keys of EMPLOYEE through different
views.
Once you identify potential conflicts, you then align the models to make them
compatible for integration.
You ensure that concepts conform to only one primitive data model type.
And you remove all redundant concepts from a global schema.
This includes overlapping entities and semantically redundant relationships.
Many-to-many relationships are not redundant, but they are typically eliminated
from ER models as early as possible in the design process.
Like the classification of entities and attributes, the view integration process is
characterized by continual refinement and re-evaluation.
Each stage of refinement is informed by the need for clarity and completeness in
the global schema.
A global schema is the basis for developing normalized tables.
So it is important that you verify all components with end-users before the ER model
is transformed to SQL tables.
ER analysis uses a diagrammatic syntax to represent data requirements.
For example, the statement "the company manufactures products" is presented like
this.
In ER syntax, entities normally correspond to nouns and relationships to verbs.
Attributes are characteristics of entities.
At the requirements analysis stage of the design process, you can list data items
under a common heading.
For example, a customer's name and address can be listed under "customer".
This will make it easier to distinguish between entities and attributes when you
begin to model the data.
You should include quantitative data in your requirements analysis.
Information regarding the total number of customers or the number of orders
processed allows you to establish the cardinality and connectivity of data
relationships.
Once the information is gathered and collated, a formal requirements specification
is drawn up.
Your requirements specification should describe an enterprise in terms of data items
and associations among data items.
A requirements specification can also include process models such as data flow
diagrams.
The requirements for InterSwift's database are deduced from a number of different
sources.
You have a brief description of the enterprise.
You have access to company records and sales documentation.

And you have conducted interviews with both management and customers.
Your first requirement is to establish the nature of the business.
An enterprise typically has customers who place orders for products - or services that are manufactured or provided by employees.
These are fundamental data-processing and storage requirements.
Enterprise is usually implicit in a database and isn't depicted in specifications or
conceptual models.
InterSwift specializes in transporting goods within the US.
The company's headquarters are in New York.
But throughout the US, regional depots handle orders for the transportation of
goods.
A network of vans and lorries is used to collect and deliver the goods within a local
area.
From the description of the company, you can derive that
the nature of the business is haulage
the company is divided into regional depots
vehicles are used to transport goods
These concepts are included in your requirements specification.
Because InterSwift is a haulage business, you can derive the data concept
RECIPIENT.
This is based on the assumption that goods must be delivered somewhere.
Customers supply InterSwift with the address of each RECIPIENT as part of the
ORDER description.
During your analysis to date, you have acquired two distinct views of the enterprise.
One - based on an interview with a delivery manager - conceives of the enterprise in
terms of order processing.
The other is based on interviews conducted with key customers and derived from
sales documentation such as invoices.
According to the depot manager, InterSwift has 22 regional depots.

In the last year the company has processed over 200,000 orders for 80,000
customers.
Here you see the cardinality for DEPOT, CUSTOMER, and ORDER.
Each order is uniquely identified by an order number.

The company defines its delivery employees as working in a depot.
The company currently employs 1500 delivery personnel, who are supervised by
depot managers.
Personnel records contain details of each employee's
name
address
Id number
You have access to the vehicle registration and insurance documentation for each of
the 126 lorries and 175 vans in the fleet.
During your interview, the delivery manager describes the order process in great
detail.
Each order is assigned a unique number.
It consists of different items.
And it may be stored in a depot prior to delivery.
This sentence suggests an optional existence constraint in the database design,
because not every order is stored in a depot before delivery.
The destination of each order is specified by a unique recipient number.
Your customer interview reveals a different view of the enterprise's data

requirements.
Shoptimum is InterSwift's main customer.
It regularly places orders for the collection and delivery of goods.
And it supplies detailed information on the recipient of the delivered goods.
Shoptimum receives an invoice monthly for all orders carried out during that period.
InterSwift insists that customer invoices be cleared by a single payment.
Your interview discloses this data.
And you know the period for which invoices are issued.
This information allows you to make some assertions about data associations in the
enterprise that can be used to model relationships.
For example, the statement "customer receives many invoices" can be shown as
this entity relationship in a conceptual model.
InterSwift keeps detailed records of all customers and business transactions.
After analyzing this invoice, you can identify customers by
name
address
Id number
Because "customer" is already included in your specification, you can consider
these data items as customer attributes.
You can also derive balance details and payment details.
Here's the full table of the data requirements from the customer's perspective.
Shoptimum is InterSwift's main customer.

It regularly places orders for the collection and delivery of goods.
And it supplies detailed information on the recipient of the delivered goods.
Shoptimum receives an invoice monthly for all orders carried out during that period.
InterSwift insists that customer invoices be cleared by a single payment.
Your interview discloses this data.
And you know the period for which invoices are issued.
This information allows you to make some assertions about data associations in the
enterprise that can be used to model relationships.
For example, the statement "customer receives many invoices" can be shown as
this entity relationship in a conceptual model.
InterSwift keeps detailed records of all customers and business transactions.
After analyzing this invoice, you can identify customers by

name
address
Id number
Because "customer" is already included in your specification, you can consider
these data items as customer attributes.
You can also derive balance details and payment details.
Here's the full table of the data requirements from the customer's perspective.
First let's consider data from an order processing perspective.
Based on your analysis, you can identify a number of provisional entities, such as
CUSTOMER and ORDER.
InterSwift is a haulage company, therefore its product is delivering goods.
During your analysis, you have recorded a great deal of information from delivery
dockets.
For example, you know the content, recipient, and mode of carriage for each order
that is delivered.
Delivery information can be recorded as part of the ORDER entity.

But the nature of the business and the type and volume of data to be recorded
suggests that you require a separate DELIVERY entity.
You can identify provisional relationships between the various entities.
You have decided that only binary relationships are to be included in the conceptual
model.
This means that the degree of each relationship is two.
One way of logically developing relationships is to isolate sentences that define
entities.
You know that "a customer places an order".
And that "an order is sometimes handled by regional depots".
Here is a provisional model of the system's data relationships.
You can now analyze the relationships for redundancy.

There is some redundancy between DELIVERY and ORDER in terms of the
relationships involving EMPLOYEE and VEHICLE.
This can be resolved if you remove the relationship between EMPLOYEE and ORDER
and the relationship between VEHICLE and ORDER.
In this model the relationships between EMPLOYEE, VEHICLE, and DELIVERY are
anomalous.
But you can eliminate the redundant relationship between EMPLOYEE and DELIVERY
and maintain the relationship between EMPLOYEE and VEHICLE.
Here is a revised ER model with the redundant relationships eliminated.
You have derived the cardinality for many of the entities from company files.
This allows you to map the connectivity of each relationship.

For example, the connectivity between ORDER and CUSTOMER is one-to-many.
This diagram shows examples of optional and mandatory existence in relationships.
The use of optional existence between DEPOT and ORDER is derived from this
assertion.
This model contains a many-to-many relationship between EMPLOYEE and VEHICLE.
You must resolve this into two one-to-many relationships.
You do this by devising the associative entity VEHICLE_ASSIGNMENT.
After modeling the main entities and relationships in this view, you can define a
number of provisional attributes.
According to the order processing manager, orders are collected and delivered on a
specific day and at a particular time.
And your analysis disclosed information about the type of order and some data
concerning specific order items.
From this you can deduce these attributes.
You discovered that ORDERS are uniquely identified by an order number.
Therefore, you can assign Order_Id as an identifier attribute or primary key of

ORDER.
From analyzing delivery dockets, you discovered these data items.
Your data requirements include EMPLOYEE details.

So you define these attributes.
The company assigns each depot a unique Id number and the delivery is made from
a particular depot.
Therefore some attributes of DEPOT are

Depot_Name
Depot_Address
Depot_Id
During requirements analysis, you distinguish between management and delivery
personnel.
These concepts have been subsumed into the EMPLOYEE entity.
The recursive relationship EMPLOYEE supervises EMPLOYEE depicts how the
company defines its management structure.
When deliveries are made, each customer is issued with a RECEIPT detailing the
content, cost, and period of each order.
Here is a conceptual model of the ORDER PROCESSING view.
Now let's construct a conceptual model of the CUSTOMER view.
From your analysis, you have enough descriptive information to identiy five entities.
Based on assertions about business, you can describe a number of data relations.
For example, "A customer places an order" and "An order is delivered to a
recipient".
There is also an implicit relationship between INVOICE and PAYMENT.
Here is an ER model that represents all the data relationships that you have so far
identified.
In this view a CUSTOMER places multiple ORDERS.
And each order is delivered to a single recipient.
This means that the connectivity between CUSTOMER and ORDER is one-to-many.
And the connectivity between ORDER and RECIPIENT is one-to-one.
Representing the relationship between CUSTOMER and INVOICE in ER syntax is
more difficult than it first appears.
But if you analyze the conditions of the relationship, you can depict it logically.
InterSwift sends an invoice to each customer monthly for all orders carried out in
that period.
And the company insists that each invoice is cleared by a single payment.
In addition, it is logical that the company will issue more than one invoice to regular
customers.
This is the most logical depiction of the relationship between CUSTOMER and
INVOICE.
The connectivity is one-to-many, and you can assign Invoice_Period as the primary
key of INVOICE.
It may emerge at a later stage in the design process that Invoice_Period doesn't
make sense as a primary key. In fact, it may need to be joined to another attribute,
such as Customer_Id in order to uniquely define every occurrence of the entity

INVOICE.
You can assign Invoice_Due_Date as a non-key attribute.
Attributes are characteristics of entities.

Therefore, the following are attributes of CUSTOMER:
Customer_Name
Customer_Address
Customer_Id
Your analysis of delivery dockets and company statements revealed these

attributes.
Each invoice sent by the company contains details of the delivered items, so you
define this as an attribute of INVOICE.
This is a fully attributed model of the CUSTOMER view.
Here Address is a multivalued attribute of CUSTOMER and RECIPIENT.

Therefore, you can classify ADDRESS as an entity.
During the requirements analysis process you obtain two views of the database.
The first view defines the database in terms of order processing.
Here is the ER model of this view.
The second view defines the company in terms of orders delivered.

It incorporates all the relationships associated with the entities CUSTOMER and
ORDER, as seen from a customer's perspective.
Here is the ER model for this view.
The integration of the two database views results in a global ER schema for the
entire database, which becomes the basis for developing normalized tables.
Before integration can be attempted, however, you must resolve all differences in
terminology and perspective between the two views.
Your first step to successful integration is to compare the models.
You need to identify naming and structural conflicts, as well as incompatible
entities.
Structural conflicts include
type
dependency
key
There is a type conflict between the entity DELIVERY in the ORDER view and the
relationship ORDER delivered to RECIPIENT in the CUSTOMER view.
This can be resolved by maintaining the stronger entity type DELIVERY and
eliminating the relationship "delivered to".
You notice a naming conflict between RECEIPT in the ORDER view and INVOICE in
the CUSTOMER view because both entities are essentially synonymous.
After further analysis, you discover that these attributes are compatible and that the
entities can be consolidated.
Because RECEIPT is a rather ambiguous entity, you decide that the data is better
represented by the entity INVOICE.
There is a dependency conflict between the CUSTOMER and ORDER relationships as
they appear in the two views.
From the ORDER perspective, the relationship reads one ORDER to many
CUSTOMERS.
But from the CUSTOMER point of view, the relationship reads one CUSTOMER to
many ORDERS.
Typically, you resolve a dependency conflict by maintaining a more general
connectivity.
Therefore CUSTOMER and ORDER can be mapped as a many-to-many relationship.
You have decided to resolve all many-to-many relationships in the InterSwift model
into two one-to-many relationships.
And because this is a customer- tracking database, the best solution is to map one
CUSTOMER to many ORDERS.
There are no apparent key conflicts between the two views.
At this point you have sufficient commonality between the models to attempt a
merge.
You have three sets of common entities - CUSTOMER, ORDER, and RECIPIENT.
Other entities do not overlap, and appear intact in the merged global schema.
Because the common entities are equivalent, there are no structural conflicts due to
relations existing in one view but not in the other.
The entity type ADDRESS that you introduced in the CUSTOMER view creates a data
anomaly in the data model.
After merging the views, you need to eliminate the Address attributes from the
CUSTOMER, RECIPIENT, DEPOT, and EMPLOYEE entities.
And you need to define the ADDRESS_TYPE entity to accommodate the different
categories of address data.
This is your global schema for the InterSwift database.

All the data concepts discovered during requirements analysis are represented.
And it unifies the conflicting ORDER and CUSTOMER views to give a clear picture of
data relations in the enterprise.
The global schema undergoes further alterations to maintain data integrity and to
enhance performance

Fundamentals of Database Design

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Fundamentals of Database Design

Загружено:

Авторское право:

Доступные форматы

LESSON 1

A database is a collection of inter-related, stored data that serves the needs of

Conceptual data modeling is completed independently of all hardware and software

Database design is an iterative process.

This ensures that ER models give a clear, unambiguous, and comprehensive

The diagram reads COMPANY employs EMPLOYEE.

The relationship EMPLOYEE is supervised by EMPLOYEE is known as a recursive

And many is denoted by a crow's foot.

Therefore, optional existence is represented by 0 on the connection line between

Therefore, mandatory existence is represented by a line perpendicular to the

Ever86 is not the only commonly used ER notation.

In Chen76 the relationship construct is a diamond that connects the associated

Like Chen76, Rein85 uses a diamond to signify a relationship construct.

Optional existence is denoted by 0 on the relationship line.

The circle on the connection line denotes optional existence.

Database design would greatly benefit from standardization.

Some attributes of EMPLOYEE contained in the file are

These values are typically defined in terms of datatype and domain.

An attributes datatype determines its domain.

A descriptor or non-key attribute identifies a non-unique characteristic of an entity.

You can classify depot as a multivalued attribute of COMPANY, but it is better

The procedure of identifying entities and attaching attributes to them is iterative.

Suppose there is a many-to-many relationship between DEPOT and EMPLOYEE, so

The InterSwift haulage company is currently developing a database to store

This model defines "orders" as a relationship between customer and product.

Finally, the relationship "orders" is replaced by another relationship, "purchases".

And order is designated as an attribute of the PRODUCT entity.

In this case the INVOICE to PAYMENT relationship should be a one-to-many

You have access to company records and sales documentation.

According to the depot manager, InterSwift has 22 regional depots.

Each order is uniquely identified by an order number.

Your customer interview reveals a different view of the enterprise's data

Shoptimum is InterSwift's main customer.

After analyzing this invoice, you can identify customers by

First let's consider data from an order processing perspective.

Delivery information can be recorded as part of the ORDER entity.

You can now analyze the relationships for redundancy.

This allows you to map the connectivity of each relationship.

From this you can deduce these attributes.

You discovered that ORDERS are uniquely identified by an order number.

Therefore, you can assign Order_Id as an identifier attribute or primary key of

Your data requirements include EMPLOYEE details.

Therefore some attributes of DEPOT are

In this view a CUSTOMER places multiple ORDERS.

And each order is delivered to a single recipient.

such as Customer_Id in order to uniquely define every occurrence of the entity

Attributes are characteristics of entities.

Your analysis of delivery dockets and company statements revealed these

Here Address is a multivalued attribute of CUSTOMER and RECIPIENT.

Here is the ER model of this view.

The second view defines the company in terms of orders delivered.

Here is the ER model for this view.

This is your global schema for the InterSwift database.

Вам также может понравиться