Академический Документы
Профессиональный Документы
Культура Документы
As a designer, your primary aim is to provide for the efficient storage, update, and
retrieval of data.
You must also ensure that the data stored in the database is reliable.
In addition, your design should aim to create a database that is fully scalable and
adaptable to new requirements.
A database is part of a larger information system that is used to manage the
resources of an organization.
Database design takes place as part of the software development of an information
system.
Software development typically follows a number of stages including
requirements
design
implementation
maintenance
At the requirements stage of software development you establish the data and data
processing needs of your system.
Data requirements are determined by the producers and users of data in a system.
By analyzing data requirements, system programmers can plan the applications
that meet your data processing needs.
When software is being developed, the database is designed in conjunction with the
technical architecture of the software and applications
infrastructure on which the users will interact with the software and data
The final stage of software development is the implementation and maintenance of
the system.
Developing a database design involves three distinct roles:
designing the applications which will meet your data processing needs
identifying data structures for the database
implementing the database design on a software platform
Depending on the scale of the project, these roles can be carried out by a single
individual or by a number of individuals working in close collaboration.
The process of database design incorporates the basic steps involved in defining
and modeling data requirements, and normalizing design.
It also involves modifying the design and allocating data structures for optimal
performance.
The stages in database design include
requirements analysis
logical design
physical design
database implementation
Requirements analysis takes place before the design process begins.
Its aim is to gather information about the current state of your organization.
In the context of database design, requirements analysis refers only to the data that
is to be in the database and not the processes.
Requirements analysis determines the information you need to store in the
database and how that information is used.
You can gather this information by interviewing the users and producers of data as
well as by analyzing company documentation.
After the data has been gathered, you produce a formal requirements specification.
This broadly defines the data to be processed and the data relationships in an
information system.
Logical design is the representation of the structure of a database - showing all the
data and the relationships between the data.
Once the requirements analysis has been completed, the designer looks at the
specifications and determines how they will translate to the database.
This can be done using a conceptual modeling technique such as ER analysis.
A conceptual data model is a diagrammatic representation of the requirements
specification.
It displays the characteristics of each entity and attribute in the database, as well as
the relationships between them.
connectivity
cardinality
existence
Degree refers to the number of entities associated in the relationship.
In a binary relationship - an association between two entities - the degree is 2.
Here we see an example of a typical binary relationship.
A special category of binary relationship is shown here.
This means that there is one entity occurrence of COMPANY associated with many
occurrences of EMPLOYEE.
If expressed in natural language rather than ER notation, it would read that a
company employs at least one and probably many employees.
The number associated with the connectivity of a relationship is called the
cardinality.
Minimum cardinality is specified either by O (for zero) or | (for one) on the
connection line between entities.
On the many side of a relationship the maximum cardinality is either constant or
variable.
The lowest known value for maximum cardinality is one - denoted by |.
When the maximum cardinality is variable, it is specified by a crow's foot to
designate many.
Some situations are such that the actual maximum cardinality is known.
Here the minimum cardinality for COMPANY is 1 and the maximum is 1.
This is described as 'one and only one' and is shown by the symbol | |.
On the other side of the relationship, the minimum cardinality for EMPLOYEE is one
and the maximum is many.
In natural language, the cardinality for both entities signifies that each employee
works for 'one and only one' company.
Some enterprises have entities whose existence depends on the existence of
another entity.
The existence of an entity in a relationship is defined as either mandatory or
optional.
An entity is considered mandatory if an occurrence of the entity must always exist
for it to be included in a relationship.
For example, in this relationship the entity EMPLOYEE is dependent on the
permanent existence of COMPANY.
When an occurrence of an entity need not always exist for the entity to be included
in a relationship, it is considered optional.
For example, the entity ORDER may or may not be held at any DEPOT, therefore the
entity DEPOT in the relationship between ORDER and DEPOT is considered optional.
Optional existence defines a minimum cardinality of zero.
Master and Detail are not types of entity - rather they are roles that an entity plays
in a relationship.
An entity can be a Master in one relationship and a Detail in another.
Here you see the basic constructs for representing connectivity using Chen76
notation.
On the one side, the number one is shown on the connection between the
relationship and one of the entities.
And on the other side, the letter N - or sometimes M - is used on the connection line
to designate the concept of many.
Mandatory existence is denoted by a line perpendicular to the relationship line.
The shaded side of the relationship diamond implies many and the unshaded side
implies one.
Here are the typical connectivity constructs in Rein85 form.
All three are reasonably easy to learn and read, and they offer a simple approach to
ER modeling.
Although efforts at standardization are currently underway, there is no agreed
convention for the ER model.
And there are many other constructs in addition to the three types discussed.
For example, a broken connection line sometimes represents optional existence.
In ER analysis the data associated with an entity is known as the entity data
content.
It is typically expressed as a list of attributes.
Each attribute corresponds to a data item.
Here is an employee's personnel file.
Every entity occurrence has a single attribute - or some composite of attributes that uniquely identifies it.
This is known as the identifier or primary key.
Here the primary key of EMPLOYEE is Employee_Id.
Each depot supervises delivery personnel, processes orders, and maintains the
company's fleet of delivery vehicles.
Two relationships are allowed between two or more entities, as long as the two
relationships have different meanings.
Consider this situation, where an EMPLOYEE works on ORDER that is held in a
DEPOT.
The third relationship, EMPLOYEE works in DEPOT, isn't redundant because it doesn't
relate to the others.
Redundant relationships result in un-normalized tables when you transform the ER
model into relational schemas.
Attributes can be assigned to relationships as well as entities.
But they are typically assigned only to many-to-many relationships.
This is because in one-to-one and one-to-many relationships there is no ambiguity
in assigning an attribute to a particular entity.
Consider the one-to-many relationship between DEPOT and EMPLOYEE.
If the attribute Task_Assignment is applied to DEPOT, it designates the particular
tasks carried out by that depot.
And if the same attribute is applied to EMPLOYEE, it specifies the particular tasks
carried out by each occurrence of that entity
This ensures that each instance of the relationship that connects one employee with
one depot has a unique Task_Assignment for that employee in that depot.
Many-to-many relationships are suitable only for preliminary data models.
And they are usually resolved early in the database design process.
All many-to-many relationships can be decomposed into a pair of one-to-many
relationships, by the introduction of an associative entity.
The associative entity is related to the two entities that participated in the many-tomany relationship.
To decompose the relationship between EMPLOYEE and VEHICLE, you introduce the
associative entity VEHICLE_ASSIGNMENT.
This represents one employee's relationship with one vehicle at a given time.
And the two one-to-many relationships resulting from the decomposition are
EMPLOYEE to VEHICLE_ASSIGNMENT and VEHICLE to VEHICLE_ASSIGNMENT.
During the requirements analysis stage of database design, you obtain different
perspectives of a company's business.
This can result in inconsistent views of the database, incompatible data concepts,
and redundant data.
When different views of the database are depicted conceptually, the models may
contain incompatible relationships and terminology.
For example, a marketing department might view the concept of product as a unit
of sales.
But manufacturing might view product in terms of raw materials and personnel.
View incompatibility can result in the same concept being modeled as an entity,
attribute, or relationship, depending on perspective.
To illustrate this, let's model the statement "the customer places an order for a
product" from three different perspectives.
Here customer, order, and product are depicted as entities, and "places", and
"consists of" as relationships.
A vital stage in the database design process is the merging of different views into a
unified, nonredundant conceptual model.
This integrated conceptual model is known as a global schema.
Each relationship in a global schema is based on verifiable assertions about the
actual data in an enterprise.
The global schema is the result of the resolution of all differences in perspective and
terminology between end-user views.
One approach to integrating different views is to merge just two models at the one
time.
This is known as the binary approach and has the advantage of involving a small
number of ER constructs, and is relatively easy to conceptualize.
The approach described here is based on an integration methodology defined by
Batini, Lenzerini, and Navathe.
It involves the following steps:
comparison of models
conformation or alignment of models
merging and restructuring of models
First you compare the conceptual models to detect potential conflicts.
Type conflicts arise if different ER constructs are used to model the same concept.
For example, in this diagram an entity and an attribute are used to model the
concept of order.
This can be resolved by maintaining the entity type ORDER and moving the
attribute type Order_No from PRODUCT to ORDER.
Dependency conflicts result when users specify different levels of connectivity for
the same concepts.
Consider a situation where invoices are cleared by multiple payments in the
CUSTOMER view.
But the billing manager says that the COMPANY view is that a single payment is
made for each invoice.
The solution to dependency conflicts is to select the more general connectivity.
Once you identify potential conflicts, you then align the models to make them
compatible for integration.
You ensure that concepts conform to only one primitive data model type.
And you remove all redundant concepts from a global schema.
This includes overlapping entities and semantically redundant relationships.
Many-to-many relationships are not redundant, but they are typically eliminated
from ER models as early as possible in the design process.
Like the classification of entities and attributes, the view integration process is
characterized by continual refinement and re-evaluation.
Each stage of refinement is informed by the need for clarity and completeness in
the global schema.
A global schema is the basis for developing normalized tables.
So it is important that you verify all components with end-users before the ER model
is transformed to SQL tables.
ER analysis uses a diagrammatic syntax to represent data requirements.
For example, the statement "the company manufactures products" is presented like
this.
In ER syntax, entities normally correspond to nouns and relationships to verbs.
Attributes are characteristics of entities.
At the requirements analysis stage of the design process, you can list data items
under a common heading.
For example, a customer's name and address can be listed under "customer".
This will make it easier to distinguish between entities and attributes when you
begin to model the data.
You should include quantitative data in your requirements analysis.
Information regarding the total number of customers or the number of orders
processed allows you to establish the cardinality and connectivity of data
relationships.
Once the information is gathered and collated, a formal requirements specification
is drawn up.
Your requirements specification should describe an enterprise in terms of data items
and associations among data items.
A requirements specification can also include process models such as data flow
diagrams.
The requirements for InterSwift's database are deduced from a number of different
sources.
You have a brief description of the enterprise.
For example, the statement "customer receives many invoices" can be shown as
this entity relationship in a conceptual model.
InterSwift keeps detailed records of all customers and business transactions.
Based on your analysis, you can identify a number of provisional entities, such as
CUSTOMER and ORDER.
InterSwift is a haulage company, therefore its product is delivering goods.
During your analysis, you have recorded a great deal of information from delivery
dockets.
For example, you know the content, recipient, and mode of carriage for each order
that is delivered.
You have derived the cardinality for many of the entities from company files.
The use of optional existence between DEPOT and ORDER is derived from this
assertion.
This model contains a many-to-many relationship between EMPLOYEE and VEHICLE.
You must resolve this into two one-to-many relationships.
You do this by devising the associative entity VEHICLE_ASSIGNMENT.
After modeling the main entities and relationships in this view, you can define a
number of provisional attributes.
According to the order processing manager, orders are collected and delivered on a
specific day and at a particular time.
And your analysis disclosed information about the type of order and some data
concerning specific order items.
The company assigns each depot a unique Id number and the delivery is made from
a particular depot.
From your analysis, you have enough descriptive information to identiy five entities.
Based on assertions about business, you can describe a number of data relations.
For example, "A customer places an order" and "An order is delivered to a
recipient".
There is also an implicit relationship between INVOICE and PAYMENT.
Here is an ER model that represents all the data relationships that you have so far
identified.
This means that the connectivity between CUSTOMER and ORDER is one-to-many.
And the connectivity between ORDER and RECIPIENT is one-to-one.
Representing the relationship between CUSTOMER and INVOICE in ER syntax is
more difficult than it first appears.
But if you analyze the conditions of the relationship, you can depict it logically.
InterSwift sends an invoice to each customer monthly for all orders carried out in
that period.
And the company insists that each invoice is cleared by a single payment.
In addition, it is logical that the company will issue more than one invoice to regular
customers.
This is the most logical depiction of the relationship between CUSTOMER and
INVOICE.
The connectivity is one-to-many, and you can assign Invoice_Period as the primary
key of INVOICE.
It may emerge at a later stage in the design process that Invoice_Period doesn't
make sense as a primary key. In fact, it may need to be joined to another attribute,
The integration of the two database views results in a global ER schema for the
entire database, which becomes the basis for developing normalized tables.
Before integration can be attempted, however, you must resolve all differences in
terminology and perspective between the two views.
Your first step to successful integration is to compare the models.
You need to identify naming and structural conflicts, as well as incompatible
entities.
Structural conflicts include
type
dependency
key
There is a type conflict between the entity DELIVERY in the ORDER view and the
relationship ORDER delivered to RECIPIENT in the CUSTOMER view.
This can be resolved by maintaining the stronger entity type DELIVERY and
eliminating the relationship "delivered to".
You notice a naming conflict between RECEIPT in the ORDER view and INVOICE in
the CUSTOMER view because both entities are essentially synonymous.
After further analysis, you discover that these attributes are compatible and that the
entities can be consolidated.
Because RECEIPT is a rather ambiguous entity, you decide that the data is better
represented by the entity INVOICE.
There is a dependency conflict between the CUSTOMER and ORDER relationships as
they appear in the two views.
From the ORDER perspective, the relationship reads one ORDER to many
CUSTOMERS.
But from the CUSTOMER point of view, the relationship reads one CUSTOMER to
many ORDERS.
Typically, you resolve a dependency conflict by maintaining a more general
connectivity.
Therefore CUSTOMER and ORDER can be mapped as a many-to-many relationship.
You have decided to resolve all many-to-many relationships in the InterSwift model
into two one-to-many relationships.
And because this is a customer- tracking database, the best solution is to map one
CUSTOMER to many ORDERS.
There are no apparent key conflicts between the two views.
At this point you have sufficient commonality between the models to attempt a
merge.
You have three sets of common entities - CUSTOMER, ORDER, and RECIPIENT.
Other entities do not overlap, and appear intact in the merged global schema.
Because the common entities are equivalent, there are no structural conflicts due to
relations existing in one view but not in the other.
The entity type ADDRESS that you introduced in the CUSTOMER view creates a data
anomaly in the data model.
After merging the views, you need to eliminate the Address attributes from the
CUSTOMER, RECIPIENT, DEPOT, and EMPLOYEE entities.
And you need to define the ADDRESS_TYPE entity to accommodate the different
categories of address data.
And it unifies the conflicting ORDER and CUSTOMER views to give a clear picture of
data relations in the enterprise.
The global schema undergoes further alterations to maintain data integrity and to
enhance performance