Database Slides

File Based system
A collection of application programs that perform services for the end-users, such as the production of reports.
Each program defines and manages its own data.
File-based systems were an early attempt to computerize the manual filing system that we are all familiar with.
For example, an organization might have physical files set up to hold all external and internal correspondence
relating to a project, product, task, client, or employee.
Limitations of the File-Based Approach

 Separation and isolation of data.
“When data is isolated in separate files, it is more difficult to access data that should be available.”
For example, if we want to produce a list of all houses that match the requirements of clients, we first need to create
a temporary file of those clients who have “house” as the preferred type.
We then search the PropertyForRent file for those properties
 Duplication of data.
Owing to the decentralized approach taken by each department, the file-based approach encouraged, if not
necessitated, the uncontrolled duplication of data.
For example, in Previous figures we can clearly see that there is duplication of both property and client details in the
Sales and Contracts Departments.
 Data dependence
As we have already mentioned, the physical structure and storage of the data files and records are defined in the
application code.
This means that changes to an existing structure are difficult to make.
For example, increasing the size of the PropertyForRent address field from 40 to 41 characters sounds like a simple
change, but it requires the creation of a one-off program (that is, a program that is run only once and can then be
discarded) that converts the PropertyForRent file to the new format.
 Incompatible file formats

Because the structure of files is embedded in the application programs, the structures are dependent on the
application programming language.
For example, the structure of a file generated by a COBOL program may be different from the structure of a file
generated by a C program.
 Fixed queries/proliferation of application programs

From the end-user’s point of view, file-based systems were a great improvement over manual systems.
Consequently, the requirement for new or modified queries grew.
However, file-based systems are very dependent upon the application developer, who has to write any queries or
reports that are required.
In some organizations, the type of query or report that could be produced was fixed.
Database
A shared collection of logically related data and its description, designed to meet the information needs of an
organization.
• The database is a single, possibly large repository of data that can be used simultaneously by many departments and
users.
Meta data
• The description of the data is known as the system catalog (or data dictionary or metadata—the “data about data”).
• It is the self-describing nature of a database that provides program–data independence.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
• An entity is a distinct object (a person, place, thing, concept, or event) in the organization that is to be
represented in the database.
• An attribute is a property that describes some aspect of the object that we wish to record.
• A relationship is an association between entities.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
• The DBMS is the software that interacts with the users’ application programs and the database.
• Typically, a DBMS provides the following facilities:
1. It allows users to define the database, usually through a Data Definition Language (DDL). The DDL allows
users to specify the data types and structures and the constraints on the data to be stored in the database.
2. It allows users to insert, update, delete, and retrieve data from the database, usually through a Data
Manipulation Language (DML). Having a central repository for all data and data descriptions allows the
DML to provide a general inquiry facility to this data, called a query language.
---------------------------------------------------------------------------------------------------------------------------------------------
Application Programs
Users interact with the database through a number of application programs that are used to create and
maintain the database and to generate information.
The application programs may be written in a programming language or in higher-level fourth-generation
language.
View
• With this functionality, the DBMS is an extremely powerful and useful tool.
• However, as the end-users are not too interested in how complex or easy a task is for the system, it could
be argued that the DBMS has made things more complex, because they now see more data than they
actually need or want.
In recognition of this problem, a DBMS provides another facility known as a view mechanism, which allows each
user to have his or her own view of the database (a view is, in essence, some subset of the database)
Roles in the Database Environment
Data Administrator (DA)
• Data Administrator (DA) is responsible for the management of the data resource, including database planning;
development and maintenance of standards, policies and procedures; and conceptual/logical database design.
Database Administrator (DBA)
• The Database Administrator (DBA) is responsible for the physical realization of the database, including physical
database design and implementation, security and integrity control, maintenance of the operational system, and
ensuring satisfactory performance of the applications for users.
Database Designers
In large database design projects, we can distinguish between two types of designer:
1.logical database designers 2. physical database designers
• The logical database designer is concerned with identifying the data (that is, the entities and attributes), the
relationships between the data, and the constraints on the data that is to be stored in the database.
• The logical database designer must have a thorough and complete understanding of the organization’s data and any
constraints on this data (the constraints are sometimes called business rules).
Application Developers
• Once the database has been implemented, the application programs that provide the required functionality for the
end-users must be implemented.
• This is the responsibility of the application developers.
• Typically, the application developers work from a specification produced by systems analysts.
• Each program contains statements that request the DBMS to perform some operation on the database, which
includes retrieving data, inserting, updating, and deleting data.
End-Users
1.Naïve users 2.Sophisticated users
• Naïve users are typically unaware of the DBMS.
They access the database through specially written application programs that attempt to make the operations as
simple as possible.
• At the other end of the spectrum, the sophisticated end-user is familiar with the structure of the database and the
facilities offered by the DBMS.
Sophisticated end-users may use a high-level query language such as SQL to perform the required operations. Some
sophisticated end-users may even write application programs for their own use.
Advantages of DBMS
Disadvantages of dbms
• Complexity The provision of the functionality that we expect of a good DBMS makes the DBMS an
extremely complex piece of software. Database designers and developers, data and database
administrators, and end-users must understand this functionality to take full advantage of it. Failure to
understand the system can lead to bad design decisions, which can have serious consequences for an
organization.
• Size The complexity and breadth of functionality makes the DBMS an extremely large piece of software,
occupying many megabytes of disk space and requiring substantial amounts of memory to run efficiently.
• Cost of DBMSs The cost of DBMSs varies significantly, depending on the environment and functionality
provided.
• Additional hardware costs The disk storage requirements for the DBMS and the database may necessitate
the purchase of additional storage space. Furthermore, to achieve the required performance, it may be
necessary to purchase a larger machine, perhaps even a machine dedicated to running the DBMS.
• Cost of conversion In some situations, the cost of the DBMS and extra hardware may be relatively small
compared with the cost of converting existing applications to run on the new DBMS and hardware. This
cost also includes the cost of training staff to use these new systems, and possibly the employment of
specialist staff to help with the conversion and running of the systems.
• Performance Typically, a file-based system is written for a specific application, such as invoicing. As a
result, performance is generally very good. However, the DBMS is written to be more general, to cater for
many applications rather than just one. The result is that some applications may not run as fast as they
used to.
• Greater impact of a failure The centralization of resources increases the vulnerability of the system.
Because all users and applications rely on the availability of the DBMS, the failure of certain components
can bring operations to a halt.
Types of Database
1.Centralized Database 2.Distributed Database
Centralized Database
• A database whose all data is located at a single computer (or site) and multiple users can access that database is
know as centralized database.
• A centralized database provides an efficient way to access and update data.
• These databases are usually used in computer network environments. The examples of centralized databases are:
1. Personal Computer Databases
A personal computer database is normally created and maintained by a single user on the personal computer.
The personal computer database are commonly used in small business or organization.
2. Client/Server Databases
The client/server databases are used in small to medium organization or businesses to share data among multiple
users in local area network.
The microcomputers are often used in a local area network.

The client/server architecture is designed for the distribution of work on a computer network in which many clients
may share the data (or services).
Functions of client and sever
Distributed Database
• A logically interrelated collection of shared data (and a description of this data), physically distributed over a
computer network.
• Many organizations/departments have sub-offices in different cities and countries.
• In such cases, the distributed databases are used instead of centralized databases. A distributed database is a single
logical database, which is spread physically across computers in multiple locations (such as cities or countries).
The distributed databases are further divided into two categories:
1. Homogeneous Databases
A homogeneous distributed database has identical software and hardware running all databases instances, and may
appear through a single interface as if it were a single database.
2. Heterogeneous Databases
A heterogeneous distributed database may have different hardware, operating systems, database management
systems, and even data models for different databases.
The Three-Level ANSI-SPARC Architecture
• The levels form a three-level architecture comprising an external, a conceptual, and an internal level.
• The way users perceive the data is called the external level.
• The way the DBMS and the operating system perceive the data is the internal level, where the data is actually stored
using the data structures and file organizations.
• The conceptual level provides both the mapping and the desired independence between the external and internal
levels.
External Level
• The users’ view of the database. This level describes that
part of the database that is relevant to each user.
Conceptual Level
• The community view of the database. This level describes
what data is stored in the database and the relationships
among the data.
• The middle level in the three-level architecture is the conceptual level. This level contains the logical structure of the
entire database as seen by the DBA.
The conceptual level represents:
1. all entities, their attributes, and their

relationships;
2. the constraints on the data;
3. semantic information about the data;
4. security and integrity information.
Internal Level
• The physical representation of the database on the computer. This level describes how the data is stored in the
database.
• The internal level covers the physical implementation of the database to achieve optimal runtime performance and
storage space utilization.
• It covers the data structures and file organizations used to store data on storage devices.
The internal level is concerned with such things as:
1. storage space allocation for data and indexes;
2. record descriptions for storage (with stored sizes for data items);
3. record placement;
4. data compression and data encryption techniques.

Data model
An integrated collection of concepts for describing and manipulating data, relationships between data, and constraints on
the data in an organization.
• A model is a representation of real-world objects and events, and their associations.
A data model can be thought of as comprising three components:
1. a structural part, consisting of a set of rules according to which databases can be constructed;
2. a manipulative part, defining the types of operation that are allowed on the data (this includes the operations that
are used for updating or retrieving data from the database and for changing the structure of the database);
3. a set of integrity constraints, which ensures that the data is accurate.
There have been many data models proposed in the literature.
They fall into three broad categories:
 object-based,
 record-based, and
 physical data models.
Object-Based Data Models

Object-based data models use concepts such as entities, attributes, and relationships.
• An entity is a distinct object (a person, place, thing, concept, event) in the organization that is to be represented in
the database.
• An attribute is a property that describes some aspect of the object that we wish to record, and
• a relationship is an association between entities.
Some of the more common types of object-based data model are:

• Entity-Relationship (ER)
• Semantic
• Functional
• Object-oriented
---------------------------------------------------------------------------------------------------------------------------------------------------------
ER model
• The ER model has emerged as one of the main techniques for database design and forms the basis for the database
design methodology used in this book.
• The object-oriented data model extends the definition of an entity to include not only the attributes that describe
the state of the object but also the actions that are associated with the object, that is, its behavior.
• The object is said to encapsulate both state and behavior.
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Record-Based Data Models

• In a record-based model, the database consists of a number of fixed-format records, possibly of differing types.
• Each record type defines a fixed number of fields, typically of a fixed length.
There are three principal types of record-based logical data model:

• the relational data model,
• the network data model, and
• the hierarchical data model.
Relational data model
• The relational data model is based on the concept of mathematical relations.
• In the relational model, data and relationships are represented as tables, each of which has a number of columns
with a unique name.
Network data model

• In the network model, data is represented as collections of records, and relationships are represented by sets.
• Compared with the relational model, relationships are explicitly modeled by the sets, which become pointers in the
implementation.
• The records are organized as generalized graph structures with records appearing as nodes (also called segments)
and sets as edges in the graph.
Hierarchical data model

• The hierarchical model is a restricted type of network model. Again, data is represented as collections of records and
relationships are represented by sets.
• However, the hierarchical model allows a node to have only one parent.
• A hierarchical model can be represented as a tree graph, with records appearing as nodes (also called segments) and
sets as edges.
----------------------------------------------------------------------------------------------------------------------------------------------------------
Physical Data Models

• Physical data models describe how data is stored in the computer, representing information such as record
structures, record orderings, and access paths.
• There are not as many physical data models as logical data models; the most common ones are the unifying model
and the frame memory.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Conceptual Modeling
• Conceptual modeling or conceptual database design is the process of constructing a model of the information use
in an enterprise that is independent of implementation details, such as the target DBMS, application programs,
programming languages, or any other physical considerations.
• This model is called a conceptual data model.
• Conceptual models are also referred to as “logical models” in the literature.
• The conceptual model is independent of all implementation details, whereas the logical model assumes knowledge
of the underlying data model of the target DBMS.
----------------------------------------------------------------------------------------------------------------------------------------------------------
The Database System Development Lifecycle
 As a database system is a fundamental component of the larger organization-wide information system, the
database system development lifecycle is inherently associated with the lifecycle of the information system.
 For small database systems, with a small number of users, the lifecycle need not be very complex.
 However, when designing a medium to large database systems with tens to thousands of users, using hundreds
of queries and application programs, the lifecycle can become extremely complex.
 User view
Defines what is required of a database system from the perspective of a particular job role (such as Manager or
Supervisor) or enterprise application area (such as marketing, personnel, or stock control).
Main approaches to managing the requirements
 the centralized approach;
 the view integration approach;
 a combination of both approaches.
Centralized Approach
Requirements for each user view are merged into a single
set of requirements for the new database system. A data
model representing all user views is created during the
database design stage.
 The centralized (or one-shot) approach involves

collating the requirements for different user
views into a single list of requirements.
 In the database design stage, a global data model

is created, which represents all user views.
 The global data model is composed of diagrams

and documentation that formally describe the
data requirements of the users.
View Integration Approach
Requirements for each user view remain as separate lists.

Data models representing each user view are created and
then merged later during the database design stage.
 The view integration approach involves leaving the

requirements for each user view as separate lists of
requirements.
 In the database design stage, we first create a data

model for each user view.
 A data model that represents a single user view (or a

subset of all user views) is called a local data model.
 Each model is composed of diagrams and

documentation that formally describes the
requirements of one or more—but not all—user
views of the database.
Fact-finding
The formal process of using techniques such as interviews and questionnaires to collect facts about systems, requirements,
and preferences.
 There are many occasions for fact-finding during the database system development life cycle.
 Examining documentation;
 Interviewing;
 Observing the enterprise in operation;
 Research;
 Questionnaires.
Examining Documentation
 Examining documentation can be useful when we are trying to gain some insight as to how the need for a database
arose.
 We may also find that documentation can help to provide information on the part of the enterprise associated with
the problem.
Interviewing
 Interviewing is the most commonly used and normally the most useful factfinding technique.
 We can interview to collect information from individuals faceto-face.
 As with other fact-finding techniques, interviewing is not always the best method for all situations.
Types:
 Unstructured Interviews
 Structured interviews
Unstructured Interviews Structured interviews

 Unstructured interviews are conducted with only a  In structured interviews, the interviewer has a specific
general objective in mind and with few, if any, set of questions to ask the interviewee.
specific questions.  Depending on the interviewee’s responses, the
 The interviewer counts on the interviewee to interviewer will direct additional questions to obtain
provide a framework and direction to the interview. clarification or expansion.
 This type of interview frequently loses focus and, Open-ended questions allow the interviewee to respond in
for this reason, it often does not work well for any way that seems appropriate. An example of an open-
database analysis and design. ended question is: “Why are you dissatisfied with the
report on client registration?”
Closed-ended questions restrict answers to either specific
choices or short, direct responses. An example of such a
question might be: “Are you receiving the report on client
registration on time?”
Observing the Enterprise in Operation

 Observation is one of the most effective fact-finding techniques for understanding a system.
 With this technique, it is possible to

either participate in or watch a
person perform activities to learn
about the system.
 This technique is particularly useful

when the validity of data collected
through other methods is in question
or when the complexity of certain
aspects of the system prevents a
clear explanation by the end-users.
Research
 A useful fact-finding technique is to research the application and problem.
 Computer trade journals, reference books, and the Internet
 They can provide information on how others have solved similar problems, plus on whether software packages
exist to solve or even partially solve the problem.
Questionnaires
 Another fact-finding technique is to conduct surveys through questionnaires.
 Questionnaires are special-purpose documents that allow facts to be gathered from a large number of people
while maintaining some control over their responses.
 When dealing with a large audience, no other fact-finding technique can tabulate the same facts as efficiently.
Free-format questions offer the respondent greater Fixed-format questions require specific responses from
freedom in providing answers. A question is asked and the individuals. Given any question, the respondent must
respondent records the answer in the space provided after choose from the available answers. This makes the results
the question. Examples of free-format questions are: much easier to tabulate. On the other hand, the respondent
“What reports do you currently receive and how are they cannot provide additional information that might prove
used?” and “Are there any problems with these reports? If valuable. An example of a fixed-format question is: “The
so, please explain.” current format of the report on property rentals is ideal
and should not be changed.” The respondent may be given
the option to answer “Yes” or “No” to this question, or be
given the option to answer from a range of responses
including “Strongly agree,” “Agree,” “No opinion,”
“Disagree,” and “Strongly disagree.”
Entity Types
 A group of objects with the same properties, which are identified by

the enterprise as having an independent existence.
 The basic concept of the ER model is the entity type, which

represents a group of “objects” in the “real world” with the same
properties.
Entity occurrence
 A uniquely identifiable object of an entity type.
 Each uniquely identifiable object of an entity type is referred to simply as an

entity occurrence.
------------------------------------------------------------------------------------------------------------
Relationship Types
 A relationship type is a set of associations between one or more participating entity types.
 Each relationship type is given a name that describes its function.
Relationship occurrence
 A uniquely identifiable association that includes one occurrence from each participating entity type.
 A relationship occurrence indicates the particular entity occurrences that are related.
Degree of Relationship Type
 The number of participating entity types in a relationship.
 The entities involved in a particular relationship type are referred to as participants in that relationship.
 The number of participants in a relationship type is called the degree of that relationship.
 A relationship of degree four is called quaternary

Recursive Relationship
A relationship type in which the same entity type participates
more than once in different roles.
Example:
 An example of entities associated through two distinct

relationships called Manages and Has with role names
 Role names may also be used when two entities are

associated through more than one relationship.
For example, the Staff and Branch entity types are associated
through two distinct relationships called Manages and Has.
Attribute
 A property of an entity or a relationship type.
Attribute domain
 The set of allowable values for one or more attributes.
 Each attribute is associated with a set of values called a domain.
 For example, the number of rooms associated with a property is between 1 and 15 for each entity occurrence.
Simple and Composite Attributes
Simple attribute Composite attribute
 An attribute composed of a single component with an  An attribute composed of multiple components, each
independent existence. with an independent existence.
 Simple attributes cannot be further subdivided into  Some attributes can be further divided to yield smaller
smaller components. components with an independent existence of their
own.
 Examples of simple attributes include position and
salary of the Staff entity. Simple attributes are  For example, the address attribute of the Branch entity
sometimes called atomic attributes. with the value (163 Main St, Glasgow, G11 9QX) can be
subdivided into street (163 Main St), city (Glasgow), and
postcode (G11 9QX) attributes.
Single-valued attrib Multi-valued attribute

 An attribute that holds a single value for each  An attribute that holds multiple values for each
occurrence of an entity type. The majority of occurrence of an entity type.
attributes are single-valued.  Some attributes have multiple values for each
 For example, each occurrence of the Branch entity entity occurrence.
type has a single value for the branch number  For example, each occurrence of the Branch entity
(branchNo) attribute (for example, B003), and type can have multiple values for the telNo
therefore the branchNo attribute is referred to as attribute (for example, branch number B003 has
being single valued. telephone numbers 0141-339-2178 and 0141-339-
4439) and therefore the telNo attribute in this case
is multi-valued.
Derived attribute
 An attribute that represents a value that is derivable from the value of a related attribute or set of
attributes, not necessarily in the same entity type.
 The values held by some attributes may be derived.
 For example, the value for the duration attribute of the Lease entity is calculated from the rentStart and
rentFinish attributes, also of the Lease entity type.
Keys
1.)Candidate key 2.)Primary key 3.)Composite key
Primary key
 The candidate key that is selected to uniquely identify each occurrence of an entity type.
 An entity type may have more than one candidate key.
 For the purposes of discussion, consider that a member of staff has a unique company-defined staff number
(staffNo) and also a unique National Insurance Number (NIN) that is used by the government.
Candidate key
 The minimal set of attributes that uniquely identifies each occurrence of an entity type.
 A candidate key is the minimal number of attributes, whose value(s) uniquely identify each entity occurrence.
 For example, the branch number (branchNo) attribute is the candidate key for the Branch entity type, and has a
distinct value for each branch entity occurrence.
 The candidate key must hold values that are unique for every occurrence of an entity type.
 This implies that a candidate key cannot contain a null.
Composite key
 A candidate key that consists of two or more attributes.
 In some cases, the key of an entity type is composed of several attributes whose values together are unique for each
entity occurrence but not separately.
 For example, consider an entity called Advert with propertyNo (property number), newspaperName, dateAdvert,
and cost attributes.
Strong Entity Types Weak Entity Types

 An entity type that is not existence-dependent on some  An entity type that is existence-dependent on some
other entity type. other entity type.
 An entity type is referred to as being strong if its  A weak entity type is dependent on the existence of
existence does not depend upon the existence of another entity type.
another entity type.  An example of a weak entity type called Preference.
 Examples of strong entities the Staff, Branch,  A characteristic of a weak entity is that each entity
PropertyForRent, and Client entities. occurrence cannot be uniquely identified using only the
attributes associated with that entity type.
 For example, note that there is no primary key for the
Preference entity.
Diagrammatic representation of attributes on relationships

Structural Constraints
 The constraints should reflect the restrictions on the relationships as perceived in the “real world.”
 Examples of such constraints include the requirements that a property for rent must have an owner and each branch
must have staff.
Multiplicity
 The number (or range) of possible occurrences of an entity type that may relate to a single occurrence of an
associated entity type through a particular relationship.
 Multiplicity constrains the way that entities are related.

 The most common degree for relationships is binary.
Binary relationships are generally referred to as being
 one-to-one (1:1),
 one-to-many (1:*), or
 many-to-many (*:*).
One-to-One (1:1) Relationships
 Consider the relationship Manages, which relates the Staff

and Branch entity types.
One-to-Many (1:*) Relationships
 Consider the relationship Oversees, which relates the Staff

and PropertyForRent entity types.
Many-to-Many (*:*) Relationships
 Consider the relationship Advertises, which relates the

Newspaper and PropertyForRent entity types.
Cardinality Constraints Participation Constraints

 Multiplicity actually consists of two separate constraints  Determines whether all or only some entity occurrences
known as cardinality and participation. participate in a relationship.
 Describes the maximum number of possible  The participation constraint represents whether all entity
relationship occurrences for an entity participating in a occurrences are involved in a particular relationship
given relationship type. (referred to as mandatory participation) or only some
 The cardinality of a binary relationship is what we (referred to as optional participation).
previously referred to as a one-to- one (1:1), one-to-  The participation of entities in a relationship appears as
many (1:*), and many-to-many (*:*). the minimum values for the multiplicity ranges on either
side of the relationship.
 Optional participation is represented as a minimum
value of 0, and mandatory participation is shown as a
minimum value of 1.
Problems with ER Models
Fan Traps
 Where a model represents a relationship between entity types, but the pathway between certain entity occurrences
is ambiguous.
 A fan trap may exist where two or more 1:* relationships fan out from the same entity.
 A potential fan trap is illustrated in next figure, which shows two 1:* relationships (Has and Operates) emanating
from the same entity called Division.
Solution:
 We resolve this fan trap by restructuring the original ER model to represent the correct association between these
entities, as shown in next Figure.
 If we now examine occurrences of the Operates and Has relationships, as shown in next Figure.
Chasm Traps
 Where a model suggests the existence of a relationship between entity types, but the pathway does not exist
between certain entity occurrences.
 A chasm trap may occur where there are one or more relationships with a minimum multiplicity of zero (that is,
optional participation) forming part of the pathway between related entities.
 This model represents the facts that a single branch has one or more staff who oversee zero or more properties for
rent.
 A problem arises when we want to know which properties are available at each branch.
Solution:
 The inability to answer this question is considered to be a loss of information (as we know a property must be
available at a branch), and is the result of a chasm trap.
Therefore, to solve this problem, we need to identify the missing relationship, which in this case is the Offers relationship
between the Branch and PropertyForRent entities.
Specialization/Generalization
 The concept of specialization/generalization is associated with special types of entities known as superclasses and
subclasses, and the process of attribute inheritance.
 We begin this section by defining superclasses and subclasses and by examining superclass/subclass relationships.
 We describe the process of attribute inheritance and contrast the process of specialization with the process of
generalization.
Superclass An entity type that includes one or more distinct Subclass A distinct subgrouping of occurrences of an entity
subgroupings of its occurrences, which must be type, which must be represented in a data model.
represented in a data model.  Entity types that have distinct subclasses are called
superclasses.
 For example, the entities that are members of the
Staff entity type may be classified as Manager,
SalesPersonnel, and Secretary.
Attribute Inheritanc
 As mentioned earlier, an entity in a subclass represents the same “real world” object as in the superclass, and may
possess subclass-specific attributes, as well as those associated with the superclass.
 A subclass is an entity in its own right and so it may also have one or more subclasses.
 An entity and its subclasses and their subclasses, and so on, is called a type hierarchy.
Specialization Process
 The process of maximizing the differences between members of an entity by identifying their distinguishing
characteristics.
 Specialization is a top-down approach to defining a set of superclasses and their related subclasses.
 The set of subclasses is defined on the basis of some distinguishing characteristics of the entities in the superclass.
Generalization Process
 The process of minimizing the differences between entities by identifying their common characteristics.
 The process of generalization is a bottom-up approach, that results in the identification of a generalized superclass
from the original entity types.
 If we apply the process of generalization on these entities, we attempt to identify similarities between them, such as
common attributes and relationships.
Participation constraints Disjoint constraints
 Determines whether every member in the superclass  Describes the relationship between members of the
must participate as a member of a subclass. subclasses and indicates whether it is possible for a
 A participation constraint may be mandatory or member of a superclass to be a member of one, or
optional. more than one, subclass.
 A superclass/subclass relationship with mandatory  The disjoint constraint only applies when a superclass
participation specifies that every member in the has more than one subclass.
superclass must also be a member of a subclass.  If the subclasses are disjoint, then an entity occurrence
 To represent mandatory participation, “Mandatory” is can be a member of only one of the subclasses.
placed in curly brackets below the triangle that points  To represent a disjoint superclass/subclass relationship,
towards the superclass. “Or” is placed next to the participation constraint within
the curly brackets.
Aggregation Composition
 Represents a “has-a” or “is-part-of” relationship  A specific form of aggregation that represents an
between entity types, where one represents the association between entities, where there is a strong
“whole” and the other the “part.” ownership and coincidentallifetime between the
 A relationship represents an association between two “whole” and the “part.”
entity types that are conceptually at the same level.  Aggregation is entirely conceptual and does nothing
 Sometimes we want to model a “has-a” or “is-part-of” more than distinguish a “whole” from a “part.”
relationship, in which one entity represents a larger  In a composite, the “whole” is responsible for the
entity (the “whole”), consisting of smaller entities (the disposition of the “parts,” which means that the
“parts”). composition must manage the creation and destruction
 An example of an aggregation is the Has relationship, of its “parts.”
which relates the Branch entity (the “whole”) to the  In other words, an object may be part of only one
Staff entity (the “part”). composite at a time.
Normalization
Staff and Branch relations Example
The Staff Branch relation is an alternative format of the Staff and Branch relations. The relations have the following form:
Staff (staffNo, sName, position, salary, branchNo)
Branch (branchNo, bAddress)
StaffBranch (staffNo, sName, position, salary, branchNo, bAddress)

Insertion Anomalies
 There are two main types of insertion anomaly, which we illustrate using the StaffBranch relation:
 To insert the details of new members of staff into the StaffBranch relation, we must include the details of the branch
at which the staff are to be located.
 For example, to insert the details of new staff located at branch number B007, we must enter the correct details of
branch number B007 so that the branch details are consistent with values for branch B007 in other tuples of the
StaffBranch relation.
Deletion Anomalies
 If we delete a tuple from the StaffBranch relation that represents the last member of staff located at a branch, the
details about that branch are also lost from the database.
 For example, if we delete the tuple for staff number SA9 (Mary Howe) Figure 14.3 StaffBranch relation from the
StaffBranch relation, the details relating to branch number B007 are lost from the database.
 The design of the relations in Figure 14.2 avoids this problem, because branch tuples are stored separately from staff
tuples and only the attribute branchNo relates the two relations.
Modification Anomalies
 If we want to change the value of one of the attributes of a particular branch in the StaffBranch relation—for
example, the address for branch number B003—we must update the tuples of all staff located at that branch.
 If this modification is not carried out on all the appropriate tuples of the StaffBranch relation, the database will
become inconsistent.
The lossless-join property ensures that any instance of the The dependency preservation property ensures that a
original relation can be identified from corresponding constraint on the original relation can be maintained by
instances in the smaller relations. simply enforcing some constraint on each of the smaller
relations.
Functional Dependencies
 Consider a relation with attributes A and B, where attribute B is functionally dependent on attribute A.
 If we know the value of A and we examine the relation that holds this dependency, we find only one value of B in all
the tuples that have a given value of A, at any moment in time.
 Thus, when two tuples have the same value of A, they also have the same value of B. However, for a given value of
B, there may be several different values of A.
Determinant
 Refers to the attribute, or group of attributes, on the left-hand side of the arrow of a functional dependency.
 When a functional dependency exists, the attribute or group of attributes on the left-hand side of the arrow is called
the determinant.
 For example, in Figure 14.4, A is the determinant of B.

full functional dependency Transitive dependency
 Indicates that if A and B are attributes of a relation,  A condition where A, B, and C are attributes of a
B is fully functionally dependent on A if B is relation such that if A ® B and B ® C, then C is
functionally dependent on A, but not on any proper transitively dependent on A via B (provided that A
subset of A. is not functionally dependent on B or C).
 A functional dependency A ® B is a full functional
dependency if removal of any attribute from A
results in the dependency no longer existing.
 A functional dependency A ® B is a partial
dependency if there is some attribute that can be
removed from A and yet the dependency still holds.
Example of a full functional dependency
 Consider the following functional dependency that exists in the Staff relation of Figure 14.2:
 staffNo, sName ® branchNo
 It is correct to say that each value of (staffNo, sName) is associated with a single value of branchNo.
 However, it is not a full functional dependency, because branchNo is also functionally dependent on a subset of
(staffNo, sName), namely staffNo.
 In other words, the functional dependency shown in the example is an example of a partial dependency.
Example of a transitive functional dependency
 Consider the following functional dependencies within the StaffBranch relation:
 staffNo ® sName, position, salary, branchNo, bAddress
 branchNo ® bAddress
 The transitive dependency branchNo ® bAddress exists on staffNo via branchNo.
 In other words, the staffNo attribute functionally determines the bAddress via the branchNo attribute and neither
branchNo nor bAddress functionally determines staffNo.
First Normal Form (1NF)
 Unnormalized Form (UNF)
 A table that contains one or more repeating groups.
 A relation in which the intersection of each row and column contain one and only one value.
 we begin the process of normalization by first transferring the data from the source (for example, a standard data
entry form) into table format with rows and columns.
 To transform the unnormalized table to First Normal Form, we identify and remove repeating groups within the
table.
 A repeating group is an attribute, or group of attributes, within a table that occurs with multiple values for a single
occurrence of the nominated key attribute(s) for that table.
Second Normal Form (2NF)

 A relation that is in first normal form and every non-primary-key attribute is fully functionally dependent on the
primary key.
 The normalization of 1NF relations to 2NF involves the removal of partial dependencies.
 If a partial dependency exists, we remove the partially dependent attribute(s) from the relation by placing them in a
new relation along with a copy of their determinant.
Third Normal Form (3NF)

 A relation that is in first and second normal form and in which no non-primary-key attribute is transitively
dependent on the primary key.
 Although 2NF relations have less redundancy than those in 1NF, they may still suffer from update anomalies.
 If we update only one tuple and not the other, the database would be in an inconsistent state. This update anomaly
is caused by a transitive dependency.
 We need to remove such dependencies by progressing to third normal form.

Database Slides

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Database Slides

Загружено:

Авторское право:

Доступные форматы

File Based system

Limitations of the File-Based Approach

 Incompatible file formats

 Fixed queries/proliferation of application programs

• It is the self-describing nature of a database that provides program–data independence.

Database Administrator (DBA)

1.logical database designers 2. physical database designers

• This is the responsibility of the application developers.

1.Naïve users 2.Sophisticated users

• Naïve users are typically unaware of the DBMS.

• A centralized database provides an efficient way to access and update data.

1. Personal Computer Databases

The microcomputers are often used in a local area network.

Functions of client and sever

• Many organizations/departments have sub-offices in different cities and countries.

The distributed databases are further divided into two categories:

The conceptual level represents:

1. all entities, their attributes, and their

2. the constraints on the data;

3. semantic information about the data;

4. security and integrity information.

The internal level is concerned with such things as:

1. storage space allocation for data and indexes;

4. data compression and data encryption techniques.

• A model is a representation of real-world objects and events, and their associations.

A data model can be thought of as comprising three components:

3. a set of integrity constraints, which ensures that the data is accurate.

There have been many data models proposed in the literature.

They fall into three broad categories:

Object-Based Data Models

• a relationship is an association between entities.

Some of the more common types of object-based data model are:

• The object is said to encapsulate both state and behavior.

Record-Based Data Models

There are three principal types of record-based logical data model:

Network data model

Hierarchical data model

Physical Data Models

• This model is called a conceptual data model.

• Conceptual models are also referred to as “logical models” in the literature.

 The centralized (or one-shot) approach involves

 In the database design stage, a global data model

 The global data model is composed of diagrams

View Integration Approach

Requirements for each user view remain as separate lists.

 The view integration approach involves leaving the

 In the database design stage, we first create a data

 A data model that represents a single user view (or a

 Each model is composed of diagrams and

Unstructured Interviews Structured interviews

Observing the Enterprise in Operation

 With this technique, it is possible to

 This technique is particularly useful

 A group of objects with the same properties, which are identified by

 The basic concept of the ER model is the entity type, which

 A uniquely identifiable object of an entity type.

 Each uniquely identifiable object of an entity type is referred to simply as an

 Each relationship type is given a name that describes its function.

Degree of Relationship Type

 The number of participating entity types in a relationship.

 A relationship of degree four is called quaternary

 An example of entities associated through two distinct

 Role names may also be used when two entities are

 A property of an entity or a relationship type.

Many-to-Many (:) Relationships