Вы находитесь на странице: 1из 63

Contents

Sectie 1 .................................................................................................................................................... 2
Module 2 ............................................................................................................................................. 2
Module 3 ............................................................................................................................................. 6
Module 4 ........................................................................................................................................... 13
Module 5 ........................................................................................................................................... 15
Module 6 ........................................................................................................................................... 17
Sectie 2 .................................................................................................................................................. 19
Module 1 ........................................................................................................................................... 19
Module 2 ........................................................................................................................................... 20
Module 3 ........................................................................................................................................... 21
Module 4 ........................................................................................................................................... 23
Module 5 ........................................................................................................................................... 25
Sectie 3 .................................................................................................................................................. 28
Module 1 ........................................................................................................................................... 28
Module 2 ........................................................................................................................................... 29
Module 3 ........................................................................................................................................... 30
Module 4 ........................................................................................................................................... 34
Module 5 ........................................................................................................................................... 38
Module 6 ........................................................................................................................................... 40
Module 8 ........................................................................................................................................... 54
Module 9: .......................................................................................................................................... 57
Sectie 1
Module 2

Data: Collected facts about a topic or item


Data is raw material from which you can draw conclusions; facts from which you can deduce new
facts.

Information: The result of combining, comparing and performing calculations on data


Information is knowledge, intelligence, a particular piece of data with a special meaning or function.
Information is often the result of combining, comparing, and performing calculations on data.

The difference between data and information can be explained by using an example such as test
scores. In one class, if every student receives a numbered score, the scores can be calculated to
determine a class average. The class averages can be calculated to determine the school average. So in
this scenario, how can you differentiate between data and information?

For data, each student's test score is one piece of data.


Information is the class’s average score or the school's average score.

Database:
 is a centralized and structured set of data stored on a computer system;
 provides facilities for retrieving, adding, modifying and deleting data when required;
 provides facilities for transforming retrieved data into useful information

Every organization needs to collect and maintain data to meet its requirements. An information system
can be defined as a formal system for storing and processing data. Most organizations today use a
database to automate their information systems. A database is an organized collection of data put
together as a unit. The rationale of a database is to collect, store, and retrieve related data for use by
database applications. A database application is a software program that interacts with a database to
access and manipulate data. A database is usually managed by a database administrator (DBA).

Database Management System heeft de volgende elementen:

 The kernel code manages memory and storage for the DBMS.
 The repository of metadata is called a data dictionary.
 The query language enables applications to access the data.
Key Computing Terms:
 Hardware: The physical "bits and pieces" of a computer; for example, keyboard, screen,
mouse, disk drive, memory.
 Software: Programs (sets of instructions) that tell the hardware what to do.
 Operating system: A software program that directly controls and manages the hardware; for
example, Microsoft Windows.
 Application: A software program that carries out specific tasks on behalf of computer users.
 Client: A workstation or desktop computer, including a screen, a keyboard, and a mouse.
Clients communicate directly with human computer users.
 Server: A more powerful computer that accepts work requests from clients, does the work, and
sends results back to the client.
 Every time you request information from a web page, your client computer sends the request
to a database on the server. The server retrieves the data from the database, converts it into
useful

Transformation in Computing:
1. 1970’s: Mainframe computing;
2. 1980’s: Desktop computing;
3. 1990’s: Client/server computing;
4. 2000’s: Grid computing.
Early computer applications focused on tasks that were clerical in nature; for example, payroll,
accounting, and inventory. These applications accessed data stored in computer files, converted the
data into meaningful information, and generated reports to fulfill the organization's requirements.
These systems were called file-based systems. The decades-long evolution in computer technology,
coupled with the needs and demands of organizations, has resulted in the development of a database
technology from the primitive file
1970’s: Mainframe computing
In the 1970s, attempts were made to build database systems with integrated hardware and software.
Smaller computers, or "dumb terminals," were used to access the large mainframe and execute
commands. The terminals depended on the mainframe and displayed the results only after the
processing was executed.d
1980’s: Desktop computing
As PCs became faster and widely available, processing moved from mainframes to clients. Because
PCs had their own software and were capable of doing some processing on their own, they came to be
known as "smart clients" or "workstations." Having the processing power within the client machine
ushered in a wave of graphical user interface (GUI) applications. Many of today’s common
applications (Word, Excel, PowerPoint) were created during this era.
1990’s: Client/server computing
Upgradation was one of the issues with multiple applications on multiple client workstations. An
upgrade made to a software application warranted that each and every server plus each and every
client had to be upgraded.
To resolve this issue, one of the strategies adopted in the 1990s was the use of Internet and fast
processing servers to meet the needs of organizations in storing data and producing information. This
is how it worked:
o The software that manages the data is on the database server. It performs processing
for storage and retrieval.
o Applications for business operations sit on the application server. It performs
processing for document creation, developing, interacting, or manipulating the data.
o Clients can have applications of their own, but the essential business applications are
accessed from the clients by using an Internet browser.
o This also paved the path to grid computing.

2000’s: Grid computing


In the grid-computing model, all of an organization's computers in different locations can be utilized
just like a pool of computing resources. Grid computing builds a software infrastructure that can run
on a large number of networked servers.
A user makes a request for information or computation from his or her workstation and that request is
processed somewhere in the grid as efficiently as possible.
Grid computing treats computing as a utility, like the electric company. You don't know where the
generator is or how the electric grid is wired. You just ask for electricity and you get it.
For keeping track of purchases on credit and debit cards, which helps generate monthly statements.
–For integrating heterogeneous information sources for business-related activities, such as online
shopping, booking of holiday packages, and doctor consultations.
–In the health-care industry to maintain and track patient health care details.
–In the area of digital publishing and digital libraries to manage and deliver textual and multimedia
data.
In finance and trading for storing information pertaining to sales, purchases of stocks and bonds, or
online trading.
At organizations for storing information about their employees, salaries, benefits, taxes,
Module 3
Database Development Process
Data modeling is the first part of the database development process. Conceptual data modeling is the
examination of a business and business data to determine the structure of business information and the
rules that govern it. This structure forms the basis for database design. A conceptual model is
relatively stable over long periods of time. Physical data modeling (or database building) is concerned
with implementation in a given technical software and hardware environment. The physical
implementation is highly dependent on the current state of technology and is subject to change as
available technologies rapidly change.

Types of Database Models


1. Flat file;
2. Hierarchical;
3. Relational;
4. Network;
5. Object-oriented.
Flat File:

 A flat file database is a database designed around a single table. The flat file design puts
all database information in one table, or list, with fields to represent all parameters.

 A flat file may contain many fields, often with duplicate data that is prone to data
corruption. If you decide to merge data between two flat files, you need to copy and paste
relevant information from one file to the other.
 There is no automation between flat files. If you have two or more flat files that contain
client addresses, for example, and a client moved, you would have to manually modify the
address parameters in each file that contains that client's information.
 Changing information in one file has no bearing on other files.
 Flat files offer the functionality to store information, manipulate fields, print or display
formatted information and exchange information with others, through email and over the
Internet. Some flat files may be attached to external files, such as text editors, to extend
functionality and manage related information.
 Flat file databases are generally in plain-text form, where each line holds only one record.
The fields in the record are separated with delimiters, such as tabs and commas.
Hierarchical Model:
In a hierarchical database model, the data is organized in a tree-like structure. The data is stored as
records that are connected to one another through links. A record is a collection of fields. Each field
contains only one value. The entity type of a record defines which fields the record contains.
A record in the hierarchical database model corresponds to a row in the relational database model. An
entity type corresponds to a table.

In a hierarchical database model:


1. Each child record has only one parent.
2. A parent record can have one or more child records.

To retrieve data from a hierarchical database, the whole tree needs to be traversed, starting from the
root node.
Advantages:
1. Easy addition and deletion of new information
2. Faster access to data at the top of the hierarchy

Disadvantages:
1. Increased storage space
2. Slower access to data at the bottom of the hierarchy
The slide depicts an example of hierarchical model, where the root node represents the Teacher entity.
A teacher could be employed as part time, full time, or contractor. Each parent node has child nodes.
For example:
 The “Teacher” parent node has three child nodes that hold information about the type of
employment.
 The “Full Time” parent node has three child nodes that hold information about the subjects
taught by
Network Model:

The network model is a database model that can be regarded as a flexible way of representing objects
and their relationships. A network database comprised of a collection of records connected to one
another through links. Each record is a collection of fields, each of which contains only one data
value. A link is an association between two records.
In the Network Model Diagram as depicted in the slide:
 Boxes -correspond to fields, like Bank ID, State and City
 Lines -correspond to links that connect one record to another

In the hierarchical database model, data is represented as a tree of records, with each record having
one parent record and many children. In a network database model each record can have multiple
parent and child records, forming a generalized graph structure. The network model enables a more
natural way of modeling the relationship between records.
Relational model

A relational model describes a database in terms of tables, columns, rows, and joins between tables.
Here are some key points about the relational model:
 Data is represented as a collection of tables.
 Each column represents attributes that belong to the table. For example, in the student table,
you could have name, address, student ID, Birth_Date, and so on.
 Each row represents the instance of the table. For example, in the student table, 110, Jones, 12
Oxford Street, and 03-03-66 represents one student instance.
 Each table is the visual representation of columns and rows.
 Every table has a field or a set of fields that uniquely identifies the row. For example, in the
student table, the student ID column uniquely identifies each student.

Features of a relational database model:


 The order of the rows and columns is not important.
 Every row is unique. There is a value in each row that is different from that value in another
row.
 Each field can contain only one value.
 Values within a column of fields are from the same domain. (For example, a column defined
as a date column would not contain a salary amount.)
 Table names must be unique, and column names within each table must be unique.

A relational model represents a database as collections of records that are stored in tables. Each
relational database table contains rows of records and columns with fields of information about each
record.
Each table of records will have a relationship with another table of records when the two tables share a
field (or column).
An object-oriented data model consists of the following basic object-oriented concepts:
 A entity is modeled as an object.
 Every object has a state (the set of values for the attributes of the object) and a behavior (the
set of methods that operate on the state of the object). The values and methods in an object can
be accessed or invoked outside the object only through explicit implementation. The
relationship between the objects is through sharing of access rather than through pointers or
joins.
 An object must belong to only one class as an instance of that class.
 You can derive a new class (subclass) from an existing class (superclass).

Advantages
 Reduced maintenance
 Real-world modeling
 High code reusability

Disadvantages:

 Many information application systems do not benefit from object-oriented modeling because it
is best suited for dynamic, interactive environments.
Module 4
Relational database:

 Stores info in tables with rows and colums;


 A table is a collection of records;
 A row is called a record or instance;
 A column is referred to as a field or attribute.
Rules for Relational Databases:

 Each table has a distinct name;


 Each table may contain multiple rows;
 Each table has a value to uniquely identify the rows;
 Each column in a table has a unique name.
When you design a relational database, it is important that you minimize redundant data (that is, the
same data stored in more than one place in the database)
Normalization:

 The process of organizing the attributes and tables of a relational database to minimize
redundancy;
 Helps in handling insert, update, delete anomalies and ensure a better performance of the
database;
In relational database design, the process of organizing data to minimize redundancy is called
normalization. In other words, normalization can be defined as the process of decomposing a table
with inconsistencies to produce a smaller, well-structured table. Normalization may involve splitting a
table into two or more tables and defining relationships between the tables. The objective is to isolate
data so that additions, deletions, and modifications of an attribute or a field can be made in just one
table and then propagated throughout the rest of the database by using well-defined relationships.
Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization.
Objectives of Normalization:

 To free the collection of tables from undesirable insertion, update & deletion dependencies;
 To reduce the need for restructuring the collection of relations, as new types of data are
introduced, and thus increase the life span of application programs;
 To make the relational model more informative to users;
 To make the collection of tables neutral to the query statistics, where these statistics are liable
to change as time goes by.
Here are some characteristics of insufficiently normalized tables:
 The same information can be expressed on multiple rows; therefore, updates to the table may
result in logical inconsistencies.
 There are circumstances where certain facts cannot be recorded at all.
 Under certain circumstances, deletion of data representing certain facts necessitates deletion of
data representing completely different facts.

When a fully normalized database structure is extended to allow it to accommodate new types of data,
the preexisting aspects of the database structure can remain largely or entirely unchanged. As a result,
applications interacting with the database are minimally affected. Normalized tables are suitable for
general-purpose querying; that is, any queries against these tables, including future queries whose
details cannot be anticipated, are supported.
Here are some characteristics of insufficiently normalized tables:
 The same information can be expressed on multiple rows; therefore, updates to the table may
result in logical inconsistencies.
 There are circumstances where certain facts cannot be recorded at all.
 Under certain circumstances, deletion of data representing certain facts necessitates deletion of
data representing completely different facts.

When a fully normalized database structure is extended to allow it to accommodate new types of data,
the preexisting aspects of the database structure can remain largely or entirely unchanged. As a result,
applications interacting with the database are minimally affected. Normalized tables are suitable for
general-purpose querying; that is, any queries against these tables, including future queries whose
details

Advantages of a Relational Database:


 Avoid duplication of data;
 Ensures consistency of the data that is stored as records;
 Easier to modify data and data format;
 Easier to insert and delete data;
 Easier to maintain security of data.

Data integrity:
 Ensures that data is accurate;
 Ensures that data is consistent;
 Is achieved through normalization, defined business rules and validated data.

There are three main types of data integrity:


 Entity integrity ensures that each row (record) is a unique instance in a particular table by
enforcing the identifier column(s) of a table; for example, Employee ID, Student ID, and
Social Security Number (SSN).
 Domain integrity ensures that valid data is entered for a column by enforcing the data
type, the data format, and the range of possible values. For example, the value in the
Salary column of the EMPLOYEES table must contain numerical values.
 Referential integrity preserves the defined relationships between tables when records are
inserted, updated/modified, or deleted by ensuring that the key values are consistent across
tables. This kind of consistency requires that there are no references to nonexistent values.
If a key value changes, all references to that key value change consistently throughout the
database.
 User-defined integrity ensures that the data stored in a database complies with the rules of
the business.
Module 5
Four logical units of database space allocation:

 Data blocks;
 Extents;
 Segments;
 Tablespaces.
Data blocks are the smallest units of storage that Oracle Database can use or allocate. At the finest
level of granularity, Oracle Database stores data in data blocks. One logical data block corresponds to
a specific number of bytes of physical disk space.
An extent is a set of logically contiguous data blocks allocated for storing a specific type of
information.
A segment is a set of extents allocated for a specific database object, such as a table. For example,
the data for the Employees table is stored in its own data segment. Every database object that
consumes storage consists of a single segment.
Each segment belongs to one and only one tablespace. Thus, all extents for a segment are stored in
the same tablespace.

Data block:

 The smallest logical storage unit of a database;


 A single data block represents a specific number of bytes on the physical hard disk;
 The size of a data block is generally a multiple of the operating system block size.
The Oracle Database manages the logical storage space in the data files of a database in units called
data blocks, also called Oracle blocks or pages. A data block is the minimum unit of database I/O. At
the physical level, database data is stored in disk files made up of operating system blocks. An
operating system block is the minimum unit of data that the operating system can read or write. In
contrast, an Oracle block is a logical storage structure whose size and structure are not known to the
operating system.

Data block consists of the following format:


 A Header that holds generic information like block address and type of segment
 A Table Directory that contains information about the table having rows in that block
 A Row Directory that contains information about the actual row contained in that block
 A Free Space that is the available free space in the data block
 A Row Data that contains table or index data

The first three components of a data block (Header, Table Directory, and Row Directory) are
collectively known as Overhead.
Extent:

 Is a logical unit of database storage space allocation made up of contiguous data blocks
Data blocks in an extent are logically contiguous, but can be physically spread out on disk. By default,
the database allocates an initial extent for a data segment when the segment is created. The first data
block of every segment contains a directory of the extents in the segment.
Segment:
A segment is a set of extents that have been allocated for a specific type of data structure and that are
stored in the same tablespace. For example, each table's data is stored in its own data segment, while
each query's data is stored in a temporary segment. Oracle allocates space for segments in extents.
Oracle allocates another extent when the existing extents of a segment become full. The extents of a
segment may or may not be contiguous on disk, because extents are allocated on an as-needed basis.

Tablespace:

Tablespaces are the primary logical storage structures of any Oracle database. The usable data of an
Oracle database is logically stored in the tablespaces and physically stored in the data files associated
with the corresponding tablespace.
•An Oracle database consists of one or more logical storage units called tablespaces. The database's
data is collectively stored in the database's tablespaces. Each tablespace in an Oracle database consists
of one or more files called data files, which are physical structures that conform to the operating
system in which an Oracle database is running.

Physical Storage Structures:

 A data file is a physical file on disk that was created by Oracle Database and contains data
structures, such as tables and indexes. A temp file is a data file that belongs to a temporary
tablespace. The data is written to these files in an Oracle proprietary format that cannot be
read by other programs.
 A control file is a root file that tracks the physical components of the database.
 The online redo log is a set of files containing records of changes made to data.
 A database instance is a set of memory structures that manage database files.

Data file

Oracle Database stores database data in data files. Every database must have at least one data file.
Oracle Database allocates space for user data in tablespaces, which, like segments, are logical storage
structures. Each segment belongs to only one tablespace. Oracle Database physically stores tablespace
data in data files. Tablespaces and data files are closely related, but they have important differences:
 Each tablespace consists of one or more data files, which conform to the operating system in
which the Oracle database is running.
 The data for a database is collectively stored in the data files located in each tablespace of the
database.
 A segment can span one or more data files, but it cannot span multiple tablespaces.
 A database must have the SYSTEM and SYSAUX tablespaces. Oracle Database automatically
allocates the first data files of any database for the SYSTEM tablespace during database
creation.
 The SYSTEM tablespace contains the data dictionary, a set of tables that contains database
metadata. Generally, a database also has an undo tablespace and a temporary tablespace
(usually named TEMP).

Control Files:
 Is a small binary file associated with only one database;
o Has a database name and database unique identifier (DBID);
o Time stamp of database creation;
o Information about data files and online redo log files;
o Has tablespace information;
o Current log sequence number;
o Metadata that must be accessible when the database is not open.

Each database has one unique control file, although it may maintain identical copies of it. The control
file is the root file that Oracle Database uses to find database files and to manage the state of the
database generally. The control file of an Oracle database is created at the same time as the database.

Online Redo Log File:


 Every instance of an oracle database has an associated redo log to protect the database in case
of an instance failure;
 The redo log for each database instance Is also referred to as a redo thread

Module 6

The basic component of a file in a file system is a data item. Examples of data items in the real world
are last name, first name, street address, and employee ID.
A database is a more complex object. It is a collection of interrelated stored data that must meet the
needs of many users. A database must also adhere to the business rules and processes of the
organization.
Advantages of using a database rather than a simple file system:
 Availability of data to a diverse group of users
 Integration of data for easier access and modification when performing complex
transactions
 Data integrity and reduced data redundancy

A possible database solution is to split the flat file into relational database tables, eliminating the
issues related to:
 Redundancy
 Data entry anomalies
 Inconsistency
Importance of business rules:
 Allow the developer/architect to understand the relationship and constraints of the
participating entities;
 Help you understand the standardization procedure that an organization follows when
handling huge data;
 Should be simple and easy to understand
 Must be kept up to date.

Business rules are used to understand business processes and the nature, role, and scope of the data.
Business rules help you categorize and design database tables.
Business rules are generally provided by:
 Managers
 Policy makers
 Documentation and operation manuals
 Organizational procedures and standards
 Interviews with end users
Note: Not all business rules can be modeled in a database.
Business rule: It is used to understand business processes and the nature, role, and scope of the data
Assumption: It can be defined as a fact or a statement that has been taken for granted.
Problem: It can be defined as a situation or scenario that requires attention and a possible solution to
alleviate the situation.
Sectie 2
Module 1
Flat file database - single table:

 A flat file database is a type of database that stores data in a single table;
 Flat file databases are generally in plain-text form, where each line holds only one record.
Unlike a relational database, which makes use of multiple tables and relations, a flat file database
stores data in a single table. Flat file databases are generally in plain-text form, where each line holds
only one record. The fields in the record are separated by delimiters such as tabs and commas.

Flat file database tables can be set in various application types, including HTML documents, simple
word documents, or worksheets in spreadsheet applications. The tables can be sorted based on column
values. These tables serve as a solution for simple database tasks. The flat file design puts all database
information in one table, or list, with fields to represent all parameters. A flat file may contain many
fields, often with duplicated data that is prone to data corruption.

If you decide to merge data between two flat files, you need to copy and paste relevant information
from one file to the other. There is no automation between flat files. For example, if two or more flat
files contain client addresses and a client moved, you would have to manually modify the address
parameters in each file that contains the client's information. Changing information in one file has no
bearing on other files.

Flat files offer the functionality to store information, manipulate fields, print or display formatted
information, and exchange information with others, through email and over the Internet. Some flat
files may be attached to external files, such as text editors, to extend functionality and manage related
Advantages of a flat file database:

 Easy to understand;
 Easy to implement;
 Easy to extract information;
 All records stores in one place;
 Simple sorting and filtering of reports;
 Less hardware and software requirements.
Disadvantages of a flat file database:

 Less security;
 Data inconsistency;
 Data redundancy;
 Cumbersome sharing of information;
 Slow for huge databases.
Module 2
Relational database:

 Presents the information in tables with rows and columns;


 Each column represent a particular type of information (a field), and each row lists one record;
 The tables are then related to one another by using foreign keys;
 A foreign key is simply the primary key in a different table.
A relational database management system (RDBMS) stores data in tables. Each table is given a name
by the user who creates the table. The user generally chooses a name that correlates to the data that
will be stored in the table; for example, STUDENTS, EMPLOYEES, LOCATIONS. When a table is
created, the user also creates and names columns related to the specific characteristics that are stored
for each record.
To understand how important databases have become in today's world, consider the following
statistics:
 Currently 20% of the world's data resides in RDBMSs.
 In the next two years, databases are expected to grow larger than 100 terabytes. A database
this big would be able to store 100,000 copies of the Encyclopedia Britannica or 200,000
hours of music or about 10 billion webpages.
 Some of the top 10 world's largest databases using the Oracle RDBMS are:
 France Telecom, 29.2TB: a communications company (a TB is a terabyte equivalent
to 1,000 gigabytes)
 Amazon.com, 13 TB: an online company selling books and merchandise
 The Claria Corporation,12TB: an Internet behavioral marketing company tracking
Internet user behavior

Advantages of a relational database:


Compared to the flat file system, the RDBMS has several advantages:
 Less redundancy: In a flat file system, there is much redundancy. For example, the names of
university professors and students are stored in more than one file.
 Avoidance of inconsistency : If the same piece of information is stored in more than one
place, then any changes in the data need to be made in all places where the data is stored.
 Efficiency: A database is usually more efficient than a flat file system, because a piece of
information is stored in fewer locations.
 Data integrity: In a database system, it is easier to maintain data integrity because strong data
types are assigned to each column.
 Confidentiality: It is easier to maintain the confidentiality of the information if data storage is
centralized in one location.

Relational tables:
 A table is a simple structure where data is organized and stores.
 Table: A basic storage structure
 Column: An attribute that describes the information in the table
 Primary Key: The unique identifier for each row
 Foreign Key: A column that refers to a primary key column in another table
 Row: Data for one table instance
 Field: The one value found at the intersection of a row and a column

A relational database has six properties:


 Entries in columns are single values;
 Entries in columns are of the same kind
 Each row Is unique
 Order of columns is insignificant
 Order of rows is insignificant
 Each column has a unique name

Module 3
Conceptual model

 Identifies the highest-level relationships between the different entities;


 It contains relationships between entities, but may or may not include cardinality and
nullability;
 It does not specify the primary key, but it does determine a unique identifier for each entity.
A conceptual model is a formal model in which every table being modeled in the real world has a
corresponding object in the model. It describes the things of significance to an organization (entities),
about which it collects information, characteristics (attributes) of these entities, and associations
between pairs of entities (relationships).

The conceptual modeling has a formal analysis and design method that uses a set of guidelines and
rules to capture the semantics of a domain. Formal methods include textual or graphical notations to
create, present, validate, and manipulate data models. It clarifies identification of entities, attributes,
and relationships. It provides a basis for discussion and refinement.

A conceptual model is important to a business because it:


 Describes the exact information needs of the business
 Facilitates discussion
 Prevents mistakes and misunderstandings
 Forms a sound basis for physical database design
 Documents the processes (also known as business rules) of the business
 Takes into account regulations and laws governing this industry

Entities:
 Real-world object or thing that has an independent existence and that is distinguishable from
other objects;
 Eg: Person, car, person,…
Attributes:
 Characteristics of entities or relationships that provide descriptive details about them
 Eg: person’s name, address,…

Relationship:

 Person drives a car.


Creating a conceptual model:
1. Identify entities;
2. Identify attributes (including optionality);
3. Identify unique identifiers;
4. Determine relationships (including optionality and cardinality).

Entity Relationship Diagram (ERD):

 A consistent tool that can be used to represent the data requirements of a business, regardless
of the type of database that is used and even in the absence of one;
 A graphical representation of entities and their relationships to each other and it is used to
organize data within databases or information systems.
ERD is a model that identifies the concepts or entities that exist in a system and the relationships
between those entities.
It serves several purposes:
 The database analyst/designer gains a better understanding of the information to be contained
in the database through the process of constructing the ERD;
 It serves as a documentation tool;
 It is used to communicate the logical structure of the database to users. In particular, it
effectively communicates the logic of the database to users;
 Is a list of all entities and attributes as well as all relationships between the entities that are of
importance;
 Provides background information such as entity descriptions, data types and constraints;
 Does not require a diagram but the diagram is typically a very useful tool.

A good conceptual data model stays the same regardless of the type of database system that is
eventually built or implemented on. (=implementation-free)
The data model should stay the same even if a database is not used at all; for example, when the data is
eventually stored on pieces of paper in a filing cabinet.
Goals of ER Modeling:
 Capture all required information;
 Ensure that all information appears only once;
 Model no information that is derivable from other information that is already modeled;
 Locate information in a predictable, logical place.

Steps to build an ERD:


1. Create entities and attributes;
2. Choose unique identifiers;
3. Build relationships;
4. Identify cardinalities;
5. Check the model.

Module 4
Changing from analysis (conceptual model) to implementation (physical model) also means changing
terminology:
 An entity becomes a table.
 An instance becomes a row.
 An attribute becomes a column.
 A primary UID becomes a primary key.
 A secondary UID becomes a unique key.
 A relationship is transformed into a foreign key column and a foreign key constraint.

ERD Physical Design


Entity Table
Instance Row
Attribute Column
Primary UID Primary key
Secondary UID Unique key
Relationship Foreign key

It is important to learn about attributes because they provide more information about the entities.
Attributes help you be more specific about the data that you need to track.
In the slide example, the STUDENTentity has five attributes, and so the STUDENTStable has five
columns:
 a column of ID numbers, with no ID number appearing more than once in it
 a column of first names of students
 a column of last names of students
 a column of street addresses of students
 A column of cities of students

Attributes are classified as one of the following:


 Not null (mandatory): Indicated by the asterisk ( * ) symbol next to the attribute
 Optional (nulls allowed): Indicated by the o (optional) symbol next to the attribute
Entities have instances. An instance is a single occurrence of an entity. Some entities have many
instances, and some have only a few.

Table Diagram Notations:


 The first row of the table diagram contains the table name and the short name.
 The Key Type column should contain values of "pk" for the primary key, "uk" for the unique key,
or "fk" for the foreign key column. The cell is blank if the column is not a part of a key.
 The Optionality column must contain an asterisk (*) if the column is mandatory and a lowercase
"o" if it is optional. This is similar to the ERD.
 The third column is for the column name.

Table name is the plural of the entity name


Column names are identical to the attribute names, except that special characters and spaces are
replaced with underscores. They also use more abbreviations than attribute names.
Table Short Names:
A unique short name for every table is useful when naming foreign key columns.
For entity names that contain more than one word, take the:
 First character of the first word
 First character of the second word
 Last character of the last word

Example: PRIVATE HOME gets a short name of PHE.


For entity names that contain one word but more than one syllable, take the:
 First character of the first syllable
 First character of the second syllable
 Last character of the last syllable

Example: EMPLOYEE gets a short name of EPE and CLIENT gets a short name of CET.
For entity names that contain one syllable but more than one character, take the:
 First character
 Second character
 Last character

Example: FLIGHT gets a short name of FLT.


Restrictions:
Table and column names:

 Start with a letter;


 Contain up to 30 alphanumeric characters;
 Cannot contains spaces or special characters such as ! $ # _
Some words have a special meaning in the Oracle database and in the SQL programming language.
These words are called “reserved words.” It is best to avoid using these as names for your tables and
columns.
Some common examples of Oracle reserved words are:
 TABLE
 NUMBER
 SEQUENCE
 ORDER
 VALUES
 LEVEL
 TYPE

Module 5
Conceptual Data Model UID:

 A UID is an attribute of an entity that is unique across all instances of the entity;
 It has a non-NULL value for each instance of the entity for the lifetime of the instance;
 Its value never changes over the lifetime of the instance;
 A UID is diagrammed with a number sign #.
In a conceptual data model, a UID is the value or combination of values that enables the user to find
that one unique item among all the rest. Identifying just the right attribute, the combination of
attributes, and the relationships is a skill that any database designer must master. The UID is the way
to find your record in a file, a particular card in a deck of cards, your package in a warehouse, and a
specific piece of data in a database.

The UID is important because it designates which instance of an entity is being dealt with. Identifier
selection is critical because it is also used to model relationships.
Primary key:

 is a column or set of columns that uniquely identifies each row in a table;


 cannot contain null values;
 is either an existing table column or a column that is specifically generated by the database
according to a defined sequence;
 it must contain a unique value for each row of data.
A primary key is a special relational database table column (or combination of columns) designated to
uniquely identify all table records. The PK concept is critical to an efficient relational database.
Without PK and closely related foreign key concepts, relational databases would not work.

For example, a database must hold all data stored by a commercial bank. Two of the database tables
include CUSTOMER,which stores basic and static customer data (for example, customer_id, name,
date of birth, address, and Social Security number) and ACCOUNTS,which stores various bank
account data (for example, account creation date, account type, withdrawal limits, or corresponding
account information).
Composite UID and Primary Key:

 A composite UID Is a combination of attributes;


 A composite primary key is any key that consists of two or more columns.
Sometimes a single attribute is not enough to uniquely identify an entity instance. If the UID is a
combination of attributes, it is called a composite UID.
Composite primary keys occur when the table uses more than one column to make up a primary key.
Relationships:

 Represents an association between two or more entities;


 These lines are either solid or dashed;
 These lines terminate in either a single toe or a crow’s foot.
In a conceptual data model, a relationship is any association, linkage, or connection between the
entities of interest to the business. It is a two-directional, significant association between two entities
or between an entity and itself. Each relationship has a name, an optionality (optional or mandatory),
and a degree (cardinality).

A relationship is identified by a name that is descriptive of the relationship. The relationship name is
usually an active or a passive verb. Relationships between entities can always operate in both
directions.

For example:
 Active verb: A teacher teaches a class.
 Passive verb: A class is taught by a teacher.

A solid line represents the mandatory relationship, and a dashed line represents the optional
relationship.
When reading the business rule sentence, use the following words for the minimum values:
 Optional: Use “may be” or “may.”
 Mandatory: Use “must be” or “must.”

And use the following words for the maximum values:


 Line: Use “one and only one.”
 Crow's feet: Use “one or more.”
Foreign Key

 Is a column or a combination of columns in one table that refers to a primary key in the same
table or another table;
 Relationships in a conceptual model are mapped to foreign keys in a database model.
A foreign key is a column or combination of columns in one table that uniquely identifies a row in
another table. The foreign key is defined in a second table, but it refers to the primary key in the first
table.
In a conceptual data model, relationships are mapped to foreign keys in a database table so that tables
can reference each other. In the examples shown in this lesson, the crow’s foot on the
EMPLOYEEentity maps to a foreign key column in the EMPLOYEEStable. All relationships from a
conceptual data model map to a foreign key column in a database table.
Sectie 3
Module 1
Conceptual model:

 Captures the functional and informational needs of a business;


 Is based on current needs, but may reflect future needs;
 Addresses the needs of a business but does not address its implementation;
 Identifies important entities and relationships among them;
 Does not specify primary keys.
A conceptual data model identifies the highest-level relationships between the different entities. It
captures the functional and informational needs of a business. It includes the important entities and the
relationships among them.
Businesses use data to increase sales, reduce costs, or both. To accurately collect this data, a business
must create a conceptual model of the data that it considers important.
Logical model

 Includes all entities and relationships among them;


 Is called an entity relationship model (ERM);
 Is illustrated in an ERD;
 Specifies all attributes for each entity;
 Determines attribute optionality;
 Determines relationship optionality;
 Determines relationship optionality and cardinality.
A logical data model describes the data in as much detail as possible, without regard to how it will be
physically implemented in the database. It is normally derived from a conceptual data model.

Physical model:
 Is an extension to a logical data model
o Defines table definitions, data types and precision;
o Identifies views, indexes and other database objects.
 Describes how the objects should be implemented in specific database;
 Shows all table structures, including columns, primary keys and foreign keys.
How to create it:
1. Model entities as tables;
2. Model relationships as foreign keys;
3. Model attributes as columns;
4. Modify the physical data model based on physical constraints and requirements.
Conceptual and physical model:

 The art of planning , developing and communicating produces a desired outcome;


 Data modeling is the process of capturing the important concepts and rules that shape a
business and depicting them visually in a diagram;
 This diagram becomes the blueprint for designing the physical thing;
 The client’s dream (conceptual model) becomes a physical reality (physical model).
The conceptual model is concerned with the real-world view and understanding of data. The physical
model specifies how it will be executed in a particular database management instance.

Module 2
Business rules:

 Is a statement that defines or constrains some aspect of the business;


 Is used to define entities, attributes, relationships and constraints;
 It includes rules and policies that define how a business operates and handles its day-to-
operations.
Business rules describe the objects, relationships, and actions that a business finds important and
worth writing down. They generally help a business satisfy its goals and meet its obligations. Business
rules are used every day to define entities, attributes, relationships, and constraints. Usually though,
they are used as an explanation of a policy, a procedure, or a principle.

One of the primary goals of data modeling is to ensure recognition of all pieces of information that are
required to run a business. Identifying and documenting business rules are keys to checking your data
model for accuracy and completeness. Identifying and documenting business rules are very important
to the database design.

Business rules allow the creator to develop relationship participation rules and constraints and to
create a correct data model. They also allow the creators to understand business processes and the
nature, role, and scope of the data. They are a communication tool between users and creators, and
they also help standardize the company's view of the data.

It is important to keep in mind that some business rules cannot be modeled and that not all business
rules can be represented in the ERD. Some business rules must be implemented by programming.
There are two types of business rules:
 Structural
 Procedural

Example: all orders must have a contact number.


Sources of business rules:

 Structural business rule


o Indicate the types of information to be stored and how the information elements
interrelate;
o Can always be diagrammed in the ERD.
o Example: a sales record includes product, buyer,… - all restaurant orders must be
handled by a staff member.
 Procedural business rules
o Deal with the prerequisites, steps, processes or workflow requirements of a business;
o Are related to time – A must happen before B;
o Some cannot be diagrammed, but must still be documented so that they can be
programmed later.
o Example: online store might not accept a next-day delivery order if the order Is
received after 03.00
Not all business rules can be modeled when a conceptual data model is being developed. Some rules,
such as the two listed in the slide, must be implemented by programming them as processes that
interact with data.
Module 3
Entity:

 Information that must be tracked;


 Name for things that you can list.
Entities are categories of things that are important for a business and about which information must be
kept. Entities contain facts and information that the business must know and remember. Some
examples of entities might include the following:
 PERSON: Agent, insured, employee, customer
 PLACE: State, country, municipality
 THING: Inventory item, vehicle, product
 CONCEPT: Policy, risk, coverage, job
 ORGANIZATION: Agency, department
 EVENT: Service request, claim, election

Entity characteristics include the following:


 Is represented by a rectangular box
 Has a unique name, usually in noun form
 Has name in uppercase letters, with no hyphens or underscores
 Has name in the singular, not plural, form; for example "EMPLOYEE" not "EMPLOYEES."
Entity types

Name Description Example


Prime Exists idenpendently Customer, instructor
Characteristic Exists because of another Order, class offering
(prime) entity
Intersection Exists because of two or more Order item, class enrollment
entities

A prime entity is independent and does not depend on the existence of any other entity. Examples include
CUSTOMERand INSTRUCTOR.
A characteristic entity depends on the existence of another entity. Examples include an ORDERentity that
is dependent on a CUSTOMERor on a CLASS OFFERING that is dependent on an INSTRUCTOR.
An intersection entity depends on the existence of two or more entities. Examples include an ORDER
ITEM that is dependent on ORDERand PRODUCT,or a CLASS ENROLLMENT that is dependent on a
CLASS OFFERING and STUDENT.

Entities:

 Contain instances;
 Is a single occurrence of an entity;
 Entities represent a set of instances that are of interest to a particular business.
Entities contain instances of the entity. Some entities have many instances; some have only a few. An
instance of one entity may be an entity in its own right. For example, the "violinist" instance of the
JOBentity could be the name of another entity with instances like Tim KaryorJerry Gam.

You can regard entities as sets of instances. The slide example shows a JOBset, and the set shows
some of its instances.

At the end of the entity modeling process, entities are typically transformed into tables, with each row
of a table representing an individual instance.
Supertype and subtype entities:
 A supertype has a parent-child relationship with one or more subtypes
Sometimes you might need to subdivide an entity into subtypes. This may be the case when a group of
instances has special properties, such as attributes or relationships that exist only for that group. In this
case, the entity is called a supertype and each group is called a subtype.
A subtype:
 Inherits all attributes of the supertype.
 Inherits all relationships of the supertype.
 Usually has its own attributes or relationships.
 Is drawn within the supertype.
 Never exists alone.
 May have subtypes of its own.
 Has identical primary keys of the supertype and subtype.

Example: The Insurance entity can be treated as supertype entities; and entities like life insurance,
health insurance, and liability insurance can be subtype entities.
Subtype:
 Is a subgrouping of the entity in an entity type which has attributes that are distinct from those
in other subgroupings.
 Each subtype is specialization of a supertype and therefore must be enclosed within an entity;
 The common attributes and relationships for all subtypes must be listed only in the supertype,
but they are inherited in every subtype;
 A subtype can and would generally have attributes and relationships on its own;
 There can never be just one subtype, another subtype should be created to contain the rest.

Generalization:
 Is a bottom-up approach where two or more lower level entities are combined to form a higher
lever entity based on the common features.

Generalization is the process of extracting shared characteristics from two or more entities, and
combining them into a generalized supertype. Shared characteristics can be attributes, associations, or
methods.

In the slide example, CARand TRUCKshare the same common attributes, and so their common
attributes are grouped to form the supertype entity named Vehicle.

Specialization means creating new subtypes from an existing entity. If it turns out that certain
attributes, associations, or methods apply only to some of the entity’s objects, a subtype can be
created.

In the slide example, the EMPLOYEE supertype is broken down into two subtypes, CURRENT
EMPLOYEE and EX-EMPLOYEE.

Specialization:
 Is a top-down approach where the higher level entity is broken down into lower level entities

Generalization is the process of extracting shared characteristics from two or more entities, and
combining them into a generalized supertype. Shared characteristics can be attributes, associations, or
methods.

In the slide example, CARand TRUCKshare the same common attributes, and so their common
attributes are grouped to form the supertype entity named Vehicle.
Specialization means creating new subtypes from an existing entity. If it turns out that certain
attributes, associations, or methods apply only to some of the entity’s objects, a subtype can be
created.

In the slide example, the EMPLOYEE supertype is broken down into two subtypes, CURRENT
EMPLOYEE and EX-EMPLOYEE.
Entity subtype rules:

 Exhaustive:
o Every instance of a supertype is also an instance of the one of the subtypes;
o OTHER should be included as a subtype to categorize instances that are not defined
by one of the existing subtypes;
o Example: an employee must be full time, part time or other.
When an ER model is complete, subtypes never stand alone. In other words, if an entity has a subtype,
a second subtype must also exist. This makes sense. A single subtype is exactly the same as the
supertype.

 Mutually exclusive:
o Every instance of the supertype is of one and only one subtype;
o Example: an employee cannot be both full time and part time.
Module 4
Attributes:

 Describe entities and are the specific information that must be known;
 Is a single-valued property detail of an entity;
Attributes are information about an entity that must be known or held. Attributes describe an entity by
qualifying, identifying, classifying, quantifying, or expressing the state of the entity. Attributes
represent a type of description or detail, not an instance.

Attribute names are singular. An attribute value can be a number, a character string, a date, or an
image. Every attribute has a specific data type in the database. An attribute for an entity must be single
valued. In more precise terms, an entity instance can have only one value for that attribute at any point
in time.
Attribute characteristics:

 Are shown within the entity box on the ERD;


 Are singular and mixed case or lower case;
 Should not include the entity’s name, because attributes are qualified with the entity name.
 Are classified:
o Mandatory (nulls are not allowed), indicated by *
o Optional (nulls are allowed), indicated by a lowercase o
Volatile and non-volatile attributes:
Volatile attributes:

 are unstable
 Age is a volatile attribute
Nonvolatile attributes:

 are stable.
 Date of Birth which is a stable attribute
Mandatory attributes:

 Must have a value


Optional attributes:
 Cannot have a value and can be blank

An attribute can be required or optional. A mandatory attribute is an attribute that must have a value
before an instance of the object can be created.

If an attribute is optional, a value does not need to be assigned to create an instance of the object.
In the slide example, the STUDENT entity includes Student ID and First Name mandatory attributes.

They uniquely define that table, and we assume that all students have a first name and student ID.
Middle Name and Email are optional attributes because some students may not have a middle name or
an email address.

A simple attribute:

 Single or atomic attributes are attributes that cannot be divided into subparts.
 Emp ID, age, and sex are examples of simple attributes
Composite attributes:

 Can be divided into smaller subparts that represent basic attributes with independent meanings
of their own.
A simple attribute consists of a single atomic value that cannot be subdivided. Emp ID, age, and sex
are examples of simple attributes.
A composite attribute is an attribute that can be further subdivided. For example, the Name attribute
can be subdivided into First Name, Last Name, and Middle Name. 10

Single valued attribute:

 Can have a single value at a particular instance of time


 Eg: student last name
Multi-valued attribute:

 Can have more than one value at one time


 Eg: student address
A single-valued attribute can have only a single value. For example, a person can have only one date
of birth and one age. But it can be a simple or a composite attribute. That is, date of birth is a
composite attribute and age is a simple attribute. But both are single-valued attributes.
Multi-valued attributes can have multiple values. For instance, a person may have multiple degrees
and phone numbers or a student address.
Rules for drawing entities:

 Is represented as a round-cornered rectangle;


 Must be named and the upper name must be placed inside the entity in the upper-left corner;
 Should be in uppercase form;
 Should be in singular form.
 Must be written so everyone understands them
 Attributes must be writing with the first letter of each word in uppercase and the rest in
lowercase
 A symbol representing the type of attribute should be placed next to each attribute.
Annotations are attached to attributes to model their participation: Use the asterisk ( * ) for mandatory,
the lowercase “o” for optional, and the number sign ( # ) for identifying the attribute as part of the
entity's unique identifier
Rules for drawing relationships:

 Can exist between a maximum of two entities;


 Can exist on the same entity;
 Has two perspectives;
 Both perspectives of a relationship must be labeled.
The Bachman notation is also a binary model.
 An entity is represented by a box.
 Attributes:
o Show * for not null and no notation for null attributes.
o Show P for UID and F for the attribute created through the relationship
 Relationship lines:
o Show an arrow instead of crow's feet for the maximum cardinality
o Show an open circle or filled-in circle instead of a dotted line and a solid line for the
minimum cardinality
The relationship is depicted as a line connecting the associated entities. The relationship is given a
phrase to describe the association at both ends of the line. In Bachman notation, a subclass is
represented as an inner box within the superclass.

The Information Engineering notation is also a binary method. It is simple and easy to read, and is
well suited for high-level logical and enterprise data modeling.

It does not allow a ternary relationship, and it does not show attributes related to a relationship. Both
cardinality and participation constraints are combined into the min/max (bar and crowfoot) notation.

The only drawback of this notation is that it does not support identification of an entity’s attributes.
Data Model Notations:
Module 5
Unique Identifiers

 Is an attribute of an entity that meets the following rules:


o It’s unique across all instances of the entity;
o It has a non-NULL value for each instance of the entity for the lifetime of the
instance;
o It has a value that never changes for the lifetime of the instance.
 Is a special attribute or group of attributes that uniquely identifies a particular instance of an
entity.
The UID is the value or combination of values that enables the user to find that one unique item
among all the rest. Identifying just the right attribute, combination of attributes, and relationships is a
skill that any database designer must master. The UID is the way to find your record in a file, a
particular card in a deck of cards, your package in a warehouse, and a specific piece of data in a
database.

A single-attribute UID that is a simple UID. However, sometimes a single attribute is not enough to
uniquely identify an instance of an entity. If the UID is a combination of attributes, it is called a
composite UID.

The UID can be information bearing, when its values have some business meaning. A non-
information-bearing UID is sometimes referred to as a artificial key. An information-bearing UID is
sometimes referred to as a natural key.

In the slide example, the UID for the DEPARTMENTentity is the IDattribute. Notice that the
DEPARTMENTentity UID contains one attribute, and the attribute has no business meaning, which
means that it is not information bearing.

For the CONCERT TICKET entity, the UID is the combination of the Performance Date and the Seat
Number attributes. In this case, the UID is more than one attribute, and it is information bearing.
An identifying relationship is created when the UID for an entity includes the relationship with
another entity. In the slide example, the UID for the ACCOUNTentity is the ACCOUNTNumber as
well as the relationship between BANKand ACCOUNT. The identifying relationship is depicted with
a vertical bar on the relationship line.

Note that a relationship included in a UID must be mandatory and one-and-only-one in the direction
that participates in the UID.

An entity may be uniquely identified through multiple relationships. In the slide example, an
EMPLOYEEand a PROJECTare needed to make WORK ASSIGNMENT unique, so both
relationships are included in the UID for WORK ASSIGNMENT.

Note that WORK ASSIGNMENT is an intersection entity that is the resolution between an M:M
relationship. This topic is discussed in a later lesson.

A non-identifying relationship is created when the UID for an entity does not need the relationship
with another entity for it to be considered unique.

In the slide example, the ACCOUNTentity does not need the relationship with ACCOUNT TYPE for
it to be considered unique. There can be instances of an ACCOUNT TYPE that do not have an
ACCOUNT.

Artificial UIDs do not occur in the natural world but are created for identification purposes in a
system.
For example, a shoe has a color, a size, a style, but no truly descriptive “number.” However, a shoe
store will assign unique numbers to each pair of shoes so they can be uniquely identified.
An entity potentially can have more than one UID. In the slide example, the EMPLOYEEentity has
two candidate UIDs: badge number and payroll number.
When this situation occurs, select one candidate UID as the primary UID and select the others as
secondary UIDs.
There is no standard diagramming convention for tagging secondary UIDs.

Module 6
Relationship:

 Is a bidirectional, significant association between two entities or between an entity and itself
A relationship represents the business rules that link entities. Each relationship always has two
business rules. In the slide example, the business rules are:
 A DEPARTMENT may contain one or many EMPLOYEEs.
 An EMPLOYEE must be assigned to one and only one DEPARTMENT.

Each direction of a relationship has:


 A name (for example, "contain one" or "assigned to")
 An optionality (for example, either "must be" or "may be")
 A degree (for example, either "one and only one" or "one or more")

The components of the relationship include the following:


Name: The label that appears close to the entity it is assigned to. Make sure that all relationship names are
in lowercase (for example, "assigned to" or "responsible for").
Cardinality: The minimum and maximum number of values in the relationship.
 Minimum values can be either optional (zero) or mandatory (at least one).
 Maximum values can be either one or many.

When reading the business rule sentence, use the following words for the minimum values:
 Optional: Use "may be" or "may."
 Mandatory: Use "must be" or "must."
When reading the business rule sentence, use the following words for the maximum values:
 Line: Use "one and only one."
 Crow's feet: Use "one or more."
Business rule syntax is as follows:
Each entity1{must be or may be} relationship name{one or more or one and only one} entity2.

Optionality in a relationship:

Cardinality in a relationship:
Examples:

There are three types of relationships:


 Many to one (M:1) or one to many (1:M): There are crow's feet on one side of the relationship.
The direction of the crow's feet determines whether the relationship is M:1 or 1:M. This type of
relationship is the most common.
 Many to many (M:M): There are crow's feet on both sides of this relationship. It is common to
see M:M relationships in a high-level ERD at the beginning of a project.
 One to one (1:1): This type of relationship is a line without crow’s feet on either end. These types
of relationships are rare.

Note: The notation is slightly different in Oracle SQL Developer Data Modeler, where one to many is
1:N and many to many is M:N.

In the one-to-many (1:M) relationship, a single record in entity A can be related to one or more records
in entity B, but a single record in entity B can be related only to one record in entity A.

In the many-to-many (M:M) relationship, many records in one entity match many records in another
entity.

In the one-to-one relationship:


One additional relationship that must be mentioned is a recursive relationship. Recursive relationships
are relationships with an entity and itself. In the slide example, there is a recursive relationship with
the EMPLOYEE entity. There are still two business rules for this type of relationship.

Non-transferability relationships:

Transferability is the ability of the relationship between two instances of an entity to change over time.
For example, the birth place of a person cannot be transferred, so it is called non-transferable . A
person may change departments or jobs within a company, so the relationship is transferable.

Relationship matrix:
A relationship matrix has the following characteristics:
 A relationship matrix shows if and how each row entity on the left side of the matrix is related to
each column entity shown across the top of the matrix.
 All entities are listed along the left side of the matrix and at the top of the matrix.
 If a row entity is related to a column entity, the name of that relationship is shown in the
intersection box.
 If a row entity is not related to a column entity, the intersection box is empty.
 Each relationship above the diagonal line is the inverse or mirror image of a relationship below the
line.
 Recursive relationships are represented by the boxes on the diagonal.

Relationship Matrix:

To map the matrix to an ERD, you draw a box for each entity and then draw the relationship and state
the business rule. Doing so helps to determine each relationship’s type and cardinality.
To help in the transformation between the matrix and the ERD, follow these steps:
1. a.Determine a relationship's existence.
2. b.Name the relationship.
3. c.Determine the relationship’s cardinality.

The first step in creating a relationship is to determine its existence. Ask yourself, “Does a significant
relationship exist between ENTITY Aand ENTITY B?”
In the slide example, consider the following questions:
 Is there a significant relationship between DEPARTMENTand EMPLOYEE? The answer is yes.
 Is there a significant relationship between DEPARTMENTand ACTIVITY? The answer is no.
 Is there a significant relationship between ACTIVITYand EMPLOYEE? The answer is yes.

Log the relationships among ACTIVITY, DEPARTMENT, and EMPLOYEEon a relationship matrix.
The check marks indicate that a relationship exists.
A relationship matrix is used to systematically examine each pair of entities.

Naming the relationship:

Relationship names represent a role and tend to be passive verbs, noun role names, or prepositions.
Try not to use "related to" or "associated with" as relationship names because they lack specific
meaning and are weak.
The following is a sample list of relationship name pairs to assist you in naming relationships:
 based on/the basis for
 bought from/the supplier of
 description of/for
 operated by/the operator for
 represented by/the representation of
 responsible for/the responsibility of

Log the relationship names in the relationship matrix as shown in the slide.**

Determining the relationship minimum cardinality:

The first question to answer is this: What is the minimum cardinality for each direction of the
relationship?
In the slide example, answer the following questions:
 Must an EMPLOYEEbe assigned to a DEPARTMENT? The answer is always.
 Is there any situation in which an EMPLOYEEwill not be assigned to a DEPARTMENT?The
answer is no. An EMPLOYEEmust always be assigned to a DEPARTMENT.(Mandatory)
 Must a DEPARTMENTbe composed of an EMPLOYEE?The answer is no. A
DEPARTMENTdoes not have to be composed of an EMPLOYEE.(Optional)

When the minimum cardinality is optional, the value could be zero. When the minimum cardinality is
mandatory, the value must be at least one.
Note that the relationship line in the slide was intentionally drawn without the maximum cardinality.
Determining the relationship maximum cardinality:

The second question to answer is this: What is the maximum cardinality for each direction of the
relationship?
In the slide example, answer the following questions:
 Must an EMPLOYEEbe assigned to more than one DEPARTMENT? The answer is no. An
EMPLOYEEmust always be assigned to one and only one DEPARTMENT.(One)
 May a DEPARTMENTbe composed of more than one EMPLOYEE? The answer is yes. A
DEPARTMENTmay be composed of one or more EMPLOYEEs.(Many)

When the maximum cardinality is one, the value can be only one. When the maximum cardinality is
many, the value can be one or more.

Validating the relationship:


Create ERDish sentences to represent ERDs

Data modeling uses industry-specific terminology as well, which we will call ERDish for the purposes
of this class. ERDish is the vocabulary used to clearly communicate the business rules that are
captured on an ERD will give you a common language both when collecting the business rules from
your client and communicating them to the Database Administrators who will implement your design.

ERDish is the language you use to state relationships between entities in an ERD. You have already
been speaking and writing it, when you identified relationships and specified optionality and
cardinality. We are simply breaking down each ERDish sentence into its components.

Components of ERDish
M:M Relationships:

An intersection entity (ORDER ITEM) was created to store additional attributes. The UID will be the
relationships between ORDERand PRODUCT.
Here are the intersection entity characteristics:
 The relationships from the intersection entity are always mandatory.
 Intersection entities usually contain consumables like quantity used and dates. They tend to be
high-volume and volatile entities.
 An intersection entity is identified by its two originating relationships (identifying relationships).

Note: Choosing a name for the intersection entity is often difficult. Leave it blank at first, move the
relationships, add the attributes, and then it will often be clearer what the name should be.
If you do not have any additional attributes in the intersection entity, you can leave it as an M:M
relationship and Oracle SQL Developer Data Modeler will create the intersection table in the relational
Barred relationship:

A barred relationship is mapped to a foreign key column on the many side, just like any other M:1
relationship. The UID of the intersection entity often comes from the originating relationships and is
represented by bars.
In this case, the relationships from the originating entities to the intersection entity are called barred
relationships.
Compose Unique Identifier:

Modeling hierarchical data


Recursive relationships:

A recursive relationship is one where an entity is related to a similar entity. For example, every
manager is also an employee. So manager is not a new entity, but just a subset of the instances of the
EMPLOYEE entity. This many-to-one relationship is optional in both directions. It is the same as the
relationship between two distinct entities.
Generic Modeling:

The attribute quantity is associated with the recursive relationship, and therefore must be resolved.
Resolve the M:M recursive relationship by adding the ASSEMBLYRULE intersection entity and two
1:M relationships back to the COMPONENTentity.ASSEMBLY RULE will have a Quantity attribute.

Using the instance data again, the ASSEMBLY RULE instance for washers to fan will have a 1:M
relationship to the COMPONENTinstance for washer and a second 1:M relationship to the
COMPONENTinstance for fan. The ASSEMBLY RULE entity will record the quantity of washers
that are part of a single fan.
Arc Relationship

An arc is an exclusive relationship group, which is defined such that only one of the relationships can
exist for any instance of an entity. An exclusive relationship occurs when two or more mutually
exclusive relationships from the same entity use an arc. The relationship implies an "or" condition.
The arc drawn between the two relationships connects them and demonstrates mutual exclusivity. The
arc indicates that any instance of that entity can have only one valid relationship of the relationship in
the arc at any one time.
Here are the characteristics of exclusive relationships:
 The relationships in an arc frequently have the same relationship name.
 The relationships in an arc must be either all mandatory or all optional.
 An arc belongs to a single entity and must include only relationships that originate from the entity.
 An entity may have multiple arcs, but a specific relationship can participate only in a single arc.

The supertype and subtype (arc) implementation produces one table for every entity. The supertype
table has a foreign key for each subtype table. These foreign keys represent exclusive relationships and
are optional because only one foreign key can have a value for each row in the table.
Module 8
Accomodating historical Data

Most businesses need to track some historical data to help them find trends and patterns that can be the
basis for business innovations or process improvements.

Every update of an attribute or transfer of a relationship means loss of information. Often information
is no longer useful, but some systems must keep track of some or all of the historical values of an
attribute. In addition, some systems are required to keep an audit trail of each transaction.

Validate any requirements for storing historical data with the user, because storing unnecessary
historical data can be costly.
Tracking attribute values

As shown in the slide example, the CONTRACTentity was initially created with four attributes.
Because the attributes are contained in one entity, the contract's status and effective date will change to
the current status and effective date each time the contract changes. If you are required to track
whenthe status of a particular contract changes, create an additional entity and move the status and
effective date to the new entity. In this way, the information about when the status of a contract
changed can be tracked.
If the status changes more than once on the same day, you should include a current status value
attribute in the unique identifier.
Accommodating a changing relationship:

You may need to add an entity to keep track of how a relationship changes over time. In the slide
example, a person can rent an apartment, but an apartment can be vacant. If you want to track which
apartments are rented at a particular point in time, add an entity to store rental information about an
apartment.
In the RENTALentity, the apartment code and rental effective date are the unique identifier. You
would not include the person ID because you are storing the current and historical rentals. You would
not have two rentals for the same apartment with the same effective date.
Creating intersection entities:
Resolving M:M relationships in an intersection entity often allows you to track information about a
relationship over time. In the in the slide example, there is an M:M relationship between
EMPLOYEEand JOB, but you cannot store information about an employee’s jobs over time. The JOB
HISTORY entity stores this information with two identifying relationships with the EMPLOYEEand
JOBentities.

Logging modified information:

When a system allows a user to modify or remove particular information, you may decide to keep the
old values. This is called logging or journaling and is common in financial type models.
In the slide example, a customer may make multiple payments and you can make changes to a
payment. If you want to track those changes, you can store the history in the PAYMENT
MODIFICATION entity.
Module 9:
Normalizing data

The goal of data normalization is to reduce and even eliminate data redundancy. It is important for
application developers because it is difficult to store objects in a relational database that maintains the
same information in several places.
Data in each table should contain unique instances of that entity with valid attribute values. For
example, the Customer Entity (table) should have only one customer, not the same customer with a
misspelled surname or a changed surname ; and certain customer attributes should be valid. You can
achieve that by using table constraints like Primary Key, Unique Key, Check Constraint, and Default
Constraint.
What is normalization:

Normalization is a relational database concept. However, if you have created a validated ERD, the
tables created during the design will conform to the rules of normalization. Each formal normalization
rule from the relational database design has a corresponding data model interpretation. The
interpretations that can be used to validate the placement of attributes in an ERD are shown in the
slide.
The goal is to normalize data to 3NF before transforming the model to your relational design.
Types for Normal Forms
First Normal Form

First Normal Form validates that each attribute has a single value for each occurrence of the entity. No
attribute should have a repeating value. In the slide example, the Date Contacted and Contact Details
attributes could have more than one value for each CLIENT ID. It is not in 1NF.
You must perform the following:
 Create another entity and move the repeating attributes to the new entity.
 Create an identifying 1:M relationship with the new entity.

Second Normal Form

Second Normal Form validates that each attribute is dependent on its entity's UID. Each specific
instance of the UID must determine a single instance of each attribute. Each attribute is not dependent
on only part of its entity's UID.
In the slide example, the Bank Location attribute is dependent on the BANKentity, not on the
ACCOUNTentity. Therefore, the attribute must be moved to the BANKentity.
Third Normal Form

Third Normal Form validates that each attribute depends only on the UID of its entity (and on nothing
else). You need to move any non-UID attribute that is dependent on another non-UID attribute into a
new entity.
In the slide example, the Quantity and Price attributes are dependent on both the Order ID (UID) and
the Line Item ID (non-UID) attributes. Because these attributes are dependent in part on a non-UID
attribute, the attributes, along with the non-UID attribute (Line Item ID), should be moved to a new
entity and an identifying relationship should be created.
Boyce Codd Normal Form

Each table may have one or more candidate keys. From the given candidate key, one is selected as the
primary key
For example, if a student has many skills in different academic fields, move the skill and academic
field information to separate tables because they repeat and they are independent of each other.
Fourth Normal Form (4NF) ensures that there are no nontrivial multivalueddependencies in a table
other than a candidate key. For example, consider a table consisting of three attributes; say, Course,
Instructor, and Reference Book. If you change a particular course's recommended reference book, the
change would not affect the course per se.
Fifth Normal Form (5NF) differs from the definitions of the previous normal forms. It describes a goal
to be reached, rather than the declaration or resolution of a particular anomaly. The goal to be reached
with 5NF is to continue splitting the tables until either of the following two conditions is reached:
 Further splitting results in tables that cannot be joined to recreate the original (flat) table.
 The only splits left are trivial.

Examples of Forms

First Normal Form requires that no multivaluedattributes exist. To check for 1NF, validate that each
attribute has a single value for each instance of the entity. One code, one name, and one address exist
for the school building.

Second Normal Form (2NF) requires that any non-UID attribute be dependent on the entire UID.
Here, the school building name and address are tied to the school building code. Similarly, the
department name is tied to the department code.
The Third Normal Form (3NF) rule states that no non-UID attribute can be dependent on another non-
UID attribute.
Third Normal Form prohibits transitive dependencies. A transitive dependency exists when any
attribute in an entity is dependent on any other non-UID attribute in that entity.

Unnormalized data

Data that has not been normalized is considered unnormalized data. Do not confuse this data with
denormalized data. If an entity relationship model was not created at the start of a database design
project, you are likely to have unnormalized data rather than denormalized data.
If you want to add redundancy for faster performance or other reasons, follow the rules defined during
the process of denormalization after you forward
Normzalization examples:

To transform the model to First Normal Form (1NF), the table must express a set of unordered, two-
dimensional table structures. A table is considered to be in 1NF if it contains no repeating groups.
Perform the following steps:
 a.Identify the data attributes in the order form.
 b.Group the data attributes into a single data entity.
 c.Determine which attribute will serve as the primary key.

You can now remove the repeating groups by performing the following steps:
a.Create a new table with the primary key of the base table and the repeating columns.
b.To ensure that the primary key is unique, add another appropriate column.
To transform the model to Second Normal Form (2NF), you must remove attributes that are dependent
on only one piece of a data entity's multi-part key.
Perform the following steps:
 Locate data entities that have multi-part keys.
 Move attributes that relate to only a piece of the multi-part key to a new data entity.

The key for the new data entity consists of the part of the old multi-part key that uniquely identifies
the attributes.

To transform the model to Third Normal Form (3NF), you must remove attributes that do not directly
relate to the primary key. As a result, all attributes depend on non-primary key attributes within the
same data entity or in another data entity.
Perform the following steps:
 Locate attributes that do not directly relate to the primary key.
 Move those attributes to a new data entity.

The key for the new data entity is the attribute that uniquely defines the data entity.

Вам также может понравиться