Вы находитесь на странице: 1из 40

Database and Knowledge

Management
Unit 2
Week 08
Database
• A database is a collection of data or
information that is organized so that it can be
easily accessed, managed and updated.
A group of related files makes up a database.

a group of records of the same type is


called a file.

A group of related fields, such as the


student’s name, the course taken, the
date, and the grade, comprises a record

a group of words, or a complete


number (such as a person’s name
or age) is called a field.
Data Storage Formats
• Files are electronic lists of data that have been
optimized to perform a particular transaction.
• A flat file is a simple database program whose
records have no relationship to one another.
Flat file databases are often used to store and
manipulate a single table or file.
Flat File
• A flat file stores data in a single table. Flat file
databases are generally in plain-text form,
where each line holds only one record. The
fields in the record are separated using
delimiters such as tabs and commas. Flat files
usually consist of a file which stores data in a
structured way.
• Spreadsheets like Excel
Problems in Traditional File System
• Data redundancy is the presence of duplicate
data in multiple data files so that the same data
are stored in more than one place or location.
• Data redundancy wastes storage resources and
also leads to data inconsistency, where the same
attribute may have different values.
• Lack of Flexibility –does not provide ad-hoc reports
• Poor Security
• Lack of Data Sharing
• Program-Data Dependence
Database
• A database is a collection of groupings of
data/information that are related to each
other in some way (e.g., through common
fields). Logical groupings of information could
include such categories as customer data,
information about an order, and product
information.
Database Management System
• A database management system (DBMS) is
software that creates and manipulates these
databases.
– End-user DBMSs – Microsoft Access
– enterprise DBMSs - DB2, SQL Server, and Oracle
– Open-source DBMS’s – MySQL, Caspio
DATABASE MANAGEMENT SYSTEMS
• A database management system (DBMS) is
software that permits an organization to
centralize data, manage them efficiently, and
provide access to the stored data by application
programs.
• The DBMS acts as an interface between
application programs and the physical data files.
When the application program calls for a data
item, such as gross pay, the DBMS finds this item
in the database and presents it to the application
program.
Relational DBMS
• Relational databases represent data as two-
dimensional tables (called relations). Tables may
be referred to as files. Each table contains data on
an entity and its attributes.
• Microsoft Access is a relational DBMS for desktop
systems, whereas DB2, Oracle Database, and
Microsoft SQL Server are relational DBMS for
large mainframes and midrange computers.
MySQL is a popular open-source DBMS, and
Oracle Database Lite is a DBMS for small
handheld computing devices.
RDBMS
• A relational database is a collection of data items with
pre-defined relationships between them.
• These items are organized as a set of tables with
columns and rows. Each column in a table holds a
certain kind of data and a field stores the actual value
of an attribute. The rows in the table represent a
collection of related values of one object or entity.
• Each row in a table could be marked with a unique
identifier called a primary key, and rows among
multiple tables can be made related using foreign keys.
This data can be accessed in many different ways
without reorganizing the database tables themselves.
RELATIONAL DATABASE TABLES
• The database has a separate table for the entity
SUPPLIER and a table for the entity PART. Each
table consists of a grid of columns and rows of
data.
• Each individual element of data for each entity is
stored as a separate field, and each field
represents an attribute for that entity. Fields in a
relational database are also called columns.
• The actual information about a single supplier
that resides in a table is called a row. Rows are
commonly referred to as records.
RELATIONAL DATABASE TABLES
Primary Key
• The field for Supplier_Number in the
SUPPLIER table uniquely identifies each record
so that the record can be retrieved, updated,
or sorted and it is called a key field.
• Each table in a relational database has one
field that is designated as its primary key. This
key field is the unique identifier for all the
information in any row of the table and this
primary key cannot be duplicated.
Foreign Key
• When the field Supplier_Number appears in
the PART table it is called a foreign key and is
essentially a lookup field to look up data
about the supplier of a specific part.
THE THREE BASIC OPERATIONS OF A RELATIONAL DBMS

The select, join, and project operations enable data from two different tables
to be combined and only selected attributes to be displayed.
What is Normalization?
• Normalization is the process of taking data
from a problem and reducing it to a set of
relations while ensuring data integrity and
eliminating data redundancy.
• Data integrity - all of the data in the database
are consistent, and satisfy all integrity
constraints.
Normalization
• The process of creating small, stable, flexible
and adaptive data structures from complex
groups of data is called normalization.
Normalization
• Normalization
– Process for evaluating and correcting table
structures to minimize data redundancies
– Works through a series of stages called normal
forms:
• First normal form (1NF)
• Second normal form (2NF)
• Third normal form (3NF)

21
1NF (First Normal Form) Rules
• Each table cell should contain a single value.
• Each record needs to be unique.
Example
Example

EmpNum EmpPhone EmpDegrees


123 233-9876
333 233-1231 BA, BSc, PhD
679 233-1231 BSc, MSc
First Normal Form

EmpNum EmpPhone EmpDegrees


123 233-9876
333 233-1231 BA
333 233-1231 BSc
333 233-1231 PhD
679 233-1231 BSc
679 233-1231 MSc
Exercise

Cust.ID Full Names Physical Address Movies Rented Saluation Category


Pirates of the
Caribbean, Clash
1 Janet Jones First Street Plot No4 of the Titans Ms. Action, Action
Daddys Little Girl,
2 Rober Phil 3rd Street 45 Marshal Mr. Romance, Romance
3 Ram 6th Avenue Clash of the Titans Mr. Action
First Normal Form

C.ID Full Names Physical Address Movies Rented Saluation Category

1 Janet Jones First Street Plot No4 Pirates of the Caribbean Ms. Action

1 Janet Jones First Street Plot No4 Clash of the Tians Ms. Action

2 Rober Phil 3rd Street 45 Daddys Little Girl Mr. Romance

2 Rober Phil 3rd Street 45 Marchal Mr. Romance

3 Ram 6th Avenue Clash of the Titans Mr. Action


Second Normal Form
• The table should be in the First Normal Form.
• There should be no Partial Dependency.
• All non-key attributes are fully dependent on
the primary key.
• All partial dependencies are removed and
placed in another table.
Example
Second Normal Form

C.ID Full Names Physical Address Saluation


1 Janet Jones First Street Plot No4 Ms.
2 Rober Phil 3rd Street 45 Mr.
3 Ram 6th Avenue Mr.

C.ID Movies Rented Category


1 Pirates of the Caribbean Action
1 Clash of the Titans Action
2 Daddys Little Girl Romance
2 Marchal Romance
3 Clash of the Titans Action
3NF (Third Normal Form) Rules

• A table is already in 2NF


• Rule is to remove to a new table any non-key
attributes that are more dependent on other
non-key attributes than the table key.
• Transitive Dependence
• Example
• Customer Id; Country Code; Country Name
Benefits of Normalization
• Greater overall database organization
• Reduction of redundant data
• Data consistency within the database
• A much more flexible database design
• A better handle on database security
Exercise
1NF Table

Source: https://www.sqa.org.uk/e-learning/MDBS01CD/page_26.htm

Вам также может понравиться