You are on page 1of 53




Information Quality

Characteristics of high quality information:

Relevance Accuracy

Completeness Timeliness
Information Quality
Information quality impacts decision-making

The importance of data

Information Quality
Potential business effects resulting from low
quality information include:

Inability to accurately track customers

Difficulty identifying valuable customers

Inability to identify selling opportunities

Marketing to nonexistent customers

File Organization terms and concepts

The data hierarchy:

Group of related files

Group of records of the same type

Group of related fields

Group of characters (e.g. a word, a group of words)

Group of bits that represents a single character

(ex. K in 2KA3 is a byte)
Smallest unit of data; binary digit (0,1 0 or 1)
Traditional File Environment
The data hierarchy:
A file describes an entity.
Entity:a person, place, thing, or event about
which information is stored and maintained
Attribute:Each characteristic or quality
describing a particular entity
They explain the entity
Traditional File Environment
Problems with the traditional file environment:
Problems with Traditional File Environment
Data redundancy and inconsistency
Data redundancy: duplicate data in multiple data files so that the same data are stored in
more than 1 place or location
Data redundancy leads to data inconsistency where the same attribute may have different
values. E.g., the data for entity COURSE may have update in some systems
Confusion might result from using different coding systems to represent 1 value of an
Program-data dependency
Coupling of data stored in files and the specific programs so that changes in
programs require changes to the data
Lack of flexibility
Deliver routine scheduled reports can be extracted from extensive programming
efforts, but ad hoc reports are too expensive and cannot respond to unanticipated
information requirements in a timely fashion
Poor security

Hard to find who is accessing or making changes to the organizations data

Lack of data sharing and availability

Virtually impossible for information to be shared or accessed in a timely manner
Information cannot flow freely across different functional areas or different parts
of the organization
The Database Approach to Data
A collection of data organized to serve many applications
efficiently by centralizing the data and controlling redundant
Single database service multiple applications

Database Management System (DBMS):

Software that permits an organization to centralize
data, manage them efficiently, and provide access
to the stored data by application programs.
An interface between application programs and the
physical data files
The Database Approach
The Database Approach

DBMS advantages
Reduce redundancy & inconsistency through
minimizing isolated files

Increase accessibility and availability of


Reduce program development and maintenance


Increase data security

The Database Approach
Relational DBMS is the most popular type of DBMS

Relational databases represent data as two

dimensional tables (called relations)
Tables may be referred to as files

Each table contains data on an entity and its


Examples: MS Access, Microsoft SQL server,

The Database Approach



Each field represents

Entity an attribute for an
Primary key (key
field): Unique
identifier for all the
information in any row Foreign key:
- Cannot be duplicated References a column
(the primary key) of
another table
A look up field
The Database Approach
Question: Find the name of suppliers that could provide us with part number 137 or part
number 150.

Join: combines
relational tables to
provide the user with
more information

Select: creates a Project: Create a

subset consisting subset consisting of
of all records in columns in a table;
the file that meet contain only the
stated criteria information required
The Database Approach
Watch the video to learn more about DB and DBMS
The Database Approach

Non-relational databases (NoSQL)

New needs
Cloud/ web services
Data volumes
New data types

Non-relational DB
Use more flexible data model
Manage large sets across multiple machines for
scaling up or down
Accelerate simple queries against large volumes
of structured and unstructured data
The Database Approach
More information on the difference between relational
DBs and No SQL
The Database Approach

Databases in the Cloud

E.g. Amazon Web Services

Amazon RDS (relational database service)
The Database Approach
Capabilities of DBMS
Data Definition
The capability to specify the structure of the content of the DB
Used to create DB tables and to define the characteristics of the
fields in each table, which is documented in a data dictionary
Data Dictionary
Stores definition of data elements and their characteristics
Data Manipulation
Used to add, change, delete, retrieve, and querying and reporting
the data in the DB
The most prominent data manipulation language: SQL (Structured
Query Language)
The Database Approach
Designing databases
To create a database, you must understand:

The relationships among the data

The type of data that will be maintained in the DB
How the data will be used
How the organization will need to change to manage data
from a company-wide perspective
The Database Approach

Conceptual/ Physical
Logical Design Design

An abstract model Shows how the

of the database Design a database is
from a business database actually arranged
perspective how on direct-access
the data element in storage devices
the database are to
be grouped
The Database Approach
To use a relational database model effectively, complex
groupings of data must be streamlined to minimize
redundant data elements and awkward many-to-many

The process of creating small, stable, yet flexible and

adaptive data structures from complex groups of data is
called Normalization.
The Database Approach
Order (before Normalization)

Normalized Tables
The Database Approach
Referential Integrity:
Rules to ensure that
relationships between
coupled tables remain
The Database Approach
Entity-Relationship Diagram (ERD):
A methodology for documenting databases
illustrating the relationship between various entities
in the database

In an ERD:

Boxes represent entities

Lines connecting the boxes represent relationships
The Database Approach

One-to-one One-to-Many
relationship relationship
Using DB to Improve Business
performance and decision making
Using databases to keep track of basic
Paying suppliers
Processing orders
Keeping track of customers
Paying employees

Companies also need databases to

provide information that will help
the company run more efficiently
and make better decisions
Using DB to Improve Business
<2010, mostly transactional data and relational databases,

Then emerged Big Data

Describes datasets with volumes so huge that they are beyond
the ability to typical DBMS to capture, store, and analyze
Usually refers to data in the petabyte and Exabyte range

Businesses are interested because they can reveal more patterns

and interesting anomalies than smaller data sets

Need new technologies and tools capable of

managing and analyzing non-traditional data
Using DB to Improve Business

Business Intelligence infrastructure

& Hadoop cluster
Using DB to Improve Business

Business Intelligence Infrastructure

Data Warehouses
Stores current and historical data
Reporting and Query tools

Data marts
A subset of a data warehouse
containing only a portion of the
organizations data for a specific
function or population of users
Using DB to Improve Business
Designed to handle big data
an open source software framework managed by Apache
software foundation that enables distributed parallel
processing of huge amounts of data across inexpensive
Breaks big data into sub-problems, distributes them among
up to thousands of inexpensive computer processing nodes,
then combines the result into a smaller data set that is easier
to analyze
Can process a large amount of any kind of data
Processors can be added or removed as needed
Using DB to Improve Business
In-Memory Computing
Rely on computer main memory (RAM) for data storage
Eliminates bottleneck (blockage) from retrieving and reading data in a
traditional, disk-based database and dramatically shortening query response
Very large sets of data = size of a data mart or small data warehouse, reside
in the RAM of a computer
Example: SAP HANA, Oracle Exalytics

Analytic Platforms
Using both relational and non-relational technology, they are optimized for
analyzing large datasets.

Specifically designed for query processing and analytics

Example: IBM Netezza, Oracle Exadata

Using DB to Improve Business
Using DB to Improve Business
BI Analytical Tools
Online Analytical Processing (OLAP)

data analysis
Using DB to Improve Business
Business Intelligence Analytical Tools

Data Mining
Provides insights into corporate data by finding hidden patterns and relationships in
large database and inferring rules from them to predict future behavior
Associations occurrences linked to a single event
Sequences events are linked over time
Classifications patterns that describe the group to which an item belongs by examining
existing items that have been classified and by inferring a set of rules
Forecasting/predictive analysis

Using DB to Improve Business
Business Intelligence Analytical Tools
Text mining
Extracts key elements from large unstructured data sets (e.g., stored e-mails), discover patterns
and relationships, and summarize the information
Used to analysis the transcripts of calls to customer service centers

Sentiment Analysis
Mines text comments in an e-mail message, blog, social media conversation, or survey to detect
favorable and unfavorable opinions about specific subjects

Web mining

Help firms understand customer behaviour, evaluate the effectiveness of a particular web
site, or quantify the success of a marketing campaign
Content mining: process of extracting knowledge from the content web pages
Structure mining: data related to the structure of a particular web site
Usage mining: examines user interaction data recorded by a web server whenever requests for a
web sites resources are received
Using DB to Improve Business
Auto Insurance Sentiment Ratios







2012 2013 2014 2015

Ratio Positive/Total Ratio Negative/Total

Using DB to Improve Business

Word Cloud:
Using DB to Improve Business

Databases and the web

Managing Data Resources
Establishing an Information policy
Data administration
Responsible for the specific policies and procedures through which
data can be managed as an organizational resource

Data governance
Deals with the policies and processes for managing the availability,
usability, integrity, and security of data
Emphasizes promoting privacy, security, data quality, and compliance

Database administration
Responsible for defining and organizing the structure and content of the
database, and maintaining the DB
Managing Data Resources
Ensuring Data Quality

Data quality audit Data cleansing

A structured survey of the Also known as data
accuracy and level of scrubbing
completeness of the data in an
information system Consists of activities for
Can be preformed by surveying detecting and correcting
entire data files, surveying data in a database that are
samples from data files, or incorrect, incomplete,
surveying end users for their
perceptions of data quality improperly formatted, or
Data quality problems are not just
business problems pose serious Enforces consistency
problems for individuals, affecting
their financial condition and their
among different sets of
jobs data that originated in
separate IS
Chapter 6 review

Which of the following is not one of

the main problem with a traditional
file environment?
Answer: Program-data

A field identified in a table as
holding the unique identifier of the
tables records is called the
Answer: primary key
The select operation
Answer: creates a subset consisting

of all records in the file that meet

stated criteria
The specialized language
programmers use to add and change
data in the database is called
Answer: a data manipulation

You work for a national car rental agency and
want to determine what characteristics are
shared among your most loyal customers. To do
this, you will want to use data mining software
that is capable of
Answer: classification; we want to use data
mining software to determine loyal customers, we
are grouping them based on loyalty, if we
arent looking for a specific customer then its
If it was about time, it would be sequences
Detecting and correcting data in a
database or file that are incorrect,
incomplete, improperly formatted, or
redundant is called:
Answer: data scrubbing
A ________ is a database that store
current and historical data of
potential interest to decision makers
throughout the company
Answer: data warehouse
Each characteristic or quality
describing a particular entity is
called an attribute
Answer: True
Data governance deals with the
policies and processes for managing
the integrity and security of data in a
Answer: True
Data administration is a special
organizational function that manages
the policies and procedures through
which data can be managed as an
organizational resource
Answer: True
A way of facilitating big data
analysis is to use _________, which
relies primarily on a computers main
memory (RAM) for data storage
Answer: in-memory computing