Вы находитесь на странице: 1из 41

Information Systems

Databases and Information Management


Topics
• File approach to data management
• Database approach to data management
• Business Intelligence
• Data Warehouse
• Data Marts
• Online analytical processing (OLAP)
• Data Mining
• Databases and the Web
• Managing data sources
Information Systems 2
• An effective information system provides
users with accurate, timely, and relevant
information.
– Accurate information is free of errors.
– Information is timely when it is available to
decision makers when it is needed.
– Information is relevant when it is useful and
appropriate for the types of work and decisions
that require it.
Data should be organized and maintained in the
right way
Information Systems 3
File based approach
• In the file based approach, files are maintained
separately by different departments
• This approach encourages each functional area in
a corporation to develop specialized applications.
• Each application requires a unique data file.
• These subsets of the master file lead to data
redundancy and inconsistency, processing
inflexibility, and wasted storage resources.

Information Systems 4
Information Systems 5
Problems of the file based
approach
• The problems related to file based approach
are:
– data redundancy and inconsistency,
– program-data dependence,
– inflexibility,
– poor data security, and
– an inability to share data among applications.

Information Systems 6
Problems of the file based
approach …
• Data redundancy is where the same data are
stored in more than one place or location.
• Data redundancy occurs when different
groups in an organization independently
collect the same piece of data and store it
independently of each other.
• Data inconsistency occurs when the same
attribute may have different values, which is
caused by data redundancy.
Information Systems 7
Problems of the file based
approach
• Program-data dependence : refers to the
coupling of data stored in files and the specific
programs required to update and maintain
those files such that changes in programs
require changes to the data.
• Lack of Flexibility : a traditional file system
can deliver routine scheduled reports after
extensive programming efforts, but it cannot
deliver ad hoc reports or respond to
unanticipated information requirements in a
timely fashion Information Systems 8
Problems of the file based
approach
• Poor Security : there is little control or
management of data and access to and
dissemination of information may be out of
control.
• Lack of Data Sharing and Availability :
Because pieces of information in different files
and different parts of the organization cannot
be related to one another, it is virtually
impossible for information to be shared or
accessed in a timely manner.
Information Systems 9
The Database Approach
• Database
– Serves many applications by centralizing data and
controlling redundant data
• Database management system (DBMS)
– Interfaces between applications and physical data files
– Separates logical and physical views of data
– Solves problems of traditional file environment
• Controls redundancy
• Eliminates inconsistency
• Uncouples programs and data
• Enables organization to centrally manage data and data security
Information Systems 10
The Database Approach …
• Relational DBMS
– Relational DBMS are the most widely used type of DBMS for PCs
as well as Servers.
– Represent data as two-dimensional tables called relations or
files
– Each table contains data on entity and attributes
– E.g. MS Access, MS SQL Server, Oracle, MySQL…
Table: grid of columns and rows
– Rows (tuples): Records for different entities
– Fields (columns): Represents attribute for entity
– Key field: Field used to uniquely identify each record
– Primary key: Field in table used for key fields
– Foreign key: Primary key used in second table as look-up field to identify
records from original table
Information Systems 11
Tables in RDBMS

Information Systems 12
Multi table data extraction

• The select, join, and project operations enable


data from two different tables to be combined and
only selected attributes to be displayed.
Information Systems 13
Functions of a DBMS
• Data Storage, Retrieval, and Update.
• Transaction Support.
– A transaction is a series of actions, carried out
by a single user or application program, which
accesses or changes the contents of the
database
– A DBMS must ensure either that all the
updates corresponding to a given transaction are
made or that none of them is made.

Database Management Systems 14


Functions of a DBMS
• Concurrency Control Services.
– A DBMS must ensure that the database is
updated correctly when multiple users are
updating the database concurrently.

Database Management Systems 15


Functions of a DBMS
• Recovery Services.
• Authorization Services.
• Support for Data Communication.
• Utility Services( Import, Export,
Statistical Analysis, etc.)

Database Management Systems 16


Database development
• Designing Databases
– Conceptual (logical) design: abstract model from business
perspective
– Physical design: How database is arranged on direct-access
storage devices

• Normalization
– Streamlining complex groupings of data to minimize redundant
data elements and awkward many-to-many relationships

Information Systems 17
An unnormalized relation

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg

Information Systems 18
Normalized set of relations
EMPLOYEE

Emp_ID F_Name L_Name Dept_ID Dept_Name


111 Mary Jones 1 Acct
122 Sarah Smith 2 Mktg

EMPLOYEE

Emp_ID F_Name L_Name Dept_ID


111 Mary Jones 1
122 Sarah Smith 2
DEPARTMENT

Dept_ID Dept_Name
1 Acct

Information Systems
2 Mktg 19
Business Intelligence and Analytics
• Business intelligence
– Infrastructure for collecting, storing, analyzing
data produced by a business
– Databases, data warehouses, data marts
• Business analytics
– Tools and techniques for analyzing data
– OLAP, statistics, models, data mining
• Business intelligence and analytics requires a
strong database foundation, a set of analytic
tools, and an involved management team that
can ask intelligent questions and analyze data.
Information Systems 20
Business Intelligence …
• Very large databases and systems require
special capabilities, tools
– To analyze large quantities of data
– To access data from multiple systems
• Principal tools include:
• Software for database query and reporting
• Online analytical processing (OLAP)
• Data mining

Information Systems 21
Business Intelligence …

Information Systems 22
Data Warehouses
• A data warehouse is a database that stores
current and historical data of potential
interest to decision makers throughout the
company.
• Consolidates and standardizes information for
use across enterprise, but data cannot be
altered
• Data warehouse system will provide query,
analysis, and reporting tools
Information Systems 23
Data Warehouses …
• Extraction Transformation Loading –
ETL
– To get data out of the source and Data Warehouse
load it into the data warehouse –
simply a process of copying data
from one database to other ETL pipelin
outputs

– Data is extracted from an OLTP


database, transformed to match the
data warehouse schema and loaded ETL
into the data warehouse database
– Many data warehouses also ETL ETL ETL ETL

incorporate data from non-OLTP


systems such as text files, legacy
systems, and spreadsheets; such RDBMS 1 RDBMS 2
data also requires extraction, HTML 1 XML 1
transformation, and loading
Information Systems 24
OLTP vs. OLAP
We can divide IT systems into transactional (OLTP) and analytical (OLAP).
In general we can assume that OLTP systems provide source data to data warehouses, whereas
OLAP systems help to analyze it.

Information Systems 25
Components of a Data Warehouse

Information Systems 26
Data Marts
• Data marts:
– Subset of data warehouse
– Summarized or highly focused portion of firm’s data for
use by specific population of users
– Typically focuses on single subject or line of business

Information Systems 27
Multidimensional data analysis
and data mining
• Once data have been captured and organized
in data warehouses and data marts, they are
available for further analysis using tools for
business intelligence such as:
– Online Analytical Processing (OLAP)
– Data Mining
– Web Mining

Information Systems 28
Online Analytical Processing(OLAP)

• Online analytical processing (OLAP)


– Supports multidimensional data analysis
• Viewing data using multiple dimensions
• Each aspect of information (product, pricing, cost,
region, time period) is different dimension
• E.g., how many washers sold in the East in June
compared with other regions?
– OLAP enables rapid, online answers to ad hoc
queries

Information Systems 29
Online Analytical
Processing(OLAP)…

Information Systems 30
Data Mining
• Traditional database queries answer such questions
as, “How many units of product number 403 were
shipped in February 2010?” OLAP supports much
more complex requests for information, such as
“Compare sales of product 403 relative to plan by
quarter and sales region for the past two years.”
• Data mining is more discovery-driven and provides
insights into corporate data that cannot be obtained
with OLAP by finding hidden patterns and
relationships in large databases and inferring rules
from them to predict future behaviour.
Information Systems 31
Data Mining …
• The patterns and rules obtained from data
mining are used to guide decision making and
forecast the effect of those decisions.
• The types of information obtainable from data
mining include:
– Associations
– Sequences
– Classifications
– clusters, and
– forecasts.
Information Systems 32
Data Mining …
• Associations are occurrences linked to a single event. For
instance, when corn chips are purchased, a cola drink is
purchased 65 percent of the time
• In sequences, events are linked over time. for example, if a
house is purchased, a new refrigerator will be purchased
within two weeks 65 percent of the time
• Classification recognizes patterns that describe the group to
which an item belongs by examining existing items that have
been classified and by inferring a set of rules.
• Clustering works in a manner similar to classification when no
groups have yet been defined. A data mining tool can discover
different groupings within data
• Forecasting uses a series of existing values to forecast what
other values will be.
Information Systems 33
Text Mining and Web Mining
• Text mining tools extract key elements from
large unstructured data sets (e.g., stored e-
mails), discover patterns and relationships,
and summarize the information.
• Web mining is the discovery and analysis of
useful patterns and information from the
World Wide Web

Information Systems 34
Databases and the Web
• Many companies use Web to make some internal databases
available to customers or partners
• Typical configuration includes:
– Web server
– Application server/middleware
– Database server (hosting DBM)
• Advantages of using Web for database access:
– Ease of use of browser software
– Web interface requires few or no changes to database
– Inexpensive to add Web interface to system instead of redesign and
rebuild the system to improve user access
– Accessing corporate databases through the Web is creating new
efficiencies, opportunities, and business models.

Information Systems 35
Using Databases …
• LINKING INTERNAL DATABASES TO THE WEB

• Users access an organization’s internal database through the


Web using their desktop PCs and Web browser software.

Information Systems 36
Information Systems 37
Transmission Control Protocol/Internet Protocol (TCP/IP) and the
Hypertext Transfer Protocol (HTTP).
Subtopics
• TCP/IP protocols
TCP/IP is a family of communication protocols used to connect
computer systems in a network. It is named after two of the
protocols in the family: Transmission Control Protocol (TCP)
and Internet Protocol (IP). Hypertext Transfer Protocol (HTTP)
is a member of the TCP/IP family.
• IP addresses
Each server or client on a TCP/IP internet is identified by a
numeric IP (Internet Protocol) address. The two types of IP
address are the IPv4 (IP version 4) address and the IPv6 (IP
version 6) address.

Information Systems 38
Managing Data Sources
1. Establishing an information policy
– Firm’s rules, procedures, roles for sharing, managing,
standardizing data.
– which users and organizational units can share
information,
– where information can be distributed,
– and who is responsible for updating and maintaining the
information.
2. Data administration:
– Is responsible for the specific policies and procedures
through which data can be managed as an organizational
resource
Information Systems 39
Managing Data Sources …
3. Ensuring data quality
– Most data quality problems stem from faulty
input
– Before new database is in place, organizations
need to:
• Identify and correct faulty data
• Establish better routines for editing data once database
is in operation

Information Systems 40
Managing Data Sources …
4. Data quality audit:
– Structured survey of the accuracy and level of
completeness of the data in an information system
1. surveying entire data files,
2. surveying samples from data files, or
3. surveying end users for their perceptions of data quality
5 . Data cleansing
– Software to detect and correct data that are incorrect,
incomplete, improperly formatted, or redundant
– Enforces consistency among different sets of data from
separate information systems

Information Systems 41

Вам также может понравиться