You are on page 1of 6

SS ZG515: Data Warehousing

Introduction to Data Warehousing


Lecture Notes #1 Present Business Scenario Over the last 20 years, $1 trillion has been invested in new computer systems to gain competitive advantage. The vast majority of these systems have automated business processes, to make them faster, cheaper, and more responsive to the customer. Electronic point of sales (EPOS) at supermarkets, itemized billing at telecommunication companies (telcos), and mass market mailing at catalog companies are some examples of such Operational Systems. These systems computerized the day-to-day operations of business organizations. Some characteristics of the operational systems are as follows: Most organizations have a number of individual operational systems (databases, applications) On-Line Transaction Processing (OLTP) systems capture the business transactions that occur. An Operational System is a system that is used daily (perhaps constantly) to perform routine operations - part of the normal business processes. Examples: Order Entry, Purchasing, Stock/Bond trading, bank operations. Users make short term, localized business decisions based on operational data. e.g., "Can I fill this order based on the current units in inventory?" Presently almost all businesses have operational systems and these systems are not giving them any competitive advantage. These systems have gathered a vast amount of data over the years. The companies are now realizing the importance of this hidden treasure of information. Efforts are now on to tap into this information that will improve the quality of their decision-making. A data warehouse is nothing but a repository of data collected from the various operational systems of an organization. This data is then comprehensively analyzed to gain competitive advantage. The analysis is basically used in decision making at the top level. From being just a passing fad, Data Warehousing technology has grown much in scale and reputation in the past few years, as evidenced by the increasing number of products, vendors, organizations, and yes books, even books, devoted to the subject. Enterprises that have successfully implemented data warehouses find it strategic and often wonder how they ever managed to survive without it in the past. As early as 1995, a Gartner Group survey of Fortune 500 IT managers found that 90% of all organizations had planned to implement Data Warehouses by 1998.

Dr. Navneet Goyal, BITS, Pilani

Page 1 of 1

SS ZG515: Data Warehousing

Data Warehousing Systems


A data warehousing system can perform advanced analyses of operational data without impacting operational systems. OLTP is very fast and efficient at recording the business transactions - not so good at providing answers to high-level strategic questions. Component Systems Legacy Systems Any information system currently in use that was built using previous technology generations. Most legacy Systems are operational in nature, largely because the automation of transaction-oriented business process had long been the priority of IT projects. Source Systems Any system from which data is taken for a data warehouse. A source system is often called a legacy system in a mainframe environment. Operational Data Stores (ODS) An ODS is a collection of integrated databases designed to support the monitoring of operations. Unlike the databases of OLTP applications (that are function oriented), the ODS contains subject oriented, volatile, and current enterprise-wide detailed information. It serves as a system of record that provides comprehensive views of data in operational sources. Like data warehouses, ODSs are integrated and subject-oriented. However, an ODS is always current and is constantly updated. The ODS is an ideal data source for a data warehouse, since it already contains integrated operational data as of a given point in time. In short, ODS is an integrated collection of clean data destined for the data warehouse.

Definition Data Warehouses are mostly populated with periodic migrations of data from operational systems. The second source is made up of external, frequently purchased, databases. Examples of this data would include lists of income and demographic information. This purchased information is linked with internal data about customers to develop a good customer profile. A Data Warehouse is a Subject-oriented Integrated Time-variant Non-volatile collection of data in support of management decisions.

Dr. Navneet Goyal, BITS, Pilani

Page 2 of 2

SS ZG515: Data Warehousing

Subject Oriented OLTP databases usually hold information about small subsets of the organization. For example, a retailer might have separate order entry systems and databases for retail, catalog, and outlet sales. Each system will support queries about the information it captures. But if somebody wants to find out details of all sales, then these separate systems are not adequate. To address this type of situation, your data warehouse database should be subject-oriented, organized into subject areas like sales, rather than around OLTP data sources.

Retail Sales System

Outlet Sales System

Catalog Sales System

OLTP System

Sales Subject Area

Data Warehouse

Subject-oriented sales information A data warehouse is organized around major subjects such as customer, products, sales, etc. Data are organized according to subject instead of application. For exmple, an insurance company using a data warehouse would organize their data by customer, premium, and claim instead of by different products (auto, life, property etc.). Integrated A data warehouse is usually constructed by integrating multiple, heterogeneous sources, such as relational databases, flat files, and OLTP files. When data resides in many separate applications in the operational environment, the encoding of data is often inconsistent. For example, in the above system, the retail system uses a numeric 7-digit code for products, the outlet system code consists of 9 alpha-numerics, and the catalog system uses 4 alphabets and 4 numerics. To create a useful subject area, the source data must be integrated. There is no need to change the coding in these systems, but there must be some mechanism to modify the data coming into the data warehouse and assign a common coding scheme. OLTP Systems Retail Sales System Outlet Sales System Catalog Sales System Product code: Product code: Product code: 9999999 XXXXXXXXX XXXX99.99 Product code:

Dr. Navneet Goyal, BITS, Pilani

Page 3 of 3

SS ZG515: Data Warehousing

Common code or a mapping of the various source codes Sales Subject Area Data Warehouse Nonvolatile Unlike operational databases, warehouses primarily support reporting, not data capture. A data warehouse is always a physically separate store of data. Due to this separation, data warehouses do not require transaction processing, recovery, concurrency control etc. The data are not updated or changed in any way once they enter the data warehouse, but are only loaded, refreshed and accessed for queries. READ USER WRITE OLTP

READ USER OLTP R/W vs. DW Read Only Time Variant Data are stored in a data warehouse to provide historical perspective. Every key structure in the data warehouse contains, implicitly or explicitly, an element of time. A data warehouse generally stores data that is 5-10 years old, to be used for comparisons, trends, and forecasting. DW

Operational Systems vs Data Warehousing Systems Operational Data Warehouse Holds current data Holds historic data Data is dynamic Data is largely static Read/Write accesses Read only accesses Repetitive processing Adhoc complex queries Transaction driven Analysis driven Application oriented Subject oriented Used by clerical staff for day-to-day Used by top managers for analysis operations Normalized data model (ER model) Denormalized data model (Dimensional model) Must be optimized for writes and small Must be optimized for queries queries. involving a large portion of the warehouse.

Dr. Navneet Goyal, BITS, Pilani

Page 4 of 4

SS ZG515: Data Warehousing

Advantages of Data Warehousing Potential high Return on Investment Competitive Advantage Increased Productivity of Corporate Decision Makers Problems with Data Warehousing Underestimation of resources for data loading Hidden problems with source systems Required data not captured Increased end-user demands High maintenance Long duration projects Complexity of integration

Data Warehouse Architecture*


A typical data warehousing architecture is illustrated below: External Data Sources Reports
EXTRACT CLEAN TRANSFORM LOAD REFRESH

Metadata Repository Serves OLAP

Data Mining Operational Systems DW

*The details of the architecture will be discussed in the next article (read chapter 3 of the text book 1 in the mean time) The above notes have been compiled from the following sources: 1. Corey M et al., Oracle 8i Data Warehousing, TMH 2001 2. Connolly T and Carolyn B, Database Systems, second edition, AW, 1998 3. Ramakrishna R and Gehrke J, Database Management Systems, second edition, MGH, 2000 4. http://cisnet.baruch.cuny.edu/holowczak/classes/9440/datawarehousing/
You may also refer to http://system-services.com/ftp/dwintro.doc

Dr. Navneet Goyal, BITS, Pilani

Page 5 of 5

This document was created with Win2PDF available at http://www.daneprairie.com. The unregistered version of Win2PDF is for evaluation or non-commercial use only.