Вы находитесь на странице: 1из 6

Le c t ur e N ot e sF o rDBM S a nd Da t a M ining a nd Da t a W ar e hous i ng

Unit IV Lecture 24 Data-Warehousing Introduction Present Business Scenario Over the last 20 years, $1 trillion has been invested in new computer systems to gain competitive advantage. The vast majority of these systems have automated business processes, to make them faster, cheaper, and more responsive to the customer. Electronic point of sales (EPOS) at supermarkets, itemized billing at telecommunication companies (telcos), and mass market mailing at catalog companies are some examples of such Operational Systems. These systems computerized the day-to-day operations of business organizations. Some characteristics of the operational systems are as follows: ? ? ? ? ? Most organizations have a number of individual operational systems (databases, applications) On-Line Transaction Processing (OLTP) systems capture the business transactions that occur. An Operational System is a system that is used daily (perhaps constantly) to perform routine operations - part of the normal business processes. Examples: Order Entry, Purchasing, Stock/Bond trading, bank operations. Users make short term, localized business decisions based on operational data. e.g., "Can I fill this order based on the current units in inventory?"

Presently almost all businesses have operational systems and these systems are not giving them any competitive advantage. These systems have gathered a vast amount of data over the years. The companies are now realizing the importance of this hidden treasure of information. Efforts are now on to tap into this information that will improve the quality of their decision-making. A data warehouse is nothing but a repository of data collected from the various operational systems of an organization. This data is then comprehensively analyzed to gain competitive advantage. The analysis is basically used in decision making at the top level. Need For Data Warehousing ? ? ? ? Data Warehouses (DW) provides users with current and historical decision support information that is hard to get in traditional operational data stores. DW can provide strategic business opportunities by allowing customers and vendors access to corporate data while maintaining necessary security measures DW gives competitive advantage from a business perspective as it allows decisions to be taken quickly and correctly by providing all available data in a non technical user friendly way. Also it can handle incremental increase if data in bulk. DW can address the incompatibility of information and operational transaction systems.

Depa r tm ent ofEl ectr i c a la nd El ectr oni c s

By:S ul a bh Ba ns a l

Le c t ur e N ot e sF o rDBM S a nd Da t a M ining a nd Da t a W ar e hous i ng

Operational and Informational data stores ? ? ? Various OLTP( On-line Transaction processing) systems ( e.g. financial, order entry, work scheduling, and point-of sale systems) create opearational data in corporations Opeartional data is detailed, non redundant, updateable and reflects current structure. It answers questions as How many gadgets were sold to a customer number 123876 on Sepember 19? Informational data is organized around subject such as customer, vendor, and product. It is often summarized, is redundant to support varying data views, and is non-updateable. It answers decision making question as What three products resulted in the most frequent calls to the hotline over the past quarter? Informational data is obtained from operational data sources (including any or all applications, databases, and computer systems within the enterprise) after cleaning, renaming and providing access methods.

Data Warehousing Systems A data warehousing system can perform advanced analysis of operational data without impacting operational systems. OLTP is very fast and efficient at recording the business transactions - not so good at providing answers to high-level strategic questions. Component Systems: ? Legacy Systems:-Any information system currently in use that was built using previous technology generations. Most legacy Systems are operational in nature, largely because the automation of transaction-oriented business process had long been the priority of IT projects. Source Systems:- Any system from which data is taken for a data warehouse. A source system is often called a legacy system in a mainframe environment. Operational Data Stores (ODS):- An ODS is a collection of integrated databases designed to support the monitoring of operations. Unlike the databases of OLTP applications (that are function oriented), the ODS contains subject oriented, volatile, and current enterprise-wide detailed information. It serves as a system of record that provides comprehensive views of data in operational sources. Like data warehouses, ODSs are integrated and subject-oriented. However, an ODS is always current and is constantly updated. The ODS is an ideal data source for a data warehouse, since it already contains integrated operational data as of a given point in time. In short, ODS is an integrated collection of clean data destined for the data warehouse.

? ?

Depa r tm ent ofEl ectr i c a la nd El ectr oni c s

By:S ul a bh Ba ns a l

Le c t ur e N ot e sF o rDBM S a nd Da t a M ining a nd Da t a W ar e hous i ng

Definition: Data Warehouses are mostly populated with periodic migrations of data from operational systems. The second source is made up of external, frequently purchased, databases. Examples of this data would include lists of income and demographic information. This purchased information is linked with internal data about customers to develop a good customer profile. A Data Warehouse is a ? Subject-oriented ? Integrated ? Time-variant ? Non-volatile collection of data in support of management decisions. ? Subject Oriented o OLTP databases usually hold information about small subsets of the organization. o For example, a retailer might have separate order entry systems and databases for retail, catalog, and outlet sales. Each system will support queries about the information it captures. But if somebody wants to find out details of all sales, then these separate systems are not adequate. To address this type of situation, your data warehouse database should be subject-oriented, organized into subject areas like sales, rather than around OLTP data sources.

o A data warehouse is organized around major subjects such as customer, products, sales,etc. Data are organized according to subject instead of application. For exmple, an insurance company using a data warehouse would organize their data by customer, premium, and claim instead of by different products (auto, life, property etc.).

Depa r tm ent ofEl ectr i c a la nd El ectr oni c s

By:S ul a bh Ba ns a l

Le c t ur e N ot e sF o rDBM S a nd Da t a M ining a nd Da t a W ar e hous i ng

Integrated o A data warehouse is usually constructed by integrating multiple, heterogeneous sources, such as relational databases, flat files, and OLTP files. o When data resides in many separate applications in the operational environment, the encoding of data is often inconsistent. o For example, in the above system, the retail system uses a numeric 7-digit code for products, the outlet system code consists of 9 alpha-numerics, and the catalog system uses 4 alphabets and 4 numerics. To create a useful subject area, the source data must be integrated. There is no need to change the coding in these systems, but there must be some mechanism to modify the data coming into the data warehouse and assign a common coding scheme.

Nonvolatile o Unlike operational databases, warehouses primarily support reporting, not data capture. o A data warehouse is always a physically separate store of data. o Due to this separation, data warehouses do not require transaction processing, recovery, concurrency control etc. o The data are not updated or changed in any way once they enter the data warehouse, but are only loaded, refreshed and accessed for queries.

Depa r tm ent ofEl ectr i c a la nd El ectr oni c s

By:S ul a bh Ba ns a l

Le c t ur e N ot e sF o rDBM S a nd Da t a M ining a nd Da t a W ar e hous i ng

Time Variant o Data are stored in a data warehouse to provide historical perspective. o Every key structure in the data warehouse contains, implicitly or explicitly, an element of time. o A data warehouse generally stores data that is 5-10 years old, to be used for comparisons, trends, and forecasting.

Other terms related to the data warehouse: ? Current detail data- acquired directly from the operational database. It is organized along the subject lines (customer profile data, customer activity data, demographic data, sales data, etc.) ? Old detail data- represents the history of the subject data areas ? Data mart- it may contain lightly summarized departmental data, a collection of data marts composes an enterprise-wide data warehouse. ? Summarized data- Data aggregated along the lines required for executive level reporting, trend analysis, and enterprise wide decision making. It is much smaller than current data or old detail data. ? Drill-down- traversing the summarization levels from highly summarized data to the underlying current data or detail. For example an analyst can drill down from sales volumes in North America into the state, county, city. ? Metadata- It is data about data. It contains the location and description of warehouse system components; names, definition, structure, and content of the data warehouse and end-users views. Operational Systems vs Data Warehousing Systems:
O per a ti ona l Hol ds c ur r ent da ta Da ta i s dyna mi c R ea d/W r i te a c c es s es R epeti ti ve pr oces s i ng Tr a ns a c t i on dr i ven A ppl i c a t i o no r i e n t e d Us ed by c l er i c a ls ta f ff or da ytoda y oper a ti ons Da taW a r ehous e Hol ds hi s tor i c da ta Da tai sl a r gel ys ta ti c R ea d onl ya c c es s es A dhocc om pl ex quer i es A na l ys i s dr i ven S ubj ect or i ented Us ed by top m a na ger s f ora na l ys i s Denor ma l i zed da ta N or ma l i zed da ta m odel m odel ( Di m ens i ona l ( ER m odel ) m odel ) M us t be opti mi zed f or quer i esi nvol v i ng al a r ge M us t be opti mi zed f or por ti on of the wr i tes a nd s ma l lquer i es . wa r ehous e.

Depa r tm ent ofEl ectr i c a la nd El ectr oni c s

By:S ul a bh Ba ns a l

Le c t ur e N ot e sF o rDBM S a nd Da t a M ining a nd Da t a W ar e hous i ng

Advantages of Data Warehousing: ? Potential high Return on Investment ? Competitive Advantage ? Increased Productivity of Corporate Decision Makers Problems with Data Warehousing ? Underestimation of resources for data loading ? Hidden problems with source systems ? Required data not captured ? Increased end-user demands ? High maintenance ? Long duration projects ? Complexity of integration

Data Warehouse Architecture


A typical data warehousing architecture is illustrated below:

Depa r tm ent ofEl ectr i c a la nd El ectr oni c s

By:S ul a bh Ba ns a l

Вам также может понравиться