Академический Документы
Профессиональный Документы
Культура Документы
Application databases are OLTP (On-Line Transaction Processing) systems where every transaction has to be recorded as and when it occurs. Consider the scenario where a bank ATM has disbursed cash to a customer but was unable to record this event in the bank records. If this happens frequently, the bank wouldnt stay in business for too long. So the banking system is designed to make sure that every transaction gets recorded within the time you stand before the ATM machine. A Data Warehouse (DW) on the other end, is a database (yes, you are right, its a database) that is designed for facilitating querying and analysis. Often designed as OLAP (On-Line Analytical Processing) systems, these databases contain read-only data that can be queried and analyzed far more efficiently as compared to your regular OLTP application databases. In this sense an OLAP system is designed to be read-optimized. Separation from your application database also ensures that your business intelligence solution is scalable (your bank and ATMs dont go down just because the CFO asked for a report), better documented and managed. Creation of a DW leads to a direct increase in quality of analysis as the table structures are simpler (you keep only the needed information in simpler tables), standardized (well-documented table structures), and often de-normalized (to reduce the linkages between tables and the corresponding complexity of queries). Having a well-designed DW is the foundation for successful BI (Business Intelligence)/Analytics initiatives, which are built upon. Data Warehouses usually store many months or years of data. This is to support historical analysis. OLTP systems usually store data from only a few weeks or months. The OLTP system stores only historical data as needed to successfully meet the requirements of the current transaction.
2. With necessary diagram, Explain about Data Warehouse Development Life Cycle.
The Data Warehouse development life cycle covers two vital areas. One is warehouse management and the second one is data management. The former deals with defining the project activities and requirements gathering; whereas the latter deals with modeling and designing the Warehouse
Managing the Project Managing the Data Warehouse project is an on-going activity. It is not like traditional systems project. The Data Warehouse is concerned with the execution of warehousing process and the data. Defining the Project The process of defining the project typically involves the following questions: What do I want to analyze? Why do I want? What if I do not do this? How do I get this? Software personnel should get answers to these questions, and then we can understand the requirements that must be addressed. Requirements Gathering Transaction Processing Systems focus on automating the process, making it faster and efficient. This, in turn means that the requirements for transactional systems are specific and more directed towards business process automation. In contrast, the Data Warehousing environment focuses on facilitating the analysis that will change the process to make it more effective. Common questions/ information required during requirements. Who is of interest to the user? What is the user trying to analyze? Why does the user need data? When does the data need to be recovered? Where do relevant processes occur? How do we measure the performance?
approach is different. For the first run of the initial load of a particular table, use the load mode. All further runs will apply the incoming data using the append mode. Creation of indexes on initial loads or full refreshes requires special consideration. Index creation on mass loads can be too time-consuming. So drop the indexes prior to the loads to make the loads go quicker. You may rebuild or regenerate the indexes when the loads are complete. Incremental Loads: These are the applications of ongoing changes from the source systems. Changes to the source systems are always tied to specific times, irrespective of whether or not they are based on explicit time stamps in the source systems. Therefore, you need a method to preserve the periodic nature of the changes in the Data Warehouse. Let us review the constructive merge mode. In this mode, if the primary key of an incoming record matches with the key of an existing record, the existing record is left in the target table as is and the incoming record is added and marked as superseding the old record. If the time stamp is also part of the primary key or if the time stamp is included in the comparison between the incoming and the existing records, then constructive merge may be used to preserve the periodic nature of changes. This is an oversimplification of the exact details of how constructive merge may be used. Nevertheless, the point is that the constructive merge mode is an appropriate method for incremental loads. The details will have to be worked out based on the nature of the individual target tables. Are there cases in which the mode of destructive merge may be applied? What about a Type 1 slowly changing dimension? In this case, the change to a dimension table record is meant to correct an error in the existing record. The corrected incoming record must replace the existing record, so you may use the destructive merge mode. This mode is also applicable to any target tables where the historical perspective is not important. Full Refresh: This type of application of data involves periodically rewriting the entire Data Warehouse. Sometimes, you may also do partial refreshes to rewrite only specific tables. Partial refreshes are rare because every dimension table is intricately tied to the fact table. As far as the data application modes are concerned, full refresh is similar to the initial load. However, in the case of full refreshes, data exists in the target tables before incoming data is applied. The existing data must be erased before applying the incoming data. Just as in the case of the initial load, the load and append modes are applicable to full refresh.