0 оценок0% нашли этот документ полезным (0 голосов)
43 просмотров5 страниц
Data warehouse systems integrate data from
heterogeneous sources and are used by decision makers to
analyze the status and the development of an organization. The
raw data coming from different units and sections of a sales
industry usually come from heterogeneous sources such as SQl
server database tables, data in spreadsheets, notepads or simple
MS Access databases. But for accessing and analyzing the raw
data from heterogeneous sources, has to be put into a common
format with all the relationships intact. That is the premise on
which the concept of data warehousing is based. In this paper we
propose a model how to prepare a data warehouse for a sales
company using SQL Server 2008 R2. Using ETL we have
extracted all the heterogeneous data from different data sources
and did the possible transformations to convert all into a
common format and finally load all the data into a data
warehouse. After having the homogenous data it is easy to
analyse the data. That task is done by creating a
multidimensional database from the relational database and by
analyzing the data from multiple angles. Then in the final step we
have represented that data in a common man understandable
format as per reports, charts or tables etc in a visual picture of
the performance of the sales company.
Оригинальное название
Implementation of Data warehousing in Online Sales
Company
Data warehouse systems integrate data from
heterogeneous sources and are used by decision makers to
analyze the status and the development of an organization. The
raw data coming from different units and sections of a sales
industry usually come from heterogeneous sources such as SQl
server database tables, data in spreadsheets, notepads or simple
MS Access databases. But for accessing and analyzing the raw
data from heterogeneous sources, has to be put into a common
format with all the relationships intact. That is the premise on
which the concept of data warehousing is based. In this paper we
propose a model how to prepare a data warehouse for a sales
company using SQL Server 2008 R2. Using ETL we have
extracted all the heterogeneous data from different data sources
and did the possible transformations to convert all into a
common format and finally load all the data into a data
warehouse. After having the homogenous data it is easy to
analyse the data. That task is done by creating a
multidimensional database from the relational database and by
analyzing the data from multiple angles. Then in the final step we
have represented that data in a common man understandable
format as per reports, charts or tables etc in a visual picture of
the performance of the sales company.
Data warehouse systems integrate data from
heterogeneous sources and are used by decision makers to
analyze the status and the development of an organization. The
raw data coming from different units and sections of a sales
industry usually come from heterogeneous sources such as SQl
server database tables, data in spreadsheets, notepads or simple
MS Access databases. But for accessing and analyzing the raw
data from heterogeneous sources, has to be put into a common
format with all the relationships intact. That is the premise on
which the concept of data warehousing is based. In this paper we
propose a model how to prepare a data warehouse for a sales
company using SQL Server 2008 R2. Using ETL we have
extracted all the heterogeneous data from different data sources
and did the possible transformations to convert all into a
common format and finally load all the data into a data
warehouse. After having the homogenous data it is easy to
analyse the data. That task is done by creating a
multidimensional database from the relational database and by
analyzing the data from multiple angles. Then in the final step we
have represented that data in a common man understandable
format as per reports, charts or tables etc in a visual picture of
the performance of the sales company.
Implementation of Data warehousing in Online Sales Company Snigdha Dash 1 , Subhadra Mishra 2
1 M.C.A., Dept. of CSA, CPGS, OUAT, Bhubaneswar, Odisha, India 2 Assistant Professor, Dept. of CSA, CPGS, OUAT, Bhubaneswar, Odisha, India
Abstract Data warehouse systems integrate data from heterogeneous sources and are used by decision makers to analyze the status and the development of an organization. The raw data coming from different units and sections of a sales industry usually come from heterogeneous sources such as SQl server database tables, data in spreadsheets, notepads or simple MS Access databases. But for accessing and analyzing the raw data from heterogeneous sources, has to be put into a common format with all the relationships intact. That is the premise on which the concept of data warehousing is based. In this paper we propose a model how to prepare a data warehouse for a sales company using SQL Server 2008 R2. Using ETL we have extracted all the heterogeneous data from different data sources and did the possible transformations to convert all into a common format and finally load all the data into a data warehouse. After having the homogenous data it is easy to analyse the data. That task is done by creating a multidimensional database from the relational database and by analyzing the data from multiple angles. Then in the final step we have represented that data in a common man understandable format as per reports, charts or tables etc in a visual picture of the performance of the sales company.
KeywordsData warehouse, Business Intelligence, ETL, SSIS Packages, SSIS Objects, MDDM, fact table, dimension table, schema. I. INTRODUCTION Data warehouse or enterprise data warehouse is a database used for reporting and data analysis. It is a central repository of data which is created by integrating data from one or more disparate sources. Data warehouses store current as well as historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons. This definition of the data warehouse focuses on data storage. The main source of the data is cleaned, transformed, catalogued and made available for use by managers and other business professionals for data mining, online analytical processing, market research and decision support. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system. Thus, an expanded definition for data warehousing includes business intelligence tools, tools to extract, transform and load data into the repository, and tools to manage and retrieve metadata that is data about data. A data warehouse maintains a copy of information from the source transaction systems. It congregates data from multiple sources into a single database so that a single query engine can be used to present data. It mitigates the problem of database isolation level lock contention in transaction processing systems caused by attempts to run large, long running, analysis queries in transaction processing databases. It integrates data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly when the organization has grown by merger. Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data. It presents the organization's information consistently and also provides a single common data model for all data of interest regardless of the data source. Business Intelligence enables organizations to associate sales pipeline data with financial, marketing and customer information to make informed, strategic decisions to improve sales effectiveness. Restructure the data so that it makes sense to the business users, it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems. Add value to operational business applications, notably customer relationship management (CRM) systems. There are three basic concepts in BI i.e ETL(Extract,Transfotm and Load),MDDM(Multiple dimensional data modeling) and Reporting. To achieve these we have also certain tools are there like Pentaho, Informatica, SSIS, SSAS, SSRS, Oracle Integration Services, Oracle Analysis services, Oracle Reporting Services etc. ETL
Almost every time the data which is maintained by any business units will be in different formats. The formats can vary from highly sophisticated databases to flat files. Most common data storages are databases, flat files (e.g.: text files) and excel sheets. These days XML is also playing an important role when it comes to data handling. The basic purpose of Integration Services used to integrate the data that is available in different formats. However it doesnt meant to merge the data from different sources into a single destination alone . Integration Service has a lot to offer. It is not only about source and destination as far data transfer is International J ournal of Computer Trends and Technology (IJ CTT) volume 4 Issue 7J uly 2013
concerned. When data is stored in different formats any means of Integration involves three steps named ETL (Extract, Transform and Load ).
Fig.1. Conceptual Diagram of Datawarehousing
MDDM
The basic unit of storage and analysis in Analysis Services is the cube. A cube is a collection of data thats been aggregated to allow queries to return data quickly Adimension table contains hierarchical data by which youd like to summarize. Each cube has one or more dimensions, each based on one or more dimension tables. A dimension represents a category for analyzing business data. Each type of summary that can be retrieved from a single dimension is called a level. A fact table contains the basic information that you wish to summarize. Every cube will contain one or more measures, each based on a column in a fact table that youd like to analyze. Fact tables and dimension tables are related, which is hardly surprising, given that you use the dimension tables to group information from the fact table. The relations within a cube form a schema.
REPORTING
Reporting Services is a server-based reporting platform that provides comprehensive reporting functionality for a variety of data sources. Reporting Services includes a complete set of tools for you to create, manage, and deliver reports, and APIs that enable developers to integrate or extend data and report processing in custom applications. With Reporting Services, you can create interactive, tabular, graphical, or free-form reports from relational, multidimensional, or XML-based data sources. Reports can include rich data visualization, including charts and maps.
II. RELATED WORK Shashank Swaroop and Manoj Kumar[3] have surveyed the literature related to work done by various authors in last few years and Comparison of various approaches in data warehouse design from security perspective by using various parameters. Due to the fiercely competitive environment and also high stakes of making wrong decision, it has become inevitable to design and implement Business Intelligence solutions which proved to be immensely beneficial in decision making.An attempt has been made by Azimuddin khan, KaruneshSaxena,Jensen [9,10] to develop the conceptual framework pertaining to Business Intelligence and also to discuss its application, decision making as well as to analyze its significance in creating business value. Oscar Romero, Alberto Abell[4] communicated a paper in which multidimensional conceptual view of data is distinguished by the fact / dimension dichotomy, and it is characterized by representing data as if placed in an n-dimensional space, allowing us to easily understand and analyse data in terms of facts (the subjects of analysis)and dimensions showing the different points of view where a subject can be analysed.. Prat, N., Akoka, J., & Comyn-Wattiau and Fernandez- Medina[11,13] researched a paper in which automated, future- oriented analyses made possible by data warehousing move beyond the analyses of past events typically provided by history-oriented tools such as decision support systems. Data mining tools answer business questions that in the past were too time-consuming to pursue. K. Katic, G. Quirchmayr, J. Schiefer, M. Stolba, A.M. Tjoa[5] researched a paper related to Data Mining algorithm usingwhich we extract useful information. Data Mining is powerful technology that is widely used in various applications like E-Commerce, Educational System, Remote Sensing, Online shopping system etc. In this paper, they light on analyzing reviews of customers purchasing different items of different brands. Phipps, C., & Davis, K. C.[1] suggestested a paper in which they present a new data-mining framework that is tightly integrated with the data warehousing technology. Kimball, Ralph., Reeves, L., Thornthwaite, W., & Ross[6,7]suggested a paper where they surveyed and evaluated the literature related to the DQ(Data Quality) dimension in every phase of DW development and proposed an integrated approach for incorporating DQ in DW(Data Warehouse) development in order to minimize risk of DW project failure. Diego Calvanese[2] communicated a paper where he proposed a technique for declaratively specifying suitable reconciliation correspondences to be used inorder to solve conicts among data in different sources. Golfarelli, M., Maio, D., & Rizzi, S.[8] researched a paper where the aim of the paper is to propose a comprehensive set of solutions for conceptual modelling according to the DFM and to give the designer a practical guide for applying them in the context of a design methodology. Besides the basic concepts of multidimensional modelling, the other issues discussed are descriptive and cross-dimension attributes; convergences; shared, incomplete, recursive, and dynamic hierarchies; multiple and optional arcs; International J ournal of Computer Trends and Technology (IJ CTT) volume 4 Issue 7J uly 2013
and additivity. Data warehouse (DW) design is based on a set of requirements expressed as service level agreements (SLAs) and business level objects (BLOs). Populating a DW system from a set of information sources is realized with extract- transform-load (ETL) processes based on SLAs and BLOs. The entire task is complex, time consuming, and hard to be performed manually. So Romero, O., & Abell, A.[12] communicated a paper which presents an approach to the requirement-driven creation of ETL designs. Each requirement is considered separately and a respective ETL design is produced. They propose an incremental method for consolidating these individual designs and creating an ETL design that satisfies all given requirements. Finally, the design produced is sent to an ETL engine for execution. III. PROPOSED MODEL Almost every time the data which is maintained by any business units will be in different formats. The formats can vary from highly sophisticated databases to flat files. The data we have collected from online contain various types such as database tables, flat files (e.g.: text files) and excel sheets. The basic purpose of Integration Services used to integrate the data that is available in different formats. SQL Server 2008 R2 provides SSIS (SQL Server Integration Services) tool to extract the data from the source and do necessary transformations and finally it has to be loaded in the destination. The data that we get after doing ETL is used for analysis. SQL Server 2008 R2 provides SSAS (SQL Server Analysis Services) using which we get analytical information that is helpful for decision making process. Here we can view the data from multiple angles using cube. Analysis Services analyzes it in advance and figures out those daily, weekly, and monthly numbers and stores them away and stores many other aggregations at the same time. This takes up plenty of disk space, but it means that when we want to explore the data we can do it quickly. These analytical information also helps for creating reports which will present the graphical view of the queries and helpful for taking manager level decisions. For this SQL Server 2008 R2 provides SSRS (SQL Server Reporting Services) tool to create, deploy, and manage reports for the organization, as well as programming features that enable us to extend and customize the reporting functionality. With Reporting Services, we can create interactive, tabular, graphical, or free-form reports from relational, multidimensional, or XML-based data sources. Reports can include rich data visualization, including charts, maps, and spark lines.
The term Business Intelligence (BI) represents the tools and systems that play a key role in the strategic planning process of the corporation. These systems allow a company to gather, store, access and analyse corporate data to aid in decision- making. Often BI applications use data gathered from a data warehouse or a data mart.
\
Fig.2. Working of Proposed System
Through ETL we extract the data from the heterogeneous sources, did the necessary transformations to make the data meaningful and good for analytics and finally load the data into the data warehouse in which data remains in a homogenous format i.e. db tables or objects.
Fig.3. Extraction, Transforamtion and Loading of Customer data to SQL Server 2008 R2
Applying SSIS tool to integrate the data Data converted to homogeneous format Data made available for analysis services Applying SSAS tool to produce analytical information Create the cube through which we can view the data from multiple angles Using the Cube data reports are made Applying SSRS tool to make reports Reports are made according to the requirement sheet given Data in heterogeneous format
International J ournal of Computer Trends and Technology (IJ CTT) volume 4 Issue 7J uly 2013
After the data warehouse has been created the next step is to analyze the data measures from multiple angles (dimensions) so that we can get the analytical information for decision making. This is the task of data analytics
Fig.5. Multi Dimensional Data Cube in the form of Schema
Fig.6. Multidimensional Data Cube Finally for tracking the organizational performance and meaningful representation purpose we use Reporting service. By using the Reports, Charts, Bar graphs, KPIs (Key performance indicators) and corporate dashboards etc., all the organizational information can be represented in a more effective way that can be easily interpreted and the way of decision making and performance tracking can be more efficient.
Fig 2 ShipCountry wise Shipper Summary
International J ournal of Computer Trends and Technology (IJ CTT) volume 4 Issue 7J uly 2013
IV. CONCLUSIONS Data warehousing has a very broad scope while considering industrial decision making and performance analysis. In this project the basic business intelligence functions have been covered as per the business requirements. The data has been checked to verify the error-freeness of the 3 modules by using the SSMS queries .In ETL module the data warehouse for the Sales Company has been successfully created. Using analysis module the cube has been successfully designed which provides accurate analytical data. Using SSRS the necessary reports have been created to facilitate industrial decision making. These functionalities can be extended further by using several improvements like adding data mining algorithms for analysis, adding dashboards, KPIs etc as reporting enhancements. As a part of future scope of ETL module more data sources can be specified so that the database structure can grow beyond this. Then ETL can be followed to the new data to add the tables to the data warehouse. As a part of future scope of MDDM module more calculated measures can be added to the cube which will enhance the analytics. As a part of Reporting module more KPIs and dash boards can be added, which will provide more support for decision making.
REFERENCES
[1]Phipps, C., & Davis, K. C. (2002). Automating Data Mining Conceptual Schema Designand Evaluation. In L. V. S. Lakshmanan (Ed.),Proceedings of 4th International Workshop on Designand Management of Data Warehouses; pp 23-32, Toronto, Canada: CEUR-WS.org.
[2]Diego Calvanese, Data Integration in Data Warehousing, International Journal of Cooperative Information Systems Vol. 10,No. 3 (2001) 237271
[3]Golfarelli, M., Maio, D., and Rizzi, S., The Dimensional Fact Model: A Conceptual Model for Data Warehouses. Int. Journal ofCooperative Information Systems (IJCIS),(1998) (2-3): 215-247.
[4]Oscar Romeroand and Alberto Abell et al., Access Control and Audit Model for the Multidimensional Modeling of Data Decision SupportSystems, 2006. 42: p. 1270-1289.
[5]K. Katic, G. Quirchmayr, J. Schiefer, M. Stolba, A.M. Tjoa, A prototype model for Data Warehouse security based on metadata, in: Proceedings of the 9th International Workshop on Database and Expert Systems Applications (DEXA98), Vienna, Austria, 1998, pp. 300309
[6] Kimball, R., Reeves, L., Thornthwaite, W., & Ross, M. (1998). The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses. John Wiley & Sons,Inc.
[7]Kimball, Ralph and Ross, Margy, 2002, The Data Warehouse Toolkit, John Wiley and Sons Inc., Second Edition, p. 16.
[8]Golfarelli, M., Maio, D., & Rizzi, S. (1998a). The Dimensional Fact Model: A Conceptual Modelfor Data Warehouses. International Journal of Cooperative Information Systems, 7(2-3):215-247
[9]Jensen, M. R., Holmgren, T., & Pedersen, T. B. (2004). Discovering Multidimensional Structure in Relational Data. In Y. Kambayashi, M. K. Mohania, W. W (Eds.), Proceedings of 6th International Conference on Data Warehousing and Knowledge Discovery; Vol. 3181, Lecture Notes of Computer Science (pp 138-148). Zaragoza, Spain: Springer.
[10]Azimuddin khan,Karunesh Saxena, J.(2007). Reconciling Requirement- Driven DataWarehouses with Data Sources Via Multidimensional Normal Forms. Data & Knowledge Engineering,23(3):725-751.
[11]Prat, N., Akoka, J., & Comyn-Wattiau, I. (2006).A UML-based Data Warehouse Design Method. Decision Support Systems, 42(3):1449-1473.
[12]Romero, O., & Abell, A. (in press). MDBE:Automatic Multidimensional Modeling. InProceedings of 27rd Int. Conf. on ConceptualModeling.
[13]Fernandez-Medina, E., et al., Access Control and Audit Mode for the Multidimensional Modeling of Data Decision SupportSystems, 2006. 42: p. 1270-1289.