Вы находитесь на странице: 1из 5

International J ournal of Computer Trends and Technology (IJ CTT) volume 4 Issue 7J uly 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2330



Implementation of Data warehousing in Online Sales
Company
Snigdha Dash
1
, Subhadra Mishra
2

1
M.C.A., Dept. of CSA, CPGS, OUAT, Bhubaneswar, Odisha, India
2
Assistant Professor, Dept. of CSA, CPGS, OUAT, Bhubaneswar, Odisha, India


Abstract Data warehouse systems integrate data from
heterogeneous sources and are used by decision makers to
analyze the status and the development of an organization. The
raw data coming from different units and sections of a sales
industry usually come from heterogeneous sources such as SQl
server database tables, data in spreadsheets, notepads or simple
MS Access databases. But for accessing and analyzing the raw
data from heterogeneous sources, has to be put into a common
format with all the relationships intact. That is the premise on
which the concept of data warehousing is based. In this paper we
propose a model how to prepare a data warehouse for a sales
company using SQL Server 2008 R2. Using ETL we have
extracted all the heterogeneous data from different data sources
and did the possible transformations to convert all into a
common format and finally load all the data into a data
warehouse. After having the homogenous data it is easy to
analyse the data. That task is done by creating a
multidimensional database from the relational database and by
analyzing the data from multiple angles. Then in the final step we
have represented that data in a common man understandable
format as per reports, charts or tables etc in a visual picture of
the performance of the sales company.

KeywordsData warehouse, Business Intelligence, ETL, SSIS
Packages, SSIS Objects, MDDM, fact table, dimension table,
schema.
I. INTRODUCTION
Data warehouse or enterprise data warehouse is
a database used for reporting and data analysis. It is a central
repository of data which is created by integrating data from
one or more disparate sources. Data warehouses store current
as well as historical data and are used for creating trending
reports for senior management reporting such as annual and
quarterly comparisons. This definition of the data warehouse
focuses on data storage. The main source of the data is
cleaned, transformed, catalogued and made available for use
by managers and other business professionals for data
mining, online analytical processing, market research and
decision support.
However, the means to retrieve and analyze data, to extract,
transform and load data, and to manage the data dictionary are
also considered essential components of a data warehousing
system. Thus, an expanded definition for data warehousing
includes business intelligence tools, tools to extract, transform
and load data into the repository, and tools to manage and
retrieve metadata that is data about data.
A data warehouse maintains a copy of information from the
source transaction systems. It congregates data from
multiple sources into a single database so that a single query
engine can be used to present data. It mitigates the problem of
database isolation level lock contention in transaction
processing systems caused by attempts to run large, long
running, analysis queries in transaction processing databases.
It integrates data from multiple source systems, enabling a
central view across the enterprise. This benefit is always
valuable, but particularly when the organization has grown by
merger. Improve data quality, by providing consistent codes
and descriptions, flagging or even fixing bad data. It presents
the organization's information consistently and also provides a
single common data model for all data of interest regardless of
the data source.
Business Intelligence enables organizations to associate sales
pipeline data with financial, marketing and customer
information to make informed, strategic decisions to improve
sales effectiveness. Restructure the data so that it makes sense
to the business users, it delivers excellent query performance,
even for complex analytic queries, without impacting
the operational systems. Add value to operational business
applications, notably customer relationship
management (CRM) systems.
There are three basic concepts in BI i.e
ETL(Extract,Transfotm and Load),MDDM(Multiple
dimensional data modeling) and Reporting. To achieve these
we have also certain tools are there like Pentaho, Informatica,
SSIS, SSAS, SSRS, Oracle Integration Services, Oracle
Analysis services, Oracle Reporting Services etc.
ETL

Almost every time the data which is maintained by any
business units will be in different formats. The formats can
vary from highly sophisticated databases to flat files. Most
common data storages are databases, flat files (e.g.: text files)
and excel sheets. These days XML is also playing an
important role when it comes to data handling. The basic
purpose of Integration Services used to integrate the data
that is available in different formats. However it doesnt
meant to merge the data from different sources into a single
destination alone . Integration Service has a lot to offer. It is
not only about source and destination as far data transfer is
International J ournal of Computer Trends and Technology (IJ CTT) volume 4 Issue 7J uly 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2331

concerned. When data is stored in different formats any means
of Integration involves three steps named ETL (Extract,
Transform and Load ).



Fig.1. Conceptual Diagram of Datawarehousing


MDDM

The basic unit of storage and analysis in Analysis Services is
the cube. A cube is a collection of data thats been aggregated
to allow queries to return data quickly Adimension table
contains hierarchical data by which youd like to summarize.
Each cube has one or more dimensions, each based on one or
more dimension tables. A dimension represents a category for
analyzing business data. Each type of summary that can be
retrieved from a single dimension is called a level. A fact table
contains the basic information that you wish to summarize.
Every cube will contain one or more measures, each based on
a column in a fact table that youd like to analyze. Fact tables
and dimension tables are related, which is hardly surprising,
given that you use the dimension tables to group information
from the fact table. The relations within a cube form a schema.

REPORTING

Reporting Services is a server-based reporting platform that
provides comprehensive reporting functionality for a variety
of data sources. Reporting Services includes a complete set of
tools for you to create, manage, and deliver reports, and APIs
that enable developers to integrate or extend data and report
processing in custom applications. With Reporting Services,
you can create interactive, tabular, graphical, or free-form
reports from relational, multidimensional, or XML-based data
sources. Reports can include rich data visualization, including
charts and maps.




II. RELATED WORK
Shashank Swaroop and Manoj Kumar[3] have surveyed the
literature related to work done by various authors in last few
years and Comparison of various approaches in data
warehouse design from security perspective by using various
parameters. Due to the fiercely competitive environment and
also high stakes of making wrong decision, it has become
inevitable to design and implement Business Intelligence
solutions which proved to be immensely beneficial in decision
making.An attempt has been made by Azimuddin khan,
KaruneshSaxena,Jensen [9,10] to develop the conceptual
framework pertaining to Business Intelligence and also to
discuss its application, decision making as well as to analyze
its significance in creating business value. Oscar Romero,
Alberto Abell[4] communicated a paper in which
multidimensional conceptual view of data is distinguished by
the fact / dimension dichotomy, and it is characterized by
representing data as if placed in an n-dimensional space,
allowing us to easily understand and analyse data in terms of
facts (the subjects of analysis)and dimensions showing the
different points of view where a subject can be analysed.. Prat,
N., Akoka, J., & Comyn-Wattiau and Fernandez-
Medina[11,13] researched a paper in which automated, future-
oriented analyses made possible by data warehousing move
beyond the analyses of past events typically provided by
history-oriented tools such as decision support systems. Data
mining tools answer business questions that in the past were
too time-consuming to pursue. K. Katic, G. Quirchmayr, J.
Schiefer, M. Stolba, A.M. Tjoa[5] researched a paper related
to Data Mining algorithm usingwhich we extract useful
information. Data Mining is powerful technology that is
widely used in various applications like E-Commerce,
Educational System, Remote Sensing, Online shopping
system etc. In this paper, they light on analyzing reviews of
customers purchasing different items of different brands.
Phipps, C., & Davis, K. C.[1] suggestested a paper in which
they present a new data-mining framework that is tightly
integrated with the data warehousing technology. Kimball,
Ralph., Reeves, L., Thornthwaite, W., & Ross[6,7]suggested a
paper where they surveyed and evaluated the literature related
to the DQ(Data Quality) dimension in every phase of DW
development and proposed an integrated approach for
incorporating DQ in DW(Data Warehouse) development in
order to minimize risk of DW project failure. Diego
Calvanese[2] communicated a paper where he proposed a
technique for declaratively specifying suitable reconciliation
correspondences to be used inorder to solve conicts among
data in different sources. Golfarelli, M., Maio, D., & Rizzi,
S.[8] researched a paper where the aim of the paper is to
propose a comprehensive set of solutions for conceptual
modelling according to the DFM and to give the designer a
practical guide for applying them in the context of a design
methodology. Besides the basic concepts of multidimensional
modelling, the other issues discussed are descriptive and
cross-dimension attributes; convergences; shared, incomplete,
recursive, and dynamic hierarchies; multiple and optional arcs;
International J ournal of Computer Trends and Technology (IJ CTT) volume 4 Issue 7J uly 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2332

and additivity. Data warehouse (DW) design is based on a set
of requirements expressed as service level agreements (SLAs)
and business level objects (BLOs). Populating a DW system
from a set of information sources is realized with extract-
transform-load (ETL) processes based on SLAs and BLOs.
The entire task is complex, time consuming, and hard to be
performed manually. So Romero, O., & Abell, A.[12]
communicated a paper which presents an approach to the
requirement-driven creation of ETL designs. Each
requirement is considered separately and a respective ETL
design is produced. They propose an incremental method for
consolidating these individual designs and creating an ETL
design that satisfies all given requirements. Finally, the design
produced is sent to an ETL engine for execution.
III. PROPOSED MODEL
Almost every time the data which is maintained by any
business units will be in different formats. The formats can
vary from highly sophisticated databases to flat files. The data
we have collected from online contain various types such as
database tables, flat files (e.g.: text files) and excel sheets. The
basic purpose of Integration Services used to integrate the
data that is available in different formats. SQL Server 2008
R2 provides SSIS (SQL Server Integration Services) tool to
extract the data from the source and do necessary
transformations and finally it has to be loaded in the
destination. The data that we get after doing ETL is used for
analysis. SQL Server 2008 R2 provides SSAS (SQL Server
Analysis Services) using which we get analytical information
that is helpful for decision making process. Here we can view
the data from multiple angles using cube. Analysis Services
analyzes it in advance and figures out those daily, weekly, and
monthly numbers and stores them away and stores many other
aggregations at the same time. This takes up plenty of disk
space, but it means that when we want to explore the data we
can do it quickly. These analytical information also helps for
creating reports which will present the graphical view of the
queries and helpful for taking manager level decisions. For
this SQL Server 2008 R2 provides SSRS (SQL Server
Reporting Services) tool to create, deploy, and manage reports
for the organization, as well as programming features that
enable us to extend and customize the reporting functionality.
With Reporting Services, we can create interactive, tabular,
graphical, or free-form reports from relational,
multidimensional, or XML-based data sources. Reports can
include rich data visualization, including charts, maps, and
spark lines.

The term Business Intelligence (BI) represents the tools and
systems that play a key role in the strategic planning process
of the corporation. These systems allow a company to gather,
store, access and analyse corporate data to aid in decision-
making. Often BI applications use data gathered from a data
warehouse or a data mart.


















\




Fig.2. Working of Proposed System

Through ETL we extract the data from the heterogeneous
sources, did the necessary transformations to make the data
meaningful and good for analytics and finally load the data
into the data warehouse in which data remains in a
homogenous format i.e. db tables or objects.



Fig.3. Extraction, Transforamtion and Loading of Customer
data to SQL Server 2008 R2

Applying SSIS tool to
integrate the data
Data converted to
homogeneous format
Data made available for
analysis services
Applying SSAS tool to produce
analytical information
Create the cube through
which we can view the data
from multiple angles
Using the Cube data
reports are made
Applying SSRS tool to
make reports
Reports are made
according to the
requirement sheet
given
Data in heterogeneous
format

International J ournal of Computer Trends and Technology (IJ CTT) volume 4 Issue 7J uly 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2333



Fig.4. ETL of Employee Data to SQL Server 2008 R2

After the data warehouse has been created the next step is to
analyze the data measures from multiple angles (dimensions)
so that we can get the analytical information for decision
making. This is the task of data analytics

Fig.5. Multi Dimensional Data Cube in the form of Schema

Fig.6. Multidimensional Data Cube
Finally for tracking the organizational performance and
meaningful representation purpose we use Reporting service.
By using the Reports, Charts, Bar graphs, KPIs (Key
performance indicators) and corporate dashboards etc., all the
organizational information can be represented in a more
effective way that can be easily interpreted and the way of
decision making and performance tracking can be more
efficient.



Fig 2 ShipCountry wise Shipper Summary

International J ournal of Computer Trends and Technology (IJ CTT) volume 4 Issue 7J uly 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2334



Fig 3 Product Category wise Summary

IV. CONCLUSIONS
Data warehousing has a very broad scope while considering
industrial decision making and performance analysis. In this
project the basic business intelligence functions have been
covered as per the business requirements.
The data has been checked to verify the error-freeness of the 3
modules by using the SSMS queries .In ETL module the data
warehouse for the Sales Company has been successfully
created. Using analysis module the cube has been successfully
designed which provides accurate analytical data. Using SSRS
the necessary reports have been created to facilitate industrial
decision making. These functionalities can be extended further
by using several improvements like adding data mining
algorithms for analysis, adding dashboards, KPIs etc as
reporting enhancements. As a part of future scope of ETL
module more data sources can be specified so that the
database structure can grow beyond this. Then ETL can be
followed to the new data to add the tables to the data
warehouse. As a part of future scope of MDDM module more
calculated measures can be added to the cube which will
enhance the analytics. As a part of Reporting module more
KPIs and dash boards can be added, which will provide more
support for decision making.


REFERENCES

[1]Phipps, C., & Davis, K. C. (2002). Automating Data Mining Conceptual
Schema Designand Evaluation. In L. V. S. Lakshmanan (Ed.),Proceedings of
4th International Workshop on Designand Management of Data Warehouses;
pp 23-32, Toronto, Canada: CEUR-WS.org.

[2]Diego Calvanese, Data Integration in Data Warehousing, International
Journal of Cooperative Information Systems Vol. 10,No. 3 (2001) 237271

[3]Golfarelli, M., Maio, D., and Rizzi, S., The Dimensional Fact Model: A
Conceptual Model for Data Warehouses. Int. Journal ofCooperative
Information Systems (IJCIS),(1998) (2-3): 215-247.

[4]Oscar Romeroand and Alberto Abell et al., Access Control and Audit
Model for the Multidimensional Modeling of Data Decision SupportSystems,
2006. 42: p. 1270-1289.

[5]K. Katic, G. Quirchmayr, J. Schiefer, M. Stolba, A.M. Tjoa, A prototype
model for Data Warehouse security based on metadata, in: Proceedings of the
9th International Workshop on Database and Expert Systems Applications
(DEXA98), Vienna, Austria, 1998, pp. 300309

[6] Kimball, R., Reeves, L., Thornthwaite, W., & Ross, M. (1998). The Data
Warehouse Lifecycle Toolkit:
Expert Methods for Designing, Developing and Deploying Data Warehouses.
John Wiley & Sons,Inc.

[7]Kimball, Ralph and Ross, Margy, 2002, The Data Warehouse Toolkit,
John Wiley and Sons Inc., Second Edition, p. 16.

[8]Golfarelli, M., Maio, D., & Rizzi, S. (1998a). The Dimensional Fact Model:
A Conceptual Modelfor Data Warehouses. International Journal of
Cooperative Information Systems, 7(2-3):215-247

[9]Jensen, M. R., Holmgren, T., & Pedersen, T. B. (2004). Discovering
Multidimensional Structure in Relational Data. In Y. Kambayashi, M. K.
Mohania, W. W (Eds.), Proceedings of 6th International Conference on
Data Warehousing and Knowledge Discovery; Vol. 3181, Lecture Notes of
Computer Science (pp 138-148). Zaragoza, Spain: Springer.

[10]Azimuddin khan,Karunesh Saxena, J.(2007). Reconciling Requirement-
Driven DataWarehouses with Data Sources Via Multidimensional Normal
Forms. Data & Knowledge Engineering,23(3):725-751.

[11]Prat, N., Akoka, J., & Comyn-Wattiau, I. (2006).A UML-based Data
Warehouse Design Method. Decision Support Systems, 42(3):1449-1473.

[12]Romero, O., & Abell, A. (in press). MDBE:Automatic Multidimensional
Modeling. InProceedings of 27rd Int. Conf. on ConceptualModeling.

[13]Fernandez-Medina, E., et al., Access Control and Audit Mode for the
Multidimensional Modeling of Data Decision SupportSystems, 2006. 42: p.
1270-1289.

Вам также может понравиться