Академический Документы
Профессиональный Документы
Культура Документы
html
INTRODUCTION
architectures. The real goal of reporting systems is decision support or its modern
equivalent, business intelligence-to help people make better, more intelligent decisions.
In order to achieve the above characteristics, which are the main requirements of the
present Information Systems department, Data Warehousing and Mining is preferred over
Data Warehouse is a database of data gathered from many systems and intended
These patterns and relationships in the data can help in making better business
decisions.
1 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
• Subject oriented: Data Warehouse deals with all the subjects of corporate data.
• Time variant: Historical data with chronological importance, i.e Historical data
This is one of the main difference in ordinary Database and Data Warehouse.
• To give end users access to their data without a reliance on reports produced by
Intelligence.
2 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
Store 4
Store1 Store 2 OLAP
OLTP Integration
Layer
(DWH)
Store 3 Store 4
Store 1 HPQS OLAP
OLTP DATA MART
As shown in the architecture, the approach to reporting systems and data warehousing is
built around:
System(HPQS)
• Three Data flow paths between the four data structures. They are
3 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
architecture . i.e The architecture is divided into three phases, they are
1. Extract phase
2. Transform phase
3. Loading phase
EXTRACT PHASE
This phase is the process of transferring data from DATA STORE1 to DATA-
STORE2.
system or systems-that will provide data to the warehouse. Eg: like sales, accounting, and
distribution- that will provide data to your warehouse. Each of these systems has a
database of information that end users need access to. Frequently, the need access to data
There are different mechanisms for extracting that data out of its sources. This is called
The data from different source systems may be in different formats, say in Oracle,
Foxpro, Ms-Access etc., These data should be integrated and transferred to DATA-
STORE2.
The art of determining what records to extract from the source system is frequently called
change data capture. The point of change data capture is to recognize what source
records have changed and how, so that just the changed records are moved to the
warehouse. Deletes can be very difficult to recognize. Because they leave no trace. The
4 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
deleted record is just gone. If you need to recognize deletes, you may have to jump
through a few hoops to figure out if records have been deleted from your data source.
• Timestamps The lucky among us extract data from systems that timestamp
records whenever they are inserted or deleted. In these situations, change data
triggers on the source tables. Every time a record is inserted into, updated in,
log file. The warehouse uses the information in these log files to determine how to
update itself.
• File Compares Probably the least desirable technique for identifying changes in
your source data is to compare the file as it appears today to a copy of how it
appeared when you last loaded the warehouse. Not only is this technique difficult
to implement, but it’s also less accurate than some of “snapshots.” Thus, if you
load your warehouse weekly, you’ll only be able to see the new state of the
database every week, but not every change that occurred during the week.
TRANSFORM PHASE
After the Extract phase, data is present in the DATA STORE2. Transform phase
is where this data is Transformed into the required form in the DATA STORE2.
5 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
4. Normalization
Step 1:
The data in the DATA STORE2 is from the different source systems of DATA-
STORE1. So, the data is heterogeneous. Hence this should be converted to Homogeneous
Data. This is the reason why this DATA STORE2 is called Integration Layer or
Warehouse.
Step 2:
Some special keys are added to the data in the DATA STORE2.This keys have
no business meaning. For example, rather than using the customer number as the key on
the CUSTOMER table, you might use a surrogate key that is simply a sequential number
generated by your warehouse load programs. The customer number would still appear on
Step 3:
Dirty data is the set of records, which are not useful for the database.
• Ignoring them.
• Rejecting bad records, but saving them in a separate file for manual review.
• Loading as much of the bad record as possible and pointing out the errors for later
6 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
Step 4:
like a flat file that is broken up into smaller files or tables in order to store the data
After the performing above 4 steps, data is efficiently stored in the DATA STORE2.
LOADING PHASE
Data Marts are databases that share many of the features of data warehouses but
Like a warehouse, data in mart may come from multiple systems although, in our
standard architecture, this data will have been integrated in the warehouse before it ever
Marts differ from warehouses in that, whereas the warehouse focuses on the needs of the
entire organization, marts are dedicated to particular subject areas or the needs of a
• Star Schema in which, maintenance of data will be in one fact table and
7 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
DATA MARTS, or HPQS, are databases and data structures set up specifically to
support end-user queries. These database are most frequently managed by either
Note: Data Marts and HPQSs are logical, not physical, concepts. Frequently, an
organization’s data warehouse and its data marts will share the same computer.
Finally, after Extract, Transform and Loading of Data, the DATA FLOW3 of the
architecture is done.
DATA FLOW3 is the transfer of data from the High Performance Query Structures to the
There are some Query tools, to transfer data from Data Marts to the End user. These
DATA STORE4 is the data in the end user’s hands. This report in users’ hands is the end
of the information utility. It is, also, the last data store in our preferred warehousing
architecture.
ETL – TOOLS:
• INFORMATICA
• MICROSTRATAGY
• ABIONTASH
8 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
OLAP TOOLS:
• BUSINESS OBJECTS
• COGNOS
• Relational Database
GOVERNMENT DEPARTMENTS
9 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
Railway Centralized
Dept Server
(DWHS)
Credit & Debit OLAP
Details
Registration DATA MART
Dept
The Data Marts can analyze data and answer the most complex queries of the user.
Report Generation will be immediate. So, any decisions for the development can be
taken immediately. The data from the Centralized Server can even be sent to internet
for making the people know the Government’s activities, and can question the
Government if any incorrect data is found in the Data Warehouse. For example the
details of salary and loans of a particular employee are available through their respective
department systems. His assets are obtained from the Registration department. Data can
be analyzed and can conclude whether a particular employee is loyal to the Government
or not. Thus respective actions can be taken. Thus, Data Warehousing and Mining can
REFERENCES
10 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
• Michael Corey, Michael Abbey, Ian Abramson and Ben Taub, “ORACLE8i Data
Warehoushing”.
WEB SITES
• www.datawarehouse.com
• www.data-warehouse.com
• www.dwinfocentre.org
11 Email: chinna_chetan05@yahoo.com
Visit: www.geocities.com/chinna_chetan05/forfriends.html
12 Email: chinna_chetan05@yahoo.com