Академический Документы
Профессиональный Документы
Культура Документы
Society's
Rajarambapu Institute of Technology, Rajaramnagar
QUANTUM 06
A paper on
mailto:aditi_k10@rediff.com mailto:vahida.mulla@rediffmail.com
SANGLI.
Abstract
We live in the Age of Information. The importance of collecting data that reflect business or
scientific activities to achieve competitive advantage is widely recognized now days. Powerful systems for
collecting data and managing it in large databases are in place in all large and mid-range companies.
However, the bottleneck of turning this data into our success is the difficulty of extracting knowledge about
the system we study. The problems can probably be solved if information hidden among megabytes of data
in your database can be found explicitly and utilized. Modeling the investigated system, discovering
relations that connect variables in a database are the subject of data mining.
Modern computer data mining systems self learn from the previous history of the
investigated system, formulating and testing hypotheses about the rules, which this system obeys. The
primary concept behind data warehousing is that the nonvolatile data stored for business analysis can be
most effectively managed by separating it from the active data in the operational systems.
The following presentation covers data mining and data warehousing concepts,
implementations and applications. Starting with introductory concepts we proceed with exhaustive
discussion of hard facts of data warehousing and data mining. Then we switch over to pros and cons of the
subject along with some concluding remarks.
INDEX
1. Introduction
2. Data Warehousing
3. Data mining
5. Applications
According to Bill Inmon, known as the father of Data Warehousing, a data warehouse
is a subject oriented, integrated, time-variant, nonvolatile collection of data in support of
management decisions.
Data warehousing is a concept. It is a set of hardware and software components that can
be used to better analyze the massive amounts of data that companies are accumulating to
make better business decisions. Data Warehousing is not just data in the data warehouse,
but also the architecture and tools to collect, query, analyze and present information.
The primary concept behind data warehousing is that the nonvolatile data stored for
business analysis can be most effectively managed by separating it from the active data in
the operational systems. Nonvolatile data is data that is not modified or rarely modified
after being moved from operational systems to a data warehouse.
While large-scale information technology has been evolving separate transaction and
analytical systems, data mining provides the link between the two. Data mining software
analyzes relationships and patterns in stored transaction data based on open-ended user
queries. Several types of analytical software are available: statistical, machine learning,
and neural networks. Generally sought four types of relationships are:
• Extract, transform, and load transaction data onto the data warehouse system.
Data warehousing
Operational data is the data we use to run our business. This data is what is typically
stored, retrieved, and updated by your Online Transactional Processing (OLTP) system.
An OLTP system may be, for example, a reservations system, an accounting application,
or an order entry application. Informational data is created from the wealth of operational
data that exists in your business and some external data useful to analyze your business.
Relational databases store data in a two dimensional format: tables of data represented by
rows and columns. Multi-dimensional analysis solutions, commonly referred to as On-
Line Analytical Processing (OLAP) solutions, offer an extension to the relational model
to provide a multi-dimensional view of the data.
Data Marts:
Data marts are workgroup or departmental warehouses, which are small in size, typically
10-50GB. The data mart contains informational data that is departmentalized, tailored to
the needs of the specific departmental work group. Data marts are less expensive and take
less time for implementation with quick ROI. They are scaleable to full data warehouses
and at times are summarized subsets of more detailed, pre-existing data warehouses.
Metadata/Information Catalogue:
Metadata describes the data that is contained in the data warehouse (e.g. Data elements
and business-oriented description) as well as the source of that data and the
transformations or derivations that may have been performed to create the data element.
Once ready, data is loaded into a relational database management system (RDBMS)
which acts as the data warehouse. Some of the requirements of database servers for data
warehousing include: Performance, Capacity, Scalability, Open interfaces, Multiple-data
structures, optimizer to support for star-schema, and Bitmapped indexing . Some of the
popular data stores for data warehousing are relational databases like Oracle, DB2,
Informix or specialized Data Warehouse databases like RedBrick, SAS. To provide the
level of performance needed for a data warehouse, an RDBMS should provide
capabilities for parallel processing - Symmetric Multiprocessor (SMP) or Massively
Parallel Processor (MPP) machines, near-linear scalability, data partitioning, and system
administration.
Integrated: When data are moved from the operational environment into the data
warehouse, they assume a consistent coding convention
Time-variant: The data warehouse contains a place for storing data that are five to 10
years old, or older, to be used for comparisons, trends, and forecasting. These data are not
updated.
Non-volatile: Data are not updated or changed in any way once they enter the data
warehouse, but are only loaded and accessed.
Data mining has many and varied fields of applications such as:
1.Retail/Marketing
2.Banking
5. Transportation
Conclusion
Data Warehousing provides the means to change raw data into information for
making effective business decisions--the emphasis on information, not data. The data
warehouse is the hub for decision support data. A good data warehouse will... provide the
RIGHT data... to the RIGHT people... at the RIGHT time: RIGHT NOW! While data
warehouse organizes data for business analysis. So the future of data warehousing lies in
their accessibility from the Internet. Successful implementation of a data warehouse and
data mining requires a high-performance; scaleable combination of hardware and
software which can integrate easily with existing systems, so customers can use data
warehouses to improve their decision-making--and their competitive advantage.
References
Web Resources:
www.datawarehousingonline.com
www.megaputer.com
www.anderson.ucla.edu
Book: