Вы находитесь на странице: 1из 2

Much information is available today about data warehouses, data mining, KDD, OLTP, OLAP, and a

whole alphabet soup of other acronyms that describe techniques and methods of storing, accessing,
visualizing, and using data. There are books and magazines about building models for making
predictions of all typesfraud, marketing, New customers, consumer demand, economic statistics,
stock movement, option prices, weather, sociological behavior, traffic demand, resource needs, and
many more.
In order to use the techniques, or make the predictions, industry professionals almost universally agree
that one of the most important parts of any such project, and one of the most time-consuming and
difficult, is data preparation. Unfortunately, data preparation has been much like the weatheras the
old aphorism has it, Everyone talks about it, but no one does anything about it. This book takes a
detailed look at the problems in preparing data, the solutions, and how to use the solutions to get the
most out of the datawhatever you want to use it for. This book tells you what can be done about it,
exactly how it can be done, and what it achieves, and puts a powerful kit of tools directly in your hands
that allows you to do it.

How important is adequate data preparation? After finding the right problem to solve, data preparation
is often the key to solving the problem. It can easily be the difference between success and failure,
between useable insights and incomprehensible murk, between worthwhile predictions and useless
guesses.
For instance, in one case data carefully prepared for warehousing proved useless for modeling. The
preparation for warehousing had destroyed the useable information content for the needed mining
project. Preparing the data for mining, rather than warehousing, produced a 550% improvement in model
accuracy. In another case, a commercial baker achieved a bottom-line improvement approaching $1
million by using data prepared with the techniques described in this book instead of previous approaches.

Ever since the Sumerian and Elam peoples living in the Tigris and Euphrates River basin some 5500
years ago invented data collection using dried mud tablets marked with tax records, people have been
trying to understand the meaning of, and get use from, collected data. More directly, they have been
trying to determine how to use the information in that data to improve their lives and achieve their
objectives. These are the same objectives addressed by the latest technology to wring use and
meaning out of datathe group of technologies that today have come to be called data mining. Often,
something important gets lost in the rush to apply these powerful technologies to find something in
this data. The technologies themselves are not an answer. They are tools to help find an answer. It is
no use looking for an answer unless there is a question. But equally important, given a question, both
the data and the miner need to be readied to find the best answer to the question asked.
This book has two objectives: 1) to present a proven approach to preparing the data, and the miner, to
get the most out of computer-stored data, and 2) to help analysts and business managers make cost-
effective and informed decisions based on the data, their expertise, and business needs and constraints.
This book is intended for everyone who works with or uses data and who needs to understand the
nature, limitations, application, and use of the results they get.
In The Wizard of Oz, while the wizard hid behind the curtain and manipulated the controls, the results
were both amazing and magical. When the curtain was pulled back, and the wizard could be seen
manipulating the controls, the results were still amazingthe cowardly lion did find courage, the tin
man his heart, the scarecrow his brain. The power remained; only the mystery evaporated. This book
pulls back the curtain about the reason, application, applicability, use, and results of data
preparation.
Knowledge, Power, Data, and the World
Francis Bacon said, Knowledge is power. But is it? And if it is, where is the power in knowledge?
Power is the ability to control, or at least influence, events. Control implies taking an action that
produces a known result. So the power in knowledge is in knowing what to do to get what you want
knowing which actions produce which results, and how and when to take them. Knowledge, then, is
having a collection of actions that work reliably. But where does this knowledge come from?

Вам также может понравиться