You are on page 1of 2

The application automated of algorithms of mining of data allows detect easily

patterns in them data, reason by which this technique is much more efficient
that the analysis directed to the verification when is tries to explore data from
of repositories of great size and complexity high. These emerging techniques
are constantly evolving as a result of collaboration between research fields
such as data bases, recognition of patterns, artificial intelligence, expert
systems, statistics, visualization, information retrieval, and high performance
computing. The algorithms of mining of data are classified in two large
categories: supervised or predictive and not supervised or of discovery of the
knowledge [Weiss and Indurkhya, 1998].
Supervised or predictive algorithms predict the value of an attribute (label) of a
set of data, known other attributes (descriptive attributes). Based on data
whose label is called induced a relationship between this label and other
attributes. These relationships are used to perform prediction on data whose
label is unknown. This way of working is known as learning supervised and is
develops in two phases:
Training (construction of a model using a subset of data with name tag) and
test (test of the model on the rest of the data). When an application not is it
sufficiently mature not has the potential necessary for a solution predictive, in
that case there are that resort to them methods not supervised or of discovery
of the knowledge that discover patterns and trends in the data current (not
used data historical). The discovery of such information used to carry out
actions and (scientific or business) benefit from them. The following table some
of the techniques of mining of both categories are displayed.
Decision trees
Induction neuronal
Time series

Detection of deviations
Grouping ("clustering")
Association rules
Sequential patterns

The application of data mining algorithms requires a series of previous efforts

to prepare input data since, in many cases such data from heterogeneous
sources, they do not have the proper format or contain noise. On the other
hand, it is necessary to interpret and evaluate the results. The entire process
consists of the following steps [Cabea et al., 1998]:
1. determination of objective
2. data preparation

-Selection: identification of the sources of information external and

internal and selection of the subset of data necessary.
-Preprocessing: study of the quality of the data and determination of the
operations of mining that is can perform.
3. transformation of data: conversion of data in an analytical model.
4. data mining: automated treatment of the data selected with an appropriate
combination of algorithms.
5. analysis of results: interpretation of the results obtained in the previous
stage, usually with the help of a visualization technique.
6. assimilation of knowledge : application of discovered knowledge.

While the above steps are performed in the order in which they appear, the
process is highly iterative, establishing feedback between them. In addition,
not all the steps usually require the same effort, the preprocessing stage is the
most expensive since it represents approximately 60% of the total effort, while
the stage of mining accounts for only 10%.