A Study On Machine Learning Techniques Used in Data Mining

International Journal of Scientific Research Engineering & Technology (IJSRET)
Volume 2 Issue 12 pp 840-843
March 2014
www.ijsret.org
ISSN 2278 0882
A STUDY ON MACHINE LEARNING TECHNIQUES USED IN DATA

MINING
S.Rajasulochana1, M.Nagulanand2, Ramasubash M.P.3
1,2,3
M.E Computer Science and Engineering, SriGuru Institute of Technology, Coimbatore
ABSTRACT
Data mining and machine learning are two main
areas that are under a serious approach. Machine learning
technique enables the machine to improve its performance
based on previous results. Data mining is a concept that
makes use of machine learning technique in solving many
real world problems. The paper provides a state of art
about the machine learning techniques used in data mining
in a laymans perspective.
Data mining often defined as knowledge discovery

in database (KDD) is an iterative sequence of various steps
that involves the following:
Data cleaning
Data integration
Data selection
Data transformation
Relevance analysis
DATA CLEANING
Keywords Data mining, Machine learning, deep

learning, classification, clustering, reinforcement learning
DATA INTEGRATION
I. INTRODUCTION
Present is an era where information plays a vital
role in all sorts of processing. With the advent of
computers, large amount of information flooding the web,
hence gathering, analyzing and processing the data to
extract required patterns has become a serious issue. It is
difficult to deal the big data obtained as a result of data
mining. A means for storing and retrieving data efficiently
is the requirement that need to be done with. There are also
cases where it is essential to classify the data based on
class labels, cluster the relevant data and to associate them
based on patterns to arrive at an inference. Data mining an
important step in Knowledge discovery in database
(KDD) deals with the process of retrieving useful
information (patterns) from the ample amount of available
raw data. Data mining has its evolution in classical
statistics, artificial intelligence and machine learning. It is
one such concept that uses machine learning technique.
Unlike traditional data retrieval that retrieves records for a
given query, data mining is the process of discovering
patterns that are not explicitly stored in the database i.e., it
is the process of discovering the implicit patterns stored in
the database. Yet data mining faces several issues like
security and social issues, performance issues, data source
issues and the like.
II. DATA MINING
DATA SELECTION
DATA TRANSFORMATION
DATA MINING
PATTERN EVALUATION
KNOWLEDGE
REPRESENTATION
Fig. 1 Data Mining process

Thus data mining is the process of extracting
patterns from the data that have been consolidated /
transformed.
Computing is used in all the fields that involve
processing large amount of data. Industries, educational
institutions, researchers exploring the natural world, social
IJSRET @ 2014

March 2014
websites all caters to the source of data. These data have to

effectively classified, clustered, pruned and rendered in
order to obtain the required pattern. Machine learning
plays a vital role in rendering these activities. Now data
mining has moved on to the next stage called semantic
mining or ontology-based mining where prediction is done
based on a training set of data.
III. MACHINE LEARNING

The inaccurate estimates as a result of statistical
estimation results in poor performance. Machine learning
is an approach that aims at optimizing the performance of
the solution thus provides far better estimations when
compared to statistical estimation. The performance can be
optimized either with the help of example data called the
training set or the past experience. Machine learning can
be either inductive or deductive. The core of machine
learning is the learning. Learning is all about the
observations or past experiences for the given set of data to
do better in the future. The main goal of machine learning
is to device an algorithm that learns automatically based
on the past experience. Thus the machine learning
paradigm can be best viewed as Programming by
example. Machine learning can be used to solve problems
by considering the following things [3] [4].
Task identification
Performance analysis
Knowledge identification
Knowledge representation
Identifying the learning paradigm to use.
How to construct a training experience for the
learner.
Machine learning algorithms also have
been used in:
speech recognition
drive automobiles
play world-class backgammon
program generation
routing in communication networks
understanding handwritten text
data mining
Health care etc.
Machine learning problems can be
broadly classified as
supervised learning
unsupervised learning
reinforcement learning
Agent-based modeling and Basket analyses are
some other types of machine learning problems that do not
fall in these three categories.
www.ijsret.org
ISSN 2278 0882
IV. SUPERVISED LEARNING

In case of supervised learning the given set of data
is labeled with pre-defined classes. Supervised learning
can be further classified into two types based on the type
of variables (continuous and discrete) to which they are
applied namely
Classification
Regression
Decision trees, neural networks and Nave Bayes use
classification algorithms whereas Regression, Association
rules and clustering uses Prediction algorithms.
A. Classification
More than 90% of the machines learning problems
are classification problems.
1. Classification by decision tree induction
Decision tree is the simplest form of classification
being used in Data mining. CART (Classification and
Regression Trees), ID3 (Iterative and Dichotomized 3),
CHAID (CHi-squared Automatic Interaction Detector),
MARS and C4.5 are some of the decision trees widely
used. These algorithms differ in the way the split point is
chosen. When the target variable has more than two
categories then a variant of decision tree induction called
the C4.5 algorithm is used and in case of binary split the
typical CART procedure is used. There are two phases in
decision tree classifier namely
Growth phase
Prune phase
The initial phase of building a decision tree is
called the growth phase. Pruning phase reduces any over
fitting of data i.e., it removes any noisy data or outliers.
Over fitting can be removed either by pre-pruning or postpruning. In pre-pruning outliers are removed before any
node is split based on the measure of threshold (choosing
an appropriate threshold measure is indeed difficult)
whereas in post-pruning outliers are removed from a fully
grown tree.
B. Regression
Regression is different from classification in that it
is used to predict the behavior of continuous variables
whereas regression is used to predict the behavior of one
or more random variables. Regression generates numerical
value as the estimated outcome whereas classification
identifies the categorical class label for the given data set.
Let xi be the variable used to predict the outcome
called the independent variable. yi be the observed value of
the predicted variable called the dependent variable. yi be
the predicted value of the dependent variable. A model is
IJSRET @ 2014

March 2014
built using these variables that helps in predicting a

variable from one or more variables and is called the
regression model.
www.ijsret.org
ISSN 2278 0882
target variable is discrete. Supervised learning is similar to

a teacher teaching a elementary school student.
C. ISSUES IN SUPERVISED LEARNING
Bias-variance trade off
Function complexity and amount of training data
Dimensionality of the input space
Noise in the output values
Fig. 2 An example of how a machine learner is trained to

recognize images using training set (a corrupted image of
the number 8) which is labeled or identified as the
number 8.
Supervised learning provides the learning
algorithm with a labeled set of data based on which the
inference is generated. The inference function is defined as
where X is the set of input object also referred to as a
vector and Y is the set of output objects typically called as
supervisory signal. Let X is given as {x1, x2... xn} and Y be
given as {y1, y2 yn}. Then a pair say (x1, y1) forms the
training example and a set containing {(x1, y1), (x2, y2)
(xn,yn)} forms the training set.
TRAINING
SET
LEARNING
ALGORITHM
Fig. 3 supervised learning algorithm

The learning problem in supervised learning is
termed as Regression problem when the target variable
is continuous and as Classification problem when the
Fig. 4 Some of the modeling objectives and supervised

learning techniques
V. UNSUPERVISED LEARNING
It is the problem of trying to find the hidden
structure where the input data is not labeled. Thus the task
is to find the clusters of data from the given unlabeled
data set. Unsupervised learning is similar to a teacher
teaching a graduate student. Following are some of the
approaches to unsupervised learning:
Clustering (eg. K-means, hierarchical clustering)
Association rule mining
Hidden markov model
Blind signal separation
A. Clustering
A clustering problem that is given a training set {x (1),
(2)
x x (k)} and if no output label y (i) is provided the
learning problem is called as unsupervised learning
problem. Clustering is the process of grouping of data
points using the measure of similarity such as Correlation
IJSRET @ 2014

March 2014
or Euclidean distance [1]. Clustering paves the way for

pattern recognition.
www.ijsret.org
ISSN 2278 0882
molecules identification for drug designing. It follows a

layer by layer or hierarchical approach for classification in
case of supervised learning.
VIII. CONCLUSION
A birds overview on various machine learning
techniques been used in data mining has been discussed. It
is such a technique that it could be understood easily in a
laymans perspective. Machine learning techniques can be
applied in all phases of data mining thereby achieving
efficiency.
ACKNOWLEDGEMENT
Fig. 5 Some of the modeling objectives and unsupervised
learning techniques
VI. REINFORCEMENT LEARNING

Reinforcement learning sometimes called as
unsupervised learning is a form of predicting what to do
or in other words mapping situations to actions. That is, it
helps the learning agent to learn the behavior of the system
based on the feedback from the environment. In other
words it is the process of learning from the action. It finds
its application in sequential decision making and control
problems where explicit supervision is not possible.
Reinforcement learning algorithms make use of a reward
function that marks the learning agent to be either
successful or unsuccessful. Upon right move the learning
agent is given positive rewards and upon wrong move or
failure the learning agent is provided with negative
rewards. That is, Reinforcement learning is associated with
learning of policies. For example What to do and not
What is that.
Reinforcement learning has been successful in
applications like autonomous helicopter flight, cell-phone
network routing, factory control, marketing strategy
selection, robot legged locomotion and efficient web-page
indexing.
The advantage of reinforcement learning is that
the algorithm improves its accuracy over time as it reads
more training data and modifies the rules as it makes
wrong prediction.
Reinforcement learning problems are usually
posed using Markov Decision Process (MDP).
The authors would like to thank the staff and

students of SriGuru Institute of Technology for their
support and guidance. The authors also would like to thank
the friends and family members for their valuable
comments.
REFERENCES
[1] Yogesh Singh, Pradeep Kumar Bhatia & Omprakash
Sangwan, A Review Of Studies On Machine Learning
Techniques, International Journal of Computer Science
and Security, Volume (1) : Issue (1).
[2] R. Agarwal, M. Mehta, J. Shafer, R. Srikant, A.
Arning, T. Bollinger. The Quest Data Mining System
Proceedings of 1996 International Conference on Data
Mining and Knowledge Discovery (KDD96), Port-land,
Oregon, pp. 244-249, August 1996.
[3] Clifton Phua, Vincent Lee, Kate Smith1 & Ross
Gayler, A Comprehensive Survey of Data Mining-Based
Fraud Detection Research.
[4] Rob Schapire, A lecture note on Theoretical Machine
Learning.
[5] Jiban K Pal, Usefulness and application of data
mining in extracting information from different
perspective, Annals of Library and Information Studies,
Vol. 58, March 2011, pp 7-16.
VII. DEEP LEARNING

Deep learning is supposed to be the future of
machine learning. It is widely used in image recognition,
IJSRET @ 2014

A Study On Machine Learning Techniques Used in Data Mining

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

A Study On Machine Learning Techniques Used in Data Mining

Загружено:

Авторское право:

Доступные форматы

International Journal of Scientific Research Engineering & Technology (IJSRET)

Volume 2 Issue 12 pp 840-843

ISSN 2278 0882