Академический Документы
Профессиональный Документы
Культура Документы
Machine Learning system that it is useful to classify them in broad categories based on :
- Whether or not they are trained with human supervision (supervised, unsupervised,
semisupervised, and reinforcement learning)
- Whether or not they can learn incrementally on the fly (online vs batch learning)
- Whether they work by simply comparing new data points to known data point, or instead detect
pattern in the training data and build a predictive model, much like scientists (instance based vs
model based learning)
Supervised Learning
In supervised learning, the training data you feed to the algorithm includes the desired solutions, called
labels.
Figure : a labeled training set for supervised learning (e.g : spam classification)
- Typical supervised learning task is classification. Like figure : train spam or ham email
- Predict is an another task in supervised learning. The target is a numeric value, such as the proce
of a car, given a set of features (mileage, age, brand, etc) called predictor Regression
- Regression analysis : statistical methodology that is most often used for numeric prediction
Classification vs Prediction
Classification :
Prediction :
- Each sample is assumed to belong to a predefined class, as determined by the class label
attribute
- The set of record/tuples used for model construction : training set
- The model is represented as classification rules, decision tree, or mathematical formula
Note :
In machine learning an attribute is a data type (e.g : “Mileage”) while a feature has several meanings
depending on the context but generally means an attribute plus its value (eg : mileage = 15 000). Many
people use the words attribute and feature interchangeably, though.
Figure : regression
Unsupervised Learning
The data training is unlabeled. The system tries to learn without a teacher
- Partitioning
o Find mutually exclusive cluster of spherical shape
o Distance – based
o May use mean or medoid (etc.) to represent cluster center
o Effective for small to medium size datasets
- Hierarchical
o Clustering is a hierarchical decomposistion (i.e. multiple levels)
o Cannot correct erroneous merges or splits
o May incorporate other techniques like microclustering or consider object “linkage”
- Density based
o Can find arbitrarily shape cluster
o Clusters are dense region of objects in space that are separated by low density regions
o Cluster density : each point must have a minimum number of points within its
“neighborhood”
o May filter out outliers
- Grid based
o Use a multiresolution grid data structure
o Fast processing time (typically independent of number of data object, yet dependent on
grid size)
Supervised Learning
Some algorithm can deal with partially training data, usually a lot of unlabeled data and a little bir of
labeled data. This is called semisupervised learning
Most semisupervised learning algorithms are combinations of supervised and unsupervised learning
(example : deep belief networks – DBFn) DBF based in unsupervised component called restricted
Boltzmann Machines (RBM) stacked on top of one another. RBM are trained sequentially in an
unsupervised manner and then the whole system is fine tuning using supervised learning techniques
Reinforcement Learning
Reinforcement learning observe the environment, select and perform actions and get REWARDS in
return (or penalties in the form of negative rewards)
Figure : reinforcement learning