Вы находитесь на странице: 1из 35

Danang University of Technology Pioneer Club

Topic: MACHINE LEARNING

Prepared by: Hoang Le Uyen Thuc

Danang, Jan 2013

Outline

* Introduction * Unsupervised Learning * Supervised Learning

What is Machine Learning?

Scientific field: * Designing and developing of algorithm that allow computer to know the details of the underlying generation of data Core task: * Making inference from training samples Research: * Automatically learning to recognize complex patterns and make intelligent decisions based on data.
3

Applications of Machine Learning


Pattern recognition * Speech recognition, optical character recognition, bar code recognition, human face recognition, finger print recognition * Human action recognition: to statistically identify the input video clip as one of the categories of interested actions. Event detection Event detection * Special case of pattern recognition * Indicate spam email, estimate the loan risk, detect the falling in videos
4

Taxonomy of Machine Learning


Supervised learning - Need to label the data - Need an external help - No self-organization - Feedback algorithm - Often be used in classification problems Unsupervised learning - No need to label the data - No need an external help - Self-organization - Forward algorithm - Often be used in clustering problems
5

Outline

* Introduction * Unsupervised Learning * Supervised Learning

K-means Clustering (1/5)


Clustering: finding groups of similar data points among input patterns * X Rmxn: set of m data points Xi in Rn. * Partition X into K clusters {Ck} such that data belonging to the same cluster are more alike than data in different clusters. K-means Clustering: to group the data based on features into K groups. The grouping is done by minimizing the sum of squares of distances between data points and the corresponding cluster centroid (Euclidean distance).

K-means Clustering (2/5)


Step 1: Initialization * Number of cluster and initial centroids Step 2: Data clustering Step 3: Centroid determination Step 4: Iteration * Repeat steps 2 to 3 until terminated Stop conditions: (1)The change in all cluster centroids is less than the amount specified by the user, or (2)The number of iterations is less than a threshold defined by the 8 user.

K-means Clustering (3/5)

K-means Clustering(4/5)
Pros: * Simple and fast Cons: * Sensitive to the initialization * Sensitive to the outliers
outlier
outlier

Data points

Undesirable clustering

Ideal clustering
10

K-means Clustering (5/5)


Goal: to extract the flower petals Clustering based on the color feature K=3 The number of iterations: 5

11

Gaussian Mixture Model (1/4)


Being formed by combining multivariate Gaussian density components

1 [ 1 / 2 ( x k ) T k 1 ( x k )] p( x k , k ) = e n/2 1/ 2 (2 ) | k |
c

p ( x | ) = wk p ( x | k , k )
k =1

x: feature vectors, {p(x|k,k)}: Gaussian components, {wk}: mixture weights, k: mean vector, k: covariance matrix , k = 1, , Nc

GMM parameters:

= {wk, k, k} k = 1, 2,

, Nc.
12

Gaussian Mixture Model (2/4)


Step 1: Initialization * Nc, k0, k0, wk0 = 1/Nc Step 2: Computing the mixture Gaussians p(x|) Step 3: Updating the GMM parameters t Step 4: Calculating the error function
T t +1 t t +1 p =1 T p =1

t+1

= p ( X | ) p( X | ) = p ( x p | ) p ( x p | t )

Step 5: Repeat steps 2 to 4 until terminated Stop conditions: (1) The change in error function is less than the threshold, or (2) The number of iterations reaches a threshold
13

Gaussian Mixture Model (3/4)


Pros: * Smooth approximations to arbitrary shaped densities * Good final clusters starting from almost all the initial points tested

14

Gaussian Mixture Model (4/4)


Goal: to cluster the 2D data into 3 clusters. Implementation: * generate simulated data from a mixture of 3 3D-Gaussians * perform the clustering based on GMM The algorithm is converged after 25 iterates Unlike K-means, which requires the spherical boundary, GMM can act as the soft clustering.

15

Outline

* Introduction * Unsupervised Learning * Supervised Learning

16

Artificial Neural Network (1/5)


Structure of a biological neuron:

McCulloch and Pitts neuron model: * Net function:


u = wj y j +
j =1

y1 yj y wj

w1

{wj; 1 j N}: synaptic weights : threshold

* Activation function:

a = f (u )
17

Artificial Neural Network (2/5)


Perceptron model

Multiplayer Perceptron Model (MLP)

18

Artificial Neural Network (3/5)


MLP training using error-back-propagation training algorithm Pros: * ability to use the learned knowledge to predict unseen data * successfully applied to solve many real world challenging problems Cons: * every training example must maximize the information content * the right value of * # hidden layers, # hidden neurons * large amount of training data * overtraining problem
19

Artificial Neural Network (4/5)


Goal: to classify the flower images in self-built database into rose or daisy Implementation: * Extract features: 7 Hu-moments

* Construct ANN:

20

Artificial Neural Network (5/5)


Implementation: * Divide the images into three sets, 70% for training, 15% for validating and 15% for testing. * Repeat the training/testing 36 times and get the confusion matrix.

Average recognition rate: 77.19%


21

Support Vector Machine (1/6)


Given a set of data points:
denotes +1

{(xi , yi )}, i = 1, 2,L, n , where


For yi = +1, wT xi + b > 0 For yi = 1, wT xi + b < 0
With a scale transformation on both w and b, the above is equivalent to

x2

denotes -1

For yi = +1, w T xi + b 1 For yi = 1, w T xi + b 1


x1
22

Support Vector Machine (2/6)


Margin:
M = (x+ x ) n = (x + x ) 2 w = w w

Formulation:

maximize
such that

2 w

For yi = +1, w T xi + b 1 For yi = 1, w T xi + b 1


23

Support Vector Machine (3/6)


Nonlinear SVM:

24

Support Vector Machine (4/6)


Kernel trick:

g ( x) = w T ( x) + b =

iSV

T ( x ) i i ( x) + b

K (xi , x j ) (xi )T (x j )
No need to know (x) explicitly, because we only use the dot product of feature vectors in both the training and test. Algorithm: * Step 1: Choose a kernel function * Step 2: Choose a value for C * Step 3: Solve the quadratic programming problem * Step 4: Construct the SVM discriminant function from the support vectors.
25

Support Vector Machine (5/6)


Pros: * powerful approach to pattern recognition problems * simple geometric interpretation Cons: * SVM only works on real-valued inputs * kernel-based algorithm and its parameters need to choose appropriate kernel

* mainly formulated to solve two-class categorization problems


26

Support Vector Machine (6/6)


Two-class classification:

Three-class classification:

27

Hidden Markov Model (1/5)


Doubly statistical Markov process Underlying statistical process is hidden but can be observed through another set of statistical processes HMM parameters:
* Number of states - N * Number of distinct observation symbols per state - M * State transition probability distribution A = {aij} * Observation symbol probability distribution B = {bj(k)} * Initial state distribution = {i}

HMM model: = (A, B, )


28

Hidden Markov Model (2/5)


Three basic HMM problems:
* Evaluation: Given the observation sequence O = O1 O2 OT and a model = (A, B, ), how to efficiently compute the likelihood P(O|)? - Solving by forward-backward algorithm * Analysis: Given the observation sequence O = O1 O2 OT and a model = (A, B, ), how to find the optimal state sequence Q = q1 q2 qT? - Solving by Viterbi algorithm * Training: Given the observation sequence O = O1 O2 OT, how do we adjust the model parameters = (A, B, ) to maximize P(O|)? - Solving by Baum-Welch algorithm
29

Hidden Markov Model (3/5)


HMM types:

Ergodic model

Left-to-right model

Parallel left-to-right model

30

Hidden Markov Model (4/5)


MLP training using error-back-propagation training algorithm Pros: * ability to deal with classification or modeling of temporal signals with time-varying processes * convert large variations of temporal data into one unified signal generation model * be easily extended to deal with larger scale problems Cons: * require a lot of training data * require high precision computation * no direct interpretation of the implication of each state after the model is trained 31

Hidden Markov Model (5/5)


MLP training training algorithm Application ofusing HMMerror-back-propagation in human action recognition: * Training phase: train each HMM for each action * Classification phase: for each model, calculate the probability of the model generating an observed sequence to measure the likelihood between the action model and the input feature vector sequence. The maximum likelihood is selected.

32

Demo Experiments on HumanEva database (6/6)

* DEMO

33

Outline

* Introduction * Unsupervised Learning * Supervised Learning

34

Hoang Le Uyen Thuc hluthuc@dut.udn.vn


35

Вам также может понравиться