Академический Документы
Профессиональный Документы
Культура Документы
Sargur Srihari
Machine Learning Srihari
Topics
2
Machine Learning Srihari
3
Machine Learning Srihari
1. Regression M
• Output is a single target variable t that can take any real value
• Assuming t is Gaussian distributed with an x-dependent mean
p(t | x, w) = N(t | y(x, w), β −1 )
• Likelihood function
N
p(t | x,w, β ) = ∏ N(t n | y(x n ,w), β −1 )
n=1
€
Machine Learning
Regression Error Function Srihari
• Having found wML the value of βML can also be found using
2
ln β / 2π 1 N 7
= ∑ { y(x n ,w ML ) − t n }
β N n=1
Machine Learning Srihari
2. Binary Classification
• Single target variable t where t=1 denotes C1 and t =0
denotes C2
• Consider network with single output whose activation
function is logistic sigmoid
1
y = σ (a) =
1+ exp(−a)
• so that 0 < y(x,w) < 1
• Interpret y(x,w) as conditional probability p(C1|x)
€
• Conditional distribution of targets given inputs
{ }
E(w) = − ∑ tn ln yn + (1 − tn )ln(1 − yn )
n =1
9
Machine Learning Srihari
k =1
3. Multiclass Classification
• Each input assigned to one of K classes
• Binary target variables have 1-of-K coding scheme
t k ∈ {0,1}
• Network outputs are interpreted as yk (x,w) = p(tk = 1 | x)
Parameter Optimization
• Task: Find weight vector w which minimizes the
chosen function E(w)
• Geometrical picture of error function
• Error function has a highly nonlinear
• dependence
12
Machine Learning Srihari
13
Machine Learning Srihari
€ ∇E(w) = 0
• Points at which gradient vanishes are
stationary points: minima, maxima, saddle
€
Complex surface
No hope of finding analytical solution to
equation
∇E(w) = 0 14
Machine Learning Srihari
Discussion Overview
Approximated by quadratic
1
• Error Functions E(w) ≅ E(w*) + (w − w*)T H (w − w*)
M 2
• Linear Regression y(x,w) = w T φ (x) = ∑ w jφ j (x) Where H = ∇∇E is a W x W matrix
j =1
1 N
2 Whose contours of constant error are
E(w) = ∑{ y(x n ,w) − t n } ellipses
2 n=1 €
with axes aligned with eigen vectors ui
1
• Binary Classification y(x,w) = σ (w T φ (x)) = of H whose
1 + exp(−w T φ (x))
N
lengths are inversely proportional to
€ sq roots of eigenvalues
E(w) = −∑{t n ln y n + (1− t n )ln(1− y n )}
n=1
exp(w kφ (x))
• Multiclass Classification y (x,w) = k
∑ exp(w jφ (x))
j
N K
€ E(w) = −∑ ∑ t kn ln y k (x n ,w)
n=1 k=1
€
19
Machine Learning Srihari
Neighborhood of a minimum w*
21
Machine Learning Srihari
23
Machine Learning Srihari
w (τ +1) = w (τ ) − η∇E(w (τ ) )
24
Machine Learning Srihari
Summary
• Neural network parameters have many parameters
• can be determined analogous to linear regression
parameters
• Probabilistic formulation leads to appropriate error
functions for linear regression, binary and multi-class
classification