Академический Документы
Профессиональный Документы
Культура Документы
1
Gaussian process history
2
Nonlinear regression
Consider the problem of nonlinear regression:
You want to learn a function f with error bars from data D = {X, y}
p(f )p(D|f )
p(f |D) =
p(D)
3
What is a Gaussian process?
4
Covariances
Kij = K(xi, xj )
Ensures consistency
5
Squared exponential (SE) covariance
" #
0
2
1 x x
K(x, x0) = 02 exp
2
6
Matern class of covariances
1
0
0
0 2 2|x x | 2|x x |
K(x, x ) = K
()
Stationary, isotropic
: SE covariance
7
Nonstationary covariances
8
Constructing new covariances from old
9
Prediction with GPs
10
GP regression with Gaussian noise
p(y|f ) = N (f , 2I)
= N (0, K + 2I)
11
Prediction
N training input and output pairs (X, y), and T test inputs XT
12
Prediction
= KN [KN + 2I]1y
2 = K KN [KN + 2I]1KN + 2 .
Prediction cost per test case is O(N ) for the mean and O(N 2) for
the variance
13
Determination of hyperparameters
1 1 > 1 N
L = log p(y|) = log det C() + y C ()y + log(2)
2 2 2
where C = K + 2I
14
Gradient based optimization
Gradients:
L 1 1 C 1 > 1 C 1
= tr C y C C y
i 2 i 2 i
15
Automatic relevance determination1
1
Neal, 1996. MacKay, 1998.
16
Relationship to generalized linear regression
M
X
f (x) = wmm(x) = w>(x)
m=1
17
Relationship to generalized linear regression
NH
X
f (x) = b + vj h(x; uj )
j=1
19
Relationship to spline models
log p(y|f)
log p(y|f)
f y e f e y
22
GP classification
23
GP classification
High dimensional data points are feature vectors derived from pose
information from mo-cap data.
4
Grochow, Martin, Hertzmann, Popovic, 2004. Style-based inverse kinematics. Demos available
from http://grail.cs.washington.edu/projects/styleik/
26
Sparse GP approximations
Problem for large data sets: training GP O(N 3), prediction O(N 2)
per test case
5
Lawrence et al., 2003
27
Two stage generative model
f
pseudo-input prior
p(f |X) = N (0, KM )
28
Two stage generative model
conditional
p(f |f ) = N (, )
f = KNM K1
M f
= KN KNM K1
M KMN
x
X
3. Draw f conditioned on f
n = KnM K1
M f
f
n = Knn KnM K1
M KM n
Y
Approximate: p(f |f ) p(fn|f ) = N (, ) , = diag()
n
h Y i
Minimum KL: min KL p(f |f ) k qn(fn)
qn
n
30
Sparse pseudo-input Gaussian processes (SPGP)6
R Q
Integrate out f to obtain SPGP prior: p(f ) = df n p(fn|f ) p(f )
GP prior SPGP/FITC prior
N (0, KN ) p(f ) = N (0, KNM K1
M KMN + )
= +
32
1D demo
amplitude lengthscale noise
32
Future GP directions
33