Академический Документы
Профессиональный Документы
Культура Документы
Pawel Herman
CB, KTH
Outline
3 Regularisation Concept
4 Bayesian Framework
What is generalisation?
Generalisation is the capacity to apply learned knwoledge
to new situations
the capability of a learning machine to perform well on
unseen data
a tacit assumption that the test set is drawn from the same
distribution as the training one
Overfitting phenomenon
X → T : T = f (X) + ε
h i
R[F ] = ED (t − F (x,w))2 reduces to the sum with the first term
independent of NN model F :
h i h i
ED (t − f (x))2 + ED (F (x,w) − f (x))2
Bias-variance illustration
Adding a noise term (zero mean) to the input training data has
been reported to enhance generalisation capabilities of MLPs
It is expected that noise smears out data points and thus
prevents the network from precise fitting original data
In practice, each time the data point is presented for network
learning new noise vector should be added
Alternatively, sample-by-sample training with data re-ordering for
every generation
Early stopping
An additional data set is required - validation set (split of
original data)
The network is trained with BP until the error monitored on the
validation set reaches minimum (further on only noise fitting)
Bootstrap
Bootstrap estimate:
boot T
Etrue = Eemp + B boot
1
Tk →T
B boot = ∑ Tk
Eemp − Eemp
K
Alternatively, “0.632” estimator can be used:
0.632 T 1 k T →T
Etrue = 0.368 Eemp + 0.632 ∑ Eemp
K
n-fold cross-validation
1
CV
Etrue = E T rTk →Tk
n ∑ emp
Pawel Herman Regularisation, Bayesian methods
Generalisation in Neural Networks
Methods to Improve Generalisation
Regularisation Concept
Bayesian Framework
Ẽ = E + λ Ω
(wi /w0 )2
Ω=∑
1 + (wi /w0 )2
w0 is a parameter
variable selection algorithm
it favours few large weights over a number of small ones
exp (−β ED )
P (D| w) =
ZD (β )
This corresponds to theGaussian noise model of the target
data, pdf(t | x, w) ∝ exp − (y (x, w) − t)2
α, β are hyperparameters that describe the distribution of
network parameters
Pawel Herman Regularisation, Bayesian methods
Generalisation in Neural Networks
Methods to Improve Generalisation
Regularisation Concept
Bayesian Framework
2
with M(w) = αEw + β ED = α ∑ wj2 + β ∑ (y (xi , w) − ti )
γ (n ) N − γ (n )
α (n+1) =
2 , β
(n+1)
=
2 ∑N (n ) − t 2
2
w(n)
i =1 y xi , w i
1
For the uniform distribution of the prior P wopt | Hi =
4wprior
4wposterior
Occam factor =
4wprior
References