Вы находитесь на странице: 1из 42

# MODULE-IV

## STATISTICAL METHODS IN ANN

Module 4
Statistical Methods: Boltzmann's Training - Cauchy
training - Artificial specific heat methods - applications
to general non-linear optimization problems

## Statistical Methods are used for

Training ANN
Producing output from trained network
Training Methods

Deterministic Methods
Statistical Training Methods

## Follows a step by step procedure.

Weights are changed based on

their current

values of weight.

It

## also based on the desired output and the

actual output.
E.g.:-Perceptron Training Algorithm.
Back Propagation Algorithm etc

Retains

## only those change which results in

improvements.

GENERAL PROCEDURE
( FOR STTISTICAL TRAINING METHOD)
Apply a set of input and compute the resulting
output

## Compare the result with target, find the error.

The objective of the training is to minimize the error.
Select a weight in random and adjust it by a small
random amount.

If

the change

Otherwise

value

Repeat

## The local minima problem

Objective Function

## The objective function minimization problem can get

trapped in poor solution.

A
B
Weight

## If the objective function is at A and if the random

weight

changes

are

small

then

the

weight

The superior weight setting at point B will never
found and the

## minima instead of global minima at point B.

If the random weight changes are large both point
A and B are visited frequently, but so will every
other point.
The weight will change so drastically that it will
never settle at desired point.

## Solution & Explanation

Statistical method overcome local minima problem by
Example:

Let

## the fig. represents a ball on a surface in a

box.
If the box is shaken violently ,then the ball will
move rapidly from one side to the other side.
The probability to occupy any point on the
surface is equal for all points.

If the violence

## The ANN are trained in the same way as through

The weight change that improves the objective
function is retained.
The average step size is hen gradually reduced to
reach global minimum.

## Annealing [ Boltzmann Law ]

Annealing:-If a metal is raised to a temperature
above melting point ,the atoms are in violent random
motion. The atoms always tend to reach a minimum
energy state. As the metal is gradually cooled the
atoms enters a minimum possible energy state
corresponds to each temperature.

P (e) exp ( e / kT )
P(e)=probability that the system is in a state with
energy e.,k Boltzmanns constant. T temperature.

## Apply a set of input to the network, and calculate

the outputs and objective function.

network output.

weight change.

If the weight

## weight change ,calculate the probability of accepting

the weight change.

P(c) exp

( c / kT )

## P(c)=probability of a change of c in the objective

function,k Boltzmanns constant. T temperature.

Select

## a random number r from a uniform

distribution between zero and one.
If p(c) is greater than r, retain the change otherwise
return the weight to previous value.
This allows the system to take a step in a
direction that worsen the objective
function,hence
escapes
from
local
minimum.
Repeat the weight change process over each of the
weights in the network, gradually reducing the
temperature T untill an acceptably low value for
objective function is obtained.

## The size of the random weight change is selected by

various methods.
2
2
Eg:- P ( ) exp( w / T )
P(w)=Probability of a weight change of size w.
T=artificial temperature
To achieve global minimum at the earliest the
cooling rate is usually expressed as follows

T0
T (t )
(log(1 t ))

## The main disadvantage of Boltzmanns training is very

low cooling rate and hence long computations.
Boltzmanns machine usually takes impractical time
for training.

Cauchy Training
Cauchy training method is more rapid than
Boltzmann training.

## Cauchy training substitutes cauchys distribution for

Boltzmann's distribution.

## Caushys distribution has longer tails", hence more

probability for larger step size.

The

## inverse linear. (For Boltzmann training it was inverse

logarithmic.)

Cauchy s distribution is

T (t )
P( x)
[T (t ) 2 x 2 ]
The inverse linear relationship
reduction reduces the training time.

T0
T (t )
(1 t )

for

temperature