Вы находитесь на странице: 1из 85

7.

Introduction to Non-
Conventional Optimization
Evolutionary and heuristic algorithms
Heuristic algorithm
A heuristic algorithm is one that is designed
to solve a problem in a faster and more
efficient fashion than traditional methods
by sacrificing optimality, accuracy,
precision, or completeness for speed.
Heuristic algorithm cont…
Heuristic algorithms are most often
employed when approximate solutions are
sufficient and exact solutions are
necessarily computationally expensive.
Example Algorithms
1.1 Swarm Intelligence
1.2 Tabu Search
1.3 Simulated Annealing
1.4 Genetic Algorithms
1.5 Artificial Neural Networks
1.6 Support Vector Machines
Artificial Neural Networks
- Introduction -
Overview
1. Biological inspiration
2. Artificial neurons and neural networks
3. Learning processes
4. Learning with artificial neural networks
Biological inspiration
Animals are able to react adaptively to changes in their
external and internal environment, and they use their nervous
system to perform these behaviours.

An appropriate model/simulation of the nervous system


should be able to produce similar responses and behaviours in
artificial systems.

The nervous system is build by relatively simple units, the


neurons, so copying their behavior and functionality should be
the solution.
Biological Neural Nets
Pigeons as art experts (Watanabe et al. 1995)

Experiment:
• Pigeon in Skinner box
• Present paintings of two different artists (e.g.
Chagall / Van Gogh)
• Reward for pecking when presented a particular
artist (e.g. Van Gogh)
Pigeons were able to discriminate between Van
Gogh and Chagall with 95% accuracy (when
presented with pictures they had been trained on)

Discrimination still 85% successful for previously


unseen paintings of the artists
Pigeons do not simply memorise the pictures
They can extract and recognise patterns (the ‘style’)
They generalise from the already seen to make
predictions

This is what neural networks (biological and


artificial) are good at (unlike conventional
computer)
Biological neuron

synapse axon

nucleus

cell body

dendrites
Biological inspiration
Dendrites

Soma (cell body)

Axon
Biological inspiration

dendrites
axon

synapses

The information transmission happens at the synapses.


Biological inspiration
The spikes travelling along the axon of the pre-synaptic
neuron trigger the release of neurotransmitter substances at
the synapse.
The neurotransmitters cause excitation or inhibition in the
dendrite of the post-synaptic neuron.
The integration of the excitatory and inhibitory signals
may produce spikes in the post-synaptic neuron.
The contribution of the signals depends on the strength of
the synaptic connection.
Neural Networks
A mathematical model to solve engineering problems
Group of highly connected neurons to realize
compositions of non linear functions
Tasks
Classification
Discrimination
Estimation
2 types of networks
Feed forward Neural Networks
Recurrent Neural Networks
Feed Forward Neural Networks
The information is
propagated from the
Output inputs to the outputs
layer Computations of No non
2nd hidden linear functions from n
layer input variables by
compositions of Nc
1st hidden algebraic functions
layer Time has no role (NO
cycle between outputs
and inputs)

x1 x2 ….. xn
Recurrent Neural Networks
Can have arbitrary topologies
Can model systems with
internal states (dynamic ones)
Delays are associated to a
0 1 specific weight
0 Training is more difficult
0 Performance may be
1
problematic
0 Stable Outputs may be
0 1 more difficult to evaluate
Unexpected behavior
x1 x2 (oscillation, chaos, …)
Learning
The procedure that consists in estimating the parameters
of neurons so that the whole network can perform a
specific task

2 types of learning
The supervised learning
The unsupervised learning

The Learning process (supervised)


Present the network a number of inputs and their
corresponding outputs
See how closely the actual outputs match the desired
ones
Modify the parameters to better approximate the
desired outputs
Supervised learning
The desired response of the neural network
in function of particular inputs is well
known.
A “Professor” may provide examples and
teach the neural network how to fulfill a
certain task
Unsupervised learning
Idea : group typical input data in function of
resemblance criteria un-known a priori
Data clustering
No need of a professor
The network finds itself the correlations between the
data
Examples of such networks :
• Kohonen feature maps
Properties of Neural Networks
Supervised networks are universal approximators (Non
recurrent networks)
Theorem : Any limited function can be approximated by a
neural network with a finite number of hidden neurons to
an arbitrary precision
Type of Approximators
Linear approximators : for a given precision, the number of
parameters grows exponentially with the number of variables
(polynomials)
Non-linear approximators (NN), the number of parameters grows
linearly with the number of variables
Other properties
Adaptivity
Adapt weights to environment and retrained easily
Generalization ability
May provide against lack of data
Fault tolerance
Graceful degradation of performances if damaged =>
The information is distributed within the entire net.
Artificial neurons
Neurons work by processing information. They receive and
provide information in form of spikes.
x1 w1
x2
w2 n Output
z   wi xi ; y  H ( z )
Inputs

x3
w3 i 1 y
… . wn-1
xn-1 .
wn
xn
The McCullogh-Pitts model
Graphical Notation & Terms
Circles
Are neural units
Metaphor for nerve cell
body

Arrows
Represent synaptic
connections from one unit Another layer of
to another
neural units
These are called weights
and represented with a One layer of
scalar numeric value (e.g., neural units
a real number)
Another Example: 8 units in each
layer, fully connected network
Units & Weights

Units 1 W1,1
Sometimes notated W1,2

Unit number
Unit numbers
with unit numbers
2
1
3 W1,3
Weights W1,4
Sometimes give by 4
symbols
Sometimes given by
numbers
1 0.3
Always represent
numbers 2 -0.1
May be integer or 1
real valued 3 2.1
-1.1
4
Computing with Neural Units
Inputs are presented to input units
How do we generate outputs?
One idea
Summed Weighted Inputs
1 0.3
Input: (3, 1, 0, -2)
2 -0.1
Processing: 2.1
3
3(0.3) + 1(-0.1) + 0(2.1) + -1.1(-2) -1.1
4
= 0.9 + (-0.1) + 2.2
Output: 3
Activation Functions
Usually, don’t just use weighted sum directly
Apply some function to the weighted sum before
it is used (e.g., as output)
Call this the activation function
Step function could be a good simulation of a


biological neuron spiking Is called the
threshold
1 if x   
f ( x)   
0 if x   
Step function
Step Function Example
Let  = 3

Input: (3, 1, 0, -2)

1 0.3
1
2 -0.1
x=3
3 2.1
3
-1.1
4
Output after passing through f (3) 1
step activation function:
Step Function Example (2)
Let  = 3

Input: (0, 10, 0, 0)

1 0.3
1
2 -0.1
2.1 x
3 3
-1.1
4
Output after passing through f (x)  ?
step activation function:
Another Activation Function:
The Sigmoidal

The math of some neural nets requires that


the activation function be continuously
differentiable
A sigmoidal function often used to
approximate the step function

1
 Is the
steepness
f ( x)  x
parameter

1 e
Sigmoidal Example
Input: (3, 1, 0, -2)

0.3
  2
-0.1 1
f (x) 
2.1 1 e2x
-1.1

1
f (3)  2x
 .998
1 e
Input: (0, 10, 0, 0)
network output?
Sigmoidal
1.2

0.8

1/(1+exp(-x)))
0.6
1/(1+exp(-10*x)))

0.4

0.2

0
-5

-2

4
-4.4
-3.8
-3.2
-2.6

-1.4
-0.8
-0.2
0.4

1.6
2.2
2.8
3.4

4.6
Another Example

A two weight layer, feedforward network


Two inputs, one output, one ‘hidden’ unit
Input: (3, 1)
1
f ( x) 
1  ex
0.5
0.75

-0.5 What is the output?


Computing in Multilayer Networks
Start at leftmost layer
Compute activations based on inputs
Then work from left to right, using computed activations
as inputs to next layer
Example solution
Activation of hidden unit
f(0.5(3) + -0.5(1)) = 1
f(1.5 – 0.5) = f ( x) 
f(1) = 0.731 1  ex
Output activation
f(0.731(0.75)) =
f(0.548) = .634
Notation
At times useful to represent weights and
activations using vector and matrix
notations

Weight (scalar) from unit j


a1,1 Wi , j in left layer to unit i in
W1,1 right layer
a2 ,1 W1,2
a3,1 W1,3 a1, 2
Activation value of unit k in
a4 ,1 W1,4 a k ,l layer l; layers increase in
number from left to right
Notation for Weighted Sums

a1,2 
f (W1,1a1,1  W1, 2 a2,1  W1,3a3,1  W1, 4 a4,1 )
a1,1 W1,1
a2 ,1 W1,2
a3,1 W1,3 a1, 2
a4 ,1 W1,4
Generalizing

ak ,l 1  f (i 1Wi , j ai ,l )
n

Weight (scalar) from unit j


a1,1 Wi , j in left layer to unit i in
W1,1 right layer
a2 ,1 W1,2
a3,1 W1,3 a1, 2
Activation value of unit k in
a4 ,1 W1,4 a k ,l layer l; layers increase in
number from left to right
Can Also Use Vector Notation

Row vector of incoming weights for unit i


Wi
ai Column vector of activation values of units
connected to unit i

(Assuming that the layer for unit i is


specified in the context)
Example

W1  W1,1 W1, 2 W1,3 W1, 4 


a1,1 W1,1
a2 ,1 W1,2
 a1  Recall: multiplying a n*r with a
a3,1 W1,3 a1, 2 a  r*m matrix produces an n*m
a1   2  matrix, C, where each element in
a4 ,1 W1,4 a3  that n*m matrix Ci,j is produced
  as the scalar product of row i of
a 4  the left and column j of the right

 a1,1 
a 
W1a1  W1,1 W1, 2 W1,3 W1, 4  2,1 

 a3,1 
 
a4,1 
Learning Linearly Separable
Functions
The AND
Learning Starts

Req
d Learni
Epo I I I Outp Activati Err Converge ng
ch 0 1 2 ut W0 W1 W2 Sum on or d? Rate

-
1 1 0 0 0 0.3 0.5 -0.4 -0.3 0 0 0.1
How to update the weights

Req
d Learni
Epo I I I Outp Activati Err Converge ng
ch 0 1 2 ut W0 W1 W2 Sum on or d? Rate

-
1 1 0 0 0 0.3 0.5 -0.4 -0.3 0 0 0.1
Learning Rate
After Epoch 1
Req
d Learni
Epo I I I Out Activati Err Converge ng
ch 0 1 2 put W0 W1 W2 Sum on or d? Rate
-
1 1 0 0 0 0.3 0.5 -0.4 -0.3 0 0 0.1
-
1 0 1 0 0.3 0.5 -0.4 -0.7 0 0
-
1 1 0 0 0.3 0.5 -0.4 0.2 1 -1
Not
- Converge
1 1 1 1 0.4 0.4 -0.4 -0.4 0 1 d
At the end of Epoch 3
Reqd
Epoc Outpu Activatio Erro Converged Learnin
h I0 I1 I2 t W0 W1 W2 Sum n r ? g Rate
1 -1 0 0 0 0.3 0.5 -0.4 -0.3 0 0 0.1
-1 0 1 0 0.3 0.5 -0.4 -0.7 0 0
-1 1 0 0 0.3 0.5 -0.4 0.2 1 -1
Not
-1 1 1 1 0.4 0.4 -0.4 -0.4 0 1Converged
2 -1 0 0 0 0.3 0.5 -0.3 -0.3 0 0
-1 0 1 0 0.3 0.5 -0.3 -0.6 0 0
-1 1 0 0 0.3 0.5 -0.3 0.2 1 -1
Not
-1 1 1 1 0.4 0.4 -0.3 -0.3 0 1Converged
3 -1 0 0 0 0.3 0.5 -0.2 -0.3 0 0
-1 0 1 0 0.3 0.5 -0.2 -0.5 0 0
-1 1 0 0 0.3 0.5 -0.2 0.2 1 -1
Not
-1 1 1 1 0.4 0.4 -0.2 -0.2 0 1Converged
Learning Linearly non Separable
Functions
Example
Say we want to create a neural network that tests
for the equality of two bits: f(x1, x2) = z1
When x1 and x2 are equal, z1 is 1, otherwise, z1 is 0
The function we want to approximate is as
follows: Inputs:
Goal
Sample No. x1 x2 z1 outputs:
1 0 0 1
2 0 1 0
3 1 0 0
4 1 1 1
What architecture might be suitable for a neural network?
Architecture for ANN Approximating th
Equality Function for 2 Bits
Possible Network Goal
Inputs: outputs:
Architecture
No. x1 x2 z1
x1 y1
1 0 0 1
z1
2 0 1 0
x2 y2
3 1 0 0
4 1 1 1

What weights would allow this architecture to


approximate this function?
Later: How do we define the weights through a process of learning or training?
Approximate Solution Actual
network
Network Architecture results:

x1 y1 x1 x2 z1
z1 0 0 .925
x2 y2 0 1 .192
1 0 .19
Weights
1 1 .433

w_x1_y1 w_x1_y2 w_x2_y1 w_x2_y2 w_y1_z1 w_y2_z1

-1.8045 -7.7299 -1.8116 -7.6649 -10.3022 15.3298


Quality Measures
A given ANN may only approximate the
desired function (e.g., equality for two bits)
We need to measure the quality of the
approximation
I.e., how closely did the ANN approximate
the desired function?
How well did this approximate the
goal function?

Categorically
For inputs x1=0, x2=0 and x1=1, x2=1, the
output of the network was always greater than
for inputs x1=1, x2=0 and x1=0, x2=1
Summed squared error
numTrainSamples

 ( ActualOutp
s 1
ut s  DesiredOut put s ) 2
Compute the summed squared error for our
example
x1 x2 z1
0 0 .925
0 1 .192
1 0 .19
1 1 .433

numTrainSamples

 ( ActualOutp
s 1
ut s  DesiredOut put s ) 2
Solution
Expected Actual
x1 x2 z1 z1 squared error
0 0 1 0.925 0.005625
0 1 0 0.192 0.036864
1 0 0 0.19 0.0361
1 1 1 0.433 0.321489

Sum squared error = 0.400078

Generally, lower values for sum squared error


indicate better approximation; 0 is “perfect”
Need also consider generalization-- later.
Weight Matrix

Row vector provides weights for a single unit in


“right” layer
A weight matrix can provide all weights
connecting “left” layer to “right” layer
Let W be a n*r weight matrix
Row vector i in matrix connects unit i on “right” layer
to units in “left” layer
n units in layer to “right”
r units in layer to “left”
Notation

ai The vector of activation values of layer to


“left”; an r*1 column vector (same as before)

n*1 column vector;


Wai summed weights for
“right” layer

n*1 - New activation values for


f (Wai ) “right” layer
Function f is now taken as applying to
elements of a vector
Example
Updating hidden
layer activation Updating output
values activation values

 1 3.0 .1
4   
 3  0.4  2 2 3 1 .4  2 
f ( 1.1  1   ) f ( .1 7 4 6 5   3  )
  .75  
2 0  0 1 3 3 .6  3 
 3  2  1 

Draw the architecture (units and


arcs representing weights) of
the connectionist model
Answer
2 input units
5 hidden layer units
3 output units
Fully connected, feedforward network
Artificial neurons

The McCullogh-Pitts model:


• spikes are interpreted as spike rates;
• synaptic strength are translated as synaptic weights;
• excitation means positive product between the
incoming spike rate and the corresponding synaptic
weight;
• inhibition means negative product between the
incoming spike rate and the corresponding synaptic
weight;
Artificial neurons
Nonlinear generalization of the McCullogh-Pitts
neuron:

y  f ( x, w)
y is the neuron’s output, x is the vector of inputs, and w
is the vector of synaptic weights.
Examples: 1
y w xa
sigmoidal neuron
1 e
T

|| x  w||2

ye 2a 2 Gaussian neuron
Artificial neural networks

Output
Inputs

An artificial neural network is composed of many artificial


neurons that are linked together according to a specific
network architecture. The objective of the neural network
is to transform the inputs into meaningful outputs.
Artificial neural networks
Tasks to be solved by artificial neural networks:
• controlling the movements of a robot based on self-
perception and other information (e.g., visual
information);
• deciding the category of potential food items (e.g.,
edible or non-edible) in an artificial world;
• recognizing a visual object (e.g., a familiar face);
• predicting where a moving object goes, when a robot
wants to catch it.
Learning in biological systems

Learning = learning by adaptation

The young animal learns that the green fruits are sour,
while the yellowish/reddish ones are sweet. The learning
happens by adapting the fruit picking behavior.

At the neural level the learning happens by changing of the


synaptic strengths, eliminating some synapses, and
building new ones.
Learning as optimisation

The objective of adapting the responses on the basis of the


information received from the environment is to achieve a
better state. E.g., the animal likes to eat many energy rich,
juicy fruits that make its stomach full, and makes it feel
happy.

In other words, the objective of learning in biological


organisms is to optimise the amount of available resources,
happiness, or in general to achieve a closer to optimal state.
Learning in biological neural
networks

The learning rules of Hebb:


• synchronous activation increases the synaptic strength;
• asynchronous activation decreases the synaptic strength.

These rules fit with energy minimization principles.


Maintaining synaptic strength needs energy, it should be
maintained at those places where it is needed, and it
shouldn’t be maintained at places where it’s not needed.
Learning principle for
artificial neural networks

ENERGY MINIMIZATION

We need an appropriate definition of energy for artificial


neural networks, and having that we can use
mathematical optimisation techniques to find how to
change the weights of the synaptic connections between
neurons.

ENERGY = measure of task performance error


Neural network mathematics

Output
Inputs

y11  f ( x1 , w11 )  y11  2


  y1  f ( y1 , w12 )  y 2

y 2  f ( x 2 , w2 ) y 1   y 2  2
1 1 1  3

 y 2  f ( y , w2 )
1 2 2
  2
 y  f ( y 2
, w3
1)
1
y y 3 Out
y 31  f ( x3 , w31 )  y3  2  2 
1  y3  f ( y , w3 )

1 2
 y3 
y 14  f ( x 4 , w14 )  y4 
Neural network mathematics

Neural network: input / output transformation

yout  F ( x,W )

W is the matrix of all weight vectors.


MLP neural networks
MLP = multi-layer perceptron
Perceptron:
yout  wT x x yout

MLP neural network:


1
y1k   w1 kT x  a1k
, k  1,2,3
1 e
y1  ( y11 , y12 , y31 )T
1
y k2   w 2 kT y 1  a k2
, k  1,2
1 e
y 2  ( y12 , y 22 )T yout
2
x
yout   wk3 y k2  w3T y 2
k 1
RBF neural networks
RBF = radial basis function

r ( x)  r (|| x  c ||)
|| x  w||2

Example: f ( x)  e 2a 2 Gaussian RBF

|| x  w1,k ||2
4 
y out   wk2  e 2( ak ) 2
x
k 1 yout
Neural network tasks
• control
• classification These can be reformulated
in general as
• prediction
FUNCTION
• approximation
APPROXIMATION
tasks.

Approximation: given a set of values of a function g(x)


build a neural network that approximates the g(x) values
for any input x.
Neural network approximation

Task specification:

Data: set of value pairs: (xt, yt), yt=g(xt) + zt; zt is random


measurement noise.

Objective: find a neural network that represents the input /


output transformation (a function) F(x,W) such that
F(x,W) approximates g(x) for every x
Learning to approximate
Error measure:
N
1
E
N
 t
( F ( x ; W )  y t ) 2

t 1

Rule for changing the synaptic weights:

E
wi  c j
(W )
wi j

wi j , new
 wi  wi
j j

c is the learning parameter (usually a constant)


Learning with a perceptron
Perceptron: yout  wT x
1 2 N
Data: ( x , y1 ), ( x , y2 ),...,( x , y N )
Error: E (t )  ( y (t ) out  yt ) 2
 ( w(t ) T t
x  yt ) 2

Learning:
E (t )  ( w(t )T x t  yt ) 2
wi (t  1)  wi (t )  c   wi (t )  c 
wi wi
wi (t  1)  wi (t )  c  ( w(t )T x t  yt )  xit
m
w(t ) x   w j (t )  x tj
T

j 1

A perceptron is able to learn a linear function.


Learning with RBF neural
networks
|| x  w1,k ||2
M 
RBF neural network: yout  F ( x,W )   wk  e
2 2( ak ) 2

k 1
1 2 N
Data: ( x , y1 ), ( x , y 2 ),...,
( x , yN )
|| x t  w1,k ||2
M 
Error: E (t )  ( y (t ) out  yt )  ( wk2 (t )  e
2 2( ak ) 2
 yt ) 2
k 1
Learning: E (t )
w (t  1)  w (t )  c 
2 2

wi2
i i

|| x t  w1,i ||2
E (t ) 
2 ( ai ) 2
 2  ( F ( x t
,W (t ))  yt )  e
wi2

Only the synaptic weights of the output neuron are modified.


An RBF neural network learns a nonlinear function.
Learning with MLP neural
networks
1
y1k   w1 kT x  a1k
, k  1,..., M 1
MLP neural network: 1 e
y1  ( y11 ,..., y1M )T 1

with p layers 1
y k2   w 2 kT y 1  a k2
, k  1,..., M 2
yout 1 e
x y 2  ( y12 ,..., y M2 )T 2

...
1 2 … p-1 p yout  F ( x;W )  w pT y p 1
1 2 N
Data: ( x , y1 ), ( x , y 2 ),...,
( x , yN )
Error: E(t )  ( y(t ) out  yt ) 2  ( F ( xt ;W )  yt ) 2

It is very complicated to calculate the weight changes.


Learning with backpropagation
Solution of the complicated learning:
• calculate first the changes for the synaptic weights
of the output neuron;
• calculate the changes backward starting from layer
p-1, and propagate backward the local error terms.

The method is still relatively complicated but it


is much simpler than the original optimisation
problem.
Learning with general
optimisation
In general it is enough to have a single layer of nonlinear
neurons in a neural network in order to learn to
approximate a nonlinear function.
In such case general optimisation may be applied without
too much difficulty.

Example: an MLP neural network with a single hidden layer:

M
1
yout  F ( x;W )   w  2
k  w1,kT x  ak
k 1 1 e
Learning with general
optimisation
Synaptic weight change rules for the output neuron:
E (t )
w (t  1)  w (t )  c 
2 2

wi2
i i

E (t ) 1
 2  ( F ( x t
, W (t ))  y ) 
wi2
t
1  ew x t  ai
1,iT

Synaptic weight change rules for the neurons of the


hidden layer: w (t  1)  w (t )  c  Ew(t )
1, i
j
1, i
j 1,i
j

E (t )   1 
 2  ( F ( x t
, W (t ))  y )   
w1j,i w1j,i
t w x  ai
 1 e
1,iT t

ew x t  ai

 
1,iT
 1  
     w1,iT x t  ai
w1j,i 1 e
w1,iT
x  ai
t


 1  ew
1,iT
x t  ai

2
w j
1,i


w j
1,i

 w1,iT x t  ai   x tj 
ew x t  ai
1,iT

w (t  1)  w (t )  c  2  ( F ( x ,W (t ))  yt ) 
1, i 1, i t
 ( x tj )
j j
1  e w 1,iT
x  ai
t

2
New methods for learning with
neural networks
Bayesian learning:
the distribution of the neural network
parameters is learnt

Support vector learning:


the minimal representative subset of the
available data is used to calculate the synaptic
weights of the neurons
Summary
• Artificial neural networks are inspired by the learning
processes that take place in biological systems.
• Artificial neurons and neural networks try to imitate the
working mechanisms of their biological counterparts.
• Learning can be perceived as an optimisation process.
• Biological neural learning happens by the modification
of the synaptic strength. Artificial neural networks learn
in the same way.
• The synapse strength modification rules for artificial
neural networks can be derived by applying mathematical
optimisation methods.
Summary
• Learning tasks of artificial neural networks can be
reformulated as function approximation tasks.
• Neural networks can be considered as nonlinear function
approximating tools (i.e., linear combinations of nonlinear
basis functions), where the parameters of the networks
should be found by applying optimisation methods.
• The optimisation is done with respect to the approximation
error measure.
• In general it is enough to have a single hidden layer neural
network (MLP, RBF or other) to learn the approximation of
a nonlinear function. In such cases general optimisation can
be applied to find the change rules for the synaptic weights.

Вам также может понравиться