Академический Документы
Профессиональный Документы
Культура Документы
Introduction to Non-
Conventional Optimization
Evolutionary and heuristic algorithms
Heuristic algorithm
A heuristic algorithm is one that is designed
to solve a problem in a faster and more
efficient fashion than traditional methods
by sacrificing optimality, accuracy,
precision, or completeness for speed.
Heuristic algorithm cont…
Heuristic algorithms are most often
employed when approximate solutions are
sufficient and exact solutions are
necessarily computationally expensive.
Example Algorithms
1.1 Swarm Intelligence
1.2 Tabu Search
1.3 Simulated Annealing
1.4 Genetic Algorithms
1.5 Artificial Neural Networks
1.6 Support Vector Machines
Artificial Neural Networks
- Introduction -
Overview
1. Biological inspiration
2. Artificial neurons and neural networks
3. Learning processes
4. Learning with artificial neural networks
Biological inspiration
Animals are able to react adaptively to changes in their
external and internal environment, and they use their nervous
system to perform these behaviours.
Experiment:
• Pigeon in Skinner box
• Present paintings of two different artists (e.g.
Chagall / Van Gogh)
• Reward for pecking when presented a particular
artist (e.g. Van Gogh)
Pigeons were able to discriminate between Van
Gogh and Chagall with 95% accuracy (when
presented with pictures they had been trained on)
synapse axon
nucleus
cell body
dendrites
Biological inspiration
Dendrites
Axon
Biological inspiration
dendrites
axon
synapses
x1 x2 ….. xn
Recurrent Neural Networks
Can have arbitrary topologies
Can model systems with
internal states (dynamic ones)
Delays are associated to a
0 1 specific weight
0 Training is more difficult
0 Performance may be
1
problematic
0 Stable Outputs may be
0 1 more difficult to evaluate
Unexpected behavior
x1 x2 (oscillation, chaos, …)
Learning
The procedure that consists in estimating the parameters
of neurons so that the whole network can perform a
specific task
2 types of learning
The supervised learning
The unsupervised learning
x3
w3 i 1 y
… . wn-1
xn-1 .
wn
xn
The McCullogh-Pitts model
Graphical Notation & Terms
Circles
Are neural units
Metaphor for nerve cell
body
Arrows
Represent synaptic
connections from one unit Another layer of
to another
neural units
These are called weights
and represented with a One layer of
scalar numeric value (e.g., neural units
a real number)
Another Example: 8 units in each
layer, fully connected network
Units & Weights
Units 1 W1,1
Sometimes notated W1,2
Unit number
Unit numbers
with unit numbers
2
1
3 W1,3
Weights W1,4
Sometimes give by 4
symbols
Sometimes given by
numbers
1 0.3
Always represent
numbers 2 -0.1
May be integer or 1
real valued 3 2.1
-1.1
4
Computing with Neural Units
Inputs are presented to input units
How do we generate outputs?
One idea
Summed Weighted Inputs
1 0.3
Input: (3, 1, 0, -2)
2 -0.1
Processing: 2.1
3
3(0.3) + 1(-0.1) + 0(2.1) + -1.1(-2) -1.1
4
= 0.9 + (-0.1) + 2.2
Output: 3
Activation Functions
Usually, don’t just use weighted sum directly
Apply some function to the weighted sum before
it is used (e.g., as output)
Call this the activation function
Step function could be a good simulation of a
biological neuron spiking Is called the
threshold
1 if x
f ( x)
0 if x
Step function
Step Function Example
Let = 3
1 0.3
1
2 -0.1
x=3
3 2.1
3
-1.1
4
Output after passing through f (3) 1
step activation function:
Step Function Example (2)
Let = 3
1 0.3
1
2 -0.1
2.1 x
3 3
-1.1
4
Output after passing through f (x) ?
step activation function:
Another Activation Function:
The Sigmoidal
1
Is the
steepness
f ( x) x
parameter
1 e
Sigmoidal Example
Input: (3, 1, 0, -2)
0.3
2
-0.1 1
f (x)
2.1 1 e2x
-1.1
1
f (3) 2x
.998
1 e
Input: (0, 10, 0, 0)
network output?
Sigmoidal
1.2
0.8
1/(1+exp(-x)))
0.6
1/(1+exp(-10*x)))
0.4
0.2
0
-5
-2
4
-4.4
-3.8
-3.2
-2.6
-1.4
-0.8
-0.2
0.4
1.6
2.2
2.8
3.4
4.6
Another Example
a1,2
f (W1,1a1,1 W1, 2 a2,1 W1,3a3,1 W1, 4 a4,1 )
a1,1 W1,1
a2 ,1 W1,2
a3,1 W1,3 a1, 2
a4 ,1 W1,4
Generalizing
ak ,l 1 f (i 1Wi , j ai ,l )
n
a1,1
a
W1a1 W1,1 W1, 2 W1,3 W1, 4 2,1
a3,1
a4,1
Learning Linearly Separable
Functions
The AND
Learning Starts
Req
d Learni
Epo I I I Outp Activati Err Converge ng
ch 0 1 2 ut W0 W1 W2 Sum on or d? Rate
-
1 1 0 0 0 0.3 0.5 -0.4 -0.3 0 0 0.1
How to update the weights
Req
d Learni
Epo I I I Outp Activati Err Converge ng
ch 0 1 2 ut W0 W1 W2 Sum on or d? Rate
-
1 1 0 0 0 0.3 0.5 -0.4 -0.3 0 0 0.1
Learning Rate
After Epoch 1
Req
d Learni
Epo I I I Out Activati Err Converge ng
ch 0 1 2 put W0 W1 W2 Sum on or d? Rate
-
1 1 0 0 0 0.3 0.5 -0.4 -0.3 0 0 0.1
-
1 0 1 0 0.3 0.5 -0.4 -0.7 0 0
-
1 1 0 0 0.3 0.5 -0.4 0.2 1 -1
Not
- Converge
1 1 1 1 0.4 0.4 -0.4 -0.4 0 1 d
At the end of Epoch 3
Reqd
Epoc Outpu Activatio Erro Converged Learnin
h I0 I1 I2 t W0 W1 W2 Sum n r ? g Rate
1 -1 0 0 0 0.3 0.5 -0.4 -0.3 0 0 0.1
-1 0 1 0 0.3 0.5 -0.4 -0.7 0 0
-1 1 0 0 0.3 0.5 -0.4 0.2 1 -1
Not
-1 1 1 1 0.4 0.4 -0.4 -0.4 0 1Converged
2 -1 0 0 0 0.3 0.5 -0.3 -0.3 0 0
-1 0 1 0 0.3 0.5 -0.3 -0.6 0 0
-1 1 0 0 0.3 0.5 -0.3 0.2 1 -1
Not
-1 1 1 1 0.4 0.4 -0.3 -0.3 0 1Converged
3 -1 0 0 0 0.3 0.5 -0.2 -0.3 0 0
-1 0 1 0 0.3 0.5 -0.2 -0.5 0 0
-1 1 0 0 0.3 0.5 -0.2 0.2 1 -1
Not
-1 1 1 1 0.4 0.4 -0.2 -0.2 0 1Converged
Learning Linearly non Separable
Functions
Example
Say we want to create a neural network that tests
for the equality of two bits: f(x1, x2) = z1
When x1 and x2 are equal, z1 is 1, otherwise, z1 is 0
The function we want to approximate is as
follows: Inputs:
Goal
Sample No. x1 x2 z1 outputs:
1 0 0 1
2 0 1 0
3 1 0 0
4 1 1 1
What architecture might be suitable for a neural network?
Architecture for ANN Approximating th
Equality Function for 2 Bits
Possible Network Goal
Inputs: outputs:
Architecture
No. x1 x2 z1
x1 y1
1 0 0 1
z1
2 0 1 0
x2 y2
3 1 0 0
4 1 1 1
x1 y1 x1 x2 z1
z1 0 0 .925
x2 y2 0 1 .192
1 0 .19
Weights
1 1 .433
Categorically
For inputs x1=0, x2=0 and x1=1, x2=1, the
output of the network was always greater than
for inputs x1=1, x2=0 and x1=0, x2=1
Summed squared error
numTrainSamples
( ActualOutp
s 1
ut s DesiredOut put s ) 2
Compute the summed squared error for our
example
x1 x2 z1
0 0 .925
0 1 .192
1 0 .19
1 1 .433
numTrainSamples
( ActualOutp
s 1
ut s DesiredOut put s ) 2
Solution
Expected Actual
x1 x2 z1 z1 squared error
0 0 1 0.925 0.005625
0 1 0 0.192 0.036864
1 0 0 0.19 0.0361
1 1 1 0.433 0.321489
1 3.0 .1
4
3 0.4 2 2 3 1 .4 2
f ( 1.1 1 ) f ( .1 7 4 6 5 3 )
.75
2 0 0 1 3 3 .6 3
3 2 1
y f ( x, w)
y is the neuron’s output, x is the vector of inputs, and w
is the vector of synaptic weights.
Examples: 1
y w xa
sigmoidal neuron
1 e
T
|| x w||2
ye 2a 2 Gaussian neuron
Artificial neural networks
Output
Inputs
The young animal learns that the green fruits are sour,
while the yellowish/reddish ones are sweet. The learning
happens by adapting the fruit picking behavior.
ENERGY MINIMIZATION
Output
Inputs
yout F ( x,W )
r ( x) r (|| x c ||)
|| x w||2
Example: f ( x) e 2a 2 Gaussian RBF
|| x w1,k ||2
4
y out wk2 e 2( ak ) 2
x
k 1 yout
Neural network tasks
• control
• classification These can be reformulated
in general as
• prediction
FUNCTION
• approximation
APPROXIMATION
tasks.
Task specification:
t 1
E
wi c j
(W )
wi j
wi j , new
wi wi
j j
Learning:
E (t ) ( w(t )T x t yt ) 2
wi (t 1) wi (t ) c wi (t ) c
wi wi
wi (t 1) wi (t ) c ( w(t )T x t yt ) xit
m
w(t ) x w j (t ) x tj
T
j 1
k 1
1 2 N
Data: ( x , y1 ), ( x , y 2 ),...,
( x , yN )
|| x t w1,k ||2
M
Error: E (t ) ( y (t ) out yt ) ( wk2 (t ) e
2 2( ak ) 2
yt ) 2
k 1
Learning: E (t )
w (t 1) w (t ) c
2 2
wi2
i i
|| x t w1,i ||2
E (t )
2 ( ai ) 2
2 ( F ( x t
,W (t )) yt ) e
wi2
with p layers 1
y k2 w 2 kT y 1 a k2
, k 1,..., M 2
yout 1 e
x y 2 ( y12 ,..., y M2 )T 2
...
1 2 … p-1 p yout F ( x;W ) w pT y p 1
1 2 N
Data: ( x , y1 ), ( x , y 2 ),...,
( x , yN )
Error: E(t ) ( y(t ) out yt ) 2 ( F ( xt ;W ) yt ) 2
M
1
yout F ( x;W ) w 2
k w1,kT x ak
k 1 1 e
Learning with general
optimisation
Synaptic weight change rules for the output neuron:
E (t )
w (t 1) w (t ) c
2 2
wi2
i i
E (t ) 1
2 ( F ( x t
, W (t )) y )
wi2
t
1 ew x t ai
1,iT
E (t ) 1
2 ( F ( x t
, W (t )) y )
w1j,i w1j,i
t w x ai
1 e
1,iT t
ew x t ai
1,iT
1
w1,iT x t ai
w1j,i 1 e
w1,iT
x ai
t
1 ew
1,iT
x t ai
2
w j
1,i
w j
1,i
w1,iT x t ai x tj
ew x t ai
1,iT
w (t 1) w (t ) c 2 ( F ( x ,W (t )) yt )
1, i 1, i t
( x tj )
j j
1 e w 1,iT
x ai
t
2
New methods for learning with
neural networks
Bayesian learning:
the distribution of the neural network
parameters is learnt