Академический Документы
Профессиональный Документы
Культура Документы
Introduction
This article is Part 1 of a series of 3 articles that I am going to post. The proposed article
content will be as follows:
1. Part 1: This one, will be an introduction into Perceptron networks (single layer
neural networks)
2. Part 2: Will be about multi layer neural networks, and the back propogation
training method to solve a non-linear classification problem such as the logic of
an XOR logic gate. This is something that a Perceptron can't do. This is explained
further within this article
3. Part 3: Will be about how to use a genetic algorithm (GA) to train a multi layer
neural network to solve some logic problem
Figure 1 Neuron
The boundary of the neuron is known as the cell membrane. There is a voltage difference
(the membrane potential) between the inside and outside of the membrane.
If the input is large enough, an action potential is then generated. The action potential
(neuronal spike) then travels down the axon, away from the cell body.
Synapses
The connections between one neuron and another are called synapses. Information
always leaves a neuron via its axon (see Figure 1 above), and is then transmitted across a
synapse to the receiving neuron.
Neuron Firing
Neurons only fire when input is bigger than some threshold. It should, however, be noted
that firing doesn't get bigger as the stimulus increases, its an all or nothing arrangement.
Suppose that we have a firing rate at each neuron. Also suppose that a neuron connects
with m other neurons and so receives m-many inputs "x1 . xm", we could imagine
this configuration looking something like:
Activation
Function
The activation usually
uses one of the following functions.
Sigmoid Function
The stronger the input, the faster the neuron fires (the higher the firing rates). The
sigmoid is also very useful in multi-layer networks, as the sigmoid curve allows for
differentation (which is required in Back Propogation training of multi layer networks).
or if your
into maths
type
explanations
Step Function
A basic on/off type
function, if 0 > x then 0, else if x >= 0 then 1
or if your
into
math-type
explanations
Learning
A foreword on learning
Before we carry on to talk about perceptron learning lets consider a real world example :
How do you teach a child to recognize a chair? You show him examples, telling him,
"This is a chair. That is not a chair," until the child learns the concept of what a chair is.
In this stage, the child can look at the examples we have shown him and answer correctly
when asked, "Is this object a chair?"
Furthermore, if we show to the child new objects that he hasn't seen before, we could
expect him to recognize correctly whether the new object is a chair or not, providing that
we've given him enough positive and negative examples.
This is exactly the idea behind the perceptron.
Learning in perceptrons
Is the process of modifying the weights and the bias. A perceptron computes a binary
function of its input. Whatever a perceptron can compute it can learn to compute.
"The perceptron is a program that learn concepts, i.e. it can learn to respond with True
(1) or False (0) for inputs we present to it, by repeatedly "studying" examples presented
to it.
The Perceptron is a single layer neural network whose weights and biases could be
trained to produce a correct target vector when presented with the corresponding input
vector. The training technique used is called the perceptron learning rule. The perceptron
generated great interest due to its ability to generalize from its training vectors and work
with randomly distributed connections. Perceptrons are especially suited for simple
problems in pattern classification."
Professor Jianfeng feng, Centre for Scientific Computing, Warwick university, England.
The Learning Rule
The perceptron is trained to respond to each input vector with a corresponding target
output of either 0 or 1. The learning rule has been proven to converge on a solution in
finite time if a solution exists.
The learning rule can be summarized in the following two equations:
b=b+[T-A]
For all inputs i:
W(i) = W(i) + [ T - A ] * P(i)
Where W is the vector of weights, P is the input vector presented to the network, T is the
correct result that the neuron should have shown, A is the actual output of the neuron, and
b is the bias.
Training
Vectors from a training set are presented to the network one after another.
If the network's output is correct, no change is made.
Otherwise, the weights and biases are updated using the perceptron learning rule (as
shown above). When each epoch (an entire pass through all of the input training vectors
is called an epoch) of the training set has occured without error, training is complete.
At this time any input training vector may be presented to the network and it will respond
with the correct output vector. If a vector, P, not in the training set is presented to the
network, the network will tend to exhibit generalization by responding with an output
similar to target vectors for input vectors close to the previously unseen input vector P.
It can
be seen
that this
is
Introduction
This article is part 2 of a series of 3 articles that I am going to post. The proposed article
content will be as follows:
1. Part 1 : Is an introduction into Perceptron networks (single layer neural networks).
2. Part 2 : This one, is about multi layer neural networks, and the back propagation
training method to solve a non linear classification problem such as the logic of an
XOR logic gate. This is something that a Perceptron can't do. This is explained
further within this article.
3. Part 3 : Will be about how to use a genetic algorithm (GA) to train a multi layer
neural network to solve some logic problem.
Summary
This article will show how to use a multi-layer neural network to solve the XOR logic
problem.
The bias can be thought of as the propensity (a tendency towards a particular way of
behaving) of the perceptron to fire irrespective of it's inputs. The perceptron configuration
network shown above fires if the weighted sum > 0, or if you have into maths type
explanations
As already explained above, there is one input layer, one hidden layer and one output
layer.
It is by using the inputs and weights that we are able to work out the activation for a
given node. This is easily achieved for the hidden layer as it has direct links to the actual
input layer.
The output layer, however, knows nothing about the input layer as it is not directly
connected to it. So to work out the activation for an output node we need to make use of
the output from the hidden layer nodes, which are used as inputs to the output layer
nodes.
This entire process described above can be thought of as a pass forward from one layer to
the next.
This still works like it did with a single layer network; the activation for any given node
is still worked out as follows:
Types Of Learning
There are essentially 2 types of learning that may be applied, to a Neural Network, which
is "Reinforcement" and "Supervised"
Reinforcement
In Reinforcement learning, during training, a set of inputs is presented to the Neural
Network, the Output is 0.75, when the target was expecting 1.0.
The error (1.0 - 0.75) is used for training ('wrong by 0.25').
What if there are 2 outputs, then the total error is summed to give a single number
(typically sum of squared errors). Eg "your total error on all outputs is 1.76"
Note that this just tells you how wrong you were, not in which direction you were wrong.
Using this method we may never get a result, or it could be a case of 'Hunt the needle'.
NOTE : Part 3 of this series will be using a GA to train a Neural Network, which is
Reinforcement learning. The GA simply does what a GA does, and all the normal GA
phases to select weights for the Neural Network. There is no back propagation of values.
The Neural Network is just good or just bad. As one can imagine, this process takes a lot
more steps to get to the same result.
Supervised
In Supervised Learning the Neural Network is given more information.
Not just 'how wrong' it was, but 'in what direction it was wrong' like 'Hunt the needle' but
where you are told 'North a bit', 'West a bit'.
So you get, and use, far more information in Supervised Learning, and this is the normal
form of Neural Network learning algorithm. Back Propagation (what this article uses, is
Supervised Learning)
Learning Algorithm
In brief, to train a multi-layer Neural Network, the following steps are carried out:
Start off with random weights (and biases) in the Neural Network
Try one or more members of the training set, see how badly the output(s) are
compared to what they should be (compared to the target output(s))
Jiggle weights a bit, aimed at getting improvement on outputs
Now try with a new lot of the training set, or repeat again,
jiggling weights each time
This is what this article submission uses to solve the XOR problem. This is also called
"Back Propagation" (normally called BP or BackProp)
Backprop allows you to use this error at output, to adjust the weights arriving at the
output layer, but then also allows you to calculate the effective error 1 layer back, and use
this to adjust the weights arriving there, and so on, back-propagating errors through any
number of layers.
The trick is the use of a sigmoid as the non-linear transfer function (which was covered in
Part 1. The sigmoid is used as it offers the ability to apply differentiation techniques.
Because this is
nicely
differentiable
it so happens
that
Which
in
emphasizes any consistent direction, then this will go down such valleys with gentle
bottom-slopes much more successfully (faster)
Starting The
Training
This is probably best demonstrated
with a code snippet from the
article's actual code:
Hide Shrink Copy Code
/// <summary>/// The main
training. The expected target
values are passed in to this/// method as parameters, and the <see
cref="NeuralNetwork">NeuralNetwork</see>/// is then updated with small
weight changes, for this training iteration/// This method also applied
momentum, to ensure that the NeuralNetwork is/// nurtured into
proceeding in the correct direction. We are trying to avoid valleys.///
If you don't know what valleys means, read the articles associated
text/// </summary>/// <param name="target">A double[] array containing
the target value(s)</param>private void train_network(double[] target)
{
//get momentum values (delta values from last pass)
double[]
delta_hidden = new double[nn.NumberOfHidden + 1];
double[] delta_outputs = new double[nn.NumberOfOutputs];
// Get the delta value for the output layer
for (int i = 0; i <
nn.NumberOfOutputs; i++)
{
delta_outputs[i] =
nn.Outputs[i] * (1.0 - nn.Outputs[i]) * (target[i] nn.Outputs[i]);
}
// Get the delta value for the hidden layer
for (int i = 0; i <
nn.NumberOfHidden + 1; i++)
{
double error = 0.0;
for (int j = 0; j < nn.NumberOfOutputs; j++)
{
error += nn.HiddenToOutputWeights[i, j] * delta_outputs[j];
}
delta_hidden[i] = nn.Hidden[i] * (1.0 - nn.Hidden[i]) * error;
}
// Now update the weights between hidden & output layer
for (int
i = 0; i < nn.NumberOfOutputs; i++)
{
for (int j = 0; j < nn.NumberOfHidden + 1; j++)
{
Introduction
This article is part 3 of a series of three articles that I am going to post. The proposed
article content will be as follows:
1. Part 1: This one will be an introduction into Perceptron networks (single layer
neural networks)
2. Part 2: Will be about multi-layer neural networks, and the back propagation
training method to solve a non-linear classification problem such as the logic of
an XOR logic gate. This is something that a Perceptron can't do. This is explained
further within this article.
3. Part 3: This one is about how to use a genetic algorithm (GA) to train a multilayer neural network to solve some logic problem, ;f you have never come across
genetic algorithms, perhaps my other article located here may be a good place to
start to learn the basics.
Summary
This article will show how to use a Microbial Genetic Algorithm to train a multi-layer
neural network to solve the XOR logic problem.
The bias can be thought of as the propensity (a tendency towards a particular way of
behaving) of the perceptron to fire irrespective of its inputs. The perceptron configuration
network shown in Figure 5 fires if the weighted sum > 0, or if you are into math type
explanations.
As already explained above, there is one input layer, one hidden layer, and one output
layer.
It is by using the inputs and weights that we are able to work out the activation for a
given node. This is easily achieved for the hidden layer as it has direct links to the actual
input layer.
The output layer, however, knows nothing about the input layer as it is not directly
connected to it. So to work out the activation for an output node, we need to make use of
the output from the hidden layer nodes, which are used as inputs to the output layer
nodes.
This entire process described above can be thought of as a pass forward from one layer to
the next.
This still works like it did with a single layer network; the activation for any given node
is still worked out as follows:
where wi is the weight(i), and Ii is the input(i) value. You see it the same old stuff, no
demons, smoke, or magic here. It's stuff we've already covered.
So that's how the network looks. Now I guess you want to know how to go about training
it.
Learning
There are essentially two types of learning that may be applied to a neural network,
which are "Reinforcement" and "Supervised".
Reinforcement
In Reinforcement learning, during training, a set of inputs is presented to the neural
network. The output is 0.75 when the target was expecting 1.0. The error (1.0 - 0.75) is
used for training ("wrong by 0.25"). What if there are two outputs? Then the total error is
summed to give a single number (typically sum of squared errors). E.g., "your total error
on all outputs is 1.76". Note that this just tells you how wrong you were, not in which
direction you were wrong. Using this method, we may never get a result, or could be hunt
the needle.
Using a generic algorithm to train a multi-layer neural network offers a Reinforcement
type training arrangement, where the mutation is responsible for "jiggling the weights a
bit". This is what this article is all about.
Supervised
In Supervised learning, the neural network is given more information. Not just "how
wrong" it was, but "in what direction it was wrong", like "Hunt the needle", but where
you are told "North a bit" "West a bit". So you get, and use, far more information in
Supervised learning, and this is the normal form of neural network learning algorithm.
This training method is normally conducted using a Back Propagation training method,
which I covered in Part 2, so if this is your first article of these three parts, and the back
propagation method is of particular interest, then you should look there.
As can be seen from the figure above, what we are going to do is have a GA which will
actually contain a population of neural networks. The idea being that the GA will jiggle
the weights of the neural networks, within the population, in the hope that the jiggling of
the weights will push the neural network population towards a solution to the XOR
problem.
1. The Geneotype
For this article, the problem domain states that we had a population of neural networks.
So I created a single dimension array of NeuralNetwork objects. This can be seen from
the constructor code within the GA_Trainer_XOR object:
Hide Copy Code
//ANN'sprivate NeuralNetwork[] networks;
public GA_Trainer_XOR()
{
networks = new NeuralNetwork[POPULATION];
//create new ANN objects, random weights applied at start
(int i = 0; i <= networks.GetUpperBound(0); i++)
{
networks[i] = new NeuralNetwork(2, 2, 1);
networks[i].Change +=
new NeuralNetwork.ChangeHandler(GA_Trainer_NN_Change);
}
}
for