Neural Net 3rdclass

A Brief Overview of Neural Networks N k
Overview
Relation to Biological Brain: Biological Neural Network g g The Artificial Neuron Types of Networks and Learning Techniques Supervised Learning & Backpropagation Training Algorithm Learning by Example Applications
Biological Neuron
Artificial Neuron
W I N P U T S W
Neuron
W W W=Weight W i ht
f(n)
Activation Function
Outputs
Transfer Functions
Output
1 SIGMOID : f (n) = n 1 +e
Input
LINEAR : f (n) = n
Types of networks
Multiple Inputs and Single Layer
Multiple Inputs and layers
Types of Networks Contd Contd.

Feedback
Recurrent Networks
Recurrent Networks
Feed forward networks: Information only flows one way One input pattern produces one output No sense of time (or memory of previous state) Recurrency Nodes connect back to other nodes or themselves Information flow is multidirectional Sense of time and memory of previous state(s) Biological nervous systems show high levels of recurrency (but feed-forward structures exists too)
ANNs The basics

ANNs incorporate the two fundamental components of biological neural nets:
1. Neurones (nodes) 2. Synapses (weights)
Feed forward Feed-forward nets

Information flow is unidirectional
Data is presented to Input layer p p y Passed on to Hidden Layer Passed on to Output layer
Information is distributed Information processing is parallel
Internal representation (interpretation) of data
Neural networks a e good for p ed ct o p ob e s eu a et o s are o prediction problems. The inputs are well understood. You have a p good idea of which features of the data are important, but not necessarily how to combine them. them The output is well understood. You know what you are trying to predict predict. Experience is available. You have plenty of examples where both the inputs and the output are known. This experience will be used to train the network.
Feeding data through the net:
(1 0.25) + (0.5 (-1.5)) = 0.25 + (-0.75) 1 Squashing: = 0.3775
= - 0.5
1 + e 0 .5
Learning Techniques
Supervised Learning:
Inputs from the environment
Expected Output
Actual System y
Actual Output
Neural Network
Training Error
Multilayer Perceptron
Inputs
First Hidden layer
Second Hidden Layer
Output Layer
Signal Flow Backpropagation of Errors B k i fE
Function Signals Error Signals
Neural networks for Directed Data Mining: Building a model f classification and prediction d l for l ifi ti d di ti
1. 2. 2 3. 4. 5. Identify the input and output features Normalize (scaling) the inputs and outputs so their range is between 0 and 1. Set up a network on a representative set of training examples. l Train the network on a representative set of training examples. p Test the network on a test set strictly independent from the training examples. If necessary repeat the training, adjusting the training set network topology nad set, topology, parameters. Evaluate the network using the evaluation set to see how well it performs. Apply the model generated by the network to predict outcomes for unknown inputs.
6. 6
Learning by Example
Hidden layer transfer function: Sigmoid function = F(n)= 1/(1+exp(-n)), where n is the net input to the neuron. Derivative= F(n) = (output of the neuron)(1output of the neuron) : Slope of the transfer function. Output layer transfer function: Linear function= F(n)=n; Output=Input to the neuron F( ) O I h Derivative= F(n)= 1
Purpose of the Activation Function

We want the unit to be active (near +1) when the right ( ) g inputs are given We want the unit to be inactive (near 0) when the wrong inputs are given. i t i Its preferable for activation function to be nonlinear. Otherwise, the entire neural network collapses into a simple linear function.
Possibilities for activation function
Step function
Sign function
Sigmoid (logistic) function

sigmoid(x) = 1/(1+e-x)
sign(x) = +1, if x > 0 step(x) = 1, if x > threshold 0, if x threshold -1, if x 0 ( picture above, threshold = 0) (in p , )
Adding an extra input with activation a0 = - 1 and weight W0,jj = t (called the bias weight) is equivalent to having a threshold at t. This way we can always assume a 0 threshold.
Using Bias Weight to U i a Bi W i ht t Standardize the Threshold

-1 x1 x2 W1x1+ W2x2 < T W1x1+ W2x2 - T < 0 W1 W2 T
Learning by Example
Training Algorithm: backpropagation of errors using gradient descent training. Colors:
Red: Current weights O Orange: Updated weights U d t d i ht Black boxes: Inputs and outputs to a neuron Blue: Sensitivities at each layer
The perceptron learning rule performs gradient descent in weight space. Error surface: The surface that describes the error on each example as a function of all the weights in the network. network A set of weights defines a point on this surface surface. (It could also be called a state in the state space of possible weights, i.e., weight space.) We look at the partial derivative of the surface with respect to each weight (i.e., the gradient -- how much the error would change if we made a small change in each weight). Then the weights are being altered in an amount proportional t th slope i each di ti l to the l in h direction (corresponding ti ( di to a weight). Thus the network as a whole is moving in the direction of steepest descent on the error surface.
Definition of Error: Sum of Squared Errors

1 1 2 E= (t o) = Err 2 2 examples 2
This is introduced to simplify the math on the next slide
Here, t is the correct (desired) output and o is the actual output of the neural net. t t f th l t
Reduction of Squared Error

Gradient descent reduces the squared error by calculating the partial derivative of E with respect to each weight:
E Err chain rule for derivatives h i l f d i ti =E Err a vector E = W j W j
This is called in
E is
n t g Wk xk expand second E above to (t g(in)) d d Err b ( (i )) k =0 t = Err g ' (in) x j = 0 and chain rule because = Err W j
learning rate
W j W j + Err g ' (in) x j

The weight is updated by times this gradient of error E in weight space. The fact that the weight is updated in the correct direction (+/-) can be verified with examples. The learning rate, , is typically set to a small value such as 0.1
First Pass
G1= (0.6225)(10.6225)(0.0397)(0.5)(2)=0.0093 0 6225)(0 0397)(0 5)(2) 0 0093 G2= (0.6508)(10.6508)(0.3492)(0.5)=0.0397 0 6508)(0 3492)(0 5) 0 0397
0.5 05
0.5
0.6225
0.5 05
0.6225
0.6508
0.5
0.6508
1
0.5
0.5
0.5 0.5
0.6508
G3=(1)(0.3492)=0.3492
0.5
0.6225
0.5
0.6225
0.6508
Gradient of the neuron= G =slope of the transfer function[{(weight of the neuron t the next neuron) to th t ) (output of the neuron)}]
Gradient of the output neuron = slope of the transfer function error
Error=1-0.6508=0.3492
Weight Update 1
New Weight=Old Weight + {(learning rate)(gradient)(prior output)}
0.5+(0.5)(0.0397)(0.6225) 0.5+(0.5)(0.0093)(1) 0.5+(0.5)(0.3492)(0.6508)
0.5124 0 5124 0.5047 0.5124 0.5047 0.5124 0.5124 0.6136 0.6136
Second Pass
G1= (0.6236)(10.6236)(0.5124)(0.0273)(2)=0.0066 0 6236)(0 5124)(0 0273)(2) 0 0066 G2= (0.6545)(10.6545)(0.1967)(0.6136)=0.0273 0 6545)(0 1967)(0 6136) 0 0273
0.6236 0.5047 0 5047

0.5047
0.6391
0.5124 0 5124
0.6545
0.6136
0.8033
1
0.5047
0.5124
0.5124 0.6136 0.5124
0.8033
G3=(1)(0.1967)=0.1967
0.5047 0.6236
0.6391
0.6545
Error=1-0.8033=0.1967
Weight Update 2
New Weight=Old Weight + {(learning rate)(gradient)(prior output)}
0.5124+(0.5)(0.0273)(0.6236) 0.5047+(0.5)(0.0066)(1) 0.6136+(0.5)(0.1967)(0.6545)
0.5209 0 5209 0.508 0.5209 0.508 0.5209 0.5209 0.6779 0.6779
Third Pass
0.6243 0.508 0 508
0.508
0.6504
0.5209 0 5209
0.6571
0.6779
0.8909
1
0.508
0.5209
0.5209 0.6779 0.5209
0.508 0.6243
0.6571
0.8909
0.6504
Weight Update Summary

Weights Output Expected O Error w1 1 w2 2 w3 3 Initial conditions 0.5 0.5 0.5 0.6508 1 0.3492 Pass 1 Update 0.5047 0.5124 0.6136 0.8033 0 5047 0 5124 0 6136 0 8033 1 0 1967 0.1967 Pass 2 Update 0.508 0.5209 0.6779 0.8909 1 0.1091
W1: Weights from the input to the input layer W2: Weights from the input layer to the hidden layer W3: Weights from the hidden layer to the output layer
Training Algorithm
The process of feedforward and backpropagation continues until the required mean squared error has been reached. Typical mse: 1e 5 1e-5 Other complicated backpropagation training l ith t i i algorithms also available. l il bl
Why Gradient?
O1 Output W1 O3 = 1/[1+exp(-N)] Error = Actual Output O3
N= O2 (O1W1) +(O2W (O O = Output of the neuron 2) W = Weight N = Net input to the neuron W2 To reduce error: Change in weights:
Input
o Learning rate o Rate of change of error w.r.t rate of change of weight Gradient: rate of change of error w.r.t rate of change of N Prior output (O1 and O2)
Gradient in Detail
Gradient : Rate of change of error w.r.t rate of change in net input to
neuron o For output neurons Slope of the transfer function error o For hidden neurons : A bit complicated ! : error fed back in terms of gradient of successive neurons Slope of the transfer function [ (gradient of next neuron weight connecting the neuron to the next neuron)] Why summation? Share the responsibility!!
An Example
G1=0.66(1-0.66)(0.34)= 0.0763 1 0.731 0.5 0.6645 0.66 1 Error = 1-0.66 = 0 3 o 0 66 0.34 0.5 0.5
0.4
0.598 Reduce more
0.5
0.6645
0.66
0 Error = 0-0.66 = -0.66
G1=0.66(1-0.66)(-0.66)= -0.148 Increase less
Improving performance
Changing the number of layers and number of neurons in each layer. Variation in Transfer functions functions. Changing the learning rate. Training for longer times. Type of p p yp pre-processing and p g postprocessing.

Neural Net 3rdclass

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Neural Net 3rdclass

Загружено:

Авторское право:

Доступные форматы

A Brief Overview of Neural Networks N k

Multiple Inputs and Single Layer

Multiple Inputs and layers

Types of Networks Contd Contd.

ANNs The basics

Feed forward Feed-forward nets

Information is distributed Information processing is parallel

Internal representation (interpretation) of data

Feeding data through the net:

(1 0.25) + (0.5 (-1.5)) = 0.25 + (-0.75) 1 Squashing: = 0.3775

First Hidden layer

Second Hidden Layer

Signal Flow Backpropagation of Errors B k i fE

Function Signals Error Signals

Purpose of the Activation Function

Possibilities for activation function

Sigmoid (logistic) function

Using Bias Weight to U i a Bi W i ht t Standardize the Threshold

Definition of Error: Sum of Squared Errors

This is introduced to simplify the math on the next slide

Reduction of Squared Error

W j W j + Err g ' (in) x j

Gradient of the output neuron = slope of the transfer function error

0.5124 0 5124 0.5047 0.5124 0.5047 0.5124 0.5124 0.6136 0.6136

0.6236 0.5047 0 5047

0.5124 0.6136 0.5124

0.5209 0 5209 0.508 0.5209 0.508 0.5209 0.5209 0.6779 0.6779

0.5209 0.6779 0.5209

Weight Update Summary

0.598 Reduce more

0 Error = 0-0.66 = -0.66

G1=0.66(1-0.66)(-0.66)= -0.148 Increase less

Вам также может понравиться