Вы находитесь на странице: 1из 11

Chapter 1 Articial Neural Networks

An articial neural network can be viewed as (from [AM90]): A neural network is a massively distributed processor made of simple processing units, which has a natural propensity for storing experimental knowledge and making it available for use. It resembles the brain in two aspects: Knowledge is acquired by the network from its environment through a learning process. Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge.

Figure 1.1: Neuron Model 1

Chapter 1. Articial Neural Networks

Figure 1.2: Single Layer feedforward neural network Neural networks are distributed processing units (neurons, see Figure 1.1) that can be linear or nonlinear. The elementary units are connected with dierent strengths (weights) that are used to adapt the neural network to dierent situations. Neural networks are used in dierent applications and with dierent aims because of their ability to represent and model dierent behaviours. This ability is due to: Input-output mapping - Neural networks can be trained to learn the mapping between a system input and output by processing the signals acquired from the system. Nonlinearity - Because of the dierent activation functions that are on a neural network, the network can be very highly nonlinear, reproducing in this way the behaviour of highly nonlinear systems. Adaptability - Using the same neural network structure it is possible to retrain a neural network (adjusting the weights) adapting the behaviour to a new reality. Fault Tolerance - Neural networks are just distributed independent processing units that represent a global system. When used in practice, neural networks exhibit a graceful performance degradation. This graceful performance degradation is due to the fact that when a processing unit is broken the system continues to work with the remaining units. The simplest processing unit of a network can be represented by the following

Figure 1.3: One hidden layer feedforward neural network equation: y = (


i=1

ui ki + k0 )

(1.1)

where represent the activation function, ki represent the weights from the ith input to the kth neuron and u is the input function (stimulus function). The activation function can be a linear or nonlinear function. Common activation functions are: threshold functions; piece-wise linear functions; sigmoid functions. When several processing units are joined together, a neural network structure is formed. According to their structure, neural networks can be classied as follows: Single-Layer Feedforward Networks. This is the simplest network structure available. As implied by the name the feedforward network is a strictly forward network, does not exist feedback from the output neurons to the input of the network (see Figure 1.2). Multilayer Feedforward Networks. One or more hidden layers are present to the network. The hidden layers have the objective to give another degree of freedom in the correspondence between the input and output layer. The hidden layer is very important when one has more than one output neuron. When using hidden layers it is easier to nd a more complex mapping between the input and output, but the network becomes much more complex and dicult to compute. (see Figure 1.3) Recurrent Networks. In recurrent networks at least one feedback loop exists

Chapter 1. Articial Neural Networks

Figure 1.4: Recurrent network with hidden neurons connecting output neurons with input or intermediate layers (see Figure 1.4). More details on neural network structures can be found in [Hay99] and [RVCW03]. Learning Neural networks learning, in neural network, is dened as [Hay99]: Learning is a process by witch the free parameters of a neural network are adapted through a process of simulation by the environment in witch the network is embedded. The type of learning is determined by the manner in witch the parameter changes take place. Learning is a very abstract concept and when used in exact sciences has to be dealt with caution. When a neural network is learning is trying to adapt its parameters

5 to emulate a certain specic behaviour through a well established algorithm. The learning algorithm determines the mapping of the neural network inputs with its outputs, thus determining the equivalence with the behaviour that the neural network wants to simulate. The learning process can be divided in two major categories: Supervised Learning In supervised learning the neural network weights are adjusted by comparing the neural network result with pre acquired data with the behaviour of the system (teacher). The supervised learning mechanism is usually seen as a form of feedback and a minimisation of a certain error parameter. The neural network will learn by minimising an error between the teacher (training data) and the output of the network. Unsupervised Learning In this case the network learns from the environment using an independent quality measurement, and weights are adjust in direct inuence of the criterium. The process of learning uses several mathematical tools and can be based on dierent techniques: Error-Correction Learning. In this rule the error between the real behaviour of the system and the behaviour of the neural network is minimised by adjusting the neural network parameters. Memory Based Learning. With memory based learning the past experiences are stored in classied input-output examples. The neural network learns with the classied memory and when a dierent input is presented the network analysis the memory in a neighbourhood of the test input. Competitive Learning. In competitive learning only one output neuron can be active at one time. In this structure the output neurons compete with each other to be active. This type of learning is very used in pattern recognition since each neuron is responsible for a certain feature that is red only in the presence of a similar behaviour or feature. Boltzmann Learning. In this structure the neurons constitute a recurrent network and they have two possible states. The neurons can be either in an on

Chapter 1. Articial Neural Networks state or in an o state. The network is characterised by a energy functions that is based on the states of the neurons. The adjustments of the weights is done by studying the relations between the neural network when all the neurons are on a free mode and when the neurons are on a pre determined mode imposed by the environment.

More details on neural networks learning can be found in [Hay99], [RVCW03], [Har75] and [DHS01]. In this work only the Error-Correction learning was used, thus some important techniques usually used to train networks based on this principle are presented next. Backpropagation This is the most common algorithm to train multi-layer networks. In simple terms the back-propagation algorithm has two steps. The rst step is a feedforward calculation neuron by neuron with xed weights. The second step is a backward pass calculating the local gradient in each neuron. Using this methodology the adjustment of the weights is done by: kj = k yj (1.2)

where represents the learning parameter, represents the local gradient and yi represents the neuron output calculated during the feedforward pass. This process is repeated until a certain stopping criterium is satised [Hay99]. The backpropagation algorithm is based on the gradient , so its like the steepest-descent method [ES96], uses a linear approximation of the cost function in the local neighbourhood of the operating point of the weights. This linear approximation may not be sucient and more information must be given to the algorithm to improve the results. Using a quadratic approximation instead of a linear one, the weight update can be obtained by: kj = H 1g (1.3)

where H represents the Hessian matrix and g is the gradient. Now to adjust the weights one has two points of information the rst and second derivative. Of course that using this kind of optimisation brings some problems to the process: the calculation of the Hessian matrix and of its inverse sometimes its impractical and not possible.

7 Qausi-Newton Methods [ES96],[Hay99]. The Quasi-Newton methods use a modication of the Newton method to avoid the calculation of the inverse of the Hessian matrix (usually an approximation). Several algorithms use this method to calculate the minimum of the function.(DavidonFletcher-Powell algorithm, Broyden, Fletcher, Goldfarb and Shanno algorithm, GaussNewton algorithm [ES96]). Levenberg-Marquardt Another algorithm used to train neural networks is the Levenberg-Marquardt (LM) Algorithm. This algorithm is like a Quasi-Newton method because it uses an approximation of the Hessian to minimise the function. Here is referenced apart from the Quasi-Newton methods because is one of the most used algorithms to train neural networks and was the algorithm used to train the neural network models that are going to be presented later. As mentioned in 1.3 the adjustments in the weights are obtained by the inverse of the Hessian and the gradient. With the LM algorithm the Hessian is approximated by: H = JT J (1.4)

where J is the Jacobian matrix (contains rst derivatives of the networks error with respect to the weights). With this in mind the adjustment in the weights is calculated using: = [J T J + I]1J T e (1.5) the scalar brings the LM algorithm towards a modied Newton method when 0 and towards a gradient descent method when . Because Newton methods are faster and more accurate near minimums the LM algorithm decreases the value of when there is a reduction in the cost function. When the value of the cost function is increased in a certain step of the algorithm then is increased towards a gradient descent method.

1.0.1

Identication using Neural Networks

Several neural networks structures can be used to identify and simulate nonlinear systems.

8 Neural Network ARX Structure

Chapter 1. Articial Neural Networks

This structure is based on a multi-layer feedforward network. In this case the network has two layers, one hidden layer and an output layer (see Figure 1.5). The number of neurons in the output layer are the number of outputs that exist in the system to identify. The number of neurons in the hidden layer is a variable of the model, and can change from system to system . The predicted output is:
nh nu +ny ny

yk = [
i=1

Wik (
j=ny +1

wji u(k j) +
j=1

wji y(k j))]

(1.6)

where k represents the k th output of the system, nh represent the number of neurons on the hidden layer, nu and ny represent the regressions in the inputs and outputs and w and W represent the weights from the inputs to the hidden layer and form the hidden layer to the outputs, respectively. This model structure is trained using a supervised training method. The teacher is a data set with the acquired data from the system to be identied. The parameters to be identied are the weights of the network, in general, this structure has the following nonlinear output equation: y (k|k 1, ) = g(, ) (1.7)

where represent the regression vector, is the matrix with the parameters to be identied and g is the function represented by (1.6).

Neural Network OE Structure This structure is a recurrent neural network, because the predicted outputs are used as inputs to structure (Figure 1.6). The NNOE structure uses, just like in the linear case, the predicted outputs to form the regression vector. The network has two layers, one hidden layer and one output layer. The two layers can have dierent activation
There is no rule or a formula to nd the ideal number of neurons in the hidden layer. A practical approach is to make a test with a high number of neurons and if the result is good try to prune the network afterwards.

Figure 1.5: Neural Network ARX Structure (NNARX) functions. For this structure the predicted output is:
nh nu +ny ny

yk = [
i=1

Wik (
j=ny +1

wji u(k j) +
j=1

wji y (k j))]

(1.8)

Thus, y (k, ) = g(ou , ) where ou is the regression vector, which is given by: ou (k) = [(k 1|) . . . y (k ny |) u(k 1) . . . u(k nu )]T y (1.10) (1.9)

The training of this network structure can be accomplished using the tools used in the case of the NNARX structure.

Modelling MIMO Structures Just like for the linear models presented, the MIMO case is an extension of the SISO case. The number of output neurons is equal to the number of outputs of the system, making the construction of MIMO models a very easy task.

10

Chapter 1. Articial Neural Networks

Figure 1.6: Neural Network Output Error Structure (NNOE)

Bibliography
[AM90] I. Aleksander and H. Morton. An introduction to neural computing. London: Chapman and Hall, 1990. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classication. John Wiley & Sons, 2001. [ES96] K. P. C. Edwin and H. Z. Stanislaw. An Introduction to Optimization. John Wiley & Sons, 1996. [Har75] [Hay99] J.A. Hartigan. Clustering Algorithms. John Wiley & Sons, 1975. S. Haykin. Neural Networks A Comprehensive Foundation. Prentice Hall International, Inc., 1999. [RVCW03] F. R.Garces, V.M.Becerra, C.Kambhampati, and K. Warwick. Strategies for Feedback Linearisation. Springer, 2003.

[DHS01]

11

Вам также может понравиться