Академический Документы
Профессиональный Документы
Культура Документы
Figure 1: A biological neuron Abstract This report will demonstrate the development and usage of several single layer feed forward neural nets often called perceptrons in order to predict the secondary structure of an unseen set of proteins after training on a known set of proteins. Also explained in this report is the usage of the neuroph open source Java package to train a multi-level feed forward network thus showing the progression from simple 2 dimensional, linearly separable problems solvable with a simple perceptron to a problem with a far larger number of dimensions which can be approximated by a complex multi-layer perceptron. The overall goal of this report is to conrm the results claimed in the work of Qian and Sejnowski [1].
Neurons The human brain contains a vast number of neurons all interconnected with many other neurons. The dendrites receive voltage from other neurons as input through synapses and if the total voltage applied is enough to activate the neuron then it will re sending its own voltage to possibly many output synapses see [2] Figure 1 for an example of a neuron.
This progression of neurons which may or may not activate and transmit voltage is how all human though is carried out. The purpose of a neural network node is to model one of these neurons with the correct number of input and output nodes to suit the purpose of the task and then adjust the amount of voltage each input will provide by multiplying the input value by a multiplier called a weight. To teach the neuron simple calculations such as AND and OR problems these weights are modied by supervised learning algorithms such as those developed by Rosenblatt(1959) and Widrow & Ho(1960). Learning To teach a perceptron a task the weight of each input is initialised to a random value, then sample inputs are provided to the neuron and if the dot product of the inputs and the weights is greater than 0 the neuron is said to have red. This allows simple classication problems where only 2 groups are available A and NOT A such that if the neuron res the sample input provided was from A and if it does not it was from the NOT A class; initially these are likely to be wrong as the weights were randomly initialised. In 2 dimensions the perceptron is acting like a line drawn on a 2D graph where the points above the line are A and those 1
below are NOT A. During training when the class chosen is incorrect the distance from the line hereafter known as local error is calculated by subtracting the value given by the perceptron (this will be the voltage it was passed to activate) from the expected value see Equation (1). The lines gradient is then adjusted to more closely match what was expected. This is repeated for the other items in the set of training data. In some cases the training set is too small for the perceptron to converge on a solution therefore it becomes necessary to provide the training set several times to allow the solution to emerge. Each time the entire training set is processed is called an epoch. Later in this paper the eects of epochs will be demonstrated. E =T A (1)
Where E is local error, T is the training value provided as expected output and A is the actual value of the output. Learning Rate and Bias Two important concepts in perceptron training are the learning rate and the bias. If the line were to be moved such that the point that was incorrect was correct this may be too far moreover the point may be an outlier in the class and move the line incorrectly. To solve this problem the learning rate is used. Essentially any adjustment to the line is multiplied by a fractional value between 0 and 1. This reduces the eect each adjustment has allowing the line to gradually shift into the correct alignment however because of this after completion it is uncertain whether the position has improved enough so several training epochs are performed to allow each point several opportunities to shift itself into place. Eventually the perceptron will converge on the solution if one is available according to the perceptron convergence theory which it will if the classes are linearly separable, if not then the perceptron will converge on a solution area and oscillate continuously. There is another issue that is faced when training perceptrons which is that the seperator line must pass through (0,0). This clearly will cause issues when attempting to separate classes thus the bias is used. The bias is a preference for the perceptron to act in a certain way which in terms of the separation line shifts the line in the x axis such that it need not pass through (0,0) this bias is also updated as the perceptron learns and is multiplied by the learning rate. Emergent Behaviour As the learning process continues the result will become closer and closer to correct. It will however always remain an estimate as even if data is separable then there is no guarantee that in the region of correct answers as given by the training set there does not exist a sub region which is incorrect but no data is given to test it. However there is another possible issue which is that of over training. If a perceptron is given too much training data it will begin to model so called noise which is properties of the data set which are not desired eventually theoretically it may even converge such that it would simply identify the training set, this is not what is desired, what is required is the emergence not of specics but of generalisation such that when applied to unseen data the perceptron is still a good estimate. Implementation of a 2D single layer perceptron For this task the language chosen was Java due to the authors familiarity with the language and the existence of the ArrayList structure which is useful. A basic framework was implemented by creating a Perceptron class which can be passed two arrays. An array of arrays of doubles and a single array of doubles. These are a list of input vectors which are themselves simply lists of double values and a list of the expected outcome of the perceptron. Doubles were chosen as they are more exible allowing integers and fractional values to make the perceptron more global. An integer is provided on creating a new Perceptron which denes the number of inputs and blank ArrayLists of doubles are created. Using add methods provided training data is added to the perceptron by giving an array of inputs and the expected output. Once done the train method is called to train the perceptron and after this is complete the present(double[]) method passes a set of inputs to the perceptron and the result will be returned 1 or -1 to dene if it is or is not in the class. A getLastThreshold() method was also provided to allow access to the actual result value. The present method uses equation (2) to calculate whether the neuron will re and the value of how far over or under the threshold the point was. This same equation is used in training which will be explained later in the report.
n
b + (wi xi )
i=0
(2)
This is the sum of each input(x) multiplied by its weight(w) plus the bias(b). 2
Training the 2D Perceptron As the data set used was small the learning rate was chosen quite high at 0.1 and the weights were initialised using the an instance of the Random class which generates random numbers. Each item in the learning set was then provided to the perceptron using the present method. The local error was calculated using equation (1) and the bias was calculated using equation (3). However in the initial tests bias was set to 0 and the data selected was chosen such that the separator line would pass through (0,0). Equation (4) was then applied to each input to adjust its weight. As all weights are adjusted with this formula by the same amount complex behaviours can be impossible to model as the eect of individual inputs is not separable for example behaviours such as if inputs xi toxi+j (where j is 1 or greater) sum to greater than 1 then set assume all those inputs are 0 and continue. The most famous example of this is XOR a perceptron cannot be trained to understand this as the behaviour is contradictory as in the previous example if the total is too high ignore all inputs would be needed. This problem is solved by the multi-layer perceptrons discussed shortly. This process is then repeated for as many epochs as necessary. For this implementation while(globalError != 0 & & epoch 10000); was chosen as the indicator for completed training as on 2D problems with little data epochs are fast and the classes were separable so globalError which is the sum of all errors in the epoch as seen in equation (5) would equal 0 which implies a solution has been converged upon long before the epoch limit was reached. b=b+E (3)
Bias has the local error added too it each time. this can be posative or negative so does not become too large. wi = wi + r E xi (4) G = G+ | E | (5) Global error is the total error in an epoch and the new global error after a present has been called is the old global error plus the magnitude of the local error. On testing this implementation on a sample set of 2D data, the solution was converged on in under 1 second in fewer than 5 epochs averaging at 4 epochs using 30 data points such as those listed in the example matrix below: 0.47 0.88 0.91 0.69 0.46 0.80 The full data set is too large to be included but an example 5 epoch convergence execution of this program resulted in the following output: Epoch 0 1 2 3 4 5 Modications to Process Proteins In the following sections of this report average accuracy is mentioned often, this refers to Q3 as dened in Qian and Sejnowskis paper [1] and seen in equation (6). Q3 = P + P + PCoil N (6) Error 54.0 50.0 44.0 24.0 8.0 0.0
To match the perceptron model the amino acids making up proteins are encoded as a 20 long sparsely encoded array meaning that of the 20 positions only 1 in 20 can ever be a 1. To modify the existing code in order to allow processing of proteins was largely unnecessary. A Protein class was created to which a string of amino acid letters can be passed to dene a protein. Within this class is code which acts as an iterator allowing a window at a time to be returned in a sparse encoding rather than individual acid encodings where the window size is dened on creation of the protein. A method was then created to read the data 3
Figure 2: A graph to show the values of Q3 for dierent window sizes of 50 epochs given on the UCI Machine learning repository and process it into an ArrayList of proteins which can be modied if necessary. However the learning algorithm did need updating. After much testing and renement an average of only 30 percent was typical. After much experimentation it became apparent that the issue was that the algorithm only trains if it incorrectly guesses the classication resulting in a skew when input data clusters which is why randomisation of inputs helped. The solution was to train even on correct guesses as if totally correct the multiplied local error will be 0 so no change will happen but when correctly guessed the line should still be adjusted closer to the correct position as dened by the local error. The nal structure prediction was achieved by creating 3 perceptrons one for each of the 3 classes and selecting the structure which had the highest last activation value this gave an accuracy value of approximately 55% with a 5 long window and 61% with a 13 long window on unseen data after 50 epochs for a wider range of examples see Figure 2. Due to the larger number of adjustments caused by the modication to the learning algorithm more than 50 epochs became impractical. On seen data the accuracy averaged at 68%. These measurements were gathered using a learning rate of 0.0001 this was chosen after testing several possible values a subset of which are shown in Figure 3. See Figure 4 for single layer learning progression.
Multi-Layer Perceptron [1] used a structure with 40 nodes on one hidden layer therefore the structure used in this implementation was the same. To implement the multilayer perceptron (Hereafter refered to as MLP) the Protein class was carried forward as were the algorithms to populate an array of these. The actual training of the perceptron was done using [4] Neuroph which is an open source package for Java for neural networks. Neuroph has a GUI but for 13 long windows and even for 5 long windows which gives 100 inputs manual entry of these is impractical therefore the core was imported and the protein array was traversed to transfer the patterns to the multi-layer perceptron created by this package. To create an MLP using Neuroph all that is needed is this line of code MultiLayerPerceptron myMlPerceptron = new MultiLayerPerceptron(TransferFunctionType.SIGMOID, WINDOW SIZE* 20, 40, 3);. This denes an MLP using the Sigmoid function for learning with enough input nodes for the sparsely encoded window using a single hidden layer of 40 nodes and 3 output nodes. The sigmoid function is selected above others such as TANH due to several reasons the rst of which is structured trial and error where every possibility was tried and the second of which is that Figure 2 in [1] shows a processing units output which clearly matches the sigmoid function discussed on [2]. This was indeed the best selection. To train the perceptron it is passed all the data in a training set that the single layer perceptron was trained on however as there is no full solution 4
Figure 3: A graph to show the increase in accuracy using dierent learning rates
Figure 4: A graph to show learning progression of the 3 classes of the single layer solution. Followed by the same graph with rst 5 points removed to more clearly show that learning is continuing but at a slower rate. the training never completes. This was left running for a week and failed to complete therefore a dierent approach was taken for testing. The training can be done in a separate thread and can be stopped at any time therefore the program was adjusted to run the training algorithm and wait a set length of time before stopping the learning progress and testing the perceptron. After 100000ms the average accuracy is 65%, after 500000ms the average accuracy is approximately 90% and after 1000000ms the average is over 96%. This is considerably better than even Qian and Sejnowski displayed in their paper [1]. Unfortunately as the inside of the black box of the MLP is hidden why this result is so good is unclear but by outputting the estimated structure alongside the true structure it becomes apparent that the majority of the proteins have over 95% correct with some even 100% correct on unseen data with a few outliers reducing the average, whether these outliers would eventually be absorbed increasing the accuracy even further may be tested in a future publication. A small sample output from the 1000000ms run is given on the following page:
Figure 5: A graph to show the learning progress of the MLP every 10 seconds, it is seen to oscillate but still rise slowly after an initial fast improvement Calc:___ee_____eee____ee__hh_h_hhhh_______e Real:___ee_____eee____ee__hhhhhhhhhh______e 92.10526315789474% A training curve has been created by pausing the learning every 10 seconds to measure performance on the unseen data seen in Figure 5 however the constant pausing seems to make the learning rate slower
Conclusion While the single layer perceptron works well for linearly separable sets if anything more complex appears a multi-layered perceptron is far superior though the style of regression and number of hidden nodes may need to be experimented with. When comparing the MLP to online alternatives such as JPred it would seem according to the table below that the best achievable is a Q3 of only 76.4. During this study far higher accuracy results were obtained and this can only be explained by the data this study used. The data used was specically selected to contain few ambiguities such as similar haemoglobin proteins as well as the fact that the test set is a mere 50 elements. It is possible that given a larger set of data the results would shrink to closer to the 76.4 gure quoted by JPred. A table of success rates (Q3) of other methods can be seen below [3]: Method PHD DSC PREDATOR NNSSP Mulpred Zpred Consensus (Jpred) Jnet 126 Protein Set 73.5 71.1 70.3 72.7 67.2 66.7 74.8 N/A 6 396 Protein Set 406 Protein Set 71.9 73.3 68.4 70.6 68.6 70.7 71.4 72.3 66.1 N/A 64.8 62.0 72.9 74.6 N/A 76.4
Bibliography
[1] Qian and Sejnowski, Predicting the Secondary Structure of Globular Proteins Using Neural Network Models. Academic Press Limited, J. Mol. Biol. (1988) 202, 865-884, Available at: http://users.ecs.soton.ac.uk/mn/ArticialIntelligence/QianSejnowskiPaper.pdf 25 September 1987. [2] Sacha Barber, AI : An introduction into Neural Networks. http://CodeProject.com, Available at: http://www.codeproject.com/KB/recipes/NeuralNetwork 1.aspx 16 May 2007. [3] Various, Documentation - Accuracy. http://www.compbio.dundee.ac.uk, Available at: http://www.compbio.dundee.ac.uk/www-jpred/legacy/accuracy.html [4] Various, Neuroph. Available at: http://neuroph.sourceforge.net/