Академический Документы
Профессиональный Документы
Культура Документы
ro
http://rolisz.ro/2013/04/18/neural-networks-in-python/
Instead of using a hard threshold to decide whether to send a signal or not (this has the disadvantage of not being
a differentiable function), multilayer perceptrons use sigmoid functions such as the hyperbolic tangent or the
logistic function (f(x)=11+ex).
The most common algorithm used to train MLPs is the backpropagation algorithm. It has two phases:
1. A forward pass, in which the training data is run through the network to obtain it's output
2. A backward pass, in which, starting from the output, the errors for each neuron are calculated and then
used to adjust the weight of the network.
That's the rough summary of the algorithm, so let's start implementing it.
First, we define our activation functions and their derivatives, using numpy.
import numpy as np
def tanh(x):
return np.tanh(x)
def tanh_deriv(x):
return 1.0 - x**2
def logistic(x):
return 1/(1 + np.exp(-x))
def logistic_derivative(x):
return logistic(x)*(1-logistic(x))
In the constructor of the class we will need to set the number of neurons in each layer, initialize their weights
randomly between -0.25 and 0.25 and set the activation function to be used. Each layer, except the last one, will
also have a bias unit which corresponds to the threshold value for the activation.
class NeuralNetwork:
def __init__(self, layers, activation='tanh'):
"""
:param layers: A list containing the number of units in each layer.
Should be at least two values
:param activation: The activation function to be used. Can be
"logistic" or "tanh"
"""
if activation == 'logistic':
self.activation = logistic
self.activation_deriv = logistic_derivative
elif activation == 'tanh':
self.activation = tanh
self.activation_deriv = tanh_deriv
self.weights = []
for i in range(1, len(layers) - 1):
self.weights.append((2*np.random.random((layers[i - 1] + 1, layers[i]
+ 1))-1)*0.25)
self.weights.append((2*np.random.random((layers[i] + 1, layers[i +
1]))-1)*0.25)
Now we get to the fun part: the training. Given a set of input vectors X and output values y, adjust the weights ap
propiately. The algorithm we will use is called stochastic gradient descent, which chooses randomly a sample from
the training data and does the backpropagation for that sample, and this is repeated for a number of times (called
epochs). We also have to set the learning rate of the algorithm, which determines how big a change occurs in the
weights each time (proportionally to the errors).
def fit(self, X, y, learning_rate=0.2, epochs=10000):
X = np.atleast_2d(X)
temp = np.ones([X.shape[0], X.shape[1]+1])
temp[:, 0:-1] = X # adding the bias unit to the input layer
X = temp
y = np.array(y)
for k in range(epochs):
i = np.random.randint(X.shape[0])
a = [X[i]]
for l in range(len(self.weights)):
a.append(self.activation(np.dot(a[l], self.weights[l])))
error = y[i] - a[-1]
deltas = [error * self.activation_deriv(a[-1])]
for l in range(len(a) - 2, 0, -1): # we need to begin at the second to
last layer
deltas.append(deltas[1].dot(self.weights[l].T)*self.activation_deriv(a[l]))
deltas.reverse()
for i in range(len(self.weights)):
layer = np.atleast_2d(a[i])
delta = np.atleast_2d(deltas[i])
self.weights[i] += learning_rate * layer.T.dot(delta)
And now the useful part: prediction. This is pretty much the same as the forward pass part of backpropagation,
except we don't need to keep all the values of the activations for each neuron, so we keep only the last one.
def predict(self, x):
x = np.array(x)
temp = np.ones(x.shape[0]+1)
temp[0:-1] = x
a = temp
for l in range(0, len(self.weights)):
a = self.activation(np.dot(a, self.weights[l]))
return a
And that's it. 50 lines of code for the neural network itself, plus 10 more for the activation functions. So lets test it.
Let's start with something simple: the XOR function. The XOR function is not linearly separable (if we represent it
in plane, there is no line that can separate the points with label 1 from the points with label 0), and this means we
need at least one hidden layer. We will use it with 2 units.
nn = NeuralNetwork([2,2,1], 'tanh')
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 1, 1, 0])
nn.fit(X, y)
for i in [[0, 0], [0, 1], [1, 0], [1,1]]:
print(i,nn.predict(i))
And the output is:
([0,
([0,
([1,
([1,
0],
1],
0],
1],
array([ 4.01282568e-05]))
array([ 0.98765949]))
array([ 0.98771753]))
array([ 0.00490502]))
Pretty good. If we instead used a step function for the activations in the output layer, we would get the exact
results.
Now let's take a look at something slightly more complicated: the digits dataset that comes included with scikitlearn. This has 1797 8x8 pixel images of digits with their labels. Lets see what accuracies can we get on them. We
will have to transform the labels from values (such as 1 or 5), to vectors of 10 elements, which are all 0 except for
the position corresponding to the label, which will be one.
import numpy as np
from sklearn.datasets import load_digits
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.preprocessing import LabelBinarizer
from NeuralNetwork import NeuralNetwork
digits = load_digits()
X
y
X
X
= digits.data
= digits.target
-= X.min() # normalize the values to bring them into the range 0-1
/= X.max()
nn = NeuralNetwork([64,100,10],'tanh')
X_train, X_test, y_train, y_test = train_test_split(X, y)
labels_train = LabelBinarizer().fit_transform(y_train)
labels_test = LabelBinarizer().fit_transform(y_test)
nn.fit(X_train,labels_train,epochs=30000)
predictions = []
for i in range(X_test.shape[0]):
o = nn.predict(X_test[i] )
predictions.append(np.argmax(o))
print confusion_matrix(y_test,predictions)
print classification_report(y_test,predictions)
As output we get a confusion matrix (yup, such a thing exists :))) and a nice report (all from the nice scikit-learn
package):
[[50 0 0 0 0 0 0 0 0 0]
[ 0 39 1 0 0 0 0 1 2 6]
[ 0 0 52 0 0 0 0 0 0 0]
[ 0 0 2 34 0 1 0 0 5 0]
[ 0 1 0 0 35 0 1 1 0 0]
[ 0 0 0 0 0 50 1 0 0 1]
[ 0 0 0 0 0 0 37 0 4 0]
[ 0 0 0 0 0 0 0 47 0 1]
[ 0 0 0 0 0 1 1 0 38 1]
[ 0 0 1 0 0 1 0 1 1 33]]
precision
recall f1-score
support
0
1
2
3
4
5
6
7
8
9
1.00
0.97
0.93
1.00
1.00
0.94
0.93
0.94
0.76
0.79
1.00
0.80
1.00
0.81
0.92
0.96
0.90
0.98
0.93
0.89
1.00
0.88
0.96
0.89
0.96
0.95
0.91
0.96
0.84
0.84
50
49
52
42
38
52
41
48
41
37