Академический Документы
Профессиональный Документы
Культура Документы
Возможно, что вы уже заметили, что нейросети с большим количеством слоев обучаются
значительно медленнее, чем небольшие сети. И это связано не только с тем, что требуется
больше операций для вычисления нейросети и одного шага обучения. Если вы заметили, то
и шагов обучения необходимо делать намного больше, чем для более простых нейросетей
для достижения одной и той же точности результата обучения.
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt # библиотека построения графиков
plt.style.use('ggplot') # устанавливаем стиль построения графиков
Можно сделать вывод, что от модуля производной функции активации напрямую зависит
рапространение ошибки. И производная, близкая к нулю, фактически "обнуляет" обучение
всех предыдущих слоев, идущее от данного нейрона, несмотря на возможно большую
ошибку на выходе нейрона.
сигмоидная функция и softmax - удобны для выходного слоя, если мы решаем задачу
классификации;
Давайте проанализруем, как производные функций активации sigmoid и tanh ведут себя в
зависимости от входа u
dy1_dx = tape.gradient(y1, x)
dy2_dx = tape.gradient(y2, x)
plt.figure(figsize=(12, 5))
ax1 = plt.subplot(1,2, 1)
ax1.plot(x, y1, label='y=sigmoid')
ax1.plot(x, dy1_dx, label='dy/dx')
ax1.legend()
_ = ax1.set_xlabel('x')
Сохранено
ax2 = plt.subplot(1,2, 2)
ax2.plot(x, y2, label='y=tanh')
ax2.plot(x, dy2_dx, label='dy/dx')
ax2.legend()
_ = ax2.set_xlabel('x')
plt.show()
для тангенса - от -3 до 3.
ReLU
Проверим насколько увеличится скорость обучения нейросети, если заменить tanh на relu
Загрузка и подготовка данных для классификации
# нормировка данных
x_train /= 255
x_test /= 255
model_2.add(tf.keras.layers.Dense(512, activation='relu'))
model_2.add(tf.keras.layers.Dropout(rate=0.25))
model_2.add(tf.keras.layers.Dense(num_classes, activation='softmax'))
model_2.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_3 (Dense) (None, 512) 401920
_________________________________________________________________
dropout_2 (Dropout) (None, 512) 0
_________________________________________________________________
dense_4 (Dense) (None, 512) 262656
_________________________________________________________________
dropout_3 (Dropout) (None, 512) 0
_________________________________________________________________
dense_5 (Dense) (None, 10) 5130
=================================================================
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
_________________________________________________________________
history_2 = model_2.fit(
x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test)
)
poc /50
235/235 [==============================] - 6s 25ms/step - loss: 0.2538 - accuracy: 0.9
Epoch 22/50
235/235 [==============================] - 6s 25ms/step - loss: 0.2519 - accuracy: 0.9
Epoch 23/50
235/235 [==============================] - 6s 25ms/step - loss: 0.2416 - accuracy: 0.9
Epoch 24/50
235/235 [==============================] - 6s 25ms/step - loss: 0.2328 - accuracy: 0.9
Epoch 25/50
235/235 [==============================] - 6s 25ms/step - loss: 0.2328 - accuracy: 0.9
Epoch 26/50
235/235 [==============================] - 6s 25ms/step - loss: 0.2273 - accuracy: 0.9
Epoch 27/50
235/235 [==============================] - 6s 25ms/step - loss: 0.2186 - accuracy: 0.9
Epoch 28/50
235/235 [==============================] - 6s 25ms/step - loss: 0.2204 - accuracy: 0.9
Epoch 29/50
235/235 [==============================] - 6s 25ms/step - loss: 0.2108 - accuracy: 0.9
Epoch 30/50
235/235 [==============================] - 6s 25ms/step - loss: 0.2119 - accuracy: 0.9
Epoch 31/50
235/235 [==============================] - 6s 25ms/step - loss: 0.2014 - accuracy: 0.9
Epoch 32/50
235/235 [==============================] - 6s 25ms/step - loss: 0.2015 - accuracy: 0.9
Epoch 33/50
235/235 [==============================] - 6s 25ms/step - loss: 0.2007 - accuracy: 0.9
Epoch 34/50
235/235 [==============================] - 6s 25ms/step - loss: 0.2003 - accuracy: 0.9
Epoch 35/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1938 - accuracy: 0.9
Epoch 36/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1937 - accuracy: 0.9
Epoch 37/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1886 - accuracy: 0.9
Epoch 38/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1818 - accuracy: 0.9
Epoch 39/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1829 - accuracy: 0.9
Epoch 40/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1819 - accuracy: 0.9
Epoch 41/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1749 - accuracy: 0.9
Epoch 42/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1810 - accuracy: 0.9
Epoch 43/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1715 - accuracy: 0.9
Epoch 44/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1667 - accuracy: 0.9
Сохранено
Epoch 45/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1692 - accuracy: 0.9
Epoch 46/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1641 - accuracy: 0.9
Epoch 47/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1628 - accuracy: 0.9
Epoch 48/50
235/235 [==============================] - 6s 26ms/step - loss: 0.1581 - accuracy: 0.9
Epoch 49/50
235/235 [==============================] - 6s 25ms/step - loss: 0.1559 - accuracy: 0.9
Epoch 50/50
235/235 [==============================] - 6s 25ms/step - loss: 0 1569 - accuracy: 0 9
model_3.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_6 (Dense) (None, 2048) 1607680
_________________________________________________________________
dropout_4 (Dropout) (None, 2048) 0
_________________________________________________________________
dense_7 (Dense) (None, 1024) 2098176
_________________________________________________________________
dropout_5 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_8 (Dense) (None, 512) 524800
_________________________________________________________________
dropout_6 (Dropout) (None, 512) 0
_________________________________________________________________
dense_9 (Dense) (None, 10) 5130
=================================================================
Total params: 4,235,786
Trainable params: 4,235,786
Non-trainable params: 0
_________________________________________________________________
# обучаем модель
batch_size = 256
epochs = 50
history_3 = model_3.fit(
x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
lid ti d t ( t t t t)
validation_data=(x_test, y_test)
)
poc /50
Сохранено
235/235 [==============================] - 34s 143ms/step - loss: 0.2586 - accuracy: 0
Epoch 22/50
235/235 [==============================] - 34s 143ms/step - loss: 0.2505 - accuracy: 0
Epoch 23/50
235/235 [==============================] - 34s 144ms/step - loss: 0.2421 - accuracy: 0
Epoch 24/50
235/235 [==============================] - 34s 143ms/step - loss: 0.2383 - accuracy: 0
Epoch 25/50
235/235 [==============================] - 34s 143ms/step - loss: 0.2269 - accuracy: 0
Epoch 26/50
235/235 [==============================] - 34s 143ms/step - loss: 0.2275 - accuracy: 0
Epoch 27/50
235/235 [==============================] - 34s 144ms/step - loss: 0.2242 - accuracy: 0
Epoch 28/50
235/235 [==============================] - 34s 144ms/step - loss: 0.2190 - accuracy: 0
Epoch 29/50
235/235 [==============================] - 34s 143ms/step - loss: 0.2041 - accuracy: 0
Epoch 30/50
235/235 [==============================] - 34s 143ms/step - loss: 0.2011 - accuracy: 0
Epoch 31/50
235/235 [==============================] - 34s 143ms/step - loss: 0.2066 - accuracy: 0
Epoch 32/50
235/235 [==============================] - 34s 144ms/step - loss: 0.1994 - accuracy: 0
Epoch 33/50
235/235 [==============================] - 34s 144ms/step - loss: 0.1961 - accuracy: 0
Epoch 34/50
235/235 [==============================] - 34s 144ms/step - loss: 0.1913 - accuracy: 0
Epoch 35/50
235/235 [==============================] - 34s 145ms/step - loss: 0.1896 - accuracy: 0
Epoch 36/50
235/235 [==============================] - 34s 144ms/step - loss: 0.1791 - accuracy: 0
Epoch 37/50
235/235 [==============================] - 34s 144ms/step - loss: 0.1792 - accuracy: 0
Epoch 38/50
235/235 [==============================] - 34s 143ms/step - loss: 0.1747 - accuracy: 0
Epoch 39/50
235/235 [==============================] - 34s 143ms/step - loss: 0.1727 - accuracy: 0
Epoch 40/50
235/235 [==============================] - 34s 144ms/step - loss: 0.1715 - accuracy: 0
Epoch 41/50
235/235 [==============================] - 34s 143ms/step - loss: 0.1678 - accuracy: 0
Epoch 42/50
235/235 [==============================] - 33s 142ms/step - loss: 0.1674 - accuracy: 0
Epoch 43/50
235/235 [==============================] - 34s 143ms/step - loss: 0.1613 - accuracy: 0
Epoch 44/50
235/235 [==============================] - 34s 143ms/step - loss: 0.1592 - accuracy: 0
Epoch 45/50
235/235 [==============================] - 34s 144ms/step - loss: 0.1555 - accuracy: 0
Epoch 46/50
235/235 [==============================] - 34s 143ms/step - loss: 0.1514 - accuracy: 0
Epoch 47/50
235/235 [==============================] - 34s 144ms/step - loss: 0.1522 - accuracy: 0
Epoch 48/50
235/235 [==============================] - 34s 144ms/step - loss: 0.1515 - accuracy: 0
Epoch 49/50
235/235 [==============================] - 34s 143ms/step - loss: 0.1525 - accuracy: 0
Сохранено
Epoch 50/50
dir(model_2.layers[1])
#trpars = [ for w in model.trainable_weights]
model2_sizes = [(layer.trainable_weights[0].shape[0]*(layer.trainable_weights[0].shape[1]+1
model2_sizes
model3_sizes = [(layer.trainable_weights[0].shape[0]*(layer.trainable_weights[0].shape[1]+1
model3_sizes
learning_step = 0.001
learning_step = 0.01
для скрытых слоев имеет смысл использовать функции активации, которые имеют
ненулевые производные в достаточно широком диапазоне изменения аргумента
(суммарного входа u);
relu (recti ed linear unit) - наиболее простая и часто используемая функция активации
для скрытых слоев;
существует много модификаций relu (recti ed linear unit) (leaky relu, elu, gelu, selu) -
которые решают проблему нулевого градиента в области u < 0 ;