Академический Документы
Профессиональный Документы
Культура Документы
Batch-нормализация
то увидим прямую зависимость от выхода предыдущего слоя. и, если этот выход очень
большой по сравнению с другими нейронами данного слоя, то он может определять
суммарный вход всех нейронов следующего слоя.
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt # библиотека построения графиков
plt.style.use('ggplot') # устанавливаем стиль построения графиков
Source: (https://arxiv.org/pdf/1803.08494.pdf)
γ : параметр масштаблирования
β : параметр смещения
Параметры γ и β - обучаемые и они подбираются во время обучения так, чтобы уменьшить
ошибку. При расчете нейросети они фиксируются и просто используются для
нормализации.
# нормировка данных
x_train /= 255
x_test /= 255
array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)
model_2.add(tf.keras.layers.Dense(512, activation='relu'))
model_2.add(tf.keras.layers.BatchNormalization())
model_2.add(tf.keras.layers.Dropout(rate=0.25))
model_2.add(tf.keras.layers.Dense(num_classes, activation='softmax'))
model_2.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 512) 401920
_________________________________________________________________
batch_normalization (BatchNo (None, 512) 2048
_________________________________________________________________
dropout (Dropout) (None, 512) 0
_________________________________________________________________
dense_1 (Dense) (None, 512) 262656
_________________________________________________________________
batch_normalization_1 (Batch (None, 512) 2048
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 5130
=================================================================
Total params: 673,802
Trainable params: 671,754
Non-trainable params: 2,048
_________________________________________________________________
# Готовим модель для обучения
model_2.compile(
loss='categorical_crossentropy',
optimizer=tf.keras.optimizers.SGD(learning_rate=0.01),
metrics=['accuracy']
)
# обучаем модель
batch_size = 256
epochs = 50
history_2 = model_2.fit(
x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test)
)
Epoch 1/50
235/235 [==============================] - 4s 6ms/step - loss: 1.0549 - accuracy: 0.6
Epoch 2/50
235/235 [==============================] - 1s 4ms/step - loss: 0.3592 - accuracy: 0.8
Epoch 3/50
235/235 [==============================] - 1s 4ms/step - loss: 0.2767 - accuracy: 0.9
Epoch 4/50
235/235 [==============================] - 1s 4ms/step - loss: 0.2373 - accuracy: 0.9
Epoch 5/50
235/235 [==============================] - 1s 4ms/step - loss: 0.2151 - accuracy: 0.9
Epoch 6/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1896 - accuracy: 0.94
Epoch 7/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1766 - accuracy: 0.94
Epoch 8/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1639 - accuracy: 0.94
Epoch 9/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1525 - accuracy: 0.9
Epoch 10/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1444 - accuracy: 0.9
Epoch 11/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1404 - accuracy: 0.9
Epoch 12/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1299 - accuracy: 0.9
Epoch 13/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1232 - accuracy: 0.9
Epoch 14/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1248 - accuracy: 0.9
Epoch 15/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1148 - accuracy: 0.9
Epoch 16/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1121 - accuracy: 0.9
Epoch 17/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1075 - accuracy: 0.9
Epoch 18/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1050 - accuracy: 0.9
Epoch 19/50
235/235 [==============================] - 1s 4ms/step - loss: 0.1013 - accuracy: 0.9
Epoch 20/50
235/235 [==============================] - 1s 4ms/step - loss: 0.0954 - accuracy: 0.9
Epoch 21/50
235/235 [==============================] - 1s 4ms/step - loss: 0.0900 - accuracy: 0.9
Epoch 22/50
235/235 [==============================] - 1s 4ms/step - loss: 0.0915 - accuracy: 0.9
Epoch 23/50
235/235 [==============================] - 1s 4ms/step - loss: 0.0885 - accuracy: 0.9
Epoch 24/50
235/235 [==============================] - 1s 4ms/step - loss: 0.0830 - accuracy: 0.9
Epoch 25/50
235/235 [==============================] - 1s 4ms/step - loss: 0.0819 - accuracy: 0.9
Epoch 26/50
235/235 [==============================] - 1s 4ms/step - loss: 0.0805 - accuracy: 0.9
Epoch 27/50
235/235 [==============================] - 1s 5ms/step - loss: 0.0773 - accuracy: 0.9
Epoch 28/50
235/235 [==============================] - 1s 4ms/step - loss: 0.0771 - accuracy: 0.9
Epoch 29/50
235/235 [==============================] - 1s 4ms/step - loss: 0.0751 - accuracy: 0.9
model2_sizes = [(layer.trainable_weights[0].shape[0]*(layer.trainable_weights[0].shape[1]+1
model2_sizes
res_df
learning_step = 0.001
learning_step = 0.01
Резюмируем
при использовании неограниченных по значению функций активации хорошим
средством от гиперактивации является нормализация выхода слоя.
еще один существенный способ улучшить сходимость и результат обучения - это выбор
параметров и методов оптимизации