CNN Backprop Matlab

A Cast Study of Back Propagation in Convolutional
Neural Network∗
Tirtharaj Dash, BITS Pilani, Goa Campus
October 2, 2018
Abstract
In this report, we study how back propagation is conducted in a simple convo-
lutional neural network takes a two-dimensional input and has two convolutional
neural layers. We will see that the feed-forward computation in CNN is similar to
that in MLP. We further derive the filter learning equations based on the gradients
computed based on a squared loss.
1 CNN – Example
We will stick to the following CNN architecture:
Figure 1: CNN with two convolution layers. Obtained from the Matlab Example from
Deep Learning Toolbox.
Parameters and Initialisation

1
• C1 layer: k1,p : 5 × 5. b1p : 1 × 1. p = 1, 2, . . . , 6
2
• C2 layer: kp,q : 5 × 5. b2q : 1 × 1. q = 1, 2, . . . , 12
∗
This material is created only for academic reading purpose and it is based on the cited references.
1
• FC layer: W : 10 × 192 and b: 10 × 1.
All the filters and weights are initialised based on some uniform distribution. The biases
are initialised to 0. The total number of parameters in our network is:
(5 × 5 + 1) × 6 + (5 × 5 × 6 + 1) × 12 + 10 × 192 + 10 = 3898
and
1
σ(x) =
1 + e−x
1.1 Feed-forward Computation

Layer by layer computations are as follows:
1.1.1 Convolutional Layer, C1

Cp1 = σ I ∗ k1,p
1
+ b1p

(1)
or, !
2
X 2
X
Cp1 (i, j) = σ 1
I(i − u, j − v) · k1,p (u, v) + b1p (2)
i=−2 j=−2
where p = 1, . . . , 6 because there are 6 feature maps (kernels or filters) on C1 layer. Here
i, j are row and column indices of the feature map.
1.1.2 Pooling layer, S1

Here, average pooling is used.
1 1
1 XX 1
Sp1 (i, j) = C (2i − u, 2j − v), (3)
4 u=0 v=0 p
i, j = 1, 2, . . . , 12.
1.1.3 Convolution Layer, C2

p
!
X
Cq2 = σ Sp1 ∗ kp,q
2
+ b2q (4)
p=1
or !
6 X
X 2 2
X
Cq2 (i, j) = σ Sp1 (i − u, j − v) · kp,q
2
(u, v) + b2q (5)
p=1 u=−2 v=−2
where q = 1, 2, . . . , 12. There are 12 filters on C2 layer.
1.1.4 Pooling layer, S2

1 1
2 1 XX 2
Sp,q (i, j) = C (2i − u, 2j − v), (6)
4 u=2 v=0 q
i, j = 1, 2, . . . , 4
2
1.1.5 Vectorisation and Concatenation
Each Sq2 is a 4 × 4 matrix. There are 12 such matrices in the S2 layer. These matrices
are vectorised row-wise (column scan), and then the 12 1-dimensional vectors are con-
catenated. The size of the flattened layer is 4 × 4 × 12 = 192. We denote this process
as
f = F ({Sq2 }q=1,2,...,12 ) (7)
and the reverse process is:
{Sq2 }q=1,2,...,12 = F −1 (f ) (8)
1.1.6 Fully connected layer, FC

ŷ = σ(W × f + b) (9)
1.1.7 Loss function

We will use a squared loss:
10
1X
L= (y(i) − ŷ(i))2 (10)
2 i=1
1.2 Back Propagation

In the back propagation, we will first compute all the gradients generated in each layer.
For simplicity, we will denote ∇x = ∂L
∂x
.
1.2.1 ∇W (10 × 192)
∂L
∇W (i, j) =
∂W (i, j)
∂L ∂ ŷ(i)
= ·
∂ ŷ(i) ∂W (i, j)
192
!
∂ ŷ(i) X
= (y(i) − haty(i)) · ∂ · σ W (i, j) × f (j) + b(i)
∂W (i, j) i=1
= (y(i) − haty(i)) · ŷ(i)(1 − ŷ(i)) · f (j)
Let ∇ŷ(i) = (y(i) − haty(i)) · ŷ(i)(1 − ŷ(i)), whose size is 10 × 1, then
∇W (i, j) = ∇ŷ(i) · f (j) (11)
or
∇W = ∇ŷ × f T (12)
1.2.2 ∇b (10 × 1)
From the equation of ∇W , we see that we can directly obtain the value of ∇b with input
1.
∇b = ∇ŷ (13)
3
1.2.3 ∇f (192 × 1)
∂L
∇f (j) =
∂f (j)
10
X ∂L ∂ ŷ(i)
=
i=1
∂ ŷ(i) ∂f (j)
192
!
∂ X
= (y(i) − ŷ(i)) σ W (i, j) × f (j) + b(i)
∂f (j) i=1
10
X
= (y(i) − ŷ(i))ŷ(i)(1 − ŷ(i))W (i, j)
i=1
10
X
= ∇ŷ(i)W (i, j)
i=1
Or,
∇f = W T × ∇f (14)
1.2.4 ∇Sq2
{∇Sq2 }q=1,2,...,12 = F −1 (∇f ) (15)
2
1.2.5 ∇kp,q (5 × 5)
1
∇Cq2 (i, j) = ∇Sq2 (di/2e, dj/2e) (16)
4
i, j = 1, . . . , 8. The above step is upsampling the gradient with the help fo Kronecker
product. i.e.
up(x) = 1 ⊗ x
Now,
2 ∂L
∇kp,q (u, v) = 2 (u, v)
∂kp,q
8 X 8
X ∂L ∂Cq2 (i, j)
=
i=1 j=1
∂Cq2 (i, j) ∂kp,q
2 (u, v)
8
8 X 6 X 2 2
!
X
2 ∂ X X
= ∇Cq (i, j) 2 σ Sp1 (i − u, j − v)kp,q
2
(u, v) + b2q
i=1 j=1
∂k p,q (u, v) p=1 u=−2 v=−2
8 X
X 8
= ∇Cq2 (i, j)Cq2 (i, j)(1 − Cq2 (i, j))Sp1 (i − u, j − v)
i=1 j=1
2
Let ∇Cq,σ = ∇Cq2 (i, j)Cq2 (i, j)(1 − Cq2 (i, j)).
6 X
X 2 2
X
2
Cq,σ = 2
Sp1 (i − u, j − v)kp,q (u, v) + b2q (17)
p=1 u=−2 v=−2
4
To write this as a cross-correlation term, we will flip the filter 180o . So,
8 X
X 8
2 1 2
∇kp,q (u, v) = Sp,rot180 (u − i, v − j)∇Cq,σ (i, j) (18)
i=1 j=1
2 1 2
∇kp,q = Sp,rot180 ∗ ∇Cq,σ (19)
1.2.6 ∇b2q (1 × 1
This is straightforward obtainable from the above equation:
8 X
X 8
∇b2q = 2
∇Cq,σ (i, j) (20)
i=1 j=1
The layer 2 gradient computation is finished.
Homework 1 Follow the similar approach as the layer 2 gradient computation as de-
scribed here and compute the layer 1 gradients for the given architecture.
1.3 Updates
We need to update the parameters as follows, where, η is the learning rate.
1 1 1
k1,p ← k1,p − η∇k1,p (21)
b1p
← − η∇b1p
b1p (22)
2 2 2
kp,q ← kp,q − η∇kp,q (23)
b2q ← b2q − η∇b2q (24)
W ← W − η∇W (25)
b ← b − η∇b (26)
References
[1] Bouvrie, J. (2006). Notes on convolutional neural networks.
[2] Zhang, Z. (2016). Derivation of Backpropagation in Convolutional Neural Network

(CNN).
[3] Ionescu, C., Vantzos, O., & Sminchisescu, C. (2015). Matrix backpropagation for deep
networks with structured layers. In Proceedings of the IEEE International Conference on
Computer Vision (pp. 2965-2973).

CNN Backprop Matlab

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

CNN Backprop Matlab

Загружено:

Авторское право:

Доступные форматы

A Cast Study of Back Propagation in Convolutional

Parameters and Initialisation

1.1 Feed-forward Computation

1.1.1 Convolutional Layer, C1

1.1.2 Pooling layer, S1

1.1.3 Convolution Layer, C2

where q = 1, 2, . . . , 12. There are 12 filters on C2 layer.

1.1.4 Pooling layer, S2

1.1.6 Fully connected layer, FC

1.1.7 Loss function

1.2 Back Propagation

1.2.1 ∇W (10 × 192)

Let ∇ŷ(i) = (y(i) − haty(i)) · ŷ(i)(1 − ŷ(i)), whose size is 10 × 1, then

∇W (i, j) = ∇ŷ(i) · f (j) (11)

The layer 2 gradient computation is finished.

[2] Zhang, Z. (2016). Derivation of Backpropagation in Convolutional Neural Network

Вам также может понравиться