Вы находитесь на странице: 1из 5

A Cast Study of Back Propagation in Convolutional

Neural Network∗
Tirtharaj Dash, BITS Pilani, Goa Campus
October 2, 2018

Abstract
In this report, we study how back propagation is conducted in a simple convo-
lutional neural network takes a two-dimensional input and has two convolutional
neural layers. We will see that the feed-forward computation in CNN is similar to
that in MLP. We further derive the filter learning equations based on the gradients
computed based on a squared loss.

1 CNN – Example
We will stick to the following CNN architecture:

Figure 1: CNN with two convolution layers. Obtained from the Matlab Example from
Deep Learning Toolbox.

Parameters and Initialisation


1
• C1 layer: k1,p : 5 × 5. b1p : 1 × 1. p = 1, 2, . . . , 6
2
• C2 layer: kp,q : 5 × 5. b2q : 1 × 1. q = 1, 2, . . . , 12

This material is created only for academic reading purpose and it is based on the cited references.

1
• FC layer: W : 10 × 192 and b: 10 × 1.

All the filters and weights are initialised based on some uniform distribution. The biases
are initialised to 0. The total number of parameters in our network is:

(5 × 5 + 1) × 6 + (5 × 5 × 6 + 1) × 12 + 10 × 192 + 10 = 3898

and
1
σ(x) =
1 + e−x

1.1 Feed-forward Computation


Layer by layer computations are as follows:

1.1.1 Convolutional Layer, C1


Cp1 = σ I ∗ k1,p
1
+ b1p

(1)
or, !
2
X 2
X
Cp1 (i, j) = σ 1
I(i − u, j − v) · k1,p (u, v) + b1p (2)
i=−2 j=−2

where p = 1, . . . , 6 because there are 6 feature maps (kernels or filters) on C1 layer. Here
i, j are row and column indices of the feature map.

1.1.2 Pooling layer, S1


Here, average pooling is used.
1 1
1 XX 1
Sp1 (i, j) = C (2i − u, 2j − v), (3)
4 u=0 v=0 p

i, j = 1, 2, . . . , 12.

1.1.3 Convolution Layer, C2


p
!
X
Cq2 = σ Sp1 ∗ kp,q
2
+ b2q (4)
p=1
or !
6 X
X 2 2
X
Cq2 (i, j) = σ Sp1 (i − u, j − v) · kp,q
2
(u, v) + b2q (5)
p=1 u=−2 v=−2

where q = 1, 2, . . . , 12. There are 12 filters on C2 layer.

1.1.4 Pooling layer, S2


1 1
2 1 XX 2
Sp,q (i, j) = C (2i − u, 2j − v), (6)
4 u=2 v=0 q
i, j = 1, 2, . . . , 4

2
1.1.5 Vectorisation and Concatenation
Each Sq2 is a 4 × 4 matrix. There are 12 such matrices in the S2 layer. These matrices
are vectorised row-wise (column scan), and then the 12 1-dimensional vectors are con-
catenated. The size of the flattened layer is 4 × 4 × 12 = 192. We denote this process
as
f = F ({Sq2 }q=1,2,...,12 ) (7)
and the reverse process is:
{Sq2 }q=1,2,...,12 = F −1 (f ) (8)

1.1.6 Fully connected layer, FC


ŷ = σ(W × f + b) (9)

1.1.7 Loss function


We will use a squared loss:
10
1X
L= (y(i) − ŷ(i))2 (10)
2 i=1

1.2 Back Propagation


In the back propagation, we will first compute all the gradients generated in each layer.
For simplicity, we will denote ∇x = ∂L
∂x
.

1.2.1 ∇W (10 × 192)

∂L
∇W (i, j) =
∂W (i, j)
∂L ∂ ŷ(i)
= ·
∂ ŷ(i) ∂W (i, j)
192
!
∂ ŷ(i) X
= (y(i) − haty(i)) · ∂ · σ W (i, j) × f (j) + b(i)
∂W (i, j) i=1
= (y(i) − haty(i)) · ŷ(i)(1 − ŷ(i)) · f (j)

Let ∇ŷ(i) = (y(i) − haty(i)) · ŷ(i)(1 − ŷ(i)), whose size is 10 × 1, then

∇W (i, j) = ∇ŷ(i) · f (j) (11)

or
∇W = ∇ŷ × f T (12)

1.2.2 ∇b (10 × 1)
From the equation of ∇W , we see that we can directly obtain the value of ∇b with input
1.
∇b = ∇ŷ (13)

3
1.2.3 ∇f (192 × 1)

∂L
∇f (j) =
∂f (j)
10
X ∂L ∂ ŷ(i)
=
i=1
∂ ŷ(i) ∂f (j)
192
!
∂ X
= (y(i) − ŷ(i)) σ W (i, j) × f (j) + b(i)
∂f (j) i=1
10
X
= (y(i) − ŷ(i))ŷ(i)(1 − ŷ(i))W (i, j)
i=1
10
X
= ∇ŷ(i)W (i, j)
i=1

Or,
∇f = W T × ∇f (14)

1.2.4 ∇Sq2
{∇Sq2 }q=1,2,...,12 = F −1 (∇f ) (15)

2
1.2.5 ∇kp,q (5 × 5)
1
∇Cq2 (i, j) = ∇Sq2 (di/2e, dj/2e) (16)
4
i, j = 1, . . . , 8. The above step is upsampling the gradient with the help fo Kronecker
product. i.e.
up(x) = 1 ⊗ x
Now,

2 ∂L
∇kp,q (u, v) = 2 (u, v)
∂kp,q
8 X 8
X ∂L ∂Cq2 (i, j)
=
i=1 j=1
∂Cq2 (i, j) ∂kp,q
2 (u, v)

8
8 X 6 X 2 2
!
X
2 ∂ X X
= ∇Cq (i, j) 2 σ Sp1 (i − u, j − v)kp,q
2
(u, v) + b2q
i=1 j=1
∂k p,q (u, v) p=1 u=−2 v=−2
8 X
X 8
= ∇Cq2 (i, j)Cq2 (i, j)(1 − Cq2 (i, j))Sp1 (i − u, j − v)
i=1 j=1

2
Let ∇Cq,σ = ∇Cq2 (i, j)Cq2 (i, j)(1 − Cq2 (i, j)).
6 X
X 2 2
X
2
Cq,σ = 2
Sp1 (i − u, j − v)kp,q (u, v) + b2q (17)
p=1 u=−2 v=−2

4
To write this as a cross-correlation term, we will flip the filter 180o . So,
8 X
X 8
2 1 2
∇kp,q (u, v) = Sp,rot180 (u − i, v − j)∇Cq,σ (i, j) (18)
i=1 j=1
2 1 2
∇kp,q = Sp,rot180 ∗ ∇Cq,σ (19)

1.2.6 ∇b2q (1 × 1
This is straightforward obtainable from the above equation:
8 X
X 8
∇b2q = 2
∇Cq,σ (i, j) (20)
i=1 j=1

The layer 2 gradient computation is finished.

Homework 1 Follow the similar approach as the layer 2 gradient computation as de-
scribed here and compute the layer 1 gradients for the given architecture.

1.3 Updates
We need to update the parameters as follows, where, η is the learning rate.
1 1 1
k1,p ← k1,p − η∇k1,p (21)
b1p
← − η∇b1p
b1p (22)
2 2 2
kp,q ← kp,q − η∇kp,q (23)
b2q ← b2q − η∇b2q (24)
W ← W − η∇W (25)
b ← b − η∇b (26)

References
[1] Bouvrie, J. (2006). Notes on convolutional neural networks.

[2] Zhang, Z. (2016). Derivation of Backpropagation in Convolutional Neural Network


(CNN).

[3] Ionescu, C., Vantzos, O., & Sminchisescu, C. (2015). Matrix backpropagation for deep
networks with structured layers. In Proceedings of the IEEE International Conference on
Computer Vision (pp. 2965-2973).

Вам также может понравиться