Академический Документы
Профессиональный Документы
Культура Документы
Neural Network∗
Tirtharaj Dash, BITS Pilani, Goa Campus
October 2, 2018
Abstract
In this report, we study how back propagation is conducted in a simple convo-
lutional neural network takes a two-dimensional input and has two convolutional
neural layers. We will see that the feed-forward computation in CNN is similar to
that in MLP. We further derive the filter learning equations based on the gradients
computed based on a squared loss.
1 CNN – Example
We will stick to the following CNN architecture:
Figure 1: CNN with two convolution layers. Obtained from the Matlab Example from
Deep Learning Toolbox.
1
• FC layer: W : 10 × 192 and b: 10 × 1.
All the filters and weights are initialised based on some uniform distribution. The biases
are initialised to 0. The total number of parameters in our network is:
(5 × 5 + 1) × 6 + (5 × 5 × 6 + 1) × 12 + 10 × 192 + 10 = 3898
and
1
σ(x) =
1 + e−x
where p = 1, . . . , 6 because there are 6 feature maps (kernels or filters) on C1 layer. Here
i, j are row and column indices of the feature map.
i, j = 1, 2, . . . , 12.
2
1.1.5 Vectorisation and Concatenation
Each Sq2 is a 4 × 4 matrix. There are 12 such matrices in the S2 layer. These matrices
are vectorised row-wise (column scan), and then the 12 1-dimensional vectors are con-
catenated. The size of the flattened layer is 4 × 4 × 12 = 192. We denote this process
as
f = F ({Sq2 }q=1,2,...,12 ) (7)
and the reverse process is:
{Sq2 }q=1,2,...,12 = F −1 (f ) (8)
∂L
∇W (i, j) =
∂W (i, j)
∂L ∂ ŷ(i)
= ·
∂ ŷ(i) ∂W (i, j)
192
!
∂ ŷ(i) X
= (y(i) − haty(i)) · ∂ · σ W (i, j) × f (j) + b(i)
∂W (i, j) i=1
= (y(i) − haty(i)) · ŷ(i)(1 − ŷ(i)) · f (j)
or
∇W = ∇ŷ × f T (12)
1.2.2 ∇b (10 × 1)
From the equation of ∇W , we see that we can directly obtain the value of ∇b with input
1.
∇b = ∇ŷ (13)
3
1.2.3 ∇f (192 × 1)
∂L
∇f (j) =
∂f (j)
10
X ∂L ∂ ŷ(i)
=
i=1
∂ ŷ(i) ∂f (j)
192
!
∂ X
= (y(i) − ŷ(i)) σ W (i, j) × f (j) + b(i)
∂f (j) i=1
10
X
= (y(i) − ŷ(i))ŷ(i)(1 − ŷ(i))W (i, j)
i=1
10
X
= ∇ŷ(i)W (i, j)
i=1
Or,
∇f = W T × ∇f (14)
1.2.4 ∇Sq2
{∇Sq2 }q=1,2,...,12 = F −1 (∇f ) (15)
2
1.2.5 ∇kp,q (5 × 5)
1
∇Cq2 (i, j) = ∇Sq2 (di/2e, dj/2e) (16)
4
i, j = 1, . . . , 8. The above step is upsampling the gradient with the help fo Kronecker
product. i.e.
up(x) = 1 ⊗ x
Now,
2 ∂L
∇kp,q (u, v) = 2 (u, v)
∂kp,q
8 X 8
X ∂L ∂Cq2 (i, j)
=
i=1 j=1
∂Cq2 (i, j) ∂kp,q
2 (u, v)
8
8 X 6 X 2 2
!
X
2 ∂ X X
= ∇Cq (i, j) 2 σ Sp1 (i − u, j − v)kp,q
2
(u, v) + b2q
i=1 j=1
∂k p,q (u, v) p=1 u=−2 v=−2
8 X
X 8
= ∇Cq2 (i, j)Cq2 (i, j)(1 − Cq2 (i, j))Sp1 (i − u, j − v)
i=1 j=1
2
Let ∇Cq,σ = ∇Cq2 (i, j)Cq2 (i, j)(1 − Cq2 (i, j)).
6 X
X 2 2
X
2
Cq,σ = 2
Sp1 (i − u, j − v)kp,q (u, v) + b2q (17)
p=1 u=−2 v=−2
4
To write this as a cross-correlation term, we will flip the filter 180o . So,
8 X
X 8
2 1 2
∇kp,q (u, v) = Sp,rot180 (u − i, v − j)∇Cq,σ (i, j) (18)
i=1 j=1
2 1 2
∇kp,q = Sp,rot180 ∗ ∇Cq,σ (19)
1.2.6 ∇b2q (1 × 1
This is straightforward obtainable from the above equation:
8 X
X 8
∇b2q = 2
∇Cq,σ (i, j) (20)
i=1 j=1
Homework 1 Follow the similar approach as the layer 2 gradient computation as de-
scribed here and compute the layer 1 gradients for the given architecture.
1.3 Updates
We need to update the parameters as follows, where, η is the learning rate.
1 1 1
k1,p ← k1,p − η∇k1,p (21)
b1p
← − η∇b1p
b1p (22)
2 2 2
kp,q ← kp,q − η∇kp,q (23)
b2q ← b2q − η∇b2q (24)
W ← W − η∇W (25)
b ← b − η∇b (26)
References
[1] Bouvrie, J. (2006). Notes on convolutional neural networks.
[3] Ionescu, C., Vantzos, O., & Sminchisescu, C. (2015). Matrix backpropagation for deep
networks with structured layers. In Proceedings of the IEEE International Conference on
Computer Vision (pp. 2965-2973).