Академический Документы
Профессиональный Документы
Культура Документы
Deep Learning
Russ Salakhutdinov
Machine Learning Department
rsalakhu@cs.cmu.edu
h0p://www.cs.cmu.edu/~rsalakhu/10707/
Convolutional Networks I
Used Resources
Disclaimer: Much of the material in this lecture was borrowed from
Hugo Larochelles class on Neural Networks:
https://sites.google.com/site/deeplearningsummerschool2016/
Local connectivity
Parameter sharing
Convolution
Pooling / subsampling hidden units
Local Connectivity
Use a local connectivity of hidden units
Each hidden unit is connected only to a
sub-region (patch) of the input image
Local connectivity
Parameter sharing
Convolution
Pooling / subsampling hidden units
Parameter Sharing
Share matrix of parameters across some units
Units that are organized into the feature map share parameters
same color
= Wij is the matrix connec?ng
same matrix of the ith input channel with the
connection jth feature map
Parameter Sharing
Why parameter sharing?
Reduces even more the number of parameters
same color
= Wij is the matrix connec?ng
same matrix of the ith input channel with the
connection jth feature map
Parameter Sharing
Share matrix of parameters across certain units
Local connectivity
Parameter sharing
Convolution
Pooling / subsampling hidden units
Filter Bank Layer - FCSG : the input of a
Parameterlayer is a 3DSharing
array with n1 2D feature maps of si
Each componentMath for my
is denoted slides
xijk , and Compu
each fea
Each feature map forms denoted xi . ofThe
a 2D grid output is also a 3D array, y co
features
can be computed withm1 afeature
maps
discrete H X
convolution () m
of size of a m3 . A filter in the
2 kernel
kijhidden
matrix kij which is the has size l1 matrix
weights l2 Wand connects input feature
ij with its rows and
columns flipped output feature map yj . The module computes:
!
yj = gj tanh( kij xi )
i
- xi is the ith channel of input
where tanh is the hyperbolic tangent non-linear
- kij is the convolution kernel
2D discrete convolution operator and gj is a train
- gj is a learned scaling factor
coefficient. By taking into account the borders
- gj is the hidden layer
have m1 = n1 l1 + 1, and m2 = n2 l2 + 1. T
Jarret et al. 2009
denoted by FCSG canbecause
add bias it is composed of a se
lution filters (C), a sigmoid/tanh non-linearity (S
Figure 1. A example of feature extracti
coefficients (G). In the following, superscripts
Discrete Convolution
The convolution of an image x with a kernel k is computed as
follows:
X
(x k)ij = xi+p,j+q kr p,r q
pq
Example:
Discrete Convolution
The convolution of an image x with a kernel k is computed as
follows:
X
(x k)ij = xi+p,j+q kr p,r q
pq
Example:
k = k with rows and columns flipped
Discrete Convolution
The convolution of an image x with a kernel k is computed as
follows:
X
(x k)ij = xi+p,j+q kr p,r q
pq
0.5%
W
0% 0% 0% 255% 0% 0%
0% 128% 128% 0%
0% 0.5% 0% 255% 0% 0%
0% 0% 128% 128% 0%
0% 0% 255% 0% 0%
0% 255% 0% 0%
0% 255% 0% 0% 0%
255% 0% 0% 0%
255% 0% 0% 0% 0%
%%%%%
X X W%%%%%
Example
With a non-linearity, we get a detector of a feature at any
position in the image:
Example
0% 0.5% connexions%
x kij , where Wij = Wij
Calcul%dune%couche%%simple%cell%% vers%les%neurones
premire%tape%:%calcul%de%la%convolu7on%% 0.5% 0% cachs%
0.5%
W
0% 0% 0% 255% 0% 0%
0% 128% 128% 0%
0% 0.5% 0% 255% 0% 0%
0% 0% 128% 128% 0%
0% 0% 255% 0% 0%
0% 255% 0% 0%
0% 255% 0% 0% 0%
255% 0% 0% 0%
255% 0% 0% 0% 0%
%%%%%
X X W%%%%%
Example
Can use zero padding to allow going over the borders ( * )
Example
Multiple Feature Maps
Example: 200x200 image, 100 filters,
filter size 10x10, 10K parameters
Computer Vision
Our goal is to design neural networks that are specifically
adapted for such problems
Must deal with very high-dimensional inputs: 150 x 150 pixels =
22500 inputs, or 3 x 22500 if RGB pixels
Can exploit the 2D topology of pixels (or 3D for video data)
Can build in invariance to certain variations: translation,
illumination, etc.
Local connectivity
Parameter sharing
Convolution
Pooling / subsampling hidden units
Abstract
Pooling
.
Pool hidden units in same neighborhood
pooling is performed in non-overlapping neighborhoods
(subsampling)
Pooling
Poolyijk
hidden unitsxini,j+p,k+q
= max same neighborhood
p,q
an alternative to max pooling is average pooling
Logis6c(%(%%%%%%%%%%%%%n%200%)%/%50%)%
X W%%%%%
%%%%%
Why pooling?
% %
couche))simple)cell)) couche))complex)cell))
Introduces invariance to local translations
IFT%615% Hugo%Larochelle% 49%
Reduces the number of hidden units in hidden layer
Example: Pooling
Gradient of Convolutional
November 8, 2012 Layer
Let l be the loss function
Abstract
For max pooling operation yijk = max xi,j+p,k+q , the
Math for my slides Computer p,q
gradient for xijkvision.
is
yijk =
where p, q = argmax xi,j+p,k+q xi,j+p,k+q
M p,q
yijk = max xi,j+p,k+q
In other words, only the winning units
p,qin layer x get the gradient
from the pooled layer
1 X
For the average operation yijk = 2
m p,q
xi,j+p,k+q , the
gradient for xijk is
P 1
vijk = xijk rx l = 2 upsample(ry l)
ipq wpq xi,j+p,k+q
m
yijk = vijk / max(c, jk )
where upsample inverts subsampling
P 2
jk = ( ipq wpq vi,j+p,k+q )1/2
Convolutional Network
Convolutional neural network alternates between the
convolutional and pooling layers
Figure
where c1. Asmall
is a example
constant to of feature
prevent extraction
division by 0 stage of
Rabs
N P A . An input image
reduces units activation if neighbors are also active
(or a featur
through
createsacompetition
non-linear between filterbank,
feature maps followed by re
scales activations at each layer better for learning
contrast normalization and spatial pooling/sub-sa
Local Contrast Normalization
Perform local contrast normalization
nk
3.
is
of Figure 1. A example of feature extraction stage of the type
JarretFetCSG
al. 2009
nk Rabs N PA . An input image (or a feature map) is passed
to through a non-linear filterbank, followed by rectification, local
contrast normalization and spatial pooling/sub-sampling.