Академический Документы
Профессиональный Документы
Культура Документы
MIT 6.S191
Ava Soleimany
January 29, 2019
6.S191 Introduction to Deep Learning
1/29/19
introtodeeplearning.com
What Computers “See”
Images are Numbers
Lincoln 0.8
Washington 0.1
classification
Jefferson 0.05
Obama 0.05
Problems?
How can we use spatial structure in the input to inform the architecture of the network?
filters
= 9
filter
image
We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs…
' '
4x4 filter: matrix 1) applying a window of weights
$ $ !"# (")*,#), + .
of weights !"# 2) computing linear combinations
"%& #%&
3) activating with non-linear function
for neuron (p,q) in hidden layer
6.S191 Introduction to Deep Learning
1/29/19
introtodeeplearning.com
CNNs: Spatial Arrangement of Output Volume
depth
Layer Dimensions:
ℎ"#"$
where h and w are spatial dimensions
d (depth) = number of filters
height
Stride:
Filter step size
Receptive Field:
Locations in input image that
width a node is path connected to
1) Reduced dimensionality
2) Spatial invariance
Classification task: produce a list of object categories present in image. 1000 categories.
“Top 5 error”: rate at which the model does not output correct label in top 5 predictions
Other tasks include:
single-object localization, object detection from video/image, scene classification, scene parsing
2013: ZFNet
20
- 8 layers, more filters
16.4 2014:VGG
- 19 layers
11.7 2014: GoogLeNet
10
6.7 - “Inception” modules
5.1 - 22 layers, 5million parameters
3.57
2015: ResNet
0 - 152 layers
10
11
12
13
14
15
an
20
20
20
20
20
20
um
H
number of layers
20
100
16.4
11.7
10 50
7.3 6.7
5.1
3.57
0 0
10
11
12
13
14
14
15
an
10
11
12
13
14
14
15
um
20
20
20
20
20
20
20
20
20
20
20
20
20
20
H
Automobile
Bird
Cat
Deer
Dog
MNIST: handwritten digits
Frog
Horse
Ship
ImageNet:
22K categories. 14M images. Truck
places: natural scenes
CIFAR-10
6.S191 Introduction to Deep Learning
1/29/19
introtodeeplearning.com
Deep Learning for Computer Vision: Impact