Академический Документы
Профессиональный Документы
Культура Документы
Joan Bruna (Berkeley), Raja Giryes (Duke), Guillermo Sapiro (Duke), Rene Vidal (Johns Hopkins)
Tutorial Schedule
14:00-14.30: Introduction
14:30-15.15: Global Optimality in Deep Learning (Ren Vidal)
15:15-16.00: Coffee Break
16:00-16:45: Scattering Convolutional Networks (Joan Bruna)
16:45-17:30: Stability of Deep Networks (Raja Giryes)
17.30-18:00: Questions and Discussion
Disclaimer
What do we mean by Deep Learning in this tutorial?
Disclaimer
What do we mean by Deep Learning in this tutorial?
(Yanns Family)
CNN
non-CNN
2015 results: MSRA under 3.5% error. (using a CNN with 150 layers!)
figures from Yann LeCuns CVPR15 plenary
Puzzling Questions
What made this result possible?
Larger training sets (1.2 million, high-resolution training samples, 1000
object categories)
Better Hardware (GPU)
Better Learning Regularization (Dropout)
sup |f (x)
x2Im
F (x)| < .
sup |f (x)
F (x)| < .
x2Im
2
Cf
Nm
O
+O
log K ,
N
K
where K is the number of training points, N is the number of neurons, m is the
input dimension, and Cf measures the global smoothness of f .
2
Cf
Nm
O
+O
log K ,
N
K
where K is the number of training points, N is the number of neurons, m is the
input dimension, and Cf measures the global smoothness of f .
Non-Convex Optimization
[Choromaska et al, AISTATS15] (also [Dauphin et al, ICML15] ) use tools
from Statistical Physics to explain the behavior of stochastic gradient methods
when training deep neural networks.
Non-Convex Optimization
[Choromaska et al, AISTATS15] (also [Dauphin et al, ICML15] ) use tools
from Statistical Physics to explain the behavior of stochastic gradient methods
when training deep neural networks.
Tutorial Outline
CHAPTER 4. GENERALIZED FACTORIZATIONS
(c)
(e)
(d)
(g)
(f )
(h)
Figure 4.1: Left: Example critical points of a non-convex function (shown in red).
Part
II: plateau
Signal(b,d)
Recovery
from
Scattering
Convolutional
(a)
Saddle
Global minima
(c,e,g)
Local maxima
(f,h) Local minima (i
-Networks
right panel) Saddle
Right: Guaranteed properties of our framework. From
(Joanpoint.
Bruna)
any initialization a non-increasing path exists to a global minimum. From points on
a flat plateau a simple method exists to find the edge of the plateau (green points).
original
single-layer recovery
scattering recovery
N )2 )
O(log is
N )a simpleO((log
plateaus (a,c) for which there is no local descent direction , there
method
to find the edge of the plateau from which there will be a descent direction (green
points). Taken
will imply a theoretical meta-algorithm that is
together, these results
()
Key Results
Deep learning is a positively homogeneous factorization problem
With proper regularization, local minima are global
CHAPTER 4. GENERALIZED FACTORIZATIONS
If network large enough, global minima can be found by local descent
Critical Points of Non-Convex Function
(a)
(b)
(c)
(e)
(d)
(g)
(f )
(h)
Figure 4.1: Left: Example critical points of a non-convex function (shown in red).
(a) Saddle plateau (b,d) Global minima (c,e,g) Local maxima (f,h) Local minima (i
- right panel) Saddle point. Right: Guaranteed properties of our framework. From
original
Key Results
single-layer recovery
O(log N )
scattering recovery
O((log N )2 )
()
such that
Tutorial Schedule
14:00-14.30: Introduction
14:30-15.15: Global Optimality in Deep Learning (Ren Vidal)
15:15-16.00: Coffee Break
16:00-16:45: Scattering Convolutional Networks (Joan Bruna)
16:45-17:30: Stability of Deep Networks (Raja Giryes)
17.30-18:00: Questions and Discussion