ISBI2 PDF Version

ueep Learnlng ll
8uss SalakhuLdlnov
Department of Statistics and Computer Science
University of Toronto
hup://www.uLsLaL.LoronLo.edu/~rsalakhu/lsbl.hLml
1alk 8oadmap
ueep 8olLzmann Machlnes
Cne-ShoL and 1ransfer Learnlng
Learnlng SLrucLured and 8obusL ueep Models
Advanced ueep Models
Mulumodal Learnlng
Concluslons
h
3
h
2
h
1
v
W
3
W
2
W
1
h
3
h
2
h
1
v
W
3
W
2
W
1
Deep Belief Network Deep Boltzmann Machine
u8ns vs. u8Ms
u8ns are hybrld models:
lnference ln u8ns ls problemauc due Lo exp|a|n|ng away.
Cnly greedy preLralnlg, no [o|nt opnm|zanon over a|| |ayers.
ApproxlmaLe lnference ls feed-forward: no bouom-up and top-down.
MaLhemaucal lormulauon
h
3
h
2
h
1
v
W
3
W
2
W
1
model parameLers
8ouom-up and 1op-down:
ueep 8olLzmann Machlne
8ouom-up 1op-uown
unllke many exlsung feed-forward models: ConvneL (LeCun),
PMAx (ogglo eL.al.), ueep 8ellef neLs (PlnLon eL.al.)
uependencles beLween hldden varlables.
All connecuons are undlrecLed.
lnpuL
h
3
h
2
h
1
v
W
3
W
2
W
1
Condluonal ulsLrlbuuons:
lnpuL
noLe LhaL exacL compuLauon of
ls lnLracLable.
h
3
h
2
h
1
v
W
3
W
2
W
1
neural neLwork
CuLpuL
h
3
h
2
h
1
v
W
3
W
2
W
1
h
3
h
2
h
1
v
W
3
W
2
W
1
ueep 8ellef neLwork
PMAx (ogglo), ueep 8ellef neLs (PlnLon)
lnpuL
h
3
h
2
h
1
v
W
3
W
2
W
1
h
3
h
2
h
1
v
W
3
W
2
W
1
ueep 8olLzmann Machlne ueep 8ellef neLwork
h
3
h
2
h
1
v
W
3
W
2
W
1
PMAx (ogglo), ueep 8ellef neLs (PlnLon)
l
n
f
e
r
e
n
c
e

neural neLwork
CuLpuL
lnpuL
model parameLers
Maxlmum llkellhood learnlng:
rob|em: 8oLh expecLauons are
lnLracLable!
Learnlng rule for undlrecLed graphlcal models:
M8ls, C8ls, lacLor graphs.
uependencles beLween hldden varlables.
h
3
h
2
h
1
v
W
3
W
2
W
1
ApproxlmaLe Learnlng
(ApproxlmaLe) Maxlmum Llkellhood:
noL facLorlal any more!
h
3
h
2
h
1
v
W
3
W
2
W
1
8oLh expecLauons are lnLracLable!
uaLa
h
3
h
2
h
1
v
W
3
W
2
W
1
h
3
h
2
h
1
v
W
3
W
2
W
1
varlauonal
lnference
SLochasuc
Approxlmauon
(MCMC-based)
revlous Work
Many approaches for learnlng 8olLzmann machlnes have been
proposed over Lhe lasL 20 years:
PlnLon and Se[nowskl (1983),
eLerson and Anderson (1987)
Calland (1991)
kappen and 8odrlguez (1998)
Lawrence, 8lshop, and !ordan (1998)
1anaka (1998)
Welllng and PlnLon (2002)
Zhu and Llu (2002)
Welllng and 1eh (2003)
?asuda and 1anaka (2009)
Many of Lhe prevlous approaches were noL successful for learnlng
general 8olLzmann machlnes wlLh h|dden var|ab|es.
8eal-world appllcauons - Lhousands
of hldden and observed varlables
wlLh mllllons of parameLers.
AlgorlLhms based on ConLrasuve ulvergence, Score MaLchlng, seudo-
Llkellhood, ComposlLe Llkellhood, MCMC-MLL, lecewlse Learnlng, cannoL
handle muluple layers of hldden varlables.
new Learnlng AlgorlLhm
Condluonal
uncondluonal
osLerlor lnference SlmulaLe from Lhe Model
ApproxlmaLe
condluonal
ApproxlmaLe Lhe
[olnL dlsLrlbuuon
(Salakhutdinov, 2008; NIPS 2009)
Condluonal
uncondluonal
osLerlor lnference SlmulaLe from Lhe Model
ApproxlmaLe Lhe
[olnL dlsLrlbuuon
uaLa-dependenL
ApproxlmaLe
condluonal
uaLa-lndependenL
denslLy
MaLch
(Salakhutdinov, 2008; NIPS 2009)
Condluonal
uncondluonal
osLerlor lnference
ApproxlmaLe Lhe
[olnL dlsLrlbuuon
uaLa-dependenL
ApproxlmaLe
condluonal
uaLa-lndependenL
MaLch
key Idea of Cur Approach:
Markov Chaln
MonLe Carlo
uaLa-dependenL: Var|anona| Inference, mean-eld Lheory
uaLa-lndependenL: Stochasnc Approx|manon, MCMC based
Mean-lleld
SlmulaLe from Lhe Model
Sampllng from u8Ms
Sampllng from Lwo-hldden layer u8M by runnlng a Markov chaln:
.
Sample
8andomly
lnluallze
h
2
h
1
v
1lme L=1
SLochasuc Approxlmauon
updaLe
updaLe
h
2
h
1
v
L=2
h
2
h
1
v
L=3
CeneraLe by slmulaung from a Markov chaln
LhaL leaves lnvarlanL (e.g. Clbbs or M-P sampler)
updaLe by replaclng lnLracLable wlLh a polnL
esumaLe
ln pracuce we slmulaLe several Markov chalns ln parallel.
8obblns and Monro, Ann. MaLh. SLaLs, 1937
L. ?ounes, robablllLy 1heory 1989
updaLe and sequenually, where
Markov Chaln
MonLe Carlo
SLochasuc Approxlmauon
updaLe rule decomposes:
1rue gradlenL nolse Lerm
AlmosL sure convergence guaranLees as learnlng raLe
Connecuons Lo Lhe Lheory of sLochasuc approxlmauon and adapuve MCMC.
Salakhutdinov,
ICML 2010
rob|em: Plgh-dlmenslonal daLa:
Lhe energy landscape ls hlghly
mulumodal
key |ns|ght: 1he Lransluon operaLor can be
any valld Lransluon operaLor - 1empered
1ransluons, arallel/SlmulaLed 1emperlng.
osLerlor lnference
Mean-lleld
varlauonal lnference
ApproxlmaLe lnLracLable dlsLrlbuuon wlLh slmpler, LracLable
dlsLrlbuuon :
(Salakhutdinov, 2008; Salakhutdinov & Larochelle, AI & Statistics 2010)
varlauonal Lower 8ound
Mlnlmlze kL beLween approxlmaung and Lrue
dlsLrlbuuons wlLh respecL Lo varlauonal parameLers .
osLerlor lnference
Mean-lleld
dlsLrlbuuon :
Mean-I|e|d: Choose a fully facLorlzed dlsLrlbuuon:
wlLh
nonllnear xed-
polnL equauons:
Var|anona| Inference: Maxlmlze Lhe lower bound w.r.L.
varlauonal parameLers .
osLerlor lnference
Mean-lleld
dlsLrlbuuon :
1. Var|anona| Inference: Maxlmlze Lhe lower
bound w.r.L. varlauonal parameLers
Markov Chaln
MonLe Carlo
2. MCMC: Apply sLochasuc approxlmauon
Lo updaLe model parameLers
AlmosL sure convergence guaranLees Lo an asympLoucally
sLable polnL.
uncondluonal Slmulauon varlauonal Lower 8ound
osLerlor lnference
Mean-lleld
dlsLrlbuuon :
1. Var|anona| Inference: Maxlmlze Lhe lower
bound w.r.L. varlauonal parameLers
Markov Chaln
MonLe Carlo
2. MCMC: Apply sLochasuc approxlmauon
Lo updaLe model parameLers
AlmosL sure convergence guaranLees Lo an asympLoucally
sLable polnL.
uncondluonal Slmulauon
lasL lnference
Learnlng can scale Lo
mllllons of examples
Cood Cenerauve Model?
Pandwrluen CharacLers
8eal uaLa SlmulaLed
8eal uaLa SlmulaLed
MnlS1 Pandwrluen ulglL uaLaseL
Pandwrlung 8ecognluon
Learnlng AlgorlLhm Lrror
Loglsuc regresslon 12.0
k-nn 3.09
neural neL (lau 2003) 1.33
SvM (uecosLe eL.al. 2002) 1.40
ueep AuLoencoder
(8englo eL. al. 2007)
1.40
ueep 8ellef neL
(PlnLon eL. al. 2006)
1.20
D8M 0.9S
k-nn 18.92
neural neL 14.62
SvM (Larochelle eL.al. 2009) 9.70
ueep AuLoencoder
(8englo eL. al. 2007)
10.03
ueep 8ellef neL
(Larochelle eL. al. 2009)
9.68
D8M 8.40
MnlS1 uaLaseL Cpucal CharacLer 8ecognluon
60,000 examples of 10 dlglLs 42,132 examples of 26 Lngllsh leuers
ermuLauon-lnvarlanL verslon.
Cenerauve Model of 3-u Cb[ecLs
24,000 examples, 3 ob[ecL caLegorles, 3 dlerenL ob[ecLs wlLhln each
caLegory, 6 llghLnlng condluons, 9 elevauons, 18 azlmuLhs.
3-u Cb[ecL 8ecognluon
k-nn (LeCun 2004) 18.92
SvM (8englo & LeCun 2007) 11.6
ueep 8ellef neL (nalr &
PlnLon 2009)
9.0
D8M 7.2
auern Compleuon
ermuLauon-lnvarlanL verslon.
Learnlng Plerarchlcal 8epresenLauons
ueep 8olLzmann Machlnes:
Learnlng Plerarchlcal SLrucLure
ln leaLures: edges, comblnauon
of edges.
erforms well ln many appllcauon domalns
lasL lnference: fracuon of a second
Learnlng scales Lo mllllons of examples
Learnlng Plerarchlcal 8epresenLauons
ueep 8olLzmann Machlnes:
Learnlng Plerarchlcal SLrucLure
ln leaLures: edges, comblnauon
of edges.
1he Shape 8o|tzmann Mach|ne: a
Strong Mode| of Cb[ect Shape
(Lslaml, Peess, Wlnn, Cv8 2012).
na||uc|nanons |n Char|es 8onnet
Syndrome Induced by nomeostas|s:
a Deep 8o|tzmann Mach|ne Mode|
(8elcherL, Serles, SLorkey, nlS 2012)
need more sLrucLured
and robusL models
uemo u8M
1alk 8oadmap
Mulumodal Learnlng
Concluslons
Cne-shoL Learnlng
Pow can we learn a novel concepL - a hlgh dlmenslonal
sLausucal ob[ecL - from few examples.
zarc
segway
Supervlsed Learnlng
Segway MoLorcycle
1esL:
Learnlng Lo Learn
1esL:
Mllllons of unlabeled lmages
Some labeled lmages
8lcycle
LlephanL
uolphln
1racLor
8ackground knowledge
Learn novel concepL
from one example
Learn Lo 1ransfer
knowledge
Learnlng Lo Learn
1esL:
Mllllons of unlabeled lmages
Some labeled lmages
8lcycle
LlephanL
uolphln
1racLor
8ackground knowledge
Learn novel concepL
from one example
Learn Lo 1ransfer
knowledge
key problem ln compuLer vlslon,
speech percepuon, naLural language
processlng, and many oLher domalns.
One shot learning of simple visual concepts
Brenden M. Lake, Ruslan Salakhutdinov, Jason Gross, and Joshua B. Tenenbaum
Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology
Abstract
People can learn visual concepts from just one example, but
it remains a mystery how this is accomplished. Many authors
have proposed that transferred knowledge from more familiar
concepts is a route to one shot learning, but what is the form
of this abstract knowledge? One hypothesis is that the shar-
ing of parts is core to one shot learning, and we evaluate this
idea in the domain of handwritten characters, using a massive
new dataset. These simple visual concepts have a rich inter-
nal part structure, yet they are particularly tractable for com-
putational models. We introduce a generative model of how
characters are composed from strokes, where knowledge from
previous characters helps to infer the latent strokes in novel
characters. The stroke model outperforms a competing state-
of-the-art character model on a challenging one shot learning
task, and it provides a good t to human perceptual data.
Keywords: category learning; transfer learning; Bayesian
modeling; neural networks
A hallmark of human cognition is learning from just a few
examples. For instance, a person only needs to see one Seg-
way to acquire the concept and be able to discriminate future
Segways from other vehicles like scooters and unicycles (Fig.
1 left). Similarly, children can acquire a new word from one
encounter (Carey & Bartlett, 1978). How is one shot learning
possible?
New concepts are almost never learned in a vacuum. Past
experience with other concepts in a domain can support the
rapid learning of novel concepts, by showing the learner what
matters for generalization. Many authors have suggested this
as a route to one shot learning: transfer of abstract knowledge
from old to new concepts, often called transfer learning, rep-
resentation learning, or learning to learn. But what is the
nature of the learned abstract knowledge that lets humans ac-
quire new object concepts so quickly?
The most straightforward proposals invoke attentional
learning (Smith, Jones, Landau, Gershkoff-Stowe, & Samuel-
son, 2002) or overhypotheses (Kemp, Perfors, & Tenenbaum,
2007; Dewar & Xu, in press), like the shape bias in word
learning. Prior experience with concepts that are clearly orga-
nized along one dimension (e.g., shape, as opposed to color or
material) draws a learners attention to that same dimension
(Smith et al., 2002) or increases the prior probability of new
concepts concentrating on that same dimension (Kemp et al.,
2007). But this approach is limited since it requires that the
relevant dimensions of similarity be dened in advance.
For many real-world concepts, the relevant dimensions of
similarity may be constructed in the course of learning to
learn. For instance, when we rst see a Segway, we may
parse it into a structure of familiar parts arranged in a novel
conguration: it has two wheels, connected by a platform,
supporting a motor and a central post at the top of which are
two handlebars. These parts and their relations comprise a
Figure 1: Test yourself on one shot learning. From the example
boxed in red, can you nd the others in the array? On the left is a
Segway and on the right is the rst character of the Bengali alphabet.
A n s w e r f o r t h e B e n g a l i c h a r a c t e r : R o w 2 , C o l u m n 3 ; R o w 4 , C o l u m n 2 .
Figure 2: Examples from a new 1600 character database.
useful representational basis for many different vehicle and
artifact concepts a representation that is likely learned in
the course of learning the concepts that they support. Several
papers from the recent machine learning and computer vision
literature argue for such an approach: joint learning of many
concepts and a high-level part vocabulary that underlies those
concepts (e.g., Torralba, Murphy, & Freeman, 2007; Fei-Fei,
Fergus, & Perona, 2006). Another recently popular machine
learning approach is based on deep learning (Salakhutdinov
& Hinton, 2009): unsupervised learning of hierarchies of dis-
tributed feature representations in neural-network-style prob-
abilistic generative models. These models do not specify ex-
plicit parts and structural relations, but they can still construct
meaningful representations of what makes two objects deeply
similar that go substantially beyond low-level image features.
These approaches from machine learning may be com-
pelling ways to understand how humans learn so quickly,
but there is little experimental evidence that directly supports
them. Models that construct parts or features from sensory
data (pixels) while learning object concepts have been tested
in elegant behavioral experiments with very simple stimuli
and a very small number of concepts (Austerweil & Grifths,
2009; Schyns, Goldstone, & Thibaut, 1998). But there have
been few systematic comparisons of multiple state-of-the-art
computational approaches to representation learning with hu-
Plerarchlcal-ueep Model
nD Mode|s: lnLegraLe
hlerarchlcal 8ayeslan
models wlLh deep models.
n|erarch|ca| 8ayes:
Learn h|erarch|es of categor|es for
sharlng absLracL knowledge.
Super-caLegory
Cne-ShoL Learnlng
Deep Mode|s:
Learn h|erarch|es of features.
Unsuperv|sed feature |earn|ng - no need
Lo rely on human-craed lnpuL feaLures.
Shared hlgher-level feaLures
Shared low-level feaLures
(SalakhuLdlnov, 1enenbaum, 1orralba, nlS 2011, AMl 2013)
Lower-|eve| gener|c features:
u8M Model
lmages
horse cow car van Lruck
n|gher-|eve| c|ass-sens|nve features:
Plerarchlcal ulrlchleL
rocess Model
capLure dlsuncuve percepLual
sLrucLure of a speclc concepL
edges, comblnauon of edges
Lower-|eve| gener|c features:
u8M Model
lmages
n|gher-|eve| c|ass-sens|nve features:
n|erarch|ca| Crgan|zanon of Categor|es:
modular daLa-parameLer relauons
capLure dlsuncuve percepLual
sLrucLure of a speclc concepL
edges, comblnauon of edges
express prlors on Lhe feaLures LhaL are
Lyplcal of dlerenL klnds of concepLs
"an|ma|"
"veh|c|e"
Plerarchlcal ulrlchleL
rocess Model
"an|ma|"
"veh|c|e"
(Nested Ch|nese kestaurant rocess)
prlor: a nonparameLrlc prlor over Lree
sLrucLures
1ree hlerarchy of
classes ls learned
"an|ma|"
"veh|c|e"
1ree hlerarchy of
classes ls learned
sLrucLures
(n|erarch|ca| D|r|ch|et rocess) prlor:
a nonparameLrlc prlor allowlng caLegorles Lo
share hlgher-level feaLures, or parLs.
"an|ma|"
"veh|c|e"
1ree hlerarchy of
classes ls learned
sLrucLures
Deep 8o|tzmann Mach|ne
Lnforce (approxlmaLe) global conslsLency
Lhrough many local consLralnLs.
(n|erarch|ca| D|r|ch|et rocess) prlor:
a nonparameLrlc prlor allowlng caLegorles Lo
share hlgher-level feaLures, or parLs.
CllA8 Cb[ecL 8ecognluon
"an|ma|"
"veh|c|e"
30,000 lmages of 100 classes
4 mllllon unlabeled lmages
32 x 32 plxels x 3 8C8
Lower-level
generlc feaLures
Plgher-level class
sensluve feaLures
1ree hlerarchy of
classes ls learned
Inference: Markov chaln
MonLe Carlo
Learnlng Lhe Plerarchy
woman
1he model learns how Lo share Lhe knowledge across many vlsual
caLegorles.
"human" "fru|t"
"aquanc
an|ma|"
Learned super-
c|ass h|erarchy
shark ray turt|e do|ph|n
baby man g|r|
sunower orange app|e 8as|c |eve|
c|ass
Learned |ow-|eve|
gener|c features
"g|oba|"
.
.
Learned h|gher-|eve|
c|ass-sens|nve features
1he model learns how Lo share Lhe knowledge across many vlsual
caLegorles.
woman
"human" "fru|t"
"aquanc
an|ma|"
Learned super-
c|ass h|erarchy
shark ray turt|e do|ph|n
baby man g|r|
sunower orange app|e 8as|c |eve|
c|ass
Learned |ow-|eve|
gener|c features
"g|oba|"
.
.
Learned h|gher-|eve|
c|ass-sens|nve
features
crocod||e
kangaroo
||zard
snake
sp|der
squ|rre|
ouer
porcup|ne
shrew
skunk
bus
streetcar
truck
tractor
house
tank
tra|n
bou|e
can
bow|
|amp
cup
br|dge
cast|e
skyscraper
road
|eopard
fox
||on
nger
wo|f
mouse
rabb|t
hamster
raccoon
possum
bear
ch|mpanzee
e|ephant
came|
beaver
cau|e
do|ph|n
ray
wha|e
shark
turt|e
man
boy
g|r|
man
woman
app|e
peer
orange
pepper
sunower
p|ne
w|||ow tree
map|e tree
oak
Learnlng Lhe Plerarchy
Sharlng leaLures
Shape Color
S
u
n
o
w
e
r

orange app|e sunower
Learnlng Lo
Learn
"fru|t"
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
d
e
t
e
c
t
i
o
n

r
a
t
e
false alarm rate
3 3 1 ex's
Sunower 8CC curve
lxel-
space
dlsLance
Learn|ng to Learn: Learnlng a hlerarchy for sharlng parameLers -
rapld learnlng of a novel concepL.
u
o
l
p
h
l
n

A
p
p
l
e

8eal
8econsL-
rucuons
C
r
a
n
g
e

# examples
1 3 S 10
Area under ROC curve for same/different
(1 new class vs. 99 distractor classes)
1
0.93
0.9
0.83
0.8
0.73
0.7
0.63
ClS1
LuA
(class condluonal)
Pu-u8M
Pu-u8M
(no super-classes)
[Averaged over 40 LesL classes]
S0
Cur mode| outperforms standard computer v|s|on
features (e.g. GIS1).
Cb[ecL 8ecognluon
u8M
Learnlng from 3 Lxamples
Clven only 3 Lxamples
CeneraLed Samples
Wlllow 1ree 8ockeL
Learned
hlgher-level
feaLures
Learned lower-
level feaLures
char 1 char 2 char 3 char 4 char 3
"a|phabet 1"
"a|phabet 2"
Ldge
s
Strokes
Pandwrluen CharacLer 8ecognluon
23,000
characLers
1 3 S
# examples
Area under ROC curve for same/different
(1 new class vs. 1000 distractor classes)
lxels
Pu-u8M
[Averaged over 40 LesL classes]
10
Pandwrluen CharacLer 8ecognluon
1
0.93
0.9
0.83
0.8
0.73
0.7
0.63
LuA
(class condluonal)
Pu-u8M
(no super-classes)
u8M
Slmulaung new CharacLers
Clobal
Super
class 1
Super
class 2
8eal daLa wlLhln super class
Class 1 Class 2
new class
SlmulaLed new characLers
Class 1 Class 2
Super
class 1
Super
class 2
Clobal
new class
Class 1 Class 2
Super
class 1
Super
class 2
Clobal
new class
Class 1 Class 2
Super
class 1
Super
class 2
Clobal
new class
Class 1 Class 2
Super
class 1
Super
class 2
Clobal
new class
Class 1 Class 2
Super
class 1
Super
class 2
Clobal
new class
Class 1 Class 2
Super
class 1
Super
class 2
Clobal
new class
1he same model can be applled Lo
speech, LexL, vldeo, or any oLher
hlgh-dlmenslonal daLa.
Walk
Sexy Walk
Mouon CapLure
urunken Walk
Walk
Sexy Walk
Mouon CapLure
urunken Walk
1lme
Sexy walk 8CC curve
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
d
e
t
e
c
t
i
o
n

r
a
t
e
false alarm rate
Pu-u8M
lnpuL space dlsLance
(no hlerarchy)
1alk 8oadmap
Mulumodal Learnlng
Concluslons
lace 8ecognluon
uue Lo exLreme lllumlnauon varlauons, deep models
perform qulLe poorly on Lhls daLaseL.
?ale 8 LxLended lace uaLaseL
4 subseLs of lncreaslng lllumlnauon varlauons
Conslder More SLrucLured Models: undlrecLed + dlrecLed models.
ueep Lamberuan Model
ueep
undlrecLed
ulrecLed
Comblnes Lhe eleganL properues of Lhe Lamberuan model wlLh Lhe
Causslan u8M model.
Cbserved
lmage
l
n
f
e
r
r
e
d

(Tang et. Al., ICML 2012, Tang et. al. CVPR 2012)
Lambertan Reectance Mode
vewer
Lamberuan 8eecLance Model
A slmple model of Lhe lmage formauon process.
Albedo -- dluse reecuvlLy of a surface, maLerlal
dependenL, lllumlnauon lndependenL.
lmage
albedo
Surface
normal
LlghL
source
lmages wlLh dlerenL lllumlnauon can be generaLed by varylng llghL
dlrecuons
Surface normal -- perpendlcular Lo Lhe LangenL
plane aL a polnL on Lhe surface.
Cbserved
lmage
lmage
albedo
Surface
normals
LlghL
source
Cbserved
lmage
lmage
albedo
Surface
normals
LlghL
source
Causslan ueep
8olLzmann Machlne
A|bedo D8M:
reLralned uslng
1oronLo lace uaLabase
1ransfer Learnlng
lnference: varlauonal lnference.
Learnlng: SLochasuc Approxlmauon
l
n
f
e
r
r
e
d

?ale 8 LxLended lace uaLaseL
38 sub[ecLs, 43 lmages of varylng lllumlnauons per sub[ecL,
dlvlded lnLo 4 subseLs of lncreaslng lllumlnauon varlauons.
28 sub[ecLs for Lralnlng, and 10 for Lesung.
lace 8ellghung
Cne 1esL lmage
lace 8ellghung
Cbserved
lnferred
albedo
8ecognluon 8esulLs
8ecognluon as funcuon of Lhe number of Lralnlng lmages for
10 LesL sub[ecLs.
Cne-ShoL
8ecognluon
WhaL abouL deallng wlLh
occluslons or sLrucLured
nolse?
8obusL 8olLzmann Machlnes
8ulld more sLrucLured models LhaL can deal wlLh occluslons or
sLrucLured nolse.
Cbserved
lmage
lnferred
1ruLh
lnferred
8lnary Mask
(Tang et. Al., ICML 2012, Tang et. al. CVPR 2012)
sLrucLured nolse.
Causslan 88M, modellng
clean faces
8lnary 88M
modellng occluslons
8lnary plxel-wlse
Mask
Causslan nolse model
Cbserved
lmage
sLrucLured nolse.
Causslan 88M, modellng
clean faces
8lnary 88M
modellng occluslons
8lnary plxel-wlse
Mask
Causslan nolse model
ls a heavy-Lalled dlsLrlbuuon
lnference: varlauonal lnference.
Learnlng: SLochasuc Approxlmauon
# of lLerauons 1 3 3 7 10 20 30 40 30
Gbbs Iteratons
Inta 1 3 5 7 9 11
lnference on Lhe LesL sub[ecLs
# of lLerauons
8ecognluon 8esulLs on
A8 lace uaLabase
lnLernal sLaLes of
8o8M durlng
learnlng.
l
n
f
e
r
r
e
d

# of lLerauons 1 3 3 7 10 20 30 40 30
Gbbs Iteratons
Inta 1 3 5 7 9 11
# of lLerauons
A8 lace uaLabase
lnLernal sLaLes of
8o8M durlng
learnlng.
l
n
f
e
r
r
e
d

# of lLerauons 1 3 3 7 10 20 30 40 30
Gbbs Iteratons
Inta 1 3 5 7 9 11
# of lLerauons
A8 lace uaLabase
lnLernal sLaLes of
8o8M durlng
learnlng.
l
n
f
e
r
r
e
d

Learnlng AlgorlLhm Sunglasses Scarf
8obusL 8M 84.3 80.7
88M 61.7 32.9
Llgenfaces 66.9 38.6
LuA 36.1 27.0
lxel 31.3 17.3
Speech 8ecognluon
Learnlng AlgorlLhm AvC LL8
CMM unsupervlsed 16.4
u8M unsupervlsed 14.7
u8M (1 labels) 13.3
u8M (30 labels) 10.3
u8M (100 labels) 9.7
(Zhang, SalakhuLdlnov, Chang, Class, lCASS 2012)
23 ms wlndowed frames
61 phoneuc
labels
630 speaker 1lMl1 corpus: 3,696
Lralnlng and 944 LesL uuerances.
Spoken uery Detecnon:
lor each keyword, esumaLe uuerance's
probablllLy of conLalnlng LhaL keyword.
erformance: Average equal error
raLe (LL8).
PMM decoder
1alk 8oadmap
Mulumodal Learnlng
Concluslons
uaLa - Collecuon of Modallues
Mulumedla conLenL on Lhe web -
lmage + LexL + audlo.
roducL recommendauon
sysLems.
8oboucs appllcauons.
Audlo
vlslon
1ouch sensors
MoLor conLrol
sunseL,
paclcocean,
bakerbeach,
seashore, ocean
car,
auLomoblle
Shared ConcepL
ModallLy-free" represenLauon
ModallLy-full" represenLauon
ConcepL"
sunseL, paclc ocean,
baker beach, seashore,
ocean
lmprove Classlcauon
Mulu-Modal lnpuL
penLax, k10d, kangaroolsland
souLhausLralla, sa ausLralla
ausLrallanseallon 300mm
SLA / nC1 SLA
8eLrleve daLa from one modallLy when querled uslng daLa from
anoLher modallLy
beach, sea, surf,
sLrand, shore,
wave, seascape,
sand, ocean, waves
llll ln Mlsslng Modallues
beach, sea, surf,
sLrand, shore,
wave, seascape,
sand, ocean, waves
8ulldlng a robablllsuc Model
Learn a [olnL denslLy model:
h: fused" represenLauon for
classlcauon, reLrleval.
ConcepL"
baker beach,
seashore, ocean
CeneraLe daLa from
condluonal dlsLrlbuuons
for
ConcepL"
Mlsslng
uaLa
- lmage AnnoLauon
CeneraLe daLa from
condluonal dlsLrlbuuons
for
ConcepL"
- lmage AnnoLauon
- lmage 8eLrleval
Mlsslng
uaLa
baker beach,
seashore, ocean
Challenges - l
very dlerenL lnpuL
represenLauons
lmage 1exL
baker beach, seashore,
ocean
lmages - real-valued, dense
ulmculL Lo learn
cross-modal feaLures
from low-level
represenLauons.
uense
1exL - dlscreLe, sparse
Sparse
Challenges - ll
nolsy and mlsslng daLa
lmage 1exL
penLax, k10d,
penLaxda30200,
kangaroolsland, sa,
ausLrallanseallon
mlcklkrlmmel,
mlcklpedla,
headshoL
unseulplxel,
naLurey, crap
< no LexL>
Challenges - ll
lmage 1exL 1exL generaLed by Lhe model
beach, sea, surf, sLrand,
shore, wave, seascape,
sand, ocean, waves
porLralL, glrl, woman, lady,
blonde, preuy, gorgeous,
expresslon, model
nlghL, noue, Lramc, llghL,
llghLs, parklng, darkness,
lowllghL, nachL, glow
fall, auLumn, Lrees, leaves,
follage, foresL, woods,
branches, paLh
penLax, k10d,
penLaxda30200,
kangaroolsland, sa,
ausLrallanseallon
mlcklkrlmmel,
mlcklpedla,
headshoL
unseulplxel,
naLurey, crap
< no LexL>
A Slmple Mulumodal Model
use a [olnL blnary hldden layer.
rob|em: lnpuLs have very dlerenL sLausucal
properues.
ulmculL Lo learn cross-modal feaLures.
0
0
1
0
0
8eal-valued
1-of-k
0
0
1
0
0
uense, real-valued
lmage feaLures
Causslan model
8epllcaLed Somax
Mulumodal u8M
Word
counLs
(Srivastava & Salakhutdinov, NIPS 2012)
Mulumodal u8M
0
0
1
0
0
uense, real-valued
lmage feaLures
Causslan model
8epllcaLed Somax
Word
counLs
Causslan model
8epllcaLed Somax
0
0
1
0
0
Mulumodal u8M
Word
counLs
uense, real-valued
lmage feaLures
0
0
1
0
0
uense, real-valued
lmage feaLures
Word
counLs
Causslan model
8epllcaLed Somax
Mulumodal u8M
8ouom-up
+
1op-down
0
0
1
0
0
uense, real-valued
lmage feaLures
Word
counLs
Causslan 88M
8epllcaLed Somax
Mulumodal u8M
8ouom-up
+
1op-down
1exL CeneraLed from lmages
canada, naLure,
sunrlse, onLarlo, fog,
mlsL, bc, mornlng
lnsecL, buuery, lnsecLs,
bug, buuerles,
lepldopLera
gramu, sLreeLarL, sLencll,
sucker, urbanarL, gra,
sanfranclsco
porLralL, chlld, kld,
rlLrauo, klds, chlldren,
boy, cuLe, boys, lLaly
dog, caL, peL, kluen,
puppy, glnger, Longue,
kluy, dogs, furry
sea, france, boaL, mer,
beach, rlver, breLagne,
plage, brluany
Clven CeneraLed Clven CeneraLed
1exL CeneraLed from lmages
Clven CeneraLed
waLer, glass, beer, boule,
drlnk, wlne, bubbles, splash,
drops, drop
porLralL, women, army, soldler,
moLher, posLcard, soldlers
obama, barackobama, elecuon,
pollucs, presldenL, hope, change,
sanfranclsco, convenuon, rally
lmages from 1exL
Sample drawn aer
every 30 sLeps of
Clbbs sampllng
lmages from 1exL
waLer, red,
sunseL
naLure, ower,
red, green
blue, green,
yellow, colors
chocolaLe, cake
Clven 8eLrleved
Ml8-lllckr uaLaseL
Pulskes eL. al.
1 mllllon lmages along wlLh user-asslgned Lags.
sculpLure, beauLy,
sLone
nlkon, green, llghL,
phoLoshop, apple, d70
whlLe, yellow,
absLracL, llnes, bus,
graphlc
sky, geoLagged,
reecuon, clelo,
bllbao, ree[o
food, cupcake,
vegan
d80
anawesomeshoL,
LheperfecLphoLographer,
ash, damnlwlshldLakenLhaL,
splrlLofphoLography
nlkon, ablgfave,
goldsLaraward, d80,
nlkond80
uaLa and ArchlLecLure
12 Mllllon parameLers
2048
1024
1024
2000
1024
1024
3837
200 mosL frequenL Lags.
23k labeled subseL (13k
Lralnlng, 10k Lesung)
38 classes - !"#$ &'(($
)*)#$ +*'$ +,-./ .
Addluonal 1 mllllon
unlabeled daLa
8esulLs
Loglsuc regresslon on Lop-level represenLauon.
Mulumodal lnpuLs
Learnlng AlgorlLhm MA reclslon[30
8andom 0.124 0.124
LuA [Pulskes eL. al.] 0.492 0.734
SvM [Pulskes eL. al.] 0.473 0.738
u8M-Labelled 0.326 0.791
Mean Average reclslon
Slmllar
leaLures,
23k
8esulLs
Loglsuc regresslon on Lop-level represenLauon.
Mulumodal lnpuLs
8andom 0.124 0.124
LuA [Pulskes eL. al.] 0.492 0.734
SvM [Pulskes eL. al.] 0.473 0.738
u8M-Labelled 0.326 0.791
u8M 0.609 0.863
ueep 8ellef neL 0.399 0.867
AuLoencoder 0.600 0.873
Mean Average reclslon
Slmllar
leaLures,
23k
+ 1 Mllllon
unlabelled
8eneLs of uslng Mulumodal uaLa
1ralnlng
hase
lmage-LuA [Pulskes eL. al.] 0.313 -
lmage-SvM [Pulskes eL. al.] 0.373 -
lmage-u8M 0.469 0.803
Mulumodal-u8M (mlsslng LexL) 0.331 0.832
1esL hase
lmages Cnly!
Mlsslng
1exL
vldeo and Audlo
Cuave uaLaseL
Mulu-Modal Models
Laser scans
lmages
vldeo
1exL & Language
1lme serles
daLa
Speech &
Audlo
Cne of key Cha||enges:
lnference
uevelop learnlng sysLems LhaL come
closer Lo dlsplaylng human llke lnLelllgence
Summary
LmclenL learnlng algorlLhms for Plerarchlcal Cenerauve Models.
Learnlng more adapuve, robusL, and sLrucLured represenLauons.
ueep models can lmprove currenL sLaLe-of-Lhe arL ln many
appllcauon domalns:
! Cb[ecL recognluon and deLecuon, LexL and lmage reLrleval, handwrluen
characLer and speech recognluon, and oLhers.
1ext & |mage retr|eva| ]
Cb[ect recogn|non
Learn|ng a Category
n|erarchy
Dea||ng w|th m|ss|ng]
occ|uded data
PMM decoder
Speech kecogn|non
beach, seashore
Mu|nmoda| Data
Cb[ect Detecnon
1hank you
Nitish Srivastava University of Toronto
Charlie Tang University of Toronto
Josh Tenenbaum MIT
Geoffrey Hinton University of Toronto
Nathan Srebro TTI, University of Chicago
Roger Grosse MIT
Ilya Sutskever Google
Iain Murray University of Edinburgh
Andriy Mnih Gatsby Computational Neuroscience Unit, UCL
Hugo Larochelle University of Toronto
Antonio Torralba MIT
Bill Freeman MIT
John Langford Yahoo Research
Tong Zhang Rutgers
Sham Kakade University of Pennsylvania
Brenden Lake MIT
Thanks to my collaborators:
Code for learnlng 88Ms, u8ns, and u8Ms ls avallable aL:
hup://www.uLsLaL.LoronLo.edu/~rsalakhu/code.hLml

ISBI2 PDF Version

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

ISBI2 PDF Version

Загружено:

Авторское право:

Доступные форматы

ueep Learnlng ll

Вам также может понравиться