Академический Документы
Профессиональный Документы
Культура Документы
Chetan Arora
Disclaimer: The contents of these slides are taken from various publicly available resources such as research papers,
talks and lectures. The sources are usually acknowledged but sometimes not. To be used for the purpose of
classroom teaching, and academic dissemination only.
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Tractable density
Approximate density Markov Chain
PixelRNN/CNN
(", $ − 1)
• Generate image pixels starting from corner
• Dependency on previous pixels modelled
Input Layer
using an RNN (LSTM)
(", $)
Drawback: sequential generation is slow! Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu
Pixel Recurrent Neural Networks. 2016
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Row LSTM
• Gives rise to a triangular context.
Diagonal BiLSTM
• Designed to capture the entire available context for any
image size.
Diagonal BiLSTM
Rotation Trick
• To allow for parallelization along the diagonals, the input map is skewed by
offsetting each row by one position with respect to the previous row
• Location (" − 1, &) moves to (" − 1, & − 1) and now all pixels of a column
can be generated in parallel.
• Diagonal model is able to capture the whole context (even for the border
pixels), which is not true for Row model.
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
PixelCNN
• Row and Diagonal capture long range dependencies in the images using
LSTM layers.
• Learning of dependencies comes at a computational cost as each state
needs to be computed sequentially.
• Standard convolutional layers can capture a bounded receptive field and
compute features for all pixel positions at once.
PixelCNN
• Still generate image pixels starting from corner Softmax loss at each pixel
Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu
Conditional Image Generation with PixelCNN Decoders. NIPS 2016
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
PixelCNN
• Still generate image pixels starting from corner Softmax loss at each pixel
Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu
Conditional Image Generation with PixelCNN Decoders. NIPS 2016
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
PixelCNN: Masking
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, Koray Kavukcuoglu
Conditional Image Generation with PixelCNN Decoders. NIPS 2016
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Generation Samples
• The model does not know that a value of 128 is close to a value of 127 or
129.
• Need to learn such relationship first before higher level structures could be
learnt.
• Especially problematic for data with higher accuracy on the observed pixels
than the usual 8 bits
PixelCNN++
• Use parameterised probability distribution: Discretized Logistic Mixture
Likelihood:
• ! ∼ ∑'
$%& ($ logistic 0$ , 2$
789.;<=> 7<9.;<=>
• 3 4 (, 0, 2 = ∑'
$%& ($ 6 −6
?> ?>
• A is the number of components in mixture, 6 logistic function, ($ , 0$ , 2$
weight, mean and scale of component B respectively.
PixelCNN++
• Use down-sampling to efficiently capture structure at multiple resolutions.
• Short-cut connections to further speed up optimization.
• Regularize the model using dropout
Con:
• Sequential generation ⇒ slow
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
So far…
PixelCNNs define tractable density function, optimize likelihood of training
data:
)
Is there any other any other way to learn the tractable density function?
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
!* $ = !+ (& () $ )
Carl Doesrsch
Tutorial on Variational Autoencoders
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Variational Autoencoders
• Learn both latent variables as well as the transformation function to sample
! from it.
• Cannot optimize directly, derive and optimize lower bound on the log
likelihood instead
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
# % L2 Loss
!−!
How to learn this feature function:
representation?
!# Reconstructed
Train such that features input data
Encoder: 4-layer conv
can be used to reconstruct Decoder
Decoder: 4-layer upconv
original data " Features
“Autoencoding” - encoding
itself Encoder
! Input data
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Classifier Fine-tune
encoder
Features " jointly Train for final task
with sometimes with small data)
Encoder
classifier
Input data !
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Variational Autoencoders
Probabilistic spin on autoencoders - will let us sample from the model
to generate data!
" %
Assume training data ! "#$ is generated from underlying unobserved
(latent) representation &
Variational Autoencoders
We want to estimate the true
parameters ! ∗ of this generative model.
Sample from
true conditional %
How should we represent this model?
'( ∗ %|$ )
Decoder
Sample from network Choose prior p $ to be simple, e.g.
true prior Gaussian. Reasonable for latent
'( ∗ ($) $ attributes, e.g. pose, how much smile.
Variational Autoencoders
We want to estimate the true
parameters ! ∗ of this generative model.
Sample from
true conditional % How to train the model?
#$ ∗ %|( +
Decoder Learn model parameters to
Sample from network maximize likelihood of training data
true prior #$ % = ∫ #$ ( #$ %|( *(
#$ ∗ (() (
Now with latent (
Single ( in autoencoders
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Variational Autoencoders
!" # = ∫ !" & !" #|& (&
Variational Autoencoders
!" # = ∫ !" & !" #|& (&
Variational Autoencoders
!" # = ∫ !" & !" #|& (&
What if ) is binary?
• One can use Bernoulli distribution parameterized by f(&, -)
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Variational Autoencoders
!" # = ∫ !" & !" #|& (&
Variational Autoencoders
How to compute !(#) in a simple manner?
)
• Compute . / = ∑1 , .(/|(1 )
Variational Autoencoders
Maximize Data likelihood: !" # = ∫ !" & !" #|& (&
Variational Autoencoders
Maximize Data likelihood: !" # = ∫ !" & !" #|& (&
• Solution?
• Use ,- (&|#) that approximates !" (&|#)
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Variational Autoencoders
Since we’re modelling probabilistic generation of data, encoder and decoder
networks are probabilistic
Sample , from ,|!
Sample ! from ~ 2("%|# , &%|# )
Mean and (diagonal)
!|, ~ 2("%|# , &%|# ) Mean and (diagonal)
covariance of (|'
covariance of '|(
Variational Autoencoders
Now equipped with our encoder and decoder networks, let’s work out the
(log) data likelihood:
log $% & ' = )*~,-(*|0 1) log $% & ' ($% (& ' ) Does not depend on 3)
Taking expectation w.r.t. 3 (using encoder network) will come in handy later
45 0 1 |* 45 *
= )*~,- log (Bayes’ Rule)
45 *|0 1
45 0 1 |* 45 * ,- 3 &'
= )*~,- log
45 *|0 1 ,- 3 &'
,- 3 &' ,- 3 &'
= )*~,- log $% & ' |3 − )* log + )* log
45 * 45 *|0 1
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Variational Autoencoders
log $% & '
,- / &' ,- / &'
= )*~,- log $% &' |/ − )*~,- log + )*~,- log
12 * 12 *|4 5
= )*~,- log $% & ' |/ − 678 9: / & ' ||$% (/) + 678 9: / & ' ||$% (/|& ' )
Decoder gives $% &|/ . Can This KL term (between $% /|& ' intractable
compute estimate of this Gaussians for encoder (saw earlier), can’t
term through sampling and z prior) has nice compute this KL term :(
closed-form solution! But we know KL
divergence always >= 0.
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Variational Autoencoders
log $% & ' ≥ ℒ *, ,, & ' = ./~12 log $% & ' |4 − 678 9: 4 & ' ||$% (4)
Decoder(P)
% &
Non Differentiable Step
Encoder(Q)
! Carl Doesrch
Tutorial on Variational Autoencoders
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
3
0 1 −!
Variational Autoencoders
With Reparameterization Trick 0(1)
+ Decoder(P)
% &
Still Non Differentiable
Encoder(Q)
However gradient
can flow from here
!
Tutorial on Variational Autoencoders, Carl Doesrch.
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
3
0 1 −!
Variational Autoencoders
1
= ;< Σ + %= % − > − log det(Σ 0(1)
2
+ Decoder(P)
% &
• k is the dimensionality
of the distribution
• Easy to backpropagate Encoder(Q)
when Σ is diagonal.
Possible otherwise also !
Tutorial on Variational Autoencoders, Carl Doesrch.
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
,(!)
Vary !"
Decoder(P)
Vary !#
Kingma and Welling
Auto-Encoding Variational Bayes. ICLR 2014
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Conditional VAEs
• Mix of Pixel RNN and VAEs.
Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, Daan Wierstra
DRAW: A Recurrent Neural Network For Image Generation. PMLR 2015.
Chetan Arora
Computer Vision and Graphics Lab, IIT Delhi
Variational Autoencoders
Active areas of research:
Variational Autoencoders
Summary:
Variational Autoencoders
Pros:
• Principled approach to generative models
• Allows inference of q(z|x), which can be a useful feature representation
for other tasks
Cons:
• Maximizes lower bound of likelihood: okay, but not as good evaluation as
PixelRNN/PixelCNN
• Samples blurrier and lower quality compared to state-of-the-art (GANs).