Vector Quantization: Data Compression and Data Retrieval

Vector Quantization
Data Compression and Data Retrieval

Types of Quantization
1) Scalar Quantization
2) Vector Quantization
Vector Quantization (VQ)
• Vector Quantization is one of the efficient tool for lossy compression.
• It uses the principle of Block Coding Technique.
• Simple decoding structure and provides high compression ratio.
• VQ concept is based on Shannon’s rate distortion theory “the better
compression ratio is always achievable by encoding sequence of input
samples rather than the input sample one by one”.
Cont…
• There are three major steps in VQ Technique:
1) Codebook Design
2) VQ Encoding Process
3) VQ Decoding Process
Cont…
• Vector quantization groups source data into vectors
o A vector quantizer maintains a set of vectors called the codebook. Each vector in
the codebook is assigned an index.
Cont…
• The performance of VQ technique depends on the constructed
codebook.
• The search complexity increases with the number of vectors in the
codebook.
• Number of code vectors N in the codebook depends on two parameters,
rate R and dimension L.
Number of code vectors N = 2RL
Here, R=rate in bits/pixel and L=dimension

Example
Input Vectors: <0.75, 1.27>, <1.78, 2.11>
Encoder Decoder
Input Output
Vector Vector
Find Closest Code
Table Lookup
Vector
Codebook Index Index Codebook

-2 -2 00 00 -2 -2
-1 -1 01 01 -1 -1
1 1 10 10 1 1
2 2 11 11 2 2
Cont…
• Distortion
|X-Yj|2 ≤ |X-Yi|2, for all i≠j

Here, X is input vector and Y is output vector
• Quantization function
Q(X) = Yj if d(X, Yj) ≤ d(X, Yi) for all i≠j

Cont…
For Input Vector X <0.75, 1.27>
• First code vector <-2, -2>
Distortion = [0.75 - (-2)] 2 + [1.27 - (-2)] 2 = 18.2554
• Second code vector <-1, -1>
Distortion = [0.75 - (-1)] 2 + [1.27 - (-1)] 2 = 8.2154
• Third code vector <1, 1>
Distortion = [0.75 - 1] 2 + [1.27 - 1] 2 = 0.1354
• Last code vector <2, 2>
Distortion = [0.75 - 2] 2 + [1.27 - 2] 2 = 2.0954
Vector Quantization Vs Scalar
Quantization
Vector Quantization Scalar Quantization
1. The input symbols are grouped together called 1. Each input is treated separately and produce the
vector and process then to give output. output.
2. Increase the optimality of the quantizer. 2. This is not the case for SQ.
3. More efficient 3. Less efficient
4. The granular error is affected by size and shape of 4. The granular error is determined by the size of
quantization interval. quantization interval.
5. Q(X) = Yj if d(X, Yj) ≤ d(X, Yi) for all i≠j 5. Q(x) = yi if bi–1 < x  bi.
6. VQ is used for low-bit rate applications where low 6. In image compression, SQ creates annoying effects
resolution is sufficient. It is widely used for image in decompressed image.
compression.
7. For a given rate, VQ results in lower distortion than 7. For a given rate, SQ results in greater distortion
SQ. than VQ.
LBG (Linde-Buzo-Gray) Algorithm
• It is used to design codebook.
• Most widely used approach.
• Set of quantizer output points are called codebook of quantizer.
• The process of placing these output points is referred as codebook
design.
• It is based on K-means algorithm (Clustering procedure).
• It guarantee that the distortion from one iteration to the next will not
increase.
Cont…
• K-means Algorithm:
 Large set of output vectors (training set).
 Initial set of K representative vectors (representative set)
 The representative set is updated by computing the centroid of the training
set vectors which are assigned to it. (by mean value of all the vectors assigned
to representative vectors)
Cont…
• LBG Algorithm:
 Choose initial set of code vectors ri and tolerance value e.
m=1; D0 = ∞;
 While not success
 Divide the set M training vectors x into L clusters Ki using the minimum
distortion condition:
x ∈ Ki if d(x, ri) ≤ d(x, rj) for all j≠i, 1 ≤ I ≤ L;
 Compute the average distortion
 Where and ri ∈ ki, code vector ri is in the same group ki as ;

Cont…
 In each cluster ki, computer the centroid of the training vectors in ki and
make this centroid the new code vector ri in ki;
 If (Dm-1 - Dm)/ Dm < e
Success;
 Else m++ and try another iteration.
Codebook generation process of
LBG Algorithm
Example of LBG
• Consider example: 4 6 5 10 9 6 8 9 3 9 10 10 2 8 8 4 5 1 4 2
• For N=2 stream of samples is divided into 10 vectors, all of them used
as training vectors.
[4 6], [5 10], [9 6], [8 9], [3 9], [10 10], [2 8], [8 4], [5 1], [4 2]
• To apply k-means algorithm set no. of levels to 3, L=3 and choose
these 3 code vectors.
r1=[5 3], r2=[6 9], r3=[10 7]
Which are shown as open circles in fig.(a)

Cont…
(a) (b) (c)

Cont…
• Using the squared-error distortion, we obtain the following 3 clusters
K1 = {[4 2], [4 6], [5 1], [8 4]}
K2 = {[2 8], [3 9], [5 10], [8 9]}
K3 = {[9 6], [10 10]}
• Now we compute average distortion
D1 = 1/10 [ { (4-5)2 + (2-3)2 + (4-5)2 + (6-3)2 }
+ { (5-5)2 + (1-3)2 + (8-5)2 + (4-3)2 }
+ { (2-6)2 + (8-9)2 + (3-6)2 + (9-9)2 }
+ { (5-6)2 + (10-9)2 + (8-6)2 + (9-9)2 }
+ { (9-10)2 + (6-7)2 + (10-10)2 + (10-7)2 } ]
D1 = 6.9
Cont…
• Then centroids for each of the 3 clusters:
• r1 = ¼ ([4 2] + [4 6] + [5 1] + [8 4]) = ¼ [21 13] = [5.25 3.25]
• r2 = ¼ ([2 8] + [3 9] + [5 10] + [8 9]) = [4.5 9]
• r3 = ¼ ([9 6] + [10 10]) = [9.5 8]
• With the new centroids that become code vectors, the plot of training
vectors is as in fig.(b).
Cont…
• The second iteration of main while loop begins with reclassification of
the training vectors.
k1 = {[4 2], [4 6], [5 1], [8 4]}
k2 = {[2 8], [3 9], [5 10]}
k3 = {[8 9], [ 9 6], [10 10]}
• The first cluster remains the same and second loses one vector and in
third cluster one vector is added which was in previous second cluster.
• The new average distortion D2 = 4.8
• There is an improvement over D1 = 6.9
Cont…
• The new code vector that is centroid of the 3 clusters are now
r1 = [5.25 3.25]
r2 = [3.33 9]
r3 = [9 8.33]
Which results in the plot as shown in the fig.(c).
• None of the code vectors changes significantly and the membership of the 3
clusters remains the same as before, the average distortion is
D3 = 4.8
• So now membership will remain same if we continue execution of the
algorithm and code vectors will not change their values.
Tree-Structured Vector Quantizers
(TSVQ)
• Fast codebook technique presented by Buzo.
• Full search method results in high computational complexity.
• Number of operations can be reduced by enforcing a certain structure
on the codebook.
• One possibility is using tree structure for the codebook and method is
called the binary search clustering.
• This method gives significant decrease in search complexity over full
search.
Cont…
• Assume L=number of code vectors and it is power of 2.
• First design codebook with k-means method with two code vectors r0
and r1.
• With these code vectors, all training vectors are divided into two
clusters C0 and C1.
• Next, for each of the two clusters, a codebook is designed with the two
code vectors r00 and r01 for cluster C0 and the code vectors r10 and
r11 for cluster C1.
• The process is continued until the total number of code vectors is equal
to L.
Cont…
• When quantizing a vector x, at each level of the tree codebook, a
distortion measure has to be computed for code vectors ri0 and ri1
and if d(x, ri0) < d(x, ri1), then bit 0 is sent as part of the codeword for
x.
• Next at the level i+1, the distortion measure is found for x and the
two descendants of ri0, namely ri00 and ri01 and the appropriate bit
is sent.
Cont…
r0 r1
r00 r01 r10 r11
r000 r001 r010 r011 r100 r101 r111

r110
An example of a uniform tree codebook L=8

Cont…
• This method requires storing twice as many code vectors as in full
search. So storage requirement will be larger.
• The advantage is that we need to perform 2log M comparisons.
• Disadvantage is that distortion will be higher compared to a full
search quantizer.
How to design TSVQ?
1) Obtain the average of all the training vectors, perturb it to obtain a second
vector and use these vectors to form a two level VQ.
2) Call the vectors v0 and v1 and group the training set vectors that would be
quantized to each as g0 and g1.
3) Perturb v0 and v1 to get the initial vectors for a four-level VQ.
4) Use g0 to design a two-level VQ and g1 to design another two-level VQ.
5) Label the vectors v00, v01, v10, v11.
6) Split g0 using v00 and v01 into two groups g00 and g01.
7) Split g1 using v10 and v11 into two groups g10 and g11.
Continue in this manner until we have required number of output points.
Pruned Tree-Structured Vector
Quantizers
• We can improve TSVQ rate distortion performance by pruning.
• Pruning may increase distortion. Pruning means carefully removing
subgroups of the tree.
• Reduces size of codebook.
• So main objective of pruning is to remove those subgroups that will
result in the best trade-off rate and distortion.
Structured Vector Quantizers
• Impose structure on the codebook design so that the implementation
and search complexity is reduced.
• Let L=dimension of vector quantizer and R=bit-rate
• Then L.2RL scalars needs to be stored and L.2RL scalar distortion
calculations are required.
• So solution is to introduce some form of structure in codebook and
quantization process.
• Disadvantage: loss in rate-distortion performance.
Cont…
• Different types of structured vector quantizers:
1) Lattice Quantization
2) Tree-Structured codes
3) Multistage codes
4) Product codes: gain/shape code
Pyramid Vector Quantization(PVQ)
• Fischer introduced PVQ for encoding of high frequency sub bands and
proved that at high bit-rate its performance is close to source entropy.
• PVQ uses lattice points of pyramidal shape in multidimensional space
as the quantizer codebook.
• Lattice is a discrete set of points in the n-dimensional space, which
can be generated by the integral linear combination of a given set of
basis vectors.
• Enumeration assigns a unique index to all possible vectors in the PVQ
codebook, imparting a sorting order to the PVQ codebook vectors.
Cont…
• Advantage of PVQ is its fixed output bit-rate.
• Steps in Pyramid VQ
1) First find the value

2) This value is called gain and quantized and transmitted to the receiver.
3) The input is normalized by gain and called as shape. Shape is quantized
using a single hyper-pyramid.
4) Quantization of shape consists of finding the output point on the hyper-
pyramid closest to the shape and finding the binary codeword for it.
Polar and Spherical Vector
Quantizers
• High complexity is a major drawback of vector quantizers which
exponentially increases with the increasing of the dimension n.
• Therefore, the most used vector quantizers are two-dimensional
quantizers.
• All signals of interest are random and described with some pdf.
• For signals with Gaussian pdf, it is easy to design two-dimensional
vector quantizer in polar coordinates (magnitude r and phase Θ) than
in the Cartesian coordinates. Such quantizers are called polar
quantizers.
Cont…
• In two dimensions, we can quantize the input vector by first
transforming it into polar coordinates r and Θ.
The magnitude r =
The phase Θ =
• There are two types of polar quantizers:
1) restricted (product)
2) unrestricted
• Unrestricted polar quantizers have better performance than restricted
polar quantizers, but are more complex.
Cont…
• In polar quantizers, the magnitude r can be quantized in different
ways: uniform or non-uniform quantization.
• But phase Θ can be quantized using uniform quantization since it has
uniform distribution.
• Vectors are 2-dimensional then contours are circles
• Vectors are 3 or higher dimensional then contours are spheres and
hyper-spheres.
Lattice Vector Quantizers
• Vector Quantizer codebook designed using LBG algorithm complicates the
quantization process and have no visible structure.
• So alternative is a Lattice point quantization since we can use it as fast
encoding algorithm.
• Lattice-Based Vector Quantizer is a structured technique in which the lattice
points are used as a codebook of VQ.
• Codebook is a regular lattice where all regions having the same shape, size
and orientation.
• For n=bit-rate and v=dimension, the number of codebook vectors is 2nv.
• With LBG, high bit-rate and high dimension, the number 2nv is not achievable.
Cont…
• A Lattice in k-dimensional space is a collection of all vectors of the form
where mi’s are arbitrary integers and ui’s form a linearly independent
set of n<=k vectors
• Example of lattice quantizer in two dimensions: Hexagonal lattice.
• Low complexity
• Codebook storage is eliminated
• Gives lower distortion for the same rate, compared to a uniform scalar
quantizer.
Hexagonal Lattice
Thank You

Vector Quantization: Data Compression and Data Retrieval

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Vector Quantization: Data Compression and Data Retrieval

Загружено:

Авторское право:

Доступные форматы

Vector Quantization

Data Compression and Data Retrieval

Number of code vectors N = 2RL

Here, R=rate in bits/pixel and L=dimension

Codebook Index Index Codebook

|X-Yj|2 ≤ |X-Yi|2, for all i≠j

Q(X) = Yj if d(X, Yj) ≤ d(X, Yi) for all i≠j

 Where and ri ∈ ki, code vector ri is in the same group ki as ;

Which are shown as open circles in fig.(a)

(a) (b) (c)

r00 r01 r10 r11

r000 r001 r010 r011 r100 r101 r111

An example of a uniform tree codebook L=8

1) First find the value

Вам также может понравиться