Вы находитесь на странице: 1из 42

Second Module Topics

1. Basic Audio Compression Techniques


Quantization, Non-linear Quantization, Differential Encoding
2. Linear Prediction Coding
-LPC, DPCM, DM, Adaptive DPCM.
3. Lossless Compression
-Run Length Coding, Statistical Coding, Huffman Coding, Dictionary
Coding, Arithmetic Coding.
4. Lossy Compression
-Transform coding, DFT, DCT, Harr Transform, KLT, Wavelet
Transforms, Embedded Zero Tree Coder.

1. Basic Audio Compression Techniques


Quantization, Non-linear Quantization, Differential Encoding,

Compression
Compression is generally applied in multimedia communication to reduce
the volume of information to be transmitted or to reduce the bandwidth that is
required for its transmission. Compression is used to reduce the volume of
information to be stored into storages or to reduce the communication bandwidth
required for its transmission over the networks.
Compression Principles
1. Source encoders and destination decoders
2. Lossless compression and lossy compression
3. Entropy encodings
4. Source encoding
Source encoders and destination decoders

• Prior to transmitting source information , a compression algorithm is


applied
• Destination reproduce the original or nearly exact copy of it while applying
a matching decompression algorithm
Lossless and lossy compression
In general, compression schemes can be broadly classified into two categories,
(a) Lossless Compression
(b) Lossy Compression

Lossless Compression
As the name implies, lossless compression schemes exploit redundancies
without incurring any loss of data. Thus, the data stream prior to encoding and
after decoding is exactly the same and no distortion in the reconstruction quality
is observed. Lossless image compression is exactly reversible. Ex : Text file
Lossless compression is achieved through the statistical redundancy. For
example, if we transform the image into a string of symbols prior to encoding and
then assign shorter code words to more frequently occurring symbols and longer
code words to less frequently occurring symbols, then we can achieve
compression and at the same time, the encoding process can be exactly reversed
during decoding, since there is an one-to-one mapping between the symbols and
their codes.
Lossless compression schemes: Run-length encoding, entropy coding, Ziv-
Lempel coding etc. Lossless image compression schemes can achieve only
limited extent of bandwidth reduction for data transmission, but preserves the
quality of the image, without suffering any distortion.
Lossy Image Compression
Contrary to lossless image compression, lossy image compression schemes
incur loss of data and hence suffer a loss of quality in reconstruction. Like
lossless image compression, the image is first transformed into a string of
symbols, which are quantized to a discrete set of allowable levels. It is possible to
achieve significant data compression, but quantization being a many-to-one
mapping is irreversible and exact reconstruction is never possible. Yet, if the loss
in reconstruction quality is acceptable to our visual perception, we may accept
this scheme in the interest of achieving very significant degree of compression.
Lossy compression schemes achieved through psychovisual redundancy. While
designing the quantizers, it must be known where we can tolerate loss of quality
and where we can not.
u Lossy compression algorithms, is normally not to reproduce an exact
copy of the source information after decompression
u Example application of lossy compression are for the transfer of
digitized images and audio and video streams

Elements of Image Compression System


A typical image compression system consists of the following elements –
(a) Transformer, (b) quantizer and (c) coder.

Transformer
This block transforms the original input data into a form that is more
suitable to compression. The transformation can be local, involving pixels in the
neighbourhood or global, involving the full image or a block of pixels.
The example of local transformation is
1. linear predictive coding followed by Differential Pulse Code Modulation
(DPCM),
2. Global transformation techniques use Discrete Fourier Transforms (DFT),
Discrete Cosine Transforms (DCT), Karhunen-Love Transforms (KLT), Discrete
Wavelet Transforms (DWT) etc,
The transformer block transforms the original spatial domain signal into another
spatial domain signal of reduced dynamic range, where only a few coefficients
contain bulk of the energy and efficient compression is possible. This block is is
lossless.
Quantization
Def:1.Reduce the number of distinct output values to a much smaller set.
Def:2. It is the process of round-off the signal amplitude for the sampling time
with the respective quantization level.
• Quantization is the process that confines the amplitude of a signal into a
finite number of values.

This is the main source of the “loss" in lossy compression.

Three different forms of quantization.


1. Uniform: midrise and midtread quantizers.
2. Nonuniform: companded quantizer.
3. Vector Quantization.

Uniform Scalar Quantization


A uniform scalar quantizer partitions the domain of input values into equally
spaced intervals, except possibly at the two outer intervals.
The output or reconstruction value corresponding to each interval is taken to be
the midpoint of the interval.
The length of each interval is referred to as the step size, denoted by symbol Δ.
Two types of uniform scalar quantizers:
1. Midrise quantizers have even number of output levels.
2. Midtread quantizers have odd number of output levels, including zero as
one of them (see Fig. 8.2).

For the special case where Δ = 1, we can simply compute the output values for
these quantizers as:
Qmidrise(x) = [x] − 0:5
Qmidtread(x) = [x]+0:5
Performance of an M level quantizer. Let B = {b0; b1; : : : ; bM } be the set of
decision boundaries and Y = {y1; y2; : : : ; yM } be the set of reconstruction or
output values. Suppose the input is uniformly distributed in the interval
[−Xmax;Xmax]. The rate of the quantizer is: [R = log2M]
Quantization noise
The difference between the actual value of the analog signal amplitude and
the corresponding nominal amplitude(nearest quantization interval value)
for the particular sampling time is called Quantization noise.
• Ratio of the peak amplitude of a signal to its minimum amp is known as
the dynamic range. (Vmax to +Vma)
• No of bits per sample = N bits per sample
• Quantization interval q = 2 Vmax /2× 2N-1 = V max/2(N-1)
• Range of digital signal = -2N-1 to 2N-1 -1

Signal to Quantization noise ratio (SQNR) is as follows

where Vquan-noise has the max value of ½.


Each bit adds 6dB resolution, So 16 bit has maximum SQNR of 96dB

Quantization Error of Uniformly Distributed Source


Granular distortion: quantization error caused by the quantizer for bounded input.
Decision boundaries bi for a midrise quantizer are [(i − 1)Δ , iΔ], i = 1….M/2,
The total distortion is twice the sum over the positive data, or
Since the reconstruction values yi are the midpoints of each interval, the
quantization error must lie within the values [-Δ/2, Δ/2]. For a uniformly
distributed source, the graph of the quantization error is shown

Non linear Quantization


Companded quantization is called nonlinear. A compander consists of a
ompressor function G, a uniform quantizer, and an expander function G−1.
The two commonly used companders are the µ-law and A-law companders.

Vector Quantization (VQ)


According to Shannon's information theory, any compression system performs
better if it operates on vectors or groups of samples rather than individual
symbols
or samples.
Form vectors of input samples by simply concatenating a number of consecutive
samples into a single vector.
Instead of single reconstruction values as in scalar quantization, in VQ code
vectors with n components are used. A collection of these code vectors form the
codebook.
Coder
Coders assign a code word, a binary bit-stream, to each symbol at the output of
the quantizer. The coder may employ
(i) Fixed-length coding (FLC), which have codeword length fixed, irrespective of
the probabilities of occurrence of quantized symbols or,

(ii) Variable length coding (VLC) , also known as entropy coding, assigns code
words in such a way as to minimize the average length of the binary
representation of the symbols. This is achieved by assigning shorter code words
to the more probable symbols.

Elements of Image De-compression system


At the receiver end, the encoded bitstream received through the communication
channel has to be decoded before it can be displayed. The image de-compression
system, also known as image decoding system should do exact reversal of the
processes adopted during encoding.

(a) Image decoder – Performs exact reversal of the coder in image compression
system. This block extracts the quantized coefficients.
(b) De-quantizer- Performs inverse of the quantization operation in image
compression. Since quantizer itself is lossy, de-quantization can never exactly
recover the transformed coefficients.
(c) Inverse Transformer – Performs exact reversal of the transformation
operation carried out in the corresponding image compression system. The output
of this block can be used for display.
Differential Pulse Code Modulation (DPCM)

Encode the difference between the current and previous 8x8 block
From the behavior in the past, the future signal values can be approximately
estimated
DPCM encoder consists of three steps:

– Predict the current signal value x(n) from past values x(n-1), x(n-2), ..
– Quantize the difference signal = prediction error
– Encode, e.g. VLC the prediction difference (prediction error).

Most of the audio signals , the range of the differences in amplitude


between successive samples of the audio waveform is less than the range of the
actual sample amplitudes . Hence the difference is coded.. So fewer bits are
required.
The register R store the previous digitized sample of the analog input
signal and difference is computed by subtracting the current contents(Ro) of the
register from the new digitized sample output from ADC(PCM) . The value in
the register is then updated by adding to the current register contents with the
computed difference signal output
The decoder operates by simply adding the received difference signal
(DPCM) to the previously computed signal held in the register (PCM).

As the output of the ADC is used directly and hence the accuracy of each
computed difference signal is known as residual signal . ADC operations
produce quantization errors. So with a basic DPCM, the previous value held in
the register is only the approximate value. More accurate version of the new
value can be identified by using the estimate of the current signal and also by the
prediction coefficients . As an example the in the below diagram, the difference
signal is computed by subtracting varying proportion of the last three predicted
values from the current digitized value output by the ADC. .

C1,C2,C3 are predicted coefficients.


Adaptive differential PCM

Principle: Depends on the varying number of bits for the difference in signal
which depends on the amplitude.

 Adaptive DPCM is similar to DPCM, but adjusts the width of the


quantization steps
 Encode difference in 4 bits, but vary the mapping of bits to difference
dynamically
 If rapid change, use large differences
 If slow change, use small differences
• Savings of bandwidth is possible by varying the number of bits used for the
difference signal depending on its amplitude (fewer bits to encode smaller
difference signals)
• An international standard for ADPCM is defined in ITU-T
recommendation G721. This is based on the same principle as the DPCM
except an eight-order predictor is used and the number of bits used to
quantize each difference is varied
• This can be either 6 bits – producing 32 kbps – to obtain a better quality
output than with third order DPCM, or 5 bits- producing 16 kbps – if lower
bandwidth is more important

• A second ADPCM standard which is a derivative of G-721 is defined in
ITU-T Recommendation G-722 (better sound quality)
• This uses subband coding in which the input signal prior to sampling is
passed through two filters: one which passes only signal frequencies in the
range 50Hz through to 3.5kHz and the other only frequencies in the range
3.5kHz through to 7kHz
• By doing this the input signal is effectively divided into two separate equal-
bandwidth signals, the first known as the lower subband signal and the
second the upper subband signal
• Each is then sampled and encoded independently using ADPCM, the
sampling rate of the upper subband signal being 16 ksps to allow for the
presence of the higher frequency components in this subband

• The use of two subbands has the advantage that different bit rates can be
used for each
• In general the frequency components in the lower subband have a higher
perceptual importance than those in the higher subband
• For example with a bit rate of 64 kbps the lower subband is ADPCM
encoded at 48kbps and the upper subband at 16kbps
• The two bitstreams are then multiplexed together to produce the transmitted
(64 kbps) signal – in such a way that the decoder in the receiver is able to
divide them back again into two separate streams for decoding
Adaptive predictive coding
• Even higher levels of compression possible at higher levels of complexity
• These can be obtained by also making the predictor coefficients adaptive
• In practice, the optimum set of predictor coefficients continuously vary since
they are a function of the characteristics of the audio signal being digitized
• To exploit this property, the input speech signal is divided into fixed time
segments and, for each segment, the currently prevailing characteristics are
determined.
• The optimum set of coefficients are then computed and these are used to
predict more accurately the previous signal
• This type of compression can reduce the bandwidth requirements to 8kbps
while still obtaining an acceptable perceived quality

Linear predictive coding (LPC) signal encoder and decoder

Linear predictive coding involves the source simply analyzing the audio
waveform to determine a selection of the perceptual features it contains.
• With this type of coding the perceptual features of an audio waveform are
analysed first
• These are then quantized and sent and the destination uses them, together
with a sound synthesizer, to regenerate a sound that is perceptually
comparable with the source audio signal
• With this compression technique although the speech can often sound
synthetic high levels of compressions can be achieved
• In terms of speech, the three features which determine the perception of a
signal by the ear are its:
Pitch: this is closely related to the frequency of the signal. This is important
since ear is more sensitive to signals in the range
2-5kHz
Period: this is the duration of the signal
Loudness: This is determined by the amount of energy in the signal
• The input speech waveform is first sampled and quantized at a defined rate
• A block of digitized samples – known as segment - is then analysed to
determine the various perceptual parameters of the speech that it contains
• The output of the encoder is a string of frames, one for each segment
• Each frame contains fields for pitch and loudness – the period determined
by the sampling rate being used – a notification of whether the signal is
voiced (generated through the vocal cords) or unvoiced (vocal cords are
opened)
And a new set of computed modal coefficients

Delta modulation (DM)

• This is the Simplified DPCM. Moreover this the one-bit encoder. So it is


Not suitable for fast changing signals. Inexpensive and simple to
implement.
• DM takes advantage of the fact that voice signals do not change abruptly
• This scheme sends only the difference between pulses, if the pulse at time
tn+1 is higher in amplitude value than the pulse at time tn, then a single bit,
say a “1”, is used to indicate the positive value.
• If the pulse is lower in value, resulting in a negative value, a “0” is used.
• This scheme works well for small changes in signal values between
samples.
• If changes in amplitude are large, this will result in large errors.
• Transmits information only to indicate whether the analog signal that is
being encoded goes up or goes down
• The Encoder Outputs are highs or lows that “instruct” whether to go up or
down, respectively
• The analog signal is quantized by a one-bit ADC (a comparator
implemented as a comparator)
• The comparator output is converted back to an analog signal with a 1-bit
DAC, and subtracted from the input after passing through an integrator
• The shape of the analog signal is transmitted as follows: a "1" indicates
that a positive excursion has occurred since the last sample, and a "0"
indicates that a negative excursion has occurred since the last sample.

\
3. Lossless Compression
-Run Length Coding, Statistical Coding, Huffman Coding, Dictionary
Coding, Arithmetic Coding.
Entropy encoding
• Entropy encoding is lossless and independent of the type of information
(semantic/ structure of the source information) that is being compressed. It
is concerned only with how the information is represented.
• e.g. Run-length encoding
• Statistical encoding
» Huffman encoding
» Arithmetic encoding
Run length coding- Lossless compression
Run length coding is that type of compression technique where the sequences of
same bytes are replaced with a flag and the number of the occurrences. As a flag
any unused special symbol that is not in the stream is used.
• When source information contains along substrings of the same character
or binary.

Example:1
Let a data stream contains AAAAABCDDDDDDF
Run length coding of this will be A!5BCD!6F

Example :2 data stream is 00000001111100011


Run length coding of this will be ⇒ 0,7,1,5,0,3,1,2 ⇒ 111,101,011,010
Decimal digits would be sent in their binary form, assuming field number of bits
per code word, it is by largest possible substring .

Statistical encoding
 Based on the probability of occurrence of a pattern
 Statistical encoding uses a set of variable-length code words with the shorter
code words used to represent the more frequently occurring symbols.
 Ensure that shorter codeword in the set does not form the start of a longer
codeword, otherwise interpret will be wrong.
 “Prefix property”: a shorter codeword must not form the start of a longer
codeword
 Example Huffman coding algorithm
Entropy
The theoretical minimum average number of bits that are required to transmit a
particularNsource stream is known as the entropy of the source.
H    p(si) log 2 p(si)
i 1

Where N is the number of different symbols in the source stream and Psi is the
probability of occurrence of symbol si.
How to calculate the average code word length

Average number of bits per codeword is given by , where Ni is the bit length of
the ith codeword of the codebook
Find the average code word length of the given table:

Symbol A B C D E
P(S) 0.25 0.30 0.12 0.15 0.18
Code 01 11 100 101 00

N
L   code length of (si)  p(si)
i 1

n
L   Ni Pi  0.25  2  0.30  2  0.12  3  0.15  3  0.18  2  2.27bits
i 1

Entropy Encoding : Example

Comprises only the six different characters M,F,Y,N,0 and 1

Frequency of occurrence of 0.25, 0.25, 0.125, 0.125, 0.125, and 0.125


If the encoding algorithm under following set of codewords:
M=10, F=11, Y=010, N=011, 0=000,1=001
Huffman Encoding
Lossless statistical encoding.
Huffman coding is a variable length code whose length is inversely proportional
to that character’s frequency and must satisfy nonprefix property to be uniquely
decodable
 Statistical encoding
 To determine Huffman code, it is useful to construct a binary tree
 Leaves are characters to be encoded
 Nodes carry occurrence probabilities of the characters belonging to the
sub-tree

Huffman coding algorithm: Method of construction for an encoding tree


• Full Binary Tree Representation
• Each edge of the tree has a value,
(0 is the left child, 1 is the right child)
• Data is at the leaves, not internal nodes
• Result: encoding tree
Problem 1
Find the Huffman code for the following
P(A)=0.16, P(B)=0.51, P(C)=0.09, P(D)=0.13, P(E)=0.11

Solution
Step 1 : Sort all Symbols according to their probabilities (left to right) from
Smallest to largest these are the leaves of the Huffman tree

Step 2: Build a binary tree from left to Right


Policy: always connect two smaller nodes together (e.g., P(CE) and P(DA) had
both
Probabilities that were smaller than P(B), Hence those two did connect first

Step 3: label left branches of the tree With 0 and right branches of the tree
With 1
Example 2
Find the Huffman code for A,B,C,D,E,F,G,H respective probability of
occurrence
0.25, 0.25, 0.14, 0.14,0.055, 0.055,0.055,0.055.
 Entropy, H: theoretical min. avg. number of bits that are required to
transmit a particular stream
N
H    Pi) log2 Pi
i 1

where n: number of symbols, Pi: probability of symbol i


 Efficiency, E = H/H’ n
L   Ni Pi
where, H’ = avr. Number of bits per codeword = i 1
Ni: number of bits of symbol i

Student’s exercise
1. Calculate the entropy of the above two problems :
2. Construct the Huffman coding tree for
Symbol A B C D E
Probability 0.25 0.30 0.12 0.15 0.18

Arithmetic Coding
This is an Optimal algorithm as Huffman coding with respect to compression
ratio.This is also a lossless compression. Arithmetic coding yields a single
codeword for each encoded string of characters

It is a better algorithm than Huffman with respect to transmitted amount of


information
– Huffman – needs to transmit Huffman tables with compressed data
– Arithmetic – needs to transmit length of encoded string with
compressed data
– Each symbol is coded by considering the prior data
– Encoded data must be read from the beginning, there is no random
access possible
– Each real number (< 1) is represented as binary fraction
– 0.5 = 2-1 (binary fraction = 0.1); 0.25 = 2-2 (binary fraction = 0.01),
0.625 = 0.5+0.125 (binary fraction = 0.101)
Algorithm
The first step is to divide the numeric range from 0 to 1 into a number of
different characters present in the message to be sent
• Assign to every symbol a range in this line based on the probability
• Higher the probability higher the range which assign to it.
• Sort the symbol based the probability(lowest to highest )
• Start to encode the symbol
• Subdivide the range for the first token given the probabilities of the second
token then third etc.
Example problem
Encode the symbols “ CAEE$” using arithmetic coding, Where the probabilities
are P(A)=0.2, P(B)=0.1, P(C)=0.2, P(D)=0.05, P(E)=0.3, P(F)=0.05, P($)=0.1.

Pick the encode of CAEE$ is in


the range of 0.33184 to
0.3322
(Upper value = Lower value
+Pbly of particular character ×
Range of the prior parent data)
Example range of “CA” is
Lower value =0.3
Upper value= 0.3+0.2×0.2=0.34
Example 2

Dictionary coding- Lemple-Ziv coding


• LZW/ Dictionary-based compression algorithm
• A table containing all the possible character strings
• Initially, the dictionary held by both the encoder and decoder contains only
the character set
• The remaining entries in the dictionary are then built up dynamically by
both the encoder and decoder and contain the words that occur in the text
• In this technique a data dictionary of data patterns are created observing
the occurrence of data in an uncompressed data stream. And when there is
a match with the patterns the output is taken from that table.
• LZW uses fixed-length codewords to represent variable-length strings of
symbols/characters that commonly occur together, e.g., words in English
text.
• The LZW encoder and decoder build up the same dictionary dynamically
while receiving the data.
• The encoder prior to sending each word in the form of single characters,
first checks to determine if the word is currently stored in its dictionary
and, if it is, it sends only the index for the word

• The available space become full, then the number of entries is allowed to
increase incrementally

Example : LZW compression for string “ABABBABCABABBA"
Given :
Code 1 2 3
String A B C

ABABBABCABABBA

12 4 5 2 34 6 1.

The output codes are: 1 2 4 5 2 3 4 6 1.


Instead of sending 14 characters, only 9 codes need to be sent .
Compression ratio = 14/9 = 1.56

DECODING : 1 2 4 5 2 3 4 6 1
K is the subsequent input code

.
4. Lossy Compression
-Transform coding, DFT, DCT, Harr Transform, KLT, Wavelet Transforms,
Embedded Zero Tree Coder

Lossy compression: Transform encoding


Transform encoding is one of the Source encoding process.
• Source coding
• Source encoding exploits a particular property of the source information in
order to produce an alternative form of representation that is either a
compressed version of the original form or is more amenable to the
application of compression.
• Two types
• 1· Differential encoding
• 2· Transform encoding

Differential encoding
Instead of sending large code words, from the source information, sent a
set of smaller codewords which indicate the difference in amplitude between the
current value of the signal being encoded and immediately preceding value of the
signal.
Example 12bits are required to represent the dynamic(current value) range of the
signal, but only 3 bits are required to transmit the difference between the
successive samples.
Example 2
If the sequence of DC coefficients are
12,13,11,11,10, …..
Then the corresponding difference value would be
12, 1 , -2, 0, -1, ……
Difference values are encoded in the form (SSS, value) where SSS field indicates
the number of bits needed to encode the value and value field the actual bits that
represent the value
Value SSS field Value field
12 4 1100
1 1 1
-2 2 01
0 0
-1 1 0

Differential coding is either lossy or lossless.


Sufficient number of bits are used to encode the difference values, then it is
lossless.
In-sufficient number of bits are used to encode the difference values, then it is
lossy.

Transform Encoding/source encoding


• Purpose of transformation is to convert the data into a form where
compression is easier. This transformation will transform the pixels which
are correlated into a representation where they are decorrelated. The new
values are usually smaller on average than the original values. The net
effect is to reduce the redundancy of representation.
The human eye is less sensitive to the higher spatial frequency components
-- If we can transform the original spatial form of representation into an
equivalent representation involving spatial frequency components, then we can
more readily identify and eliminate those higher frequency components which the
eye cannot detect thereby reducing the volume of information
While scanning across a set of pixel locations, the rate of change in magnitude
will vary.
If all the pixel values remain the same, the rate of change is zero. If each pixel
magnitude changes from one location to the next, there will be a high rate of
change. The rate of change in pixel magnitude in the matrix is spatial frequency
Amplitudes of the spatial frequencies are determined by the relative changes in
magnitudes of the pixels

Block diagram of Transform encoder:

Examples of Transform encoding:


DFT,
High frequency and low frequency representation DCT,
Harr Transform,
KLT,
Wavelet Transforms,
Embedded Zero Tree Coder.
DCT- Discrete Cosine Transform (DCT)

The DCT transform of an image brings out a set of numbers called


coefficients. A coefficient’s usefulness is determined by its variance over a set
of images as in video’s case. If a coefficient has a lot of variance over a set,
then it cannot be removed without affecting the picture quality.

Conventional image data have reasonably high inter-element correlation. DCT


avoids the generation of the spurious spectral components which is a problem
with DFT and has a fast implementation which avoids complex algebra.

Histograms for 8x8 DCT coefficient amplitudes measured for


natural images, Where DC coefficient is typically uniformly
distributed. The distribution of the AC coefficients have a
Laplacian distribution with zero-mean.

The basis idea is to decompose the image into a set of “waveforms”, each with a
particular “special” frequency.
To human eyes, high spatial frequencies are imperceptible and a good
approximation of the image can be created by keeping only the lower
frequencies.

DCT Algorithm
1. Divide picture into 16 by 16 blocks - (macroblocks).
2. Each macroblock is 16 pixels by 16 pixels( lines). (4 blocks)
3. Each block is 8 pixels by 8 lines.
4. Each pixels in the block represents the intensity value of the particular
position f(i,j)
5. Apply DCT over 8 X 8 Block -Frequency Coefficient
5. f(i,j) is the intensity of the pixel in row i and column j;
6. Level shift / Make all the values are centered around zero. The 8 arbitrary
grayscale values (with range 0 to 255) are level shifted by 128 .

7. F(u,v) is the DCT coefficient in row and column of the DCT matrix.
8. F(0,0) is the DC coefficient
9. u=1-7 for v=0, u=0 for v=1-7 or u=1-7, v=1-7 are AC co-efficients.
10. For most images, much of the signal energy lies at low frequencies; these
appear in the upper left corner of the DCT.
11. Compression is achieved since the lower right values represent higher
frequencies, and are often small - small enough to be neglected with little
visible distortion

The DCT transform f(u,v) for input signal f(i,j) is


All 64 values in the input matrix,f(i,j), contribute to each entry in the transformed
matrixF(u,v).
2D 8X8 basis functions of the DCT, The horizontal frequency of the basis
functions increases from left to right and the vertical frequency of the basis
functions increases from top to bottom.
The DCT coefficient values can be regarded as the relative
amounts of the 2-D spatial frequencies contained in the
8×8 block,the upper-left corner coefficient is called the
DC coefficient, which is a measure of the average of the
energy of the block. Other coefficients are called AC
coefficients, coefficients correspond to high frequencies
tend to be zero or near zero for most natural images

Karhunen-Loeve Transform (KLT)


• The Karhunen-Loeve transform is a reversible linear transform that
exploits the statistical properties of the vector representation.
• It optimally de-correlates the input signal.
• To understand the optimality of the KLT, consider the autocorrelation
matrix RX of the input vector X defined as

Our goal is to find a transform T such that the components of the output Y are
uncorrelated, i.e E[YtYs] = 0, if t≠s
. Thus, the autocorrelation matrix of Y takes on the form of a positive diagonal
matrix.
• Since any autocorrelation matrix is symmetric and non-negative definite,
there are k orthogonal eigenvectors u1, u2,... , uk and k corresponding real
and nonnegative eigenvalues λ1 ≥ λ2 ≥ ··· ≥ λk ≥ 0.
If we define the Karhunen-Loeve transform as
T = [u1;u2; ;uk]T
• Then, the autocorrelation matrix of Y becomes
 4  3  5  6   4.5 
mean m x  4  2  7  7   5 
1
4   
 5  5  6  7  5.75
Subtracting the mean vector from each input vector and
apply the KLT: x1  m x T   y1output vector
Wavelet based coding – Haar Transform
Haar Transform is a very fast transform. The easiest wavelet transform..It
is useful in edge detection, image coding, and image analysis problems. Energy
Compaction is fair, not the best compression algorithms.
The simplest wavelet transform is so called as Haar wavelet transform. Here we,
repeatedly take the averages and differences and keep the result for every step.
Thus we do the multi resolution analysis. This will create smaller and smaller
size images ie ¼, 1/16 and so on.
Haar Transform steps
1. Find the average of each pair of samples.

2. Find the difference between the average and the samples .


3. Fill the first half of the array with averages.
4. Normalize
5. Fill the second half of the array with differences.

6. Repeat the process on the first half of the array.

Example 1

5. Insert
6. Normalize
7. Insert
8. Repeat ie 2+6/2=4,, and (2-6)/2= -2

Example 2
2D Wavelet Transform

Wavelet transform represents a signal with good resolution in both frequency and
time, by using a set wavelet basis function.
The objective of the wavelet transform is to decompose the input signal into set
of basis components (wavelets) that are easier to deal with, or have some
components that can be thresholded away, for compression purposes.

There are two types


Continues wavelet Transform (CWT) and Discrete wavelet Transform(DWT)

Discrete wavelets are formed from a mother wavelet, but with scale and
shift in discrete steps. The DWT makes the connection between wavelets in the
continuous time domain and filter banks" in the discrete time domain in a
multiresolution analysis framework.

 Continuous wavelet transform (CWT) of 1D signal is defined as


Wa f (b)   f ( x )  a ,b ( x ) dx

 The a,b is computed from the mother wavelet by translation and dilation
1 x b
 a ,b ( x )   
a  a 

 where a is the scaling parameter and b is the shifting parameter

Discrete Wavelet function


DWT operates on discrete samples of input ssample
Key concepts in Discrete wavelet Transform is
• – shift the coefficient and
• – scale the coefficient 2D function

2D Wavelet Transform steps


• Convolve each row of image with h[0] and h[1]
• Discard the odd numbered columns of resulting arrays
• Concatenate to form a transformed row
• After all rows have been transformed, convolve each column of the result
with h[0] and h[1]
• Discard the odd numbered rows
• Concatenate to form a transformed result
• Transformed image contains LL,LH, HL and HH sub bands

After the above steps, one stage of the DWT is complete. The transformed
image now contains four subbands LL, HL, LH, and HH, standing for low-low,
high-low, etc. The LL subband can be further decomposed to yield yet another
level of decomposition. This process can be continued until the desired number
of decomposition levels is reached

2D Wavelet Transform Example


The input image is a sub-sampled version of the image Lena. The size of the
input is 16×16. The filter used in the example is the Antonini 9/7 filter set

a)Original image b) 16×16 sub-sampled


image
This complete one stage of Discrete Wavelet Transform. Second stage of D W T
is by applying same procedure to the upper left 8 ×8 image. The result will be I 12
(x,y)

Wavelet Packets
In the usual dyadic wavelet decomposition, only the low-pass filtered subband is
recursively decomposed and thus can be represented by a logarithmic tree
structure. A wavelet packet decomposition allows the decomposition to be
represented by any pruned subtree of the full tree topology.
The wavelet packet decomposition is very flexible since a best wavelet
basis in the sense of some cost metric can be found within a large library of
permissible bases. The computational requirement for wavelet packet
decomposition is relatively low as each decomposition can be computed in the
order of N logN using fast filter banks.
Embedded Zero-tree of Wavelet

Effective and computationally efficient for image coding.


The EZW algorithm addresses two problems:
1. Obtaining the best image quality for a given bit-rate,
2. Accomplishing this task in an embedded fashion.

Using an embedded code allows the encoder to terminate the encoding at any
point. Hence, the encoder is able to meet any target bit-rate exactly. Similarly, a
decoder can cease to decode at any point and can produce reconstructions
corresponding to all lower-rate encodings

• EZW algorithm use low bit-rate image coding.

• Embedded code that contains all lower rate codes , embedded at the
beginning of the bit stream.

• The EZW algorithm use a new data structure called the zero-tree

• Using the hierarchical wavelet decomposition, we can relate every


coefficient at a given scale to a set of coefficients at the next finer scale of
similar orientation.

• The coefficient at the coarse scale is called the ‘parent“ while all
corresponding coefficients are the next finer scale of the same spatial
location and similar orientation are called “children".

• EZW algorithm consists two central components: Zero tree data structure
and the method of successive approximation quantization.
Second stage decomposition
First stage decomposition

LL 2 HL 2
LL 1 HL 1 HL 1
LH 2 HH 2

LH 1 LH 1
HH 1 HH 1

• Zero Tree structure


Order of scanning format

The significance map is coded using zero tree with four symbol alphabets. Four
symbols are

• Positive significance: The coefficient is significant with a positive value


with a magnitude greater than thresholdT ,.
• Negative significance: The coefficient is significant with a negative
value with a magnitude greater than thresholdT .
• zr: If the magnitude of the coefficient is less than T (it is insignificant),
and all its descendants have magnitudes less than T, then the coefficient is
called zerotree root.
• iz: it might happen that the coefficient itself is less than T but some of its
descendants have a value greater than T. Such a coefficient is called an
isolated zero..
The flow chart of EZW algorithm is

Example Illustrate the steps of EZW algorithm:

Let the image map is

26 6 13 10
-7 7 6 4
4 -4 4 -3
2 -2 -2 0

1. Decompose the image map

.
Fix the threshold value =

Вам также может понравиться