Вы находитесь на странице: 1из 13

Discrete Cosine Transform and JPEG Files

Amber Habib
Mathematical Sciences Foundation
Delhi
www.mathscifound.org

Abstract
A digital image can be viewed as an array of numbers, each number
representing the colour value of the corresponding pixel. In the JPEG
format, these numbers are stored indirectly, via their discrete cosine
transform. This enables easy compression, resizing, etc. For further
savings, the array produced by the discrete cosine transform is stored
using Huffman encoding.
The calculations and plotting were carried out using Mathematica.
These notes were prepared for MSF’s Programme in Mathematical
Simulation and IT. They provided the base for student projects in
image manipulation using Matlab. The projects used Fourier analysis
as well as wavelets.

Contents
1 Discrete Fourier Transform 2

2 Discrete Cosine Transform 4

3 The Two Dimensional Discrete Cosine Transform 6

4 Huffman Encoding 10

1
1 DISCRETE FOURIER TRANSFORM 2

1 Discrete Fourier Transform

Consider the data depicted in the following graph:

1 2 3 4 5 6

To represent this data in a way that can be easily manipulated for different
purposes, we wish to construct a function that passes through all the data
points.

More specifically, we construct a function of the form


5 4
A0 X X
f (x) = + Ak cos(kx) + Bk sin(kx), (1)
2
k=1 k=1

where the Ak ’s and Bk ’s are suitably chosen constants. This function is


called the discrete Fourier transform of the data. Note that we have 10 data
points1 and 10 unknown constants. Further, each data point, on substitution
in (1) creates a linear equation for the unknowns. So we can hope to solve
this linear system and obtain a unique set of values for Ak and Bk .

For our data, the Ak ’s turn out to be given by the vector

A = (5.65486, −0.628319, −0.628319,


−0.628319, −0.628319, −0.314159),

and the Bk ’s by the vector

B = (−1.93377, −0.864806, −0.4565,


−0.204153, 0).
1
Actually 11, but the last is just a repeat of the first.
1 DISCRETE FOURIER TRANSFORM 3

Note: We have not said anything about how to find the coefficients Ak and
Bk . Our immediate interest is in observing that this knowledge is useful,
and then later we will see how to obtain it.

The discrete Fourier transform f (x) passes exactly through the data points:

1 2 3 4 5 6

Now we investigate the contribution of the different coefficients Ak and Bk .


Suppose we set A4 and B4 to zero. Then the function becomes:

1 2 3 4 5 6

This function doesn’t represent the data exactly but it does roughly follow
the general trend.

Now let us instead drop the A2 and B2 terms:


2 DISCRETE COSINE TRANSFORM 4

1 2 3 4 5 6

The loss in quality is much greater. This shows that the “higher order”
terms contribute less than the “lower order” terms. Therefore, we need not
store them to the same order of accuracy.

Suppose then, that we round off the last couple of coefficients of A and B:

A = (2.82743, −0.628319, −0.628319, −0.628319, −0.6, −0.3),


B = (−1.93377, −0.864806, −0.4565, −0.2, 0).

This makes no noticeable difference to the accuracy of the interpolation:

1 2 3 4 5 6

2 Discrete Cosine Transform

It is possible to manipulate the discrete Fourier transform of a set of data


so that it consists of only cosine terms (all the Bk ’s are zero). The benefits
from this are simpler computational procedures, especially when we deal
2 DISCRETE COSINE TRANSFORM 5

with data which comes as arrays instead of lists. Thus, consider a string of
data, such as

Data = (123, 157, 142, 127, 131, 102, 99, 235).

Instead of distributing these values at evenly spaced points 2πk/8, k =


0, 1, . . . , 7, we associate them to points π(2k + 1)/16. Further, we symmet-
rically assign the same values to the points −π(2k + 1)/16. Thus, we get a
collection of data points that is symmetric with respect to the y-axis:

220

200

180

160

140

120

-3 -2 -1 1 2 3

If we calculate the discrete Fourier transform for such data, we find that the
sine terms vanish (because sine is odd) and only the cosine terms remain
(because cosine is even, like the data). This special form is called the discrete
cosine transform of the data.

For data such as we have given (8 points), the discrete cosine transform is
7
A0 X
f (x) = + Ak cos(kx),
2
k=1

where the Fourier coefficients Ak are given by


7  
1X k
Ak = Data(n) cos (2n + 1)π .
4 16
n=0

Note that we have numbered the data points as 0, 1, . . . , 7.

For the example we have given, this formula produces the following values
for the Fourier coefficients:
A0 = 139.5 A1 = −10.04 A2 = 24.25 A3 = −35.36
A4 = 20.51 A5 = −28.66 A6 = 6.79 A7 = −4.22
3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM 6

The corresponding cosine transform f (x) exactly passes through the data
points:

250

225

200

175

150

125

100
0.5 1 1.5 2 2.5 3

3 The Two Dimensional Discrete Cosine Trans-


form

A digital image consists of a rectangular array of closely packed pixels, each of


whom is assigned a colour value. These colour values are given by numbers,
and various formats exist for mapping colours to corresponding numbers.
For instance, in one format, colours are broken up into their red, green and
blue (RGB) components and a particular colour is chosen by assigning a
corresponding intensity (via a number) to each of the RGB components.
For example, the background colour of the following box is obtained by
setting R = G = B = 0.8 :

A Shaded Box

Thus, a digital image, for the mathematician, is just an array of numbers.


To manipulate this array, we use a two dimensional version of the discrete
cosine transform.

Consider the data depicted in the following table:


3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM 7

123 157 142 127 131 102 99 235


134 135 157 112 109 106 108 136
135 144 159 108 112 118 109 126
176 183 161 111 186 130 132 133
137 149 154 126 185 146 131 132
121 130 127 146 205 150 130 126
117 151 160 181 250 161 134 125
168 170 171 178 183 179 112 124

We number the rows and columns as

0, 1, 2, . . . , 7.

Thus the (0, 0) entry is 123, the (3, 7) entry is 133, etc.

To this data, we apply the two dimensional discrete Cosine transform, defined
by:
7 7
1 XXh
DCT(u, v) = C(u)C(v) Data(x, y) ×
4
x=0 y=0
 u  v i
cos π(2x + 1) cos π(2y + 1) .
16 16
Here Data(x, y) refers to the (x, y) entry in the data table given above. The
coefficients C(u) and C(v) are defined by
1
C(h) = √ if h = 0 and C(h) = 1 if h 6= 0.
2

The discrete cosine transform produces the following table, after rounding:2

1149 39 -43 -10 25 -84 11 41


-81 -3 114 -74 -6 -2 21 -6
14 -11 0 -43 25 -3 17 -39
1 -61 -14 -12 36 -24 -18 4
44 13 36 -5 9 -22 6 -8
36 -12 -9 -5 20 -29 -21 13
-19 -8 21 -6 3 2 11 -22
-5 -14 -11 -18 -5 -1 7 -5
2
We always round off the results of our calculations to integers, because (1) integers
take less space than reals, (2) integer operations are faster, and (3) color values are usually
integers. It is one of the important strengths of the discrete cosine transform that the
errors introduced by the rounding off are inconsequential.
3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM 8

The first thing is to establish that we can recover the data from its dis-
crete cosine transform. For this purpose we define the inverse discrete cosine
transform by
7 7
1 XXh
IDCT(x, y) = C(u)C(v)DCT(u, v) ×
4
u=0 v=0
 u  v i
cos π(2x + 1) cos π(2y + 1) .
16 16
If we apply the IDCT to the DCT table, we get (after rounding):

123 157 142 127 131 102 99 235


134 135 157 112 109 106 108 135
135 144 159 108 112 118 109 127
176 183 161 111 186 130 132 133
137 149 154 126 185 146 131 132
121 130 127 145 205 150 130 126
117 151 160 181 250 161 134 125
168 170 171 178 183 179 112 124

Can you spot any difference between this and the original data?

Suppose we store the data via its DCT. We ask if we can afford to lose some
of the details of the DCT without significantly affecting the quality of the
data. One way to reduce the amount of space required by the DCT is to
divide every entry by, say, 8 (thus saving 3 bits per entry since the numbers
are stored in binary).

Then the DCT becomes

144 5 -5 -1 3 -10 1 5
-10 0 14 -9 -1 0 3 -1
2 -1 0 -5 3 0 2 -5
0 -8 -2 -2 4 -3 -2 0
6 2 4 -1 1 -3 1 -1
4 -2 -1 -1 2 -4 -3 2
-2 -1 3 -1 0 0 1 -3
-1 -2 -1 -2 -1 0 1 -1

Clearly, this “Compressed DCT” occupies much less space.

To recover the original data, we just uncompress by multiplying by 8, and


then apply IDCT.
3 THE TWO DIMENSIONAL DISCRETE COSINE TRANSFORM 9

This time there is some loss:

122 161 145 130 128 106 101 233


142 132 156 116 109 107 108 132
138 146 155 105 109 118 113 127
175 184 163 110 190 127 132 134
137 148 155 128 182 149 130 133
119 132 126 147 204 149 133 129
115 149 156 177 248 159 137 125
174 173 171 180 186 178 114 123

Another approach is to compress the entries on the top left less (as these
are more significant). For example, we divide the entries in the top left 4 × 4
submatrix of DCT by 2, and all the other entries by 8:

574 20 -22 -5 3 -10 1 5


-40 -2 57 -37 -1 0 3 -1
7 -6 0 -22 3 0 2 -5
0 -30 -7 -6 4 -3 -2 0
6 2 4 -1 1 -3 1 -1
4 -2 -1 -1 2 -4 -3 2
-2 -1 3 -1 0 0 1 -3
-1 -2 -1 -2 -1 0 1 -1

We uncompress the last table by multiplying by 2 and 8 in the appropriate


places. Then we apply IDCT, and we get:

120 159 144 129 127 105 100 234


138 130 156 116 109 106 108 133
135 145 156 106 109 117 113 127
174 183 164 111 191 127 131 132
139 149 154 127 183 149 128 129
121 133 125 146 204 149 131 125
114 148 156 177 248 158 136 123
170 172 172 182 186 177 113 123

This hybrid approach offers almost as much compression as the previous


one, with lower loss of quality.
4 HUFFMAN ENCODING 10

4 Huffman Encoding

The discrete cosine transform produces the numbers used to store and trans-
mit an image. However, these numbers are not stored according to their
values, but through a code that further reduces the required space. This
code names numbers according to their frequency. More frequent numbers
are given shorter codes.

Consider the compressed DCT we had obtained in the last section:

144 5 -5 -1 3 -10 1 5
-10 0 14 -9 -1 0 3 -1
2 -1 0 -5 3 0 2 -5
0 -8 -2 -2 4 -3 -2 0
6 2 4 -1 1 -3 1 -1
4 -2 -1 -1 2 -4 -3 2
-2 -1 3 -1 0 0 1 -3
-1 -2 -1 -2 -1 0 1 -1

We will construct a binary tree out of the numbers in this grid.

Step 1. List all the numbers occurring in the table, along with their frequen-
cies:

Data -10 -9 -8 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 14 144


Freq. 2 1 1 3 1 4 7 14 9 5 5 4 3 2 1 1 1

Step 2. Arrange the numbers in increasing order of frequency:

Data -9 -8 -4 6 14 144 -10 5 -5 4 -3 3 1 2 -2 0 -1


Freq. 1 1 1 1 1 1 2 2 3 3 4 4 5 5 7 9 14

Each number will become a ‘leaf’of the binary tree. This leaf will be labelled
by the number and its frequency. For instance, since 5 has frequency 2, the
corresponding leaf will be drawn as 5:2 .

Step 3. Two leaves with the lowest frequency are combined into one node.
This node is labelled by the sum of their frequencies. Thus, we get
4 HUFFMAN ENCODING 11

2n
 AA
 A
-9:1 -8:1

We repeat this step, with the following modification: Leaves and nodes
already collected below a node are ignored while comparing frequencies.
Only the top nodes and remaining leaves are taken into account.

Step 4. The starting situation is:

-4:1 6:1 14:1 144:1 2n -10:2 5:2 -5:3 ···


 AA
 A
-9:1 -8:1

On collecting the lowest frequency leaves under a node, we get:

14:1 144:1 2n 2n -10:2 5:2 -5:3 ···


 AA  AA
 A  A
-4:1 6:1 -9:1 -8:1

Step 5.

2n 2n 2n -10:2 5:2 ···


 AA  AA  AA
 A  A  A
14:1 144:1 -4:1 6:1 -9:1 -8:1

Step 6.

2n -10:2 5:2 -5:3 4:3 4n -3:4 ···


 AA

 A
A
 A  A
-9:1 -8:1  A
 A
2n 2n
 AA  AA
 A  A
14:1 144:1 -4:1 6:1
4 HUFFMAN ENCODING  12
64

 A
 A
 A
 A
 A
 A
A
0 A1

 A
 A
 A
 A
n
28 n
36
 A  A
A 1
0 0  A 1
 A  A
 A  A
 A  A
n
14 -1:14 n
17 n
19
 A  A 0  AA 1
0  A 1 A 1 0  A
 A  A n
 A  A 0:9 10
 A  A 0  AA 1
7n -2:7 8n 9n  A
 AA  A  A 1:5 2:5
0 1 0 A1 0  1A
 A  A  A
4:3 4n 4n -3:4 3:4 5n


 A
A 
 A
A 0  AA 1
 A
0 A1 0 A1 5:2 -5:3
 A  A
 A  A
2n -10:2 2n 2n
 AA  AA  AA
0 1 0 1 0 1
 A  A  A
-9:1 -8:1 14:1 144:1 -4:1 6:1

Figure 1: The binary tree for the Huffman code.

By now, the general scheme should be clear. It is evident that we have made
certain choices in each step: namely the order in which we write nodes/leaves
having the same frequency. This does affect the final binary tree we obtain.
However, once we have described the method of coding, it will be obvious
that these choices do not affect the efficiency of the encoding.

Figure 1 shows the final binary tree for our data. We have also labelled each
branch of the tree: by 0 if it is a left branch and by 1 if it is a right branch.
The encoding proceeds as follows. To obtain the code for a value, start from
the root (the node labelled 64) and move down to the value, noting down
each 0 or 1 label for a branch as you cross it. Thus, in moving to the leaf
for the value -10, we obtain the sequence 00011. This is the code for that
4 HUFFMAN ENCODING 13

144 -5 −5 −1


−10 0 14 −9

2 −1 0 −5

0 −8 −2 −2

Figure 2: The sequence in which values are encoded.

value.

Note that the most frequent value (-1) has the shortest code (01), and the
less frequent ones have progressively longer codes. A value such as 144, with
frequency 1, has the longest code: 100001.

The table is coded by going through the values one-by-one in the zigzag man-
ner shown in Figure 2 and writing their codes – without any separators! For
instance the starting sequence 144, 5, -10,. . . , becomes 1000011011000011. . . .
(144 → 100001, 5 → 10110, −10 → 00011) To decode this string, one need
only refer to the tree. We start at the root and follow the left or right
branches according to whether we see a 0 or a 1. When we reach a leaf, we
note the corresponding value and start again at the root.

Exercise. Show that our table of values can be described by 231 binary
digits if we use Huffman encoding. If, on the other hand, we had worked
with codes of fixed length, we would have needed 320 binary digits.

Вам также может понравиться