Академический Документы
Профессиональный Документы
Культура Документы
Pupil Text
18 Arithmetic Coding
One of the most powerful compression techniques is called Arithmetic Coding. This
converts the entire input data into a single floating point number. (A floating point
number is similar to a number with a decimal point, like 3.5 instead of 3 12 . However, in
arithmetic coding we are not dealing with decimal numbers so we call it a floating point
instead of a decimal point.)
We will use as our example the string (or message)
BE_A_BEE
and compress it using arithmetic coding. The first thing we do is look at the frequency
counts for the different letters:
E
Then we encode the string by dividing up the interval [0, 1] and allocate each letter an
interval whose size depends on how often it occurs in the string.
3
8
5
8
7
8
Our string starts with a 'B', so we take the 'B' interval and divide it up again in the same
way:
3 24
=
8 64
BE
3
8
30
64
BB
5
8
34
64
B_
7
8
38
64
BA
5 40
=
8 64
3
2
of the way along the interval, which is itself
8
8
3
3
2
3
30
. So the boundary is + =
. Similarly the boundary
8 8 8 64
8
34
3 2 5
+
, and so on.
=
8
8
8
64
Pupil Text
The second letter in the message is 'E', so now we subdivide the 'E' interval in the same
way. We carry on through the message ...
3
8
3 24
=
8 64
30
64
BE
5
8
34
64
BB
BEE
BEB
3 192
=
8 512
_B
7
8
38
64
BE_
210
512
222
512
BA
5 40
=
8 64
BEA
234
512
30 240
=
64 512
7653888
16777216
7654320
16777216
BE_A_BEE
7654 320
easily using a computer. Computers
16 777 216
use binary numbers a system where all the numbers are made up of 0s and 1s. The first
few whole numbers in binary are 1, 10, 11, 100, 101, ... but how do they actually work?
In decimal notation, the rightmost digit to the left of the decimal point indicates the
number of units; the one to its left gives the number of tens; the next one along gives the
number of hundreds, and so on.
So
) (
) (
) (
) (
7653888 = 7 10 6 + 6 10 5 + 5 10 4 + 3 10 3 + 8 10 2 + (8 10) + 8
Binary numbers are almost exactly the same, only we deal with powers of 2 instead of
powers of 10. The rightmost digit of a binary number is units (as before); the one to its
left gives the number of 2s; the next one the number of 4s, and so on.
Pupil Text
) (
) (
) (
)
+ (0 2 ) + (1 2 ) + (1 2) + 1
) (
So 110100111 = 1 2 8 + 1 2 7 + 0 2 6 + 1 2 5 + 0 2 4
3
= 256 + 128 + 32 + 4 + 2 + 1
= 423 in denary (i.e. base 10)
Exercise 1
a)
b)
ii) 147
iii) 623
ii) 1000011
iii) 101010101
We can write fractions in binary too. In denary, i.e. base 10, the first digit after the
decimal point gives the number of tenths; the next gives the number of hundredths, etc.
In binary, the first digit after the floating point gives the number of halves, the next the
number of quarters, etc. For example,
Fraction
Binary
Fraction
Binary
1
2
0.1
3 1 1
= +
4 2 4
0.11
1
4
0.01
3 1 1
= +
8 4 8
0.011
1
8
0.001
5 1 1
= +
8 2 8
0.101
Exercise 2
a)
b)
ii)
0.10101
iii) 0.100011
7653888
in binary?
16 777 216
Think about how we would write it in denary (decimal). A good start is to write the
fraction in its simplest form. In this case this is easy to do because the denominator is a
power of 2 (in fact, 16 777 216 = 2 24 ). So to simplify the fraction we just divide the
numerator by 2 as many times as we can until we get an odd number, then divide the
denominator by the same amount.
Pupil Text
So
7653888
2 9 14 949 14 949 14 949
=
=
=
16 777 216
2 24
215
32 768
If we were calculating this in decimal, we would do it by long division. However, in
binary, it is actually a good deal simpler because the denominator is a power of 2.
To understand why, think about how you would write
14 949
14 949
in decimal: it's
=
100 000
10 5
14 949
14 949
is simply 0.014949. In general,
is 14 949.0 with the
6
10
10 n
decimal point moved to the left by n places.
We can use exactly the same method in binary all we have to do is convert 14 949 to a
binary number and then, as the denominator is 32 768 = 215 , we move the floating point
to the left by 15 places.
But how do we convert a number to binary? Start by finding the largest power of 2
which is less than or equal to our number, then subtract it, keeping a note of which power
of 2 it was. Then repeat with the remainder, and so on. So ...
n
2n
Test
Include n ?
13
8192
Yes
12
4096
Yes
11
2048
Yes
10
1024
No
512
613 >
512
Yes
256
101 <
256
No
128
101 <
128
No
64
101 >
64
Yes
101 64 = 37
32
37 >
32
Yes
37 32 = 5
16
5 <
16
No
5 <
No
5 >
Yes
1 =
Yes
Calculate remainder
5 4 = 1
7653888
14 949
=
= 0.011101001100101 in binary.
16 777 216 32 768
Pupil Text
However, we can represent the message BE_A_BEE as any number in the interval
Exercise 3
7654 320
is 0.01110100110010111011.
16 777 216
How do we choose a number to represent our message? To get the best compression, we
want to send the least possible number of binary digits so we want the shortest number
between
0.011101001100101
and
0.01110100110010111011
In this case, the first point in the interval has the shortest binary representation out of all
the points in the interval, so we choose this to represent out message. (Note that it is not
always an endpoint in general it is the number in the interval whose numerator can be
divided by 2 the most times.)
So
BE_A_BEE 011101001100101
This is 15 digits long as there is no need to send the first zero or the floating point.
Is this an effective compression? If we were using ASCII computer codes, the message
would be 8 5 = 40 digits long, so there is a significant improvement using the
Arithmetic Coding method.
Activity 1
Design a Huffman code for this message. How many digits would be in the message?
Exercise 4
We want to compress the word LOSSLESS using arithmetic coding.
The frequency counts for the characters in this message are:
(a) (i)
Fill in the gaps indicated by the letters in brackets on the following model.
(For ease of presentation, only the numerators of the fractions are written on
the diagram the denominator for each row is at the side.)
Pupil Text
?
8
?
64
10
112
7936
992
O
(b)
1024
S
8064
8192
63 936 64 000
O
509 952
(e)
128
63 744 (c)
E
?
16777216
120
984
(a)
24
S
?
262144
16
118
968
8
S
?
32768
14
114
960
?
4096
3
L
?
512
?
2097152
510 016
(f)
O
(g)
510 464
S
(h)
64 256
(i)
Pupil Text
j)
k)
l)
What is the shortest binary representation of LOSSLESS that we can send using
this method?
Exercise 5
You receive the following message which has been compressed using arithmetic coding:
0110100110101
You know that the following characters were sent:
3 As, 1 C, 1 D 1 I and 2 Ns.
Decode the message.
a)
b)
c)
Subdivide the interval with your answer to a) in it. Where in this new interval is
your answer to a)? Hence find the second letter of the message.
d)
Continue subdividing the intervals in this way until you have found all 8 letters.
What is the message?