Вы находитесь на странице: 1из 7

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

Pupil Text

18 Arithmetic Coding
One of the most powerful compression techniques is called Arithmetic Coding. This
converts the entire input data into a single floating point number. (A floating point
number is similar to a number with a decimal point, like 3.5 instead of 3 12 . However, in
arithmetic coding we are not dealing with decimal numbers so we call it a floating point
instead of a decimal point.)
We will use as our example the string (or message)
BE_A_BEE
and compress it using arithmetic coding. The first thing we do is look at the frequency
counts for the different letters:
E

Then we encode the string by dividing up the interval [0, 1] and allocate each letter an
interval whose size depends on how often it occurs in the string.

3
8

5
8

7
8

Our string starts with a 'B', so we take the 'B' interval and divide it up again in the same
way:

3 24
=
8 64

BE

3
8

30
64

The boundary between 'BE' and 'BB' is


long and starts at

BB

5
8

34
64

B_

7
8

38
64

BA

5 40
=
8 64

3
2
of the way along the interval, which is itself
8
8

3
3
2
3
30
. So the boundary is + =
. Similarly the boundary
8 8 8 64
8

between 'BB' and 'B_' is

34
3 2 5
+

, and so on.
=

8
8
8
64

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

Pupil Text

The second letter in the message is 'E', so now we subdivide the 'E' interval in the same
way. We carry on through the message ...

3
8

3 24
=
8 64

30
64

BE

5
8

34
64

BB

BEE

BEB

3 192
=
8 512

_B

7
8

38
64

BE_

210
512

222
512

BA

5 40
=
8 64

BEA
234
512

30 240
=
64 512

and, continuing in this way, we eventually obtain:

7653888
16777216

7654320
16777216

BE_A_BEE

So we represent the message as any number in the interval


7653888 7654 320
16 777 216 , 16 777 216

7654 320
easily using a computer. Computers
16 777 216
use binary numbers a system where all the numbers are made up of 0s and 1s. The first
few whole numbers in binary are 1, 10, 11, 100, 101, ... but how do they actually work?

However, we cannot send numbers like

In decimal notation, the rightmost digit to the left of the decimal point indicates the
number of units; the one to its left gives the number of tens; the next one along gives the
number of hundreds, and so on.
So

) (

) (

) (

) (

7653888 = 7 10 6 + 6 10 5 + 5 10 4 + 3 10 3 + 8 10 2 + (8 10) + 8
Binary numbers are almost exactly the same, only we deal with powers of 2 instead of
powers of 10. The rightmost digit of a binary number is units (as before); the one to its
left gives the number of 2s; the next one the number of 4s, and so on.

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

Pupil Text

) (

) (
) (
)
+ (0 2 ) + (1 2 ) + (1 2) + 1

) (

So 110100111 = 1 2 8 + 1 2 7 + 0 2 6 + 1 2 5 + 0 2 4
3

= 256 + 128 + 32 + 4 + 2 + 1
= 423 in denary (i.e. base 10)

Exercise 1
a)

Write the following denary numbers in binary.


i) 56

b)

ii) 147

iii) 623

What are these binary numbers in base 10 ?


i) 11011

ii) 1000011

iii) 101010101

We can write fractions in binary too. In denary, i.e. base 10, the first digit after the
decimal point gives the number of tenths; the next gives the number of hundredths, etc.
In binary, the first digit after the floating point gives the number of halves, the next the
number of quarters, etc. For example,
Fraction

Binary

Fraction

Binary

1
2

0.1

3 1 1
= +
4 2 4

0.11

1
4

0.01

3 1 1
= +
8 4 8

0.011

1
8

0.001

5 1 1
= +
8 2 8

0.101

Exercise 2
a)

Write the following fractions in binary.


1
5
9
ii)
iii)
i)
16
16
64

b)

What are these binary numbers as decimals in base 10 ?


i) 0.1101

ii)

0.10101

But how do we write the number

iii) 0.100011

7653888
in binary?
16 777 216

Think about how we would write it in denary (decimal). A good start is to write the
fraction in its simplest form. In this case this is easy to do because the denominator is a
power of 2 (in fact, 16 777 216 = 2 24 ). So to simplify the fraction we just divide the
numerator by 2 as many times as we can until we get an odd number, then divide the
denominator by the same amount.

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

Pupil Text

So
7653888
2 9 14 949 14 949 14 949
=
=
=
16 777 216
2 24
215
32 768
If we were calculating this in decimal, we would do it by long division. However, in
binary, it is actually a good deal simpler because the denominator is a power of 2.
To understand why, think about how you would write

14 949
14 949
in decimal: it's
=
100 000
10 5

14 949
14 949
is simply 0.014949. In general,
is 14 949.0 with the
6
10
10 n
decimal point moved to the left by n places.

just 0.14949. And

We can use exactly the same method in binary all we have to do is convert 14 949 to a
binary number and then, as the denominator is 32 768 = 215 , we move the floating point
to the left by 15 places.
But how do we convert a number to binary? Start by finding the largest power of 2
which is less than or equal to our number, then subtract it, keeping a note of which power
of 2 it was. Then repeat with the remainder, and so on. So ...
n

2n

Test

Include n ?

13

8192

14 949 > 8192

Yes

14949 8192 = 6757

12

4096

6757 > 4096

Yes

6757 4096 = 2661

11

2048

2661 > 2048

Yes

2661 2048 = 163

10

1024

613 < 1024

No

512

613 >

512

Yes

256

101 <

256

No

128

101 <

128

No

64

101 >

64

Yes

101 64 = 37

32

37 >

32

Yes

37 32 = 5

16

5 <

16

No

5 <

No

5 >

Yes

1 =

Yes

Calculate remainder

613 512 = 101

5 4 = 1

So 14 949 = 213 + 212 + 211 + 2 9 + 2 6 + 2 5 + 2 2 + 1 = 11101001100101 in binary


(corresponding to the pattern of 'Yes's' above).
Therefore

7653888
14 949
=
= 0.011101001100101 in binary.
16 777 216 32 768

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

Pupil Text

However, we can represent the message BE_A_BEE as any number in the interval

7653888 7654 320


16 777 216 , 16 777 216

Exercise 3
7654 320
is 0.01110100110010111011.
16 777 216

Show that the binary representation of

How do we choose a number to represent our message? To get the best compression, we
want to send the least possible number of binary digits so we want the shortest number
between
0.011101001100101
and
0.01110100110010111011
In this case, the first point in the interval has the shortest binary representation out of all
the points in the interval, so we choose this to represent out message. (Note that it is not
always an endpoint in general it is the number in the interval whose numerator can be
divided by 2 the most times.)
So

BE_A_BEE 011101001100101

This is 15 digits long as there is no need to send the first zero or the floating point.
Is this an effective compression? If we were using ASCII computer codes, the message
would be 8 5 = 40 digits long, so there is a significant improvement using the
Arithmetic Coding method.

Activity 1
Design a Huffman code for this message. How many digits would be in the message?

Exercise 4
We want to compress the word LOSSLESS using arithmetic coding.
The frequency counts for the characters in this message are:

(a) (i)

Fill in the gaps indicated by the letters in brackets on the following model.
(For ease of presentation, only the numerators of the fractions are written on
the diagram the denominator for each row is at the side.)

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

Pupil Text

?
8

?
64

10

112

7936

992
O

(b)

1024
S

8064

8192

63 936 64 000
O

510 144 (d)


O

509 952

(e)

128

63 744 (c)
E

?
16777216

120

984

(a)

24
S

?
262144

16

118

968

8
S

?
32768

14

114

960

?
4096

3
L

?
512

?
2097152

510 016

(f)

O
(g)

510 464

S
(h)

64 256

(i)

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

Pupil Text

j)

What interval represents the word LOSSLESS using this model?

k)

What is the interval in binary?

l)

What is the shortest binary representation of LOSSLESS that we can send using
this method?

Exercise 5
You receive the following message which has been compressed using arithmetic coding:
0110100110101
You know that the following characters were sent:
3 As, 1 C, 1 D 1 I and 2 Ns.
Decode the message.
a)

b)

What fraction does the number 0110100110101 represent? [Hint: to make


the following calculations easier, scale this fraction up so that the denominator
is 16 777 216 .]
Divide up the interval [0, 1) according to the character frequencies, as in the
example. Find where on this line your answer to a) is, and hence find the first
letter of the message.

c)

Subdivide the interval with your answer to a) in it. Where in this new interval is
your answer to a)? Hence find the second letter of the message.

d)

Continue subdividing the intervals in this way until you have found all 8 letters.
What is the message?

Вам также может понравиться