Codes U18 Text

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding
Pupil Text
18 Arithmetic Coding
One of the most powerful compression techniques is called Arithmetic Coding. This
converts the entire input data into a single floating point number. (A floating point
number is similar to a number with a decimal point, like 3.5 instead of 3 12 . However, in
arithmetic coding we are not dealing with decimal numbers so we call it a floating point
instead of a decimal point.)
We will use as our example the string (or message)
BE_A_BEE
and compress it using arithmetic coding. The first thing we do is look at the frequency
counts for the different letters:
E
Then we encode the string by dividing up the interval [0, 1] and allocate each letter an
interval whose size depends on how often it occurs in the string.
3
8
5
8
7
8
Our string starts with a 'B', so we take the 'B' interval and divide it up again in the same
way:
3 24
=
8 64
BE
3
8
30
64
The boundary between 'BE' and 'BB' is

long and starts at
BB
5
8
34
64
B_
7
8
38
64
BA
5 40
=
8 64
3
2
of the way along the interval, which is itself
8
8
3
3
2
3
30
. So the boundary is + =
. Similarly the boundary
8 8 8 64
8
between 'BB' and 'B_' is
34
3 2 5
+
, and so on.
=
8
8
8
64
Pupil Text
The second letter in the message is 'E', so now we subdivide the 'E' interval in the same
way. We carry on through the message ...
3
8
3 24
=
8 64
30
64
BE
5
8
34
64
BB
BEE
BEB
3 192
=
8 512
_B
7
8
38
64
BE_
210
512
222
512
BA
5 40
=
8 64
BEA
234
512
30 240
=
64 512
and, continuing in this way, we eventually obtain:
7653888
16777216
7654320
16777216
BE_A_BEE
So we represent the message as any number in the interval

7653888 7654 320
16 777 216 , 16 777 216
7654 320
easily using a computer. Computers
16 777 216
use binary numbers a system where all the numbers are made up of 0s and 1s. The first
few whole numbers in binary are 1, 10, 11, 100, 101, ... but how do they actually work?
However, we cannot send numbers like
In decimal notation, the rightmost digit to the left of the decimal point indicates the
number of units; the one to its left gives the number of tens; the next one along gives the
number of hundreds, and so on.
So
) (
) (
) (
) (
7653888 = 7 10 6 + 6 10 5 + 5 10 4 + 3 10 3 + 8 10 2 + (8 10) + 8
Binary numbers are almost exactly the same, only we deal with powers of 2 instead of
powers of 10. The rightmost digit of a binary number is units (as before); the one to its
left gives the number of 2s; the next one the number of 4s, and so on.
Pupil Text
) (
) (
) (
)
+ (0 2 ) + (1 2 ) + (1 2) + 1
) (
So 110100111 = 1 2 8 + 1 2 7 + 0 2 6 + 1 2 5 + 0 2 4
3
= 256 + 128 + 32 + 4 + 2 + 1
= 423 in denary (i.e. base 10)
Exercise 1
a)
Write the following denary numbers in binary.

i) 56
b)
ii) 147
iii) 623
What are these binary numbers in base 10 ?

i) 11011
ii) 1000011
iii) 101010101
We can write fractions in binary too. In denary, i.e. base 10, the first digit after the
decimal point gives the number of tenths; the next gives the number of hundredths, etc.
In binary, the first digit after the floating point gives the number of halves, the next the
number of quarters, etc. For example,
Fraction
Binary
Fraction
Binary
1
2
0.1
3 1 1
= +
4 2 4
0.11
1
4
0.01
3 1 1
= +
8 4 8
0.011
1
8
0.001
5 1 1
= +
8 2 8
0.101
Exercise 2
a)
Write the following fractions in binary.

1
5
9
ii)
iii)
i)
16
16
64
b)
What are these binary numbers as decimals in base 10 ?

i) 0.1101
ii)
0.10101
But how do we write the number
iii) 0.100011
7653888
in binary?
16 777 216
Think about how we would write it in denary (decimal). A good start is to write the
fraction in its simplest form. In this case this is easy to do because the denominator is a
power of 2 (in fact, 16 777 216 = 2 24 ). So to simplify the fraction we just divide the
numerator by 2 as many times as we can until we get an odd number, then divide the
denominator by the same amount.
Pupil Text
So
7653888
2 9 14 949 14 949 14 949
=
=
=
16 777 216
2 24
215
32 768
If we were calculating this in decimal, we would do it by long division. However, in
binary, it is actually a good deal simpler because the denominator is a power of 2.
To understand why, think about how you would write
14 949
14 949
in decimal: it's
=
100 000
10 5
14 949
14 949
is simply 0.014949. In general,
is 14 949.0 with the
6
10
10 n
decimal point moved to the left by n places.
just 0.14949. And
We can use exactly the same method in binary all we have to do is convert 14 949 to a
binary number and then, as the denominator is 32 768 = 215 , we move the floating point
to the left by 15 places.
But how do we convert a number to binary? Start by finding the largest power of 2
which is less than or equal to our number, then subtract it, keeping a note of which power
of 2 it was. Then repeat with the remainder, and so on. So ...
n
2n
Test
Include n ?
13
8192
14 949 > 8192
Yes
14949 8192 = 6757
12
4096
6757 > 4096
Yes
6757 4096 = 2661
11
2048
2661 > 2048
Yes
2661 2048 = 163
10
1024
613 < 1024
No
512
613 >
512
Yes
256
101 <
256
No
128
101 <
128
No
64
101 >
64
Yes
101 64 = 37
32
37 >
32
Yes
37 32 = 5
16
5 <
16
No
5 <
No
5 >
Yes
1 =
Yes
Calculate remainder
613 512 = 101
5 4 = 1
So 14 949 = 213 + 212 + 211 + 2 9 + 2 6 + 2 5 + 2 2 + 1 = 11101001100101 in binary

(corresponding to the pattern of 'Yes's' above).
Therefore
7653888
14 949
=
= 0.011101001100101 in binary.
16 777 216 32 768
Pupil Text
However, we can represent the message BE_A_BEE as any number in the interval
7653888 7654 320

16 777 216 , 16 777 216
Exercise 3
7654 320
is 0.01110100110010111011.
16 777 216
Show that the binary representation of
How do we choose a number to represent our message? To get the best compression, we
want to send the least possible number of binary digits so we want the shortest number
between
0.011101001100101
and
0.01110100110010111011
In this case, the first point in the interval has the shortest binary representation out of all
the points in the interval, so we choose this to represent out message. (Note that it is not
always an endpoint in general it is the number in the interval whose numerator can be
divided by 2 the most times.)
So
BE_A_BEE 011101001100101
This is 15 digits long as there is no need to send the first zero or the floating point.
Is this an effective compression? If we were using ASCII computer codes, the message
would be 8 5 = 40 digits long, so there is a significant improvement using the
Arithmetic Coding method.
Activity 1
Design a Huffman code for this message. How many digits would be in the message?
Exercise 4
We want to compress the word LOSSLESS using arithmetic coding.
The frequency counts for the characters in this message are:
(a) (i)
Fill in the gaps indicated by the letters in brackets on the following model.
(For ease of presentation, only the numerators of the fractions are written on
the diagram the denominator for each row is at the side.)
Pupil Text
?
8
?
64
10
112
7936
992
O
(b)
1024
S
8064
8192
63 936 64 000
O
510 144 (d)

O
509 952
(e)
128
63 744 (c)
E
?
16777216
120
984
(a)
24
S
?
262144
16
118
968
8
S
?
32768
14
114
960
?
4096
3
L
?
512
?
2097152
510 016
(f)
O
(g)
510 464
S
(h)
64 256
(i)
Pupil Text
j)
What interval represents the word LOSSLESS using this model?
k)
What is the interval in binary?
l)
What is the shortest binary representation of LOSSLESS that we can send using
this method?
Exercise 5
You receive the following message which has been compressed using arithmetic coding:
0110100110101
You know that the following characters were sent:
3 As, 1 C, 1 D 1 I and 2 Ns.
Decode the message.
a)
b)
What fraction does the number 0110100110101 represent? [Hint: to make

the following calculations easier, scale this fraction up so that the denominator
is 16 777 216 .]
Divide up the interval [0, 1) according to the character frequencies, as in the
example. Find where on this line your answer to a) is, and hence find the first
letter of the message.
c)
Subdivide the interval with your answer to a) in it. Where in this new interval is
your answer to a)? Hence find the second letter of the message.
d)
Continue subdividing the intervals in this way until you have found all 8 letters.
What is the message?

Codes U18 Text

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Codes U18 Text

Загружено:

Авторское право:

Доступные форматы

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

The boundary between 'BE' and 'BB' is

between 'BB' and 'B_' is

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

and, continuing in this way, we eventually obtain:

So we represent the message as any number in the interval

However, we cannot send numbers like

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

Write the following denary numbers in binary.

What are these binary numbers in base 10 ?

Write the following fractions in binary.

What are these binary numbers as decimals in base 10 ?

But how do we write the number

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

just 0.14949. And

14 949 > 8192

14949 8192 = 6757

6757 > 4096

6757 4096 = 2661

2661 > 2048

2661 2048 = 163

613 < 1024

613 512 = 101

So 14 949 = 213 + 212 + 211 + 2 9 + 2 6 + 2 5 + 2 2 + 1 = 11101001100101 in binary

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

7653888 7654 320

Show that the binary representation of

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

510 144 (d)

MEP: Codes and Ciphers, UNIT 18 Arithmetic Coding

What interval represents the word LOSSLESS using this model?

What is the interval in binary?

What fraction does the number 0110100110101 represent? [Hint: to make

Вам также может понравиться