Вы находитесь на странице: 1из 45

Coding Theory and

its Applications

Hung-Lin Fu
Dept. of Applied Mathematics
National Chiao Tung University
Hsin Chu, Taiwan

Basic Ideas
Messages Transmission
Correctness and Security
Save time and expense
Security Study is the main job of
Cryptography
Coding Theory not only deals with the
correctness of transmission but also
the quickness of transmission.

The flow of
Transmission
Message

Encode

Demodulation

Modulation
Decode

Original Message
Through Noisy Channel

Examples
Grades A, B, C, and D
Use digits 0 and 1 to encode
A : 00
B : 01
C : 10
D : 11
00
Send A

Receiving
Following demodulation and decoding
We expect to receive the original
message A.
Unfortunately, it is possible to make
errors due to the noise.

Probability of Errors
Let p denote the error probability of
sending 0 and receiving 1.
In a symmetric channel, sending 1 and
receiving 0 also has error probability p.
If t digits are transmitted, then the
probability of making s errors is
C(t,s)ps(1-p)(t-s).
The probability of making errors is
C(t,1)p1(1-p)t-1 + C(t,2)p2(1-p)t-2 + + pt.

(1-p)
p p
(1-p)

Symmetric Channel

It happens!

Let p = 0.01.
It looks small. But, in fact, this is a very large
number if we consider a transmission of real
world. Million digits are transmitted in a
minute. So, we have error digits about 10,000
in a minute.
Therefore, if we use 00, 01, 10, and 11 for A,
B, C, and D, then errors in transmitting words
occur! The probability of making errors(words)
is 2x(0.01)x(0.99) + (0.01)2 = 0.0199.

An Improvement
Parity check digits
00 000
01 011
10 101
11 110
The probability of making errors without
noticing is smaller!
C(3,2)x(0.01)2x(0.99) + (0.01)3 = 0.000298.
We can add more digits instead of just one.

Error Correction
When an error occurs, we may not be
able to know where is the error digit.
So, ask for retransmission.
Retransmission is not always possible.

The Idea of Correcting


Errors
00
000000
01
010101
10
101010
11
111111
Assume that 101110 is received. We
shall conclude that the message sent
is 101010!

Hamming Distance
The message we send can be
expressed as an n-dimension vector
over the finite field GF(2) if the
message has n digits.
E.g. 010101
(1,0,1,0,1,0)
Let GF(2) = K.
Kn is a set of 2n vectors.

A New Metric

Let (a1,a2, , an) and (b1,b2, , bn) be two


vectors of Kn. Then the Hamming distance
of the two vectors is the number of ks such
that ak bk is not equal to 0, k = 1, 2, , n.
E.g. d(101010,101110) = 1
d(000000,101110) = 4
d(111111,101110) = 2
d(010101,101110) = 5
Hamming distance is a metric!

Distance and Decoding


If the distance of two words u and v of length n is d,
then the probability of sending u and receiving v is
pd(1-p)n-d.
Fact: If d(w,u) > d(v,u) and u is received, then v is
more probable than w as a sending word.
e.g. Let 000000, 010101, 101010, and 111111 be the
four possible sending words and 101110 is received.
Then we choose 101010 as the sending word.

Maximum Likelihood
Decoding

Let C be the code we use for transmission


and u be the word which is received through
the channel.
CMLD(Complete Maximum Likelihood
Decoding): If v satisfies that d(v,u) is
minimum for all codewords in C, then we
conclude that v is the transmitted codeword
no matter v is unique or not.
IMLD(Incomplete MLD): If v(as above) is not
unique, then ask for retransmission.

Linear Codes

A code of length n is a subset of Kn.


A linear code of length n is a linear subspace
of Kn. (The sum of two vectors is taken under
addition of K for each coordinate.)
A linear (n,k,d)-code is a linear code with
dimension k and distance d where d is the
minimum distance between two distinct
vectors of the linear code.

Weights of Codewords
Each vector of a code is called a
codeword.
The weight of a codeword is the
number of 1s in the codeword.
E.g. wt(101011) = 4.
Proposition. The distance of a linear
code is equal to the minimum weight
of a non-zero codeword.

Main Theorem
Theorem. A code with distance d can detect d-1
errors and correct [(d-1)/2] errors.
Proof. If u and w are two codewords of
u

v
w

the
code C and d(v,w) < [(d-1)/2], then for each y in
C, d(v,y) > d(v,w).

Better Codes

The length of a codeword determines the time of


transmission.
The dimension of a linear code shows the information
rate k/n.
The distance of a code tells you how many errors which
can be detected (or corrected).
The bits which are not information bits are parity check
bits. (n-k)
A(n,d) is the maximum number of words of length n
such that the distance between two words is at least d.
A code C is (n,d)-optimal if C has A(n,d) codewords.
(A[n,d] for linear codes.)

The most Important


Problem in Coding
Theory

Given two positive integers n and d where


d < n, determine A(n,d) and A[n,d].

A(7,3) <= 27 / (1+7) = 16 (Sphere packing


bound).
A(7,3) = 16. (By direct constructions.)

Two Constructions
Use a Steiner triple system of order 7.
{1,2,4}, {2,3,5}, {3,4,6}, {4,5,7}, {5,6,1}, {6,7,2},
{7,1,3}.
1101000 0010111 0000000
0110100 1001011 1111111
0011010 1100101
0001101 1110010
1000110 0111001
0100011 1011100
1010001 0101110

Parity Check Matrix

The code we plan to construct is a linear


code of dimension 4.
By using a 7x3 matrix H of rank 3, we
conclude that the set of vectors v satisfies
vH = 0 form a linear subspace of K7 with
dimension 4.
0010111
Let Ht = 0 1 0 1 0 1 1
1011101

BCH Codes

BCH represents Bose, Chaudhuri and


Hocquengham.
The code we just construct is a 1-error
correcting BCH code.
Since no two rows (vectors) are the same, a
nonzero vector v satisfies vH = 0 has weight
at least 3. Hence the distance of the code is
3 (there are 3 rows which are dependent).
The rows of H can be considered as the set of
all non-zero elements of GF(23).

A different Point of
View

Kn can be viewed as the set of all


polynomials of degree at most n-1 with
coefficients in K.
Let Rn = K[x]/(xn+1) (xn = 1). Then Rn with
polynomial addition and multiplication is a
ring.
If f(x) is a divisor of xn+1, then the set of all
multiples of f(x) is a linear (cyclic) code of
dimension n deg(f(x)).

Quiz
Consider R7.
x7+1 = (1+x)(1+x+x3)(1+x2+x3) (?)
(Hint: 1 = -1, (1+x)2 = 1+x2.)
The set of all polynomials in R 7 which
are multiples of 1+x+x3 forms a linear
code with 16 codewords. This is
essentially the same code as
constructed above.

Reed-Solomen Codes

Instead of using K = GF(2), we shall use K =


GF(q) where q is a prime power. (It is well
known that a finite field of order q exists.) So,
the codewords are vectors with coordinates
from GF(q). The one used in CD is letting q =
28.
An RS(2r,d)-code is a linear cyclic
(2 r1,2r-d,d)-code over GF(q) generated by (x+b m+1)
(x+bm+2)(x+bm+d-1) where q = 2r, m is a
nonnegative integer and b is a primitive element
of GF(q).

Design of Compact
Discs
(Key Contributions)

1948, C.E. Shannon publishes A


mathematical theory of communication.
1950, R.W. Hamming publishes Information
about error detection/correction codes.
1958, Invention of laser.
1960, Start experiments of computer music.

Story- Continued

1960, I.S. Reed and G. Solomen constructed


Reed-Solomen codes.
1969, Klass Copaan, a Dutch physicist comes up
with the idea for compact disc.
1970, Klass complete a glass disc prototype and
decide to use laser.
1978, Philips releases the video disc player and
type of laser selected for CD players.
1980, CD standard proposed by Philips and
Sony.
1982, Philips and Sony both have products ready
to go.

Keep Going

1983, 30,000 CD players sold in U.S. and


800,000 CDs sold in U.S.
1984, Portable CD players (Sony DiscMan)
sold.
1985, CD-ROM drives hit the computer market.
1990, 9.2 millions players sold in U.S. only and
about one billion CDs sold in the world.
1997, DVD released. DVD players/movies hit
consumer market.
Now, we can not live without it.

A Brief Overview

Data storage in CD format is not simple.


Typically, a user pictures the "1s" and "0s" in
the memory of the computer as being directly
transferred to "pits" and "bumps" on the CD
disk.
To begin with the incoming data is subjected to
a series of coding operations. These coding
operations add a number of additional parity
bits to the data for error detection and correction
purposes. The data is also subject to an
interleaving process .

Concealment( )

Interpolation( ): In this technique, some average is


constructed using the valid data around an error. This
average is then substituted in for the erroneous data. Since
most music (with the possible exception of heavy metal!) is
continuous -- this method works well for concealing
relatively short errors.
Muting( ): Muting is a last ditch technique -- as it
effectively creates a brief period of silence in the audio
train. However, it is not effective to simply set all the binary
digits to zero --as this produces exactly the click that we are
trying to avoid! Instead, the volume is faded out( ) and
then back in again to conceal the error.

Error-Correcting Ability

CD players use parity and interleaving


techniques to minimize the effects of an
error on the disk. Theoretically, the
combination of parity and interleaving in a
CD player can detect and correct a burst
error of up to 4000 bad bits -- or a physical
defect 2.47 mm long. Interpolation can
conceal errors up to 13,700 or physical
defects up to 8.5 mm long. (Burst-errorcorrecting codes)

EFM modulation

EFM means Eight to Fourteen Modulation and is an


incredibly clever way of reducing errors. The idea is
to minimize the number of 0 to 1 and 1 to 0
transitions( )-- thus avoiding small pits.
In EFM only those combinations of bits are used in
which more than two but less than 10 zeros appear
continuously.

E.g. 0000 1010 EFM 10010001000000.

Figure 2

pits

Figure 4

Encoding

The original musical signal is a waveform in time. A


sample of this waveform in time is taken and
"digitized" into two 16-bit words, one for the left
channel and one for the right channel.
For example, a single sample of the musical signal
might look like:
L1 = 0111 0000 1010 1000
R1 = 1100 0111 1010 1000
Six samples (six of the left and six of the right for a
total of twelve) are taken to form a frame such as
L1 R1 L2 R2 L3 R3 L4 R4 L5 R5 L6 R6.

Sound has 216 Levels


The frame is then encoded in the form of 8-bit words.
Each 16-bit audio signal turns into two 8-bit words, such
as
L1 L1 R1 R1 L2 L2 R2 R2 L3 L3 R3 R3
L4 L4 R4 R4 L5 L5 R5 R5 L6 L6 R6 R6
This gives a grand total of 24 8-bit words. ((L,R) produces
stereo effects and one second has 44,100 ticks.)
The even words are then delayed by two blocks and the
resulting "word" scrambled.
This delay and scramble is the first part of the interleaving
process.

RS codes Show Up!

Encoded by C(227):(28,24,5)-RS:
The resulting 24 byte word (remember, it has an
included two block delay -- so some symbols in this
word are from blocks two blocks behind) has 4 bytes of
parity added. This particular parity is called "Q" parity.
Parity errors found in this part of the algorithm are called
C1 errors. More on the Q parity later.
4-frame delay interleaved:
Now, the resulting 24 + 4Q = 28 bytes word is
interleaved. Each of the 28 bytes is delayed by a
different period. Each period is an integral multiple of 4
blocks. So the first byte might be delayed by 4 blocks,
the second by 8 blocks, the third by 12 blocks and so
on. The interleaving spreads the word over a total of 28
x 4 = 112 blocks

Another RS code

Encoded by C(223):(32,28,5)-RS:
The resulting 28 byte words are again subjected to a parity
operation. This generates four more parity bytes called P
bytes which are placed at the end of the 28 bit data word. The
word is now a total of 28 + 4 = 32 bytes long. Parity errors
found in this part of the algorithm are called C2 errors.
Finally, another odd-even delay is performed -- but this time
delay by just a single block. Both the P and Q parity bits are
inverted (turning the "1s" into "0s") to assist data readout
during muting.

EFM

A subcode of length 8 is then added to the front end


of the word. The subcode specifies the total number
of selections on the disk, their length, and so on.
Next, the data-words are converted to EFM format.
EFM means Eight to Fourteen Modulation and is an
incredibly clever way of reducing errors. The idea
is to minimize the number of 0 to 1 and 1 to 0
transitions -- thus avoiding small pits. In EFM only
those combinations of bits are used in which more
than two but less than 10 zeros appear
continuously.

Encode the Sound


Each frame finally has a 24-bit synchronization word attached to the
very front end -- (just for completeness the word is
(100000000001000000000010) and each group of 14 symbols is
then coupled by three merged bits.
SO! The final frame (which started at 6*16*2 = 192 data bits) now
contains:

1 sync word
1 subcode signal
6*2*2*14 data bits
8*14 parity bits
34*3 merge bits

24 bits
14 bits
336 bits (14 comes from 8)
112 bits
102 bits

GRAND TOTAL 588 bits.

Music:

Final Words

, !

, !

You are lucky!

Вам также может понравиться