Information Theory

UNIT V
INTRODUCTION TO INFORMATION THEORY

ERROR PROBABILITY
In any mode of communication, as long as channel noise exists, the
communication cannot be error-free

However, in the case of digital systems, the accuracy can be improved by reducing
the error probability
.
For all digital systems,

Where,

b
KE
e
P e o

i
b
b
S Signal power
E bit energy
Bit rate R
= = =
Limitations on Minimising Error Probability
Increasing bit energy means:

Increasing the signal power ( for a given bit rate )

OR

Decreasing the bit rate ( for a given signal power )

OR

Both

Due to physical limitations, cannot be increased beyond a certain limit. Therefore,
in order to reduce further, we must reduce the rate of transmission of information
bits .

This implies that to obtain

In the presence of channel noise, it is not possible to obtain error-free
communication.
i
S
b
R
b
R
i
S
i
S
e
P
b
R
0, 0
e b
P we must have R
Shannon Theory
Shannon, in 1948 showed that for a given channel, as long as the rate of transmission
of information digits (R
b
) is maintained within a certain limit
(known as Channel Capacity), it is possible to achieve error-free communication.

Therefore, in order to obtain, it is not necessary to make . It
can be obtained by maintaining ,
where C = channel capacity (per sec).

The presence of random disturbance in the channel does not, by itself set any limit on
the transmission accuracy. Instead it sets a limit on the information rate for which
.
0
e
P 0
b
R

b
R C <
0
e
P
Information Capacity: Hartleys Law
Therefore, there is a theoretical limit to the rate at which information can be sent
along a channel with a given bandwidth and signal-to-noise ratio

The relationship between time, information capacity, and channel bandwidth is
given by:

HARTLEYS LAW

where I = amount of information to be sent
t = transmission time
B = channel bandwidth
k = a constant which depends on the type of data coding used and the
signal-to-noise ratio of the channel
I ktB =
Information Capacity: Shannon Hartley Law
Ignoring the noise, the theoretical limit to the amount of data that can be sent in a
given bandwidth is given by:

Shannon-Hartley Theorem

where C = information capacity in bits per second
B =the channel bandwidth in hertz
M= number of levels transmitted
2
2 log C B M =
Shannon-Hartley Law : Explanation
Consider a channel that can pass all frequencies from 0 to B hertz A simple binary signal
with alternate 1s and 0s is transmitted. Such a signal would be a simple square wave
with frequency one-half the bit-rate

Since it is square-wave, the signal will have harmonics at all odd multiples of its
fundamental frequency, with declining amplitude as the frequency increases. At very low-
bit rates, the output signal will be similar to the output after passage through the channel
but as the bit-rate increases, the frequency of the signal also increases and more of its
harmonics are filtered out making the output more and more distorted
Finally, for a bit-rate of 2B, the frequency of the input signal becomes B and only the
fundamental frequency component of the input square-wave signal will pass through the
channel

Input Signal Output Signal at maximum bit-rate 2B
Thus, with binary input signal , the channel capacity would be

2 C B =
1V
-1V
1 0 1 1 0
0 1 1
t
t
Multilevel Signalling
The previously discussed idea can be extended to any number of levels

Consider an input signal with 4-voltage levels, -1V, -0.5V,0.5V and 1V, then ach
voltage level would correspond to two bits of information

Four-level Code

Therefore, we have managed to transmit twice as much information in the same
bandwidth. However, the maximum frequency of the signal would not change.

1V
-1V
11 10 01 00
t
Shannon Limit
From the previous discussion it seems that any amount of information can be
transmitted through a given channel by simply increasing the number of levels. This is
however, not true for because of noise

As the number of levels increase, the probability of occurrence of error due to noise also
increases. Therefore, for a given noise level, the maximum data rate is given by:

where, C = information capacity in bits per second
B = bandwidth in hertz
S/N = signal-to-noise ratio
2
log 1
S
C B
N
| |
= +
|
\ .
Example
Ques: A telephone line has a bandwidth of 3.2 KHz and signal-to-noise ratio of 35 dB.
A signal is transmitted down this line using a four-level code. Find the
maximum Theoretical data rate

Ans: The maximum data rate ignoring noise is given as:

The maximum data using Shannon- limit is given as:

We will take the lesser of the two results. It also implies that it is possible to
increase the data rate over this channel by using more levels

2
3
2
2 log
=2 3.2 10 log 4
12.8 Kb/s
C B M =

=
10
/ log (35/ 10) 3162 S N anti = =
2
, log (1 / )
=37.2 /
Therefore C B S N
Kb s
= +
Measure of Information

Common-Sense Measure of Information

Consider the following three hypothetical headlines in a morning paper

Tomorrow the sun will rise in the east
United States invades Cuba
Cuba invades the United States

Now, from the point of view of common sense, the first headline conveys hardly any
information, the second one conveys a large amount of information, and the third one
conveys yet a larger amount of information.

If we look in terms of the probability of occurrence of the above three events then the
probability of occurence of the first event is unity, the probability of occurence of the
second event is very low and the probability of occurrence of the third event is practically
zero

Common-Sense Measure of Information contd
Therefore, an event with lower probability of occurrence has a greater surprise element
associated with it and hence conveys larger amount of information as compared to an
event with greater probability of occurrence.

Mathematically,

If P = Probability of occurrence of a message
I = Information content of the message

Then,

For and for, ,

1, 0 P I 0, P I
,
1
log I
P
o
The Engineering Measure of Information

From the engineering point of view, the information in a message is directly
proportional to the minimum time required to transmit the message.

It implies that a message with higher probability can be transmitted in shorter time
than required for a message with lower probability.

For efficient transmission shorter code words are assigned to the alphabets like a, e,
o, t, etc which occur more frequently and longer code words are assigned are to
alphabets like x, z, k, q, etc which occur less frequently. Therefore, the alphabets that
occur more frequently in a message (i.e., the alphabets with a higher probability of
occurrence) need a shorter time to transmit as compared to those with smaller
probability of occurrence.

Therefore, the time required to transmit a symbol (or a message) with probability of
occurrence
P is
1
log
P
o
| |
|
\ .
The Engineering Measure of Information contd

Let us consider two equiprobable binary messages m
1
and m
2
. These two
equiprobable messages require a minimum of one binary digit (which can assume two
values)

Similarly, to encode 4 equiprobable messages, we require a minimum of 2 binary
digits per message. Therefore, each of these four messages takes twice as much
transmission time as that required by each of the two equiprobable messages and
hence, it contains two times (twice) more information

Therefore, generally in order to encode each of the n equiprobable messages, the no.
of binary digits required are:

2
log r n =
The Engineering Measure of Information contd
Since, all the messages are equiprobable, therefore, the number of binary digits
required to encode each message of probability P is:

Therefore, the information content I of each message of probability P is given as:

Similarly for r-ary digits

2
1
log r
P
| |
=
|
\ .
2
2
1
log
1
log
I
P
I K
P
o
| |
|
\ .
| |
=
|
\ .
2
1
log ; for K=1 I
P
| |
=
|
\ .
1
log r-ary units
r
I
P
| |
=
|
\ .
Units of Information

Now, it is evident that:

and in general,
2
1 1
log log
r
I r ary units bits
P P
| | | |
= =
| |
\ . \ .
2
, 1 log r ary unit r bits =
1 log
s
r ary unit r s ary unit =

The 10-ary unit of information is called the Hartley
( in the honour of R.V.L Hartley)

2
1 log 10 3.32 hartley bits bits = =
2
1 log 1.44 nat e bits bits = =
AVERAGE INFORMATION PER MESSAGE:
Entropy of a Source
Consider a memory less source (i.e., each message emitted is independent of the
previous message(s)) emitting messages with probabilities
respectively ( )

Now, the information content of message is given by:

Thus, the average or mean information per message emitted by the source is given by:

1 2
, ,...,
n
m m m
1 2 3
, , ,....,
n
P P P P
1 2
.... 1
n
P P P + + + =
i
I
i
m
2
1
log
i
i
I bits
P
| |
=
|
\ .
2
1 1
( ) log
n n
i i i i
i i
H m PI bits P P bits Entropy
= =
= = =

Maximum value of Entropy
Since, entropy is the measure of uncertainity, the probability distribution that
generates maximum uncertainity will have the maximum entropy

The maximum value of is obtained as:

( ) H m
1 2 1
( )
0, , 1, 2,..., 1 ( .... )
n n
i
dH m
for i n and P P P P
dP

= = = + + +
{ }
2 2 2
1
2 2
2 2 2 2
, ( ) log log log
log log
( ) 1 1
, log log log log 0
n
i i i i n n
i
i i n n
i n i n
i i i n
H m P P P P P P
d P P P P
dH m
P e P e P P
dP dP P P
=
= = +
+ | | | |
= = + + =
| |
\ . \ .
2 2
2
log log 0
log 0
n i
n
i
P P
P
P
=
| |
=
|
\ .
Maximum value of Entropy contd

Therefore, the maximum value of entropy occurs for equiprobable messages, i.e., when

Therefore, the maximum value of entropy = the minimum no. of binary digits
required to encode the message
1 2 3

1
,
i n
n
The previous equation is true if P P
P P P P
n
=
= = = = =
| |
max
1
1 1
( ) log log
n
i
H m n
n n
=
| |
= =
|
\ .
i n
P P =
The Intuitive (Common Sense) and Engineering
Interpretation of Entropy
From the engineering point of view, the information content of any message is equal to
the minimum number of digits required to encode the message. Therefore,

Entropy = H(m) = The average value of the minimum number of digits
required for encoding each message

Now, from the intuitive (common sense) point of view, information is considered
synonymous with the uncertainity or the surprise element associated with it,, a message
with lower probability of occurrence which has greater uncertainity conveys larger
information.

Therefore, if , is a measure of uncertainity (unexpectedness) of the message,
then,

= avg. uncertainity per message of the source that generates
messages

If the source is not memory-less (i.e. a message emitted at any time is not independent
of the previous messages emitted), then the source entropy will be less than,. This is
because the dependence of the previous messages reduces its uncertainity
1
log
P
| |
|
\ .
1
1
log
i
i
i
P
P
=
| |
|
\ .
Source Encoding

We know that the minimum number of binary digits required to encode a message is
equal to the source entropy H(m) ,if all the messages are equiprobable (with
probability P) . It can be shown that this result is true even for the non-equiprobable
messages.

Let a source emit m messages with probabilities
respectively. Consider a sequence of N messages with .Let be the number
of times message occurs in this sequence. Then,

Thus, the message m
i
occurs times in the whole sequence of N messages

1 2
, , ,
n
m m m
1 2 3
, , , ,
n
P P P P
N
i
K
lim
i
i
N
K
P
N
=
i
m
i
NP
Source Encoding contd

Now, consider a typical sequence S
n
of N messages from the source.

Because, the N messages (of probability ) occur times
respectively and because each message is independent, the probability of occurrence
of a typical sequence of N messages is given by:

Therefore, the number of digits required to encode such sequence is:

1 2 3
, , , ,
n
P P P P
1 2
, ,.....,
n
NP NP NP
1 2
1 2
( ) ( ) ( ) ......( )
N
NP NP NP
N N
P S P P P =
1
1
log
( )
1
log
N
N
N
N i
i
i
L
P S
L N P
P
=
| |
=
|
\ .
| |
=
|
\ .
( ) bits
N
L NH m =
Source Encoding contd
Therefore, the average number of digits required per message is given as:

This shows that we can encode a sequence of non-equiprobable messages by using
on an average H(m) no. of bits per message.

( ) bits
N
L
L H m
N
= =
Compact Codes
The source coding theorem states that in order to encode a source with entropy
H(m), we need to have a minimum number of H(m) binary digits per message or
r-ary digits per message.

Thus, the average wordlength of an optimum code is H(m), but to attain this length ,
we have to encode a sequence of N messages (N) at a time.

However, if we wish to encode each message directly without using longer
sequences, then, the average length of the code word per message will be > H(m)

In practice, it is not desirable to use long sequences, as they cause transmission
delay and to the equipment complexity. Therefore, it is preferred to encode messages
directly, even if the price has to be paid in terms of increased wordlength.
( )
r
H m
Huffman Code

Let us suppose that we are given a set of n messages (), then to find the Huffman Code

Step 1
All the messages are arranged in the order of decreasing probability

Step 2
The last two messages (messages with least probabilities) are then combined into
one message (i.e. their probabilities are added up)

Step 3
These messages are now again arranged in the decreasing order of probability

Step 4
The whole of the process is repeated until only two messages are left (in the case of
binary digits coding) or r messages are left ( in the case of r-ary digits coding).

Step 5
In the case of binary digits coding, the two (reduced) messages are assigned 0 and
1as their first digits in the code sequence (and in the case r-ary digits coding the
reduced messages are assigned 0, 1 r-1).
Huffman Code contd

Step 6
We then go back and assign the numbers 0 and 1 to the second digit for the
two messages that were combined in the previous step. We regressing this way
until the first column is reached.

Step 7
The optimum (Huffman) code obtained this way is called the Compact Code.

The average length of the compact code is given as:

This is compared with the source entropy given by:

1
n
i i
i
L PL
=
=
1
1
( ) log
n
r i r
i
i
H m P
P
=
| |
=
|
\ .
Huffman Code contd

Code Efficiency

Redundancy

( ) H m
L
q =
1 q =
Huffman Code contd

For an r-ary code, we will have exactly r messages left in the last reduced set, if and
only if, the total number of original messages is equal to r + k(r-1), where k is an integer.
This is because each reduction decreases the number of messages by (r-1).Therefore,
if there are k reductions; there must be r + k(r-1)

In case the original number of messages does not satisfy this condition, we must add
some dummy messages with zero probability of occurrence until this condition is
satisfied

For e.g., if r = 4 and the number of messages n = 6, then we must add one dummy
message with zero probability of occurrence to make the total number of messages 7,
i.e., [4 + 1(4 1)]

EXERCISE-1
Ques. 1 Obtain the compact (Huffman) Code for the following set of messages:

Messages Probabilities
m
1
0.30
m
2
0.15
m
3
0.25
m
4
0.08
m
5
0.10
m
6
0.12
EXERCISE-1
Ans. 1 The optimum Huffman code is obtained as follows:

Messages Probabilities S
1
S
2
S
3
S
4

m
1
0.30 00 0.30 00 0.30 00 0.43 1 0.57 0
m
2
0.25 10 0.25 10 0.27 01 0.30 00 0.43 1
m
3
0.15 010 0.18 11 0.25 10 0.27 01
m
4
0.12 011 0.15 010 0.18 11
m
5
0.10 110 0.12 011
m
6
0.08 111
Ans. 1 contd
The average length of the compact code is given by:

The entropy H(m) of the source is given by:

Code Efficiency =
Redundancy =

1
(0.30 2) (0.25 2) (0.15 3) (0.12 3) (0.10 3) (0.08 3)
= 0.60 + 0.50 + 0.45 + 0.36 + 0.30 + 0.24
= 2.45
i i
n
L PL
bits
=
= = + + + + +
( )
0.976
H m
L
q = =
1 0.024 q = =
2
1
1
( ) log
= 0.521089678 0.5 0.4105448 0.3670672 0.3321928+0.2915085
=2.418
i
n
i
H m P
P
bits
=
| |
=
|
\ .
+ + + +
EXERCISE-1 contd
Ques. 2 A zero memory source emits six messages as shown Find the 4-ary
(quaternary) Huffman Code. Determine its avg. wordlength, the efficiency and the
redundancy.

m
1

0.30
m
2

0.15
m
3

0.25
m
4

0.08
m
5

0.10
m
6

0.12
EXERCISE-1 contd
Ans. 2 The optimum Huffman code is obtained as follows:

1

m
1
0.30 0 0.30 0
m
2
0.25 2 0.30 1
m
3
0.15 3 0.25 2
m
4
0.12 10 0.15 3
m
5
0.10 11
m
6
0.08 12
m
7
0.0 13
Ans. 2 contd

The entropy H(m) of the source is given by:

Code Efficiency =

Redundancy =
1
(0.30 1) (0.25 1) (0.15 1) (0.12 2) (0.1 2) (0.08 2)
= 1.3 4-ary
i i
n
L PL
bits
=
= = + + + + +
4
1
1
( ) log
=1.209 4-ary
i
n
i
H m P
P
bits
=
| |
=
|
\ .
( )
0.93
H m
L
q = =
1 0.07 q = =
EXERCISE-1 contd
Ques. 3 A zero memory source emits messages and with probabilities 0.8 and
0.2 respectively. Find the optimum binary code for the source as well as for the
second and third order extensions ( i.e. N=2 and 3)

(Ans. For N=1 H(m) = 0.72 ; = 0.72
For N=2 L = 1.56 ; = 0.923
For N=3 L = 2.184 ; = 0.989 )

Shannon Fano Code

An efficient code can be generated by the following simple procedure known as the
Shannon-Fano algorithm:

Step 1
List the source symbols in decreasing probability

Step 2
Partition the set into two sets that are as close to equiprobable as possible
Assign 0 to the upper set and 1 to the lower set.

Step 3
Continue this process, each time partitioning the sets with as nearly equal
probabilities as possible until further partitioning is not possible.
EXERCISE-2
Ques. 1 Obtain Shannon-Fano Code for the following given set of messages
from a memory-less source

Messages
Probabilities
X
1
0.30
X
2
0.25
X
3
0.20
X
4
0.12
X
5
0.08
X
6
0.05
i
x
( )
i
P x
EXERCISE-2 contd
Ans. 2
Ans. 2 contd
The optimum wordlength of the code word is:

2 0.3 2 0.25 2 0.20 3 0.12 4 0.08 4 0.05
= 2.38
L = + + + + +
2 2
2 2 2 2
( ) (0.30 log 0.30 0.25 log 0.25 0.20
log 0.20 0.12 log 0.12 0.08 log 0.08 0.05 log 0.05)

= 2.0686389

H m = + +
+ + +
( )
0.86
H m
efficiency
L
q = = =
1 0.1308239 q = =
Some more examples

Ques. 2 A memory-less source emits messages with

(i) Construct a Shannon-Fano Code for X and calculate
efficiency of the code

(ii) Repeat for the Huffman Code and compare the results.
1 2 3 4 5
, , , and x x x x x
1
( ) 0.4, P x =
2 3 4 5
( ) 0.19, ( ) 0.16, ( ) 0.15 ( ) 0.1 P x P x P x and P x = = = =
Ans. 2
The Shannon-Fano Code can be obtained as:

1
x
2
x
3
x
4
x
5
x
Ans. 2 contd

2 (0.40 0.19 0.16) 3 (0.15 0.10)
= 2.25
L = + + + +
2 2
2 2 2
( ) (0.40 log 0.40 0.19 log 0.19 0.16
log 0.16 0.15 log 0.15 0.10 log 0.10)

= 2.1497523

H m = + +
+ +
( )
0.9554454
H m
efficiency
L
q = = =
1 0.0445546 q = =
Ans. 2 contd
Now, let us generate the Huffman Code and compare the results. Huffman code is
generated as:
1
S
2
S
3

m
1
0.40 1 0.40 1 0.40 1 0.60 0
m
2

0.19 000 0.25 01 0.35 00 0.40 1
m
3

0.16 001 0.19 000 0.25 01
m
4

0.15 010 0.16 001
m
5

0.10 011
Ans. 2 contd

Therefore, we find that the efficiency of Huffman Code is better than that of
Shannon Fano Code
1
0.40 (0.19 0.16 0.15 0.10) 3
= 2.2
i i
n
L PL
bits
=
= = + + + +
( )
0.9771601
H m
efficiency
L
q = = =
Some more examples contd

Ques. 3 Construct Shannon-Fano Code and Huffman Code for five
equiprobable messages emitted from a memory-less source with
probability P=0.2

Ans. 3
Ans. 3 contd

(0.2 2) 3 (0.2 3) 2
= 2.4
L = +
2
( ) (0.2log 0.2 5)

= 2.3219281

H m =
( )
0.96747
H m
efficiency
L
q = = =
1 0.03253 q = =
Ans. 3 contd
Now, Huffman code is generated as:

1
S
2
S
3

m
1

0.20 01 0.40 1 0.40 1 0.60 0
m
2

0.20 000 0.20 01 0.40 00 0.40 1
m
3

0.20 001 0.20 000 0.20 01
m
4

0.20 10 0.20 001
m
5

0.20 11
Ans. 3 contd

The efficiency is the same for both the codes
1
(0.2 3) 2 (0.2 2) 3
= 2.4
i i
n
L PL
bits
=
= = +
( )
0.96747
H m
efficiency
L
q = = =
Construct Shannon-Fano Code and Huffman
Code for the following:
m
1
0.50
m
2
0.25
m
3
0.125
m
4

0.125
Ans: For Shannon Fano Code:
H (m) =1.75; L=1.75; =1

m
1

m
2

m
3

m
4

m
5

m
6

1
2
1
4
1
8
1
16
1
32
1
32
H (m) =1.9375; L=1.9375; =1)
1. 2.
3.

m
1

m
2

m
3

m
4

m
5

1
3
1
9
1
9
1
9
1
9
H (m) =4/3; L=4/3; =1
CHANNEL CAPACITY OF A DISCRETE MEMORYLESS
CHANNEL

Let a source emit symbols . The receiver receives symbols
The set of received symbols may or may not be identical to the transmitted set.

If the channel is noiseless
The reception of some symbol y
j
uniquely determines the message transmitted

Because of noise however, there is a certain amount of uncertainity regarding
the transmitted symbol when y
j
is received

If represents the conditional probabilities that x
i
was transmitted when y
j
is
received, then there is an uncertainity of about x
i
when y
j
is received

Thus, the average loss of information over the transmitted symbol when y
j
is
received is given as:

1 2
, ,....,
r
x x x
1 2
, ,.....,
s
y y y
( )
i j
P x y
log[1/ ( )]
i j
P x y
1
( ) ( ) log
( )
j i j
i
i j
H x y P x y bits per symbol
P x y
| |
=
|
|
\ .
Contd
When this uncertainity is averaged over all x
i
and y
j
, we obtain the average
uncertainity about a transmitted symbol when a symbol is received which is denoted
as:

This uncertainity is caused by channel noise. Hence, it is the average loss of
information about a transmitted symbol when a symbol is received. Therefore, is
also called equivocation of x with respect to y
( ) ( ) ( )
j j
j
H x y P y H x y bits per symbol =
Joint Probability =P(x , )

1
( ) ( ) log
( )
i j
j i j
i j
i j
y
P y P x y
P x y
| |
=
|
|
\ .
1
( ) ( , ) log
( )
i j
i j
i j
H x y P x y bits per symbol
P x y
| |
=
|
|
\ .
Contd
If the channel is noiseless

= probability that y
j
is received when x
i
is transmitted

This is characteristic of the channel and the receiver. thus, a given channel (with its
receiver) is specified by the channel matrix

( ) 0 H x y =
( )
j i
P y x
Contd
We can obtain reverse conditional probabilities using Bayes Rule:
( )
i j
P x y
DISCRETE MEMORYLESS CHANNELS
Channel Representation

Contd
The figure shows a Discrete Memoryless Channel

It is a statistical model with an input X and output Y

During each unit of time (signaling interval), the channel accepts an input
symbol from X, and in response it generates an output symbol from Y.

The channel is discrete when the alphabets of both X and Y are finite

It is memoryless when the current output depends on only the current input and
not on any of the previous inputs.
The Channel Matrix
As discussed earlier a channel is completely specified by the complete set of
transition probabilities. Accordingly, a DMC channel shown above is specified by a
matrix of transition probabilities [P ( )], given by:

Since each input to the channel results in some output, each row of the channel
matrix must add up to unity ,i.e.,

Y X
1
( ) 1
i j
j
P y x for all i
=
=
Contd

If the input probabilities P(X) are represented by the row matrix, then we have:

and the output probabilities P(Y) are represented by the row matrix as:

1 2
[ ( )] [ ( ) ( ) ... ( )]
m
P X P x P x P x =
Contd
If P(X) is represented as a diagonal matrix, then we have:

Then,

=Joint Probability Matrix and

The element is the joint probability of transmitting x
i
and receiving y
j

[ ( , )] [ ( )] [ ( )]
d
P X Y P X P Y X =
( , )
i j
P x y
SPECIAL CHANNELS
Lossless Channel

A channel described by a channel matrix with only one non-zero element in each
column is called a lossless channel

A lossless channel is represented as:

Therefore, in the lossless transmission, no source information is lost in
transmission.

Deterministic Channel
A channel described by a channel matrix with only one non-zero element in each row is
called a deterministic channel

A deterministic channel is represented as:

Since each row has only one non-zero element, therefore, this element must be unity.
Thus, when a given symbol is sent in a deterministic channel, it is clear which output
symbol will be received.
Noiseless Channel
A channel is called noiseless if it is both lossless and deterministic.

A noiseless channel is represented as:

Therefore, the channel matrix has only one element in each row and in each column,
and this element is unity. Also the number of input and output symbols are the same
i.e. m = n for a noiseless channel.

Binary Symmetric Channel (BSC)
A binary symmetric channel is represented as:

This channel has two inputs ( ) and two output ( ).

It is symmetric channel because the probability of receiving a 1 if a 0 is sent is the
same as the probability of receiving a 0 if a 1 sent.
This common transition probability is denoted by p.
1 2
0, 1 x x = =
1 2
0, 1 y y = =
EXERCISE-3
Ques. 1 Consider a binary channel as shown:

Find the channel matrix of the channel
Find P(y1) and P(y2) when P(x1)= P(x2)=0.5
Find the Joint Probabilities P(x1, y2), P(x2, y1)
Ans. 1
The Channel Matrix can be given as:

The Output Probability Matrix is given as:

Now, the Joint Probability Matrix is given as:

EXERCISE-3 contd
Ques. 2 Two binary channels discussed in the previous question are
connected in cascade as shown:

Find the overall channel matrix of the resultant channel, and draw the
equivalent channel diagram

Find P(z1) and P(z2) when P(x1)= P(x2)=0.5

Ans. 2
We have

The resultant equivalent channel diagram is shown as:

Ans. 2 contd
EXERCISE-3 contd
Ques. 3 A channel has the following channel matrix:

Draw the Channel diagram

If the source has equally likely outputs, compute the probabilities
associated with the channel outputs for p = 0.2
Ans. 3

Above shown is a Binary Erasure Channel with two inputs, X
1
=0;X
2
=1and 3
outputs y
1
=0 ,y
2
=e and y
3
=1; where e indicates an erasure which means that the
output is in doubt, and hence it should be erased

The output matrix for the above given channel at p=0.2 can be given as:

ERROR-FREE COMMUNICATION OVER A NOISY CHANNEL
We know that messages from a source with entropy H(m) can be encoded by using
an average of H(m) digits per message. This encoding has, however, zero
redundancy.

Hence, if we transmit these coded messages over a noisy channel, some of the
information will be received erroneously. Therefore, we cannot have error-free
communication over a noisy channel when the messages are encoded with zero
redundancy. Redundancy in general helps combat noise.

A simple example of the use of redundancy is the Single Parity Check Code in
which an extra binary digit is added to each code word to ensure that the total
number of 1s in the resulting codeword is always even (or odd). If a single error
occurs in the received code-word, the parity is violated, and the receiver requests
retransmission.
Error-Correcting Codes
The two important types of Error correcting codes include:

Block Codes Convolutional Codes

Block and Convolutional codes
Block Codes

In block codes, a block of k data
digits is encoded by a code word of
n digits (n>k), i.e., for each
sequence of k data digits, there is a
distinct code word of n digits

In block codes, k data digits are
accumulated and then encoded into
a n-digit code word.

If k data digits are transmitted by a
code word of n digits, the number of
check digits is m=n k.
The Code Efficiency (also known
as the code rate) = .

Such a code is called (n, k) code .
Convolutional Codes

In Convolutional or recurrent
codes, the coded sequence of n
digits depends not only on the k
data digits but also on the previous
N-1 data digits (N>1). Hence, the
coded sequence for a certain k
data digits is not unique but
depends on N-1 earlier data digits

In Convolutional codes, the coding
is done on a continuous, or
running basis
k
n
LINEAR BLOCK CODES
A code word comprising of n-digits and a data word comprising of k
digits can be represented by row matrices as:

c = ( )

d = ( )

Generally, in linear block codes, all the n digits of c are formed by linear
combinations (modulo-2 additions) of k data digits

A special case where and the remaining digits from
are linear combinations of is known as systematic code

1 2 3
, , ,...,
n
c c c c
1 2 3
, , ,...,
k
d d d d
1 2 3
, , ,...,
n
c c c c
1 2 3
, , ,...,
n
c c c c
1 2 3
, , ,...,
k
d d d d
1 1 2 2
, ,...,
k k
c d c d c d = = =
1
to
k n
c c
+
1 2 3
, , ,...,
k
d d d d
LINEAR BLOCK CODES
For linear Block Codes:

Minimum Distance between code words: D
min

Number of errors that can be detected:D
min
-1

Number of errors that can be corrected:

D
min
-1/2 If Dmin is odd
D
min
-2/2 If Dmin is even
In a systematic code, the first k digits of a code word are the data digits and the
last m = n k digits are the parity-check digits, formed by linear combination of
data digits :
1 2 3
, , ,...,
k
d d d d
Example 1:
For a (6, 3) code, the generator matrix is

For all eight possible data words find the corresponding code words, and
verify that this code is a single-error correcting code.
Solution
Solution contd :Decoding
Since the modulo-2 sum of any sequence with itself is zero, we get:

Decoding
Decoding contd
But because of possible channel errors, is in general a non-zero row vector s called
the Syndrome

Therefore, from the received word r we can get s and hence the error word e
i
.But
this procedure does not give a unique solution because r can be expressed in terms
of other code words other than c
i
.

Decoding contd
Since for k-dimensional data words there are code words , the equation is
satisfied by error vectors.

For e .g,

If d = 100 corresponding to it c = 100101 and an error occurred in the third digit,
then, r =101101 and e=001000

But in the case of c = 101011 and e = 001000 also the received word would be
r = 101101.

Similarly, for c = 110110 and e = 011011, again, the received word would be r =
101101

Therefore in the case of 3-bit data, there are 8 possible data words, 8 corresponding
code words and hence 8 possible error vectors for each received word.

Maximum-likelihood Rule

If we receive r, then we decide in favour of that c for which r is most likely to be
received. , i.e., c corresponding to that e which represents minimum bit errors

2
k T
s eH =
2
k
Example-2 contd
A (6, 3) code is generated according to the generating matrix in the
previous example. The receiver receives r = 100011. Determine the corresponding
data word d.
Solution
Solution contd
CYCLIC CODES

Cyclic codes area subclass of linear block codes.

In Linear block codes, the procedure for selecting a generator matrix is relatively easy
for single-error correcting codes. However it cannot carry us very far in constructing
higher order error correcting codes. Cyclic codes have a fair amount of mathematical
structure that permits us the design of higher order correcting codes.

For Cyclic codes, encoding and syndrome calculations can be easily implemented
using simple shift registers.
CYCLIC CODES contd
One of the important properties of code polynomials is that when is divided by
, the remainder is .

This property can be easily verified as:
( )
i
x c x
1
n
x +
( )
( )
i
c x
Proof:
Consider a polynomial

This is a polynomial of degree n-1 or less. There are a total of 2
k
such polynomials
corresponding to 2
k
data vectors

Thus, we obtain a linear (n, k) code generated by (A) .

Now, let us prove that this code generated is indeed cyclic
Proof contd
EXERCISE-3
Ques. 1 Find a generator polynomial g(x) for a (7, 4) cyclic code and find
code vectors for the following data vectors: 1010, 1111, 0001, and
1000.

Ans.1 Now, in this case n =7 and n-k =3 and

The generator polynomial should be of the order n-k=
Let us take:

For d = [1 0 1 0]

7 3 3 2
1 ( 1)( 1)( 1) x x x x x x + = + + + + +
3 2
( ) 1 g x x x = + +
3
( ) d x x x = +
Ans. 1 contd
SYSTEMATIC CYCLIC CODES
In the previous example, the first k digits were not necessarily the data digits.
Therefore, it is not a systematic code.

In a systematic code, the first k digits are the data digits and the last n-k digits are
the parity check digits

In a systematic cyclic code, the code word polynomial is given as:

---- (B)

where, is the remainder when is divided by g(x)
1
( ) ( ) ( )
n
c x x d x x
= +
( ) x
( )
n k
x d x
Example:
Construct a systematic (7, 4) cyclic code using a generator polynomial

Example: Solution contd
Cyclic Code Generation
Coding and Decoding of Cyclic codes can be very easily implemented using Shift
Registers and Modulo-2 adders

Systematic Code generation involves division of by g(x) which is
implemented using a shift register with feedback connections according to the
generator polynomial g(x)

An encoding circuit with n-k shift registers is shown as:

( )
n k
x d x
Cyclic Code Generation contd

The k-data digits are shifted in one at a time at the input with the switch s held at
position p1 . The symbol D represents one-digit delay

As the data digits move through the encoder, they are also shifted out onto the
output line, because the first k digits of the code word are the data digits themselves

As soon as the last (or k
th
) data digit clears the last [(n-k)
th
] register, all the registers
contain parity check digits. The switch s is now thrown to position p2,and then the
parity check digits are shifted out one at a time onto the line

Every valid code polynomial c(x) is a multiple of g(x). In case of error during
transmission , the received word polynomial r(x) will not be a multiple of g(x). Thus,

Cyclic Code Generation contd
Every valid code polynomial c(x) is a multiple of g(x). In case of error during
transmission , the received word polynomial r(x) will not be a multiple of g(x).
Thus,

1
( ) ( ) ( )
( ) ( ) Re (of degree n-k or less)
( ) ( ) ( )
r(x)= c(x)+e(x)
where, e(x) = error polynomial, then since c(x) is a multiple of g(x)
( )
s(x)=Rem
( )
r x s x r x
m x and s x m syndrome polynomial
g x g x g x
e x
g x
= + = =

Information Theory

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Information Theory

Загружено:

Авторское право:

Доступные форматы

UNIT V

INTRODUCTION TO INFORMATION THEORY

Huffman Code contd

Joint Probability =P(x , )

Cyclic Code Generation contd

Вам также может понравиться