Академический Документы
Профессиональный Документы
Культура Документы
2 2
2
log log 0
log 0
n i
n
i
P P
P
P
=
| |
=
|
\ .
Maximum value of Entropy contd
Therefore, the maximum value of entropy occurs for equiprobable messages, i.e., when
Therefore, the maximum value of entropy = the minimum no. of binary digits
required to encode the message
1 2 3
1
,
i n
n
The previous equation is true if P P
P P P P
n
=
= = = = =
| |
max
1
1 1
( ) log log
n
i
H m n
n n
=
| |
= =
|
\ .
i n
P P =
The Intuitive (Common Sense) and Engineering
Interpretation of Entropy
From the engineering point of view, the information content of any message is equal to
the minimum number of digits required to encode the message. Therefore,
Entropy = H(m) = The average value of the minimum number of digits
required for encoding each message
Now, from the intuitive (common sense) point of view, information is considered
synonymous with the uncertainity or the surprise element associated with it,, a message
with lower probability of occurrence which has greater uncertainity conveys larger
information.
Therefore, if , is a measure of uncertainity (unexpectedness) of the message,
then,
= avg. uncertainity per message of the source that generates
messages
If the source is not memory-less (i.e. a message emitted at any time is not independent
of the previous messages emitted), then the source entropy will be less than,. This is
because the dependence of the previous messages reduces its uncertainity
1
log
P
| |
|
\ .
1
1
log
i
i
i
P
P
=
| |
|
\ .
Source Encoding
We know that the minimum number of binary digits required to encode a message is
equal to the source entropy H(m) ,if all the messages are equiprobable (with
probability P) . It can be shown that this result is true even for the non-equiprobable
messages.
Let a source emit m messages with probabilities
respectively. Consider a sequence of N messages with .Let be the number
of times message occurs in this sequence. Then,
Thus, the message m
i
occurs times in the whole sequence of N messages
1 2
, , ,
n
m m m
1 2 3
, , , ,
n
P P P P
N
i
K
lim
i
i
N
K
P
N
=
i
m
i
NP
Source Encoding contd
Now, consider a typical sequence S
n
of N messages from the source.
Because, the N messages (of probability ) occur times
respectively and because each message is independent, the probability of occurrence
of a typical sequence of N messages is given by:
Therefore, the number of digits required to encode such sequence is:
1 2 3
, , , ,
n
P P P P
1 2
, ,.....,
n
NP NP NP
1 2
1 2
( ) ( ) ( ) ......( )
N
NP NP NP
N N
P S P P P =
1
1
log
( )
1
log
N
N
N
N i
i
i
L
P S
L N P
P
=
| |
=
|
\ .
| |
=
|
\ .
( ) bits
N
L NH m =
Source Encoding contd
Therefore, the average number of digits required per message is given as:
This shows that we can encode a sequence of non-equiprobable messages by using
on an average H(m) no. of bits per message.
( ) bits
N
L
L H m
N
= =
Compact Codes
The source coding theorem states that in order to encode a source with entropy
H(m), we need to have a minimum number of H(m) binary digits per message or
r-ary digits per message.
Thus, the average wordlength of an optimum code is H(m), but to attain this length ,
we have to encode a sequence of N messages (N) at a time.
However, if we wish to encode each message directly without using longer
sequences, then, the average length of the code word per message will be > H(m)
In practice, it is not desirable to use long sequences, as they cause transmission
delay and to the equipment complexity. Therefore, it is preferred to encode messages
directly, even if the price has to be paid in terms of increased wordlength.
( )
r
H m
Huffman Code
Let us suppose that we are given a set of n messages (), then to find the Huffman Code
Step 1
All the messages are arranged in the order of decreasing probability
Step 2
The last two messages (messages with least probabilities) are then combined into
one message (i.e. their probabilities are added up)
Step 3
These messages are now again arranged in the decreasing order of probability
Step 4
The whole of the process is repeated until only two messages are left (in the case of
binary digits coding) or r messages are left ( in the case of r-ary digits coding).
Step 5
In the case of binary digits coding, the two (reduced) messages are assigned 0 and
1as their first digits in the code sequence (and in the case r-ary digits coding the
reduced messages are assigned 0, 1 r-1).
Huffman Code contd
Step 6
We then go back and assign the numbers 0 and 1 to the second digit for the
two messages that were combined in the previous step. We regressing this way
until the first column is reached.
Step 7
The optimum (Huffman) code obtained this way is called the Compact Code.
The average length of the compact code is given as:
This is compared with the source entropy given by:
1
n
i i
i
L PL
=
=
1
1
( ) log
n
r i r
i
i
H m P
P
=
| |
=
|
\ .
=
= = + + + + +
( )
0.976
H m
L
q = =
1 0.024 q = =
2
1
1
( ) log
= 0.521089678 0.5 0.4105448 0.3670672 0.3321928+0.2915085
=2.418
i
n
i
H m P
P
bits
=
| |
=
|
\ .
+ + + +
EXERCISE-1 contd
Ques. 2 A zero memory source emits six messages as shown Find the 4-ary
(quaternary) Huffman Code. Determine its avg. wordlength, the efficiency and the
redundancy.
Messages Probabilities
m
1
0.30
m
2
0.15
m
3
0.25
m
4
0.08
m
5
0.10
m
6
0.12
EXERCISE-1 contd
Ans. 2 The optimum Huffman code is obtained as follows:
Messages Probabilities S
1
m
1
0.30 0 0.30 0
m
2
0.25 2 0.30 1
m
3
0.15 3 0.25 2
m
4
0.12 10 0.15 3
m
5
0.10 11
m
6
0.08 12
m
7
0.0 13
Ans. 2 contd
The average length of the compact code is given by:
The entropy H(m) of the source is given by:
Code Efficiency =
Redundancy =
1
(0.30 1) (0.25 1) (0.15 1) (0.12 2) (0.1 2) (0.08 2)
= 1.3 4-ary
i i
n
L PL
bits
=
= = + + + + +
4
1
1
( ) log
=1.209 4-ary
i
n
i
H m P
P
bits
=
| |
=
|
\ .
( )
0.93
H m
L
q = =
1 0.07 q = =
EXERCISE-1 contd
Ques. 3 A zero memory source emits messages and with probabilities 0.8 and
0.2 respectively. Find the optimum binary code for the source as well as for the
second and third order extensions ( i.e. N=2 and 3)
(Ans. For N=1 H(m) = 0.72 ; = 0.72
For N=2 L = 1.56 ; = 0.923
For N=3 L = 2.184 ; = 0.989 )
Shannon Fano Code
An efficient code can be generated by the following simple procedure known as the
Shannon-Fano algorithm:
Step 1
List the source symbols in decreasing probability
Step 2
Partition the set into two sets that are as close to equiprobable as possible
Assign 0 to the upper set and 1 to the lower set.
Step 3
Continue this process, each time partitioning the sets with as nearly equal
probabilities as possible until further partitioning is not possible.
EXERCISE-2
Ques. 1 Obtain Shannon-Fano Code for the following given set of messages
from a memory-less source
Messages
Probabilities
X
1
0.30
X
2
0.25
X
3
0.20
X
4
0.12
X
5
0.08
X
6
0.05
i
x
( )
i
P x
EXERCISE-2 contd
Ans. 2
Ans. 2 contd
The optimum wordlength of the code word is:
2 0.3 2 0.25 2 0.20 3 0.12 4 0.08 4 0.05
= 2.38
L = + + + + +
2 2
2 2 2 2
( ) (0.30 log 0.30 0.25 log 0.25 0.20
log 0.20 0.12 log 0.12 0.08 log 0.08 0.05 log 0.05)
= 2.0686389
H m = + +
+ + +
( )
0.86
H m
efficiency
L
q = = =
1 0.1308239 q = =
Some more examples
Ques. 2 A memory-less source emits messages with
(i) Construct a Shannon-Fano Code for X and calculate
efficiency of the code
(ii) Repeat for the Huffman Code and compare the results.
1 2 3 4 5
, , , and x x x x x
1
( ) 0.4, P x =
2 3 4 5
( ) 0.19, ( ) 0.16, ( ) 0.15 ( ) 0.1 P x P x P x and P x = = = =
Ans. 2
The Shannon-Fano Code can be obtained as:
1
x
2
x
3
x
4
x
5
x
Ans. 2 contd
The optimum wordlength of the code word is:
2 (0.40 0.19 0.16) 3 (0.15 0.10)
= 2.25
L = + + + +
2 2
2 2 2
( ) (0.40 log 0.40 0.19 log 0.19 0.16
log 0.16 0.15 log 0.15 0.10 log 0.10)
= 2.1497523
H m = + +
+ +
( )
0.9554454
H m
efficiency
L
q = = =
1 0.0445546 q = =
Ans. 2 contd
Now, let us generate the Huffman Code and compare the results. Huffman code is
generated as:
Messages Probabilities S
1
S
2
S
3
m
1
0.40 1 0.40 1 0.40 1 0.60 0
m
2
0.19 000 0.25 01 0.35 00 0.40 1
m
3
0.16 001 0.19 000 0.25 01
m
4
0.15 010 0.16 001
m
5
0.10 011
Ans. 2 contd
The average length of the compact code is given by:
Therefore, we find that the efficiency of Huffman Code is better than that of
Shannon Fano Code
1
0.40 (0.19 0.16 0.15 0.10) 3
= 2.2
i i
n
L PL
bits
=
= = + + + +
( )
0.9771601
H m
efficiency
L
q = = =
Some more examples contd
Ques. 3 Construct Shannon-Fano Code and Huffman Code for five
equiprobable messages emitted from a memory-less source with
probability P=0.2
Ans. 3
Ans. 3 contd
The optimum wordlength of the code word is:
(0.2 2) 3 (0.2 3) 2
= 2.4
L = +
2
( ) (0.2log 0.2 5)
= 2.3219281
H m =
( )
0.96747
H m
efficiency
L
q = = =
1 0.03253 q = =
Ans. 3 contd
Now, Huffman code is generated as:
Messages Probabilities S
1
S
2
S
3
m
1
0.20 01 0.40 1 0.40 1 0.60 0
m
2
0.20 000 0.20 01 0.40 00 0.40 1
m
3
0.20 001 0.20 000 0.20 01
m
4
0.20 10 0.20 001
m
5
0.20 11
Ans. 3 contd
The average length of the compact code is given by:
The efficiency is the same for both the codes
1
(0.2 3) 2 (0.2 2) 3
= 2.4
i i
n
L PL
bits
=
= = +
( )
0.96747
H m
efficiency
L
q = = =
Construct Shannon-Fano Code and Huffman
Code for the following:
Messages Probabilities
m
1
0.50
m
2
0.25
m
3
0.125
m
4
0.125
Ans: For Shannon Fano Code:
H (m) =1.75; L=1.75; =1
Messages Probabilities
m
1
m
2
m
3
m
4
m
5
m
6
1
2
1
4
1
8
1
16
1
32
1
32
Ans: For Shannon Fano Code:
H (m) =1.9375; L=1.9375; =1)
1. 2.
3.
Messages Probabilities
m
1
m
2
m
3
m
4
m
5
1
3
1
9
1
9
1
9
1
9
Ans: For Shannon Fano Code:
H (m) =4/3; L=4/3; =1
CHANNEL CAPACITY OF A DISCRETE MEMORYLESS
CHANNEL
Let a source emit symbols . The receiver receives symbols
The set of received symbols may or may not be identical to the transmitted set.
If the channel is noiseless
The reception of some symbol y
j
uniquely determines the message transmitted
Because of noise however, there is a certain amount of uncertainity regarding
the transmitted symbol when y
j
is received
If represents the conditional probabilities that x
i
was transmitted when y
j
is
received, then there is an uncertainity of about x
i
when y
j
is received
Thus, the average loss of information over the transmitted symbol when y
j
is
received is given as:
1 2
, ,....,
r
x x x
1 2
, ,.....,
s
y y y
( )
i j
P x y
log[1/ ( )]
i j
P x y
1
( ) ( ) log
( )
j i j
i
i j
H x y P x y bits per symbol
P x y
| |
=
|
|
\ .
Contd
When this uncertainity is averaged over all x
i
and y
j
, we obtain the average
uncertainity about a transmitted symbol when a symbol is received which is denoted
as:
This uncertainity is caused by channel noise. Hence, it is the average loss of
information about a transmitted symbol when a symbol is received. Therefore, is
also called equivocation of x with respect to y
( ) ( ) ( )
j j
j
H x y P y H x y bits per symbol =
1
( ) ( , ) log
( )
i j
i j
i j
H x y P x y bits per symbol
P x y
| |
=
|
|
\ .
Contd
If the channel is noiseless
= probability that y
j
is received when x
i
is transmitted
This is characteristic of the channel and the receiver. thus, a given channel (with its
receiver) is specified by the channel matrix
( ) 0 H x y =
( )
j i
P y x
Contd
We can obtain reverse conditional probabilities using Bayes Rule:
( )
i j
P x y
DISCRETE MEMORYLESS CHANNELS
Channel Representation
Contd
The figure shows a Discrete Memoryless Channel
It is a statistical model with an input X and output Y
During each unit of time (signaling interval), the channel accepts an input
symbol from X, and in response it generates an output symbol from Y.
The channel is discrete when the alphabets of both X and Y are finite
It is memoryless when the current output depends on only the current input and
not on any of the previous inputs.
The Channel Matrix
As discussed earlier a channel is completely specified by the complete set of
transition probabilities. Accordingly, a DMC channel shown above is specified by a
matrix of transition probabilities [P ( )], given by:
Since each input to the channel results in some output, each row of the channel
matrix must add up to unity ,i.e.,
Y X
1
( ) 1
i j
j
P y x for all i
=
=
Contd
If the input probabilities P(X) are represented by the row matrix, then we have:
and the output probabilities P(Y) are represented by the row matrix as:
1 2
[ ( )] [ ( ) ( ) ... ( )]
m
P X P x P x P x =
Contd
If P(X) is represented as a diagonal matrix, then we have:
Then,
=Joint Probability Matrix and
The element is the joint probability of transmitting x
i
and receiving y
j
[ ( , )] [ ( )] [ ( )]
d
P X Y P X P Y X =
( , )
i j
P x y
SPECIAL CHANNELS
Lossless Channel
A channel described by a channel matrix with only one non-zero element in each
column is called a lossless channel
A lossless channel is represented as:
Therefore, in the lossless transmission, no source information is lost in
transmission.
Deterministic Channel
A channel described by a channel matrix with only one non-zero element in each row is
called a deterministic channel
A deterministic channel is represented as:
Since each row has only one non-zero element, therefore, this element must be unity.
Thus, when a given symbol is sent in a deterministic channel, it is clear which output
symbol will be received.
Noiseless Channel
A channel is called noiseless if it is both lossless and deterministic.
A noiseless channel is represented as:
Therefore, the channel matrix has only one element in each row and in each column,
and this element is unity. Also the number of input and output symbols are the same
i.e. m = n for a noiseless channel.
Binary Symmetric Channel (BSC)
A binary symmetric channel is represented as:
This channel has two inputs ( ) and two output ( ).
It is symmetric channel because the probability of receiving a 1 if a 0 is sent is the
same as the probability of receiving a 0 if a 1 sent.
This common transition probability is denoted by p.
1 2
0, 1 x x = =
1 2
0, 1 y y = =
EXERCISE-3
Ques. 1 Consider a binary channel as shown:
Find the channel matrix of the channel
Find P(y1) and P(y2) when P(x1)= P(x2)=0.5
Find the Joint Probabilities P(x1, y2), P(x2, y1)
Ans. 1
The Channel Matrix can be given as:
The Output Probability Matrix is given as:
Now, the Joint Probability Matrix is given as:
EXERCISE-3 contd
Ques. 2 Two binary channels discussed in the previous question are
connected in cascade as shown:
Find the overall channel matrix of the resultant channel, and draw the
equivalent channel diagram
Find P(z1) and P(z2) when P(x1)= P(x2)=0.5
Ans. 2
We have
The resultant equivalent channel diagram is shown as:
Ans. 2 contd
EXERCISE-3 contd
Ques. 3 A channel has the following channel matrix:
Draw the Channel diagram
If the source has equally likely outputs, compute the probabilities
associated with the channel outputs for p = 0.2
Ans. 3
Above shown is a Binary Erasure Channel with two inputs, X
1
=0;X
2
=1and 3
outputs y
1
=0 ,y
2
=e and y
3
=1; where e indicates an erasure which means that the
output is in doubt, and hence it should be erased
The output matrix for the above given channel at p=0.2 can be given as:
ERROR-FREE COMMUNICATION OVER A NOISY CHANNEL
We know that messages from a source with entropy H(m) can be encoded by using
an average of H(m) digits per message. This encoding has, however, zero
redundancy.
Hence, if we transmit these coded messages over a noisy channel, some of the
information will be received erroneously. Therefore, we cannot have error-free
communication over a noisy channel when the messages are encoded with zero
redundancy. Redundancy in general helps combat noise.
A simple example of the use of redundancy is the Single Parity Check Code in
which an extra binary digit is added to each code word to ensure that the total
number of 1s in the resulting codeword is always even (or odd). If a single error
occurs in the received code-word, the parity is violated, and the receiver requests
retransmission.
Error-Correcting Codes
The two important types of Error correcting codes include:
Block Codes Convolutional Codes
Block and Convolutional codes
Block Codes
In block codes, a block of k data
digits is encoded by a code word of
n digits (n>k), i.e., for each
sequence of k data digits, there is a
distinct code word of n digits
In block codes, k data digits are
accumulated and then encoded into
a n-digit code word.
If k data digits are transmitted by a
code word of n digits, the number of
check digits is m=n k.
The Code Efficiency (also known
as the code rate) = .
Such a code is called (n, k) code .
Convolutional Codes
In Convolutional or recurrent
codes, the coded sequence of n
digits depends not only on the k
data digits but also on the previous
N-1 data digits (N>1). Hence, the
coded sequence for a certain k
data digits is not unique but
depends on N-1 earlier data digits
In Convolutional codes, the coding
is done on a continuous, or
running basis
k
n
LINEAR BLOCK CODES
A code word comprising of n-digits and a data word comprising of k
digits can be represented by row matrices as:
c = ( )
d = ( )
Generally, in linear block codes, all the n digits of c are formed by linear
combinations (modulo-2 additions) of k data digits
A special case where and the remaining digits from
are linear combinations of is known as systematic code
1 2 3
, , ,...,
n
c c c c
1 2 3
, , ,...,
k
d d d d
1 2 3
, , ,...,
n
c c c c
1 2 3
, , ,...,
n
c c c c
1 2 3
, , ,...,
k
d d d d
1 1 2 2
, ,...,
k k
c d c d c d = = =
1
to
k n
c c
+
1 2 3
, , ,...,
k
d d d d
LINEAR BLOCK CODES
For linear Block Codes:
Minimum Distance between code words: D
min
Number of errors that can be detected:D
min
-1
Number of errors that can be corrected:
D
min
-1/2 If Dmin is odd
D
min
-2/2 If Dmin is even
In a systematic code, the first k digits of a code word are the data digits and the
last m = n k digits are the parity-check digits, formed by linear combination of
data digits :
1 2 3
, , ,...,
k
d d d d
Example 1:
For a (6, 3) code, the generator matrix is
For all eight possible data words find the corresponding code words, and
verify that this code is a single-error correcting code.
Solution
Solution contd :Decoding
Since the modulo-2 sum of any sequence with itself is zero, we get:
Decoding
Decoding contd
But because of possible channel errors, is in general a non-zero row vector s called
the Syndrome
Therefore, from the received word r we can get s and hence the error word e
i
.But
this procedure does not give a unique solution because r can be expressed in terms
of other code words other than c
i
.
Decoding contd
Since for k-dimensional data words there are code words , the equation is
satisfied by error vectors.
For e .g,
If d = 100 corresponding to it c = 100101 and an error occurred in the third digit,
then, r =101101 and e=001000
But in the case of c = 101011 and e = 001000 also the received word would be
r = 101101.
Similarly, for c = 110110 and e = 011011, again, the received word would be r =
101101
Therefore in the case of 3-bit data, there are 8 possible data words, 8 corresponding
code words and hence 8 possible error vectors for each received word.
Maximum-likelihood Rule
If we receive r, then we decide in favour of that c for which r is most likely to be
received. , i.e., c corresponding to that e which represents minimum bit errors
2
k T
s eH =
2
k
Example-2 contd
A (6, 3) code is generated according to the generating matrix in the
previous example. The receiver receives r = 100011. Determine the corresponding
data word d.
Solution
Solution contd
CYCLIC CODES
Cyclic codes area subclass of linear block codes.
In Linear block codes, the procedure for selecting a generator matrix is relatively easy
for single-error correcting codes. However it cannot carry us very far in constructing
higher order error correcting codes. Cyclic codes have a fair amount of mathematical
structure that permits us the design of higher order correcting codes.
For Cyclic codes, encoding and syndrome calculations can be easily implemented
using simple shift registers.
CYCLIC CODES contd
One of the important properties of code polynomials is that when is divided by
, the remainder is .
This property can be easily verified as:
( )
i
x c x
1
n
x +
( )
( )
i
c x
Proof:
Consider a polynomial
This is a polynomial of degree n-1 or less. There are a total of 2
k
such polynomials
corresponding to 2
k
data vectors
Thus, we obtain a linear (n, k) code generated by (A) .
Now, let us prove that this code generated is indeed cyclic
Proof contd
EXERCISE-3
Ques. 1 Find a generator polynomial g(x) for a (7, 4) cyclic code and find
code vectors for the following data vectors: 1010, 1111, 0001, and
1000.
Ans.1 Now, in this case n =7 and n-k =3 and
The generator polynomial should be of the order n-k=
Let us take:
For d = [1 0 1 0]
7 3 3 2
1 ( 1)( 1)( 1) x x x x x x + = + + + + +
3 2
( ) 1 g x x x = + +
3
( ) d x x x = +
Ans. 1 contd
SYSTEMATIC CYCLIC CODES
In the previous example, the first k digits were not necessarily the data digits.
Therefore, it is not a systematic code.
In a systematic code, the first k digits are the data digits and the last n-k digits are
the parity check digits
In a systematic cyclic code, the code word polynomial is given as:
---- (B)
where, is the remainder when is divided by g(x)
1
( ) ( ) ( )
n
c x x d x x
= +
( ) x
( )
n k
x d x
Example:
Construct a systematic (7, 4) cyclic code using a generator polynomial
Example: Solution contd
Cyclic Code Generation
Coding and Decoding of Cyclic codes can be very easily implemented using Shift
Registers and Modulo-2 adders
Systematic Code generation involves division of by g(x) which is
implemented using a shift register with feedback connections according to the
generator polynomial g(x)
An encoding circuit with n-k shift registers is shown as:
( )
n k
x d x