Вы находитесь на странице: 1из 13

ETN2126 INFORMATION THEORY AND ERROR CODING

FACULTY OF ENGINEERING

LAB SHEET

INFORMATION THEORY AND ERROR CODING

ETN2126

TRIMESTER 2 (2017/2018)

Experiment 1: IT1 – Huffman Coding

Note: Students are advised to read through this lab sheet before doing experiment. On-
the-spot evaluation may be carried out during or at the end of the experiment. Your
performance, teamwork effort, and learning attitude will count towards the marks.

0
IT1: HUFFMAN CODING
ETN2126 INFORMATION THEORY AND ERROR CODING

Multimedia University
Experiment 1: Huffman Coding
1. OBJECTIVE
 To apply source coding techniques using Huffman Codes.
 To design and generate Huffman Codes with MATLAB software.

2. SOFTWARE REQUIRED
 MATLAB

3. INTRODUCTION TO MATLAB

3.1 MATLAB
MATLAB is a powerful collection of tools for algorithm development, computation and
visualization. It provides more control and flexibility compared to a traditional high-
level programming language. Unlike such languages, MATLAB is compact and easy to
learn, letting you express algorithms in concise, readable code. In addition, MATLAB
provides an extensive set of ready-to-use functions including mathematical and matrix
operations, graphics, color and sound control, and low-level file I/O. MATLAB is readily
extendable - you can use the MATLAB language to easily create functions that operate as
part of the MATLAB environment.

The MATLAB User's Guide describes how to work with the MATLAB language. It
discusses how to enter and manipulate data and use MATLAB's extensive collection of
functions. It explains how to perform command-line computations and how to create
your own functions and scripts. The MATLAB Reference Guide provides reference
descriptions of supplied MATLAB functions and commands.

3.2 SIMULINK
SIMULINK is a dynamic system simulation environment. It allows you to represent
systems as block diagrams, which you build using your mouse to connect blocks and
your keyboard to edit block parameters.

The SIMULINK User's Guide describes how to work with SIMULINK. It also provides
reference descriptions of each block in the standard SIMULINK libraries.

3.3 Communications Toolbox


The Communications Toolbox is a collection of computation functions and simulation
blocks for research, development, system design, analysis, and simulation in the
communications area. The toolbox is designed to be suitable for both experts and
beginners in the communications field. The toolbox contains ready-to-use functions and
blocks, which you can easily modify to implement your own schemes, methods, and
algorithms. The Communications Toolbox is designed for use with MATLAB and

1
IT1: HUFFMAN CODING
ETN2126 INFORMATION THEORY AND ERROR CODING

SIMULINK. The Communications Toolbox includes:


 SIMULINK blocks
 MATLAB functions

You can use the toolbox directly from the MATLAB workspace. You can use the
SIMULINK environment to construct a simulation block diagram for your
communication system. For convenience, the Communications Toolbox provides online
interactive examples that use any of the function found in this toolbox.

3.4 Available Functions


This toolbox supports a variety of functions in the communications area. These functions
include:
 Data source
 Source coding/ decoding
 Error-control coding
 Modulation/ demodulation
 Transmission/ reception filters
 Transmitting channel
 Multiple access
 Synchronization
 Utilities

4. MATLAB FUNDAMENTALS
In this section, we will illustrate some commands used for this experiment. (For additional
information on the individual MATLAB commands, we refer you to the MATLAB User's Guide or
the online help facility.)

4.1 Fundamental Expressions


Working in the MATLAB environment is straightforward because most commands are
entered as you would write them mathematically. For example, entering the following
simple expression

>>a = 4/3

yields the MATLAB response

a=
1.3333

In general, it is better to use appropriate and memorable names for variables. MATLAB
recognizes the first 19 characters of a variable name but the first character in a variable
name be a letter. MATLAB is case sensitive and all MATLAB commands are written in
lowercase letters.

2
IT1: HUFFMAN CODING
ETN2126 INFORMATION THEORY AND ERROR CODING

If you prefer to create a new variable but do not wish to see the MATLAB response, type
a semicolon at the end of the expression. For example,

>>b = 4+7;

will create a new variable b whose value is 11, but MATLAB will not display the value
of b.

4.2 Creating Script Files


It is much more convenient to use script files than to enter commands line by line at the
MATLAB prompt. A script file is an ASCII file (regular text file) that contains a series of
MATLAB commands written just as you would enter them in the MATLAB environment.
Statements that begin with a '%' are considered to be comments and are ignored by
MATLAB. The script file is created outside of MATLAB with any available text editor
or word processor. Each script file should have a name that ends in ".m". The commands
in the script file are executed in the MATLAB environment simply by entering the name
of the script file without the ".m". For example, suppose the text file magphase.m
contains the following statements:

% magphase.m: example m-file to


% compute the magnitude and phase of G at  = 1.
 =1;
G =l/(j* + 2);
mag =abs(G)
phase =atan(imag(G)/real(G))

Then type magphase at the MATLAB prompt will yield the following MATLAB
response:

mag=
0.4472

phase=
-0.4636

which are the magnitude and phase of the transfer function G(j) = l/(j + 2) evaluated at
 = 1. By entering help magphase will give you back the text in the comment lines at
the beginning of the file:

magphase.m: example m-file to


compute the magnitude and phase of G at =1.

It should be obvious that it is very desirable to include at least some brief comments as a
header to each m-file.

3
IT1: HUFFMAN CODING
ETN2126 INFORMATION THEORY AND ERROR CODING

4.3 Matrices
Matrices are entered into MATLAB by listing the elements of the matrix and enclosing
them within a pair of brackets. Commas or blanks separate elements of a single row, and
semicolons or carriage returns separate rows. For example,

>>A = [1 2; 3 4]

yields the MATLAB response

A=
1 2
3 4

To get the dimensions of a matrix, we can use the size command, e.g.,

>>[K,N] = size(A)
K=
2
N=
2

Special vectors can be created using the ':' operator. For example, the command

>>k= 1:10

will generate a row vector with elements from 1 to 10 with increment 1.

Using the simple commands described thus far, you can easily manipulate both matrices
and vectors. For example, to add a row onto matrix A we type,

>>A = [A; [7 8]]


A=
1 2
3 4
7 8

To extract the submatrix of A that consists of the first through second columns of the
second through third rows, use vectors as indices as follows:

>>B = A(2:3,1:2)
B=
3 4
7 8

MATLAB has commands for generating special matrices. For example, you can create a
n x n identity matrix with the eye(n). The zeros, ones and rand commands work

4
IT1: HUFFMAN CODING
ETN2126 INFORMATION THEORY AND ERROR CODING

similarly to eye and make matrices with elements equal to zero, elements equal to one,
and random elements, respectively. These commands can also be used to create non-
square matrices. For example, zeros(2,4) generates a 2 x 4 matrix of zeros.

Finally, A = A' command will give the transpose of the matrix A.

4.4 Additional Commands

A) HUFFMANDICT
DICT = HUFFMANDICT(SYM, PROB)
Generates a binary Huffman code dictionary using the maximum variance algorithm for
the distinct symbols given by the SYM vector. The symbols can be represented as a
numeric vector or single-dimensional alphanumeric cell array. The second input, PROB,
represents the probability of occurrence for each of these symbols. SYM and PROB must
be of same length.

DICT = HUFFMANDICT(SYM, PROB, N)


Generates an N-ary Huffman code dictionary using the maximum variance algorithm. N
is an integer between 2 and 10 (inclusive) that must not exceed the number of source
symbols whose probabilities appear in PROB.

DICT = HUFFMANDICT(SYM, PROB, N, VARIANCE)


Generates an N-ary Huffman code with the specified variance. The possible values for
VARIANCE are 'min' and 'max'.

[DICT, AVGLEN] = HUFFMANDICT(...)


Returns the average codeword length of the Huffman code.

B) HUFFMANENCO
ENCO = HUFFMANENCO(SIG, DICT)
Encodes the input signal, SIG, based on the code dictionary, DICT. The code dictionary is
generated using the HUFFMANDICT function. Each of the symbols appearing in SIG
must be present in the code dictionary, DICT. The SIG input can be a numeric vector or a
single-dimensional cell array containing alphanumeric values.

C) HUFFMANDECO
HUFFMANDECO(COMP, DICT)
Decodes the numeric Huffman code vector COMP using the code dictionary, DICT. The
encoded signal is generated by the HUFFMANENCO function. The code dictionary can
be generated using the HUFFMANDICT function. The decoded signal will be a numeric
vector if the original signals are only numeric. If any signal value in DICT is
alphanumeric, then the decoded signal will be represented as a single-dimensional cell

5
IT1: HUFFMAN CODING
ETN2126 INFORMATION THEORY AND ERROR CODING

array.

5. THEORY

5.1 Source Coding


Source encoding (or source coding) is a process by the help of which the data generated
by a discrete source is efficiently represented. The device that performs the representation
is called a source encoder.

For a source encoder to be efficient, we require knowledge of the statistics of the source.
In particular, if some source symbols are known to be more probable than the others, then
we may exploit this feature in the generation of a source code by assigning short code
words to frequent source symbols, and long code words to rare source symbols. We refer
to such a source code as a variable-length code.

An efficient source encoder should satisfy the following two functional requirements:
1. The code words produced by the encoder are in binary form.
2. The source code is uniquely decodable, so that the original source sequence can be
reconstructed perfectly from the encoded binary sequence.

Consider a discrete memoryless source whose output sk is converted by the source


encoder into a block of 0s and 1s, denoted by bk. We assume that the source has an
alphabet with K different symbols, and that the kth symbol sk occurs with probability pk,
k = 0, 1, …, K-1. Let the binary code word assigned to sk by the encoder have a length lk,
measured in bits. We define the average code word length,L, of the source encoder as

L =  pklk k = 0, 1, …, K-1

L represents the average number of bits per source symbol used in the encoding process.

Let Lmin denote the minimum possible value of L. We then define the coding efficiency
of the source encoder as

 = Lmin /L

With L  Lmin, we clearly have   1. The source encoder is said to be efficient when 
approaches unity.

5.2 Source Coding Theorem


Given a discrete memoryless source of entropy H(S), the average code-word length L
for any distortionless source encoding is bounded as

L  H(S)

This is also known as Noiseless coding theorem or Shannon’s first theorem.

6
IT1: HUFFMAN CODING
ETN2126 INFORMATION THEORY AND ERROR CODING

Accordingly, the entropy H(S) represents a fundamental limit on the average number of
bits per source symbol necessary to represent a discrete memoryless source. In other
words, L cannot be made smaller than H(S).

Thus with Lmin = H(S), we may rewrite the efficiency of a source encoder in terms of the
entropy H(S) as

 = H(S) /L

5.3 Huffman Coding


Huffman code is a variable-length code. The basic principle of Huffman coding is to
assign short codes to symbols with high p, and long codes to symbols with low p. The
purpose is to construct a code such that its average code-word length L that approaches
the minimum i.e. entropy of source H(S).

The followings are the encoding steps:

1. The source symbols are listed in order of decreasing probability.


2. The two symbols with the lowest probability are assigned a 0 and a 1.
3. The probabilities of the last two symbols are added.
4. The list of symbols is re-sorted with the last two symbols combined.
5. Repeat steps 2, 3, 4 until there are only two symbols left.
6. Assign the final two combined symbols a 0 and a 1.

6. PROCEDURE
Step 1: Start the MATLAB program.
Step 2: In the Command Window, type the MATLAB code line by line, as given in
Example (see Section 7). Observe the result after typing each line.

Alternatively, you can type the entire MATLAB code in an m-file (this is actually the
preferred method). By saving the m-file, you can run the code at a later date, or you can
make minor modifications without retyping the entire code, or you can copy the code to
another computer to be executed there.

Step 1: Start the MATLAB program.


Step 2: To view the present working directory, type:
pwd
This is where all your m-files will be stored. To change the present working
directory, use the "cd" command. For example:
cd d:\infot\exp1
Step 3: Go to File -> New -> M-file. The MATLAB Editor window will open.
Step 4: Type the entire code as given in Example (see Section 7).

7
IT1: HUFFMAN CODING
ETN2126 INFORMATION THEORY AND ERROR CODING

Step 5: Save the m-file by going to File -> Save As. Choose a short name for your m-
file. Make sure it contains no spaces.
Step 6: To run the m-file from the MATLAB Editor window, go to Tools -> Run. You
can also execute the m-file from the Command Window by typing the m-file
name (without the “.m”) on the command line.

7. EXAMPLE
STEP 1: Generating the message sequence
Let us generate a message which consists of one sentence of text: 'information theory is
interesting'. The message will be stored in a variable called msg. In the MATLAB
command window, type
msg = 'information theory is interesting'
Our message contains a total of 33 characters, including spaces.

STEP 2: Estimating the source


Before a Huffman coder can properly encode the given message, it requires information
about the source alphabet and the source statistics. The source alphabet is a list of
distinct symbols in the message sequence. There are altogether 14 distinct symbols in our
message:
symb = {'i' 'n' 'f' 'o' 'r' 'm' 'a' 't' ' ' 'h' 'e' 'y' 's' 'g'}
The source statistics, on the other hand, is a list of probabilities corresponding to each
symbol in the source alphabet. Since we have 14 distinct symbols, our source statistics
should have 14 probabilities:
p = [5/33 4/33 1/33 3/33 3/33 1/33 1/33 4/33 3/33 1/33 3/33 1/33 2/33
1/33]

Observe that we have enclosed the elements of symb with curly braces { } instead of the
usual square brackets [ ]. Doing so indicates that symb is a cell array, instead of a
character array. The variable symb will become an input to a later function called
huffmandict, which requires that the source alphabet be of cell data type.

STEP 3: Constructing the Huffman codebook


Based on the supplied source alphabet and source statistics, the MATLAB function
huffmandict generates the Huffman codebook:
dict = huffmandict(symb, p)
The entire codebook construction process (including the Huffman tree) is done in the
background, and is invisible to the user. The resulting codebook is stored in the variable
dict.

8
IT1: HUFFMAN CODING
ETN2126 INFORMATION THEORY AND ERROR CODING

Like the variable symb previously, the variable dict is also a cell array. In order to
access the elements in dict, we again use curly braces { }. To see the Huffman code for
the first symbol, type:
dict{1,:}
To view the Huffman code for the rest of the symbols, simply increment the subscript
value:
dict{2,:}
dict{3,:}

STEP 4: Encoding the message


Now that the Huffman codebook is ready, we may proceed to convert the message
sequence into a binary stream:
binstream = huffmanenco(msg, dict)
The encoding process simply consists of replacing each symbol in the message with the
corresponding binary word in the codebook.

STEP 5: Decoding the binary stream


The binary stream will be transmitted over a communications channel to the receiver. At
the receiving end, it will be decoded in order to recover the original message sequence.
Message decoding is generally performed with the help of a code tree. In MATLAB, we
simply type:
msgdeco = huffmandeco(binstream, dict)
Huffman code is a type of prefix code, so no ambiguity will arise in the decoding process.
If our transmission is error-free, the decoded message should match the original message
emitted by the source. Also note that the same dictionary (or codebook) is used for both
encoding and decoding.

8. PROBLEM
1. Generate a message sequence that incorporates your name. The message should be
phrased as
'my name is <insert name here>'
Store the message in a variable called myname.
2. Estimate the source alphabet and source statistics of your message. Store the source
alphabet in the variable salpha, and the source statistics in the variable sstat.
3. Draw the Huffman tree corresponding to the information found in part 2 (do this by

9
IT1: HUFFMAN CODING
ETN2126 INFORMATION THEORY AND ERROR CODING

hand). Trace the branches in the Huffman tree to construct the Huffman codebook.
4. Type the MATLAB code that automatically generates the Huffman codebook.
Compare this codebook with the one constructed manually in part 3. Is there any
difference between the two codebooks? If so, why?
5. Encode your message sequence and convert it to a binary stream.
6. Answer the following questions:
(a) How many bits were used to encode your message?
(b) If 7-bit ASCII were employed instead of Huffman, what would the length of
your binary stream be?
(c) Estimate the percentage of compression that is achieved by Huffman coding
relative to using 7-bit ASCII.
(d) Calculate the efficiency of your Huffman coder.

9. REPORT
Print out and submit your report 2 weeks after the laboratory session. The report should
contain the following:

 Introduction
 Procedures
 Results and Discussion
 Conclusions
 Appendices (MATLAB code)

Lab report writing is an individual work. Fabricating results and copying the report of
others are strictly prohibited. Severe penalties will be imposed for any such offences.

10
IT1: HUFFMAN CODING
ETN2126 INFORMATION THEORY AND ERROR CODING

FACULTY OF ENGINEERING

LAB REPORT SUBMISSION

ETN2126 INFORMATION THEORY


AND ERROR CODING

TRIMESTER 2 SESSION 2017/2018

Student Name: ………..………………………………………………………………


Student ID: ………………………………
Lab Group No.: ………………………………
Degree Major: TE / MCE
Declaration of originality:
I declare that all sentences, results and data mentioned in this report are from my own
work. All work derived from other authors have been listed in the references. I understand
that failure to do this is considered plagiarism and will be penalized.
Note that collaboration and discussions in conducting the experiments are allowed but
copying and any act of cheating in the report, results and data are strictly prohibited.

Student signature: …………………………………………

Experiment title: IT1 – Huffman Coding


Experiment Date: ………………………………
Table/PC No.: ………………………………
Date Submitted: ……………………................
Lab Instructor Name: ……………………………………… Verified:………………….
(Please get your lab instructor signature after they have verified your result)

11
IT1: HUFFMAN CODING
ETN2126 INFORMATION THEORY AND ERROR CODING

12
IT1: HUFFMAN CODING

Вам также может понравиться