Team 11

EE175WS-00-11
EE175WS00-11
Team # 11
EE 175AB: Senior Design Project
June 14, 2000
John P. Jones
Technical Advisor: Frank Vahid
Project Advisor: Barry S. Todd

EE175WS-00-11
Executive Summary
This design project consisted of designing and implementing a JPEG decoder system.
JPEG is a commonly used digital image compression algorithm officially known as ISO
Standard 10918-1. JPEG coding allows digital images to be stored in a compressed form
that achieves anywhere from 12:1 to 100:1 depending on the acceptable loss in image
quality of the compression.
This JPEG decoder system is primarily intended for use in consumer electronics devices
such as digital cameras. This use requires the design to have a number of features
including low price, low power, compact design, and high speed. To meet these
requirements the system was designed as a custom digital logic component described in the
standard hardware description language VHDL (VHSIC Hardware Description Language).
This type of design is a high-level description of the system that is then translated into a
digital circuit.
A number of challenges needed to be met to design and implement a JPEG decoder in
hardware rather than in software running on a microprocessor. JPEG coding normally
requires many floating-point calculations. Since these types of calculations are not
efficiently implemented in custom hardware they were replaced by scaled fixed-point
approximations. Also the JPEG decoding algorithm requires a substantial amount of
memory. To reduce the memory requirements of the JPEG decoding only the core
algorithm, which works on relatively small blocks of data, was implemented.
To test and demonstrate the design a Field Programmable Gate Array (FPGA) prototype
board was purchased. Unfortunately the project cannot currently be downloaded to the
prototype board due to time constraints. The JPEG decoder system does work in
Simulation however and the results of that simulation will be presented.
1
EE175WS-00-11
Acknowledgements
I would like to thank the following individuals for their assistance throughout the course of
this project. Their help has greatly improved the project, my understanding of the concepts
and the design process in general.
Dr. Frank Vahid
Dr. Vahid has been very helpful in providing the resources necessary for this project.
Dr. Vahid was also very helpful in helping me to decide upon this project and
providing me with ideas on where to find information helpful to the project.
Barry Todd
Mr. Todd has provided a lot of guidance on the design process and project
management.
Tony Givargis
Tony Givargis has helped considerably by providing me with a book on Graphics File
Formats and a rough Inverse Discrete Cosine Transform unit, which I was able to
modify and incorporate into the design. Also Tonys PC side serial interface library
for Windows was used to communicate with the XESS board.
Jeremy Thorpe
Jeremy Thorpe was helpful in helping to configure and test the board and to help
develop the serial communications devices on the board. In the early parts of
testing Jeremy and I were able to work together to solve our common problems
communicating with the development board.
2
EE175WS-00-11
Keywords / Terminology
Following is a list of some important terminology used in this report and a brief description of
its usage.
JPEG (Joint Photographic Experts Group)
The Joint Photographic Experts Group is a standardization body for the development of
continuous tone computer image algorithms.
ISO 10918-1
ISO 10918-1 is the formal name for the basic image compression algorithm developed
by the Joint Photographic Experts Group and is commonly called JPEG. This is
the algorithm that is discussed and decoded in this project.
VHDL (VHSIC (Very High Speed Integrated Circuit) Hardware Description
Language)
VHDL is an IEEE standardized language for describing the function and behavior of a
digital logic device.
VLSI (Very Large Scale Integration)
Very Large Scale Integration is a description of the process of designing and
implementing digital systems using CMOS Integrated Circuit technology.
SOC (System On a Chip)
Today as minimum chip feature sizes decrease the effective area on a single IC is
growing rapidly. Many designers are working to design entire systems on a single
chip from components the same way that ICs are commonly interconnected on a
printed circuit board today.
Rapid Prototyping
Rapid Prototyping is the effort to increase turn-around time for design testing by initially
testing designs on a programmable logic device before they are sent to a
fabrication plant for prototyping.
FPGA (Field Programmable Gate Array)
A Field Programmable Gate Array is a type of re-configurable logic device that uses an
array of logic blocks that can be programmed and interconnected to one another to
implement both combinational and sequential logic.
CPLD (Complex Programmable Logic Device)
3
EE175WS-00-11
A Complex Programmable Logic Device is another type of re-configurable logic device
that uses a number of PLA (Programmable Logic Array) type devices
interconnected on a single chip.
XILINX
XILINX is a company that builds and sells FPGAs, CPLDs and a number of software
packages that allow these chips to be programmed from a number of different
sources including VHDL code.
XESS (X Engineering Software Systems)
XESS Corporation is a manufacturer of prototype boards with Xilinx FPGAs. The
prototype board used for testing in this project was manufactured by XESS.
DCT (Discrete Cosine Transform)
The primary principle of JPEG compression of an image is based on a discrete
frequency transformation called the Discrete Cosine Transform. This
transformation is related to the standard Discrete Fourier Transform but has
specific properties that make it applicable to image processing.
Huffman Coding
Huffman Coding is an algorithm to minimize the length of messages by using a short
code word to encode highly probable symbols such as the letter e and longer code
words for less probable symbols such as the letter z.
JFIF (JPEG File Interchange Format)
The JPEG File Interchange Format is a commonly used file format for storing JPEG
encoded streams of data for storage and communication. JFIF files are commonly
named with a .JPG file extension.
BMP (Bitmap)
A Bitmap is a device independent format for describing a graphics image as a simple
array of pixel values. This method is commonly used for displays and image
processing algorithms since it provides a simple Cartesian representation of the
data.
4
EE175WS-00-11
Table Of Contents
EXECUTIVE SUMMARY .................................................................................................................................... 1
ACKNOWLEDGEMENTS ................................................................................................................................... 2
KEYWORDS / TERMINOLOGY ........................................................................................................................ 3
TABLE OF CONTENTS ....................................................................................................................................... 5
INTRODUCTION .................................................................................................................................................. 7
PROBLEM STATEMENT .................................................................................................................................... 8
SPECIFICATION .................................................................................................................................................. 8
General Description ................................................................................................................................... 8
Performance Requirements ......................................................................................................................... 8
SOLUTION .......................................................................................................................................................... 10
ALTERNATE SOLUTIONS ANALYSIS ....................................................................................................................... 10
Software Implementation ........................................................................................................................... 10
Hardware Implementation ....................................................................................................................... 10
Solutions Analysis Table ............................................................................................................................ 11
ENGINEERING ANALYSIS ..................................................................................................................................... 12
DESIGN OVERVIEW ........................................................................................................................................... 12
HUFFMAN DECODER ......................................................................................................................................... 14
RUN-LENGTH DECODER .................................................................................................................................... 14
QUANTIZATION DECODER ................................................................................................................................... 15
INVERSE DISCRETE COSINE TRANSFORMATION ........................................................................................................ 16
XESS DEVELOPMENT BOARD INTERFACING .......................................................................................................... 17
TESTING PROCEDURE .................................................................................................................................... 18
Simulation ................................................................................................................................................ 18
Synthesis & Hardware Testing ................................................................................................................... 18
RESULTS ............................................................................................................................................................. 19
BUDGET / RESOURCES ....................................................................................................................................... 19
COMPARISON TO SPECIFICATIONS .......................................................................................................................... 19
Precision .................................................................................................................................................. 20
Chip Area ................................................................................................................................................. 20
Speed ....................................................................................................................................................... 20
Power ...................................................................................................................................................... 20
CONCLUSIONS AND RECOMMENDATIONS .............................................................................................. 21
WHAT WAS LEARNED ...................................................................................................................................... 21
WHAT WENT WRONG ...................................................................................................................................... 21
FUTURE WORK ................................................................................................................................................ 21
REFERENCE DOCUMENTS ............................................................................................................................ 23
APPENDICES ...................................................................................................................................................... 24
FIXED-POINT ARITHMETIC .................................................................................................................................. 24
5
EE175WS-00-11
SCHEMATICS .................................................................................................................................................... 25
JPEG Decoder Unit .................................................................................................................................. 25
Huffman Decoder / Run-Length Decoder Unit ............................................................................................ 25
Quantization Decoder Unit ........................................................................................................................ 26
Inverse Discrete Cosine Transformation Unit ............................................................................................. 26
VHDL SOURCE CODE ...................................................................................................................................... 28
JPEG Library ........................................................................................................................................... 28
JPEG Decoder Unit .................................................................................................................................. 30
Huffman / Run-Length Decoder Unit .......................................................................................................... 36
Quantization Decoder Unit ........................................................................................................................ 41
Inverse Discrete Cosine Transform Unit ..................................................................................................... 43
Serial Input Controller .............................................................................................................................. 52
Serial Output Controller ............................................................................................................................ 54
Memory Input Controller .......................................................................................................................... 56
MATLAB & C++ CODE .................................................................................................................................... 60
Data Create & Test Matlab Script .............................................................................................................. 60
Huffman Coding in C ................................................................................................................................ 63
DCT Test Matlab Code ............................................................................................................................. 73
Computation of DCT Coefficient Matrix ..................................................................................................... 74
Quantization Testing in Matlab .................................................................................................................. 75
Image DCT, Quantization, De-Quantization, IDCT Testing in Matlab .......................................................... 76
XESS XSV BOARD V1.0 MANUAL .................................................................................................................. 78
6
EE175WS-00-11
Introduction
Since modern computer systems are required to store and transmit vast amounts of data
the field of data compression has become very important. One form of data that is
commonly processed by computer systems is graphic images. To compress graphic
images the Joint Photographic Experts Group (JPEG) developed a method of compressing
images by reducing the precision of the high-frequency portions of images. This allows the
images to be stored more compactly without sacrificing the important low-frequency
portions.
This is done by first dividing the image into an array of 8 pixels by 8 pixels data blocks and
performing a transformation on these data blocks that expresses each data block by a linear
combination of sinusoidal components of harmonic frequencies. Then the magnitudes of
the components corresponding to the higher frequency harmonics are stored with less
precision then the lower frequencies. This filtering loses some of the detail of the image but
retains most of the images information since the human eye acts as an integrator, which
reduces the contribution of high detail portions of our visual field. After being filtered the data
is coded so that large values will be stored with larger numbers of bits then smaller values.
This process allows a variable length coding of the data for compression. Finally the data is
Huffman coded so that more frequent data values are stored as shorter codes
This algorithm for image compression is formally known as ISO10918-1 but is commonly
referred to as JPEG after the standardization body that developed it. JPEG is frequently
used both on the Internet and in consumer electronics devices such as digital cameras. To
decode JPEG images into uncompressed data commonly stored as Bitmaps, which are a
device independent representation of the array of pixels that make up an image a device
called a JPEG decoder, is needed to restore the image. This device performs the inverse of
the JPEG encoder, which encodes bitmap images as JPEG streams. Usually JPEG
encoders and decoders are written as programs in a high-level language such as C or C++
and run on general-purpose microprocessors.
The purpose of this project is to design and implement a JPEG decoding system that can be
incorporated into a digital camera design. This application requires the JPEG decoder to be
simple, fast, low power, and easily integrated into a larger system. Such systems have
been built before and are commonly used in consumer electronics devices such as digital
cameras. Also JPEG decoder designs such as the one built for this project are available for
purchase and can be incorporated into larger designs.
7
EE175WS-00-11
Problem Statement
This project involved the design of a JPEG decoder. JPEG, the Joint Photographic Expert
Group, is a standardization body that produces standards for continuous tone image coding.
Perhaps the best known such standard is IS10918-1 which is a widely used image
compression standard. The JPEG decoder designed in this project will be used to decode a
JPEG File Interchange Format (JFIF) file into an uncompressed bitmap file. JFIF is the file
format that is commonly associated with JPEG and is used widely on the Internet and in
consumer electronics devices to store still image data. While the JPEG standard (ISO
10918-1) defines a large class of related compression algorithms the JPEG decoder
designed for this project will focus on the simplest and most widely used such algorithm
known as baseline JPEG.
The wide use of JPEG in consumer electronics devices such as digital cameras produces a
need for a fast, low-power implementation that is capable of meeting the demands of the
overall system. As with any digital system the JPEG decoder could be implemented either
in software running on a general purpose microprocessor, or more likely a special purpose
microprocessor such as a Digital Signal Processor, DSP, or with custom hardware circuitry.
The advantages and disadvantages of both software and hardware implementations will be
discussed shortly. While this project will produce only the JPEG decoder much of the
design would be reusable in the design of a JPEG encoder.
This project will demonstrate the JPEG decoder using a Field Programmable Gate Array
(FPGA). The FPGA will be programmed with the JPEG decoder design and will receive
input JPEG images from a serial communication link with a computer system and send the
decoded output images back to the computer for viewing.
Specification
General Description
This project requires that a system for decoding JPEG images into a standard bitmap image
representation be implemented. This system must adhere to the baseline JPEG standard
described by ISO 10918-1.
Performance Requirements
Precision
Keep average per pixel error to within 3% of a standard floating-point implementation of
JPEG decoding.
Chip Area
Maintain a reasonable area for the implementation of the JPEG decoder. The target FPGA
has a capacity of about 300 thousand gates. Since routing of components produces a less
then optimal usage of an FPGA it is desired to keep the gate count at about 140 thousand
gates. This target gate count will be useful in ensuring that the entire design will be able to
fit onto the FPGA board.
8
EE175WS-00-11
Speed
It is desirable to maximize the JPEG decoders speed. While again the speed of the circuit
is highly dependent on implementation technology the JPEG decoder must be able to
perform at speeds between the speed of a software implementation of JPEG decoding and
the speed of a fully optimized JPEG decoder design that is available commercially. While
speed is a crucial design point in a production design it will not be emphasized in this
prototype using an FPGA while the design should be suitable for optimization towards a
specific usage.
Power
Lastly the power consumption of the JPEG decoder must be within acceptable limits.
However since the power consumed is determined by the FPGA used not the design itself
only simulation data will be available to measure the predicted actual power consumption of
the JPEG decoder when implemented using as an ASIC (Application Specific Integrated
Circuit).
9
EE175WS-00-11
Solution
Alternate Solutions Analysis
As previously mentioned a digital system can be implemented either in software running on
a microprocessor or with a custom designed digital logic circuit. These are the major realms
of digital system design; each of these solutions has a wide variety of design decisions
associated with them.
Software Implementation
Implementation of the JPEG decoding algorithm in software is very common. There are
numerous open-source software implementations of JPEG in languages such as C and C+
+. The existence of this software and the easy accessibility to C compilers for most
microprocessor designs simplifies the software design to the point where only moderate
coding would be required to modify one of these implementations for a specific use. Since
microprocessors are relatively affordable at low volumes such an implementation would be
essential for a small volume product. In some applications that use a microprocessor it
would be reasonable to bear the extra load of JPEG decoding on the microprocessor but in
many situations the microprocessor is a valued resource that would better be utilized
performing other calculations.
Hardware Implementation
The repetitive, well defined nature of the process of JPEG decoding lends itself very well to
a hardware implementation where the ease of design and implementation are traded off for
a faster, less power consuming solution which allows greater computational flexibility at the
cost of design effort. In addition to these conventional arguments for a hardware
implementation the constantly expanding area of chips produced by the continual progress
of Moores Law, which states that chip capacity will double every 18 months, provides
another reason to consider a hardware implementation of JPEG decoding. The increased
chip capacity has allowed, in recent years, the combination of a general-purpose
microprocessor with custom logic units on a single chip. By placing these external units on
the same piece of silicon as the microprocessor the costs of communications are greatly
reduced. As the capacity of integrated circuits continues to increase such System-On-a-
Chip designs will continue to grow in popularity.
VHDL VHSIC Hardware Description Language
VHSIC (Very High Speed Integrated Circuit) Hardware Description Language (VHDL) is a
powerful language used for the description of digital circuits. VHDL allows the mixture of
both high-level behavioral descriptions and low-level structural descriptions to be connected
and used together. Using these multiple levels of abstraction together allows the design
process to focus on testing functionality and then optimizing the critical areas of the design
by specifying them at a more detailed level where the designer can optimize the circuit as
needed to meet specifications.
10
EE175WS-00-11
Verilog
Verilog is another standardized hardware description language. In contrast to VHDL,
Verilog is more commonly used in the United States since it is more popular in industry.
Schematic Capture
Before hardware description languages such as VHDL and Verilog became popular the
standard industry practice was to use CAD programs to draw board and chip layouts from
standard components and simple logic gates. This method is similar to structural
architecture description in hardware description languages. Schematic capture is basically a
graphical way of connecting standard and custom discrete components together to develop
a digital system. While sophisticated automated routing tools are available in packages
such as Protel the placement of the components must be done by hand in most instances,
which increases the design complexity of the process. Modern HDL synthesis tools take
advantage of regular design structures such as Field Programmable Gate Arrays (FPGAs)
to simplify the task of placement and routing of logic for a design.
Magic Layout Design Editor
Magic is a layout design editor for CMOS technology where actual transistors are created
from the varying layers of silicon of varying impurity levels and insulation layers. These
transistors are then routed by metal layers into the actual physical structure of a microchip.
The output of such a tool would then be extracted to a simulation tool such as PSpice and
accurate, albeit intolerably slow, simulations could be run on the design. Finally the masks
defined by the Magic generated layout would be sent to manufacturing where the actual
manufacturing masks would be generated and the chip could be produced.
Solutions Analysis Table
Design Metrics
Vs. Solutions
Software VHDL
Behavioral
Synthesis
VHDL
Structural
Description
Schematic
Capture / Magic
Design Ease /
Design Cost
Winner Good Acceptable Unacceptable
Speed Worst Case Acceptable Good Good
Power Worst Case Good Good Winner
Chip Area
Efficiency.
Worst Case Acceptable Good Winner
Accuracy Winner Good Good Good
Unit Cost (@
High Volume)
Poor Good Good Good
11
EE175WS-00-11
This table clearly shows that while the low level solutions towards the right side have better
performance attributes the high level solutions towards the left side have better practicality
attributes. The behavioral or algorithmic VHDL description method provides a strong
compromise between software design, in which JPEG is usually implemented, and higher
performance hardware solutions.
Engineering Analysis
The relationships between layout tools, schematic capture tools, and hardware description
languages are very closely analogous to the corresponding relationships in software
between machine languages, assembly languages, and programming languages.
Just as the current focus in software design is on reusable, machine independent
algorithmic descriptions the use of a technology independent description language such as
VHDL or Verilog are strongly preferred. The effort expended on designing HDL descriptions
of digital circuits can be reused and optimized as logic synthesis tools become more
powerful. This potential for improving designs through the advancement of synthesis tools
and implementation technologies makes the design of large libraries of digital designs to be
designed and reused as software libraries are today. This concept of a design, described in
an HDL, has been termed Intellectual Property or IP which conveys the great potential
importance of reusable designs.
For all these reasons digital system design in a High-Level Description Language is
becoming the preferred method for design of hardware and the relative ease of this design
in an HDL is comparable to software implementation in a High-Level Programming
Language such as C and C++.
The proposed design for the JPEG decoder will allow the pipeline structure of the JPEG
decoding operation to be performed in a parallel manner to enhance the operations
concurrency thus increasing speed. Due to this explicit parallelism of the design a hardware
implementation of JPEG decoding has a great potential for being faster then software
implementations.
Design Overview
The JPEG decoder device was designed and implemented in VHDL at a behavioral
description level of abstraction to be synthesized to logic gates. The design of a JPEG
decoder in VHDL will provide a robust, hardware technology independent description. The
decoder could then be downloaded to a Field Programmable Gate Array for testing and
verification.
Figure 1 is a general block diagram representing the JPEG encoding process. A good
understanding of the encoding process will help illuminate some of the design options of the
decoding process while describing the fundamental problem at hand in greater detail.
12
EE175WS-00-11
Figure 1 [1]
As shown in Figure 1 the JPEG encoding process is performed on blocks of an image that
are 8 pixel wide by 8 pixels high. Each of these JPEG data blocks is encoded in a
sequence of three operations. First the image block is transformed using a 2-dimensional
Forward Discrete Cosine Transform (FDCT) to determine the spectral components of the
image. After the FDCT is performed the upper left corner of the coefficient matrix contains
the DC component of the block and the lower right corner contains the highest frequency
components of the image. Since the human eye does not readily perceive high frequency
changes the high frequency components can be stored with less precision then the more
important low frequency components. This low pass filtering of the image is performed by
the next stage, which quantizes the data in exactly this manner. The Quantization table
used in this step thus determines the exact filter characteristics and thus the compression
ratio and quality of the encoded JPEG image. Finally the Coding stage transforms the 8x8
quantized block into a linear stream of values and then assigns the more frequently
occurring values to shorter binary codes and less frequently occurring values to longer
binary codes to minimize the length of the encoded message. The Coding table used in this
step determines the compression ratio since the table must accurately match the relative
frequencies of the input values to achieve good compression.
The JPEG decoding process is an inverse transformation where the encoded data is first
decoded and restored to 8 pixel by 8 pixel data blocks using a preliminary Decoding stage.
This stage includes both Huffman Decoding and Run-Length decoding two distinct coding
schemes. Next the Quantization table specification is used to approximately regain the
spectral components of the image block, while low frequency components may be fully
restored the high frequency components may be severely distorted however this distortion is
barely perceptible. Finally the Inverse Discrete Cosine Transform approximately recovers
the original 8x8 data block. Figure 2 is a detailed block diagram showing this process, which
is implemented by the JPEG decoder designed for this project.
Figure 2 [1]
13
EE175WS-00-11
Huffman Decoder
The first stage of the JPEG decoding process is the decoding of data values using a
Huffman coding. The Huffman decoder was designed to read in a Huffman Table that was
extracted from the JPEG data file and use that data to determine the decoding of the input.
Since the Huffman encoded data is of variable length the decoder must make decisions one
bit at a time. To do this the decoder reads data 1 bit at a time from the input using a
separate process to handle buffering the input data and delivering it as needed to the
decoder. The process of decoding is exactly like walking down a binary tree. At each step
from the root of the Huffman tree the decoder makes a single decision based on the next bit
of input until it reaches the end of the path. At this point the decoder is able to decide from
the Huffman Table what the appropriate decoded value is. The following figure shows a
Huffman Tree and the corresponding codes for a 2-bit message where 00 is very common
and 10 and 11 are very rare.

Root
00
01
10 11
1
1
1 0
0
0
Value Code
00 0
01 10
10 110
11 111

Figure 3
Run-Length Decoder
The decoded 8-bit word from the Huffman Decoder represents a 4-bit run-length followed by
a 4-bit data-length. The 4-bit run-length is a count of the number of zero data values
occurred between the last non-zero data value and the current one. The 4-bit data-length is
the number of bits following this 8-bit word that make up the actual non-zero data point. A
data-length of 0 signifies either the end of a data block or if the run-length is 15 then the
event of 16 consecutive zero data values. Since both the Huffman Decoder and the Run-
Length decoder have to read bit by bit from the input to decode the data I decided to merge
these two distinct operations into a single VHDL entity that performs both of these
operations. The data values are then read from the input and decoded according to the
following rules. If the high order bit of the data value is 0 then it corresponds to a negative
number and should be sign extended with ones since we are using a signed 2s
complement numbering system. If the high order bit of the value is 1 then it corresponds to
a positive value and should be sign extended with zeros. Then 1 is added to the negative
values to make their codes the right value. This creates a gap between the least negative
14
EE175WS-00-11
number and the least positive number exactly large enough to hold the values that can be
represented by less bits and would thus have another sign bit. Table 4 below is a table for
the 3-bit long data values and their decodings.
Value to Be Coded Conversion If
Negative
8-bit 2s
Complement
Encoded Value
-7 -8 11111000 000
-6 -7 11111001 001
-5 -6 11111010 010
-4 -5 11111011 011
4 00000100 100
5 00000101 101
6 00000110 110
7 00000111 111
Table 4 [1]
Quantization Decoder
The Quantization Decoder requests data values from its input. It multiplies these data
values by the corresponding value in the Quantization table and then places them in the
appropriate location in the 8x8 JPEG data block. During JPEG encoding the frequency
components of the data block are ordered so that the low frequency components are at the
beginning and higher frequency components follow. To do this the frequency matrix is
ordered in a zig-zag fashion as described in the following diagram.
15
EE175WS-00-11

Figure 5 [1]
This data block is then passed on to the Inverse Discrete Cosine Transform unit. Since the
Quantization Decoder is in the middle of the JPEG decoder pipeline and is relatively simple I
decided to make it the master device of the JPEG decoder. It requests the Huffman
Decoder to give it data and with that data it assembles a data block and requests the
Inverse Discrete Cosine Transform unit to decode it. This allows almost all of the operations
of the Quantization Unit to be done while the Huffman decoder, which takes a long time
since it has to make decisions at every bit, is running. This increased parallelism is one of
the major advantages to a hardware-based design.
Inverse Discrete Cosine Transformation
The Inverse Discrete Cosine Transform unit is definitely the most complex unit in the JPEG
decoder. The IDCT requires many multiplications and additions of irrational values and is
computationally intensive. Since a floating point ALU is very difficult to design, very large,
and very slow floating point arithmetic is generally never done in custom hardware designs
except for the data-path of a microprocessor where it can be properly shared among many
different uses. For this design I quickly realized that I would have to work around this
problem. I chose to implement the IDCT using only scaled fixed-point arithmetic. After
extensive Matlab testing I decided that a 16-bit whole part followed by an 8-bit fractional
extension would be used. Thus the input was extended from 16-bits to 24-bits and then the
16
EE175WS-00-11
calculations could be performed. After computation the output is rounded back to whole
numbers and reduced to the final 8-bit output.
Here are the equations for the 2-Dimensional 8x8 Discrete Cosine Transform and its Inverse
Transform
( )
( ) ( )
( )
( ) ( )
'
,
_
,
_
,
_
,
_

0 1
0
2
1
16
1 2
cos
16
1 2
cos
4
1
, : IDCT
16
1 2
cos
16
1 2
cos
4
1
, : FDCT
7
0
7
0
, ,
7
0
7
0
, ,
n
n
C
v y u x
S C C y x s
v y u x
s C C v u S
n
u v
v u v u y x
x y
y x v u v u

These equations can be rewritten in the form of linear transformations using matrices as
follows
( )
( ) D D s D D D S D s
D D S D D D s D S
T T T
T T T

Here D is a constant 8x8 Matrix formed by the cosine values and constants above. This
Form shows that D is orthogonal since its transpose is also its inverse. The JPEG decoder
used this linear transformation equation to compute the IDCT since computing the product
of two matrices is relatively straightforward.
XESS Development Board Interfacing
The XESS Board that I decided to use for this project has turned out to be a very useful and
versatile development board. The documentation and developing tools provided by XESS
have been very helpful and have enabled the basic communications devices required to
communicate with the board to be developed. To communicate with the PC Jeremy and I
decided to use the onboard Serial communications port and to implement VHDL entities to
work on the board to communicate with the PC via the serial port. To do this we had to
reprogram the CPLD (Complex Programmable Logic Device) on the board to route the
serial pins to the FPGA. These serial lines were not originally configured to connect to the
FPGA and we needed these pins for serial communication. Once the serial pins were
routed to the FPGA we designed two very simple VHDL entities to control the receiving and
transmitting of data. The Serial Input Controller listens to the serial receive line and reads off
bytes of data as they arrive and presents this data as an 8-bit output value. The Serial
Output Controller waits for a signal to send an 8-bit input value to the serial communications
link and when it receives this signal it transmits it over the serial port to the PC. The design
of these entities is based on a simple Finite State Machine to read and write the data one bit
at a time.
17
EE175WS-00-11
Testing Procedure
Since the prototype fabrication process is prohibitively expensive for this project the actual
JPEG decoder chip could not be built and tested instead testing of the JPEG decoder
design proceeded in two phases, software simulation testing and hardware testing on a
Field Programmable Gate Array (FPGA). This process of software simulation of VHDL code
followed by FPGA testing is commonly referred to as Rapid Prototyping since these tools
allow the design to iterate much quicker then under a conventional development cycle.
Simulation
To simulate the JPEG decoder the program Active HDL was used. This program allows the
VHDL code to be written and simulated in a single development environment and offers an
advanced logic analyzer type waveform display for observing how signals evolve through
simulation time. This allows different signals to be viewed and analyzed for errors easily.
The procedure basically consists of starting an empty design project and adding to the
design all the VHDL source code. The code can then be compiled and the simulation will
begin. The first time the software asks which entity is the top level entity to simulate and this
can be changed as needed later. Then the simulation will display the available VHDL
entities, their input and output ports, and their internal signals for observation. When the
desired signals have been added to the waveform viewer the simulation begins by
specifying how long to run. This will run the simulation for the specified interval and display
the waveforms during that interval.
The simulation of the JPEG decoder has been successful and has yielded good results.
The simulation helped considerably during the design process, as I was able to pinpoint
what signals were behaving incorrectly and where in the code this problem arose.
Synthesis & Hardware Testing
The plan was to have the synthesis of the JPEG decoder completed and have the device
downloaded to the FPGA on the XESS board and running so that JPEG data blocks could
be sent to the board for decoding and the decoded data could be returned to the PC and
assembled into a viewable image. While many of the pieces for such a setup are in place
the project ran out of time and this testing could not be performed. Preliminary work on
designing the serial communications units has been completed and additional work has
been done to design a memory interface controller. Specifically the memory interface
controller can successfully read and write the on-board memory but some timing issues
have not been resolved when the device is both reading and writing. I have been making
use of a logical analyzer to discover errors and fix them and that has proven to be a very
important tool. The logical analyzer is easily connected to the expansion port pins on the
XESS development board and signals from the FPGA can be easily viewed.
18
EE175WS-00-11
Results
The implementation of the JPEG decoder was not completed. All of the pieces for the
JPEG decoder have been written and simulated and I have determined that they do work.
Then these pieces were assembled and a full simulation of the JPEG decoder was made for
a single block of a JPEG image.
The second part of the project, which involves the synthesis of the JPEG decoder and
testing on the XESS development board, has not been completed.
Budget / Resources
Since this project consisted of the design and testing of a JPEG decoder in simulation
followed by testing of the design on a re-programmable logic device the resources required
for this project included a great deal of usage of expensive software systems and
prototyping devices but no expenditure of resources will be involved. The required
resources for this project included the VHDL tools necessary to write, compile, test, and
synthesize a VHDL description and the FPGA prototype hardware are very expensive and
the Technical Advisor, Frank Vahid, generously provided access to these resources. Dr.
Vahid supplied the resources necessary to purchase the XESS XSV-300 Virtex Prototyping
Board, which I selected for use on this project. This prototype board has all the features
such as a gate capacity of approximately 300K gates and an easy interface with the
software packages being used. The total cost of this board was $899.00. This board also
has many other features that will be used by Dr. Vahid and his research group as well as
possible future Senior Design Projects under Dr. Vahid. This board features a Xilinx FPGA
of the Virtex class that is capable of implementing approximately a 300 thousand gate
design with reasonable speed.
Comparison to Specifications
The specifications required that the JPEG decoder operate not just in simulation but actually
function on the board. Also the glue logic to repeat the JPEG decoding process for every
block of an image and to interface with a standard JPEG file format (JFIF) was not
completed. I will discuss these two issues separately
First there was not enough time for me to get the JPEG decoder to work on the board. This
was due to the amount of time needed to get the JPEG decoder operational in a simulation
environment. There were no major stumbling blocks except for time constraints. I believe
that if I had been able to schedule the project with more time devoted to slowly integrating
the working parts of the design on the board the project would be able to operate on the
board.
Second the current JPEG decoder needs a substantial amount of control to deliver the
appropriate data. Ideally a standard JFIF file could be delivered to the decoder and a
decoded image bitmap would be produced. Unfortunately time did not permit me to
continue on with adding this control logic. As it stands now the JPEG decoder design, if
implemented in hardware would be able to significantly increase the speed of decoding
JPEG images when connected to a microprocessor programmed to control it and do the
necessary bookkeeping. This in itself is an important feature that demonstrates how a
19
EE175WS-00-11
system can be implemented as a hybrid hardware / software implementation to improve
performance without sacrificing versatility.
Precision
The specification required that the average per pixel error be below 3% of the error rate
introduced by a standard floating-point implementation. This design goal appears to have
been met. Matlab experiments show that I my approximate method of computing the IDCT
will introduce approximately 0.245% error into the image. The VHDL simulation results
agree to this prediction within 0.06%. The actual error introduced through the simulation
was found to be 0.227%, which is actually less then the predicted value. This data shows
that the JPEG decoder is sufficiently accurate.
Chip Area
The specification required that the design be able to synthesize and fit on the FPGA on the
development board, which has a capacity of about 300 thousand gates. Since this part of
the project did not get completed this design goal has not been evaluated. I believe that the
JPEG decoder is sufficiently simple to fit within this boundary but I have not been able to
determine the actual required number of gates due to time restrictions.
Speed
From simulation it has been determined that the number of clock cycles required to
complete one 8x8 block of data is about 3,700 clock cycles. Since I anticipate no problems
implementing this design to run at 50MHz and possibly much faster at about 100MHz I have
calculated that a 640 by 480 pixel image consisting of 4,800 data blocks will take about one
third of a second. This would give a throughput of about 8 Mega bits per second. This
speed is very acceptable and while it is slower then a standard PC decoding images it can
keep up with less powerful microprocessors.
Power
As mentioned in the specifications this project being implemented on a FPGA is not a good
measure of the power requirements of the design since the power used is dependent on the
FPGA that it is running on.
20
EE175WS-00-11
Conclusions And Recommendations
This project proved to be considerably more challenging and time-consuming then originally
projected. While the overall project was not completed the decoder is basically complete
and simulation shows that it does work. Also many of the necessary components for
communication and memory storage on the prototype board were developed. I believe that
future projects building upon this one would be able to use the work I have done to complete
this project or a similar one. I believe that a two-member team could expand the results of
this project and deliver a system capable of downloading JPEG images.
The theoretical foundations of the JPEG algorithm such as the Discrete Cosine Transform
and Huffman coding are good applications of the concepts learned in Digital Signal
Processing and Digital Communications.
What Was Learned
The project showed that the design and implementation of complex systems in hardware is
considerably different then in software. While the JPEG algorithm is relatively simple to
implement in software it does not easily translate to a hardware implementation. Also a lot
of valuable experience in project management was learned.
It was also very instructive to see how the frequency transformations learned about
throughout my studies can be applied to a seemingly simple problem such as compression.
I have an increased awareness of the complexities of the JPEG algorithm.
The chosen prototype board produced by XESS worked well and was well documented. I
believe that the board is capable of being useful for a number of different design projects.
What Went Wrong
The failure of this project is due primarily due to time delays and the complexity of the
project. Even though the project did not get finished I believe that I successfully completed a
lot of work on the project and that my final progress of a proper simulation of the JPEG
decoder is significant. The project turned out to be considerably more challenging then I
originally anticipated and I believe that it would have been much better if I had only taken a
minimal course load during the second quarter of the project so that I could focus on the
project almost exclusively.
Future Work
First of all I believe that this project is too much for a single beginning engineer to handle
alone. Further work on this progress should be attempted in a group where the team
members are well trained in VHDL programming and experienced in the design of digital
systems. Beyond finishing this project there are a number of related topics that would be
interesting to explore
Prototype Board Since this project did not succeed in getting the JPEG decoder
working on the prototype board future work should start with replicating the
successful simulation results on the actual board.
21
EE175WS-00-11
Huffman Table Extraction The Huffman Table format used in the JPEG decoder
is not the same as the information stored in a normal JPEG file. This could be
changed by extracting the useful form of the Huffman Tree from the data stored in
the file. I have done this with some simple C code, which could be used as a basis
for doing it in VHDL.
Color Space Transformation The JPEG decoder does not currently use a specific
color space and generally treats the data as an array of bytes. For image
applications hardware usually uses the RGB color space but many JPEG streams
use the YCbCr color space since the Cb and Cr coordinates can be stored more
efficiently [1].
Fast Cosine Transformation Realization The Discrete Cosine Transformation is
very similar to the Discrete Fourier Transform and the efficient algorithms for
computing the Discrete Fourier Transform collectively known as the Fast Fourier
Transform algorithms can be applied to improve the efficiency of the Discrete
Cosine Transformation as well. While I used a simple algorithm based on linear
transformations to implement the Inverse Discrete Cosine Transformation a Fast
Cosine Transform implementation may be a better alternative.
JPEG Encoder A JPEG encoder could also be extended from this project. I
would suggest that the encoder be designed to use a pre-determined Huffman
code tree and one of a few selected Quantization terms. This would decrease the
complexity of the design considerably. The corresponding decoder would also be
simplified since the Codes would be known without having to extract them from the
data stream, which I found to be difficult and inefficient. Also the compression
would be better since the codes would not need to be explicitly stored in the file. I
believe that this type of implementation is better for a digital camera since many
features of the data set are well known and generality is not needed. ( A digital
camera does not need to encode any size image for example but only one of a few
different resolutions. )
JPEG 2000 The Joint Photographic Experts Group has now submitted a draft for
completely new image compression standard to replace JPEG. This new standard
is expected to be approved later this year and quickly replace the current JPEG
standard. This new algorithm uses more advanced Wavelet analysis methods to
provide better compression, image quality and versatility. I believe that the design
of a JPEG 2000 system perhaps using a Digital Signal Processor would make a
good project.
22
EE175WS-00-11
Reference Documents
The following is a list of the documents that I have referred to and found to be useful in the
course of this project.
[1] C. W. Brown, B. J. Shepherd, Graphics File Formats: Reference and Guide.
Greenwich, CT: Manning Publications Co, 1995.
[2] XSV Board v1.0 Manual, XESS Corporation, Apex, NC, 2000
[3] Hsu et al, VHDL Modeling for Digital Design Synthesis, Norwell, MA, Kluwer
Academic Publishers, 1995
[4] Xentec, JPEG_CODEC X_JPEG Short Form Datasheet, http://www.xentec-
inc.com/X-datasheets/x_jpeg_rev1.4.pdf (current June 11, 2000)
[5] T. Tran, Fast Multiplierless Approximation of the DCT, Johns Hopkins University,
ECE Department, http://thanglong.ece.jhu.edu/Tran/Pub/intDCT-SPL.pdf (current June
11, 200)
[6] Collosseum Builders Inc, Image Library Source Code Version 3,
http://www.collosseumbuilders.com/imageformats/compressedimageformats.html
(current June 11, 2000)
23
EE175WS-00-11
Appendices
Fixed-Point Arithmetic
To efficiently compute he Inverse Discrete Cosine Transform in hardware a scaled fixed-
point arithmetic system was used to simulate real numbers. This numbering system
consisted of 16-bits of a whole number followed by 8-bits of fractional part. This allows
numbers from 32,768.9906375 to 32,767.9906375 to be stored in 24-bits as follows.
8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
.

a a a a a a a a a a a a a a a a a a a a a a a a A
Where
{ } 1 , 0
i
a
And
8
23
0
8
8
15
8
2 2 2 2

A a a A
i
i
i
i
i
i
Where
A
is a normal 24-bit binary number. Addition works as expected for this system
due to linearity.
( )
8
2
+ + B A B A
However multiplication does introduces an extra term.
( )
8 8
2 2

B A B A
To correct for this extra term the product of two numbers must be shifted right an extra 8-
bits. In actuality this could be performed by editing the multiplication algorithm to shift right
once extra per iteration (normal hand multiplication algorithm from grade-school.) but this is
not feasible when working with a pre-defined multiplication algorithm. This can be described
simply by noting that while 0 . 1 0 . 1 0 . 1 when shifted to allow fractions 100 10 10 ,
which gives 0 . 10 unless we include an additional shift to arrive at the expected answer of
0 . 1 .
With this scaled-fixed point system and the appropriate correction for multiplication I was
able to approximate the Inverse Discrete Cosine Transform using only standard integer
arithmetic.
24
EE175WS-00-11
Schematics
The following sections show the schematics of the different components of the JPEG
decoder. They primarily depict the I/O characteristics of each device and their
interconnection structure. The implementation of these devices is described by a VHDL
process that is basically a high-level representation of a Finite State Machine with Datapath.
This behavior is best analyzed by reading the VHDL source presented later.
JPEG Decoder Unit

JPEG Decoder Unit
in_req go

inp(0:7) value(0:7)

in_rdy rdy

rst
clk
Here the JPEG Decoder Unit takes a stream of input data in the form of 8-bit words and
produces the JPEG decoded values from the data stream. The beginning of the input data
stream will have both the Huffman Code Table and the Quantization Table appended to it.
The decoder logic will extract these values and send them to the Huffman Decoder and
Quantization Decoder which are serially connected along with the Inverse Discrete Cosine
Transform Unit.. The output of the IDCT unit is then placed one entry at a time on the 8-bit
output value.
Huffman Decoder / Run-Length Decoder Unit

Huffman Decoder
in_req go

inp(0:7) value(15:0)

in_rdy rdy

rst
clk code

rst
25
EE175WS-00-11
Here code is the Huffman Code Table extracted from the input stream. Input is an 8-bit
word from the data stream. Value is the Huffman and Run-Length Decoded data value that
has been extracted from the data stream. The other signals are control signals.
Quantization Decoder Unit

Quantization Decoder
in_req go

in_val(15:0) outp

in_rdy rdy

rst
clk quan
Here quan is the Quantization Table extracted from the input stream. In_val is the 16-bit
word from the Huffman Decoder. Outp is the assembled 8x8 JPEG data block matrix. It
consists of 64 entries each a 16-bit value that is the product of one of the last 64 in_vals
from the Huffman Decoder and the corresponding term from the Quantization Table. These
are the restored DCT coefficients and are sent to the IDCT unit to recover the actual data
values. The other signals are control signals.
Inverse Discrete Cosine Transformation Unit

Inverse Discrete
Cosine Transform

X O

go rdy

rst
clk
Here the IDCT Unit computes the Inverse Discrete Cosine Transformation of the input and
produces it on the output. To do this the IDCT Unit has two matrix multiplication sub-units
and an output-rounding subunit, which are described next. First the input is extended to
form the internal scaled fixed-point form of the input matrix then the two matrix
26
EE175WS-00-11
multiplications by the constant DCT matrix and its inverse are performed and then the output
is rounded back to an 8-bit integer valued matrix.
Matrix Multiplication Sub-Unit

Matrix Multiplier

inp1
outp
inp2
go rdy

rst
clk
Here the Matrix Multiplier will compute inp1 pre-multiplied to inp2 (inp1*inp2) and deliver the
result on outp.
Output Rounding Sub-Unit

Output Rounder

inpt outp

go rdy

rst
clk
Here the Output Rounder will take the input, which is a 8x8 matrix of internal scaled fixed-
point values, and round it to an integral 8x8 matrix and output it.
27
EE175WS-00-11
VHDL Source Code
JPEG Library
28
EE175WS-00-11
29
EE175WS-00-11
JPEG Decoder Unit
30
EE175WS-00-11
31
EE175WS-00-11
32
EE175WS-00-11
33
EE175WS-00-11
34
EE175WS-00-11
35
EE175WS-00-11
Huffman / Run-Length Decoder Unit
36
EE175WS-00-11
37
EE175WS-00-11
38
EE175WS-00-11
39
EE175WS-00-11
40
EE175WS-00-11
Quantization Decoder Unit
41
EE175WS-00-11
42
EE175WS-00-11
Inverse Discrete Cosine Transform Unit
43
EE175WS-00-11
44
EE175WS-00-11
45
EE175WS-00-11
46
EE175WS-00-11
47
EE175WS-00-11
48
EE175WS-00-11
49
EE175WS-00-11
50
EE175WS-00-11
51
EE175WS-00-11
Serial Input Controller
52
EE175WS-00-11
53
EE175WS-00-11
Serial Output Controller
54
EE175WS-00-11
55
EE175WS-00-11
Memory Input Controller
56
EE175WS-00-11
57
EE175WS-00-11
58
EE175WS-00-11
59
EE175WS-00-11
Matlab & C++ Code
Data Create & Test Matlab Script
60
EE175WS-00-11
61
EE175WS-00-11
62
EE175WS-00-11
Huffman Coding in C
63
EE175WS-00-11
64
EE175WS-00-11
65
EE175WS-00-11
66
EE175WS-00-11
67
EE175WS-00-11
68
EE175WS-00-11
69
EE175WS-00-11
70
EE175WS-00-11
71
EE175WS-00-11
72
EE175WS-00-11
DCT Test Matlab Code
73
EE175WS-00-11
Computation of DCT Coefficient Matrix
74
EE175WS-00-11
Quantization Testing in Matlab
75
EE175WS-00-11
Image DCT, Quantization, De-Quantization, IDCT Testing in Matlab
76
EE175WS-00-11
77
EE175WS-00-11
XESS XSV Board V1.0 Manual
78
EE175WS-00-11
137

Team 11

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Team 11

Загружено:

Авторское право:

Доступные форматы

EE175WS-00-11

Вам также может понравиться