Report

Project Report
on
Implementation of 1D – DCT Based

on FPGA with Verilog HDL
Submitted in partial fulfillment of the requirements for the degree of
Master in Technology
in
Vlsi Design
of the
UNIVERSITY OF CALCUTTA
by
Niraj Kumar
Roll No.: 97/VLM/171004
Registration No.: 133-1121-0224-14
Under the supervision of
Dr. Pulak Mondal

Institute of Radio Physics and Electronics
University of Calcutta
2018-2019
1
UNIVERSITY OF CALCUTTA
UNIVERSITY COLLEGE OF SCIENCE, TECHNOLOGY AND AGRICULTURE
INSTITUTE OF RADIOPHYSICS AND ELECTRONICS
Tel : +91-33-2350-9115/9116/9413 SISIR MITRA BHAVAN
Fax : +91-33- 2351-5828 92 ACHARYA PRAFULLA CHANDRA
KOLKATA 700 009,

INDIA
TO WHOM IT MAY CONCERN
Niraj Kumar, Registration No. 133-1121-0224-14. Roll no. 97/VLM/171004 have

successfully carried out their M. Tech. project foundation work entitled “Implementation of
2D-DCT based on FPGA with Verilog HDL” in partial fulfillment of the requirements for
the degree of Master in Technology in Vlsi Design, University of Calcutta, under my
supervision.
-------------------------------------------------
Dr. Pulak Mondal
Assistant Professor
Institute of Radio Physics & Electronics,
University of Calcutta Date:11/11/2018
2
Acknowledgement
I would like to take this opportunity to thank Dr. Pulak Mondal of the Institute of Radio
Physics and Electronics for their invaluable suggestions, encouragement and guidance.
---------------------------------------------
Niraj Kumar
Institute of Radiophysics & Electronics
University of Calcutta
92, Acharya Prafulla Chandra Road
Kolkata-700009
Date 11/11/2018
3
Contents
1. Introduction……………………………………………………………………………………………..………5
2. Why JPEG ………………………………………………………………………………………………………...7
3. Steps Involved………………………………………………………………………………………….……….9
4. JPEG Modes……………………………………………………………………………………………………..15
5. Hardware implementation……………………………………………………………………………….17
6. Study of Literature…………………………………………………………………………………………..17
7. Future Work…………………………………………………………………………………………………….19
8. Reference...…………………………………………………………………………………………………….20
4
Introduction
Image compression is an important topic in the digital world. Whether it be
commercial photography, industrial imagery, or video. A digital image bitmap
can contain considerably large amounts of data causing exceptional overhead in
both computational complexity as well as data processing. Storage media has
exceptional capacity, however, access speeds are typically inversely
proportional to capacity. [8] Compression is important to manage large amounts
of data for network, internet, or storage media. Compression techniques have
been studied for years, and will continue to improve.
Typically image and video compressors and decompressors (CODECS) are

performed mainly in software as signal processors can manage these operations
without incurring too much overhead in computation. However, the complexity
of these operations can be eciently implemented in hardware. Hardware specific
CODECS can be integrated into digital systems fairly easily. Improvements in
speed occur primarily because the hardware is tailored to the compression
algorithm rather than to handle a broad range of operations like a digital signal
processor. Data compression itself is the process of reducing the amount of
information into a smaller data set that can be used to represent, and reproduce
the information. Types of image compression include lossless compression, and
lossy compression techniques that are used to meet the needs of specific
applications. JPEG compression can be used as a lossless or a lossy process
depending on the requirements of the application. Both lossless and lossy
compression techniques employ reduction of redundant data.
Work in standardization has been controlled by the International Organization
for Standardization (ISO) in cooperation with the International Electro technical
5
Commission (IEC). The Joint Photographic Experts Group produced the well-
known image format JPEG, a widely used image format.
[2] JPEG provides a solid baseline compression algorithm that can be modified
numerous ways to tany desired application. The JPEG specification was
released initially in 1991, although it does not specify a particular
implementation.
• The goal of image compression is to reduce the amount of data required

to represent a digital image.
6
Why JPEG
The compression ratio of lossless methods (e.g., Huffman, Arithmetic, LZW) is
not high enough for image and video compression.
JPEG uses transform coding , it is largely based on the following observations:
Observation 1: A large majority of useful image contents change relatively

slowly across images, i.e., it is unusual for intensity values to alter up and down
several times in a small area, for example, within an 8 x 8 image block.
A translation of this fact into the spatial frequency domain , implies, generally,
lower spatial frequency components contain more information than the high
frequency components which often correspond to less useful details and noises.
Observation 2: Experiments suggest that humans are more immune to loss of

higher spatial frequency components than loss of lower frequency components.
Block Diagram
7
8
Steps Involved
1 .Discrete Cosine Transform of each 8x8 pixel array
f(x,y) T F(u,v)
2.Quantization using a table or using a consta
3. Zig-Zag scan to exploit redundancy
4. Differential Pulse Code Modulation (DPCM) on the DC component and Run

length Coding of the AC component Entropy coding (Huffman) of the final
output
DCT : Discrete Cosine Transform
DCT converts the information contained in a block(8x8) of pixels from spatial

domain to the frequency domain.
A simple analogy: Consider a unsorted list of 12 numbers between 0 and 3 ->

(2, 3, 1, 2, 2, 0, 1, 1, 0, 1, 0, 0). Consider a transformation of the list involving
two steps (1.) sort the list (2.) Count the frequency of occurrence of each of the
numbers ->(4,4,3,1 ). : Through this transformation we lost the spatial
information but captured the frequency information.
There are other transformations which retain the spatial information. E.g.,
Fourier transform , DCT etc. Therefore allowing us to move back and forth
between spatial and frequency domains
DCT(Discrete Cosine Transform)
9
Quantization
Why? -- To reduce number of bits per sample
F’(u,v) = round (F(u,v)/q(u,v))
Example: 101101 = 45 (6 bits).

Truncate to 4 bits: 1011 = 11. (Compare 11 x 4 =44 against 45)
Truncate to 3 bits: 101 = 5. (Compare 8 x 5 =40 against 45)
Note, that the more bits we truncate the more precision we lose
Quantization error is the main source of the Lossy Compression.
Uniform Quantization:
q(u,v) is a constant.
Non-uniform Quantization -- Quantization Tables
Eye is most sensitive to low frequencies (upper left corner in frequency matrix),
less sensitive to high frequencies (lower right corner)
Custom quantization tables can be put in image/scan header.
JPEG Standard defines two default quantization tables, one each for luminance
and chrominance
Zig-Zag Scan
Why? -- to group low frequency coefficients in top of vector and high
frequency coefficients at the bottom Maps 8 x 8 matrix to a 1 x 64 vector
.
10
DPCM on DC Components
The DC component value in each 8x8 block is large and varies across
blocks, but is often close to that in the previous block.
Differential Pulse Code Modulation (DPCM): Encode the difference

between the current and previous 8x8 block. Remember, smaller number
-> fewer bits
RLE on AC Components
The 1x64 vectors have a lot of zeros in them, more so towards the end of the
vector. Higher up entries in the vector capture higher frequency (DCT)
components which tend to be capture less of the content. Could have been as a
result of using a quantization table
Encode a series of 0s as a (skip,value) pair, where skip is the number of zeros

and value is the next non-zero component. Send (0,0) as end-of-block sentinel
value.
Entropy Coding: DC Components

DC components are differentially coded as (SIZE,Value)
The code for a Value is derived from the following table
11
Entropy Coding: DC Components (Contd..)
Example: If a DC component is 40 and the previous DC component is 48. The

difference is -8. Therefore it is coded as:1010111
0111: The value for representing –8 (see Size_and_Value table)
101: The size from the same table reads 4. The corresponding code from the
table at left is 101.
Entropy Coding: AC Components

AC components (range –1023..1023) are coded as (S1,S2 pairs):
S1: (Run Length/SIZE)
Run Length: The length of the consecutive zero values [0..15]
12
• SIZE: The number of bits needed to code the next nonzero
AC component’s value. [0-A]
• (0,0) is the End_Of_Block for the 8x8 block.
• S1 is Huffman coded (see AC code table below)
S2: (Value)
• Value: Is the value of the AC component.(refer to

size_and_value table)
13
Partial Huffman Table for AC Run/Size Pairs
14
JPEG modes
Sequential DCT Based
The sequential DCT based mode of operation comprises the baseline JPEG
algorithm. This technique can produce very good compression ratios, while
sacrificing image quality. The sequential DCT based mode achieves much of its
compression through quantization, which removes entropy from the data set.
Although this baseline algorithm is transform based, it does use some measure
of predictive coding called the di®erential pulse code modulation (DPCM).
After each input 8x8 block of pixels is transformed to frequency space using the
DCT, the resulting block contains a single DC component, and 63 AC
components. The DC component is predictively encoded through a difference
between the current DC value and the previous. This mode only uses Huffman
coding models, not arithmetic coding models which are used in JPEG
extensions. This mode is the most basic, but still has a wide acceptance for its
high compression ratios, which can’t many general applications very well.
Progressive DCT Based

Progressive DCT based JPEG compression actually uses two complimentary
coding methods. [9] The goal of this extension is to display low quality images
during compression which successively improve. The first method for such a
technique is known as spectral-selection. This implies that data is compressed in
bands. The first band contains DC components and a very few AC components
to get an image that is somewhat discernable. The second method employed is
known as successive approximation. This method at ¯rst will grossly quantize
the coefficients after the DCT, which will result in a small data set, which will
incur large amounts of blocky artifacts during decompression. The following
scans will contain information about the difference between the quantized and
non-quantized
coefficients, using ner quantization steps. This will allow the image to slowly
come into focus during decompression. Again, this method is used in
applications where these features are desired.
Lossless Mode
Quite simply, this mode of JPEG experiences no loss when comparing the
source image, to the reproduced image. This method does not use the discrete
cosine transform, rather it uses predictive, differential coding. As it is lossless, it
also rules out the use of quantization. This method does not achieve high
compression ratios, but some applications do require extremely precise image
reproduction.
15
Hierarchical Mode
The hierarchical JPEG extension uses a multi-stage compression approach, with
Prediction , and can use the encoding methods from the progressive, sequential
or lossless modes of operation. The strategy is to down sample the image in
each dimension. Then code this data set using one of the three methods
discussed, lossless, sequential, or progressive. The resulting encoded data
stream is to be decoded, and up-sampled to recreate the source image. Then the
process encodes the difference between the recreated image and the source. This
process can be repeated multiple times.
16
HARDWARE IMPLEMENTATION
1) 1D-DCT module
Assuming that the input eight points are x0, x1, x2, x3, x4, x5,
x6, x7, 1D-DCT computing equation can be simplified as (4) based
on the characteristic of the symmetry and rotation for DCT
coefficient
• Y0 = [(x0+x7)+(x1+x6)+(x2+x5)+(x3+x4)]*C4 =
(s07341625)*C4
• Y1 = (x0-x7)*C1+(x1-x6)*C3+(x2-x5)*C5+(x3-x4)*C7
• = f0_7*C1+f1_6*C3+f2_5*C5+f3_4*C7
• Y2 = [(x0+x7)-(x3+x4)]*C2+[(x1+x6)-(x2+x5)]*C6 =
(s07_34)*C2+(s16_25)*C6
• Y3 = (x0-x7)*C3+(x1-x6)*(-C7)+(x2-x5)*(-C1)+(x3-x4)*(-
C5)
17
• = f0_7*C3+f1_6*(-C7)+f2_5*(-C1)+f3_4*(-C5)
• Y4  [(x0+x7)+(x3+x4)-(x1+x6)-(x2+x5)]*C4 
(s0734_1625)*C4
• Y5  (x0-x7)*C5+(x1-x6)*(-C1)+(x2-x5)*C7+(x3-
x4)*C3 f0_7*C5+f1_6*(-C1)+f2_5*C7+f3_4*C3
• Y6  [(x0+x7)-(x3+x4)]*C6-[(x1+x6)-(x2+x5)]*C2 
(s07_34)*C6+(s16_25)*(-C2)
• Y7  (x0-x7)*C7+(x1-x6)*(-C5)+(x2-x5)*C3+(x3-x4)*(-C1)
•  f0_7*C7+f1_6*(-C5)+f2_5*C3+f3_4*(-C1)
Where C1  1 / 2 * COS ( / 16)

C 2 1 / 2 * COS(2 / 16)
C3  1 / 2 * COS(3 / 16)
C4  1 / 2 * COS(4 / 16)
C5  1 / 2 * COS(5 / 16)
C 6  1 / 2 * COS(6 / 16)
C 7 = 1 / 2 * COS(7 / 16)
The pipeline architecture shown in Fig.2 is used to realize the 1D-

DCT calculation described as (4).
The pipeline architecture shown in Fig.2 makes it possible to input the
next eight data continuously so as to improve the processing speed.
18
Fig. 2. The computing architecture figure of 1D-DCT
The method of calculating Y2 is showed in Fig.
19
20
Study of Literatures
.
1 ) Implementation of 2D-DCT Based on FPGA with Verilog HDL- Discrete
Cosine Transform is widely used in image compression.This paper describes the
FPGA implementation of a two dimensional (8×8) point Discrete Cosine
Transform (8×8 point 2D-DCT) processor with Verilog HDL for application of
image processing. The row-column decomposition algorithm and pipelining are
used to produce the high quality circuit design with the max clock frequency of
318MHz when implemented in a Xinlinx VIRTEX-ⅡPRO FPGA chip,
Yunqing Ye and Shuying Cheng.
2) FPGA Implementation of Pipelined 2D-DCT and Quantization Architecture

for JPEG Image Compression- Two dimensional DCT takes important role in
JPEG image compression. Architecture and VHDL design of 2-D DCT,
combined with quantization and zig-zag arrangement, is described in this paper.
The architecture is used in JPEG image compression. DCT calculation used in
this paper is made using scaled DCT. The output of DCT module needs to be
multiplied with post-scaler value to get the real DCT coefficients. Post-scaling
process is done together with quantization process. 2D-DCT is computed by
combining two 1-D DCT that connected by a transpose buffer. This design
aimed to be implemented in cheap Spartan-3E XC3S500 FPGA. The 2-D DCT
architecture uses 3174 gates, 1145 Slices, 21 I/O pins, and 11 multipliers of one
Xilinx Spartan-3E XC3S500E FPGA and reaches an operating frequency of
84.81MHz. One input block with 8 x 8 elements of 8 bits each is processed in
2470 ns and pipeline latency is 123 clock cycles, Enas Dhuhri Kusuma,
Thomas Sri Widodo.
3) Efficient hardware architecture for direct 2D DCT computation and its FPGA
Implementation- In this paper, we propose a low complexity architecture for
direct 2D-DCT computation. The architecture will transform the pixels from
spatial to spectral domain with the required quality constraints of the
compression standards. In our previous works we introduced a new fast 2D
DCT with low computations: only 40 additions are used and no multiplications
are needed. Based on that algorithm we developed in this work a new
architecture to achieve the computations of the 2D DCT directly without using
any transposition memory. We defined Sk functions blocks to build the 2D
DCT architecture. The Sk block perform 8 function depending on the control
signals of the system. The number of additions/subtractions used is 63, but no
multiplication or memory transposition is needed. The architecture is suitable
21
for usage with statistical rules to predict the zero quantized coefficients, which
can considerably reduce the number of computation. We implemented the
design using an FPGA Cyclone 3. The design can reach up to 244 MHz and
uses 1188 logic elements, and it respect the real time video requirements, Anas
Hatim, SaidBelkouch, Tayeb Sadiki.
4)The 2D-DCT combined with Forward and Inverse is designed using VHDL.
This has proposed a architecture based on the row column decomposition for
computation of 2D-DCT. Parallel process causes latency in the system. Latency
produced from this system is 113 clock cycles for 2D-DCT and 112 for 2D-
IDCT. The design implemented in the Xilinx KINTEX-7 FPGA chip can
complete the 2D-DCT/IDCT logic operations correctly at 161.793MHz clock
frequency for forward and 141.709MHz clock frequency for inverse with the
clock period of 6.181ns. Using the Row-Column decomposition algorithm, the
number of calculations are logically, reduced. Ankita Selokar,
A.C.Kailuke,V.B.Bagde
.
5) For the past few years, a joint ISO/CCITT committee known as JPEG (Joint
Photographic Experts Group) has been working to establish the first
international compression standard for continuous-tone still images, both
grayscale and color. JPEG’s proposed standard aims to be generic, to support a
wide variety of applications for continuous-tone images. To meet the differing
needs of many applications, the JPEG standard includes two basic compression
methods, each with various modes of operation. A DCT-based method is
specified for “lossy’’ compression, and a predictive method for “lossless’’
compression. JPEG features a simple lossy technique known as the Baseline
method, a subset of the other DCT-based modes of operation. The Baseline
method has been by far the most widely implemented JPEG method to date, and
is sufficient in its own right for a large number of applications. This article
provides an overview of the JPEG standard, and focuses in detail on the
Baseline method.George K. Wallace
22
FUTURE WORK
As we implement 8×8 block, in future we may implement it by 16×16, 32×32
upto a standard size of the image. The row column decomposition method
reduces the hardware complexity as per the other methods. As the block of the
size increases hardware also increases but we can implement it easily by this
row column decomposition method.
23
REFERENCES
1) Yunqing Ye, Shuying Cheng “Implementation of 2D-DCT Based on FPGA

with Verilog HDL”. Electronics and Signal Processing, LNEE 97 , pp.633-
639.-2011
2) A. Hatim, S. Belkouch “Efficient hardware architecture for direct 2D DCT

computation and its FPGA Implementation”.-2010
3) Wallace,G.k,“The JPEG Still Picture Compression

Standard”,Communications of ACM, Vol.34,Issue 4, pp.30-44. 1991.
4) Kusuma ED, Widodo TS “FPGA implementation of pipelined 2D-DCT

and quantization architecture for JPEG image compression”. Intern
Symp Inform Tech (Itsim)pp 1-6,2010.
5) Ankita Selokar, A.C.Kailuke,V.B.Bagde “FPGA Implementation of Forward

2D-DCT and Inverse 2D-DCT Based On Row-Column Decomposition
Method” Vol. 3, Issue 7, July 2015
24

Report

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Report

Загружено:

Авторское право:

Доступные форматы

Project Report

Implementation of 1D – DCT Based

Under the supervision of

Dr. Pulak Mondal

UNIVERSITY COLLEGE OF SCIENCE, TECHNOLOGY AND AGRICULTURE

INSTITUTE OF RADIOPHYSICS AND ELECTRONICS

Tel : +91-33-2350-9115/9116/9413 SISIR MITRA BHAVAN

Fax : +91-33- 2351-5828 92 ACHARYA PRAFULLA CHANDRA

KOLKATA 700 009,

TO WHOM IT MAY CONCERN

Niraj Kumar, Registration No. 133-1121-0224-14. Roll no. 97/VLM/171004 have

Institute of Radiophysics & Electronics

92, Acharya Prafulla Chandra Road

Typically image and video compressors and decompressors (CODECS) are

• The goal of image compression is to reduce the amount of data required

JPEG uses transform coding , it is largely based on the following observations:

Observation 1: A large majority of useful image contents change relatively

Observation 2: Experiments suggest that humans are more immune to loss of

2.Quantization using a table or using a consta

3. Zig-Zag scan to exploit redundancy

4. Differential Pulse Code Modulation (DPCM) on the DC component and Run

DCT : Discrete Cosine Transform

DCT converts the information contained in a block(8x8) of pixels from spatial

A simple analogy: Consider a unsorted list of 12 numbers between 0 and 3 ->

DCT(Discrete Cosine Transform)

F’(u,v) = round (F(u,v)/q(u,v))

Example: 101101 = 45 (6 bits).

Quantization error is the main source of the Lossy Compression.

Non-uniform Quantization -- Quantization Tables

Custom quantization tables can be put in image/scan header.

Differential Pulse Code Modulation (DPCM): Encode the difference

Encode a series of 0s as a (skip,value) pair, where skip is the number of zeros

Entropy Coding: DC Components

The code for a Value is derived from the following table

Example: If a DC component is 40 and the previous DC component is 48. The

0111: The value for representing –8 (see Size_and_Value table)

Entropy Coding: AC Components

S1: (Run Length/SIZE)

Run Length: The length of the consecutive zero values [0..15]

• (0,0) is the End_Of_Block for the 8x8 block.

• S1 is Huffman coded (see AC code table below)

• Value: Is the value of the AC component.(refer to

Progressive DCT Based

Where C1  1 / 2 * COS ( / 16)

The pipeline architecture shown in Fig.2 is used to realize the 1D-

The method of calculating Y2 is showed in Fig.

2) FPGA Implementation of Pipelined 2D-DCT and Quantization Architecture

1) Yunqing Ye, Shuying Cheng “Implementation of 2D-DCT Based on FPGA

2) A. Hatim, S. Belkouch “Efficient hardware architecture for direct 2D DCT

3) Wallace,G.k,“The JPEG Still Picture Compression

4) Kusuma ED, Widodo TS “FPGA implementation of pipelined 2D-DCT

5) Ankita Selokar, A.C.Kailuke,V.B.Bagde “FPGA Implementation of Forward

Вам также может понравиться