Академический Документы
Профессиональный Документы
Культура Документы
Abstract—A two-dimensional discrete cosine transform (DCT) precision results of the transformation. However, the integer
module for the JPEG image compression system is designed. DCT is an intensive computing process. The architecture of
Considering the compromise of resource and speed in the FPGA (Field Programmed Gate Array) has advantages in
FPGA chip, two same 1D-DCT module are reused to complete parallel and pipelining processing. The compromise of
the FPGA design of 2D-DCT. The pipelining levels in the resource and speed is an important factor in the design of
module are also analyzed and optimized. Simulation and test
FPGA. In this paper, the 2D-DCT (two dimensions DCT)
results for the whole system based on EP1C6Q240C8 show
that it can perform the integer DCT of 4×4 block in twelve module based on FPGA is optimized for two same 1D-DCT
clock cycles and 10% resource consumption rate. It provides a through the analysis of pipelining level, a group input data
exploring attempt and a positive reference on the JPEG used to verify the algorithm and the hardware design that it
encoder system IP core design and their FPGA can meet the JPEG compression system.
implementation.
II. THEORY OF INTEGRAL TRANSFORMATION FOR H.264
Keywords-JPEG; Discrete Cosine Transform; Pipelining The 2D-DCT can be broken into two sequential 1D-DCT
level; FPGA
operations by employing the row-column decomposition,
one along the row vector and the second along the column
I. INTRODUCTION vector of the preceding row vector results [3, 4]. The 1D-
The H.264 digital coding standard developed by the JVT DCT for N point can be expressed as
(Joint Video Team) including VCEG (Video Coding Expert N −1
(2n + 1)kπ (1)
Group) and MPEG (Moving Picture Expert Group) of ITU- yk = Ck ∑ xn cos
n =0 2N
T is aimed at high-quality coding for video contents at very and the 1D-IDCT for N point is expressed as
low bit-rates [1]. The DCT plays a key role in several image N −1
(2n + 1)kπ
compression standards including JPEG for still picture xn = ∑ Ck yk cos (2)
compression [2]. It transforms a signal or image from the k =0 2N
spatial domain to the frequency domain and it is similar to where xn is the nth item of input sequence in time domain,
the DFT (Discrete Fourier Transform). However one yk is the kth item of output sequence in frequency domain.
primary advantage of the DCT over the DFT is that the The coefficient Ck is defined as
former involves only real multiplications, which reduces the ⎧ 1 k =0
total number of required multiplications, unlike the latter. ⎪ N (3)
Ck = ⎨
The other advantage lies in the fact that for most images ⎪ 2 k = 1, 2," N − 1
much of the signal energy lies at low frequencies, and are ⎩ N
often small enough to be neglected with little visible The 2D-DCT of N×N image block can be considered as
distortion. The DCT does a better job of concentrating the 1D-DCT for the row of blocks firstly, and then 1D-DCT
energy into lower order coefficients than does the DFT for for the column of blocks [5]
image data. However, the traditional DCT which is used to N −1 N −1
(2 j + 1)nπ (2i + 1)nπ
Ymn = CmCn ∑∑ X ij cos cos (3)
transform the prediction residuals of intra-frame or inter- i =0 j =0 2 N 2N
frame will bring out the complex hardware design of N −1 N −1
(2 j + 1)nπ (2i + 1)nπ
floating point calculations, and then it will result in the X ij = ∑∑ Cm CnYij cos cos (4)
mismatch between the encoder and decoder induced by the i = 0 j =0 2N 2N
rounding errors. In order to overcome these shortcomings, where Xij is the ith column and jth row prediction residual,
the new standard DCT of H.264 is modified to be realized Ymn is the coefficient of DCT at the switch matrix Y, and it
with integer addition and subtraction transform and shift is expressed as
operation. Therefore, the decoder output can recover Y = AXAT (5)
accurately the input code with slight declining of T
X = A YA (6)
compression performance in the condition of no considering
where the coefficient of switch matrix A in the N×N block
the quantitative effect.
is
No multiplications in the integer DCT of H.264 will
greatly reduce the complexity of computing and ensure
Authorized licensed use limited to: SARDAR VALLABHBHAI NATIONAL INSTITUTE OF TECH. Downloaded on August 16,2010 at 05:54:12 UTC from IEEE Xplore. Restrictions apply.
Y =(CXCT )⊗E
⎛⎡1 1 1 1 ⎤ ⎡1 1 1 d ⎤⎞ ⎡a2 ab a2 ab⎤
⎜⎢ ⎟ ⎢ ⎥
d −d −1⎥⎥ ⎢⎢1 d −1 −1⎥⎥⎟ ⎢ab
(10)
1 b2 ab b2 ⎥
=⎜⎢ X ⊗
⎜⎢1 −1 −1 1 ⎥ ⎢1 −d −1 1 ⎥⎟ ⎢a2 ab a2 ab⎥
⎜⎜⎢ ⎥ ⎢ ⎥⎟ ⎢ ⎥
⎝⎣d −1 1 d ⎦ ⎣1 −1 1 −d⎦⎟⎠ ⎣ab b2 ab b2 ⎦
where d=c/b (≈0.141), the multiply operation is needed
between the each element in the expression (CfXCfT) and
the corresponding elements in the matrix E. Generally, the
value of d is selected as 0.5 for simplify calculation. The
value of b is amended as b = 2 5 for maintaining the
orthogonal character of the transformation. Meanwhile, the
Fig.1. the module of 1D-DCT
elements of the second and forth column of matrix C, and
the elements of the second and forth row of matrix CT all
(2 j + 1)iπ
Aij = Ci cos cos (7) are multiplied with the coefficient 2, then the matrix E is
2N turned into the matrix Ef
For the 2D-DCT operation for a 4×4 image block such as Y = (Cf XCTf ) ⊗Ef
brightness block or chromaticity block, the corresponding
transform matrix A is ⎡ 2 ab ab⎤
⎢a a2
2 2 ⎥ (11)
⎡ 1 1 1 1 ⎤ ⎛ ⎡1 1 1 ⎤ ⎡1
1 2 1 1 ⎤ ⎞ ⎢⎢ab ⎥
⎢ cos(0) cos(0) cos(0) cos(0) ⎥ b2 ab b2 ⎥
2 2 2 2 ⎜⎢ ⎟
⎢ ⎥ 2 1 −1 −2⎥ ⎢1 1 −1 −2⎥ ⎟ ⎢ 2 4 2 4⎥
⎢ 1 π
cos( )
1 3π
cos( )
1 5π
cos( )
1 7π
cos( ) ⎥ (8) = ⎜⎢ ⎥X⎢ ⎥ ⊗⎢ ⎥
⎜ ⎢1 −1 −1 1 ⎥ ⎢1 −1 −1 2 ⎥⎟ ⎢ 2 ab ab⎥
⎢ 2 8 2 8 2 8 2 8 ⎥ ⎜⎜ ⎢ ⎥ ⎢ ⎥⎟ a a2
A= ⎢ −1⎦ ⎟⎠ ⎢ 2⎥
1 2π 1 6π 1 10π 1 14π ⎥ ⎝ ⎣1 −2 2 −1⎦ ⎣1 −2 1
⎢
2
⎥
⎢ cos( ) cos( ) cos( ) cos( )⎥ b2 b2 ⎥
⎢ 2 8 2 8 2 8 2 8 ⎥ ⎢ab ab
⎢ 1 3π 1 9π 1 15π 1 21π ⎥ ⎢⎣ 2 4 2 4 ⎥⎦
⎢ cos( ) cos( ) cos( ) cos( )⎥ where the multiply operation is only once for each element
⎣ 2 8 2 8 2 8 2 8 ⎦
of matrix Ef , and it can be evolved into the quantization
operation at the same time. Therefore, there are only the
1
where a = , b = 1 cos ⎛ π ⎞ , c = 1 cos ⎛ 3π ⎞ , then the operation including adder and subtraction of integer and
2 ⎜ ⎟ ⎜ ⎟
2 ⎝8⎠ 2 ⎝ 8 ⎠ shift in the expression (CfXCfT), here the actual output of
transform matrix A is integer DCT is
⎡a a a a⎤ W = CXC T (12)
⎢b c − c b ⎥ As mentioned above, we can change the matrix
(9)
A=⎢ ⎥ multiplication into two integral 1D-DCT shifting transform.
⎢a − a − a a ⎥
⎢ ⎥ Therefore, the integer 2D-DCT of H.264 can be divided into
⎣ c − b b − c⎦ two steps. Firstly, each column of image or residual is
Obviously, a, b and c are real number, and the elements are transformed as one dimension matrix DCT. Secondly, each
integers in the block X. With regard to the real DCT, the row of the first operation results is transformed as one
mismatch between the encoder and decoder induced by the dimension matrix DCT. Then the integer 2D-DCT can be
rounding errors will cause data error for the accuracy of realized as integer 1D-DCT, which can reduce the multiply
floating-point operation. As there is more prediction process computation cost and time with butterfly calculation.
than other image coding, and even the internal encoding
mode is dependent on spatial prediction in H.264. Therefore, TABLE I. BUTTERFLY PROCESSING
H.264 is very sensitive to the prediction drift. Based on type Input value Operation process Output value
integral DCT technology, we adjust the elements of matrix
A to integers. This method can effectively reduce
M [0] = x[0] + x[3] X[0] =M[ 0] +M[1]
computation cost, and no loss of image accuracy. (5) is
x [0 ], x [1 ] M [3] = x[0] − x[3] X[ 2] =M[ 0] −M[1]
equivalent to Positive
x [2 ], x [3 ] M [1] = x[1] + x[2] X[1] =M[ 2] +(M[3] <<1)
M [2] = x[1] − x[2] X[3] =M[3] −(M[ 2] <<1)
300
Authorized licensed use limited to: SARDAR VALLABHBHAI NATIONAL INSTITUTE OF TECH. Downloaded on August 16,2010 at 05:54:12 UTC from IEEE Xplore. Restrictions apply.
Fig. 1 2D-DCT
301
Authorized licensed use limited to: SARDAR VALLABHBHAI NATIONAL INSTITUTE OF TECH. Downloaded on August 16,2010 at 05:54:12 UTC from IEEE Xplore. Restrictions apply.
IV. CONCLUSIONS
In this paper we have analyzed the principle of integral
2D-DCT. After the pipelining levels in the module which
constitute of two modules of 1D-DCT are also analyzed, a
hardware implementation of based on FPGA through
selecting two pipelining 1D-DCT module instead of one
1D-DCT module to reuse twice, the speed can increase twice
but the consume resource increment is relative small. It
provides a exploring attempt and a positive reference on the
JPEG encoder system IP core design and their FPGA
implementation.
REFERENCES
Fig. 4. The module of 2D-DCT [1] T. Wiegand, G. J. Sullivan, “Overview of the H.264/AVC video coding
standard,” IEEE Transactions on circuits and systems for video technology,
vol.13 No.7 July.2003
[2] H. Kim, H. Jeong and Y. Lee, “Efficient DCT/IDCT and quantization
implementation of multimedia ASIP for H.264,”
[3] C Loeffler, A Lightenberg, “Practical fast 1-D DCT algorithms with 11
multiplications,” Proceedings of the International Conference on
Acoustics, Speech and Signal Processing(ICASSP’89),Scotland, May
1989, pp.988-991
[4] H.EL-Banna, A. A. EL-Fattah, W. Fakhr, “An Efficient Implementation
of the 1D DCT using FPGA Technology,” Proceedings of the 11th IEEE
International Conference and Workshop on the Engineering of Computer
Based Systems (ECBS’04), May 2004, pp: 356-60. doi:10.1109/ECBS.
2004. 1316719
[5] L V Agostini, I S Silva, “Pipelined fast 2D-DCT architecture for JPEG
image compression,” 14th Symposium on Integrated Circuits and Systems
Design, 2001,pp: 226-31
302
Authorized licensed use limited to: SARDAR VALLABHBHAI NATIONAL INSTITUTE OF TECH. Downloaded on August 16,2010 at 05:54:12 UTC from IEEE Xplore. Restrictions apply.