Академический Документы
Профессиональный Документы
Культура Документы
ISSN 1796-2048
Volume 6, Number 6, December 2011
Contents
REGULAR PAPERS
An Adaptive Algorithm for Improving the Fractal Image Compression (FIC)
Taha Mohammed Hasan and Xiangqian Wu
Information Loss Determination on Digital Image Compression and Reconstruction Using Qualitative
and Quantitative Analysis
Zhengmao Ye, Habib Mohamadian, and Yongmao Ye
An Improved Fast SPIHT Image Compression Algorithm for Aerial Applications
Ning Zhang, Longxu Jin, Yinhua Wu, and Ke Zhang
3D Tracking and Positioning of Surgical Instruments in Virtual Surgery Simulation
Zhaoliang Duan, Zhiyong Yuan, Xiangyun Liao, Weixin Si, and Jianhui Zhao
Design of Image Security System Based on Chaotic Maps Group
Feng Huang and Xilong Qu
The Capture of Moving Object in Video Image
Weina Fu, Zhiwen Xu, Shuai Liu, Xin Wang, and Hongchang Ke
Skeletonization of Deformed CAPTCHAs Using Pixel Depth Approach
Jingsong Cui, Lu Liu, Gang Du, Ying Wang, and Qianqi Guan
477
486
494
502
510
518
526
An Adaptive Algorithm for Improving the Fractal
Image Compression (FIC)
Taha Mohammed Hasan
School of Computer Science and Technology, Harbin Institute of Technology (HIT)
Harbin 150001, China
taha_alzaidy@yahoo.com
Xiangqian Wu
School of Computer Science and Technology, Harbin Institute of Technology (HIT),
Harbin 150001, China
xqwu@hit.edu.cn
AbstractIn this paper an adaptive algorithm is proposed
to reduce the long time that has been taken in the Fractal
Image Compression (FIC) technique. This algorithm
worked on reducing the number of matching operations
between range and domain blocks by reducing both of the
range and domain blocks needed in the matching process,
for this purpose, two techniques have been proposed; the
first one is called Range Exclusion (RE), in this technique
variance factor is used to reduce the number of range blocks
by excluding ranges of the homogenous or flat regions from
the matching process; the second technique is called
Reducing the Domain Image Size (RDIZ), it is responsible
for the reduction of the domain by minimizing the Domain
Image Size to 1/16
th
instead of 1/4
th
of the original image size
used in the traditional FIC. This in turn will affect the
encoding time, compression ratio and the reconstructed
image quality. For getting best results, the two techniques
are coupled in one algorithm; the new algorithm is called
(RD-RE). The tested (256x256) gray images are partitioned
into fixed (4x4) blocks and then compressed using visual
C++ 6.0 code. The results show that RE technique is faster
and gets more compression ratio than the traditional FIC
and keeping a high reconstructed images quality while RD-
RE is faster and it gets higher compression ratio than RE
but with slight loss in the reconstructed image quality.
Index TermsFractal, range block, variance, image
compression, encoding time
I. INTRODUCTION
Compression and decompression technology of digital
image has become an important aspect in the storing and
transferring of digital image in information society [1].
Recently fractal compression of digital images has
attracted much attention [2]. M. Barnsley introduced the
fundamental principle of fractal image compression in
1988 [3]. Fractal theories are totally different from the
others. Fractal image compression is also called as fractal
image encoding because compressed image is represented
by contractive transforms and mathematical functions
required for reconstruction of original image, instead of
any data in pixel form [4]. One of the most important
characteristics of fractal image coding is its
unsymmetrical property of encoding and decoding
processing. Coding time is rather long for domain
codebook generation and domain-range matching
operation, while decoding algorithm is relatively simple
and fast. This weak aspect makes the fractal compression
method not widely used as standard compression,
although it has advantage of fast decompression as well
as very high compression ratios [5].
Mathematically, FIC is based on the theory of Iterated
Function System (IFS) and its performance relies on the
presence of self-similarity between the regions of an
image. Since most images possess high degree of self-
similarity, fractal compression contributes an excellent
tool for compressing them [6]. FIC consists of finding a
set of transformations that produces a fractal image which
approximates the original image [7].
In IFS coding scheme, many main processes must be
done. First, range creating, the image must be partitioned
into blocks (ranges) with non-overlapping [8]. Second,
domain creating, the domain is created through taking the
average of every four (2 x 2) adjacent elements in range
to be one pixel in the domain, that means the size of
domain image will be quarter size of the range. Fig. 1
shows an example of range and domain blocks size.
Third, matching process, for every range block, a similar
domain block is found using IFS mapping. The data of
blocks of the compressed image are represented using the
IFS mapping coefficients [9].
FIC suffers from the length of time spent in the
compression process because there are a huge number of
corresponding mapping operations, as all over the range
compared with all domains for each of eight cases of (8
symmetries) [10].
In decoder process, the compressed image must be
reconstructed from IFS-code, which is saved in codebook
file. The reconstructed image starting from an arbitrary
image and iterates these affine transformation parameters,
according to the contractive mapping theory, the
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 477
2011 ACADEMY PUBLISHER
doi:10.4304/jmm.6.6.477-485
reconstructed image converges to attractor after 8
iterations [11]. Fig.2 shows the main process of the FIC
model.
Large efforts have been undertaken to speed up the
encoding process. Most of the proposed techniques
attempt to accelerate the searching and are based on some
kind of feature vector assigned to ranges and domains. A
different route to increased speed can be chosen by less
searching as opposed to faster searching [12]. In this
work the proposed algorithm worked to speed-up the FIC
by less searching through excluding homogenous ranges
from the search process and also through minimizes the
domain pool.
Figure 1. Construction domain block from the range block
a) Encoding Unit
b) Decoding Unit
Figure 2. Fractal Image Compression System Model
II. SELF SIMILARITY
In mathematics, a self-similar object is exactly or
approximately similar to a part of itself (i.e. the whole has
the same shape as one or more of the parts). Many objects
in the real world, such as coastlines, are statistically self-
similar: parts of them show the same statistical properties
at many scales. Self-similarity is a typical property of
fractals. Scale invariance is an exact form of self-
similarity where at any magnification there is a smaller
piece of the object that is similar to the whole. For
instance, a side of the Koch snowflake is both
symmetrical and scale-invariant; it can be continually
magnified 3x without changing shape [13].
Natural images are not exactly self-similar, natural
images can be partially constructed from affine
transformations of small parts of themselves. Self-
Similarity indicates that small portions of the image
resemble larger portions of the same image. The search
for this resemblance forms the basis of fractal
compression scheme [15]. Therefore the image must be
partitioned into blocks to find self-similarity in other
portion of the same image. This is intrinsic of fractal
encoding techniques.
Fig. 3 shows that selfsimilar portion of the image can
be found, there is a reflection of the hat in the mirror. The
reflected portion can be obtained using an affine
transformation of a small portion of her hat. Parts of her
shoulder are almost identical [11].
Figure 3. An example shows the self-similarity in Lenna image
This paper constrains on the idea that the self-
similarity in images depends on the image features
because in FIC compression, the matching process search
for self-similar portions in the image but in different
scales, so if the partition of the image to be search has
many details, it is hard to find a suitable matching part of
image for it and vice versa.
III. IMAGE FEATURES
Images contain many regions of different details, some
of these regions have significant detailed information and
others have not (flat or homogenous) those regions where
there is no gradation or that the gradation is not recognize
with the naked eye, see Fig.4. The homogeneous regions
are easy detectable in the images that are manufactured or
composed by human, such as personal photos, which
often contain areas of constant color, smooth as it is in
the background. On the contrary, it is difficult to obtain a
homogeneous 100% in natural images (landscape, etc.)
such images may be reflected areas appear to the naked
eye as a homogeneous or with a single color, but the
arithmetic is not, the current research focuses on the
exploitation of this principle , i.e., the exploitation of the
regions that appear homogeneous to the naked eye and
using them to speed up the compression process of the
Range Pool
Generation
Domain Pool
Generation
Matching and
Quantization
Original
Image
IFS Code
IFS Code
IFS Code
Dequantization
IFS Code
Decoding
Range pool
Domain pool =1/4
th
original image size
478 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
FIC to the point that do not affect the quality of the image
recovered after compression.
Figure 4. Area inside the box is a homogenous region, while the
region inside the circle contains different combinations
IV. PROPOSED METHOD
In this work an adaptive algorithm for improving the
basic fractal image coding process is proposed. The
algorithm works to enhance the performance of FIC in
terms of speeding up the encoding process and increasing
the compression ratio while keeping a high reconstructed
image quality. The proposed algorithm concentrates on
reducing the number of matching operations between
each rang block and domain pool needed to produce the
FIC code by the following two techniques:
A. Range Exclusion (RE) technique:
This technique works to reducing the number of range
blocks required in matching process by extracting a few
numbers of features that characterize the range images.
Then a number of ranges are excluding from matching
process with the all 8 symmetries. The mean and standard
variance features of all range blocks will be extracted.
The mean value of range gives the measure of average
gray level of range (Mr); standard variance (Vr) value of
range defines the dispersion of its gray level from the
mean. Variance has been used to check the range area
whether it is homogenous region or contains details. After
partitioning, the contents of each range will be checked
before starting the search operation to decide if it is a
homogenous region (flat) or not using the variance
criteria, in homogenous region the value of variance is
about zero while it is increase in the areas with more
details. The flat region means that all pixels of this region
have the same value or are close to each other. During the
matching process, the homogeneous ranges will be
excluded. So, the matching operation will be limited on
the detailed regions only and this leads to reduce a huge
amount of complex calculations, which results as fast
coding process. In order to achieve the greatest benefit,
the areas of homogeneous are controlled by using several
values of the variance; these values were named
Homogenous Permittivity (HP) which represents the
amount of homogeneity allowed. If the variance of any
part of the image (Range) is zero or is less than the HP
value, this means that all pixels of that part is equal or so
closely , then it will not enter in the search and matching
operation, and this range will be encoding only by saving
its mean value. So this process will speed-up FIC
significantly also it will increase the compression ratio
because each range excluded from the matching operation
will require only one byte (8 bits) to store its mean value
(Mr) as its fractal code instead of the 25 bits required to
store its IFS code parameters (s = 7bits, O= 5bits, x= 5
bits, y= 5bits and sym= 3bits) [11].
B. Reducing the Domain Image Size (RDIZ) technique:
This technique works on minimizing the domain pool;
as mentioned previously in traditional FIC, the encoding
process is computationally intensive. A large number of
sequential searches through a list of domains are carried
out while trying to find a best match for a range block. A
large domain pool will increase the number of
comparisons that have to be made to find the best domain
block and this where most of the computing time is used
[13]. The proposed method significantly reduces the
encoding time. The key of the idea is to reduce the
number of domain blocks searched for each range block.
This can be done by reducing the domain image size to
1/4
th
the traditional domain image size so this process
called Reducing the Domain Image Size (RDIZ). The
domain pool will be created as 1/16
th
of the original
image size instead of the conventional domain size (1/4
th
of the original size) by down sampling every 4x4 (instead
2x2) pixels in the original image (using the average
method) to one pixel in the reduced domain image as
illustrated in Fig. 5,6 . In this case, for example if the
original image was (256 x 256) pixels, the range block
size was 4 and the domain jump step was 4, the number
of the domain blocks needed in the match process for
each range block will be reduced from 1024 (in the
traditional FIC) to the 256 domain blocks. So, the
computations needed in the encoding process will be
reduced to be (4096 x 256) =1,048,576 instead of (4096 x
1024) = 4,194,304 and this will decrease the encoding
time significantly. Also reducing the domain size will
reduce the number of bits required to encode the original
image because the number of bits needed for storing each
of x and y coordinates of the best matched domains will
be decreased. In our example when the domain pool of
size (64 x 64) pixels , the maximum value for each x and
y coordinates will be (60) by dividing it on the jump step
(60/4 =15) then the encoder will need 4 bits to store each
of x and y coordinates instead of 5 bits in the traditional
FIC. Accordingly, this will lead to remarkable increase in
the compression ratio.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 479
2011 ACADEMY PUBLISHER
Figure 5. Down sampling method using the average of (4x4) pixels
Figure 6. Down sampling Lenna image using the average of (4x4) pixels
The following algorithm utilized steps required to
perform the proposed method.
Algorithm
Input: Image, HP
Output: IFS code ((x, y, s, o, and sym) or Mr)
Step1: Load the image into buffer
Step2: Partitioning the image into fixed blocks size
with non-overleap (R
1
R
n
)
Step3: Generate the domain pool blocks (D
1
D
m
)
from the original image using 4x4 averaging
method.
Step 4: Compute the mean for the current range block
R
i
according to (1):
) 1 (
1
1
0
1
0
=
=
size size
Y
j
ij
X
i size size
X
Y X
Mr
Step 5: Compute its variance according to (2):
[ ] ) 2 (
1
1
0
2
1
0
=
=
size size
Y
j
ij
X
i size size
Mr X
Y X
Vr
Step 6: If Vr <= HP then save ranges mean (Mr) and
excluding this range from the mapping operation
(jump to step 10), else jump to step 7.
Step 7: Do the mapping operation by:
Compute the scale s and offset o coefficients
according to (3) and (4):
) 3 ( S
1
2
1
2
1 1 1
=
= =
= = =
n
i
n
i
i i
n
i
i
n
i
i
n
i
i i
d d
r d r d
n
n
And
) 4 ( O
1
2
1
2
1 1 1 1
2
=
= =
= = = =
n
i
n
i
i i
n
i
n
i
n
i
n
i
i i i i i
d d
r d d d r
n
Quantize the s and o values
Compute the approximation error E(R
i
, D
i
; s, o)
according to (5):
) 5 ( 2 nO O O 2 2 S S
n
1
D) E(R,
1 1 1 1 1
2 2
+ + =
= = = = =
n
i
n
i
i
n
i
n
i
n
i
i i i i i r d r d d r
Compare the computed error with the minimum
registered error (E
min
): if E(R
i
, D
i
; s, o)> E
min
then jump to step 8, else
Replace the E
min
and store the current IFS code
(i.e. x, y, s, o, and symmetry).
Step8: Repeat the step (7) for all symmetry versions of
tested domain blocks.
Step9: Repeat step (7) to (9) for all domain blocks
listed in the domain pool.
Step10: Get the next range.
Step11: Repeat step (4) to (9) for all range blocks
listed in the range pool.
In order to get high speed FIC with more compression
ratio and keep as much as possible the reconstructed
image quality, different values of the variance are
adopted in the present research.
V. RESULTS AND DISCUSSION
In order to show the effects of each of the RE and
RDIZ on the traditional FIC, RE will be tested and its
results will be discussed at first and then we will discuss
the tests and results of coupling RE and RDIZ together in
one algorithm.
A. Results and Discussion of RE
The program of FIC was applied to many images
without check whether the image contains homogenous
region or not (i.e. HP=0, the normal state), then the RE
technique is applied to the same images with different HP
values. Table I shows the effects of applying different
values of HP on Peppers Image, it shows that when
HP=2, the compression ratio been increased to reach
(5.534) and the time required has been reduced from (34
seconds) to (18 seconds), when HP = 4 the compression
ratio increased to (5.65) and the encoding time decreased
to (14 seconds), when HP = 10, the compression ratio
increased to (5.747) and the time required for the
compression has decreased to (11 seconds), when the
Original image (256 x 256)
Domain image (64 x 64)
(1/16
th
) of the original image
size
480 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
value of HP = 30 the compression ratio increased to
(5.82) with the loss in image quality recovered, but still a
high quality (30.74) and the encoding time been
decreased to (9 seconds). Compared to time-spent when
HP = 0 (34 seconds), we have obtained the proportion of
speed about 73.5% of the compression process.
Fig. 7 shows the results of applying 0, 4 and 30 HP
values on Pepper image. Fig. 8 and 9 show some results
obtained when applying HP = 0, 6 and 20 on Bird and
Lenna images respectively.
In the experiments, HP values ranged from 1 to 30
have been applied to all images. The data obtained from
experiments, is plotted the relationship between the
values of HP and the encoding time (see Fig. 10) as well
as the relationship between HP and the quality of
recovered image (see Fig. 11) and also the relationship
between HP and the compression ratio (see Fig. 12).
The Original Image BIRD image (File size: 64 kb)
The Reconstructed Image
(a) Traditional method
(HP=0)
(b)
HP=6
(c)
HP=20
Enco. time: 34 Sec.
PSNR 30.30 DB
C.R:5.12
File size: 12.5 kb
Enco. time: 13 Sec.
PSNR 29.92 DB
C.R:5.68
File size: 11.27 kb
Enco. time: 9 Sec.
PSNR 28.69 DB
C.R:5.80
File size: 11.03 kb
Fig. 8 Results of the impact of HP on the BIRD image
The Original Image LENNA image (File size: 64 kb)
The Reconstructed Image
(a) Traditional method
(HP=0)
(b)
HP=6
(c)
HP=20
Enco. time: 35 Sec.
PSNR 32.22 DB
C.R:5.12
File size: 12.5 kb
Enco. time: 12 Sec.
PSNR 31.51 DB
C.R:5.72
File size: 11.18 kb
Enco. time: 9 Sec.
PSNR 30.64 DB
C.R:5.81
File size: 11.01 kb
Fig. 9 Results of the impact of HP on the LENNA image
The Original Image PEPPER image (File size: 64 kb)
The Reconstructed Image
(a) Traditional method
(HP=0)
(b)
HP=4
(c)
HP=30
Enco. time: 34 Sec.
PSNR 32.69 DB
C.R:5.12
File size: 12.5 kb
Enco. time: 14 Sec.
PSNR 32.15 DB
C.R:5.65
File size: 11.33 kb
Enco. time: 9 Sec.
PSNR 30.74 DB
C.R:5.82
File size: 10.99 kb
Fig. 7 Results of the impact of HP on the PEPPER image
TABLE I.
THE EFFECTS OF USING DIFFERENT HP VALUES ON C.R, E.T, AND THE
QUALITY OF PEPPER IMAGE
HP
value
C.R.
E.T.
(second)
PSNR
(dB)
0 5.12 34 32.69
2 5.534 18 32.45
4 5.65 14 32.15
10 5.747 11 31.65
30 5.82 9 30.74
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 481
2011 ACADEMY PUBLISHER
B.Results and Discussion of coupling RDIZ with RE
To achieve more speeding-up and more compression
ratio, RDIZ is coupled with RE in one algorithm. The new
algorithm is called RD-RE. RD-RE is applied to same
tested images with same previous HP values. The results
show that; there is a reasonable increase in compression
ratio, the encoding time is reduced significantly and the
reconstructed image quality is still acceptable. Table II
shows the effects of applying RD-RE with different
values of HP on Peppers Image.
LEENA image
PEPPER image
BIRD image
Fig. 10 The effect of HP on the Encoding Time
LEENA image
PEPPER image
BIRD image
Fig. 11 The effect of HP on the image quality
TABLE II.
THE EFFECTS OF APPLYING RD-RE WITH DIFFERENT HP VALUES ON
C.R, E.T, AND THE QUALITY OF PEPPER IMAGE
HP
value
C.R.
E.T.
(second)
PSNR
(dB)
0 5.56 9 31.9
2 5.89 4.5 31.3
4 6.1 3.6 31.04
10 6.29 2.7 30.56
30 6.375 2.1 29.68
LEENA image
PEPPER image
BIRD image
Fig. 12 The effect of HP on the Compression Ratio (C.R.)
482 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
From table II, it can be seen that when HP=0, that
means there no effects of RE, so the results is affected
only by the RDIZ, the C.R. is increased from 5.12 to 5.56
(increasing about 8.6%) and time is reduced from 34 to 9
seconds (decreasing about 73.5%) but with a little lose in
PSNR (about 2.4%). The other values of the HP show the
full effects of RD-RE on the FIC results. From comparing
the results achieved from tables I and II, it obvious that
using HP value = 30 in RE method gives an acceptable
PSNR (about 30), increasing C.R. from 5.12 to 5.82
(about 14% increasing) and reduction in the E.T. from 34
to 9 seconds (about only 26% of the time required when
HP= 0), and in RD-RE the same PSNR (about 30) can be
achieved when HP=10 but the C.R. will be increased to
6.29 and this mean that this method get higher
compression ratio (about 22.85% increasing) than the
traditional FIC and also the encoding time will be
significantly reduced to 2.7 seconds meaning that the
time will be decreased about 93.82% of the time required
in traditional FIC. Fig. 13-15 show the effects of applying
RD-RE with different HP values on Pepper, Bird and
Lenna images respectively.
Fig. 16-18 show comparison of the results achieved
from applying traditional FIC, RE technique, RDIZ
technique and the RD-RE algorithm on the tests images
(the results of traditional FIC in the figures are the results
with HP=0 in the RE columns and the results of RDIZ are
with HP=0 in the RD-RE column). The figures showed
that the RDIZ can reduce the encoding time to about 1/4
th
the time required in the traditional FIC and can get about
8.6% C.R. more than traditional but with a little loss in
PSNR differs from one image to another but it is about
1.78% in average (for the test images). Also the figures
showed that the effects of each of the RE and RD-RE with
different HP values on the test images, the results are
differ from one image to another depending on how many
homogenous regions are there? but if we take the value of
HP=10 as a compression value, we can see that the
encoding time is reduced to about 62% and 91% in
average and the C.R. can be increased to about 12.2% and
22% in average by applying RE and RD-RE respectively
in comparison with the result achieved by applying the
traditional FIC on the test images but the PSNR is
reduced slightly to about 2.77% and 5.71% respectively.
The Original Image PEPPER image (File size: 64 kb)
The Reconstructed Image
(a) Traditional method
(HP=0)
(b)
HP=4
(c)
HP=30
Enco. time: 9 Sec.
PSNR 31.9 dB
C.R:5.56
File size: 11.51 kb
Enco. time: 3.6 Sec.
PSNR 31.04 dB
C.R:6.1
File size: 10.49 kb
Enco. time: 2.1 Sec.
PSNR 29.68 dB
C.R:6.375
File size: 10.03 kb
The Original Image BIRD image (File size: 64 kb)
The Reconstructed Image
(a) Traditional method
(HP=0)
(b)
HP=6
(c)
HP=20
Enco. time: 9 Sec.
PSNR 29.75 dB
C.R:5.56
File size: 11.51 kb
Enco. time: 3.27 Sec.
PSNR 28.99 dB
C.R:6.13
File size: 10.44 kb
Enco. time: 2.24 Sec.
PSNR 27.74 dB
C.R:6.34
File size: 10.09 kb
Fig. 14 Results of applying RE-RD with different HP on the Bird image
Fig. 13 Results of applying RE-RD with different HP on the PEPPER image
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 483
2011 ACADEMY PUBLISHER
The Original Image LENNA image (File size: 64 kb)
The Reconstructed Image
(a) Traditional method
(HP=0)
(b)
HP=6
(c)
HP=20
Enco. time: 9 Sec.
PSNR 31.85 dB
C.R:5.56
File size: 11.51 kb
Enco. time: 3.2 Sec.
PSNR 30.53 dB
C.R:6.19
File size: 10.33 kb
Enco. time: 2.21 Sec.
PSNR 29.62 dB
C.R:6.31
File size: 10.14 kb
Fig. 15 Results of applying RE-RD with different HP on the Lenna image
Fig. 16 The effects of applying RE-RD and RD on the E.T. with
different HP
Pepper Image
0
5
10
15
20
25
30
35
40
0 2 4 6 10 15 20 30
HP
RD-RE
RE
Bird Image
0
5
10
15
20
25
30
35
40
0 2 4 6 10 15 20 30
HP
Lenna Image
0
5
10
15
20
25
30
35
40
0 2 4 6 10 15 20 30
HP
RD-RE
RE
Bir d Ima ge
4
5
6
7
8
0 2 4 6 10 15 20 30
H P
Pe ppe r Ima ge
4
5
6
7
8
0 2 4 6 10 15 20 30
H P
RD - R E
RE
Le nna Ima ge
4
5
6
7
8
0 2 4 6 10 15 20 30
H P
C
o
m
p
r
e
s
s
i
o
n
R
a
t
i
o
C
.
R
.
Fig. 17 The effects of applying RE-RD and RD on the C.R. with
different HP
Pepper Image
25
26
27
28
29
30
31
32
33
34
35
0 2 4 6 10 15 20 30
H P
RD-RE
RE
Bird Image
25
26
27
28
29
30
31
32
33
34
35
0 2 4 6 10 15 20 30
H P
P
S
N
R
Lenna Image
25
26
27
28
29
30
31
32
33
34
35
0 2 4 6 10 15 20 30
H P
P
S
N
R
Fig. 18 The effects of applying RE-RD and RD on the PSNR with
different HP
484 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
VI. CONCLUSIONS
1. The effects of HP values on the results of FIC are
different from one image to another depending on the
image composition. Best results can be achieved in
the images that have many homogeneous regions.
2. Experience has shown that if the value of HP=0, the
results represent the normal situation for the FIC
process.
3. High values of the HP lead to exclude more number
of range areas from the process of searching and
matching in the compression algorithm and this will
provide high speed and more compression ratio but on
the expense of the quality.
4. Experiments showed (in test images) that values of
HP greater than 30 in RE technique and greater than
10 in RD-RE technique may lead to poor quality.
5. When HP=0 in RD-RE mean that the results are
affected only by RDIZ technique, and the results
showed that RDIZ lead to reduce E.T. significantly
and increase the C.R. remarkably but with a slightly
loos in PSNR.
6. There is an inverse relationship between HP and the
time spent in the encryption process.
7. There is an inverse relationship between HP and the
quality of recovered image.
8. The relationship direct correlation between HP and
the compression ratio.
9. HP=10 is the best value in RD-RE algorithm because
it made a best tradeoff among E.T., C.R. and PSNR
(for the test images).
ACKNOWLEDGMENT
This work was supported by the Natural Science
Foundation of China (NSFC) (No. 60873140, 61073125
and 61071179), the Program for New Century Excellent
Talents in University (No. NCET-08-0155 and NCET-08-
0156), and the Fok Ying Tong Education Foundation
(No. 122035).
REFERENCES
[1] Y.Chakrapani and K.Soundera Rajan, Hybrid Genetic-
Simulated Annealing Approach for Fractal Image
Compression, International Journal of Information and
Mathematical Sciences vol.4, no.4, pp. 308-313, 2008.
[2] Gonzalez R. C., and Wintz, P., Digital Image Processing,
2nd ed., Addison-Wesley publication company, 1987.
[3] M. Barnsley, Fractals Everywhere. New York:
Academic, 1988.
[4] V. Chaurasia and A. Somkuwar, Speed up Technique for
Fractal Image Compression, International Conference on
Digital Image Processing icdip, pp.319-323, 2009, doi.
10.1109/ICDIP.2009.66.
[5] Y. Fisher, Fractal Image Compression Theory and
Application, Springier Verlage, New York, 1994.
[6] Ghada K., "Adaptive Fractal Image Compression", M.Sc.
thesis, National computer Center /Higher Education
Institute of computer and Informatics, 2001.
[7] Sumathi Poobal, and G. Ravindran, Arriving at an
Optimum Value of Tolerance Factor for Compressing
Medical Images, World Academy of Science,
Engineering and Technology, vol. 24, pp. 169-173, 2006.
[8] Mahadevaswamy H.R., New Approaches to Image
Compression, Ph.D. thesis, Regional Engineering
College, university of Calicut, December, 2000.
[9] Auday A., "Fractal Image Compression with Fasting
Approaches", M.Sc. thesis, College of Science, Saddam
University, 2003.
[10] Jamila H. S., "Fractal Image compression", Ph.D. thesis,
College of Science, University of Baghdad, January,
2001.
[11] S. Abdul-Khalik, Fractal Image Compression Using
Shape Structure, M.Sc. thesis, College of Science, Al-
Mustansiriya University, Iraq, 2005.
[12] Dietmar Saupe, Accelerating Fractal Image Compression
by Multi-Dimensional Nearest Neighbor Search, CC95
Data Compression Conference, J. A. Storer, M. Cohn
(eds.), IEEE Computer Society Press, March 1995.
[13] http://en.wikipedia.org/wiki/Self-similarity, 1/6/2011.
[14] Mario Polvere and Michele Nappi, Speed-Up In Fractal
Image Coding Comparison of Methods, IEEE
TRANSACTIONS ON IMAGE PROCESSING, VOL. 9,
NO. 6, JUNE 2000.
[15] E. Vrcasy and L. Colin, , "Image Compression Using
Fractals, IBM Journal of Research and Development,
vol. 65, no.19, pp. 121-134 1995.
Taha M. Hasan received his BSc, MSc
in computer science from Mansour
University College and The University of
Mustansiriyah, Baghdad, Iraq in1992 and
2006 respectively. He is currently
pursuing the Ph.D. degree at the Harbin
Institute of Technology (HIT), Harbin,
China. His research interests is the image
processing.
Xiangqian Wu received his BSc, MSc
and PhD in computer science from
Harbin Institute of Technology (HIT),
Harbin, China in 1997, 1999 and 2004,
respectively. Currently he is a professor
in HIT. His research interests include
image processing, pattern recognition
and biometrics, etc.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 485
2011 ACADEMY PUBLISHER
Information Loss Determination on Digital Image
Compression and Reconstruction Using
Qualitative and Quantitative Analysis
Zhengmao Ye and Habib Mohamadian
Southern University, Baton Rouge, Louisiana, USA
Emails: {zhengmaoye, habib}@engr.subr.edu
Yongmao Ye
Liaoning Radio and Television Station, Shenyang, China
Email: yeyongmao@hotmail.com
Abstract To effectively utilize the storage capacity, digital
image compression has been applied to numerous science
and engineering problems. There are two fundamental
image compression techniques, either lossless or lossy. The
former employs probabilistic models for lossless storage on
a basis of statistical redundancy occurred in digital images.
However, it has limitation in the compression ratio and bit
per pixel ratio. Hence, the latter has also been widely
implemented to further improve the storage capacities,
covering various fundamental digital image processing
approaches. It has been well documented that most lossy
compression schemes will provide perfect visual perception
under an exceptional compression ratio, among which
discrete wavelet transform, discrete Fourier transform and
some statistical optimization compression schemes (e.g.,
principal component analysis and independent component
analysis) are the dominant approaches. It is necessary to
evaluate these compression and reconstruction schemes
objectively in addition to the visual appealing. Using a well
defined set of the quantitative metrics from Information
Theory, a comparative study on several typical digital
image compression and reconstruction schemes will be
conducted in this research.
Index Terms Image Compression, Image Reconstruction,
Discrete Wavelet Transform, Discrete Fourier Transform,
Optimal Compression
I. INTRODUCTION
The objectives of digital image compression schemes
are to minimize the image size so as to speed up data
transmission and reduce the memory requirement, while
to retain the image quality at an acceptable level.
Lossless compression leads to an exact replica of the
source image when being decompressed. By contrast,
lossy compression can also be introduced to save more
storage space with a trade-off of sacrificing finer
information. In general, digital images have certain
statistical properties that are exploited by encoders, thus
the compression result is always less than optimal. With
a little tolerable mismatch between the compressed and
source images, an approximation is made adequately.
Applications of digital image compression involve a
wide variety of methodologies and techniques, many of
whom have appeared in literatures [1-4].
Context-based modeling is of great importance for
high-performance lossless data compression. Partial
approximate matching is used to reduce the modeling
costs and enhance compression efficiency based on
previous context modeling. It has competitive lossless
compression performance [10]. Huber regression is
applied to robust fractal image compression design,
which is insensitive against noises and outliers in the
corrupted source images. To reduce the high
computational costs, particle swarm optimization is
utilized to minimize searching time. Encoding time is
effectively reduced while the quality of retrieved images
is preserved [11]. Adaptive coding techniques are used to
exploit the spatial energy compaction property of discrete
wavelet transform. Two crucial issues are the flexibility
level and coding efficiency. Spherical coder is an
adaptive framework using local energy as a direct
measure to differentiate wavelet subbands and allocate
the bit rates. The scheme is nonredundant with the
competitive peak signal to noise ratio (PSNR) [12]. An
adaptive one-hidden-layer feedforward neural network is
applied to image compression. Training, generalization
capabilities and quantization effects of this adaptive
network are improved with promising results [13]. A new
encoding algorithm is proposed for matching pursuit
image coding. Coding performance is improved when
correlations are used in encoding. Optimization is
reached upon tradeoff among the efficient atom position
coding, atom coefficient coding and optimize encoder
parameters. The proposed algorithm outperforms existing
coding algorithms for matching image coding. The
algorithm also provides better rate distortion performance
than JPEG 2000 at low bit rates [14]. A practical uniform
down-sampling is proposed in image space, making the
sampling adaptive by spatially varying, directional low-
pass prefiltering. The decoder decompresses the low-
resolution image and then upconverts it to the original
resolution in the constrained least squares restoration
486 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
doi:10.4304/jmm.6.6.486-493
process, using a 2D piecewise autoregressive model and
directional low-pass prefiltering. It achieves superior
visual quality and also outperforms JPEG2000 in PSNR
measure at the low and medium bit rates [15]. A
multidimensional multiscale parser is used for universal
lossy compression of images and videos. It is based on
multiscale pattern matching and block encoding of an
input signal. The pattern is updated using encoded
blocks. It presents a flexible probability model using
smoothness constraints. It improves both smooth images
and less smooth ones [16]. The 2D oriented wavelet
transform is used for image compression, where two
separable 1D transforms are implemented with direction
consistency. Interpolation rules are designed to generate
rectangular subbands and direction mismatch is adjusted.
The proposed compression scheme outperforms the
JPEG2000 for remote sensing images with high
resolution. It is suitable for real time processing at a low
cost [17]. To reduce coding distortions at borders,
symmetric extension filter bank is applied to wavelet
packet coding. It provides a framework to accommodate
FIR and IIR filters with potential perfect reconstruction.
The IIR filters with both the rational and irrational
transfer functions are implemented, giving rise to perfect
stopband suppression [18]. Linear convolution neural
network has been used to seek optimal wavelet kernel
that minimizes errors and maximizes image compression
efficiency. Daubechies wavelet transform can produce
the highest compression efficiency and smallest mean-
square-error for most patterns. Haar wavelet produces
solid results on sharp edges and low noise smooth areas.
It provides robust fractal image compression [19]. A
supervised classification scheme is also proposed for
hyperspectral imagery, consisting of the greedy modular
eigenspace and positive Boolean function. The greedy
modular eigenspace is implemented as a feature extractor
to generate the feature eigenspace for all material classes
present in hyperspectral data to reduce dimensionality.
The positive Boolean function is a stack filter for
supervised training. It uses the minimum classification
error as the criterion to improve the classification
performance. The feature extractor serves as a nonlinear
PBF-based multiclass classifier for classification
preprocessing. It increases the accuracy for hyperspectral
imagery [20]. Principal component analysis (PCA) is a
multivariate statistical method for image compression. It
is sensitive to the outliers and missing data, so fuzzy
statistics is introduced into the classical PCA methods.
The surface characteristics are differentiated sufficiently
and feature recognition accuracy is improved greatly
[21]. At the same time, several optimization based
approaches such as PCA, nonlinear component analysis
(NCA) and independent component analysis (ICA) have
been applied to research fields of image processing and
pattern recognition [22-23].
No doubt that each scheme has its own benefit for
image compression. However, it is impractical to claim
any to be the most powerful one as some conflicting
results occur in different cases. Tradeoffs also need to be
made frequently among the compression ratios and
reconstruction qualities as well as computational costs. In
fact, there are at least three dominating compression
approaches for digital image, such as prevalent discrete
wavelet transform, discrete Fourier transform and
optimal statistical schemes. In order to further access the
merits and drawbacks of these approaches, quantitative
metrics are introduced to measure qualities of image
compression based on these major approaches [5-9].
II. SCHEMES FOR IMAGE COMPRESSION
Three digital image compression and reconstruction
schemes will be analyzed in this section.
A. Discrete Wavelet Transform (DWT)
2D discrete wavelet transform uses a set of basis
functions for image decomposition. In a two dimensional
case, one scaling function (x, y) and three wavelet
functions
H
(x, y),
V
(x, y) and
D
(x, y) are employed.
Each scaling function or wavelet function is the product
of the basis wavelet functions. Four products produce the
scaling function (1) and separable directional sensitive
wavelet functions (2)-(4), resulting in a structure of
quaternary tree. The simple Haar Transform has been
used to determine the scaling and wavelet functions.
(x, y) = (x)(y) (1)
H
(x, y) = (y)(x) (2)
V
(x, y) = (x)(y) (3)
D
(x, y) = (x)(y) (4)
These wavelets measure variations for images along
three directions, where
H
(x, y) corresponds variations
along columns (horizontal),
V
(x, y) corresponds to
variations along rows (vertical) and
D
(x, y) corresponds
to variations along diagonals (diagonal). The scaled and
translated basis functions are defined by:
j,m,n
(x, y) = 2
j/2
(2
j
x - m, 2
j
y - n) (5)
i
j,m,n
(x, y) = 2
j/2
i
(2
j
x - m, 2
j
y - n)
i={H, V, D} (6)
where index i identifies the directional wavelets of H,
V, and D. The discrete wavelet transform of function f(x,
y) of size M by N is formulated as:
0
M-1 N-1
0
x=0 y=0
j ,m,n
1
w (j ,m,n)= f(x,y) (x,y)
MN
(7)
M-1 N-1
i i
j,m,n
x=0 y=0
1
w (j,m,n)= f(x,y) (x,y)
MN
(8)
where i={H, V, D}, j
0
is the initial scale, the w
j
(j
0
, m,
n) coefficients define the approximation of f(x, y), w
i
(j,
m, n) coefficients represent the horizontal, vertical and
diagonal details for scales j>= j
0
. Here j
0
=0 and select N
+ M = 2
J
so that j=0, 1, 2,, J-1 and m, n = 0, 1, 2, , 2
j
-1. Then f(x, y) is retrieved via the inverse discrete
wavelet transform.
2D wavelet compression is introduced, where discrete
wavelet transform is implemented as the N level
decomposition. At each level N, outputs of wavelet
decomposition include: the approximation, horizontal,
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 487
2011 ACADEMY PUBLISHER
vertical and diagonal details, at a quarter size of the
source image, followed by downsampling by a factor of
two. The approximation will be further decomposed until
level N is reached while detail components are held. The
approximation components will be retained for image
reconstruction. The level-dependent soft thresholding is
then selected and applied to detail coefficients using a
shrinkage function at each level. It produces level and
orientation dependent thresholds. The reconstruction is
an inverse transform. At each level back from N to 1, the
approximation and updated detail coefficients will both
be applied for wavelet image reconstruction. Discrete
wavelet compression is suitable for transient signals. On
the other hand, Fourier transform is suitable for smooth
and periodic signals
B. Discrete Fourier Transform (DFT)
2D discrete Fourier transform acts as another simple
conventional scheme for digital image compression. DFT
operates on a function at a finite number of discrete data
points. The formulation is less complex compared with
DWT. Using DFT, an image in the spatial domain is
transformed into a function in the frequency domain,
which can be separated into the real and imaginary
components. DFT returns the value of Fourier transform
for a set of values in frequency domain which are equally
spaced. 2D DFT of f(x,y) in the spatial domain to F(u, v)
in the frequency domain is shown in (9), where M
samples can be obtained at values of x from 0 to M-1 and
N samples can be obtained at values of y from 0 to N-1.
M-1 N-1
-2i(xu/M+yv/N)
x=0 y=0
1
F(u, v)= f(x,y)e
MN
(9)
Now DFT decomposes a digital image into its real and
imaginary components to represent image information in
the frequency domain. Discrete Cosine Transform (DCT)
and Discrete Sine Transform (DST) are both special
cases of the discrete Fourier transform (DFT). In the
spatial domain, DFT operates on both cosine and sine
functions at a finite number of discrete data, while DCT
and DST solely uses cosine (even) function and sine
(odd) function, respectively. The number of frequencies
in the frequency domain is equivalent to the number of
pixels in the spatial domain. In the frequency domain,
DCT contains solely real parts while DST contains solely
imaginary parts of the DFT complex exponentials. DFT
is applied in context. Image compression via Fourier
transform requires zero padding of input data. Inverse
DFT can also be implemented for reconstruction.
DFT is used to reduce computation cost via separating
the total data points into multiple sets, where image
compression is conducted via zero padding of input data.
The compression ratio depends on the percentage of real
and imaginary components that is actually substituted by
zero using zero padding. To reconstruct the image in the
spatial domain after compression, it can be retrieved
from the frequency domain back to the spatial domain
using inverse Fourier transform, as shown in (10).
M-1 N-1
+2i(xu/M+yv/N)
u=0 v=0
f(x, y)= F(u, v)e
(10)
Both DWT and DFT techniques can be successfully
applied to reduce the amount of memory needed to
represent a digital image. The relatively smaller amount
of information is efficiently used to represent the true
image via DWT or DFT image compression. By and
large, DFT is more appropriate for smooth images and
DWT is more appropriate for images with high frequency
components.
C. 2D Principal Component Analysis (PCA)
Statistical optimization is a unique approach in digital
image compression. A digital image can be sorted into
smaller ones and then compressed by projecting a set of
input block matrices onto a reduced number of vectors
(principal components) being estimated. The image
matrix (e.g., MN) is thus divided into a set of small
blocks of the same small size (e.g., mn matrices), giving
rise to a block set. Each block is simply substituted by a
single column vector of the length m*n in order, which
acts as one input vector. Upon image compression, all
single input vectors are projected onto a reduced number
of the vectors estimated from the input vectors by the
generalized Hebbian algorithm, where the principal
components can be determined based on the vectors
using one layer linear neural network. The estimated
components are the final weight vectors of the neural
networks.
This approximation is optimal in a least square sense.
The most significant component has the highest accuracy
and remaining ones have the decreasing accuracy. Using
a single layer neural network with linear neurons, the
generalized Hebbian algorithm illustrates a feedforward
neural network for unsupervised learning. Its learning
rule is formulated as:
j i 1
w = [ - (w y )]
j
ij k ik k
y x
=
(11)
where is the learning rate; w
ij
is the synaptic weighting
function between ith input and jth output neurons; both x
and y are input vectors and output activation vectors,
respectively. It has a normalization effect with the
presence of the current vector in the projection
subtraction. The actual number of neurons represents the
final number of principal components, which also
determines the compression ratio and the bit per pixel. A
smaller number will enhance the compression ratio and
the storage capacity. Several significant eigenvectors can
be obtained using this learning algorithm. It converges on
the eigenvalue decomposition with a probability of one.
To retrieve the digital image after data compression, the
input vectors are obtained in terms of the chosen
principal components at the reduced number. After
reversing the single matrix blocks in order and then
reconstructing multiple image blocks, the decompressed
image is obtained. At the same time, the image size is
reduced at a certain compression ratio, which will gain
the storage capacity.
488 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
D. Visual Comparisons of Three Compression Schemes
The visual effects in terms of three dominant image
compression schemes are analyzed in this section from a
qualitative perspective.
Gray Level Source Image
Component Analysis Training
Discrete Wavelet Transform
Discrete Fourier Transform
Figure 1. Image Compression of Pyramid and Sphinx
Gray Level Source Image
Component Analysis Training
Discrete Wavelet Transform
Discrete Fourier Transform
Figure 2. Image Compression of National Capitol Columns
Gray Level Source Image Discrete Wavelet Transform
Component Analysis Training Discrete Fourier Transform
Figure 3. Image Compression of Great Goose Pagoda
All three selected schemes have provided solid image
compression without degradation of the image quality.
Three archetypal gray level images are chosen as shown
in Figs. 1-3. The first one is the picture of the Great
Pyramid and Sphinx in Egypt, the second one is the
picture of the National Capitol Columns in USA, and the
third one is the picture of Great Goose Pagoda in China,
symbolizing the integration of ancient and modern
civilization in human history. Discrete wavelet transform,
discrete Fourier transform and the optimal principal
component analysis have been applied to compress and
reconstruct the images. For each of three cases, no
remarkable distinction is shown when the source and
reconstruction images using three schemes are compared.
As a result, the objective evaluation should be conducted
instead to make further comparisons.
III. QUANTITATIVE ANALYSIS
Digital images with MN pixels have been considered.
Occurrence of the gray level is shown as a co-occurrence
matrix of relative frequencies. Occurrence probability
functions are then estimated from the histogram.
A. Discrete Entropy
The discrete entropy is a measure of the information
content, which can be interpreted as the average
uncertainty of the information source. The discrete
entropy is the summation of products of the probability
of the outcome multiplied by the logarithm of the inverse
of probability of the outcome, taking into considerations
of all possible outcomes {1, 2, , n} as the gray level in
the event {x
1
, x
2
, , x
n
}, where p(i) represents the
probability at the level i, which contains all the histogram
counts. It is shown in (12).
k k
2 2
i=1 i=1
1
H(x)= p(i)log = - p(i)log p(i)
p(i)
(12)
B. Discrete Energy
The discrete energy measure indicates how the gray
level elements are distributed. Its formulation is shown in
(13), where E(x) represents the discrete energy with 256
bins and p(i) refers to the probability distribution
function at different gray level, which contains the
histogram counts. For any constant value of the gray
level, the energy measure reaches its maximum value of
one. The larger energy corresponds to lower gray level
number and the smaller one corresponds to higher gray
level number.
k
2
i=1
E(x)= p(i)
(13)
C. Contrast
Contrast is the amount of the gray level (or true color)
difference in the visual properties which shows the
difference between one object and another or background
within the same field of view. The high contrast level and
low contrast level will display distinguishable degree of
variations in gray level (or true color) visual perception.
Highlights and shadows will depict intense differences of
density in image tones for high contrast images, but
depict mild differences of density in image tones for low
contrast images. In the context, root mean square (RMS)
contrast is used as another quantitative metric. It is
defined as the standard deviation of the pixel intensities,
which is not dependent on the actual spatial distribution
of the image contrast.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 489
2011 ACADEMY PUBLISHER
2 M-1 N-1
i=0 j=0
[g(i,j) - g ]
Contrast=
M*N
AVG
(14)
where the intensity g(i, j) denotes an element at
coordinates i and j on a 2D image of the size MN. g
AVG
is the average intensity of all pixel values within an
image. The image intensity has been normalized within
the range [0, 1].
D. Correlation
Correlation is a standard measure of the image contrast
to analyze the linear dependency of the gray levels of
neighboring pixels. It indicates the amount of local
variations across a gray level image. The higher the
contrast is, the sharper the structural variation is. This
measure is formulated as:
M-1 N-1
i j
i=0 j=0 i j
(i- )(j- )
COR = g(i,j)
(15)
M-1 N-1 M-1 N-1
2
i i i
i=0 j=0 i=0 j=0
= (i- ) g(i,j); [i*g(i,j)] =
M-1 N-1 M-1 N-1
2
j j j
i=0 j=0 i=0 j=0
= (j- ) g(i,j); [j*g(i,j)] =
where i and j are the coordinates of the co-occurrence
matrix; M and N represent total numbers of pixels in the
row and column of the digital image; g(i, j) is the element
in the co-occurrence matrix at coordinates i and j.
i
and
i
are the horizontal mean and variance and
j
and
j
are
the vertical mean and variance. is a metric of the gray
tone variance.
E. Dissimilarity
Dissimilarity between two gray level images is
expressed as the distance between two sets of co-
occurrence matrix representations. It is formulated on a
basis of local distance representation as shown in (16).
M-1 N-1
i=0 j=0
DisSim= g(i,j) |i-j|
(16)
where g(i, j) is an element in the co-occurrence matrix at
the coordinates i and j; M and N represent total numbers
of pixels in the row and column of the digital image.
F. Homogeneity
This measure acts as a direct measure of the local
homogeneity of a gray level image, which relates
inversely to the image contrast. The higher values of
homogeneity measures indicate less structural variations
and the lower values indicate more structural variations.
A larger value is corresponding to a higher homogeneity
while a smaller value is corresponding to a lower
homogeneity. It is formulated as (17).
M-1 N-1
2
i=0 j=0
1
Homogeneity= g(i,j)
1+(i-j)
(17)
G. Mutual Information
Another metric of the mutual information I(X; Y) can
be applied as well, which is to describe how much
information one variable tells about the other variable.
The relationship is formulated as (18).
XY
XY 2
X,Y X Y
p (X, Y)
I(X;Y)= p (X, Y)log H(X) - H(X |Y)
p (X)p (Y)
=
(18)
where H(X) and H(X|Y) are the values of the discrete
entropy and conditional entropy; p
XY
is the joint
probability density function; p
X
and p
Y
are the marginal
probability density functions. It can be explained as the
information that Y can tell about X is the reduction in
uncertainty of X due to the existence of Y. It can also be
regarded as the relative entropy between the joint
distribution and product distribution.
IV. NUMERICAL SIMULATIONS
Using the quantitative metrics defined above, a
comparative study on the mismatch between the source
and reconstructed images is made, using three schemes
of discrete wavelet transform, discrete Fourier transform
and optimal principal component analysis. The detail
results are plotted in Fig. 4. Graphically for all three
selected pictures of the Pyramid, the Columns and the
Pagoda, it is hard to tell exactly whether or not one
scheme is superior to another, since only very little
difference on each quantitative metric can be observed.
The reason is that three schemes of DFT, DWT and PCA
are all fairly effective in digital image compression and
reconstruction. Thus, the retrieval images just depict little
difference away from the source images.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
2
4
6
8
Case No.
Q
u
a
n
t
i
t
a
t
i
v
e
M
e
t
r
i
c
s
Digital Image (Great Sphinx) Compression & Reconstruction
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
2
4
6
8
Case No.
Q
u
a
n
t
i
t
a
t
i
v
e
M
e
t
r
i
c
s
Digital Image (Capitol Columns) Compression & Reconstruction
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
2
4
6
8
Case No.
Q
u
a
n
t
i
t
a
t
i
v
e
M
e
t
r
i
c
s
Digital Image (Great Goose Pagoda) Compression & Reconstruction
Discrete Entropy
Contrast
Correlation
Dissimilarity
Homogeneity
Discrete Entropy
Contrast
Correlation
Dissimilarity
Homogeneity
Discrete Entropy
Contrast
Correlation
Dissimilarity
Homogeneity
Figure 4. Metrics in Image Compression and Reconstruction
490 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
Accordingly, more detailed simulation data analysis is
necessary. The information metrics obtained from image
processing of the Pyramid, Columns and Pagoda pictures
are listed from top to bottom in Table 1. For all three
cases, the source image contains the greater amount of
discrete entropy, contrast and dissimilarity than the three
retrieved images, but the smaller amount of the discrete
energy, correlation and homogeneity than the three
retrieved images. Because there is an information loss
occurred across image compression and reconstruction
process, less information content has been covered,
which gives rise to the smaller discrete entropy. The
discrete energy levels differ insignificantly due to
magnitudes of the discrete energy levels themselves,
however, it can still be observed that the source images
contain more information from the smallest discrete
energy values. The source image has the higher contrast
and dissimilarity than three reconstructed images, which
means more color distributions and tone differences
between elements. At the same time, it is also associated
with the bigger distance of dissimilarity. On the other
hand, actual outcomes of correlation and homogeneity
show that three reconstructed images depict less amount
of the structural variations than the source image.
Regarding three approaches, the outcome from the
PCA optimal scheme shows better match to the source
image than those from two other schemes of DFT and
DWT, in terms of all six metrics defined above. The
greater discrete entropy and smaller discrete energy
indicate that the reconstructed images via PCA represent
more intrinsic information and better information match
than reconstructed images via DFT and DWT. Relatively
higher contrast and dissimilarity are corresponding to
more color distributions and tone differences via PCA.
Relatively lower correlation and homogeneity are
corresponding to more structural variations via PCA.
Moreover, mutual information between the source image
and PCA reconstructed image is less than those from two
other cases of DFT and DWT, thus less extra information
can be discovered from one to the other. It also means
that less mismatch occurs between the source and
reconstructed images via PCA than two other cases via
DFT and DWT. These data also imply that the optimal
PCA scheme can produce the better compression result
and show more intrinsic information.
Between the DWT and DFT approaches, data for all
information metrics are quite similar. It is tough to make
conclusions where information is even conflicting. It
shows that the impact of the DWT and DFT compression
schemes vary case by case. No distinctive conclusion can
be achieved between the DWT and DFT.
TABLE 1 METRICS FOR IMAGE COMPRESSION AND RECONSTRUCTION
Pyramid
Metrics
Source
Case 1
2DPCA
Case 2
Wavelet
Case 3
Fourier
Case 4
Discrete
Entropy
6.9415 6.9301 6.9263 6.9285
Discrete
Energy
0.0096 0.0097 0.0097 0.0097
Contrast 0.1657 0.1505 0.1302 0.1340
Correlation 0.9009 0.9071 0.9205 0.9185
Dissimilarity 0.1506 0.1352 0.1184 0.1219
Homogeneity 0.9262 0.9290 0.9420 0.9402
Mutual
Information
0.0114 0.0152 0.0130
NC Column
Metrics
Source
Case 1
2DPCA
Case 2
Wavelet
Case 3
Fourier
Case 4
Discrete
Entropy
7.2803 7.2756 7.2642 7.2676
Discrete
Energy
0.0076 0.0076 0.0077 0.0077
Contrast 0.2939 0.2780 0.2644 0.2676
Correlation 0.9032 0.9082 0.9117 0.9109
Dissimilarity 0.2095 0.2019 0.1852 0.1881
Homogeneity 0.9033 0.9064 0.9150 0.9136
Mutual
Information
0.0047 0.0161 0.0127
Pagoda
Spring
Source
Case 1
2DPCA
Case 2
Wavelet
Case 3
Fourier
Case 4
Discrete
Entropy
6.9771 6.9634 6.9285 6.9059
Discrete
Energy
0.0130 0.0137 0.0144 0.0145
Contrast 0.6620 0.6397 0.6178 0.6203
Correlation 0.8230 0.8249 0.8393 0.8958
Dissimilarity 0.4215 0.4164 0.3553 0.3349
Homogeneity 0.8106 0.8132 0.8467 0.8909
Mutual
Information
0.0077 0.0348 0.0485
V. CONCLUSIONS
Image compression focuses on reducing the number of
the bits needed for image representation and storing
information with suitable quality. It aims to use as little
as possible the size of an image file for data storage.
Taking into account the issues on potential compression
ratios and bit per pixel ratios, lossy compression schemes
are investigated rather than the lossless ones, where
discrete wavelet transform, discrete Fourier transform
and optimal principal component analysis have all been
employed to decompose, compress, and reconstruct
typical grayscale images which symbolize ancient and
modern civilization. From visual appealing, all three
schemes will generate very similar results to source
images. It is hard to differentiate merits and drawbacks of
diverse approaches. To objectively measure the impact of
image compression and reconstruction using three
schemes, the information metrics of discrete entropy,
discrete energy, contrast, correlation, dissimilarity and
homogeneity as well as mutual information have been
introduced to evaluate various approaches. It is then
concluded that that the optimal PCA scheme can produce
relatively better results in digital image compression and
reconstruction than two other schemes.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 491
2011 ACADEMY PUBLISHER
REFERENCES
[1] R. Gonzalez and R. Woods, Digital Image Processing,
3rd Edition, Prentice-Hall, 2007
[2] S. Mitra, "Digital Signal Processing, A Computer Based
Approach", 3rd Edition, McGraw-Hill, 2005
[3] R. Duda, P. Hart, D. Stork, Pattern Classification, 2nd
Edition, John Wiley and Sons, 2000
[4] Simon Haykin, Neural Networks A Comprehensive
Foundation, 2nd Edition, Prentice Hall, 1999
[5] David MacKay, "Information Theory, Inference and
Learning Algorithms", Cambridge University Press, 2003
[6] Z. Ye, H. Cao, S. Iyengar and H. Mohamadian, "Medical
and Biometric System Identification for Pattern
Recognition and Data Fusion with Quantitative
Measuring", Systems Engineering Approach to Medical
Automation, Chapter Six, pp. 91-112, Artech House
Publishers, ISBN978-1-59693-164-0, October, 2008
[7] Z. Ye, H. Mohamadian, Y. Ye, "Information Measures for
Biometric Identification via 2D Discrete Wavelet
Transform", Proceedings of 2007 IEEE International
Conference on Automation Science and Engineering,
pp.835-840, September 22-25, 2007, Scottsdale, USA
[8] Z. Ye and G. Auner, Principal Component Analysis for
Biomedical Sample Identification, Proceedings of the
2004 IEEE International Conference on Systems, Man and
Cybernetics (SMC 2004), pp. 1348-1353, Oct. 10-13,
2004, Hague, Netherlands
[9] Z. Ye, H. Mohamadian, Y. Ye, "Quantitative Study of
Information Loss in Digital Image Compression and
Reconstruction", Proceedings of the 2011 International
Symposium on Data Storage and Data Engineering
(DSDE), May 13-15, 2011, Xi'An, China
[10] Y. Zhang and D. Adjeroh, "Prediction by Partial
Approximate Matching for Lossless Image Compression",
IEEE Transactions on Image Processing, pp. 924-935,
Vol. 17, NO. 6, JUNE 2008
[11] J. Jeng, C. Tseng, and J. Hsieh, "Study on Huber Fractal
Image Compression", IEEE Transactions on Image
Processing, pp. 995-1003, Vol. 18, NO. 5, May 2009
[12] H. Ates, and M. Orchard, "Spherical Coding Algorithm for
Wavelet Image Compression", IEEE Transactions on
Image Processing, pp. 1015-1024, Vol. 18, NO. 5, May
2009
[13] L. Ma and K. Khorasani, "Application of Adaptive
Constructive Neural Networks to Image Compression",
IEEE Transactions on Neural Networks, pp. 1112 -1126,
Vol. 13, NO. 5, Sept. 2002
[14] A. Shoa and S. Shirani, "Optimized Atom Position and
Coefficient Coding for Matching Pursuit-Based Image
Compression", IEEE Transactions on Image Processing,
Vol. 18, No. 12, pp. 2686-2694, December 2009
[15] X. Wu, X. Zhang, and X. Wang, "Low Bit-Rate Image
Compression via Adaptive Down-Sampling and
Constrained Least Squares Upconversion", IEEE
Transactions on Image Processing, pp. 552-561, Vol. 18,
NO. 3, March 2009
[16] E. Filho, E. Silva, M. Carvalho, and F. Pinag, "Universal
Image Compression Using Multiscale Recurrent Patterns
With Adaptive Probability Model", IEEE Transactions on
Image Processing, pp. 512-526, Vol. 17, No. 4, April 2008
[17] B. Li, R. Yang, and H. Jiang, "Remote-Sensing Image
Compression Using Two-Dimensional Oriented Wavelet
Transform", IEEE Transactions on Geoscience and
Remote Sensing, pp. 1-15, 2011
[18] J. Lin, M. Smith, "New Perspectives and Improvements on
the Symmetric Extension Filter Bank for Subband Wavelet
Image Compression", IEEE Transactions on Image
Processing, pp. 177-189, Vol. 17, NO. 2, Feb. 2008
[19] S. Lo, H. Li, and M. Freedman, "Optimization of Wavelet
Decomposition for Image Compression & Feature
Preservation", IEEE Transactions on Medical Imaging, pp.
1141-1149, Vol. 22, No. 9, Sept 2003
[20] Y. Chang, C. Han, et.al., "Greedy Modular Eigenspaces
and Positive Boolean function for Supervised
Hyperspectral Image Classification", Optical Engineering
42(09), pp.2576-2587, September 2003
[21] C. Yang, "A Fuzzy-Statistics-Based Principal Component
Analysis Method for Multispectral Image Enhancement
and Display", IEEE Transactions on Geoscience and
Remote Sensing, pp. 3937-3947, Vol. 46, NO. 11,
November 2008
[22] Z. Ye, Y. Ye, H. Mohamadian, "Biometric Identification
via PCA and ICA Based Pattern Recognition",
Proceedings of the 2007 IEEE International Conference on
Control and Automation (ICCA 2007), pp. 1600-1604,
May 30-June 1, 2007, Guangzhou, China
[23] Z. Ye and R. Turner, "Intelligent Linear and Nonlinear
Analysis for Biometric Fingerprint Recognition",
Proceedings of the 39th Southeastern Symposium on
System Theory, pp. 315-319, March 4-6, 2007, Macon,
Georgia, USA
Dr. Yes research interests include modeling, control and
optimization across a broad spectrum of diverse applications on
electrical, mechanical, automotive and biomedical systems, as
well as signal processing and image processing. Dr. Ye is the
first multi-disciplinary researcher internationally who has the
first author publications covering all the leading control
proceedings in three most prestigious engineering societies
(IEEE, ASME, SAE), specifically, IEEE (CDC, CCA, SMC,
ACC, ISIC, FUZZ, IJCNN, CEC, CASE, ICCA, SOSE, WCCI
Congress, MSC Congress), ASME World Congress and SAE
World Congress. Dr. Ye also has Sole Authorships in IEEE
Transactions and SAE Transactions. He was an academic
reviewer for over 150 articles submitted to the IEEE, ASME,
SAE and various International Journals. Dr. Ye is the recipient
of the Chinese National Fellowship (First Prize) at Tianjin
University, the USA Allied Signal Fellowship (First Prize) at
Tsinghua University and the Most Outstanding Faculty of
Electrical Engineering Department at Southern University. He
was selected for inclusion in Marquis Whos Who in 2008.
Zhengmao Ye received the B.E. degree
from Tianjin University and the first
M.S. degree from Tsinghua University,
China; the second M.S. and Ph.D degrees
in electrical engineering from Wayne
State University, USA. Currently he
serves as an Associate Professor in the
Department of Electrical Engineering,
Southern University at Baton Rouge,
USA. He is a Senior Member of IEEE
and the Founder and Director of Systems
and Control Lab at Southern University.
492 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
Yongmao Ye graduated from School of
Electronic and Information Engineering
at Tianjin University, China. He currently
serves as Chief Engineer of Multimedia
Resource Administration in Technology
Center, Liaoning Radio and Television
Station, China. His professional expertise
and research interest include High
Definition Television (HDTV), Digital
Television (DTV), Advanced Television
(ATV), and Multimedia Technology, as
well as High Performance Signal and
Image Processing. He has more than
twenty refereed Journal and International
Conference publications within these
academic fields.
Habib Mohamadian ASME Fellow,
received B.S. degree from University of
Texas at Austin, M.S. and Ph.D degrees
in College of Engineering from Louisiana
State University. Dr. Mohamadian serves
as the Professor and Dean, College of
Engineering, Southern University at
Baton Rouge, Louisiana, USA. The Dean
oversees the College's strategic planning,
program development, academic affairs,
government and industry relations, and
research initiatives. His research interests
include various aspects of engineering
education and practice.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 493
2011 ACADEMY PUBLISHER
An Improved Fast SPIHT Image Compression
Algorithm for Aerial Applications
Ning Zhang
Graduate School of Chinese Academy of Sciences, Beijing, China
Institute of Optics, Fine Mechanics and Physics Chinese Academy of Sciences, ChangChun, China
Email: scorode@163.com
Longxu Jin
Institute of Optics, Fine Mechanics and Physics Chinese Academy of Sciences, ChangChun, China
Email: jinlx@ciomp.ac.cn
Yinhua Wu
Graduate School of Chinese Academy of Sciences, Beijing, China
Institute of Optics, Fine Mechanics and Physics Chinese Academy of Sciences, ChangChun, China
Email: yhwcn@msn.com
Ke Zhang
Institute of Optics, Fine Mechanics and Physics Chinese Academy of Sciences, ChangChun, China
Email: ke_ogg@163.com
AbstractIn this paper, an improved fast SPIHT algorithm
has been presented. SPIHT and NLS (Not List SPIHT) are
efficient compression algorithms, but the algorithms
application is limited by the shortcomings of the poor error
resistance and slow compression speed in the aviation areas.
In this paper, the error resilience and the compression speed
are improved. The remote sensing images are decomposed
by Le Gall5/3 wavelet, and wavelet coefficients are indexed,
scanned and allocated by the means of family blocks. The
bit-plane importance is predicted by bitwise OR, so the N
bit-planes can be encoded at the same time. Compared with
the SPIHT algorithm, this improved algorithm is easy
implemented by hardware, and the compression speed is
improved. The PSNR of reconstructed images encoded by
fast SPIHT is higher than SPIHT and CCSDS from 0.3 to
0.9db, and the speed is 4-6 times faster than SPIHT
encoding process. The algorithm meets the high speed and
reliability requirements of aerial applications.
Index Termsimage compression, fast SPIHT coding, error
resistance, bit-plane parallel
I. INTRODUCTION
With the development of high-resolution remote
sensing technology at home and abroad, the space camera
sampling rate and the remote sensing image resolution
are higher and higher. Therefore, a high error resistance
and compression speed algorithm has become the
research focus in the field of remote sensing image
compression in the communication channel limited.
There are some compression chips on the market, such as
ADI's ADV202 and ADV212. But the largest input speed
is only 65Mpixel/s, and the pixel depth cant meet all
requirements. Therefore, to meet the engineering
requirements, the reliable and efficient compression
system must be studied.
1996, SPIHT (set partitioning in hierarchical trees)
algorithm [1] was proposed by Said and Pearlman, which
adopts spatial orientation tree structure, and can
effectively extract the significant coefficients in wavelet
domain. SPIHT has less extremely flexible features of bit
stream than JPEG2000, but SPIHT has low structure and
algorithm complexity relatively, and supports multi-rate,
has high signal-to-noise ratio (SNR) and good image
restoration quality, so it is suitable for encoding
occasions with a high real-time requirement. Wavelet
domain coefficients are scanned by three lists of SPIHT,
which named: the list of insignificant pixels (LIP), the list
of significant pixels (LSP) and the list of insignificant
pixels sets (LIS). Each scan is from the highest bit-plane
to the lowest bit-plane. The encoding speed is limited via
the repetitive scans and dynamic update of three lists, and
it isnt conducive to hardware implementation.
Subsequently, F.W.Wheeler and W.A.Pearlman
proposed a variant of the original SPIHT algorithm can
be achieved with hardware, named NLS (Not List
SPIHT)[2]. The process of scans, the sets, and the
compression ratio of NLS are the same with SPIHT. The
IP, IS, REF of NLS are corresponding to LIP, LIS, LSP
three lists of SPIHT. NLS solves the problem of hardware
implementation, but does not improve the encoding
speed.
In the aerial space, there are a lot of charged particles
and cosmic rays, so the bits stream in storage devices
may be disturbed by radiation effects, and then errors will
be caused. The output bits stream of SPIHT and NLS are
in accordance with the order of scans, so the whole image
494 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
doi:10.4304/jmm.6.6.494-501
reconstruction will be influenced when an error
generated.
Based on the above discussions, SPIHT and NLS are
not conducive to hardware implementation. Actually,
SPIHT and NLS are not used widely by virtue of the slow
scanning speed and the poor error resistance factors. In
this paper, the scanning process is simplified, and wavelet
domain coefficients are stored by family blocks. The
error resilience and encoding speed are improved, so the
compression algorithm can be used in the aerial camera
image compression system.
II. THE IMPROVEMENTS AND DEFICIENCIES OF NLS
A. The improvements of NLS
The spatial orientation tree structure is still adopted in
NLS. The wavelet domain coefficients are separated into
fathers, children, and grandchildren, as shown in the Fig.
1. Compared with SPIHT, NLS is improved at the aspect
of hardware implementation, as follows:
Figure 1. Spatial orientation tree structure of wavelet coefficients
The linear index technology is introduced. Linear
index uses one number instead of the two-dimensional
index. R = C = 2
N
, R, C are the number of rows and
columns of the image. (r, c) is the coordinates of
coefficients, and r, c, can be wrote as binary numbers:
1 2 1 0
1 2 1 0
[ , , , , ]
[ , , , , ]
N N
N N
r r r r r
c c c c c
=
=
(1)
Where r
n
, c
n
(n = 1,2, ... N-1) is a binary number. The
linear index can be defined as:
1 1 2 2 1 1 0 0
[ , , , , , , , , ]
N N N N
i r c r c r c r c
= (2)
Coordinates of a 8*8 image coefficients are arranged
by the linear index, as shown Table I.
TABLE I. THE LINEAR INDEX
A state table Mark is used to show the state of each
wavelet coefficient instead of three lists of SPIHT. Mark
is updated by the results of each scan process. Val is used
to store the coefficients arranged by the linear index. Two
maximum arrays, dmax [i] and gmax [i], are introduced,
so that it is not computing the maximum value of
descendant sub-bands coefficients repeatedly. They are
the maximum magnitude of the descendant (dmax) and
the maximum magnitude of the grand descendant (gmax),
and can be computed by (3) and (4), separately. They are
no longer updated during the encoding process.
max( ) max( max(4 ), max(4 1), max(4 2), max(4 3)) g i d i d i d i d i = + + +
(3)
max( ) max( (4 ), (4 1), (4 2), (4 3), max( )) d i val i val i val i val i g i = + + +
(4)
NLS uses one-dimensional linear index instead of the
two-dimensional index, and adopts the SKIP function to
skip the unimportant pixels blocks. The state table Mark
can be implemented by hardware easily.
B. Time consumption and error resistance of NLS
Wavelet transform coefficients are rearranged by the
linear index module. The address of high frequency sub-
bands of coefficients is fragmented in the linear index
lookup table. A standard image Lena of 512 * 512* 8 bits
is done the 3-level DWT, as shown in the Figure 1. In the
last row of the sub-band HH1, the coefficients (511, 256),
(511,511) can be wrote as the binary number (11111,1111,
10000,0000), (11111,1111,11111,1111), corresponding
the index value 240298, 262143 on the linear index
lookup table, respectively. The linear index value of each
coefficient must be corresponding with the storage RAM
address by the means of address mapping. The lookup
table need a lot of memory space to be stored, and the
computing process is more and more complicated with the
increasing rows and columns. NLS can properly encode or
not, depends on whether the wavelet domain coefficients
are arranged properly. Therefore, the linear index process
must be simplified.
The scanning process of IP, IS, REF adopt the serial
processing approach, a bit-plane by a bit-plane, from
MSB to LSB. So the processing of a bit-plane needs scan
the image 3times, and 24 times for a 8 bit-planes image.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 495
2011 ACADEMY PUBLISHER
The output message of the scanning process of each bit-
plane includes the information for decoder and the
scanning process of next bit-plane. If some errors in the
code stream, the next scanning and decoding process
would be wrong, and disastrous consequences would
happen in the image reconstruction process. The cost time
of the scanning process is the bottleneck of the time
requirement of the whole compression system. The image
could be correct reconstructed or not, demands on
whether the output codes stream is right. Therefore, the
scanning process need be improved.
III. IMPROVED INITIALIZATION OF SPIHT
In this paper, wavelet domain coefficients are arranged
by the linear index module, and dmax, gmax value arrays
are computed at the same time. Wavelet domain
coefficients are divided into many family blocks. A
family is composed of one coefficient of the HL3, HH3,
LH3 sub-bands (a pixel), four coefficients of HL2, HH2,
LH2 sub-bands (FourBlock), and sixteen coefficients of
HL1, HH1, LH1 sub-bands (SixteenBlocks ), as shown in
the Fig.1. 32*32 families form a family block, which is
stored in a RAM.
Wavelet domain coefficients at all levels (except LL3
sub-band) are stored in the family blocks RAM by the
mapping address module directly, as shown in Fig.2.
There are 12 family blocks totally. Coefficients in the LL3
sub-band are stored separately. Compared to the whole
image coefficients, the number of coefficients in each
family block is small, so the computational complexity of
the linear index process is greatly reduced. All family
blocks share the linear index module, which also reduces
the memory space requirements. The size of the family
block can be adjusted according to the hardware
resources, such as 16*16, 64*64 and so on. The output
code streams are stored in blocks separately. If an error
generated in a scanning process, the wrong codes would
be constrained in the only block, and not disturb other
family blocks. The error resistance has been improved.
Each family block is encoded by the same scanning
process, and the encoding module can be easy generated
by the means of citing the module in Verilog HDL or
VHDL. Therefore, the encoding efficiency has been
improved. The improved initialization process is shown in
the Fig.2.
Figure 2. Improved initialization of SPIHT
IV. PARALLEL SPIHT
SPIHT and NLS adopts the serial processing approach,
form MSB to LSB, because the processing for a bit-plane
needs the results of the previous bit-planes processing.
The parallel SPIHT algorithm in this paper can deal with
all bit-planes simultaneously. It also adopts three lists
(LSP, LIP, LIS) instead of the state Mark, but the
scanning results of all bit-planes can be obtained at the
same time. The proposed algorithm predicts each pixels
state by the bitwise OR operation. The bitwise OR results
of the first n-1 bits of the val, dmax and gmax value are
computed in the scanning process. In addition, 3 times
image scanning is need for a bit-plane in NLS. While the
fast SPIHT handles three processes in NLS through one
scan. Therefore, the scanning speed is only relative to the
image resolution. Bitwise OR is easy implemented by
hardware, so it is a real-time process.
Define: PixelOR is the bitwise OR result of the first n-1
bits of val for each pixel, and PixelOR is 0 for the first bit-
plane. DmaxOR is the bitwise OR result of the first n-
1bits of dmax for each FourBlock, and DmaxOR is 0 for
the first bit-plane. GmaxOR is the bitwise OR result of the
first n-1bits of gmax for each SixteenBlock, and GmaxOR
is 0 for the first bit-plane. PxielBit,DmaxBit and GmaxBit
is the nth bit of val , dmax and gmax respectively.
PixelOR, DmaxOR and GmaxOR modules can be used
to determine whether the pixel, the FourBlock and the
SixteenBlock have been an important element
respectively. When 1, it indicates that there is at least one
bit 1 in the first n-1 bit-planes. When 0, it indicates that
the first n-1 bits are all 0, and the pixel is unimportant.
MGpredict is the bitwise OR result of the first n bits of
FourBlocks dmax in the HL2, HH2 and LH2 sub-bands
for each SixteenBlock. MGpredict can be used to
determine whether the SixteenBlock has been flagged as
MG (a kind of state of the pixel in NLS, as the L-type
entry in SPIHT).
Sign is the sign for each pixel. LIP, LIS, and LSP are
three lists, which correspond to the output stream of the
IP, IS and REF process of NLS. LIP, LIS and LSP
correspond to the output stream of the IPP, ISP and RP
process of NLS respectively.
Specific description of the parallel SPIHT algorithm:
z The 3
rd
level
DWTcoefficients(LL3,LH3,HL3,HH3):
if (PixelOR==1) output PixelBit to the lsp
else
output PixelBit to the lip
if(PixelBit ==1) output sign to the lip
z The 2
nd
level DWT coefficients (LH2,HL2,HH2):
if(DmaxOR==1)
for each pixelFourBlock
if(PixelOR==1)output PixelBit to the lsp
else
output PixelBit to the lip
if(PixelBit==1)output sign to the lip
else
output DmaxBit to the lis
496 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
if(DmaxBit ==1)
for each pixelFourBlock
output DmaxBit to the lis
if(DmaxBit ==1) output sign to the lis
z The 1
st
level DWT coefficients(LH1,HL1,HH1):
if(MGpredict==1)
if(GmaxOR==1)
for each FourBlockSixteenBlock
if(DmaxOR ==1)
for each pixelFourBlock
if(PixelOR==1)output PixelBit to the lsp
else
output PixelBit to the lip
if(PixelBit ==1)output sign to the lip
else
output DmaxBit to the lis
if(DmaxBit ==1)
for each pixelFourBlock
output PixelBit to the lis
if(PixelBit ==1)output sign to the lis
else
output GmaxBit to the lis
if(GmaxBit ==1)
for each FourBlockSixteenBlock
output GmaxBit to the lis
if(DmaxBit ==1)
for each pixelFourBlock
output PixelBit to the lis
if(PixelBit ==1)output sign to the lis
The flowchart of the parallel SPIHT algorithm is
shown in the Fig.3 [3].
V. BIT-RATE CONTROL
Bit-Rate control is a process by which the bitrates are
allocated in each code-block in each sub-band in order to
achieve the overall target encoding bit-rate for the whole
image while minimizing the distortion (errors) introduced
in the reconstructed image due to quantization and
truncation of codes to achieve the desired code rate.
The scanning processes of SPIHT and NLS are from
the MSP to the LSP, and it is truncated at a bit-plane
when the communication channel bandwidth is limited.
All wavelet domain coefficients are quantified at the
same step size, and it is bigger when the bit-rate is lower.
The serious distortion and false contour may be
introduced.
In this paper, the entropy of each sub-band family is
estimated according to LOCO-I MED prediction
algorithm in [5] when coefficients are indexed, as shown
in Fig. 2. The number of output codes of each sub-band
family is allocated according to compression rate
requirements, and codes are truncated from high to low
bit-planes to achieve rate control.
Y
Y
Y
Y
N
N Y
Y
iLH1HL1HH1
MGpredict=1
GmaxOR=1
Output GmaxBit
to lis
Each FourBlcok
SixteenBlock
DmaxOR=1
GmaxBit=1
Each FourBlock
SixteenBlcok
Output DmaxBit
to lis
DmaxBit=1
Each Pixel
FourBlcok
Output PixelBit
to lis
PixelBit=1
Output sign to
lis
iLL3
LH3
HL3
HH3
Y
Y
Output DmaxBit
to lis
DmaxBit=1
Each Pixel
FourBlock
Output PixelBit
to lis
PixelBit=1
Output sign
To lis
PixelOR=1
Output PixelBit
to lip
PixelBit=1
Output
PixelBit
to lsp
Y
N
Y
Each Pixel
FourBlock
Output sign
To lip
iLH2
HL2
HH2
N
i=i+1
Figure 3. The flowchart of the scanning process
The forecast template is shown in Fig. 4:
c b
a x
Figure 4. The forecast template and linear index order
The prediction coefficients:
min( , ), max( , )
max( , ), min( , )
,
a b c a b
x a b c a b
a b c otherwise
(5)
The forecast expression is shown in figure 5. The
coefficient at the boundary is predicted by the former one.
The FamilyBlock is composed of three SubBands, and
s
H is the prediction entropy of a SubBand.
s i i
i
H x x =
(6)
i SubBand ,s=1 2 3.
i
x and
i
x are the reality
value and the prediction value respectively. Similarly
f
H ,
f s
s
H H =
(7)
f is the number of the FamilyBlocks, f=1212.
f
W is the pre-allocated weight of each
FamilyBlock, which is defined as /
f f
W H H = .
fs
W is
the pre-allocated weight of the s SubBand of the f
FamilyBlock, which is defined as :
/
fs fs
W H H = (8)
fs
H is the prediction entropy of the s SubBand of the f
FamilyBlock. Similarly,
3 LL
W is the pre-allocated weight
of the LL3 sub-band, which is defined as :
3 3
/
LL LL
W H H = (9)
The sum of all weights is equal to 1, as
3
1
LL f
f
W W + =
.
Rate is the compressed bit-rate, and the size of the
image is R C .Then the pre-allocated code streams of
each SubBand can be defined as :
3 3
fs fs
LL LL
Bit Rate R C W
Bit Rate R C W
=
=
(10)
Bit-rate of each SubBand is controlled by (10),
according to the channel bandwidth.
VI. EXPERIMENTAL RESULTS
The standard images of Lena, Goldhill, Aerial, and
Fukushima nuclear plant (before and after disaster) are
chose as experimental images. The size of images is 512
*512*8bits. Images are done the 3-level DWT with
matlabR2008. The computer CPU is Intel Pentium Dual
E2160 1.8GHz, and the memory size is 1G. The PSNR of
Goldhill reconstruction compressed by fast SPIHT,
SPIHT, NLS and CCSDS are shown in the Fig.5, and the
PSNR of reconstruction images compressed by fast
SPIHT at different bit-rate are shown in the Fig. 6.
Fast SPIHT processes 8 bit-planes at the same time
containing three lists, so the code-stream must be
reordered. The output streams are adjusted by the order
process in accordance with the sequence of SPIHT, so the
decompressed process is the same with SPIHT, and the
compression ratio can be controlled. The time cost of
SPIHT encoding at different bit-rate are shown in the
Fig.7, also the time cost and PSNR of fast SPIHT
encoding at bit-rate 1bpp are shown in the table II.
Figure 5. PSNR of Goldhill reconstruction compressed
by diffeient methods
Figure 6. PSNR of reconstruction images compressed by fast SPIHT at
different bit-rate
Figure 7. Time cost of SPIHT encoding at different bit-rate
498 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
TABLE II. CONSEQUENCES OF FAST SPIHT ALGORITHM
Goldhill reconstruction images PSNR=32.83 db
Aerial reconstruction images PSNR=28.65 db
Fukushima nuclear plant (before disaster) reconstruction images
PSNR=30.59 db
Fukushima nuclear plant (after disaster) reconstruction images
PSNR=31.84 db
Figure 8. Reconstruction Images compressed by fast SPIHT, Bit-
Rate=0.5bpp
Goldhill, Aerial and Fukushima nuclear plant (before
and after disaster) reconstruction images at bit-rate 0.5bpp
are displayed in the Fig.8. By comparing images of the
nuclear plant before and after disaster, the situation of
plant damaged can be seen clearly. The image of nuclear
plant after disaster is smoother than the image before, so
the PSNR is higher at the same bit-rate.
The experiments results show that the PSNR of
reconstruction images encoded by fast SPIHT are
increased 0.3 to 0.9db, and the time are only 1/4-1/6 of
SPIHT encoding process. Implemented by hardware, the
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 499
2011 ACADEMY PUBLISHER
speed can be further improved via virtue of parallelism
and pipelining.
VII. CONCLUSIONS
In this paper, the error resilience and compression
speed are improved. The compression algorithm based on
set partitioning in hierarchical scans coefficients by the
serial processing approach. The encoding speed is limited
by repeatedly scans. A new fast SPIHT algorithm is
proposed, which can deal with all bit-planes
simultaneously, and the speed is only relative to the
image resolution. The coefficients are divided into many
family blocks, stored in block RAMs separately. The
algorithm is suitable for a fast, simple hardware
implementation, and can be used in the field of aerial
image compression system, which requiring the high
speed and high error resilience.
ACKNOWLEDGMENT
I wish to thank my advisor, Dr. Long xu Jin, for his
guidance and support. Furthermore, the project has been
accomplished with the help of engineer, Ke Zhang, and
Dr. Yinhua Wu. Last, I am also deeply thankful to my
family, they are always giving me a lot of encourage in
my research and life.
REFERENCES
[1] A.Said, and W.A.Pearlman. A new fast and efficient
image codec based on set partitioning in hierarchical
trees, IEEE Transactions on Circuits and Systems for
Video Technology, 1996, 6(6): 243250.
[2] F.W.Wheeler. and W.A.Pearlman. SPIHT image
compression without lists, IEEE Int. Conf on Acoustics,
Speech and Signal Processing. (ICASSP2000).
Istanbul:IEEE,2000.2047-2050.
[3] J.M.Shapiro. Embedded image coding using zerotrees of
wavelet coefficients, IEEE Trans. On Signal
Processing,vol.41,pp.3445-3462,December 1993.
[4] Yong Xu, Zhi yong Xu, Qi heng Zhang, Ru jin Zhao.
Low complexity image compression scheme for hardware
implementation, Optics and Precision Engineering.
Precision Eng., 2009, 17(9):2262-2268.
[5] J.H.Zhao, W.J.Sun, Z.Meng, Z.H.Hao. Wavelet transform
characteristics and compression coding of remote sensing
images, Optics and Precision Engineering, Vol.12(2),
pp.205-210, 2004.
[6] H.L.Xu, S.H.Zhong, Image Compression Algorithm of
SPIHT Based on Block-Tree, Journal of Human Institute
of Engineering,Vol.19(1),pp.58-61,2009.
[7] Feng Zhao, Dong feng Yuan, Hai xia Zhang, Ting fa Xu.
Multi-DSP real-time parallel processing system for image
compression, Optics and Precision Engineering.
Precision Eng., 2007, 15(9):1451-1455.
[8] Xu cheng Xue, Shu yan Zhang, Yong fei Guo.
Implementation of the No Lists SPIHT Image
Compression Algorithm Using FPGA, Micro Computer
Information, vol.24(62),pp.219-220,2008.
[9] Sen Ma, Yuanyuan Shang, Weigong Zhang, Yong Guan,
Qian Song, Dawei Xu, Design of Panoramic Mosaic
Camera Based on FPGA Using Optimal Mosaic
Algorithm, JOURANL OF COMPUTERS, vol.6(7),
PP.1378-1385,2011.
[10] Zhaohui Zeng, Yajun Liu,Construction of High
Performance Balanced Symmetric Multifilter Banks and
Application in Image Processing, JOURANL OF
COMPUTERS, vol.5(7), PP.1038-1045,2010.
[11] Yin hua Wu, Long xu Jin, Hong jiang Tao. An improved
fast parallel SPIHT algorithm and its FPGA
implementation, The 2010 2
nd
International Conference
on Future Computer and Communication.
2010,Vol.1,pp.191-195.
[12] Nileshsingh V.Thaker, Dr.O.G.Kakde, Color Image
Compression with Modified Fractal Coding on Spiral
Architecture, JOURANL OF MULTIMEDIA, vol.2(4),
PP.55-66,2007.
[13] Zhenbing Liu, Jianguo Liu, Guoyou Wang, An Arbitrary-
length and Multiplierless DCT Algorithm and Systolic
Implementation, JOURANL OF COMPUTERS, vol.5(5),
PP.725-732,2010.
[14] Hualiang Zhu, Chundi Xiu,Dongkai Yang, An Improved
SPIHT Algorithm Based on Wavelet Coefficient Blocks
for Image Coding, ICCASM2010, vol.2, PP.646-649,2010.
[15] Li Wern Chew, Li Minn Ang, Kah Phooi Seng, Reduced
Memory SPIHT Coding Using Wavelet Transform with
Post-Processing, IHMSC2009, vol.1, PP.371-374,2009.
[16] Kong Fan-qiang, Li Yun-song, Wang Ke-yan, Zhuang
Huai-yu. An Adaptive Rate Control Algorithm for
JPEG2000 Based on Rate Pre-allocation, Journal of
Electronics & Information Technology, 2009,31(1):66-70.
[17] Du Lie-bo, Xiao Xue-min, Luo Wu-Sheng, Lu Hai-bao.
Quantification removing for satellite on-board remote
image JPEG2000 compression algorithm, Optics and
Precision Engineering. Precision Eng., 2009, 17(3):690-
694.
[18] Wang Ren-long, Hao Yan-ling, Liu Ying. Embedded
block wavelet coding method based on block bit-length,
Optics and Precision Engineering. Precision Eng., 2009,
16(7):1315-1322.
[19] Tian Bao-feng, Xu SHu-yan, Sun Rong-chun, Wang Xin,
Yan De-jie. A lossy compression algorithm of remote
sensing image suited to space-borne application, Optics
and Precision Engineering. Precision Eng., 2006,
14(4):725-730.
[20] Lei Jie, Kong Fan-qiang, Wu Cheng-ke, Li Yun-song.
Hardware oriented rate control algorithm for JPEG2000
and its VLSI architecture design, Journal of xidian
university.2008, 35(4):645-649.
[21] Tingku Acharya, Ping-Sing Tsai. JPEG2000 standard for
image compression, A john wiley& sons,INC.,
publication.2005.
[22] Zhang Xue-quan, Gu Xiao-dong, Sun Hui-xian. Design
and implementation of CCSDS-based onboard image
compression unit using FPGA, Semiconductor
optoelectronics. 2009,30(6):935-939.
[23] Limin Ren, Web image retrieval in web pages, The 2010
2
nd
International Conference on Future Computer and
Communication. 2010,Vol.1,pp.26-31.
[24] T.B.Ma, the research of remote sensing image
compression based on the wavelet transform of SPIHT
algorithm, Master Dissertation,2008.
[25] Zhang Su-wen, Wang Li-li, Miao Dan-dan, an improved
embedded zerotree wavelets image coding algorithm,
Infrared Technology.2008,30(9)541-545.
500 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
Ning Zhang is a doctoral candidate of Graduate School of
Chinese Academy of Sciences, China. He was born in
Shandong province of china in 1982. He received his B.Eng.
degree in Communication Engineering of Jilin University of
China in 2006. He has been working and studying in Institute of
Optics, Fine Mechanics and Physics Chinese Academy of
Sciences, Changchun, China since 2006. His research interests
include image processing and hardware system designing.
Longxu Jin, PhD, a researcher. He was born in Jilin province
of China in 1965. He received his B.Eng. degree in Machinery
& Electronics Engineering of Optics and Mechanics college of
Changchun, China, in 1987, and M.Eng. and PhD. degree in
Electronic Engineering of Institute of Optics, Fine Mechanics
and Physics Chinese Academy of Sciences, Changchun, China,
in 1993 and 2003 respectively. He has been working as a
researcher in department of Space Optics of Electronic
Engineering of Institute of Optics, Fine Mechanics and Physics
Chinese Academy of Sciences, Changchun, China, since
1993.His current research interests include digital image
acquisition and processing, space camera intelligent controlling.
Yinhua Wu is a doctoral candidate of Graduate School of
Chinese Academy of Sciences, China. She was born in Jilin
province of china in 1984.She received her B.Eng. degree in
Electronic Engineering of Jilin University of China in 2007. She
has been working and studying in Institute of Optics, Fine
Mechanics and Physics Chinese Academy of Sciences,
Changchun, China since 2007. Her research interests include
video compressing and transmitting, FPGA system designing.
Ke Zhang was born in Shandong province of china in 1979. He
received his B.Eng. degree in Electronic Engineering of Harbin
Engineering University of China in 2003, and his M.Eng.
degree in Tianjin polytechnic university of China in 2006. He
has been working in Institute of Optics, Fine Mechanics and
Physics Chinese Academy of Sciences, Changchun, China since
2006. Currently he is a study engineer, and his research interests
include space camera controlling and image acquisition.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 501
2011 ACADEMY PUBLISHER
3D Tracking and Positioning of Surgical
Instruments in Virtual Surgery Simulation
Zhaoliang Duan
School of Computer, Wuhan University, Wuhan 430072, China
Email: dzlwhu@gmail.com
Zhiyong Yuan, Xiangyun Liao, Weixin Si, Jianhui Zhao
School of Computer, Wuhan University, Wuhan 430072, China
Email: zhiyongyuan@whu.edu.cn, xyunliao@gmail.com, wxsics@gmail.com, jianhuizhao@whu.edu.cn
Abstract3D tracking and positioning of surgical
instruments is an indispensable part of virtual Surgery
training system, because it is the unique interface for trainee
to communicate with virtual environment. A suit of 3D
tracking and positioning of surgical instruments based on
stereoscopic vision is proposed. It can capture spatial
movements of simulated surgical instrument in real time,
and provide 6 degree of freedom information with the
absolute error of less than 1 mm. The experimental results
show that the 3D tracking and positioning of surgical
instruments is highly accurate, easily operated, and
inexpensive. Combining with force sensor and embedded
acquisition device, this 3D tracking and positioning method
can be used as a measurement platform of physical
parameters to realize the measurement of soft tissue
parameters.
Index TermsVirtual surgery simulation system,
Stereoscopic vision, Surgical instruments, 3D tracking and
positioning
I. INTRODUCTION
With advances in robotics and information technology,
computer graphics (CG) and virtual reality (VR) have
been increasingly applied to the field of medicine [1]. As
the cutting-edge interdisciplinary research field of
information and medical sciences, research on virtual
surgery simulation system has significant application
value for reducing surgery risks, cutting training cost and
protecting human health [2]. With the help of virtual
surgery training platform, trainee surgeons can skillfully
master the operations of surgical instruments, general
procedure of surgery and anatomy of diseased region or
organ.
The accurate displacement and force response of tissue
model is a key part of VR based surgery training
simulation system [3]. To meet this requirement, we must
model the non-linear heterogeneous nature of soft tissue.
The interaction of surgical instrument and virtual scene
decides the accuracy and effectiveness of displacement
measurement and force response.
In virtual surgery simulation, the interaction of
surgical instrument and virtual scene mainly contains
construction of surgical instruction, rendering of collision
detection and rendering of the interaction and simulation
[4]. In order to simulate the interaction between surgical
instrument and virtual organ tissue vividly in virtual
surgery simulation, the surgical instrument must be
tracked and located accurately in real time.
Currently, there have been some available three
dimensional (3D) trackers in the field of virtual reality.
According to their physical properties, they are roughly
classified into five subcategories: mechanical tracker [5],
magnetic tracker [6-8], ultrasonic tracker [9], optical
tracker [10-11] and hybrid tracker [12]. Some of them
can provide high positioning accuracy, such as [6] and
[10], and have been used in some medical applications.
However, these existing devices are very expensive,
therefore can only be popularized in a limited number of
medical centers and research institutes. A 3D surgical
instrument tracking and positioning method with a high
performance-price ratio has been highly desirable for
computer-based virtual surgery simulation systems [13].
In order to make this goal come true, we present a
method based on stereoscopic vision for 3D tracking and
positioning of surgical instruments based on stereoscopic
vision. This method employs three cameras to capture the
motion images of simulated surgical instrument in real
time. After a series of computer processing, including
camera calibration, reconstruction of 3D coordinates of
markers on simulated surgical instruments and so on, we
can obtain the six degree of freedom information of
simulated surgical instruments, thereby positioning the
instrument. At the end of this paper, we apply the
presented method to accomplish interactive virtual organ
tissue deformation simulation. The experimental results
show that it is feasible and effective in virtual surgery
simulation systems.
The rest of the paper is organized as follows. Section 2
describes the methodology for tracking and positioning
Manuscript received July 1, 2011; revised September 13, 2011;
accepted October 1, 2011.
Project fully supported by a grant from the National Natural Science
Foundation of China (Grant No. 61070079).
Corresponding author, Zhiyong Yuan, Email addresses:
zhiyongyuan@whu.edu.cn
502 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
doi:10.4304/jmm.6.6.502-509
surgical instruments. Section 3 gives the implementation
of the presented method in details. Section 4 provides
experimental results, and then section 5 concludes this
paper.
II. METHODOLOGY FOR 3D TRACKING AND POSITIONING
A. Binocular Vision System
As an ideal linear camera model, pinhole model is the
basic imaging model in computer vision. Fig. 1 illustrates
the abstract graph of pinhole model. Where plane C is the
imaging plane, point O is the camera optical center in
plane C . The coordinate system
1
O UV in plane C is
the image coordinate system. Point
1
O is the projection
of point O in plane C . Axis
c
X and axis
c
Y are
respectively parallel with axis u and axis v in image
coordinate system. Axis
c
Z is the camera optical axis,
perpendicular to plane C . Camera coordinate system
consists of point O and axis
c
X ,
c
Y ,
c
Z .
1
OO is the
camera focal length f . To describe the position of the
camera and surroundings, we adopt a reference
coordinate system
w w w w
O X Y Z , which is called world
coordinate system.
For a point P in world coordinate system, we can
obtain its approximate imaging position p in an image,
that is the intersection point of PO and plane C .
Binocular vision system is the simplest stereoscopic
vision system consisting of two cameras. As shown in Fig.
2, ( , , )
w w w
P x y z is a point in world coordinate system, its
two image points in image planes of two camera are
l
p ( )
l l
u v and
r
p ( )
r r
u v , naming
l
p ( )
l l
u v and
r
p ( )
r r
u v conjugate point. The extension line of photo
center
l
O and point
l
p intersect with the extension line
of photo center
r
O and point
r
p , this intersection point
P is. The 3D coordinates of point P in world coordinate
system can be obtained by the internal and external
parameters of two cameras.
Generally, binocular vision system consists of five
modules: image acquisition, camera calibration, feature
extraction, stereo matching and 3D reconstruction [14].
B. Basic Principle
Our presented method is to utilize cameras to recover
3D coordinates of two markers on the simulated surgical
instruments. Aiming at this goal, each marker must be
covered by at least two cameras. If we use two cameras to
track simulated surgical instrument, we can detect four
feature points, the corresponding image regions of
markers within the motion image of simulated surgical
instrument, at a time, then we classify these four feature
points into two pairs of identical points, and calculate
their image coordinates respectively. Along with the
camera parameters, 3D coordinates of two markers can be
obtained through least square method [15]. Fig. 3 shows
the flowchart of the presented method.
Considering the virtual surgery simulation system is
extremely strict with precision, we employ three cameras
to capture the movement of simulated surgical instrument
Figure 3. Flowchart of our presented method
Figure 1. Pinhole model
Figure 2. Binocular vision system
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 503
2011 ACADEMY PUBLISHER
for the purpose of minimizing the system error introduced
by image acquisition. Three cameras construct three pairs
of cameras groups, and each camera group includes two
cameras. As for each marker, three pairs of cameras
groups obtain three groups of 3D coordinates. By
calculating their average values, we get the final and
more accurate 3D coordinates of markers.
C. System Construction
The 3D tracking and positioning apparatus of surgical
instrument based on our proposed method consists of a
simulated surgical instrument, three cameras and a
computer. Fig. 4 illustrates the hardware distribution of
our developed 3D tracking and positioning apparatus.
The gray circular area is active region of simulated
surgical instrument. The degree of included angle
constructed by any two cameras and the center of gray
circular area is 120.
The cameras are Basler acA1300-30gc cameras with
frame rate 30 Frames/sec and the highest resolution
1092*962. The cameras are connected to the computers.
The main parameters of Basler acA1300-30gc camera are
shown in Table I.
TABLE I.
THE MAIN PARAMETERS OF BASLER ACA1300-30GC CAMERA
The highest resolution 1092*962
Optical Size 1/3 inch
Pixel size (micrometer) 3.75*3.75
Sensor type CCD
Frame rate 30 Frames / sec
Data transmission Network card
Two markers with 70 mm space distance are deployed
on the simulated surgical instrument, and specific
distribution of markers has been shown in Fig. 5. As for
the actual simulated surgical instrument, its main body is
white, while two markers are black.
III. IMPLEMENTATION
A. Camera Calibration
Camera calibration is the most basic step in
stereoscopic vision [16]. Its purpose is to obtain camera
parameters, i.e., the mapping relation between the 3D
coordinates of point ( , , )
w w w
P x y z in word coordinate
system and 2D coordinates point ( , ) p x y , the projection
of point
P
on image plane of camera, in image coordinate
system. The mapping relation can be described by the
equation 1 simply:
11 12 13 14
21 22 23 24
31 32 33 34
1
1 1
w w
w w
c
w w
x x
x m m m m
y y
z y m m m m M
z z
m m m m
= =
(1)
Where
c
Z is Z coordinate of point P in camera
coordinate system, M is so called projection matrix
determined by camera parameters. Once we obtain
camera parameters, we do not need to calculate them
again until camera is moved. Generally speaking, the
detailed calibration process includes the following five
steps.
Step 1. Generation of planar calibration plate.
We adopt a regular 77 black-and-white checkerboard
as the pattern on the calibration plate, as shown in Fig. 6,
the size of each checker is 3030 mm.
Step 2. Acquisition of calibration plate images
We utilize multithread and soft trigger techniques to
synchronously capture calibration plate images. Fig. 7
gives two calibration plate images.
Figure 4. Hardware distribution of the 3D tracking and positioning
apparatus
Figure 5. Abstraction of surgical instrument
Figure 6. Planar calibration plate
504 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
Step 3. Corner detection
In this paper, we employ the function,
cvFindChessboardCorners(), from OpenCV library to
detect corners of calibration plate image [17]. For an
input image ( ) , I u v s pixel ( , ) x y , if R is bigger than a
given threshold, then pixel ( , ) x y is a Corner.
2
2
2 2 2 2 2
det ( )
( )
x y x y x y
R M k traceM
I I I I k I I
=
= +
(2)
Where k is a coefficient, generally is 0.04.
x
I and
y
I
are first gray gradients:
,
x y
I I
I I
x y
= =
(3)
M is a real symmetric matrix:
2
2
,
,
x x y
x y y
I I I
M
I I I
=
(4)
M det is M s determinant, M trace is the trace of M .
is Gauss smoothing operator. After that,
cvFindCornerSubPix() is used to get more accurate image
coordinates of corners. Fig. 8 shows the corner detection
results of calibration plate images in Fig. 6.
Step 4. Corner matching
Actually, the distributions of corners in different
rectangular arrays are not one-to-one corresponding, so
corner matching is indispensable. In our paper, we design
two basic transformation functions performing on the
rectangular corner array: clockwise rotation function and
horizontal flip function. The different combination of
these two functions can achieve all transformation of the
rectangular array needed in experiment.
Step 5. Calculation of camera parameters
When it comes to calculation of camera parameters,
the two relatively popular algorithms are Tsai two-step
method [18] and Zhangs algorithm [19], and they are
both highly accurate and robust. Compared with Tsai
two-step method, Zhangs algorithm expects camera is
supposed to capture calibration plate from different
viewpoints, but the camera and calibration plate should
be fixed all the time and cannot be moved in our
application. In this situation, we choose Tsai two-step
method to calculate the camera parameters.
The calibration results of the camera capturing Fig. 7(a)
are shown in TABLE II.
TABLE II.
CALIBRATION RESULTS
Foal length f (mm) 5.130550
Radial distortion
coefficient
kappa(1/m
2
)
4.680239e-003
Translation vector T
(mm)
3.753.75
Scale factor s 1.000000
Optical center
coordinates
x
C = 646.000000,
y
C = 482.000000
Rotation matrix R
(0.629319 0.777092 0.009271)
(0.417971 -0.328383 -0.847033)
(-0.655178 0.536929 -0.531459)
B. Reconstruction of 3D coordinates of markers
In order to recover 3D coordinates of two markers on
simulated surgical instrument, we first need to extract
corresponding feature points, and then match feature
points from three cameras to form identical points. After
these, least square method is used to calculate the 3D
coordinates of markers.
Picture Preprocessing
Owing to effect of illumination, it is unavoidable to
generate shadows of simulated surgical instrument and
hands of operator in motion images of simulated surgical
instrument. Besides, the noises are also inevitable to be
introduced during camera imaging. Therefore, image
preprocessing is quite necessary.
For any motion image of simulated surgical instrument,
this process mainly includes the following steps:
(1) Step 1. Convert the RGB image into gray image,
the conversion formula is shown as following:
[ ]
0.299
0.578
0.114
Gray R G B
=
(5)
(2) Step 2. Subtract the gray image of corresponding
background image to remove background and it can be
expressed by the following formula:
( )
max 0,
new bkg old
I I I = (6)
(a) (b)
Figure 8. Corner detection results
(a) (b)
Figure 7. Captured calibration plate images
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 505
2011 ACADEMY PUBLISHER
Where
old
I is a captured gray image,
bkg
I is
background image and
new
I is a processed image
according to formula (6).
(3) Step 3. Apply median filter to the residual image
for noise removal and it can be expressed by the
following formula:
^
( , )
( , ) { ( , ) }
xy
i j W
I x y median I i j
=
(7)
Where I is an input image,
I is an image applied
median filter. As we seen in formula (7), to calculate the
result of a pixel applied median filter, we must sort the
pixels in
xy
W , and we choose the intermediate value as
the result.
(4) Step 4. Utilize threshold segmentation method
based on statistics to complete initial shadow removal.
The threshold value is set as 101 through statistics.
Fig. 9 shows the result of image preprocessing.
Although most of background and shadows has been
removed, there are still redundant areas.
Feature points extraction and stereo matching
The purpose of feature point extraction is to further
remove redundant areas, and finally obtain two
dimensional (2D) image coordinates of marker by
calculating the barycentre of corresponding feature point.
Generally speaking, feature points in an image obey
Gaussian distribution, so their grey histograms will
appear Gaussian peaks [20]. The sum of each row is
shown in Fig. 10.
Assume the top left corner of an image is origin, the
width direction is X axis, the height direction is Y axis.
Then we have integral projection along Y axis that is
adding all pixels gray value in a row, the formula is:
0
( ) ( , )
w
i
y j I i j
=
=
(8)
Where j is the j th row of a image, w is the width of
a image.
The integral projection of Fig. 9.b on Y axis forms a
blue curve. Two highest peaks represent the projections
of two feature points on Y axis, others areas correspond
to the projections of redundant areas. If we set a proper
threshold value, it is quite easy to isolate projections of
feature points on Y axis. In the same way, we also can
get their projections on X axis. Through experiments,
we find that the threshold value is 2000. In this way, we
can determine the feature points. At last, we take the
coordinates of barycentre of each feature point as its 2D
image coordinates. The extraction results of the feature
points in Fig. 9(b) are shown as the crosses in Fig. 11.
During the actual surgery, marker 1, as shown in Fig.
11, always moves above marker 2. As for any image, the
feature point with larger Y coordinate therefore is image
region of marker 1, and another one is image region of
marker 2. In this way, the extracted feature points are
simply matched.
Calculation of 3D coordinates of markers
Suppose the feature points of marker 1,
( )
w w w
z y x Q , ,
,
are
( )
1 1 1
, q x y ,
2
q ( )
2 2
, x y and
( )
3 3 3
, q x y in images taken
by three cameras,
1
C ,
2
C ,
3
C , respectively. After camera
calibration, we have known corresponding three
projection metrics
1
M ,
2
M and
3
M . We can obtain the
following equation.
11 12 13 14
21 22 23 24
31 32 33 34
1
1 1
w w i i i i
i
w w i i i i i
ci i
w w i i i i
x x
x m m m m
y y
z y m m m m M
z z
m m m m
= =
(9)
Figure 10. The integral projection of Fig. 9(b) on Y axis
(a) (b)
Figure 9. The original motion image of simulated surgical instrument
and its image preprocessing result
Figure 11. The extracted feature points in Fig. 9(b)
506 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
As for camera group including
1
C and
2
C , we set i=1,2,
and let
12
w
x ,
12
w
y ,
12
w
z take the places of
w
x ,
w
y ,
w
z ,
respectively. Thus, we get linear system as follows.
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( )
1 1 12 1 1 12 1 1 12 1 1
1 31 11 1 32 12 1 33 13 14 1 34
1 1 12 1 1 12 1 1 12 1 1
1 31 21 1 32 22 1 33 23 24 1 34
2 2 12 2 2 12 2 2 12 2 2
2 31 11 2 32 12 2 33 13 14 2 34
2 2 12 2
2 31 21 2 32
w w w
w w w
w w w
w
xm m x xm m y xm m z m xm
ym m x ym m y ym m z m ym
x m m x x m m y x m m z m x m
y m m x y m
+ + =
+ + =
+ + =
+ ( ) ( )
2 12 2 2 12 2 2
22 2 33 23 24 2 34 w w
m y y m m z m y m + =
(10)
The linear system above includes four equations and
three unknowns
12
w
x ,
12
w
y and
12
w
z , here we utilize least
square method to solve this linear system. In the same
way, as for camera group including
1
C and
3
C , we
get
13
w
x
,
13
w
y and
13
w
z ; as for camera group including
2
C and
3
C , we get
23
w
x ,
23
w
y
and
23
w
z . At last, the final 3D
coordinates of maker 1 can be calculated as follows.
12 13 23
3
w w w
w
x x x
x
+ +
=
(11)
12 13 23
3
w w w
w
y y y
y
+ +
=
(12)
12 13 23
3
w w w
w
z z z
z
+ +
=
(13)
In the same way, we also can obtain the 3D
coordinates of maker 2. Thus, we complete the 3D
tracking and positioning of simulated surgical instrument.
IV. EXPERIMENTAL RESULTS
The detailed configuration of our experimental
platform is as follows: Computer: Intel Core Duo CPU
@2.66GHz, 2GB memory; Basler acA1300-30gc Camera:
1092
962 resolution, 30FPS; Software: Microsoft
Visual C++.net 2005.
The 3D tracking and positioning equipment is shown
in Fig. 12. Three cameras are connected to the computers
and transmit the captured pictures to computers. Then we
process the captured pictures and calculate marked
points 3D coordinates.
The absolute error between the corners calculated 3D
coordinates and the theoretical 3D coordinates is shown
in Fig. 13.
According to Fig. 13, the average absolute error of
reconstructed corners 3D coordinates is 33.40 (363)
=0.31mm. The error mainly derived from the following
three aspects:
Production and placing of the calibration plate
The error of corner detection algorithm
The error of camera calibration algorithm
As the absolute error of reconstructed corners is less
than 1mm, it can meet the required precision of the
applied field.
The image coordinates and space coordinate of
surgical instrument are shown in TABLE III.
TABLE III.
THE IMAGE COORDINATES AND SPACE COORDINATE OF SURGICAL
INSTRUMENT.
Image coordinate Space coordinate
510.01,
226.23
285.76,
407.24
952.65,
337.45
(43.79, -38.61, 92.68)
477.09,
380.54
346.31,
595.17
945.31,
474.26
(64.08, -29.52, 26.08)
663.15,
226.67
258.64,
336.43
827.08,
417.54
(-19.3, -27.34, 92.39)
625.61,
384.06
317.18,
508.61
830.26,
555.04
(-1.14, -16.55, 26.32)
674.41,
170.26
259.24,
262.90
804.00,
342.18
(-24.80, -22.31, 117.62)
634.01,
333.78
321.10,
438.32
995.31,
164.45
(-5.91, -11.83, 51.98)
456.09,
104.61
273.32,
277.64
997.67,
301.41
(60.87, -42.92, 144.09)
401.13,
258.51
350.13,
488.40
731.05,
129.39
(89.87, -34.88, 82.11)
429.32,
190.17
556.72,
239.15
757.01,
257.53
(63.21, 52.81, 140.14)
366.14,
366.27
621.82,
436.19
647.91,
239.28
(92.44, 61.47, 78.27)
553.67,
261.04
524.30,
268.42
952.08,
337.12
(21.52, 61.07, 114.55)
507.17,
438.30
580.19,
447.20
945.31,
474.16
(43.50, 71.32, 50.19)
As we mentioned above, the actual distance between
the two makers on the simulated surgical instrument is
fixed and it measures 70 mm. Fig. 14 provides the
Figure 12. 3D tracking and positioning equipment
Figure 13. Interactive organ tissue deformation simulation
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 507
2011 ACADEMY PUBLISHER
distances calculated by the 3D coordinates of the two
makers. The Fig. 15 provides the absolute error and
standard deviation calculated by the 3D coordinates of
the two makers.
As can be seen, the standard deviation of our
developed apparatus is 0.371632mm, less than 1mm,
which totally satisfies the precision requirement of
current virtual surgery simulation systems.
There mainly exist three types of errors in our
developed apparatus. The first one is the algorithm error
which comes from the implementation of algorithms. The
second error is the generation and placement of planar
calibration plate and the third one is the generation of
simulated surgical instrument manually. Overall, our
developed apparatus is precise, and the major reason for
the errors is the fabrication errors, which can be
decreased by using professorial calibration plate and
machined simulated surgical instrument.
V. CONCLUSION AND FUTURE WORK
In this paper, we present a method based on
stereoscopic vision to construct a suit of 3D tracking and
positioning apparatus of surgical instruments. It consists
of a simulated surgical instrument, three cameras and a
computer. Three cameras are used to capture the motion
images of simulated surgical instrument in real time.
After a series of computer processing, we can obtain the
six degree of freedom information of simulated surgical
instruments with the absolute error of less than 1 mm,
thereby positioning the instrument. Then, we analyze the
sources of error, and integrate the developed apparatus
into soft tissue deformation simulation in virtual surgery.
The experimental results show that the proposed method
is highly accurate and easily operated, even if it is
inexpensive. In future work, we will utilize this tracking
and positioning method and equipment to capture the soft
tissue warping images and calculate the soft tissue
parameters by using least square method. Also we will
perfect our tracing and positioning method and construct
parameter measurement platform, proving the
effectiveness of it.
ACKNOWLEDGMENTS
This work was fully supported by a grant from the
National Natural Science Foundation of China (Grant No.
61070079).
REFERENCES
[1] Bassma Ghali, B. Eng. Algorithms for Nonlinear Finite
Element-based Modeling of Soft-tissue Deformation and
Cutting. Electrical & Computer Engneering, Hamilton,
Ontario, Canada, 2008.
[2] Cagetay Basdogn, Mert Sedef and Mathias Harders et al.
VR-Based simulators for training in minimally invasive
surgical. IEEE Computer Graphics and Applications,
2007, vol. 27, no. 2, pp. 54-66.
[3] Lgor Peterlk, Mert Sedef, Cagatay Basdogan, Ludek
Matyska. Real-time visio-haptic interaction with static
soft sissue models having geometric and material
nonlinearity, Computers & Graphics, 2010, vol. 34, no.1,
pp. 43-54.
[4] Florian Schulze, Katja Bhler, Andr Neubauer, Armin
Kanitsar, Leslie Holton, Stefan Wolfsberger. Intra-
operative virtual endoscopy for image guided endonasal
transsphenoidal pituitary surgery. International Journal of
Computer Assisted Radiology and Surgery, 2010, vol. 5, no.
2, pp. 143-154.
[5] Changmok Choi, Jungsik Kim, Hyonyung Han, Bummo
Ahn, Jung Kim. Graphic and haptic modeling of the
oesophagus for VR-Based medical simulation,
International Journal of Medical Robotics Computer
Assisted Surgery, 2009, vol. 5, no. 3, pp. 257-266.
[6] J. C. Krieg. Motion tracking: polhemus technology.
Virtual Reality Systems, mar. 1993, vol. 1, no. 1, pp. 32-36.
[7] A.L. Trejos, R.V. Patel, M.D. Naish and C.M. Schlachta.
Design of a sensorized instrument for skills assessment
and training in minimally invasive surgery. the 2nd
Biennial International Conference on Biomedical Robotics
and Biomechatronics, Scottsdale, Arizona, October 19-22,
2008: 965970.
[8] Yamaguchi S, Yoshida D, Kenmotsu H, Yasunaga T,
Konishi K, Ieiri S, Nakashima H, Tanoue K, Hashizume
M. Objective assessment of laparoscopic suturing skills
using a motion-tracking system. Surg Endosc, 2010, vol.
25, no. 3, pp. 771-775.
[9] J. Stoll, P. Novotny, P. Dupont and R. Howe. Real-time
3d ultrasound-based servoing of a surgical instrument.
IEEE International Conference on Robotics and
Automation, Orlando, FL, 2006: 613-618.
Figure 15. Absolute error and standard deviation
Figure 14. Standard distance and actual distance
508 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
[10] Nogueira JF Jr, Stamm AC, Lyra M. Novel compact
Laptop-based image-guidance system: preliminary study.
Laryngoscope, 2009, vol. 119, no. 3, pp. 576-579.
[11] Tansel Halic, Sinan Kockara, Coskun Bayrak, Richard
Rowe. Mixed reality simulation of rasping procedure in
artificial cervical disc replacement (ACDR) surgery, BMC
Bioinformatics 2010, vol. 11, no. Suppl 6, pp.S11.
[12] Foxlin E. Motion Tracking Requirements and Technologies.
In K. Stanney, Handbook of Virtual Environment
Technology, Kay Stanney, Editor, Lawrence Erlbaun,
Associates, 2002: 163-210.
[13] Tomikawa M, Hong J, Shiotani S, Tokunaga E, Konishi K,
Ieiri S, Tanoue K, Akahoshi T, Maehara Y, Hashizume
M, Real-time 3-dimensional virtual reality navigation
system with open MRI for breast-conserving surgery.
Journal of the American College of Surgeons, 2010, vol.
210, no. 6, pp. 927-933.
[14] Songde Ma, Zhengyou Zhang, Computer vision: The Basis
of Theoretical Calculations and Algorithms. Science Press,
Beijing, 2004.
[15] Morgan Kaufmann. Machine Vision : Theory, Algorithms,
Practicalities. Morgan Kaufmann Publishing Co. Inc., 3rd
Edition, 2004.
[16] Richard L. Burden, J. Douglas Faires. Numerical Analysis.
Brooks Cole. 9th edition, August 9, 2010.
[17] http://sourceforge.net/projects/opencvlibrary/.
[18] Roger Y. Tsai. A versatile camera calibration technique
for high-accuracy 3D machine vision metrology using off-
shelf TV cameras and lenses. IEEE Journal of Robotics
and Automation, August 1987, vol.3, no. 4, pp. 323344.
[19] Zhengyou Zhang. A flexible new technique for camera
calibration. IEEE Transactions on Pattern Analysis and
Machine Intelligence, November 2000, vol. 22, no. 11,
pp.13301334.
[20] Xuan Yang, Ji Hong Pei, Wan Hai Yang. Real-time
detection and tracking of light point. Journal of Infrared
and Millimeter Waves, 2001, vol. 20, no. 4, pp. 279-282.
Zhaoliang Duan is currently a fresh-year
Master's student in Computer School of
Wuhan University in China. He has
received his BS degree in computer
science and technology from Wuhan
University of China in June 2011. He is
recommended for admission to Wuhan
University to continue his study in the
next 2 years.
His research interests include computer simulation and
virtual reality.
He has published 10 papers in related conference
proceedings and journals.
Zhiyong Yuan received the BS degree in
Computer Application and MS degree in
Signal and Information Processing from
Wuhan University, in 1986 and 1994,
respectively, and a Ph.D. degree in
Control Theory and Control Engineering
from Huazhong University of Science
and Technology in 2008. He was an
assistant professor from 1994 to 1998 and
have been an associate professor at Wuhan University since
1999. During 2006-2007, He was a visiting professor at the
Dept. of Neurological Surgery, School of Medicine, University
of Pittsburgh, conducting research on computer-based surgical
simulation system for training endoscopic neurosurgeons.
His Research interests include Computer Simulation and
Virtual Reality, Embedded System and IOT applications, Image
Processing and Pattern Recognition, Intelligent Healthcare
System.
He has been the reviewer of Journal of X-Ray Science and
Technology, Journal of Supercomputing, Journal of Computer
Science and Technologys reviewer He has published over 60
papers in related conference proceedings and journals.
Xiangyun Liao is currently a fresh-year
Master's student in Computer School of
Wuhan University in China. He has
received his BS degree in computer
science and technology from Wuhan
University of China in June 2011. Liao
is recommended for admission to Wuhan
University to continue his study in the
next 2 years.
His research interests include computer simulation, virtual
reality, image processing and pattern recognition, etc.
He has published 7 papers in related conference proceedings
and journals.
Weixin Si is currently a fresh-year
Master's student in Computer School of
Wuhan University in China. He has
received his BS degree in computer
science and technology from Wuhan
University of China in June 2011. Si is
recommended for admission to Wuhan
University to continue his study in the
next 2 years.
His research interests include computer simulation and virtual
reality, image processing and pattern recognition, etc.
He has published 9 papers in related conference proceedings
and journals.
Jianhui Zhao received the B.Sc. degree
in Computer Engineering from Wuhan
University of Technology in 1997, the
M.Sc. degree in Computer Science from
Huazhong University of Science and
Technology in 2000, and the Ph.D.
degree in Computer Science from
Nanyang Technological University in
2004. From 2003 to 2006, he worked as a
Research Assistant/Associate in Hong Kong University of
Science and Technology. Currently he is working as an
Associate Professor in Computer School of Wuhan University.
His research interests include computer graphics and digital
image processing.
He has published over 50 papers in related conference
proceedings and journals
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 509
2011 ACADEMY PUBLISHER
Design of Image Security System Based on
Chaotic Maps Group
Feng Huang
College of Electrical & Information Engineering, Hunan Institute of Engineering, Xiangtan, China
Email: huangfeng25@126.com
Xilong Qu
College of Computer & Communication, Hunan Institute of Engineering, Xiangtan, China
Email: quxilong@126.com
AbstractImages are used more and more widely in
peoples life today. The image security becomes an
important issue. Some encryption technologies are used to
ensure the security of images. In them, the SCAN patterns
are the one of effective tools to protect image. It generates
very large number of scanning patterns of image. Then it
shuffles the positions of image pixels by the patterns. The
idea of chaotic maps group is similar to SCAN patterns. The
paper designs a new image security system based on chaotic
maps group. It takes the different maps of chaotic maps as
patterns. The key represents different chaotic map patterns.
Simulation shows that the image security system has a fast
encryption speed and large enough key space, which mean
high security. The design solve the limit between the keys
and the size of image when encrypt image by chaotic map.
At the same time it also solves the problem of the size of
image required by SCAN pattern.
Index Termsimage security, chaotic maps, encryption
I. INTRODUCTION
Over the past decade, digital images were used more
and more widely today. People used to take some pictures
with a digital camera. They uploaded the pictures to
Internet websites, act as Facebook. How to protect the
security of image became a more and more important
issue. Some traditional encryption technologies such as
DES, RSA, etc can be used to protect the security of
digital images. But for some intrinsic features of image,
such as bulk data capacity and high correlation among
adjacent pixels, the technologies are not suitable
absolutely for practical image encryption
[1]
.
There are some new technology are used in image
encryption. Blowfish is a symmetric block cipher that can
be used as a drop-in replacement for DES or IDEA. In [2],
blowfish algorithm is used to encrypt image. The plain
image was divided into different blocks, which were
rearranged into a transformed image using a
transformation algorithm.
In [3], a new method for image encryption is present.
Specific higher frequencies of DCT coefficients are taken
as the characteristic values which are encrypted. The
resulted encrypted blocks are shuffled according to a
pseudorandom bit sequence.
Chaos can be well used in image encryption
[4,5]
now.
Chaos has some characteristics which can be connected
with the confusion and diffusion property in
cryptography, such as sensitive dependence on initial
conditions and parameters, broadband power spectrum,
randomness in the time domain, ergodicity, low-
dimensional etc.
In fact the idea of using chaos for encryption can be
trace back to the classical Shannons paper
[6]
in which
the basic stretch-and-fold mechanism was proposed
which could be used for encryption. And the stretch-and-
fold mechanism can generate chaotic map. The image
encryption uses the geometric characteristics of image.
The process of stretch-and-fold represses the changes of
distance among pixels. It shuffles the position of pixels.
In fact it is a process of image permutation. Combined
with diffusion mechanism, it can change the value of
image pixels.
Some classic chaotic maps are used in image
encryption such as the cat map, the baker map etc. In [5]
a symmetric image encryption scheme is obtained. It is
shown that the permutations induced by the baker map
behave as typical random permutations. The cipher has
good diffusion properties with respect to the plain image
and the key. But the baker map does not have simple
formula and the key are limited by size of image. In [7,8]
symmetric image encryption schemes based on three-
dimensional chaotic maps are proposed. It employs a
chaotic map to shuffle the positions of image pixels and
uses the logistic map to confuse the relationship between
the cipher image and the plain image. In [9], a new
invertible two-dimensional chaotic map, the line map was
proposed. An image encryption scheme based on the line
map is developed and the execution of the scheme is fast
and the key can be any enough long integer to satisfy the
high security requirements.
SCAN patterns
[10,11]
are the one of effective tools to
protect image. SCAN patterns generate very large
number of scanning paths or space filling curves. The
image encryption is performed by SCAN-based
510 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
doi:10.4304/jmm.6.6.510-517
permutation of pixels and a substitution rule which
together form an iterated product cipher. But one of the
questions is that the SCAN patterns require the plain
image must be square image and its size must be even.
The paper designs a new image security system based
on chaotic maps group. The idea of chaotic maps group is
similar to SCAN patterns technology. It takes the
different chaotic maps as encryption patterns. The key
represents the number of different chaotic map patterns.
Simulation shows that the image security system has a
fast encryption speed and large enough key space, which
mean high security. The design solve the limit between
the keys and the size of image when encrypt image by
chaotic map. At the same time it also solves the problem
of the size of image required by SCAN pattern. Analysis
shows that the image security is safe.
II. THE KEY GENERATION
Generally people are used to take six decimal numbers
as the key. Here the origin key are N
0
N
1
N
2
N
3
N
4
N
5
, 0 N
i
9 (i=0,1,2,3,4,5). The process of the key generation is as
following.
Firstly, it can use the value of N
0
and
N
1
to get the key
of permutation (key
1
and key
2
)
.
Here there are a map table
between the value of N
0
, N
1
and the position of origin key
.
It can see in Table I.
For example, if the origin key is 476328, here N
0
=4,
from the Table I, key
1
is equal to the value of the 6
nd
number of the origin key. Key
1
is 3. If N
1
=7, the value of
key
2
is equal to the number of the 2
th
number of the origin
key. Key
2
is 7. And the key
3
is 7, the key
4
is 2.
Secondly, the left parts of the origin key are the part
after the decimal point. The result is key
5
, which is for
part of diffusion. Here key
5
is 0.468.
TABLE I.
THE MAP BETWEEN N
0
,
N
1
,
N
2
,
N
3
AND THE POSITION OF ORIGIN KEY
N
0
,
N
1
,
N
2
,
N
3
Position in origin key
0 3
1 6
2,3 5
4,5 4
6,7 2
8,9 1
The process of the key generation can be seen in
Figure 1.
Figure 1. The key generation.
III. THE IDEA OF CHAOTIC MAPS GROUP
The paper uses five new two-dimensional chaotic maps
as pattern group. All of the maps utilize an important
characteristic of images, which is each pixel of column of
image can be inserted into adjacent two pixels of row of
image. The new chaotic map can encrypt images by
processing image stretch-and-fold. The processes of the
chaotic maps are seen in Figure 2.
0 1 2 3
4 5 6 7
8 9
Figure 2. Chaotic maps group.
Supposing the dimension of a square image is NN,
where N is an integer. A(i, j) is the matrix of a square
image, in which each element corresponds to a gray-level
value of the pixel (i, j); L(i), i=0, , N-1, j=0, , N-1. N
is a one dimensional vector mapped from A.
The chaotic maps all have two different maps. Those
are left map and right map.
The left map algorithms
The first chaotic map,
0
( (4 1) 1) ( ((4 2) / 2), ( / 2))
i
k
k j floor i j floor j
=
+ + = +
L A
(1)
where i=0,1,, N/2-1, j=0,1,,4i-2. N is the even
number.
/ 2
1 / 2
( (4 1) (4 1 4 ) 2 1 )
( ((2 1 ) / 2), 2 1 (( 1) / 2))
N i
k k N
k N k N j
floor N j i N floor j
= =
+ + +
= + + +
L
A
(2)
where i=N/2, N/2+1,, N-1, j=0,1,,4N-4i. N is the odd
number.
0
( (4 1) 1) ( ((4 2) / 2), ( / 2))
i
k
k j floor i j floor j
=
+ + = +
L A
(3)
where i=0,1,,(N-1)/2-1, j=0,1,,4i-2.
( 1) / 2
1 ( 1) / 2
( (4 1) (4 1 4 ) 2 3 )
( ((2 1 ) / 2), 2 (( 1) / 2))
N i
k k N
k N k N j
floor N j i N floor j
= =
+ + +
= +
L
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 511
2011 ACADEMY PUBLISHER
(4)
where i=(N+1)/2-1, (N+1)/2+1,, N-1, j=0,1,,4N-4i.
The 2
nd
chaotic map,
[(2 ) 2( ) 1] ( , ) N j j i j i j + = L A (5)
where i > j.
[(2 ) 2( )] ( , ) N i i j i i j + = L A (6)
where
i j
.
The 3
th
chaotic map,
[(2 2 1) 2( ) 1] ( , ) N j j i j i j + + = L A (7)
where N j i < + , j i > .
[(2 2 1) 2( )] ( , ) N i i j i i j + + = L A (8)
where N j i < + , j i .
2
[ (2 1) ( 1 ) 2( )] ( , ) N j N j j i i j + = L A
(9)
where N j i + , j i < .
2
[ (2 1) ( 1 ) 2( ) 1] ( , ) N i N i i j i j + = L A
(10)
where N j i + , j i .
The
4th
chaotic map,
[( 2)( 1) / 2 2( )] ( , ) N j N j j i i j + + + = L A
(11)
where i j ,N, j both are even numbers or odd numbers,
ji N-j are odd numbers.
[( 3)( 2) / 2 2( ) 1] ( , ) N j N j j i i j + + + + = L A
(12)
where ji, N-j is even number.
2
[( (2 1) ) / 2 2( 1)] ( , ) N N N j j N i i j + + + = L A
(13)
where j<i, j is even number.
2
[( (2 ) ( 1)) / 2 2( ) 1] ( , ) N N N j j N i i j + + + = L A
(14)
where j<i, j is odd number.
The 5
th
chaotic map,
(( 1) / 2 2( ) 1) ( , ) j j j i i j + + + = L A (15)
where i j , N, j both are even numbers or odd
numbers.
(( 1) / 2 2( )) ( , ) j j j i i j + = L A (16)
where i j , N is even numbers and j is odd numbers, or
N is odd numbers and j is even numbers.
2
([ (2 1) ] / 2 2( ) 2) ( , ) N N N j j N i i j + + + = L A
(17)
where i j < , j is even number.
2
([ (2 ) ( 1)] / 2 2( ) 1) ( , ) N N N j j N i i j + + + = L A
(18)
where i j < , j is odd number.
The right map algorithms
The right map is symmetry with left map. First, a
mirror process of the image is made. The algorithm of the
mirror image is described with the following formula:
) 1 , ( ) , ( j N i A j i A = (19)
where A is the matrix of the mirror image of a square
image A.
After obtaining the mirror image A of A, the right
map can be done with the algorithms of the left map.
Of course, in order to increase the efficiency of
calculation of the right map permutation, the best way is
to derive the algorithm of the right map directly.
Some of the left map algorithms are as follows.
The 2
nd
chaotic map,
[(2 ) 2( ) 1] ( , 1 ) N j j i j i N j + = L A (20)
where i > j.
[(2 ) 2( )] ( , 1 ) N i i j i i N j + = L A (21)
where
i j
.
The 3
th
chaotic map,
[(2 2 1) 2( ) 1] ( , 1 ) N j j i j i N j + + = L A
(22)
where N j i < + , j i > .
[(2 2 1) 2( )] ( , 1 ) N i i j i i N j + + = L A (23)
where N j i < + , j i .
2
[ (2 1) ( 1 ) 2( )] ( , 1 ) N j N j j i i N j + = L A
(24)
where N j i + , j i < .
2
[ (2 1) ( 1 ) 2( ) 1] ( , 1 ) N i N i i j i N j + = L A
(25)
where N j i + , j i .
The
4th
chaotic map,
[( 2)( 1) / 2 2( )] ( , 1 ) N j N j j i i N j + + + = L A
(26)
where i j ,N, j both are even numbers or odd numbers,
ji N-j are odd numbers.
[( 3)( 2) / 2 2( ) 1] ( , 1 ) N j N j j i i N j + + + + = L A
(27)
where ji, N-j is even number.
2
[( (2 1) ) / 2 2( 1)] ( , 1 ) N N N j j N i i N j + + + = L A
(28)
where j<i, j is even number.
2
[( (2 ) ( 1)) / 2 2( ) 1] ( , 1 ) N N N j j N i i N j + + + = L A
(29)
where j<i, j is odd number.
The 5
th
chaotic map,
(( 1) / 2 2( ) 1) ( , 1 ) j j j i i N j + + + = L A (30)
where i j , N, j both are even numbers or odd numbers.
(( 1) / 2 2( )) ( , 1 ) j j j i i N j + = L A (31)
where i j , N is even numbers and j is odd numbers, or
N is odd numbers and j is even numbers.
2
([ (2 1) ]/ 2 2( ) 2) ( , 1 ) N N N j j N i i N j + + + = L A
(32)
where i j < , j is even number.
2
([ (2 ) ( 1)] / 2 2( ) 1) ( , 1 ) N N N j j N i i N j + + + = L A
(33)
where i j < , j is odd number.
512 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
The map from a line to a square image
The line of NN pixels L is further mapped to a same
size NN square image, B. The map from line L to image
B is described with the following formula:
( , ) ( ) B i j L i N j = +
(34)
IV. DESIGN OF IMAGE SECURITY SYSTEM
Obviously, the design of image encryption based on
chaotic maps group is more flexible than based on single
chaotic map.
In [9] the image encryption is achieved by pixels
permutation firstly. Since the chaotic map was divided
into the left map and the right map, the numbers of the
left map and the right map were used as secret key in
image encryption. If the key are in decimal, from the least
significant digit to the most significant digit, each digit
(0-9) corresponds to the iteration number of the left map
and the right map alternately.
So in one of design the key can be design as following.
The first digit of key represses the time of the left map of
the first chaotic map. The 2nd digit of key represses the
time of the right map of the first chaotic map. The
following key design is similar to them.
There is a serious security problem. The process of
chaotic map must cost time. If the numbers of the left
map and the right map were used as secret key in image
encryption, the whole encryption time may disclosure the
iteration number of chaotic map. In fact it is the key.
If the key are 103050 the whole encryption time is
about 0.02s (the CPU of pc is Intels L2300, the ram is
1G, and the operating system is windows XP). While key
is 1 or 01 the time is about 0.0023. Obviously, the
iteration number is about 9. The sum of all the bits of key
is about 9. Theoretical the original key space is 10
6
. But
in fact the key space is only 2,002 which are much
smaller than the theoretical values. Parts of theoretical
and real key space of six decimal numbers key can be
seen in Table.3, the sum express the sum of all the bits of
key. In fact the real key space is also smaller than the
values in table by previous conclusions. Act as when sum
is 1, the key space only is 2 which much smaller than
theoretical value.
TABLE II.
PARTS OF THEORETICAL AND REAL KEY SPACE OF SIX DECIMAL KEY
Real key space
Sum
is 1
Sum
is 2
Sum
is 3
Sum
is 4
Sum
is 5
Sum
is 6
Sum
is 7
Sum
is 8
6 21 56 126 252 462 792 1287
It can be noted that it is not safe when the sum of all
the bits of key is small. The key space becomes bigger
and bigger with the increase of sum of all the bits of key.
The paper uses other design. The each decimal
number represses one chaotic map. For example, the
number 0 represses the left map of the first chaotic map;
the number 1 represses the right map of the first chaotic
map and so on. So the key from 0 to 9 mean different
map. If the keys are 3772 mean shuffle the image by
the right map of the 2
nd
chaotic map and twice right map
of the 4
th
chaotic map and the left map of the 2
nd
chaotic
map.
The second step is to add a diffusion mechanism to
confuse the value of pixels and change the demographic
characteristics of the cipher image. It can use the logistic
map. Here a is 3.9, Xn is the key
5.
) 1 (
1 n n n
X X a X =
+
(35)
where
) 4 , 0 ( a
,
) 1 , 0 (
n
X
,
3 , 2 , 1 = n
An image encryption is carried out based on the map.
The plain image and cipher image are shown in Figure 3.
It has 256256 pixels with 256 grey levels. The plain
image is encrypted using the chaotic map by the key 0
and 0123456789. It can be seen that the plain image
has been encrypted. The plain image and the cipher
image are equal for every pixel; the decrypted image is
recovered completely. It shows the image encryption
using the chaotic map has no message loss.
(a) Plain image
(b) Cipher image (key 0)
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 513
2011 ACADEMY PUBLISHER
(b) Cipher image (key 0123456789)
Figure 3. Plain image and cipher images.
V. SECURITY ANALYSIS OF PERMUTAION
Key space
Since the length of the key of the map has no limit, its
key space can be calculated according to the length of the
key. Suppose the key are represented in binary bits. The
relationship between the key space size and the key
length is shown in Table III. In theory, security key can
be any long integer to satisfy the different security
requirements.
TABLE III.
KEY SPACE SIZE VS KEY LENGTH
Key length (bits) 64 256 512
Key space size 1.8410
19
1.1610
77
1.3410
154
Key Sensitivity
Assume that an image is encrypted using the map with
key is 0123456789. Now, the least significant digit of
the key is changed and the test is done for image
decryption. The original key 0123456789 is changed to
key 0123456788 and key 0123456780, both of which
are used to decrypt the cipher image by the original key
0123456789 respectively. The two decrypted images
by two different key are shown in Figure 4. It can be seen
that the image cannot be decrypted using both two key,
which are different from the correct key only in the least
bit one. Therefore, the security of the image encryption
using the chaotic map is much effective.
(a) Decrypted cipher image by error key
(b) Decrypted cipher image by another error key
Figure 4. The sensitivity of key
Correlation
Correlation of two adjacent pixels in a cipher image:
( ) ( ( ))( ( )) cov x, y = E x - E x y - E y (36)
( )
( ) ( )
xy
cov x, y
r =
D x D y
(37)
Where x and y are gray-scale values of two adjacent
Figure 5 shows the correlations of two horizon-tally
adjacent pixels in the plain image and the cipher image:
the correlation coefficients are 0.9442 and 0.0010.
Similar results for diagonal and vertical directions were
obtained and are shown in Table IV.
514 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
(a) Plain image
(b) Cipher image
Figure 5. Correlations in plain image and cipher image.
TABLE IV.
CORRELATION COEFFICIENTS OF TWO ADJACENT PIXELS
Plain image Cipher image
horizontal
0.9442
0.0010
vertical
0.9711
0.0007
diagonal
0.9187
0.0032
Fixed point ratio
Where key is 0 BD=0.69%, where key is
0123456789 BD=0.71%.
Those mean the positions of the 99% plain image
pixels are charged.
Change of the gray
Where key is 0 GAVE=51.9501, where key is
0123456789 GAVE =52.5440.
Those mean the average values of the pixels are
charged by 20%.
r-m self-relevance
Where r=1, the r-m self-relevance can seen in Table V,
here keys are 0 and 0123456789.
It can be proved the self-relevance of cipher image
significantly reduced compared with the plain image. The
value of self-relevance is even smaller than the value
when m=1. Those mean the effect of permutation is very
good.
TABLE V. SELF-CORRELATION OF IMAGE
m 1 2 3 4 5 6 7
lena 0.41 0.41 0.46 0.50 0.54 0.57 0.60
Key
1
0.22 0.22 0.24 0.26 0.28 0.29 0.31
Key
2
0.13 0.13 0.14 0.15 0.15 0.16 0.16
m 8 9 10 11 12 13 14
lena 0.62 0.64 0.66 0.68 0.69 0.70 0.71
Key
1
0.32 0.33 0.34 0.35 0.36 0.36 0.37
Key
2
0.17 0.18 0.18 0.19 0.19 0.20 0.20
m 15 16 17 18 19 20
lena 0.73 0.73 0.74 0.75 0.76 0.77
Key
1
0.38 0.38 0.39 0.40 0.40 0.41
Key
2
0.21 0.22 0.22 0.23 0.23 0.24
Speed of encryption and decryption
Where the sizes of image are 3232, 6464,
128128, 256256, 512512, the key is 0123456789, the
speed of encryption and decryption are shown in Table VI.
Simulation results show that the encryption speed is
enough fast to meet the real application.
TABLE VI. SPEEDS OF ENCRYPTION AND DECRYPTION
Size of image Encryption(s) Decryption(s)
3232 0.0002 0.0002
6464 0.0009 0.0008
128128 0.0034 0.0033
256256 0.0134 0.0129
512512 0.0534 0.0533
VI. SIMULATION
The image security system includes four parts: key
generation, permutation, diffusion mechanism and
hardware realization.
Key generation can see in part II. So the keys for
permutation are 3772 and for diffusion mechanism is
0.468.
It uses the chaotic maps group as permutation. The
keys for permutation mean the different chaotic maps. It
is explained in part IV.
The classic logistic map is used as diffusion
mechanism. The key for diffusion mechanism is the
parameters of logistic map. The formula of logistic map
can see (35).
In design of hardware system, firstly, it must consider
the characteristics of multi-sensor fusion image. The
multi-sensors fused images have higher accuracy, more
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 515
2011 ACADEMY PUBLISHER
information and more complex shapes than normal
images. Secondly it must consider the issue of real-time
implementation. Some chaotic maps can be improved in
hardware implementation. Act as the mapping process
can be seen as a regular scan mode. So it can be
compressed. Lastly, the design of hardware must face the
popular image formats (act as JPEG 2000). Otherwise it
may undermine the stream. The cipher image can not be
decoded.
The chaotic maps are the permutation and combination
of pixels. The charges of location of the transformation
can be converted to accumulate in smart chip cache. So in
theory, chaotic image encryption algorithm is easy for
hardware implementation.
A hardware design as shown in Figure 6. The system
includes address generator, control logic unit, RAM and
accumulator. The key
1
and key
2
control the process of
permutation. By address generator and control logic unit
it can charge the address of plain image in RAM. In fact
it confuses the pixel of plain image. The key
3
is the
parameters of diffusion mechanism. By accumulator it
can charge the value of pixels of plain image. After the
process the histogram of cipher image becomes flat. It
can confuse the value of pixels and change the
demographic characteristics of the cipher image by
diffusion mechanism.
In fact the design only considers operation of the pixels.
In order to enhance the adaptability, it must consider
some common compression algorithm, act as JPEG 2000.
Figure 6. Hardware design.
Simulation show the time of permutation is about 4ms.
So the speed of permutation is 15M/s (the CPU of pc is
Intels L2300, the ram is 1G, the operating system is
windows XP). And the time of diffusion is about 5ms. So
the total speed of encryption is about 7M/s. The cipher
image is shown in figure 7.
(a) Cipher image
(b) Histogram of cipher image
Figure 7. Simulation
VII. SUMMARY
The paper designs a new image security system based
on chaotic maps group. The idea of chaotic maps group is
similar to SCAN patterns technology. It takes the
different chaotic maps as encryption patterns. The key
represents the number of different chaotic map patterns.
Simulation shows that the image security system has a
fast encryption speed and large enough key space, which
mean high security. The design solve the limit between
the keys and the size of image when encrypt image by
chaotic map. At the same time it also solves the problem
of the size of image required by SCAN pattern. Analysis
shows that the image security is safe. It can be used in
real-time image encryption applications with a speed
about 7M/s.
ACKNOWLEDGE
The authors gratefully acknowledge the projects
supported by scientific research fund of Hunan Provincial
Education Department (08B015, 08A009), project
supported by Provincial Natural Science Foundation of
Hunan (10JJ6099), supported by Provincial Science &
Technology plan project of Hunan (2010GK3048)
supports the research.
516 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
REFERENCES
[1] S. Mazloom, A. M. Eftekhari-Moghadam, Color image
encryption based on coupled nonlinear chaotic map,
Chaos, Solitons & Fractals. vol. 42(3), pp. 1745-1754, 15
November 2009.
[2] B. Y. Mohammad Ali and J. Aman, Image encryption
using Block-Based transformation algorithm, IAENG Int
J of Computer Science, vol. 35(1), pp. 15-23, 19 February
2008.
[3] L. Krikor, S. Bab, T. Ari and Zyad Shaaban, Image
encryption using DCT and stream cipher,. European
Journal of Scientific Research, vol.32(1), pp.47-57, 2009.
[4] L. Kocarev, Chaos-Based cryptography: a brief
overview, IEEE Circuits and System Magazine, vol.1(3),
pp.6-21, 2001.
[5] J. Fridrich, Symmetric ciphers based on two-dimensional
chaotic maps,Int J Bifurcat Chaos, vol.8, pp.1259-1284,
1998.
[6] C.E.Shannon, Communication theory of secrecy
systems, The Bell System Technical Journal, vol.28, no.4,
pp.656-715, 1949.
[7] G. Chen, Y. Mao, and C. K. Chui, A Symmetric Image
Encryption scheme based on 3D chaotic cat maps,Chaos
Solitons and Fractals, vol.21, pp.749-761, 2004.
[8] Y. Mao, G. Chen and S. Lian, A novel fast image
encryption scheme based on 3D chaotic Baker maps, Int J
Bifurcat Chaos, vol.14, pp.3613-3624, 2004.
[9] Y. Feng, L. J. Li, F. Huang, A symmetric image
encryption approach based on line maps, ISSCAA, vol.1,
pp.1362-1367, 2006.
[10] N. Bourbakis, C. Alexopoulos, Picture data encryption
using SCAN patterns, Pattern Recognition, vol.25(6),
pp.567-58, 1992.
[11] S. S. Maniccam, N. G. Bourbkis, Image and video
encryption using Scan Patterns, Pattern Recognition,
Vol.37(4): pp.725-737, 2004.
Feng Huang (1978-), was born in
Shaoyang, Hunan, P. R. C. He
received the B.S. degree in
automatic test and control form
Harbin Institute of Technology, P.
R. C., in 2000, the M.S. degree in
Power Engineering form Harbin
Institute of Technology, P. R. C., in
2002, and the PHD degree in
Power Electronics and Power
Drives from Harbin Institute of
Technology, P. R. C., in 2007. He is an associate
professor of College of electrical & information
engineering, Hunan Institute of Engineering, Xiangtan, P.
R. C. His research interests include image encryption,
design of automated test system. He had several years
experience in teaching, research and development in
projects and published over 20 scientific papers.
Xilong Qu(1978-), was born in Shaoyang, Hunan, P. R. C.
He received the PHD degree from Southwest Jiaotong
University, P. R. C. in 2006. Now he is an associate
professor of Hunan Institute of Engineering, master
supervisor of Xiangtan University, the key young teacher
of Hunan province, academic leader of computer
application technology in Hunan Institute of Engineering.
His research interesting is web service technology,
information safety and networked manufacturing. He has
published over 30 papers in some important magazines.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 517
2011 ACADEMY PUBLISHER
The Capture of Moving Object in Video Image
Weina Fu
a
, Zhiwen Xu
b
, Shuai Liu
a, b
, Xin Wang
a,b
, Hongchang Ke
a
a
Software College, Changchun Institute of Technology, Changchun, China
Email: ls_25210114@sohu.com
b
College of Computer Science and Technology, Jilin University, Changchun, China
Email: ls_25210114@163.com
AbstractNowadays, video is a primary information carrier
in www (world wild web) and moving objects are often
carrying more information. But it is hard to catch these
objects in video quickly and correctly. In this paper, we put
forward a method to catch moving object in video. Firstly,
based on difference image method, we determine moving
region in video image. To avoid hardness to build
background, we build background with a new algorithm
based on difference changed. Finally, we get the objects and
denoise them with erosion and dilation. The experimental
result is shown that the new method is feasibility and high-
quality.
Index Termsvideo capture, denoise, moving object, video
image
I. INTRODUCTION
An automatic video-based face recognition system
includes human face detection part, face tracking part,
facial feature capture part and the people face recognition
part
[1-3]
. Obviously, the premise is to locate the face. It is
segment to two directions. One is how to locate human
face in slack images
[4]
, the other is how to locate human
face in video
[5]
. Moreover, it is well known that object
recognition system has the samilar parts.
An important method in video capture is to find and
get changed regions of moving targets from background
in series images of video. We can see the status of our
work in video capture in alorithm a.
Algorithm a (Capture Moving Object in Video )
Input: a series video images
Output: the moving region (object).
Step1. Find difference between every two next images.
Step2. Judge if points are relative by relation of color
and moving rules.
Step3. Catch the region (object) for next process.
We often call this method segmentation of moving
objects. The effective segmentation of moving region is
important in post-processing such as target classification,
tracking and behavior understanding. However, due to the
dynamic changes of the background image, such as
weather, light, shadow and other disturb factors, to make
effective segmentation is still a difficult work.
In segmentation of moving objects, the main
segmentation method is difference image method, time
difference method and optical flow method
[6-7]
.
Difference image method is a kind of technology that to
segment regions by use difference of current frame and
background frame. But it is too sensitivity by dynamic
scene that it makes lots mistakes. The main limitation of
time difference method is that it can not get all pixels
with general characters and it is often create a hole inside
of moving task. Algorithm of optical flow method is too
complex and too poor to resist noise. To compare these
three methods comprehensively, the created new method
is based on difference image method because that it is
simple and easy to implement in real time environment
by the video image with general static background.
Nowadays, scholars improved object recognition to a
new position. Massimo
[8]
and partners provide an
overview and some new insights on the use of dynamic
visual information for face recognition. In their paper, not
only physical features emerge in the face representation,
but also behavioral features should be accounted. They
give some experimental results obtained from real video
image data to show the feasibility of the proposed
approach.
Junius
[9]
and partners demonstrate a three-dimensional
(for location, time, and magnitude of body part
movement) pattern representation of entire time-
dependent front-view gait cycle that simultaneously
displays the coupled kinetics of different body parts
thereby revealing possible irregularities in the gait.
Among the potential applications of their technique are
improved diagnosis and treatment of gait pathologies in
rehabilitation clinics and modeling schools as well as
development of more robust surveillance systems.
Tomokazu
[10]
and partners propose an efficient method
for estimating a depth map from long-baseline image
sequences captured by a calibrated moving multi-camera
system. The experiment is verified the validity and
feasibility of their algorithm for both synthetic and real
outdoor scenes.
In this paper, we segment moving object exactly by
use the histogram with automatic threshold segmentation
and mathematical morphology.
To consider with the hardness to construct the
background, we put forward a new algorithm. In this
algorithm, we do not construct the background firstly, but
to construct it in processing. Then, we trace moving
object from background and catch them. Finally, we give
some experiments to validate our algorithm is higher
correctness and detect pace to the classic capture moving
object algorithm.
518 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
doi:10.4304/jmm.6.6.518-525
II. VIDEO IMAGE PROCESSING
A. Definition
Video image is also called dynamic image. It is made
up of a series of images with a given or assumed relative
order. We can get time interval series of every two next
images by the order.
The relative order is the time interval of next images.
Moreover, to define a time series t
i
(i=1, 2, , n ) and
every t
k
is next to t
k-1
, we set the time interval as:
t k t
k t t t
k
k k k
=
= =
1) n , , 2 , 1 (
1 n , , 2 , 1 ,
1
It is to say that all the time intervals of image capture
are equal to each other.
We call every piece of video images is frame. Of
course, space-position of every moving object is different
to each other by different times. In other words, when
space-position of a moving object is changed from one
frame to the next, we call it moving object. When a point
P of space-object moves from (x
k-1
, y
k-1
) in frame k-1 to
(x
k
, y
k
) in frame k, we set the displacement as (x
k
, y
k
).
We also call it parallax when position is shifted from t
k-1
to t
k
of a point at the object surface.
B. Construct the Background
In classic algorithm, the construction of background is
a key step. Based on the relative simple background of
moving tracking, we construct the background image by
use method based on CDM (change detection mask). The
method is assumed that the moving object can not cover
all images. In other words, the background is must
appeared in images. So when the object moved, we will
see that the background is changed in image.
We set the luminance component of image series I
i
(x,
y). Point (x, y) is the pixels position and integer i is the
frame number (i=1, , N). Integer N is the total number
of frames.
Then we use formula below to define change direction
mask. It reflects grays changes in the next frames.
{
) , ( ) , (
, ) , (
1
,
, 0
y x I y x I d
y x CDM
i i
T d d
T d i
=
=
+
<
In this formula, the threshold value T is used to control
the removal of noise. For each position (x, y), CDM
i
(x, y)
explains the changed curve which is along time axis of
the pixel at position (x, y). Then we can segment the
curve by compute whether CDM
i
(x, y) is larger than zero.
Some stillness parts detected is express as set
{ } M j y x S
j
1 ), , ( . We can see them in figure 1.
In figure 1, the beginning and ending of S
j
is ST
j
and
EN
j
. We select the longest stillness part and register
frame number of its midpoint as M(x, y) in set of {S
j
}
corresponding to position (x, y). Then we use points of
frame with number M(x, y) to fill corresponding position
in video background. It is define as formula below.
2 / )) , ( ) , ( ( ) , ( y x EN y x ST y x M + =
)) , ( , , ( ) , ( y x M y x I y x B =
In this formula, ST(x, y) is the beginning of the longest
stillness part and the ending is EN(x, y). B(x, y) is the
rebuild video background.
C. A New Think with No Background Structure
We can see that it is hard to construct background in
video capture. Moreover, background structure is a key
step in video capture. So if we find a new way to catch
moving objects without background structure, we can get
rid of bottleneck of capture of moving objects.
In CDM, we find that it detect background by changes
of images. As we known, it rebuilds background by use
the region has been covered by moving objects when
moving objects move. So when moving objects covers all
images and so on, we can not find the correct background.
It means that we always catch incorrect objects in this
case.
So we create a new think to find moving objects with
no background rebuilt firstly. When objects move, we can
detect them by the difference between two next images of
series. Therefore, we know the movement region of
images. It is to say the remained region is background
when we remove the moving objects in images.
Then, we construct a background image. Of course, it
may be a only small region of images. But we execute
this step from the first image to last. When we get a piece
of background, we change the original one to the union of
the two. So we divide images to three kinds. The first is
moving objects we have found. The second is background
fragment. The last is moving object we have not found.
We create a list to store them. We replace the older one to
newer one when we find moving object we have found
and link it to list when we find a new object.
Then we give a new algorithm to catch moving objects
from video images.
Fig.1. Curves to display the changes along time axis by difference of luminance frame
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 519
2011 ACADEMY PUBLISHER
III. TRACKING AND PROCESSING OF MOVING OBJECT
A. Algorithm Based on Difference image method
Difference image method is such a method that it judge
existence of moving object by difference subtract from
two next frames. We call the difference as difference
image. It is simple and active to use difference image to
process globally and crudely. Moreover, it is also
beneficial when we catch crude information of moving
object.
Principle of the difference image method is that to find
the moving object by difference. When there is no
movement of objects in monitoring region, the difference
of grey level between next frames in image series is very
small. On the contrary, when there is some movement of
objects, the difference of grey level between next frames
will be significantly increased. So when we choose a
reasonable threshold value, we can determine whether
there exist objects in image series or not. The
mathematical formula is:
>
=
otherwise 0
) , ( ) , ( 1
y) D(x,
2 1
T y x f y x f when
The f
1
(x, y) and f
2
(x, y) in formula are the images of
background only and background with a moving object
inside. D (x, y) is a binary difference image of image at
point (x, y). T is the gray threshold with its size
determines the sensitive degree of monitor. The
difference may be produced by the movement of objects
in the region or moving objects enter inside or leave the
region. It can also be produced by the lighting changes of
region or noise.
We segment moving objects based on difference image
method, which is shown in algorithm b.
Algorithm b (Segment Moving Object)
Input: video image series with pretreatment.
Output: the moving region (object).
Step1. Use current image as the background image to
compare.
IF (video image series are not null)
Goto Step 2.
ELSE
Goto Step 4.
Step2. Get the next image as reading image;
Find difference between the background and the
reading image;
If (difference> threshold)
{
Find moving objects as the difference in
reading image;
Goto Step 3.
}
Set the reading image as the current image;
Goto Step 1.
Step3. Judge the object is same to the objects in list or
not.
If (the object has the same characters (for
example, just like color, gray, et al) to an object in list)
{
Mark the selected moving object in video
images.
Replace the selected object to the object with
same character in list.
}
If (the object has same characters to static part in
list which is an item in list to store static part that
calculated by whole image minus moving object)
{
Find union of static part and selected object
and replace static part with the union in list.
}
If (the object has no same characters to both
moving object and static part in list)
{
Mark the selected moving object in video
images.
Link the selected object in list.
If (item number is larger than the item number
threshold)
Release the list items with moving objects
till to n percent released, n is a selected number between
1 and 100.
}
Goto Step2.
Step4. End algorithm.
Of course, algorithm b is more effective than classic
difference image method. We show it in theorem 1 and 2.
Theorem 1 is prove the correctness of algorithm b and
theorem 2 is prove the effective of algorithm b.
Theorem 1. The moving objects are same to catch both
in algorithm b and classic algorithm when capture is
correct.
Proof:
In order to prove theorem 1, we segment all conditions
into four cases. They are case i~iv.
Case i. When there is no moving object in images.
It is easy to know that the image series are same to
each other in this case because there is no active object in
it. So we can not catch any moving object by both classic
algorithm and algorithm b. It is to say that the results of
both two algorithms are same.
Case ii. When there exist moving objects in image
series and the objects exist in all images of series. (The
moving object moves slowly.)
The classic algorithm compares all images to the
background image. In this case, because the moving
object moves slowly, to assume the created background
image is correct by classic algorithm, we can find the
correct moving object by classic algorithm.
In algorithm b, we will catch the same series moving
objects in this case when the moving objects are move
slowly. So we also catch the correct object.
Case iii. When there exist moving object in image
series and the objects are not exist in all images of series
because they move too fast.
In this case, our algorithm will find that there is no
object in images when the object moves out. And so is
520 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
the classic algorithm when background is constructed
correctly. In other words, the classic algorithm will find
that the image is same to background when background is
constructed correctly and our algorithm will catch the
object irrelatively with the background. It is because that
we use static part to find objects.
Case iv. When there are more than one moving objects
in image series.
It is the similar condition with case iii. When the
background is found correctly, the classic algorithm will
get correct object. Nevertheless, it is hard to construct
correct background when there is more than one object in
background. Therefore, the capture rate is lower with
moving objects become more.
To consider all above, we find that our algorithm is
always catch the correct moving objects. But the capture
rate of classic algorithm depends on the correctness rate
of background. So theorem 1 is proved.
Theorem 2. The magnitude of time complexity of
algorithm b and classic algorithm are same to each other.
The calculate time of algorithm b is less than classic
algorithm plus csm. Number n is item number threshold
of list and m is whole image numbers. Number c is a
constant number.
Proof:
When we ignore the time of the structure background,
we can easily to find that the classic algorithm compute
sm times by s as whole pixels of every images. Compute
time of Algorithm b can divide into the time of step 2 and
step 3.
Obviously, the compute time of step 2 is sm because it
compare s times for each image. And in step 3, we
calculate every if separately.
The mark of first if is same to the classic algorithm. It
means that the calculation is also in classic algorithm. We
can ignore it. The replace part take less than n calculate
times.
It takes less than s calculate times by determine the
second if. The union spends less than s calculates times
because there are moving objects in it.
The last if finds a new moving object in image. The
mark part is ignored because the same reason to the first
if. The link part spends one time and the item numbers
detection is taken one time if we set an item to record the
item number in list.
So when calculate the total times, we know that it spent
less than n+2s+2 compute times for each image. It is to
say that it spend (n+2s+2)m times in step 3.
To plus compute times of step 2 and step 3, we
conclude that algorithm b is spend less than (n+2s+2)m
times to classic algorithm. As we known, n is usually
small. So theorem 2 is proved.
Moreover, we ignore calculate time of background
structure of classic algorithm. In fact, the background
structure is an important process in classic algorithm and
it spends a lot of time to construct it. So when we
consider about this, we know that algorithm b spends
some time to avoid construct the classic background is
valuable.
B. Capture of Moving Object
To use the difference between background and current
frame image, we can gain moving objects to segment. We
must segment the remained points and moving objects
because that it may exist some remained points of
background in the segmented image. Of course, we can
choose the suitable threshold based on distribution of
images gray. In this paper, we use an automatic
threshold segment method to segment images because of
histogram characters of these images.
We assume that gray scales of all images are 1, 2,, L
and the corresponding pixel number with gray i is n
i
. So
whole pixel number
L 2 1
n ... + + + = n n N . It is to say that
probability distribution of pixels with gray i is N n P
i i
/ = .
In the formula, we set 0
i
P and 1
1
=
=
L
i
i
P .
When we divide the whole image to two kinds C
0
and
C
1
with threshold gray k, in other words, we call a pixel
is in C
0
when a pixels gray is in [1k] and a pixel is in
C
1
when a pixels gray is in [k+1L], we find the
probability value of the two kinds are
=
= =
k
i
i k
p C P w
1
0 0
) (
and
0
L
1 k
0 1
1 ) ( w p C P w
i
i k
= = =
+ =
. Then, we solve the
average value of the two kinds are
0
1
0
/ w iP u
k
i
i
=
=
=u(k)/w
0
and
i
L
k i
i
w iP u /
1
1
+ =
= =(u
i
-u(k))/ (1-w(k)). In the
formulas,
=
=
k
i
i
P k w
1
) ( and
=
=
k
i
i
iP k u
1
) ( .
Then, we can find that the average value of whole
image is
=
= =
L
i
i
iP L u u
1
) (
=
= and
1
2
1
1
2
1
/ ) ( w P u i
i
L
k i
+ =
= .
To define variance between-kinds
2
B
=w
0
(u
0
-u
y
)
2
+
w
1
(u
1
-u
y
)
2
=w
0
w
1
(u
0
-u
1
)
2
. Moreover, we define variance
within-kind
2
1 1
2
0 0
2
w w
w
+ = and total-variance
l
L
l
T
P u l
2
1
2
) (
=
=
. Then we can solve the best threshold
) 1 ( , max
2
L k k
B
=