Вы находитесь на странице: 1из 61

Journal of Multimedia

ISSN 1796-2048

Volume 6, Number 6, December 2011


Contents

REGULAR PAPERS

An Adaptive Algorithm for Improving the Fractal Image Compression (FIC)
Taha Mohammed Hasan and Xiangqian Wu

Information Loss Determination on Digital Image Compression and Reconstruction Using Qualitative
and Quantitative Analysis
Zhengmao Ye, Habib Mohamadian, and Yongmao Ye

An Improved Fast SPIHT Image Compression Algorithm for Aerial Applications
Ning Zhang, Longxu Jin, Yinhua Wu, and Ke Zhang

3D Tracking and Positioning of Surgical Instruments in Virtual Surgery Simulation
Zhaoliang Duan, Zhiyong Yuan, Xiangyun Liao, Weixin Si, and Jianhui Zhao

Design of Image Security System Based on Chaotic Maps Group
Feng Huang and Xilong Qu

The Capture of Moving Object in Video Image
Weina Fu, Zhiwen Xu, Shuai Liu, Xin Wang, and Hongchang Ke

Skeletonization of Deformed CAPTCHAs Using Pixel Depth Approach
Jingsong Cui, Lu Liu, Gang Du, Ying Wang, and Qianqi Guan
477
486
494
502
510
518
526








An Adaptive Algorithm for Improving the Fractal
Image Compression (FIC)

Taha Mohammed Hasan
School of Computer Science and Technology, Harbin Institute of Technology (HIT)
Harbin 150001, China
taha_alzaidy@yahoo.com

Xiangqian Wu
School of Computer Science and Technology, Harbin Institute of Technology (HIT),
Harbin 150001, China
xqwu@hit.edu.cn




AbstractIn this paper an adaptive algorithm is proposed
to reduce the long time that has been taken in the Fractal
Image Compression (FIC) technique. This algorithm
worked on reducing the number of matching operations
between range and domain blocks by reducing both of the
range and domain blocks needed in the matching process,
for this purpose, two techniques have been proposed; the
first one is called Range Exclusion (RE), in this technique
variance factor is used to reduce the number of range blocks
by excluding ranges of the homogenous or flat regions from
the matching process; the second technique is called
Reducing the Domain Image Size (RDIZ), it is responsible
for the reduction of the domain by minimizing the Domain
Image Size to 1/16
th
instead of 1/4
th
of the original image size
used in the traditional FIC. This in turn will affect the
encoding time, compression ratio and the reconstructed
image quality. For getting best results, the two techniques
are coupled in one algorithm; the new algorithm is called
(RD-RE). The tested (256x256) gray images are partitioned
into fixed (4x4) blocks and then compressed using visual
C++ 6.0 code. The results show that RE technique is faster
and gets more compression ratio than the traditional FIC
and keeping a high reconstructed images quality while RD-
RE is faster and it gets higher compression ratio than RE
but with slight loss in the reconstructed image quality.

Index TermsFractal, range block, variance, image
compression, encoding time

I. INTRODUCTION
Compression and decompression technology of digital
image has become an important aspect in the storing and
transferring of digital image in information society [1].
Recently fractal compression of digital images has
attracted much attention [2]. M. Barnsley introduced the
fundamental principle of fractal image compression in
1988 [3]. Fractal theories are totally different from the
others. Fractal image compression is also called as fractal
image encoding because compressed image is represented
by contractive transforms and mathematical functions
required for reconstruction of original image, instead of
any data in pixel form [4]. One of the most important
characteristics of fractal image coding is its
unsymmetrical property of encoding and decoding
processing. Coding time is rather long for domain
codebook generation and domain-range matching
operation, while decoding algorithm is relatively simple
and fast. This weak aspect makes the fractal compression
method not widely used as standard compression,
although it has advantage of fast decompression as well
as very high compression ratios [5].
Mathematically, FIC is based on the theory of Iterated
Function System (IFS) and its performance relies on the
presence of self-similarity between the regions of an
image. Since most images possess high degree of self-
similarity, fractal compression contributes an excellent
tool for compressing them [6]. FIC consists of finding a
set of transformations that produces a fractal image which
approximates the original image [7].
In IFS coding scheme, many main processes must be
done. First, range creating, the image must be partitioned
into blocks (ranges) with non-overlapping [8]. Second,
domain creating, the domain is created through taking the
average of every four (2 x 2) adjacent elements in range
to be one pixel in the domain, that means the size of
domain image will be quarter size of the range. Fig. 1
shows an example of range and domain blocks size.
Third, matching process, for every range block, a similar
domain block is found using IFS mapping. The data of
blocks of the compressed image are represented using the
IFS mapping coefficients [9].
FIC suffers from the length of time spent in the
compression process because there are a huge number of
corresponding mapping operations, as all over the range
compared with all domains for each of eight cases of (8
symmetries) [10].
In decoder process, the compressed image must be
reconstructed from IFS-code, which is saved in codebook
file. The reconstructed image starting from an arbitrary
image and iterates these affine transformation parameters,
according to the contractive mapping theory, the
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 477
2011 ACADEMY PUBLISHER
doi:10.4304/jmm.6.6.477-485
reconstructed image converges to attractor after 8
iterations [11]. Fig.2 shows the main process of the FIC
model.
Large efforts have been undertaken to speed up the
encoding process. Most of the proposed techniques
attempt to accelerate the searching and are based on some
kind of feature vector assigned to ranges and domains. A
different route to increased speed can be chosen by less
searching as opposed to faster searching [12]. In this
work the proposed algorithm worked to speed-up the FIC
by less searching through excluding homogenous ranges
from the search process and also through minimizes the
domain pool.











Figure 1. Construction domain block from the range block















a) Encoding Unit










b) Decoding Unit

Figure 2. Fractal Image Compression System Model

II. SELF SIMILARITY
In mathematics, a self-similar object is exactly or
approximately similar to a part of itself (i.e. the whole has
the same shape as one or more of the parts). Many objects
in the real world, such as coastlines, are statistically self-
similar: parts of them show the same statistical properties
at many scales. Self-similarity is a typical property of
fractals. Scale invariance is an exact form of self-
similarity where at any magnification there is a smaller
piece of the object that is similar to the whole. For
instance, a side of the Koch snowflake is both
symmetrical and scale-invariant; it can be continually
magnified 3x without changing shape [13].
Natural images are not exactly self-similar, natural
images can be partially constructed from affine
transformations of small parts of themselves. Self-
Similarity indicates that small portions of the image
resemble larger portions of the same image. The search
for this resemblance forms the basis of fractal
compression scheme [15]. Therefore the image must be
partitioned into blocks to find self-similarity in other
portion of the same image. This is intrinsic of fractal
encoding techniques.
Fig. 3 shows that selfsimilar portion of the image can
be found, there is a reflection of the hat in the mirror. The
reflected portion can be obtained using an affine
transformation of a small portion of her hat. Parts of her
shoulder are almost identical [11].



Figure 3. An example shows the self-similarity in Lenna image

This paper constrains on the idea that the self-
similarity in images depends on the image features
because in FIC compression, the matching process search
for self-similar portions in the image but in different
scales, so if the partition of the image to be search has
many details, it is hard to find a suitable matching part of
image for it and vice versa.
III. IMAGE FEATURES
Images contain many regions of different details, some
of these regions have significant detailed information and
others have not (flat or homogenous) those regions where
there is no gradation or that the gradation is not recognize
with the naked eye, see Fig.4. The homogeneous regions
are easy detectable in the images that are manufactured or
composed by human, such as personal photos, which
often contain areas of constant color, smooth as it is in
the background. On the contrary, it is difficult to obtain a
homogeneous 100% in natural images (landscape, etc.)
such images may be reflected areas appear to the naked
eye as a homogeneous or with a single color, but the
arithmetic is not, the current research focuses on the
exploitation of this principle , i.e., the exploitation of the
regions that appear homogeneous to the naked eye and
using them to speed up the compression process of the
Range Pool
Generation
Domain Pool
Generation
Matching and
Quantization
Original
Image
IFS Code
IFS Code
IFS Code
Dequantization
IFS Code
Decoding
Range pool
Domain pool =1/4
th

original image size
478 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
FIC to the point that do not affect the quality of the image
recovered after compression.

























Figure 4. Area inside the box is a homogenous region, while the
region inside the circle contains different combinations

IV. PROPOSED METHOD
In this work an adaptive algorithm for improving the
basic fractal image coding process is proposed. The
algorithm works to enhance the performance of FIC in
terms of speeding up the encoding process and increasing
the compression ratio while keeping a high reconstructed
image quality. The proposed algorithm concentrates on
reducing the number of matching operations between
each rang block and domain pool needed to produce the
FIC code by the following two techniques:
A. Range Exclusion (RE) technique:
This technique works to reducing the number of range
blocks required in matching process by extracting a few
numbers of features that characterize the range images.
Then a number of ranges are excluding from matching
process with the all 8 symmetries. The mean and standard
variance features of all range blocks will be extracted.
The mean value of range gives the measure of average
gray level of range (Mr); standard variance (Vr) value of
range defines the dispersion of its gray level from the
mean. Variance has been used to check the range area
whether it is homogenous region or contains details. After
partitioning, the contents of each range will be checked
before starting the search operation to decide if it is a
homogenous region (flat) or not using the variance
criteria, in homogenous region the value of variance is
about zero while it is increase in the areas with more
details. The flat region means that all pixels of this region
have the same value or are close to each other. During the
matching process, the homogeneous ranges will be
excluded. So, the matching operation will be limited on
the detailed regions only and this leads to reduce a huge
amount of complex calculations, which results as fast
coding process. In order to achieve the greatest benefit,
the areas of homogeneous are controlled by using several
values of the variance; these values were named
Homogenous Permittivity (HP) which represents the
amount of homogeneity allowed. If the variance of any
part of the image (Range) is zero or is less than the HP
value, this means that all pixels of that part is equal or so
closely , then it will not enter in the search and matching
operation, and this range will be encoding only by saving
its mean value. So this process will speed-up FIC
significantly also it will increase the compression ratio
because each range excluded from the matching operation
will require only one byte (8 bits) to store its mean value
(Mr) as its fractal code instead of the 25 bits required to
store its IFS code parameters (s = 7bits, O= 5bits, x= 5
bits, y= 5bits and sym= 3bits) [11].
B. Reducing the Domain Image Size (RDIZ) technique:
This technique works on minimizing the domain pool;
as mentioned previously in traditional FIC, the encoding
process is computationally intensive. A large number of
sequential searches through a list of domains are carried
out while trying to find a best match for a range block. A
large domain pool will increase the number of
comparisons that have to be made to find the best domain
block and this where most of the computing time is used
[13]. The proposed method significantly reduces the
encoding time. The key of the idea is to reduce the
number of domain blocks searched for each range block.
This can be done by reducing the domain image size to
1/4
th
the traditional domain image size so this process
called Reducing the Domain Image Size (RDIZ). The
domain pool will be created as 1/16
th
of the original
image size instead of the conventional domain size (1/4
th

of the original size) by down sampling every 4x4 (instead
2x2) pixels in the original image (using the average
method) to one pixel in the reduced domain image as
illustrated in Fig. 5,6 . In this case, for example if the
original image was (256 x 256) pixels, the range block
size was 4 and the domain jump step was 4, the number
of the domain blocks needed in the match process for
each range block will be reduced from 1024 (in the
traditional FIC) to the 256 domain blocks. So, the
computations needed in the encoding process will be
reduced to be (4096 x 256) =1,048,576 instead of (4096 x
1024) = 4,194,304 and this will decrease the encoding
time significantly. Also reducing the domain size will
reduce the number of bits required to encode the original
image because the number of bits needed for storing each
of x and y coordinates of the best matched domains will
be decreased. In our example when the domain pool of
size (64 x 64) pixels , the maximum value for each x and
y coordinates will be (60) by dividing it on the jump step
(60/4 =15) then the encoder will need 4 bits to store each
of x and y coordinates instead of 5 bits in the traditional
FIC. Accordingly, this will lead to remarkable increase in
the compression ratio.


JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 479
2011 ACADEMY PUBLISHER

Figure 5. Down sampling method using the average of (4x4) pixels











Figure 6. Down sampling Lenna image using the average of (4x4) pixels
The following algorithm utilized steps required to
perform the proposed method.

Algorithm
Input: Image, HP
Output: IFS code ((x, y, s, o, and sym) or Mr)
Step1: Load the image into buffer
Step2: Partitioning the image into fixed blocks size
with non-overleap (R
1
R
n
)
Step3: Generate the domain pool blocks (D
1
D
m
)
from the original image using 4x4 averaging
method.
Step 4: Compute the mean for the current range block
R
i
according to (1):

) 1 (
1
1
0
1
0


=

=
size size
Y
j
ij
X
i size size
X
Y X
Mr


Step 5: Compute its variance according to (2):

[ ] ) 2 (
1
1
0
2
1
0


=

=
size size
Y
j
ij
X
i size size
Mr X
Y X
Vr


Step 6: If Vr <= HP then save ranges mean (Mr) and
excluding this range from the mapping operation
(jump to step 10), else jump to step 7.
Step 7: Do the mapping operation by:

Compute the scale s and offset o coefficients
according to (3) and (4):

) 3 ( S
1
2
1
2
1 1 1

=


= =
= = =
n
i
n
i
i i
n
i
i
n
i
i
n
i
i i
d d
r d r d
n
n


And

) 4 ( O
1
2
1
2
1 1 1 1
2


=


= =
= = = =
n
i
n
i
i i
n
i
n
i
n
i
n
i
i i i i i
d d
r d d d r
n


Quantize the s and o values
Compute the approximation error E(R
i
, D
i
; s, o)
according to (5):

) 5 ( 2 nO O O 2 2 S S
n
1
D) E(R,
1 1 1 1 1
2 2

+ + =

= = = = =
n
i
n
i
i
n
i
n
i
n
i
i i i i i r d r d d r


Compare the computed error with the minimum
registered error (E
min
): if E(R
i
, D
i
; s, o)> E
min

then jump to step 8, else
Replace the E
min
and store the current IFS code
(i.e. x, y, s, o, and symmetry).
Step8: Repeat the step (7) for all symmetry versions of
tested domain blocks.
Step9: Repeat step (7) to (9) for all domain blocks
listed in the domain pool.
Step10: Get the next range.
Step11: Repeat step (4) to (9) for all range blocks
listed in the range pool.

In order to get high speed FIC with more compression
ratio and keep as much as possible the reconstructed
image quality, different values of the variance are
adopted in the present research.
V. RESULTS AND DISCUSSION
In order to show the effects of each of the RE and
RDIZ on the traditional FIC, RE will be tested and its
results will be discussed at first and then we will discuss
the tests and results of coupling RE and RDIZ together in
one algorithm.
A. Results and Discussion of RE
The program of FIC was applied to many images
without check whether the image contains homogenous
region or not (i.e. HP=0, the normal state), then the RE
technique is applied to the same images with different HP
values. Table I shows the effects of applying different
values of HP on Peppers Image, it shows that when
HP=2, the compression ratio been increased to reach
(5.534) and the time required has been reduced from (34
seconds) to (18 seconds), when HP = 4 the compression
ratio increased to (5.65) and the encoding time decreased
to (14 seconds), when HP = 10, the compression ratio
increased to (5.747) and the time required for the
compression has decreased to (11 seconds), when the
Original image (256 x 256)
Domain image (64 x 64)
(1/16
th
) of the original image
size
480 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
value of HP = 30 the compression ratio increased to
(5.82) with the loss in image quality recovered, but still a
high quality (30.74) and the encoding time been
decreased to (9 seconds). Compared to time-spent when
HP = 0 (34 seconds), we have obtained the proportion of
speed about 73.5% of the compression process.

Fig. 7 shows the results of applying 0, 4 and 30 HP
values on Pepper image. Fig. 8 and 9 show some results
obtained when applying HP = 0, 6 and 20 on Bird and
Lenna images respectively.
In the experiments, HP values ranged from 1 to 30
have been applied to all images. The data obtained from
experiments, is plotted the relationship between the
values of HP and the encoding time (see Fig. 10) as well
as the relationship between HP and the quality of
recovered image (see Fig. 11) and also the relationship
between HP and the compression ratio (see Fig. 12).


The Original Image BIRD image (File size: 64 kb)

The Reconstructed Image
(a) Traditional method


(HP=0)
(b)

HP=6

(c)

HP=20
Enco. time: 34 Sec.
PSNR 30.30 DB
C.R:5.12
File size: 12.5 kb
Enco. time: 13 Sec.
PSNR 29.92 DB
C.R:5.68
File size: 11.27 kb
Enco. time: 9 Sec.
PSNR 28.69 DB
C.R:5.80
File size: 11.03 kb
Fig. 8 Results of the impact of HP on the BIRD image
The Original Image LENNA image (File size: 64 kb)

The Reconstructed Image
(a) Traditional method


(HP=0)
(b)


HP=6

(c)


HP=20
Enco. time: 35 Sec.
PSNR 32.22 DB
C.R:5.12
File size: 12.5 kb
Enco. time: 12 Sec.
PSNR 31.51 DB
C.R:5.72
File size: 11.18 kb
Enco. time: 9 Sec.
PSNR 30.64 DB
C.R:5.81
File size: 11.01 kb
Fig. 9 Results of the impact of HP on the LENNA image
The Original Image PEPPER image (File size: 64 kb)

The Reconstructed Image
(a) Traditional method


(HP=0)
(b)


HP=4

(c)


HP=30
Enco. time: 34 Sec.
PSNR 32.69 DB
C.R:5.12
File size: 12.5 kb
Enco. time: 14 Sec.
PSNR 32.15 DB
C.R:5.65
File size: 11.33 kb
Enco. time: 9 Sec.
PSNR 30.74 DB
C.R:5.82
File size: 10.99 kb
Fig. 7 Results of the impact of HP on the PEPPER image
TABLE I.
THE EFFECTS OF USING DIFFERENT HP VALUES ON C.R, E.T, AND THE
QUALITY OF PEPPER IMAGE
HP
value
C.R.
E.T.
(second)
PSNR
(dB)
0 5.12 34 32.69
2 5.534 18 32.45
4 5.65 14 32.15
10 5.747 11 31.65
30 5.82 9 30.74
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 481
2011 ACADEMY PUBLISHER






















































































B.Results and Discussion of coupling RDIZ with RE
To achieve more speeding-up and more compression
ratio, RDIZ is coupled with RE in one algorithm. The new
algorithm is called RD-RE. RD-RE is applied to same
tested images with same previous HP values. The results
show that; there is a reasonable increase in compression
ratio, the encoding time is reduced significantly and the
reconstructed image quality is still acceptable. Table II
shows the effects of applying RD-RE with different
values of HP on Peppers Image.





LEENA image
PEPPER image
BIRD image
Fig. 10 The effect of HP on the Encoding Time
LEENA image
PEPPER image
BIRD image
Fig. 11 The effect of HP on the image quality
TABLE II.
THE EFFECTS OF APPLYING RD-RE WITH DIFFERENT HP VALUES ON
C.R, E.T, AND THE QUALITY OF PEPPER IMAGE
HP
value
C.R.
E.T.
(second)
PSNR
(dB)
0 5.56 9 31.9
2 5.89 4.5 31.3
4 6.1 3.6 31.04
10 6.29 2.7 30.56
30 6.375 2.1 29.68
LEENA image
PEPPER image
BIRD image
Fig. 12 The effect of HP on the Compression Ratio (C.R.)
482 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
From table II, it can be seen that when HP=0, that
means there no effects of RE, so the results is affected
only by the RDIZ, the C.R. is increased from 5.12 to 5.56
(increasing about 8.6%) and time is reduced from 34 to 9
seconds (decreasing about 73.5%) but with a little lose in
PSNR (about 2.4%). The other values of the HP show the
full effects of RD-RE on the FIC results. From comparing
the results achieved from tables I and II, it obvious that
using HP value = 30 in RE method gives an acceptable
PSNR (about 30), increasing C.R. from 5.12 to 5.82
(about 14% increasing) and reduction in the E.T. from 34
to 9 seconds (about only 26% of the time required when
HP= 0), and in RD-RE the same PSNR (about 30) can be
achieved when HP=10 but the C.R. will be increased to
6.29 and this mean that this method get higher
compression ratio (about 22.85% increasing) than the
traditional FIC and also the encoding time will be
significantly reduced to 2.7 seconds meaning that the
time will be decreased about 93.82% of the time required
in traditional FIC. Fig. 13-15 show the effects of applying
RD-RE with different HP values on Pepper, Bird and
Lenna images respectively.
Fig. 16-18 show comparison of the results achieved
from applying traditional FIC, RE technique, RDIZ
technique and the RD-RE algorithm on the tests images
(the results of traditional FIC in the figures are the results
with HP=0 in the RE columns and the results of RDIZ are
with HP=0 in the RD-RE column). The figures showed
that the RDIZ can reduce the encoding time to about 1/4
th

the time required in the traditional FIC and can get about
8.6% C.R. more than traditional but with a little loss in
PSNR differs from one image to another but it is about
1.78% in average (for the test images). Also the figures
showed that the effects of each of the RE and RD-RE with
different HP values on the test images, the results are
differ from one image to another depending on how many
homogenous regions are there? but if we take the value of
HP=10 as a compression value, we can see that the
encoding time is reduced to about 62% and 91% in
average and the C.R. can be increased to about 12.2% and
22% in average by applying RE and RD-RE respectively
in comparison with the result achieved by applying the
traditional FIC on the test images but the PSNR is
reduced slightly to about 2.77% and 5.71% respectively.






The Original Image PEPPER image (File size: 64 kb)

The Reconstructed Image
(a) Traditional method


(HP=0)
(b)

HP=4

(c)


HP=30
Enco. time: 9 Sec.
PSNR 31.9 dB
C.R:5.56
File size: 11.51 kb
Enco. time: 3.6 Sec.
PSNR 31.04 dB
C.R:6.1
File size: 10.49 kb
Enco. time: 2.1 Sec.
PSNR 29.68 dB
C.R:6.375
File size: 10.03 kb
The Original Image BIRD image (File size: 64 kb)

The Reconstructed Image
(a) Traditional method


(HP=0)
(b)


HP=6

(c)


HP=20
Enco. time: 9 Sec.
PSNR 29.75 dB
C.R:5.56
File size: 11.51 kb
Enco. time: 3.27 Sec.
PSNR 28.99 dB
C.R:6.13
File size: 10.44 kb
Enco. time: 2.24 Sec.
PSNR 27.74 dB
C.R:6.34
File size: 10.09 kb
Fig. 14 Results of applying RE-RD with different HP on the Bird image
Fig. 13 Results of applying RE-RD with different HP on the PEPPER image
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 483
2011 ACADEMY PUBLISHER
























































The Original Image LENNA image (File size: 64 kb)

The Reconstructed Image
(a) Traditional method


(HP=0)
(b)


HP=6

(c)


HP=20
Enco. time: 9 Sec.
PSNR 31.85 dB
C.R:5.56
File size: 11.51 kb
Enco. time: 3.2 Sec.
PSNR 30.53 dB
C.R:6.19
File size: 10.33 kb
Enco. time: 2.21 Sec.
PSNR 29.62 dB
C.R:6.31
File size: 10.14 kb
Fig. 15 Results of applying RE-RD with different HP on the Lenna image
Fig. 16 The effects of applying RE-RD and RD on the E.T. with
different HP
Pepper Image
0
5
10
15
20
25
30
35
40
0 2 4 6 10 15 20 30
HP
RD-RE
RE
Bird Image
0
5
10
15
20
25
30
35
40
0 2 4 6 10 15 20 30
HP
Lenna Image
0
5
10
15
20
25
30
35
40
0 2 4 6 10 15 20 30
HP
RD-RE
RE
Bir d Ima ge
4
5
6
7
8
0 2 4 6 10 15 20 30
H P
Pe ppe r Ima ge
4
5
6
7
8
0 2 4 6 10 15 20 30
H P
RD - R E
RE
Le nna Ima ge
4
5
6
7
8
0 2 4 6 10 15 20 30
H P
C
o
m
p
r
e
s
s
i
o
n

R
a
t
i
o

C
.
R
.

Fig. 17 The effects of applying RE-RD and RD on the C.R. with
different HP
Pepper Image
25
26
27
28
29
30
31
32
33
34
35
0 2 4 6 10 15 20 30
H P
RD-RE
RE
Bird Image
25
26
27
28
29
30
31
32
33
34
35
0 2 4 6 10 15 20 30
H P
P
S
N
R
Lenna Image
25
26
27
28
29
30
31
32
33
34
35
0 2 4 6 10 15 20 30
H P
P
S
N
R

Fig. 18 The effects of applying RE-RD and RD on the PSNR with
different HP
484 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
VI. CONCLUSIONS
1. The effects of HP values on the results of FIC are
different from one image to another depending on the
image composition. Best results can be achieved in
the images that have many homogeneous regions.
2. Experience has shown that if the value of HP=0, the
results represent the normal situation for the FIC
process.
3. High values of the HP lead to exclude more number
of range areas from the process of searching and
matching in the compression algorithm and this will
provide high speed and more compression ratio but on
the expense of the quality.
4. Experiments showed (in test images) that values of
HP greater than 30 in RE technique and greater than
10 in RD-RE technique may lead to poor quality.
5. When HP=0 in RD-RE mean that the results are
affected only by RDIZ technique, and the results
showed that RDIZ lead to reduce E.T. significantly
and increase the C.R. remarkably but with a slightly
loos in PSNR.
6. There is an inverse relationship between HP and the
time spent in the encryption process.
7. There is an inverse relationship between HP and the
quality of recovered image.
8. The relationship direct correlation between HP and
the compression ratio.
9. HP=10 is the best value in RD-RE algorithm because
it made a best tradeoff among E.T., C.R. and PSNR
(for the test images).

ACKNOWLEDGMENT
This work was supported by the Natural Science
Foundation of China (NSFC) (No. 60873140, 61073125
and 61071179), the Program for New Century Excellent
Talents in University (No. NCET-08-0155 and NCET-08-
0156), and the Fok Ying Tong Education Foundation
(No. 122035).
REFERENCES
[1] Y.Chakrapani and K.Soundera Rajan, Hybrid Genetic-
Simulated Annealing Approach for Fractal Image
Compression, International Journal of Information and
Mathematical Sciences vol.4, no.4, pp. 308-313, 2008.
[2] Gonzalez R. C., and Wintz, P., Digital Image Processing,
2nd ed., Addison-Wesley publication company, 1987.
[3] M. Barnsley, Fractals Everywhere. New York:
Academic, 1988.
[4] V. Chaurasia and A. Somkuwar, Speed up Technique for
Fractal Image Compression, International Conference on
Digital Image Processing icdip, pp.319-323, 2009, doi.
10.1109/ICDIP.2009.66.
[5] Y. Fisher, Fractal Image Compression Theory and
Application, Springier Verlage, New York, 1994.
[6] Ghada K., "Adaptive Fractal Image Compression", M.Sc.
thesis, National computer Center /Higher Education
Institute of computer and Informatics, 2001.
[7] Sumathi Poobal, and G. Ravindran, Arriving at an
Optimum Value of Tolerance Factor for Compressing
Medical Images, World Academy of Science,
Engineering and Technology, vol. 24, pp. 169-173, 2006.
[8] Mahadevaswamy H.R., New Approaches to Image
Compression, Ph.D. thesis, Regional Engineering
College, university of Calicut, December, 2000.
[9] Auday A., "Fractal Image Compression with Fasting
Approaches", M.Sc. thesis, College of Science, Saddam
University, 2003.
[10] Jamila H. S., "Fractal Image compression", Ph.D. thesis,
College of Science, University of Baghdad, January,
2001.
[11] S. Abdul-Khalik, Fractal Image Compression Using
Shape Structure, M.Sc. thesis, College of Science, Al-
Mustansiriya University, Iraq, 2005.
[12] Dietmar Saupe, Accelerating Fractal Image Compression
by Multi-Dimensional Nearest Neighbor Search, CC95
Data Compression Conference, J. A. Storer, M. Cohn
(eds.), IEEE Computer Society Press, March 1995.
[13] http://en.wikipedia.org/wiki/Self-similarity, 1/6/2011.
[14] Mario Polvere and Michele Nappi, Speed-Up In Fractal
Image Coding Comparison of Methods, IEEE
TRANSACTIONS ON IMAGE PROCESSING, VOL. 9,
NO. 6, JUNE 2000.
[15] E. Vrcasy and L. Colin, , "Image Compression Using
Fractals, IBM Journal of Research and Development,
vol. 65, no.19, pp. 121-134 1995.


Taha M. Hasan received his BSc, MSc
in computer science from Mansour
University College and The University of
Mustansiriyah, Baghdad, Iraq in1992 and
2006 respectively. He is currently
pursuing the Ph.D. degree at the Harbin
Institute of Technology (HIT), Harbin,
China. His research interests is the image
processing.


Xiangqian Wu received his BSc, MSc
and PhD in computer science from
Harbin Institute of Technology (HIT),
Harbin, China in 1997, 1999 and 2004,
respectively. Currently he is a professor
in HIT. His research interests include
image processing, pattern recognition
and biometrics, etc.


JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 485
2011 ACADEMY PUBLISHER

Information Loss Determination on Digital Image
Compression and Reconstruction Using
Qualitative and Quantitative Analysis

Zhengmao Ye and Habib Mohamadian

Southern University, Baton Rouge, Louisiana, USA
Emails: {zhengmaoye, habib}@engr.subr.edu

Yongmao Ye

Liaoning Radio and Television Station, Shenyang, China
Email: yeyongmao@hotmail.com



Abstract To effectively utilize the storage capacity, digital
image compression has been applied to numerous science
and engineering problems. There are two fundamental
image compression techniques, either lossless or lossy. The
former employs probabilistic models for lossless storage on
a basis of statistical redundancy occurred in digital images.
However, it has limitation in the compression ratio and bit
per pixel ratio. Hence, the latter has also been widely
implemented to further improve the storage capacities,
covering various fundamental digital image processing
approaches. It has been well documented that most lossy
compression schemes will provide perfect visual perception
under an exceptional compression ratio, among which
discrete wavelet transform, discrete Fourier transform and
some statistical optimization compression schemes (e.g.,
principal component analysis and independent component
analysis) are the dominant approaches. It is necessary to
evaluate these compression and reconstruction schemes
objectively in addition to the visual appealing. Using a well
defined set of the quantitative metrics from Information
Theory, a comparative study on several typical digital
image compression and reconstruction schemes will be
conducted in this research.

Index Terms Image Compression, Image Reconstruction,
Discrete Wavelet Transform, Discrete Fourier Transform,
Optimal Compression

I. INTRODUCTION
The objectives of digital image compression schemes
are to minimize the image size so as to speed up data
transmission and reduce the memory requirement, while
to retain the image quality at an acceptable level.
Lossless compression leads to an exact replica of the
source image when being decompressed. By contrast,
lossy compression can also be introduced to save more
storage space with a trade-off of sacrificing finer
information. In general, digital images have certain
statistical properties that are exploited by encoders, thus
the compression result is always less than optimal. With
a little tolerable mismatch between the compressed and
source images, an approximation is made adequately.
Applications of digital image compression involve a
wide variety of methodologies and techniques, many of
whom have appeared in literatures [1-4].
Context-based modeling is of great importance for
high-performance lossless data compression. Partial
approximate matching is used to reduce the modeling
costs and enhance compression efficiency based on
previous context modeling. It has competitive lossless
compression performance [10]. Huber regression is
applied to robust fractal image compression design,
which is insensitive against noises and outliers in the
corrupted source images. To reduce the high
computational costs, particle swarm optimization is
utilized to minimize searching time. Encoding time is
effectively reduced while the quality of retrieved images
is preserved [11]. Adaptive coding techniques are used to
exploit the spatial energy compaction property of discrete
wavelet transform. Two crucial issues are the flexibility
level and coding efficiency. Spherical coder is an
adaptive framework using local energy as a direct
measure to differentiate wavelet subbands and allocate
the bit rates. The scheme is nonredundant with the
competitive peak signal to noise ratio (PSNR) [12]. An
adaptive one-hidden-layer feedforward neural network is
applied to image compression. Training, generalization
capabilities and quantization effects of this adaptive
network are improved with promising results [13]. A new
encoding algorithm is proposed for matching pursuit
image coding. Coding performance is improved when
correlations are used in encoding. Optimization is
reached upon tradeoff among the efficient atom position
coding, atom coefficient coding and optimize encoder
parameters. The proposed algorithm outperforms existing
coding algorithms for matching image coding. The
algorithm also provides better rate distortion performance
than JPEG 2000 at low bit rates [14]. A practical uniform
down-sampling is proposed in image space, making the
sampling adaptive by spatially varying, directional low-
pass prefiltering. The decoder decompresses the low-
resolution image and then upconverts it to the original
resolution in the constrained least squares restoration
486 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
doi:10.4304/jmm.6.6.486-493

process, using a 2D piecewise autoregressive model and
directional low-pass prefiltering. It achieves superior
visual quality and also outperforms JPEG2000 in PSNR
measure at the low and medium bit rates [15]. A
multidimensional multiscale parser is used for universal
lossy compression of images and videos. It is based on
multiscale pattern matching and block encoding of an
input signal. The pattern is updated using encoded
blocks. It presents a flexible probability model using
smoothness constraints. It improves both smooth images
and less smooth ones [16]. The 2D oriented wavelet
transform is used for image compression, where two
separable 1D transforms are implemented with direction
consistency. Interpolation rules are designed to generate
rectangular subbands and direction mismatch is adjusted.
The proposed compression scheme outperforms the
JPEG2000 for remote sensing images with high
resolution. It is suitable for real time processing at a low
cost [17]. To reduce coding distortions at borders,
symmetric extension filter bank is applied to wavelet
packet coding. It provides a framework to accommodate
FIR and IIR filters with potential perfect reconstruction.
The IIR filters with both the rational and irrational
transfer functions are implemented, giving rise to perfect
stopband suppression [18]. Linear convolution neural
network has been used to seek optimal wavelet kernel
that minimizes errors and maximizes image compression
efficiency. Daubechies wavelet transform can produce
the highest compression efficiency and smallest mean-
square-error for most patterns. Haar wavelet produces
solid results on sharp edges and low noise smooth areas.
It provides robust fractal image compression [19]. A
supervised classification scheme is also proposed for
hyperspectral imagery, consisting of the greedy modular
eigenspace and positive Boolean function. The greedy
modular eigenspace is implemented as a feature extractor
to generate the feature eigenspace for all material classes
present in hyperspectral data to reduce dimensionality.
The positive Boolean function is a stack filter for
supervised training. It uses the minimum classification
error as the criterion to improve the classification
performance. The feature extractor serves as a nonlinear
PBF-based multiclass classifier for classification
preprocessing. It increases the accuracy for hyperspectral
imagery [20]. Principal component analysis (PCA) is a
multivariate statistical method for image compression. It
is sensitive to the outliers and missing data, so fuzzy
statistics is introduced into the classical PCA methods.
The surface characteristics are differentiated sufficiently
and feature recognition accuracy is improved greatly
[21]. At the same time, several optimization based
approaches such as PCA, nonlinear component analysis
(NCA) and independent component analysis (ICA) have
been applied to research fields of image processing and
pattern recognition [22-23].
No doubt that each scheme has its own benefit for
image compression. However, it is impractical to claim
any to be the most powerful one as some conflicting
results occur in different cases. Tradeoffs also need to be
made frequently among the compression ratios and
reconstruction qualities as well as computational costs. In
fact, there are at least three dominating compression
approaches for digital image, such as prevalent discrete
wavelet transform, discrete Fourier transform and
optimal statistical schemes. In order to further access the
merits and drawbacks of these approaches, quantitative
metrics are introduced to measure qualities of image
compression based on these major approaches [5-9].
II. SCHEMES FOR IMAGE COMPRESSION
Three digital image compression and reconstruction
schemes will be analyzed in this section.
A. Discrete Wavelet Transform (DWT)
2D discrete wavelet transform uses a set of basis
functions for image decomposition. In a two dimensional
case, one scaling function (x, y) and three wavelet
functions
H
(x, y),
V
(x, y) and
D
(x, y) are employed.
Each scaling function or wavelet function is the product
of the basis wavelet functions. Four products produce the
scaling function (1) and separable directional sensitive
wavelet functions (2)-(4), resulting in a structure of
quaternary tree. The simple Haar Transform has been
used to determine the scaling and wavelet functions.

(x, y) = (x)(y) (1)

H
(x, y) = (y)(x) (2)

V
(x, y) = (x)(y) (3)

D
(x, y) = (x)(y) (4)

These wavelets measure variations for images along
three directions, where
H
(x, y) corresponds variations
along columns (horizontal),
V
(x, y) corresponds to
variations along rows (vertical) and
D
(x, y) corresponds
to variations along diagonals (diagonal). The scaled and
translated basis functions are defined by:

j,m,n
(x, y) = 2
j/2
(2
j
x - m, 2
j
y - n) (5)

i

j,m,n
(x, y) = 2
j/2

i
(2
j
x - m, 2
j
y - n)
i={H, V, D} (6)

where index i identifies the directional wavelets of H,
V, and D. The discrete wavelet transform of function f(x,
y) of size M by N is formulated as:
0
M-1 N-1
0
x=0 y=0
j ,m,n
1
w (j ,m,n)= f(x,y) (x,y)
MN

(7)
M-1 N-1
i i
j,m,n
x=0 y=0
1
w (j,m,n)= f(x,y) (x,y)
MN

(8)
where i={H, V, D}, j
0
is the initial scale, the w
j
(j
0
, m,
n) coefficients define the approximation of f(x, y), w
i

(j,
m, n) coefficients represent the horizontal, vertical and
diagonal details for scales j>= j
0
. Here j
0
=0 and select N
+ M = 2
J
so that j=0, 1, 2,, J-1 and m, n = 0, 1, 2, , 2
j

-1. Then f(x, y) is retrieved via the inverse discrete
wavelet transform.
2D wavelet compression is introduced, where discrete
wavelet transform is implemented as the N level
decomposition. At each level N, outputs of wavelet
decomposition include: the approximation, horizontal,
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 487
2011 ACADEMY PUBLISHER

vertical and diagonal details, at a quarter size of the
source image, followed by downsampling by a factor of
two. The approximation will be further decomposed until
level N is reached while detail components are held. The
approximation components will be retained for image
reconstruction. The level-dependent soft thresholding is
then selected and applied to detail coefficients using a
shrinkage function at each level. It produces level and
orientation dependent thresholds. The reconstruction is
an inverse transform. At each level back from N to 1, the
approximation and updated detail coefficients will both
be applied for wavelet image reconstruction. Discrete
wavelet compression is suitable for transient signals. On
the other hand, Fourier transform is suitable for smooth
and periodic signals
B. Discrete Fourier Transform (DFT)
2D discrete Fourier transform acts as another simple
conventional scheme for digital image compression. DFT
operates on a function at a finite number of discrete data
points. The formulation is less complex compared with
DWT. Using DFT, an image in the spatial domain is
transformed into a function in the frequency domain,
which can be separated into the real and imaginary
components. DFT returns the value of Fourier transform
for a set of values in frequency domain which are equally
spaced. 2D DFT of f(x,y) in the spatial domain to F(u, v)
in the frequency domain is shown in (9), where M
samples can be obtained at values of x from 0 to M-1 and
N samples can be obtained at values of y from 0 to N-1.
M-1 N-1
-2i(xu/M+yv/N)
x=0 y=0
1
F(u, v)= f(x,y)e
MN

(9)

Now DFT decomposes a digital image into its real and
imaginary components to represent image information in
the frequency domain. Discrete Cosine Transform (DCT)
and Discrete Sine Transform (DST) are both special
cases of the discrete Fourier transform (DFT). In the
spatial domain, DFT operates on both cosine and sine
functions at a finite number of discrete data, while DCT
and DST solely uses cosine (even) function and sine
(odd) function, respectively. The number of frequencies
in the frequency domain is equivalent to the number of
pixels in the spatial domain. In the frequency domain,
DCT contains solely real parts while DST contains solely
imaginary parts of the DFT complex exponentials. DFT
is applied in context. Image compression via Fourier
transform requires zero padding of input data. Inverse
DFT can also be implemented for reconstruction.
DFT is used to reduce computation cost via separating
the total data points into multiple sets, where image
compression is conducted via zero padding of input data.
The compression ratio depends on the percentage of real
and imaginary components that is actually substituted by
zero using zero padding. To reconstruct the image in the
spatial domain after compression, it can be retrieved
from the frequency domain back to the spatial domain
using inverse Fourier transform, as shown in (10).
M-1 N-1
+2i(xu/M+yv/N)
u=0 v=0
f(x, y)= F(u, v)e

(10)
Both DWT and DFT techniques can be successfully
applied to reduce the amount of memory needed to
represent a digital image. The relatively smaller amount
of information is efficiently used to represent the true
image via DWT or DFT image compression. By and
large, DFT is more appropriate for smooth images and
DWT is more appropriate for images with high frequency
components.
C. 2D Principal Component Analysis (PCA)
Statistical optimization is a unique approach in digital
image compression. A digital image can be sorted into
smaller ones and then compressed by projecting a set of
input block matrices onto a reduced number of vectors
(principal components) being estimated. The image
matrix (e.g., MN) is thus divided into a set of small
blocks of the same small size (e.g., mn matrices), giving
rise to a block set. Each block is simply substituted by a
single column vector of the length m*n in order, which
acts as one input vector. Upon image compression, all
single input vectors are projected onto a reduced number
of the vectors estimated from the input vectors by the
generalized Hebbian algorithm, where the principal
components can be determined based on the vectors
using one layer linear neural network. The estimated
components are the final weight vectors of the neural
networks.
This approximation is optimal in a least square sense.
The most significant component has the highest accuracy
and remaining ones have the decreasing accuracy. Using
a single layer neural network with linear neurons, the
generalized Hebbian algorithm illustrates a feedforward
neural network for unsupervised learning. Its learning
rule is formulated as:
j i 1
w = [ - (w y )]
j
ij k ik k
y x
=
(11)
where is the learning rate; w
ij
is the synaptic weighting
function between ith input and jth output neurons; both x
and y are input vectors and output activation vectors,
respectively. It has a normalization effect with the
presence of the current vector in the projection
subtraction. The actual number of neurons represents the
final number of principal components, which also
determines the compression ratio and the bit per pixel. A
smaller number will enhance the compression ratio and
the storage capacity. Several significant eigenvectors can
be obtained using this learning algorithm. It converges on
the eigenvalue decomposition with a probability of one.
To retrieve the digital image after data compression, the
input vectors are obtained in terms of the chosen
principal components at the reduced number. After
reversing the single matrix blocks in order and then
reconstructing multiple image blocks, the decompressed
image is obtained. At the same time, the image size is
reduced at a certain compression ratio, which will gain
the storage capacity.
488 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER

D. Visual Comparisons of Three Compression Schemes
The visual effects in terms of three dominant image
compression schemes are analyzed in this section from a
qualitative perspective.
Gray Level Source Image
Component Analysis Training
Discrete Wavelet Transform
Discrete Fourier Transform

Figure 1. Image Compression of Pyramid and Sphinx
Gray Level Source Image
Component Analysis Training
Discrete Wavelet Transform
Discrete Fourier Transform

Figure 2. Image Compression of National Capitol Columns
Gray Level Source Image Discrete Wavelet Transform
Component Analysis Training Discrete Fourier Transform

Figure 3. Image Compression of Great Goose Pagoda
All three selected schemes have provided solid image
compression without degradation of the image quality.
Three archetypal gray level images are chosen as shown
in Figs. 1-3. The first one is the picture of the Great
Pyramid and Sphinx in Egypt, the second one is the
picture of the National Capitol Columns in USA, and the
third one is the picture of Great Goose Pagoda in China,
symbolizing the integration of ancient and modern
civilization in human history. Discrete wavelet transform,
discrete Fourier transform and the optimal principal
component analysis have been applied to compress and
reconstruct the images. For each of three cases, no
remarkable distinction is shown when the source and
reconstruction images using three schemes are compared.
As a result, the objective evaluation should be conducted
instead to make further comparisons.
III. QUANTITATIVE ANALYSIS
Digital images with MN pixels have been considered.
Occurrence of the gray level is shown as a co-occurrence
matrix of relative frequencies. Occurrence probability
functions are then estimated from the histogram.
A. Discrete Entropy
The discrete entropy is a measure of the information
content, which can be interpreted as the average
uncertainty of the information source. The discrete
entropy is the summation of products of the probability
of the outcome multiplied by the logarithm of the inverse
of probability of the outcome, taking into considerations
of all possible outcomes {1, 2, , n} as the gray level in
the event {x
1
, x
2
, , x
n
}, where p(i) represents the
probability at the level i, which contains all the histogram
counts. It is shown in (12).
k k
2 2
i=1 i=1
1
H(x)= p(i)log = - p(i)log p(i)
p(i)

(12)
B. Discrete Energy
The discrete energy measure indicates how the gray
level elements are distributed. Its formulation is shown in
(13), where E(x) represents the discrete energy with 256
bins and p(i) refers to the probability distribution
function at different gray level, which contains the
histogram counts. For any constant value of the gray
level, the energy measure reaches its maximum value of
one. The larger energy corresponds to lower gray level
number and the smaller one corresponds to higher gray
level number.

k
2
i=1
E(x)= p(i)

(13)
C. Contrast
Contrast is the amount of the gray level (or true color)
difference in the visual properties which shows the
difference between one object and another or background
within the same field of view. The high contrast level and
low contrast level will display distinguishable degree of
variations in gray level (or true color) visual perception.
Highlights and shadows will depict intense differences of
density in image tones for high contrast images, but
depict mild differences of density in image tones for low
contrast images. In the context, root mean square (RMS)
contrast is used as another quantitative metric. It is
defined as the standard deviation of the pixel intensities,
which is not dependent on the actual spatial distribution
of the image contrast.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 489
2011 ACADEMY PUBLISHER

2 M-1 N-1
i=0 j=0
[g(i,j) - g ]
Contrast=
M*N
AVG

(14)
where the intensity g(i, j) denotes an element at
coordinates i and j on a 2D image of the size MN. g
AVG

is the average intensity of all pixel values within an
image. The image intensity has been normalized within
the range [0, 1].
D. Correlation
Correlation is a standard measure of the image contrast
to analyze the linear dependency of the gray levels of
neighboring pixels. It indicates the amount of local
variations across a gray level image. The higher the
contrast is, the sharper the structural variation is. This
measure is formulated as:
M-1 N-1
i j
i=0 j=0 i j
(i- )(j- )
COR = g(i,j)

(15)
M-1 N-1 M-1 N-1
2
i i i
i=0 j=0 i=0 j=0
= (i- ) g(i,j); [i*g(i,j)] =


M-1 N-1 M-1 N-1
2
j j j
i=0 j=0 i=0 j=0
= (j- ) g(i,j); [j*g(i,j)] =



where i and j are the coordinates of the co-occurrence
matrix; M and N represent total numbers of pixels in the
row and column of the digital image; g(i, j) is the element
in the co-occurrence matrix at coordinates i and j.
i
and

i
are the horizontal mean and variance and
j
and
j
are
the vertical mean and variance. is a metric of the gray
tone variance.
E. Dissimilarity
Dissimilarity between two gray level images is
expressed as the distance between two sets of co-
occurrence matrix representations. It is formulated on a
basis of local distance representation as shown in (16).
M-1 N-1
i=0 j=0
DisSim= g(i,j) |i-j|

(16)

where g(i, j) is an element in the co-occurrence matrix at
the coordinates i and j; M and N represent total numbers
of pixels in the row and column of the digital image.
F. Homogeneity
This measure acts as a direct measure of the local
homogeneity of a gray level image, which relates
inversely to the image contrast. The higher values of
homogeneity measures indicate less structural variations
and the lower values indicate more structural variations.
A larger value is corresponding to a higher homogeneity
while a smaller value is corresponding to a lower
homogeneity. It is formulated as (17).
M-1 N-1
2
i=0 j=0
1
Homogeneity= g(i,j)
1+(i-j)

(17)
G. Mutual Information
Another metric of the mutual information I(X; Y) can
be applied as well, which is to describe how much
information one variable tells about the other variable.
The relationship is formulated as (18).
XY
XY 2
X,Y X Y
p (X, Y)
I(X;Y)= p (X, Y)log H(X) - H(X |Y)
p (X)p (Y)
=

(18)

where H(X) and H(X|Y) are the values of the discrete
entropy and conditional entropy; p
XY
is the joint
probability density function; p
X
and p
Y
are the marginal
probability density functions. It can be explained as the
information that Y can tell about X is the reduction in
uncertainty of X due to the existence of Y. It can also be
regarded as the relative entropy between the joint
distribution and product distribution.
IV. NUMERICAL SIMULATIONS
Using the quantitative metrics defined above, a
comparative study on the mismatch between the source
and reconstructed images is made, using three schemes
of discrete wavelet transform, discrete Fourier transform
and optimal principal component analysis. The detail
results are plotted in Fig. 4. Graphically for all three
selected pictures of the Pyramid, the Columns and the
Pagoda, it is hard to tell exactly whether or not one
scheme is superior to another, since only very little
difference on each quantitative metric can be observed.
The reason is that three schemes of DFT, DWT and PCA
are all fairly effective in digital image compression and
reconstruction. Thus, the retrieval images just depict little
difference away from the source images.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
2
4
6
8
Case No.
Q
u
a
n
t
i
t
a
t
i
v
e

M
e
t
r
i
c
s
Digital Image (Great Sphinx) Compression & Reconstruction


0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
2
4
6
8
Case No.
Q
u
a
n
t
i
t
a
t
i
v
e

M
e
t
r
i
c
s
Digital Image (Capitol Columns) Compression & Reconstruction


0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
2
4
6
8
Case No.
Q
u
a
n
t
i
t
a
t
i
v
e

M
e
t
r
i
c
s
Digital Image (Great Goose Pagoda) Compression & Reconstruction


Discrete Entropy
Contrast
Correlation
Dissimilarity
Homogeneity
Discrete Entropy
Contrast
Correlation
Dissimilarity
Homogeneity
Discrete Entropy
Contrast
Correlation
Dissimilarity
Homogeneity

Figure 4. Metrics in Image Compression and Reconstruction

490 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER

Accordingly, more detailed simulation data analysis is
necessary. The information metrics obtained from image
processing of the Pyramid, Columns and Pagoda pictures
are listed from top to bottom in Table 1. For all three
cases, the source image contains the greater amount of
discrete entropy, contrast and dissimilarity than the three
retrieved images, but the smaller amount of the discrete
energy, correlation and homogeneity than the three
retrieved images. Because there is an information loss
occurred across image compression and reconstruction
process, less information content has been covered,
which gives rise to the smaller discrete entropy. The
discrete energy levels differ insignificantly due to
magnitudes of the discrete energy levels themselves,
however, it can still be observed that the source images
contain more information from the smallest discrete
energy values. The source image has the higher contrast
and dissimilarity than three reconstructed images, which
means more color distributions and tone differences
between elements. At the same time, it is also associated
with the bigger distance of dissimilarity. On the other
hand, actual outcomes of correlation and homogeneity
show that three reconstructed images depict less amount
of the structural variations than the source image.
Regarding three approaches, the outcome from the
PCA optimal scheme shows better match to the source
image than those from two other schemes of DFT and
DWT, in terms of all six metrics defined above. The
greater discrete entropy and smaller discrete energy
indicate that the reconstructed images via PCA represent
more intrinsic information and better information match
than reconstructed images via DFT and DWT. Relatively
higher contrast and dissimilarity are corresponding to
more color distributions and tone differences via PCA.
Relatively lower correlation and homogeneity are
corresponding to more structural variations via PCA.
Moreover, mutual information between the source image
and PCA reconstructed image is less than those from two
other cases of DFT and DWT, thus less extra information
can be discovered from one to the other. It also means
that less mismatch occurs between the source and
reconstructed images via PCA than two other cases via
DFT and DWT. These data also imply that the optimal
PCA scheme can produce the better compression result
and show more intrinsic information.
Between the DWT and DFT approaches, data for all
information metrics are quite similar. It is tough to make
conclusions where information is even conflicting. It
shows that the impact of the DWT and DFT compression
schemes vary case by case. No distinctive conclusion can
be achieved between the DWT and DFT.
TABLE 1 METRICS FOR IMAGE COMPRESSION AND RECONSTRUCTION
Pyramid
Metrics
Source
Case 1
2DPCA
Case 2
Wavelet
Case 3
Fourier
Case 4
Discrete
Entropy
6.9415 6.9301 6.9263 6.9285
Discrete
Energy
0.0096 0.0097 0.0097 0.0097
Contrast 0.1657 0.1505 0.1302 0.1340
Correlation 0.9009 0.9071 0.9205 0.9185
Dissimilarity 0.1506 0.1352 0.1184 0.1219
Homogeneity 0.9262 0.9290 0.9420 0.9402
Mutual
Information
0.0114 0.0152 0.0130

NC Column
Metrics
Source
Case 1
2DPCA
Case 2
Wavelet
Case 3
Fourier
Case 4
Discrete
Entropy
7.2803 7.2756 7.2642 7.2676
Discrete
Energy
0.0076 0.0076 0.0077 0.0077
Contrast 0.2939 0.2780 0.2644 0.2676
Correlation 0.9032 0.9082 0.9117 0.9109
Dissimilarity 0.2095 0.2019 0.1852 0.1881
Homogeneity 0.9033 0.9064 0.9150 0.9136
Mutual
Information
0.0047 0.0161 0.0127

Pagoda
Spring
Source
Case 1
2DPCA
Case 2
Wavelet
Case 3
Fourier
Case 4
Discrete
Entropy
6.9771 6.9634 6.9285 6.9059
Discrete
Energy
0.0130 0.0137 0.0144 0.0145
Contrast 0.6620 0.6397 0.6178 0.6203
Correlation 0.8230 0.8249 0.8393 0.8958
Dissimilarity 0.4215 0.4164 0.3553 0.3349
Homogeneity 0.8106 0.8132 0.8467 0.8909
Mutual
Information
0.0077 0.0348 0.0485
V. CONCLUSIONS
Image compression focuses on reducing the number of
the bits needed for image representation and storing
information with suitable quality. It aims to use as little
as possible the size of an image file for data storage.
Taking into account the issues on potential compression
ratios and bit per pixel ratios, lossy compression schemes
are investigated rather than the lossless ones, where
discrete wavelet transform, discrete Fourier transform
and optimal principal component analysis have all been
employed to decompose, compress, and reconstruct
typical grayscale images which symbolize ancient and
modern civilization. From visual appealing, all three
schemes will generate very similar results to source
images. It is hard to differentiate merits and drawbacks of
diverse approaches. To objectively measure the impact of
image compression and reconstruction using three
schemes, the information metrics of discrete entropy,
discrete energy, contrast, correlation, dissimilarity and
homogeneity as well as mutual information have been
introduced to evaluate various approaches. It is then
concluded that that the optimal PCA scheme can produce
relatively better results in digital image compression and
reconstruction than two other schemes.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 491
2011 ACADEMY PUBLISHER

REFERENCES
[1] R. Gonzalez and R. Woods, Digital Image Processing,
3rd Edition, Prentice-Hall, 2007
[2] S. Mitra, "Digital Signal Processing, A Computer Based
Approach", 3rd Edition, McGraw-Hill, 2005
[3] R. Duda, P. Hart, D. Stork, Pattern Classification, 2nd
Edition, John Wiley and Sons, 2000
[4] Simon Haykin, Neural Networks A Comprehensive
Foundation, 2nd Edition, Prentice Hall, 1999
[5] David MacKay, "Information Theory, Inference and
Learning Algorithms", Cambridge University Press, 2003
[6] Z. Ye, H. Cao, S. Iyengar and H. Mohamadian, "Medical
and Biometric System Identification for Pattern
Recognition and Data Fusion with Quantitative
Measuring", Systems Engineering Approach to Medical
Automation, Chapter Six, pp. 91-112, Artech House
Publishers, ISBN978-1-59693-164-0, October, 2008
[7] Z. Ye, H. Mohamadian, Y. Ye, "Information Measures for
Biometric Identification via 2D Discrete Wavelet
Transform", Proceedings of 2007 IEEE International
Conference on Automation Science and Engineering,
pp.835-840, September 22-25, 2007, Scottsdale, USA
[8] Z. Ye and G. Auner, Principal Component Analysis for
Biomedical Sample Identification, Proceedings of the
2004 IEEE International Conference on Systems, Man and
Cybernetics (SMC 2004), pp. 1348-1353, Oct. 10-13,
2004, Hague, Netherlands
[9] Z. Ye, H. Mohamadian, Y. Ye, "Quantitative Study of
Information Loss in Digital Image Compression and
Reconstruction", Proceedings of the 2011 International
Symposium on Data Storage and Data Engineering
(DSDE), May 13-15, 2011, Xi'An, China
[10] Y. Zhang and D. Adjeroh, "Prediction by Partial
Approximate Matching for Lossless Image Compression",
IEEE Transactions on Image Processing, pp. 924-935,
Vol. 17, NO. 6, JUNE 2008
[11] J. Jeng, C. Tseng, and J. Hsieh, "Study on Huber Fractal
Image Compression", IEEE Transactions on Image
Processing, pp. 995-1003, Vol. 18, NO. 5, May 2009
[12] H. Ates, and M. Orchard, "Spherical Coding Algorithm for
Wavelet Image Compression", IEEE Transactions on
Image Processing, pp. 1015-1024, Vol. 18, NO. 5, May
2009
[13] L. Ma and K. Khorasani, "Application of Adaptive
Constructive Neural Networks to Image Compression",
IEEE Transactions on Neural Networks, pp. 1112 -1126,
Vol. 13, NO. 5, Sept. 2002
[14] A. Shoa and S. Shirani, "Optimized Atom Position and
Coefficient Coding for Matching Pursuit-Based Image
Compression", IEEE Transactions on Image Processing,
Vol. 18, No. 12, pp. 2686-2694, December 2009
[15] X. Wu, X. Zhang, and X. Wang, "Low Bit-Rate Image
Compression via Adaptive Down-Sampling and
Constrained Least Squares Upconversion", IEEE
Transactions on Image Processing, pp. 552-561, Vol. 18,
NO. 3, March 2009
[16] E. Filho, E. Silva, M. Carvalho, and F. Pinag, "Universal
Image Compression Using Multiscale Recurrent Patterns
With Adaptive Probability Model", IEEE Transactions on
Image Processing, pp. 512-526, Vol. 17, No. 4, April 2008
[17] B. Li, R. Yang, and H. Jiang, "Remote-Sensing Image
Compression Using Two-Dimensional Oriented Wavelet
Transform", IEEE Transactions on Geoscience and
Remote Sensing, pp. 1-15, 2011

[18] J. Lin, M. Smith, "New Perspectives and Improvements on
the Symmetric Extension Filter Bank for Subband Wavelet
Image Compression", IEEE Transactions on Image
Processing, pp. 177-189, Vol. 17, NO. 2, Feb. 2008
[19] S. Lo, H. Li, and M. Freedman, "Optimization of Wavelet
Decomposition for Image Compression & Feature
Preservation", IEEE Transactions on Medical Imaging, pp.
1141-1149, Vol. 22, No. 9, Sept 2003
[20] Y. Chang, C. Han, et.al., "Greedy Modular Eigenspaces
and Positive Boolean function for Supervised
Hyperspectral Image Classification", Optical Engineering
42(09), pp.2576-2587, September 2003
[21] C. Yang, "A Fuzzy-Statistics-Based Principal Component
Analysis Method for Multispectral Image Enhancement
and Display", IEEE Transactions on Geoscience and
Remote Sensing, pp. 3937-3947, Vol. 46, NO. 11,
November 2008
[22] Z. Ye, Y. Ye, H. Mohamadian, "Biometric Identification
via PCA and ICA Based Pattern Recognition",
Proceedings of the 2007 IEEE International Conference on
Control and Automation (ICCA 2007), pp. 1600-1604,
May 30-June 1, 2007, Guangzhou, China
[23] Z. Ye and R. Turner, "Intelligent Linear and Nonlinear
Analysis for Biometric Fingerprint Recognition",
Proceedings of the 39th Southeastern Symposium on
System Theory, pp. 315-319, March 4-6, 2007, Macon,
Georgia, USA










Dr. Yes research interests include modeling, control and
optimization across a broad spectrum of diverse applications on
electrical, mechanical, automotive and biomedical systems, as
well as signal processing and image processing. Dr. Ye is the
first multi-disciplinary researcher internationally who has the
first author publications covering all the leading control
proceedings in three most prestigious engineering societies
(IEEE, ASME, SAE), specifically, IEEE (CDC, CCA, SMC,
ACC, ISIC, FUZZ, IJCNN, CEC, CASE, ICCA, SOSE, WCCI
Congress, MSC Congress), ASME World Congress and SAE
World Congress. Dr. Ye also has Sole Authorships in IEEE
Transactions and SAE Transactions. He was an academic
reviewer for over 150 articles submitted to the IEEE, ASME,
SAE and various International Journals. Dr. Ye is the recipient
of the Chinese National Fellowship (First Prize) at Tianjin
University, the USA Allied Signal Fellowship (First Prize) at
Tsinghua University and the Most Outstanding Faculty of
Electrical Engineering Department at Southern University. He
was selected for inclusion in Marquis Whos Who in 2008.




Zhengmao Ye received the B.E. degree
from Tianjin University and the first
M.S. degree from Tsinghua University,
China; the second M.S. and Ph.D degrees
in electrical engineering from Wayne
State University, USA. Currently he
serves as an Associate Professor in the
Department of Electrical Engineering,
Southern University at Baton Rouge,
USA. He is a Senior Member of IEEE
and the Founder and Director of Systems
and Control Lab at Southern University.
492 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER


















Yongmao Ye graduated from School of
Electronic and Information Engineering
at Tianjin University, China. He currently
serves as Chief Engineer of Multimedia
Resource Administration in Technology
Center, Liaoning Radio and Television
Station, China. His professional expertise
and research interest include High
Definition Television (HDTV), Digital
Television (DTV), Advanced Television
(ATV), and Multimedia Technology, as
well as High Performance Signal and
Image Processing. He has more than
twenty refereed Journal and International
Conference publications within these
academic fields.
Habib Mohamadian ASME Fellow,
received B.S. degree from University of
Texas at Austin, M.S. and Ph.D degrees
in College of Engineering from Louisiana
State University. Dr. Mohamadian serves
as the Professor and Dean, College of
Engineering, Southern University at
Baton Rouge, Louisiana, USA. The Dean
oversees the College's strategic planning,
program development, academic affairs,
government and industry relations, and
research initiatives. His research interests
include various aspects of engineering
education and practice.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 493
2011 ACADEMY PUBLISHER

An Improved Fast SPIHT Image Compression
Algorithm for Aerial Applications

Ning Zhang
Graduate School of Chinese Academy of Sciences, Beijing, China
Institute of Optics, Fine Mechanics and Physics Chinese Academy of Sciences, ChangChun, China
Email: scorode@163.com

Longxu Jin
Institute of Optics, Fine Mechanics and Physics Chinese Academy of Sciences, ChangChun, China
Email: jinlx@ciomp.ac.cn

Yinhua Wu
Graduate School of Chinese Academy of Sciences, Beijing, China
Institute of Optics, Fine Mechanics and Physics Chinese Academy of Sciences, ChangChun, China
Email: yhwcn@msn.com

Ke Zhang
Institute of Optics, Fine Mechanics and Physics Chinese Academy of Sciences, ChangChun, China
Email: ke_ogg@163.com



AbstractIn this paper, an improved fast SPIHT algorithm
has been presented. SPIHT and NLS (Not List SPIHT) are
efficient compression algorithms, but the algorithms
application is limited by the shortcomings of the poor error
resistance and slow compression speed in the aviation areas.
In this paper, the error resilience and the compression speed
are improved. The remote sensing images are decomposed
by Le Gall5/3 wavelet, and wavelet coefficients are indexed,
scanned and allocated by the means of family blocks. The
bit-plane importance is predicted by bitwise OR, so the N
bit-planes can be encoded at the same time. Compared with
the SPIHT algorithm, this improved algorithm is easy
implemented by hardware, and the compression speed is
improved. The PSNR of reconstructed images encoded by
fast SPIHT is higher than SPIHT and CCSDS from 0.3 to
0.9db, and the speed is 4-6 times faster than SPIHT
encoding process. The algorithm meets the high speed and
reliability requirements of aerial applications.
Index Termsimage compression, fast SPIHT coding, error
resistance, bit-plane parallel
I. INTRODUCTION
With the development of high-resolution remote
sensing technology at home and abroad, the space camera
sampling rate and the remote sensing image resolution
are higher and higher. Therefore, a high error resistance
and compression speed algorithm has become the
research focus in the field of remote sensing image
compression in the communication channel limited.
There are some compression chips on the market, such as
ADI's ADV202 and ADV212. But the largest input speed
is only 65Mpixel/s, and the pixel depth cant meet all
requirements. Therefore, to meet the engineering
requirements, the reliable and efficient compression
system must be studied.
1996, SPIHT (set partitioning in hierarchical trees)
algorithm [1] was proposed by Said and Pearlman, which
adopts spatial orientation tree structure, and can
effectively extract the significant coefficients in wavelet
domain. SPIHT has less extremely flexible features of bit
stream than JPEG2000, but SPIHT has low structure and
algorithm complexity relatively, and supports multi-rate,
has high signal-to-noise ratio (SNR) and good image
restoration quality, so it is suitable for encoding
occasions with a high real-time requirement. Wavelet
domain coefficients are scanned by three lists of SPIHT,
which named: the list of insignificant pixels (LIP), the list
of significant pixels (LSP) and the list of insignificant
pixels sets (LIS). Each scan is from the highest bit-plane
to the lowest bit-plane. The encoding speed is limited via
the repetitive scans and dynamic update of three lists, and
it isnt conducive to hardware implementation.
Subsequently, F.W.Wheeler and W.A.Pearlman
proposed a variant of the original SPIHT algorithm can
be achieved with hardware, named NLS (Not List
SPIHT)[2]. The process of scans, the sets, and the
compression ratio of NLS are the same with SPIHT. The
IP, IS, REF of NLS are corresponding to LIP, LIS, LSP
three lists of SPIHT. NLS solves the problem of hardware
implementation, but does not improve the encoding
speed.
In the aerial space, there are a lot of charged particles
and cosmic rays, so the bits stream in storage devices
may be disturbed by radiation effects, and then errors will
be caused. The output bits stream of SPIHT and NLS are
in accordance with the order of scans, so the whole image
494 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
doi:10.4304/jmm.6.6.494-501

reconstruction will be influenced when an error
generated.
Based on the above discussions, SPIHT and NLS are
not conducive to hardware implementation. Actually,
SPIHT and NLS are not used widely by virtue of the slow
scanning speed and the poor error resistance factors. In
this paper, the scanning process is simplified, and wavelet
domain coefficients are stored by family blocks. The
error resilience and encoding speed are improved, so the
compression algorithm can be used in the aerial camera
image compression system.
II. THE IMPROVEMENTS AND DEFICIENCIES OF NLS
A. The improvements of NLS
The spatial orientation tree structure is still adopted in
NLS. The wavelet domain coefficients are separated into
fathers, children, and grandchildren, as shown in the Fig.
1. Compared with SPIHT, NLS is improved at the aspect
of hardware implementation, as follows:



Figure 1. Spatial orientation tree structure of wavelet coefficients
The linear index technology is introduced. Linear
index uses one number instead of the two-dimensional
index. R = C = 2
N
, R, C are the number of rows and
columns of the image. (r, c) is the coordinates of
coefficients, and r, c, can be wrote as binary numbers:
1 2 1 0
1 2 1 0
[ , , , , ]
[ , , , , ]
N N
N N
r r r r r
c c c c c


=
=
(1)
Where r
n
, c
n
(n = 1,2, ... N-1) is a binary number. The
linear index can be defined as:
1 1 2 2 1 1 0 0
[ , , , , , , , , ]
N N N N
i r c r c r c r c

= (2)
Coordinates of a 8*8 image coefficients are arranged
by the linear index, as shown Table I.
TABLE I. THE LINEAR INDEX


A state table Mark is used to show the state of each
wavelet coefficient instead of three lists of SPIHT. Mark
is updated by the results of each scan process. Val is used
to store the coefficients arranged by the linear index. Two
maximum arrays, dmax [i] and gmax [i], are introduced,
so that it is not computing the maximum value of
descendant sub-bands coefficients repeatedly. They are
the maximum magnitude of the descendant (dmax) and
the maximum magnitude of the grand descendant (gmax),
and can be computed by (3) and (4), separately. They are
no longer updated during the encoding process.
max( ) max( max(4 ), max(4 1), max(4 2), max(4 3)) g i d i d i d i d i = + + +

(3)
max( ) max( (4 ), (4 1), (4 2), (4 3), max( )) d i val i val i val i val i g i = + + +

(4)
NLS uses one-dimensional linear index instead of the
two-dimensional index, and adopts the SKIP function to
skip the unimportant pixels blocks. The state table Mark
can be implemented by hardware easily.
B. Time consumption and error resistance of NLS
Wavelet transform coefficients are rearranged by the
linear index module. The address of high frequency sub-
bands of coefficients is fragmented in the linear index
lookup table. A standard image Lena of 512 * 512* 8 bits
is done the 3-level DWT, as shown in the Figure 1. In the
last row of the sub-band HH1, the coefficients (511, 256),
(511,511) can be wrote as the binary number (11111,1111,
10000,0000), (11111,1111,11111,1111), corresponding
the index value 240298, 262143 on the linear index
lookup table, respectively. The linear index value of each
coefficient must be corresponding with the storage RAM
address by the means of address mapping. The lookup
table need a lot of memory space to be stored, and the
computing process is more and more complicated with the
increasing rows and columns. NLS can properly encode or
not, depends on whether the wavelet domain coefficients
are arranged properly. Therefore, the linear index process
must be simplified.
The scanning process of IP, IS, REF adopt the serial
processing approach, a bit-plane by a bit-plane, from
MSB to LSB. So the processing of a bit-plane needs scan
the image 3times, and 24 times for a 8 bit-planes image.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 495
2011 ACADEMY PUBLISHER

The output message of the scanning process of each bit-
plane includes the information for decoder and the
scanning process of next bit-plane. If some errors in the
code stream, the next scanning and decoding process
would be wrong, and disastrous consequences would
happen in the image reconstruction process. The cost time
of the scanning process is the bottleneck of the time
requirement of the whole compression system. The image
could be correct reconstructed or not, demands on
whether the output codes stream is right. Therefore, the
scanning process need be improved.
III. IMPROVED INITIALIZATION OF SPIHT
In this paper, wavelet domain coefficients are arranged
by the linear index module, and dmax, gmax value arrays
are computed at the same time. Wavelet domain
coefficients are divided into many family blocks. A
family is composed of one coefficient of the HL3, HH3,
LH3 sub-bands (a pixel), four coefficients of HL2, HH2,
LH2 sub-bands (FourBlock), and sixteen coefficients of
HL1, HH1, LH1 sub-bands (SixteenBlocks ), as shown in
the Fig.1. 32*32 families form a family block, which is
stored in a RAM.
Wavelet domain coefficients at all levels (except LL3
sub-band) are stored in the family blocks RAM by the
mapping address module directly, as shown in Fig.2.
There are 12 family blocks totally. Coefficients in the LL3
sub-band are stored separately. Compared to the whole
image coefficients, the number of coefficients in each
family block is small, so the computational complexity of
the linear index process is greatly reduced. All family
blocks share the linear index module, which also reduces
the memory space requirements. The size of the family
block can be adjusted according to the hardware
resources, such as 16*16, 64*64 and so on. The output
code streams are stored in blocks separately. If an error
generated in a scanning process, the wrong codes would
be constrained in the only block, and not disturb other
family blocks. The error resistance has been improved.
Each family block is encoded by the same scanning
process, and the encoding module can be easy generated
by the means of citing the module in Verilog HDL or
VHDL. Therefore, the encoding efficiency has been
improved. The improved initialization process is shown in
the Fig.2.


Figure 2. Improved initialization of SPIHT
IV. PARALLEL SPIHT
SPIHT and NLS adopts the serial processing approach,
form MSB to LSB, because the processing for a bit-plane
needs the results of the previous bit-planes processing.
The parallel SPIHT algorithm in this paper can deal with
all bit-planes simultaneously. It also adopts three lists
(LSP, LIP, LIS) instead of the state Mark, but the
scanning results of all bit-planes can be obtained at the
same time. The proposed algorithm predicts each pixels
state by the bitwise OR operation. The bitwise OR results
of the first n-1 bits of the val, dmax and gmax value are
computed in the scanning process. In addition, 3 times
image scanning is need for a bit-plane in NLS. While the
fast SPIHT handles three processes in NLS through one
scan. Therefore, the scanning speed is only relative to the
image resolution. Bitwise OR is easy implemented by
hardware, so it is a real-time process.
Define: PixelOR is the bitwise OR result of the first n-1
bits of val for each pixel, and PixelOR is 0 for the first bit-
plane. DmaxOR is the bitwise OR result of the first n-
1bits of dmax for each FourBlock, and DmaxOR is 0 for
the first bit-plane. GmaxOR is the bitwise OR result of the
first n-1bits of gmax for each SixteenBlock, and GmaxOR
is 0 for the first bit-plane. PxielBit,DmaxBit and GmaxBit
is the nth bit of val , dmax and gmax respectively.
PixelOR, DmaxOR and GmaxOR modules can be used
to determine whether the pixel, the FourBlock and the
SixteenBlock have been an important element
respectively. When 1, it indicates that there is at least one
bit 1 in the first n-1 bit-planes. When 0, it indicates that
the first n-1 bits are all 0, and the pixel is unimportant.
MGpredict is the bitwise OR result of the first n bits of
FourBlocks dmax in the HL2, HH2 and LH2 sub-bands
for each SixteenBlock. MGpredict can be used to
determine whether the SixteenBlock has been flagged as
MG (a kind of state of the pixel in NLS, as the L-type
entry in SPIHT).
Sign is the sign for each pixel. LIP, LIS, and LSP are
three lists, which correspond to the output stream of the
IP, IS and REF process of NLS. LIP, LIS and LSP
correspond to the output stream of the IPP, ISP and RP
process of NLS respectively.
Specific description of the parallel SPIHT algorithm:


z The 3
rd
level
DWTcoefficients(LL3,LH3,HL3,HH3):
if (PixelOR==1) output PixelBit to the lsp
else
output PixelBit to the lip
if(PixelBit ==1) output sign to the lip
z The 2
nd
level DWT coefficients (LH2,HL2,HH2):
if(DmaxOR==1)
for each pixelFourBlock
if(PixelOR==1)output PixelBit to the lsp
else
output PixelBit to the lip
if(PixelBit==1)output sign to the lip
else
output DmaxBit to the lis
496 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER

if(DmaxBit ==1)
for each pixelFourBlock
output DmaxBit to the lis
if(DmaxBit ==1) output sign to the lis
z The 1
st
level DWT coefficients(LH1,HL1,HH1):
if(MGpredict==1)
if(GmaxOR==1)
for each FourBlockSixteenBlock
if(DmaxOR ==1)
for each pixelFourBlock
if(PixelOR==1)output PixelBit to the lsp
else
output PixelBit to the lip
if(PixelBit ==1)output sign to the lip
else
output DmaxBit to the lis
if(DmaxBit ==1)
for each pixelFourBlock
output PixelBit to the lis
if(PixelBit ==1)output sign to the lis
else
output GmaxBit to the lis
if(GmaxBit ==1)
for each FourBlockSixteenBlock
output GmaxBit to the lis
if(DmaxBit ==1)
for each pixelFourBlock
output PixelBit to the lis
if(PixelBit ==1)output sign to the lis

The flowchart of the parallel SPIHT algorithm is
shown in the Fig.3 [3].

V. BIT-RATE CONTROL
Bit-Rate control is a process by which the bitrates are
allocated in each code-block in each sub-band in order to
achieve the overall target encoding bit-rate for the whole
image while minimizing the distortion (errors) introduced
in the reconstructed image due to quantization and
truncation of codes to achieve the desired code rate.
The scanning processes of SPIHT and NLS are from
the MSP to the LSP, and it is truncated at a bit-plane
when the communication channel bandwidth is limited.
All wavelet domain coefficients are quantified at the
same step size, and it is bigger when the bit-rate is lower.
The serious distortion and false contour may be
introduced.
In this paper, the entropy of each sub-band family is
estimated according to LOCO-I MED prediction
algorithm in [5] when coefficients are indexed, as shown
in Fig. 2. The number of output codes of each sub-band
family is allocated according to compression rate
requirements, and codes are truncated from high to low
bit-planes to achieve rate control.
Y
Y
Y
Y
N
N Y
Y
iLH1HL1HH1
MGpredict=1
GmaxOR=1
Output GmaxBit
to lis
Each FourBlcok
SixteenBlock
DmaxOR=1
GmaxBit=1
Each FourBlock
SixteenBlcok
Output DmaxBit
to lis
DmaxBit=1
Each Pixel
FourBlcok
Output PixelBit
to lis
PixelBit=1
Output sign to
lis
iLL3
LH3
HL3
HH3
Y
Y
Output DmaxBit
to lis
DmaxBit=1
Each Pixel
FourBlock
Output PixelBit
to lis
PixelBit=1
Output sign
To lis
PixelOR=1
Output PixelBit
to lip
PixelBit=1
Output
PixelBit
to lsp
Y
N
Y
Each Pixel
FourBlock
Output sign
To lip
iLH2
HL2
HH2
N
i=i+1

Figure 3. The flowchart of the scanning process
The forecast template is shown in Fig. 4:
c b
a x


Figure 4. The forecast template and linear index order
The prediction coefficients:

min( , ), max( , )
max( , ), min( , )
,
a b c a b
x a b c a b
a b c otherwise

(5)
The forecast expression is shown in figure 5. The
coefficient at the boundary is predicted by the former one.
The FamilyBlock is composed of three SubBands, and
s
H is the prediction entropy of a SubBand.

s i i
i
H x x =

(6)
i SubBand ,s=1 2 3.
i
x and
i
x are the reality
value and the prediction value respectively. Similarly
f
H ,
f s
s
H H =

is the prediction entropy of a


FamilyBlock, and
3 LL
H is the prediction entropy of LL3.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 497
2011 ACADEMY PUBLISHER

The prediction entropy of the whole coefficients can be
defined as follows:
3 LL f
f
H H H = +

(7)
f is the number of the FamilyBlocks, f=1212.
f
W is the pre-allocated weight of each
FamilyBlock, which is defined as /
f f
W H H = .
fs
W is
the pre-allocated weight of the s SubBand of the f
FamilyBlock, which is defined as :
/
fs fs
W H H = (8)
fs
H is the prediction entropy of the s SubBand of the f
FamilyBlock. Similarly,
3 LL
W is the pre-allocated weight
of the LL3 sub-band, which is defined as :

3 3
/
LL LL
W H H = (9)
The sum of all weights is equal to 1, as
3
1
LL f
f
W W + =

.
Rate is the compressed bit-rate, and the size of the
image is R C .Then the pre-allocated code streams of
each SubBand can be defined as :
3 3
fs fs
LL LL
Bit Rate R C W
Bit Rate R C W
=
=
(10)
Bit-rate of each SubBand is controlled by (10),
according to the channel bandwidth.
VI. EXPERIMENTAL RESULTS
The standard images of Lena, Goldhill, Aerial, and
Fukushima nuclear plant (before and after disaster) are
chose as experimental images. The size of images is 512
*512*8bits. Images are done the 3-level DWT with
matlabR2008. The computer CPU is Intel Pentium Dual
E2160 1.8GHz, and the memory size is 1G. The PSNR of
Goldhill reconstruction compressed by fast SPIHT,
SPIHT, NLS and CCSDS are shown in the Fig.5, and the
PSNR of reconstruction images compressed by fast
SPIHT at different bit-rate are shown in the Fig. 6.
Fast SPIHT processes 8 bit-planes at the same time
containing three lists, so the code-stream must be
reordered. The output streams are adjusted by the order
process in accordance with the sequence of SPIHT, so the
decompressed process is the same with SPIHT, and the
compression ratio can be controlled. The time cost of
SPIHT encoding at different bit-rate are shown in the
Fig.7, also the time cost and PSNR of fast SPIHT
encoding at bit-rate 1bpp are shown in the table II.

Figure 5. PSNR of Goldhill reconstruction compressed
by diffeient methods

Figure 6. PSNR of reconstruction images compressed by fast SPIHT at
different bit-rate

Figure 7. Time cost of SPIHT encoding at different bit-rate


498 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER

TABLE II. CONSEQUENCES OF FAST SPIHT ALGORITHM



Goldhill reconstruction images PSNR=32.83 db



Aerial reconstruction images PSNR=28.65 db


Fukushima nuclear plant (before disaster) reconstruction images
PSNR=30.59 db



Fukushima nuclear plant (after disaster) reconstruction images
PSNR=31.84 db

Figure 8. Reconstruction Images compressed by fast SPIHT, Bit-
Rate=0.5bpp
Goldhill, Aerial and Fukushima nuclear plant (before
and after disaster) reconstruction images at bit-rate 0.5bpp
are displayed in the Fig.8. By comparing images of the
nuclear plant before and after disaster, the situation of
plant damaged can be seen clearly. The image of nuclear
plant after disaster is smoother than the image before, so
the PSNR is higher at the same bit-rate.
The experiments results show that the PSNR of
reconstruction images encoded by fast SPIHT are
increased 0.3 to 0.9db, and the time are only 1/4-1/6 of
SPIHT encoding process. Implemented by hardware, the
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 499
2011 ACADEMY PUBLISHER

speed can be further improved via virtue of parallelism
and pipelining.
VII. CONCLUSIONS
In this paper, the error resilience and compression
speed are improved. The compression algorithm based on
set partitioning in hierarchical scans coefficients by the
serial processing approach. The encoding speed is limited
by repeatedly scans. A new fast SPIHT algorithm is
proposed, which can deal with all bit-planes
simultaneously, and the speed is only relative to the
image resolution. The coefficients are divided into many
family blocks, stored in block RAMs separately. The
algorithm is suitable for a fast, simple hardware
implementation, and can be used in the field of aerial
image compression system, which requiring the high
speed and high error resilience.
ACKNOWLEDGMENT
I wish to thank my advisor, Dr. Long xu Jin, for his
guidance and support. Furthermore, the project has been
accomplished with the help of engineer, Ke Zhang, and
Dr. Yinhua Wu. Last, I am also deeply thankful to my
family, they are always giving me a lot of encourage in
my research and life.
REFERENCES
[1] A.Said, and W.A.Pearlman. A new fast and efficient
image codec based on set partitioning in hierarchical
trees, IEEE Transactions on Circuits and Systems for
Video Technology, 1996, 6(6): 243250.
[2] F.W.Wheeler. and W.A.Pearlman. SPIHT image
compression without lists, IEEE Int. Conf on Acoustics,
Speech and Signal Processing. (ICASSP2000).
Istanbul:IEEE,2000.2047-2050.
[3] J.M.Shapiro. Embedded image coding using zerotrees of
wavelet coefficients, IEEE Trans. On Signal
Processing,vol.41,pp.3445-3462,December 1993.
[4] Yong Xu, Zhi yong Xu, Qi heng Zhang, Ru jin Zhao.
Low complexity image compression scheme for hardware
implementation, Optics and Precision Engineering.
Precision Eng., 2009, 17(9):2262-2268.
[5] J.H.Zhao, W.J.Sun, Z.Meng, Z.H.Hao. Wavelet transform
characteristics and compression coding of remote sensing
images, Optics and Precision Engineering, Vol.12(2),
pp.205-210, 2004.
[6] H.L.Xu, S.H.Zhong, Image Compression Algorithm of
SPIHT Based on Block-Tree, Journal of Human Institute
of Engineering,Vol.19(1),pp.58-61,2009.
[7] Feng Zhao, Dong feng Yuan, Hai xia Zhang, Ting fa Xu.
Multi-DSP real-time parallel processing system for image
compression, Optics and Precision Engineering.
Precision Eng., 2007, 15(9):1451-1455.
[8] Xu cheng Xue, Shu yan Zhang, Yong fei Guo.
Implementation of the No Lists SPIHT Image
Compression Algorithm Using FPGA, Micro Computer
Information, vol.24(62),pp.219-220,2008.
[9] Sen Ma, Yuanyuan Shang, Weigong Zhang, Yong Guan,
Qian Song, Dawei Xu, Design of Panoramic Mosaic
Camera Based on FPGA Using Optimal Mosaic
Algorithm, JOURANL OF COMPUTERS, vol.6(7),
PP.1378-1385,2011.
[10] Zhaohui Zeng, Yajun Liu,Construction of High
Performance Balanced Symmetric Multifilter Banks and
Application in Image Processing, JOURANL OF
COMPUTERS, vol.5(7), PP.1038-1045,2010.
[11] Yin hua Wu, Long xu Jin, Hong jiang Tao. An improved
fast parallel SPIHT algorithm and its FPGA
implementation, The 2010 2
nd
International Conference
on Future Computer and Communication.
2010,Vol.1,pp.191-195.
[12] Nileshsingh V.Thaker, Dr.O.G.Kakde, Color Image
Compression with Modified Fractal Coding on Spiral
Architecture, JOURANL OF MULTIMEDIA, vol.2(4),
PP.55-66,2007.
[13] Zhenbing Liu, Jianguo Liu, Guoyou Wang, An Arbitrary-
length and Multiplierless DCT Algorithm and Systolic
Implementation, JOURANL OF COMPUTERS, vol.5(5),
PP.725-732,2010.
[14] Hualiang Zhu, Chundi Xiu,Dongkai Yang, An Improved
SPIHT Algorithm Based on Wavelet Coefficient Blocks
for Image Coding, ICCASM2010, vol.2, PP.646-649,2010.
[15] Li Wern Chew, Li Minn Ang, Kah Phooi Seng, Reduced
Memory SPIHT Coding Using Wavelet Transform with
Post-Processing, IHMSC2009, vol.1, PP.371-374,2009.
[16] Kong Fan-qiang, Li Yun-song, Wang Ke-yan, Zhuang
Huai-yu. An Adaptive Rate Control Algorithm for
JPEG2000 Based on Rate Pre-allocation, Journal of
Electronics & Information Technology, 2009,31(1):66-70.
[17] Du Lie-bo, Xiao Xue-min, Luo Wu-Sheng, Lu Hai-bao.
Quantification removing for satellite on-board remote
image JPEG2000 compression algorithm, Optics and
Precision Engineering. Precision Eng., 2009, 17(3):690-
694.
[18] Wang Ren-long, Hao Yan-ling, Liu Ying. Embedded
block wavelet coding method based on block bit-length,
Optics and Precision Engineering. Precision Eng., 2009,
16(7):1315-1322.
[19] Tian Bao-feng, Xu SHu-yan, Sun Rong-chun, Wang Xin,
Yan De-jie. A lossy compression algorithm of remote
sensing image suited to space-borne application, Optics
and Precision Engineering. Precision Eng., 2006,
14(4):725-730.
[20] Lei Jie, Kong Fan-qiang, Wu Cheng-ke, Li Yun-song.
Hardware oriented rate control algorithm for JPEG2000
and its VLSI architecture design, Journal of xidian
university.2008, 35(4):645-649.
[21] Tingku Acharya, Ping-Sing Tsai. JPEG2000 standard for
image compression, A john wiley& sons,INC.,
publication.2005.
[22] Zhang Xue-quan, Gu Xiao-dong, Sun Hui-xian. Design
and implementation of CCSDS-based onboard image
compression unit using FPGA, Semiconductor
optoelectronics. 2009,30(6):935-939.
[23] Limin Ren, Web image retrieval in web pages, The 2010
2
nd
International Conference on Future Computer and
Communication. 2010,Vol.1,pp.26-31.
[24] T.B.Ma, the research of remote sensing image
compression based on the wavelet transform of SPIHT
algorithm, Master Dissertation,2008.
[25] Zhang Su-wen, Wang Li-li, Miao Dan-dan, an improved
embedded zerotree wavelets image coding algorithm,
Infrared Technology.2008,30(9)541-545.


500 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER

Ning Zhang is a doctoral candidate of Graduate School of
Chinese Academy of Sciences, China. He was born in
Shandong province of china in 1982. He received his B.Eng.
degree in Communication Engineering of Jilin University of
China in 2006. He has been working and studying in Institute of
Optics, Fine Mechanics and Physics Chinese Academy of
Sciences, Changchun, China since 2006. His research interests
include image processing and hardware system designing.


Longxu Jin, PhD, a researcher. He was born in Jilin province
of China in 1965. He received his B.Eng. degree in Machinery
& Electronics Engineering of Optics and Mechanics college of
Changchun, China, in 1987, and M.Eng. and PhD. degree in
Electronic Engineering of Institute of Optics, Fine Mechanics
and Physics Chinese Academy of Sciences, Changchun, China,
in 1993 and 2003 respectively. He has been working as a
researcher in department of Space Optics of Electronic
Engineering of Institute of Optics, Fine Mechanics and Physics
Chinese Academy of Sciences, Changchun, China, since
1993.His current research interests include digital image
acquisition and processing, space camera intelligent controlling.

Yinhua Wu is a doctoral candidate of Graduate School of
Chinese Academy of Sciences, China. She was born in Jilin
province of china in 1984.She received her B.Eng. degree in
Electronic Engineering of Jilin University of China in 2007. She
has been working and studying in Institute of Optics, Fine
Mechanics and Physics Chinese Academy of Sciences,
Changchun, China since 2007. Her research interests include
video compressing and transmitting, FPGA system designing.


Ke Zhang was born in Shandong province of china in 1979. He
received his B.Eng. degree in Electronic Engineering of Harbin
Engineering University of China in 2003, and his M.Eng.
degree in Tianjin polytechnic university of China in 2006. He
has been working in Institute of Optics, Fine Mechanics and
Physics Chinese Academy of Sciences, Changchun, China since
2006. Currently he is a study engineer, and his research interests
include space camera controlling and image acquisition.




JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 501
2011 ACADEMY PUBLISHER
3D Tracking and Positioning of Surgical
Instruments in Virtual Surgery Simulation

Zhaoliang Duan
School of Computer, Wuhan University, Wuhan 430072, China
Email: dzlwhu@gmail.com

Zhiyong Yuan, Xiangyun Liao, Weixin Si, Jianhui Zhao
School of Computer, Wuhan University, Wuhan 430072, China
Email: zhiyongyuan@whu.edu.cn, xyunliao@gmail.com, wxsics@gmail.com, jianhuizhao@whu.edu.cn



Abstract3D tracking and positioning of surgical
instruments is an indispensable part of virtual Surgery
training system, because it is the unique interface for trainee
to communicate with virtual environment. A suit of 3D
tracking and positioning of surgical instruments based on
stereoscopic vision is proposed. It can capture spatial
movements of simulated surgical instrument in real time,
and provide 6 degree of freedom information with the
absolute error of less than 1 mm. The experimental results
show that the 3D tracking and positioning of surgical
instruments is highly accurate, easily operated, and
inexpensive. Combining with force sensor and embedded
acquisition device, this 3D tracking and positioning method
can be used as a measurement platform of physical
parameters to realize the measurement of soft tissue
parameters.

Index TermsVirtual surgery simulation system,
Stereoscopic vision, Surgical instruments, 3D tracking and
positioning
I. INTRODUCTION
With advances in robotics and information technology,
computer graphics (CG) and virtual reality (VR) have
been increasingly applied to the field of medicine [1]. As
the cutting-edge interdisciplinary research field of
information and medical sciences, research on virtual
surgery simulation system has significant application
value for reducing surgery risks, cutting training cost and
protecting human health [2]. With the help of virtual
surgery training platform, trainee surgeons can skillfully
master the operations of surgical instruments, general
procedure of surgery and anatomy of diseased region or
organ.
The accurate displacement and force response of tissue
model is a key part of VR based surgery training
simulation system [3]. To meet this requirement, we must
model the non-linear heterogeneous nature of soft tissue.
The interaction of surgical instrument and virtual scene
decides the accuracy and effectiveness of displacement
measurement and force response.
In virtual surgery simulation, the interaction of
surgical instrument and virtual scene mainly contains
construction of surgical instruction, rendering of collision
detection and rendering of the interaction and simulation
[4]. In order to simulate the interaction between surgical
instrument and virtual organ tissue vividly in virtual
surgery simulation, the surgical instrument must be
tracked and located accurately in real time.
Currently, there have been some available three
dimensional (3D) trackers in the field of virtual reality.
According to their physical properties, they are roughly
classified into five subcategories: mechanical tracker [5],
magnetic tracker [6-8], ultrasonic tracker [9], optical
tracker [10-11] and hybrid tracker [12]. Some of them
can provide high positioning accuracy, such as [6] and
[10], and have been used in some medical applications.
However, these existing devices are very expensive,
therefore can only be popularized in a limited number of
medical centers and research institutes. A 3D surgical
instrument tracking and positioning method with a high
performance-price ratio has been highly desirable for
computer-based virtual surgery simulation systems [13].
In order to make this goal come true, we present a
method based on stereoscopic vision for 3D tracking and
positioning of surgical instruments based on stereoscopic
vision. This method employs three cameras to capture the
motion images of simulated surgical instrument in real
time. After a series of computer processing, including
camera calibration, reconstruction of 3D coordinates of
markers on simulated surgical instruments and so on, we
can obtain the six degree of freedom information of
simulated surgical instruments, thereby positioning the
instrument. At the end of this paper, we apply the
presented method to accomplish interactive virtual organ
tissue deformation simulation. The experimental results
show that it is feasible and effective in virtual surgery
simulation systems.
The rest of the paper is organized as follows. Section 2
describes the methodology for tracking and positioning

Manuscript received July 1, 2011; revised September 13, 2011;
accepted October 1, 2011.
Project fully supported by a grant from the National Natural Science
Foundation of China (Grant No. 61070079).
Corresponding author, Zhiyong Yuan, Email addresses:
zhiyongyuan@whu.edu.cn
502 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
doi:10.4304/jmm.6.6.502-509
surgical instruments. Section 3 gives the implementation
of the presented method in details. Section 4 provides
experimental results, and then section 5 concludes this
paper.
II. METHODOLOGY FOR 3D TRACKING AND POSITIONING
A. Binocular Vision System
As an ideal linear camera model, pinhole model is the
basic imaging model in computer vision. Fig. 1 illustrates
the abstract graph of pinhole model. Where plane C is the
imaging plane, point O is the camera optical center in
plane C . The coordinate system
1
O UV in plane C is
the image coordinate system. Point
1
O is the projection
of point O in plane C . Axis
c
X and axis
c
Y are
respectively parallel with axis u and axis v in image
coordinate system. Axis
c
Z is the camera optical axis,
perpendicular to plane C . Camera coordinate system
consists of point O and axis
c
X ,
c
Y ,
c
Z .
1
OO is the
camera focal length f . To describe the position of the
camera and surroundings, we adopt a reference
coordinate system
w w w w
O X Y Z , which is called world
coordinate system.

For a point P in world coordinate system, we can
obtain its approximate imaging position p in an image,
that is the intersection point of PO and plane C .
Binocular vision system is the simplest stereoscopic
vision system consisting of two cameras. As shown in Fig.
2, ( , , )
w w w
P x y z is a point in world coordinate system, its
two image points in image planes of two camera are
l
p ( )
l l
u v and
r
p ( )
r r
u v , naming
l
p ( )
l l
u v and
r
p ( )
r r
u v conjugate point. The extension line of photo
center
l
O and point
l
p intersect with the extension line
of photo center
r
O and point
r
p , this intersection point
P is. The 3D coordinates of point P in world coordinate
system can be obtained by the internal and external
parameters of two cameras.
Generally, binocular vision system consists of five
modules: image acquisition, camera calibration, feature
extraction, stereo matching and 3D reconstruction [14].

B. Basic Principle
Our presented method is to utilize cameras to recover
3D coordinates of two markers on the simulated surgical
instruments. Aiming at this goal, each marker must be
covered by at least two cameras. If we use two cameras to
track simulated surgical instrument, we can detect four
feature points, the corresponding image regions of
markers within the motion image of simulated surgical
instrument, at a time, then we classify these four feature
points into two pairs of identical points, and calculate
their image coordinates respectively. Along with the
camera parameters, 3D coordinates of two markers can be
obtained through least square method [15]. Fig. 3 shows
the flowchart of the presented method.

Considering the virtual surgery simulation system is
extremely strict with precision, we employ three cameras
to capture the movement of simulated surgical instrument


Figure 3. Flowchart of our presented method



Figure 1. Pinhole model

Figure 2. Binocular vision system
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 503
2011 ACADEMY PUBLISHER
for the purpose of minimizing the system error introduced
by image acquisition. Three cameras construct three pairs
of cameras groups, and each camera group includes two
cameras. As for each marker, three pairs of cameras
groups obtain three groups of 3D coordinates. By
calculating their average values, we get the final and
more accurate 3D coordinates of markers.
C. System Construction
The 3D tracking and positioning apparatus of surgical
instrument based on our proposed method consists of a
simulated surgical instrument, three cameras and a
computer. Fig. 4 illustrates the hardware distribution of
our developed 3D tracking and positioning apparatus.

The gray circular area is active region of simulated
surgical instrument. The degree of included angle
constructed by any two cameras and the center of gray
circular area is 120.
The cameras are Basler acA1300-30gc cameras with
frame rate 30 Frames/sec and the highest resolution
1092*962. The cameras are connected to the computers.
The main parameters of Basler acA1300-30gc camera are
shown in Table I.
TABLE I.
THE MAIN PARAMETERS OF BASLER ACA1300-30GC CAMERA
The highest resolution 1092*962
Optical Size 1/3 inch
Pixel size (micrometer) 3.75*3.75
Sensor type CCD
Frame rate 30 Frames / sec
Data transmission Network card

Two markers with 70 mm space distance are deployed
on the simulated surgical instrument, and specific
distribution of markers has been shown in Fig. 5. As for
the actual simulated surgical instrument, its main body is
white, while two markers are black.

III. IMPLEMENTATION
A. Camera Calibration
Camera calibration is the most basic step in
stereoscopic vision [16]. Its purpose is to obtain camera
parameters, i.e., the mapping relation between the 3D
coordinates of point ( , , )
w w w
P x y z in word coordinate
system and 2D coordinates point ( , ) p x y , the projection
of point
P
on image plane of camera, in image coordinate
system. The mapping relation can be described by the
equation 1 simply:

11 12 13 14
21 22 23 24
31 32 33 34
1
1 1
w w
w w
c
w w
x x
x m m m m
y y
z y m m m m M
z z
m m m m





= =






(1)
Where
c
Z is Z coordinate of point P in camera
coordinate system, M is so called projection matrix
determined by camera parameters. Once we obtain
camera parameters, we do not need to calculate them
again until camera is moved. Generally speaking, the
detailed calibration process includes the following five
steps.
Step 1. Generation of planar calibration plate.
We adopt a regular 77 black-and-white checkerboard
as the pattern on the calibration plate, as shown in Fig. 6,
the size of each checker is 3030 mm.

Step 2. Acquisition of calibration plate images
We utilize multithread and soft trigger techniques to
synchronously capture calibration plate images. Fig. 7
gives two calibration plate images.



Figure 4. Hardware distribution of the 3D tracking and positioning
apparatus

Figure 5. Abstraction of surgical instrument









Figure 6. Planar calibration plate
504 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER

Step 3. Corner detection
In this paper, we employ the function,
cvFindChessboardCorners(), from OpenCV library to
detect corners of calibration plate image [17]. For an
input image ( ) , I u v s pixel ( , ) x y , if R is bigger than a
given threshold, then pixel ( , ) x y is a Corner.

2
2
2 2 2 2 2
det ( )
( )
x y x y x y
R M k traceM
I I I I k I I
=
= +
(2)
Where k is a coefficient, generally is 0.04.
x
I and
y
I
are first gray gradients:
,
x y
I I
I I
x y

= =

(3)
M is a real symmetric matrix:

2
2
,
,
x x y
x y y
I I I
M
I I I


=


(4)
M det is M s determinant, M trace is the trace of M .
is Gauss smoothing operator. After that,
cvFindCornerSubPix() is used to get more accurate image
coordinates of corners. Fig. 8 shows the corner detection
results of calibration plate images in Fig. 6.

Step 4. Corner matching
Actually, the distributions of corners in different
rectangular arrays are not one-to-one corresponding, so
corner matching is indispensable. In our paper, we design
two basic transformation functions performing on the
rectangular corner array: clockwise rotation function and
horizontal flip function. The different combination of
these two functions can achieve all transformation of the
rectangular array needed in experiment.
Step 5. Calculation of camera parameters
When it comes to calculation of camera parameters,
the two relatively popular algorithms are Tsai two-step
method [18] and Zhangs algorithm [19], and they are
both highly accurate and robust. Compared with Tsai
two-step method, Zhangs algorithm expects camera is
supposed to capture calibration plate from different
viewpoints, but the camera and calibration plate should
be fixed all the time and cannot be moved in our
application. In this situation, we choose Tsai two-step
method to calculate the camera parameters.
The calibration results of the camera capturing Fig. 7(a)
are shown in TABLE II.
TABLE II.
CALIBRATION RESULTS
Foal length f (mm) 5.130550
Radial distortion
coefficient
kappa(1/m
2
)
4.680239e-003
Translation vector T
(mm)
3.753.75
Scale factor s 1.000000
Optical center
coordinates
x
C = 646.000000,
y
C = 482.000000
Rotation matrix R
(0.629319 0.777092 0.009271)
(0.417971 -0.328383 -0.847033)
(-0.655178 0.536929 -0.531459)

B. Reconstruction of 3D coordinates of markers
In order to recover 3D coordinates of two markers on
simulated surgical instrument, we first need to extract
corresponding feature points, and then match feature
points from three cameras to form identical points. After
these, least square method is used to calculate the 3D
coordinates of markers.
Picture Preprocessing
Owing to effect of illumination, it is unavoidable to
generate shadows of simulated surgical instrument and
hands of operator in motion images of simulated surgical
instrument. Besides, the noises are also inevitable to be
introduced during camera imaging. Therefore, image
preprocessing is quite necessary.
For any motion image of simulated surgical instrument,
this process mainly includes the following steps:
(1) Step 1. Convert the RGB image into gray image,
the conversion formula is shown as following:

[ ]
0.299
0.578
0.114
Gray R G B


=



(5)
(2) Step 2. Subtract the gray image of corresponding
background image to remove background and it can be
expressed by the following formula:
( )
max 0,
new bkg old
I I I = (6)


(a) (b)
Figure 8. Corner detection results


(a) (b)
Figure 7. Captured calibration plate images
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 505
2011 ACADEMY PUBLISHER
Where
old
I is a captured gray image,
bkg
I is
background image and
new
I is a processed image
according to formula (6).
(3) Step 3. Apply median filter to the residual image
for noise removal and it can be expressed by the
following formula:

^
( , )
( , ) { ( , ) }
xy
i j W
I x y median I i j

=
(7)
Where I is an input image,

I is an image applied
median filter. As we seen in formula (7), to calculate the
result of a pixel applied median filter, we must sort the
pixels in
xy
W , and we choose the intermediate value as
the result.
(4) Step 4. Utilize threshold segmentation method
based on statistics to complete initial shadow removal.
The threshold value is set as 101 through statistics.
Fig. 9 shows the result of image preprocessing.
Although most of background and shadows has been
removed, there are still redundant areas.

Feature points extraction and stereo matching
The purpose of feature point extraction is to further
remove redundant areas, and finally obtain two
dimensional (2D) image coordinates of marker by
calculating the barycentre of corresponding feature point.
Generally speaking, feature points in an image obey
Gaussian distribution, so their grey histograms will
appear Gaussian peaks [20]. The sum of each row is
shown in Fig. 10.

Assume the top left corner of an image is origin, the
width direction is X axis, the height direction is Y axis.
Then we have integral projection along Y axis that is
adding all pixels gray value in a row, the formula is:

0
( ) ( , )
w
i
y j I i j
=
=

(8)
Where j is the j th row of a image, w is the width of
a image.
The integral projection of Fig. 9.b on Y axis forms a
blue curve. Two highest peaks represent the projections
of two feature points on Y axis, others areas correspond
to the projections of redundant areas. If we set a proper
threshold value, it is quite easy to isolate projections of
feature points on Y axis. In the same way, we also can
get their projections on X axis. Through experiments,
we find that the threshold value is 2000. In this way, we
can determine the feature points. At last, we take the
coordinates of barycentre of each feature point as its 2D
image coordinates. The extraction results of the feature
points in Fig. 9(b) are shown as the crosses in Fig. 11.

During the actual surgery, marker 1, as shown in Fig.
11, always moves above marker 2. As for any image, the
feature point with larger Y coordinate therefore is image
region of marker 1, and another one is image region of
marker 2. In this way, the extracted feature points are
simply matched.
Calculation of 3D coordinates of markers
Suppose the feature points of marker 1,
( )
w w w
z y x Q , ,
,
are
( )
1 1 1
, q x y ,
2
q ( )
2 2
, x y and
( )
3 3 3
, q x y in images taken
by three cameras,
1
C ,
2
C ,
3
C , respectively. After camera
calibration, we have known corresponding three
projection metrics
1
M ,
2
M and
3
M . We can obtain the
following equation.
11 12 13 14
21 22 23 24
31 32 33 34
1
1 1
w w i i i i
i
w w i i i i i
ci i
w w i i i i
x x
x m m m m
y y
z y m m m m M
z z
m m m m







= =









(9)


Figure 10. The integral projection of Fig. 9(b) on Y axis


(a) (b)
Figure 9. The original motion image of simulated surgical instrument
and its image preprocessing result



Figure 11. The extracted feature points in Fig. 9(b)
506 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
As for camera group including
1
C and
2
C , we set i=1,2,
and let
12
w
x ,
12
w
y ,
12
w
z take the places of
w
x ,
w
y ,
w
z ,
respectively. Thus, we get linear system as follows.

( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( )
1 1 12 1 1 12 1 1 12 1 1
1 31 11 1 32 12 1 33 13 14 1 34
1 1 12 1 1 12 1 1 12 1 1
1 31 21 1 32 22 1 33 23 24 1 34
2 2 12 2 2 12 2 2 12 2 2
2 31 11 2 32 12 2 33 13 14 2 34
2 2 12 2
2 31 21 2 32
w w w
w w w
w w w
w
xm m x xm m y xm m z m xm
ym m x ym m y ym m z m ym
x m m x x m m y x m m z m x m
y m m x y m
+ + =
+ + =
+ + =
+ ( ) ( )
2 12 2 2 12 2 2
22 2 33 23 24 2 34 w w
m y y m m z m y m + =
(10)
The linear system above includes four equations and
three unknowns
12
w
x ,
12
w
y and
12
w
z , here we utilize least
square method to solve this linear system. In the same
way, as for camera group including
1
C and
3
C , we
get
13
w
x
,
13
w
y and
13
w
z ; as for camera group including
2
C and
3
C , we get
23
w
x ,
23
w
y
and
23
w
z . At last, the final 3D
coordinates of maker 1 can be calculated as follows.

12 13 23
3
w w w
w
x x x
x
+ +
=

(11)

12 13 23
3
w w w
w
y y y
y
+ +
=

(12)

12 13 23
3
w w w
w
z z z
z
+ +
=
(13)
In the same way, we also can obtain the 3D
coordinates of maker 2. Thus, we complete the 3D
tracking and positioning of simulated surgical instrument.
IV. EXPERIMENTAL RESULTS
The detailed configuration of our experimental
platform is as follows: Computer: Intel Core Duo CPU
@2.66GHz, 2GB memory; Basler acA1300-30gc Camera:
1092

962 resolution, 30FPS; Software: Microsoft
Visual C++.net 2005.
The 3D tracking and positioning equipment is shown
in Fig. 12. Three cameras are connected to the computers
and transmit the captured pictures to computers. Then we
process the captured pictures and calculate marked
points 3D coordinates.

The absolute error between the corners calculated 3D
coordinates and the theoretical 3D coordinates is shown
in Fig. 13.

According to Fig. 13, the average absolute error of
reconstructed corners 3D coordinates is 33.40 (363)
=0.31mm. The error mainly derived from the following
three aspects:
Production and placing of the calibration plate
The error of corner detection algorithm
The error of camera calibration algorithm
As the absolute error of reconstructed corners is less
than 1mm, it can meet the required precision of the
applied field.
The image coordinates and space coordinate of
surgical instrument are shown in TABLE III.
TABLE III.
THE IMAGE COORDINATES AND SPACE COORDINATE OF SURGICAL
INSTRUMENT.
Image coordinate Space coordinate
510.01,
226.23
285.76,
407.24
952.65,
337.45
(43.79, -38.61, 92.68)
477.09,
380.54
346.31,
595.17
945.31,
474.26
(64.08, -29.52, 26.08)
663.15,
226.67
258.64,
336.43
827.08,
417.54
(-19.3, -27.34, 92.39)
625.61,
384.06
317.18,
508.61
830.26,
555.04
(-1.14, -16.55, 26.32)
674.41,
170.26
259.24,
262.90
804.00,
342.18
(-24.80, -22.31, 117.62)
634.01,
333.78
321.10,
438.32
995.31,
164.45
(-5.91, -11.83, 51.98)
456.09,
104.61
273.32,
277.64
997.67,
301.41
(60.87, -42.92, 144.09)
401.13,
258.51
350.13,
488.40
731.05,
129.39
(89.87, -34.88, 82.11)
429.32,
190.17
556.72,
239.15
757.01,
257.53
(63.21, 52.81, 140.14)
366.14,
366.27
621.82,
436.19
647.91,
239.28
(92.44, 61.47, 78.27)
553.67,
261.04
524.30,
268.42
952.08,
337.12
(21.52, 61.07, 114.55)
507.17,
438.30
580.19,
447.20
945.31,
474.16
(43.50, 71.32, 50.19)

As we mentioned above, the actual distance between
the two makers on the simulated surgical instrument is
fixed and it measures 70 mm. Fig. 14 provides the

Figure 12. 3D tracking and positioning equipment

Figure 13. Interactive organ tissue deformation simulation
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 507
2011 ACADEMY PUBLISHER
distances calculated by the 3D coordinates of the two
makers. The Fig. 15 provides the absolute error and
standard deviation calculated by the 3D coordinates of
the two makers.


As can be seen, the standard deviation of our
developed apparatus is 0.371632mm, less than 1mm,
which totally satisfies the precision requirement of
current virtual surgery simulation systems.
There mainly exist three types of errors in our
developed apparatus. The first one is the algorithm error
which comes from the implementation of algorithms. The
second error is the generation and placement of planar
calibration plate and the third one is the generation of
simulated surgical instrument manually. Overall, our
developed apparatus is precise, and the major reason for
the errors is the fabrication errors, which can be
decreased by using professorial calibration plate and
machined simulated surgical instrument.
V. CONCLUSION AND FUTURE WORK
In this paper, we present a method based on
stereoscopic vision to construct a suit of 3D tracking and
positioning apparatus of surgical instruments. It consists
of a simulated surgical instrument, three cameras and a
computer. Three cameras are used to capture the motion
images of simulated surgical instrument in real time.
After a series of computer processing, we can obtain the
six degree of freedom information of simulated surgical
instruments with the absolute error of less than 1 mm,
thereby positioning the instrument. Then, we analyze the
sources of error, and integrate the developed apparatus
into soft tissue deformation simulation in virtual surgery.
The experimental results show that the proposed method
is highly accurate and easily operated, even if it is
inexpensive. In future work, we will utilize this tracking
and positioning method and equipment to capture the soft
tissue warping images and calculate the soft tissue
parameters by using least square method. Also we will
perfect our tracing and positioning method and construct
parameter measurement platform, proving the
effectiveness of it.
ACKNOWLEDGMENTS
This work was fully supported by a grant from the
National Natural Science Foundation of China (Grant No.
61070079).
REFERENCES
[1] Bassma Ghali, B. Eng. Algorithms for Nonlinear Finite
Element-based Modeling of Soft-tissue Deformation and
Cutting. Electrical & Computer Engneering, Hamilton,
Ontario, Canada, 2008.
[2] Cagetay Basdogn, Mert Sedef and Mathias Harders et al.
VR-Based simulators for training in minimally invasive
surgical. IEEE Computer Graphics and Applications,
2007, vol. 27, no. 2, pp. 54-66.
[3] Lgor Peterlk, Mert Sedef, Cagatay Basdogan, Ludek
Matyska. Real-time visio-haptic interaction with static
soft sissue models having geometric and material
nonlinearity, Computers & Graphics, 2010, vol. 34, no.1,
pp. 43-54.
[4] Florian Schulze, Katja Bhler, Andr Neubauer, Armin
Kanitsar, Leslie Holton, Stefan Wolfsberger. Intra-
operative virtual endoscopy for image guided endonasal
transsphenoidal pituitary surgery. International Journal of
Computer Assisted Radiology and Surgery, 2010, vol. 5, no.
2, pp. 143-154.
[5] Changmok Choi, Jungsik Kim, Hyonyung Han, Bummo
Ahn, Jung Kim. Graphic and haptic modeling of the
oesophagus for VR-Based medical simulation,
International Journal of Medical Robotics Computer
Assisted Surgery, 2009, vol. 5, no. 3, pp. 257-266.
[6] J. C. Krieg. Motion tracking: polhemus technology.
Virtual Reality Systems, mar. 1993, vol. 1, no. 1, pp. 32-36.
[7] A.L. Trejos, R.V. Patel, M.D. Naish and C.M. Schlachta.
Design of a sensorized instrument for skills assessment
and training in minimally invasive surgery. the 2nd
Biennial International Conference on Biomedical Robotics
and Biomechatronics, Scottsdale, Arizona, October 19-22,
2008: 965970.
[8] Yamaguchi S, Yoshida D, Kenmotsu H, Yasunaga T,
Konishi K, Ieiri S, Nakashima H, Tanoue K, Hashizume
M. Objective assessment of laparoscopic suturing skills
using a motion-tracking system. Surg Endosc, 2010, vol.
25, no. 3, pp. 771-775.
[9] J. Stoll, P. Novotny, P. Dupont and R. Howe. Real-time
3d ultrasound-based servoing of a surgical instrument.
IEEE International Conference on Robotics and
Automation, Orlando, FL, 2006: 613-618.

Figure 15. Absolute error and standard deviation


Figure 14. Standard distance and actual distance
508 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
[10] Nogueira JF Jr, Stamm AC, Lyra M. Novel compact
Laptop-based image-guidance system: preliminary study.
Laryngoscope, 2009, vol. 119, no. 3, pp. 576-579.
[11] Tansel Halic, Sinan Kockara, Coskun Bayrak, Richard
Rowe. Mixed reality simulation of rasping procedure in
artificial cervical disc replacement (ACDR) surgery, BMC
Bioinformatics 2010, vol. 11, no. Suppl 6, pp.S11.
[12] Foxlin E. Motion Tracking Requirements and Technologies.
In K. Stanney, Handbook of Virtual Environment
Technology, Kay Stanney, Editor, Lawrence Erlbaun,
Associates, 2002: 163-210.
[13] Tomikawa M, Hong J, Shiotani S, Tokunaga E, Konishi K,
Ieiri S, Tanoue K, Akahoshi T, Maehara Y, Hashizume
M, Real-time 3-dimensional virtual reality navigation
system with open MRI for breast-conserving surgery.
Journal of the American College of Surgeons, 2010, vol.
210, no. 6, pp. 927-933.
[14] Songde Ma, Zhengyou Zhang, Computer vision: The Basis
of Theoretical Calculations and Algorithms. Science Press,
Beijing, 2004.
[15] Morgan Kaufmann. Machine Vision : Theory, Algorithms,
Practicalities. Morgan Kaufmann Publishing Co. Inc., 3rd
Edition, 2004.
[16] Richard L. Burden, J. Douglas Faires. Numerical Analysis.
Brooks Cole. 9th edition, August 9, 2010.
[17] http://sourceforge.net/projects/opencvlibrary/.
[18] Roger Y. Tsai. A versatile camera calibration technique
for high-accuracy 3D machine vision metrology using off-
shelf TV cameras and lenses. IEEE Journal of Robotics
and Automation, August 1987, vol.3, no. 4, pp. 323344.
[19] Zhengyou Zhang. A flexible new technique for camera
calibration. IEEE Transactions on Pattern Analysis and
Machine Intelligence, November 2000, vol. 22, no. 11,
pp.13301334.
[20] Xuan Yang, Ji Hong Pei, Wan Hai Yang. Real-time
detection and tracking of light point. Journal of Infrared
and Millimeter Waves, 2001, vol. 20, no. 4, pp. 279-282.





Zhaoliang Duan is currently a fresh-year
Master's student in Computer School of
Wuhan University in China. He has
received his BS degree in computer
science and technology from Wuhan
University of China in June 2011. He is
recommended for admission to Wuhan
University to continue his study in the
next 2 years.
His research interests include computer simulation and
virtual reality.
He has published 10 papers in related conference
proceedings and journals.



Zhiyong Yuan received the BS degree in
Computer Application and MS degree in
Signal and Information Processing from
Wuhan University, in 1986 and 1994,
respectively, and a Ph.D. degree in
Control Theory and Control Engineering
from Huazhong University of Science
and Technology in 2008. He was an
assistant professor from 1994 to 1998 and
have been an associate professor at Wuhan University since
1999. During 2006-2007, He was a visiting professor at the
Dept. of Neurological Surgery, School of Medicine, University
of Pittsburgh, conducting research on computer-based surgical
simulation system for training endoscopic neurosurgeons.
His Research interests include Computer Simulation and
Virtual Reality, Embedded System and IOT applications, Image
Processing and Pattern Recognition, Intelligent Healthcare
System.
He has been the reviewer of Journal of X-Ray Science and
Technology, Journal of Supercomputing, Journal of Computer
Science and Technologys reviewer He has published over 60
papers in related conference proceedings and journals.



Xiangyun Liao is currently a fresh-year
Master's student in Computer School of
Wuhan University in China. He has
received his BS degree in computer
science and technology from Wuhan
University of China in June 2011. Liao
is recommended for admission to Wuhan
University to continue his study in the
next 2 years.
His research interests include computer simulation, virtual
reality, image processing and pattern recognition, etc.
He has published 7 papers in related conference proceedings
and journals.



Weixin Si is currently a fresh-year
Master's student in Computer School of
Wuhan University in China. He has
received his BS degree in computer
science and technology from Wuhan
University of China in June 2011. Si is
recommended for admission to Wuhan
University to continue his study in the
next 2 years.
His research interests include computer simulation and virtual
reality, image processing and pattern recognition, etc.
He has published 9 papers in related conference proceedings
and journals.



Jianhui Zhao received the B.Sc. degree
in Computer Engineering from Wuhan
University of Technology in 1997, the
M.Sc. degree in Computer Science from
Huazhong University of Science and
Technology in 2000, and the Ph.D.
degree in Computer Science from
Nanyang Technological University in
2004. From 2003 to 2006, he worked as a
Research Assistant/Associate in Hong Kong University of
Science and Technology. Currently he is working as an
Associate Professor in Computer School of Wuhan University.
His research interests include computer graphics and digital
image processing.
He has published over 50 papers in related conference
proceedings and journals

JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 509
2011 ACADEMY PUBLISHER

Design of Image Security System Based on
Chaotic Maps Group

Feng Huang
College of Electrical & Information Engineering, Hunan Institute of Engineering, Xiangtan, China
Email: huangfeng25@126.com

Xilong Qu
College of Computer & Communication, Hunan Institute of Engineering, Xiangtan, China
Email: quxilong@126.com



AbstractImages are used more and more widely in
peoples life today. The image security becomes an
important issue. Some encryption technologies are used to
ensure the security of images. In them, the SCAN patterns
are the one of effective tools to protect image. It generates
very large number of scanning patterns of image. Then it
shuffles the positions of image pixels by the patterns. The
idea of chaotic maps group is similar to SCAN patterns. The
paper designs a new image security system based on chaotic
maps group. It takes the different maps of chaotic maps as
patterns. The key represents different chaotic map patterns.
Simulation shows that the image security system has a fast
encryption speed and large enough key space, which mean
high security. The design solve the limit between the keys
and the size of image when encrypt image by chaotic map.
At the same time it also solves the problem of the size of
image required by SCAN pattern.

Index Termsimage security, chaotic maps, encryption

I. INTRODUCTION
Over the past decade, digital images were used more
and more widely today. People used to take some pictures
with a digital camera. They uploaded the pictures to
Internet websites, act as Facebook. How to protect the
security of image became a more and more important
issue. Some traditional encryption technologies such as
DES, RSA, etc can be used to protect the security of
digital images. But for some intrinsic features of image,
such as bulk data capacity and high correlation among
adjacent pixels, the technologies are not suitable
absolutely for practical image encryption
[1]
.
There are some new technology are used in image
encryption. Blowfish is a symmetric block cipher that can
be used as a drop-in replacement for DES or IDEA. In [2],
blowfish algorithm is used to encrypt image. The plain
image was divided into different blocks, which were
rearranged into a transformed image using a
transformation algorithm.
In [3], a new method for image encryption is present.
Specific higher frequencies of DCT coefficients are taken
as the characteristic values which are encrypted. The
resulted encrypted blocks are shuffled according to a
pseudorandom bit sequence.

Chaos can be well used in image encryption
[4,5]
now.
Chaos has some characteristics which can be connected
with the confusion and diffusion property in
cryptography, such as sensitive dependence on initial
conditions and parameters, broadband power spectrum,
randomness in the time domain, ergodicity, low-
dimensional etc.
In fact the idea of using chaos for encryption can be
trace back to the classical Shannons paper
[6]
in which
the basic stretch-and-fold mechanism was proposed
which could be used for encryption. And the stretch-and-
fold mechanism can generate chaotic map. The image
encryption uses the geometric characteristics of image.
The process of stretch-and-fold represses the changes of
distance among pixels. It shuffles the position of pixels.
In fact it is a process of image permutation. Combined
with diffusion mechanism, it can change the value of
image pixels.
Some classic chaotic maps are used in image
encryption such as the cat map, the baker map etc. In [5]
a symmetric image encryption scheme is obtained. It is
shown that the permutations induced by the baker map
behave as typical random permutations. The cipher has
good diffusion properties with respect to the plain image
and the key. But the baker map does not have simple
formula and the key are limited by size of image. In [7,8]
symmetric image encryption schemes based on three-
dimensional chaotic maps are proposed. It employs a
chaotic map to shuffle the positions of image pixels and
uses the logistic map to confuse the relationship between
the cipher image and the plain image. In [9], a new
invertible two-dimensional chaotic map, the line map was
proposed. An image encryption scheme based on the line
map is developed and the execution of the scheme is fast
and the key can be any enough long integer to satisfy the
high security requirements.
SCAN patterns
[10,11]
are the one of effective tools to
protect image. SCAN patterns generate very large
number of scanning paths or space filling curves. The
image encryption is performed by SCAN-based
510 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
doi:10.4304/jmm.6.6.510-517

permutation of pixels and a substitution rule which
together form an iterated product cipher. But one of the
questions is that the SCAN patterns require the plain
image must be square image and its size must be even.
The paper designs a new image security system based
on chaotic maps group. The idea of chaotic maps group is
similar to SCAN patterns technology. It takes the
different chaotic maps as encryption patterns. The key
represents the number of different chaotic map patterns.
Simulation shows that the image security system has a
fast encryption speed and large enough key space, which
mean high security. The design solve the limit between
the keys and the size of image when encrypt image by
chaotic map. At the same time it also solves the problem
of the size of image required by SCAN pattern. Analysis
shows that the image security is safe.
II. THE KEY GENERATION
Generally people are used to take six decimal numbers
as the key. Here the origin key are N
0
N
1
N
2
N
3
N
4
N
5
, 0 N
i

9 (i=0,1,2,3,4,5). The process of the key generation is as
following.
Firstly, it can use the value of N
0
and

N
1
to get the key
of permutation (key
1
and key
2
)
.
Here there are a map table
between the value of N
0
, N
1
and the position of origin key
.
It can see in Table I.
For example, if the origin key is 476328, here N
0
=4,
from the Table I, key
1
is equal to the value of the 6
nd

number of the origin key. Key
1
is 3. If N
1
=7, the value of
key
2
is equal to the number of the 2
th
number of the origin
key. Key
2
is 7. And the key
3
is 7, the key
4
is 2.
Secondly, the left parts of the origin key are the part
after the decimal point. The result is key
5
, which is for
part of diffusion. Here key
5
is 0.468.
TABLE I.
THE MAP BETWEEN N
0
,

N
1
,

N
2
,

N
3
AND THE POSITION OF ORIGIN KEY
N
0
,

N
1
,

N
2
,

N
3
Position in origin key
0 3
1 6
2,3 5
4,5 4
6,7 2
8,9 1

The process of the key generation can be seen in
Figure 1.


Figure 1. The key generation.
III. THE IDEA OF CHAOTIC MAPS GROUP
The paper uses five new two-dimensional chaotic maps
as pattern group. All of the maps utilize an important
characteristic of images, which is each pixel of column of
image can be inserted into adjacent two pixels of row of
image. The new chaotic map can encrypt images by
processing image stretch-and-fold. The processes of the
chaotic maps are seen in Figure 2.

0 1 2 3
4 5 6 7
8 9

Figure 2. Chaotic maps group.
Supposing the dimension of a square image is NN,
where N is an integer. A(i, j) is the matrix of a square
image, in which each element corresponds to a gray-level
value of the pixel (i, j); L(i), i=0, , N-1, j=0, , N-1. N
is a one dimensional vector mapped from A.
The chaotic maps all have two different maps. Those
are left map and right map.
The left map algorithms
The first chaotic map,
0
( (4 1) 1) ( ((4 2) / 2), ( / 2))
i
k
k j floor i j floor j
=
+ + = +

L A

(1)
where i=0,1,, N/2-1, j=0,1,,4i-2. N is the even
number.
/ 2
1 / 2
( (4 1) (4 1 4 ) 2 1 )
( ((2 1 ) / 2), 2 1 (( 1) / 2))
N i
k k N
k N k N j
floor N j i N floor j
= =
+ + +
= + + +

L
A

(2)
where i=N/2, N/2+1,, N-1, j=0,1,,4N-4i. N is the odd
number.
0
( (4 1) 1) ( ((4 2) / 2), ( / 2))
i
k
k j floor i j floor j
=
+ + = +

L A

(3)
where i=0,1,,(N-1)/2-1, j=0,1,,4i-2.
( 1) / 2
1 ( 1) / 2
( (4 1) (4 1 4 ) 2 3 )
( ((2 1 ) / 2), 2 (( 1) / 2))
N i
k k N
k N k N j
floor N j i N floor j

= =
+ + +
= +

L


JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 511
2011 ACADEMY PUBLISHER

(4)
where i=(N+1)/2-1, (N+1)/2+1,, N-1, j=0,1,,4N-4i.

The 2
nd
chaotic map,
[(2 ) 2( ) 1] ( , ) N j j i j i j + = L A (5)
where i > j.
[(2 ) 2( )] ( , ) N i i j i i j + = L A (6)
where
i j
.
The 3
th
chaotic map,
[(2 2 1) 2( ) 1] ( , ) N j j i j i j + + = L A (7)
where N j i < + , j i > .
[(2 2 1) 2( )] ( , ) N i i j i i j + + = L A (8)
where N j i < + , j i .
2
[ (2 1) ( 1 ) 2( )] ( , ) N j N j j i i j + = L A
(9)
where N j i + , j i < .
2
[ (2 1) ( 1 ) 2( ) 1] ( , ) N i N i i j i j + = L A
(10)
where N j i + , j i .
The
4th
chaotic map,
[( 2)( 1) / 2 2( )] ( , ) N j N j j i i j + + + = L A
(11)
where i j ,N, j both are even numbers or odd numbers,
ji N-j are odd numbers.
[( 3)( 2) / 2 2( ) 1] ( , ) N j N j j i i j + + + + = L A
(12)
where ji, N-j is even number.
2
[( (2 1) ) / 2 2( 1)] ( , ) N N N j j N i i j + + + = L A

(13)
where j<i, j is even number.
2
[( (2 ) ( 1)) / 2 2( ) 1] ( , ) N N N j j N i i j + + + = L A
(14)
where j<i, j is odd number.
The 5
th
chaotic map,
(( 1) / 2 2( ) 1) ( , ) j j j i i j + + + = L A (15)
where i j , N, j both are even numbers or odd
numbers.
(( 1) / 2 2( )) ( , ) j j j i i j + = L A (16)
where i j , N is even numbers and j is odd numbers, or
N is odd numbers and j is even numbers.

2
([ (2 1) ] / 2 2( ) 2) ( , ) N N N j j N i i j + + + = L A

(17)
where i j < , j is even number.

2
([ (2 ) ( 1)] / 2 2( ) 1) ( , ) N N N j j N i i j + + + = L A

(18)
where i j < , j is odd number.
The right map algorithms
The right map is symmetry with left map. First, a
mirror process of the image is made. The algorithm of the
mirror image is described with the following formula:
) 1 , ( ) , ( j N i A j i A = (19)
where A is the matrix of the mirror image of a square
image A.
After obtaining the mirror image A of A, the right
map can be done with the algorithms of the left map.
Of course, in order to increase the efficiency of
calculation of the right map permutation, the best way is
to derive the algorithm of the right map directly.
Some of the left map algorithms are as follows.
The 2
nd
chaotic map,
[(2 ) 2( ) 1] ( , 1 ) N j j i j i N j + = L A (20)
where i > j.
[(2 ) 2( )] ( , 1 ) N i i j i i N j + = L A (21)
where
i j
.
The 3
th
chaotic map,
[(2 2 1) 2( ) 1] ( , 1 ) N j j i j i N j + + = L A
(22)
where N j i < + , j i > .
[(2 2 1) 2( )] ( , 1 ) N i i j i i N j + + = L A (23)
where N j i < + , j i .
2
[ (2 1) ( 1 ) 2( )] ( , 1 ) N j N j j i i N j + = L A
(24)
where N j i + , j i < .
2
[ (2 1) ( 1 ) 2( ) 1] ( , 1 ) N i N i i j i N j + = L A
(25)
where N j i + , j i .
The
4th
chaotic map,
[( 2)( 1) / 2 2( )] ( , 1 ) N j N j j i i N j + + + = L A
(26)
where i j ,N, j both are even numbers or odd numbers,
ji N-j are odd numbers.
[( 3)( 2) / 2 2( ) 1] ( , 1 ) N j N j j i i N j + + + + = L A
(27)
where ji, N-j is even number.
2
[( (2 1) ) / 2 2( 1)] ( , 1 ) N N N j j N i i N j + + + = L A
(28)
where j<i, j is even number.
2
[( (2 ) ( 1)) / 2 2( ) 1] ( , 1 ) N N N j j N i i N j + + + = L A
(29)
where j<i, j is odd number.
The 5
th
chaotic map,
(( 1) / 2 2( ) 1) ( , 1 ) j j j i i N j + + + = L A (30)
where i j , N, j both are even numbers or odd numbers.
(( 1) / 2 2( )) ( , 1 ) j j j i i N j + = L A (31)
where i j , N is even numbers and j is odd numbers, or
N is odd numbers and j is even numbers.
2
([ (2 1) ]/ 2 2( ) 2) ( , 1 ) N N N j j N i i N j + + + = L A
(32)
where i j < , j is even number.

2
([ (2 ) ( 1)] / 2 2( ) 1) ( , 1 ) N N N j j N i i N j + + + = L A
(33)
where i j < , j is odd number.
512 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER

The map from a line to a square image
The line of NN pixels L is further mapped to a same
size NN square image, B. The map from line L to image
B is described with the following formula:

( , ) ( ) B i j L i N j = +
(34)
IV. DESIGN OF IMAGE SECURITY SYSTEM
Obviously, the design of image encryption based on
chaotic maps group is more flexible than based on single
chaotic map.
In [9] the image encryption is achieved by pixels
permutation firstly. Since the chaotic map was divided
into the left map and the right map, the numbers of the
left map and the right map were used as secret key in
image encryption. If the key are in decimal, from the least
significant digit to the most significant digit, each digit
(0-9) corresponds to the iteration number of the left map
and the right map alternately.
So in one of design the key can be design as following.
The first digit of key represses the time of the left map of
the first chaotic map. The 2nd digit of key represses the
time of the right map of the first chaotic map. The
following key design is similar to them.
There is a serious security problem. The process of
chaotic map must cost time. If the numbers of the left
map and the right map were used as secret key in image
encryption, the whole encryption time may disclosure the
iteration number of chaotic map. In fact it is the key.
If the key are 103050 the whole encryption time is
about 0.02s (the CPU of pc is Intels L2300, the ram is
1G, and the operating system is windows XP). While key
is 1 or 01 the time is about 0.0023. Obviously, the
iteration number is about 9. The sum of all the bits of key
is about 9. Theoretical the original key space is 10
6
. But
in fact the key space is only 2,002 which are much
smaller than the theoretical values. Parts of theoretical
and real key space of six decimal numbers key can be
seen in Table.3, the sum express the sum of all the bits of
key. In fact the real key space is also smaller than the
values in table by previous conclusions. Act as when sum
is 1, the key space only is 2 which much smaller than
theoretical value.

TABLE II.
PARTS OF THEORETICAL AND REAL KEY SPACE OF SIX DECIMAL KEY
Real key space
Sum
is 1
Sum
is 2
Sum
is 3
Sum
is 4
Sum
is 5
Sum
is 6
Sum
is 7
Sum
is 8
6 21 56 126 252 462 792 1287

It can be noted that it is not safe when the sum of all
the bits of key is small. The key space becomes bigger
and bigger with the increase of sum of all the bits of key.
The paper uses other design. The each decimal
number represses one chaotic map. For example, the
number 0 represses the left map of the first chaotic map;
the number 1 represses the right map of the first chaotic
map and so on. So the key from 0 to 9 mean different
map. If the keys are 3772 mean shuffle the image by
the right map of the 2
nd
chaotic map and twice right map
of the 4
th
chaotic map and the left map of the 2
nd
chaotic
map.
The second step is to add a diffusion mechanism to
confuse the value of pixels and change the demographic
characteristics of the cipher image. It can use the logistic
map. Here a is 3.9, Xn is the key
5.


) 1 (
1 n n n
X X a X =
+
(35)
where
) 4 , 0 ( a
,
) 1 , 0 (
n
X
,
3 , 2 , 1 = n


An image encryption is carried out based on the map.
The plain image and cipher image are shown in Figure 3.
It has 256256 pixels with 256 grey levels. The plain
image is encrypted using the chaotic map by the key 0
and 0123456789. It can be seen that the plain image
has been encrypted. The plain image and the cipher
image are equal for every pixel; the decrypted image is
recovered completely. It shows the image encryption
using the chaotic map has no message loss.


(a) Plain image


(b) Cipher image (key 0)

JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 513
2011 ACADEMY PUBLISHER


(b) Cipher image (key 0123456789)
Figure 3. Plain image and cipher images.
V. SECURITY ANALYSIS OF PERMUTAION
Key space
Since the length of the key of the map has no limit, its
key space can be calculated according to the length of the
key. Suppose the key are represented in binary bits. The
relationship between the key space size and the key
length is shown in Table III. In theory, security key can
be any long integer to satisfy the different security
requirements.
TABLE III.
KEY SPACE SIZE VS KEY LENGTH
Key length (bits) 64 256 512
Key space size 1.8410
19
1.1610
77
1.3410
154

Key Sensitivity
Assume that an image is encrypted using the map with
key is 0123456789. Now, the least significant digit of
the key is changed and the test is done for image
decryption. The original key 0123456789 is changed to
key 0123456788 and key 0123456780, both of which
are used to decrypt the cipher image by the original key
0123456789 respectively. The two decrypted images
by two different key are shown in Figure 4. It can be seen
that the image cannot be decrypted using both two key,
which are different from the correct key only in the least
bit one. Therefore, the security of the image encryption
using the chaotic map is much effective.


(a) Decrypted cipher image by error key


(b) Decrypted cipher image by another error key
Figure 4. The sensitivity of key
Correlation
Correlation of two adjacent pixels in a cipher image:

( ) ( ( ))( ( )) cov x, y = E x - E x y - E y (36)

( )
( ) ( )
xy
cov x, y
r =
D x D y
(37)

Where x and y are gray-scale values of two adjacent
Figure 5 shows the correlations of two horizon-tally
adjacent pixels in the plain image and the cipher image:
the correlation coefficients are 0.9442 and 0.0010.
Similar results for diagonal and vertical directions were
obtained and are shown in Table IV.

514 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER


(a) Plain image

(b) Cipher image
Figure 5. Correlations in plain image and cipher image.
TABLE IV.
CORRELATION COEFFICIENTS OF TWO ADJACENT PIXELS

Plain image Cipher image
horizontal
0.9442
0.0010
vertical
0.9711
0.0007
diagonal
0.9187
0.0032

Fixed point ratio
Where key is 0 BD=0.69%, where key is
0123456789 BD=0.71%.
Those mean the positions of the 99% plain image
pixels are charged.
Change of the gray
Where key is 0 GAVE=51.9501, where key is
0123456789 GAVE =52.5440.
Those mean the average values of the pixels are
charged by 20%.
r-m self-relevance
Where r=1, the r-m self-relevance can seen in Table V,
here keys are 0 and 0123456789.
It can be proved the self-relevance of cipher image
significantly reduced compared with the plain image. The
value of self-relevance is even smaller than the value
when m=1. Those mean the effect of permutation is very
good.
TABLE V. SELF-CORRELATION OF IMAGE
m 1 2 3 4 5 6 7
lena 0.41 0.41 0.46 0.50 0.54 0.57 0.60
Key
1
0.22 0.22 0.24 0.26 0.28 0.29 0.31
Key
2
0.13 0.13 0.14 0.15 0.15 0.16 0.16
m 8 9 10 11 12 13 14
lena 0.62 0.64 0.66 0.68 0.69 0.70 0.71
Key
1
0.32 0.33 0.34 0.35 0.36 0.36 0.37
Key
2
0.17 0.18 0.18 0.19 0.19 0.20 0.20

m 15 16 17 18 19 20
lena 0.73 0.73 0.74 0.75 0.76 0.77
Key
1
0.38 0.38 0.39 0.40 0.40 0.41
Key
2
0.21 0.22 0.22 0.23 0.23 0.24

Speed of encryption and decryption
Where the sizes of image are 3232, 6464,
128128, 256256, 512512, the key is 0123456789, the
speed of encryption and decryption are shown in Table VI.
Simulation results show that the encryption speed is
enough fast to meet the real application.
TABLE VI. SPEEDS OF ENCRYPTION AND DECRYPTION
Size of image Encryption(s) Decryption(s)
3232 0.0002 0.0002
6464 0.0009 0.0008
128128 0.0034 0.0033
256256 0.0134 0.0129
512512 0.0534 0.0533
VI. SIMULATION
The image security system includes four parts: key
generation, permutation, diffusion mechanism and
hardware realization.
Key generation can see in part II. So the keys for
permutation are 3772 and for diffusion mechanism is
0.468.
It uses the chaotic maps group as permutation. The
keys for permutation mean the different chaotic maps. It
is explained in part IV.
The classic logistic map is used as diffusion
mechanism. The key for diffusion mechanism is the
parameters of logistic map. The formula of logistic map
can see (35).
In design of hardware system, firstly, it must consider
the characteristics of multi-sensor fusion image. The
multi-sensors fused images have higher accuracy, more
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 515
2011 ACADEMY PUBLISHER

information and more complex shapes than normal
images. Secondly it must consider the issue of real-time
implementation. Some chaotic maps can be improved in
hardware implementation. Act as the mapping process
can be seen as a regular scan mode. So it can be
compressed. Lastly, the design of hardware must face the
popular image formats (act as JPEG 2000). Otherwise it
may undermine the stream. The cipher image can not be
decoded.
The chaotic maps are the permutation and combination
of pixels. The charges of location of the transformation
can be converted to accumulate in smart chip cache. So in
theory, chaotic image encryption algorithm is easy for
hardware implementation.
A hardware design as shown in Figure 6. The system
includes address generator, control logic unit, RAM and
accumulator. The key
1
and key
2
control the process of
permutation. By address generator and control logic unit
it can charge the address of plain image in RAM. In fact
it confuses the pixel of plain image. The key
3
is the
parameters of diffusion mechanism. By accumulator it
can charge the value of pixels of plain image. After the
process the histogram of cipher image becomes flat. It
can confuse the value of pixels and change the
demographic characteristics of the cipher image by
diffusion mechanism.
In fact the design only considers operation of the pixels.
In order to enhance the adaptability, it must consider
some common compression algorithm, act as JPEG 2000.

Figure 6. Hardware design.

Simulation show the time of permutation is about 4ms.
So the speed of permutation is 15M/s (the CPU of pc is
Intels L2300, the ram is 1G, the operating system is
windows XP). And the time of diffusion is about 5ms. So
the total speed of encryption is about 7M/s. The cipher
image is shown in figure 7.


(a) Cipher image

(b) Histogram of cipher image
Figure 7. Simulation
VII. SUMMARY
The paper designs a new image security system based
on chaotic maps group. The idea of chaotic maps group is
similar to SCAN patterns technology. It takes the
different chaotic maps as encryption patterns. The key
represents the number of different chaotic map patterns.
Simulation shows that the image security system has a
fast encryption speed and large enough key space, which
mean high security. The design solve the limit between
the keys and the size of image when encrypt image by
chaotic map. At the same time it also solves the problem
of the size of image required by SCAN pattern. Analysis
shows that the image security is safe. It can be used in
real-time image encryption applications with a speed
about 7M/s.
ACKNOWLEDGE
The authors gratefully acknowledge the projects
supported by scientific research fund of Hunan Provincial
Education Department (08B015, 08A009), project
supported by Provincial Natural Science Foundation of
Hunan (10JJ6099), supported by Provincial Science &
Technology plan project of Hunan (2010GK3048)
supports the research.

516 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER

REFERENCES
[1] S. Mazloom, A. M. Eftekhari-Moghadam, Color image
encryption based on coupled nonlinear chaotic map,
Chaos, Solitons & Fractals. vol. 42(3), pp. 1745-1754, 15
November 2009.
[2] B. Y. Mohammad Ali and J. Aman, Image encryption
using Block-Based transformation algorithm, IAENG Int
J of Computer Science, vol. 35(1), pp. 15-23, 19 February
2008.
[3] L. Krikor, S. Bab, T. Ari and Zyad Shaaban, Image
encryption using DCT and stream cipher,. European
Journal of Scientific Research, vol.32(1), pp.47-57, 2009.
[4] L. Kocarev, Chaos-Based cryptography: a brief
overview, IEEE Circuits and System Magazine, vol.1(3),
pp.6-21, 2001.
[5] J. Fridrich, Symmetric ciphers based on two-dimensional
chaotic maps,Int J Bifurcat Chaos, vol.8, pp.1259-1284,
1998.
[6] C.E.Shannon, Communication theory of secrecy
systems, The Bell System Technical Journal, vol.28, no.4,
pp.656-715, 1949.
[7] G. Chen, Y. Mao, and C. K. Chui, A Symmetric Image
Encryption scheme based on 3D chaotic cat maps,Chaos
Solitons and Fractals, vol.21, pp.749-761, 2004.
[8] Y. Mao, G. Chen and S. Lian, A novel fast image
encryption scheme based on 3D chaotic Baker maps, Int J
Bifurcat Chaos, vol.14, pp.3613-3624, 2004.
[9] Y. Feng, L. J. Li, F. Huang, A symmetric image
encryption approach based on line maps, ISSCAA, vol.1,
pp.1362-1367, 2006.
[10] N. Bourbakis, C. Alexopoulos, Picture data encryption
using SCAN patterns, Pattern Recognition, vol.25(6),
pp.567-58, 1992.
[11] S. S. Maniccam, N. G. Bourbkis, Image and video
encryption using Scan Patterns, Pattern Recognition,
Vol.37(4): pp.725-737, 2004.



Feng Huang (1978-), was born in
Shaoyang, Hunan, P. R. C. He
received the B.S. degree in
automatic test and control form
Harbin Institute of Technology, P.
R. C., in 2000, the M.S. degree in
Power Engineering form Harbin
Institute of Technology, P. R. C., in
2002, and the PHD degree in
Power Electronics and Power
Drives from Harbin Institute of
Technology, P. R. C., in 2007. He is an associate
professor of College of electrical & information
engineering, Hunan Institute of Engineering, Xiangtan, P.
R. C. His research interests include image encryption,
design of automated test system. He had several years
experience in teaching, research and development in
projects and published over 20 scientific papers.

Xilong Qu(1978-), was born in Shaoyang, Hunan, P. R. C.
He received the PHD degree from Southwest Jiaotong
University, P. R. C. in 2006. Now he is an associate
professor of Hunan Institute of Engineering, master
supervisor of Xiangtan University, the key young teacher
of Hunan province, academic leader of computer
application technology in Hunan Institute of Engineering.
His research interesting is web service technology,
information safety and networked manufacturing. He has
published over 30 papers in some important magazines.


JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 517
2011 ACADEMY PUBLISHER
The Capture of Moving Object in Video Image

Weina Fu
a
, Zhiwen Xu
b
, Shuai Liu
a, b
, Xin Wang
a,b
, Hongchang Ke
a

a
Software College, Changchun Institute of Technology, Changchun, China
Email: ls_25210114@sohu.com
b
College of Computer Science and Technology, Jilin University, Changchun, China
Email: ls_25210114@163.com



AbstractNowadays, video is a primary information carrier
in www (world wild web) and moving objects are often
carrying more information. But it is hard to catch these
objects in video quickly and correctly. In this paper, we put
forward a method to catch moving object in video. Firstly,
based on difference image method, we determine moving
region in video image. To avoid hardness to build
background, we build background with a new algorithm
based on difference changed. Finally, we get the objects and
denoise them with erosion and dilation. The experimental
result is shown that the new method is feasibility and high-
quality.

Index Termsvideo capture, denoise, moving object, video
image

I. INTRODUCTION
An automatic video-based face recognition system
includes human face detection part, face tracking part,
facial feature capture part and the people face recognition
part
[1-3]
. Obviously, the premise is to locate the face. It is
segment to two directions. One is how to locate human
face in slack images
[4]
, the other is how to locate human
face in video
[5]
. Moreover, it is well known that object
recognition system has the samilar parts.
An important method in video capture is to find and
get changed regions of moving targets from background
in series images of video. We can see the status of our
work in video capture in alorithm a.
Algorithm a (Capture Moving Object in Video )
Input: a series video images
Output: the moving region (object).
Step1. Find difference between every two next images.
Step2. Judge if points are relative by relation of color
and moving rules.
Step3. Catch the region (object) for next process.
We often call this method segmentation of moving
objects. The effective segmentation of moving region is
important in post-processing such as target classification,
tracking and behavior understanding. However, due to the
dynamic changes of the background image, such as
weather, light, shadow and other disturb factors, to make
effective segmentation is still a difficult work.
In segmentation of moving objects, the main
segmentation method is difference image method, time
difference method and optical flow method
[6-7]
.
Difference image method is a kind of technology that to
segment regions by use difference of current frame and
background frame. But it is too sensitivity by dynamic
scene that it makes lots mistakes. The main limitation of
time difference method is that it can not get all pixels
with general characters and it is often create a hole inside
of moving task. Algorithm of optical flow method is too
complex and too poor to resist noise. To compare these
three methods comprehensively, the created new method
is based on difference image method because that it is
simple and easy to implement in real time environment
by the video image with general static background.
Nowadays, scholars improved object recognition to a
new position. Massimo
[8]
and partners provide an
overview and some new insights on the use of dynamic
visual information for face recognition. In their paper, not
only physical features emerge in the face representation,
but also behavioral features should be accounted. They
give some experimental results obtained from real video
image data to show the feasibility of the proposed
approach.
Junius
[9]
and partners demonstrate a three-dimensional
(for location, time, and magnitude of body part
movement) pattern representation of entire time-
dependent front-view gait cycle that simultaneously
displays the coupled kinetics of different body parts
thereby revealing possible irregularities in the gait.
Among the potential applications of their technique are
improved diagnosis and treatment of gait pathologies in
rehabilitation clinics and modeling schools as well as
development of more robust surveillance systems.
Tomokazu
[10]
and partners propose an efficient method
for estimating a depth map from long-baseline image
sequences captured by a calibrated moving multi-camera
system. The experiment is verified the validity and
feasibility of their algorithm for both synthetic and real
outdoor scenes.
In this paper, we segment moving object exactly by
use the histogram with automatic threshold segmentation
and mathematical morphology.
To consider with the hardness to construct the
background, we put forward a new algorithm. In this
algorithm, we do not construct the background firstly, but
to construct it in processing. Then, we trace moving
object from background and catch them. Finally, we give
some experiments to validate our algorithm is higher
correctness and detect pace to the classic capture moving
object algorithm.

518 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
doi:10.4304/jmm.6.6.518-525
II. VIDEO IMAGE PROCESSING
A. Definition
Video image is also called dynamic image. It is made
up of a series of images with a given or assumed relative
order. We can get time interval series of every two next
images by the order.
The relative order is the time interval of next images.
Moreover, to define a time series t
i
(i=1, 2, , n ) and
every t
k
is next to t
k-1
, we set the time interval as:
t k t
k t t t
k
k k k
=
= =

1) n , , 2 , 1 (
1 n , , 2 , 1 ,
1

It is to say that all the time intervals of image capture
are equal to each other.
We call every piece of video images is frame. Of
course, space-position of every moving object is different
to each other by different times. In other words, when
space-position of a moving object is changed from one
frame to the next, we call it moving object. When a point
P of space-object moves from (x
k-1
, y
k-1
) in frame k-1 to
(x
k
, y
k
) in frame k, we set the displacement as (x
k
, y
k
).
We also call it parallax when position is shifted from t
k-1

to t
k
of a point at the object surface.
B. Construct the Background
In classic algorithm, the construction of background is
a key step. Based on the relative simple background of
moving tracking, we construct the background image by
use method based on CDM (change detection mask). The
method is assumed that the moving object can not cover
all images. In other words, the background is must
appeared in images. So when the object moved, we will
see that the background is changed in image.
We set the luminance component of image series I
i
(x,
y). Point (x, y) is the pixels position and integer i is the
frame number (i=1, , N). Integer N is the total number
of frames.
Then we use formula below to define change direction
mask. It reflects grays changes in the next frames.
{
) , ( ) , (
, ) , (
1
,
, 0
y x I y x I d
y x CDM
i i
T d d
T d i
=
=
+

<

In this formula, the threshold value T is used to control
the removal of noise. For each position (x, y), CDM
i
(x, y)
explains the changed curve which is along time axis of
the pixel at position (x, y). Then we can segment the
curve by compute whether CDM
i
(x, y) is larger than zero.
Some stillness parts detected is express as set
{ } M j y x S
j
1 ), , ( . We can see them in figure 1.
In figure 1, the beginning and ending of S
j
is ST
j
and
EN
j
. We select the longest stillness part and register
frame number of its midpoint as M(x, y) in set of {S
j
}
corresponding to position (x, y). Then we use points of
frame with number M(x, y) to fill corresponding position
in video background. It is define as formula below.
2 / )) , ( ) , ( ( ) , ( y x EN y x ST y x M + =
)) , ( , , ( ) , ( y x M y x I y x B =
In this formula, ST(x, y) is the beginning of the longest
stillness part and the ending is EN(x, y). B(x, y) is the
rebuild video background.
C. A New Think with No Background Structure
We can see that it is hard to construct background in
video capture. Moreover, background structure is a key
step in video capture. So if we find a new way to catch
moving objects without background structure, we can get
rid of bottleneck of capture of moving objects.
In CDM, we find that it detect background by changes
of images. As we known, it rebuilds background by use
the region has been covered by moving objects when
moving objects move. So when moving objects covers all
images and so on, we can not find the correct background.
It means that we always catch incorrect objects in this
case.
So we create a new think to find moving objects with
no background rebuilt firstly. When objects move, we can
detect them by the difference between two next images of
series. Therefore, we know the movement region of
images. It is to say the remained region is background
when we remove the moving objects in images.
Then, we construct a background image. Of course, it
may be a only small region of images. But we execute
this step from the first image to last. When we get a piece
of background, we change the original one to the union of
the two. So we divide images to three kinds. The first is
moving objects we have found. The second is background
fragment. The last is moving object we have not found.
We create a list to store them. We replace the older one to
newer one when we find moving object we have found
and link it to list when we find a new object.
Then we give a new algorithm to catch moving objects
from video images.

Fig.1. Curves to display the changes along time axis by difference of luminance frame
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 519
2011 ACADEMY PUBLISHER
III. TRACKING AND PROCESSING OF MOVING OBJECT
A. Algorithm Based on Difference image method
Difference image method is such a method that it judge
existence of moving object by difference subtract from
two next frames. We call the difference as difference
image. It is simple and active to use difference image to
process globally and crudely. Moreover, it is also
beneficial when we catch crude information of moving
object.
Principle of the difference image method is that to find
the moving object by difference. When there is no
movement of objects in monitoring region, the difference
of grey level between next frames in image series is very
small. On the contrary, when there is some movement of
objects, the difference of grey level between next frames
will be significantly increased. So when we choose a
reasonable threshold value, we can determine whether
there exist objects in image series or not. The
mathematical formula is:

>
=
otherwise 0
) , ( ) , ( 1
y) D(x,
2 1
T y x f y x f when

The f
1
(x, y) and f
2
(x, y) in formula are the images of
background only and background with a moving object
inside. D (x, y) is a binary difference image of image at
point (x, y). T is the gray threshold with its size
determines the sensitive degree of monitor. The
difference may be produced by the movement of objects
in the region or moving objects enter inside or leave the
region. It can also be produced by the lighting changes of
region or noise.
We segment moving objects based on difference image
method, which is shown in algorithm b.

Algorithm b (Segment Moving Object)
Input: video image series with pretreatment.
Output: the moving region (object).
Step1. Use current image as the background image to
compare.
IF (video image series are not null)
Goto Step 2.
ELSE
Goto Step 4.
Step2. Get the next image as reading image;
Find difference between the background and the
reading image;
If (difference> threshold)
{
Find moving objects as the difference in
reading image;
Goto Step 3.
}
Set the reading image as the current image;
Goto Step 1.
Step3. Judge the object is same to the objects in list or
not.
If (the object has the same characters (for
example, just like color, gray, et al) to an object in list)
{
Mark the selected moving object in video
images.
Replace the selected object to the object with
same character in list.
}
If (the object has same characters to static part in
list which is an item in list to store static part that
calculated by whole image minus moving object)
{
Find union of static part and selected object
and replace static part with the union in list.
}
If (the object has no same characters to both
moving object and static part in list)
{
Mark the selected moving object in video
images.
Link the selected object in list.
If (item number is larger than the item number
threshold)
Release the list items with moving objects
till to n percent released, n is a selected number between
1 and 100.
}
Goto Step2.
Step4. End algorithm.
Of course, algorithm b is more effective than classic
difference image method. We show it in theorem 1 and 2.
Theorem 1 is prove the correctness of algorithm b and
theorem 2 is prove the effective of algorithm b.

Theorem 1. The moving objects are same to catch both
in algorithm b and classic algorithm when capture is
correct.
Proof:
In order to prove theorem 1, we segment all conditions
into four cases. They are case i~iv.
Case i. When there is no moving object in images.
It is easy to know that the image series are same to
each other in this case because there is no active object in
it. So we can not catch any moving object by both classic
algorithm and algorithm b. It is to say that the results of
both two algorithms are same.
Case ii. When there exist moving objects in image
series and the objects exist in all images of series. (The
moving object moves slowly.)
The classic algorithm compares all images to the
background image. In this case, because the moving
object moves slowly, to assume the created background
image is correct by classic algorithm, we can find the
correct moving object by classic algorithm.
In algorithm b, we will catch the same series moving
objects in this case when the moving objects are move
slowly. So we also catch the correct object.
Case iii. When there exist moving object in image
series and the objects are not exist in all images of series
because they move too fast.
In this case, our algorithm will find that there is no
object in images when the object moves out. And so is
520 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
the classic algorithm when background is constructed
correctly. In other words, the classic algorithm will find
that the image is same to background when background is
constructed correctly and our algorithm will catch the
object irrelatively with the background. It is because that
we use static part to find objects.
Case iv. When there are more than one moving objects
in image series.
It is the similar condition with case iii. When the
background is found correctly, the classic algorithm will
get correct object. Nevertheless, it is hard to construct
correct background when there is more than one object in
background. Therefore, the capture rate is lower with
moving objects become more.
To consider all above, we find that our algorithm is
always catch the correct moving objects. But the capture
rate of classic algorithm depends on the correctness rate
of background. So theorem 1 is proved.

Theorem 2. The magnitude of time complexity of
algorithm b and classic algorithm are same to each other.
The calculate time of algorithm b is less than classic
algorithm plus csm. Number n is item number threshold
of list and m is whole image numbers. Number c is a
constant number.
Proof:
When we ignore the time of the structure background,
we can easily to find that the classic algorithm compute
sm times by s as whole pixels of every images. Compute
time of Algorithm b can divide into the time of step 2 and
step 3.
Obviously, the compute time of step 2 is sm because it
compare s times for each image. And in step 3, we
calculate every if separately.
The mark of first if is same to the classic algorithm. It
means that the calculation is also in classic algorithm. We
can ignore it. The replace part take less than n calculate
times.
It takes less than s calculate times by determine the
second if. The union spends less than s calculates times
because there are moving objects in it.
The last if finds a new moving object in image. The
mark part is ignored because the same reason to the first
if. The link part spends one time and the item numbers
detection is taken one time if we set an item to record the
item number in list.
So when calculate the total times, we know that it spent
less than n+2s+2 compute times for each image. It is to
say that it spend (n+2s+2)m times in step 3.
To plus compute times of step 2 and step 3, we
conclude that algorithm b is spend less than (n+2s+2)m
times to classic algorithm. As we known, n is usually
small. So theorem 2 is proved.

Moreover, we ignore calculate time of background
structure of classic algorithm. In fact, the background
structure is an important process in classic algorithm and
it spends a lot of time to construct it. So when we
consider about this, we know that algorithm b spends
some time to avoid construct the classic background is
valuable.
B. Capture of Moving Object
To use the difference between background and current
frame image, we can gain moving objects to segment. We
must segment the remained points and moving objects
because that it may exist some remained points of
background in the segmented image. Of course, we can
choose the suitable threshold based on distribution of
images gray. In this paper, we use an automatic
threshold segment method to segment images because of
histogram characters of these images.
We assume that gray scales of all images are 1, 2,, L
and the corresponding pixel number with gray i is n
i
. So
whole pixel number
L 2 1
n ... + + + = n n N . It is to say that
probability distribution of pixels with gray i is N n P
i i
/ = .
In the formula, we set 0
i
P and 1
1
=

=
L
i
i
P .
When we divide the whole image to two kinds C
0
and
C
1
with threshold gray k, in other words, we call a pixel
is in C
0
when a pixels gray is in [1k] and a pixel is in
C
1
when a pixels gray is in [k+1L], we find the
probability value of the two kinds are

=
= =
k
i
i k
p C P w
1
0 0
) (
and
0
L
1 k
0 1
1 ) ( w p C P w
i
i k
= = =

+ =
. Then, we solve the
average value of the two kinds are
0
1
0
/ w iP u
k
i
i
=
=
=u(k)/w
0
and
i
L
k i
i
w iP u /
1
1
+ =
= =(u
i
-u(k))/ (1-w(k)). In the
formulas,

=
=
k
i
i
P k w
1
) ( and

=
=
k
i
i
iP k u
1
) ( .
Then, we can find that the average value of whole
image is

=
= =
L
i
i
iP L u u
1
) (

and it is easy to know that


w
0
u
0
+w
1
u
1
=u and w
0
+w
1
=1. To study variances of these
two kinds, we find that the variances are
0
2
1
0
2
0
/ ) ( w P u i
i
k
i

=
= and
1
2
1
1
2
1
/ ) ( w P u i
i
L
k i

+ =
= .
To define variance between-kinds
2
B
=w
0
(u
0
-u
y
)
2
+
w
1
(u
1
-u
y
)
2
=w
0
w
1
(u
0
-u
1
)
2
. Moreover, we define variance
within-kind
2
1 1
2
0 0
2
w w
w
+ = and total-variance
l
L
l
T
P u l
2
1
2
) (

=
=

. Then we can solve the best threshold
) 1 ( , max
2
L k k
B
=

by the maximum of variance


between-kinds. After that, we confirm moving region of
initial video images.
C. Denoise in Moving Region
As we known, there would be errors in the caught
object because the movement and environment are
usually unpredictable. For example, intense movement
can bring sharp or fuzzy border and strong light reflection
and elastic deformation can bring bad-capture. In other
words, when we get the moving object from video image,
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 521
2011 ACADEMY PUBLISHER
we will find that the moving object always be caught with
sharp or fuzzy border, salt or pepper region.
So in order to catch clear and complete moving object,
we need to modify its body further more. In our work, we
use mathematical morphology to modify the object.
Specifically, we use erosion to fill small hole and dilation
to remove isolated noise. To use them in special order,
we get opening and closing operation derived from
erosion and dilation.
When we call B
z
as translation of structure element B,
we know that dilation
z
B X is a set construct with all
points z, which makes B
z
and X are not empty set.
Furthermore, dilation is an expanded process. The
exchanged result is to make the target object expanded
and empty hole reduced. So we can use dilation to fill
holes in moving object and return it to initial connected
domain.
Oppositely, erosion
z
B X is such a set construct with
all points z, which makes B
z
is a subset of X. Erosion is a
shrink transform because its result is a subset of X.
Erosion is a process to remove boundary points. The
exchange result can expand hole and shrink object. So it
is an active method to remove isolated noise point.
Generally, both erosion and dilation are irreversible
operations. It is to say that the result is usually not X
when we transform X by erosion firstly and dilation
secondly. We set the result X
B
, so we know
that } { X B B B B X X
z z
z
B
= = : ) ( . The new
morphological transformation is called opening operation.
We know that X
B
is construct with and set of Bs
translation B
z
in X. So opening operation always smooth
the objects border. Moreover, it can remove small sharp
tip and isolation points. It also sharpen angles, disconnect
limited gap and remove thin tip.
Otherwise, closing operation B B X X
z
B
= ) (
= } {
c
z
c
z
X B B : is oppose to opening. In other words,
we transform X by dilation firstly and erosion secondly.
In this case, X
B
is a intersection of complementary set of
all translation B
z
of B outside X. Closing operation is
smooth border same to opening. Similarly, it removes
limited gap and long thin blank. It also removes small
hole and fill rupture of border.
To execute several erosions and dilations to caught
object continually, we can get a complete moving object
from video images. The capture object can beused in the
further process.
IV. EXPERIMENT
In this paper, we give an algorithm to catch moving
object in series images of video. This is not only a part of
algorithm of object recognition and judgment, but also an
independent algorithm. So when we validate activity and
correctness of our algorithm, we do not have to execute
the experiment in a whole video capture algorithm. We
can use difference method algorithms to catch moving
object. Then we get important parameters of both
difference method algorithm and our algorithm which
conclude the caught object and computed time. By
compared the two algorithms with several different series
images, it is known that our algorithm is better.
A. Capture Object in Video Image
We can see the comparison of the two algorithms by
catching normal moving object in figures below with
figure 2 contain the original video images. We can also
see the parameters in table 1.
In figure 2, we can see some images in a traffic
accident video from website www.youku.com. The
images are not continuous by considered the visualization
factor. We choose some images with same interval from a
part of video.

















In this figure, we can see two moving object. One is a
motorbike, the other is a car. We use both classic
algorithm and algorithm b to catch moving object in this
figure.
In fact, this video has whole background itself. We can
see it in figure 3. So classic algorithm has not to find
background and get well detection result. The result is in
figure 4.








Figure 4.x.1~4.x.4 are processes of figure 2.x (x is
a~d). In these figures, figures 4.x.1 and 4.x.3 are
processes of algorithm b, figure 4.x.2 and 4.x.4 are
processes of classic algorithm. Both figures 4.x.1 and
4.x.2 are capture of moving objects. Figures 4.x.3 and
4.x.4 are extraction of moving objects.
We can find that both classic algorithm and algorithm
b catch the correct moving objects in figure 4. Otherwise,
we find that both capture time and capture quality are
better in figure 4.x.2 and 4.x.4. This is means that classic
algorithm is better than algorithm b. The advantage of
algorithm b is that it spends less time than classic
algorithm when we create the moving orbit of objects.
It is because that classic algorithm uses CDM to detect
moving object. All images are compared to the

a b

c d
Fig.2. Original video images, the images order are from a to d

Fig.3. Background of the video images
522 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
background. As we known, the background is easy to
construct in this video. In fact, the first image of image
series is the whole background. So the classic algorithm
is benefit in this case.







































When we create the orbit of moving objects, algorithm
b is benefit. It is because that moving objects in all
images are compared in algorithm b. It is to say that we
can easily to find the moving orbit of objects.
We show the parameters just like capture time and
pixels number of moving object in table 1. We can see
the effective of the two algorithms when they catch
moving objects.
When we remove the background image from video,
we can get the different results. It is also in table 1. We
do not show capture figures because it is similar to figure
4.
Now we study parameters of this video capture in table
1. We can see that there are some differences between
classic algorithm and algorithm b.
At first, we can see that the pixels of the two moving
objects are similar by the two algorithms. The pixels of
four images are all increased from a to c. In figure d, we
see that parts of the two objects are superposed.
Moreover, the motorbike is behind the car in figure. It is
means that a part of motorbike can not be seen. This is
why the pixels of motorbike decrease a lot in figure d. At
the same time, we can see that the capture pixels of
algorithm b are more than classic algorithm. It is means
that the classic algorithm is a little better than algorithm b
in this condition.
Secondly, we find that the time of the caught moving
object with background is also better than our algorithm.
Especially, with the video going, the time of classic
algorithm is decreasing. It is because that the background
has been rebuilt. In algorithm b, we can see that all time
in image a~d are similar. It is because we compare
images to images in list and do not rebuild background.
So algorithm b is a steady algorithm whether there is
background or not. The classic algorithm is not a steady
algorithm because the time with background is a lot
different to without background. When we see the time of
capture without background, we can see that the time of
rebuild is increased. So is the compared time.
At last, we compare the time with background to
without background. We know that classic algorithm
depend on the background much because the capture time
increases a lot of classic algorithm. When the background
is removed, the rebuilt time is increased much and larger
than algorithm b. So when we catch moving objects in a
video with tiny background, we find that classic
algorithm is unsuitable. Therefore, we execute an
experiment to catch moving objects in a video with tiny

4.a.1 4.a.2

4.a.3 4.a.4

4.b.1 4.b.2

4.b.3 4.b.4

4.c.1 4.c.2

4.c.3 4.c.4

4.d.1 4.d.2

4.d.3 4.d.4
Fig.4. Capture and extraction of Moving Object
TABLE I.
PARAMETERS OF CLASSIC ALGORITHM AND ALGORITHM B
Classic Algorithm Algorithm b
pixels 400300=120000
pixels of the
caught moving
car with
background
image a 10864 10913
image b 10871 10912
image c 10886 10927
image d 10890 10938
pixels of the
caught moving
car with
background
image a 2706 2724
image b 2731 2775
image c 2749 2790
image d 2282 2197
time of the
caught moving
object with
background
image a 1.28ms 1.31ms
image b 1.29ms 1.31ms
image c 0.54ms 1.30ms
image d 0.55ms 1.31ms
time of the
caught moving
object without
background
image a 2.26ms 1.31ms
image b 2.38ms 1.30ms
image c 1.45ms 1.31ms
image d 1.44ms 1.31ms
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 523
2011 ACADEMY PUBLISHER
background. The result of experiment is shown in figure
5~6.


















In this video, we use two books to execute our
experiment. Book a is TCP/IP Illustrated. Volume 1. The
Protocols written by W. Richard Stevens which is
published by China Machine Press and Addison Wesley.
Book b is Object-Oriented Software Engineering
written by Stephen R. Schach which is published by
China Machine Press and McGraw Hill.

































We videorecord the video by used book a cross to
book b. In this process, the video shows only a little
background.
Figure 5 is shown some image from the video. We
choose some images with same interval that just like
figure 2. We put book a onto book b and move book a
across book b.
Then we can see the capture objects by both classic
algorithm and algorithm b in figure 6. Figures 6.*.1 are
captures by algorithm b and 6.*.2 are captures by classic
algorithm. In figure 6, the advantage of algorithm b is
obvious because classic algorithm can not determine the
background. When it determines the incorrect
background, the following captures of moving objects
will be all jumbled in the upcoming images. It is all in
figures 6.*.2.
In figures 6.*.1, we know that it is immune the effect
of background in algorithm b. Though it may catch some
incorrect objects in some of beginning images, it may
changed itself in the following images. So when the
background is tiny, algorithm b is more effective than the
classic algorithm.
V. CONCLUSION AND FURTHER WORK
We give a method to catch a moving object in series
images of video in this paper. In this method, we give an
algorithm to catch moving objects. This algorithm avoids
background structure. At first, we determine the moving
region by difference between next images in series.
Secondly, we track the moving object and catch them.
Finally, we catch the moving object to denoise it. The
experiment shows that our method is more effective than
classic algorithm by similar compute time.
In next step, we will work for two directions. The first
is that we will extend the capture method to distribute
system in order to improve the capture time. Second one
is that we will use this method in some detection system
to catch the selected object.
ACKNOWLEDGMENT
The authors wish to thank the anonymous reviewers
for their helpful comments in reviewing this paper. This
work was supported by National Natural Science
Foundation of China (No. 60973091).
REFERENCES
[1] Chellappa R, Wilson C, Sirohey S. Human and machine
recognition of faces: A survey. Proceedings of the IEEE,
1995, Vol.83, No.5, pp: 705-740
[2] Zhao W, Chellappa R, Rosenfeld A, Phillips P J. Face
recognition: A literature survey. ACM Computation
Survey, 2003, Vol.35, No.4, pp: 399-458
[3] Li S Z, Jain A K. Handbook of Face Recognition. New
York: Springer, 2005, pp.1-371
[4] Liu X M, Zhang Y J, Than H C. A new Hausdorff distance
based approach for face localization. Sciencepaper Online,
2005, 200512-662(1-9)
[5] Srikantaswamy R, Samuel R D S. A novel face
segmentation algorithm from a cideo sequence for real-

6.a.1 6.a.2

6.b.1 6.b.2

6.c.1 6.c.2

6.d.1 6.d.2
Fig.6. Capture and extraction of Moving Object

a b

c d
Fig.5. Original video images, the images order are from a to d
524 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
time face recognirion. EURASIP Journal on Advances in
Signal Processing, 2007, 2007: 1-6
[6] K.Sung and T.Poggio. Example-Based Learning for
View_Based Human Face Detection, IEEE Tran. On
Pattern Analysis and Machine Intelligence, 1998, Vol. 20
No.1, pp: 39-51
[7] Rowley H A, Baluja S, Kanade T. Neural network-based
face detection. IEEE Trans. on PAMI, 1998, Vol.20, No.1,
pp: 23~38
[8] Massimo T, Manuele B, Enrico G. Dynamic face
recognition: From human to machine vision. Image and
Vision Computing. 2009, Vol.27, No. 3, pp: 222-232.
[9] Junius Andr F.Balista, Maricor N.Soriano, Caesar
A.Saloma. Compact time-independent pattern
representation of entire human gait cycle for tracking of
gait irregularities. Pattern Recognition Letters. 2010,
Vol.31, No. 1, pp: 20-27.
[10] Tomokazu S, Naokazu Y. Efficient hundreds-baseline
stereo by counting interest points for moving omni-
directional multi-camera system. Journal of Visual
Communication and Image Representation. 2010, Vol.21,
No.(5-6), pp: 416-426.

























































Weina Fu, female, instructor, born in
1982, her research include pattern
recognition and bioinformatics






Zhiwen Xu, male, born in 1969, PhD
and Professor, his research include
computer graphics and bioinformatics






Shuai Liu, male, born in 1982, phD
and instructor, his research include
fractal, patter recognition and
computer graphics




JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 525
2011 ACADEMY PUBLISHER
Skeletonization of Deformed CAPTCHAs Using
Pixel Depth Approach

Jingsong Cui, Lu Liu, Gang Du, Ying Wang, and Qianqi Guan
School of Computer, Wuhan University, Wuhan, Hubei, China
Email: cuijs@qq.com, liulu@whu.edu.cn, melephas@gmail.com, violetwangy91@gmail.com, gqqd@qq.com



AbstractCAPTCHA is a standard security technology that
presents test to tell computers and humans apart. Nowadays
the most widely deployed CAPTCHAs are text-based
schemes, which rely on sophisticated distortion of text
images aimed at rendering them unrecognizable to the state
of the art of pattern recognition methods. Generally, the
skeletonization of character is acknowledged as one of the
most significant parts in character recognition. The skeleton
which keeps the topology information as well as reduces the
computational complexity is an excellent and robust
structural feature to noise and deformation. In this paper, a
depth-based approach is proposed in order to locate the
skeleton point. In order to strike the balance between
efficiency and robustness against distortion, three fault
tolerance techniques have been applied in the extraction
process. Then in the amendment stage, we use noise patterns
to filter redundant points. Experiments are conducted and
positive results are achieved, which show that the depth-
based skeletonization scheme is applicable to the widely
used CAPTCHA images, and the skeleton is robust against
rotated, distorted or conglutinated characters.

Index Termsdeformed CAPTCHA, skeleton, pixel depth,
distortion, symmetry

I. INTRODUCTION
CAPTCHA (Completely Automated Public Turing
Test to Tell Computers and Humans Apart) is a program
that generates and grades tests that are human solvable,
but intend to be beyond the capabilities of current
computer programs [1]. This technology is now almost a
standard security mechanism for defending against
undesirable or malicious Internet bot programs, aiming to
improve the server system and user information security
[2]. CAPTCHA are divided into three categories: OCR-
based, visual non-OCR-based, non-visual. In order to
protect the system security, CAPTCHA based on non-
OCR, for instance the moving object identification and
tracking problems is proposed, which is referred to
biological motion vision model [3,4]. Now there are no
effective methods to attack 3D dynamic CAPTCHAs.
The most widely used CAPTCHAs are the so-called text-
based schemes, which typically require users to solve a
text recognition task. In order to improve the security, a
great deal of interference has been added into the
CAPTCHA images. There are three types of interference:
foreground, background and character itself. Foreground
interference includes noisy points and interfering lines;
background interference includes background color and
texture; interference upon itself includes font-change,
rotation, distortion and conglutination. It should be
pointed out that there are many mature techniques dealing
with foreground and background interference and can
achieve high recognition rate. However most of the
methods fail to recognize characters with deformation or
affine transformation upon itself. The well-known
reCAPTCHA and Google CAPTCHA are two classical
examples that only apply interference upon character
itself instead of interference on foreground or
background. To recognize these CAPTCHAs poses a
great challenge to Artificial Intelligence, so in this paper
we are focusing on these deformed CAPTCHAs.
Generally, Character skeleton plays a significant role
in character recognition. As an important shape feature to
the pattern recognition and the classification, the skeleton
which reduces the computational complexity efficiently
as well as keeps the topology and the geometric
properties of the shape is the collection of the pixels past
through the center of data cloud [5]. To the deformed
CAPTCHAs, this feature is more robust and effective
than a raster of pixels to the contour of the shape. This
representation is particularly effective in extracting
relevant features of the character for optical character
recognition [8,15,16]. At present, it covers a wide range
of applications, such as graphic recognition, handwriting
recognition, signature verification [22], etc. Whats more,
in the recognizing process, the skeleton lie a solid
foundation for the follow-up recognition tasks which are
based on pixel weight, such as angle detection,
segmentation and template matching.
In this paper, a skeletonization method based on pixel
depth, i.e. the largest inscribed circle radius of a certain
stroke area is proposed. There are two main steps: a)
extraction of the primary skeleton points using tolerance
techniques. b) amendment processing using noise
patterns. As a result, the skeleton of character can be
extracted precisely. Experiments are conducted on
reCAPTCHA, Google CAPTCHAs and other classical
deformed CAPTCHAs. Positive results are achieved,
which shows that the depth-based skeletonization scheme
is applicable to the widely used CAPTCHA images, and
the skeleton is robust against rotated, distorted or
conglutinated characters.
526 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
doi:10.4304/jmm.6.6.526-533
II. RELATED WORK
In the past several decades, a great many of
skeletonization techniques has been developed. Recently,
the skeletonization algorithms have three categories,
namely the distance skeleton based on the symmetry
analysis, the iterative edge-point erosion method and the
non-iterative method [6,7,8]. H.blum [9] defines the
medial axis and medial axis function of the object to
represent a certain shape. This distance skeleton derivates
many other new skeletonization algorithms such as the
triangulation scheme and getting the skeleton by the
contour symmetry. The iterative edge-point erosion
technique use a sliding window (e.g. 3*3 window) which
moved over the entire image with a set of rules applied to
the contents of the windows. Simon [10] partitioned the
character stroke into regular and singular regions. The
singular region corresponds to ends, intersections and
turns, and the regular region covers the other parts of the
strokes. The technique of the regularity-singularity
analysis uses the constrained Delaunay triangulation to
separate two sections apart. Then extract the primary
skeleton points in the regular region first and do
amendment processing in the singular region. The non-
iterative way is commonly used to extract the skeleton of
simple shape.
The known thinning algorithms are usually feasible for
the well-defined digital line patterns. To the affine
transformation, conglutination and rotation the above
skeletonization algorithms will lead to disappointing
results. Whats more, the implementation of the
traditional symmetry and nonsymmetrical analysis often
suffers from the complicated computation while finding
the symmetric pairs indirectly from the boundaries of the
character strokes in both continuous and discrete
domains.
III. EXTRACTION OF PRIMARY SKELETON
A. Motivation
The skeletonization algorithm for CAPTCHA should
meet following requirements:
a) keep the topology and the geometric properties of
characters
b) thin and well centered
c) fast and efficient
d) sensitive to small CAPTCHA characters
e) robust against noise and affine transformation
CAPTCHA characters are deliberately and artificially
rotated, distorted or conglutinate, so it is hard to tell
regular and singular regions apart. Whats more, singular
regions and interference are closely bound with each
other and ones change may lead to anothers variance.
For instance, as Figure 1(a) shows below, the
conglutination of two adjacent characters links two
endpoints to one junction. In Figure 1(b), the point in the
regular region of a stroke turns into an intersection point
because of conglutination. In Figure 1(c), the original
straight line which belongs to the regular regions
becomes curved and oblique.


Figure 1:Three situations happens in CAPTCHA.(a)shows two
endpoints become one junction; (b) a stroke turns into an
intersection point; (c)a straight line becomes curve and oblique.
B. Definitions
A binary image is represented as a two-dimensional
matrix [f] whose (x, y)th element is pixel f x, y, where x
and y denote spatial coordinates, and f x, y = 1 and f x, y
= 0 are represented as a white pixel and a black pixel,
respectively.
Definition 1 (Pixel Depth) Let the location of the
pixel be a center and r be a radius, considering a black
pixel (f x, y = 0) in a stroke, we may have a circle with
the radius r which starts from a low value. As the radius r
increases by a certain step, the circle enlarges. The depth
of the pixel is the maximum radius of a circle that is
tangential to any of the boundary curves of the stroke or
intersects the boundary for the first time. The depth of a
white pixel (f x, y = 1) is zero. Every skeleton point has a
triple:

(point, depth, context) (P, d, OP OQ) (1)
Here, depth (P) = r
max
; context (P) = {P| P two radius}.
P
k
is skeleton point
depth (P
k
) = max {depth (P), P context(P)} (2)
The skeleton of the character is the set of skeleton points:
K = {K
i
| depth (K
i
) = max{depth(P
i
), P
i
context(P
i
)}}
(3)

It is conceivable that after scanning the whole image,
each pixel owns its depth. The original two-dimensional
image together with the depth of each pixel constitutes a
three-dimensional image, which appears to form a hilly
ground. The character skeleton is just like the ridge of the
hills. As Figure 3 shows, the skeletonization procedure is
just like a hiking travel, and we start from the highest
peak of the mountain and go downhill following the
mountain ridge forward.






Definition 2 (white pixel heap) White pixel heap is a
circular arc, part of the circular ring in each rotate, which
Figure 2: The process of the
expanding of circles and locating
the skeleton point O.
Figure 3: The depth of
each pixel of the letter E.
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 527
2011 ACADEMY PUBLISHER
only contains white (1) pixels and all the pixels appear
continuously and stably. Continuity means white points
lie continuously without any disconnection and gap;
Stability requires the radian of the arc large enough so
that it wont be judged invalid in the fault tolerance
processing.
C. Idea
Let the location of any black pixel be a center and r be
a radius, draw a circle with the radius r which starts from
a low value. With the increasing of the radius r by a
certain step, the circle enlarges. If the circle is still inside
the stroke, the radius keep increasing by the step of 1
pixel until the circle reaches (intersect or tangent) at least
one of the contour. We call this first reach first-time
touch. The depth of the center pixel is the value of last
detecting radius. Whether the pixel is a skeleton point is
determined by the condition of first-time touch as the
following rules:
1.If first-time touch only reach (intersect or tangent)
one side of the stroke contour, say there is only one white
pixel heap on the circular ring, the pixel is non-skeleton
point.
2.If first-time touch reach (intersect or tangent) two or
more sides of the stroke contour, say there are two or
more non-conterminous white pixel heaps on the ring, the
pixel is a skeleton point.
There is another problem which is different between
continuous and discrete domain: Given a pixel and a
radius, how to draw a circle? Which pixel around the
center is on the circle? As it is in a binary image,
trigonometric function is not appropriate as it fails to
locate pixels precisely. Therefore, we use relative
coordinates to record the relative position of the points on
the circle. Regard the detecting center as original of the
coordinate and the whole coordinate plane is divided into
four quadrants. Given a radius starting value and
detecting step (an integer), a series of circles can be
formed. In this way, we can exactly locate the points on
the circular ring.
Points (x, y) on the circle satisfy the condition:

(i -1)
2
x
2
+y
2
i
2
(4)

With the pixel depth, the essential distinction between
skeleton point and non-skeleton point is found. There is
no need to find the symmetric point pairs in local area as
some of the traditional algorithms do. By detecting the
maximum radius that a circle can extend, the symmetric
point can be positioned directly. With the white pixel
heap, the state of the circle generating in each rotate can
be shown, as well as the relationship between the circle
and the contour, say intersect or tangent.
Now the most crucial task is to compute the number of
the white pixel heap in each rotate. How to maintain the
skeleton points and leap over interference and singular
points? We bring in following fault tolerance techniques
to generalize the theory and achieve the goal.
D. Fault Tolerance Technique
From practical point of view, in the two-dimensional
discrete domain, the thinning method is different with the
method in the continuous domain. Pixels are individual
points distributed discretely in a picture. Each pixel has
its own unique coordinate. Moreover, on condition that
we are aiming at recognizing the character later on based
on the skeleton extracted, human visual perception is not
of great significance compared with a skeleton which
keeps the topology and the geometric properties of
characters. So we generalize the skeleton theory and use
three techniques to tolerant pixel fault or irregularity as
follows:
i. Filter the unstable region (white or black) in each
circle generating in each rotate by a certain center.
ii. Tolerate at most two skeleton points exit in one
local area, whose depth simultaneously increase
to the local maximum depth.
iii. Tolerate the number of white pixel heap of a
skeleton point in first-time touch equals 1, but the
next time the heap number must at least equals 2.
Filtering the unstable circular arcs can help eliminates
some of the singular points and noise that appear
irregularly and unexpectedly. In terms of the circular ring
generating in each detecting rotate, the ring is divided by
black or white boundary pixels into several consecutive
regions, each of which has the same pixel color. No
matter the region is white or black, the states can be
classified into two states: stable and unstable. A
minimum radian threshold is used to represent the
tolerance of pixel fault, which denotes the highest degree
of fault the algorithm permits. If the arc radian is larger
than the threshold, it is a stable region. Otherwise it is
unstable and will be filtered in the processing. Points in
the stable region are stable points, and in the unstable
region are unstable points. The lower is, the less pixels
is required to constitute a stable region, and relatively the
more skeleton points will be extracted at last; the higher
is, the more pixels are required in a stable region, and
consequently the less skeleton points will be extracted.
The factor is set out of regard for the following
reasons:
1. Circumference (c): the sum of pixels on the circle.
2. Radius (r): current detecting radius.
3. Experimental verification, we give a parameter 2.
According to the value of c and r, the minimum radian
of a valid white pixel heap can be calculated:
o =
c
2-
(5)

Statistically, the arc length in continuous domain is
equivalent to the total number of pixels of the arc in
discrete domain. Therefore, we might as well use the sum
of pixels count in an arc region to approximately
represent its arc length.
iauian = count i (6)
528 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
sto
The radian
lower than t
state and sho
boundary po
together on b
As the Fig
count (EF) =
= 3.2. The r
regions with
regions with
fault toleranc
{BCDEF
EF}. The nu
of elements
arc EF whic
region. For w
= 1 / 3 = 0.3
and connec
segmentation
amended Nw
In the dis
current detec
There exist tw
(1) There
of loc
can f
stroke
(2) There
and r
time
time
in a s



As for the
located accur
second circum
skeleton is c
touch is two
skeleton poin
with lots
Furthermore,
thin the char
will definitel
we generaliz
otc = ]
stoblc
unsto
n equals coun
the threshold
ould be marked
oints and con
both sides.
gure 4 shows,
= 1. The circum
ring is divide
h consecutive
consecutive w
ce techniques
FA}, and black
umber of whit
of set W, Nw
ch only conta
we can calcula
3 < o , so elim
ct two adja
n is: W = {
w = 2.
screte domain
cting radius
wo circumstan
e is only one
cal area whic
form a circle
e contour on t
e exists two s
right side resp
touch only re
it reaches bot
similar way.
Figure 4: Filteri
first circums
rately by com
mstance seem
constrained to
o sides, then
nt in many loc
of disconne
, as mentione
racter to a del
ly increase co
ze the theory
c roJion
oblc roJion
nt divided by
the region
d invalid by e
nnect the two
, the circles c
mference equa
ed into six se
black (0)
white (1) pi
s, white pixel
k pixel heap i
e pixel heap e
w = 3. Howev
ains one pixe
ate radian (EF
minate the tw
acent regions
{BCDA}, B
n, the relation
and stroke w
nces:
symmetric p
ch has the ma
e simultaneou
two or more si
ymmetric poi
pectively. The
each the left c
h sides. The r
ing unstable regio
tance, the ske
mparing the pi
ms to be a little
o those points
n it will fail
cal areas, resu
ected and
ed above, it is
licate single p
omputation co
y of skeleto
n o
n < o

r. If this radi
is in an uns
eliminating the
o adjacent reg
center is O, r
als 16, o = 16
egments, i.e. b
pixels, and w
ixels. Withou
l heap is set
is set B = {AB
equals the nu
ver, apparently
el is not a s
F) = count (EF
wo boundary p
s together.
B = {AB
nship between
width is uncer
oint in the m
aximum depth
usly reaching
ides of contou
ints located on
e left points
ontour, while
right point con
on EF
eleton point ca
ixel depth. Bu
e troublesome
s whose first-
to get even
ulting in a ske
scattered po
s not necessa
pixel stream w
omplexity. H
n permitting
(7)
ian is
stable
e two
gions
r = 3,
6/2*3
black
white
ut any
W =
B, CD,
mber
y the
stable
F) / r
points
New
CD},
n the
rtain.
middle
h and
g the
ur.
n left
first-
next
nduct

an be
ut the
e. If a
-time
n one
eleton
oints.
ary to
which
ence,
two
poin
skel
pote
circl
beco
skel
skel
any
In
dual
ame
Th
are:
i.
ii.
Th
patte
ame


Th
follo
It
valu
case
gene
dete
the i
Algo
Inpu
Outp
Step
nts of equal s
eton points. A
ential points ar
le. If this tim
omes a skeleto
eton point. W
eton can be e
of its topolog
n the first ext
l adjacent sk
ndment step,
IV
he two majo
When the
of the left
considered
we need to
skeleton po
When the
irregular p
longer con
need to elim
he noise patte
erns are divid
nds a specific
hough experi
owing parame
a) radius sta
b) radius in
c) maximum
d) minimum
should be po
ue of detecting
es the width o
erally no wi
cting radius v
integrity of de
orithm 1: (Sk
ut parameter:
a) Preproce
b) Absolute
put:
The primary
ps:
status and sim
After the firs
re given anoth
me it touches
on point. Othe
With the abo
extracted com
ical structure.
traction step,
keleton pixel
we will remov
V. AMENDMEN
r operations
width of a str
t and right si
d as the skelet
o detect it and
oint.
stroke is in
parts, the extra
ntinuous or sm
minate redund
erns are displa
ded into sever
c situation:
Figure 5: The no
V. ALG
ence and stat
eters:
arting value: r
ncreasing step:
m detecting ra
m radian thres
ointed out tha
g radius 10, f
of character s
ider than 20
varying from
etecting.
keletonization
essed text-base
e coordinates s
y skeleton of im
milar function
st-time touch
her chance to
both sides o
erwise, it is d
ove three tec
mpletely witho
.
we allow the
s. And in th
ve these redun
NT PROCESSIN
of amendmen
raight line is
ide in the ce
ton points. In
d only keep on
singular regi
acted skeleton
mooth. In this
dant points.
ayed in the Fi
ral groups an
oise patterns
ORITHM
tistical verific
r
0
= 1
: r
s
= 1
adius: r
m
= 10
shold: o = c /
at here we se
for the reason
stroke is limi
0. Therefore
1 to 10, whic
based on pixe
ed CAPTCHA
sets of circles
mage
ns both to be
failed, these
form a larger
of contour, it
definitely non-
chniques, the
out damaging
e existence of
he following
ndant pixels.
G
nt processing
not odd, both
enter area are
this situation,
ne side as the
ions or other
n loci may no
situation, we
gure 5. These
nd each group
cation, we set
0
(2*r)
t the maxima
n that in most
ited, which is
we set the
ch will assure
el depth)
A image
e
e
r
t
-
e
g
f
g
g
h
e
,
e
r
o
e
e
p

t
a
t
s
e
e
JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 529
2011 ACADEMY PUBLISHER
1) Scan the image, if the current pixel is white (f x,
y = 0), continue scanning the next pixel; if the
pixel is black (f x, y = 0), jump to step 2
2) Let the location of the pixel be a center point,
the circle radius ranges from 1 to 10, calculate
the white pixel heap number h
3) If h is greater than 2, the current pixel can be
considered as the primary skeleton point, jump
to step 1
4) If h is equal to 1 for the first time, set flag,
expand the circle and calculate the next h, jump
to step 2; if h is equal to 1 for the second time,
ignore this pixel and jump to step 1
5) If h equals zero, expand the circle, jump to step
2.
6) If the current detecting radius exceed maximum
detecting radius, jump to step 1
Algorithm 2: (Calculating the white pixel heap)
Input parameter:
a) Preprocessed text-based CAPTCHA image
b) Current pixel location (x, y) and the detecting
circle radius r
c) Absolute coordinates sets of the circles
Output:
The white pixel heap of the current pixel
Steps:
1) Obtain the coordinates of pixels on the current
detecting circle. The coordinates have been
sorted anti-clockwise.
2) Calculate the minimum radian , as a threshold
of a valid white pixel heap.
3) Traverse the detecting circle, when finding a
white pixel area, consider this area as a potential
white pixel heap.
4) Determine whether the current white region is
valid. If the radian is greater than , then the
count of the white pixel heap plus 1.Jump to step
3.
5) When the traverse finished, output the count of
the white pixel heap.
Algorithm 3: (Filtering with noise patterns)
Input parameter:
a) The primary skeleton of image
b) The noise patterns
Output:
The final skeleton image result
Steps
1) Scan the image, if the current pixel is white (f x,
y = 0), continue scanning the next pixel; if the
pixel is black (f x, y = 0), jump to step 2
2) Get the 3*3 sliding window from the current
pixel (f x-1: x+1, y-1: y+1)
3) If the current sliding window contains the
pattern that is in the noise patterns, remove the
current pixel and jump to step l; otherwise keep
the current pixel in the final skeleton result.
Algorithm 4: (Generating Circles)
Input parameter:
Maximum detecting radius r
m
= 10;
Output: Relative coordinates on the circle
Steps:
1) Compute relative coordinate of edge points on
each circle, the loop viable ranges as follows:
2) Calculate the distance between original and
detect point (x, y) in each loop: D =x
2
+y
2
.
3) Search the points whose D is greater than
(i -1)
2
and lower than i
2


TABLE I
THE RELATIVE COORDINATE
Quadrant X coordinate Y coordinate
X axis (positive) and
First quadrant
x= r:-1:0 y= 0: r-1
X axis (negative) and
Second quadrant
x = 0:-1:-(r-
1)
y= r:-1:0
Y axis (positive) and
Third quadrant
x = -r: 0 y= 0:-1: -(r-1)
Y axis (negative) and
Fourth quadrant
x = 0: r-1 y = -r: 0

VI. EXPERIMENT
In this section, we present several images to show how
the mechanism works, and then present some CAPTCHA
images and their skeleton results.
Figure 6 shows the mechanism by the skeletonization
of a standard letter B. Figure 6(a) is the skeleton
extracted. We use two colors to represent two kinds of
skeleton points by their white pixel heap numbers in rst-
time touch. The green parts denote the pixels whose rst
non-zero white pixel heap count is 2, and the red parts
denote the rst non-zero white pixel heap count is 1,
which are considered as primary skeleton points via fault
tolerance techniques. The fault tolerance techniques work
in the situations that the width (sum of black pixels inside
the stroke) of the straight line is not odd and the line is a
curve. Figure 6(b) shows the rst non-zero count of the
white pixel heaps of every pixel in the image. X axis and
Y axis compose the 2-demensional image plane, and each
point is corresponding to a pixel in the image. Z axis
represents their rst non-zero white pixel heap count.
Figure 6(c,d) shows the depth of each pixel, and the local
deepest pixels are skeleton points. In this example, the
maximum depth of skeleton points is no bigger than 6,
indicating stroke of the character is no wider than 12.
Early CAPTCHA designs usually use the combination
of deformations and conglutination, which could be
easily thinned and recognized by machine. The
Microsofts early CAPTCHA design in Figure 7 and
Gimpy which is created by Carnegie Mellon University
in Figure 8 are typical, which.
Next, Figure 9 displays the skeleton of reCAPTCHA in
the register page on MSN website. They are of previous
style without any foreground or background interference.
In these figures, the skeleton image keeps the basic
skeleton structures of characters and of good visual
quality.
530 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER

(a)

(c)

Figure 6:
standard letter
gure of let

Then, we
skeleton im
reCAPTCHA
CAPTCHA
Google CA
unnaturally,
Unfortunatel
the third exam
artifacts. Bu
approach is
proposed app
The Tenc
can be seen
deployed by
that has the
dont have a
well-designe
to break. S
approach ext
The last
patterns. Figu
conformed t
unwanted art
All experi
on the Core
Server 2008
CAPTCHAs
executed tim
than some
algorithm an
approach sc
computationa
depth of stro
great


(
: Explains the me
r A. (a) the plan
tter A. (c, d) the 3
d
present the G
mages in Fi
A, the secur
are deforma
APTCHA co
leading to
ly, over-defor
mple, cannot
ut other ex
efficacious,
proach is robu
ent CAPTCH
in Figure 11
y QQ, which
largest quant
any affine tra
ed conglutinat
Still, with fa
tracts the skele
example co
ure 12 illustra
to human pe
tifacts and bra
ments are con
2 Due proce
8 x64. The
and images i
me of our dept
current met
nd wavelet-b
ans the CAP
al time is rel
okes. However
(b)
(d)
echanism by the s
gure with depth
3D gure of the s
denoting.
Google CAPT
igure 10. A
rity methods
ations and co
onglutinates
a terrible
rmed Google
be skeletonize
xamples show
which demo
ust to affine tra
HA is anothe
1. This kind
is an instant
ity of users in
ansformation,
tion method,
ault tolerance
eton out succe
ontains three
ates the extrac
erception, wit
anches in the s
nducted in Ma
essor with 2.0
computationa
is shown in th
th- based meth
thods, for e
based approa
PTCHA imag
lated to the r
r the other me
skeletonization o
denoting. (b) the
keleton with dept
TCHAs and t
As well as
s of the Go
onglutination.
these chara
user experie
CAPTCHA,
ed well becau
w our prop
onstrates that
ansform.
r example, w
of CAPTCHA
message plat
n the world.
but assured
they are still
e techniques
essfully.
typical sin
cted the skelet
thout leaving
singular region
atlab and exec
0GHz in Wind
al time of
he TABLE II.
hod is much f
example, iter
ach. Because
ge only once
resolution and
ethods suffer


f a
e 3D
th
theirs
the
oogle
But
acters
ence.
like
use of
posed
t the
which
As is
tform
They
by a
hard
our
gular
ton is
g any
ns.
cuted
dows
these
. The
faster
rative
our
, the
d the
from
F
Fig
Mell
F

Figure 7: Example
gure 8: Gimpy-r, a
lon University, pr
results s
igure 9: Tow ima
Figure 10: Tw

es of the Microso
results



a well-known ear
reprocessed imag
show the robustne


ages of reCAPTC

wo images of The
skeleton

oft CATPCHA an
s.
rly scheme design
ges and their skele
ess of our approa
CHA and their ske
e Google CAPTC
ns.



nd their skeleton


ned at Carnegie
eton results. The
ch.




eleton results.

CHA and

JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 531
2011 ACADEMY PUBLISHER
Figure 1
Figure 12
THE C
Type
Micros
CAPTC
Gimp
reCAPTC
Googl
CAPTC
Tence
CAPTC
Symbo
In the fut
CAPTCHAs
CAPTCHA.
out key poin
nodes and ed
points includ
and edges a
direction and
analysis meth
recognize the
In this pap
extract the
scanning the
located accu
1: Three images
skele
2: Original image
T
COMPUTATION
e Image
soft
CHA
1
2
py
1
2
CHA
1
2
le
CHA
1
2
nt
CHA
1
2
3
ol
1
2
VII. F
ture, we will
, such like
Based on the
nts and lines,
dges. In the gr
ding ends, inte
are extracted
d curvature a
hod, we will
e deformed CA
VIII.
per, we propo
skeleton from
e CAPTCHA
urately using

of Tencent CAPT
eton results.

of symbols and t
ABLE II
NAL TIME OF E
e Resolutio
114*422
108*408
178*444
170*420
114*600
114*600
140*400
140*400
106*260
106*260
106*260
576*976
545*225
FUTURE WORK
l work on b
e reCAPTCH
e skeleton, we
, which comp
raph nodes are
ersections and
d skeleton lin
attributes. Th
find a more g
APTCHAs ba
CONCLUSION
osed a depth-b
m deformed
images, skele
the criterion
TCHA and their
the final skeleton
EACH IMAGE
on
Time
(seconds
2 0.45
8 0.46
4 1.52
0 1.48
0 1.42
0 1.26
0 0.54
0 0.40
0 0.31
0 0.44
0 0.28
6 3.67
5 0.56
K
reaking defor
HA and Go
e can easily f
pose a graph
e the extracted
d inflection po
nes with its
hrough topolo
general solutio
ased on skeleto
N
based approac
CAPTCHAs
eton points ca
n of pixel d




s)
rmed
oogle
figure
with
d key
oints,
own
ogical
on to
on.
ch to
. By
an be
depth.
Mea
effic
toler
Then
filter
base
to th
skel
cong
prov
reco
T
Natu
2010
the
Natu
6094
Prov
Chen
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
anwhile, in o
ciency and ro
rance techniq
n in the amen
r redundant p
ed skeletoniza
he widely used
eton is rob
glutinated cha
vides a new
ognition.
The research
ural Science
0CDB08603,
Central Uni
ural Science
40028, the O
vince under
nguang Scien
L von Ahn, M
and Compute
No.2, 2004.
El Ahmad, A
The robustne
3rd Europe
EUROSEC'10
Jing-Song Cu
CAPTCHA I
Recognition
Business and
China, May 7
JingSong Cui
Dynamic CA
Workshop o
Science (ETC
2010.
Mian Yang, Z
of low-qualit
curves, Pro
Conference
Baoding, 12-1
Yu-Shuen W
Extraction Us
Visualization
on Vol. 14, is
Gisela Klette
transforms an
processing,
Journal, vol. 1
B. Kgl, A. K
using principa
Intell., vol. 24
H. Blum,
descriptors of
Speech and
Cambridge, M
J. C. Simon, E
to feature de
Netherlands: N
Jeff Yan and
on a Microso
Conference o
2008 pages 54
order to stri
obustness aga
ques have bee
ndment stage
points. Experim
ation scheme
d text-based C
bust against
aracters. Our r
w preprocess
ACKNOWLED
was supporte
Foundation o
the Fundam
iversities No
Foundation o
utstanding Y
Grant No. 2
nce Project of W
REFEREN
M Blum and J
er Apart Auto
Ahmad Salah, Y
ess of a new CA
ean Worksho
0, pp.36-41, 20
ui, Jing-Ting
Implementation
Problem, Inte
d E-Governmen
7th to 9th, 2010.
i, WuZhou Zh
PTCHA Imple
n Education
CS 2010),Wuha
Zhi-Wu Liao ,
ty Chinese ch
oceedings of
on Machine
15 July 2009
Wang; Tong-Y
sing Iterative L
and Computer
sue 4, pp. 926
e. A compara
nd simple def
Machine Grap
12, No. 2, pp 23
Krzyzak. Pie
al curves, IEE
4, no. 1, pp. 59
A transform
f shape, in M
Visual Form
MA: MIT Press,
Ed. Amsterdam
etection in Fro
North-Holland,
Ahmad Salah E
oft CAPTCHA
on Computer a
43 554, Alexan
ike the bala
ainst distortion
en applied in
, we use nois
ments show th
is applicable
CAPTCHA im
t rotated,
research on sk
method for
DGMENT
ed by the Hub
of China und
mental Researc
. 6082022,
of China und
outh Foundat
2009CDA148
Wuhan (2009
NCES
J Langford. T
omatically, CA
Yan Jeff, Mar
APTCHA," Pro
op on Syst
10
Mei, Wu-Zho
n Based on M
ernational Con
nt (ICEE 2010
.
hang, Yang Pen
mentation, 2n
Technology a
an, China, Mar
The skeletoniz
haracters based
the Eighth
Learning and
Yee Lee. C
Least Squares
r Graphics, IEE
936 , July-Au
ative discussio
formations in
phics & Vision
35-256, Feb. 20
ecewise linear
EE Trans.Patter
74, Jan. 2002.
mation for ex
Models for the
m, W. Wath
, pp. 362380,
m, A complem
om Pixels to F
, pp. 229236, 1
El Ahmad, A l
A, Proceedings
and Communica
ndria, VA, Unite
ance between
n, three fault
n the process.
se patterns to
hat the depth-
and efficient
mages, and the
distorted or
keletonization
r CAPTCHA
bei Provincial
er Grant No.
ch Funds for
the National
er Grant No.
tion of Hubei
8, the Youth
50431189).
Telling Humans
ACM, Vol.47,
rshall, Lindsay,
ceedings of the
em Security,
ou Zhang, A
Moving Objects
nference on E-
0), Guangzhou,
ng, A 3-layer
nd International
and Computer
rch 6th to 7th,
zation research
d on principal
International
d Cybernetics,
Curve-Skeleton
Optimization,
EE Transactions
ug. 2008.
on of distance
digital image
n International
003.
skeletonization
rn Anal. Mach.
xtracting new
Perception of
hen-Dunn, Ed.
1967.
mental approach
Features, The
1989.
low-cost attack
s of the ACM
ations Security
ed states, 2008.
n
t
.
o
-
t
e
r
n
A
l
.
r
l
.
i
h
s
,
,
e
,
A
s
-
,
r
l
r
,
h
l
l
,
n

s
e
e
l
n
.
w
f
.
h
e
k
M
y

532 JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011
2011 ACADEMY PUBLISHER
[12] reCAPTCHA. http://recaptcha.net/ . Accessed in Jan 2011.
[13] Google. http://www.google.com/recaptcha . Accessed in
Jan 2011.
[14] Tencent. http://www.imqq.com/ . Accessed in Jan 2011.
[15] T. M. Alcorn and C. W. Hoggar, Preprocessing of data
for character recognition, Marconi Rev., vol. 32, pp.
6181, 1969.
[16] E. S. Deutsch, Preprocessing for character recognition,
in Proc. IEEE NPL Conf. Pattern Recognition, pp. 179190,
1968.
[17] L. Lam, S. W. Lee and C. Y. Suen. Thinning
methodologies A comprehensive survey, IEEE Trans.
Pattern Anal. Mach. Intell., vol. 14, no. 9, pp. 869885, Sep.
1992.
[18] J. J. Zou and H. Yan, Skeletonization of ribbon-like
shapes based on regularity and singularity analyses,
IEEE Trans. Syst. Man. Cybern. B, Cybern., vol. 31, no. 3,
pp. 401407, Jun. 2001.
[19] Wan, Y., Yao, L., Xu, B. and Zeng, P., A distance map
based skeletonization algorithm and its application in ber
recognition, International Conference on Audio,
Language and Image Processing, Shanghai, China, pp.
17691774, 2008.
[20] You, X. and Tang, Y. Y., Wavelet-based approach to
character skeleton, IEEE Transactions on Image
Processing 16(5): 12201231, 2007.
[21] Saeed, K., Rybnik, M. and Tabedzki, M.,
Implementation and advanced results on the non-
interrupted skeletonization algorithm, in W. Skar bek
( Ed.) Computer Analysis of Images and Patterns, Lecture
Notes in Computer Science, Vol. 2124, Springer-Verlag,
Heidelberg, pp. 601609, 2001.
[22] R. D. T. Janssen, Interpretation of maps: From bottom-
up to modelbased, in Handbook of Character
Recognition and Document Image Analysis, H. Bunke
and P. S. P.Wang, Eds. Singapore:World Scientific,1997.
[23] Zhang, Y. Y. and Wang, P. P., A parallel thinning
algorithm with two-subiteration that generates one-pixel-
wide skeletons, International Conference on Pattern
Recognition, Vienna, Austria, Vol. 4, pp. 457461, 1996.
[24] Rockett, P. I., An improved rotation-invariant thinning
algorithm, IEEE Transactions on Pattern Analysis and
Machine Intelligence 27(10): 16711674, 2005.
Jingsong Cui received a bachelors
degree of computer software in 1997, at
the department of computer science and
technology, Wuhan University. In the
same year, he went on graduate studies
for a master degree in computer
applications at Wuhan University
without examination. In the year of 2000,
he worked for a PhD. Degree at School
of mathematics and computer, Wuhan
University. The main research topics are information security
and algorithm optimization. In 2003, he received a PhD. degree
in 2003, and began to teach in Wuhan University in 2004. He
has published more than 20 academic papers in academic
journals and international conferences, among which 17 articles
are EI indexed. He has accumulated rich research experience of
systems infrastructure security, information security, and
network security.

Lu Liu is a senior student major in information security at
Wuhan University, Wuhan, Hubei, China. In the year of 2010,
she participated in the scientific research project on CAPTCHA,
including CAPTCHA design and security analysis. She has
carried out some research on CAPTCHA recognition algorithm
and security assessment approaches.
Gang Du is a senior student major in computer science and
technology at Wuhan University, Wuhan, Hubei, China. In the
year of 2009, he participated in the research project on
CAPTCHA analysis and recognition. He and his partners won
second prize in the 2010 Undergraduate Electronic Design
Contest - Information Security Technology Invitational Contest.
During 2010 and 2011, he furthered his research on security
assessment approaches.

Ying Wang is a senior student major in information security at
Wuhan University, Wuhan, Hubei, China. In the year of 2010,
she participated in the scientific research project on CAPTCHA,
including CAPTCHA design and security analysis. She has
studied the CAPTCHA assessment methods and detecting
algorithms.

Qianqi Guan is a junior student major in information security
at Wuhan University, Wuhan, Hubei, China. In the year of 2010,
she participated in the scientific research project on CAPTCHA,
including CAPTCHA design and security analysis.



JOURNAL OF MULTIMEDIA, VOL. 6, NO. 6, DECEMBER 2011 533
2011 ACADEMY PUBLISHER

Call for Papers and Special Issue Proposals


Aims and Scope.

Journal of Multimedia (JMM, ISSN 1796-2048) is a scholarly peer-reviewed international scientific journal published bimonthly, focusing on
theories, methods, algorithms, and applications in multimedia. It provides a high profile, leading edge forum for academic researchers, industrial
professionals, engineers, consultants, managers, educators and policy makers working in the field to contribute and disseminate innovative new work
on multimedia.

The Journal of Multimedia covers the breadth of research in multimedia technology and applications. JMM invites original, previously
unpublished, research, survey and tutorial papers, plus case studies and short research notes, on both applied and theoretical aspects of multimedia.
These areas include, but are not limited to, the following topics:

Multimedia Signal Processing
Multimedia Content Understanding
Multimedia Interface and Interaction
Multimedia Databases and File Systems
Multimedia Communication and Networking
Multimedia Systems and Devices
Multimedia Applications

JMM EDICS (Editors Information Classification Scheme) can be found at http://www.academypublisher.com/jmm/jmmedics.html.


Special Issue Guidelines

Special issues feature specifically aimed and targeted topics of interest contributed by authors responding to a particular Call for Papers or by
invitation, edited by guest editor(s). We encourage you to submit proposals for creating special issues in areas that are of interest to the Journal.
Preference will be given to proposals that cover some unique aspect of the technology and ones that include subjects that are timely and useful to the
readers of the Journal. A Special Issue is typically made of 10 to 15 papers, with each paper 8 to 12 pages of length.

The following information should be included as part of the proposal:

Proposed title for the Special Issue
Description of the topic area to be focused upon and justification
Review process for the selection and rejection of papers.
Name, contact, position, affiliation, and biography of the Guest Editor(s)
List of potential reviewers
Potential authors to the issue
Tentative time-table for the call for papers and reviews

If a proposal is accepted, the guest editor will be responsible for:

Preparing the Call for Papers to be included on the Journals Web site.
Distribution of the Call for Papers broadly to various mailing lists and sites.
Getting submissions, arranging review process, making decisions, and carrying out all correspondence with the authors. Authors should be
informed the Instructions for Authors.
Providing us the completed and approved final versions of the papers formatted in the Journals style, together with all authors contact
information.
Writing a one- or two-page introductory editorial to be published in the Special Issue.


Special Issue for a Conference/Workshop

A special issue for a Conference/Workshop is usually released in association with the committee members of the Conference/Workshop like
general chairs and/or program chairs who are appointed as the Guest Editors of the Special Issue. Special Issue for a Conference/Workshop is
typically made of 10 to 15 papers, with each paper 8 to 12 pages of length.

Guest Editors are involved in the following steps in guest-editing a Special Issue based on a Conference/Workshop:

Selecting a Title for the Special Issue, e.g. Special Issue: Selected Best Papers of XYZ Conference.
Sending us a formal Letter of Intent for the Special Issue.
Creating a Call for Papers for the Special Issue, posting it on the conference web site, and publicizing it to the conference attendees.
Information about the Journal and Academy Publisher can be included in the Call for Papers.
Establishing criteria for paper selection/rejections. The papers can be nominated based on multiple criteria, e.g. rank in review process plus
the evaluation from the Session Chairs and the feedback from the Conference attendees.
Selecting and inviting submissions, arranging review process, making decisions, and carrying out all correspondence with the authors.
Authors should be informed the Author Instructions. Usually, the Proceedings manuscripts should be expanded and enhanced.
Providing us the completed and approved final versions of the papers formatted in the Journals style, together with all authors contact
information.
Writing a one- or two-page introductory editorial to be published in the Special Issue.


More information is available on the web site at http://www.academypublisher.com/jmm/.

Вам также может понравиться