Вы находитесь на странице: 1из 5

International Journal of Research and Reviews in Electrical and Computer Engineering (IJRRECE)

Vol. 1, No. 2, June 2011


ISSN: 2046-5149
Copyright Science Academy Publisher, United Kingdom
www.sciacademypublisher.com
Science Academy
Publisher

Analysis of Detecting Steganography contents in corporate


Emails
P. T. Anitha1 , M. Rajaram2 and S. N. Sivanandham3
1

Asst. Prof. /MCA, Karpagam College of Engineering, Coimbatore 641032, Tamilnadu, India
Vice Chancellor, Anna University, Tirunelveli, Tamilnadu, India
3
Educational Advisor, Karpagam Group of Institutions, Coimbatore 641032, Tamilnadu, India
2

Email: anitha_pt@yahoo.com, rajaramgct@redifmail.com, Snsprof25@yahoo.com

Abstract - The widespread use of Steganography inevitably leads to a need to detect hidden data. However, compared to
steganography, steganalysis is still in its infancy. Our goal is to establish a solid framework for steganalysis, and design
systems to detect state-of-the-art hiding systems. We are researching three two approaches to accomplish this: 1)
cryptography 2) Steganography 3) steganalysis. Steganography is used to hide the occurrence of communication. Today,
email management is not only a filing and storage challenge. Because law firms and attorneys must be equipped to take
control of litigation, email authenticity must be unquestionable with strong chains of custody, constant availability, and
tamper-proof security. Email is insecure. This proposed will develop a steganalysis framework that will check the Email
content of corporate mails by improving the S-DES algorithm with the help of neural network approach. A new filtering
algorithm is also developed which will used to extract only the JPG images from the corporate emails. We anticipate that this
paper can also give a clear picture of the current trends in Steganography so that we can develop and improvise appropriate
steganalysis algorithms.
Keywords: Steganalysis, Steganography, Information Hiding, LSB, Stegdetect, Stego, Outguess

1.

Introduction

Cryptography and Steganography are well known and


widely used techniques that manipulate information
(messages) in order to cipher or hide their existence. These
techniques have many applications in computer science and
other related fields: they are used to protect military
messages, E-mails, credit card information, corporate data,
personal files, etc. The widespread use of Steganography
inevitably leads to a need to detect hidden data. Steganalysis
is detecting and ultimately extracting data hidden in an
innocuous medium.
The goal of steganalysis is to detect and/or estimate
potentially hidden information from observed data with little
or no knowledge about the steganography algorithm and/or its
parameters.
Current trend in steganalysis seems to suggest two
extreme approaches: (a) little or no statistical assumptions
about the image under investigation. Statistics are learnt using
a large database of training images and (b) a parametric
model is assumed for the image and its statistics are
computed for steganalysis detection. This proposed research
work developed a framework which is used to analyze the
stego content in the corporate emails. Our goal is to establish
a solid framework for steganalysis, and design systems to

detect state-of-the-art hiding systems.

2.

Image Steganalysis

There are essentially three types of image formats: raw,


uncompressed formats (BMP, PCX), palette formats (GIF),
and lossy compressed formats (JPEG, Wavelet, JPEG2000).
Only few current steganographic programs offer the
capability to embed messages directly in the JPEG stream. It
is a difficult problem to devise a steganographic method that
would hide messages in the JPEG stream in a secure manner
while keeping the capacity practical. Far more programs use
the BMP, PCX, or the GIF palette-based format. The GIF
format is a difficult environment for secure steganography
with reasonable capacity.
Consequently, if the cover-image was initially stored in
the JPEG format, the act of message embedding will not erase
the characteristic structure created by the JPEG compression
and one can still easily determine whether or not a given
image has been stored as JPEG in the past. Actually, unless
the image is too small, one can reliably recover even the
values of the JPEG quantization table by carefully analyzing
the values of DCT coefficients in all 88 blocks. After
message embedding, however, the cover-image will become
(with a high probability) incompatible with the JPEG format

International Journal of Research and Reviews in Electrical and Computer Engineering (IJRRECE)
in the sense that it may be possible to prove that a particular
88 block of pixels could not have been produced by JPEG
decompression of any block of quantized coefficients. This
finding provides strong evidence that the block has been
modified. It is highly suspicious to find an image stored in a
lossless format that bears a strong fingerprint of JPEG
compression, yet is not fully compatible with any JPEG
compressed image. This can be interpreted as evidence for
steganography. Presented in the figure 1 is an example of a
hidden message inside a picture.

3.

93

compression algorithm.

Proposed Idea

3.1. Hybrid Algorithm


A new hybrid algorithm is developed by combining the S
DES algorithm and Back propagation algorithm of neural
network which will effectively detect the stego content in the
images. The S_DES is the best known and most widely used
cryptosystem for civilian applications. It was developed at
IBM and adopted by the National Bureau of Standards in the
mid 1970s, and has successfully withstood all the attacks
published so far in the open literature.
The proposed work developed a frame work which
contains the following tasks: Image separation from corporate
mails using the newly developed capturing algorithm,
Compression,
encryption,
hiding,
decryption,
and
decompression steps.
3.2. S-DES method of encryption
This method is an example of a block cipher: the plain
text is split into blocks of a certain size, in this case 8 bits.
Plaintext = b1b2b3b4b5b6b7b8
key = k1k2k3k4k5k6k7k8k9k10
For subkey generation, first, produce two subkeys K1 and
K2:
K1 = P8 (LS1 (P10 (key)))
K2 = P8 (LS2 (LS1 (P10 (key))))
where P8, P10, LS1 and LS2 are bit substitution operators.
For example, P10 takes 10 bits and returns the same 10 bits in
a different order:
P10 (k1k2k3k4k5k6k7k8k9k10) = k3k5k2k7k4k10k1k9k8k6.
The plain text is split into 8-bit blocks; each block is
encrypted separately. Given a plaintext block, the cipher text
is defined using the two subkeys K1 and K2, as follows:
ciphertext = IP-1( fK2( SW( fK1( IP( plaintext ) ) ) ) )
and fK ( ) is computed as follows. We write exclusive-or
(XOR) as:
fK(L,R) = (L+FK(R),R
FK(R) = P4(S0(lhs( EP(R)+K )), S1(rhs(EP(R)+K )))
Once sample image and embedded information are
finalized then it is compressed with the help of JPEG

Fig 1 Simplified DES Scheme

3.3. The Capturing algorithm


This new capturing algorithm checks the mail inbox only
for JPEG files. This filtering concept helps us to minimize the
seeking time of filtering the JPEG files. After filtering those
files they are stored in a large database for further processing.
A sample image is taken from the database as covert channel
which is used to hide the secret information. For our
experiments, we created a database containing more than
20000 JPG images obtained from corporate mails. For each
image, we embedded a random binary stream of different
lengths using S-DES algorithm. The proposed research
analyzes the performance of the improved version of image
steganalysis algorithms in corporate mails. A large database is
used to store the images. The performance and the detection
ratio are going to be measured in corporate mails.

4.

Detection based on back propagation method

The neural network back propagation approach is used to


check for the discrepancy patterns and train itself for better
accuracy by automating the whole process [7]. This study
used neural network to analyze object digital image based on
three different types of transformation which are Domain
Frequency Transform (DFT), Domain Coefficient Transform
(DCT) and Domain Wavelet Transform (DWT).
In this paper, we only consider following transforms,
DFT, DCT and DWT. Firstly we analysis object digital image
according these three different kinds transforms in this
method. The object image is transformed into transform
domain data according these three transforms. Then calculate
these transforms datas statistical features which can be
exploited to detect hided information. The reason for
selecting DFT, DCT and DWT is that most data hiding
method operate in these domains. These selected features
should be significantly impacted by the data hiding
processing. But it is difficult to find those features, so we
select neural network to process this problem, neural network
has the super capability to approximation any nonlinear

International Journal of Research and Reviews in Electrical and Computer Engineering (IJRRECE)
functions.
For these features which have more effected by data
hiding process, neural network will assign larger weight
coefficients and for these features which have less effected by
data hiding process, neural network will assign less weight
coefficients.
Let us denote the i-th DCT coefficient of the k-th block as
dk(i), 0 i 64, k = 1, , T, where T is the total number of
blocks in the image. In each block, all 64 coefficients are
further quantized to integers Dk(i) using the JPEG
quantization matrix Q .
The quantized coefficients Dk(i) are arranged in a zig-zag
manner and compressed using the Huffman coder. The
resulting compressed stream together with a header forms the
final JPEG file.
The decompression works in the opposite order. The
JPEG bit-stream is decompressed using the Huffman coder
and the quantized DCT coefficients Dk(i) are multiplied by
Q(i) to obtain DCT coefficients QDk, QDk(i) = Q(i)Dk(i) for
all k and i. Then, the inverse DCT is applied to QDk and the
result is rounded to integers in the range 0255.
Algorithm description:
1. Divide the image into a grid of 88 blocks, skipping
the last few rows or columns if the image dimensions
are not multiples of 8.
2. Arrange the blocks in a list and remove all saturated
blocks from the list (a block is saturated if it has at
least one pixel with a gray value 0 or 255). Denote the
total number of blocks as T.
3. Extract the quantization matrix Q from all T blocks as
described in Appendix A. If all the elements of Q are
ones, the image was not previously stored as JPEG and
our steganalytic method does not apply (exit this
algorithm). If more than one plausible candidate exists
for Q, the steps 46 need to be carried out for all
candidates and the results that give the highest number
of JPEG compatible blocks will be accepted as the
result of this algorithm.
4. For each block B calculate the quantity S (see equation
(3)).
5. If S>16, the block B is not compatible with JPEG
compression with quantization matrix Q. If S16, for
each DCT coefficient QDi' calculate the closest
multiples of Q(i), order them by their distance from
QDi', and denote them qp(i), p=1, . For those
combinations, for which the inequality (4) is satisfied,
check if expression (5) holds. If, for at least one set of
indices {p(1), , p(64)} the expression (5) is
satisfied, the block B is JPEG compatible, otherwise it
is not.
6. After going through all T blocks, if no incompatible
JPEG blocks are found, the conclusion is that our
steganalytic method did not find any evidence for
presence of secret messages. If, on the other hand,
there are some JPEG incompatible blocks, we can
attempt to estimate the size of the secret message,
locate the message-bearing pixels, and even attempt to
obtain the original cover image before secret message
embedding started.

94

7. If all blocks are identified as JPEG incompatible or if


the image does not appear to be previously stored as
JPEG, we should repeat the algorithm for different
88 divisions of the image (shifted by 0 to 7 pixels in
the x and y directions). This step may be necessary if
the cover image has been cropped prior to message
embedding.

5.

Performance Analysis and Experiment Results

From the measured statistics of training sets of images


with and without hidden information, our destination is to
determine whether an image has been hidden information or
not. Artificial Neural Network have the ability to adapt, learn,
generalize, cluster or organize data.
There are many structures of Aitificial Neural Network
including, Percepton, Adaline, Madaline, Kohonen,
BackPropagation
and
many
others.
Probably,
BackPropagation Artificial Neural Network is the most
commonly used, as it is very simple to implement and
effective. In this work, we will deal with BackPropagation
Artificial Neural Network Neural network has an excellent
capability to simulate any nonlinear relation, so we make use
of neural network to classify images [7]. In this paper we take
use of BP neural network to train and simulate images. [6]
This BP neural network uses three levels: Input level, Hidden
level and Output level. In neural network, the important issue
is the slow of convergence.
In practice, this is the main limitation of neural network
applications. And many new algorithms claimed fast
convergence were developed. In this paper a single parameter
dynamic search algorithm is used to accelerate network train.
Each time only one parameter to be searched to achieve best
performance, so this learning algorithm has a better
improvement than other old algorithms ([9, 10]). We set the
number of this networks input as features, and node number
of hidden level is set to be 40, and output is either yes or no.
A typical BackPropagation ANN is as depicted below.
The black nodes (on the extreme left) are the initial inputs.
Training such a network involves two phases. In the first
phase, the inputs are propagated forward to compute the
outputs for each output node. Then, each of these outputs are
subtracted from its desired output, causing an error [an error
for each output node].
In the second phase, each of these output errors is passed
backward and the weights are fixed. These two phases is
continued until the sum of [square of output errors] reaches
an acceptable value.

Fig. 2 Back propogation

International Journal of Research and Reviews in Electrical and Computer Engineering (IJRRECE)
Training the network can be summarized as follows:
Apply input to the network.
Calculate the output.
Compare the resulting output with the desired output
for the given input. This is called the error.
Modify the weights for all neurons using the error.
Repeat the process until the error reaches an
acceptable value (e.g. error < 1%), which means that
the NN was trained successfully, or if we reach a
maximum count of iterations, which means that the
NN training was not successful.
The program trains the network using JPEG images that
are located in a folder. This folder must be in the following
format:
There must be one (input) folder that contains input
images [*.jpg].
Each image's name is the target (or output) value for
the network (the pixel values of the image are the
inputs, of course).

6.

Test Results

The cover image was taken from the image database. The
image was originally in JPEG format in 680x480 resolutions.
Since a BMP image was also required for the evaluation, a
second image in BMP format was generated using the same
JPEG image. Once both the cover images have been obtained,
the proposed method generates the secret code for both the
images were created. The encrypted image thus obtained was
steganographically concealed in the carrier image.

+ secretssecrets =
Cover file

steganography document

8.

95

Conclusion

In summary, each carrier media has its own special


attributes and reacts differently when a message is embedded
in it. Therefore, the steganalysis algorithms have also been
developed in a manner specific to the target stego file and the
algorithms developed for one cover media are generally not
effective for a different media. This paper we conclude that it
is possible to design efficient web search algorithms to detect
covert messages in corporate emails.

References
[1]

Ahmed Ibrahim, 2007, Steganalysis in Computer Forensics, Security


Research Centre Conferences, Australian Digital Forensics
Conference, Edith Cowan University.
[2]
Avcibas, I. Memon, N. and Sankur, B., 2003, Steganalysis using
image quality metrics, IEEE Trans. on Image Processing, vol. 12, no.
2, pp. 221229,
[3] Chandramouli, R., 2002, A Mathematical Approach to Steganalysis,
Proc. SPIE Security and Watermarking of Multimedia Contents IV,
California.
[4]
Geetha ,S., Siva, S. and Sivatha Sindhu, 2009, Detection of Stego
Anomalies in Images Exploiting the Content Independent Statistical
Footprints of the Steganograms, Department of Information
Technology, Thiagarajar College of Engineering, Madurai, ,
Informatica(2540).
[5] Greg Goth, 2005, Steganalysis Gets Past the Hype, IEEE, Distributed
Systems Online 1541-4922 2005 Published by the IEEE Computer
Society Vol. 6, No. 4.
[6] Sujay Narayana and Gaurav Prasad, 2010, Two new approaches for
secured image Steganography using cryptographic Techniques and
type conversions, Department of Electronics and
Communication,NITK,Surathkal, INDIA
[7] Liu Shaohui, Yao Hongxun, and Gao Wen, 2003, Neural network
based steganalysis in still images, Department of Computer Science,
Harbin Institute of Technology, ICME.
[8] Niels Provos, Peter Honeyman, 2003, Hide and Seek: Introduction to
Steganography, University of Michigan, Published by the IEEE
Computer Society.
[9] Niels Provos, and Honeyman, P.,2007, Detecting steganographic
content on the internet. Retrieved from
http://www.citi.umich.edu/u/provos/papers/detecting.pdf
[10] Samir K Bandyopadhyay, and Debnath Bhattacharyya, 2008, A
Tutorial Review on Steganography, University of Calcutta, Senate
House, 87 /1 College Street, Kolkata, UFL & JIITU.

Fig. 3 Steganography based document

The compression ratio and detection ratio of stego content


is also analyzed. By analyzing the images in the sampled
database the probability of occurrences of images with stego
content in the corporate mails is zero.

7.

Discussion

In this paper, we have analyzed the steganalysis


algorithms available for Image Steganography. The proposed
mathematical web search model admits a wide variety of
resource constraints. Depending on the application,
implementation, hardware, and steganalysis probability of
error constraints, a suitable resource model can be used to
derive an optimal web search strategy using the proposed
technique. Depending on the reliability of the steganalysis
algorithms employed and the storage constraint one of two
strategies, namely, coordinated search or random search can
be chosen. It is seen that for a certain range of steganalysis
reliability, both these methods give comparable performance.

M. Rajaram, M.E., Ph.D., is a Professor and Head in


Electrical and Electronics Engineering and Computer
Science and Engineering in Government College of
Engineering, Tirunelveli. He received B.E Degree in
Electrical and Electronics Engineering from Madurai
University, M.E and PhD degree from Bharathiyar
University, Coimbatore, in 1981, 1988 and 1994 years
and his research interests are Computer Science and
engineering, electrical engineering and Power Electronics. He is the author
of over 120 Publications in various International and National Journals. 7
PhD scholars and 10 M.S (By Research) Scholars have been awarded under
his supervision. At present, he is supervising 12 PhD Scholars. Further Dr.
Rajaram has become the Vice-Chancellor of Anna University of
Technology, Tirunelveli, Tamilnadu, India.

S. N. Sivanadam completed his B.E (Electrical and


Electronics Engineering) in 1964 from Government
College of Technology, Coimbatore and M. Sc.
(Engineering) in power system in 1966 from PSG
College of Technology, Coimbatore (University Second
Rank). He acquired Ph.D. in Control Systems in 1982
from Madras University. He has received the Best
Teacher Award in the year 2001 and the Dhakshina Murthy Award for
teaching Excellence from PSG College of Technology. He received the
CITATION for best Teaching and Technical contribution in the year 2002,
Government College of Technology, Coimbatore. He has teaching

International Journal of Research and Reviews in Electrical and Computer Engineering (IJRRECE)
experience (UG and PG) of over 44 years. The total number of
undergraduate and postgraduate projects guided by him for both Computer
Science and Engineering and Electrical and Electronics Engineering is
around 950. Formerly he was a Professor and Head for the departments EEE
and CSE, PSG College of technology, Coimbatore. Further he was a
coordinator for seven government funded projects. Dr. Sivanandam has coauthored 14 books. He has delivered around 100 special lectures of different
specializations in Summer/Winter schools and also in various Engineering
Colleges. He has guided 32 Ph.D. research works and at present 10 Ph.D.
research scholars are working under him. The total number of technical
publications credited to him in various National and International journals
and Conferences is around 750. He has chaired 12 International and 12
National Conferences. He is a member of various professional bodies like IE
(India), ISTE, CSI, ACS, SSI and IEEE. He is a Technical Advisor to
various reputed industries and reputed engineering Institutions. His research
areas include Modeling and Simulation, Neural Networks, Fuzzy Systems
and Genetic Algorithms, Pattern Recognition, Multi-dimensional System
Analysis, Linear and Non-Linear Control Systems, Signal and Image
Processing, Power Systems, Numerical Methods, Parallel algorithms, Data
mining and Database Security.

P T Anitha received B.Sc. Computer Applications and


Master of Computer Applications degree from Bharathiar
University in 1993 and 1996 respectively. Presently
working as an Assistant Professor in the department of
MCA, Karpagam College of Engineering, and Coimbatore.
I am Pursuing Doctorate degree in computer Science under
the guidance of Dr. M. Rajaram, Vice Chancellor of Anna
University of Technology, Tirunelveli, Tamilnadu, India. My area of
research is Steganalysis. Published four papers in the international
conferences, 2 papers in International Journals and 11 in national
conferences. My area of research is Steganalysis. Currently I am working to
improve the performance of the steganalysis algorithms used in corporate Emails.

96

Вам также может понравиться