Response To Reviewers

Response to Reviewers
Thank you very much for your approval to our work. We acknowledge your comments and
suggestions very much, which are valuable in improving the quality of our manuscript. Here below is
our description on revision according to your comments.

To Reviewer #1:

Question #1:
The authors state that this approach is not likely to be observed by an agent looking for the use of
stenography, but they never justify this claim. Does word ever naturally split up sentences and
paragraphs with multiple and tags? If not, this approach will stand out if people look for it.
Answer:
We have previously considered this issue. At first, the number of printable characters was used to
represent the secret binary message, not the number of printable words (In general, one word
contains one or more characters). Thus, the paragraphs and sentences were split up unnaturally into
chunks of 1-4 characters, which is easy to arouse suspicion. So we later split paragraphs into chunks
of 1-4 words to represent the secret binary message. Of course, as you said, a large number of
chunks of 1-4 words could also arouse suspicion. In order to increase imperceptibility, now we select
the sentences which have the same Run Properties to be split into chunks of 1-4 words. The natural
degree of the spilt main document part could not be exactly same to the original, but can be more
close to the original; there seem to be a puzzling contradiction between the embedding capacity and
imperceptibility.

Question #2:
The authors should investigate whether the embedded data is still present if an RTF or PDF file is
made from the OOXML file.
Answer:
Thank you for the kind advice. We have investigated the RTF or PDF file which is made from the
OOXML stegodocument and found that the embedded data is still present in the RTF file but not in
the PDF file. Opening the RTF file using Notepad, we can found that the selected sentences are still
split into the same chunks. However, the related information (the chunks, Robust Mark etc.) does not
exist in the PDF format file.

Question #3: The authors need to understand why some of their files get smaller after the data is
embedded. The statement in section 5.1 line 54 that "the reason is that the OOXML documents use
ZIP technology" is nonsensical. When information is embedded using this technique the
decompressed document gets larger, so why does the compressed document sometimes get
smaller? The authors should understand what is going on.
Answer:
In the experiment, we developed the steganography tool using Java. There are 10 compression levels
(0-9) in the ZipOutputStream of Java. The higher the number is, the higher the compression level is.
The default compression level is 8 in our developed tool using Java, but the default compression level
of the office 2007-2010 files is 6. Therefore, some of the files get smaller after the data is embedded.
The related content has been added in the revised manuscript, see 5.2.

Question #4:
Finally, the authors should release their tool.
Answer:
The tool was developed using Java by the authors (available online at:
http://nisl.hnu.cn/index.php?option=com_content&task=view&id=42).

To Reviewer #2:

Question #1:
My concern in this paper is that the authors discuss the embedding of an encrypted message in two
occurrences (e.g. page 7 Line 29, Page 7 Line 61, and Page 11 Line 38),but the remaining of the paper
they discuss embedding a secret message (no indication that the secret message is encrypted). The
authors should either discuss further the use of encryption or discard it. (The goal of the paper is to
hide a message inside an OOXML based document and not to protect its confidentiality.)
Answer:
Thank you for pointing this out. As you said, the goal of the paper is to hide a message inside an
OOXML document, so the use of encryption for the secret message is no longer discussed in the
article.

Question #2:
The authors may organize the related work section into subsections. They may also emphasize the
contrast of related work to their solutions.
Answer:
The related work has been organized into 3 subsections: Image-Based Approach, Linguistic approach
and Structural approach. Due to that the key of this paper is to hide message in OOXML format files,
so the previous work which hide message using structural approach was introduced in detail. The
contrast of related work to our solutions was also presented in Section 2 and Section 5.

Question #3:
Section 2 should discuss the linguistic approach since it is reported in the paper several times but was
not discussed in related work.
Answer:
Thank you for the kind advice. The linguistic approach which includes syntactic approach and
semantic approach has been discussed in Section 2.

Question #4:
The equation in Page 6/ Line 26 could be explained better with a case.
Answer:
We explained the equation with an example (n=4, that is, 4-bits of secret message are selected to
embed into each segment), see 4.2. And we deduced the maximum of the equation, see the next
Reply.

Question #5:
In Page 6/ Line 32, the authors use the words we can deduce while they only observe the fact from
their table. The authors need to derive their equation to find the maximum so they can deduce the
maximum.
Answer:
The deduction is as follows:
Calculate f(n+1)-f(n), common denominator,
The denominator is greater than 0;
When n = 1, f(n+1)-f(n)>0, that is , f(2)> f(1)
When n > 1, the numerator <0, and it is decrease with the increase of n, so f(2)> f(3) > f(4) > > f(n)
Therefore, R takes the maximum 0.8 when n = 2.

Question #6:
The author may improve the analysis of the embedding rate of the solution using Claude Shannon
approach for information theory:
http://cm.belllabs.com/cm/ms/what/shannonday/shannon1948.pdf
Answer:
Thank you for the kind advice. We simply analyze the embedding capacity using Claude Shannon
approach for information theory.
Determining how large a message can be hidden inside a cover message without becoming
detectable has been a long unanswered question. However, we can estimate the average maximum
message length that can be hidden without becoming detectable by the measured statistics of the
probability distribution. If we consider that xi is being used as an cover information channel, we know
from information theory that the maximum amount of information on average that can be
transmitted through such a channel is equal to the entropy of the probability distribution P(xi):
The smaller probability distribution P(xi) is, the greater the entropy is. The capacity limit can be
measured for a given xi. Therefore, the embedding capacity should be limited within a certain range
in which the message can be hidden inside a cover message without becoming detectable. In our
experiment, the paragraphs, which have the same properties and contains four or more words, are
selected to embed secret message to resist the statistical attack. The paragraphs which contain three
or less words are not selected. This selection strategy can make sure that the natural degree of the
spilt part is more close to the original.
The related content has been added in the revised manuscript, see 6.2.

Question #7:
Page 10: The authors should discuss further extracting message attack. Better they can add a
section that discusses the threat model and the resilience of the solution to the different attacks.
Answer:
Thank you for the kind advice. The extracting message attack is discussed in section 6.3.1, and a
new section has been added in the revised manuscript to discuss the active attack and passive attack.
The resilience of the proposed method and the detection accuracy to the different attacks are also
presented, see 6.3.2.

Question #8:
Section 6 should be discussed further.
Answer:
Section 6 contains 4 subsections: Imperceptibility, Embedding capacity, Robustness, Security. Each
subsection is discussed further, and the related content has been added in the revised manuscript,
see section 6.

Question #9:
Page 6 / Line 31: The equation is not correct.
Answer:
The equation has been corrected.

Question #10:
Page 7 / Line 36: The line creates an infinite loop
Answer:
Thank you for pointing this out. Our Carelessness caused the error (infinite loop). This step has been
corrected.

Question #11:
Page 8/ Line 59: Robust markup is included in all items run and not only the second and fifth (see Fig.
4)
Answer:
The attribute is defined as Robust Mark. If two or more neighboring elements have the same
attributes, they will be automatically merged into one element by Office application after the
document is modified. Robust Mark needs to be alternately added into split run elements to resist
merger of the split elements. So Robust Mark should be added into the second and fourth split run
elements, not in all split run elements.

To Reviewer #3:

Question #1:
Organization of the manuscript is not good. It is difficult to follow tables when they are referenced in
the paper, because tables are placed at the end of the manuscript. In addition, there are
unnecessarily repeated figures, for example, on pages from 16 to 20.
Answer:
The journal requires that tables must be on separate pages after the reference list, and not be
incorporated into the main text. We are sorry for the inconvenience.

Question #2:
Experimental setup is not described very well. It is not clear how secret information has been
embedded into 10000 MS Word documents, and then extracted from them. Furthermore, there is no
information about the implementation of the proposed method. Is there a software implementation
of the proposed method in order to embed and then extract secret information from
stegodocuments. If yes, there should be description about the implementation process of the
proposed method. If not, then the proposed method should be implemented, and thereby its
applicability should be shown.
Answer:
10,000 test documents were retrieved and downloaded by Google on the internet. Due to that we
did not find the existing test documents library on the Internet, so we (6 people) spent one week
collecting the documents.
According to our proposed embedding and extraction algorithm, we developed a steganography tool
using Java, which automatically embeds secret message into all embeddable places in the test
documents and extracts them. (Available online at:
http://nisl.hnu.cn/index.php?option=com_content&task=view&id=42).
The implementation process of the proposed method is as follows:
Firstly the test document was browsed and unpacked using our developed tool, and then the main
document part document.xml was extracted;
Secondly secret information was embedded into the main document part according to our proposed
embedding algorithm, and then all the parts were packed into the original OOXML file (the test
document);
Thirdly embedding capacity and embedding rate of each stegodocument was calculated;
Finally, if there is need to extract the secret information, the stegodocument was browsed and the
embedded information was extracted according to our proposed extraction algorithm.
All steps are completed automatically by our developed tool.
The related content has been added in the revised manuscript, see section 5.1.

Question #3:
In section 6.1, it is claimed that the proposed method has better imperceptibility. But, in
comparasion with what? It is not certain what the proposed method is compared with.
Answer:
We have come to this conclusion from the following three respects:
1) Text content:
The proposed method does not change the format information of characters (font, color etc.) which
were presented by previous text format based approach and does not change the linguistic
information of cover text which were presented by previous linguistic approach.
2) The change rate of file size
The change rate of file size can almost be ignored when the proportion of embedding bit rate is equal
to or greater than 90% using our proposed method.
3) Embedding strategy and selection strategy
Embedding strategy and selection strategy are used to make sure that the natural degree of the spilt
part is more close to the original. The embedding capacity can be limited within a certain range in
which the message can be hidden inside a cover message without becoming detectable. In our
experiment, the paragraphs, which have the same properties and contains four or more words, are
selected to embed secret message to resist the statistical attack. This goal can also be achieved by
decreasing the embedding capacity which can be measured by using information theory method.
The related content are presented in the revised manuscript, see section 6.1.

Question #4:
Embedding capacity may not be satisfactory for most applications.
Answer:
As you said, embedding capacity of the proposed method is not satisfactory, especially in the covert
communication needing large capacity. However, embedding bit rate of the proposed method is
higher than contemporary linguistic steganography approaches. The proposed method can apply to
the fields of copyright protection for OOXML format documents or the fields of covert
communication needing small capacity.

Question #5:
There is no information about what the possible encryption algorithm could be?
Answer:
Encryption algorithm could be considered to improve the security of hidden data.The keys are not
the same in asymmetric encryption algorithm which could ensure the security of encrypted
information. However, the goal of the paper is to hide a message inside an OOXML document and
not to protect its confidentiality, so the use of asymmetric encryption for the secret message will be
discussed in the future work.

Question #6:
The language of the manuscript could be much more fluent.
Answer:
About the English writing of the manuscript, we revised the manuscript in accordance with
reviewers comments, and carefully proof-read the manuscript to minimize typographical,
grammatical and bibliographical errors. All the modifications have been included in the revised
manuscript.

Question #7:
There is not enough information what the novelty and originality of the proposed method, and
advantages sides in comparison with earlier related works.
Answer:
Thank you for the kind advice. The novelty and originality of the proposed method are as follows:
At present there are few researches on information hiding in OOXML format documents. This paper
proposes a novel steganographic method in OOXML format documents. Secret data can be
embedded into cover OOXML documents by splitting up the printable text into a sequence of words.
The proposed method does not change the format information of characters (font, color etc.) which
were presented by previous text format based approach and does not change the linguistic
information of cover text which were presented by previous linguistic approach. Selection strategy
makes sure that the natural degree of the spilt part is more close to the original, and embedding
strategy improves the information embedding capacity.
Compared with other methods, the proposed method does not observably change the size of cover
document and not add new files or parts to MS Office file package. The proposed method can resist
Format, Impersonation, Save As, Copy and other active attacks, and the embedding capacity
of the proposed method is higher than contemporary linguistic steganography approaches.
Experiments demonstrate the feasibility of the proposed method.
The related content has been added in section 1.

Response To Reviewers

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Response To Reviewers

Загружено:

Авторское право:

Доступные форматы

Response to Reviewers

Вам также может понравиться