Вы находитесь на странице: 1из 4

2010 Second International Workshop on Education Technology and Computer Science

Text Watermarking Using Combined Image-plus-


Text Watermark
Zunera Jalil and Anwar M. Mirza
Department of Computer Science,
FAST National University of Computer and Emerging Sciences,
A.K. Barohi Road, H-11/4, Islamabad, Pakistan
{zunera.jalil, anwar.m.mirza}@nu.edu.pk

algorithm. Also, the inherent requirements of a generic


Abstract— Authentication and copyright protection of digital watermarking scheme like imperceptibility, robustness,
contents over the internet is an important issue. Digital accuracy, capacity need to be satisfied.
watermarking provides a complete authentication and copyright
protection solution for this problem. Besides, image, audio, and
This paper briefly analyzes the previous work on text
video; text is the most dominant medium travelling over the watermarking in section 1. Section 2 contains the detailed
internet and it requires complete protection. Text watermarking description of proposed watermarking (embedding and
techniques have been developed in past to protect the text from extraction) algorithm. The experimental results for the
illegal copying, forgery, redistribution and to prevent copyright tampering (insertion, deletion and re-ordering) attacks with
violations. In this paper, we propose a novel text watermarking combined image-plus-text watermark are stated in section 3.
algorithm using combined image-plus-text watermark to fully The last section concludes the paper along with directions for
protect the text document. The watermark is logically embedded future work.
in the text and is extracted later to prove ownership.
Experimental results demonstrate the effectiveness of proposed II. PREVIOUS WORK
algorithm under localized as well as dispersed tampering attacks
on the text. Text watermarking is an important area of research;
Keywords- watermarking; copyright protection; information however, the previous work on digital text watermarking is
securit; text structure quite inadequate. The previous work on digital text
watermarking can be classified in the following categories; an
I. INTRODUCTION (HEADING 1) image based approach, a syntactic approach, a semantic
approach and the structural approach.
Increasing use of digital media and internet has made this
world, a global village. Besides, the digital world is also In image based approach towards text watermarking,
encountering problems of copyright protection, authentication, watermark is embedded in text image. Brassil, et al. were the
illegal copying and re-distribution of digital contents due to the first to propose a few text watermarking methods utilizing text
ease of information sharing in a nominal time. Text is the most image[2]-[3]. Later Maxemchuk, et al. [4]-[6] analyzed the
dominant medium existing in the digital world, besides image, performance of these methods. Huang and Yan [7] proposed
audio, and video; hence requires complete protection. The an algorithm based on an average inter-word distance in each
major component of websites, newspapers, e-books, research line.
papers, legal documents, letters, SMS messages, poetry, blogs In syntactic approach towards text watermarking, the
etc is the plain text; therefore, it is necessary to protect text. syntactic structure of text has been used to embed watermark.
Copyright protection of images, audio and video have been Mikhail J. Atallah, et al. first proposed the natural language
given due consideration in the past by many researchers, but watermarking scheme by using syntactic structure of text [8]-
text watermarking has been a neglected area. Digital [9]. Hassan et al. performed morpho-syntactic alterations to the
watermarking provides a complete copyright protection and text to watermark it [10]. An overview of available syntactic
authentication solution for digital contents. tools for text watermarking was provided in [11].
In semantic approach, semantics of text are utilized to
A digital watermark can be described as a visible or an
embed the watermark in text. Atallah et al. were the first to
invisible, preferably the latter, identification code that
permanently is embedded in the data [1]. The process of propose the semantic watermarking schemes in the year 2000
embedding a digital watermark into a digital text document that [12]. Later, the synonym substitution method [13] was
carries information unique to the copyright owner or the creator proposed. A noun-verb based technique for text watermarking
of the document is called Digital Text Watermarking. was also proposed [14] which exploit nouns and verbs in a
sentence parsed with a grammar parser using semantic
Copyright violations can be avoided by efficient text networks.
watermarking algorithms. The binary nature, work/line Later Mercan, et al. proposed an algorithm of the text
patterning, text meaning, grammar structure, writing styles, and watermarking by using typos, acronyms and abbreviation to
language rules, are some of the eminent properties of plain text embed the watermark [15]. Algorithms were developed to
which are needed to be addressed in any text watermarking

978-0-7695-3987-4/10 $26.00 © 2010 IEEE 11


DOI 10.1109/ETCS.2010.494

Authorized licensed use limited to: Sethu Institute of Technology. Downloaded on July 06,2010 at 07:25:33 UTC from IEEE Xplore. Restrictions apply.
watermark the text using the linguistic semantic phenomena of In figure 1, preprocessing of text includes, discarding
presuppositions [16]. The algorithm based on text meaning whitespaces, special characters, digits etc., thus making the
representation (TMR) strings has also been proposed [17]. watermark pure alphabetical. Preprocessing of image means
The structural approach is the most recent approach used conversion of image to grayscale and scaling to standard size
for copyright protection of text documents. A text (100 x 100 pixels). Afterwards, image is converted to plain text
watermarking algorithm for copyright protection of text using by normalization process. The textual watermarks and partial
occurrences of double letters (aa-zz) in text to embed the key containing a preposition, group and shift size is given as
watermark has recently been proposed [18]. Another algorithm input to the embedding algorithm.
which use preposition besides double letters to watermark text A watermark key is generated representing the inherent
is also proposed recently [19]. properties of text. After watermarking the text, the author
Text watermarking algorithms using binary text image are registers the key with the Certifying Authority.
not robust against re-typing attack. The text watermarking
The detailed watermark embedding algorithm is as follows:
methods based on semantics are language dependent. The
synonym based techniques are not resilient to the random 1. Input W, GS, Pr and T.
synonym substitution attacks. The structural algorithms are not 2. Split W into WImg and WTxt
applicable to all types of text documents and the algorithms are 3. Preprocess WImg and WTxt
restricted to only alphabetical watermark or only image 4. Convert WImg to WT
watermark. To increase robustness, it is better to use combined 5. Make partitions of T based on Pr
image-plus-text watermark instead of using plain textual or 6. Make groups of text based on GS,where
image watermark. Hence, we propose a text watermarking No.of groups = No. of partitions/GS
algorithm which uses combined image-plus-text watermark. 7. Count occurrence of double letters in
each group and find secong largest
III. PROPOSED ALGORITHM occuring double letter
The proposed algorithm uses combined image-plus-text 8. Populate 2MOL (2nd Maximum Occuring
watermark to ensure robustness. The occurrences of double Letter)list for each group.
letters existing in text are utilized to embed the watermark as in 9. W = Merge (WT ,WTxt)
[20]. The original copyright owner of text logically embeds the 10. While(j<watermark_length)
watermark in a text and generates a watermark key. The repeat step 11 to 12
watermarking process involves two stages, watermark 11. if(wj ȯ 2MOL list)
embedding and watermark extraction. Watermark embedding Key(i)=0,key(i+1)= groupnumber(2MOL)
is done by the original author and extraction done later by the else
Certifying Authority on author behalf to prove ownership. Key(i)=1, Key(i+1)=(wj+k)MOD26,
where k is in Z26 and Z26
A. Embeding Algorithm represents 26 alphabets(a-z)
The algorithm which embeds the watermark in the text is 12. Increment i
called embedding algorithm. The embedding algorithm takes 13. Output AK
the combined image-plus-text watermark as input, and W: watermark, WImg: image watermark, WTxt: text watermark,
performs preprocessing of image and the text. The embedding GS: Group size, Pr: Preposition, T: text file, WT: text
process is shown in figure 1. watermark, AK: Author key

The watermark (W) is first split into image (WImg) and text
(WTxt). WImg is first converted to alphabet and we obtain an
alphabetical watermark (WT). Then, depending on preposition
(Pr) and group size (GS) input by user (partial key), partitions
and groups are formed. In the next step, the occurrence of each
double letter is counted in each group and the 2nd largest
occurring double letter in each group is identified (2MOL).
The key generator generates the author key by using watermark
(W) and 2MOL list as shown in the algorithm and generates
the author key(AK). This author key is then registered with the
CA along with the watermark, original text, current date, and
time.

B. Extraction Algorithm
The algorithm used to extract the watermark from the
watermarked text is known extraction algorithm. It takes the
author key and watermarked text as input and extracts the
Figure 1. Watermark embedding process. watermark (image-plus-text) from the text. The algorithm is

12

Authorized licensed use limited to: Sethu Institute of Technology. Downloaded on July 06,2010 at 07:25:33 UTC from IEEE Xplore. Restrictions apply.
kept with the Certifying Authority that uses it to resolve [20]. The average accuracy of extracted watermark under
copyright issues, if any, at a later stage. The detailed extraction localized tampering attack is shown in table 1 and figure 3.
algorithm is as follows:
1. Input AK and T. TABLE I. ACCURACY OF EXTRACTED WATERMARK (IMAGE, TEXT AND
2. Read Pr from AK and set counter=1. OVERALL) UNDER LOCALIZED TAMPERING ATTACK

3. Make partitions of T based on Pr Text Accuracy of extracted watermark


4. Make groups of text based on GS i.e. category Image Text Overall
Numberofgroups=Numberof partitions/GS
5. Count occurrence of double letters in SST 97.19% 85.31% 91.25%
each group and find secong largest MST 97.36% 81.88% 89.62%
occuring double letter LST 97.22% 90.00% 93.61%
6. Populate 2MOL (2nd Maximum Occuring VLST 97.13% 92.50% 94.82%
Letter)list in each group.
7. L=length(AK), I=6
8. While(I<L)repeat 9 to 10
9. If(AK(I)equals 0)
W(I)=groupnumber(2MOL)
else
W(I)= AK(I+1) i.e. cipher letter
10. I=I+1
11. Split W in WImg and WTxt
12. Output WImg and WTxt
In the extraction algorithm, text is partitioned using
preposition(Pr) from author key (AK). Then partitions are
combined to make text groups as done previously in the
embedding algorithm. Afterwards, occurrence of double letters
in each group is analyzed and second maximum occurring Figure 3. Accuracy of extracted watermark under localized tampering attack
letter (2MOL) in each group is identified. The contents of on all text sample
author key (AK) are then used to obtain watermark from the
text. The reverse process of figure 1 is performed in the The accuracy of extracted watermark is always greater than
extraction process, where extraction algorithm takes place of 80%. Textual watermark is more sensitive to tampering attacks
embedding algorithm. than image watermark. Hence the accuracy of text is lesser
than image. However the combined accuracy is around 90%.
IV. EXPERIMENTAL RESULTS Experiments were also performed under dispersed tampering
attacks on all text samples and the percentage accuracy of
We used 20 samples of variable size text as in [25] and [20] extracted watermark for image and text is shown in table 2 and
for experiments. Group size was kept 2, 3 5 and 10 for Small figure 4.
Size Text (SST), Medium Size Text (MST), Large Size Text
(LST) and very Large Size Text (VLST) respectively.
Preposition ‘on’ was used in all experiments. The combined TABLE II. ACCURACY OF EXTRACTED WATERMARK (IMAGE, TEXT AND
OVERALL) UNDER DISPERSED TAMPERING ATTACK
image-plus-text watermark used in experiments is shown in
figure 2. Text Accuracy of extracted watermark
Category Image Text Overall
SST 95.53% 90.00% 92.76%
MST 95.68% 86.88% 91.28%
LST 95.61% 89.06% 92.34%
VLST 95.37% 95.94% 95.65%

The accuracy of extracted watermark under dispersed


National University of Computer and Emerging tampering attacks is also greater than 85% in most of the cases.
Sciences, Islamabad, Pakistan The accuracy of textual watermark is good in small size text
and very large size text. However, the overall accuracy of
extracted watermark is always greater than 80%.
Figure 2. Original Watermark (Image and text)
Image watermark is more resilient towards dispersed
We evaluated the performance of algorithm under both tampering attacks since the accuracy is always above 95 % for
localized and dispersed tampering attacks where tampering all text samples. And it can be clearly observed that overall
means random insertion, deletion, and re-ordering of words; to accuracy is always above 90% on all text samples.
and from the text. The tampering volume was kept same as in

13

Authorized licensed use limited to: Sethu Institute of Technology. Downloaded on July 06,2010 at 07:25:33 UTC from IEEE Xplore. Restrictions apply.
[3] J. T. Brassil, S. Low, and N. F. Maxemchuk, “Copyright Protection for
the Electronic Distribution of Text Documents”, Proceedings of the
IEEE, vol. 87, no. 7, pp.1181-1196, July 1999.
[4] N. F. Maxemchuk, S. H. Low, “Performance Comparison of Two Text
Marking Methods”, IEEE Journal of Selected Areas in Communications
(JSAC),vol. 16 no. 4 1998. pp. 561-572, May 1998.
[5] N. F. Maxemchuk, “Electronic Document Distribution,” AT&T
Technical Journal, September 1994, pp. 73-80. 6.
[6] N. F. Maxemchuk and S. Low, “Marking Text Documents”, Proceedings
of the IEEE International Conference on Image Processing, Washington,
DC, , pp. 13-16, Oct. 26-29, 1997.
[7] D. Huang and H. Yan, “Interword distance changes represented by sine
waves for watermarking text images”, IEEE Trans. Circuits and Systems
Figure 4. Accuracy of extracted watermark under dispersed tampering attack for Video Technology, Vol.11, No.12, pp.1237-1245, Dec 2001.
on all text sample [8] M. J. Atallah, C. McDonough, S. Nirenburg, and V. Raskin, “Natural
Language Processing for Information Assurance and Security: An
We have adopted a novel approach in text watermarking Overview and Implementations”, Proceedings 9th ACM/SIGSAC New
Security Paradigms Workshop, Cork, Ireland, pp. 51–65, September,
where image and text are combined to form watermark. There 2000.
is no such previous work on combined image-plus-text [9] M. J. Atallah, et al., “Natural language watermarking: Design,analysis,
watermark, so comparison is not possible. Also, there is no and a proof-of-concept implementation”, Proceedings of the Fourth
benchmark text available to facilitate comparison. Information Hiding Workshop, vol. LNCS 2137, Pittsburgh, PA, 25-27
April 2001.
V. CONCLUSION [10] Hassan M. Meral et al., “Natural language watermarking via
morphosyntactic alterations”, Computer Speech and Language, 23, 107-
Text watermarking methods for English language text 125, 2009.
proposed so far; use either an image watermark or a textual [11] Hasan M. Meral, et al, “Syntactic tools for text watermarking”, 19th
watermark. The existing text watermarking algorithms are not SPIE Electronic Imaging Conf. 6505: Security, Steganography, and
Watermarking of Multimedia Contents, San Jose, Jan. 2007.
robust against random tampering attacks. Watermarks
composed of both image and text, make the text secure and [12] M. Topkara, C. M. Taskiran, and E. Delp, “Natural language
watermarking”, Proceedings of the SPIE International Conference on
has better robustness. We have developed a text watermarking Security, Steganography, and Watermarking of Multimedia Contents VII,
algorithm, which uses combined image-plus-text watermark to 2005.
watermark the text document. Watermark can later be [13] U. Topkara, M. Topkara, M. J. Atallah, “The Hiding Virtues of
separately identified to prove the ownership. We evaluated the Ambiguity: Quantifiably Resilient Watermarking of Natural Language
performance of the algorithm for localized and dispersed Text through Synonym Substitutions”, In Proceedings of ACM
random tampering attack in 20 texts. The results show that the Multimedia and Security Conference, Geneva, 2006.
algorithm using text plus image watermarks are more robust, [14] Xingming Sun, Alex Jessey Asiimwe, “Noun-Verb Based Technique of
Text Watermarking Using Recursive Decent Semantic Net Parsers”,
secure and efficient against random tampering attacks. Lecture Notes in Computer Science (LNCS) 3612: 958-961, Springer
Press, August 2005.
ACKNOWLEDGMENT [15] M. Topkara, U. Topraka, M.J. Atallah, “Information hiding through
errors: a confusing approach”, Proceedings of SPIE International
Z. Jalil, 041-101673-Cu-014 would like to acknowledge the Conference on Security, Steganography, and Watermarking of
Higher Education Commission of Pakistan for providing the Multimedia Content IX., San Jose, CA, 2007.
funding and resources to complete this work under Indigenous [16] B. Macq and O. Vybornova, “A method of text watermarking using
Fellowship Program. presuppositions” Proceedings of the SPIE International Conference on
Security, Steganography, and Watermarking of Multimedia Contents,
January 2007J. Clerk Maxwell, A Treatise on Electricity and Magnetism,
REFERENCES 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73.
[17] P. Lu, Z. Lu, and J. Gu, “An optimized natural language watermarking
[1] A. Khan, A. M. Mirza and A. Majid, “Optimizing Perceptual Shaping of algorithm based on TMR”, Proceedings of 9th International Conference
for Young Computer Scientists, 2009.
a Digital Watermark Using Genetic Programming”, Iranian Journal of
Electrical and Computer Engineering, vol. 3, pp. 144-150, 2004. [18] Z. Jalil and A. M. Mirza, “A Novel Text Watermarking Algorithm Based
on Double Letters”, unpublished.
[2] J. T. Brassil, S. Low, N. F. Maxemchuk, and L. O’Gorman, “Electronic
Marking and Identification Techniques to Discourage Document [19] Z. Jalil and A. M. Mirza, “A Preposition based Algorithm for Copyright
Copying”, IEEE Journal on Selected Areas in Communications, vol. 13, Protection of Text Documents”, unpublished.
no. 8, pp. 1495-1504, October 1995.

14

Authorized licensed use limited to: Sethu Institute of Technology. Downloaded on July 06,2010 at 07:25:33 UTC from IEEE Xplore. Restrictions apply.

Вам также может понравиться