Академический Документы
Профессиональный Документы
Культура Документы
Authorized licensed use limited to: Sethu Institute of Technology. Downloaded on July 06,2010 at 07:25:33 UTC from IEEE Xplore. Restrictions apply.
watermark the text using the linguistic semantic phenomena of In figure 1, preprocessing of text includes, discarding
presuppositions [16]. The algorithm based on text meaning whitespaces, special characters, digits etc., thus making the
representation (TMR) strings has also been proposed [17]. watermark pure alphabetical. Preprocessing of image means
The structural approach is the most recent approach used conversion of image to grayscale and scaling to standard size
for copyright protection of text documents. A text (100 x 100 pixels). Afterwards, image is converted to plain text
watermarking algorithm for copyright protection of text using by normalization process. The textual watermarks and partial
occurrences of double letters (aa-zz) in text to embed the key containing a preposition, group and shift size is given as
watermark has recently been proposed [18]. Another algorithm input to the embedding algorithm.
which use preposition besides double letters to watermark text A watermark key is generated representing the inherent
is also proposed recently [19]. properties of text. After watermarking the text, the author
Text watermarking algorithms using binary text image are registers the key with the Certifying Authority.
not robust against re-typing attack. The text watermarking
The detailed watermark embedding algorithm is as follows:
methods based on semantics are language dependent. The
synonym based techniques are not resilient to the random 1. Input W, GS, Pr and T.
synonym substitution attacks. The structural algorithms are not 2. Split W into WImg and WTxt
applicable to all types of text documents and the algorithms are 3. Preprocess WImg and WTxt
restricted to only alphabetical watermark or only image 4. Convert WImg to WT
watermark. To increase robustness, it is better to use combined 5. Make partitions of T based on Pr
image-plus-text watermark instead of using plain textual or 6. Make groups of text based on GS,where
image watermark. Hence, we propose a text watermarking No.of groups = No. of partitions/GS
algorithm which uses combined image-plus-text watermark. 7. Count occurrence of double letters in
each group and find secong largest
III. PROPOSED ALGORITHM occuring double letter
The proposed algorithm uses combined image-plus-text 8. Populate 2MOL (2nd Maximum Occuring
watermark to ensure robustness. The occurrences of double Letter)list for each group.
letters existing in text are utilized to embed the watermark as in 9. W = Merge (WT ,WTxt)
[20]. The original copyright owner of text logically embeds the 10. While(j<watermark_length)
watermark in a text and generates a watermark key. The repeat step 11 to 12
watermarking process involves two stages, watermark 11. if(wj ȯ 2MOL list)
embedding and watermark extraction. Watermark embedding Key(i)=0,key(i+1)= groupnumber(2MOL)
is done by the original author and extraction done later by the else
Certifying Authority on author behalf to prove ownership. Key(i)=1, Key(i+1)=(wj+k)MOD26,
where k is in Z26 and Z26
A. Embeding Algorithm represents 26 alphabets(a-z)
The algorithm which embeds the watermark in the text is 12. Increment i
called embedding algorithm. The embedding algorithm takes 13. Output AK
the combined image-plus-text watermark as input, and W: watermark, WImg: image watermark, WTxt: text watermark,
performs preprocessing of image and the text. The embedding GS: Group size, Pr: Preposition, T: text file, WT: text
process is shown in figure 1. watermark, AK: Author key
The watermark (W) is first split into image (WImg) and text
(WTxt). WImg is first converted to alphabet and we obtain an
alphabetical watermark (WT). Then, depending on preposition
(Pr) and group size (GS) input by user (partial key), partitions
and groups are formed. In the next step, the occurrence of each
double letter is counted in each group and the 2nd largest
occurring double letter in each group is identified (2MOL).
The key generator generates the author key by using watermark
(W) and 2MOL list as shown in the algorithm and generates
the author key(AK). This author key is then registered with the
CA along with the watermark, original text, current date, and
time.
B. Extraction Algorithm
The algorithm used to extract the watermark from the
watermarked text is known extraction algorithm. It takes the
author key and watermarked text as input and extracts the
Figure 1. Watermark embedding process. watermark (image-plus-text) from the text. The algorithm is
12
Authorized licensed use limited to: Sethu Institute of Technology. Downloaded on July 06,2010 at 07:25:33 UTC from IEEE Xplore. Restrictions apply.
kept with the Certifying Authority that uses it to resolve [20]. The average accuracy of extracted watermark under
copyright issues, if any, at a later stage. The detailed extraction localized tampering attack is shown in table 1 and figure 3.
algorithm is as follows:
1. Input AK and T. TABLE I. ACCURACY OF EXTRACTED WATERMARK (IMAGE, TEXT AND
2. Read Pr from AK and set counter=1. OVERALL) UNDER LOCALIZED TAMPERING ATTACK
13
Authorized licensed use limited to: Sethu Institute of Technology. Downloaded on July 06,2010 at 07:25:33 UTC from IEEE Xplore. Restrictions apply.
[3] J. T. Brassil, S. Low, and N. F. Maxemchuk, “Copyright Protection for
the Electronic Distribution of Text Documents”, Proceedings of the
IEEE, vol. 87, no. 7, pp.1181-1196, July 1999.
[4] N. F. Maxemchuk, S. H. Low, “Performance Comparison of Two Text
Marking Methods”, IEEE Journal of Selected Areas in Communications
(JSAC),vol. 16 no. 4 1998. pp. 561-572, May 1998.
[5] N. F. Maxemchuk, “Electronic Document Distribution,” AT&T
Technical Journal, September 1994, pp. 73-80. 6.
[6] N. F. Maxemchuk and S. Low, “Marking Text Documents”, Proceedings
of the IEEE International Conference on Image Processing, Washington,
DC, , pp. 13-16, Oct. 26-29, 1997.
[7] D. Huang and H. Yan, “Interword distance changes represented by sine
waves for watermarking text images”, IEEE Trans. Circuits and Systems
Figure 4. Accuracy of extracted watermark under dispersed tampering attack for Video Technology, Vol.11, No.12, pp.1237-1245, Dec 2001.
on all text sample [8] M. J. Atallah, C. McDonough, S. Nirenburg, and V. Raskin, “Natural
Language Processing for Information Assurance and Security: An
We have adopted a novel approach in text watermarking Overview and Implementations”, Proceedings 9th ACM/SIGSAC New
Security Paradigms Workshop, Cork, Ireland, pp. 51–65, September,
where image and text are combined to form watermark. There 2000.
is no such previous work on combined image-plus-text [9] M. J. Atallah, et al., “Natural language watermarking: Design,analysis,
watermark, so comparison is not possible. Also, there is no and a proof-of-concept implementation”, Proceedings of the Fourth
benchmark text available to facilitate comparison. Information Hiding Workshop, vol. LNCS 2137, Pittsburgh, PA, 25-27
April 2001.
V. CONCLUSION [10] Hassan M. Meral et al., “Natural language watermarking via
morphosyntactic alterations”, Computer Speech and Language, 23, 107-
Text watermarking methods for English language text 125, 2009.
proposed so far; use either an image watermark or a textual [11] Hasan M. Meral, et al, “Syntactic tools for text watermarking”, 19th
watermark. The existing text watermarking algorithms are not SPIE Electronic Imaging Conf. 6505: Security, Steganography, and
Watermarking of Multimedia Contents, San Jose, Jan. 2007.
robust against random tampering attacks. Watermarks
composed of both image and text, make the text secure and [12] M. Topkara, C. M. Taskiran, and E. Delp, “Natural language
watermarking”, Proceedings of the SPIE International Conference on
has better robustness. We have developed a text watermarking Security, Steganography, and Watermarking of Multimedia Contents VII,
algorithm, which uses combined image-plus-text watermark to 2005.
watermark the text document. Watermark can later be [13] U. Topkara, M. Topkara, M. J. Atallah, “The Hiding Virtues of
separately identified to prove the ownership. We evaluated the Ambiguity: Quantifiably Resilient Watermarking of Natural Language
performance of the algorithm for localized and dispersed Text through Synonym Substitutions”, In Proceedings of ACM
random tampering attack in 20 texts. The results show that the Multimedia and Security Conference, Geneva, 2006.
algorithm using text plus image watermarks are more robust, [14] Xingming Sun, Alex Jessey Asiimwe, “Noun-Verb Based Technique of
Text Watermarking Using Recursive Decent Semantic Net Parsers”,
secure and efficient against random tampering attacks. Lecture Notes in Computer Science (LNCS) 3612: 958-961, Springer
Press, August 2005.
ACKNOWLEDGMENT [15] M. Topkara, U. Topraka, M.J. Atallah, “Information hiding through
errors: a confusing approach”, Proceedings of SPIE International
Z. Jalil, 041-101673-Cu-014 would like to acknowledge the Conference on Security, Steganography, and Watermarking of
Higher Education Commission of Pakistan for providing the Multimedia Content IX., San Jose, CA, 2007.
funding and resources to complete this work under Indigenous [16] B. Macq and O. Vybornova, “A method of text watermarking using
Fellowship Program. presuppositions” Proceedings of the SPIE International Conference on
Security, Steganography, and Watermarking of Multimedia Contents,
January 2007J. Clerk Maxwell, A Treatise on Electricity and Magnetism,
REFERENCES 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73.
[17] P. Lu, Z. Lu, and J. Gu, “An optimized natural language watermarking
[1] A. Khan, A. M. Mirza and A. Majid, “Optimizing Perceptual Shaping of algorithm based on TMR”, Proceedings of 9th International Conference
for Young Computer Scientists, 2009.
a Digital Watermark Using Genetic Programming”, Iranian Journal of
Electrical and Computer Engineering, vol. 3, pp. 144-150, 2004. [18] Z. Jalil and A. M. Mirza, “A Novel Text Watermarking Algorithm Based
on Double Letters”, unpublished.
[2] J. T. Brassil, S. Low, N. F. Maxemchuk, and L. O’Gorman, “Electronic
Marking and Identification Techniques to Discourage Document [19] Z. Jalil and A. M. Mirza, “A Preposition based Algorithm for Copyright
Copying”, IEEE Journal on Selected Areas in Communications, vol. 13, Protection of Text Documents”, unpublished.
no. 8, pp. 1495-1504, October 1995.
14
Authorized licensed use limited to: Sethu Institute of Technology. Downloaded on July 06,2010 at 07:25:33 UTC from IEEE Xplore. Restrictions apply.