Вы находитесь на странице: 1из 8

Owner Verification and Copyright Protection for Relational Data

Using Digital Watermarking

S. A. Shah, S.A.M. Gilani, I. A. Awan


Faculty of Computer Science & Engineering GIK Institute, Topi Pakistan
Email: sarif, asif, imtiazawan@giki.edu.pk

ABSTRACT The watermarks are chosen so as to have an


A novel technique is introduced in this paper, for insignificant impact on the usefulness of the data
relational database copyright protection and and are placed in such a way that a malicious user
ownership verification through watermarking. A cannot destroy them without making the data
new embedding channel is discovered where significantly less useful.[1]
watermark is embedded in such a way that Although watermarking does not prevent
semantic meaning of data is preserved. This illegal copying, it deters such copying by providing
technique does not introduce any distortion in a means for establishing the original ownership of
actual data while embedding the watermark. To a redistributed copy[1]
make this scheme more secure, tuples are selected Traditionally the focus of watermarking for right
using a secret key known only to the owner. Later protection has been on multimedia more recently it
the same key is used in detection process. is shifting towards different data types such as text,
Moreover there is no need of original data for software and most recently the relational data
watermark detection so it is a fully blind scheme. [1, 12]. The ever increasing use of outsourced
The method is proved (through experiments) to be relational data demands more effective ways for
resilient to various kinds of attacks. Using an rights protection, especially in the areas where
implementation running on SQL Server database it sensitive valuable data is outsourced. For example
is shown that performance of our algorithms allows data mining application (financial data, oil drilling
for their use in real world application. data, sales patterns database etc), where a set of
data is usually produced or collected by one party
Index Terms: - Right protection, relational data, and sold in pieces to others. So it becomes
watermarking. necessary to associate owner rights over it. The
Internet is also exerting tremendous pressure on
1. INTRODUCTION data providers to create services that allow users to
Most data today are in digital formats which are search and access databases remotely, it exposes
easy to reproduce and modify without any trace of the data providers to the threat of data theft.
manipulations. This has made the piracy of digital Providers are therefore demanding
data (software, images, video, audio, and text) very Technology for identifying pirated copies of their
easy. So the copyright protection has been a databases. Watermarking can be used to solve this
concern for owners of these digital assets. Different problem. In watermarking right witness in the
protection schemes have been proposed. A number form of watermark is concealed within actual data
of technologies have been developed to provide in such a way that alteration caused by the
data protection including cryptography and watermark does not destroy the value of the work
steganography [11]. Cryptography protects data by and that is difficult for a malicious attacker to
encrypting them so that the encrypted data are detect and destroy watermark without causing any
meaningless to an attacker. However, once the data loss [10, 7].
encrypted data are decrypted, the data are in the There are variety of techniques for image
clear and are no longer under protection. A new watermarking which later on extended for audio
emerging technology, the insertion of digital and video[10], but these does not work for
watermarks into the data [5] complements relational database because of the following big
cryptography. A digital watermarking technique differences.
embeds a watermark, including a signature or a  Data in a database has a defined semantic
copyright message, such as a trade logo, a seal, or a meaning and sometime a minor change of
sequence number, into an image. Subsequently, the even a single bit can destroy its value.
watermark can be extracted/detected from the  A multimedia object has considerable
watermarked image and be adopted to verify the redundancy. Database relation consists of
ownership. tuples with little or no redundancy.

252 IMECS 2006


 The relative spatial/temporal positioning robust watermarking scheme for categorical data.
of various pieces of a multimedia object Like Agrawals scheme, some tuples are first
typically does not change. In Tuples of a selected according to a secret key. Then, the
relation there is no implied ordering. categorical attributes of selected tuples are changed
 Portions of a multimedia object cannot be to other values in the set of possible categorical
dropped or replaced arbitrarily without attribute values based on the embedded watermark.
causing perceptual changes in the object. For example, the color of a product may be
However, the pirate of a relation can changed from red to green. As we see, though only
simply drop some tuples or substitute a small part of selected tuples are affected by the
them with tuples from other relations. watermarking embedding, the modifications of
categorical attributes in certain applications may be
The paper is organized as follows. In section 2 too significant to be acceptable.
related work is discussed. Section 3 discusses our
scheme in detail. Section 4 analyzes main features 3. OUR SCHEME
of our scheme. In section 5 different attacks are Most of the above mentioned schemes are based on
discussed. Experimental setup and results are also the manipulation of numeric data (which must have
outlined in section 5. Section 6 concludes. some margin of error), thus limiting their domain
[1, 2, 4, 9]. Also the schemes discussed above
2. RELATED WORK. introduce distortion in the contents by changing
Work on database watermarking has started attribute values which is often not desirable.
recently. Agrawal and Kiernan first presented a In our scheme we introduce a new
robust watermarking scheme for databases [1]. In embedding channel that we are embedding
their scheme, it is assumed that attributes are watermark in non numeric data or more precisely
numeric and can tolerate modifications of some the alphabetic data attributes. Since the database
least significant bits. First tuples are selected for queries are case insensitive so it will not affect the
watermark embedding. Then an attribute is selected semantic meaning of data if the case is changed
and finally selected bits of that attribute are from small to capital or vice versa. We are going to
modified in order to embed watermark bits. All the exploit this inherent property present in such kind
selection process is based on secret embedding of data attributes.
key. The scheme is claimed to be robust against a We now present our technique for
wide range of attacks including rounding attack, watermarking relational database. This technique
subset attack, and additive attack etc. Since the marks only alphabetic attributes with out
watermark detector can only determine whether the introducing any change in their semantic meaning.
related watermark is embedded or not, actually one All attributes need not be watermarked. Data
bit information is embedded. owner will decide which attributes are more
Radu Sion et al. [2] present a different suitable for watermarking. Let R be the database
approach to robust watermarking scheme for relation with schema R (P, A0A v-1) where P is
databases. Unlike Agrawals tuple based
watermarking scheme, Sions scheme is subset
based where all tuples are securely divided into r Row or record of a relation
non-intersecting subsets. A single watermark bit is
embedded into tuples of a subset by modifying the K Secret key known only to owner
distribution of tuple values. The same watermark
bit is embedded repeatedly across several subsets n Number of tuples in the relation
and the majority voting technique is employed to
recover the embedded bits. In addition to the subset v Number of attributes in the tuple
attack, this scheme is claimed to be robust against
other attacks such as data resorting and data t No of tuples to be marked
transformation. However, the scheme is not Fraction of tuples to be watermarked( g
suitable for database relations that need frequent 1/g being gap parameter)
updates, since it is very expensive to re-watermark Fig. 1 Notations and Parameters
the updated database relations. In the
aforementioned watermarking schemes, attributes the primary key attribute. Figure 1 show the
are assumed to be numeric and can tolerate small important parameters used in our algorithms. For
modifications, and watermarks are embedded by simplicity assume that all the attributes are
modifying attribute values. In [4], Sion proposes a candidate for watermarking. g is used to

IMECS 2006 253


determine the number of tuple to be watermarked.
1. foreach tuple r r R do step 2 to 5
If t denote the number of attributes to be marked
2. PrimaryKeyHash pHash=H (K, r.P)
then
3. If (pHash mod g ==0) then //Select This
t n / g
tuple
r.Ai is used to denote value of attribute Ai. Since,
4. attribute_index: i= pHash mod v //Select
in this technique we are using Hashed Message
Attribute r.Ai
Authentication Code (HMAC), so a brief review is
5. selected attribute array: selectedA[i]=r.Ai
presented in the next section.
6. Sort Selected tuples using pHash
7. t= length of selectedA[ ]
3.1 Hashed Message Authentication Code
8. Generate a watermark w of size t as
HMAC uses cryptographic one way hash
Random binary sequence using key K.
function H [6, 11]. H takes an input of arbitrary
9. Call embedwm( selectedA[ ], w)
size and returns a fixed length hash value h. i.e.
h=H (input). It has an important characteristic that,
//Embed Watermark in the Value of attributes
given the input it is easy to compute h, but given h
Ai
it is hard to compute back the input thats why it is
embebwm subroutine::
called one way hash function. There are number of
For i=0 to t repeat step 9 to 13
hash function like MD4, MD5 SHA1 SHA256 etc.
10. r.Ai= selectedA[i]
A Message Authentication Code (MAC) is
11. Case-I: When r. Ai is single word
computed using a one way hash function that
12. If w[i] ==1 then
depends on a key. If K denotes the key then H will
Chang its Case to Title case //If not already so
randomize the input primary key r.P of
else If w[i]==0 then
relation R when H is seeded with K known only to
Chang its Case to Capital
owner. The Following MAC is used in our Scheme
13. Case-II: When r.Ai is multi word
MAC=H (K, r.P)
14. If w[i] ==1 then
Where we are using SHA-1 as one way hash
Chang whole text to Title case //If not already so
function H.
else If w[i]==0 then
Chang to Sentence Case
3.2 Watermark Embedding:
Figure 2 gives the algorithm1 for watermark
embedding. In line one a watermark sequence is
generated based on the key. Here one can use
company name or logo as watermark after Figure 2: Watermark embedding Algorithm
converting it to binary sequence. Lines 3, 4
determine the tuple and attribute to be marked Sometimes database contain null values in that case
respectively. Selection of both depends on secret mark is not applied. Also when there is text such as
key of the owner, so only the owner can identify abbreviation, where a standard is there, in that case
which tuple and which attribute of that tuple is to no change applied
be marked. An attacker has to guess the tuple as
well as attribute within the tuple to destroy the 3.3 Watermark Detection Algorithm:
watermark. embedwm subroutine actually embeds Let Alice is the owner of the database and
the watermark depending on the corresponding Mallory another person with pirated copy of
watermark bit. It adjusts the case of selected Alices Data. We assume that Mallory has not
attribute value according to the conditions laid dropped the primary Key because dropping it may
down in Algorithm. Case is adjusted so as to cause loss of important data. The algorithm for
follow most common practice e.g. if watermark bit watermark detection is given in the figure 3.In line
is 1 then case is converted to title case and for 0 to 3 the tuple is selected where watermark is suppose
sentence case. to be marked. Line 4 determines attribute marked.
Both of these are selected using same secret key
used during embedding. In line s 6 to 10 watermark
bits are extracted using the following conditions.
//Secret key K and parameters g and v are private to
the owner.

254 IMECS 2006


When the attribute value has only a single word standard statistical tests for these properties [8].
then watermark extracted bit is 1 if it has Title case The values in the sequence are determined by the
and bit 0 if all alphabets are caps. value of an initial seed. Given a fixed seed value,
repeated executions of G generate the same fixed
sequence of numbers every time. Several
//Secret key K and parameters g, v are private pseudorandom sequence generators are described
to the owner. in [6].
Selection of tuple and attributes is purely key based
1. foreach tuple r r R do steps 2 to 4 which makes our scheme secure because this key is
2. PrimaryKeyHash pHash=H(K, r.P) only known to the owner. Moreover use of a secure
3. if (pHash mod g ==0) then Select one way hash function makes this scheme more
This tuple secure.
4. attribute_index i= pHash mod v 4.2 Blind Detection. Our detection algorithm
//Select Attribute Ai does not require original database for watermark
5. Sort Selected tuples using pHash extraction. The watermark w is extracted from
//Extract Watermark we watermarked relation, which is then verified. So
6. Repeat step 7 to 10 for selected tuples our scheme is blind in nature. It is difficult to keep
7. Case-I: When r.Ai is single word original version of distributed copy of database
8. If (r.Ai has title case) then because it requires frequent updates, so a blind
we[i]==1 technique is very helpful.
e If (r.Ai has all caps ) then
we[i]==0 4.3 Incremental updateability.
9. Case-II: When r.Ai multi word Since primary key attribute determines whether to
10. If (whole text of r.Ai has Title case ) mark the tuple or not, so a tuple can be deleted,
then inserted, or updated without examining or altering
we[i]==1 any other tuple. A deleted tuple requires no
else If(First word of r.Ai has title and processing by the watermarking algorithm. In case
others lower case) then of insertion, the tuple is marked or not, based on its
we[i] ==0 primary key value. When updating the primary key
// Verify Watermark attribute of a tuple, the tuples marking is
11. Generate w using key K recomputed before storing it in the database. In
12. result_vector=w XOR we case of updating a non primary key attribute, if
Figure 3: Watermark Detection Algorithm attribute is not selected for marking then nothing
need be done. On the other hand, if the algorithm
For attribute having multiple words, the watermark has selected the attribute, the mark is applied to the
bit is 1 if whole text has title case, and bit is 0, if all attributes value before the tuple is stored in the
words are lower case except the first one with title database.
case (Sentence case). Watermark verification is
done in lines 11 & 12, where, first the original 5. ATTACKS ANALYSIS AND
watermark is generated using same secret key and EXPERIMENTS
then compared with the extracted watermark. The watermarked data can be attacked in various
ways through malicious attacks and benign
4. DISCUSSION updates. The most common attacks are:
In this section we discuss some important features Tuple Deletion.
of our scheme related to security detection and Updating Attributes Values.
updatability. Changing Case.
4.1 Security. The watermark generation is done
through A cryptographically secure pseudorandom The first two attacks may affect the usability of
sequence generator G deterministically generates a data, so, for an attacker these kinds of attack are
sequence of numbers in which it is computationally often less desirable. The third one is actually a
infeasible to predict the next number in the legal attack which does not affect the usability of
sequence [6]. Statistically, the numbers generated data so is most important to study it. We analyzed
by G appear to be a realized sequence of our scheme against this attack through experiments
independent and identically distributed random and results show that it is robust against the above
variables, in the sense that the numbers pass attacks. For experimental purpose we used SQL

IMECS 2006 255


server database of more than ten thousands records detection, so any kind of sorting attack will not
on Windows Xp platform. The value of g was harm the detection of watermark.
kept 3 and number of watermark able attributes v
was also 3 in our sample database. In the following
we present experiment involving different attacks Loss In Watermark Detection After
Effect of Modification on watermark
Modifications
(Data loss, Data Alterations, Change in Case).

% Loss in Watermark
6
Effect of Tuple Deletion on Watermark 5
4
14 3
12 2
% Distortion in

10
Watermark

1
8
0
6
4

10

14

18

22

26

30
2

6
2
0
% Tuples Modified
10

14

18

22

26

30
2

% of Tuples Deleted
Figure 5. Watermark degrades gracefully

Fig. 4 Watermark Degrades Slowly 5.4 Case Alteration Attacks. Since we are
playing with the case of attributes values so this
5.1 Data Loss Attack (Tuple deletion) kind of attack is only specific to our scheme and
In this kind of attack we study the also it is a legal attack so robustness against it must
distortion of watermark as the input data is be checked. In this experiment we randomly (and
subjected to gradually increasing level of data loss. repeatedly) changed the case of attributes values.
The experiment is performed repeatedly and the The results are averaged our various runs. As
results are then averaged over multiple runs. The shown in the Fig. 6 It is observed that our scheme
average behavior is plotted in Fig. 4. It shows that is very robust against this kind of attack. After
with 35-40% of data Loss Watermark distortion is changing the case of up to 60% tuples, the
12 %. watermark distortion is less than 20%. Fig. 7
shows the comparison of all three kinds of attack.
5.2 Data Modification (Updates) Attack After all experiments we can say that our scheme is
Mallorys (The attacker) priority would be to very robust against modification and change of
destroy watermark, while preserving the data. case attacks.
Given no knowledge of secret key or the original
data, the other available choice may to attempt 6. CONCLUSIONS AND FUTURE WORK
(minor) data modification in attributes values, thus In this paper we introduced a new scheme for
hoping to destroy watermark at some point. Since watermarking relational database for owner
Mallory has no knowledge of original data thus verification and copyright protection. A solution is
current watermarks distortions are also unknown so proposed by:
it is impossible for him to determine real minority
of changes. In this experiment we analyze the Discovering a new embedding channel for
sensitivity of our scheme to randomly occurring watermarking.
changes. The demonstrated behavior is shown in
the Fig. 5. The results show that only 6%
Building an algorithm for watermarking
distortion in watermark is observed with 35-40%
such that it preserves data integrity by
random modifications. So it is more robust against
introducing zero distortion to its semantic
such kind of attacks.
meaning.
5.3 Tuple Sorting Attack.
We thus provided a robust watermarking
technique for relational data. Through experiments,
During embedding, we sort the selected tuples
we also proved that our scheme is robust
before embedding watermark, using primary key
against common database attack as well attacks
hash value, and the same process is repeated during
specific to our scheme only.

256 IMECS 2006


Loss In Watermark Detection After Case Alterations
25
% Loss in Watermark

20

15

10

0
10 15 20 30 35 40 45 50 55 60
% Tuples Altered
Figure 6

Different Attacks Comparison


14
12
% Watermark Loss

10
Tuple deletion
8
6 Modification
4
Case
2 Alteration
0
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
(% )Tuple Effected
Figure 7

In future we intend to analyze and improve this Proc. International Conference on Information
scheme against other attacks such as sub set and Technology: Coding and Computing, 2000.
partitioning attacks. Another research direction [6] Schneider B (1996) Applied cryptography,
may be to investigate a method for data 2nd ed. Wiley, New York
authentication using fragile watermarks. [7] I. J. Cox, J. Kilian, T. Leighton and T.
Shamoon, Secure Spread Spectrum Watermarking
for Multimedia", IEEE Trans. on Image
References Processing, 6, 12, 1673-1687, (1997).
[1] R. Agrawal and J. Kiernan. Watermark [8] Knuth D (1981) Semi numerical algorithms.
relational databases. In Proc. of the 28th Inter. In: The art of computer programming, vol
Conf. On Very Large Data Bases, 2002. 2.Addison-Wesley, Reading, MA
[2] R.Sion. Proving ownership over categorical [9] Y. Li, V. Swarup, and S. Jajodia. A robust
data. In Proceedings of ICDE 2004, 2004. watermarking scheme for relational data. In Proc.
[3] I. J. Cox, M. Miller, and J. Bloom. The 13th workshop on information technology and
Watermarking applications and properties. In Proc. engineering, pages 195200, December 2003.
International Conference on Information [10] Ingemar Cox, Matthew Miller, Jeffrey Bloom
Technology: Coding and Computing, 2000. Digital Watermarking Morgan Kaufmann .2002
[4] R.Sion, M. Atallah, and S. Prabhakar. Rights [11] N. Ferguson and B. Schneier, Practical
protection for relational data. In Proceedings of Cryptography. Wiley & Sons, 2003.
ACM SIGMOD 2003, 2003. [12] J. Brassil, L. OGorman, Watermarking
[5 I. J. Cox, M. Miller, and J. Bloom. Document Images with Bounding Box
Watermarking applications and properties, In Expansion, in Anderson [1], pp. 227-235.

IMECS 2006 257

Вам также может понравиться