Вы находитесь на странице: 1из 10

Tamper Detection and Localization for Categorical Data

Using Fragile Watermarks

Yingjiu Li1 Huiping Guo2 Sushil Jajodia2


1
School of Information Systems, Singapore Management University, Singapore 259756
2
Center for Secure Information Systems, George Mason University, Fairfax, VA 22030
yjli@smu.edu.sg, hguo1@gmu.edu, jajodia@gmu.edu

ABSTRACT meaningless to an attacker. However, once the encrypted data are


Today, database relations are widely used and distributed over the decrypted, the data are in the clear and are no longer under protec-
Internet. Since these data can be easily tampered with, it is crit- tion. On the other hand, steganography conceals the very existence
ical to ensure the integrity of these data. In this paper, we pro- of the data by hiding them in cover data. The problem is that it
pose to make use of fragile watermarks to detect and localize ma- cannot extract the hidden data if the stego data undergo some dis-
licious alterations made to a database relation with categorical at- tortions.
tributes. Unlike other watermarking schemes which inevitably in- A new emerging technology, digital watermarking, complements
troduce distortions to the cover data, the proposed scheme is dis- cryptography and steganography by embedding an invisible signal
tortion free. In our algorithm, all tuples in a database relation are directly into the data, thus providing a promising way to protect
first securely divided into groups according to some secure parame- digital data from illicit copying and manipulation. After embed-
ters. Watermarks are embedded and verified in each group indepen- ding, the watermark and the data are inseparable. There is a wide
dently. Thus, any modifications can be localized to some specific range of applications of digital watermarking including copyright
groups. Theoretical analysis shows that the probability of missing protection, authentication, fingerprinting, copy control, and broad-
detection is very low. cast monitoring, etc. For different kinds of applications, digital wa-
termarking should satisfy different properties [3]. Here, we mainly
focus on the tamper detection problem. For this kind of applica-
Categories and Subject Descriptors tion, digital watermarking should have properties such as invisibil-
H.1 [Information System]: Models and Principles; H.2.7 [Database ity, fragility, high detection reliability, etc.
Management]: Database Administrationsecurity, integrity and pro- Digital watermarks can be classified into two categories based
tection; K.6.5 [Management of Computing and Information Sys- on their application: fragile watermarks for tamper detection and
tems]: Security and Protection robust watermarks for ownership verification [10]. In the last few
years, research on fragile watermarking for multimedia data, such
General Terms as images, audio, and video [2, 4, 5, 11], has been extensively con-
ducted. Recently, some researchers began to realize the importance
Security, Algorithms
of watermarking databases and proposed some robust watermark-
ing schemes for database relations [1, 8, 9, 14]. However, all of
Keywords these schemes are designed to protect the copyright of a database
Fragile watermarking, database security, integrity relation. Though it is very important to verify the source or owner
of a database relation, in some cases, it is critical to ensure the
1. INTRODUCTION authenticity of database relations. This is of increasing interest in
many applications where database relations are publicly available
The security of digital data has been a great concern since the ex-
on the Internet. For example, in the application of database out-
panded use of these data over the Internet. Because digital data al-
sourcing, owners of databases, who do not have sufficient resources
low unlimited number of copies of an original without any qual-
to maintain the applications, store their databases on servers pro-
ity loss and can also be easily distributed and forged, this presents
vided by external application service providers so that the owners
problems of copyright protection and tamper detection, creating a
can focus on their own core tasks [6, 7, 12]. The application ser-
pressing need for digital data protection schemes.
vice providers provide data processing service to clients on behalf
A number of technologies have been developed to provide data
of the owners. Since service providers may not be trusted, it is
protection including cryptography and steganography. Cryptogra-
the database owners responsibility to ensure the integrity of out-
phy protects data by encrypting them so that the encrypted data are
sourced databases. Similar applications include edge computing
and data dissemination etc.
Unfortunately, despite the importance of tamper detection for
Permission to make digital or hard copies of all or part of this work for database relations, this problem has not been adequately addressed.
personal or classroom use is granted without fee provided that copies are Although some digital signature based schemes have been pro-
not made or distributed for profit or commercial advantage and that copies posed to address this problem, they can only detect, but not lo-
bear this notice and the full citation on the first page. To copy otherwise, to calize, the modifications. Thus, like fragile watermarking for mul-
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. timedia data, it is desirable to have a fragile watermarking scheme
DRM04, October 25, 2004, Washington, DC, USA. for database relations, so that any modifications made to a database
Copyright 2004 ACM 1-58113-969-1/04/0010 ...$5.00.
relation can be not only detected but localized as well. This is es- Agrawal and Kiernan first presented a robust watermarking scheme
pecially useful for a very large database relation where the rest of for databases [1]. In their scheme, it is assumed that attributes are
the relation can still be trusted and used after some tampered tuples numeric and can tolerate modifications of some least significant
are detected and localized. bits. Tuples are first selected for watermark embedding. Then some
Embedding watermarks in database relations is a challenging bits of some attributes of the selected tuples are modified to embed
problem because there is little redundancy present in a database watermark bits. The selection of tuples, attributes, and bit positions
relation. One important property of digital watermarks is invisibil- is governed by a secret embedding key. The scheme is claimed to
ity. Usually, in a watermarking scheme, a watermark is embedded be robust against a wide range of attacks including rounding attack,
by slightly modifying the cover data. To ensure invisibility, the subset attack, and additive attack etc. Since the watermark detector
modifications are limited to some acceptable level. This requires can only determine whether the related watermark is embedded or
that the cover data can tolerate these modifications. In the con- not, actually one bit information is embedded. Li et al. [8, 9] fur-
text of multimedia data, this requirement is not a problem. Since ther extend this scheme to a fingerprinting scheme where multiple
multimedia data are highly correlated, there is a lot of redundant in- bits information can be embedded. Thus, a distinct watermark can
formation present in multimedia data. Although compression tech- be embedded to each copy of a database relation so that potential
niques can remove some of the redundant information, currently, illegal distributors can be tracked if unauthorized copies are found
no compression technique is perfect enough to remove all the re- later.
dundant information. This leaves room for watermark embedding. Sion et al. [14] present a different approach to robust watermark-
A watermark can be embedded as a part of the redundant informa- ing scheme for databases. Unlike Agrawals tuple based water-
tion without affecting the quality of the multimedia data. Further- marking scheme, Sions scheme is subset based where all tuples
more, some properties of the human vision (auditory) system can are securely divided into non-intersecting subsets. A single water-
be incorporated to the watermark embedding so that the strength of mark bit is embedded into tuples of a subset by modifying the dis-
the embedded watermark can be adjusted adaptively. All of these tribution of tuple values. The same watermark bit is embedded re-
make it easy to ensure invisibility for multimedia watermarking. peatedly across several subsets and the majority voting technique is
In contrast, database relations contain large number of independent employed to recover the embedded bits. In addition to the subset at-
tuples. A tuple can be added, deleted, or modified without affect- tack, this scheme is claimed to be robust against other attacks such
ing other tuples. All tuples and all attributes are equally important. as data resorting and data transformation. However, the scheme is
There is little redundancy present in the tuples. Thus, it is a chal- not suitable for database relations that need frequent updates, since
lenge to embed an invisible watermark in a database relation. it is very expensive to re-watermark the updated database relations.
In current robust database watermarking schemes, there is an im- In the aforementioned watermarking schemes, attributes are as-
portant assumption that all watermarked attributes are numeric and sumed to be numeric and can tolerate small modifications, and wa-
can tolerate small distortions. Although this assumption is reason- termarks are embedded by modifying attribute values. In [13], Sion
able for some kinds of database relations such as weather and mea- proposes a robust watermarking scheme for categorical data. Like
surement databases, in real life, we also need to deal with database Agrawals scheme, some tuples are first selected according to a
relations which contain categorical attributes such as social secu- secret key. Then, the categorical attributes of selected tuples are
rity number, name, date of birth, etc. Obviously, these attributes changed to other values in the set of possible categorical attribute
cannot tolerate any modifications. values based on the embedded watermark. For example, the color
To the best of our knowledge, the only effort on watermarking of a product may be changed from red to green. As we see, though
categorical data is by Sion [13]. In his scheme, although only a only a small part of selected tuples are affected by the watermark-
small number of tuples are selected for watermark embedding, the ing embedding, the modifications of categorical attributes in certain
categorical values of the selected tuples are modified to embed a applications may be too significant to be acceptable.
watermark. Such modifications may be too significant to be tolera-
ble. Besides, their scheme is a robust watermarking scheme and is
designed to protect the copyright of a database relation. A fragile 3. ALGORITHMS
watermarking scheme for categorical data is yet to be devised.
In this paper, we propose a fragile watermarking scheme for de- 3.1 Design purpose
tecting and localizing malicious alterations made to a database rela-
For different applications, the design purposes of watermarking
tion with categorical attributes. Unlike other watermarking schemes
schemes are different. In a robust watermarking scheme for owner-
which inevitably introduce distortions to the cover data, the pro-
ship verification, an attacker attempts to remove the embedded wa-
posed scheme is distortion free. In our algorithm, all tuples in a
termark or make it undetectable while keeping the database relation
database relation are first securely divided into groups according to
useful. Thus, the design purpose is to make the embedded water-
some secure parameters. Watermark is embedded and verified in
mark robust against malicious attacks. In contrast, our scheme is
each group independently. Thus, any modification can be localized
a fragile watermarking scheme for tamper detection. In this kind
to some specific groups. Security analysis shows that the probabil-
of scheme, an attacker will try her best to make modifications to
ity of missing detection is very low.
a database relation while keeping the embedded watermarks un-
The rest of this paper is organized as follows. Section 2 gives
touched. The attack is successful if the database relation is modi-
an overview of the related work. Section 3 explains in detail our
fied while the embedded watermarks are still detectable. Thus, in
proposed fragile watermarking scheme, including watermark em-
our scheme, the embedded watermarks are designed to be fragile
bedding and watermark detection. Security analysis of the scheme
so as to detect any modifications made to a database relation. If
is provided in section 4. Section 5 concludes this paper with sum-
there are some alterations, the embedded watermarks should not be
maries and suggestions for future work.
detectable. Furthermore, since our scheme is designed for categor-
ical data that cannot tolerate distortion, the watermark embedding
2. RELATED WORK should be distortion free. In short, the fragile watermark for cate-
Recently, database watermarking has begun to receive attention. gorical data should have the following properties:
Algorithm 1 Watermark embedding
Table 1: Notations and parameters
1: For all k [1, g] qk = 0
number of attributes in the relation
2: for i = 1 to do
number of tuples in the relation
3: hi = HASH(K, ri .A1 , ri .A2 , , ri .A ) // tuple hash
g number of groups in the table
4: hpi = HASH(K, ri .P ) // primary key hash
average number of tuples in a group
5: k = hpi mod g
hpi primary key hash of the ith tuple in the table or in a 6: ri Gk
group 7: qk + +
hi tuple hash of the ith tuple in the table or in a group 8: end for
H group hash 9: for k = 1 to g do
Gk the kth group 10: watermark embedding in Gk // See Algorithm 2
qk the number of tuples in Gk 11: end for
ri the ith tuple in the table or in a group
ri .Aj the j th attribute of the ith tuple in the table or in a
group to an embedding key and the primary key the tuple. Based on the
K watermark embedding key primary key hash value, all tuples are then securely divided into g
W watermark embedded in a group groups. The grouping is only a virtual operation which means that
W watermark extracted in a group it does not change the physical position of the tuples. After group-
V watermark verification result for W ing, all tuples in each group are sorted according to their primary
key hash. Like grouping, the sorting operation does not change the
physical position of tuples. Each group is then watermarked in-
dependently. Algorithm 2 shows the detailed embedding process.
1. Invisibility: In a watermarking scheme for categorical data, Before embedding, a secure tuple hash is computed for each tuple
the embedded watermark should not introduce any distor- based on the embedding key and all attributes of the tuple. Since
tions to the categorical data, since even a minor modification the attributes can be of any type, we encode each attribute to an
will render the categorical data useless. integer. For example, a string attribute may be encoded as the con-
catenation of the ASCII code of each letter in the string. A group
2. Prevent illegal embedding and verification: The whole wa- hash value is then computed based on the tuple hash of sorted tu-
termarking process is governed by a key. Only an authorized ples in the group and the embedding key. A watermark, the length
person who has a key can embed, extract, and verify water- of which is equal to the number of tuple pairs in the group, is ex-
marks. This prevents unauthorized persons from inserting a tracted from the group hash value. That is, some selected bits from
false watermark or illegally verifying watermarks. the group hash value are put together to form a watermark. To em-
bed the watermark, for each tuple pair, the order of the two tuples
3. Blind verification: The original unmarked database relation
are changed or unchanged according to their tuple hash values and
should not be required for watermark verification.
the corresponding watermark bit. As we can see, since only the or-
4. The extracted watermark indicates the locations of alterations: der of tuples is changed, the watermark embedded is distortion free.
In case of modifications, the embedded watermarks should This is different from all other watermarking schemes which em-
indicate where the modifications are or at least narrow down bed watermarks by introducing small distortions to the cover data.
the modifications to a range of tuples. Since watermarking embedding is group based, any modifications
made to the database relation can be detected and localized to some
specific groups. Here, grouping and sorting are two very important
3.2 Watermark embedding operations in the scheme. They indeed enforce some relationship
A very important problem in a watermarking scheme is synchro- between tuples so that the embedded watermarks and the extracted
nization, that is, we must ensure that the watermark extracted is in watermarks can be synchronized.
the same order as that embedded. If synchronization is lost, even if Figure 1 illustrates an example of watermark embedding for a
no modifications have been made, the embedded watermark cannot small table which only has 10 tuples. The table on the left is the
be correctly verified. In a fragile watermarking scheme for multi- original un-watermarked table. For simplicity, we only show hy-
media data, synchronization is not a problem, since the relative po- pothetical hash vlaue, not the real value, for each tuple. Suppose
sition of multimedia data is fixed. In contrast, tuples in a database all tuples are divided into two groups: all shaded tuples belong to
relation are independent and can be put in an arbitrary order. On the first group where a watermark {01} is embedded; the remain-
the one hand, this particular property makes fragile watermarking ing tuples compose the second group where a watermark {101} is
for categorical data a challenge. On the other hand, we may make embedded. To embed the watermark into the group 1, we take the
use of this property to devise a novel approach to database water- tuple pair {1009, 1005}. Since first watermark bit is zero 0 and
marking through manipulating the order of tuples. The advantage the first tuple hash is larger than the second one, the two tuples are
of this approach is that the attributes will not be modified by water- switched. The other two tuples in the second pair remain untouched
mark embedding, which makes it an ideal solution to watermarking because a watermark bit 1 is embedded and the first tuple hash is
categorical data. The following proposed scheme is based on this already larger than the second one. Similarly, the second group is
idea. Table 1 gives the notations and parameters that will be used watermarked and the watermarked table is shown on the right.
in this paper.
Suppose there is a database relation which has a primary key P 3.3 Watermark detection
and attributes, denoted by T (P, A1 , A2 , , A ). Algorithms Algorithms 3 and 4 describe the watermark detection algorithm.
1 and 2 describe the watermark embedding algorithm. First, a se- To verify the integrity of a database relation, we need to know K
cure primary key hash value is computed for each tuple according and g. As in watermark embedding, the secure primary key hash
Algorithm 2 Watermark embedding in Gk Algorithm 3 Watermark detection
1: sort tuples in Gk in ascendant order according to their primary 1: For all k [1, g] qk = 0
key hash // Virtual operation 2: for i = 1 to do
2: H = HASH(K, h1 , h2 , , hqk ) // hi (i = 1, qk ) is the 3: hi = HASH(K, ri .A1 , ri .A2 , , ri .A ) // tuple hash
tuple hash of ith tuple after ordering 4: hpi = HASH(K, ri .P ) // primary key hash
3: W = extractBits(H, qk /2) // See line 9 - 16 5: k = hpi mod g
4: for i = 1, i < qk , i = i + 2 do 6: ri Gk
5: if (W [i/2] == 1 and hi < hi+1 ) or (W [i/2] == 0 and 7: qk + +
hi > hi+1 ) then 8: end for
6: switch the position of ri and ri+1 9: for k = 1 to g do
7: end if 10: watermark verification in Gk // See Algorithm 4
8: end for 11: end for
9: extractBits(H, l) {
10: if length(H) l then Algorithm 4 Watermark verification in Gk
11: W = concatenation of first l selected bits from H
12: else 1: sort tuples in Gk in ascendant order according to their primary
13: m = l - length(H) key hash // Virtual operation
14: W = concatenation of H and extractBits(H,m) 2: H = HASH(K, h1 , h2 , , hqk ) // hi (i = 1, qk ) is the
15: end if tuple hash of ith tuple after ordering
16: return W } 3: W = extractBits(H, qk /2) //See line 9-16 in Algorithm 2
4: for i = 1, i < qk , i = i + 2 do
5: if hi hi+1 then
i hi i hi 6: W  [i/2] = 0
1 1009 1 1005 7: else
2 2001 2 2001 8: W  [i/2] = 1
3 1005 3 1009 9: end if
4 4310 4 4310 10: end for
5 1000 5 1000 11: if W  == W then
6 2357 6 1111 12: V = T RU E
7 2100 7 2100 13: else
8 1111 8 2357 14: V = F ALSE
9 3294 9 3294 15: end if
10 3000 10 3000

binations, log2 m! bits are needed. Thus, by manipulating the order


Figure 1: A table before and after watermark embedding of tuples in the sub-groups, maximum slog2 m! watermark bits can
be embedded in a group. Actually, the proposed scheme introduced
in section 3 is the special case of m = 2. The embedding capac-
is computed for each tuple and all tuples are divided into groups ity and complexity is minimum in this case. If the whole group is
and each group is verified independently. In a group, the tuples are viewed as a sub-group ( s = 1, m = ), the maximum embedding
first sorted according to their primary key hash. Like watermark capacity of log2 ! is achieved. Of course, this also maximizes the
embedding, the sorting is a virtual operation and does not involve complexity of the watermark embedding and detection. For a spe-
order change of any tuples. Based on tuple hash of the sorted tuples cific order arrangement of the tuples, it is time consuming to decide
and the secret embedding key, a group hash value is computed and the related watermark bits. Thus, trade-offs must be made between
a watermark W is extracted accordingly. W is the watermark that security and complexity.
is supposed to be embedded. Then, the watermarking W  that is
actually embedded is extracted from the tuples. For every tuple 3.4.2 Threat model
pair, if tuple hash of the first tuple is larger than that of the second Our scheme is designed to detect following malicious modifica-
one, the related watermark bit is one; otherwise, it is zero. After tions: modification of one or multiple attribute values, insertion of
W  is extracted, it is checked against W . If the two match, the one or more tuples, and deletion of one or multiple tuples. The ob-
tuples in the group is authentic; otherwise, it is not. jective of an attacker is to modify the tuples in such a way that his
modifications are undetectable. Our alalysis in section 4 shows that
3.4 Discussion these modifications can be detected with high probability.
Our scheme distributes all tuples evenly in groups due to the
3.4.1 Embedding capacity property of the hash function. The relative order of tuples rep-
In the proposed scheme, since each tuple pair can only embed resents the embedded watermarks. What if the order is changed
one watermark bit, the length of the watermark in each group is inadvertently? It is likely that such a change will not change the
/2. As seen in section 4, the longer a watermark, the more secure relative position of tuples in one or two groups. In this case, the
is the scheme. To increase the embedding capacity, one solution embedded watermark is not affected and can still be correctly ver-
is to re-divide the tuples in a group to sub-groups. Suppose the ified. For example, in Figure 1, if the tuples 1005 and 2001 in the
number of subgroups is s, then the average number of tuples in a watermarked table (the one on the right) are switched, this will not
sub-group is m(m = /s). In each sub-group, the number of all affect the embedded watermark since their relative positions remain
possible combinations of m tuples is m!. To represent these com- the same.
However, in other cases, even if the order of tuples is changed Now consider the case that the modified value is a primary key
without any modifications of tuple vales, the embedded watermark value ri .P . Since the modification affects the primary key hash hP
i
may be affected and may not be correctly verified. For example, in a random way due to our hash model, tuple ri has probability
if order of the tuples 1009 and 4310 is changed, it will disrupt the 1/g to remain in the same group, and probability 1 1/g to change
embedded watermark. Here, the order change will be treated as to one of the other groups after the modification. In the latter case
malicious modifications. This is acceptable since our scheme is two groups are affected by the modification. The probability that
mainly designed to detect modifications; it is more important that the modification can be correctly detected (i.e., all affected groups
any malicious modifications can be detected with high probability. are localized in watermark verification) is
In some applications, the owner may need to change the order of
1 1 g1 1 1
the tuples frequently. For example, he may need to sort the database P rob = (1 ) + (1 1 )(1 +1 ) (2)
relation frequently according to different attributes. In this case, g 2 2 g 2 2 2 2
the owner can build an index table which records the initial order Clearly, the probability in this case is less than that in the case of
of the tuples in the database. After that, the tuples can be sorted modifying non-primary-key value. The reason is that the tuple with
arbitrarily. When it is needed to verify the integrity of the database modified primary key may move from one group to another and this
relation, the index table is first used to recover the initial order of affects one more watermark.
the tuples and the embedded watermarks can then be verified as
usual. 0
Modify a value
10
modify a nonP value
3.4.3 For databases without a primary key modify a P value, g=10
2
In the proposed scheme, it is assumed that the database to be 10 modify a P value, g=30
modify a P value, g=50
protected has a primary key. The primary key is used for computing
4
the primary key hash which is for grouping and sorting tuples. In 10
this way, if an attacker modifies an attribute value other than the
6
primary key of a tuple, this modification only affects the watermark 10

1Prob
embedded in one group. For database relations without a primary
8
key, tuple hash can be used to group and sort tuples. In this case, the 10
modification of an attribute value will affect the watermarks in two
10
groups, since the modified tuple may be removed from one group 10
and be added to the other group.
12
10

4. ANALYSIS 14
10
Our fragile watermarking scheme is group based; the order of 10 30 50 70 90

tuples in each group represents a unique watermark. Therefore,
the finest granularity that one can localize possible alterations is a
group. In this section, we analyze the probability that all alterations Figure 2: Error in detecting single value modification
are correctly localized in corresponding groups. We consider three
simple alterations: modify an attribute value, insert a tuple, and
delete a tuple. We also consider three massive alterations: modify Figure 2 shows the error rate (i.e., 1 prob) in detecting single
multiple attribute values, insert multiple tuples, and delete multiple value modification for different size of groups and number of
tuples. For simplicity, we assume that each group consists of ex- groups g. We see that the error rates decrease exponentially with
actly tuples; thus, the length of each embedded watermark W is the group size (Y-axis in logarithm scale). We also see that the
/2. If is not even, the last tuple contributes a half bit in water- error rate for non-primary-key value detection, which is irrelevant
mark detection. to the number of groups, is lower than primary key value detection
Given an ideal hash function HASH with an input and an out- as expected. For non-primary key value detection, the number of
put, any tamper of the input will randomize the output; that is, each groups g has little influence because the second term in formula 2
bit of the output has equal probabilities to change or not after the dominates the detection probability and the coefficient of the term
modification. We shall assume this ideal hash in the following anal- (g 1)/g  1 for not-too-small gs.
ysis.
4.2 Insert a tuple
4.1 Modify an attribute value Let a single tuple be inserted into the watermarked relation. The
Suppose a single attribute value is modified in the watermarked inserted tuple will be allocated to one of g groups in the watermark
relation. If the modified value is not from primary key attribute, detection. Since the added tuple will affect the group hash as well
the modification will affect neither primary key hash nor partition as the embedded watermark in a random way, the probability that
of groups in watermark verification. Without loss of generality, as- the insertion can be detected (i.e., the watermark verification result
sume ri .Aj is modified (i.e., the j-th attribute of the i-th tuple) in is false for the affected group) is
a group. The modification will affect the tuple hash hi , the group
1
hash H, and thus the embedded watermark W . After the modifica- P rob = 1 +1 (3)
tion, each bit of W has equal probabilities to match corresponding 2 2

bit of W  , which is the watermark extracted from the group. There- Figure 3 compares the error rate in detecting a single tuple in-
fore, the probability that the modification can be correctly detected sertion (bottom line in the figure) with detecting a non-primary-key
(i.e., W = W  ) is value modification (middle line). Since the tuple insertion increases
1 watermark length, while the modification does not, the error rate for
P rob = 1 (1) detecting the tuple insertion is lower. In general, the error rate in
22
0
10 is
modify a nonP value
2
10
insert a tuple sk1 , . . . , km  =
delete a tuple
[ ( 1) ( k1 )] [ ( 1) ( km )]
4
10 g (g n + 1)
(!)m (g n)!
6
10 = (5)
( k1 1)! ( km 1)! (g)!
1Prob

8
10 Then the probability that the modification of n tuples will result
in m modify-groups is
10
10
pk1 , . . . , km  =
 
g n!
12
10
sk1 , . . . , km (6)
m (1 + k1 )!(1 + k2 )! (1 + km )!
14
10
10 30 50 70 90 g
where the first term m is the total number of combinations of m
modify-groups selected from g groups, the second term (i.e., the
Figure 3: Error in detecting single value tamper fraction term) is the number of different sequences (or permuta-
tions) of the tuples modified in different delete-groups, and the last
term is the probability of any sequence. After the modification, a
watermark in any modify-group has /2 bits. Therefore, the prob-
ability that all deletions can be detected is
detecting longer watermark is always lower than detecting shorter
watermark. P rob =
min(g,n)
  1 m
pk1 , . . . , km (1 ) (7)
4.3 Delete a tuple m=n/ 0k1 ,...,km <
22
k1 +...km =nm
Let a single tuple be deleted from the watermarked relation. Ex-
actly one group will lose the deleted tuple in watermark detection. Example 1 Suppose that there are three groups with ten tuples in
The absence of the deleted tuple in the group will affect the group each group (g = 3 and = 10). Suppose n = 3 tuples are
hash as well as the embedded watermark in a random way. The modified. First consider the case in which the modification results
probability that the deletion can be detected (i.e., the watermark in m = 2 modify-groups, where in modify-group 1 two tuples are
verification result is false for the affected group) is modified, and in modify-group 2 one tuple is modified (k1 = 1 and
k2 = 0). The three tuples are modified one by one and we use a
1 sequence to indicate which tuple is modified in which group and in
P rob = 1 1 (4) which order. For example, sequence 112 means that the first tuple
2 2
is modified in modify-group 1, the second in modify-group 1, and
the third in modify-group 2. For this sequence, the first tuple is
Figure 3 also shows the error rate for detecting single tuple dele- modified with probability 10/30 because there are ten un-modified
tion (the upper line in the figure). The error rate is higher than tuples in modify-group 1 and there are 30 un-modified tuples in
those for detecting value modification and tuple insertion because total (recall that un-modified tuples have equal probability to be
the tuple deletion decrease the length of watermark. modified). Similarly, the second tuple is modified with probability
9/29, and the third with probability 10/28. The overall probability
10910
4.4 Modify multiple values of the sequence is s1, 0 = 302928 (see also formula 5). One
n! 3!
First consider the modification of non-primary-key attributes. can verify that there are total (1+k1 )!(1+k 2 )!
= 2!1! = 3 possible
The modification does not change any primary key hash, neither sequences (112, 121, and 211), and that the probability
g  of
 each
does the grouping of tuples in watermark detection. Since chang- of these sequence is the same. Since there are m = 32 = 3
ing any number of values within a tuple has the same effect (in combinations of the two modify-groups selected from three groups,
terms of randomization) on tuple hash, group hash as well as ex- the probability of this considered case is p1, 0 = 3 3 s1, 0
tracted watermark, we consider the modification at the tuple level. (see also formula 6).
For convenience, we call a group modify-group if some of its tuples Now we verify formula 7 by showing that the sum of the prob-
are modified. abilities p is exactly one. The modification of three tuples will
Assume that n tuples are modified one by one and that at any lead to either one or two or three modify-groups. This corresponds
step, un-modified tuples have equal probability to be modified. The to the leftmost sum from m = 1 to 3 in formula 7. For the case of
probability that the first tuple is modified in any particular group is two modify-groups (m = 2), either one or two tuples are mod-
/(g) = 1/g. After i tuples being modified, the probability that ified in modify-group one; that is, either k1 = 0, k2 = 1 or
the next tuple is modified in a particular group in which k tuples k1 = 1, k2 = 0. This corresponds to the second sum in formula 7.
have already been modified, is ( k)/(g i). We have p0, 1 = p1, 0.
Assume that the modification of n tuples will result in m (m For the case of one modify-group (m = 1), three tuples are
1098
min(g, n)) modify-groups and that ki + 1 (0 ki < ) tuples are modified in a same group (k1 = 2) with s2 = 302928 and
3 3!
modified in modify-group i (i = 1, . . . , m). Given a sequence p2 = 1 3! s2. For the case of three modify-groups (m = 3),
which indicates which tuple is modified in which group and in one tuple is modified in each group (k1 = k2 = k3 = 0), and we
which order, one can verify that the probability of the sequence have s0, 0, 0 = 101010
302928
and p0, 0, 0 = 33 1!1!1!3!
s0, 0, 0.
1098
Therefore, p2 + p0, 1 + p1, 0 + p0, 0, 0 = 3 302928 + As shown in section 4.1, each tuple with a modified primary key
10910
18 302928 + 6 101010 = 1. 2 value would remain in the same group with probability 1/g and
302928
One can verify that when n = 1, the probability is the same as move out of that group with probability 1 1/g. The moved-out
that given in section 4.1. For massive modification where n > g tuple has equal probability 1/g to fall in any other group. Now
and at least one tuple is modified in each group, the probability can consider one modified tuple in each group. For any group, the
be simplified as: probability that the modified tuple moves out is 1 1/g, and the
probability that any tuple from any other group moves in is also
1 g 1 1/g. Since the moving-out and moving-in probabilities are the
P rob  (1 ) (8)
22 same, each group will have the same probability pg to contain some
modified tuples, and the size of each group will remain roughly the
Massive modification of n tuples (n/g > 1)
same. In-depth analysis and experimental results will be provided
0
10 in an extended version of this paper.
g=10
2
10
g=30
g=50
4.5 Insert multiple tuples
Assume that n tuples are inserted into the data. We call a group
4
10 insert-group if it contains at least one inserted tuple after the inser-
tion. Since the partition of g groups are kept secret from attackers,
6
10 when one tuple is inserted, it has equal probability to fall into any
1Prob

group. The probability that the insertion of n tuples will result in


8
10 m (m min(g,n)) insert-groups, where insert-group i contains
ki + 1 (ki 0, m i=1 ki = n m) inserted tuples, is
10
10
 
g n! 1
pk1 , . . . , km  =
12
10 m (1 + k1 )!(1 + k2 )! (1 + km )! g n
g
14 where the first term m is the total number of combinations of m
10
10 30 50 70 90 insert-groups selected from g groups, the second term (i.e., fraction

term) is the number of different sequences (or permutations) of the
inserted tuples falling in different insert-groups, and the last term
Figure 4: Error in detecting massive value modification 1
gn
is the probability of any sequence. A sequence indicates which
inserted tuple falls into which insert-group in which order. After
the modification, a watermark in insert-group i has ( + ki + 1)/2
bits. Therefore, the probability that all insertions can be detected is
Comparison (g=10, n/(g ) = 50%)
0
10 P rob =
modification min(g,n)
insertion   1
deletion pk1 , . . . , km (1 +k1 +1
)
m=1 k1 ,...,km 0 2 2
k1 +...km =nm

1 1
5
10
(1 +k2 +1
) . . . (1 )
+km +1
(9)
2 2 2 2
1Prob

Example 2 Consider example 1 except that n = 3 tuples are in-


serted rather than modified. When a tuple is inserted, it has equal
probability g1 = 13 to fall in any particular group. The insertion
10
10
of three tuples may lead to one or two or three insert-groups. The
analysis is similar to example 1 except that the probability of each
sequence is always ( 13 )3 = 27 1
.
15 For the case of one insert-group (m = 1), three tuples are in-
10 1
10 30 50 70 90 serted in a same group (k1 = 2) with s2 = 27 and p2 =
3 3! 1
1 3! 27
. For the case of two insert-groups (m = 2), either one
or two tuples are inserted in insert-group one (k1 = 0, k2 = 1
Figure 5: Error in detecting massive tampers 1
or k1 = 1, k2 = 0); therefore, s1, 0 = s0, 1 = 27 and
  3! 1
p0, 1 = p1, 0 = 32 2!1! 27
. For the case of three insert-groups,
one tuple is inserted in each group (k1 = k 2 = k3 = 0), and
Figure 4 shows the error rate (i.e., 1 prob) in detecting massive
we have s0, 0, 0 = 27 1
and p0, 0, 0 = 33 1!1!1! 3! 1
. Therefore,
value modification for different size of groups and number of 27
the sum of the probabilities p is exactly one: p2 + p0, 1 +
groups g. We assume that at least one tuple is modified in each 1 1 1
p1, 0 + p0, 0, 0 = 3 27 + 18 27 + 6 27 = 1. 2
group. While all error rates go down exponentially with the group
It is easy to verify that when n = 1, the probability is the same
size , the larger the g, the larger the error rates. The reason is that
as that given in section 4.2. For massive insertion where n > g and
with larger g, more watermarks are affected by the modification,
n/g tuples are inserted into each group on average, the probability
and more likely at least one watermark remains undetected.
can be simplified as:
If the modification is applied to primary key attribute values, the
probability of detection can be analyzed similarly as for the modi- 1
P rob  (1 +n/g
)g (10)
fication of non-primary key values. We give some intuition for this. 2 2
Massive insertion of n tuples (n/(g ) = 50%) Massive insertion of n tuples ( = 50)
0 6
10 10
g=10 g=10
g=30 g=30
g=50 g=50
8
10
5
10

1Prob
1Prob

10
10

10
10
12
10

15 14
10 10
10 30 50 70 90 10 30 50 70 90
n/(g ) (%)

Figure 6: Error in detecting massive tuple insertion

Figure 6 shows the error rate (i.e., 1 prob) in detecting mas- probability that all deletions can be detected is
sive tuple insertion. The left sub-figure shows the error rate for
different size of groups and number of groups g where the in- P rob =
min(g,n)
serted tuples are 50% of the number of original tuples. The right   1
sub-figure shows the error rate for different percentage of inserted pk1 , . . . , km (1 k1 1
)
tuples n/(g) and number of groups g where the group size is 50. m=n/ 0k1 ,...,km < 2 2
k1 +...km =nm
In this figure, we set n > g and consider the average case where
n/g tuples are inserted into each group. Since all groups are af- 1 1
(1 k2 1
) . . . (1 km 1
) (11)
fected, we see that the smaller the g, the smaller the error rates. We 2 2 2 2

also see that the error rate decreases exponentially with group size
One can verify that when n = 1, the probability is the same as
and percentage n/(g) because the length of affected watermarks
that given in section 4.3. For massive deletion where n > g and
increases linearly in these cases.
n/g tuples are deleted from each group on average, the probability
can be simplified as:
1
P rob  (1 n/g
)g (12)
2 2

4.6 Delete multiple tuples Figure 7 shows the error rate (i.e., 1 prob) in detecting massive
Assume that n tuples are deleted from the data. If at least one but tuple deletion. The left sub-figure shows the error rate for different
not all tuples are deleted from a group, it will affect the group hash size of groups and number of groups g where 50% of the tuples
and corresponding watermark in a random way. If all tuples are are deleted. The right sub-figure shows the error rate for different
deleted from one group, there is no way to detect the modification, percentage of deleted tuples n/(g) and number of groups g where
in which case the probability of detection is zero. For convenience, the group size is 50. In this figure, we set n > g and consider
we call a group delete-group if at least one tuple is deleted from the the average case where n/g tuples are deleted from each group.
group. Since all groups are affected, the error rate is monotonic increasing
We assume that every tuple has equal probability to be deleted. with g. On the other hand, the more the tuples are deleted, the
The probability that the first tuple is deleted from any particular less the length of the affected watermark, and the larger the error
group is /(g) = 1/g. After i tuples being deleted, the probabil- rate. Increasing has similar effect on the detection probability as
ity that the next tuple is deleted from a particular group consisting decreasing n/g.
of k tuples, is ( k)/(g i).
Assume that the deletion of n tuples will result in m (m 4.7 Comparison of Detecting Massive
min(g, n)) delete-groups and that ki + 1 (0 ki < ) tuples Tampering
are deleted from delete-group i (i = 1, . . . , m). The analysis is Figure 5 compares the error rate in detecting massive tampering
similar to section 4.4 because the probability that an un-deleted tu- (massive value modifications, tuple insertions and tuple deletions)
ple is deleted is the same as the probability that an un-modified for different group sizes where g = 10 and n/(g) = 50%. With
tuple is modified. Given a sequence which indicates which tuple is group size increasing, all error rates decrease (exponentially) be-
deleted from which group and in which order, one can verify that cause the length of affected watermarks increases (linearly). For
the probability of the sequence is the same as sk1 , . . . , km  given the same group size, tuple deletion always yields larger error rate
in equation 5. Also the probability that the deletion will result in than tuple modification, which yields larger error rate than tuple
m delete-groups is the same as pk1 , . . . , km  given in equation 6. insertion. The reason is that the length of affected watermarks
After the deletion, a watermark in delete-group i has ( ki 1)/2 decreases, remains the same, and increases, respectively, in tuple
bits (this is different from the modification case). Therefore, the deletion, modification, and insertion.
Massive deletion of n tuples (n/(g ) = 50%) Massive deletion of n tuples ( = 50)
0 0
10 10
g=10
g=30
1 g=50 1
10 10

2 2
10 10

1Prob
1Prob

3 3
10 10

4 4
10 10

5 5
10 10 g=10
g=30
6 6
g=50
10 10
10 30 50 70 90 10 30 50 70 90
n/(g ) (%)

Figure 7: Error in detecting massive tuple deletion

4.8 Discussion on group size watermark by modifying the order of tuples; thus, it is distortion
The group size is a very important parameter in our scheme. It free. In the proposed scheme, all tuples in a database relation are
partially decides how the tuples are grouped. If an attacker knows first securely divided into groups according to some secure param-
the group size, combined with the knowledge of the secret key K, eters. Watermarks are embedded and verified in each group inde-
he can simply delete all tuples in a group, the scheme will fail to pendently. Thus, any modifications can be detected and localized to
detect this kind of modifications. Thus, to foil this attack, we must some specific groups. Security analysis showed that the probability
keep group size or the secret key secure so that an attacker will not of missing detections is very low.
have any knowledge of the grouping information used for water-
mark embedding. 6. ACKNOWLEDGEMENT
The length of each watermark /2 is the half of the group size.
The authors would like to thank the anonymous reviewers for
On the one hand, the larger the group size, the larger the proba-
their valuable comments. The work of Huiping Guo was supported
bility of detecting modifications in watermark detection, and the
in part by the Natural Sciences and Engineering Research Council
more secure is the proposed scheme. On the other hand, the larger
of Canada (NSERC) Postdoctoral Fellowship. The work of Sushil
the group size, less precisely we can localize modifications because
Jajodia was partially supported by the National Science Foundation
there are more tuples in each group (recall that group is the finest
under grants CCR-0113515 and IIS-0242237.
granularity in localization). We can therefore make trade-offs be-
tween security and localization when choosing the group size .
In our scheme and analysis, the group size is decided by the 7. REFERENCES
number of groups g and the total number of tuples in data. Given [1] R. Agrawal and J. Kiernan. Watermark relational databases.
, the larger the g, smaller the probability of tamper detection (see In Proc. of the 28th Inter. Conf. On Very Large Data Bases,
previous subsections). If g is too large, we can easily partition the 2002.
groups into smaller collections, and then apply watermark detection
[2] M. Chen, Y. He, and R. Lagendijk. A fragile watermark error
to each of the collections (this is equivalent to decreasing g without
detection scheme for wireless video communications. IEEE
changing ) such that the probability of tamper detection is large
Trans. On Multimedia, pages 315329, August 2003.
enough within each collection.
[3] I. J. Cox, M. Miller, and J. Bloom. Watermarking
In our scheme, since an attackers purpose is to make his mod-
applications and properties. In Proc. International
ifications undetectable by keeping the embedded watermarks un-
Conference on Information Technology: Coding and
touched, it is unreasonable for him to make lots of modifications.
Computing, 2000.
Given the group size and the number of groups in each collection,
the more modifications an attacker makes, the more likely his mod- [4] J. Fridrich and M. Du. Images with self-correcting
ifications will be detected. In our scheme, the purpose is to keep capabilities. In Proc. of the IEEE Inter. Conf. On Image
the detection probability as high as possible. After all, it is more Processing, pages 792796, 1999.
important to detect modifications than to narrow down the range of [5] J. Fridrich, M. Goljan, and M. Du. Invertible authentication.
modifications. In Proc. Of SPIE, Security and Watermarking of Multimedia
Contents, January 2001.
[6] H. Hacigumus, B. Iyer, and S. Mehrotra. Executing sql over
5. CONCLUSIONS encrypted data in the database-service-provider. In ACM
In this paper, we identified the problem of tamper detection and SIGMOD Conference on Management of Data, pages
localization for a database relation with categorical attributes and 216227, June 2002.
proposed a novel fragile watermarking scheme to address this prob- [7] H. Hacigumus, B. Iyer, and S. Mehrotra. Providing database
lem. Unlike other watermarking schemes which inevitably intro- as a service. In Proc. of Internal Conference on Data
duce distortions to the cover data, the proposed scheme embeds Engineering (ICDE) 02), March 2002.
[8] Y. Li, V. Swarup, and S. Jajodia. Constructing a virtual [12] E. Mykletun, M. Narasimha, and G. Tsudik. Authentication
primary key for fingerprinting relational data. In Proc. ACM and integrity in outsourced databases. In Proc. Of the
Workshop on Digital Rights Management, pages 133141, Network and Distributed System Security Symposium 2004),
October 2003. Feb 2004.
[9] Y. Li, V. Swarup, and S. Jajodia. A robust watermarking [13] R.Sion. Proving ownership over categorical data. In
scheme for relational data. In Proc. The 13th workshop on Proceedings of ICDE 2004, 2004.
information technology and engineering, pages 195200, [14] R.Sion, M. Atallah, and S. Prabhakar. Rights protection for
December 2003. relational data. In Proceedings of ACM SIGMOD 2003,
[10] E. Lin and E. Delp. A review of fragile image watermarks. In 2003.
Proc. Of the Multimedia and Security Workshop (ACM
Multimedia 99), October 30 - November 5 1999.
[11] C. Lu, H. Liao, and L. Chen. Multipurpose audio
watermarking. In Proc. 15th Int. Conf. on Pattern
Recognition, 2000.

Вам также может понравиться