Академический Документы
Профессиональный Документы
Культура Документы
1Prob
embedded in one group. For database relations without a primary
8
key, tuple hash can be used to group and sort tuples. In this case, the 10
modification of an attribute value will affect the watermarks in two
10
groups, since the modified tuple may be removed from one group 10
and be added to the other group.
12
10
4. ANALYSIS 14
10
Our fragile watermarking scheme is group based; the order of 10 30 50 70 90
tuples in each group represents a unique watermark. Therefore,
the finest granularity that one can localize possible alterations is a
group. In this section, we analyze the probability that all alterations Figure 2: Error in detecting single value modification
are correctly localized in corresponding groups. We consider three
simple alterations: modify an attribute value, insert a tuple, and
delete a tuple. We also consider three massive alterations: modify Figure 2 shows the error rate (i.e., 1 prob) in detecting single
multiple attribute values, insert multiple tuples, and delete multiple value modification for different size of groups and number of
tuples. For simplicity, we assume that each group consists of ex- groups g. We see that the error rates decrease exponentially with
actly tuples; thus, the length of each embedded watermark W is the group size (Y-axis in logarithm scale). We also see that the
/2. If is not even, the last tuple contributes a half bit in water- error rate for non-primary-key value detection, which is irrelevant
mark detection. to the number of groups, is lower than primary key value detection
Given an ideal hash function HASH with an input and an out- as expected. For non-primary key value detection, the number of
put, any tamper of the input will randomize the output; that is, each groups g has little influence because the second term in formula 2
bit of the output has equal probabilities to change or not after the dominates the detection probability and the coefficient of the term
modification. We shall assume this ideal hash in the following anal- (g 1)/g 1 for not-too-small gs.
ysis.
4.2 Insert a tuple
4.1 Modify an attribute value Let a single tuple be inserted into the watermarked relation. The
Suppose a single attribute value is modified in the watermarked inserted tuple will be allocated to one of g groups in the watermark
relation. If the modified value is not from primary key attribute, detection. Since the added tuple will affect the group hash as well
the modification will affect neither primary key hash nor partition as the embedded watermark in a random way, the probability that
of groups in watermark verification. Without loss of generality, as- the insertion can be detected (i.e., the watermark verification result
sume ri .Aj is modified (i.e., the j-th attribute of the i-th tuple) in is false for the affected group) is
a group. The modification will affect the tuple hash hi , the group
1
hash H, and thus the embedded watermark W . After the modifica- P rob = 1 +1 (3)
tion, each bit of W has equal probabilities to match corresponding 2 2
bit of W , which is the watermark extracted from the group. There- Figure 3 compares the error rate in detecting a single tuple in-
fore, the probability that the modification can be correctly detected sertion (bottom line in the figure) with detecting a non-primary-key
(i.e., W = W ) is value modification (middle line). Since the tuple insertion increases
1 watermark length, while the modification does not, the error rate for
P rob = 1 (1) detecting the tuple insertion is lower. In general, the error rate in
22
0
10 is
modify a nonP value
2
10
insert a tuple sk1 , . . . , km =
delete a tuple
[ ( 1) ( k1 )] [ ( 1) ( km )]
4
10 g (g n + 1)
(!)m (g n)!
6
10 = (5)
( k1 1)! ( km 1)! (g)!
1Prob
8
10 Then the probability that the modification of n tuples will result
in m modify-groups is
10
10
pk1 , . . . , km =
g n!
12
10
sk1 , . . . , km (6)
m (1 + k1 )!(1 + k2 )! (1 + km )!
14
10
10 30 50 70 90 g
where the first term m is the total number of combinations of m
modify-groups selected from g groups, the second term (i.e., the
Figure 3: Error in detecting single value tamper fraction term) is the number of different sequences (or permuta-
tions) of the tuples modified in different delete-groups, and the last
term is the probability of any sequence. After the modification, a
watermark in any modify-group has /2 bits. Therefore, the prob-
ability that all deletions can be detected is
detecting longer watermark is always lower than detecting shorter
watermark. P rob =
min(g,n)
1 m
pk1 , . . . , km (1 ) (7)
4.3 Delete a tuple m=n/ 0k1 ,...,km <
22
k1 +...km =nm
Let a single tuple be deleted from the watermarked relation. Ex-
actly one group will lose the deleted tuple in watermark detection. Example 1 Suppose that there are three groups with ten tuples in
The absence of the deleted tuple in the group will affect the group each group (g = 3 and = 10). Suppose n = 3 tuples are
hash as well as the embedded watermark in a random way. The modified. First consider the case in which the modification results
probability that the deletion can be detected (i.e., the watermark in m = 2 modify-groups, where in modify-group 1 two tuples are
verification result is false for the affected group) is modified, and in modify-group 2 one tuple is modified (k1 = 1 and
k2 = 0). The three tuples are modified one by one and we use a
1 sequence to indicate which tuple is modified in which group and in
P rob = 1 1 (4) which order. For example, sequence 112 means that the first tuple
2 2
is modified in modify-group 1, the second in modify-group 1, and
the third in modify-group 2. For this sequence, the first tuple is
Figure 3 also shows the error rate for detecting single tuple dele- modified with probability 10/30 because there are ten un-modified
tion (the upper line in the figure). The error rate is higher than tuples in modify-group 1 and there are 30 un-modified tuples in
those for detecting value modification and tuple insertion because total (recall that un-modified tuples have equal probability to be
the tuple deletion decrease the length of watermark. modified). Similarly, the second tuple is modified with probability
9/29, and the third with probability 10/28. The overall probability
10910
4.4 Modify multiple values of the sequence is s1, 0 = 302928 (see also formula 5). One
n! 3!
First consider the modification of non-primary-key attributes. can verify that there are total (1+k1 )!(1+k 2 )!
= 2!1! = 3 possible
The modification does not change any primary key hash, neither sequences (112, 121, and 211), and that the probability
g of
each
does the grouping of tuples in watermark detection. Since chang- of these sequence is the same. Since there are m = 32 = 3
ing any number of values within a tuple has the same effect (in combinations of the two modify-groups selected from three groups,
terms of randomization) on tuple hash, group hash as well as ex- the probability of this considered case is p1, 0 = 3 3 s1, 0
tracted watermark, we consider the modification at the tuple level. (see also formula 6).
For convenience, we call a group modify-group if some of its tuples Now we verify formula 7 by showing that the sum of the prob-
are modified. abilities p is exactly one. The modification of three tuples will
Assume that n tuples are modified one by one and that at any lead to either one or two or three modify-groups. This corresponds
step, un-modified tuples have equal probability to be modified. The to the leftmost sum from m = 1 to 3 in formula 7. For the case of
probability that the first tuple is modified in any particular group is two modify-groups (m = 2), either one or two tuples are mod-
/(g) = 1/g. After i tuples being modified, the probability that ified in modify-group one; that is, either k1 = 0, k2 = 1 or
the next tuple is modified in a particular group in which k tuples k1 = 1, k2 = 0. This corresponds to the second sum in formula 7.
have already been modified, is ( k)/(g i). We have p0, 1 = p1, 0.
Assume that the modification of n tuples will result in m (m For the case of one modify-group (m = 1), three tuples are
1098
min(g, n)) modify-groups and that ki + 1 (0 ki < ) tuples are modified in a same group (k1 = 2) with s2 = 302928 and
3 3!
modified in modify-group i (i = 1, . . . , m). Given a sequence p2 = 1 3! s2. For the case of three modify-groups (m = 3),
which indicates which tuple is modified in which group and in one tuple is modified in each group (k1 = k2 = k3 = 0), and we
which order, one can verify that the probability of the sequence have s0, 0, 0 = 101010
302928
and p0, 0, 0 = 33 1!1!1!3!
s0, 0, 0.
1098
Therefore, p2 + p0, 1 + p1, 0 + p0, 0, 0 = 3 302928 + As shown in section 4.1, each tuple with a modified primary key
10910
18 302928 + 6 101010 = 1. 2 value would remain in the same group with probability 1/g and
302928
One can verify that when n = 1, the probability is the same as move out of that group with probability 1 1/g. The moved-out
that given in section 4.1. For massive modification where n > g tuple has equal probability 1/g to fall in any other group. Now
and at least one tuple is modified in each group, the probability can consider one modified tuple in each group. For any group, the
be simplified as: probability that the modified tuple moves out is 1 1/g, and the
probability that any tuple from any other group moves in is also
1 g 1 1/g. Since the moving-out and moving-in probabilities are the
P rob (1 ) (8)
22 same, each group will have the same probability pg to contain some
modified tuples, and the size of each group will remain roughly the
Massive modification of n tuples (n/g > 1)
same. In-depth analysis and experimental results will be provided
0
10 in an extended version of this paper.
g=10
2
10
g=30
g=50
4.5 Insert multiple tuples
Assume that n tuples are inserted into the data. We call a group
4
10 insert-group if it contains at least one inserted tuple after the inser-
tion. Since the partition of g groups are kept secret from attackers,
6
10 when one tuple is inserted, it has equal probability to fall into any
1Prob
1 1
5
10
(1 +k2 +1
) . . . (1 )
+km +1
(9)
2 2 2 2
1Prob
1Prob
1Prob
10
10
10
10
12
10
15 14
10 10
10 30 50 70 90 10 30 50 70 90
n/(g ) (%)
Figure 6 shows the error rate (i.e., 1 prob) in detecting mas- probability that all deletions can be detected is
sive tuple insertion. The left sub-figure shows the error rate for
different size of groups and number of groups g where the in- P rob =
min(g,n)
serted tuples are 50% of the number of original tuples. The right 1
sub-figure shows the error rate for different percentage of inserted pk1 , . . . , km (1 k1 1
)
tuples n/(g) and number of groups g where the group size is 50. m=n/ 0k1 ,...,km < 2 2
k1 +...km =nm
In this figure, we set n > g and consider the average case where
n/g tuples are inserted into each group. Since all groups are af- 1 1
(1 k2 1
) . . . (1 km 1
) (11)
fected, we see that the smaller the g, the smaller the error rates. We 2 2 2 2
also see that the error rate decreases exponentially with group size
One can verify that when n = 1, the probability is the same as
and percentage n/(g) because the length of affected watermarks
that given in section 4.3. For massive deletion where n > g and
increases linearly in these cases.
n/g tuples are deleted from each group on average, the probability
can be simplified as:
1
P rob (1 n/g
)g (12)
2 2
4.6 Delete multiple tuples Figure 7 shows the error rate (i.e., 1 prob) in detecting massive
Assume that n tuples are deleted from the data. If at least one but tuple deletion. The left sub-figure shows the error rate for different
not all tuples are deleted from a group, it will affect the group hash size of groups and number of groups g where 50% of the tuples
and corresponding watermark in a random way. If all tuples are are deleted. The right sub-figure shows the error rate for different
deleted from one group, there is no way to detect the modification, percentage of deleted tuples n/(g) and number of groups g where
in which case the probability of detection is zero. For convenience, the group size is 50. In this figure, we set n > g and consider
we call a group delete-group if at least one tuple is deleted from the the average case where n/g tuples are deleted from each group.
group. Since all groups are affected, the error rate is monotonic increasing
We assume that every tuple has equal probability to be deleted. with g. On the other hand, the more the tuples are deleted, the
The probability that the first tuple is deleted from any particular less the length of the affected watermark, and the larger the error
group is /(g) = 1/g. After i tuples being deleted, the probabil- rate. Increasing has similar effect on the detection probability as
ity that the next tuple is deleted from a particular group consisting decreasing n/g.
of k tuples, is ( k)/(g i).
Assume that the deletion of n tuples will result in m (m 4.7 Comparison of Detecting Massive
min(g, n)) delete-groups and that ki + 1 (0 ki < ) tuples Tampering
are deleted from delete-group i (i = 1, . . . , m). The analysis is Figure 5 compares the error rate in detecting massive tampering
similar to section 4.4 because the probability that an un-deleted tu- (massive value modifications, tuple insertions and tuple deletions)
ple is deleted is the same as the probability that an un-modified for different group sizes where g = 10 and n/(g) = 50%. With
tuple is modified. Given a sequence which indicates which tuple is group size increasing, all error rates decrease (exponentially) be-
deleted from which group and in which order, one can verify that cause the length of affected watermarks increases (linearly). For
the probability of the sequence is the same as sk1 , . . . , km given the same group size, tuple deletion always yields larger error rate
in equation 5. Also the probability that the deletion will result in than tuple modification, which yields larger error rate than tuple
m delete-groups is the same as pk1 , . . . , km given in equation 6. insertion. The reason is that the length of affected watermarks
After the deletion, a watermark in delete-group i has ( ki 1)/2 decreases, remains the same, and increases, respectively, in tuple
bits (this is different from the modification case). Therefore, the deletion, modification, and insertion.
Massive deletion of n tuples (n/(g ) = 50%) Massive deletion of n tuples ( = 50)
0 0
10 10
g=10
g=30
1 g=50 1
10 10
2 2
10 10
1Prob
1Prob
3 3
10 10
4 4
10 10
5 5
10 10 g=10
g=30
6 6
g=50
10 10
10 30 50 70 90 10 30 50 70 90
n/(g ) (%)
4.8 Discussion on group size watermark by modifying the order of tuples; thus, it is distortion
The group size is a very important parameter in our scheme. It free. In the proposed scheme, all tuples in a database relation are
partially decides how the tuples are grouped. If an attacker knows first securely divided into groups according to some secure param-
the group size, combined with the knowledge of the secret key K, eters. Watermarks are embedded and verified in each group inde-
he can simply delete all tuples in a group, the scheme will fail to pendently. Thus, any modifications can be detected and localized to
detect this kind of modifications. Thus, to foil this attack, we must some specific groups. Security analysis showed that the probability
keep group size or the secret key secure so that an attacker will not of missing detections is very low.
have any knowledge of the grouping information used for water-
mark embedding. 6. ACKNOWLEDGEMENT
The length of each watermark /2 is the half of the group size.
The authors would like to thank the anonymous reviewers for
On the one hand, the larger the group size, the larger the proba-
their valuable comments. The work of Huiping Guo was supported
bility of detecting modifications in watermark detection, and the
in part by the Natural Sciences and Engineering Research Council
more secure is the proposed scheme. On the other hand, the larger
of Canada (NSERC) Postdoctoral Fellowship. The work of Sushil
the group size, less precisely we can localize modifications because
Jajodia was partially supported by the National Science Foundation
there are more tuples in each group (recall that group is the finest
under grants CCR-0113515 and IIS-0242237.
granularity in localization). We can therefore make trade-offs be-
tween security and localization when choosing the group size .
In our scheme and analysis, the group size is decided by the 7. REFERENCES
number of groups g and the total number of tuples in data. Given [1] R. Agrawal and J. Kiernan. Watermark relational databases.
, the larger the g, smaller the probability of tamper detection (see In Proc. of the 28th Inter. Conf. On Very Large Data Bases,
previous subsections). If g is too large, we can easily partition the 2002.
groups into smaller collections, and then apply watermark detection
[2] M. Chen, Y. He, and R. Lagendijk. A fragile watermark error
to each of the collections (this is equivalent to decreasing g without
detection scheme for wireless video communications. IEEE
changing ) such that the probability of tamper detection is large
Trans. On Multimedia, pages 315329, August 2003.
enough within each collection.
[3] I. J. Cox, M. Miller, and J. Bloom. Watermarking
In our scheme, since an attackers purpose is to make his mod-
applications and properties. In Proc. International
ifications undetectable by keeping the embedded watermarks un-
Conference on Information Technology: Coding and
touched, it is unreasonable for him to make lots of modifications.
Computing, 2000.
Given the group size and the number of groups in each collection,
the more modifications an attacker makes, the more likely his mod- [4] J. Fridrich and M. Du. Images with self-correcting
ifications will be detected. In our scheme, the purpose is to keep capabilities. In Proc. of the IEEE Inter. Conf. On Image
the detection probability as high as possible. After all, it is more Processing, pages 792796, 1999.
important to detect modifications than to narrow down the range of [5] J. Fridrich, M. Goljan, and M. Du. Invertible authentication.
modifications. In Proc. Of SPIE, Security and Watermarking of Multimedia
Contents, January 2001.
[6] H. Hacigumus, B. Iyer, and S. Mehrotra. Executing sql over
5. CONCLUSIONS encrypted data in the database-service-provider. In ACM
In this paper, we identified the problem of tamper detection and SIGMOD Conference on Management of Data, pages
localization for a database relation with categorical attributes and 216227, June 2002.
proposed a novel fragile watermarking scheme to address this prob- [7] H. Hacigumus, B. Iyer, and S. Mehrotra. Providing database
lem. Unlike other watermarking schemes which inevitably intro- as a service. In Proc. of Internal Conference on Data
duce distortions to the cover data, the proposed scheme embeds Engineering (ICDE) 02), March 2002.
[8] Y. Li, V. Swarup, and S. Jajodia. Constructing a virtual [12] E. Mykletun, M. Narasimha, and G. Tsudik. Authentication
primary key for fingerprinting relational data. In Proc. ACM and integrity in outsourced databases. In Proc. Of the
Workshop on Digital Rights Management, pages 133141, Network and Distributed System Security Symposium 2004),
October 2003. Feb 2004.
[9] Y. Li, V. Swarup, and S. Jajodia. A robust watermarking [13] R.Sion. Proving ownership over categorical data. In
scheme for relational data. In Proc. The 13th workshop on Proceedings of ICDE 2004, 2004.
information technology and engineering, pages 195200, [14] R.Sion, M. Atallah, and S. Prabhakar. Rights protection for
December 2003. relational data. In Proceedings of ACM SIGMOD 2003,
[10] E. Lin and E. Delp. A review of fragile image watermarks. In 2003.
Proc. Of the Multimedia and Security Workshop (ACM
Multimedia 99), October 30 - November 5 1999.
[11] C. Lu, H. Liao, and L. Chen. Multipurpose audio
watermarking. In Proc. 15th Int. Conf. on Pattern
Recognition, 2000.