Академический Документы
Профессиональный Документы
Культура Документы
1, JANUARY 2012 67
AbstractWe introduce video-sync (VSYNC), a video file utilities in the Unix/Linux operating systems. These conven-
synchronization system that efficiently uses a bidirectional com- tional utilities are efficient in reducing the bandwidth for
munications link to maintain up-to-date video sources at remote
exact bit-level synchronization of general data files. However,
ends to a desired resolution and distortion level. By automatically
detecting and transmitting only the differences between video video files with the same content may be stored in different
files, VSYNC is able to avoid unnecessary re-transmission of bitstream formats on different locations, and when sent over
the entire video when there are only minor differences between the network may be encoded in various ways in order to reduce
video copies. A hierarchical hashing scheme is designed to allow transmission redundancy as well as to meet the receivers de-
synchronization to within some user-defined distortion, while be-
vice characteristics. Therefore, these standard synchronization
ing rate-efficient and computationally tractable. Distributed video
coding is used to realize further rate savings when transmitting protocols are unsuitable for video data because they fail to
video updates. VSYNC is bandwidth-efficient and is useful in capture similarity at the content level.
many scenarios including video backup, video sharing, and video This motivates us to investigate protocols that can patch
authentication applications. Experimental results show that rate- and update the differences between video files, which we
savings ranging from 2 to 10 can be obtained by VSYNC
call video synchronization. There can be multiple versions
with about 10% of the frames being edited, compared to re-
transmitting the compressed video or using a file synchronization of similar video sources that users wish to synchronize to
utility such as rsync. a single version. Different videos might also have different
video parameters such as resolution and quality. This calls for
Index Termsrsync, video coding, video file synchronization,
video hash, VSYNC. a protocol that is video-centric and that can synchronize video
files to satisfy predefined distortion constraints. In this paper,
I. Introduction we introduce video-sync (VSYNC), a video-centric video file
synchronization protocol that can detect and send the changes
Fig. 2. VSYNC Protocol. In step 1, user U1 generates weak hashes for the
GOFs and strong hashes for the MB tubes and sends them to user U2 . User While the above-mentioned hashing algorithms are robust
U2 verifies the hashes in step 2 and generates new content and assembly to certain kinds of compression distortion and sensitive to
instructions in step 3. In step 4, user U1 decodes the new content and content-based visual attacks, their focus is on tamper detection
assembles the updated video.
rather than file synchronization without accountability for
transmission efficiency. Furthermore, only empirical perfor-
When applied to synchronization of video files, such content- mance of the hashing algorithms was given and no quantifiable
agnostic approaches would be very inefficient. relationship between the similarity of hashes and that of the
underlying images/videos has been established.
B. Image and Video Hashing Our work of VSYNC differs from previous works in several
There are various image and video hashing techniques in the aspects. First, VSYNC is a bidirectional protocol that aims
literature, many of which are designed for tampering detection. at synchronizing video files at remote ends in a rate-efficient
Fridrich and Goljan [8] proposed a tampering detection and computation-efficient manner, rather than detecting image
method that is robust to JPEG compression and additive noise. tampering. Second, a hierarchical hashing scheme is designed
Discrete cosine transform (DCT) blocks of an image are pro- to learn the differences and similarities between the video data,
jected onto random vectors followed by adaptive thresholding which can be analytically quantified and mapped to the Ham-
to get binary hashes. Roy and Sun [9] proposed an image hash ming distance between their respective hashes. Third, while
based on two parts. One part consists of quantized projections we use channel coding to reduce the hash rate, we also use
of an images scale-invariant feature transform features onto an extra MSE check after the channel code decoding process
random hyperplanes. The other part consists of both the ori- to validate if the underlying content satisfies the required
entation of the image and the block-wise histogram statistics. distortion. This estimated MSE is then used to apply a DVC
Lin and Chang [10] proposed a hashing technique for image approach to estimate and transmit the necessary additional
authentication which can prevent malicious manipulations but bits. This avoids unnecessary overhead and further improves
allow JPEG lossy compression. The authentication signature is transmission efficiency.
based on the invariance of the relationships between DCT co-
efficients at the same position in separate blocks of an image.
Lin et al. introduced a rate-efficient image tamper detection III. System Overview
technique [11] similar in spirit to a secure biometric hash We first present an overview of the VSYNC system as
proposed by Draper et al. [12], where blocks of the query illustrated in Fig. 2. Define a group of frames (GOFs) as a
image are projected onto random vectors and quantized, and series of F frames. Each frame of size N1 by N2 is divided
then syndrome encoded using a suitable low-density-parity- into q by q macroblocks (MBs). Also define a MB tube as
check (LDPC) code [13], [14]. The syndromes and a cryp- the set of co-located MBs within a GOF, as shown in Fig. 3.
tographic hash of the quantized projections are sent over When an update request is initiated, both Vorig and Vupd are
the network. The same projections and cryptographic hashes first resized to the same resolution as V upd if necessary.1
are also obtained of the original image, and the received In step 1 in Fig. 2, user U1 computes a weak hash of every
syndrome bits are decoded using the projections of the original non-overlapping GOF of Vorig and a strong hash for each MB
image as side information. If the decoding is successful and tube across each GOF. In step 2, user U2 computes the weak
the cryptographic hashes match, a successful match will be hash for every overlapping GOF of Vupd and compares it with
declared. In Lins work [11], tolerable differences between the that sent by U1 , where the overlapping GOFs are GOFs that
original image and the query image are assumed to be due to overlap with each other by a fixed number of frames. Such
quantization errors, and the authors showed empirically that weak hash check helps locate for each GOF in Vorig all the pos-
the difference between the unquantized projection coefficients sible matching GOFs in Vupd . Whenever the weak hash check
follow a Gaussian distribution after compression. However, the
upd is 640480, and the original
1 For example, if the required resolution of V
Gaussian assumption may break down after quantization, right
resolution of Vorig and Vupd is 320 120 and 1280 960, respectively, then
before LDPC coding is applied and therefore the thresholds Vorig should be upsampled by a factor of 2 and Vupd should be downsampled
used need to be experimentally determined. by a factor of 2.
70 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 1, JANUARY 2012
TABLE I
Key Notations
Notation Definition
Vorig Video to be updated
Vupd Destination (updated) video
F No. of frames in a GOF
N1 N2 Dimensions of a frame
qq Dimensions of an MB Fig. 4. Angle subtended by the rays from the origin to z1 and z2 can
v1 (v2 ) Sampled pixel values of GOF in Vorig (Vupd ) be found using simple trigonometry to be arccos zz1z ,z2
. If a hyperplane
1 2
1 (2 ) Pixel values of MB tube in Vorig (Vupd ) orientation is chosen uniformly at random, then the probability of the
Kw (Ks ) No. of projections for weak (strong) hash hyperplane separating z1 and z2 is just p = 1 arccos zz1z
,z2
.
w1 (w2 ) Weak hash of Vorig (Vupd ) 1 2
goals, FN1 N2 pixels for each GOF are uniformly randomly D. Strong Hash Check
sub-sampled. The sub-sampling is done to reduce the compu- As was discussed in the previous section, it is desirable for
tation complexity and the sampling fraction, denoted by f , can the weak hash to have a small false negative rate. Indeed, only
be chosen to trade off between complexity and accuracy. The vectors that form angles larger than w will not pass the MSE
sub-sampling can be coordinated between user U1 and user requirement. However, this nature also makes the weak hash
U2 by some pre-agreed common pseudo-randomness on the vulnerable in detecting other possibilities that violate the MSE
hash generation process for both the GOF and the MB tubes criteria, i.e., it can have a large false positive rate. For example,
within each GOF. Denote by v1 and v2 respectively the vector those vectors that form an angle less than w but lie outside
of randomly sampled pixels of the GOFs in Vorig and Vupd , the circle, as shown in Fig. 5, will not be rejected by the weak
and by Nw = FN1 N2 f the number of sampled pixels. v1 and hash check. Since the weak hash is generated by only a sub-
v2 are then projected onto Kw random hyperplanes Pk RNw , sampled set of the pixels, it is also possible that some changes
k {1, . . . , Kw }. Each projected value is then quantized into in certain areas cannot be detected. For example, if the only
one bit depending on its sign. Denote by w1 and w2 the difference between Vorig and Vupd is that Vupd has a logo across
vectors of the quantized values. To make the weak hash check the upper left corner of the entire video and the size of the
computationally light, only the Hamming distance dH (w1 , w2 ) logo only accounts for a small fraction of the video frames,
between w1 and w2 is checked. The relationship between then the weak hash is likely to miss it. Therefore, another hash
dH (w1 , w2 ) and the distortion tolerance TM is discussed in check that is able to detect such cases is needed to verify the
the sequel. potential matches returned by the weak hash check.
Following the idea used in (1), we know that the entries With these guidelines in mind, we design the strong hash
in w1 and w2 will be either 0 or 1 and that each pair of to check on a finer granularity in the GOFs, i.e., on an MB
corresponding entry bits in the vectors, denoted by w1 [i] and tube basis. To form the strong hash of an MB tube, all the
w2 [i], i = 1, 2, . . . , Kw , should have a probability of differing pixel values of each MB tube are formed into a vector which
v1 ,v2
to be Pr(w1 [i] = w2 [i]) = 1 arccos
v1 v2
, which can be estimated is projected onto Ks random hyperplanes and are quantized
dH (w1 ,w2 ) depending on its sign. Denote by 1 (2 ) the vectors of the
using Kw
.
Given v1 of length Nw , any vector that is within an MSE of pixel values of the MB tube in Vorig (Vupd ), and by s1 (s2 ) the
TM from it should lie within the circle of radius Nw TM as vectors of the quantized bits of the projections in Vorig (Vupd ).
shown in Fig. 5. The largest possible angle w that any vector The angle threshold s for the strong hash can be computed
within the circle can have with respect to v1 is formed between in a similar fashion to that in the weak hash. Denote by
v1 and the vector that is tangent to the circle. It can be seen Ns = Fq2 the number of pixels in an MB tube. Given the
that vectors that form an angle larger than w with respect MSE threshold, TM , the corresponding angle threshold for
N s TM
to v1 will fail the MSE requirement and therefore should not any MB tube 1 is given by s = arcsin 1
. To reduce
pass the weak hash check. The threshold w can be obtained the number of thresholds that need to be transmitted, we
as select L + 1 representative thresholds sl , l = 0, 1, . . . , L
which can be pre-designed. We say that s is in class Cl
Nw TM
w = arcsin . (3) if sl1 < s sl , l = 1, 2, . . . , L, and the representative
v1 threshold value for class Cl is sl . Only the class index l
Since dH (wK1w,w2 ) provides a reasonably good estimate of the is transmitted. When s1 belongs to class Cl , the cross-over
flipping probability of any random hyperplane bisecting the probability p between the entry bits in s1 and s2 should
l
vectors v1 and v2 , we should let the corresponding two GOFs satisfy p pls = s if the underlying MB tubes satisfy the
pass the weak hash check if dH (wK1w,w2 ) < w , otherwise they MSE constraint TM . A small false negative rate is guaranteed
fail. One threshold w for each GOF will be piggy-backed in because any 1 that gives a s in any class Cl will always be
the hash bits to be transmitted. Since only one threshold per checked against the largest value sl in that class.
GOF is transmitted, the overhead is small. However, sending out the raw bits in the strong hash as
In summary, the weak hash check works as follows. User is can be wasteful, we therefore apply a channel coding
U1 , who wants to synchronize his/her video Vorig to Vupd up concept to reduce the hash rate needed as elaborated below.
72 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 22, NO. 1, JANUARY 2012
belief propagation (BP) decoder to decode the syndrome bits dE (1 , 2 ) < max (d1 , d2 ) = dEm (1 , 2 ) (7)
using the corresponding s2 as the side-information. Following
Korner and Marton [18], we let U2 first compute the encoded where
bits s2e of s2 and its modulo-2 sum with s1e . Denote by d1 = u 2 + 2 2 2u 1 |2 cos ( dH (s1 ,s2 ) );
z = s1 s2 and by ze = s1e s2e , we have 1 Ks
d2 = u2 + 2 2 2u |2 cos ( dH (s1 ,s2 ) ).
1 1 Ks
ze = s1e s2e = P T s1 P T s2 = P T (s1 s2 ) = P T z. (4)
Therefore, dEm (1 , 2 ) Ns TM provides a sufficient condi-
It is easy to see that z should be Bernoulli with some tion for the MSE requirement dE (1 , 2 ) Ns TM . Since
parameter p. Since P T is designed to correct up to a flipping sl , Ns , TM and Ks are given, and 2 2 and dH (s1 , s2 ) can be
probability pls , user U2 should be able to decode z from ze if computed at U2 , the MSE check can be easily performed.
p pls and fail otherwise. Therefore, the hypothesis testing In summary, the strong hash check works as follows. For
problem of whether the uncoded strong hash satisfies the each pair of GOFs that pass the weak hash check, user U2
Hamming distance check is recast as a decodability problem. will classify the MB tubes for the GOF in video Vupd in
It is also worth noting that the rate of the LDPC code will be the same way as is instructed by user U1 and perform the
close to the theoretical lower bound H(ps ) when Ks is large, corresponding LDPC decoding. If the decoding fails, U2 will
where H(.) is the entropy function. It is also for this reason mark the MB tubes and request retransmission. Otherwise, the
that we group several MB tubes into categories depending decoded Hamming distance is used to perform the MSE check
on their class Cl so that the projected bit values can be by verifying dEm (1 , 2 ) Ns TM .
concatenated to increase length. As a result of applying the
syndrome encoding, large rate savings compared to directly
sending the binary bits can be obtained. This is especially true V. Rate-Efficient Update
when the desired video distortion is low and the corresponding
For GOFs that fail the weak hash check, user U2 should
threshold pls is small. However, LDPC BP decoding requires
inform user U1 that these GOFs in video Vorig are obsolete
more computations, so we only apply the strong hash checks
and should be deleted. User U2 will also send both the new
to GOF pairs that pass the weak hash check.
GOFs in video Vupd that none of the GOFs in video Vorig
To verify if the content units satisfy the MSE threshold TM ,
match and the MB tubes that fail the strong hash decoding
user U2 also needs to check the MSE between 1 and 2 in
process. Users can choose appropriate encoding schemes,
addition to the LDPC decodability check. Using (2), we can
e.g., H.264, according to the application requirements to
obtain an estimate of their Euclidean distance as follows:
encode the GOFs and MBs.
dE (1 , 2 ) = If an MB tube passes the strong hash check (and hence
also passes the weak hash check) but fails the MSE check,
dH (s1 , s2 ) it is still likely that the corresponding MB tubes are highly
1 2 + 2 2 21 2 cos ( ) (5)
Ks correlated since the decoded Hamming distance passed the
angular threshold. Therefore, the amount of rate needed to
where dH (s1 , s2 ) is the Hamming distance returned by the
update the MBs can be reduced if user U2 is able to exploit
LDPC decoder (the l1 norm of z in this case). However, 1
this correlation. We will apply a DVC technique to do so, such
is unknown at user U2 . Suppose s1 comes from an MB tube
that client U2 transmits just enough information of 2 to user
that belongs to class Cl and thus has a threshold angle sl , we
U1 to perform incremental updates to 2 using 1 as side-
know from Fig. 5 and the classification criteria that
information. The maximum MSE dEm (1 , 2 ) described in (7)
Ns TM Ns TM is used to estimate the rate needed to update the corresponding
u1 = 1 < = u 1 . (6)
sin sl sin sl1 MB tube to a quality of no less than TM .
ZHANG et al.: VSYNC: BANDWIDTH-EFFICIENT AND DISTORTION-TOLERANT VIDEO FILE SYNCHRONIZATION 73
Fig. 8. RD performance of synchronization using VSYNC, rsync, and H.264 Fig. 10. RD performance curves for VSYNC, rsync, and H.264 for video
for video Akiyo and News. Motorcycle.
Fig. 11. Tampering of video Foreman by adding a text banner to the upper-
left corner of the entire video.
[17] D. Slepian and J. Wolf, Noiseless coding of correlated information Infocomm Research, Singapore, in 2004, and was an Engineering Intern
sources, IEEE Trans. Inform. Theory, vol. 19, no. 4, pp. 471480, Jul. with Omnivision Technologies, Sunnyvale, CA, in 2005. His current research
1973. interests include image and video processing and communications, distributed
[18] J. Korner and K. Marton, How to encode the modulo-two sum of binary source coding, computer vision, and machine learning.
sources (corresp.), IEEE Trans. Inform. Theory, vol. 25, no. 2, pp. 219 Dr. Yeo was a recipient of the Singapore Government Public Service
221, Mar. 1979. Commission Overseas Merit Scholarship from 1998 to 2002. Since 2004,
[19] R. Puri, A. Majumdar, and K. Ramchandran, PRISM: A video coding he has been receiving the Singapores Agency for Science, Technology, and
paradigm with motion estimation at the decoder, IEEE Trans. Image Research Overseas Graduate Scholarship. He received a Best Student Paper
Process., vol. 16, no. 10, pp. 24362448, Oct. 2007. Award at SPIE VCIP in 2007.
[20] Available: http://www.youtube.com/watch?v=ESVLfrKr Zo&hd=
1&feature=hd