06213094

128
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 1, JANUARY 2013
Color Video Denoising Based on Combined Interframe and Intercolor Prediction

Jingjing Dai, Student Member, IEEE, Oscar C. Au, Fellow, IEEE, Chao Pang, Student Member, IEEE, and Feng Zou, Student Member, IEEE
Abstract An advanced color video denoising scheme which we call CIFIC based on combined interframe and intercolor prediction is proposed in this paper. CIFIC performs the denoising ltering in the RGB color space, and exploits both the interframe and intercolor correlation in color video signal directly by forming multiple predictors for each color component using all three color components in the current frame as well as the motion-compensated neighboring reference frames. The temporal correspondence is established through the joint-RGB motion estimation (ME) which acquires a single motion trajectory for the red, green, and blue components. Then the current noisy observation as well as the interframe and intercolor predictors are combined by a linear minimum mean squared error (LMMSE) lter to obtain the denoised estimate for every color component. The ill condition in the weight determination of the LMMSE lter is detected and remedied by gradually removing the least contributing predictor. Furthermore, our previous work on the LMMSE lter applied in the adaptive luminancechrominance space (LAYUV for short) is revisited. By reformulating LAYUV and comparing it with CIFIC, we deduce that LAYUV is a restricted version of CIFIC, and thus CIFIC can theoretically achieve lower denoising error. Experimental results verify the improvement brought by the joint-RGB ME and the integration of the intercolor prediction, as well as the superiority of CIFIC over LAYUV. Meanwhile, when compared with other state-of-the-art algorithms, CIFIC provides competitive performance both in terms of the color peak signal-to-noise ratio and in perceptual quality. Index TermsColor video denoising, intercolor correlation, least squares estimator.
I. Introduction
HE LAST decade has witnessed an overwhelming proliferation of video applications due to the rapid growth of multimedia technology. These video signals are often contaminated by noise during acquisition, storage, and transmission. The presence of noise not only results in an unpleasant visual appearance, but also imposes an adverse effect on the performance of subsequent video processing tasks, such as video compression, analysis, object tracking, and pattern recognition. Therefore, video denoising is a highly desirable and essential
Manuscript received September 29, 2011; revised December 29, 2011; accepted March 19, 2012. Date of publication June 6, 2012; date of current version January 9, 2013. This work was supported in part by the Research Grants Council and the Innovation and Technology Commission of Hong Kong. This paper was recommended by Associate Editor A. Kaup. The authors are with the Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Kowloon 10042, Hong Kong (e-mail: jjdai@ust.hk; eeau@ust.hk; pcece@ust.hk; fengzou@ust.hk). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TCSVT.2012.2203203
step in video processing systems. Since the origins of the degradations are numerous and diverse [imperfection of the charge-coupled device (CCD) detectors, electronic instabilities, thermal uctuations, etc.], the overall noise is often modeled as an additive Gaussian white process, independent of the noise-free signal [1]. Abundant efforts have been made on video denoising, and many different approaches have been presented in the literature. These approaches can be roughly classied based on the criterion whether they are implemented in the pixel domain or transform domain. In pixel-domain approaches, the ltering is performed directly on pixel intensities. Typically a 3-D window along the temporal axis or the estimated motion trajectory is constructed and used for the linear or nonlinear ltering of one pixel value [2][7]. The nonlocal technique proposed in [6] produces impressive noise reduction capability, where every noisy pixel is restored by the weighted average of all pixel intensities in a 3-D neighborhood, and the weights are determined by the dissimilarity between the patch centered at the target pixel to be denoised and that centered at the neighboring pixels. Later, a temporal denoising lter called multihypothesis motion-compensated lter (MHMCF) is presented in [7]. For every nonoverlapping block in the current frame, MHMCF rst nds its prediction blocks in reference frames by motion estimation (ME), and then restores the pixels in the block by combining the multiple temporal predictions and the current noisy observation through the linear minimum mean squared error (LMMSE) ltering. Transform-domain denoising techniques come slightly later than pixel-domain ones and become an active research area since the pioneering work in [8]. In transform-domain approaches, some kind of linear transform such as wavelet transform is performed rst to achieve a sparse representation of the video signal, and then the transform coefcients are restored by spatialtemporal ltering [9][12] or statistical estimation [13], [14], before they are eventually inverselytransformed back to the pixel domain. In the famous and effective video denoising algorithm VBM3D [11], a 3-D data array called group is formed by stacking together blocks found similar to the currently processed one, and then the 3-D group is processed in the transform domain by shrinkage and Wiener lter. A recently submitted manuscript [15] from the same authors goes a step further by proposing the VBM4D which stacks similar 3-D spatiotemporal volumes instead of 2-D blocks to form four-dimensional (4-D) data arrays. In
1051-8215/$31.00 c 2012 IEEE
DAI et al.: COLOR VIDEO DENOISING BASED ON COMBINED INTERFRAME AND INTERCOLOR PREDICTION
129
recent years, there has been a growing interest in studying statistical models for natural images and videos. In [14], the spatiotemporal Gaussian scale mixture (STGSM) model is designed to capture the local correlation between the wavelet coefcients of noise-free video sequences across both space and time, and the Bayesian least square estimation is applied to achieve the video denoising. Most existing denoising algorithms solely deal with the grayscale video signal. In contrast, color video denoising has received much less attention in the literature. To the best of our knowledge, only a few denoising algorithms have been specially developed for color video signal [16][20]. In fact, color video production and application are getting extremely popular nowadays, and thus reducing noise in color video sequences is also becoming an essential task in todays signal processing systems. A straightforward way to extend a grayscale denoiser to handling color signal is to apply it independently in every color component. However, this approach is far from satisfactory since it fails to utilize the correlation between different color components. In fact, it has been well known that the RGB signal has strong intercolor correlation between each pair of components, especially in the high frequency [21]. This feature has been widely exploited in image demosaicking and interpolation [21], [22] as well as high-delity image and video compression [23][25]. Therefore, denoising algorithms should be able to take advantage of the intercolor correlation to improve over the single-component approaches, and there are two feasible strategies in the literature. The rst one is to avoid the correlation issue by converting the RGB signal into a more decorrelated color space and apply the single-component denoising lter to every component in the new space separately. The CIELAB color space is utilized in [18] to separate the RGB signal into three different channels L*, a*, and b*, where L* represents lightness, a* represents rednessgreenness, and b* represents yellowness blueness [26]. Then different lter parameters specically tuned to each channel are used to reduce the weak-signal noise in color television reception. The traditional YCbCr space transforms the RGB information into a luminance component Y, two chrominance components Cb and Cr, and has been found to be a good choice in the wavelet based recursive spatiotemporal color lter (WRSTFC) [16], where an adaptive recursive temporal lter with varying lter strength depending on the estimated reliability of the motion vector (MV) is applied individually to each of the luminance and chrominance channels, followed by additional spatial ltering for the luminance component out of the recursive loop. Our previous work on the LMMSE lter in the LAYUV [20] is the extension of the grayscale denoiser MHMCF to color noise reduction. LAYUV examines the primary factors that inuence the denoising error of the LMMSE lter and derives an adaptive optimal linear luminancechrominance space such that when the pixel samples are converted from the RGB space to that new space, MHMCF can be applied to the individual color components to achieve the minimum overall denoising error. The second one is to introduce coupling between RGB components by treating the multichannel information as a vector and devise a nonseparable denoising formula. [17] proposes
a vector extension of the popular Hidden Markov Tree model [27], [28] to exibly exploit the color dependency of wavelet coefcients. The nonlocal means lter is extended to color signal denoising in [19] and [29] through treating every pixel as a vector consisting of intensities in RGB channels, and enforcing the intercolor correlation by improving the weight calculation using the resemblance of color patches. In this paper, we propose a novel denoising scheme called CIFIC (standing for combined interframe and intercolor prediction) which belongs to neither of the above two strategies for noise reduction in the 4:4:4 color video signal. CIFIC takes advantage of the intercolor correlation directly in the RGB space by introducing the intercolor prediction, that is, forming multiple predictors for each color component using pixel intensities of the current frame and motion-compensated neighboring frames both in this color component and the other two color components. As such, this paper can be considered as an improvement and extension of MHMCF. While MHMCF only makes use of the interframe prediction, CIFIC, specially designed for color video signal, utilizes combined interframe and intercolor prediction in the LMMSE ltering to enhance the denoising performance. Note that the LMMSE formulation in CIFIC can be written in the vector form like the second strategy. However, different from the second strategy, the LMMSE ltering of the red, green, and blue components can be optimized separately in CIFIC. Two additional features of CIFIC include a joint-RGB ME to generate a single motion eld for the three color components simultaneously, as well as the detection and remedy of the ill condition in the LMMSE weight determination. Another important contribution of this paper is that we reformulate our previous work LAYUV and deduce that LAYUV is a restricted version of CIFIC, and thus CIFIC can theoretically achieve lower denoising error than LAYUV. The remainder of this paper is organized as follows. Section II analyzes the interframe as well as the intercolor correlation in nature color video sequences, which serve as the motivation of this paper. The proposed CIFIC is developed and presented in details in Section III. Whereafter, LAYUV is reformulated and compared with CIFIC in Section IV. Section V is dedicated to intensive quantitative and qualitative performance evaluation of CIFIC and comparison with other state-of-the-art techniques. Finally Section VI concludes this paper. II. Interframe and Intercolor Correlation in Color Video Sequences The inherent interframe correlation has been exploited rather effectively in existing video denoising algorithms, and when the color signal is concerned, the strong intercolor correlation is also essential characteristic that can help improve the denoising process. Therefore, both the interframe and intercolor correlation need to be utilized for potential denoising gains. In this section, we decompose the interframe and intercolor correlation into three kinds of correlation as depicted in Fig. 1 and make a comparison between them. The rst one is the pure interframe correlation which is the correlation between the same color component of the current
130
TABLE I Interframe and Intercolor Correlation Coefficients Sequence Bus Chair Renata Salesman Tennis Pure interframe correlation coefcients R(k)&R(k-1) 0.88 0.78 0.84 0.89 0.83 G(k)&G(k-1) 0.88 0.81 0.86 0.89 0.83 B(k)&B(k-1) 0.86 0.81 0.85 0.86 0.84 Pure intercolor correlation coefcients R(k)&G(k) 0.96 0.56 0.92 0.95 0.83 R(k)&B(k) 0.93 0.62 0.91 0.88 0.83 G(k)&B(k) 0.95 0.64 0.93 0.92 0.89 Interframe intercolor correlation coefcients R(k)&G(k-1) 0.86 0.57 0.81 0.86 0.72 R(k)&B(k-1) 0.83 0.58 0.79 0.80 0.72 G(k)&R(k-1) 0.86 0.57 0.80 0.86 0.72 G(k)&B(k-1) 0.84 0.61 0.82 0.84 0.77 B(k)&R(k-1) 0.83 0.58 0.79 0.80 0.72 B(k)&G(k-1) 0.85 0.60 0.82 0.84 0.77 TABLE II MSE of Interframe and Intercolor Prediction Sequence R(k)&R(k-1) G(k)&G(k-1) B(k)&B(k-1) R(k)&G(k) R(k)&B(k) G(k)&B(k) R(k)&G(k-1) R(k)&B(k-1) G(k)&R(k-1) G(k)&B(k-1) B(k)&R(k-1) B(k)&G(k-1) Chair Renata Salesman Pure interframe prediction 155.0 18.9 95.2 30.2 147.8 15.6 99.8 28.3 153.5 20.5 99.7 34.4 Pure intercolor prediction 43.3 17.5 82.3 24.2 87.6 29.4 111.8 64.4 55.6 19.7 77.0 28.6 Interframe-intercolor prediction 187.7 27.3 168.1 49.7 229.5 40.7 199.9 90.2 187.0 27.5 167.8 50.1 198.8 30.7 169.5 53.9 223.8 41.3 195.8 90.4 193.9 31.0 162.8 53.8 Bus Tennis 99.4 87.3 102.8 239.4 290.1 133.8 295.2 351.6 300.0 211.0 353.8 208.3 Ave. 79.7 75.8 82.2 81.3 116.7 62.9 145.6 182.4 146.5 132.8 181.0 130.0 Ave. 0.84 0.85 0.84 0.84 0.83 0.87 0.76 0.74 0.76 0.78 0.74 0.78
Fig. 1.
Interframe and intercolor correlation.
pixel and its motion-compensated pixel in the neighboring reference frame. The second one is the pure intercolor correlation which is the correlation between any pair of color components at the same spatial position and in the same frame. The last one is the interframeintercolor correlation which is the correlation between one color component of the current pixel and a different color component of its motioncompensated pixel in the reference frame. We analyze these three types of correlation by two measures: the correlation coefcient and the mean squared error (MSE) associated with the interframe and intercolor prediction. Experiments are conducted on ve color video sequences Bus, Chair, Renata, Salesman, and Tennis used in [16].1 When interpreting the interframe correlation, the previous frame is served as the reference frame, and the motion trajectory is specied by the MV for every 8 8 block obtained through the joint-RGB ME with equal weights for RGB components (details of the joint-RGB ME will be disclosed in Section III). The correlation coefcients under investigation are calculated block by block. The average values of the block-wise correlation coefcients over all blocks and all frames for every sequence are reported in Table I. As observed, the pure intercolor correlation is comparable to the pure interframe correlation, which unveils that the intercolor correlation can be as strong as the interframe correlation and thus both should be made use of in a good color video denoiser. The sequence Chair is the only exception where the intercolor correlation coefcients are relatively small. This is due to the fact that a large portion of this sequence is occupied by comparatively smooth regions, where the intercolor correlation tends to be lower [21]. The interframeintercolor correlation coefcients are also signicant, though slightly smaller in magnitude than the pure interframe and intercolor correlation coefcients, and thus can also be utilized in the color video denoising. The intercolor correlation can be exploited by intercolor weighted prediction [23][25], namely using a linear function of the intensity in one color component to approximate that in a different color component PC2 = F (C1 ) = wC1 + o (1)
the additive offset o. Taking into account the high similarity between the spatial gradients of the RGB components [24], [25], (1) can be simplied by setting w = 1, that is PC2 = C1 + o. (2)
By minimizing the MSE E[(PC2 C2 )2 ], the offset o that best approximates C2 is 1 2 C o=C (3)
2 denote the mean of C1 and C2 respectively. 1 and C where C Replacing o in (2) by (3), the intercolor prediction turns into 2 C 1. PC2 = C1 + C (4)
where C1 and C2 denote the intensities in two components, PC2 denotes the prediction for C2 , and F () represents the weighted prediction function specied by the weighting factor w and
1 Available
at http://telin.ugent.be/ vzlokoli/PHD/Color/
To test the prediction efciency of the intercolor prediction as dened in (4), we calculate the MSE values associated with the pure intercolor prediction as well as the interframe intercolor prediction, and compare them with those associated with the pure interframe prediction. Again, MSE is rst computed for every 8 8 block, with the mean in (4) for all pixels in one block estimated by the sample mean of that block. The averaged MSE values are listed in Table II, which
131
further indicates that the pure intercolor prediction and the interframeintercolor prediction can be effective alternatives to the interframe prediction. Even for the sequence Chair where the intercolor correlation coefcients are relatively low, the two kinds of intercolor prediction can still work as well as the interframe prediction. This can be explained by the reason that the low correlation coefcients are mainly due to smooth regions where the RGB components all remain nearly constant in a local neighborhood, and thus the three still resemble each other to a great extent. In the LMMSE ltering that will be presented in the next section, the intercolor prediction will be integrated to enhance the color denoising efciency. III. Proposed Color Video Denoising Based on Combined Interframe and Intercolor Prediction (CIFIC) A. Overall Framework of CIFIC The basic idea of the proposed CIFIC is an LMMSE lter based on combined interframe and intercolor prediction which takes advantage of both the inherent interframe and intercolor correlation in color video sequences. The general framework is developed based on MHMCF [7] and illustrated in Fig. 2. The most essential difference between CIFIC and MHMCF is that, when denoising one color component, the other two color components in the current and motion-compensated adjacent frames are also involved in the LMMSE ltering in the form of the intercolor prediction introduced in the previous section. Other improvements of CIFIC over MHMCF common to both grayscale and color video denoising are: 1) while MHMCF employs nonoverlapping-block-based processing, CIFIC employs overlapping-block-based processing to suppress the blocking artifacts; and 2) while MHMCF employs integer-pixel ME, CIFIC employs subpixel ME to enhance the temporal prediction efciency. Noisy video sequences are restored frame by frame in the display order. To denoise the current noisy frame, multiple frames including previous denoised frames as well as future noisy frames are used in CIFIC as reference frames. The current frame is processed block-by-block with xed block size of Kx Ky , and the distance between adjacent horizontal blocks and adjacent vertical blocks are Dx and Dy , respectively. With block-based processing, it is likely to introduce blocking artifacts. To suppress these, Dx is restricted to being strictly smaller than Kx and Dy strictly smaller than Ky such that the neighboring blocks are overlapping with each other. CIFIC mainly consists of two steps to denoise every block: ME followed by motion-compensated LMMSE ltering. For the current color block, CIFIC applies joint-RGB ME with respect to each reference frame to generate a single motion trajectory for RGB components simultaneously. Quarterpixel ME and motion compensation (MC) are used for accurate temporal prediction. Using all three color components in the current block and the motion-compensated blocks, LMMSE ltering is applied to estimate every noise-free color component. As all color components in multiple frames are used as the input of the LMMSE lter, CIFIC takes advantage of both the interframe and intercolor correlation directly. In order
to obtain the optimal weights in LMMSE, some statistical parameters need estimation and the remedial renement is adopted to deal with the ill condition. After every block in the current frame has been restored, all those block-wise estimates comprise an overcomplete representation of the current frame, that is, there are multiple estimates for some, if not all, pixels. To obtain the nal estimate of the whole frame, for every pixel, all the associated blockwise estimates are aggregated by weighted averaging, where the bilinear weights similar to those in H.263 overlapped block motion compensation (OBMC) are used [30]. The details of the block-wise processing will be described in the following sections. As most papers on color image and video denoising, in this paper we assume that the RGB video signal is corrupted with additive Gaussian noise which is white and stationary over the spatial and temporal domains, and independent of the video signal. The noise components in RGB channels are assumed to be mutually independent, with 2 2 2 , nG , and nB . the respective variance denoted with nR B. Joint-RGB ME and MC Before performing the LMMSE lter, we need to establish the temporal correspondence between the current frame and the reference frames, which is achieved by local ME based on block-matching. For each color component of the current block, the best matched blocks of the same and different colors in the reference frames are required. One way to obtain these is to perform individual ME between each color component pair. However, this requires tremendous computational complexity (approximately nine times the complexity of single-component ME) which is highly undesirable. In addition, traditional ME cost functions such as sum of absolute difference (SAD) and sum of squared difference (SSD) are not appropriate for crosscolor ME (e.g., green with respect to red). On the other hand, the RGB components of any pixel belong to the same object and certainly should undergo the same motion. Thus it is not necessary to perform ME separately. In CIFIC, we develop the joint-RGB ME which performs searching within the predened search region once and produces a common MV for all three color components. In most video processing and compression applications, ME is applied only in the luminance plane as the luminance contains most of the information concerning spatiotemporal video sequence structures [16]. Note that the luminance component is the additive combination of RGB components. Motivated by this, we dene a color block mismatch measure SSDc as SSDc (mv) =
xB
[R (Rc ur(x) Rr ef(x + mv))
+ G (Gc ur(x) Gr ef(x + mv)) + B (Bc ur(x) Br ef(x + mv))]2 (5) where x denotes pixels in the current Kx Ky block B , cur and ref indicate the current and reference frames respectively, mv is the candidate MV, and R , G , and B are nonnegative weights for RGB components such that R + G + B = 1. This is effectively projecting the color vector in RGB space into a luminance-like direction (R , G , B ) and computing the SSD
132
Fig. 2.
Framework of CIFIC.
in that direction. As for the issue of computation complexity, the computation burden of the matching operation in (5) is three times the computation complexity of single-component ME, which is much lower than that of the individual ME mentioned above. Moreover, the joint-RGB ME can be further simplied by rst projecting the current frame and reference frame into the direction (R , G , B ), and then performing searching in the projected color planes. In this case, the complexity is approximately the same as that of single-component ME. The next problem is how to choose the values of R , G , and B . To address this, we need to take into consideration how the noise-free signal and the noise affect the accuracy of ME, respectively. In the view of the noise-free signal, image features such as textures and edges are very important to high-accuracy ME, and thus it is natural to give more weights to the color components with stronger image structures, which usually correspond to the ones with larger high-frequency energy. According to [31], the high-frequency bands of RGB components are not only highly correlated but also approximately equal to each other, thus no preference needs to be given to any particular color component. In other words, there is no restriction on the values of R , G , and B as long as they lie between 0 and 1. On the other hand,
the presence of noise tends to impose an adverse impact on ME, and it is obvious that the color component with higher noise level should be given a smaller weight. Therefore, our 2 , the goal of choosing R , G , and B is to minimize n variance of the overall noise in R (Rc ur(x) Rr ef(x + mv)) + G (Gc ur(x) Gr ef(x+mv))+ B (Bc ur(x) Br ef(x+mv)). When the reference frame is one of the denoised previous frames, we assume that the remaining noise in the reference frame 2 2 2 2 2 2 is negligible, such that n = 2 R nR + G nG + B nB since noise in different components is assumed to be independent. When the reference frame is one of the noisy future frames, 2 2 2 2 2 2 2 n can be written as 2 R nR + G nG + B nB , where nR 2 2 (or nG or nB ) is the sum of the noise variance in the red (or green or blue) component of the current frame and that of the reference frame since noise in different frames is assumed to be independent. Further, the assumption that the noise is 2 2 2 2 2 2 temporally stationary yields n = 2(2 R nR + G nG + B nB ). Therefore the optimization can be written as
R ,G ,B
min s.t.
2 2 2 2 2 2 R nR + G nG + B nB
R + G + B = 1, R 0, G 0, B 0
regardless of whether the reference frame is denoised or not.
133
By Lagrange multipliers, the optimal weights are R = G = B =

2 nR 2 2 2 nR + nG + nB 2 nG 2 2 2 nR + nG + nB 2 nB . 2 2 2 nR + nG + nB
(6) (7) (8)
It is trivial to show that the efciency of the motioncompensated prediction directly inuences the performance of the LMMSE lter, and better prediction usually leads to higher noise reduction capability. Besides, it has been known for a long while that ME with fractional-pixel accuracy can enhance the temporal prediction, and has been adopted in modern video coding standards [32]. Inspired by this, we apply ME with quarter-pixel accuracy instead of integer-pixel accuracy for improved prediction. The subpixel interpolation lter proposed in [33] is employed due to its superior performance and relatively low implementation complexity. C. LMMSE Filtering In this section, we will formulate the color noise removal by the LMMSE ltering. Here are some commonly used notations: 1) yR , yG , and yB : noise-free red, green, and blue components of a color pixel in current Kx Ky block; 2) zR , zG , and zB : noisy observation of yR , yG , and yB , i.e., [zR , zG , zB ] = [yR , yG , yB ] + [nR , nG , nB ] where nR , nG , and nB are the noise components; 3) pmR , pmG , and pmB : red, green, and blue components of the motion-compensated color pixel in the mth reference frame; 4) y R, y G , and y B : denoised estimate of the red, green, and blue components of the current color pixel. Assuming M color reference frames are available, the denoised estimate of every color component of one pixel is constructed as a linear combination of (3M + 3) observations consisting of zc and pmc with m = 1, 2, ..., M and c = R, G, B ) + y R y R = wT R (z z ) + y y G = wT G G (z z ) + y y B = wT B B (z z (9) (10) (11)
Note that the ltering of RGB components can be performed and optimized individually as shown in (9), (10), and (11). In the following discussion, we will take the red component as an example for illustration and the operation in the green and blue components can be conducted in the same manner. and y R To compute y R with (9), we need to estimate z as well as nd the optimal wR . Here we assume the spatial stationarity within one Kx Ky block, and all pixels in one block share the same statistical parameters and the same by the sample mean of z weighting vector wR . We estimate z in that block. As zR = yR + nR and the additive noise has zero mean, we approximate y R by z R . Furthermore, we optimize wR through minimizing the denoising error which can be written as ))2 ] E[(yR y R )2 ] = E[(yR y R wT R (z z 2 T = E[(wT R r) ] = wR Cov(r)wR (13)
) is the (3M + 3)-tuple vector of where r = (yR y R ) 1 (z z prediction errors (or residues), and Cov(r) is the covariance matrix of r. The r can be decomposed as the concatenation T T T of (M + 1) 3-tuple vectors: r = [rT 0 , r1 , ..., rM ] where rm = T [rmR , rmG , rmB ] denotes the residue vector corresponding to R (z R the mth frame for m = 0, 1, 2, ..., M . Here r0R = yR y z R ) = nR is simply the noise component in the red channel; r0G = yR y R (zG z G ) and r0B = yR y R (zB z B ) are the pure intercolor residues when predicting yR with zG and zB respectively; for m = 1, 2, ..., M , rmR = yR y R (pmR p mR ) is the pure interframe residue when predicting yR with pmR ; rmG = yR y R (pmG p mG ) and rmB = yR y R (pmB p mB ) are the interframe-intercolor residues when predicting yR with pmG and pmB , respectively. The optimization of wR becomes min
wR
wT R Cov(r)wR wT R 1 = 1. Cov1 (r)1 1T Cov1 (r)1
s.t.
The optimal wR can be solved using Lagrange multipliers as wR = (14)
where Cov1 (r) denotes the inverse of Cov(r). D. Parameter Estimation and Remedial Renement To obtain the optimal wR from (14), we need to estimate Cov(r). As in [7], we assume that residues corresponding to different reference frames are mutually independent. Then Cov(r) can be written as Cov(r0 ) 0 0 ... 0 0 Cov(r1 ) 0 . . . 0 (15) . .. 0 ... . . . . . . Cov(rM ) where Cov(rm ) is the 3 3 covariance matrix of rm . R ) 1 (z After ME, the noisy residue vector r = (zR z T T ) = [r0T , r1T , ..., rM ] can be obtained for all pixels in the z current Kx Ky block, and the following equation satises ) = r + nR 1. Note that r = (yR + nR y R ) 1 (z z
where z = [zR , zG , zB , p1R , p1G , p1B , ..., pMR , pMG , pMB ]T = [ zR , zG , zB , is the (3M + 3)-tuple observation vector, z p1R , p1G , p1B , ..., pMR , pMG , pMB ]T is the mean of z, y R, y G, and y B are the mean of yR , yG , and yB , respectively. The wR , wG , and wB are the (3M + 3)-tuple weighting vectors for the ltering of the red, green, and blue components such that
T T wT R 1 = 1, wG 1 = 1, wB 1 = 1
(12)
) where 1 is a (3M + 3)-tuple vector with all 1s. As (z z contains interframe predictors as well as intercolor predictors, CIFIC exploits both the interframe and intercolor correlation directly.
134
Algorithm 1 Remedial Renement

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
for m = 1 M do 1 if 1T 3 Cov (rm )13 < 0 then 1 rg Compute and compare 1T 2 Cov (rm )12 , T T 1 rb 1 gb 12 Cov (rm )12 , and 12 Cov (rm )12 1 rg if 1T 2 Cov (rm )12 is the largest then Remove pmb from z in (9) 1 rb else if 1T 2 Cov (rm )12 is the largest then Remove pmg from z in (9) else Remove pmr from z in (9) end if end if end for Derive the LMMSE solution of the reduced problem
Cov(rm ) is not positive semidenite, and the ill condition may occur. For safety, something needs to be done to the three predictors from the mth reference frame. Then we apply the remedial renement by identifying the least contributing (or worst) predictor and changing the LMMSE problem by removing that predictor from the predictor list z in (9). This is a remedy because we prove in Appendix A that the estimation of a 22 covariance matrix (with clipping applied if necessary) will always be positive semidenite. So the solution to the reduced problem will not have the ill condition from the mth reference frame. The above remedy is repeated for all 1 the m with 1T 3 Cov (rm )13 < 0. Note that there is no need to adjust for m = 0 since Cov1 (r0 ) is a block diagonal matrix and is certainly positive denite. By replacing wR in (13) by (14), it can be deduced that the denoising error of the LMMSE lter is E[(yR y R )2 ] =
T 1 1 (1T 3 Cov (r0 )13 + 13 Cov (r1 )13 1 1 + + 1T 3 Cov (rM )13 )
the residue vector r and r are both zero mean, and with our assumption of independent noise, it can be deduced that for every m {1, 2, ..., M }
2 Cov(rm ) = Cov(rm ) + nR J
(18)
(16)
where 13 denotes the 3-tuple vector of 1s and J is a 3 3 matrix with all elements being 1. Thus, we estimate Cov(rm ) by computing Cov(rm ) as the sample covariance matrix of rm using all noisy residues in the current block and computing Cov(rm ) as
2 Cov(rm ) = Cov(rm ) nR J.
(17)
Consider now the case of m = 0. Note that r0R is equal to the negative of noise in the red component, therefore the variance 2 of r0R is simply nR , and the covariance between r0R and r0G as well as the covariance between r0R and r0B are both 0. With the variance and covariance terms related to r0R determined, the variances of r0G and r0B as well as the covariance between the two can be estimated in a way similar to (16) and (17). The denominator in the right side of (14) can be T 1 1 decomposed into 1T 3 Cov (r0 )13 + 13 Cov (r1 )13 + + T 1 13 Cov (rM )13 . The covariance matrix is by denition positive semidenite, and so is its inverse. Every Cov(rm ) satises 1 1T 3 Cov (rm )13 0. In particular, for m = 0, it can be proven T that 13 Cov1 (r0 )13 > 0 (see the Appendix B for the details of the proof). Therefore, the denominator in (14) cannot be zero, and wR always exists. However, we observe in our experiments that the estimation of Cov(rm ) using (16) and (17) does not guarantee the positive 1 semideniteness of Cov(rm ). And negative 1T 3 Cov (rm )13 terms can sometimes lead to the ill condition that the denominator in (14) approaches zero resulting in erroneous weights. The erroneous weights can cause the denoised estimate to have extreme values, which may even be out of the valid intensity range of [0, 255] and can lead to perceptually disturbing local color distortion in the denoised images as marked by red circles in the center images in Fig. 3. To remedy this issue, we detect the ill condition by 1 checking for 1T 3 Cov (rm )13 < 0. In particular, for any 1 m {1, 2, . . . , M }, if 1T 3 Cov (rm )13 < 0, it indicates that
1 which indicates that a larger 1T 3 Cov (rm )13 leads to a smaller denoising error. Here we denote the 2 2 covariance matrix of [rmR , rmG ]T , [rmR , rmB ]T , and [rmG , rmB ]T by Cov(rrg m ), gb ), and Cov ( r ) respectively. The three are the residue Cov(rrb m m covariance matrix associated with the two remaining predictors after the blue, green, or red predictor is removed. The worst predictor in the mth reference frame is dened as the one that, when removed, gives smallest denoising error. We calculate T T 1 rg 1 rb 1 gb 1T 2 Cov (rm )12 , 12 Cov (rm )12 , and 12 Cov (rm )12 , where 12 denotes the 2-tuple vector of all 1s, and identify the largest 1 rg one. Suppose the redgreen covariance 1T 2 Cov (rm )12 is the largest, then the worst predictor to be removed is the blue component pmb . The proposed remedial renement is summarized step by step in Algorithm 1. In our experiments, we nd the method to be able to eliminate the possibilities of the ill condition, and there are no noticeable color artifacts in the denoised color images as depicted in the right images in Fig. 3. An alternative way to handle the ill condition is to force Cov(rm ) to be positive semidenite through replacing any negative eigenvalue of Cov(rm ) by a small positive value [34]. In our experiments, we nd that this does not give better results [about 0.1 dB lower color peak signal-to-noise ratio (cPSNR)] than the problem reduction approach (Algorithm 1). In addition, the eigen-decomposition is of higher computational complexity than Algorithm 1.
IV. Comparison Between LAYUV and CIFIC As stated in the introduction part, a number of color signal denoising algorithms perform denoising in some decorrelated color space (such as YCbCr) instead of the correlated RGB space since most of the redundancy among the red, green, and blue components have been removed in the new space. This approach is also employed in our previous work LAYUV [20], in which the monochromatic MHMCF denoiser is extended to color denoising by applying the LMMSE lter separately to the three new components after the RGB samples
135
Fig. 3. Frame 58 of the sequence Bus. (a) Original clean image, (b) denoised image without remedial renement (noticeable color artifacts), (c) denoised image after applying remedial renement (no color artifacts). Selected portions of images in the rst row are enlarged and displayed in the second row. The noise level is nR = nG = nB = 25.
are converted to some luminancechrominance space. The luminancechrominance space is adaptively derived such that the overall mean squared denoising error in the red, green, and blue components is minimized. In the proposed CIFIC, instead of applying decorrelation through color space conversion, the intercolor redundancy is exploited directly in the RGB space through the integration of the intercolor prediction. As CIFIC and LAYUV are both LMMSE-based approaches, we perform an analytic comparison between these two in this section. In the following, we will reformulate LAYUV into a problem which is a restricted version of the CIFIC problem, and claim that CIFIC being the optimal solution of the bigger problem is superior to LAYUV. Here we only consider the LMMSE part in CIFIC and LAYUV, and temporarily ignore other implementation issues. In other words, it is assumed that the same MVs are used for both CIFIC and LAYUV such that the temporal prediction blocks found are at the same location. Recall that the proposed CIFIC nds the linear predictor that minimizes the denoising error in the RGB color space, or to be precise min E[( yR yR )2 ]+ E[( yG yG )2 ]+ E[( yB yB )2 ] (19) y R y G y B zR zR p1R p1R rgb rgb zG zG + W1,c p1G p1G W0,c zB zB p1B p 1B pMR pMR yR rgb + + WM,c pMG pMG + yG , (20) pMB pMB yB
rgb Wm,c R33 , m
rgb is the where R33 is the set of all 3 3 matrices, Wm,c th weighting matrix for the m frame, and the optimization is rgb over the set of Wm,c , m = 0, 1, ..., M . The constraint (22) is there because the weighting coefcients for the ltering of each RGB component add up to 1. In other words, (22) is equivalent to (12). Similarly, LAYUV nds the linear predictor and the linear color space that minimize the denoising error in the RGB color space but the linear predictor is formed in the luminance chrominance space, or to be precise
W0,l ,...,WM,l ,C
yuv
min yuv
E[( yR yR )2 ]+E[( yG yG )2 ]+E[( yB yB )2 ] (23) y Y y U y V
s.t.
W0,c ,...,WM,c
rgb
rgb
zY zY p1Y p1Y yuv yuv = W0,l zU zU + W1,l p1U p1U z V zV p1V p1V pMY pMY yY yuv + + WM,l pMU pMU + yU , pMV pMV yV (24) W0,l , ..., WM,l R33 W0,l + W1,l + ...WM,l = I33 [ yY , y U, y V ]T = C[ yR , y G, y B ]T C R33 C[1, 1, 1]T = [1, 0, 0]T
yuv yuv yuv yuv yuv diag
s.t.
(25) (26) (27) (28) (29)
(21) (22)
(W0,c + W1,c + + WM,c )13 = 13 ,
rgb
rgb
rgb
136
where C is the conversion matrix from RGB space to some luminancechrominance space with Y, U, and V denoting the corresponding luminance and chrominance comyuv ponents, Wm,l is the weighting matrix in the luminance diag chrominance space, R33 is the set of 3 3 diagonal matrices, and I33 is the 3 3 identity matrix. Constraint (25) is there because the LMMSE lter is applied separately to each component in the luminancechrominance space. Constraint (26) implies that the sum of weighting coefcients in the ltering of every component is equal to 1. Constraint (29) implies that the weights for red, green, and blue in the color space conversion matrix C should add up to 1 for the luminance component and zero for chrominance components [20], as in the case of the commonly used YCbCr conversion matrix. Combining (24) and (27), it can be deduced that, for LAYUV y R zR zR yuv y G = C1 W0,l C zG zG y B zB zB p1R p1R yuv +C1 W1,l C p1G p1G + p1B p1B yR pMR pMR yuv +C1 WM,l C pMG pMG + yG (30) pMB pMB yB which can be rewritten y R rgb y G = W0,l y B as zR zR rgb zG zG + W1,l z B zB pMR pMR rgb + + WM,l pMG pMG pMB pMB
yuv
C R33 satisfying C[1, 1, 1]T = [1, 0, 0]T , and Wm,l R33 satisfying yuv yuv yuv W0,l + W1,l + ...WM,l = I33 such that Wm,l = C1 Wm,l C.
rgb yuv yuv diag
(34)
As such, the solution of LAYUV satises (W0,l + W1,l + ...WM,l )13 = = C1 (W0,l + W1,l + ...WM,l )C13
yuv yuv yuv rgb rgb rgb
C1 I33 C13 = I33 13 = 13
(35)
which is the constraint (22) of CIFIC. In other words, any feasible solution of LAYUV is a feasible solution of CIFIC. On the other hand, by inspection, a feasible solution of CIFIC as dened in the CIFIC constraint (22) does not necessarily satisfy the LAYUV constraint (34) and thus is not necessarily a feasible solution of LAYUV. From this reformulation, it can be seen that LAYUV minimizes exactly the same objective function as CIFIC, except that its optimization is over a more restricted (smaller) feasible set than CIFIC. Therefore CIFIC is better than LAYUV in the sense that the denoising error achieved by CIFIC is smaller than or at least equal to that achieved by LAYUV. In the next section, the superiority of CIFIC is further veried by experiments.
p1R p1R p1G p1G p1B p 1B yR + yG , yB (31)
V. Experimental Results For performance evaluation of CIFIC, experiments are conducted on the ve color video sequences Bus, Chair, Renata, Salesman, and Tennis which have been used as test sequences in [16], and one additional sequence Football with fast and complex motions. In order to simulate the noise, articial white Gaussian noise is generated and added to clean video sequences. The subjective quality as well as the objective metric cPSNR are employed as the denoising performance measures, with cPSNR dened as [16] ) y c (x))2 (36) where x denotes the spatial coordinate of pixels and K is the total number of pixels in one image. In our experiments, we choose the block size Kx Ky to be 8 8 with Dx = Dy = 4. These choices provide a good trade-off between the ME accuracy and robustness toward noise. First, we perform a study on the weight choices in (5) in the joint-RGB ME. The optimal weights as dened in (6), (7), and (8) are compared with the following ve sets of weights: (R , G , B ) = (1, 0, 0) for R-only, (0, 1, 0) for G-only, (0, 0, 1) for B-only, ( 1 , 1 , 1 ) for average-RGB, 3 3 3 (0.299, 0.587, 0.114) for Y (as in traditional YCbCr color space). Fig. 4 plots the frame-by-frame cPSNR of denoised sequences using different sets of weights in the joint-RGB ME in CIFIC. Fig. 4(a) is the case of the uniform noise, i.e. identical noise power in each RGB channel and Fig. 4(b) is the case of the nonuniform noise, i.e., different noise power (3K)1
c=R,G,B x (yc (x)
with Wm,l = C1 Wm,l C. Thus LAYUV in essence is also a linear estimator in the RGB space, since (31) has the same form as (20). Note that the optimization in (23) is over C and yuv the set of Wm,l , m {0, 1, ..., M }. This is equivalent to the rgb optimization over Wm,l , m {0, 1, ..., M } under the constraint rgb rgb yuv that Wm,l can be decomposed as Wm,l = C1 Wm,l C for some yuv C that satises (28) and (29) and some Wm,l that satises (25) and (26). Then the LAYUV optimization problem can be written equivalently as
rgb
rgb
cPSNR = 10 log10 (
2552
min
W0,l ,...,WM,l
rgb
s.t.
E[( yR yR )2 ]+ E[( yG yG )2 ]+ E[( yB yB )2 ] (32) y R y G y B zR zR p1R p1R rgb rgb = W0,l zG zG + W1,l p1G p1G zB zB p1B p1B pMR pMR yR rgb + + WM,l pMG pMG + yG , pMB pMB yB (33)
137
Fig. 4. Frame-by-frame cPSNR of denoised sequence Tennis using different weights in joint-RGB ME under various noise levels. (a) nR = nG = nB = 25; (b) nR = 30, nG = 20, nB = 10.
in each RGB channel. In the uniform noise case, the optimal weights derived from (6), (7), and (8) degenerate into those of average-RGB and thus both have the same curve. As observed from Fig. 4(a), the performance of the single-component (Ronly, G-only, and B-only) is very similar to each other. This is probably because, when the noise effects on three color components are identical, the highly similar high-frequency contents of red, green, and blue components tend to give similar ME accuracy and thus similar cPSNR. The cPSNR of Y is signicantly better than that of the single-component, since the effective noise variance in Y is signicantly lower than that in any single-component. As expected, the adaptive optimal weights which minimize the overall noise level in ME, give the best cPSNR, being better than Y. In Fig. 4(b), the denoising performances of the three single-component MEs vary a lot. We observe that R-only has high cPSNR due to the relatively small nR . On the contrary, the cPSNR of Bonly is low due to the relatively large nB . Both average-RGB and Y have reasonably good cPSNR, but still worse than the adaptive optimal weights, which remain the best as expected. These verify that the proposed optimal weights are capable of assisting CIFIC to achieve a good denoising performance. To demonstrate the improvements brought about by the joint-RGB ME as well as the intercolor prediction, simulation is conducted to compare CIFIC with de-generated versions of the proposed denoising framework: LMMSE applied independently in the red, green, and blue components with single-component ME and with joint-RGB ME, denoted for short by LRGB and LRGBjme respectively. In both LRGB and LRGBjme , only pure interframe prediction is utilized. In other words, the wR and z are simplied to be (M + 1)-tuple vectors in (9), using only predictors from the same color in multiple reference frames. In LRGBjme , the proposed jointRGB ME with optimal weights as dened in (6), (7), and (8) is applied to generate a single motion eld for all three color components, while in LRGB, single-component ME is applied to each component to nd its corresponding temporal predictions. To verify the advantages of the proposed improvement common to grayscale and color video denoising, including the overlapping-block-based processing as well as the quarterpixel ME, MHMCF applied independently in the red, green,
and blue channels is also included in the comparison. CIFIC is also compared with our previous work LAYUV. Note that LAYUV in [20] is implemented based on the framework of MHMCF, which is further improved in this paper. To be fair, LAYUV is re-implemented on the basis of the improved framework, and it is made sure that CIFIC and LAYUV apply the identical ME process. In addition, we compare CIFIC with several other state-ofthe-art color video denoising algorithms: WRSTFC [16], nonlocal means for color signal (NLMC for short) [19], VBM3D [11], and STGSM [14]. WRSTFC and NLMC are specially designed for color noise reduction and reported to achieve advanced performance among the wavelet-domain denoisers and the pixel-domain denoisers respectively. So they constitute good references for evaluating our algorithm. To the best of our knowledge, VBM3D [11] is among the best grayscale video denoisers published so far, and STGSM [14] is one of the newest grayscale video denoisers. To handle the color video signal, we apply VBM3D and STGSM independently in each of the red, green, and blue components. In the comparison, we also consider two scenarios of color noise: uniform noise and nonuniform noise. For the uniform noise, we simulate two standard deviation values: 15 and 25. For the nonuniform noise, we also consider two situations with noise standard deviation (nR , nG , nB ) being (10, 20, 30) and (30, 20, 10). The cPSNR values averaged over all frames in every sequence are displayed in Table III and IV. In the implementation of MHMCF, LRGB, LRGBjme , and LAYUV, the reference frame buffer consists of two denoised previous frames as in [20]. For CIFIC, two reference frame choices are considered: one with two denoised previous frames (denoted as CIFIC2ref ), and the other one with an additional noisy future frame (denoted as CIFIC3ref ). The denoising results of WRSTFC are taken from [16], where only the uniform noise is considered and tested. The results of STGSM and VBM3D are obtained by running the corresponding MATLAB codes2 with default parameter settings, where eight neighboring frames are used as reference frames. For NLMC, we implement it
2 STGSM is available at http://www.ece.uwaterloo.ca/ z70wang/research/ stgsm/ and VBM3D at http://www.cs.tut./ foi/GCF-BM3D/
138
TABLE III cPSNR Comparisons of Various Color Video Denoising Algorithms (Uniform Noise) nR = nG = nB = 15 LAYUV NLMC WRSTFC 32.99 30.64 30.93 35.73 34.03 34.44 31.02 29.77 NA 32.90 31.04 31.19 34.54 32.12 34.59 32.49 30.85 31.38 33.28 31.41 32.22 nR = nG = nB = 25 LAYUV NLMC WRSTFC 29.98 27.82 28.25 33.18 31.08 31.80 28.31 27.32 NA 30.24 27.94 28.96 31.49 29.03 32.04 29.41 28.36 28.80 30.44 28.59 29.97
Bus Chair Football Renata Salesman Tennis Average
MHMCF 28.73 32.26 27.77 29.37 31.16 29.80 29.85 MHMCF 25.67 29.19 24.85 26.13 27.64 26.65 26.69
LRGB 30.90 34.84 29.58 31.95 33.16 31.14 31.93 LRGB 27.73 32.17 26.81 29.04 30.00 28.03 28.96
LRGBjme 31.48 35.24 29.98 32.31 33.94 31.71 32.44 LRGBjme 28.56 32.82 27.39 29.77 31.09 28.61 29.71
STGSM 30.48 35.03 29.88 32.69 33.97 31.29 32.69 STGSM 27.73 32.70 27.23 30.33 30.66 28.53 29.53
VBM3D 30.55 36.08 30.57 32.76 35.13 32.38 32.91 VBM3D 27.81 33.91 27.80 30.23 32.13 29.61 30.25
CIFIC2ref 33.14 36.25 31.33 33.14 35.04 32.58 33.58 CIFIC2ref 30.15 33.60 28.60 30.62 31.93 29.54 30.74
Note: The average cPSNR for WRSTFC displayed in this table is the average of the ve available cPSNR values. TABLE IV cPSNR Comparisons of Various Color Video Denoising Algorithms (Nonuniform Noise) nR LRGBjme 29.74 33.92 28.28 30.66 31.88 29.77 30.71 nR LRGBjme 29.74 33.55 28.44 30.72 32.42 29.86 30.79 = 10, nG = 20, nB = 30 LAYUV NLMC STGSM 32.16 29.58 28.79 34.55 32.86 33.78 29.69 28.89 28.16 31.49 29.78 31.21 32.62 31.20 31.28 31.12 29.84 29.57 31.94 30.36 30.47 = 30, nG = 20, nB = 10 LAYUV NLMC STGSM 32.07 29.48 28.78 34.29 32.87 33.42 29.76 28.85 28.43 31.51 29.96 31.35 33.15 31.01 32.43 31.18 29.87 29.68 31.99 30.34 30.68
MHMCF 26.90 30.29 26.07 27.22 28.82 27.78 27.85 MHMCF 26.82 30.25 26.06 27.33 28.89 27.89 27.87
LRGB 28.89 33.27 27.91 30.01 30.92 29.11 30.02 LRGB 28.87 33.01 28.03 30.14 31.34 29.23 30.10
VBM3D 28.95 34.94 28.85 31.15 32.69 30.62 31.20 VBM3D 28.91 34.67 28.99 31.22 33.46 30.77 31.34
ourselves following the descriptions in [19] to lter the test sequences, using four reference frames including two consecutive previous frames and two consecutive future frames. Comparing the second and third columns in Tables III and IV, it can be observed that LRGB produces considerably better cPSNR results than MHMCF, with the average cPSNR gain over 2 dB under each color noise case. This is mainly due to the redundant estimation of the overlapped-block-based processing and improved temporal prediction resulting from the quarter-pixel ME. In addition, LRGBjme consistently outperforms LRGB by an average cPSNR gain of 0.66 dB, which can be explained by the reason that the effective noise level in the joint-RGB ME is lower than that in any single-component ME. Furthermore, with the introduction of intercolor prediction, CIFIC2ref signicantly outperforms LRGBjme , by up to 2.63 dB cPSNR gain. On the average, the cPSNR gains under the four noise levels are 1.14, 1.03, 1.80, and 1.77 dB, respectively.
These substantial gains fully verify the effectiveness of the intercolor predictors. As expected, CIFIC2ref achieves higher cPSNR than LAYUV. The cPSNR gains of CIFIC2ref over LAYUV can reach 1.10 dB, and the average gains under the uniform noise and the nonuniform noise are 0.30 and 0.57 dB, respectively. In Tables III and IV, CIFIC3ref with one more reference frame can achieve a modest cPSNR gain of 0.090.33 dB over CIFIC2ref . In comparison with other four state-of-theart algorithms, CIFIC can achieve competitive cPSNR results. As observed, both being pixel-domain techniques, CIFIC3ref clearly outperforms NLMC. Also, CIFIC3ref outperforms the two wavelet-domain techniques WRSTFC and STGSM in all cases. When comparing with VBM3D, CIFIC3ref produces slightly worse performance only for limited cases, and higher cPSNR values most of the time. In particular, for the uniform noise, on average CIFIC3ref outperforms all the four. For the
139
Fig. 5. Visual comparisons of denoised frame 69 of the sequence Tennis. (a) original clean; (b)(f) denoised versions (nR = nG = nB = 25): (b) NLMC, (c) WRSTFC, (d) STGSM, (e) VBM3D, (f) CIFIC3ref .
nonuniform noise, CIFIC3ref produces the best results on every test sequence. For visual quality comparisons, some typical examples are depicted in Fig. 5. Although the added noise is large, CIFIC can remove them effectively resulting in clean and sharp denoised images. While the four state-of-the-art denoising methods suppress noise well, they tend to produce oversmoothed images [Fig. 5(b), (d), (e)], introduce visually unpleasant ringing artifacts [Fig. 5(d)], or leave more remaining noise than CIFIC [Fig. 5(c)]. The enlarged fragments in each gure are shown in the bottom row of Fig. 5 to further help demonstrate the good quality of the denoised images output by CIFIC in terms of the noise reduction capability as well as the faithful detail preservation. Quite recently, the inventors of VBM3D release a software for color video denoising called CVBM3D,3 which extends the VBM3D to color ltering by the same approach in the Color-BM3D image denoising algorithm [35], or more specically, by transforming the RGB noisy video into a luminancechrominance color space, forming grouping only within the luminance channel, and applying collaborative ltering independently to each new channel. CVBM3D is quite effective in color noise removal, produces better results than VBM3D, and is thus probably the most advanced color video denoising software so far. Therefore, we nd it meaningful to compare CIFIC with it. As the CVBM3D software does not support the scenario of nonuniform noise, we compare in Fig. 6 the framewise cPSNR of CIFIC and CVBM3D under uniform noise only. The performance of CIFIC for Chair and
3 Available
Fig. 6. Framewise cPSNR comparisons of CIFIC and CVBM3D for the sequence Bus, Chair, and Football under the noise level nR = nG = nB = 15 (only the cPSNR values for the rst 30 frames are displayed).
at http://www.cs.tut./ foi/GCF-BM3D/
Football is not as good as that of the CVBM3D, while for the sequence Bus, the performance of CIFIC is better than that of the CVBM3D. We must point out that these results are still very encouraging since we are only considering a pixel-domain procedure, and in the future, we will think about further improvement from the angle of sparse representation by advanced transforms. In our C implementation of CIFIC running on a Intel Core 2.8GHz CPU and a Windows 7 operating system, the processing time of CIFIC2ref and CIFIC3ref are 1.1 and 1.6 seconds, respectively, for a common intermediate format (CIF) frame. Note that the computational complexity of CIFIC is evaluated for an implementation that has not been fully
140
optimized for speed. For example, we simply apply the full search in ME. The processing speed of CIFIC can be further enhanced through fast ME algorithms, multithreading and parallelism design for possible real-time applications. Running at the same platform as CIFIC, the processing time of NLMC for a CIF frame is much longer, about tens of seconds. It is reported in [36] that on a 2.4 GHz CPU and a GNU/Linux operating system, the processing time of WRSTFC for a CIF frame is 1 s. For STGSM and VBM3D (or CVBM3D), we can only roughly compare the computational complexity, since they are available in MATLAB implementation only. The computational complexity of CIFIC depends on that of ME, parameter estimation and remedial renement, and LMMSE ltering, among which ME is the most computationintensive part. To denoising one frame, CIFIC2ref performs ME between two frame pairs, and CIFIC3ref performs ME between three frame pairs. In contrast, STGSM and VBM3D (or CVBM3D) need to perform ME between eight and sixteen frame pairs, respectively. [Although VBM3D (or CVBM3D) uses only eight reference frames, it is implemented in two separate steps, and in each step, it needs to perform ME between eight frame pairs]. Therefore, we believe that the computation complexity of CIFIC would be no greater than, or at least comparable to, that of STGSM and VBM3D (or CVBM3D). VI. Conclusion In this paper, we proposed an effective color video denoising algorithm CIFIC which directly exploits both the intercolor and interframe correlation in the color video signal. CIFIC applied joint-RGB ME to nd a common motion eld for RGB components and combined the current noisy observation as well as the interframe and intercolor predictors to obtain the LMMSE denoised estimate. The ill condition in LMMSE weight determination was detected and remedied by the remedial renement to get rid of the annoying color artifacts. Further, we deduce that our previous work LAYUV can be reformulated as a restricted problem of CIFIC, and thus cannot achieve lower denoising error than CIFIC. Experimental results verify the effectiveness of CIFIC in color noise reduction when compared with other state-of-the-art algorithms. Appendix A The proof of the positive semideniteness of the estimated 2 2 covariance matric: Suppose the 2 2 matrix A is denoted as a b b c (37)
Appendix B > 0: As stated in Section III-D, The proof of Cov(r0 ) is a block diagonal matrix and so is Cov1 (r0 ) which can be expressed as
2 nR 0 1 1T 3 Cov (r0 )13
0 A1
(40)
where A1 is a 2 2 positive semidenite matrix. Then

T 1 2 1T 3 Cov (r0 )13 = nR + 12 A1 12 .
(41)
As the second term in the right side of (41) is always 1 nonnegative, 1T 3 Cov (r0 )13 > 0 always holds. References
[1] A. C. Bovik, Handbook of Image and Video Processing. Amsterdam, The Netherlands: Elsevier Academic, Jun. 2005. [2] R. P. Kleihorst, R. L. Lagendijk, and J. Biemond, Noise reduction of image sequences using motion compensation and signal decomposition, IEEE Trans. Image Process., vol. 4, no. 3, pp. 274284, Mar. 1995. [3] R. Dugad and N. Ahuja, Video denoising by combining kalman and wiener estimates, in Proc. IEEE Int. Conf. Image Process., vol. 4, 1999, pp. 152156. [4] V. Zlokolica, W. Philips, and D. V. D. Ville, A new non-linear lter for video processing, in Proc. 3rd IEEE Benelux Signal Process. Symp., vol. 2, Mar. 2002, pp. 221224. [5] V. Zlokolica and W. Philips, Motion- and detail-adaptive denoising of video, in Proc. SPIE, Image Process.: Algorithms Syst. III, vol. 5298, pp. 403412, 2004. [6] A. Buades, B. Coll, and J. M. Morel, Denoising image sequences does not require motion estimation, in Proc. IEEE Conf. Adv. Video Signal Based Surveillance, Sep. 2005, pp. 7074. [7] L. Guo, O. C. Au, M. Ma, and Z. Liang, Temporal video denoising based on multihypothesis motion compensation, IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 10, pp. 14231429, Oct. 2007. [8] D. L. Donoho, De-noising by soft-thresholding, IEEE Trans. Inform. Theory, vol. 41, no. 3, pp. 613627, May 1995. [9] A. Pizurica, V. Zlokolica, and W. Philips, Combined wavelet domain and temporal video denoising, in Proc. IEEE Conf. Adv. Video Signal Based Surveillance, Jul. 2003, pp. 334341. [10] V. Zlokolica, A. Pizurica, and W. Philips, Wavelet-domain video denoising based on reliability measures, IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 8, pp. 9931007, Aug. 2006. [11] K. Dabov, A. Foi, and K. Egiazarian, Video denoising by sparse 3D transform-domain collaborative ltering, in Proc. 15th Eur. Signal Process. Conf., vol. 1, no. 2, 2007, pp. 7074. [12] S. Yu, M. O. Ahmad, and M. N. S. Swamy, Video denoising using motion compensated 3-D wavelet transform with integrated recursive temporal ltering, IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 6, pp. 780791, Jun. 2010. [13] S. M. M. Rahman, M. O. Ahmad, and M. N. S. Swamy, Video denoising based on inter-frame statistical modeling of wavelet coefcients, IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 2, pp. 187198, Feb. 2007. [14] G. Varghese and Z. Wang, Video denoising based on a spatiotemporal Gaussian scale mixture model, IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 7, pp. 10321040, Jul. 2010. [15] M. Maggioni, G. Boracchi, A. Foi, and K. Egiazarian, Video denoising, deblocking and enhancement through separable 4-D nonlocal spatiotemporal transforms, IEEE Trans. Image Process., 2012, to be published. [16] V. Zlokolica, A. Pizurica, and W. Philips, Wavelet based motion compensated ltering of color video sequences, in Proc. SPIE, Wavelets XI, vol. 5914, pp. 580590, 2005. [17] N. X. Lian, V. Zagorodnov, and Y. P. Tan, Video denoising using vector estimation of wavelet coefcients, in Proc. IEEE Int. Symp. Circuits Syst., May 2006, pp. 26732676. [18] T. F. Rabie, Robust color video denoising, in Proc. IEEE Int. Conf. Comput. Syst. Appl., May 2006, pp. 792798. [19] B. Goossens, H. Luong, J. Aelterman, A. Pi zurica, and W. Philips, A GPU-accelerated real-time NLMeans algorithm for denoising color video sequences, in Lecture Notes in Computer Science (Advanced Concepts For Intelligent Vision Systems), 2010, pp. 4657.
where a, b, and c satisfy a > 0, c > 0, and 1 b2 /ac 1. For an arbitrary nonzero vector v = [v1 , v2 ]T
2 vT Av = av2 1 + 2bv1 v2 + cv2
(38)
which can be reformulated as b ac b2 2 a(v1 + v2 )2 + v2 . (39) a a As ac b2 0 and a > 0, the formula above is always nonnegative, and thus A is positive semidenite.
141
[20] J. Dai, O. C. Au, W. Yang, C. Pang, F. Zou, and X. Wen, Color video denoising based on adaptive color space conversion, in Proc. IEEE Int Symp. Circuits Syst., MayJun. 2010, pp. 29922995. [21] B. K. Gunturk, Y. Altunbasak, and R. M. Mersereau, Color plane interpolation using alternating projections, IEEE Trans. Image Process., vol. 11, no. 9, pp. 9971013, Sep. 2002. [22] X. Wu and N. Zhang, Primary-consistent soft-decision color demosaicking for digital cameras, IEEE Trans. Image Process., vol. 13, no. 9, pp. 12631274, Sep. 2004. [23] L. Goffman-Vinopal and M. Porat, Color image compression using inter-color correlation, in Proc. IEEE Int. Conf. Image Process., vol. 2, Jun. 2002, pp. 353356. [24] S. Benierbah and M. Khamadja, Compression of colour images by inter-band compensated prediction, IEE Proc. Vision, Image Signal Process., vol. 153, no. 2, Apr. 2006, pp. 237243. [25] Y. H. Kim, B. Choi, and J. Paik, High-delity RGB video coding using adaptive inter-plane weighted prediction, IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 7, pp. 10511056, Jul. 2009. [26] M. D. Fairchild, Color Appearance Models, New York: Wiley, 2005. [27] M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, Wavelet-based statistical signal processing using hidden Markov models, IEEE Trans. Signal Process., vol. 46, no. 4, pp. 886902, Apr. 1998. [28] J. K. Romberg, H. Choi, and R. G. Baraniuk, Bayesian tree-structured image modeling using wavelet-domain hidden Markov models, IEEE Trans. Image Process., vol. 10, no. 7, pp. 10561068, Jul. 2001. [29] A. Buades, B. Coll, and J. M. Morel, A review of image denoising algorithms, with a new one, Multiscale Modeling Simulation, vol. 4, no. 2, pp. 490530, 2005. [30] Video Coding for Low Bitrate Communication, ITU-T Recommendation H.263, 1995. [31] N. X. Lian, V. Zagorodnov, and Y. P. Tan, Edge-preserving image denoising via optimal color space projection, IEEE Trans. Image Process., vol. 15, no. 9, pp. 25752587, Sep. 2006. [32] T. Wedi and H. G. Musmann, Motion- and aliasing-compensated prediction for hybrid video coding, IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 577586, Jul. 2003. [33] J. Ostermann and M. Narroschke, Motion Compensated Prediction with 1/8-pel Displacement Vector Resolution, ITU-T Q.6/SG16, Doc. VCEG-AD09, 2006. [34] Y. C. Eldar and N. Merhav, Minimax MSE-ratio estimation with signal covariance uncertainties, IEEE Trans. Signal Process., vol. 53, no. 4, pp. 13351347, Apr. 2005. [35] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, Color image denoising via sparse 3d collaborative ltering with grouping constraint in luminance-chrominance space, in Proc. IEEE Int. Conf. Image Process., vol. 1, Sep. 2007, pp. 313316. [36] V. Zlokolica, Advanced nonlinear methods for video denoising, Ph.D. dissertation, Gent University, 2006.
His main research contributions are on video and image coding and processing, watermarking and light weight encryption, speech, and audio processing. Research topics include fast motion estimation for MPEG-1/2/4, H.261/3/4, AVS, optimal and fast suboptimal rate control, mode decision, transcoding, denoising, deinterlacing, post-processing, multiview coding, scalable video coding, distributed video coding, subpixel rendering, JPEG/JPEG2000, HDR imaging, compressive sensing, halftone image data hiding, GPU-processing, and software-hardware co-design, etc. He has published about 320 technical journals and conference papers. His fast motion estimation algorithms were accepted into the ISO/IEC 14496-7 MPEG-4 international video coding standard and the China AVS-M standard. His light-weight encryption and error resilience algorithms are accepted into the China AVS standard. He has eight U.S. patents and is applying for over 60 more on his signal processing techniques. He has performed forensic investigation and stood as an expert witness in the Hong Kong courts many times. Dr. Au is a Board of Governors Member of the Asia Pacic Signal and Information Processing Association. He has been an Associate Editor of the IEEE Transactions on Circuits and Systems for Video Technology, the IEEE Transactions on Image Processing, and the IEEE Transactions on Circuits and Systems, Part 1. He is on the Editorial Boards of Journal of Signal Processing Systems, Journal of Multimedia, and Journal of Franklin Institute. He has been the Chair of the CAS Technical Committee on Multimedia Systems and Applications, the Vice Chair of SP TC on Multimedia Signal Processing, and a member of CAS TC on Video Signal Processing and Communications, CAS TC on DSP, and SP TC on Image, Video and Multidimensional Signal Processing. He has served on the Steering Committee of IEEE Transactions on Multimedia and IEEE International Conference of Multimedia and Expo. He has also served on the organizing committee of IEEE International Symposium on Circuits and Systems in 1997, IEEE International Conference on Acoustics, Speech, and Signal Processing in 2003, the ISO/IEC MPEG 71st Meeting in 2005, International Conference on Image Processing in 2010, and other conferences. He was the General Chair of the PacicRim Conference on Multimedia (PCM) in 2007, IEEE International Conference on Multimedia and Expo in 2010 and Packet Video Workshop in 2010. He won best paper awards in SiPS 2007 and PCM 2007. He is an IEEE Distinguished Lecturer in 2009 and 2010, and has been keynote speaker several times.
Jingjing Dai (S09) received the B.E. degree in information engineering from the Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China, in 2008. She is currently pursuing the Ph.D. degree from the Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong. Her current research interests include image/video processing, next-generation video coding, and video codec optimization.
Chao Pang (S09) received the B.E. degree in telecommunication engineering from the Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China, in 2007. He is currently pursuing the Ph.D. degree from the Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Hong Kong. His current research interests include image/video processing, multimedia communication, and nextgeneration video coding.
Oscar C. Au (S87M90SM01F11) received the B.A.Sc. degree from the University of Toronto, Toronto, Canada, in 1986, and the M.A. and Ph.D. degrees from Princeton University, NJ, in 1988 and 1991, respectively. After being a Post-Doctoral Researcher with Princeton University for one year, he joined the Hong Kong University of Science and Technology (HKUST), Kowloon, Hong Kong, as an Assistant Professor in 1992. He is/has been a Professor of the Department of Electronic and Computer Engineering, the Director of Multimedia Technology Research Center, and the Director of the Computer Engineering Program with HKUST.
Feng Zou (S09) received the B.S. degree in electrical engineering from Harbin Institute of Technology, Harbin, China, in 2008 and is currently pursuing the Ph.D. degree in electronic and computer engineering in the Hong Kong University of Science and Technology, Hong Kong. While at Hong Kong University of Science and Technology, he is supervised by Prof. Oscar C. Au. His current research interests include intra prediction, transform, and quantization designs in video compression. He is actively contributing proposals in the standardization of HEVC under the ITU-T/ISO/IEC Joint Collaborative Team on Video Coding and holds several video coding related patents.

06213094

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

06213094

Загружено:

Авторское право:

Доступные форматы

128

Color Video Denoising Based on Combined Interframe and Intercolor Prediction

1051-8215/$31.00 c 2012 IEEE

Interframe and intercolor correlation.

[R (Rc ur(x) Rr ef(x + mv))

regardless of whether the reference frame is denoised or not.

By Lagrange multipliers, the optimal weights are R = G = B =

(6) (7) (8)

wT R Cov(r)wR wT R 1 = 1. Cov1 (r)1 1T Cov1 (r)1

The optimal wR can be solved using Lagrange multipliers as wR = (14)

Algorithm 1 Remedial Renement

E[( yR yR )2 ]+E[( yG yG )2 ]+E[( yB yB )2 ] (23) y Y y U y V

(25) (26) (27) (28) (29)

(W0,c + W1,c + + WM,c )13 = 13 ,

C1 I33 C13 = I33 13 = 13

p1R p1R p1G p1G p1B p 1B yR + yG , yB (31)

Bus Chair Football Renata Salesman Tennis Average

Bus Chair Football Renata Salesman Tennis Average

Bus Chair Football Renata Salesman Tennis Average

Bus Chair Football Renata Salesman Tennis Average

where A1 is a 2 2 positive semidenite matrix. Then

Вам также может понравиться