Академический Документы
Профессиональный Документы
Культура Документы
Accepted Manuscript
PII: S0925-2312(18)30914-7
DOI: https://doi.org/10.1016/j.neucom.2018.07.076
Reference: NEUCOM 19830
Please cite this article as: Zhanxiang Feng, Jianhuang Lai, Xiaohua Xie, Junyong Zhu, Image
Super-Resolution via a Densely Connected Recursive Network, Neurocomputing (2018), doi:
https://doi.org/10.1016/j.neucom.2018.07.076
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
T
a School
of Electronics and Information Technology, Sun Yat-sen University, China
b School
of Data and Computer Science, Sun Yat-sen University, China
c Guangdong Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of
IP
Education, China
CR
Abstract
US
The single-image super-resolution techniques (SISR) have been significantly pro-
moted by deep networks. However, the storage and computation complexities
of deep models increase dramatically alongside with the reconstruction perfor-
AN
mance. This paper proposes a densely connected recursive network (DCRN) to
trade off the performance and complexity. We introduce an enhanced dense unit
by removing the batch normalization (BN) layers and employing the squeeze-
M
1. Introduction
∗ Corresponding author. Present address: School of Data and Computer Science, Sun
Yat-sen University, Guangzhou Higher Education Mega Center, Panyu District, Guangzhou,
Guangdong, 510006, P. R. China. Tel.: +86-13168313819. Fax: +86-20-84110175.
Email addresses: fengzhx@mail2.sysu.edu.cn (Zhanxiang Feng),
stsljh@mail.sysu.edu.cn (Jianhuang Lai), xiexiaoh6@mail.sysu.edu.cn (Xiaohua Xie),
zhujuny5@mail.sysu.edu.cn (Junyong Zhu)
T
Deep network-based approaches [1, 2, 3, 4, 5] have been widely studied for
IP
the SISR task. Deep learning methods intend to reconstruct HR outputs with
10 clear details by learning an end-to-end non-linear mapping between LR and HR
CR
images. Deep networks have significantly advanced the reconstruction perfor-
mance of the existing literature. Increasing the network depth is proven effective
for improving the performance of deep models. However, the computation and
15 US
storage complexities increase rapidly with the increasing network depth. Con-
ducting SISR with a very deep network is costly for practical applications with
limited storage space and computation resources. Therefore, balancing the per-
AN
formance and the complexity is of great importance to deep learning methods.
In this paper, a novel network named the densely connected recursive net-
work (DCRN) is proposed to reconstruct high-quality images with fewer param-
M
20 eters and less computation time. The DCRN is derived from the DenseNet [6],
forming an enhanced dense unit to adapt the SISR task. Specially, the BN layers
ED
are removed from the traditional dense unit to avoid the smoothing operation
and reduce the executive consumption, while the SE blocks [7] are employed to
learn globally structured features. Besides, we use a recursive architecture to
PT
25 keep the network compact while increasing the network depth. Furthermore,
we propose a de-convolution based residual learning approach, which extracts
CE
deep features from LR input and achieves the residual output through the de-
convolution layer. The execution speed of residual image estimation is largely
accelerated by the proposed approach.
AC
2
ACCEPTED MANUSCRIPT
fully.
T
IP
2. Related work
CR
2.1. Deep Network-Based Methods for SISR
US
al. [1] pioneered a CNN model named the SRCNN that learns the non-linear
mapping from LR image patches to HR image patches via three fully connected
layers. Kim et al. [8] proposed a deep neural network named the very deep
AN
45 neural network (VDSR). The VDSR is the first work to successfully advance
the performance of SISR by increasing the network depth. Kim et al. also [9]
proposed a deeply-recursive convolutional network (DRCN) to reduce the pa-
M
50 et al. [2] introduced a novel CNN model named the deep recursive residual net-
work (DRRN). The DRRN uses both global residual learning and local residual
learning techniques to optimize the network parameters under a multi-path re-
PT
cursive architecture. Note that for deep networks, the execution cost and the
number of parameters increase dramatically alongside with the reconstruction
CE
55 quality and the network depth. Therefore, an essential concern of deep net-
works is to balance the representative ability, the computation complexity, and
the storage consumption.
AC
3
ACCEPTED MANUSCRIPT
very deep networks. ResNet builds short connections between the convolution
layers to strengthen the reuse of resources and reduce the storage complexity.
Huang et al. [6] proposed the DenseNet to reduce the number of parameters.
65 The layers in DenseNet are connected to every other feed-forward layer to avoid
T
the vanishing gradients, reuse the convolution layers, and enhance feature prop-
IP
agation. Hu et al. [7] proposed the SENet to explicitly model the weights of
different channels and adaptively recalibrate the feature representations via the
CR
re-scaled channel-wise responses. SENet is proved to be effective for improving
70 the performance of deep networks with little additional computation cost.
US
3. Densely Connected Recursive Network
75
the enhanced one. The enhanced dense unit makes three significant changes:
First, the BN layers are removed from the original dense unit. The BN
layer has been designed to normalize deep features, which is harmful to SISR
PT
80
mation is significant for SISR. However, the filters of the convolution layers
operate locally, and the contextual information out of the reception filed may
be ignored during feature extraction. We propose to integrate the SE blocks
into the dense unit to address this problem. Specially, global pooling is first
4
ACCEPTED MANUSCRIPT
T
IP
CR
US
Figure 1: Comparison of dense units. (a) The basic dense unit; (b) The enhanced dense
unit. The enhanced dense unit differs from the basic dense unit in eliminating the BN layers
and utilizing the SE blocks.
AN
M
Figure 2: Dense block in the DCRN. The dense block is composed of multiple cascaded
ED
90 applied for each channel to extract the global structural information. Then the
PT
Figure 2 demonstrates the structure of the dense block in the DCRN. A dense
block consists of several cascaded dense units which are concatenated with all
5
ACCEPTED MANUSCRIPT
T
IP
CR
Figure 3: Structure of the recursive block. N denotes the number of dense blocks in a
US
recursive block. Deep features are extracted from the LR input, and a de-convolution layer
is employed to obtain the residual image.
AN
100 other feed-forward dense units. Note that for a dense block which contains n
n(n+1)
dense units, the dense units are computed 2 times. Therefore, employing
a very deep dense block with numerous dense units is time-consuming. We pro-
M
pose to use multiple cascaded dense blocks rather than a very deep block to ease
time complexity. Compared with the dense unit, more structural information
ED
105 is contained in the dense block. Accordingly, for each dense block, we employ a
SE block to recalibrate the features and extract the contextual information.
We adopt the recursive structure [2] to control the parameters of the DCRN
while increasing the network depth. Figure 3 illustrates the architecture of the
CE
110 recursive block in the DCRN. The recursive block contains several dense blocks,
and the parameters of the dense blocks are shared. Note that we attach a
transition layer after each dense block to keep the output compact. Without
AC
transition layers, the dimension of the DCRN will grow as the number of dense
blocks increases. In particular, a multi-path structure is utilized to conduct
115 local residual learning between the input feature and each transition feature.
Semantic information is thus embedded into the recursive architecture.
6
ACCEPTED MANUSCRIPT
T
formulated as:
IP
x0 = f0 (x). (1)
CR
The output of the n-th transition layer under multi-path local residual learning
is formulated as:
US
xn = D(xn−1 ) = fn (Dn (xn−1 )) + x0 .
The output of the recursive block can be regarded as N-fold operations of local
(2)
AN
residual learning between the beginning convolution layer and the transition
layer. The formula of the recursive block is as follows:
Global residual learning has been proven effective for accelerating the opti-
mization process of SISR [8]. For traditional residual learning (TRL) methods,
120 the input image is first up-sampled through interpolation methods, and then
PT
features from the LR input rather than the interpolated image. Then, a de-
convolution layer is utilized to magnify the deep features and obtain the desired
residual image.
Given a training set {xi , y i , x̂i }, i = 1, 2, ...K, where xi is the LR input, y i
is the ground truth, and x̂i is the interpolated image of xi , the residual image
7
ACCEPTED MANUSCRIPT
T
Dense Unit1 40 × 40 × 96 Conv1 40 × 40 × 128
Dense Unit2 40 × 40 × 128 SE Block1 40 × 40 × 128
IP
Dense Unit3 40 × 40 × 160 Conv2 40 × 40 × 32
SE Block 40 × 40 × 160 SE Block2 40 × 40 × 32
CR
Transition 40 × 40 × 64 Concat 40 × 40 × 96
US
r i = y i − x̂i . (4)
AN
The proposed de-convolution based residual learning method intends to mini-
mize the distance between the de-convolution output and the residual image.
The objective function can be formulated as:
M
K
1 X
L= k r i − Deconv(D(N ) (xi )) k22 , (5)
2K i=1
ED
130 4. Experiment
PT
set comprises 291 images, of which 91 samples are from Yang et al. [14], and
the remaining 200 images are from the Berkeley Segmentation Dataset [15]. For
testing, we use four popular SISR benchmarks, namely Set5 [16], Set14 [17],
AC
135
BSD100 [15], and Urban100 [18], which contains 5, 14, 100, and 100 testing
images, respectively.
Evaluation protocols. We compare the DCRN with the state-of-the-art
approaches, including ScSR [14], A+ [19], SelfEx [18], SRCNN [12], VDSR [8],
8
ACCEPTED MANUSCRIPT
T
Transit1 Dense Block1 40 × 40 × 160 40 × 40 × 64
Residual1 {Conv1, Transit1} {40 × 40 × 64}×2 40 × 40 × 64
IP
Dense Block2 Residual1 40 × 40 × 64 40 × 40 × 160
Transit2 Dense Block2 40 × 40 × 160 40 × 40 × 64
CR
Residual2 {Conv1, Transit2} {40 × 40 × 64}×2 40 × 40 × 64
Dense Block3 Residual2 40 × 40 × 64 40 × 40 × 160
Transit3 Dense Block3 40 × 40 × 160 40 × 40 × 64
Residual3 {Conv1, Transit3} {40 × 40 × 64}×2 40 × 40 × 64
Dense Block4
Transit4
Residual4
Residual3
Dense Block4 US
{Conv1, Transit4}
40 × 40 × 64
40 × 40 × 160
{40 × 40 × 64}×2
40 × 40 × 160
40 × 40 × 64
40 × 40 × 64
AN
Dense Block5 Residual4 40 × 40 × 64 40 × 40 × 160
Transition5 Dense Block5 40 × 40 × 160 40 × 40 × 64
Residual5 {Conv1, Transit5} {40 × 40 × 64}×2 40 × 40 × 64
Deconv1 Residual5 40 × 40 × 64 M ×M ×1
M
140 DRCN [9], and DRRN [2], by evaluating the reconstruction performance under
three magnification factors (×2, ×3, ×4). SISR is performed on the YCbCr
ED
color space. The DCRN is only conducted on the luminance component of the
LR image while bi-cubic interpolation is implemented over the other color com-
PT
ponents. The DCRN is compared with the other methods using both qualitative
145 and quantitative measurements. Peak signal to noise ratio (PSNR) and struc-
tural similarity (SSIM) are computed to statistically evaluate the comparisons.
CE
Implementation details. The details of the dense block and dense unit
are presented in Table 1, and the details of the DCRN are presented in Table 2.
AC
The DCRN contains five cascaded dense blocks while each dense block consists
150 of three enhanced dense units. M denotes the desired output size of the de-
convolution layer.
As in [2], the training set is augmented by appending the flipped and rotated
versions of the training images. We flip the training samples horizontally and
9
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
M
Figure 4: Reconstruction results with a scale factor of ×3. The images from top to
bottom are from Set5, Set14, and BSD100, respectively. The DCRN generates sharp details
and maintains the structure of the images.
ED
rotate them by 90◦ , 180◦ , and 270◦ to generate seven additional version of
augmented images for training.
PT
155
half every 30,000 iterations, the clipping parameter as 0.08, the batch size as
32, the momentum parameter as 0.9, and the weight decay as 10−4 .
AC
10
ACCEPTED MANUSCRIPT
Bicubic ScSR [14] A+ [19] SelfEx [18] SRCNN [12] VDSR [8] DRCN [9] DRRN [2] DCRN (ours)
Dataset Scale
PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM
T
×2 30.24/0.8688 31.64/0.8940 32.28/0.9056 32.22/0.9034 32.42/0.9063 33.03/0.9124 33.04/0.9118 33.19/0.9133 33.15/0.9133
Set14 ×3 27.55/0.7742 28.19/0.7977 29.13/0.8188 29.16/0.8196 29.28/0.8209 29.77/0.8314 29.76/0.8311 29.94/0.8339 29.91/0.8339
×4 26.00/0.7027 26.40/0.7218 27.32/0.7491 27.40/0.7518 27.49/0.7503 28.01/0.7674 28.02/0.7670 28.18/0.7701 28.13/0.7698
IP
×2 29.56/0.8431 30.77/0.8744 31.21/0.8863 31.18/0.8855 31.36/0.8879 31.90/0.8960 31.85/0.8942 32.01/0.8969 31.95/0.8965
BSD100 ×3 27.21/0.7385 27.72/0.7647 28.29/0.7835 28.29/0.7840 28.41/0.7863 28.82/0.7976 28.80/0.7963 28.91/0.7992 28.87/0.7994
×4 25.96/0.6675 26.61/0.6983 26.82/0.7087 26.84/0.7106 26.90/0.7101 27.29/0.7251 27.23/0.7233 27.35/0.7262 27.28/0.7257
CR
×2 26.88/0.8403 28.26/0.8828 29.20/0.8938 29.54/0.8946 29.50/0.8946 30.76/0.9140 30.75/0.9133 31.02/0.9164 30.97/0.9161
Urban100 ×3 24.46/0.7349 25.69/0.7831 26.03/0.7973 26.24/0.7989 26.24/0.7989 27.14/0.8279 27.15/0.8276 27.38/0.8331 27.34/0.8329
×4 23.14/0.6577 24.02/0.7024 24.32/0.7183 24.79/0.7374 24.52/0.7221 25.18/0.7524 25.14/0.7510 25.35/0.7576 25.29/0.7572
Method
DRRN
DCRN
PSNR (dB)
33.93
33.91
US SSIM
0.9234
0.9232
Params (M)
10.7
2.8
Time (s)
0.25
0.15
AN
Furthermore, the reconstructed details of DCRN are much sharper and more
precise than those of ScSR, SRCNN, and VDSR. Note that the reconstruction
170 quality of the DCRN is competitive with the DRRN. Judging the performance
PT
tative comparisons. The DCRN outperforms the ScSR, A+, SelfEx, SRCNN,
VDSR, and DRCN for all benchmarks and magnification factors regarding both
175 PSNR and SSIM. Note that the quantitative scores of DCRN are very close to
AC
11
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
Figure 5: Overall performance on Set5 with a scale factor ×3. DCRN outperforms
other methods in balancing the performance and complexity.
M
a quarter of the DRRN (2.8M vs. 10.7M). Besides, the DRRN takes 0.25 sec-
onds to process an image on Set5, while the DCRN takes only 0.15 seconds.
Compared with the DRRN, the DCRN achieves a better trade-off between the
PT
185
12
ACCEPTED MANUSCRIPT
T
DCRN w/o SE block 33.87 0.9229 2.6 0.14
DCRN 33.91 0.9232 2.8 0.15
IP
Table 6: Effectiveness of the recursive learning
CR
Method PSNR (dB) SSIM Params (M) Time (s)
effects of the modifications brought by the enhanced dense unit. The baseline
DenseNet achieves a PSNR of 33.65 dB and a SSIM of 0.9209, while the DCRN
ED
200 achieves a PSNR of 33.91 dB and a SSIM of 0.9232. Obviously, the recon-
struction quality is remarkably improved by the enhanced dense unit. Further,
the running time of the DCRN is smaller than the DenseNet (0.15s vs. 0.18s),
PT
which indicates that the DCRN is more efficient. Attaching the BN layers to
the DCRN results in a PSNR of 33.78 dB and a SSIM of 0.9218, which is much
lower than the proposed DCRN. Meanwhile, 40% additional computation time
CE
205
is brought by the BN layers (0.24s vs. 0.15s). Notably, employing the SE blocks
results in better PSNR (33.91dB vs. 33.87dB) and SSIM (0.9232 vs. 0.9229),
AC
13
ACCEPTED MANUSCRIPT
T
IP
CR
Figure 6: Analysis of the convergence: (a) The training process of DCRN and
US
DenseNet; (b) The PSNR of the DCRN and DenseNet on the test set.
structure also proves to be valid for promoting the PSNR (33.91dB vs. 33.88
AN
dB) and SSIM (0.9232 vs. 0.9229) of the DCRN. Note that the reconstruction
215 quality of the de-convolution based residual learning method is similar to the
traditional residual learning (TRL) approaches (PSNR: 33.91dB vs. 33.93dB,
M
SSIM: 0.9232 vs. 0.9231), but the executive time is much smaller (0.15s vs.
0.91s). The proposed approach is six times faster than the traditional methods
because the deep features are extracted over the LR image.
ED
and DenseNet. The training losses of both DCRN and DenseNet keep decreas-
ing during training, which infers that the training processes are convergent.
Obviously, the DCRN converges much faster than the DenseNet with a smaller
CE
225 initial training loss. We can achieve a convergent model from the DCRN much
easier and faster compared with the DenseNet.
AC
Figure 6(b) illustrates the PSNR of the DCRN and DenseNet on the test set.
The PSNR of the DCRN and DenseNet keep raising in the training process. In
particular, the PSNR of the DCRN is higher than that of DenseNet throughout
230 the training process, indicating a better generalizability. With a favorable gen-
eralizability and a fast convergence speed, the DCRN can achieve a satisfactory
14
ACCEPTED MANUSCRIPT
T
IP
CR
US
Figure 7: The PSNR with respect to the network depth.
235 comes more powerful as the number of dense blocks increases, which indicates
that ‘the deeper, the better’ works for the proposed approach. Figure 7 visually
illustrates this conclusion. However, the performance grows slowly when the
CE
240 DCRN with five dense blocks to balance the reconstruction performance and
the computation complexity.
15
ACCEPTED MANUSCRIPT
5. Conclusion
We propose an efficient network named the DCRN to cope with the SISR
task. The DCRN introduces an enhanced dense unit by removing the BN layers
and adopting the SE blocks. Besides, a recursive architecture is adopted to con-
T
245
IP
based residual learning approach to accelerate the executive speed.
Extensive experiments are conducted on Set5, Set14, BSD100, and Urban
CR
100 datasets. The experimental results validate the efficiency of the DCRN.
250 The effectiveness of the significant components of the DCRN is also proved
in detail. With less storage complexity and computational consumption, the
US
DCRN provides a more flexible option for practical applications.
AN
Acknowledgment
255 References
[2] Y. Tai, J. Yang, X. Liu, Image super-resolution via deep recursive residual
260 network, in: Proceedings of the IEEE Conference on Computer Vision and
CE
265 [4] Z. Feng, J. Lai, X. Xie, D. Yang, L. Mei, Face hallucination by deep traver-
sal network, in: Proceedings of the International Conference on Pattern
Recognition, 2016, pp. 3276–3281.
16
ACCEPTED MANUSCRIPT
270 [6] G. Huang, Z. Liu, K. Q. Weinberger, L. van der Maaten, Densely connected
convolutional networks, arXiv preprint arXiv:1608.06993.
T
[7] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, arXiv preprint
IP
arXiv:1709.01507.
CR
[8] J. Kim, J. Kwon Lee, K. Mu Lee, Accurate image super-resolution using
275 very deep convolutional networks, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2016, pp. 1646–1654.
US
[9] J. Kim, J. Kwon Lee, K. Mu Lee, Deeply-recursive convolutional network
for image super-resolution, in: Proceedings of the IEEE Conference on
AN
Computer Vision and Pattern Recognition, 2016, pp. 1637–1645.
280 [10] W.-S. Lai, J.-B. Huang, N. Ahuja, M.-H. Yang, Deep laplacian pyra-
mid networks for fast and accurate super-resolution, arXiv preprint
M
arXiv:1704.03915.
[11] B. Lim, S. Son, H. Kim, S. Nah, K. M. Lee, Enhanced deep residual net-
ED
works for single image super-resolution, in: Proceedings of the IEEE Con-
285 ference on Computer Vision and Pattern Recognition Workshops, 2017, pp.
136–144.
PT
290 [13] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recog-
AC
17
ACCEPTED MANUSCRIPT
T
300 [16] M. Bevilacqua, A. Roumy, C. Guillemot, M. L. Alberi-Morel, Low-
IP
complexity single-image super-resolution based on nonnegative neighbor
embedding, in: Proceedings of the British Machine Vision Conference,
CR
2012.
310 [19] R. Timofte, V. De Smet, L. Van Gool, A+: Adjusted anchored neigh-
borhood regression for fast super-resolution, in: Proceedings of the Asian
ED
18
ACCEPTED MANUSCRIPT
Zhanxiang Feng
T
IP
CR
Jianhuang Lai
US
AN
M
ED
Xiaohua Xie
PT
CE
AC
ACCEPTED MANUSCRIPT
Junyong Zhu
T
IP
CR
US
AN
M
ED
PT
CE
AC
ACCEPTED MANUSCRIPT
315 Zhanxiang Feng received the B.E. degree in automation from Sun Yat-Sen
University, China, in 2012. He is currently pursuing the Ph.D. degree in infor-
mation and communication engineering with Sun Yat-Sen University, China.
His research interests include person re-identification, face recognition, face hal-
T
lucination, image super-resolution, and visual surveillance. He has authored
IP
320 papers in IEEE TIP and ICPR. His ICPR 2016 paper won the enclosed finalist
best student paper award.
CR
Jianhuang Lai received his M.Sc. degree in applied mathematics in 1989
and his Ph.D. in mathematics in 1999 from Sun Yat-Sen University, China. He
joined Sun Yat-sen University in 1989 as an Assistant Professor, where cur-
325
US
rently, he is a Professor in School of Data and Computer Science. His current
research interests are in the areas of computer vision, pattern recognition and
its applications. He has published over 250 scientific papers in the international
AN
journals and conferences on image processing and pattern recognition, e.g. IEEE
TPAMI, IEEE TNN, IEEE TIP, IEEE TSMC (Part B), Pattern Recognition,
330 ICCV, CVPR and ICDM. Prof. Lai serves as a deputy director of the Image
M
and Graphics Association of China and also serves as a standing director of the
Image and Graphics Association of Guangdong. He is also the deputy director
ED
computing science and the Ph.D. degree in applied mathematics from Sun Yat-
sen University, China, in 2007 and 2010, respectively. He was an Associate
CE
journals and conferences. His current research fields cover image processing,
computer vision, pattern recognition, and computer graphics.
Junyong Zhu received his M.S. and Ph.D. degrees in applied mathematics
in the school of Mathematics and Computational Science from Sun Yat-sen
345 University, Guangzhou, P. R. China, in 2010 and 2014, respectively. He has
21
ACCEPTED MANUSCRIPT
T
350 labeled or unlabeled auxiliary data and non-linear clustering. He has authored
IP
and co-authored papers in international journals and conferences such as IEEE
TIFS, PR, ICIP, AMFG and ICDM. His cooperative ICDM 2010 paper won the
CR
Honorable Mention for Best Research Paper Awards and his CCBR 2012 paper
won the Best Student Paper Awards.
US
AN
M
ED
PT
CE
AC
22