Feng - Super Resolution PDF

Communicated by Dr Chenchen Liu
Accepted Manuscript
Image Super-Resolution via a Densely Connected Recursive Network
Zhanxiang Feng, Jianhuang Lai, Xiaohua Xie, Junyong Zhu
PII: S0925-2312(18)30914-7
DOI: https://doi.org/10.1016/j.neucom.2018.07.076
Reference: NEUCOM 19830
To appear in: Neurocomputing
Received date: 28 December 2017

Revised date: 2 June 2018
Accepted date: 19 July 2018
Please cite this article as: Zhanxiang Feng, Jianhuang Lai, Xiaohua Xie, Junyong Zhu, Image
Super-Resolution via a Densely Connected Recursive Network, Neurocomputing (2018), doi:
https://doi.org/10.1016/j.neucom.2018.07.076
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Image Super-Resolution via a Densely Connected

Recursive Network
Zhanxiang Fenga , Jianhuang Laib,c,∗, Xiaohua Xieb,c , Junyong Zhub,c
T
a School
of Electronics and Information Technology, Sun Yat-sen University, China
b School
of Data and Computer Science, Sun Yat-sen University, China
c Guangdong Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of
IP
Education, China
CR
Abstract
US
The single-image super-resolution techniques (SISR) have been significantly pro-
moted by deep networks. However, the storage and computation complexities
of deep models increase dramatically alongside with the reconstruction perfor-
AN
mance. This paper proposes a densely connected recursive network (DCRN) to
trade off the performance and complexity. We introduce an enhanced dense unit
by removing the batch normalization (BN) layers and employing the squeeze-
M
and-excitation (SE) structure. A recursive architecture is also adopted to con-

trol the parameters of deep networks. Moreover, a de-convolution based residual
ED
learning method is proposed to accelerate the residual feature extraction pro-

cess. The experimental results validate the efficiency of the proposed approach.
Keywords: Image Super-Resolution, Deep Learning, Enhanced Dense
PT
Unit, Recursive Structure, Residual Learning

CE
1. Introduction
Single image super-resolution (SISR) refers to the process of recovering high-

resolution (HR) images from low-resolution (LR) inputs. With the increasing
AC
∗ Corresponding author. Present address: School of Data and Computer Science, Sun
Yat-sen University, Guangzhou Higher Education Mega Center, Panyu District, Guangzhou,
Guangdong, 510006, P. R. China. Tel.: +86-13168313819. Fax: +86-20-84110175.
Email addresses: fengzhx@mail2.sysu.edu.cn (Zhanxiang Feng),
stsljh@mail.sysu.edu.cn (Jianhuang Lai), xiexiaoh6@mail.sysu.edu.cn (Xiaohua Xie),
zhujuny5@mail.sysu.edu.cn (Junyong Zhu)
Preprint submitted to Journal of LATEX Templates August 7, 2018

ACCEPTED MANUSCRIPT
ubiquity of closed-circuit cameras and remote communication techniques, SISR

5 has attracted considerable research attention. The potential applications of
SISR include intelligent surveillance systems, remote sensing, medical image
enhancement, and telecommunication.
T
Deep network-based approaches [1, 2, 3, 4, 5] have been widely studied for
IP
the SISR task. Deep learning methods intend to reconstruct HR outputs with
10 clear details by learning an end-to-end non-linear mapping between LR and HR
CR
images. Deep networks have significantly advanced the reconstruction perfor-
mance of the existing literature. Increasing the network depth is proven effective
for improving the performance of deep models. However, the computation and
15 US
storage complexities increase rapidly with the increasing network depth. Con-
ducting SISR with a very deep network is costly for practical applications with
limited storage space and computation resources. Therefore, balancing the per-
AN
formance and the complexity is of great importance to deep learning methods.
In this paper, a novel network named the densely connected recursive net-
work (DCRN) is proposed to reconstruct high-quality images with fewer param-
M
20 eters and less computation time. The DCRN is derived from the DenseNet [6],
forming an enhanced dense unit to adapt the SISR task. Specially, the BN layers
ED
are removed from the traditional dense unit to avoid the smoothing operation
and reduce the executive consumption, while the SE blocks [7] are employed to
learn globally structured features. Besides, we use a recursive architecture to
PT
25 keep the network compact while increasing the network depth. Furthermore,
we propose a de-convolution based residual learning approach, which extracts
CE
deep features from LR input and achieves the residual output through the de-
convolution layer. The execution speed of residual image estimation is largely
accelerated by the proposed approach.
AC
30 In summary, the study makes the following contributions.
• We design an advanced dense unit to improve the efficiency and accuracy

of densely-connected networks for SISR.
• We accelerate the residual learning by using a de-convolution layer skill-
2
ACCEPTED MANUSCRIPT
fully.
35 • The experiments on Set5, Set14, BSD100, and Urban100 demonstrate

that the DCRN outperforms the state-of-the-art methods in balancing the
reconstruction accuracy, computation speed, and storage requirements.
T
IP
2. Related work
CR
2.1. Deep Network-Based Methods for SISR
40 Due to the strength in semantic feature learning, deep learning is playing

an increasingly significant role in the SISR task [1, 2, 8, 9, 10, 11, 12]. Dong et
US
al. [1] pioneered a CNN model named the SRCNN that learns the non-linear
mapping from LR image patches to HR image patches via three fully connected
layers. Kim et al. [8] proposed a deep neural network named the very deep
AN
45 neural network (VDSR). The VDSR is the first work to successfully advance
the performance of SISR by increasing the network depth. Kim et al. also [9]
proposed a deeply-recursive convolutional network (DRCN) to reduce the pa-
M
rameters of neural networks. The DRCN introduces recursive-supervision and

skip-connection techniques to optimize the parameters of a recursive model. Tai
ED
50 et al. [2] introduced a novel CNN model named the deep recursive residual net-
work (DRRN). The DRRN uses both global residual learning and local residual
learning techniques to optimize the network parameters under a multi-path re-
PT
cursive architecture. Note that for deep networks, the execution cost and the
number of parameters increase dramatically alongside with the reconstruction
CE
55 quality and the network depth. Therefore, an essential concern of deep net-
works is to balance the representative ability, the computation complexity, and
the storage consumption.
AC
2.2. Research on Compact Deep Networks
Recently, researchers have focused on improving the performance of deep

60 models while suppressing the parameters. He et al. [13] presented the ResNet
which adopted a residual learning structure to ease the optimization process of
3
ACCEPTED MANUSCRIPT
very deep networks. ResNet builds short connections between the convolution
layers to strengthen the reuse of resources and reduce the storage complexity.
Huang et al. [6] proposed the DenseNet to reduce the number of parameters.
65 The layers in DenseNet are connected to every other feed-forward layer to avoid
T
the vanishing gradients, reuse the convolution layers, and enhance feature prop-
IP
agation. Hu et al. [7] proposed the SENet to explicitly model the weights of
different channels and adaptively recalibrate the feature representations via the
CR
re-scaled channel-wise responses. SENet is proved to be effective for improving
70 the performance of deep networks with little additional computation cost.
US
3. Densely Connected Recursive Network
In this section, we describe the proposed DCRN approach. We first introduce

AN
the enhanced dense unit in the DCRN and then gradually present the details of
the overall framework.
3.1. Enhanced Dense Unit

M
75
The DCRN presents an enhanced dense unit to improve the performance of

the densely-connected networks. Figure 1 compares the basic dense unit and
ED
the enhanced one. The enhanced dense unit makes three significant changes:
First, the BN layers are removed from the original dense unit. The BN
layer has been designed to normalize deep features, which is harmful to SISR
PT
80
because of the smoothing effects. Besides, employing the BN layers brings

additional computational burden. Therefore, removing the BN layers leads to
CE
better reconstruction quality and faster executive speed.

Second, a SE block is attached behind each convolution layer to enhance
85 the representation power of the proposed model. The global structural infor-
AC
mation is significant for SISR. However, the filters of the convolution layers
operate locally, and the contextual information out of the reception filed may
be ignored during feature extraction. We propose to integrate the SE blocks
into the dense unit to address this problem. Specially, global pooling is first
4
ACCEPTED MANUSCRIPT
T
IP
CR
US
Figure 1: Comparison of dense units. (a) The basic dense unit; (b) The enhanced dense
unit. The enhanced dense unit differs from the basic dense unit in eliminating the BN layers
and utilizing the SE blocks.
AN
M
Figure 2: Dense block in the DCRN. The dense block is composed of multiple cascaded
ED
dense units and a SE block.
90 applied for each channel to extract the global structural information. Then the
PT
fully-connected layers are utilized to emphasize the informative channels and

suppress the useless channels. With SE blocks, the global information is ex-
ploited during feature extraction, which is beneficial to improve the consistency
CE
of the whole reconstructed image.

95 Finally, the convolution layers are organized in a residual manner. Local
residual learning is conducted in the dense unit to extract semantic features.
AC
3.2. Dense Block
Figure 2 demonstrates the structure of the dense block in the DCRN. A dense
block consists of several cascaded dense units which are concatenated with all
5
ACCEPTED MANUSCRIPT
T
IP
CR
Figure 3: Structure of the recursive block. N denotes the number of dense blocks in a
US
recursive block. Deep features are extracted from the LR input, and a de-convolution layer
is employed to obtain the residual image.
AN
100 other feed-forward dense units. Note that for a dense block which contains n
n(n+1)
dense units, the dense units are computed 2 times. Therefore, employing
a very deep dense block with numerous dense units is time-consuming. We pro-
M
pose to use multiple cascaded dense blocks rather than a very deep block to ease
time complexity. Compared with the dense unit, more structural information
ED
105 is contained in the dense block. Accordingly, for each dense block, we employ a
SE block to recalibrate the features and extract the contextual information.
3.3. Recursive Learning

PT
We adopt the recursive structure [2] to control the parameters of the DCRN
while increasing the network depth. Figure 3 illustrates the architecture of the
CE
110 recursive block in the DCRN. The recursive block contains several dense blocks,
and the parameters of the dense blocks are shared. Note that we attach a
transition layer after each dense block to keep the output compact. Without
AC
transition layers, the dimension of the DCRN will grow as the number of dense
blocks increases. In particular, a multi-path structure is utilized to conduct
115 local residual learning between the input feature and each transition feature.
Semantic information is thus embedded into the recursive architecture.
6
ACCEPTED MANUSCRIPT
Denote N as the number of dense blocks in the recursive block, x as the LR

input, xn−1 and xn (n = 1, 2, ..., N ) as the input and output of the n-th dense
block respectively, fn as the n-th transition function, and Dn as the function
of the n-th dense block. The output of the beginning convolution layer can be
T
formulated as:
IP
x0 = f0 (x). (1)
CR
The output of the n-th transition layer under multi-path local residual learning
is formulated as:
US
xn = D(xn−1 ) = fn (Dn (xn−1 )) + x0 .
The output of the recursive block can be regarded as N-fold operations of local
(2)
AN
residual learning between the beginning convolution layer and the transition
layer. The formula of the recursive block is as follows:
xN = D(N ) (x0 ) = D(D(...(D(x0 )))). (3)

M
3.4. De-convolution Based Residual Learning

ED
Global residual learning has been proven effective for accelerating the opti-
mization process of SISR [8]. For traditional residual learning (TRL) methods,
120 the input image is first up-sampled through interpolation methods, and then
PT
residual learning is implemented over the interpolated image. Nevertheless, ex-

tracting deep features over the interpolated image is time-consuming and the
CE
computational complexity increases quadratically along with the input image

size and the magnification factor.
125 To accelerate the feature extraction process, we propose to extract deep
AC
features from the LR input rather than the interpolated image. Then, a de-
convolution layer is utilized to magnify the deep features and obtain the desired
residual image.
Given a training set {xi , y i , x̂i }, i = 1, 2, ...K, where xi is the LR input, y i
is the ground truth, and x̂i is the interpolated image of xi , the residual image
7
ACCEPTED MANUSCRIPT
Table 1: Details of the dense block and enhanced dense unit
Dense Block Enhanced Dense Unit
Layer Name Output Size Layer Name Output Size

Input 40 × 40 × 64 Input 40 × 40 × 64
T
Dense Unit1 40 × 40 × 96 Conv1 40 × 40 × 128
Dense Unit2 40 × 40 × 128 SE Block1 40 × 40 × 128
IP
Dense Unit3 40 × 40 × 160 Conv2 40 × 40 × 32
SE Block 40 × 40 × 160 SE Block2 40 × 40 × 32
CR
Transition 40 × 40 × 64 Concat 40 × 40 × 96
r i can be computed by:
US
r i = y i − x̂i . (4)
AN
The proposed de-convolution based residual learning method intends to mini-
mize the distance between the de-convolution output and the residual image.
The objective function can be formulated as:
M
K
1 X
L= k r i − Deconv(D(N ) (xi )) k22 , (5)
2K i=1
ED
where Deconv() denotes the de-convolution function.
130 4. Experiment
PT
4.1. Experimental settings
Datasets and evaluation protocols. Following VDSR [8], our training

CE
set comprises 291 images, of which 91 samples are from Yang et al. [14], and
the remaining 200 images are from the Berkeley Segmentation Dataset [15]. For
testing, we use four popular SISR benchmarks, namely Set5 [16], Set14 [17],
AC
135
BSD100 [15], and Urban100 [18], which contains 5, 14, 100, and 100 testing
images, respectively.
Evaluation protocols. We compare the DCRN with the state-of-the-art
approaches, including ScSR [14], A+ [19], SelfEx [18], SRCNN [12], VDSR [8],
8
ACCEPTED MANUSCRIPT
Table 2: Details of the DCRN
Layer Name Input Layer Input Size Output Size

Conv1 Data 40 × 40 × 1 40 × 40 × 64
Dense Block1 Conv1 40 × 40 × 64 40 × 40 × 160
T
Transit1 Dense Block1 40 × 40 × 160 40 × 40 × 64
Residual1 {Conv1, Transit1} {40 × 40 × 64}×2 40 × 40 × 64
IP
Dense Block2 Residual1 40 × 40 × 64 40 × 40 × 160
CR
Dense Block4
Transit4
Residual4
Residual3
Dense Block4 US
{Conv1, Transit4}
40 × 40 × 64
40 × 40 × 160
{40 × 40 × 64}×2
40 × 40 × 160
40 × 40 × 64
40 × 40 × 64
AN
Transition5 Dense Block5 40 × 40 × 160 40 × 40 × 64
Deconv1 Residual5 40 × 40 × 64 M ×M ×1
M
140 DRCN [9], and DRRN [2], by evaluating the reconstruction performance under
three magnification factors (×2, ×3, ×4). SISR is performed on the YCbCr
ED
color space. The DCRN is only conducted on the luminance component of the
LR image while bi-cubic interpolation is implemented over the other color com-
PT
ponents. The DCRN is compared with the other methods using both qualitative
145 and quantitative measurements. Peak signal to noise ratio (PSNR) and struc-
tural similarity (SSIM) are computed to statistically evaluate the comparisons.
CE
Implementation details. The details of the dense block and dense unit
are presented in Table 1, and the details of the DCRN are presented in Table 2.
AC
The DCRN contains five cascaded dense blocks while each dense block consists
150 of three enhanced dense units. M denotes the desired output size of the de-
convolution layer.
As in [2], the training set is augmented by appending the flipped and rotated
versions of the training images. We flip the training samples horizontally and
9
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
M
Figure 4: Reconstruction results with a scale factor of ×3. The images from top to
bottom are from Set5, Set14, and BSD100, respectively. The DCRN generates sharp details
and maintains the structure of the images.
ED
rotate them by 90◦ , 180◦ , and 270◦ to generate seven additional version of
augmented images for training.
PT
155
Experiments are conducted on a Titan X GPU. During training, we set the

patch size of LR samples as 40 × 40, the learning rate as 0.8 and decreased by
CE
half every 30,000 iterations, the clipping parameter as 0.08, the batch size as
32, the momentum parameter as 0.9, and the weight decay as 10−4 .
AC
160 4.2. Comparison Results
The experimental results of the qualitative and quantitative comparisons are

demonstrated in this section. We compare the DCRN with the state-of-the-art
methods under the same setting. The best result is highlighted in bold.
10
ACCEPTED MANUSCRIPT
Table 3: Evaluation of the Overall Framework
Bicubic ScSR [14] A+ [19] SelfEx [18] SRCNN [12] VDSR [8] DRCN [9] DRRN [2] DCRN (ours)
Dataset Scale
PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM
×2 33.66/0.9299 35.78/0.9485 36.54/0.9544 36.49/0.9537 36.66/0.9542 37.53/0.9587 37.63/0.9588 37.66/0.9589 37.68/0.9592

Set5 ×3 30.39/0.8682 31.34/0.8869 32.58/0.9088 32.58/0.9093 32.75/0.9090 33.66/0.9213 33.82/0.9226 33.93/0.9234 33.91/0.9232
×4 28.42/0.8104 29.07/0.8263 30.28/0.8603 30.31/0.8619 30.48/0.8628 31.35/0.8838 31.53/0.8854 31.58/0.8864 31.55/0.8860
T
×2 30.24/0.8688 31.64/0.8940 32.28/0.9056 32.22/0.9034 32.42/0.9063 33.03/0.9124 33.04/0.9118 33.19/0.9133 33.15/0.9133
Set14 ×3 27.55/0.7742 28.19/0.7977 29.13/0.8188 29.16/0.8196 29.28/0.8209 29.77/0.8314 29.76/0.8311 29.94/0.8339 29.91/0.8339
×4 26.00/0.7027 26.40/0.7218 27.32/0.7491 27.40/0.7518 27.49/0.7503 28.01/0.7674 28.02/0.7670 28.18/0.7701 28.13/0.7698
IP
×2 29.56/0.8431 30.77/0.8744 31.21/0.8863 31.18/0.8855 31.36/0.8879 31.90/0.8960 31.85/0.8942 32.01/0.8969 31.95/0.8965
BSD100 ×3 27.21/0.7385 27.72/0.7647 28.29/0.7835 28.29/0.7840 28.41/0.7863 28.82/0.7976 28.80/0.7963 28.91/0.7992 28.87/0.7994
×4 25.96/0.6675 26.61/0.6983 26.82/0.7087 26.84/0.7106 26.90/0.7101 27.29/0.7251 27.23/0.7233 27.35/0.7262 27.28/0.7257
CR
×2 26.88/0.8403 28.26/0.8828 29.20/0.8938 29.54/0.8946 29.50/0.8946 30.76/0.9140 30.75/0.9133 31.02/0.9164 30.97/0.9161
Urban100 ×3 24.46/0.7349 25.69/0.7831 26.03/0.7973 26.24/0.7989 26.24/0.7989 27.14/0.8279 27.15/0.8276 27.38/0.8331 27.34/0.8329
×4 23.14/0.6577 24.02/0.7024 24.32/0.7183 24.79/0.7374 24.52/0.7221 25.18/0.7524 25.14/0.7510 25.35/0.7576 25.29/0.7572
Table 4: Comparison of the DRRN and DCRN
Method
DRRN
DCRN
PSNR (dB)
33.93
33.91
US SSIM
0.9234
0.9232
Params (M)
10.7
2.8
Time (s)
0.25
0.15
AN
Qualitative comparisons. Figure 4 illustrates the reconstruction results

165 of the DCRN and the other methods with a scale factor of ×3 on Set5, Set14,
M
and BSD100 datasets, respectively. The traditional method (ScSR) generates

visible artifacts while the deep models suppress the visual artifacts efficaciously.
ED
Furthermore, the reconstructed details of DCRN are much sharper and more
precise than those of ScSR, SRCNN, and VDSR. Note that the reconstruction
170 quality of the DCRN is competitive with the DRRN. Judging the performance
PT
of DRRN and DCRN merely by the visual quality is very difficult.

Quantitative comparisons. Table 3 illustrates the results of the quanti-
CE
tative comparisons. The DCRN outperforms the ScSR, A+, SelfEx, SRCNN,
VDSR, and DRCN for all benchmarks and magnification factors regarding both
175 PSNR and SSIM. Note that the quantitative scores of DCRN are very close to
AC
DRRN considering the reconstruction quality. For a scale factor of ×3 on Set5,

the PSNR/SSIM of DRRN is 33.93dB/0.9234, while the PSNR/SSIM of DCRN
is 33.91dB/0.9232. We conduct another experiment on Set5 with a magnifica-
tion factor of ×3 to compare the efficiency between DCRN and DRRN. Table 4
180 shows the comparisons of DCRN and DRRN with respect to the reconstruction
11
ACCEPTED MANUSCRIPT
T
IP
CR
US
AN
Figure 5: Overall performance on Set5 with a scale factor ×3. DCRN outperforms
other methods in balancing the performance and complexity.
M
performance, number of parameters, and executive time. With similar recon-

struction performance, the number of parameters required for DCRN is about
ED
a quarter of the DRRN (2.8M vs. 10.7M). Besides, the DRRN takes 0.25 sec-
onds to process an image on Set5, while the DCRN takes only 0.15 seconds.
Compared with the DRRN, the DCRN achieves a better trade-off between the
PT
185
reconstruction performance, storage consumption, and computation complexity.

Figure 5 visualizes the overall performance of DCRN and other models. As
CE
shown, the DCRN achieves excellent reconstruction quality without requiring

many parameters, and outperforms the state-of-the-art methods with respect
190 to the trade-off between all metrics.
AC
4.3. In-depth Analysis

In this section, we conduct extensive ablation experiments to evaluate the
effectiveness of the important components of the proposed approach. The abla-
tion experiments are performed on Set5 using a magnification factor of ×3. The
12
ACCEPTED MANUSCRIPT
Table 5: Effectiveness of the enhanced dense unit
Method PSNR (dB) SSIM Params (M) Time (s)
DenseNet 33.65 0.9209 2.0 0.18

DCRN with BN 33.78 0.9218 2.8 0.24
T
DCRN w/o SE block 33.87 0.9229 2.6 0.14
DCRN 33.91 0.9232 2.8 0.15
IP
Table 6: Effectiveness of the recursive learning
CR
Method PSNR (dB) SSIM Params (M) Time (s)
DCRN w/o transition 33.86 0.9228 2.7 0.20

DCRN w/o multi-path 33.88 0.9229 2.8 0.15
DCRN with TRL
DCRN US
33.93
33.91
0.9231
0.9232
2.8
2.8
0.91
0.15
AN
195 effect of each component is evaluated using the reconstruction statistics, model
size, and running time.
Effectiveness of the enhanced dense unit. Table 5 demonstrates the
M
effects of the modifications brought by the enhanced dense unit. The baseline
DenseNet achieves a PSNR of 33.65 dB and a SSIM of 0.9209, while the DCRN
ED
200 achieves a PSNR of 33.91 dB and a SSIM of 0.9232. Obviously, the recon-
struction quality is remarkably improved by the enhanced dense unit. Further,
the running time of the DCRN is smaller than the DenseNet (0.15s vs. 0.18s),
PT
which indicates that the DCRN is more efficient. Attaching the BN layers to
the DCRN results in a PSNR of 33.78 dB and a SSIM of 0.9218, which is much
lower than the proposed DCRN. Meanwhile, 40% additional computation time
CE
205
is brought by the BN layers (0.24s vs. 0.15s). Notably, employing the SE blocks
results in better PSNR (33.91dB vs. 33.87dB) and SSIM (0.9232 vs. 0.9229),
AC
at the cost of slight executive time (0.01s).

Effectiveness of the recursive learning architecture. Table 6 demon-
210 strates the effects of the recursive learning architecture. Employing the transi-
tion layers leads to higher PRSN (33.91dB vs. 33.86dB), higher SSIM (0.9232 vs.
0.9228), and fewer computation time (0.15s vs. 0.20s). The multi-path learning
13
ACCEPTED MANUSCRIPT
T
IP
CR
Figure 6: Analysis of the convergence: (a) The training process of DCRN and
US
DenseNet; (b) The PSNR of the DCRN and DenseNet on the test set.
structure also proves to be valid for promoting the PSNR (33.91dB vs. 33.88
AN
dB) and SSIM (0.9232 vs. 0.9229) of the DCRN. Note that the reconstruction
215 quality of the de-convolution based residual learning method is similar to the
traditional residual learning (TRL) approaches (PSNR: 33.91dB vs. 33.93dB,
M
SSIM: 0.9232 vs. 0.9231), but the executive time is much smaller (0.15s vs.
0.91s). The proposed approach is six times faster than the traditional methods
because the deep features are extracted over the LR image.
ED
220 Analysis of the convergence. Figure 6 demonstrates the convergence of

the DCRN and DenseNet. Figure 6(a) shows the training process of the DCRN
PT
and DenseNet. The training losses of both DCRN and DenseNet keep decreas-
ing during training, which infers that the training processes are convergent.
Obviously, the DCRN converges much faster than the DenseNet with a smaller
CE
225 initial training loss. We can achieve a convergent model from the DCRN much
easier and faster compared with the DenseNet.
AC
Figure 6(b) illustrates the PSNR of the DCRN and DenseNet on the test set.
The PSNR of the DCRN and DenseNet keep raising in the training process. In
particular, the PSNR of the DCRN is higher than that of DenseNet throughout
230 the training process, indicating a better generalizability. With a favorable gen-
eralizability and a fast convergence speed, the DCRN can achieve a satisfactory
14
ACCEPTED MANUSCRIPT
T
IP
CR
US
Figure 7: The PSNR with respect to the network depth.
Table 7: Evaluation of the network depth

AN
Measure/Block number 1 2 3 5 7 9
PSNR (dB) 33.52 33.75 33.84 33.91 33.95 33.97

SSIM 0.9199 0.9216 0.9226 0.9232 0.9235 0.9240
M
Time (s) 0.04 0.07 0.09 0.15 0.22 0.29

ED
PSNR on the test set within few iterations.

Analysis of the network depth. Table 7 presents the relationship be-
tween the reconstruction performance and the network depth. The DCRN be-
PT
235 comes more powerful as the number of dense blocks increases, which indicates
that ‘the deeper, the better’ works for the proposed approach. Figure 7 visually
illustrates this conclusion. However, the performance grows slowly when the
CE
number of dense blocks exceeds five. In addition, the computational complexity

increase dramatically with increasing network depth. Therefore, we adopt the
AC
240 DCRN with five dense blocks to balance the reconstruction performance and
the computation complexity.
15
ACCEPTED MANUSCRIPT
5. Conclusion
We propose an efficient network named the DCRN to cope with the SISR
task. The DCRN introduces an enhanced dense unit by removing the BN layers
and adopting the SE blocks. Besides, a recursive architecture is adopted to con-
T
245
trol the parameters of the dense blocks. Notably, we propose a de-convolution
IP
based residual learning approach to accelerate the executive speed.
Extensive experiments are conducted on Set5, Set14, BSD100, and Urban
CR
100 datasets. The experimental results validate the efficiency of the DCRN.
250 The effectiveness of the significant components of the DCRN is also proved
in detail. With less storage complexity and computational consumption, the
US
DCRN provides a more flexible option for practical applications.
AN
Acknowledgment
This project was supported by the NSFC (U1611461, 61573387).

M
255 References
[1] C. Dong, C. C. Loy, K. He, X. Tang, Learning a deep convolutional network

ED
for image super-resolution, in: Proceedings of the European Conference on

Computer Vision, 2014, pp. 184–199.
PT
[2] Y. Tai, J. Yang, X. Liu, Image super-resolution via deep recursive residual
260 network, in: Proceedings of the IEEE Conference on Computer Vision and
CE
Pattern Recognition, 2017, pp. 3147–3155.
[3] C. Dong, C. C. Loy, X. Tang, Accelerating the super-resolution convo-

lutional neural network, in: Proceedings of the European Conference on
AC
Computer Vision, 2016, pp. 391–407.
265 [4] Z. Feng, J. Lai, X. Xie, D. Yang, L. Mei, Face hallucination by deep traver-
sal network, in: Proceedings of the International Conference on Pattern
Recognition, 2016, pp. 3276–3281.
16
ACCEPTED MANUSCRIPT
[5] Q. Cao, L. Lin, Y. Shi, X. Liang, G. Li, Attention-aware face hallucination

via deep reinforcement learning, arXiv preprint arXiv:1708.03132.
270 [6] G. Huang, Z. Liu, K. Q. Weinberger, L. van der Maaten, Densely connected
convolutional networks, arXiv preprint arXiv:1608.06993.
T
[7] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, arXiv preprint
IP
arXiv:1709.01507.
CR
[8] J. Kim, J. Kwon Lee, K. Mu Lee, Accurate image super-resolution using
275 very deep convolutional networks, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2016, pp. 1646–1654.
US
[9] J. Kim, J. Kwon Lee, K. Mu Lee, Deeply-recursive convolutional network
for image super-resolution, in: Proceedings of the IEEE Conference on
AN
Computer Vision and Pattern Recognition, 2016, pp. 1637–1645.
280 [10] W.-S. Lai, J.-B. Huang, N. Ahuja, M.-H. Yang, Deep laplacian pyra-
mid networks for fast and accurate super-resolution, arXiv preprint
M
arXiv:1704.03915.
[11] B. Lim, S. Son, H. Kim, S. Nah, K. M. Lee, Enhanced deep residual net-
ED
works for single image super-resolution, in: Proceedings of the IEEE Con-
285 ference on Computer Vision and Pattern Recognition Workshops, 2017, pp.
136–144.
PT
[12] C. Dong, C. C. Loy, K. He, X. Tang, Image super-resolution using deep

convolutional networks, IEEE transactions on pattern analysis and machine
CE
intelligence 38 (2) (2016) 295–307.
290 [13] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recog-
AC
nition, in: Proceedings of the IEEE conference on computer vision and

pattern recognition, 2016, pp. 770–778.
[14] J. Yang, J. Wright, T. S. Huang, Y. Ma, Image super-resolution via sparse

representation, IEEE transactions on image processing 19 (11) (2010) 2861–
295 2873.
17
ACCEPTED MANUSCRIPT
[15] D. Martin, C. Fowlkes, D. Tal, J. Malik, A database of human segmented

natural images and its application to evaluating segmentation algorithms
and measuring ecological statistics, in: Proceedings of the International
Conference on Computer Vision, 2001, pp. 416–423.
T
300 [16] M. Bevilacqua, A. Roumy, C. Guillemot, M. L. Alberi-Morel, Low-
IP
complexity single-image super-resolution based on nonnegative neighbor
embedding, in: Proceedings of the British Machine Vision Conference,
CR
2012.
[17] R. Zeyde, M. Elad, M. Protter, On single image scale-up using sparse-

305
and Surfaces, 2010, pp. 711–730.US

representations, in: Proceedings of the International Conference on Curves
AN
[18] J.-B. Huang, A. Singh, N. Ahuja, Single image super-resolution from trans-
formed self-exemplars, in: Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, 2015, pp. 5197–5206.
M
310 [19] R. Timofte, V. De Smet, L. Van Gool, A+: Adjusted anchored neigh-
borhood regression for fast super-resolution, in: Proceedings of the Asian
ED
Conference on Computer Vision, 2014, pp. 111–126.

PT
CE
AC
18
ACCEPTED MANUSCRIPT
Zhanxiang Feng
T
IP
CR
Jianhuang Lai
US
AN
M
ED
Xiaohua Xie
PT
CE
AC
ACCEPTED MANUSCRIPT
Junyong Zhu
T
IP
CR
US
AN
M
ED
PT
CE
AC
ACCEPTED MANUSCRIPT
315 Zhanxiang Feng received the B.E. degree in automation from Sun Yat-Sen
University, China, in 2012. He is currently pursuing the Ph.D. degree in infor-
mation and communication engineering with Sun Yat-Sen University, China.
His research interests include person re-identification, face recognition, face hal-
T
lucination, image super-resolution, and visual surveillance. He has authored
IP
320 papers in IEEE TIP and ICPR. His ICPR 2016 paper won the enclosed finalist
best student paper award.
CR
Jianhuang Lai received his M.Sc. degree in applied mathematics in 1989
and his Ph.D. in mathematics in 1999 from Sun Yat-Sen University, China. He
joined Sun Yat-sen University in 1989 as an Assistant Professor, where cur-
325
US
rently, he is a Professor in School of Data and Computer Science. His current
research interests are in the areas of computer vision, pattern recognition and
its applications. He has published over 250 scientific papers in the international
AN
journals and conferences on image processing and pattern recognition, e.g. IEEE
TPAMI, IEEE TNN, IEEE TIP, IEEE TSMC (Part B), Pattern Recognition,
330 ICCV, CVPR and ICDM. Prof. Lai serves as a deputy director of the Image
M
and Graphics Association of China and also serves as a standing director of the
Image and Graphics Association of Guangdong. He is also the deputy director
ED
of Computer Vision Committee, China Computer Federation (CCF).

Xiaohua Xie received the B.S. degree in mathematics and applied math-
335 ematics from Shantou University in 2005, the M.S. degree in information and
PT
computing science and the Ph.D. degree in applied mathematics from Sun Yat-
sen University, China, in 2007 and 2010, respectively. He was an Associate
CE
Professor with Shenzhen Institutes of Advanced Technology, Chinese Academy

of Sciences. He is currently a Research Professor with Sun Yat-sen Univer-
340 sity. He has authored or co-authored over 30 papers in prestigious international
AC
journals and conferences. His current research fields cover image processing,
computer vision, pattern recognition, and computer graphics.
Junyong Zhu received his M.S. and Ph.D. degrees in applied mathematics
in the school of Mathematics and Computational Science from Sun Yat-sen
345 University, Guangzhou, P. R. China, in 2010 and 2014, respectively. He has
21
ACCEPTED MANUSCRIPT
working toward the post-doctoral research in the Department of Information

Science and Technology, Sun Yat-Sen University. Currently, he is a Associate
Research Professor with Sun Yat-sen University. His current research interests
include heterogeneous face recognition, visual transfer learning using partial
T
350 labeled or unlabeled auxiliary data and non-linear clustering. He has authored
IP
and co-authored papers in international journals and conferences such as IEEE
TIFS, PR, ICIP, AMFG and ICDM. His cooperative ICDM 2010 paper won the
CR
Honorable Mention for Best Research Paper Awards and his CCBR 2012 paper
won the Best Student Paper Awards.
US
AN
M
ED
PT
CE
AC
22

Feng - Super Resolution PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Feng - Super Resolution PDF

Загружено:

Авторское право:

Доступные форматы

Communicated by Dr Chenchen Liu

Image Super-Resolution via a Densely Connected Recursive Network

Zhanxiang Feng, Jianhuang Lai, Xiaohua Xie, Junyong Zhu

To appear in: Neurocomputing

Received date: 28 December 2017

Image Super-Resolution via a Densely Connected

Zhanxiang Fenga , Jianhuang Laib,c,∗, Xiaohua Xieb,c , Junyong Zhub,c

and-excitation (SE) structure. A recursive architecture is also adopted to con-

learning method is proposed to accelerate the residual feature extraction pro-

Unit, Recursive Structure, Residual Learning

Single image super-resolution (SISR) refers to the process of recovering high-

Preprint submitted to Journal of LATEX Templates August 7, 2018

ubiquity of closed-circuit cameras and remote communication techniques, SISR

30 In summary, the study makes the following contributions.

• We design an advanced dense unit to improve the efficiency and accuracy

• We accelerate the residual learning by using a de-convolution layer skill-

35 • The experiments on Set5, Set14, BSD100, and Urban100 demonstrate

40 Due to the strength in semantic feature learning, deep learning is playing

rameters of neural networks. The DRCN introduces recursive-supervision and

2.2. Research on Compact Deep Networks

Recently, researchers have focused on improving the performance of deep

In this section, we describe the proposed DCRN approach. We first introduce

3.1. Enhanced Dense Unit

The DCRN presents an enhanced dense unit to improve the performance of

because of the smoothing effects. Besides, employing the BN layers brings

better reconstruction quality and faster executive speed.

dense units and a SE block.

fully-connected layers are utilized to emphasize the informative channels and

of the whole reconstructed image.

3.2. Dense Block

3.3. Recursive Learning

Denote N as the number of dense blocks in the recursive block, x as the LR

xN = D(N ) (x0 ) = D(D(...(D(x0 )))). (3)

3.4. De-convolution Based Residual Learning

residual learning is implemented over the interpolated image. Nevertheless, ex-

computational complexity increases quadratically along with the input image

Table 1: Details of the dense block and enhanced dense unit

Dense Block Enhanced Dense Unit

Layer Name Output Size Layer Name Output Size

r i can be computed by:

where Deconv() denotes the de-convolution function.

4.1. Experimental settings

Datasets and evaluation protocols. Following VDSR [8], our training

Table 2: Details of the DCRN

Layer Name Input Layer Input Size Output Size

Experiments are conducted on a Titan X GPU. During training, we set the

160 4.2. Comparison Results

The experimental results of the qualitative and quantitative comparisons are

Table 3: Evaluation of the Overall Framework

×2 33.66/0.9299 35.78/0.9485 36.54/0.9544 36.49/0.9537 36.66/0.9542 37.53/0.9587 37.63/0.9588 37.66/0.9589 37.68/0.9592

Table 4: Comparison of the DRRN and DCRN

Qualitative comparisons. Figure 4 illustrates the reconstruction results

and BSD100 datasets, respectively. The traditional method (ScSR) generates

of DRRN and DCRN merely by the visual quality is very difficult.

DRRN considering the reconstruction quality. For a scale factor of ×3 on Set5,

performance, number of parameters, and executive time. With similar recon-

reconstruction performance, storage consumption, and computation complexity.

shown, the DCRN achieves excellent reconstruction quality without requiring

4.3. In-depth Analysis

Table 5: Effectiveness of the enhanced dense unit

Method PSNR (dB) SSIM Params (M) Time (s)

DenseNet 33.65 0.9209 2.0 0.18

DCRN w/o transition 33.86 0.9228 2.7 0.20