Вы находитесь на странице: 1из 23

Communicated by Dr Chenchen Liu

Accepted Manuscript

Image Super-Resolution via a Densely Connected Recursive Network

Zhanxiang Feng, Jianhuang Lai, Xiaohua Xie, Junyong Zhu

PII: S0925-2312(18)30914-7
DOI: https://doi.org/10.1016/j.neucom.2018.07.076
Reference: NEUCOM 19830

To appear in: Neurocomputing

Received date: 28 December 2017


Revised date: 2 June 2018
Accepted date: 19 July 2018

Please cite this article as: Zhanxiang Feng, Jianhuang Lai, Xiaohua Xie, Junyong Zhu, Image
Super-Resolution via a Densely Connected Recursive Network, Neurocomputing (2018), doi:
https://doi.org/10.1016/j.neucom.2018.07.076

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

Image Super-Resolution via a Densely Connected


Recursive Network

Zhanxiang Fenga , Jianhuang Laib,c,∗, Xiaohua Xieb,c , Junyong Zhub,c

T
a School
of Electronics and Information Technology, Sun Yat-sen University, China
b School
of Data and Computer Science, Sun Yat-sen University, China
c Guangdong Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of

IP
Education, China

CR
Abstract

US
The single-image super-resolution techniques (SISR) have been significantly pro-
moted by deep networks. However, the storage and computation complexities
of deep models increase dramatically alongside with the reconstruction perfor-
AN
mance. This paper proposes a densely connected recursive network (DCRN) to
trade off the performance and complexity. We introduce an enhanced dense unit
by removing the batch normalization (BN) layers and employing the squeeze-
M

and-excitation (SE) structure. A recursive architecture is also adopted to con-


trol the parameters of deep networks. Moreover, a de-convolution based residual
ED

learning method is proposed to accelerate the residual feature extraction pro-


cess. The experimental results validate the efficiency of the proposed approach.
Keywords: Image Super-Resolution, Deep Learning, Enhanced Dense
PT

Unit, Recursive Structure, Residual Learning


CE

1. Introduction

Single image super-resolution (SISR) refers to the process of recovering high-


resolution (HR) images from low-resolution (LR) inputs. With the increasing
AC

∗ Corresponding author. Present address: School of Data and Computer Science, Sun

Yat-sen University, Guangzhou Higher Education Mega Center, Panyu District, Guangzhou,
Guangdong, 510006, P. R. China. Tel.: +86-13168313819. Fax: +86-20-84110175.
Email addresses: fengzhx@mail2.sysu.edu.cn (Zhanxiang Feng),
stsljh@mail.sysu.edu.cn (Jianhuang Lai), xiexiaoh6@mail.sysu.edu.cn (Xiaohua Xie),
zhujuny5@mail.sysu.edu.cn (Junyong Zhu)

Preprint submitted to Journal of LATEX Templates August 7, 2018


ACCEPTED MANUSCRIPT

ubiquity of closed-circuit cameras and remote communication techniques, SISR


5 has attracted considerable research attention. The potential applications of
SISR include intelligent surveillance systems, remote sensing, medical image
enhancement, and telecommunication.

T
Deep network-based approaches [1, 2, 3, 4, 5] have been widely studied for

IP
the SISR task. Deep learning methods intend to reconstruct HR outputs with
10 clear details by learning an end-to-end non-linear mapping between LR and HR

CR
images. Deep networks have significantly advanced the reconstruction perfor-
mance of the existing literature. Increasing the network depth is proven effective
for improving the performance of deep models. However, the computation and

15 US
storage complexities increase rapidly with the increasing network depth. Con-
ducting SISR with a very deep network is costly for practical applications with
limited storage space and computation resources. Therefore, balancing the per-
AN
formance and the complexity is of great importance to deep learning methods.
In this paper, a novel network named the densely connected recursive net-
work (DCRN) is proposed to reconstruct high-quality images with fewer param-
M

20 eters and less computation time. The DCRN is derived from the DenseNet [6],
forming an enhanced dense unit to adapt the SISR task. Specially, the BN layers
ED

are removed from the traditional dense unit to avoid the smoothing operation
and reduce the executive consumption, while the SE blocks [7] are employed to
learn globally structured features. Besides, we use a recursive architecture to
PT

25 keep the network compact while increasing the network depth. Furthermore,
we propose a de-convolution based residual learning approach, which extracts
CE

deep features from LR input and achieves the residual output through the de-
convolution layer. The execution speed of residual image estimation is largely
accelerated by the proposed approach.
AC

30 In summary, the study makes the following contributions.

• We design an advanced dense unit to improve the efficiency and accuracy


of densely-connected networks for SISR.

• We accelerate the residual learning by using a de-convolution layer skill-

2
ACCEPTED MANUSCRIPT

fully.

35 • The experiments on Set5, Set14, BSD100, and Urban100 demonstrate


that the DCRN outperforms the state-of-the-art methods in balancing the
reconstruction accuracy, computation speed, and storage requirements.

T
IP
2. Related work

CR
2.1. Deep Network-Based Methods for SISR

40 Due to the strength in semantic feature learning, deep learning is playing


an increasingly significant role in the SISR task [1, 2, 8, 9, 10, 11, 12]. Dong et

US
al. [1] pioneered a CNN model named the SRCNN that learns the non-linear
mapping from LR image patches to HR image patches via three fully connected
layers. Kim et al. [8] proposed a deep neural network named the very deep
AN
45 neural network (VDSR). The VDSR is the first work to successfully advance
the performance of SISR by increasing the network depth. Kim et al. also [9]
proposed a deeply-recursive convolutional network (DRCN) to reduce the pa-
M

rameters of neural networks. The DRCN introduces recursive-supervision and


skip-connection techniques to optimize the parameters of a recursive model. Tai
ED

50 et al. [2] introduced a novel CNN model named the deep recursive residual net-
work (DRRN). The DRRN uses both global residual learning and local residual
learning techniques to optimize the network parameters under a multi-path re-
PT

cursive architecture. Note that for deep networks, the execution cost and the
number of parameters increase dramatically alongside with the reconstruction
CE

55 quality and the network depth. Therefore, an essential concern of deep net-
works is to balance the representative ability, the computation complexity, and
the storage consumption.
AC

2.2. Research on Compact Deep Networks

Recently, researchers have focused on improving the performance of deep


60 models while suppressing the parameters. He et al. [13] presented the ResNet
which adopted a residual learning structure to ease the optimization process of

3
ACCEPTED MANUSCRIPT

very deep networks. ResNet builds short connections between the convolution
layers to strengthen the reuse of resources and reduce the storage complexity.
Huang et al. [6] proposed the DenseNet to reduce the number of parameters.
65 The layers in DenseNet are connected to every other feed-forward layer to avoid

T
the vanishing gradients, reuse the convolution layers, and enhance feature prop-

IP
agation. Hu et al. [7] proposed the SENet to explicitly model the weights of
different channels and adaptively recalibrate the feature representations via the

CR
re-scaled channel-wise responses. SENet is proved to be effective for improving
70 the performance of deep networks with little additional computation cost.

US
3. Densely Connected Recursive Network

In this section, we describe the proposed DCRN approach. We first introduce


AN
the enhanced dense unit in the DCRN and then gradually present the details of
the overall framework.

3.1. Enhanced Dense Unit


M

75

The DCRN presents an enhanced dense unit to improve the performance of


the densely-connected networks. Figure 1 compares the basic dense unit and
ED

the enhanced one. The enhanced dense unit makes three significant changes:
First, the BN layers are removed from the original dense unit. The BN
layer has been designed to normalize deep features, which is harmful to SISR
PT

80

because of the smoothing effects. Besides, employing the BN layers brings


additional computational burden. Therefore, removing the BN layers leads to
CE

better reconstruction quality and faster executive speed.


Second, a SE block is attached behind each convolution layer to enhance
85 the representation power of the proposed model. The global structural infor-
AC

mation is significant for SISR. However, the filters of the convolution layers
operate locally, and the contextual information out of the reception filed may
be ignored during feature extraction. We propose to integrate the SE blocks
into the dense unit to address this problem. Specially, global pooling is first

4
ACCEPTED MANUSCRIPT

T
IP
CR
US
Figure 1: Comparison of dense units. (a) The basic dense unit; (b) The enhanced dense
unit. The enhanced dense unit differs from the basic dense unit in eliminating the BN layers
and utilizing the SE blocks.
AN
M

Figure 2: Dense block in the DCRN. The dense block is composed of multiple cascaded
ED

dense units and a SE block.

90 applied for each channel to extract the global structural information. Then the
PT

fully-connected layers are utilized to emphasize the informative channels and


suppress the useless channels. With SE blocks, the global information is ex-
ploited during feature extraction, which is beneficial to improve the consistency
CE

of the whole reconstructed image.


95 Finally, the convolution layers are organized in a residual manner. Local
residual learning is conducted in the dense unit to extract semantic features.
AC

3.2. Dense Block

Figure 2 demonstrates the structure of the dense block in the DCRN. A dense
block consists of several cascaded dense units which are concatenated with all

5
ACCEPTED MANUSCRIPT

T
IP
CR
Figure 3: Structure of the recursive block. N denotes the number of dense blocks in a

US
recursive block. Deep features are extracted from the LR input, and a de-convolution layer
is employed to obtain the residual image.
AN
100 other feed-forward dense units. Note that for a dense block which contains n
n(n+1)
dense units, the dense units are computed 2 times. Therefore, employing
a very deep dense block with numerous dense units is time-consuming. We pro-
M

pose to use multiple cascaded dense blocks rather than a very deep block to ease
time complexity. Compared with the dense unit, more structural information
ED

105 is contained in the dense block. Accordingly, for each dense block, we employ a
SE block to recalibrate the features and extract the contextual information.

3.3. Recursive Learning


PT

We adopt the recursive structure [2] to control the parameters of the DCRN
while increasing the network depth. Figure 3 illustrates the architecture of the
CE

110 recursive block in the DCRN. The recursive block contains several dense blocks,
and the parameters of the dense blocks are shared. Note that we attach a
transition layer after each dense block to keep the output compact. Without
AC

transition layers, the dimension of the DCRN will grow as the number of dense
blocks increases. In particular, a multi-path structure is utilized to conduct
115 local residual learning between the input feature and each transition feature.
Semantic information is thus embedded into the recursive architecture.

6
ACCEPTED MANUSCRIPT

Denote N as the number of dense blocks in the recursive block, x as the LR


input, xn−1 and xn (n = 1, 2, ..., N ) as the input and output of the n-th dense
block respectively, fn as the n-th transition function, and Dn as the function
of the n-th dense block. The output of the beginning convolution layer can be

T
formulated as:

IP
x0 = f0 (x). (1)

CR
The output of the n-th transition layer under multi-path local residual learning
is formulated as:

US
xn = D(xn−1 ) = fn (Dn (xn−1 )) + x0 .

The output of the recursive block can be regarded as N-fold operations of local
(2)
AN
residual learning between the beginning convolution layer and the transition
layer. The formula of the recursive block is as follows:

xN = D(N ) (x0 ) = D(D(...(D(x0 )))). (3)


M

3.4. De-convolution Based Residual Learning


ED

Global residual learning has been proven effective for accelerating the opti-
mization process of SISR [8]. For traditional residual learning (TRL) methods,
120 the input image is first up-sampled through interpolation methods, and then
PT

residual learning is implemented over the interpolated image. Nevertheless, ex-


tracting deep features over the interpolated image is time-consuming and the
CE

computational complexity increases quadratically along with the input image


size and the magnification factor.
125 To accelerate the feature extraction process, we propose to extract deep
AC

features from the LR input rather than the interpolated image. Then, a de-
convolution layer is utilized to magnify the deep features and obtain the desired
residual image.
Given a training set {xi , y i , x̂i }, i = 1, 2, ...K, where xi is the LR input, y i
is the ground truth, and x̂i is the interpolated image of xi , the residual image

7
ACCEPTED MANUSCRIPT

Table 1: Details of the dense block and enhanced dense unit

Dense Block Enhanced Dense Unit

Layer Name Output Size Layer Name Output Size


Input 40 × 40 × 64 Input 40 × 40 × 64

T
Dense Unit1 40 × 40 × 96 Conv1 40 × 40 × 128
Dense Unit2 40 × 40 × 128 SE Block1 40 × 40 × 128

IP
Dense Unit3 40 × 40 × 160 Conv2 40 × 40 × 32
SE Block 40 × 40 × 160 SE Block2 40 × 40 × 32

CR
Transition 40 × 40 × 64 Concat 40 × 40 × 96

r i can be computed by:

US
r i = y i − x̂i . (4)
AN
The proposed de-convolution based residual learning method intends to mini-
mize the distance between the de-convolution output and the residual image.
The objective function can be formulated as:
M

K
1 X
L= k r i − Deconv(D(N ) (xi )) k22 , (5)
2K i=1
ED

where Deconv() denotes the de-convolution function.

130 4. Experiment
PT

4.1. Experimental settings

Datasets and evaluation protocols. Following VDSR [8], our training


CE

set comprises 291 images, of which 91 samples are from Yang et al. [14], and
the remaining 200 images are from the Berkeley Segmentation Dataset [15]. For
testing, we use four popular SISR benchmarks, namely Set5 [16], Set14 [17],
AC

135

BSD100 [15], and Urban100 [18], which contains 5, 14, 100, and 100 testing
images, respectively.
Evaluation protocols. We compare the DCRN with the state-of-the-art
approaches, including ScSR [14], A+ [19], SelfEx [18], SRCNN [12], VDSR [8],

8
ACCEPTED MANUSCRIPT

Table 2: Details of the DCRN

Layer Name Input Layer Input Size Output Size


Conv1 Data 40 × 40 × 1 40 × 40 × 64
Dense Block1 Conv1 40 × 40 × 64 40 × 40 × 160

T
Transit1 Dense Block1 40 × 40 × 160 40 × 40 × 64
Residual1 {Conv1, Transit1} {40 × 40 × 64}×2 40 × 40 × 64

IP
Dense Block2 Residual1 40 × 40 × 64 40 × 40 × 160
Transit2 Dense Block2 40 × 40 × 160 40 × 40 × 64

CR
Residual2 {Conv1, Transit2} {40 × 40 × 64}×2 40 × 40 × 64
Dense Block3 Residual2 40 × 40 × 64 40 × 40 × 160
Transit3 Dense Block3 40 × 40 × 160 40 × 40 × 64
Residual3 {Conv1, Transit3} {40 × 40 × 64}×2 40 × 40 × 64
Dense Block4
Transit4
Residual4
Residual3
Dense Block4 US
{Conv1, Transit4}
40 × 40 × 64
40 × 40 × 160
{40 × 40 × 64}×2
40 × 40 × 160
40 × 40 × 64
40 × 40 × 64
AN
Dense Block5 Residual4 40 × 40 × 64 40 × 40 × 160
Transition5 Dense Block5 40 × 40 × 160 40 × 40 × 64
Residual5 {Conv1, Transit5} {40 × 40 × 64}×2 40 × 40 × 64
Deconv1 Residual5 40 × 40 × 64 M ×M ×1
M

140 DRCN [9], and DRRN [2], by evaluating the reconstruction performance under
three magnification factors (×2, ×3, ×4). SISR is performed on the YCbCr
ED

color space. The DCRN is only conducted on the luminance component of the
LR image while bi-cubic interpolation is implemented over the other color com-
PT

ponents. The DCRN is compared with the other methods using both qualitative
145 and quantitative measurements. Peak signal to noise ratio (PSNR) and struc-
tural similarity (SSIM) are computed to statistically evaluate the comparisons.
CE

Implementation details. The details of the dense block and dense unit
are presented in Table 1, and the details of the DCRN are presented in Table 2.
AC

The DCRN contains five cascaded dense blocks while each dense block consists
150 of three enhanced dense units. M denotes the desired output size of the de-
convolution layer.
As in [2], the training set is augmented by appending the flipped and rotated
versions of the training images. We flip the training samples horizontally and

9
ACCEPTED MANUSCRIPT

T
IP
CR
US
AN
M

Figure 4: Reconstruction results with a scale factor of ×3. The images from top to
bottom are from Set5, Set14, and BSD100, respectively. The DCRN generates sharp details
and maintains the structure of the images.
ED

rotate them by 90◦ , 180◦ , and 270◦ to generate seven additional version of
augmented images for training.
PT

155

Experiments are conducted on a Titan X GPU. During training, we set the


patch size of LR samples as 40 × 40, the learning rate as 0.8 and decreased by
CE

half every 30,000 iterations, the clipping parameter as 0.08, the batch size as
32, the momentum parameter as 0.9, and the weight decay as 10−4 .
AC

160 4.2. Comparison Results

The experimental results of the qualitative and quantitative comparisons are


demonstrated in this section. We compare the DCRN with the state-of-the-art
methods under the same setting. The best result is highlighted in bold.

10
ACCEPTED MANUSCRIPT

Table 3: Evaluation of the Overall Framework

Bicubic ScSR [14] A+ [19] SelfEx [18] SRCNN [12] VDSR [8] DRCN [9] DRRN [2] DCRN (ours)
Dataset Scale
PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM PSNR/SSIM

×2 33.66/0.9299 35.78/0.9485 36.54/0.9544 36.49/0.9537 36.66/0.9542 37.53/0.9587 37.63/0.9588 37.66/0.9589 37.68/0.9592


Set5 ×3 30.39/0.8682 31.34/0.8869 32.58/0.9088 32.58/0.9093 32.75/0.9090 33.66/0.9213 33.82/0.9226 33.93/0.9234 33.91/0.9232
×4 28.42/0.8104 29.07/0.8263 30.28/0.8603 30.31/0.8619 30.48/0.8628 31.35/0.8838 31.53/0.8854 31.58/0.8864 31.55/0.8860

T
×2 30.24/0.8688 31.64/0.8940 32.28/0.9056 32.22/0.9034 32.42/0.9063 33.03/0.9124 33.04/0.9118 33.19/0.9133 33.15/0.9133
Set14 ×3 27.55/0.7742 28.19/0.7977 29.13/0.8188 29.16/0.8196 29.28/0.8209 29.77/0.8314 29.76/0.8311 29.94/0.8339 29.91/0.8339
×4 26.00/0.7027 26.40/0.7218 27.32/0.7491 27.40/0.7518 27.49/0.7503 28.01/0.7674 28.02/0.7670 28.18/0.7701 28.13/0.7698

IP
×2 29.56/0.8431 30.77/0.8744 31.21/0.8863 31.18/0.8855 31.36/0.8879 31.90/0.8960 31.85/0.8942 32.01/0.8969 31.95/0.8965
BSD100 ×3 27.21/0.7385 27.72/0.7647 28.29/0.7835 28.29/0.7840 28.41/0.7863 28.82/0.7976 28.80/0.7963 28.91/0.7992 28.87/0.7994
×4 25.96/0.6675 26.61/0.6983 26.82/0.7087 26.84/0.7106 26.90/0.7101 27.29/0.7251 27.23/0.7233 27.35/0.7262 27.28/0.7257

CR
×2 26.88/0.8403 28.26/0.8828 29.20/0.8938 29.54/0.8946 29.50/0.8946 30.76/0.9140 30.75/0.9133 31.02/0.9164 30.97/0.9161
Urban100 ×3 24.46/0.7349 25.69/0.7831 26.03/0.7973 26.24/0.7989 26.24/0.7989 27.14/0.8279 27.15/0.8276 27.38/0.8331 27.34/0.8329
×4 23.14/0.6577 24.02/0.7024 24.32/0.7183 24.79/0.7374 24.52/0.7221 25.18/0.7524 25.14/0.7510 25.35/0.7576 25.29/0.7572

Table 4: Comparison of the DRRN and DCRN

Method
DRRN
DCRN
PSNR (dB)
33.93
33.91
US SSIM
0.9234
0.9232
Params (M)
10.7
2.8
Time (s)
0.25
0.15
AN

Qualitative comparisons. Figure 4 illustrates the reconstruction results


165 of the DCRN and the other methods with a scale factor of ×3 on Set5, Set14,
M

and BSD100 datasets, respectively. The traditional method (ScSR) generates


visible artifacts while the deep models suppress the visual artifacts efficaciously.
ED

Furthermore, the reconstructed details of DCRN are much sharper and more
precise than those of ScSR, SRCNN, and VDSR. Note that the reconstruction
170 quality of the DCRN is competitive with the DRRN. Judging the performance
PT

of DRRN and DCRN merely by the visual quality is very difficult.


Quantitative comparisons. Table 3 illustrates the results of the quanti-
CE

tative comparisons. The DCRN outperforms the ScSR, A+, SelfEx, SRCNN,
VDSR, and DRCN for all benchmarks and magnification factors regarding both
175 PSNR and SSIM. Note that the quantitative scores of DCRN are very close to
AC

DRRN considering the reconstruction quality. For a scale factor of ×3 on Set5,


the PSNR/SSIM of DRRN is 33.93dB/0.9234, while the PSNR/SSIM of DCRN
is 33.91dB/0.9232. We conduct another experiment on Set5 with a magnifica-
tion factor of ×3 to compare the efficiency between DCRN and DRRN. Table 4
180 shows the comparisons of DCRN and DRRN with respect to the reconstruction

11
ACCEPTED MANUSCRIPT

T
IP
CR
US
AN
Figure 5: Overall performance on Set5 with a scale factor ×3. DCRN outperforms
other methods in balancing the performance and complexity.
M

performance, number of parameters, and executive time. With similar recon-


struction performance, the number of parameters required for DCRN is about
ED

a quarter of the DRRN (2.8M vs. 10.7M). Besides, the DRRN takes 0.25 sec-
onds to process an image on Set5, while the DCRN takes only 0.15 seconds.
Compared with the DRRN, the DCRN achieves a better trade-off between the
PT

185

reconstruction performance, storage consumption, and computation complexity.


Figure 5 visualizes the overall performance of DCRN and other models. As
CE

shown, the DCRN achieves excellent reconstruction quality without requiring


many parameters, and outperforms the state-of-the-art methods with respect
190 to the trade-off between all metrics.
AC

4.3. In-depth Analysis


In this section, we conduct extensive ablation experiments to evaluate the
effectiveness of the important components of the proposed approach. The abla-
tion experiments are performed on Set5 using a magnification factor of ×3. The

12
ACCEPTED MANUSCRIPT

Table 5: Effectiveness of the enhanced dense unit

Method PSNR (dB) SSIM Params (M) Time (s)

DenseNet 33.65 0.9209 2.0 0.18


DCRN with BN 33.78 0.9218 2.8 0.24

T
DCRN w/o SE block 33.87 0.9229 2.6 0.14
DCRN 33.91 0.9232 2.8 0.15

IP
Table 6: Effectiveness of the recursive learning

CR
Method PSNR (dB) SSIM Params (M) Time (s)

DCRN w/o transition 33.86 0.9228 2.7 0.20


DCRN w/o multi-path 33.88 0.9229 2.8 0.15
DCRN with TRL
DCRN US
33.93
33.91
0.9231
0.9232
2.8
2.8
0.91
0.15
AN
195 effect of each component is evaluated using the reconstruction statistics, model
size, and running time.
Effectiveness of the enhanced dense unit. Table 5 demonstrates the
M

effects of the modifications brought by the enhanced dense unit. The baseline
DenseNet achieves a PSNR of 33.65 dB and a SSIM of 0.9209, while the DCRN
ED

200 achieves a PSNR of 33.91 dB and a SSIM of 0.9232. Obviously, the recon-
struction quality is remarkably improved by the enhanced dense unit. Further,
the running time of the DCRN is smaller than the DenseNet (0.15s vs. 0.18s),
PT

which indicates that the DCRN is more efficient. Attaching the BN layers to
the DCRN results in a PSNR of 33.78 dB and a SSIM of 0.9218, which is much
lower than the proposed DCRN. Meanwhile, 40% additional computation time
CE

205

is brought by the BN layers (0.24s vs. 0.15s). Notably, employing the SE blocks
results in better PSNR (33.91dB vs. 33.87dB) and SSIM (0.9232 vs. 0.9229),
AC

at the cost of slight executive time (0.01s).


Effectiveness of the recursive learning architecture. Table 6 demon-
210 strates the effects of the recursive learning architecture. Employing the transi-
tion layers leads to higher PRSN (33.91dB vs. 33.86dB), higher SSIM (0.9232 vs.
0.9228), and fewer computation time (0.15s vs. 0.20s). The multi-path learning

13
ACCEPTED MANUSCRIPT

T
IP
CR
Figure 6: Analysis of the convergence: (a) The training process of DCRN and

US
DenseNet; (b) The PSNR of the DCRN and DenseNet on the test set.

structure also proves to be valid for promoting the PSNR (33.91dB vs. 33.88
AN
dB) and SSIM (0.9232 vs. 0.9229) of the DCRN. Note that the reconstruction
215 quality of the de-convolution based residual learning method is similar to the
traditional residual learning (TRL) approaches (PSNR: 33.91dB vs. 33.93dB,
M

SSIM: 0.9232 vs. 0.9231), but the executive time is much smaller (0.15s vs.
0.91s). The proposed approach is six times faster than the traditional methods
because the deep features are extracted over the LR image.
ED

220 Analysis of the convergence. Figure 6 demonstrates the convergence of


the DCRN and DenseNet. Figure 6(a) shows the training process of the DCRN
PT

and DenseNet. The training losses of both DCRN and DenseNet keep decreas-
ing during training, which infers that the training processes are convergent.
Obviously, the DCRN converges much faster than the DenseNet with a smaller
CE

225 initial training loss. We can achieve a convergent model from the DCRN much
easier and faster compared with the DenseNet.
AC

Figure 6(b) illustrates the PSNR of the DCRN and DenseNet on the test set.
The PSNR of the DCRN and DenseNet keep raising in the training process. In
particular, the PSNR of the DCRN is higher than that of DenseNet throughout
230 the training process, indicating a better generalizability. With a favorable gen-
eralizability and a fast convergence speed, the DCRN can achieve a satisfactory

14
ACCEPTED MANUSCRIPT

T
IP
CR
US
Figure 7: The PSNR with respect to the network depth.

Table 7: Evaluation of the network depth


AN
Measure/Block number 1 2 3 5 7 9

PSNR (dB) 33.52 33.75 33.84 33.91 33.95 33.97


SSIM 0.9199 0.9216 0.9226 0.9232 0.9235 0.9240
M

Time (s) 0.04 0.07 0.09 0.15 0.22 0.29


ED

PSNR on the test set within few iterations.


Analysis of the network depth. Table 7 presents the relationship be-
tween the reconstruction performance and the network depth. The DCRN be-
PT

235 comes more powerful as the number of dense blocks increases, which indicates
that ‘the deeper, the better’ works for the proposed approach. Figure 7 visually
illustrates this conclusion. However, the performance grows slowly when the
CE

number of dense blocks exceeds five. In addition, the computational complexity


increase dramatically with increasing network depth. Therefore, we adopt the
AC

240 DCRN with five dense blocks to balance the reconstruction performance and
the computation complexity.

15
ACCEPTED MANUSCRIPT

5. Conclusion

We propose an efficient network named the DCRN to cope with the SISR
task. The DCRN introduces an enhanced dense unit by removing the BN layers
and adopting the SE blocks. Besides, a recursive architecture is adopted to con-

T
245

trol the parameters of the dense blocks. Notably, we propose a de-convolution

IP
based residual learning approach to accelerate the executive speed.
Extensive experiments are conducted on Set5, Set14, BSD100, and Urban

CR
100 datasets. The experimental results validate the efficiency of the DCRN.
250 The effectiveness of the significant components of the DCRN is also proved
in detail. With less storage complexity and computational consumption, the

US
DCRN provides a more flexible option for practical applications.
AN
Acknowledgment

This project was supported by the NSFC (U1611461, 61573387).


M

255 References

[1] C. Dong, C. C. Loy, K. He, X. Tang, Learning a deep convolutional network


ED

for image super-resolution, in: Proceedings of the European Conference on


Computer Vision, 2014, pp. 184–199.
PT

[2] Y. Tai, J. Yang, X. Liu, Image super-resolution via deep recursive residual
260 network, in: Proceedings of the IEEE Conference on Computer Vision and
CE

Pattern Recognition, 2017, pp. 3147–3155.

[3] C. Dong, C. C. Loy, X. Tang, Accelerating the super-resolution convo-


lutional neural network, in: Proceedings of the European Conference on
AC

Computer Vision, 2016, pp. 391–407.

265 [4] Z. Feng, J. Lai, X. Xie, D. Yang, L. Mei, Face hallucination by deep traver-
sal network, in: Proceedings of the International Conference on Pattern
Recognition, 2016, pp. 3276–3281.

16
ACCEPTED MANUSCRIPT

[5] Q. Cao, L. Lin, Y. Shi, X. Liang, G. Li, Attention-aware face hallucination


via deep reinforcement learning, arXiv preprint arXiv:1708.03132.

270 [6] G. Huang, Z. Liu, K. Q. Weinberger, L. van der Maaten, Densely connected
convolutional networks, arXiv preprint arXiv:1608.06993.

T
[7] J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, arXiv preprint

IP
arXiv:1709.01507.

CR
[8] J. Kim, J. Kwon Lee, K. Mu Lee, Accurate image super-resolution using
275 very deep convolutional networks, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2016, pp. 1646–1654.

US
[9] J. Kim, J. Kwon Lee, K. Mu Lee, Deeply-recursive convolutional network
for image super-resolution, in: Proceedings of the IEEE Conference on
AN
Computer Vision and Pattern Recognition, 2016, pp. 1637–1645.

280 [10] W.-S. Lai, J.-B. Huang, N. Ahuja, M.-H. Yang, Deep laplacian pyra-
mid networks for fast and accurate super-resolution, arXiv preprint
M

arXiv:1704.03915.

[11] B. Lim, S. Son, H. Kim, S. Nah, K. M. Lee, Enhanced deep residual net-
ED

works for single image super-resolution, in: Proceedings of the IEEE Con-
285 ference on Computer Vision and Pattern Recognition Workshops, 2017, pp.
136–144.
PT

[12] C. Dong, C. C. Loy, K. He, X. Tang, Image super-resolution using deep


convolutional networks, IEEE transactions on pattern analysis and machine
CE

intelligence 38 (2) (2016) 295–307.

290 [13] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recog-
AC

nition, in: Proceedings of the IEEE conference on computer vision and


pattern recognition, 2016, pp. 770–778.

[14] J. Yang, J. Wright, T. S. Huang, Y. Ma, Image super-resolution via sparse


representation, IEEE transactions on image processing 19 (11) (2010) 2861–
295 2873.

17
ACCEPTED MANUSCRIPT

[15] D. Martin, C. Fowlkes, D. Tal, J. Malik, A database of human segmented


natural images and its application to evaluating segmentation algorithms
and measuring ecological statistics, in: Proceedings of the International
Conference on Computer Vision, 2001, pp. 416–423.

T
300 [16] M. Bevilacqua, A. Roumy, C. Guillemot, M. L. Alberi-Morel, Low-

IP
complexity single-image super-resolution based on nonnegative neighbor
embedding, in: Proceedings of the British Machine Vision Conference,

CR
2012.

[17] R. Zeyde, M. Elad, M. Protter, On single image scale-up using sparse-


305

and Surfaces, 2010, pp. 711–730.US


representations, in: Proceedings of the International Conference on Curves
AN
[18] J.-B. Huang, A. Singh, N. Ahuja, Single image super-resolution from trans-
formed self-exemplars, in: Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, 2015, pp. 5197–5206.
M

310 [19] R. Timofte, V. De Smet, L. Van Gool, A+: Adjusted anchored neigh-
borhood regression for fast super-resolution, in: Proceedings of the Asian
ED

Conference on Computer Vision, 2014, pp. 111–126.


PT
CE
AC

18
ACCEPTED MANUSCRIPT

Zhanxiang Feng

T
IP
CR
Jianhuang Lai

US
AN
M
ED

Xiaohua Xie
PT
CE
AC
ACCEPTED MANUSCRIPT

Junyong Zhu

T
IP
CR
US
AN
M
ED
PT
CE
AC
ACCEPTED MANUSCRIPT

315 Zhanxiang Feng received the B.E. degree in automation from Sun Yat-Sen
University, China, in 2012. He is currently pursuing the Ph.D. degree in infor-
mation and communication engineering with Sun Yat-Sen University, China.
His research interests include person re-identification, face recognition, face hal-

T
lucination, image super-resolution, and visual surveillance. He has authored

IP
320 papers in IEEE TIP and ICPR. His ICPR 2016 paper won the enclosed finalist
best student paper award.

CR
Jianhuang Lai received his M.Sc. degree in applied mathematics in 1989
and his Ph.D. in mathematics in 1999 from Sun Yat-Sen University, China. He
joined Sun Yat-sen University in 1989 as an Assistant Professor, where cur-
325

US
rently, he is a Professor in School of Data and Computer Science. His current
research interests are in the areas of computer vision, pattern recognition and
its applications. He has published over 250 scientific papers in the international
AN
journals and conferences on image processing and pattern recognition, e.g. IEEE
TPAMI, IEEE TNN, IEEE TIP, IEEE TSMC (Part B), Pattern Recognition,
330 ICCV, CVPR and ICDM. Prof. Lai serves as a deputy director of the Image
M

and Graphics Association of China and also serves as a standing director of the
Image and Graphics Association of Guangdong. He is also the deputy director
ED

of Computer Vision Committee, China Computer Federation (CCF).


Xiaohua Xie received the B.S. degree in mathematics and applied math-
335 ematics from Shantou University in 2005, the M.S. degree in information and
PT

computing science and the Ph.D. degree in applied mathematics from Sun Yat-
sen University, China, in 2007 and 2010, respectively. He was an Associate
CE

Professor with Shenzhen Institutes of Advanced Technology, Chinese Academy


of Sciences. He is currently a Research Professor with Sun Yat-sen Univer-
340 sity. He has authored or co-authored over 30 papers in prestigious international
AC

journals and conferences. His current research fields cover image processing,
computer vision, pattern recognition, and computer graphics.
Junyong Zhu received his M.S. and Ph.D. degrees in applied mathematics
in the school of Mathematics and Computational Science from Sun Yat-sen
345 University, Guangzhou, P. R. China, in 2010 and 2014, respectively. He has

21
ACCEPTED MANUSCRIPT

working toward the post-doctoral research in the Department of Information


Science and Technology, Sun Yat-Sen University. Currently, he is a Associate
Research Professor with Sun Yat-sen University. His current research interests
include heterogeneous face recognition, visual transfer learning using partial

T
350 labeled or unlabeled auxiliary data and non-linear clustering. He has authored

IP
and co-authored papers in international journals and conferences such as IEEE
TIFS, PR, ICIP, AMFG and ICDM. His cooperative ICDM 2010 paper won the

CR
Honorable Mention for Best Research Paper Awards and his CCBR 2012 paper
won the Best Student Paper Awards.

US
AN
M
ED
PT
CE
AC

22

Вам также может понравиться