Вы находитесь на странице: 1из 12

Medical & Biological Engineering & Computing (2019) 57:1187–1198



Fluorescence microscopy image classification of 2D HeLa cells based

on the CapsNet neural network
XiaoQing Zhang 1,2 & Shu-Guang Zhao 1

Received: 5 August 2018 / Accepted: 17 December 2018 / Published online: 28 January 2019
# International Federation for Medical and Biological Engineering 2019

The development of computer technology now allows the quick and efficient automatic fluorescence microscopy generation of a
large number of images of proteins in specific subcellular compartments using fluorescence microscopy. Digital image process-
ing and pattern recognition technology can easily classify these images, identify the subcellular location of proteins, and
subsequently carry out related work such as analysis and investigation of protein function. Here, based on a fluorescence
microscopy 2D image dataset of HeLa cells, the CapsNet network model was used to classify ten types of images of proteins
in different subcellular compartments. Capsules in the CapsNet network model were trained to capture the possibility of certain
features and variants rather than to capture the characteristics of a specific variant. The capsule at the same level predicted the
instantiation parameters of the higher level capsule through the transformation matrix, and the higher level capsule became active
when multiple dynamic routing forecasts were consistent. Experiments show that using the CapsNet network model to classify
2D HeLa datasets can achieve higher accuracy.

Keywords CapsNet . Subcellular localization . Fluorescence microscopy . 2D HeLa . Image classification . Neural network

1 Introduction protein function. This localization information needs to be

systematically organized into a database based on the entire
The internal structure of a cell is complex, but at the same protein family. Determination of the high resolution position
time, it is a highly ordered structure that can be subdivided pattern of most or all of the expressed proteins in a given cell
into multiple cellular regions or organelles. Different regions type is the overall goal of proteomics. At present, one of the
or organelles possess their own specific functions and are most important and successful experimental methods for the
distributed throughout the cellular space. These structures, identification of protein subcellular localization is fluores-
which are more detailed than the cell structure, are termed cence microscopy; however, in the face of such massive
subcellular compartments. Subcellular compartments are im- amounts of data, the traditional method of image analysis is
portant parts of the cell that constitute its complex structure time-consuming and unsatisfactory. Thus, there is a need to
and are also the chief executives of cellular functions. Protein classify these subcellular images in an automatic and efficient
subcellular localization refers to the specific location of a gene manner, and these methods require faster computation.
product or protein within the complex regions of the cell. In recent years, researchers have developed automated
Studies have shown that the subcellular localization of a pro- bioinformatics-based systems for the classification of images
tein is closely related to its function, and understanding pro- of protein subcellular localization images. These systems can
tein subcellular localization is essential to fully appreciate recognize and classify digital fluorescent images using ma-
chine learning and neural networks. In 2001, Boland et al.
trained neural network classifiers using descriptions of various
* XiaoQing Zhang
digital features, and subsequently classified images of ten dif-
ferent subcellular patterns (including all the main organelles)
[1]. In 2004, Huang et al. trained a neural network classifier to
College of Information Science and Technology, Donghua
distinguish between all major protein subcellular localization
University, Shanghai 201620, China
patterns in 2D and 3D fluorescence microscopy images in
Nanjing University of Chinese Medicine Hanlin College, order to improve identification. The average classification
Taizhou, Jiangsu, China
1188 Med Biol Eng Comput (2019) 57:1187–1198

accuracy rate of a single 2D image was higher than 90% is a set of neurons that learns to recognize visual entities
for the first time, and 86% in 2D HeLa cell images [2]. In and outputs the probability that an entity exists within its
2006, Chen et al. classified 2D images of CHO cells con- finite field, as well as an Binstantiation parameter^ that
taining labeled DNA using deconvolution fluorescence mi- contains the entity’s attributes. Capsules can contain the
croscopy. With the use of the BP neural network (BPNN) possibility of physical attributes and features, including
and a combination of Zernike moments and Haralick tex- spatial information such as posture (position, size, orienta-
ture features, the five types of subcellular localization pat- tion), deformation, speed, reflectivity, color, and texture;
terns were distinguished, with an average precision of 88% all of which can determine the direction and size of the
[3]. In 2010, Nanni et al. proposed a model based on the consistency between the features. Instead of capturing the
Levenberg-Marquardt neural network ensemble and characteristics of a particular variant, the capsule is trained
AdaBoost algorithm [4]. In 2016, Godinez et al. classified to capture the likelihood of a signature and its variants;
images according to pixel intensity values based on the thus, the purpose of the capsule is not only to detect the
multi-scale convolution neural network (M-CNN) [5], features but also to train the model to learn variations so
and Pärnamaa et al. trained an 11-layer neural network to that the same capsules can detect the same object catego-
classify images of the subcellular localization of a fluores- ries in different orientations. Instead of a single scalar cap-
cent protein expressed in yeast cells, with an accuracy rate sule output vector, in a multi-layer capsule system, one
of 91% [6]. In 2018, Tahir et al. used Haralick texture level of the active capsule can dynamically transfer its out-
features based on a gray-level co-occurrence matrix put probability to a higher level of capsule; and if multiple
(GLCM) to analyze the validity of images of the subcellu- outputs coincide, the capsule will be activated.
lar localization of proteins for discrimination between the
different values of the offset parameter d of a specific
quantization level [7]. 2.1.1 Propagation process between capsule units (Fig. 1)
With the hope of improving the accuracy of subcellular
classification, here, we used the latest neural network model, The bottom level ui of the CapsNet network model is a vector,
CapsNet [8], to analyze the subcellular localization of proteins which is a capsule unit containing a set of neurons that is
in a dataset of fluorescence microscopy images in HeLa cells. multiplied by different weights Wij (the same vectors), respec-
For this experiment, we first needed to build and select a tively, to obtain ^u jji . The prediction vector ^u jji is subsequently
subset of fluorescence microscopy images of protein subcel- multiplied by the corresponding coupling coefficient cij and
lular localizations, which were representative and versatile. transmitted to the next layer of the capsule cell. The input sj of
Secondly, the fluorescence microscopy images in the dataset different capsule units is the weighted sum of all possible
were preprocessed, and thirdly, an appropriate classification incoming units; that is, the product of all possible incoming
model was built for training and prediction of the vectors ^u jji and the coupling coefficient cij. Subsequently, the
preprocessed images. Finally, the classification results of the different input vector sj is obtained, which is placed into the
subcellular images were obtained, analyzed, and evaluated, so squashing nonlinear function to yield the latter layer; that is,
as to completing the whole workflow. the output vector vj of the capsule unit. Next, the product of the
output vector vj and the corresponding prediction vector ^u jji
can be used to update the coupling coefficient cij, so that back
2 Methods propagation does not need to be applied to such iterative
2.1 Introduction to the capsule network method For capsules, the input ui and output vj are vectors. A
transformation matrix Wij is applied to the output of the
In deep learning, the convolution neural network (CNN) is previous capsule ui. For instance, for an m × k matrix, a
efficient at detecting image features, but the CNN classifier k-D ui is converted to m-D ^u jji ððm  k Þ  ðk  1Þ⇒m  1Þ.
has a poor recognition of the spatial relationship between Subsequently, the weight cij is used to calculate a weighted
features. In order for the CNN to handle viewpoint or style sum sij:
variations, more convolution layers and feature maps need
to be added. However, this approach tends to memorize ^u jji ¼ W ij ui
s j ¼ ∑ cij^u jji ð1Þ
datasets rather than summarize one solution, requiring a i
large amount of training data to cover different variants
and avoid overfitting. An article entitled BDynamic routing cij is the coupling coefficient trained by the iterative dy-
between capsules^ by Sara Sabour, Nicholas Frosst, and namic routing process and ∑jcij is designed as a sum. Instead
Geoffrey Hinton, the Godfather of deep learning, presented of using the ReLU function, a squashing function is used to
a neural network based on a capsule system [8]. A capsule scale the vector between 0 and unit length.
Med Biol Eng Comput (2019) 57:1187–1198 1189

Fig. 1 Propagation process

between capsule units

s j sj The coupling coefficient cij is calculated as the softmax
vj ¼ 2 ð2Þ form of bij:
1 þ s j ‖s j ‖
cij ¼ ð5Þ
It shrinks small vectors to zero vectors and reduces long ∑k expbik
vectors to unit vectors. Therefore, the probability of each cap-
sule is between 0 and 1. In order to make bij more accurate, it is iteratively updated
over multiple iterations (usually in three iterations).
v j ≈‖s j ‖s j for s j is short
sj ð3Þ bij ←bij þ ^u jji ⋅v j ð6Þ
v j≈ for s j is long
‖s j ‖

2.1.3 Loss function

2.1.2 Dynamic routing algorithm
For each category c appearing in the image, the capsule uses
The transformation matrix in the capsule Wij is still trained separate profit loss Lc:
by the back propagation algorithm; however, the coupling  2
coefficient cij is calculated using the new iterative dynamic Lc ¼ T c max 0; mþ −‖vc ‖
routing method. The prediction vector uj ∣ i is calculated  2
using a transformation matrix, ui is the activity vector of þ λð1−T c Þmax 0; ‖vc ‖−m− ð7Þ
capsule i, and v j is the activity vector of capsule j.
Intuitively, the prediction vector ^u jji is a prediction (vote) If the c class object exists and with λ weighted downward
output from capsule i to capsule j. If the activity vector vj (defaults to 0.5), initial learning is prevented from shrinking
has similarities to the prediction vector uj ∣ i, capsule i is the activity vector for all classes. The total loss is the sum of all
highly correlated with capsule j. This similarity is mea- category losses.
sured using the scalar product of the prediction and the
activity vectors. Therefore, the similarity considers the 2.1.4 Reconstruction and characterization
possibility and the characteristic attributes (not just the
possibility of neurons). Relevance scores can be calculated Real tags are used to select vj and reconstruct the image in
based on similarity bij: the training process. Subsequently, using vj as the input,
bij ←^
^u jji ⋅v j ð4Þ reconstructed networks with three fully connected layers
are used to regenerate the original image. Hinton et al. used
1190 Med Biol Eng Comput (2019) 57:1187–1198

the extra reconstruction loss to facilitate the DigitCaps lay- 2.2 Experimental design
er in order to encode the input digital images. The recon-
struction and characterization of the capsule output vector 2.2.1 Data preparation and preprocessing
can not only improves the accuracy of the model but also
enhances its interpretability. By modifying some of the The 2D HeLa dataset is a batch of fluorescence microscopy
components in the reconstructed vector, changes in the images of HeLa cells stained with fluorescent dyes specific
image can be observed after reconstruction, which helps for various organelles. The 2D HeLa image set uses
us to understand the output of the capsule layer. rhodamine-phalloidin to mark microfilaments and immu-
There are several advantages to using the CapsNet model nofluorescence to label endoplasmic reticulum (ER), Golgi
for 2D Hela image classification. First, the pre-processed 2D (represented by giant protein and gpp130 protein), lyso-
Hela image can be used directly as an input to the model. somes (LAMP2), internal bodies (transferrin receptor), mi-
Secondly, feature extraction and classification of 2D Hela im- tochondria, the nucleolus (nucleolin), and microtubules
ages can be performed simultaneously. Traditional algorithms (tubulin). Thus, a parallel DNA image is obtained and used
require feature extraction first and then input to the classifier, to calculate features relative to the core, as well as addi-
so using a capsule network can improve efficiency. Third, the tional localization patterns [9, 10].
training parameters of the capsule network are small. Through The 2D HeLa dataset used in the present paper contains
the weight value sharing, the training parameters can be re- ten kinds of organelles: Actin, DNA (nuclei), golgia
duced and the arithmetic performance of the algorithm can be (giantin, cis/medial Golgi), Golgpp (GPP130, cis Golgi),
improved. Finally, the capsule network records the spatial po- mitochondria, nucleolus (nucleolin), endosomes (TfR),
sitional relationship of the image features. When a small ER, lysosomes (LAMP2), and microtubules (tubulin).
change in the feature is detected, the capsule network can The 2D HeLa dataset contains 862 protein subcellular im-
output a vector of the same length but a slight change in ages of proteins within the ten subtypes. Each is a 16-bit
direction, thereby recording the spatial positional relationship grayscale image in a TIF format at a size of 382 × 382
of the image. pixels, as shown in Fig. 2.

Fig. 2 The ten organelles in 2D

HeLa cells are as follows: Actin,
DNA, golgia, Golgpp,
mitochondria, nucleolus,
endosomes, ER, lysosomes, and
microtubules. 2D HeLa datasets
are available at http://ome.grc.nia.
Med Biol Eng Comput (2019) 57:1187–1198 1191

Actin is a family of globular multi-functional proteins known as cisternae. There are two types of endoplasmic retic-
that form microfilaments. It is found in essentially all eu- ulum: rough (granular) and smooth (agranular).
karyotic cells (the only known exception being nematode The Golgi apparatus is an organelle found in most eukary-
sperm), Actin participates in many important cellular pro- otic cells [12]. The Golgi apparatus resides at the intersection
cesses, including muscle contraction, cell motility, cell di- of the secretory, lysosomal, and endocytic pathways. It is of
vision and cytokinesis, vesicle and organelle movement, particular importance in processing proteins for secretion,
cell signaling, and the establishment and maintenance of containing a set of glycosylationenzymes that attach various
cell junction sand cell shape. sugar monomers to proteins as the proteins move through the
Deoxyribonucleic acid (DNA) is a molecule composed of apparatus.
two chains (made of nucleotides) that coil around each other A lysosome is a membrane-bound organelle. A lysosome
to form a double helix carrying the genetic instructions used in has a specific composition, of both its membrane proteins, and
the growth, development, functioning, and reproduction of all its lumenal proteins. Besides degradation of polymers, the
known living organisms and many viruses. lysosome is involved in various cell processes, including se-
Endosome is a membrane-bound compartment inside eu- cretion, plasma membrane repair, cell signaling, and energy
karyotic cells. It is a compartment of the endocytic membrane metabolism [13].
transport pathway originating from the trans-Golgi membrane. Microtubules are tubular polymers of tubulin that form part
Molecules or ligands internalized from the plasma membrane of the cytoskeleton that provides structure and shape to the
can follow this pathway all the way to lysosomes for degrada- cytoplasm of eukaryotic cells and some bacteria [14]. They are
tion, or they can be recycled back to the plasma membrane [11]. involved in maintaining the structure of the cell and, together
The endoplasmic reticulum (ER) is a type of organelle with microfilaments and intermediate filaments, they form the
found in eukaryotic cells that forms an interconnected network cytoskeleton. They provide platforms for intracellular trans-
of flattened, membrane-enclosed sacs or tube-like structures port and are involved in a variety of cellular processes,

actin_001 actin_002 actin_003 randomColor0 randomColor0 randomColor0 randomColor1 randomColor1 randomColor1

actin_001 actin_002 actin_003 actin_001 actin_002 actin_003

randomColor2 randomColor2 randomColor2 randomColor3 randomColor3 randomColor3 randomGaussi randomGaussi randomGaussi

actin_001 actin_002 actin_003 actin_001 actin_002 actin_003 an0actin_001 an0actin_002 an0actin_003

randomGaussi randomGaussi randomGaussi randomGaussi randomGaussi randomGaussi randomGaussi randomGaussi randomGaussi

an1actin_001 an1actin_002 an1actin_003 an2actin_001 an2actin_002 an2actin_003 an3actin_001 an3actin_002 an3actin_003

randomGaussi randomGaussi randomGaussi randomRotati randomRotati randomRotati randomRotati randomRotati randomRotati

an4actin_001 an4actin_002 an4actin_003 on0actin_001 on0actin_002 on0actin_003 on1actin_001 on1actin_002 on1actin_003

randomRotati randomRotati randomRotati randomRotati randomRotati randomRotati randomRotati randomRotati randomRotati

on2actin_001 on2actin_002 on2actin_003 on3actin_001 on3actin_002 on3actin_003 on4actin_001 on4actin_002 on4actin_003
Fig. 3 Some of the Actin images were transformed
1192 Med Biol Eng Comput (2019) 57:1187–1198

including the movement of secretory vesicles, organelles, and involved in cell division and are the major constituents of
intracellular macromolecular assemblies [15]. They are also

add_1 total_loss

scalar scalar

128*4 128*4
mul scalar
reconstruction mean_1
reshape mul_2
128*4 sum_1
128*4*1*1 128*4*1*1 square_2
square mul_1 square_1
reconstruction 128
128*4*1*1 reshape_3 128*4*1*1 cast 128*784
maximum_1 sub_3
maximum sub_2
128*4*1*1 128*784 128*784
sub_1 128*784 128
sub Decoder reshape_2
128*4 128*4*1*1
128*4 128*4*1*1 128*4*16


128 128*28*28*1

128 128*20*20*256




Fig. 4 The CapsNet structure graph drawn by TensorBoard
Med Biol Eng Comput (2019) 57:1187–1198 1193

Fig. 5 The CapsNet architecture

mitotic spindles, which are used to pull eukaryotic chromo- large number of radiologists and researchers to work together
somes apart. in public. When using a multi-layer deep learning network to
The mitochondrion (plural mitochondria) is a double- train a limited dataset image, the risk of overfitting can easily
membrane-bound organelle found in most eukaryotic organ- occur. In order to reduce the over-fitting of the CapsNet model
isms. Mitochondria generate most of the cell’s supply of aden- during the training of 2D Hela images, we tried to solve it by
osine triphosphate (ATP), used as a source of chemical energy data expansion. The expansion of grayscale images mainly
[16]. In addition to supplying cellular energy, mitochondria are includes affine transformation such as translation, rotation,
involved in other tasks, such as signaling, cellular differentia- scaling, flipping, and cutting. The specific steps were as fol-
tion, and cell death, as well as maintaining control of the cell lows: (1) the enhanced images were generated by randomly
cycle and cell growth [17]. Mitochondrial biogenesis is in turn rotating the original image from 1 to 360°; (2) the images were
temporally coordinated with these cellular processes [18, 19]. amplified by randomly selecting a factor; (3) the images were
The nucleolus is the largest structure in the nucleus of eu- randomly altered by means of color, brightness, and contrast;
karyotic cells [20]. It is best known as the site of ribosome and (4) the Gauss noise was processed by adjusting the value
biogenesis. Nucleoli also participate in the formation of signal of the mean to 0.2 and the value of sigma to 0.3, so as to
recognition particles and play a role in the cell’s response to increase the data. Following this processing, the ten classes
stress [21]. Nucleoli are made of proteins, DNA, and RNA of 2D HeLa images were increased to 18,100, each of which
and form around specific chromosomal regions called were Actin (2056), DNA (1827), endosomes (1911), ER
nucleolar-organizing regions. (1806), golgia (1827), Golgpp (1785), lysosomes (1764), mi-
In the present experiment, the size of the images was ad- crotubules (1911), mitochondria (1533), and nucleolus
justed from the original 382 × 382 to 28 × 28 prior to training (1680).
the network models using the 862 image dataset. The number Part of the image transformation for the Actin class was as
of image datasets was then augmented to increase the number follows. Other types of image transformation were similar
of images. While public health data sets are available online, (Fig. 3):
most data sets are still limited in size and some are only ap- Among the 18,100 transformed images, 90% were used as
plicable to specific medical issues. Therefore, collecting med- the training and validation sets (18,100 × 0.9 = 16,290) and
ical data is a complex and expensive process that requires a 10% as the test set (18,100 × 0.1 = 1810). Finally, there were

Table 1 CapsNet model structure

for image classification Name Layers Input Output

Input_1 InputLayer (None, 28, 28, 1) (None, 28, 28, 1)

Conv1 Conv2D (None, 28, 28, 1) (None, 20, 20, 256)
PrimaryCaps_Conv2d Conv2D (None, 20, 20, 256) (None, 6, 6, 256)
PrimaryCaps_Reshape Reshape (None, 6, 6, 256) (None, 1152, 8)
PrimaryCaps_Squash Lambda (None, 1152, 8) (None, 1152, 8)
DigitCaps CapsuleLayer (None, 1152, 8) (None, 10, 16)
Out_Caps Length (None, 10, 16) (None, 10)
1194 Med Biol Eng Comput (2019) 57:1187–1198

Table 2 CapsNet model structure

for image reconstruction Name Layers Input Output

Input_2 InputLayer (None, 10) (None, 10)

Input_3:DigitCaps CapsuleLayer (None, 1152, 8) (None, 10, 16)
Mask_1 Mask Input_2 (none, 10) Input_3 (none, 10, 16) (None, 16)
Dense_1 Dense (None, 16) (None, 512)
Dense_2 Dense (None, 512) (None, 1024)
Dense_3 Dense (None, 1024) (None, 784)
Out_Reconstruction Reshape (None, 784) (None, 28, 28, 1)

14,480 images in the training set (train), 1810 images in the In the CapsNet architecture, the first two layers of the con-
validation set (validation), and 1810 images in the test set volution kernel we used for the training model are 9*9. The
(test). The images were converted to a binary format that selection of the convolution kernel is related to whether the
CapsNet can recognize: t10k-images-idx3-ubyte (1.4 MB), low-level features can be accurately obtained. If the size of the
t10k-labels-idx1-ubyte (1.8 KB), train-images-idx3-ubyte convolution kernel is too small, the selected features will be
(12.8 MB), and train-labels-idx1-ubyte (16.3 KB). CapsNet enhanced accordingly, but if there is noise in the image, the
selects parameters based on the training set and then uses the noise will be enhanced, thus overwhelming the useful details
test set to evaluate the performance of the selected parameters. in the image. However, the excessive size of the convolution
During training, if all training sets are used once, there may be kernel leads to an increase in the amount of computation dur-
a problem of overfitting. To avoid this overfitting problem, we ing training. When using a convolution kernel of n*n as the
need to add the validation set to the network and check if the convolution, the time complexity of the algorithm is O(n2) for
error is within a certain range. Therefore, the validation set is each pixel.
used to determine the parameters (weights) of the model. The importance of CapsNet model training is to make
Validation sets are used to adjust parameters (architecture) training errors as small as possible while obtaining the
and to avoid overfitting problems. The test set is used to eval- proper structure. Using a backpropagation algorithm (BP
uate the performance (generalization and predictive power) of algorithm) to estimate network parameters makes it easier
the model in the unseen data set. Following processing of all to test a multi-layer neural network model. The parameters
these 2D HeLa image datasets, the images were used for train- of the network model are related to the complexity and
ing and classed as input image data of the CapsNet network. enhancement capabilities of the network. Too many model
parameters, the decision function will be over-fitting,
resulting in poor model promotion. However, if the model
2.2.2 Experimental hardware and software parameters are small, the appropriate decision function
cannot be fitted, which can lead to inaccurate decision-
The experiment was completed using the HP workstation; the making. We use batch training protocols to continually
computer has 16 GB RAM, a 1 TB + 256 GB hard disk, a update the weight of the network and use the epoch to
NVIDIA Geforce GTX 1070 8G graphics processor, an Intel define the total number of patterns. The training of the
Core i7-6700HQ processor, and an Ubuntu 16.04 operating model is to obtain a set of parametersε = (ω, α) on the
system. The software uses Pycharm (Python2.7), tensorflow+ training sample set D. The criterion function L(ω, α) is
keras and gpu_device, Cuda8.0, and other configurations. solved to obtain the optimal solution for the loss function
between the two layers of networks, and to obtain the cor-
responding weights and offset values. In batch training, all
2.2.3 Experimental process and CapsNet network model modes will appear once and the weight will be updated.

The CapsNet architecture (Figs. 4 and 5): Table 4 The accuracy of the CapsNet test set as compared with that of
local binary pattern-based SVM and Haralick-based SVM
Table 3 The classification training results of the CapsNet model,
including loss, training set accuracy, verification set accuracy, and test Method Accuracy (%)
set accuracy
CapsNet 93.08
Method Loss Train_acc Val_acc Test_acc
Local binary pattern-based SVM [22] 90.2
CapsNet 0.003927 1.0 0.9983 0.9308 Haralick-based SVM [22] 84.1
Med Biol Eng Comput (2019) 57:1187–1198 1195

The advantage of this approach is to get the overall error of sample. CapsNet uses vectors instead of scalar output features
all modes. Since the training speed of batch training is and uses routing-by-agreement instead of max-pooling sub-
slower than random training, the learning rate is higher. sampling. This model can help us better understand the rela-
The size of the learning rate directly determines the con- tionship of images in space using less computational time and
vergence speed of the criterion function. In addition, the computational resources. Each capsule in CapsNet outputs a
capsule network performance is relatively strong. On small vector, so the model does not use the standard softmax or
data sets, CapsNet is significantly better than CNN. For sigmoid activation functions, but instead introduces the con-
very few samples, the capsule network still has good accu- cept of linear Bsquashed^ functions and Bprotocol routing.^
racy and convergence. Therefore, we can see that the classification accuracy of the
The CapsNet network model structure for image classifica- CapsNet network model is much higher than that of other
tion (Table 1): traditional machine-learning classification methods.
The CapsNet network model structure for image recon- The accuracy of the various methods is compared in
struction (Table 2): Table 4:

3 Results 4 Discussion

Using the CapsNet network model to train and classify 18,100 The size of the 2D HeLa images was 28 × 28 × 1 and the image
images of a 2D HeLa dataset, we finally obtained the follow- set was input as the standard convolution layer, ReLU Conv1. It
ing results: the value of the loss was 0.003927, the accuracy of used 256 9 × 9 kernels to generate the output of 256 channels
the training set (14,480 images) was 1, the accuracy of the (feature mapping), with the 9 × 9 kernel moving in the 28 × 28
verification set (1810 images) was 0.9983, and the accuracy image at a step size of 1 with no padding. The size of the
of the test set (1810 images) was 0.9308, as shown in Table 3. generated space was reduced from 28 × 28 to 20 × 20 (28–
Using the same 2D HeLa dataset, we compared the test set 9 + 1 = 20), and the output shape of the Conv1 layer was
accuracy of CapsNet with other methods. The classification 20 × 20 × 256. The PrimaryCaps layer was then entered, which
accuracy based on the CapsNet network model was 2.88% is a modified convolution layer supporting the capsule.
higher than that of SVM based on local binary patterns and PrimaryCaps used a kernel of 9 × 9 to move in the space of
8.98% higher than that of SVM based on Haralick. The 20 × 20, at a step size of 2 with no padding, so that the spatial
CapsNet model shows stronger performance in a limited dimension was reduced from 20 × 20 to 6 × 6 (20−9 2 + 1 = 6). In

0.600 0.600

0.400 0.400


0.00 0.00

0.000 1.500k 3.000k 4.500k 0.000 1.500k 3.000k 4.500k






0.000 1.500k 3.000k 4.500k

Fig. 6 This is a graph of Margin loss, reconstruction loss and total loss after training with 2D HeLa training sets
1196 Med Biol Eng Comput (2019) 57:1187–1198

Fig. 7 The image reconstructed by the DigitCaps layer output vector

the PrimaryCaps layer, we used an 8 × 32 kernel to generate 32 When ui and the corresponding Wij were multiplied to obtain
8-D capsules (that is, 8 output neurons grouped together to form the prediction vectors, there were 1152 × 10 coupling coeffi-
capsules), which generating an 8-D vector rather than a scalar, cients cij, which obtained 10 16 × 1 input vectors corresponding
with a total of 6 × 6 × 32 8-D capsules; thus, the output shape of to the weighted summation. The vector was input into the
the PrimaryCaps layer was 6 × 6 × 32 × 8. The PrimaryCaps squashing nonlinear function to obtain the final output vector
layer output 1152 (6 × 6 × 32 = 1152) vectors, with each vector vj, and each vector vj was used as the capsule of the j class, in
having a dimension of 8; that is, there were 1152 capsule units which the length of vj indicated that the image was recognized
in the i layer. The DigiCaps layer was then entered, where a as the probability of a certain category. The output shape of the
transformation matrix Wij with a shape of 8 × 16 was applied; DigitCaps layer was 10 × 16. Finally, the 28 × 28 image was
and each j class (j from 1 to 4) was converted to a 16-D capsule reconstructed through the fully connected FC layers. In com-
by an 8-D capsule. Since the 2D HeLa images had 10 classes, parison with Softmax, the capsules are not susceptible to mul-
the shape of the DigiCaps was 10 × 16 (10 16-D vectors); that tiple categories of overlapping interference; thus, are suitable
is, there were 10 standard capsule units, and the output vector of for single sample prediction and multi class work.
each capsule had 16 elements. The number of capsule cells in We trained approximately 4500 steps for the 2D HeLa im-
the previous PrimaryCaps layer was 1152; thus, the number of age dataset in the CapsNet network model, and due to the use
Wij was 1152 × 10, and the dimension of each Wij was 8 × 16. of GPU, we set the batch size to 128 and the epoch to 40. The

Fig. 8 This graph is a plot of the

accuracy of the test set when
using different routing iterations.
Lines of different colors represent
different routing iterations
Med Biol Eng Comput (2019) 57:1187–1198 1197

Table 5 The comparison of test set accuracy when the number of CapsNet model for classification prediction of 2D Hela
routing iterations is 1 to 5
datasets for the first time. CapsNet is a new algorithm that
Method Routing iterations Average test accuracy (%) implements dynamic routing based on active vectors and cap-
sules, which overcomes some of the limitations of artificial
CapsNet iter_routing = 1 92.9688 neural network classifiers. Second, we tried to increase the
CapsNet iter_routing = 2 92.9688 variability of the data by expanding the data of the 2D Hela
CapsNet iter_routing = 3 93.0804 image set in the experiment in order to reduce the over-fitting
CapsNet iter_routing = 4 92.9129 of the deep network model when training the image set. This
CapsNet iter_routing = 5 92.7455 can overcome the difficulty of limited medical data. Finally,
the accuracy of our CapsNet network model for 2D HeLa
dataset classification is 93.08%, which is significantly higher
coupling coefficient cij in the loss function was updated by than the SVM method based on local binary mode and the
conformance routing, which does not need to be updated ac- SVM method based on Haralick. The result shows that the
cording to the loss function. However, the whole network CapsNet model is a promising new image classification tech-
convolution parameters and the Wij within the capsule do need nology, which can improve the accuracy and effectiveness of
to be updated according to the loss function. In general, the protein subcellular fluorescence microscopy image classifica-
loss function can be updated directly by standard back-prop- tion. This method can meet the practical application require-
agation, and margin loss was used in the CapsNet model. ments of large-scale subcellular images.
Moreover, since the CapsNet model uses the length of the
instantiation vector to indicate whether the entity that the cap-
Compliance with ethical standards
sule is representing exists; when the image belongs to class k,
it is often expected that the top level capsule output vector is Conflict of interest The authors declare that they have no conflict of
very long. To allow multiple categories in an image, we gave a interest.
separate margin loss to each capsule of each class k.
Furthermore, the additional reconstruction loss was used to Publisher’s note Springer Nature remains neutral with regard to juris-
facilitate the DigitCaps layer to encode the input digital im- dictional claims in published maps and institutional affiliations.
ages. The following is a broken line chart of the margin loss,
reconstruction loss, and total loss obtained following training References
with a 2D HeLa training set (Figs. 6, 7, and 8).
In the CapsNet architecture, the lower level capsule will 1. Boland MV, Murphy RF (2001) A neural network classifier capable
sent its output to the higher level capsule whose output is of recognizing the patterns of all major subcellular structures in
similar. This similarity is captured by the dot product. After fluorescence microscope images of HeLa cells. Bioinformatics
this step, the algorithm repeats the process r times. After r
2. Huang K, Murphy RF (2004) Boosting accuracy of automated clas-
times, all outputs for higher level capsules were calculated sification of fluorescence microscope images for location proteo-
and routing weights have been established. The forward pass mics. BMC Bioinformatics 5:78
can continue to the next level of network. The number of 3. Chen X (2006) Automated interpretation of subcellular patterns in
fluorescence microscope images for location proteomics.
routing iterations can be determined through experiments. It
Cytometry 69(7):631–640
can be seen that when the number of routing iterations is 1 and 4. Nanni L, Lumini A, Brahnam S (2010) Local binary patterns var-
2, the test accuracy rate is 92.9688%. When the number of iants as texture descriptors for medical image analysis. Artif Intell
routing iterations is 3, the test accuracy rate is increased to Med 49(2):117–125
93.0804%. However, more routing iterations tend to over- 5. Godinez WJ, Hossain I, Lazic SE (2017) A multi-scale
convolutional neural network for Phenotyping high-content cellu-
fitting the data, so when the number of routing iterations is 5 lar images. Bioimage Imform 33(13):2010–2019
and 6, the test accuracy drops to 92.9129% and 92.7455%. So 6. Pärnamaa T, Parts L (2017) Accurate classification of protein sub-
the best option is to use 3 routing iterations (Table 5). cellular localization from high throughput microscopy images using
deep learning. G3 7(5):1385–1392
7. Tahir M (2018) Pattern analysis of protein images from fluores-
cence microscopy using gray level co-occurrence matrix. J King
5 Conclusions Saud Univ Sci 30(1):29–40
8. Sabour, S., Nov, C. V, and Hinton, G. E. (2017). Dynamic
routing between capsules. Computer vision and pattern recog-
We analyzed the characterization and classification prediction nition, (Nips)
of protein subcellular fluorescence microscopy images. The 9. Boland MV, Markey MK, Murphy RF (1998) Automated
innovations of this approach are as follows: First, we use the recognition of patterns characteristic of subcellular structures
1198 Med Biol Eng Comput (2019) 57:1187–1198

in fluorescence microscopy images. Cytometry 33(3):366– 22. Li, C., and Huang, J. (2013). A novel method for cell phenotype
375 image classification. 3rd international conference on electric and
10. Murphy RF, Velliste M, Porreca G (2003) Robust numerical fea- electronics (EEIC 2013), (Eeic), 105–107
tures for description and classification of subcellular location pat-
terns in fluorescence microscope images. J VLSI Sig Proc Syst
11. Mellman I (1996) Endocytosis and molecular sorting. Annu Rev
Cell Dev Biol 12:575–625. https://doi.org/10.1146/annurev.cellbio.
12. Pavelk M, Mironov AA (2008) The Golgi apparatus: state of the art Xiaoqing Zhang has a master’s
110 years after Camillo Golgi’s discovery. Springer, Berlin, p 580 degree in software engineering
ISBN978-3-211-76310-0 from Shandong University of
13. Settembre C, Fraldi A, Medina DL, Ballabio A (2013) Signals from Science and Technology. She is a
the lysosome: a control centre for cellular clearance and energy Ph.D. student at the School of
metabolism. Nat Rev Mol Cell Biol 14(5):283–296. https://doi. Information Science and
org/10.1038/nrm3565 PMC 4387238 Te c h n o l o g y, D o n g h u a
14. "Archived copy".Archivedfrom the original on 2014-02-06. University, China. Her research
Retrieved2014-02-24 focuses on the application of arti-
ficial intelligence technology in
15. Vale RD (2003) The molecular motor toolbox for intracellular trans-
medical image processing.
port. Cell 112(4):467–480. https://doi.org/10.1016/S0092-
16. Campbell NA, Williamson B, Heyden RJ (2006) Biology: explor-
ing life. Pearson prentice Hall, Boston, Massachusetts ISBN978-0-
17. McBride HM, Neuspiel M, Wasiak S (July 2006) Mitochondria:
more than just a powerhouse. Curr Biol 16(14):R551–R560.
18. Valero T (2014) Mitochondrial biogenesis: pharmacological ap- Shuguang Zhao is a professor
proaches. Curr Pharm Des 20(35):5507–5509. https://doi.org/10. and doctoral tutor in the discipline
2174/138161282035140911142118 of electronic science and technol-
19. Sanchis-Gomar F, García-Giménez JL, Gómez-Cabrera MC, ogy pattern recognition and intel-
Pallardó FV (2014) Mitochondrial biogenesis in health and disease. ligent systems at the School of
Molecular and therapeutic approaches. Curr Pharm Des 20(35): Information Science and
5619–5633. https://doi.org/10.2174/1381612820666140306095106 Te c h n o l o g y, D o n g h u a
20. O’Sullivan JM, Pai DA, Cridge AG, Engelke DR, Ganley AR University, China. He has long
(2013) The nucleolus: a raft adrift in the nuclear sea or the keystone been engaged in the research and
in nuclear structure? Biomol Concepts 4(3):277–286. https://doi. development of (medical) intelli-
org/10.1515/bmc-2012-0043.PMC5100006.PMID25436580 gent instruments and systems.
21. Olson MO, Dundr M (2015) Nucleolus: structure and function.
Encyclopedia Life Sci (eLS). https://doi.org/10.1002/