Journal Pathologi

Original Article
Detecting and Characterizing Cellular

Responses to Mycobacterium tuberculosis
From Histology Slides
M. Khalid Khan Niazi,1* Gillian Beamer,2 Metin N. Gurcan1
Abstract
1
Department of Biomedical Informatics, Infection with Mycobacterium tuberculosis (M.tb) results in immune cell recruitment to
The Ohio State University, Columbus, the lungs, forming macrophage-rich regions (granulomas) and lymphocyte-rich
Ohio regions (lymphocytic cuffs). The objective of this study was to accurately identify and
2
Department of Infectious Disease and characterize these regions from hematoxylin and eosin (H&E)-stained tissue slides. The
Global Health, Tufts University, Grafton, two target regions (granulomas and lymphocytic cuffs) can be identified by their mor-
Massachusetts phological characteristics. Their most differentiating characteristic on H&E slides is cell
density. We developed a computational framework, called DeHiDe, to detect and clas-
Received 31 May 2013; Revision Received
sify high cell-density regions in histology slides. DeHiDe employed a novel internuclei
7 November 2013; Accepted 16
geodesic distance calculation and Dulmange Mendelsohn permutation to detect and
November 2013
classify high cell-density regions. Lung tissue slides of mice experimentally infected
Grant sponsor: Tuft University at Grafton. with M.tb were stained with H&E and digitized. A total of 21 digital slides were used to
Correspondence to: M. Khalid Khan Niazi, develop and train the computational framework. The performance of the framework
Department of Biomedical Informatics, was evaluated using two main outcome measures: correct detection of potential
The Ohio State University, Columbus, OH, regions, and correct classification of potential regions into granulomas and lympho-
USA. cytic cuffs. DeHiDe provided a detection accuracy of 99.39% while it correctly classified
E-mail: Muhammad.niazi@osumc.edu 90.87% of the detected regions for the images where the expert pathologist produced
the same ground truth during the first and second round of annotations. We showed
Published online 12 December 2013 in that DeHiDe could detect high cell-density regions in a heterogeneous cell environment
Wiley Online Library (wileyonlinelibrary. with non-convex tissue shapes. VC 2013 International Society for Advancement of Cytometry
com)
DOI: 10.1002/cyto.a.22424 Key terms
C 2013 International Society for
V geodesic distance; internuclei distance; Mycobacterium tuberculosis; granulomas; lym-
Advancement of Cytometry phocytic cuffs; lung tissue
MYCOBACTERIUM tuberculosis (M.tb) causes nine million new cases of tuberculosis

(TB) in susceptible individuals each year and 1–2 million deaths. To put this in
another perspective, TB is the second leading cause of death due to a single infectious
agent, close behind disease due to HIV/AIDS (1). Many cells respond to M.tb includ-
ing immune cells (e.g., macrophages, dendritic cells, lymphocytes, and neutrophils)
and non-immune cells (e.g., fibroblasts and epithelial cells) and the morphology and
organization of the immune cells is characteristic (2–6). In experimentally infected
mice with asymptomatic M.tb, the immune cellular response is dominated by macro-
phages that form granulomas within the pulmonary parenchyma and lymphocytic
cuffs that form in perivascular or peribronchiolar adventitia as well as occasionally
within granulomas (2,3). These cells can be identified on formalin-fixed, paraffin-
embedded 5 mm tissue sections stained with H&E. The granulomas have typically low
cell-density due to the predominance of macrophages with relatively large amounts
of cytoplasm. In contrast, lymphocytic cuffs have typically high cell density due to the
predominance of lymphocytes, which are smaller cells with scant cytoplasm (7).
Additionally, normal lung structures, such as bronchial and bronchiolar epithelial
Cytometry Part A 85A: 151161, 2014

Original Article
Figure 1. Lung tissue from a mouse experimentally infected with M.tb by low-dose aerosol exposure. The granulomas are outlined in
green while lymphocytic cuffs are outlined in blue color. Additional high-density regions marked with black arrows corresponds to bron-
chial and bronchiolar epithelium or smooth muscle cells of the blood vessels. [Color figure can be viewed in the online issue, which is
available at wileyonlinelibrary.com.]
cells, arteries and veins, and alveolar ducts and sacs are also suming. Being able to discriminate macrophage-dense and
present (8). The cell density of bronchial and bronchiolar epi- lymphocyte-dense regions may be important as these cell types
thelial cells and smooth muscle can be similar to both granu- could contribute differently to the protective immunity to M.tb
lomas and lymphocytic cuffs, especially when tangentially cut; and immunopathology of TB disease. Cell density within differ-
however, the organization and the color of these regions differ ent tissue regions is a discriminating feature to identify cell
from the immune cells. Figure 1 shows a portion of lung tissue types (7), which maybe important in diagnosis, prognosis, and
from a mouse infected with M.tb: Here, the granulomas are treatment of several diseases. Moreover, regions-of-interest in
outlined in green while lymphocytic cuffs are outlined in blue histology slides can be identified by their cell density differences
color by a board certified veterinary pathologist. Other rela- (9,10). Calculating cell density by computer algorithms is an
tively high-density regions in Figure 1 (marked with black attractive alternative to manually identifying them because of
arrows) correspond to bronchial and bronchiolar epithelium higher inter- and intra-reader variability as well as the enor-
or smooth muscle of blood vessels. At the magnification in mous amount of labor it requires.
Figure 1 and at magnifications up to 4003 (zoomed version Calculation of cell density in a convex area can be rela-
of a small patch shown in the bottom right corner of Fig. 1), tively simple. Basically, the nuclei can be detected and the
which is the highest magnification routinely used by patholo- Euclidean distances between the nuclei provide a measure of
gists to evaluate H&E stained tissue sections, the cytoplasmic the density; in denser areas, these distances tend to be small.
border of each immune cell cannot be distinguished. However, However, automatic computation of cell density in tissues
closer inspection of Figure 1 reveals that the internuclear dis- with non-convex shapes is a challenging problem. The com-
tances in lymphocytic cuffs are relatively low as compared plex background (e.g., white areas in Fig. 1) makes the tissue
with cells in the granulomas. This is because lymphocytes non-convex. By assigning the background as a “no-go” region,
have less cytoplasm than other immune cells, and therefore, the cell density computation requires geodesic distance to
the lymphocyte nuclei are close together. approximate intercellular distance instead of simple Euclidean
To our knowledge, we are the first to design and test com- distance. Moreover, H&E-stained tissue images, it is non-
putational framework that can identify and discriminate trivial to segment cells when they are densely clustered. How-
between macrophage-rich regions and lymphocyte-rich regions ever, cell segmentation (11–14) is a well-studied problem in
induced by M.tb in the lungs of experimentally infected mice. microscopy. These methods (11–14) are equally applicable in
Previously, many investigators have used image analysis pro- finding the nuclei boundaries in H&E images. There are also
grams based on pixel density in captured images, or visual dis- specialized methods (15–17) for detection of nuclei bounda-
crimination, to quantify the total (proportion or absolute area) ries from H&E images. Once the cells are segmented, internu-
of M.tb-infected lung tissue (2,3). However, these methods are clei distance, which is defined as the distance between the
not able to discriminate macrophage-dense granulomas versus geometric centers of the nuclei (18,19), can be used as a surro-
lymphocytic cuffs, or the analyses are exceedingly time con- gate for cell distance.
152 Detection and Characterization of Cellular Responses

Original Article
Figure 2. A synthetic image to demonstrate the concepts of internuclei distance. The nuclei are shown as black elliptical objects; their
internuclei distance is shown with green line. (a) Green line represents the traditional distance w.r.t geometric centers between two differ-
ent size nuclei. (b) Two nuclei at the same distance as in (a) if the distance is computed between the geometric centers (Ref 1–3). (c, d) The
distances between nuclei in Figures 1a and 1b as calculated according to our suggested method is shown as green line. [Color figure can
be viewed in the online issue, which is available at wileyonlinelibrary.com.]
Defining distances between points is a well-studied prob- The history of shortest path computation in image analysis
lem (20,21). Nevertheless, the generalization of this definition dates back to Rutovitz (27), who hypothesized gray-level values
to point sets is non-trivial. Often, Hausdorff distance and its as height. The gray-weighted distance is defined in a manner
variants are employed to compute distances between point that it is lower along small gray level values in an image. In a
sets (22,23), but with considerably high computational cost. similar way, Levi et al. (28), defined the distance between two
In presence of geodesic constraint, the computation of Haus- points as the minimum sum of gray-levels along the path join-
dorff distance between point sets poses a huge computational ing the points. Toivanen (29) combined spatial distance with
challenge. It is also common to use anti-podal pairs between absolute gray-level difference to define the distance between
the convex polygons (point sets) to compute the minimum neighboring pixels. Soille (30) introduced the concept of geo-
distance (24). At the same time, its generalization in the pres- desic time functions where the distance between two points of
ence of geodesic constrains is still an open problem. a gray scale image is defined as the shortest length of the path
Due to their ease of implementation, point distances are linking both points in a minimum amount of time. In Ref. 31,
often employed to compute internuclei distance in histopathol- the redundancy in distance computation between all pairs of
ogy images. Daniel et al. (25) presented a graph-based method point was exploited for efficient implementation. Similar efforts
which models the tissue as an interconnected networks of epi- were made in Ref. 32 for efficient computation of distances
thelial cells. The connectivity between the cells was determined among all pairs of points in an image.
by their size, specific expression levels, and proximity (close- In this study, we develop a framework, called DeHiDe
ness) to each other. Internuclei distance was considered as dis- (Detect High Density) to detect and classify high cell-density
tance between nuclei, which is a poor measure in regions in histology slides. This framework consists of the
heterogeneous cell environment especially when two cells are algorithmic steps outlined in Figure 3. In the following sec-
less than the size of a nucleus apart. In some applications (26), tions these steps are explained in detail. The section
distance between geometric centers is sufficient to describe the
relationship among cells when tissue shape is convex and the
cells are of the same type with similar sizes and the internuclei
distance much larger than the size of the nuclei. When cells
have different size nuclei and cytoplasm (e.g., lung tissue), dis-
tance based on geometric centers tends to be a bit arbitrary. For
instance, Figure 2 shows some synthetic nuclei (black ellipses)
and their internuclei distance (green line). Although Figures 2a
and 2b have different size gaps between them due to different
size nuclei, their geometric centers are equally distant. To differ-
entiate between regions in lung tissue, it is important to com-
pute the precise internuclei distance as cells in these regions are
close to each other, with distances less than the size of a
nucleus. In such situations, the distance between the geometric
centers of the nuclei fails to detect the difference between cells
of different regions. In this study, we will utilize the internuclei
distance to detect and classify the tissue into different regions
of interest. Instead of computing the distance between geomet-
ric centers to represent the internuclei distance, we compute
Figure 3. Flow chart of the developed framework called DeHiDe.
the shortest path to determine the distance between all bound- [Color figure can be viewed in the online issue, which is available
ary pixels of the nuclei. at wileyonlinelibrary.com.]
Cytometry Part A 85A: 151161, 2014 153

Original Article
Figure 4. Left: A sample image f from lung sample stained with H&E. Right: OD image of image f. [Color figure can be viewed in the
online issue, which is available at wileyonlinelibrary.com.]
Background Segmentation presents the background segmenta- to transform the image segmentation problem into entropy-
tion methodology. The section Internuclei Distance Computa- based thresholding problem. The scale-space method uses nor-
tion covers the detail of the distance computation method malized multiscale difference of Gaussian to detect the nuclei
used in this article. The section to follow presents the frame- regions, both in the color channel and the intensity channel.
work to detect regions of interest. It is followed by classifica- Then, it fuses the information from both channels’ regions to
tion of detected regions into granulomas and lymphocytic detect the individual nuclei within nuclei clumps.
cuffs. The section Experimental Setup describes the experi-
mental setup used, while results are presented before Discus-
sion and Conclusions section. Discussion and Conclusions are INTERNUCLEI DISTANCE COMPUTATION
presented in the last section. The concept of all pairwise distance between pixels is
important in many areas of image processing ranging for
graph-based methods (31,37–39) to stochastic optimization
BACKGROUND SEGMENTATION
methods (40,41). For instance, the embedding provided by
To segment the H&E stained lung tissue image f (having Laplacian Eigenmaps (42) heavily depends on the underlying
normalized RGB values between [0 1]), we transform f into graph, which is created by computing the similarity between
optical density values: the pairwise elements (pixels). Isomap (43) also requires the
computation of all pairs of geodesic distances for a graph of
OD 52log 10 ðf Þ: (1) all the pixels in the image. Such applications require the dis-
This nonlinear transformation provides the flexibility to tance be computed among neighboring points (zero-dimen-
write each transformed value as a linear combination of stain sional). However, to compute the distance between two point
vectors (33). Figure 4 shows an image f and its equivalent sets (for instance, two neighboring nuclei), we first need to
optical density image. Low values (colorlessness) of OD corre- compute the shortest distance from any point in one point set
spond to the background (alveolar space, lumens of terminal to the other, and then find the minimum among all these
bronchioles, and lumens of arterioles) while the higher values shortest distances. For instance, consider two point sets G,
(colorfulness) correspond to the rest of the tissue region. and H representing the boundaries of two nuclei, the
Empirically, most of the tissue region was retained when OD shortest distance between G, and H can be written as:
was segmented using a threshold value of s5 0:20, that is, mingG min hH kg2hkp ; where k kp represents the p-norm.
We compute the shortest distance between nuclei by
Bm 5ODl s computing the geodesic distance. The geodesic distance
between two points in set S (geodesic mask) is the length of
Tr 5Bm nf (2) the shortest path(s) linking both points and included in S. To
compute the geodesic distance, the background in binary
where Tr represents an RGB image with non-zero values corre- mask Bm will be considered as a “no-go” region while com-
sponding to tissue region and (n) represents the point by point puting distances, hence making it geodesic in nature. In our
multiplication of the binary mask (Bm) with each color channel case, the binary mask Bm will serve as the geodesic mask. To
of f : OD l represents the average of OD across channels. Figure compute the geodesic distance, we set the background in Bm
5 shows the resultant image after application of Eq. (2) on to a value of 11 while all the nuclei are set to zero. Figure 6
image shown in Figure 4. Tr was further segmented into nuclei shows a synthetic image to give a pictorial representation of
N ; and rest of the tissue region RTr using visually meaningful the geodesic mask preparation process.
decomposition (VMD) and scale-space framework that we For geodesic distance computation, let us consider an
developed in our previous study (34–36). Briefly, VMD exploits image f defined over a digital space. An nD digital space usu-
the intrinsic properties of the RGB and CIEL*a*b* color space ally refers to an nD grid space that only contains integer

Original Article
Figure 5. Top left: A binary mask Bm as a result of thresholding ODm. Top right: Tissue region Tr after application of Eq. (2). Bottom left:
Tissue region RTr resulting from visual meaningful decomposition (Ref 4)). Bottom right: Nuclei N as a result of visual meaningful decom-
position of Figure 5 (Top right). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]
points in nD Euclidean space, that is, Z n Rn , where Z n rep-

resents the n-dimensional discrete space while Rn stands for X
n21
LðpÞ5 kpi 2pi11 k: jf ðpi Þ2f ðpi11 Þj (4)
real coordinate space. An nD digital image f , can be defined i50
as a function on Z n . Let p, q 2 Z n , and Ppq represent the set of
Here, k:k denotes the Euclidean norm, while j:j repre-
all possible paths between pixels p and q. Let p 2 Ppq represent
sents the absolute. There are multiple ways of defining the
a path of length n as n-tuple such that:
length of a path. For instance one can use:
p5ðp 5 p0 ; p1 ; . . . ; pn21 5qÞ; (3)
X
n21
LðpÞ5 kpi 2pi11 k1jf ðpi Þ2f ðpi11 Þj: (5)
where pi and pi11 are adjacent pixels. Then, the length L of
i50
the path p is defined as:
Figure 6. Synthetic image as an example of geodesic mask preparation. Left: A binary mask Bm where white region represents the back-
ground, purple regions represent the nuclei, and pink region represent the RTr. Right: The blue color represents 11 (“no-go” zone) while
black regions (nuclei) are set to zero. White region (RTr) is set to one. Now the distance will be computed from the black boundaries to all
the other black boundaries with a constraint that the path should always remain within black and white region. [Color figure can be
viewed in the online issue, which is available at wileyonlinelibrary.com.]

Original Article
For our application we have opted for a simpler version in a sparse binary matrix Bsp . Construction of Bsp can be writ-
of Eq. (4), that is, pseudometric sum of gray levels: ten as:
(
X
n21 1; 0 DNX NY ði; j Þ < r
LðpÞ5 f ðpi Þ 1f ðpi11 Þ ; wði; j Þ5 (9)
i50 0; otherwise
(6)
Xn22
5f ðpÞ1f ðqÞ12 f ðpi Þ: ( )
X n
i51
K5 i wði; jÞ s : (10)
j51
Exclusion of spatial weighting in Eq. (6) ensures that pix-
els set to zero do not play a role in distance computation, that (
wði; jÞ; i 2 K
is, the distance should not change within the cell boundary. Bsp ði; jÞ5 : (11)
This ensures that the nuclei sizes do not affect the distance 0; otherwise
computation. Essentially, one can travel within the nuclei
without incurring any cost, that is, two different pixels are
Here n represents the total number of nuclei. Due to the
separated by null distance if there exist a path with zero values
presence of proximity constraints, the matrix Bsp is equivalent
linking the pixels (44).
to adjacency graph as it tells the relationship between adjacent
The gray-weighted distance between p and q can be
nuclei. This simplifies our quest as subgraphs in the adjacency
defined as a combinatorial optimization problem as (45,46):
graph are equivalent to the potential regions in the lung tissue.
8 Technically, finding the connected components (subgraphs) in
< min LðpÞ; if LðpÞ 6¼ 11
d ðp; qÞ5
p2Ppq
: (7) the adjacency graph Bsp is equivalent to determining the
: potential regions in the image. To find the potential regions
11; otherwise
(connected components), we computed DMP of Bsp . DMP of
Bsp will determine the row (~ r ) and the column (~c ) permuta-
This basically corresponds to the shortest path between tion vectors, such that Bsp ð~r ;~c Þ has a block upper triangular
pixels p and q. Now, the shortest geodesic distance between form. The strongly connected components in Bsp ð~ c Þ repre-
r ;~
two nuclei NX and NY with boundary pixels represented as set sents the connected component in Bsp :
X and Y is defined as: We opted for DMP as opposed to the methods in Appen-
dix A.1 and A.2 because of its computational efficiency and
DNX NY 5 min d ðx; y Þ: (8)
x2X;y2Y availability of proximity constraints. Here, each nucleus
belonging to a potential region has a minimum number of s
where x and y represent the boundary pixels; DNX NY is a neighbors within geodesic distance r, which are considered as
square matrix with as many rows as the number of nuclei and proximity constraints. Both s and r were experimentally cho-
it contains the distance between every nucleus in the image. sen from the training image and set to 20 and 50, respectively.
Matrix elements are set to 11 if there is no path connecting
the nuclei. CLASSIFICATION OF POTENTIAL REGIONS INTO
GRANULOMAS, LYMPHOCYTIC CUFFS
DETECTION OF POTENTIAL REGIONS
To further classify the resulting regions into granulomas
To detect the potential regions (granulomas, lymphocytic and lymphocytic cuffs, the nuclei whose geodesic distance
cuffs, and bronchial or bronchiolar epithelium), we need to from its second to fifth neighbor is <10 is declared as part of
analyze each nucleus based on its closest neighbors. There are lymphocytic cuffs otherwise granulomas. This requires row-
multiple solutions to detect all potential regions from DNX NY . wise sorting (ascending order) of distances in DNX NY . The
The Appendices A.1 and A.2 provide a couple of methods whole process can be written as:
which can be used to detect the potential regions. Both meth-
ods in the Appendix provide similar results; however, both Ds 5 sort ðDNX NY Þ (12)
requires tuning of several parameters, which becomes an opti-
8
mization problem in itself. Moreover, both are computation- > X
5
>
< Ng ;
ally demanding as both require solving eigenvalue problem Ds ði; j Þ > 10
along with the clustering problem. An efficient and parameter CðNi Þ5 j52 : (13)
>
>
free alternative would be to use Dulmange Mendelsohn permu- :
Nl ; otherwise
tation (DMP) (47,48) which is explained in the next section.
Dulmange Mendelsohn Permutation Equation (13) is a classification function, which assigns
Based on the prior knowledge, we can put a proximity nucleus Ni to class granumolas (Ng) or lymphocytic cuffs (Nl)
constraint on DNX NY such that each nucleus should have s based on the sum of the geodesic distance of Ni to its four
neighbors within geodesic distance r. These constraints result closest neighbors (excluding the first neighbor). The geodesic

Original Article
Figure 7. An example where DeHiDe merges multiple regions into a single region. The blue line outlines bronchiolar epithelium while the
arrows indicate the holes. A bronchiolar epithelium may contain cells without the nuclei along the periphery of the hole. [Color figure can
be viewed in the online issue, which is available at wileyonlinelibrary.com.]
distance from the first neighbor was neglected as most nuclei by The Ohio State University’s IACUC protocol numbers
have at least one nucleus in close vicinity. 2007A0077 and its renewal, 2010A0045.
To avoid bronchiolar epithelium from getting classified
as granulomas and lymphocytic cuffs, regions containing Experimental Setup 1 (ES1)
holes are further analyzed. Figure 7 presents one such example We scanned nine lung slides at 203 microscopic magnifica-
where holes in the regions are marked with black arrows. A tion to train and test DeHiDe. One annotated image was used to
bronchiolar epithelium often, but not always, contains cells train the algorithm and to determine the parameters; the
without the nuclei along the periphery of the hole. We com- remaining eight images were used for independent testing of
pute the shortest distance between each boundary pixel of the trained algorithm. Each image contains multiple granuloma and
hole to the closest nucleus in the binary mask Bm. If the aver- lymphocytic cuff regions. On average, our dataset (results
age distance is higher than half the length of a closest cell— reported in Table 2) contains 72 such regions in each image. So,
assuming that length of a cell is double the length of a for the training image (annotated by the pathologist), the geo-
nucleus—it is declared as bronchiolar epithelium. As the desic distance (our proposed feature) from these multiple
bronchial epithelium contains an extra amount of cytoplasm regions were extracted. The value which best separate these
(due to the lack of nuclei) along the periphery of the hole, the regions was selected as a threshold to differentiate between the
average distance is higher in bronchial epithelium as compare granuloma and the lymphocytes cuff regions. As a scanned tissue
to other regions. sample often contains huge variation within the slide, one may
consider our training set as consisting of 72 different regions
instead of a single slide. Keeping into consideration that we are
EXPERIMENTAL SETUP
only using a single feature for classification, and it can be consid-
Female, 6- to 8-week-old, C57BL/6 mice purchased from ered as binary classification, as a result 72 representative regions
The Jackson Laboratory (Bar Harbor, ME) were infected with from the training image are sufficient to train our classifier.
50–100 Colony Forming Units (CFUs) of M.tb strain Erdman On average, each scanned image has a size of around 25
R
by aerosol using a Glas-col aerosol-generating machine. Mice K 3 25 K pixels. Scanned images were processed in MATLABV
were euthanized by CO2 asphyxiation between 2 and 5 weeks (Natick, MA). The boundaries of the regions are drawn on the
after infection. Following euthanasia, the lungs were har- original images and presented to the expert veterinarian
vested, and fixed in formalin for at least 3 weeks prior to sec- pathologist using in-house developed software. This software
tioning and staining with H&E. The H&E-stained lung slides provides the functionality to annotate the images via a web
from M.tb-infected mice were scanned with ScanScopeTM browser. The veterinary pathologist reviewed all the markings
(Aperio, Vista, CA). All animal experiments were approved by in a two-step process. In the first step, the task was to accept
The Ohio State University’s Institutional Animal Care and or reject the potential regions detected by DeHiDe or to add
Use Committee, and the Institutional Biosafety Committee. additional markings if some potential regions are not detected.
The original experimental infection with M.tb was approved In the second step, the pathologist reviewed the accuracy of

Original Article
Table 1. Evaluation of potential regions detected by DeHiDe more regions. For instance, in Table 1, we report 49 regions in
IMAGE IMAGE IMAGE IMAGE IMAGE IMAGE IMAGE IMAGE Image 3; however, during classification one of the regions
1 2 3 4 5 6 7 8 were automatically subdivided into two regions.
On average, DeHiDe correctly classified 68.61% of the
PF 31 61 46 49 55 49 81 115
regions in the test images. Most of these potential regions
TP 31 61 46 49 53 49 81 115
reside in close proximity of each other. This often results in
FN 0 0 0 0 2 0 0 0
classification of multiple potential regions into a single region
FP 0 0 0 0 1 0 0 0
(see Fig. 7). DeHiDe resulted in total of 90 merged regions,
PF stands for the number of potential regions detected by
which were classified as FC even if each merged region con-
DeHiDe. TP, FN, and FP stands for number of true positive, false tained several correct regions but one false region.
negative, and false positive, respectively.
Results on ES2
classifying potential regions into target regions, that is, granulo- During the first round of annotations, the board-certified
mas, lymphocytic cuffs, and bronchial or bronchiolar veterinary pathologist annotated 251 granuloma, lymphocytes
epithelium. cuffs, and bronchial epithelium regions in 80 images. In the
second round of annotations, the same pathologist annotated
Experimental Setup 2 (ES2) 242 different regions in the same 80 images. There were 211
We carefully cropped 80 images from 12 slides (these regions which were assigned the same label during the first
slides are different from the one used in ES1 but were pre- and the second round of annotations. This resulted in an
pared in the very same manner at the same magnification). accuracy of 74.36%.
On average, each image has a size of 3 K 3 3 K pixels. Each To improve the classification accuracy of DeHiDe, we
image contained multiple potential regions. These 80 images appended our feature with a morphological feature. We per-
were presented to the expert veterinarian pathologist using in- formed morphological closing on nuclei with a disk of radius
house developed software. This software provides the func- 13. The radius was chosen such that the nuclei within each
tionality to annotate the images via a web browser. The veteri- potential region resulted in huge nuclei clumped region. We
nary pathologist annotated all the potential regions in all 80 computed the geodesic distance transform (30) of each
images into granulomas, lymphocytic cuffs, and bronchial or clumped region where the nuclei within each clumped region
bronchiolar epithelium. To determine the intra-reader vari- served as seeds. The classification function in Eq. (13) was
ability, each image was annotated twice by the same patholo- modified to:
gist with a 12 days interval between the readings. 8 !
>
> X5
< Ng ; Ds ði; j Þ > 10 Ù ðLi ½GDT > 10Þ
RESULTS CðNi Þ5 :
j52
>
>
Results of ES1 :
Nl ; otherwise
The first test was to assess the detection accuracy. Table 1
presents the results for potential regions detection by DeHiDe. (14)
It also shows the evaluation of the board-certified veterinary
pathologist in agreement or disagreement with DeHiDe. The Here Li represents the 1average geodesic distance between
pathologist’s evaluation is reported here in terms of True posi- the five closest neighbors of Ni to their closest corresponding
tive (TP: pathologist and the DeHiDe agreed), false negative neighbor. Here five closest neighbors (Ni1 ; Ni2 ;
(FN: DeHiDe missed, pathologist marked), and false positive Ni3 ; Ni4 ; and Ni5 ) of Ni are determined based on Ds while their
(FP: DeHiDe marked, pathologist deleted). DeHiDe provides corresponding closest neighbors (Ni11 ; Ni21 ; Ni31 ; Ni41 ;
a detection accuracy of 99.35%. It is evident from Table 1 that and Ni51 ) and their corresponding distance (GDTi11 ;
there were only two regions that were incorrectly rejected by GDTi21 ; GDTi31 ; GDTi41 ; and GDTi51 ) are determined based on
DeHiDe. Moreover, there is a single region that was incor- the geodesic distance transform. The DeHiDe resulted in a clas-
rectly identified by DeHiDe as potential region. The results sification accuracy of 75.78% when compared with the first
show that there is a strong agreement between the board- round of annotations of the expert pathologist. The DeHiDe
certified veterinary pathologist and the markings by DeHiDe.
Table 2 shows the results of classification of potential Table 2. Classification of potential regions by DeHiDe
regions into granulomas, lymphocytic cuffs, and bronchial or IMAGE IMAGE IMAGE IMAGE IMAGE IMAGE IMAGE IMAGE
bronchiolar epithelium. Once again the evaluation of the clas- 1 2 3 4 5 6 7 8
sification results was performed by a board-certified veteri-
AC 31 61 46 50 59 49 81 120
nary pathologist. Although, we used the same images for both
FC 11 19 18 20 25 15 25 23
detection and classification, there is a slight difference in the
TC 20 42 28 30 34 34 56 97
number of automatically classified regions (number of regions
in Table 2) and the number of potential regions reported in
AC stands for automatically classified potential regions by
Table 1. This comes from the fact that during classification, DeHiDe. FC and TC stands for false and truly classified potential
certain regions were automatically subdivided into two or regions, respectively.

Original Article
classification accuracy dropped to 70.60% when compared help of SVM, trained using small cellular patches, presents a
with the second round of annotations of the expert pathologist. hyperplane which cannot be easily explained in natural lan-
In addition to reporting the accuracy for each reading sepa- guage. This unperceivable nature of SVM results in lack of
rately, we also report the overall performance of DiHiDe only trust, which limits the use of image analysis framework in a
for images where the expert pathologist produced the same clinical environment. Moreover, as the hyperplane is based on
ground truth during the first and second round of her annota- support vectors, the method will work until the test data is
tions. In these 36 images, DeHiDe resulted in an average accu- similar to the training data, which is not always the case.
racy of 90.87%. This shows that DeHiDe performed with In our previous study, we calculated the distances
considerably higher accuracy when the expert opinion is con- between the nuclei and cytoplasm using Euclidean distances
sistent during the two rounds of annotations. (18). In that work, each nucleus constituted a vertex and the
resulting graph was constructed. Pairs of nuclei were consid-
ered “linked” if their Euclidean distance was less than a pre-
DISCUSSION AND CONCLUSIONS determined threshold (e.g., 30 pixels). The features related to
Characterization of cell density by means of image analy- this graph are extracted to represent the compactness, cluster-
sis algorithms is an important task as there is a growing need ing, and spatial uniformity of nuclei and cytoplasmic compo-
to analyze large quantities of tissue slides to identify certain nents. This current study is different from that study in two
types of tissues. In many cases, cell density is the most impor- respects: instead of computing the Euclidean distance, we
tant feature. The current study makes a contribution to this computed the geodesic distance as explained in section Inter-
area by re-defining the distance by taking the heterogeneity of nuclei Distance Computation. The second difference is in the
cells into account and then employing a really efficient cluster- use of resulting distances. While in our previous study, we
ing method to identify highly dense areas. We emphasized the used the graph features globally to classify the tissue into one
importance of computing the geodesic distance in comparison of the grades; in this study, the distances were used locally to
to Euclidean distance in the context of histopathology images, identify different tissue types. Because the previous approach
especially in the presence of heterogeneous cell environment does not take heterogeneity into account, our previous
and non-convex tissue shape. We showed that approximating approach would not be appropriate for the current study.
internuclei distance with geodesic distance can help to detect However, in our future study, we will use our approach glob-
potential regions from H&E slides. We further presented how ally to classify images into different grades.
geodesic distances can be used to classify each potential region The method in Ref. 25 is similar to the presented method
into its own class. except that we take into account the tissue and background
High detection accuracy will have a positive impact on while computing distance between nuclei, however, this was
the pathologist workflow. The pathologist needs to move the not the case in Ref. 25. Moreover, instead of computing the
slide around digitally (or physically while working with a shortest distance between all boundary points between two
microscope) to determine potential regions because the area nuclei, internuclei distance was considered as the distance
to be examined is very large, particularly at higher magnifica- between nuclei. In our application, most of the cells in close
tions. This is not only time-consuming but also requires the proximity are similar in shape, which will bias the energy
pathologist to be systematic so as not to leave any unexamined function in Ref. 25 toward internuclei distance. However,
areas of the tissue. For this reason, most of the pathologists internuclei distance is a poor measure in heterogeneous cell
limit their analysis to few high-power field images, which environment especially when two cells are less than the size of
leads to inter-reader variability and lack of standardization. a nucleus apart. Due to this reason, the approach in Ref. 25) is
DeHiDe can sufficiently reduce the workload of a pathologist not appropriate for our application.
by automatically determining potential regions in an image. Although we developed and tested DeHiDe using lung
Moreover, the pathologists can benefit from the result of all tissues from mice experimentally infected with M.tb, the fun-
potential regions instead of limiting themselves to only few damental design of DeHiDe can easily be generalized and
high-power field images. applied to analyze many types of histopathology images from
Our emphasis in this article was to present a computa- complex diseases where the cellular response is heterogeneous.
tional framework based on geodesic distance and demonstrate We recognize that DeHiDe may require potential inclusion of
its effectiveness in TB histopathology images. The developed other features such as normalized color characteristics, nearby
framework is general and it can certainly be used for classifica- tissue characteristics, morphology (e.g., size and shape), as
tion of different histology slides with the addition of other well as postprocessing operations to improve performance,
disease specific features. DeHiDe is intuitive and mimics the especially with regard to lesion classification. In the current
thought process of the pathologist. This provides the user study, however, the presented framework was only applied to
(biologist/pathologist/clinician) to understand the shortcom- one particular type of problem with one type of staining and
ings of the methodology and to have a feeling where the sys- tested on a limited number of cases. In our future study, we
tem will work or fail. In contrast, the current state of the art will apply this methodology to other diseases, problems that
classification methods like support vector machines (SVM) involve global decision making (e.g., Grade determination),
present an unperceivable solution if the data is non-linearly and other types of staining (e.g., IHC or immunofluores-
separable. For instance, performing cell classification with the cence). We will also increase the number of training and test

Original Article
cases from multiple institutions to further improve the detec- represent the solutions to Eq. (18) with associated eigenvalues
tion and classification accuracy. k0 ; k1 ; . . . ; kn21 (increasing order). Each connected compo-
nent is embedded separately. This can be followed by cluster-
APPENDIX ing of these m-dimensional embedding to detect the
potential regions :
A.1. Multidimensional Scaling LITERATURE CITED
Classical multidimensional scaling on DNX NY can be used 1. World Health Organization1. Causes of Death 2008 Summary Tables. Global Health
Observatory Data Repository; 2011.
to determine the surrogate representation of nuclei in m- 2. Major S, Turner J, Beamer G. Tuberculosis in CBA/J Mice. Vet Pathol 2013;50:1016–1021.
dimensional Euclidean space (49). Formally, embedding in m- 3. Mustafa T, Phyu S, Nilsen R, Jonsson R, Bjune G. A mouse model for slowly progres-
dimensional Euclidean space can formulated as: sive primary tuberculosis. Scand J Immunol 1999;50(2):127–136.
4. Li Y, Wang Y, Liu X. The role of airway epithelial cells in response to mycobacteria
infection. Clin Dev Immunol 2012;2012:791392.
1=2 ;
5 2m m (15) 5. Leong FJWM, Eum S, Via LE, Barry CER, editors. Pathology of tuberculosis in the
human lung. In A Color Atlas of Comparative Pathology of Pulmonary Tuberculosis.
New York, NY: CRC Press; 2011. pp 53–81.
where 2m represents the matrix of m eigenvectors corre- 6. Cooper AM. Cell-mediated immune responses in tuberculosis. Annu Rev Immunol
2009;27:393–422.
sponding to the m largest eigenvalues. m is a diagonal matrix 7. Stevens A, Lowe JS, Young B. Wheater’s Basic Histopathology. Edinburgh: Churchill
of Eigenvalues. Rows of corresponds to surrogate represen- Livingstone; 2002.
8. Fiore M, Eroschenko VP. Atlas of Normal Histology. Philadelphia: Lea & Febiger; 1988.
tation of nuclei in m-dimensional Euclidean space. Both, 2m
9. Belkacem-Boussaid K, Samsi S, Lozanski G, Gurcan MN. Automatic detection of fol-
and m are computed by finding the eigenvectors and eigen- licular regions in H&E images using iterative shape index. Comput Med Imaging
Graph 2011;35(7-8):592–602.
values of }, where:
10. Gurcan MN, Boucheron LE, Can A, Madabhushi A, Rajpoot NM, Yener B. Histo-
pathological image analysis: A review. IEEE Rev Biomed Eng 2009;2:147–171.
1
}52 I2w 21 110 DN2 x Ny I2w 21 110 : (16) 11. Nandy K, Gudla PR, Amundsen R, Meaburn KJ, Misteli T, Lockett SJ. Automatic seg-
mentation and supervised learning based selection of nuclei in cancer tissue images.
2
Cytometry Part A 2012;81A(9):743–754.
12. Wang W, Ozolek JA, Rohde GK. Detection and classification of thyroid follicular
lesions based on nuclear structure from histopathology images. Cytometry Part A
Here, w represents the number of nuclei or the dimen- 2010;77A(5):485–494.
sion of DNX NY : I is an identity while 1 represents the square 13. Gudla PR, Nandy K, Collins J, Meaburn KJ, Misteli T, Lockett SJ. A high-throughput
system for segmenting nuclei using multiscale techniques. Cytometry Part A 2008;
matrix of ones. Due to geodesic nature of the distances, the 73A(5):451–466.
resulting embedding in m-dimensional space is equivalent to 14. Lin G, Chawla MK, Olson K, Barnes CA, Guzowski JF, Bjornsson C, Shain W,
embedding achieved with Isomaps (43). m is the only parame- Roysam B. A multi-model approach to simultaneous segmentation and classification
of heterogeneous populations of cell nuclei in 3D confocal microscope images.
ter which can easily be determined empirically. Once we have Cytometry Part A 2007;71A(9):724–736.
an m-dimensional representation of nuclei, one can easily per- 15. Kong H, Gurcan M, Belkacem-Boussaid K. Partitioning histopathological images:
An integrated framework for supervised color-texture segmentation and cell split-
form clustering of these m-dimensional vectors to detect the ting. IEEE Trans Med Imaging 2011;30(9):1661–1677.
potential regions : From a computational point of view, this 16. Kothari S, Phan JH, Moffitt RA, Stokes TH, Hassberger SE, Chaudry Q, Young AN,
Wang MD. Automatic batch-invariant color segmentation of histological cancer
method is relatively expensive as it involves the solution to images. In International Symposium on Biomedical Imaging (ISBI), 2011: IEEE;
2011. pp 657–660.
eigenvalue and clustering problem.
17. Chen C, Wang W, Ozolek JA, Rohde GK. A flexible and robust approach for seg-
menting cell nuclei from 2D microscopy images using supervised learning and tem-
plate matching. Cytometry Part A 2013;83A(5):495–507.
A.2. Laplacian Eigenmaps 18. Oztan B, Kong H, Gurcan M, Yener B, editors. Follicular lymphoma grading using
cell-graphs and multi-scale feature analysis. SPIE Medical Imaging; 2012: Interna-
Another possibility is to use normalized DNX NY to con- tional Society for Optics and Photonics.
struct an adjacency graph, and compute the m-dimensional 19. Singanamalli A, Sparks R, Rusu M, Shih N, Ziober A, Tomaszewski J, Rosen M,
Feldman M, Madabhushi A, editors. Identifying in vivo DCE MRI parameters corre-
embedding in the Euclidean space with the help of Laplacian lated with ex vivo quantitative microvessel architecture: A radiohistomorphometric
Eigenmaps (42). The adjacency matrix can be computed as: approach. SPIE Medical Imaging; 2013: International Society for Optics and
Photonics.
20. Berkhin P. A Survey of Clustering Data Mining Techniques. Grouping Multidimen-
2ðDN N Þ2
X Y sional Data. Berlin: Springer; 2006. pp 25–71.
Ad 5e t : (17) 21. Han J, Kamber M, Pei J. Data Mining: Concepts and Techniques. San Francisco:
Morgan kaufmann; 2006.
22. Rote G. Computing the minimum Hausdorff distance between two point sets on a
Here t (radius) can be determined empirically. To deter- line under translation. Inform Process Lett 1991;38(3):123–127.
mine the m-dimensional embedding, we need to compute the 23. Huttenlocher DP, Klanderman GA, Rucklidge WJ. Comparing images using the
Hausdorff distance. IEEE Trans Pattern Anal Mach Intell 1993;15(9):850–863.
eigenvalues and eigenvectors for the generalized eigenvalues 24. Toussaint GT, Bhattacharya BK. Optimal algorithms for computing the minimum
problem: distance between two finite planar sets. Pattern Recogn Lett 1983;2(2):79–82.
25. Margolis D, Santamaria-Pang A, Rittscher J. Tissue segmentation and classification
using graph-based unsupervised clustering. In 9th International Symposium on Bio-
LG5kDG; (18) medical Imaging (ISBI): IEEE; 2012. pp 162–165.
26. Sertel O, Kong J, Catalyurek UV, Lozanski G, Saltz JH, Gurcan MN. Histopathologi-
X cal image analysis using model-based intermediate representations and color texture:
Follicular lymphoma grading. J Signal Process Syst 2009;55(1-3):169–183.
where D is a diagonal matrix with entries, Dkk 5 k
Adkl and
27. Rutovitz D. Data structures for operations on digital images. Pictorial Pattern
L5D2Ad is a Laplacian matrix. Now the embedding of each Recognition. Washington, DC: Thompson Book Co.; 1968:105–133.
28. Levi G, Montanari U. A grey-weighted skeleton. Inform Control 1970;17(1):62–91.
nucleus i in the m-dimensional Euclidean space can be written 29. Toivanen PJ. New geodosic distance transforms for gray-scale images. Pattern Recogn
as:ei 5ðG1 ðiÞ; G2 ðiÞ; . . . ; Gm ðiÞÞ; where G0 ; G1 ; . . . ; Gn21 Lett 1996;17(5):437–450.

Original Article
30. Soille P. Morphological Image Analysis: Principles and Applications, 2nd ed. Secau- 40. Khan MK, Nystr€ om I. A modified particle swarm optimization applied in image
cus, NJ, USA: Springer-Verlag New York, Inc.; 2002. registration. In: 2010 International Conference on Pattern Recognition; 2010;1(9):
31. Bertelli L, Sumengen B, Manjunath BS, editors. Redundancy in all pairs fast march- 2302–2305.
ing method. In: IEEE International Conference on Image Processing; 2006. 41. Frey BJ, Dueck D. Clustering by passing messages between data points. Science 2007;
32. Noyel G, Angulo J, Jeulin D. Fast computation of all pairs of geodesic distances. 315(5814):972–976.
Image Anal Stereol 2011;30(2):101–109.
42. Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and
33. Macenko M, Niethammer M, Marron JS, Borland D, Woosley JT, Guan X, Schmitt clustering. Adv Neural Inform Process Syst 2001;14:585–591.
C, Thomas NE, editors. A method for normalizing histology slides for quantitative
analysis. IEEE International Symposium on Biomedical Imaging (ISBI); 2009. 43. Tenenbaum JB, De Silva V, Langford JC. A global geometric framework for nonlinear
34. Niazi MKK, Pennell M, Elkins C, Hemminger J, Jin M, Kirby S, et al., editors. dimensionality reduction. Science 2000;290(5500):2319–2323.
Entropy based quantification of Ki-67 positive cell images and its evaluation by a 44. Soille P. Generalized geodesic distances applied to interpolation and shape descrip-
reader study SPIE Medical Imaging; 2013: International Society for Optics and tion. In: Mathematical Morphology and Its Applications to Image Processing. The
Photonics. Netherland: Springer; 1994. pp 193–200.
35. Das H WZ, Niazi M K K, Aggarwal R, Lu J, Kanji S, Das M, Joseph M, Gurcan M,
Cristini V. Impact of diffusion barriers to small cytotoxic molecules on the efficacy 45. Gedda M. Contributions to 3D Image Analysis using Discrete Methods and Fuzzy
of immunotherapy in breast cancer. PLoS One 2013;8(4):e61398. Techniques: With Focus on Images from Cryo-Electron Tomography [Doctoral dis-
sertation]. Acta Universitatis Upsaliensis: Uppsala University; 2010.
36. Niazi MKK, Satoskar AA, Gurcan MN, editors. An Automated Method for Counting
Cytotoxic T-cells from CD8 Stained Images of Renal Biopsies. SPIE Medical Imaging; 46. Niazi M. Image Filtering Methods for Biomedical Applications [Doctoral disserta-
2013: International Society for Optics and Photonics. tion]. Acta Universitatis Upsaliensis: Uppsala University; 2011.
37. Papa JP, Falcao AX, Suzuki CTN. Supervised pattern classification based on 47. Pothen A, Fan C-J. Computing the block triangular form of a sparse matrix. ACM
optimum-path forest. Int J Imaging Syst Technol 2009;19(2):120–131. Trans Math Softw 1990;16(4):303–324.
38. Malmberg F, Lindblad J, Sladoje N, Nystr€ om I. A graph-based framework for sub-
pixel image segmentation. Theor Comput Sci 2011;412(15):1338–1349. 48. Davis TA. Direct Methods for Sparse Linear Systems. Philadelphia: Society for Indus-
trial and Applied Mathematics; 2006.
39. Khalid Khan Niazi M, Nystrom I, Ibrahim MT, Guan L, editors. Bias Field Correc-
tion Using Grey-weighted Distance Transform Applied on MR Volumes. In: IEEE 49. Zigelman G, Kimmel R, Kiryati N. Texture mapping using surface flattening via mul-
International Symposium on Biomedical Imaging: From Nano to Macro; 2011. tidimensional scaling. IEEE Trans Visual Comput Graph 2002;8(2):198–207.

Journal Pathologi

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Journal Pathologi

Загружено:

Авторское право:

Доступные форматы

Original Article

Detecting and Characterizing Cellular

M. Khalid Khan Niazi,1* Gillian Beamer,2 Metin N. Gurcan1

MYCOBACTERIUM tuberculosis (M.tb) causes nine million new cases of tuberculosis

Cytometry Part A 85A: 151161, 2014

152 Detection and Characterization of Cellular Responses

Cytometry Part A 85A: 151161, 2014 153

154 Detection and Characterization of Cellular Responses

points in nD Euclidean space, that is, Z n Rn , where Z n rep-

Cytometry Part A 85A: 151161, 2014 155

156 Detection and Characterization of Cellular Responses

Cytometry Part A 85A: 151161, 2014 157

158 Detection and Characterization of Cellular Responses

Cytometry Part A 85A: 151161, 2014 159

160 Detection and Characterization of Cellular Responses

Cytometry Part A 85A: 151161, 2014 161

Вам также может понравиться

Journal Pathologi

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Journal Pathologi

Загружено:

Авторское право:

Доступные форматы

Original Article

Detecting and Characterizing Cellular

M. Khalid Khan Niazi,1* Gillian Beamer,2 Metin N. Gurcan1

MYCOBACTERIUM tuberculosis (M.tb) causes nine million new cases of tuberculosis

Cytometry Part A 85A: 151161, 2014

152 Detection and Characterization of Cellular Responses

Cytometry Part A 85A: 151161, 2014 153

154 Detection and Characterization of Cellular Responses

points in nD Euclidean space, that is, Z n  Rn , where Z n rep-

Cytometry Part A 85A: 151161, 2014 155

156 Detection and Characterization of Cellular Responses

Cytometry Part A 85A: 151161, 2014 157

158 Detection and Characterization of Cellular Responses

Cytometry Part A 85A: 151161, 2014 159

160 Detection and Characterization of Cellular Responses

Cytometry Part A 85A: 151161, 2014 161

Вам также может понравиться

points in nD Euclidean space, that is, Z n Rn , where Z n rep-