Академический Документы
Профессиональный Документы
Культура Документы
CHAPTER-4
4.1 Background
In this section, various approached employed for feature extraction and selection are
discussed.
The ABCD rule is one of the most efficient and robust approaches employed by
dermatologists in skin lesion identification and classification to assess the risk of malignity of a
pigmented lesion. This method can provide a more objective and reproducible diagnosis of
melanoma plus it has the added advantage of calculating fast. This feature extraction approach
employs four main parameters. They are:
4.2.1.1 Asymmetry
2 ∆𝑁𝑆
𝑁𝑃 = 𝑘−1 𝑁𝑀 4.1
Where S is the major axis, the minor axis ∆𝑁𝑆 is the area of non-overlapping zone and NM is the
total area of the lesion. To measure asymmetry in terms of color, the equation was based on the
histograms of the three RGB components of each part of the lesion and the chi-square distance as
given below.
2
𝐾 (e1(x) − e2(x))
𝑄 𝑒1, 𝑒2 = 𝑡−1 e1(x) + e2(x) 4.2
Finally, the asymmetry score, N, is calculated as the average of the two asymmetry scores –
the one in terms of form and that in terms of color.
Normally, the border of a benign lesion is clearly defined. And with a border irregularity,
one can predict cancer characteristics during its growth and propagation. To evaluate a lesion, it
is divided into eight sectors as shown in figure 4.1.
94
Within each sector, a strong and sharp cut pattern at the periphery receives a score of 1.
In contrast, an indistinct borderline is given a score of 0. In this manner the maximum irregular
border score of a lesion is 8 and the minimum, 0. A calculation based on the Euclidean distance
and the standard derivation in each sector can be performed, thus Euclidean distance 𝑄𝑥 is,
where,𝑖2 and 𝑗2 are the coordinates of the center of the lesion;𝑖1 and 𝑗1 are the coordinates of
pixel 𝑥.
𝑘
𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = 𝑡−1 𝑄𝑥 4.4
with k being the number of pixels on the edge belonging to the considered area,𝑄𝑥 . The
Euclidean distance between the center of the lesion and the pixel 𝑥 is given above and the
standard deviation Tfor each sector can be calculated with the following equation:
1 𝑘 1
𝑇 = (𝑘 𝑡−1 𝑖𝑥 − 𝑖 2) 2 4.5
4.2.1.3 Colors
There are six colors whose presence characterizes melanoma, namely, white, red, light
brown, dark brown, blue-gray and black. Each color, in this thesis and in the methodology
95
executed, has been assigned with a score of ‘1’. Hence, a lesion can score a maximum of 6 if all
six colors are present. ‘1’ would obviously be the minimum score. To validate the existence of
one or more colors in the lesion, the image was convertedto the CIE Labcolor space from the
RGB model. This is because the distance between two colors in the RGB color space does not
reflect the real ‘difference’ perceived by the naked eye as well as the CIE Lab code. Considering
the above, the color score for a lesion with D=2, can be shown as in the following figure 4.2.
Figure 4.2: Color score calculation for a lesion:D=2(light brown and dark brown)
where C1, m1 and n1 are the components of the CIE Lab color space of the desired color and C2,
m2 and n2, those of each pixel of the image.
4.2.1.4 Diameter
Experts in the field of medicine use coloras a means to classify lesions on skin and most
CAD systems exploit this color feature. Color statistics such as mean color and color variance
[208] are commonly used in dermatoscopy analysis. These parameters are normally computed in
the RGB colourspace, however, other color spaces have also been tried and tested [209]. In the
RGB colorcode, a specificcolor can be represented as a combination of three primary colors, that
is, red (R), green (G) and blue (B).
97
Figure 4.4 Histograms of (top) amplitude and (bottom) orientation for (left) melanoma and (right) non-melanoma
lesions.
The combination quantities are given bythe color components. But, the RGB model
cannot be completely depended upon; for one, it does not appear uniform as it is based on how
the values were acquired in the first place; and secondly, the three color channels are shown to be
highly correlated [210]. These drawbacks are overcome by using other color representations.
Biologically inspired color spaces such as the opponent color space or Opp [211] is used. Other
models used includeHue Saturation Value (HSV)and Hue Saturation Intesity (HIS) that relate to
how humans perceive color as hue, saturation and brightness [210], or CIE L*a*b* and CIE
L*u*v* [49] that are perceptually uniform color spaces. There is a one for one mapping between
the pairs of colour spaces but the components represented by each of the models are different and
the resulting values are modified to present the respective property of that model. The histogram
of that model will also vary according to the colour space used this has shown in figure 4.4.
This thesis considers all the six color spaces mentioned above and the colour distribution
in the lesion region R (or R1 , R2 ) was characterized using a set of three color histograms, each of
them with Mc bins. The color histogram associated with the color channelIc (x), c ∈ {1, 2, 3} was
obtained using the following expression:
1
hc i = N x∈R bc (Ic (x)) i = 1, … , Mc 4.7
98
where, 𝑁 = #𝑅 denotes the number of pixels in region R, and 𝑖 is the histogram bin. The bins are
defined by splitting the colour component range (which depends on the color space) into 𝑀𝑐
subintervals with the same length.
The following figure 4.5, illustrates the series of interconnections of the histograms 𝑐 for
four different color spaces and splitting region R into 𝑅1 and𝑅 2 . The differences between
melanoma and non-melanoma features are more noticeable in these cases.
99
Figure 4.5 Concatenated histograms of (first row) RGB, (second row) 𝐿 ∗ 𝑢 ∗ 𝑣 ∗, (third row) Opp, and (fourth row)
HSV for (left) melanoma and (right) non-melanoma lesions, computed in R1 and R2.
The texture of an image refers to the manner in which intensity and color is spatially
organized in an image. Among the many different ways in which this can be characterized, pixel
statistics is one of them. This classic method consists of computing the statistics of pairs of
neighboring pixels, using the co-occurrence matrix [212].
Image retrieval systems use texture as an important feature and this texture can be
categorized into two: Statistical and Spectral. For spatial distribution, a texture feature may be
sufficient; at other times a combination of features will be more effective. Determine what to use
remains the primary goal. A Fourier transform can also be applied to an image to describe
texture during which spectral energy can be characterized into various frequency bands [213].
Other image transforms used are wavelets [214], Laplacian pyramids [215], or linear filters (e.g.,
Law texture features) among others [216]. Currently there are massive scientific image databases
present from which humans can be expected to analyze and extract knowledge. However, no
matter how logically simple this task of analysis and knowledge gain from vast resources looks,
the process consists of some computation that are in fact executed better by machines. This
paper, therefore, focuses on intelligent image features extraction, typically by the use of
categorized or multi-resolution analysis.
Statistical approach
Gabor filter
Multi-resolution wavelet
Many a research work hasused gradient histograms to characterize textures which in turn
have helped achieve excellent results. This paper employed two gradient histograms. Firstly, the
RGB image was converted to a gray-level image by selecting the color channel with the highest
entropy [217]. Tocompute the image gradient, this gray-level image was then filtered using a
Gaussian filter with σ = 2 and then the gradient vector was estimated at each point [𝑔(𝑥) =
[𝑔1 (𝑥) 𝑔2 (𝑥)]𝑇 using Sobel operator masks. Next, the gradient magnitude and orientation were
retrieved as follows:
𝑔2 (𝑥)
𝑔(𝑥) = 𝑔12 𝑥 + 𝑔22 𝑥 , 𝜙 𝑥 − 𝑡𝑎𝑛−1 4.8
𝑔1 (𝑥)
The histograms of the gradient amplitude and orientation are defined as follows:
1
𝑎 𝑖 = 𝑁 𝑥∈𝑅 𝑏𝑖 𝑔 𝑥 , 𝑖 = 1, … , 𝑀𝑎 4.9
1
𝜙 𝑖 = 𝑁 𝑥∈𝑅 𝑏𝑖 𝜙 𝑥 , 𝑖 = 1, … , 𝑀𝜙 4.10
Where R denotes the set of pixels that were classified as lesion;𝑁 = #𝑅is the number of lesion
pixels, and 𝑀𝑎 and 𝑀𝜙 are the number of bins used in the amplitude and orientation histograms.
𝑏𝑖 (.) and𝑏𝑖 (.) are the characteristic functions of the ith histogram bin, defined by:
When lesion region R is split into inner region 𝑅1 and border𝑅2 , individual histograms
were obtained for each of these regions using the same expressions and replacing R by 𝑅𝑖 and N
by𝑁𝑖 .
101
The formation of the border of a lesion and the geometrical properties of its shape offer
enough information in the detection of melanoma. There are four features: color, asymmetry,
border irregularity and differential structures; however when it comes to the ABCD rule of
dermoscopy [218], asymmetry is of primal importance. A number of studies on quantifying
asymmetry in skin lesions have been carried out. In some methods like principal axis [219],
major axis of the best-fit ellipse [220], Fourier transform [222] and longest or shortest diameter
[223],the symmetry axis is evaluated in a certain manner and the two halves of the lesion along
the axis are compared.Also, another method involves considering geometrical measurements
over the whole of the lesion to determine symmetry e.g., symmetric distance and circularity
[223]. There are other theories and calculations that use the circularity index (or thinness ratio)to
measure border irregularity in dermoscopy images [221], [224]. Apart from these measurements,
other border and shape features extracted from the lesion include bulkiness, irregularity indices
and fractal dimension [225]. This research work incorporates the extraction of some regular
geometry features like ABCD features, area, perimeter, greatest diameter, circularity index,
irregularity index A, irregularity index B, and asymmetry index to assist in the next stage of
classification.
Detecting the border or segmentation is the main prerequisite for the extraction of border
features, during which the lesion is separated from the surrounding normal skin. The output of
the segmentation step is a black and white imageshowing the segmentation plane. Following the
formation of the lesion pixels in a 2D matrix and the corresponding boundary pixels in a vector,
a group of 11 geometry-based features are removed from each dermoscopy image; they are listed
below:
Area (A):
Perimeter (P):
Greatest diameter (GD): The distance between the two farthest points on the boundary such that
the connecting line passes through the lesion’s centroid C, is given by:
𝑛 𝑛
𝑖=1 𝑥 𝑖 𝑖=1 𝑦 𝑖
𝑥𝑐 , 𝑦𝑐 = , 4.13
𝑛 𝑛
where,n is the number of pixels inside the lesion and (𝑥𝑖 , 𝑦𝑖 ) is the coordinates of the ith lesion
pixel.
In this thesis, the dermoscopic image considered has fairly similar spatial resolution; thus,
there has been no scale issue for features such as area and perimeter. When images considered
are of varied magnified conditions and have different resolutions, a normalization procedure is
required to measure features.
The distance between the two nearest boundary points such that the connecting line
passes through the lesion’s centroid.
4𝐴𝜋
𝐶𝑅𝐶 = 4.14
𝑃2
𝑃
𝐼𝑟𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑖𝑛𝑑𝑒𝑥 𝐴 (𝐼𝑟𝐴): = 𝐴 4.15
𝑃
𝐼𝑟𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑖𝑛𝑑𝑒𝑥 𝐵 (𝐼𝑟𝐵) = 𝐼𝑟𝐵 = 𝐺𝐷 4.16
1 1
𝐼𝑟𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑡𝑦 𝑖𝑛𝑑𝑒𝑥 𝐶 (𝐼𝑟𝐶) = 𝐼𝑟𝑃 × − 𝑆𝐷 4.17
𝑆𝐷
Taking the principal axis as the major symmetry axis and its 90◦rotation as the minor
axis, these indices can be defined as the difference in areas of the two halves of the lesion.
2 𝑛𝑖=1 𝑥 𝑖 𝑦 𝑖
𝑡𝑎𝑛2𝜃 = 𝑛 2 𝑛 2 4.19
𝑖=1 𝑥 𝑖 − 𝑖=1 𝑦 𝑖
The lesion is folded along the axis after the major and minor symmetry axes are evaluated, and
thedifference in area between the two halves of the lesion are calculated by applying the XOR
operation along the binary segmentation plane. The asymmetry index is measured by:
𝐴𝐷
𝐴𝑠𝑦𝑚𝑚𝑒𝑡𝑟𝑦 𝐼𝑛𝑑𝑒𝑥 = × 100 4.20
𝐴
It is a well-known fact that depth plays a major role in melanoma detection and a precise
classification. And so, in addition to 2D feature extraction, 3D feature extraction is also
performed in this study, to enhance the property of the selected features enabling optimal
classification which in turn results in amore efficient prediction of melanoma detection. The
following section discusses the 3D feature extraction process.
Let density distribution function be represented as, 𝑓(𝑥, 𝑦), then its 2D 𝑝 + 𝑞 𝑡 order
moments 𝑚𝑝𝑞 are defined in terms of Riemann integrals as:
If it is assumed that 𝑓(𝑥, 𝑦)is piecewise continuous, therefore, bounded function, and
that it can have non-zero values only in the finite part of the (𝑥, 𝑦) then its moments of all
orders exist and the moments sequence 𝑚𝑝𝑞 is uniquely determined by 𝑓(𝑥, 𝑦) and
conversely 𝑓(𝑥, 𝑦) is uniquely determined by 𝑚𝑝𝑞 .
where,𝑥 = 𝑚10 /𝑚00 , 𝑦 = 𝑚01 /𝑚00 is the center of gravity of an image𝑚00 . When an intensity
image is taken into consideration, its quality is represented by 𝑚00 while the same denotes ‘area’
of a binary image. Under translation, central moments are invariants. Shapes of images can be
represented by both central and geometry moments. Using zero’thcentral moment, the central
moments of other orders can be normalized as follows:
𝜇 𝑝𝑞
𝜂𝑝𝑞 = 𝑟 , 𝑟 = (𝑝 + 𝑞 + 2)/2, 𝑝 + 𝑞 = 2,3,4 …4.24
𝜇 00
105
Hu’s moment invariants consist of 7 moments which are widely used for the recognition of
the shape of the object. These seven moment invariants we reproved by Hu M. K. under
translation, scale and rotation of images using the linear combination of the second and third
order normalized central moments.
𝑀1 = 𝜂20 + 𝜂02 ,
2
𝑀2 = (𝜂20 − 𝜂02 )2 + 4𝜂11
2 2
𝑀5 = 𝜂30 − 3𝜂12 𝜂30 + 𝜂12 𝜂30 + 𝜂12 − 2 𝜂21 + 𝜂03 + 𝜂30 − 3𝜂21
𝑀6 = 𝜂20 − 𝜂02 (𝜂30 + 𝜂12 )2 − (𝜂21 + 𝜂03 )2 + 4𝜂11 𝜂30 + 𝜂12 𝜂21 + 𝜂03
2 2
𝑀7 = 3𝜂21 − 𝜂03 ( 𝜂30 + 𝜂12 𝜂30 + 𝜂12 − 3 𝜂21 + 𝜂03 − 3𝜂12 − 𝜂30
2 2
( 𝜂03 + 𝜂21 𝜂03 + 𝜂21 − 3 𝜂12 + 𝜂30 4.25
𝑥′ 𝑎11 𝑎12 𝑥 𝑏1
= 𝑎 𝑎22 𝑦 + = 𝐴 𝑥, 𝑦 𝑇 4.26
𝑦′ 21 𝑏2
106
𝑟
Generally, affine moment invariants are formulated by dividing𝜇𝑝𝑞 by polynomial of𝜇𝑝𝑞 ,
where 𝑟is an appropriate exponential. Jan Flusser et al. [226] proved the affine moment
invariants of first three orders as below:
2 4
𝐼1 = 𝜇20 𝜇02 − 𝜇11 /𝜇00
2 2 3 3 2 2 10
𝐼2 = 𝜇30 𝜇03 − 6𝜇30 𝜇21 𝜇12 𝜇03 + 4𝜇30 𝜇12 + 4𝜇21 𝜇03 − 3𝜇21 𝜇12 /𝜇00
2 2 7
𝐼3 = 𝜇20 𝜇21 𝜇03 − 𝜇12 − 𝜇11 (𝜇30 𝜇03 − 𝜇21 𝜇12 + 𝜇02 (𝜇30 𝜇12 − 𝜇21 ))/𝜇00 4.27
1 Color 2D
2 Shape 2 D
3 Shape 2 D + 3D
4 Texture 2 D
5 Texture 2 D +3 D
107
6 Color + Texture 2 D
As seen above in table 4.1, multiclass classification models were proposed for feature
classification.The feature set is defined asFR = F C , F T , F S2d , F S3d . A heuristic
approach is adopted to obtain the optimized feature setFRSel ⊆ FR . Optimized feature set
is constructed considering different combinations of the features extracted. Resulting
performance enable in understanding the significance of features considered on the
classification system. Experimental study discussed in the subsequent section considers
four optimized feature set combining the features extracted.
The optimized feature sets considered for evaluation are defined as