Hyper Spectral Data Classification

CLASSIFICATION AND FEATURE
SELECTION USING REMOTE

SENSING DATA
MAHESH PAL
NATIONAL INSTITUTE OF TECHNOLOGY
KURUKSHETRA, INDIA
Remote Sensing data

Panchromatic-one band
Multispectral Many bands (This system use
sensors that detect radiation in a small number
of broad wavelength bands.
Hyperspectral Large numbers of contiguous
bands
Hyperspectral sensor collects many, very
narrow,
contiguous
spectral
bands
throughout the visible, near-infrared, midinfrared, and thermal infrared portions of
the electromagnetic spectrum.
Landsat 7 ETM+ data

(Multispectral)
Band number
1
2
3
4
5
6
7
Panchromatic
Spectral range
(microns)
0.450 - 0.515
0.525 - 0.605
0.630 - 0.690
0.750 - 0.900
1.550 - 1.750
10.40 - 12.50
2.090 - 2.350
0.520 - 0.900
Ground
resolution (m)
30
30
30
30
30
60
30
15
Between 0.45 -2.35 m - A total of six bands
Images of the La Mancha (Spain) area by ETM+ sensor (30m

resolution)
The DAIS (Digital Airborne Imaging

Spectrometer) Hyperspectral Sensor
Spectrometer
Bands (79)
Wavelength range
(micrometer)
VIS/NIR
32
0.50 - 1.05
SWIR I
1.50 - 1.80
SWIR II
32
1.90 - 2.50
MIR
3.00 - 5.00
TIR
8.70 - 12.50
Between 0.502-2.395 m - A total of 72 bands

Continuous bands at 10-45 nm bandwidth
Images of the La Mancha (Spain) area using DAIS hyperspectral image

(5m resolution)
Hyperspectral Imaging, Imaging

Spectrometry, Imaging Spectroscopy
Spectroscopy is the study of electromagnetic radiation.
Imaging spectroscopy has been used in the laboratory
by physicists and chemists for over 100 years.
Imaging spectroscopy has many names in the remote
sensing community, including imaging spectrometry or
hyperspectral imaging.
It acquires image in large number , narrow, contiguous
spectral bands. Thus enabling extraction of reflectance
spectra at a pixel scale, that can directly be compared
with similar spectra from field.
Importance of a Hyperspectral
Sensor
Provide spectral reflectance data in hundreds of bands

rather than only a few bands with multispectral data
Allows for far more specific analysis of land cover
The emissivity levels of each band can be combined to form a
spectral reflectance curve
These sensor provide information

Visible region- vegetation, chlorophyll, sediments
Near Infrared - atmospheric properties, cloud cover,
vegetation land cover transformation
Thermal Infrared Sea surface temperature, forest fires,
volcanoes, cloud height, total ozone
CLASSIFICATION
Land cover classification has been a major research
area involving the use of remote sensing images.
Image classification process involves in assigning
pixels to classes in terms of the characteristics of
the objects or materials.
A major input in GIS based studies
Several approaches are used for land cover
classification.
CLASSIFICATION ALGORITHMS
Predictive accuracy
Computational cost
o
o
Robustness
o
time to construct the model

time to use the model
handling noise and missing values
Interpretability:
o
understanding the insight provided by the model
Hyperspectral data classification

1. Provide greater detail on the spectral variation of
targets than conventional multispectral systems.
2. The availability of large amounts of data represents
a challenge to classification analyses.
3. Each spectral waveband used in the classification
process should add an independent set of
information.
4. However, features are highly correlated, suggesting
a degree of redundancy in the available information
which can have a negative impact on classification
accuracy.
5. Require large pool of training data, which is quite costly to
collect.
Various approaches for the appropriate

classification of high dimensional data
1. Adoption of a classifier that is relatively insensitive to the Hughes
effect (Vapnik, 1995).
2. Using a methods to effectively increase training set size i.e. semisupervised classification (Chi and Bruzzone, 2005), active
learning,
and use of unlabelled data (Shahshahani and D. A.
Landgrebe, 1994)
3. Use of some form of dimensionality reduction procedure prior to
the classification analysis.
Training samples
Learning algorithm
Model/ function
Also called as
Hypothesis
Output values
Testing samples
Hypothesis can be considered as a machine that provides the prediction for
test data
SUPPORT VECTOR MACHINES (SVM)

Basic Theory: in 1965
Margin based classifier: in 1992
Support vector network: In 1995
Since 1998, support vector network called as
Support Vector Machines (SVM) - used as an
alternative to neural network.
First application in remote sensing
Gualtieri and Cromp, (1998) for hyperspectral
image classification
SVM: structural risk minimisation (SRM)
statistical learning theory proposed in

1960s by Vapnik and co-workers.
SRM: Minimise the probability of

misclassifying an unknown data drawn
randomly
Neural
network:
minimisation
Empirical
risk
Minimise the misclassification error on

training data
SVM
Map data from the original input feature
space to a very high dimensional feature
space.
Data becomes linearly separable but
problem becomes computationally difficult.
Kernel function allows SVM to work in
feature space, without knowing mapping
and dimensionality of feature space.
Advantages
Margin
theory
suggest
dimensionality of input space
no
effect
of
uses fewer number of training data (called

support vectors)
QP solution, so no local minima
Not many user-defined parameters
But with real data:

95
Classification accuracy (%)
90
85
80
75
70
65
60
8 pixels
15 pixels
25 pixels
50 pixels
75 pixels
100 pixels
55
5
10
15
20
25
30
35
40
45
50
55
60
65
Number of features
Mahesh Pal and Giles M. Foody, 2010, Feature selection for classification of hyperspectral data
by SVM. IEEE Transactions on Geoscience and Remote Sensing, Vol. 48, No. 5, 2297-2306.
Disadvantages
Designed for two class problem

Different methods to create multi-class
classifier.
Choice of kernel function and kernel
specific parameters
The kernel function should satisfy the
Mercers theorem
Choice of Regularisation Parameter C
Output is not naturally probabilistic
Relevance vector Machines (RVM)

Based on a probabilistic Bayesian
formulation
of
a
linear
model
(Tipping, 2001).
Produce a sparse solution than that of
SVM (i.e. less number of relevance
vectors)
Ability to use non-Mercer kernels
Probabilistic output
No need of parameter C
Major difference from SVM

Selected points are anti-boundary (away
from decision boundary)
Support vectors represent the least
prototypical examples (closer to boundary,
difficult to classify)
Relevance
vectors
are
the
most
prototypical (more representative of class)
Location of the useful training cases with

SVM & RVM
MAHESH PAL AND G.M FOODY, 2012, Evaluation of SVM, RVM and SMLR for accurate image classification
with limited ground data, IEEE journal of selected topics in applied earth observations and remote sensing, 5( 5),
1344-1355.
MAJOR DIFFERENCE FROM SVM

Selected
points are antiboundary (away from

Boundary)
support
vectors
represent
the
least
prototypical
examples
(closer
to
boundary, difficult to
classify),
relevant
vectors are the most
prototypical
(more
representative of class)
Disadvantages
Requires
large computation cost in

comparison to SVM.
Designed for 2-class problem- similar to
SVM.
Choice of kernel
May have a problem of local minima
Sparse Multinomial Logistic

Regression(SMLR)
SMLR algorithm learns a multi-class
classifier based on the multinomial logistic
regression.
Uses a Laplacian prior on the weights of
the linear combination of functions to
enforce sparsity.
SMLR performs a feature selection and
classification simultaneously.
Somewhat closer to RVM
Location of the useful training cases with

SMLR
110
100
Band 5
90
80
Wheat
70
Sugar beet
Oilseed rape
60
50
40
70
80
90
Band 1
100
LOCATING USEFUL TRAINING

SAMPLES
The Mahalanobis distance between
sample and a class centroid is used.
Small distance indicates that the sample

lies close to the class centroid and so is
typical of the class while a large distance
indicates that the sample is atypical.
Can help to reduce the field work for

ground truth collection, thus reducing
project cost
PRESENT WORK
Working
with
COST
Action
(European
Cooperation in Science and Technology)
TD1202: Mapping and the citizen sensor as Non
EU member
1. Classification with imperfect/noisy data
2. How SVM / RVM and SMLR works with noisy
data
3. Will be working on other classifiers- RF, ELM
Two type of data Noise

Attribute noise and Class noise
We are dealing with class noise, which can
happen due to subjectivity, data-entry error, or
inadequacy of the information used to label
each class.
Possible solutions to deal with class noise
includes data cleaning, detection and
elimination of mislabelled training cases
Error in
data -
5%
RVM
88%
(51)
88.22% 87.11% 87.78% 87.33% 87.56% 86.44% 85.56% 84.00%

(45)
(40)
(46)
(41)
(37)
(39)
(32)
(35)
SMLR
88.67% 88.89% 88.67% 87.78% 88.00% 87.33% 87.77% 86.89% 86.67%

(83)
(91)
(85)
(82)
(89)
(80)
(78)
(86)
(72)
SVM
89.11% 88.00% 90.0%

(203)
(259)
(310)
10%
15%
20%
25%
30%
89.77% 89.11% 86.67% 84.0%

(339)
(369)
(409)
(432)
35%
40%
84.22% 83.11%
(447)
(490)
EXTREME LEARNING MACHINES (ELM)
A neural network classifier
Use one hidden layer only
No parameter except number of hidden nodes
Kernel function can be used in place of

hidden layer by modifying the optimization
problem.
Global solution (no local optima like NN)
Performance comparable to SVM and

better than back-propagation neural
network
Multiclass
Very fast
Classification Accuracy
Dataset
SVM (%)
KELM (%)
ETM+
88.37
90.33
ATM
92.50
94.06
DAIS
91.97
92.16
Computational cost
Dataset
SVM (sec)
KELM (sec)
ETM+
76.74
5.78
DAIS
40.78
1.02
ATM
1.30
0.17
Mahesh Pal, A.E. Maxwell and T. A. Warner, Kernel based Extreme Learning Machine for Remote Sensing Image
Classification,2014, Remote Sensing letters.
PRESENT WORK
Working
on
sparse
extreme
learning
machine (produce sparse solution similar to

support vector machine)
Ensemble of extreme learning machine
Also trying to understand the working of
deep neural network
FEATURE REDUCTION
Two broad categories are: feature selection and

feature extraction.
Feature
reduction
may
speed-up
the
classification process by reducing data set size.
May increase the predictive accuracy.
May increase the ability to understand the

classification rules.
Feature selection select a subset of the original

features those maintains the useful information
to separate the classes by removing redundant
features.
FEATURE EXTRACTION
Number of techniques for feature extraction including
Principal Components,
maximum noise fraction
transformation, non-orthogonal techniques such as
projection pursuit, Independent component analysis are
proposed.
MNF requires estimates of the signal and noise
covariance matrices
Different features provided by MNF are ranked as per
signal-to-noise ratio (First MNF have smallest value of SN ratio).
Results with DAIS data suggests that MNF may not be
used effectively for dimensionality reduction.
Feature selection
Three approaches of feature selection are:
Filters: uses a search algorithm to search through the space of
possible features and evaluate each feature by using a filter such as
correlation and mutual information
Wrappers: uses a search algorithm to search through the space of
possible features and evaluate each subset by using a classification
algorithm.
Embedded: some classification processes such as random forest/
Multinomial logisitic regression produce a ranked list of features
during classification.
Filters
Large number of filter based approach are available in literature.
Some used with hyperspectral data are:
1. Correlation-based feature selection
2. Minimum-Redundancy-Maximum-Relevance (mRMR)
3. Entropy
4. Fuzzy entropy
5. Signal-to-noise ratio
6. RELIEF
WRAPPER APPROACH
SVM-RFE approach utilise SVM as base classifier.
1 2 w
The SVM-RFE utilise the objective
function
as a feature ranking criterion to produce a list of
features ordered by their discriminatory ability.
The feature, with the smallest ranking score is
eliminated.
SVM-RFE uses a backward feature elimination scheme
to recursively remove insignificant features from subsets
of features in order to derive a list of all features in rank
order of value.
A major drawback of wrapper methods is their high
computational requirements
EMBEDDED APPROACH
During classification process some algorithm produce
ranked list of all features.
For example: two approaches based on Random forest
and Multinomial logistic regression classifier can be
used.
In contrast to the filter and wrapper approaches, the
search for an optimal feature subset by embedded
approach is built into the classification algorithm
itself.
Classification and the feature selection process
cannot be separated.
Data Set
1. DAIS 7915 sensor by German Space Agency flown on 29 June
2000.
2. The sensor acquire information in 79-bands at a spatial resolution of
5m in the wavelength range of 0.50212.278 m.
3. 7 features located in the mid- and thermal infrared region and 7
features from spectral region of 0.502 2.395 m due to striping
noise were removed.
4. An area of 512 pixels by 512 pixels and 65 features covering the test
site was used.
Training and test data

1. Random sampling was used to collect training and test using a
ground reference image.
2. Eight land cover classes i.e. wheat, water, salt lake, hydrophytic
vegetation, vineyards, bare soil, pasture and built-up land.
3. A total of 800 training pixels and a total of 3800 test pixels was
used.
Feature selection
Algorithm
Number of used
features
Accuracy
65
91.76
Fuzzy entropy
14
91.68
Entropy
17
91.61
Signal to noise ratio
20
91.68
Relief
SVM-RFE
20
88.61
None
mRMR
CFS
Random forest
Multinomial logistic
regression
13
91.89
37
17
21
91.84
91.84
92.08
15
92.76
PRESENT WORK
How noise affects the feature selection

Ensemble of feature selection method
Stability of feature selection algorithms
for hyperspectral data

Hyper Spectral Data Classification

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Hyper Spectral Data Classification

Загружено:

Авторское право:

Доступные форматы

CLASSIFICATION AND FEATURE

SELECTION USING REMOTE

Remote Sensing data

Landsat 7 ETM+ data

Between 0.45 -2.35 m - A total of six bands

Images of the La Mancha (Spain) area by ETM+ sensor (30m

The DAIS (Digital Airborne Imaging

Between 0.502-2.395 m - A total of 72 bands

Images of the La Mancha (Spain) area using DAIS hyperspectral image

Hyperspectral Imaging, Imaging

Provide spectral reflectance data in hundreds of bands

These sensor provide information

time to construct the model

understanding the insight provided by the model

Hyperspectral data classification

Various approaches for the appropriate

and use of unlabelled data (Shahshahani and D. A.

SUPPORT VECTOR MACHINES (SVM)

SVM: structural risk minimisation (SRM)

statistical learning theory proposed in

SRM: Minimise the probability of

Minimise the misclassification error on

uses fewer number of training data (called

But with real data:

Classification accuracy (%)

Designed for two class problem

Relevance vector Machines (RVM)

Major difference from SVM

Location of the useful training cases with

MAJOR DIFFERENCE FROM SVM

points are antiboundary (away from

large computation cost in

Designed for 2-class problem- similar to

Sparse Multinomial Logistic

Location of the useful training cases with

LOCATING USEFUL TRAINING

Small distance indicates that the sample

Can help to reduce the field work for

Two type of data Noise

88.22% 87.11% 87.78% 87.33% 87.56% 86.44% 85.56% 84.00%

88.67% 88.89% 88.67% 87.78% 88.00% 87.33% 87.77% 86.89% 86.67%

89.11% 88.00% 90.0%

89.77% 89.11% 86.67% 84.0%

EXTREME LEARNING MACHINES (ELM)

A neural network classifier

Use one hidden layer only

No parameter except number of hidden nodes

Kernel function can be used in place of

Global solution (no local optima like NN)

Performance comparable to SVM and

machine (produce sparse solution similar to

Two broad categories are: feature selection and

May increase the predictive accuracy.

May increase the ability to understand the

Feature selection select a subset of the original

Training and test data

Signal to noise ratio

How noise affects the feature selection

Вам также может понравиться