Вы находитесь на странице: 1из 46

CLASSIFICATION AND FEATURE

SELECTION USING REMOTE


SENSING DATA

MAHESH PAL
NATIONAL INSTITUTE OF TECHNOLOGY
KURUKSHETRA, INDIA

Remote Sensing data


Panchromatic-one band
Multispectral Many bands (This system use
sensors that detect radiation in a small number
of broad wavelength bands.
Hyperspectral Large numbers of contiguous
bands
Hyperspectral sensor collects many, very
narrow,
contiguous
spectral
bands
throughout the visible, near-infrared, midinfrared, and thermal infrared portions of
the electromagnetic spectrum.

Landsat 7 ETM+ data


(Multispectral)
Band number
1
2
3
4
5
6
7
Panchromatic

Spectral range
(microns)
0.450 - 0.515
0.525 - 0.605
0.630 - 0.690
0.750 - 0.900
1.550 - 1.750
10.40 - 12.50
2.090 - 2.350
0.520 - 0.900

Ground
resolution (m)
30
30
30
30
30
60
30
15

Between 0.45 -2.35 m - A total of six bands

Images of the La Mancha (Spain) area by ETM+ sensor (30m


resolution)

The DAIS (Digital Airborne Imaging


Spectrometer) Hyperspectral Sensor
Spectrometer

Bands (79)

Wavelength range
(micrometer)

VIS/NIR

32

0.50 - 1.05

SWIR I

1.50 - 1.80

SWIR II

32

1.90 - 2.50

MIR

3.00 - 5.00

TIR

8.70 - 12.50

Between 0.502-2.395 m - A total of 72 bands


Continuous bands at 10-45 nm bandwidth

Images of the La Mancha (Spain) area using DAIS hyperspectral image


(5m resolution)

Hyperspectral Imaging, Imaging


Spectrometry, Imaging Spectroscopy
Spectroscopy is the study of electromagnetic radiation.
Imaging spectroscopy has been used in the laboratory
by physicists and chemists for over 100 years.
Imaging spectroscopy has many names in the remote
sensing community, including imaging spectrometry or
hyperspectral imaging.
It acquires image in large number , narrow, contiguous
spectral bands. Thus enabling extraction of reflectance
spectra at a pixel scale, that can directly be compared
with similar spectra from field.

Importance of a Hyperspectral
Sensor

Provide spectral reflectance data in hundreds of bands


rather than only a few bands with multispectral data
Allows for far more specific analysis of land cover
The emissivity levels of each band can be combined to form a
spectral reflectance curve

These sensor provide information


Visible region- vegetation, chlorophyll, sediments
Near Infrared - atmospheric properties, cloud cover,
vegetation land cover transformation
Thermal Infrared Sea surface temperature, forest fires,
volcanoes, cloud height, total ozone

CLASSIFICATION
Land cover classification has been a major research
area involving the use of remote sensing images.
Image classification process involves in assigning
pixels to classes in terms of the characteristics of
the objects or materials.
A major input in GIS based studies
Several approaches are used for land cover
classification.

CLASSIFICATION ALGORITHMS

Predictive accuracy
Computational cost
o
o

Robustness
o

time to construct the model


time to use the model
handling noise and missing values

Interpretability:
o

understanding the insight provided by the model

Hyperspectral data classification


1. Provide greater detail on the spectral variation of
targets than conventional multispectral systems.
2. The availability of large amounts of data represents
a challenge to classification analyses.
3. Each spectral waveband used in the classification
process should add an independent set of
information.
4. However, features are highly correlated, suggesting
a degree of redundancy in the available information
which can have a negative impact on classification
accuracy.
5. Require large pool of training data, which is quite costly to
collect.

Various approaches for the appropriate


classification of high dimensional data
1. Adoption of a classifier that is relatively insensitive to the Hughes
effect (Vapnik, 1995).
2. Using a methods to effectively increase training set size i.e. semisupervised classification (Chi and Bruzzone, 2005), active
learning,

and use of unlabelled data (Shahshahani and D. A.

Landgrebe, 1994)
3. Use of some form of dimensionality reduction procedure prior to
the classification analysis.

Training samples

Learning algorithm

Model/ function

Also called as
Hypothesis

Output values

Testing samples
Hypothesis can be considered as a machine that provides the prediction for
test data

SUPPORT VECTOR MACHINES (SVM)


Basic Theory: in 1965
Margin based classifier: in 1992
Support vector network: In 1995
Since 1998, support vector network called as
Support Vector Machines (SVM) - used as an
alternative to neural network.
First application in remote sensing
Gualtieri and Cromp, (1998) for hyperspectral
image classification

SVM: structural risk minimisation (SRM)

statistical learning theory proposed in


1960s by Vapnik and co-workers.

SRM: Minimise the probability of


misclassifying an unknown data drawn
randomly
Neural
network:
minimisation

Empirical

risk

Minimise the misclassification error on


training data

SVM
Map data from the original input feature
space to a very high dimensional feature
space.
Data becomes linearly separable but
problem becomes computationally difficult.
Kernel function allows SVM to work in
feature space, without knowing mapping
and dimensionality of feature space.

Advantages
Margin
theory
suggest
dimensionality of input space

no

effect

of

uses fewer number of training data (called


support vectors)
QP solution, so no local minima
Not many user-defined parameters

But with real data:


95

Classification accuracy (%)

90
85
80
75
70
65
60

8 pixels

15 pixels

25 pixels

50 pixels

75 pixels

100 pixels

55
5

10

15

20

25

30

35

40

45

50

55

60

65

Number of features

Mahesh Pal and Giles M. Foody, 2010, Feature selection for classification of hyperspectral data
by SVM. IEEE Transactions on Geoscience and Remote Sensing, Vol. 48, No. 5, 2297-2306.

Disadvantages

Designed for two class problem


Different methods to create multi-class
classifier.
Choice of kernel function and kernel
specific parameters
The kernel function should satisfy the
Mercers theorem
Choice of Regularisation Parameter C
Output is not naturally probabilistic

Relevance vector Machines (RVM)


Based on a probabilistic Bayesian
formulation
of
a
linear
model
(Tipping, 2001).
Produce a sparse solution than that of
SVM (i.e. less number of relevance
vectors)
Ability to use non-Mercer kernels
Probabilistic output
No need of parameter C

Major difference from SVM


Selected points are anti-boundary (away
from decision boundary)
Support vectors represent the least
prototypical examples (closer to boundary,
difficult to classify)
Relevance
vectors
are
the
most
prototypical (more representative of class)

Location of the useful training cases with


SVM & RVM

MAHESH PAL AND G.M FOODY, 2012, Evaluation of SVM, RVM and SMLR for accurate image classification
with limited ground data, IEEE journal of selected topics in applied earth observations and remote sensing, 5( 5),
1344-1355.

MAJOR DIFFERENCE FROM SVM


Selected

points are antiboundary (away from


Boundary)

support

vectors
represent
the
least
prototypical
examples
(closer
to
boundary, difficult to
classify),
relevant
vectors are the most
prototypical
(more
representative of class)

Disadvantages
Requires

large computation cost in


comparison to SVM.

Designed for 2-class problem- similar to

SVM.

Choice of kernel
May have a problem of local minima

Sparse Multinomial Logistic


Regression(SMLR)
SMLR algorithm learns a multi-class
classifier based on the multinomial logistic
regression.
Uses a Laplacian prior on the weights of
the linear combination of functions to
enforce sparsity.
SMLR performs a feature selection and
classification simultaneously.
Somewhat closer to RVM

Location of the useful training cases with


SMLR
110
100

Band 5

90
80
Wheat

70

Sugar beet
Oilseed rape

60
50
40
70

80

90
Band 1

100

LOCATING USEFUL TRAINING


SAMPLES
The Mahalanobis distance between
sample and a class centroid is used.

Small distance indicates that the sample


lies close to the class centroid and so is
typical of the class while a large distance
indicates that the sample is atypical.

Can help to reduce the field work for


ground truth collection, thus reducing
project cost

PRESENT WORK
Working
with
COST
Action
(European
Cooperation in Science and Technology)
TD1202: Mapping and the citizen sensor as Non
EU member
1. Classification with imperfect/noisy data
2. How SVM / RVM and SMLR works with noisy
data
3. Will be working on other classifiers- RF, ELM

Two type of data Noise


Attribute noise and Class noise
We are dealing with class noise, which can
happen due to subjectivity, data-entry error, or
inadequacy of the information used to label
each class.
Possible solutions to deal with class noise
includes data cleaning, detection and
elimination of mislabelled training cases

Error in
data -

5%

RVM

88%
(51)

88.22% 87.11% 87.78% 87.33% 87.56% 86.44% 85.56% 84.00%


(45)
(40)
(46)
(41)
(37)
(39)
(32)
(35)

SMLR

88.67% 88.89% 88.67% 87.78% 88.00% 87.33% 87.77% 86.89% 86.67%


(83)
(91)
(85)
(82)
(89)
(80)
(78)
(86)
(72)

SVM

89.11% 88.00% 90.0%


(203)
(259)
(310)

10%

15%

20%

25%

30%

89.77% 89.11% 86.67% 84.0%


(339)
(369)
(409)
(432)

35%

40%

84.22% 83.11%
(447)
(490)

EXTREME LEARNING MACHINES (ELM)

A neural network classifier

Use one hidden layer only

No parameter except number of hidden nodes

Kernel function can be used in place of


hidden layer by modifying the optimization
problem.

Global solution (no local optima like NN)

Performance comparable to SVM and


better than back-propagation neural
network

Multiclass

Very fast

Classification Accuracy
Dataset

SVM (%)

KELM (%)

ETM+

88.37

90.33

ATM

92.50

94.06

DAIS

91.97

92.16

Computational cost
Dataset

SVM (sec)

KELM (sec)

ETM+

76.74

5.78

DAIS

40.78

1.02

ATM

1.30

0.17

Mahesh Pal, A.E. Maxwell and T. A. Warner, Kernel based Extreme Learning Machine for Remote Sensing Image
Classification,2014, Remote Sensing letters.

PRESENT WORK
Working

on

sparse

extreme

learning

machine (produce sparse solution similar to


support vector machine)
Ensemble of extreme learning machine
Also trying to understand the working of
deep neural network

FEATURE REDUCTION

Two broad categories are: feature selection and


feature extraction.

Feature
reduction
may
speed-up
the
classification process by reducing data set size.

May increase the predictive accuracy.

May increase the ability to understand the


classification rules.

Feature selection select a subset of the original


features those maintains the useful information
to separate the classes by removing redundant
features.

FEATURE EXTRACTION
Number of techniques for feature extraction including
Principal Components,
maximum noise fraction
transformation, non-orthogonal techniques such as
projection pursuit, Independent component analysis are
proposed.
MNF requires estimates of the signal and noise
covariance matrices
Different features provided by MNF are ranked as per
signal-to-noise ratio (First MNF have smallest value of SN ratio).
Results with DAIS data suggests that MNF may not be
used effectively for dimensionality reduction.

Feature selection
Three approaches of feature selection are:
Filters: uses a search algorithm to search through the space of
possible features and evaluate each feature by using a filter such as
correlation and mutual information
Wrappers: uses a search algorithm to search through the space of
possible features and evaluate each subset by using a classification
algorithm.
Embedded: some classification processes such as random forest/
Multinomial logisitic regression produce a ranked list of features
during classification.

Filters
Large number of filter based approach are available in literature.
Some used with hyperspectral data are:
1. Correlation-based feature selection
2. Minimum-Redundancy-Maximum-Relevance (mRMR)
3. Entropy
4. Fuzzy entropy
5. Signal-to-noise ratio
6. RELIEF

WRAPPER APPROACH
SVM-RFE approach utilise SVM as base classifier.
1 2 w
The SVM-RFE utilise the objective
function
as a feature ranking criterion to produce a list of
features ordered by their discriminatory ability.
The feature, with the smallest ranking score is
eliminated.
SVM-RFE uses a backward feature elimination scheme
to recursively remove insignificant features from subsets
of features in order to derive a list of all features in rank
order of value.
A major drawback of wrapper methods is their high
computational requirements

EMBEDDED APPROACH
During classification process some algorithm produce
ranked list of all features.
For example: two approaches based on Random forest
and Multinomial logistic regression classifier can be
used.
In contrast to the filter and wrapper approaches, the
search for an optimal feature subset by embedded
approach is built into the classification algorithm
itself.
Classification and the feature selection process
cannot be separated.

Data Set
1. DAIS 7915 sensor by German Space Agency flown on 29 June
2000.
2. The sensor acquire information in 79-bands at a spatial resolution of
5m in the wavelength range of 0.50212.278 m.
3. 7 features located in the mid- and thermal infrared region and 7
features from spectral region of 0.502 2.395 m due to striping
noise were removed.
4. An area of 512 pixels by 512 pixels and 65 features covering the test
site was used.

Training and test data


1. Random sampling was used to collect training and test using a
ground reference image.
2. Eight land cover classes i.e. wheat, water, salt lake, hydrophytic
vegetation, vineyards, bare soil, pasture and built-up land.
3. A total of 800 training pixels and a total of 3800 test pixels was
used.

Feature selection
Algorithm

Number of used
features

Accuracy

65

91.76

Fuzzy entropy

14

91.68

Entropy

17

91.61

Signal to noise ratio

20

91.68

Relief
SVM-RFE

20

88.61

None

mRMR
CFS
Random forest
Multinomial logistic
regression

13

91.89

37
17
21

91.84
91.84
92.08

15

92.76

PRESENT WORK

How noise affects the feature selection


Ensemble of feature selection method
Stability of feature selection algorithms
for hyperspectral data

Вам также может понравиться