Вы находитесь на странице: 1из 75

CHAPTER 1

INTROUDCTION
1.1 INTROUDCTION
Nowadays medical image processing is used in the human anatomy for clinical
research, diagnosis and treatment. Here we have taken Magnetic Resonance Images
(MRI). An important step in medical image classification is segmentation and feature
extraction. However, the first problem occurs when choose of segmentation algorithm.
The second difficulty arises during feature extraction. Selection of segmentation
algorithm plays a major role in classification.
Partitioning of an image into several sub-image components is called image
segmentation. Segmentation is an important part of image recognition, compression and
classification. For accurate image segmentation, some good features have to be extracted.
Segmentation algorithm is classified into three types namely edge based segmentation,
threshold based segmentation and region based segmentation. In this article, we prefer
region based segmentation algorithm.
1.1.1 Definition Of Image Classification
Image classification refers to the task of extracting information classes from a
multiband raster image. The resulting raster from image classification can be used to
create thematic maps. Depending on the interaction between the analyst and the computer
during classification, there are two types of classification: supervised and unsupervised.
With the ArcGIS Spatial Analyst extension, there is a full suite of tools in the
Multivariate toolset to perform supervised and unsupervised classification (see An
overview of the Multivariate toolset). The classification process is a multistep workflow,
therefore, the Image Classification toolbar has been developed to provided an integrated
environment to perform classifications with the tools. Not only does the toolbar help with
the workflow for performing unsupervised and supervised classification, it also contains
additional functionality for analyzing input data, creating training samples and signature
files, and determining the quality of the training samples and signature files. The
recommended way to perform classification and multivariate analysis is through the
Image Classification toolbar.

1
SUPERVISED CLASSIFICATION
Supervised classification uses the spectral signatures obtained from training
samples to classify an image. With the assistance of the Image Classification toolbar, you
can easily create training samples to represent the classes you want to extract. You can
also easily create a signature file from the training samples, which is then used by the
multivariate classification tools to classify the image.
UNSUPERVISED CLASSIFICATION
Unsupervised classification finds spectral classes (or clusters) in a multiband
image without the analyst‘s intervention. The Image Classification toolbar aids in
unsupervised classification by providing access to the tools to create the clusters,
capability to analyze the quality of the clusters, and access to classification tools.
1.1.2 MRI
MRI is Magnetic Resonance Imaging
An MRI machine or scanner uses a powerful magnet and radio waves linked to a
computer to create remarkably clear and detailed cross sectional images of the body. To
visualize an MRI, think of your body as a loaf of bread with its many slices. The MRI
allows the physician to see many different “slices” of a body part by taking pictures from
outside the body. The “slices” can be displayed on a video monitor and saved on film or
disk for analysis.
For some MRI studies, a contrast agent, usually gadolinium may be used to
enhance the visibility of certain tissues. The contrast agent is given via a small
intravenous (IV) line placed in a vein in your arm. See ACRIN’s “About Imaging Agents
or Tracers” Information page for more information.
Examples of Uses: MRI can be used to view, monitor, or diagnose:

 spine, joint or muscle problems


 abdominal tumors and disorders
 brain tumors and abnormalities
 breast cancer
 heart or blood vessel problems

2
Fig 1.1 Sample MRI Images Used as Input Images

Specialized MRI techniques: Sometimes an MRI scan will include a special


method that provides additional information to your physician. Some specialized MRI
techniques are described below.

Diffusion MRI- Shows the microscopic movement of water molecules within


tissue. It can provide information on the microstructure of the tissue as well as swelling
within tissues. This method has been used primarily with brain pathology, but is being
studied for other uses.

Image representation for classification task used often feature extraction methods
which have been proven to be effective for different visual recognition tasks. Local binary
patterns method is used for texture features extracting. Histograms of oriented gradients
are applying for image processing. Usually these types of methods have been used to
transform images and describe them for many tasks. Most of the applied features need to
be identified by an expert and then manually coded as per the data type and domain. This
process is difficult and expensive in terms of expertise and time.

3
Fig 1.2 Outline for Classification of Medical Images

As a solution, deep learning reduces the task of developing new feature extractor,
by automating the phase of extracting and learning features. The proposed traffic sign
classification system is able to recognize the traffic sign images put on the road and
classify them by exploiting this technology.
There exist many different architectures of deep learning. The model presented in
this paper is a classifier system developed by using convolutional neural networks
category, which is the most efficient and useful deep neural network used for this type of
data. Therefore, CNNs applied to learn images representation on large-scale datasets for
recognition tasks can be exploited by transferring these learning representations on other
tasks with limited amount of training data.
To address this problem, we propose using the convolutional neural network
AlexNet applied on the largescale datasets ImageNet, by transferring its learned image
representations and reuse them to the classification task with limited training data. The
main idea is based on designing a method which reuse a part of training layers of
AlexNet. In the following, problem statement is presented in section II. Sections III
introduces the method and the CNN architecture exploited. Initial experiment results
using the appropriate CNN architecture which demonstrates that the developed neural
network achieves a satisfied success rate are described in the first part of section IV. In
the second part, the effect of the MiniBatchsize parameter is discussed .

4
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of
deep neural networks, most commonly applied to analyzing visual imagery.CNNs use a
variation of multilayer perceptrons designed to require minimal preprocessing.They are
also known as shift invariant or space invariant artificial neural networks (SIANN), based
on their shared-weights architecture and translation invariance characteristics.
Convolutional networks were inspired by biological processes in that the
connectivity pattern between neurons resembles the organization of the animal visual
cortex. Individual cortical neurons respond to stimuli only in a restricted region of the
visual field known as the receptive field. The receptive fields of different neurons
partially overlap such that they cover the entire visual field.
CNNs use relatively little pre-processing compared to other image classification
algorithms. This means that the network learns the filters that in traditional algorithms
were hand-engineered. This independence from prior knowledge and human effort in
feature design is a major advantage.They have applications in image and video
recognition,recommender systems,image classification, medical image analysis, and
natural language processing.
A convolutional neural network consists of an input and an output layer, as well as
multiple hidden layers. The hidden layers of a CNN typically consist of convolutional
layers, RELU layer i.e. activation function, pooling layers, fully connected layers and
normalization layers. Description of the process as a convolutionin neural networks is by
convention. Mathematically it is a crosscorrelation rather than a convolution (although
cross-correlation is a related operation). This only has significance for the indices in the
matrix, and thus which weights are placed at which index. Convolutional layers apply a
convolution operation to the input, passing the result to the next layer. The convolution
emulates the response of an individual neuron to visual stimuli.
Each convolutional neuron processes data only for its receptive field. Although
fully connected feedforward neural networks can be used to learn features as well as
classify data, it is not practical to apply this architecture to images. A very high number of
neurons would be necessary, even in a shallow (opposite of deep) architecture, due to the
very large input sizes associated with images, where each pixel is a relevant variable. For
instance, a fully connected layer for a (small) image of size 100 x 100 has 10000 weights
for each neuron in the second layer.

5
The convolution operation brings a solution to this problem as it reduces the
number of free parameters, allowing the network to be deeper with fewer parameters. For
instance, regardless of image size, tiling regions of size 5 x 5, each with the same shared
weights, requires only 25 learnable parameters. In this way, it resolves the vanishing or
exploding gradients problem in training traditional multi-layer neural networks with many
layers by using backpropagation.
Polling
 Convolutional networks may include local or global pooling layers. Pooling layers
reduce the dimensions of the data by combining the outputs of neuron clusters at
one layer into a single neuron in the next layer.
 Local pooling combines small clusters, typically 2 x 2. Global pooling acts on all
the neurons of the convolutional layer. In addition, pooling may compute a max or
an average. Max pooling uses the maximum value from each of a cluster of
neurons at the prior layer. Average pooling uses the average value from each of a
cluster of neurons at the prior layer.
Fully connected
 Fully connected layers connect every neuron in one layer to every neuron in
another layer. It is in principle the same as the traditional multi-layer perceptron
neural network (MLP). The flattened matrix goes through a fully connected layer
to classify the images. Receptive field
 In neural networks, each neuron receives input from some number of locations in
the previous layer. In a fully connected layer, each neuron receives input from
every element of the previous layer. In a convolutional layer, neurons receive
input from only a restricted subarea of the previous layer. Typically the subarea is
of a square shape (e.g., size 5 by 5).
 The input area of a neuron is called its receptive field. So, in a fully connected
layer, the receptive field is the entire previous layer. In a convolutional layer, the
receptive area is smaller than the entire previous layer.
Weights
 Each neuron in a neural network computes an output value by applying some
function to the input values coming from the receptive field in the previous layer.
The function that is applied to the input values is specified by a vector of weights
and a bias (typically real numbers).

6
 Learning in a neural network progresses by making incremental adjustments to the
biases and weights. The vector of weights and the bias are called a filter and
represents some feature of the input (e.g., a particular shape). A distinguishing
feature of CNNs is that many neurons share the same filter.
 This reduces memory footprintbecause a single bias and a single vector of weights
is used across all receptive fields sharing that filter, rather than each receptive
field having its own bias and vector of weights.

CNN design follows vision processing in living organisms.Receptive fields in the


visual cortex. Work by Hubeland Wieselin the 1950s and 1960s showed that cat and
monkey visual cortexes contain neurons that individually respond to small regions of the
visual field. Provided the eyes are not moving, the region of visual space within which
visual stimuli affect the firing of a single neuron is known as its receptive field.
Neighboring cells have similar and overlapping receptive fields. Receptive field size and
location varies systematically across the cortex to form a complete map of visual space.
The cortex in each hemisphere represents the contralateral visual field.
Their 1968 paper identified two basic visual cell types in the brain: Simple cells,
whose output is maximized by straight edges having particular orientations within their
receptive field complex cells, which have larger receptive fields, whose output is
insensitive to the exact position of the edges in the field.Hubel and Wiesel also proposed
a cascading model of these two types of cells for use in pattern recognition tasks.
Neocognitron, origin of the CNN architecture
The "neocognitron" was introduced by Kunihiko Fukushimain 1980. It was
inspired by the above-mentioned work of Hubel and Wiesel. The neocognitron introduced
the two basic types of layers in CNNs: convolutional layers, and downsampling layers. A
convolutional layer contains units whose receptive fields cover a patch of the previous
layer. The weight vector (the set of adaptive parameters) of such a unit is often called a
filter. Units can share filters.
Downsampling layers contain units whose receptive fields cover patches of
previous convolutional layers. Such a unit typically computes the average of the
activations of the units in its patch. This downsampling helps to correctly classify objects
in visual scenes even when the objects are shifted.

7
In a variant of the neocognitron called the cresceptron, instead of using
Fukushima's spatial averaging, J. Weng et al. introduced a method called max-pooling
where a downsampling unit computes the maximum of the activations of the units in its
patch. Max-pooling is often used in modern CNNs.
Several supervised and unsupervised learning algorithms have been proposed over
the decades to train the weights of a neocognitron. Today, however, the CNN architecture
is usually trained through backpropagation.The neocognitron is the first CNN which
requires units located at multiple network positions to have shared weights.
Neocognitrons were adapted in 1988 to analyze time-varying signals.
Time delay neural networks
The time delay neural network (TDNN) was introduced in 1987 by Alex Waibel
et al. and was the first convolutional network, as it achieved shift invariance. It did so by
utilizing weight sharing in combination with back propagation training. Thus, while also
using a pyramidal structure as in the neocognitron, it performed a global optimization of
the weights, instead of a local one.
TDNNs are convolutional networks that share weights along the temporal
dimension. They allow speech signals to be processed time-invariantly. This inspired
translation invariance in image processing with CNNs. The tiling of neuron outputs can
cover timed stages. TDNNs now achieve the best performance in far distance speech
recognition.
Image recognition with CNNs trained by gradient descent ;
A system to recognize hand-written ZIP Code numbers involved convolutions in
which the kernel coefficients had been laboriously hand designed. Yann LeCun etal
(1989) used backpropagation to learn the convolution kernel coefficients directly from
images of handwritten numbers. Learning was thus fully automatic, performed better than
manual coefficient design, and was suited to a broader range of image recognition
problems and image types.This approach became a foundation of modern computer
vision.
Shift-invariant neural network
Similarly, a shift invariant neural network was proposed by W. Zhang et al. for
image character recognition in 1988. The architecture and training algorithm were
modified in 1991 and applied for medical image processing and automatic detection of
breast cancer in mammograms.

8
A different convolution-based design was proposed in 1988 for application to
decomposition of one-dimensional electromyography convolved signals via de-
convolution. This design was modified in 1989 to other de-convolution-baseddesigns.
Neural abstraction pyramid
The feed-forward architecture of convolutional neural networks was extended in
the neural abstraction pyramid[37] by lateral and feedback connections. The resulting
recurrent convolutional network allows for the flexible incorporation of contextual
information to iteratively resolve local ambiguities. In contrast to previous models,
image-like outputs at the highest resolution were generated. Traditional multilayer
perceptron(MLP) models were successfully used for image recognition. However, due to
the full connectivity between nodes they suffer from the curse of dimensionality, and thus
do not scale well to higher resolution images. A 1000×1000-pixel image with RGB
colorchannels has 3 million dimensions, which is too high to feasibly process efficiently
at scale with full connectivity.
Thus, full connectivity of neurons is wasteful for purposes such as image
recognition that are dominated by spatially localinput patterns.
Convolutional neural networks are biologically inspired variants of multilayer
perceptrons that are designed to emulate the behavior of a visual cortex. These models
mitigate the challenges posed by the MLP architecture by exploiting the strong spatially
local correlation present in natural images. As opposed to MLPs, CNNs have the
following distinguishing features:
3D volumes of neurons
 The layers of a CNN have neurons arranged in 3 dimensions: width, height and
depth. The neurons inside a layer are connected to only a small region of the layer
before it, called a receptive field. Distinct types of layers, both locally and
completely connected, are stacked to form a CNN architecture.
Local connectivity
 Following the concept of receptive fields, CNNs exploit spatial locality by
enforcing a local connectivity pattern between neurons of adjacent layers. The
architecture thus ensures that the learned "filters" produce the strongest response
to a spatially local input pattern. Stacking many such layers leads to non-linear
filters that become increasingly global (i.e. responsive to a larger region of pixel

9
space) so that the network first creates representations of small parts of the input,
then from them assembles representations of larger areas.
Shared weights
 In CNNs, each filter is replicated across the entire visual field. These replicated
units share the same parameterization (weight vector and bias) and form a feature
map. This means that all the neurons in a given convolutional layer respond to the
same feature within their specific response field. Replicating units in this way
allows for features to be detected regardless of their position in the visual field,
thus constituting a property of translation invariance.
 Together, these properties allow CNNs to achieve better generalization on vision
problems. Weight sharing dramatically reduces the number of free parameters
learned, thus lowering the memory requirements for running the network and
allowing the training of larger, more powerful networks.
1.2 MOTIVATION
In our previous article on Image Classification, we used a Multilayer Perceptron
on the MNIST digits dataset. The performance was pretty good as we achieved 98.3%
accuracy on test data. But there was a problem with that approach. In our training
dataset, all images are centered. If the images in the test set are off-center, then the
MLP approach fails miserably. We want the network to be Translation-Invariant.
1.3 PROBLEM STATEMENT
Semantic image segmentation refers to the problem of assigning a semantic label
(such as “person”, “car” or “dog”) to every pixel in the image. Semantic segmentation (or
pixel classification) associates one of the pre-defined class labels to each pixel.
The input image is divided into the regions, which correspond to the objects of the
scene or "stuff" (in terms of Heitz and Koller (2008)). In the simplest case pixels are
classified w.r.t. their local features, such as colour and/or texture features (Shotton et al.,
2006). Markov Random Fields could be used to incorporate inter-pixel relations. To
evaluate the performance of our proposed approach we are going to use performance
parameter named "mean intersection-over-union (IOU) score ". For each class,
Intersection over Union (IU) score is=true positive / (true positive + false positive + false
negative).
 True positives:The number of correctly classified pixels.
 False positives: Thenumber of pixels wrongly classified.

10
 False negatives: The number of pixels wrongly not classifed.
We will try to achieve a mean intersection-over-union (IOU) score of 60-70% on
PASCALVOC segmentation benchmark.
1.4 AIM OF THE OBJECTIVE
The main aim of the project is to train the CNN network and extracts the features
of images. Data mining has turned up as great domain that contributes mechanism for
data analysis, to find out the hidden knowledge, and self-ruling decision making in many
operation domains. Supervised machine learning is using to find out the search for
algorithms that reason from clearly supplied instances to produce general interpretation,
which then makes predictions about future scenario or events. In other words, the goal of
supervised learning is to make a small model of the distribution of class labels
(distribution or classification) in terms of finding (predictor) features. The resulting
classifier is then used to assign class labels (attributes) to the testing instances where the
values of the predictor (attributes or properties) features are known, but the value of the
class label is unknown.The input image will be given as the unknown image and the
output image will be a labeled image as it depends on the features and training of the
Convolutional Neural Network for the image classification.
1.5 ORGANISATION OF DOCUMENT
The various stages involved in the development of this project have been properly
put into 5 chapters to enhance comprehensive and concise reading. In this project thesis,
the project is organized sequentially as follows.
Chapter1 of this work is on the introduction to Image classification. In this chapter
objective, limitations and problem definition was discussed.
Chapter2 is on the literature survey of Image classification. In this chapter all the
literature pertaining to this work was reviewed
Chapter3 is on the Feature Extraction.
Chapter4 is on the KNN Classifier.
Chapter 5 is on the CNN Classifier.
Chapter 6 is on the Matlab software.
Chapter 7 is on the Results and discussions.
Chapter 8 is on the conclusion
Chapter 9 is on the future scope.
Chapter 10 is on theReferences

11
CHAPTER 2
LITERATURE SURVEY

Convolutional neural networks with many layers have recently been shown to
achieve excellent results on many high-level tasks such as image classification, object
detection and more recently also semantic segmentation. Particularly for semantic
segmentation, a twostage procedure is often employed. Hereby, convolutional networks
are trained to provide good local pixel-wise features for the second step being
traditionally a more global graphical model.
Alexander G. Schwing and Raquel Urtasun unifies this two-stage process into a single
joint training algorithm. They demonstrate their method on the semantic image
segmentation task and show encouraging results on the challenging PASCAL VOC 2012
dataset.
Anastasios Doulamis, Nikolaos Doulamis, Klimis Ntalianis, and Stefanos Kollias
proposed an unsupervised video object (VO) segmentation and tracking algorithm based
on an adaptable neural-network architecture.
The proposed scheme comprises:
(1) a VO tracking module and
(2) an initial VO estimation module.
Object tracking is handled as a classification problem and implemented through
an adaptive network classifier, which provides better results compared to conventional
motion-based tracking algorithms. Network adaptation is accomplished through an
efficient and cost effective weight updating algorithm, providing a minimum degradation
of the previous network knowledge and taking into account the current content
conditions. A retraining set is constructed and used for this purpose based on initial VO
estimation results. Two different scenarios are investigated. The first concerns extraction
of human entities in video conferencing applications, while the second exploits depth
information to identify generic VOs in stereoscopic video sequences. Human face body
detection based on Gaussian distributions is accomplished in the first scenario, while
segmentation fusion is obtained using color and depth information in the second scenario.
A decision mechanism is also incorporated to detect time instances for weight updating.

12
Experimental results and comparisons indicate the good performance of the
proposed scheme even in sequences with complicated content (object bending,
occlusion). Bharath Hariharan, Pablo Arbel´aez, Ross Girshick, and Jitendra Malik
detects all instances of a category in an image and, for each instance, mark the pixels that
belong to it. They call this task Simultaneous Detection and Segmentation (SDS). Unlike
classical bounding box detection, SDS requires segmentation and not just a box. Unlike
classical semantic segmentation, we require individual object instances.They build on
recent work that uses convolutional neural networks to classify categoryindependent
region proposals (R-CNN), introducing a novel architecture tailored for SDS. They then
use category-specific, topdown figure-ground predictions to refine our bottom-up
proposals. They show a 7 point boost (16% relative) over our baselines on SDS, a 5 point
boost (10% relative) over state-of-the-art on semantic segmentation, and state-of-the-art
performance in object detection.
Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari presented a generic
objectness measure, quantifying how likely it is for an image window to contain an object
of any class. We explicitly train it to distinguish objects with a well-defined boundary in
space, such as cows and telephones, from amorphous background elements, such as grass
and road. The measure combines in a Bayesian framework several image cues measuring
characteristics of objects, such as appearing different from their surroundings and having
a closed boundary. These include an innovative cue to measure the closed boundary
characteristic. In experiments on the challenging PASCAL VOC 07 dataset, we show this
new cue to outperform a state-of-the-art saliency measure, and the combined objectness
measure to perform better than any cue alone. We also compare to interest point
operators, a HOG detector, and three recent works aiming at automatic object
segmentation. Finally, they present two applications of objectness. In the first, we sample
a small number windows according to their objectness probability and give an algorithm
to employ them as location priors for modern class-specific object detectors. As they
show experimentally, this greatly reduces the number of windows evaluated by the
expensive class-specific model. In the second application, they use objectness as a
complementary score in addition to the classspecific model, which leads to fewer false
positives.

13
As shown in several recent papers, objectness can act as a valuable focus of
attention mechanism in many other applications operating on image windows, including
weaklysupervised learning of object categories, unsupervised pixelwise segmentation,
and object tracking in video. Computing objectness is very efficient and takes only about
4 sec. per image.
Camille Couprie, Clement Farabet, Laurent Najman and Yann LeCun addresses
multiclass segmentation of indoor scenes with RGB-D inputs. While this area of research
has gained much attention recently, most works still rely on hand-crafted features. In
contrast, they apply a multiscale convolutional network to learn features directly from the
images and the depth information. They obtain state-of-the-art on the NYU-v2 depth
dataset with an accuracy of 64.5%. They illustrate the labeling of indoor scenes in videos
sequences that could be processed in real-time using appropriate hardware such as an
FPGA.
C.P.Town and D.Sinclair demonstrates an approach to content based image
retrieval founded on the semantically meaningful labelling of images by high level visual
categories. The image labelling is achieved by means of a set of trained neural network
classifiers which map segmented image region descriptors onto semantic class
membership terms. It is argued that the semantic terms give a good estimate of the salient
features which are important for discrimination in image retrieval. Furthermore, it is
shown that the choice of visual categories such as grass or sky which mirror high level
human perception allows the implementation of intuitive and versatile query composition
interfaces and a variety of image similarity metrics for content based retrieval.
Christian Szegedy, Alexander Toshev and Dumitru Erhan shown outstanding
performance on image classification tasks. In this paper we go one step further and
address the problem of object detection using DNNs, that is not only classifying but also
precisely localizing objects of various classes. They present a simple and yet powerful
formulation of object detection as a regression problem to object bounding box masks.We
define a multiscale inference procedure which is able to produce high-resolution object
detections at a low cost by a few network applications. State-of-the-art performance of the
approach is shown on Pascal VOC.
Clement Farabet, Camille Couprie, Laurent Najman, Yann Lecun propose a
method that uses a multiscale convolutional network trained from raw pixels to extract
dense feature vectors that encode regions of multiple sizes centered on each pixel.

14
The method alleviates the need for engineered features, and produces a powerful
representation that captures texture, shape and contextual information. They report results
using multiple post-processing methods to produce the final labeling.
Among those, they propose a technique to automatically retrieve, from a pool of
segmentation components, an optimal set of components that best explain the scene; these
components are arbitrary, e.g. they can be taken from a segmentation tree, or from any
family of over-segmentations. The system yields record accuracies on the Sift Flow
Dataset (33 classes) and the Barcelona Dataset (170 classes) and near-record accuracy on
Stanford Background Dataset (8 classes), while being an order of magnitude faster than
competing approaches, producing a 320 × 240 image labeling in less than a second,
including feature extraction.
Clement Farabet, Camille Couprie, Laurent Najman and Yann LeCun proposed scene
parsing method here starts by computing a tree of segments from a graph of pixel
dissimilarities. Simultaneously, a set of dense feature vectors is computed which encodes
regions of multiple sizes centered on each pixel.
The feature extractor is a multiscale convolutional network trained from raw
pixels. The feature vectors associated with the segments covered by each node in the tree
are aggregated and fed to a classifier which produces an estimate of the distribution of
object categories contained in the segment. A subset of tree nodes that cover the image
are then selected so as to maximize the average “purity” of the class distributions, hence
maximizing the overall likelihood that each segment will contain a single object. The
convolutional network feature extractor is trained end-to-end from raw pixels, alleviating
the need for engineered features. After training, the system is parameter free. The system
yields record accuracies on the Stanford Background Dataset (8 classes), the Sift Flow
Dataset (33 classes) and the Barcelona Dataset (170 classes) while being an order of
magnitude faster than competing approaches, producing a 320 × 240 image labeling in
less than 1 second.
Hongsheng Li, Rui Zhao, and Xiaogang Wang present highly efficient algorithms
for performing forward and backward propagation of Convolutional Neural Network
(CNN) for pixelwise classification on images. For pixelwise classification tasks, such as
image segmentation and object detection, surrounding image patches are fed into CNN
for predicting the classes of centered pixels via forward propagation and for updating
CNN parameters via backward propagation. However, forward and backward propagation
was originally designed for whole-image classification.
15
Directly applying it to pixelwise classification in a patch-by-patch scanning
manner is extremely inefficient, because surrounding patches of pixels have large
overlaps, which lead to a lot of redundant computation. The proposed algorithms
eliminate all the redundant computation in convolution and pooling on images by
introducing novel d-regularly sparse kernels. It generates exactly the same results as those
by patch-by-patch scanning.
Convolution and pooling operations with such kernels are able to continuously
access memory and can run efficiently on GPUs. A fraction of patches of interest can be
chosen from each training image for backward propagation by applying a mask to the
error map at the last CNN layer. Its computation complexity is constant with respect to
the number of patches sampled from the image. Experiments have shown that our
proposed algorithms speed up commonly used patch-by-patch scanning over 1500 times
in both forward and backward propagation. The speedup increases with the sizes of
images and patches. Source code of GPU implementation is ready to be released to the
public.
The topic of semantic segmentation has witnessed considerable progress due to
the powerful features learned by convolutional neural networks (CNNs). The current
leading approaches for semantic segmentation exploit shape information by extracting
CNN features from masked image regions. This strategy introduces artificial boundaries
on the images and may impact the quality of the extracted features. Besides, the
operations on the raw image domain require to compute thousands of networks on a
single image, which is time-consuming.
In this project we are using a Jifeng Dai, Kaiming He and Jian Sun propose a
method to exploit shape information via masking convolutional features. The proposal
segments (e.g., super-pixels) are treated as masks on the convolutional feature maps. The
CNN features of segments are directly masked out from these maps and used to train
classifiers for recognition. They further propose a joint method to handle objects and
“stuff” (e.g., grass, sky, water) in the same framework. State-of-the-art results are
demonstrated on benchmarks of PASCAL VOC and new PASCALCONTEXT, with a
compelling computational speed.
Jonathan Long, Evan Shelhamer and Trevor Darrell show that convolutional
networks by themselves, trained end-to-end, pixelsto-pixels, exceed the state-of-the-art in
semantic segmentation.

16
Their key insight is to build “fully convolutional” networks that take input of
arbitrary size and produce correspondingly-sized output with efficient inference and
learning. They define and detail the space of fully convolutional networks, explain their
application to spatially dense prediction tasks, and draw connections to prior models.
They adapt contemporary classification networks (AlexNet, the VGG net, and
GoogLeNet) into fully convolutional networks and transfer their learned representations
by fine-tuning to the segmentation task. They then define a novel architecture that
combines semantic information from a deep, coarse layer with appearance information
from a shallow, fine layer to produce accurate and detailed segmentations. Their fully
convolutional network achieves state-of-theart segmentation of PASCAL VOC (20%
relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while
inference takes less than one fifth of a second for a typical image.
Jose M. Alvarez, Yann LeCun, TheoGevers and Antonio M. Lopez referred to the
process of assigning an object label (e.g., building, road, sidewalk, car, pedestrian) to
every pixel in an image. Common approaches formulate the task as a random field
labeling problem modeling the interactions between labels by combining local and
contextual features such as color, depth, edges, SIFT or HoG. These models are trained to
maximize the likelihood of the correct classification given a training set. However, these
approaches rely on hand–designed features (e.g., texture, SIFT or HoG) and a higher
computational time required in the inference process.
Therefore, in this paper, we focus on estimating the unary potentials of a
conditional random field via ensembles of learned features.We propose an algorithm
based on convolutional neural networks to learn local features from training data at
different scales and resolutions. Then, diversification between these features is exploited
using a weighted linear combination. Experiments on a publicly available database show
the effectiveness of the proposed method to perform semantic road scene segmentation in
still images. The algorithm outperforms appearance based methods and its performance is
similar compared to state–of–the–art methods using other sources of information such as
depth, motion or stereo. Joseph J. Lim C. Lawrence Zitnick Piotr Doll´ar proposed a
novel approach to both learning and detecting local contour-based representations for
mid-level features. Their features, called sketch tokens, are learned using supervised mid-
level information in the form of hand drawn contours in images. Patches of human
generated contours are clustered to form sketch token classes and a random forest
classifier is used for efficient detection in novel images.
17
They demonstrate our approach on both top down and bottom-up tasks. They
show state-of-the-art results on the top-down task of contour detection while being over
200_ faster than competing methods. They also achieve large improvements in detection
accuracy for the bottom-up tasks of pedestrian and object detection as measured on
INRIA and PASCAL, respectively. These gains are due to the complementary
information provided by sketch tokensto low-level features such as gradient histograms.
Joao Carreira and Cristian Sminchisescu presented a novel framework for
generating and ranking plausible objects hypotheses in an image using bottom up
processes and mid-level cues. The object hypotheses are represented as figure-ground
segmentations, and are extracted automatically, without prior knowledge about properties
of individual object classes, by solving a sequence of constrained parametric min-cut
problems (CPMC) on a regular image grid. They then learn to rank the object hypotheses
by training a continuous model to predict how plausible the segments are, given their
mid-level region properties. They show that this algorithm significantly outperforms the
state of the art for low-level segmentation in the VOC09 segmentation dataset. It achieves
the same average best segmentation covering as the best performing technique to date,
0.61 when using just the top 7 ranked segments, instead of the full hierarchy in. Their
method achieves 0.78 average best covering using 154 segments. In a companion paper,
they also show that the algorithm achieves state-of-the art results when used in a
segmentation-based recognition pipeline.
Convolutional Neural Networks (CNNs) have recently shown state of the art
performance in high level vision tasks, such as image classification and object detection.
This work brings together methods from CNNs and probabilistic graphical models for
addressing the task of pixel-level classification (also called “semantic image
segmentation”). Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin
Murphy and Alan L. Yuille show that responses at the final layer of CNNs are not
sufficiently localized for accurate object segmentation. This is due to the very invariance
properties that make CNNs good for high level tasks. They overcome this poor
localization property of deep networks by combining the responses at the final CNN layer
with a fully connected Conditional Random Field (CRF). Qualitatively, their “DeepLab”
system is able to localize segment boundaries at a level of accuracy which is beyond
previous methods. Quantitatively, our method sets the new state-of-art at the PASCAL
VOC-2012 semantic image segmentation task, reaching 71.6% IOU accuracy in the test
set.
18
They show how these results can be obtained efficiently: Careful network re-
purposing and a novel application of the ‘hole’ algorithm from the wavelet community
allow dense computation of neural net responses at 8 frames per second on a modern
GPU.
Mohammadreza Mostajabi, Payman Yadollahpour and Gregory Shakhnarovic introduce a
purely feed-forward architecture for semantic segmentation. They map small image
elements (superpixels) to rich feature representations extracted from a sequence of nested
regions of increasing extent. These regions are obtained by “zooming out” from the
superpixel all the way to scene-level resolution. This approach exploits statistical
structure in the image and in the label space without setting up explicit structured
prediction mechanisms, and thus avoids complex and expensive inference. Instead
superpixels are classified by a feed forward multilayer network. Their architecture
achieves new state of the art performance in semantic segmentation, obtaining 64.4%
average accuracy on the PASCAL VOC 2012 test set.
Ning Zhang et al. in his paper “Part based R-CNN's for fine grained category
detection” showed that semantic part localization can facilitate ne-grained categorization
by explicitly isolating subtle appearance dierences associated with specifc object parts.
Methods for posenormalized representations have been proposed, but generally presume
bounding box annotations at test time due to the difficulty of object detection. They
proposed a model for ne-grained categorization that overcomes these limitations by
leveraging deep convolutional features computed on bottom-up region proposals. Their
method learns whole object and part detectors, enforces learned geometric constraints
between them, and predicts a ne-grained category from a pose-normalized representation.
Experiments on the Caltech- UCSD bird dataset conrm that this method outperforms
state-of-the-art ne-grained categorization methods in an end-to-end evaluation without
requiring a bounding box at test time.
Pablo Arbeaez, Bharath Hariharau, Chunhui Gu, Saurabh Gupta, Lubomir
Bourdev and Jitendra Malik addressed the problem of segmenting and recognizing
objects in real world images, focusing on challenging articulated categories such as
humans and other animals. For this purpose, they propose a novel design for region-based
object detectors that integrates efficiently top-down information from scanning-windows
part models and global appearance cues. Their detectors produce class-specific scores for
bottom-up regions, and then aggregate the votes of multiple overlapping candidates
through pixel classification.
19
They evaluate our approach on the PASCAL segmentation challenge, and report
competitive performance with respect to current leading techniques. On VOC2010, their
method obtains the best results in 6/20 categories and the highest performance on
articulated objects.
Pedro H. O. Pinheiro Ronan propose an approach consisting of a recurrent convolutional
neural network which allows them to consider a large input context, while limiting the
capacity of the model. Collobert Scene parsing is a technique that consist on giving a
label to all pixels in an image according to the class they belong to. To ensure a good
visual coherence and a high class accuracy, it is essential for a scene parser to capture
image long range dependencies.
In a feed-forward architecture, this can be simply achieved by considering a
sufficiently large input context patch, around each pixel to be labeled. Contrary to most
standard approaches, our method does not rely on any segmentation methods, nor any
task-specific features. The system is trained in an end-to-end manner over raw pixels, and
models complex spatial dependencies with low inference cost. As the context size
increases with the built-in recurrence, the system identifies and corrects its own errors.
Their approach yields state-of-the-art performance on both the Stanford Background
Dataset and the SIFT Flow Dataset, while remaining very fast at test time.
Pierre Sermanet David Eigen , Xiang Zhang Michael Mathieu Rob Fergus Yann
LeCun present an integrated framework for using Convolutional Networks for
classification, localization and detection. They show how a multiscale and sliding
window approach can be efficiently implemented within a ConvNet. They also introduce
a novel deep learning approach to localization by learning to predict object boundaries.
Bounding boxes are then accumulated rather than suppressed in order to increase
detection confidence. They show that different tasks can be learned simultaneously using
a single shared network. This integrated framework is the winner of the localization task
of the ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and
obtained very competitive results for the detection and classifications tasks. In post-
competition work, they establish a new state of the art for the detection task. Finally, they
release a feature extractor from our best model called OverFeat.
Ross Girshick in 2015 presented the technique “Fast R-CNN” . In his paper he
proposed Fast R-CNN, a clean and fast framework for object detection. Compared to
traditional R-CNN, and its accelerated version SPPnet, Fast R-CNN trains networks using
a multi-task loss in a single training stage.
20
The multi-task loss simplifies learning and improves detection accuracy. Unlike
SPPnet, all network layers can be updated during fine- uning. They show that this
difference has practical ramifications for very deep networks, such as VGG16, where
mAP suffers when only the fully-connected layers are updated. Compared to “slow” R-
CNN, Fast RCNN is 9 faster at training VGG16 for detection, 213 faster at test-time, and
achieves a significantly higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast
R-CNN trains VGG16 3 faster, tests 10 faster, and is more accurate.
Object detection performance, as measured on the canonical PASCAL VOC
dataset, has plateaued in the last few years. The best-performing methods are complex
ensemble systems that typically combine multiple low-level image features with high-
level context. In this paper, Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra
Malik propose a simple and scalable detection algorithm that improves mean average
precision (mAP) by more than 30% relative to the previous best result on VOC 2012—
achieving a mAP of 53.3%.
Their approach combines two key insights
 One can apply high-capacity convolutional neural networks (CNNs) to bottom-up
region proposals in order to localize and segment objects.
 When labeled training data is scarce, supervised pre-training for an auxiliary task,
followed by domain-specific fine-tuning, yields a significant performance boost.
Since They combine region proposals with CNNs, we call our method R-CNN:
Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed
sliding-window detector based on a similar CNN architecture. They find that R-CNN
outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset.
Shaoqing Renet all published paper on Faster R-CNN, using it for Real-Time Object
Detection with Region Proposal Networks. State-of-the-art object detection networks
depend on region proposal algorithms to hypothesize object locations. Advances like
SPPnet and Fast R-CNN have reduced the running time of these detection networks,
exposing region proposal computation as a bottleneck. In this work, they introduced a
Region Proposal Network (RPN) that shares full-image convolutional features with the
detection network, thus enabling nearly cost-free region proposals. An RPN is a fully
convolutional network that simultaneously predicts object bounds and objectness scores
at each position. RPNs are trained end-to-end to generate high-quality region proposals,
which are used by Fast R-CNN for detection. With a simple alternating optimization,

21
RPN and Fast R-CNN can be trained to share convolutional features. For the very deep
VGG-16 model, their detection system has a frame rate of 5fps (including all steps) on a
GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007
(73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image.
S. Ji and H.W. Park Two-step image segmentation algorithm, which is based on
region coherency for the segmentation of color image. The first step is the watershed
segmentation, and the next one is the region merging using artificial neural networks.
Spatially homogeneous regions are obtained by the first step, but the regions are over
segmented. The second step merges the over segmented regions. The proposed method
exploits the luminance and chrominance difference components of color image to verifv
region coherency. The YUV color coordinate system is used in this work.
Graph cut optimization is one of the standard workhorses of image segmentation since
for binary random field representations of the image, it gives globally optimal results and
there are efficient polynomial time implementations. Often, the random field is applied
over a flat partitioning of the image into non-intersecting elements, such as pixels or
super-pixels. In the paper Victor Lempitsky, Andrea Vedaldi and Andrew Zisserman
show that if, instead of a flat partitioning, the image is represented by a hierarchical
segmentation tree, then the resulting energy combining unary and boundary terms can
still be optimized using graph cut (with all the corresponding benefits of global optimality
and efficiency). As a result of such inference, the image gets partitioned into a set of
segments that may come from different layers of the tree. They apply this formulation,
which they call the pylon model, to the task of semantic segmentation where the goal is to
separate an image into areas belonging to different semantic classes. The experiments
highlight the advantage of inference on a segmentation tree (over a flat partitioning) and
demonstrate that the optimization in the pylon model is able to flexibly choose the level
of segmentation across the image. Overall, the proposed system has superior
segmentation accuracy on several datasets (Graz-02, Stanford background) compared to
previously suggested approaches.

22
CHAPTER 3
FEATURE EXTRACTION

3.1 INTRODUCTION
The purpose of feature extraction is to transform the input image that contains too
large information into a reduced representation with set of features. Images contain a
large number of features. It is significant to extracting features for a particular task in
order to reduce the complexity of processing. Most likely, these are attributes of the
image being analyzed. This technique increases the overall efficiency of the system.
Texture is a most important property of an image. It specifies attributes, resolution, etc,.
This can be used in image mining. Extracting features by using gray level co-occurrence
matrix (GLCM) are used to distinguish between
normal and abnormal of medical images.

3.2STATISTICAL FEATURES
An algorithmic rule that implements classification, particularly in an exceeding
concrete implementation is thought as a classifier. The term ―classifier‖ typically refers
to the function, enforced by classification algorithmic rule that maps into the input file.
GLCM is a statistical textures feature that considers the spatial relationship of the pixels.
Haralick recommended the use of co-occurrence matrix or gray level cooccurrence
matrix. It considers the relationship between two neighboring pixels, the first pixel is
known as a reference and the second is known as a neighbor pixel.

Texture contains important information about surface structural of the image.


Generally texture is a feature used in the analysis and interpretation of images. P[i, j] is
defined by first specifying a displacement vector d = (dx, dy) and counting all pairs of
pixels separated by d having gray levels i and j. G represents the dimension of
cooccurrence matrix (number of gray levels), Px(i) and Py(j) are the marginal
probabilities.

23
Mean

Standard Deviation

Contrast

Correlation

Cluster prominence

Cluster Shade

Dissimilarity

Energy

24
Entropy

Information measure of correlation


HX, HY are the entropies of px and py

Homogeneity

Maximum Probability

Sum of Squares

Autocorrelation

Where p and q are the positional difference in the ith jth direction, and m and n are image
dimensions.
Sum of Average

Sum of Variance

25
Sum of entropy

Difference Variance

Difference Entropy

Inertia

3.3 FRACTAL DIMENSIONS

A pattern dimension could be a magnitude relation providing a applied math index


of quality scrutiny howeverdetail in a very pattern changes with the size at thatit is
measured.It has conjointly been characterised as a live of the house-filling capability of a
pattern that tells however a patternscales otherwise from the space it's embedded in;a
pattern dimension doesn't need to be associate degree number.

The essential plan of "fractured" dimensions includes a long history in arithmetic,


however the term itself was delivered to the foreby mathematician supported his 1967
paper onself-similarity within which he mentioned fragmental dimensions therein paper.

In terms of that notion the stick.There area unit many formal mathematical
definitions of pattern dimension that hinge upon this basic construct of amendment
thoroughly with amendment in scale.

3.4 GRAY LEVEL DIFFERENCE STATISTICS


According to the amount of intensity points (pixels) in every combination,
statistics square measure classified into first order, second order and higher order
statistics. The Gray level method is a way of extracting second order statistical texture
features. The approach has been used in the number of applications.

26
3.4.1 Laws TEM
Transmission microscopy could be a research technique during which a beam of
electrons is transmitted through a specimen to form an image. The specimen is most
frequently Associate in Nursing ultrathin section but a hundred nm thick or a suspension
on a grid.An image is made from the interaction of the electrons with the sample because
the beam is transmitted through the specimen. The image is then magnified and focused
onto an imaging device, such as a fluorescent screen, a layer of photographic film, or a
sensor such as a scintillator attached to a charge-coupled device. Transmission negatron
microscopes area unit capable of imaging at considerably light weight microscopes,
because of the smaller Louis Victor de Broglie wavelength of electrons. This enables the
instrument to capture fine detail even as small as a single column of atoms, which is
thousands of times smaller than a resolvable object seen in a light microscope. TEMs
notice application in cancer analysis, virology, and materials science yet as pollution,
engineering science and semiconductor analysis.

Fig: 3.1 Feature Extraction of Image Parameters Using Excel Sheet

27
CHAPTER 4
KNN CLASSIFIER

4.1 INTRODUCTION OF NEURAL NETWORK


Neural networks are made up many layers, and each layer was consisted of
different number of neurons which have trainable weights and biases. All the neurons are
fully connected to the neurons in previous and post layers. The first layer is input layer
which was viewed as a signle vector. The last layer is output layer whose output view be
as predict result. Other layers between input and output layer are call hidden layer which
will process and pass the ’message’ from previous layer to post layer. Every neuron will
receive some inputs from neurons in previous layer. Then it performs a dot product of all
the inputs, following with a non-linearity optionally function as output of this neuron.

4.1.1 How does the neural network work

1. Initalize all weight in the neural network, and(l)wij stands for the weight on the path
from the ith neuron in (l − 1)th layer to the jth neuron in lth layer
2. Feedforword:

(a) Take one train data, set the input values of eachneuron in the 0th (input) layer and
the label in output layer.
(b) Compute total net input from pre-layer to each(l) hidden layer neuron xj in the next
layer, and squash the total net input using an activation(l) function as the output in
next layer yj (here we use the logistic function), then repeat the process with the output
layer neurons.
X(l) = Y (l−1)W(l−1),W(l−1) ∈R(d(l − 1) × d(l))
Y (l) = σ(X(l)),X(l) and Y (l) ∈R(d(l)×1),
in which d(l) means the number of the neuron in lth layer.

3. Back progagation

W(l) = W(l) −ηdW(l)

(a) Compute the gradients dW matrix on weights from the second last layer to last output
layer

28
(l = n). Let error , then calculate the gradient of each weight
between last layer and its pre-layer.

δ(n) = −(E−Y (n))σ0(X(n)),δ(n) ∈R(d(n)×1) dW(n−1) = (Y (n−1))T δ(n)

(b) Compute gradient in previous layers (l = n −1)


δ(l) = W(l)(δ(l+1)) dW(l−1) = (Y (l−1))T δ(l)
Repeat the process until whole weights in the neural network were updated.

4. Return to the second step (feedforward), keep updating the weights until reach the
iteration.

4.2 KNN CLASSIFIER

4.2.1 INTRODUCTION

A K-Nearest Neighbor based classifier classifies a query instance based on the


class labels of itsneighbor instances. Although kNN has proved to be a ubiquitous
classification/regression tool with good scalability, but it suffers from some drawbacks.

Two of its major drawbacks are:

 The existing kNN algorithm is equivalent to using only local prior


probabilities to predict instance labels, and hence it does not take into
account the class distribution around the wider neighborhood of the query
instance,which results into undesirable performance on imbalanced data.
 It uses all the training data at the runtime and hence is slow. In this thesis,
we provide a suite of solutions to tackle this drawbacks.

Classification has been an age old problem. Early in the 4th century BC, Aristotle
tried to group organisms into two classes depending on whether they are beneficial or
harmful to a human. He also introduced the concept of classifying all forms of life for
organizing the rich diversity in living organisms.Today classification systems find
differentiating features between classes and use them to classify unknown instances. It
has been recognized as an important problem in data mining among other knowledge
discovery tasks.

29
In the recent past, a lot of research centered at nearest neighbor methodology has
been done .Although kNN is computationally expensive, it is very simple to
understand,accurate, requires only a few parameters to be tuned and is robust with regard
to the search space. Also kNN classifier can be updated at a very little cost as new
training instances with known classes a represented.

A strong point of kNN is that, for all data distributions, its probability of error is
boundedabove by twice the Bayes probability of error.However it suffers from two major
drawbacks:

 It uses only local prior probabilities to predict instance labels, and hence does not
take into account,class distribution around the neighborhood of query instance.
This results in undesirable performance on imbalanced data sets. The performance
of kNN algorithm over imbalanced datasets can be improved, if it uses this
information while classifying instances.
 It is a lazy learner i.e. it uses all the training data at the runtime, and is hence slow.

4.3 CLASSIFICATION
Classification is defined as the process of finding a set of models (or functions)
that describe and distinguish data classes and concepts, with the goal being to use the
model to predict the classes of objects whose class labels are unknown. Thus,
classification is a supervised learning problem where the task is to predict the value of a
discrete output variable given a set of training examples and a test sample where each
training example is a pair consisting of the input object and the desired class.

Generally data classification is a two step process. In the first step, a classifier is
built describing a pre-determined set of data classes or concepts. This is the learning step
(or training phase), where a classification algorithm builds the classifier by analyzing or
learning from a training set. In the second step, the model is used for classification.

Classification has been widely used in:

 Credit scoring: Define the cap-limit for a credit card based on users past
behaviour.
 Search engines: Categorize or classify the type of query or document.

30
 Handwriting recognition: Receive and interpret intelligible handwritten input from
sources suchas paper documents, photographs, touch-screens and other devices.
 Document categorization: Assign an electronic document to one or more
categories, based on itscontents.
 Speech recognition and medical image analysis and diagnosis.

In this subsection, we describe the problem of classification and notation used to


model the dataset. The problem of classification is to estimate the value of the class
variable based on the values of one or more independent variables (known as feature
variables).

We model the tuple as {x, y} where x is anordered set of attribute values like {x1,
x2, . . . , xd} and y is the class variable to be predicted. Here xi is the value of the ith
attribute and there are d attributes overall corresponding to a d-dimensional space.

Formally, the problem has the following inputs:

 A set of n tuples called the training dataset, D = {(x1, y1), (x2, y2), . . . , (xn, yn)}.
 A query tuple xt.

The output is an estimated value of the class variable for the given query xt,
mathematically it can be expressed as:

yt = f(xt,D, parameters)

Where parameters are the arguments that the function f() takes. These are
generally set by the user or are learned by some method.

4.3.1 Mathematical Model of kNN

In this subsection, we present a mathematical model for kNN algorithma and


show that kNN only makes use of local prior probabilities for classification. For a given
query instance xt, kNN algorithm works as follows:

Where yt is the predicted class for the query instance xt and m is the number of classes
present in thedata. Also

31
The above equation is also written as,

and we know that,

Where p(cj)(xt,k) is the probability of occurrence of jth class in the neighborhood of xt .


Hence the Eqn. turns out to be

yt = argmax{p(c1)(xt,k), p(c2)(xt,k), . . . , p(cm)(xt,k)}

It is clear from Eq. 1.8, that kNN algorithm uses only prior probabilities to
calculate the class of thequery instance. It ignores the class distribution around the
neighborhood of query point.

Imbalance Data

Building data mining models using unreliable or abnormal datasets presents a


significant challenge to classifier construction. Regardless of the strength of a particular
classification algorithm, learning from poor quality data will result in sub-optimal
performance. Numerous studies dealing with classification problem have shown that the
presence of errors in the training dataset lowers the predictive accuracy of a learner on
test data . There are many different dimensions of data quality, including class noise or
labeling errors , attribute noise , and missing values . Another commonly encountered
challenge in data mining applications is the occurrence of class imbalance.

32
A data set is imbalanced, if its dependent variable is categorical and the number of
instances in one class is different from those in the other class. In many real world
applications such as Web page search, scam sites detection, fraudulent calls detection etc,
there is a highly skewed distribution of classes. Various classification techniques such as
kNN , SVM , and Neural Networks etc.

Fig:4.1 A sample scenario where regular kNN algorithm will fail

These have been designed and used, but it has been observed that the algorithms
do not perform as good onimbalanced datasets as on balanced datasets. Learning from
imbalanced data sets has been identified as one of the 10 most challenging problems in
data mining research . In the literature of solving class imbalance problems, various
solutions have been proposed.

Such techniques broadly include twodifferent approaches:

 modifying existing methods


 application of a pre-processing stage.

In the recent past, a lot of research centered at nearest neighbor methodology has
been done. However one of the major drawbacks of kNN is that, it uses only local prior
probabilities to predict instance labels, and hence does not take into account, class
distribution around the neighborhood of query instance. This results in undesirable
performance on imbalanced data sets. The performance of kNN algorithm over
imbalanced datasets can be improved, if it uses information about local class
distributionwhile classifying instances.

Fig. shows an artificial two-class imbalance problem, where the majority class
•gA•h is representedby circles and the minority class •gB•h by triangles. The query
instance is represented by cross.
33
As can be seen from the figure, the query instance would have been classified as
the majority class gAh by a regular kNN algorithm with k value equal to 7. But if the
algorithm had taken into account the imbalance class distribution around the
neighborhood of the query instances (say in the region represented by dottedsquare), it
would have classified the query instance as belonging to minority class gBh, which is the
desired class.

Regression Analysis

The problem of regression is to estimate the value of a dependent variable based


on the values of one or more independent variables, e.g., predicting price increase based
on demand or money supply based on inflation rate etc. Regression analysis includes any
techniques for modeling and analyzing several variables, when the focus is on the
relationship between a dependent variable and one or more independent variables.

More specifically, regression analysis helps to understand how the typical value
of the dependent variable changes when any one of the independent variables is varied,
while the other independent variables are held fixed. Regression algorithms can be used
for prediction (including forecasting of time-series data), inference, hypothesis-testing
and modeling of causal relationships.

Regression algorithms have been widely used for:

 Trend Line: A trend line represents a trend or the long-term movement in time-
series data.
 Finance: For analyzing and quantifying the systematic risk of an investment.
 Economics: Used to predict consumption spending, fixed investment spending,
inventory investment, spending on imports, the demand to hold liquid assets, labor
demand, and labor supply.

In statistics, Regression is considered as collection of statistical function-fitting


techniques. Statistical approaches try to learn a probability function P(y | x) and use it to
predict the value of y for a given value of x. Users study the application domain to
understand the form of this probability function. The function may have multiple
parameters and coefficients in its expansion. Statistical approaches although popular, are
not generic in that they require the user to make an intelligent guess, about the form of
regression equation, so as to get the best fit for the data.
34
Regression analysis has been studied extensively in statistics, but there have been
only a few studies from the data mining perspective. Majority of this study resulted into
algorithms that fall under the following broad categories - Linear Regression , Nearest
Neighbor Algorithms , Decision Trees, Support Vector Machines , Neural Networks and
Logistic Regression . However most of these algorithms were originally developed for
classification purpose, but have been later modified for regression.

The current existing standard algorithms suffer from one or more of high
computational complexity, poor results, fine tuning of parameters and extensive memory
requirements. Knn provides excellent accuracy, but has linear computational complexity.
Finding an optimal decision tree is a NP-Complete problem. Neural networks are highly
dependent on the initialization of weight vectors and generally, have large training time.
Also, the best fit structure of the neural network has to be intelligently guessed or
determined by trial and error method.

In the recent past, a lot of research centered at nearest neighbor methodology has
been done. Howeverone of the major drawbacks of kNN is that, it is a lazy learner i.e. it
uses all the training data at the runtime. The accuracy of kNN highly depends upon the
distance metric used. Euclidean distance is a simple and efficient method for computing
distance between two reference data points. More complex distance functions may
provide better results depending on the dataset and domain. But user may refrain from
using a better, generally computationally more complex, distance metric due to high run
time of the algorithm. This motivated us to strive for an algorithm which has a
significantly low run time and hence can incorporate expensive distance metrics with
ease.

4.4 NEAREST NEIGHBOR


One of the oldest, accurate and simplest method for pattern classification and
regression is K-Nearest-Neighbor (kNN) . kNN algorithms have been identified as one of
the top ten most influential data mining algorithms for their ability of producing simple
but powerful classifiers. It has been studied at length over the past few decades and is
widely applied in many fields. The kNN rule classifies each unlabeled example by the
majority label of its k-nearest neighbors in the training dataset.

35
Despite its simplicity, the kNN rule often yields competitive results. A recent
work on prototype reduction,called Weighted Distance Nearest Neighbor (WDNN) is
based on retaining the informative instances and learning their weights for classification.
The algorithm assigns a non negative weight to each training instance tuple at the training
phase. Only the training instances with positive weight are retained (as the prototypes) in
the test phase. Although the WDNN algorithm is well formulated and shows encouraging
performance, in practice it can only work with K = 1. A more recent approach WDkNN
tries to reduce the time complexity of WDNN and extend it to work for values of K
greater than 1.

Chawla and Liu in one of their recent work presented a novel K-Nearest
Neighbors weighting strategy for handling the problem of class imbalance. They
proposed CCW (class confidence weights) that uses the probability of attribute values
given class labels to weight prototypes in kNN. While the regular kNN directly uses the
probabilities of class labels in the neighborhood of the query instance, they used
conditional probabilities of classes. They have also shown how to calculate CCW weights
using mixture modeling and Bayesian networks. The method performed more accurately
than the existing state-of-art algorithms.

KaiYan Feng and others defined a new neighborhood relationship known as


passive nearest neighbors. For two points A and B belonging to class L, point B is the
local passive kth-order nearest neighbor of A, only and only if A is the kth nearest
neighbor of B among all data of class L. For each query point, its k actual nearest
neighbor and k passive nearest neighbors are first calculated and based on it, a overall
score is calculated for each class. The class score determines the likelihood that the query
points belong to that class.

In another recent work , Evan and others proposes to use geometric structure of
data to mitigate the effects of class imbalance. The method even works, when the level of
imbalance changes in the training data, such as online streaming data. For each query
point, a k dimensional vector is calculated for each of the classes present in the data.

The vector consist of distances of the query point to it•fs k nearest neighbors in
that class. Based on this vector probability that the query point belongs to a particular
class is calculated. However the approach is not studied in depth.

36
Yang Song and others proposes two different versions of kNN based on the idea
of informativeness.According to them, a point is treated to be informative, if it is close to
the query point and far away from the points with different class labels. One of the
proposed versions LI-KNN takes two parameters k and I, It first find the k nearest
neighbor of the query point and then among them it find the I most informative points.

Based on the class label of the informative points, class label is assigned to the
query point. They also showed that the value of k and I have very less effect on the final
result.The other version GI-KNN works on the assumption that some points are more
informative then others.

It tries to find global informative points and then assigns a weight to each of the
points in training data based on their informativeness. It then uses weighted euclidean
metric to calculate distances. In another recent work , a k Exemplar-based Nearest
Neighbor (kENN) classifier was proposed which is more sensitive to the minority class.
The main idea is to first identify the exemplar minority class instances in the training data
and then generalize them to Gaussian balls as concept for the minority class. The
approach is based on extending the decision boundary for the minority class.

4.5 DESIGN

In our new design, instead of considering the nature of the entire data, we only
consider the region around the neighborhood of the query instance. More clearly if k is
the number of neighbors that are used by the existing kNN algorithm to decide the class
of the query instance, then in this design we take into account the class distribution in k
+d nearest neighbors of the query instance.

In our previousdesigns, too much distant instances also have an affect while
classifying the query instance, but in this one, we try to limit the region, so that instances
within that region can only affect the decision. So for a given query instance xt the
modified kNN algorithm can be formally expressed as follows:

37
Where W(c, xt) denotes the weighting factor for class c and instance xt , and can be
defined as:

Where d is an input parameter, which can be taken as input from the user or can be
learned from the

data . This algorithm will more closely monitor the nature of the data around the
neighborhood of query instance.

In Fig., data points are present in clusters with each cluster having 1 major class.
If design 1

based classifier is used for classification on this type of data, it would have favored the
minority class

and have assigned minority class label to query instance

 In this case, regular kNN algorithm would fail to classify the minority class
instances present at the boundary of the clusters region (for example query
instance
 The current design will succeed to classify both of the query instances correctly.

Fig:4.2 A sample scenario where data points are present in clusters with each cluster
having 1 major class.

38
Weighting factor based on design 1 and 2, for each of the class is only a function
of that class distribution,so it can be computed in the pre-processing phase. Hence it will
computationally takes O(n) time (where n is the total number of training instances present
in the data) to calculate the weighting factor for all the classes in the pre-processing step.
Hence the kNN classifier based on any of the two designs have the same runtime as
existing kNN classifier. On the other hand for kNN classifier based ondesign 3, weighting
factor for each of the class is a function of that class distribution and query instance hence
cannot be calculated at the pre-processing stage. However for each query instance,
weighting factor for all the classes can be calculated simultaneously while getting the
neighbors of the query point. Hence kNN classifier based on design 3 is computationally
a little more expensive then existing kNN classifier.

In KNN, K is the number of nearest neighbors. The number of neighbors is the


core deciding factor. K is generally an odd number if the number of classes is 2. When
K=1, then the algorithm is known as the nearest neighbor algorithm. This is the simplest
case. Suppose P1 is the point, for which label needs to predict. First, you find the one
closest point to P1 and then the label of the nearest point assigned to P1

Fig:4.3 K Nearest Neighbor


Suppose P1 is the point, for which label needs to predict. First, you find the k
closest point to P1 and then classify points by majority vote of its k neighbors. Each
object votes for their class and the class with the most votes is taken as the prediction. For
finding closest similar points, you find the distance between points using distance
measures such as Euclidean distance, Hamming distance, Manhattan distance and
Minkowski distance. KNN has the following basic steps:
 Calculate distance
 Find closest neighbors
 Vote for labels

39
Fig:4.4 Initial Data

Fig:4.5 Distance Calculation

Fig:4.6 Finding Nearest Neighbor


The training phase of K-nearest neighbor classification is much faster compared to
other classification algorithms. There is no need to train a model for generalization, That
is why KNN is known as the simple and instance-based learning algorithm. KNN can be
useful in case of nonlinear data. It can be used with the regression problem. Output value
for the object is computed by the average of k closest neighbors value.
4.5.1 Drawback of KNN
The testing phase of K-nearest neighbor classification is slower and costlier in
terms of time and memory. It requires large memory for storing the entire training dataset
for prediction. KNN requires scaling of data because KNN uses the Euclidean distance
between two data points to find nearest neighbors. Euclidean distance is sensitive to
magnitudes. The features with high magnitudes will weight more than features with low
magnitudes. KNN also not suitable for large dimensional data.

40
4.5.2 To improve KNN:
 For better results, normalizing data on the same scale is highly recommended.
Generally, the normalization range considered between 0 and 1.
 KNN is notsuitable for the large dimensional data. In such cases, dimension needs
to reduce to improve the performance. Also, handling missing values will help us
in improving results.

41
CHAPTER 5

CNN CLASSIFIER

5.1 INTRODUCTION
Neural network, as a fundamental classification algorithm, is widely used in many
image classification issues. With the rapid development of high performance computing
device and parallel computing devices, convolutional neural network also draws
increasingly more attention from many researchers in this area.

In this project, we deduced the theory behind back-propagation neural network


and implemented a back-propagation neural network from scratch in Java. Then we
applied our neural network classifier to solve a tough image classification problem .
Moreover, we proposed a new approach to do the convolution in convolutional neural
network and made some experiments to test the functionality of dropout layer and
rectified linear neuron.
As a significant application of machine learning and pattern recognition theory,
image classification has become an important topic in individual’s life. Face recognition,
vehicle detection and signature verification are all excellent examples. In this project, we
choose a database which is a general image classification problem contains 10 different
classes of images, such as airplane, truck, frog and etc. Due to the low resolution and
complex image content, this problem is frequently treated as a hard issue in this area.
Through the implementation and application of the neural network algorithm, we
wish to deeply figure out how neural network will work for this image classification
issue. Moreover, we made sufficient experiments about how convolutional neural works
in this issue from different aspect. Through applying the knowledge we learned in this
course, we also proposed a improved version of convolutional neural network comparing
with the basic convolutional neural network.

5.2 BACKGROUND
Image classification has been one of the most important topics in the field of
computer vision and machine learning. As a popular benchmark in this field, the CIFAR-
10 databases (Krizhevsky, 2009) are frequently used as a benchmark to judge the
performance of an classification algorithm.

42
Many researchers paid neuromus efforts in this problem . Even though the best
known result has achieved a 94% of accuracy (Karpathy, 2011), it is still a quite
challenging issue and many other well designed algorithms in the performance ranking
list can only achieve around 70% of accuracy.

Neural network and a series of derivative algorithms (Mitchell, 1997) have been
applied for image classification problems for a long time. The whole network consists of
multiple layers of neurons and each of them can be treated as a single processing unit.
The basic idea of neural network is inspired by the biological neural networks
(Aleksander and Morton, 1990). The backpropagation algorithm, the most popular way to
train a multi-layered neural network, was proposed by Bryson and Yu-Chi (1969) and
further improved by Werbos (1974), Rumelhart et al (1986) .

Instead of simple multiplication between neuron outputs and weights,


convolutional neural network incorporates more properties such as convolution and
down-pooling (Simard et al, 2003). Due to the rapid development of computing platform
like GPU, increasingly more researchers start to apply this algorithm in complex image
classification. According to many other researchers work, we realize the difficulty of this
dataset and classification issue. Normally it takes more than dozens of hours of time to
train a good performance model, even with high performance GPU and some other
parallel computing techniques.

5.3CONVOLUTIONAL NEURAL NETWORK

Convolutional neural network works based on basic neural networks which was
described above. So what does the CNNs change? There are several variations on CNNs
layers architecture:

Convolutional Layer, Pooling Layer and Fully-Connected Layer. Fully-


Connected Layer is just acting as neural network which we have already covered in
previous.

43
Fig:5.1 Block Diagram Of CNN

CNN algorithm has two main processes: convolution and sampling, which will
happen on convolutional layers and max pooling layers.

5.3.1Convolution process

Every neuron takes inputs from a rectangular n× n section of the previous layer, the
rectangular section is called local receptive field.

Since the every local receptive field takes same weights wr,c and biases b from the
equation above, the parameters could be viewed as a trainable filter or kernel F, the
convolution process could be considered as acting an image convolution and the
convolutional layer is the convolution output of the previous layer. We sometimes call
the trainable filter from the input layer to hidden layer a feature map with shared
weights and bias.

Fig:5.2Convolution process

44
5.3.2 Sampling process

After each convolutional layer, there may be following a pooling layer. So the
sampling process happens between convolutional layer and pooling layer. The pooling
layer takes small rectangular blocks from the convolutional layer and subsamples it. In
our work, we will take the maximum of the block as the single output to pooling layer.

Fig:5.3 Sample process

CNNs have much fewer connections and parameters, and also they are easier to
train. Discarding the fully connected strategy means to pay more attention on the
regional structure, which is very meaningful when we take image processing into
consideration, since there are less relations between different region of an image.

Activation Units

As usual, the standard activation of a neurons output are y(x) = (1+ e−x)−1 or y(x) =
tanh(x) where x as the input.
Vanishing gradient problem
Vanishing gradients occurs when higher layer units are nearly saturated at -1 or 1,
leading lower layers of a Deep Neural Network to have gradients of nearly 0. Such
vanishing gradients cause slow optimization convergence, and in some cases the final
trained network may fall off to a poor local minimum station.
Regulation Dropout

Dropout is a ways of controlling the capacity of Neural Networks to prevent


overfitting. It’s extremely simple and effective. During training, the dropped out neurons
do not participate in feedforward and also backpropagation. Dropout could be viewed as
sampling a neural network within the full Neural Network, and only updating the
parameters of the sampled network based on the input data. Dropout layer was always
included in between fully-connected layers. In dropout layer, the choice of which units to
drop is random. In our experiment, we set all the units with a fixed probability 0.5.

45
5.3.3 How we use the neural network

The input layer of the network contains neurons encoding the values of the input
pixels. Our training data for the network will consist of 32 by 32 pixel images from
CIFAR-10 dataset, and so the input layer contains 1024 (32*32) neurons.

Fig:5.4 Structure Of Neural Network

The second layer of the network is a hidden layer. We denote the number of
neurons in this hidden layer by n, and we’ll experiment with different values for n. The
example shown illustrates a small hidden layer containing just n=15 neurons.

The output layer of the network contains 10 neurons which stand for 10 types of
image label. We number the output neurons from 0 through 9 and select the neuron with
the highest activation value as predicting result.

5.4 OUR WORK

5.4.1 Data acquisition

The database can be obtained from https://www.cs.toronto.edu/ kriz/cifar.html. This


resource consists of 6 batches and each contains 10000 32*32 color images in 10 classes
and the data is given in the format of 10000*3072 numpy array for each batch to
represent 10000 RGB image. To apply this data in our works, we wrote some python
scripts to reshape this data into two different formats for both our Java version of neural
network and the python version of convolution-al neural network. Because we are not
focusing on getting best performance as a course project, we choose to change the RGB
image into grayscale to save computation time and make it easy to process.

46
5.4.2 Implementation of Neural Network

As the first part of our work, after a detailed deduction of back-propagation, we


implemented a Java version of neural network from scratch with-out using any external
machine learning libraries. Then we applied our neural network to this image
classification problem and test the performance for different parameters.

In our implementation, we used the Java learning framework provided by the class
as the skeleton of our code. Through writing python scripts, we transform the data to the
same format with our homework data and write it to disk. Therefore we can use the
existing library to load our data. After that, each single pixel in our grayscale image plays
a rule of single neuron input at the first layer. From the perspective of algorithm itself, we
use mini-batch stochastic gradient de-scent to train the model, and more specifically the
backpropagation algorithm mentioned above is applied to calculate the gradients.

During the process of our implementation, the time efficiency for our Java
program bothered us a lot. Relatively speaking, Java has less support for complex and
large scale of matrix operations. After we firstly applied our implemented program to the
dataset, through Eclipse debugger we went through the output of each layer and the result
seems reasonable. However, even only using one batch of data set (10000 images), a
single iteration of three-layer network takes several minutes to run. As a comparison, we
configured the same parameters and data size to a python neural network library, the
output result is quite similar, but a single iteration only takes a few seconds instead.

After many testing, we figured out this is caused by Java’s garbage collection
mechanism. We followed a Princeton’s code to do the matrix operation in Java. It creates
a new array object in the memory for every matrix operation. When the computation
become really complex, like in this neural network application, it would make JVM
frequently collect garbage in the heap, which takes significant amount of time. To solve
this problem, we also pass the result array to the matrix operation method as a parameter.
Due to the reference passing for object in Java, we can avoid creating many new objects
in this case. The running time turns out to be much more efficient, but still not as good as
python. This is also an important reason for us to move to python for the rest of our work,
convolutional neural network.

47
5.5 CONVOLUTIONAL FILTER
Through the exploration of how convolutional neural networks and how it is
different from a normal neural network, we find an interesting problem about the
convolution layer. Generally speaking, convolution can be treated as a way of feature
extraction in computer vision and image processing. The difference is that this kind
“feature extraction” can also “learn”. A traditional way to initialize convolutional filters,
which also means the parameters of the first layer, is through some “random” method,
such as uniform distribution. Then during the process enough amount of training, these
initial weights would become increasingly more reasonable. Below is an example of a
convolution layer which contains10 different initial filters.

Fig:5.5 Convolutional filters generated by uniform distribution

Based on this, we tried to make some improvement and then apply it to our input is
the problem. Instead of traditional convolution kernel, we wish to use some different
image filter to implement the convolutional layer. For example, different laplacian filter,
sobel filter, prewitt filter and etc.

Fig: 5.6 Our convolutional filters

48
Fig:5.7 The convoluted result after a hundred iterations

Fig:5.8The convoluted result after a hundred iterations

Dropout

We have tried to add dropout layer in the basic convolutional neural network between last
convolutional layer and fully connected layer (output layer). We wish to explore how
well the dropout layer has regularied.

ReLU

To contrast the difference by using sigmoid and rectified function as activation,


we tried to use sigmoid and rectified function respectively to see distinct result.

It shows the validation accuracy under five neural network with different
structure: From the experiment, all the accuracy was picked by maximum value near the
maximum point of (tran cost/validation cost) in timeline. The table indicated that the
neural network with more hidden layer mill perform better than the neural network with
less hidden layer in some limit. (Since training a neural network with much more layer
will cost numerous time which is impossible to run on our normal device, we only diccuss
the this issue within our experiment).

49
We also found out that training with small mini batch size takes more time to
reach the optimal, and it’s more possible that traing with large mini batch size may fall to
local optimal.

Convolutional Neural Network performs much better than the original Neural
Network on classifying the dataset. CNNs has increased almost 20 percent accuracy with
less parameters, and also the training time was reduced, though it’s not obvious.

According to Figure n, it is obviously that our proposed new convolution


approach achieves a much better performance at first, and after a bunch of iterations the
difference between the two approaches performances become less and less. However,
even after two methods start to converge respectively, our new approach still has a
advantage over the traditional method.

50
CHAPTER 6

MATLAB SOFTWARE

6.1 INTRODUCTION
MATLAB is a tip pinnacle tongue for precise making ready .It organizes figuring
belief and in addition programming in a easy to make use of condition. Tangle lab stays
for grid observe focus. It changed into shaped at the start to give direct get right of entry
to to prepare programming made by way of LINPACK and EISPACK meanders.
MATLAB is as desires be based on foundation of cutting part matrix programming
wherein fundamental segment is arrange which require no longer utilize pre dimensioning
Normal jobs of MATLAB
 Math and estimation
 Algorithm change
 Data getting
 Data examination, examination and acknowledgment
 Scientific and building designs
 The guideline features of MATLAB
(i) Propel figuring for unrivaled numerical include, particularly the Field element
polynomial math
(ii) An expansive get-together of predefined predictable purposes of control and the
ability to portray one's own specific limits.
(iii)Two-and three dimensional layouts for plotting and indicating data
(iv) An aggregate online help structure
(v) Skilled, component or vector dealt with odd state programming tongue for
particular applications.
(vi) Tool compartments accessible for managing in the midst of cutting edge issues in
a few utilize districts.

51
Fig:6.1 Applications of MATLAB

6.2THE MATLAB SYSTEM


The MATLAB substance incorporates five fundamental parts:
6.2.1 Development Environment
This is the way of contraptions and workplaces that assistance you work
MATLAB cutoff centers and realities. A fundamental scope of those devices is graphical
UIs. It solidifies the MATLAB artistic creations region and Command Window, a value
history, a bit of composing supervisor and debugger, and tries for diagram help, the
workspace, records, and the pastime way.
6.2.2 The MATLAB Mathematical Function Library
This is an enormous party of computational counts running from clear limits, like
entire, sine, cosine, and complex number juggling, to more present day limits like
component rotate, component Eigen regards, Bessel cutoff focuses, And practical Fourier
changes.

52
6.2.3 The MATLAB Language
This is a shocking kingdom system/group dialect amidst oversee course
illuminations, limits, records structures, enter/yield, and investigate composed
programming features. It licenses both "programming inside the little" to hurriedly make
brisk and tarnished unimportant games, and "programming inside the liberal" to make
sizeable and confused utility projects.
6.2.4 Graphics
MATLAB has broad sketches situations for showing vectors and factors as
diagrams, and aside from clarifying and printing these charts. It wires standard nation
confines amidst regards to two-dimensional and three-dimensional information
delineation, video orchestrating, improvement, and creation depictions. It likewise joins
low-degree compels that empower you to exceptionally well change the vicinity of
systems and paying little mind to make end graphical UIs to your MATLAB bundles.
6.2.5 The MATLAB Application Program Interface (API)
This is a lib. That enables you to frame C and Fortran prog's that interface amidst
MATLAB. It solidifies work environments for calling designs from MATLAB (dynamic
partner), calling MATLAB as a computational motor, and for breaking down and making
MAT-documents.

6.3 STARTING MATLAB


On Windows stages, begin MATLAB by twofold tapping the MATLAB exchange
way picture on your Windows work zone. On UNIX stages, begin MATLAB by
impacting mat lab at the working portion to actuate. You can change MATLAB startup.
For instance, you can change the registry in which MATLAB begins or thusly execute
MATLAB affirmations in a substance record named new affiliations.
6.3.1 MATLAB Desktop
When you start MATLAB, the MATLAB work area shows up, containing
contraptions(graphical UIs) for administering documents, segments, and applications
related in the midst of MATLAB. The running with plot demonstrates the default work
zone. You can change the arrangement of contraptions and reports to suit your
necessities. For more information about the work district mechanical social events.

53
6.3.2 .MATLAB Working Environment MATLAB Desktop
MATLAB Desktop is the vital Mat lab utility window. The work zone comprises
of 5 sub home windows: the call for window, the workspace application, the present
registry window, the summon records window, and no short of what one parent home
windows, that are demonstrated definitely when the benefactor surely understood
demonstrates a the distance sensible.
The charge window is the situated inside the purchaser writes MATLAB sales and
verbalizations on the induce (>>) and in which the yield of these requesting is appeared.
MATLAB delineates the workspace on the grounds that the procedure of fragments that
the client makes in a piece session. The workspace program exhibits these segments and a
few records about them. Twofold tapping on a variable inside the workspace
programming dispatches the Array Editor, which likely used to get measurements and pay
events exchange certain properties of the variable.
The blessing Directory tab over the workspace tab shows the substance of the
current registry, whose way is respected in the blessing report window. For example,
inside the windows working substance the way can likewise accord to the running with:
C:MATLABWork, demonstrating that registry "work" is a subdirectory of the essential
record "MATLAB", WHICH IS INSTALLED IN DRIVE C. Tapping at the dash inside
the present stock window demonstrates a snappy assessment of starting late connected
strategies. Tapping at the seize to the contrary aspect of the window connects with the
customer to substitute the common once-over.
The Command History Window incorporates a report of the summons a customer
has entered inside the charge window, alongside each present and past MATLAB periods.
In support entered MATLAB charges possibly picked and re-executed from the summon
History window by means of legitimate tapping on a demand or course of action of
solicitations. This development dispatches a menu from which to pick diverse choices
despite executing the expenses. This is an utilization to pick stand-out choices
notwithstanding executing the expenses. This is helpful viewpoint while attempting
exceptional issues in the midst of various costs in a work session.

54
6.3.3 Using the MATLAB Editor to make M-Files
The MATLAB manager is each an expression processor novel for making M-data
and a graphical MATLAB debugger. The book pioneer can appear in a window and not
utilizing an other character or it likely a sub window in the work zone. M-records are
shown by means of the exchange .M, as in pixelup.M. The MATLAB stream chief
window has exact draw down menus for assignments, as a case, sparing, seeing, and
taking a gander at audits. Since it plays out some unmistakable exams additionally utilizes
shading to isolate among different bits of code, this expression processor is clutched in
light of the fact that the apparatus of want for trim and propelling M-limits. To open the
movement real, type exchange on the prompt opens the M-report filename's in a digital
book executive window, made creating. As observed some time starting past due, the
record must be inside the stream registry, or in an abstract inside the side interest way.
Getting Help
The principal procedure to bargain in the midst of get help on line is to utilize the
MATLAB help program, opened as a substitute window both by method for tapping at
the question mark picture (?) at the artworks domain toolbar, or with the guide of
impacting help to application at the impact in the demand window. The help Browser is a
web programming made into the MATLAB work an area that introductions a Hypertext
Markup Language (HTML) records. The Help Browser joins two sheets, the help set up
sheet, used to find data, and the show sheet, used to see the realities.
Plain as day tabs other pilot sheet are utilized to play out an intrigue. For instance,
help on a chose purpose of containment is gotten by choosing the interest tab, choosing
Function Name in light of the fact that the Search Type, and after that composition inside
the cutoff name inside the Search for teach. It is superb exercise to open the Help
Browser nearer to the start of a MATLAB session to have help fast convenient in the
midst of code advancement or distinctive MATLAB task.
Another method to good buy inside the midst of get for a specific factor of
confinement is by creating document taken after with the aid of the maximum call at the
demand incites. For example, growing document design exhibits documentation for the
cutoff points referred to as set up inside the display sheet of the Help Browser. This
summons opens the program on the off threat that it is not at once open.

55
6.3.4 Saving and Retrieving a Work Session
There are a few ways to deal with deal amidst extra and stack an entire work
session or picked workspace factors in MATLAB. The scarcest complex is according to
the running with. To spare the whole workspace, in a general sense right-tap on any
sensible space in the workspace Browser window and select Save Workspace As from the
menu that shows up. This opens a registry window that licenses naming the record and
picking any facilitator in the structure in which to spare it. By then basically snap Save.
To spare a picked variable from the workspace, select the variable in the midst of
a left snap and a compact navigate later right-tap on the included zone. By then select
Save Selection As from the menu that shows up. This again opens a window from which
a facilitator maybe spared the variable.
To pick different segments, utilize move snap or control click in the standard way, and
after that utilization the framework basically depicted for a solitary variable. All records
are spared in the twofold exactness, parallel outline amidst the amplification '. Tangle'.
These spared reports normally are recommended as MAT-records. For instance, a session
named, says mywork_2003-02-10, and would show up as the MAT-record
mywork_2003_02_10.mat when spared. Additionally, a spared video called last video
will show up when spared as final_video.mat.
6.3.5 Graph Components
MATLAB shows diagrams in an exceptional window known as a figure. To make
a graph, you need to portray an arrange structure. In this manner each chart is put inside
tomahawks, which are contained by the figure. The true blue visual depiction of the data
is skilled in the midst of plans objects like lines and surfaces. These articles are drawn
inside the arrangement with substance delineated by the tomahawks, which MATLAB
conventionally makes especially to oblige the level of the data. The real data is secured as
properties of the plans objects.
6.3.6 Plotting Tools
Plotting devices are related with figures and make a zone for making Graphs.
These instruments draw in you to do the running with:
• Select from a wide assortment of diagram sorts
• Change the kind of graph that tends to a variable
• See and set the properties of portrayals objects
• Annotate plots amidst substance, shocks, and so on.

56
• Create and manage subplots in the figure
• Drag and drop data into plot
Exhibit the plotting instruments from the View menu or by tapping the plotting
mechanical get together's picture in the figure toolbar, as appeared in the running with
picture.
6.3.7 Editor/Debugger
Use the Editor/Debugger to make and research M-records, which are programs
you write to run MATLAB limits. The Editor/Debugger gives a graphical UI to word
preparing, and for M-record taking a gander at. To make or adjust a M-report use File >
New or File > Open, or use the change work.
6.3.8 Feature Development
Feature headway happens as takes after:
• A MySQL feature is demonstrated in a Work log area.
• The Work log entry encounters detail, layout, designing and QA reviews
(yet not by any stretch of the imagination in a strict progression).
• The MySQL feature is executed in a component tree.
• Feature trees are produced using and kept in a condition of congruity in the
midst of the MySQL basic change tree, which is called TRUNK
• When a part has been executed, it encounters a code study.
• When the code overview is done, the part tree is offered over to QA
(quality certification).
• QA tests the part, the implementer fixes bugs, and QA over the long haul
"shuts down" the component.
• Once the segment is shut down, it is joined into TRUNK. Thusly, TRUNK
will total features and bug settles after some time. Wide backslide testing
is performed on TRUNK continually, keeping TRUNK close Release
Candidate (RC) quality reliably.4

6.4 FEATURE TESTING


New capabilities in MySQL are made and attempted in discrete phase timber
earlier than they're pushed to TRUNK. Quality goals for brand new functions are the
going with:

57
• Complete helpful and nonfunctional take a look at extent of changed and
new price
• No backslides
• At least extra than eighty% code scope QA association starts off evolved
while the necessities and particulars of the section are settled with the aid
of the headway collecting.
QA overviews available files and gives contribution on the association,
accommodation, testability, et cetera. An alternate takes after in the midst of the designer
and modifications are fabricated from course to make sure that the element probably
attempted.
Once the particulars and necessities are suitable, QA affects the test to organize
which reviews all circumstances which are to be attempted. This fuses free tests,
becoming a member of exams, nonfunctional tests, et cetera. The take a look at
configuration is kept an eye on with the aid of designers and partner QA buddies. While
the planners are forming the component code, QA engineers start tackling the mechanized
tests, test status quo enhancements, et cetera. The final spherical of trying out starts after
the aspect has passed code studies. This degree can last anyplace among numerous days
to months, structured upon the multifaceted concept of the factor, satisfactory of the code,
variety of bugs found, et cetera. Features get close down when the going with situations
are met:
• No known bugs in the new part – This is inside and out affirmed and even minor bugs
are not permitted. We accept that bugs are most simple to settle when the code is new,
and thusly this can enable us to pass on highlights that are of high bore.
• No known falls away from the faith – A part gets made on a tree which gets endeavored
as regularly as conceivable through a consistent joining testing contraption. Any descends
into sin are seen and settled before signoff.
• Adequate code scope numbers – A code scope report is made for the changed lines of
code and the base expected degree is 80%. Most highlights have a degree of over 90%.
Any revealed lines of code are broke down and, wherever conceivable, new tests are
added to develop code scope.
• Every single new test are added to the mechanized apostatize suite.

58
The MySQL web page depicts MySQL as the "world's maximum widespread
Open Source database." Its regularity isn't any weakness maintained via the manner that
on the off chance that you require MySQL for non-enterprise make use of, you could
download a reproduction loose from the internet site web page. MySQL is about
dependably packaged amidst the PHP internet scripting vernacular, and the 2 matters are
as regularly as attainable regarded stated collectively. Most Linux spreads run with
MySQL and PHP as popular and MySQL has been ported for use to a large collection of
stages. Because of its packaging amidst PHP, MySQL is routinely utilized as a database
lower back quit to a web server.
DBMS MySQL is what's referred to as a Database Management element (DBMS).
The company substance picks how the statistics is secured, masterminded and
recuperated, and also controlling client get entry to to it. Each time a purchaser recoups
records, eradicates facts, or contains more records, the DBMS deals with the request. The
patron can't get to the records facts mainly, he can sincerely speak within the midst of the
DBMS.
The company substance is an impediment that controls get entry to to the
shrouded records., and cannot go especially to the database itself. MySQL Databases
MySQL can manipulate multiple databases without delay. For instance, while you gift
MySQL, the detail makes the element database which is called mysql. This database
includes most of the 2 Database Design Manual: using MySQL for Windows facts
required to describe any of the sports that MySQL desires to carry out. It shops purposes
of enthusiasm of diverse databases, clients and every and every different file that the
element makes use of to shop facts. It itself is an aggregation of records used for a
specific reason.
This makes My SQL self-delineating, in that the tables that it shops are used to
depict particular tables that it stores whilst you are affecting your personal specific plans
of facts to make those in some other database. In this e-book we are able to use the mysql
database to examine particular shape limits, however by a ways maximum of the
alternative information that we make might be secured in a database called mysqlfast.
MySQL can without a number of a widen manipulate multiple database, so as to hold
your tables being combined up for thing facts, it's far first-rate to drag back them by using
the use of particular databases.

59
6.5 SQ LITE
This SQ Lite instructional exercise demonstrates to all of you that you need to
know to start using SQ Lite sufficiently. You will learn SQ Lite through wide hands-on
sharpens. SQ Lite Tutorial If you have been working in the midst of other social database
organization elements e.g., My SQL, Posture SQL, Oracle, Microsoft SQL Server, et
cetera., and you got some answers concerning SQ Lite. In addition, you are intrigued to
get some answers concerning it. In case your sidekicks recommended you use SQ Lite
database as opposed to essential archives to regulate composed data in your applications.
You have to start in the midst of SQ Lite rapidly to check whether you can utilize it for
your applications. In case you are basically starting learning SQL and need to use SQ Lite
as the database system. In case you are one of the overall public portrayed over, this SQ
Lite instructional exercise is for YOU. SQ Lite is an open source, zero-game plan,
autonomous, stay single, trade social database engine expected to be embedded into an
application.
Getting Help:
The major way to get help online is to use the MATLAB assist browser, opened as
a separate window each thru clicking at the query mark symbol (?) at the computing
device toolbar, or via typing help browser at the activate inside the command window.
The assist Browser is an internet browser protected into the MATLAB computer that
displays a Hypertext Markup Language (HTML) files. The Help Browser includes two
panes, the help navigator pane, used to discover statistics, and the display panel used to
view the facts. Self-explanatory tabs apart from navigator pane are used to carry out an
are looking for.

60
CHAPTER 7

RESULT AND DISCUSSION

7.1 PERFORMANCE AND PARAMETERS

7.1.1 PERFORMANCE
The performance evolution method is used to evaluate the use of segmentation in
CNN classifier is carried out using measures like Sensitivity, Specificity, F-score,
PerfectClassification (PC), Missed Classification (MC), False Alarm (FA), Performance
Index (PI) and Precision.
Those parameter calculated by True Positive (TP), True Negative (TN), False
Positive (FP), False Negative (FN). EM algorithm produces the highest SVM classifier
accuracy when compared against other segmentation algorithms. The classifier
performance is analyzed by the following Performance Measures.
True Positive (TP): Abnormal brain correctly identified as abnormal.

True Negative (TN): Normal brain correctly identified as normal.

False Positive (FP): Normal brain incorrectly identified as abnormal.

False Negative (FN): Abnormal brain incorrectly identified as normal.

If you’re building an image classifier these days, you’re probably using a


convolutional neural network to do it. CNNs are a type of neural network which build
progressively higher level features out of groups of pixels commonly found in the images.
How an image scores on these features is then weighted to generate a final classification
result. CNNs are the best image classifier algorithm we know of, and they work
particularly well when given lots and lots of data to work with.Progressive resizing is a
technique for building CNNs that can be very helpful during the training and optimization
phases of a machine learning project. To understand why, we must first understand that
the most important features of an image classification problem are “large”. Properly tuned
gradient descent naturally favors robust, well-supported features in its decision-making.
In the image classification case this translates into features occupying as many pixels in
as many of the sample images as possible.

61
For example, suppose we teach a neural network to distinguish between oranges
and apples. Suppose that one model classifies by distinguishing between “orange” and
“red”, and another classifies by distinguishing between “stem shaped like an orange
stem” and “stem shaped like an apple stem”. The first model is robust: any image we
score, no matter how small or misshaped, will have orange pixels and red pixels usable by
the model. The second model is not: we can image images so small that the stems are not
easily distinguishable, or images with the stem cropped out, or images where the stems
have been removed outright.
The practical result is that while a model trained on very small images will learn
fewer features than one trained on very large images, the ones that it doeslearn will be the
most important ones. Thus a model architecture that works on small images will
generalize to larger ones.Meanwhile, small-image models are much faster to train. After
all an image input size twice as large has four times as many pixels to learn on.

 We started with a model that trained on tiny 48x48 pixel images, which is 53%
accurate.
 Next we trained a model on 96x96 pixel images, reusing our 48x48 classifier
within our model and achieving 59% accuracy.
 Finally, we trained a model on 192x192 pixel images, reusing our 96x96 classifier
(which in turn reused our 48x48 classifier!) and achieving 61% accuracy.

Note that, in the interest of time, I did not spend much time experimenting with the new
layers I added in steps 2 and 3. To improve performance further, here are some other
things you could try:

 Increasing or decreasing the number of new convolutional layers you add.


 Increasing or decreasing the number of nodes in each new convolutional layer.
 Tuning hyper-parameters like activation functions and learning rates.
 Experimenting with the image preprocessing you apply.
 Unfreezing the fully-connected layer and adjusting/training that as well.
 Unfreezing one or more preexisting convolutional layers and (re-)training those as
well.

62
We started this article off by discussing the fact that convolutional neural
networks trained on small images are both fast and easy to train and readily generalizable
to larger image inputs. This makes them good models to build during the early
experimental phase of a project, when you’re still just trying to get to grips with a basic
network architecture that works well. We can concentrate on dashing off quick one-off
models now, and on scaling them up and fine-tuning performance later. Finally, we heard
that models trained this way can often achieve equal or better performance than models
trained from scratch.

Consider the alternative: building and tuning a full-sized 196 x 196 model from
the start. This model would have an unrestricted “model finding space”: it could
theoretically converge on any combination of layer weights that works best for the given
problem. Progressive resizing is much more restrictive: it require that a model that works
well on 2n x 2n images must subsume a model that works well on n x n.

This could in theory mean that we “miss” an even better architecture that a model
built from scratch could converge on. But in practice, models built using progressive
resizing principles often actually dobetterthan models built from scratch.

The theoretical reason why is a mystery. One compelling theory, courtesy of


Miguel Perez Michaus, is that it improves the ability of the model to learn “scale-
dependent” patterns. He writes about this in the excellent blog post “Yes, Convolutional
Neural Nets do care about Scale”; I recommend reading it if you want to learn more.

Nevertheless, progressive resizing is an interesting technique and a useful


approach to image modeling problems And hopefully now that you have read this article,
another useful tool in your deep learning toolbox.

63
7.1.2 PARAMETERS

a. Perfect Classification (PC): It is the ability of the classifier to identify the data
correctly. The error rate will be totally zero, where no true negatives and false
positives are found.

a.
b. Missed Classification (MC): It is opposite to the perfect classification. Missed
classification is the total number of misclassified data with respect to the total
number of inputs given. Sensitivity: It relates to the test’s ability to identify
positive results.

c. Sensitivity: It relates to the test’s ability to identify positive results .

d. Specificity: It relates to the test’s ability to identify negative results.

e. Precision: It (also called as positive predictive value) is the probability that a


positive prediction is correct.

f. F_score: It is harmonic mean of precision and recall, which is given by:

64
g. Accuracy: It is the proportion of the total number of predictions that were correct.

h. False Alarm (FA): It is also called False Positive Rate (FPR). It is the proportion
of negatives cases that were incorrectly classified as positive.

i. Performance Index (PI): It is used to calculate the efficiency of the classifier . The
Performance index defined as following equation

65
7.2 RESULTS

Fig:7.1 Feature Extraction

66
1.Parameter Calculation

False
Classifier Sensitivity Selectivity Precision Accuracy Alarm Fscore
KNN 50 60 26 50 0 34
CNN 81 240 86 83 0 83

Table:7.1 Parameter Calculation

Classifiers And Parameters


300

250

200

150

100

50

0
Sensitivity Selectivity Precision Accuracy False Alarm Fscore

KNN CNN

Fig;7.2 Classifiers And Parameters

67
2.Comparision Of Classifiers With accuracy

Classifier Accuracy
KNN 50
CNN 83

Table:7.2 Comparision Of Classifiers With accuracy

Accuracy
90
80
70
60
Accuracy

50
40
30
20
10
0
KNN CNN
Classifiers

Fig:7.3 BAR graph representation of Accuracy

68
CHAPTER 8

CONCLUSION

We started this project off by discussing the fact that convolutional neural
networks trained on small images are both fast and easy to train and readily generalizable
to larger image inputs. This makes them good models to build during the early
experimental phase of a project, when you’re still just trying to get to grips with a basic
network architecture that works well. We can concentrate on dashing off quick one-off
models now, and on scaling them up and fine-tuning performance later. Finally, we heard
that models trained this way can often achieve equal or better performance than models
trained from scratch.

Consider the alternative: building and tuning a full-sized 196 x 196 model from
the start. This model would have an unrestricted “model finding space”: it could
theoretically converge on any combination of layer weights that works best for the given
problem. Progressive resizing is much more restrictive: it require that a model that works
well on 2n x 2n images must subsume a model that works well on n x n.

This could in theory mean that we “miss” an even better architecture that a model
built from scratch could converge on. But in practice, models built using progressive
resizing principles often actually do betterthan models built from scratch.

The theoretical reason why is a mystery. One compelling theory, courtesy of


Miguel Perez Michaus, is that it improves the ability of the model to learn “scale-
dependent” patterns. He writes about this in the excellent blog post “Yes, Convolutional
Neural Nets do care about Scale”; I recommend reading it if you want to learn more.

Nevertheless, progressive resizing is an interesting technique and a useful


approach to image modeling problems And hopefully now that you have read this article,
another useful tool in your deep learning toolbox.

69
 First application of deep learning techniques in under water image
processing.
 Introduction of new coral-labeled dataset “Atlantic Deep Sea” representing
cold-water coral reefs.
 Investigation of convolutional neural networks in handling noisy large-
sized images,manipulating point-based multi-channel input data.

We have successfully completed our challenge to implement the accuracy to


determine the tumour disease by using convolutional neural network classifier with
almost 90% accuracy. The existance system i.e., K- Nearest Neigbhour has the accuracy
of just 53% which is very much less than the CNN classifier accuracy.

Hence, CNN is more accurate than the KNN classifier for image classification.

70
CHAPTER 9

FUTURE SCOPE

The future of image processing will involve scanning the heavens for other
intelligent life out in space. Also new intelligent, digital species created entirely by
research scientists in various nations of the world will include advances in image
processing applications. Due to advances in image processing and related technologies
there will be millions and millions of robots in the world in a few decades time,
transforming the way the world is managed. Advances in image processing and artificial
intelligence6 will involve spoken commands, anticipating the information requirements of
governments, translating languages, recognizing and tracking people and things,
diagnosing medical conditions, performing surgery, reprogramming defects in human
DNA, and automatic driving all forms of transport. With increasing power and
sophistication of modern computing, the concept of computation can go beyond the
present limits and in future, image processing technology will advance and the visual
system of man can be replicated. The future trend in remote sensing will be towards
improved sensors that record the same scene in many spectral channels. Graphics data is
becoming increasingly important in image processing applications. The future image
processing applications of satellite based imaging ranges from planetary exploration to
surveillance applications.
Using large scale homogeneous cellular arrays of simple circuits to perform image
processing tasks and to demonstrate pattern-forming phenomena is an emerging topic.
The cellular neural network is an implementable alternative to fully connected neural
networks and has evolved into a paradigm for future imaging techniques. The usefulness
of this technique has applications in the areas of silicon retina, pattern formation, etc.
Majorly the convolutional neural network can be implemented in all fields of our
daily life.The present accuracy is 84% and that may be implemented to 100% in future by
using more advanced techniques in the CNN classifier.
 Composition of multiple deep convolutional models for N-dimensional data.
 Development of real time image/video application for coral recognition
anddetection.
 Intensive nature analysis for different coral classes in variant aquatic
environments.

71
REFERENCES
1. Lillesand, T.M. and Kiefer, R.W. and Chipman, J.W., in “Remote Sensing and
Image Interpretation” 5th ed. Wiley, 2004
2. Li Deng and Dong Yu “Deep Learning: methods and applications” by Microsoft
research [Online] available at: http://research.microsoft.com/pubs/209355/NOW-
Book-RevisedFeb2014-online.pdf
3. McCulloch, Warren; Walter Pitts, "A Logical Calculus of Ideas Immanent in
Nervous Activity”, Bulletin of Mathematical Biophysics 5 (4): 115–133(1943)
4. An introduction to convolutional neural networks [Online]available
at:http://white.stanford.edu/teach/index.php/An_Introduction_to_Convolutional_
Neural_Networks
5. Hubel, D. and Wiesel, T. (1968). Receptive fields and functional architecture of
monkey striate cortex. Journal of Physiology (London), 195, 215–243C. J.
Kaufman, Rocky Mountain Research Laboratories, Boulder, Colo., personal
communication, 1992. (Personal communication)
6. Yann LeCun, Leon Bottou, Yodhua Bengio and Patrick Haffner, “Gradient -Based
Learning Applied to Document Recognition”, Proc. Of IEEE, November 1998.
7. S. L. Phung and A. Bouzerdoum,”MATLAB library for convolutional neural
network,” Technical Report, ICT Research Institute, Visual and Audio Signal
Processing Laboratory, University of Wollongong. Available at:
http://www.uow.edu.au/˜phung
8. Tutorial on deep learning [Online] available at :
http://deeplearning.net/tutorial/lenet.html
9. Adelson, Edward H., Charles H. Anderson, James R. Bergen, Peter J. Burt, and
Joan M. Ogden. "Pyramid methods in image processing." RCA engineer 29, no. 6
(1984): 33-41.
10. M. Riedmiller and H. Braun, “A direct adaptive method of faster backpropagation
learning: The rprop algorithm”, in IEEE International Conference on Neural
Networks, San Francisco, 1993, pp. 586– 591.
11. S. L. Phung, A. Bouzerdoum, and D. Chai, “Skin segmentation using color pixel
classification: analysis and comparison,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 27, no. 1, pp. 148–154, 2005.

72
12. Yi Yang and Shawn Newsam, "Bag-Of-Visual-Words and Spatial Extensions for
Land-Use Classification",ACM SIGSPATIAL International Conference on
Advances in Geographic Information Systems (ACM GIS), 2010.
13. J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba, “SUN Database: Large-
scale Scene Recognition from Abbey to Zoo”, IEEE Conference on Computer
Vision and Pattern Recognition (CVPR)
14. J. Xiao, K. A. Ehinger, J. Hays, A. Torralba, and A. Oliva, “SUN Database:
Exploring a Large Collection of Scene Categories”, (in revision) International
Journal of Computer Vision (IJCV)
15. Source for highway images [Online] National Highway Authority of India,
nhai.org
16. S. Daniel Madan Raja1, Dr.A.Shanmugam, “ANN and SVM Based War Scene
Classification using Wavelet Features:A Comparative Study”, Journal of
Computational Information Systems 7:5 (2011) 1402-1411
17. Source for war scene images [Online] available at: military.com and
militaryfactory
18. R.M.BellandY.Koren.Lessonsfromthenetflixprizechallenge.ACMSIGKDDExplor
ationsNewsletter, 9(2):75–79, 2007.
19. A. Berg, J. Deng, and L. Fei-Fei. Large scale visual recognition challenge 2010.
www.imagenet.org/challenges. 2010.
20. L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
21. D. Cires¸an, U. Meier, and J. Schmidhuber. Multi-column deep neural networks
for image classification. Arxiv preprint arXiv:1202.2745, 2012.
22. D.C. Cires¸an, U. Meier, J. Masci, L.M. Gambardella, and J. Schmidhuber. High-
performance neural networks for visual object classification. Arxiv preprint
arXiv:1102.0183, 2011.
23. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-
Scale Hierarchical Image Database. In CVPR09, 2009.
24. J. Deng, A. Berg, S. Satheesh, H. Su, A. Khosla, and L. Fei-Fei. ILSVRC-2012,
2012. URL http://www.image-net.org/challenges/LSVRC/2012/.
25. L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few
training examples: An incremental bayesianapproachtestedon101objectcategories.
Computer Vision and Image Understanding, 106(1):59–70, 2007.

73
26. G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset.
Technical Report 7694, California Institute of Technology, 2007. URL
http://authors.library.caltech.edu/7694.
27. G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.R. Salakhutdinov.
Improving neural networks by preventing co-adaptation of feature detectors. arXiv
preprint arXiv:1207.0580, 2012.
28. K. Jarrett, K. Kavukcuoglu, M. A. Ranzato, and Y. LeCun. What is the best multi-
stage architecture for object recognition. In International Conference on Computer
Vision, pages 2146–2153. IEEE, 2009.
29. A. Krizhevsky. Learning multiple layers of features from tiny images. Master’s
thesis, Department of Computer Science, University of Toronto, 2009.
30. A. Krizhevsky. Convolutional deep belief networks on cifar-10. Unpublished
manuscript, 2010.
31. A. Krizhevsky and G.E. Hinton. Using very deep autoencoders for content-based
image retrieval. In ESANN, 2011.
32. Y. Le Cun, B. Boser, J.S. Denker, D. Henderson, R.E. Howard, W. Hubbard, L.D.
Jackel, et al. Hand written digit recognition with a back-propagation network. In
Advances In neura linformation processing systems, 1990.
33. Y. LeCun, F.J. Huang, and L. Bottou. Learning methods for generic object
recognition with invariance to pose and lighting. In Computer Vision and Pattern
Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society
Conference on, volume 2, pages II–97. IEEE, 2004.
34. Y. LeCun, K. Kavukcuoglu, and C. Farabet. Convolutional networks and
applications in vision. In Circuits and Systems (ISCAS), Proceedings of 2010
IEEE International Symposium on, pages 253–256. IEEE, 2010.
35. H. Lee, R. Grosse, R. Ranganath, and A.Y. Ng. Convolutional deep belief
networks for scalableunsupervisedlearningofhierarchicalrepresentations.
InProceedingsofthe26thAnnualInternationalConference on Machine Learning,
pages 609–616. ACM, 2009.
36. T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Metric Learning for Large
Scale Image Classification: Generalizing to New Classes at Near-Zero Cost. In
ECCV - European Conference on Computer Vision, Florence, Italy, October
2012.

74
37. V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann
machines. In Proc. 27th International Conference on Machine Learning, 2010.
38. N. Pinto, D.D. Cox, and J.J. DiCarlo. Why is real-world visual object recognition
hard PLoS computational biology, 4(1):e27, 2008.
39. N. Pinto, D. Doukhan, J.J. DiCarlo, and D.D. Cox. A high-throughput screening
approach to discovering good forms of biologically inspired visual representation.
PLoS computational biology, 5(11):e1000579, 2009.
40. B.C. Russell, A. Torralba, K.P. Murphy, and W.T. Freeman. Labelme: a database
and web-based tool for image annotation. International journal of computer vision,
77(1):157–173, 2008.
41. J.SánchezandF.Perronnin.High-dimensionalsignaturecompressionforlarge
scaleimageclassification.InComputerVisionandPatternRecognition(CVPR),2011IE
EEConferenceon,pages1665–1672.IEEE, 2011.
42. S.Y. Simard, D. Steinkraus, and J.C. Platt. Best practices for convolutional neural
networksappliedtovisualdocumentanalysis.InProceedingsoftheSeventhInternationa
lConferenceonDocumentAnalysis and Recognition, volume 2, pages 958–962,
2003.
43. C.Turaga,J.F.Murray,V.Jain,F.Roth,M.Helmstaedter,K.Briggman,W.Denk,andH.S
.Seung. Convolutional networks can learn to generate affinity graphs for image
segmentation. Neural Computation, 22(2):511–538, 2010.

75

Вам также может понравиться