TSR Project Report

TRAFFIC SIGN RECOGNITION USING MACHINE LEARNING PROJECT REPORT
PROJECT REPORT
ON
TRAFFIC SIGN RECOGNITION USING MACHINE LEARNING
Submitted by
ANGEL MARY JOHN (KSD15IT003)

ANUSHA RAVINDRAN (KSD15IT005)
AYSHA FATHIMA NAJIA (KSD15IT011)
to
the University of Kerala

in partial fulfillment of theRequirements for the award of B.Tech Degree
in Information Technology Engineering
Department of Computer Science and Engineering

LBS College of Engineering, Kasaragod
May 2019
i
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
LBS COLLEGE OF ENGINEERING, KASARAGOD
CERTIFICATE
Certified that this report entitled ‘Traffic sign recognition using machine learning’ is the report of
project presented by Anusha Ravindran, KSD15IT005 during the year 2018-2019 in partial
fulfillment of the requirements for the award of the degree of Bachelor of Technology in Computer
Science Engineering of the University of Kerala.
SANDEEP CHANDRAN
Assistant Professor
Dept. of CSE, LBSCEK
SARITH DIVAKAR M
Assistant Professor
Dept. of CSE, LBSCEK
SMITHAMOL
Head of the Department
Dept. of CSE,LBSCEK
ii
DECLARATION
I, Anusha Ravindran hereby declare that, this project report entitled TRAFFIC SIGN
RECOGNITION USING MACHINE LEARNING is a bonafide work of mine carried out
under the supervision of Mr.Sandeep Chandran, Asst.Proffesor, Department of computer science
and engineering, LBS college of engineering. I declare that, to the best of my knowledge, the
work reported herein does not form part of any other project report or dissertation on the basis of
which a degree or award was conferred or an earlier occasion to any other candidate. The content
of this project is not being presented by any other student to this or any other university for the
award of a degree.
Signature:
Name of student: ANUSHA RAVINDRAN
Uni.Register number: KSD15IT005
Signature:
Name of guide: SANDEEP CHANDRAN
Countersigned with name:
Head, Department Of Computer Science Engineering
LBS college of engineering kasaragod Date:06/05/2019
iii
iv
ACKNOWLEDGEMENT
I take this opportunity to express my deep sense of gratitude and sincere thanks to all who helped
me to complete the project successfully.
We sincerely thank our principal Dr. MUHAMMED SHEKOOR for providing us facilities in
order to go ahead with our project.
We express our sincere gratitude to Mrs. SMITHA MOL M B, Head of the department,
Computer Science and Engineering, LBS College of Engineering, for supporting us with necessary
facilities which was essential in the successful completion and presentation of our work.
We express our sincere gratitude to Mr.SANDEEP CHANDRAN for supporting and guiding us
throughout the work.
We also express our heartiest gratitude to project coordinator Mr.SARITH DIVAKAR M for the
timely suggestion and encouragement given for the successful completion of this work.
Finally, yet importantly, we would like to express our heartfelt thanks to our beloved parents for
their blessings and classmates for their help and wishes for the successful completion of this work.
i
ABSTRACT
Traffic sign recognition is an important but challenging task, specially for automated driving and
driver assistance. This is a technology by which a vehicle is able to recognize the traffic sign put
on the road.It uses image processing techniques to detect the traffic signs. The detection methods
can be generally divided into colour based, shape based and learning based methods.
Its accuracy depends on two aspects: feature extractor and classifier. Current popular algorithms
mainly use convolutional neural networks (CNN) to execute both feature extraction and
classification. Such methods could achieve impressive results but usually on the basis of an
extremely huge and complex network. What’s more, since the fully connected layers in CNN
form a classical neural network classifier, which is trained by gradient descent-based
implementations, the generalization ability is limited and sub-optimal. Firstly CNN learns deep
and robust features, followed by the removing of the fully-connected layers, which turns CNN to
be the feature extractor.
ii
CONTENTS
Title Page Number
List of Figures iv
List of Tables v
Chapter-1. Introduction 1
1.1 General Background 2

1.2 Objective 2
1.3 Scheme 3
Chapter-2. Literature Survey 4
2.1 A system for traffic sign detection,tracking and recognition using

Colour shape and motion information 4
2.2 Recognition of traffic sign based on their colour and shape features
Extracted using human vision models 5
2.3 Real time road signs classification 8
2.4 Robust class similarity measure for traffic sign recognition 9
2.5 Convolutional networks and applications in vision 11
Chapter-3 Methodology 14
3.1 Proposed model 14

3.1.1 Feature extraction using CNN extractor 14
3.1.2 Classification using CNN with deep convolutional features 15
3.1.2.1 The Convolutional layer 15
3.1.2.2 The non linear layer 16
3.1.2.3 The pooling layer 16
3.2 Architecture of proposed model 16
3.3 Module description 16
3.3.1 Feature extraction using CNN extractor 17
3.3.2 Image classification using CNN 17
3.3.2.1 CNN 17
Chapter-4 Traffic sign classifier 18
4.1 Pipeline architecture 18

4.1.1 Load the data 18
4.1.2 Data summary and exploration 18
4.1.3 Data preprocessing 18
4.1.3.1 Shuffling 18
iii
4.1.3.2 Grayscaling 19
4.1.3.3 Local histogram equilazation 19
4.1.3.4 Normalization 19
4.1.4 Designing model architecture 19
4.1.4.1 LeNet 20
4.1.4.1.1 LeNet architecture 20
4.1.5 Model training and evaluation 20
4.1.6 Testing the model using the testset 21
Chapter-5 Code 22
Chapter-6 Results 33
Chapter-7 Conclusion 39
References 40
iv
List of figures.
1. Figure 1 CNN 03
2. Figure 2 Classification using CNN 15
3. Figure 3 Architecture of proposed model 17
v
CHAPTER 1
INTRODUCTION
Traffic sign recognition is a multi-category classification problem with unbalanced class

frequencies. It is a challenging real-world computer vision problem of high practical
relevance, which has been a research topic for several decades. Many studies have been
published on this subject and multiple systems, which often restrict themselves to a subset of
relevant signs, are already commercially available in new high- and mid-range vehicles.
Nevertheless, there has been little systematic unbiased comparison of approaches and
comprehensive benchmark datasets are not publicly available. Road signs are designed to be
easily detected and recognized by human drivers. They follow clear design principles using
color, shape, icons and text. These allow for a wide range of variations between classes.
Signs with the same general meaning, such as the various speed limits, have a common
general appearance, leading to subsets of traffic signs that are very similar to each other.
Illumination changes, partial occlusions, rotations, and weather conditions further increase
the range of variations in visual appearance a classifier has to cope with. Humans are capable
of recognizing the large variety of existing road signs in most situations with near-perfect
accuracy.
Traffic sign recognition is a promising subfield of object recognition with various

applications, which could provide reliable safety precaution and guiding information to
drivers in motorway, urban environment and the like. Nowadays, it is an indispensable
opponent of the driver assistance system (DAS) and unmanned ground vehicle (UGV). Even
though a lot of algorithms have been put forward, just as the German traffic sign recognition
benchmark (GTSRB) shows, there are still problems such as viewpoint changes, color
distortion, motion blur, contrast degradation, occlusion, underexposure or overexposure, etc.,
which make it challenging to achieve a satisfying recognition accuracy. crucial in many
applications, such as autonomous driving, mapping and navigation. It is also important in
intelligent transportation systems. Generally, a traffic sign recognition system involves two
related issues: traffic sign detection and traffic sign classification. The former aims to
accurately localize the traffic signs in an image, while the later intends to identify the labels
of detected object into specific categories/subcategories.
1
1.1 GENERAL BACKGROUND

In recent years, traffic sign recognition (TSR) has become a core technology of safety and
traffic applications. Simply stated, traffic sign recognition helps identify traffic signs and
provides easiness for drivers and thus a safe journey. Due to different weather conditions
,the traffic sign board may be distorted, faded, teared and bend which leads to the
misinterpretation of the traffic sign. Most previous works have in some way restricted their
working conditions, such as limiting the number of images used, using complicated methods
in classifying the dataset which leads to reduced accuracy which results in Overfitting.
Hence, they were supposed to use another methods for rectifying this defect. In traffic
environments, signs regulate traffic, warn the driver, and command or prohibit certain
actions.
A real-time and robust automatic traffic sign recognition can support and disburden the
driver, and thus, significantly increase driving safety and comfort. For instance, it can remind
the driver of the current speed limit, prevent him from performing inappropriate actions such
as entering a one-way street, passing another car in a no passing zone, unwanted speeding, et
The aim of this project is to lessen many of these restrictions. Identification of the traffic
signs is a demanding function for safe driving for the driver as well as the vehicles
following.One can recognize traffic sign using its shape,colour and orientation.We can use
the various features of the image dataset for classification.
1.2 OBJECTIVE
Traffic sign recognition is a technology which identifies traffic sign from a fair distance. In
this contribution, we describe a real-time system for vision based traffic sign detection and
recognition. We focus on an important and practically relevant subset of (Indian) traffic
signs, namely speed-signs and no-passing-signs, and their corresponding end-signs, The
problem of traffic sign recognition has some beneficial characteristics. First, the design of
traffic signs is unique, thus, object variations are small. Further, sign colors often contrast
very well against the environment. Moreover, signs are rigidly positioned relative to the
environment (contrary to vehicles), and are often set up in clear sight to the driver.
Nevertheless, a number of challenges remain for a successful recognition. First, weather and
lighting conditions vary significantly in traffic environments, diminishing the advantage of
the above claimed object uniqueness. Additionally, as the camera is moving, additional image
distortions, such as, motion blur and abrupt contrast changes, occur frequently. Further, the
2
sign installation and surface material can physically change over time, influenced by
accidents and weather, hence resulting in rotated signs and degenerated colors.
1.3 SCHEME
CNN is one of the neural network models for deep learning, which is characterized by three
specific characteristics, namely locally connected neurons, shared weight and spatial or
temporal sub-sampling. Generally, CNN can be considered to be made up of two main parts.
The first contains alternating convolutional and maxpooling layers. The input of each layer is
just the output of its previous layer. As a result, this forms a hierarchical feature extractor that
maps the original input images into feature vectors. Then the extracted features vectors are
classified by the second part, that is, the fully-connected layers, which is a typical
feedforward neural network.
Convolutional
Layer CNN Feature
.
. .
Input Layer
. . .
. . .
Input Image
. .
. .
3
CHAPTER 2
LITERATURE SURVEY
2.1 A SYSTEM FOR TRAFFIC SIGN DETECTION, TRACKING AND

RECOGNITION USING COLOUR, SHAPE AND MOTION INFORMATION
This paper describes a computer vision based system for real-time robust traffic sign
detection, tracking, and recognition. Such a framework is of major interest for driver
assistance in an intelligent automotive cockpit environment. The proposed approach consists
of two components. First, signs are detected using a set of Haar wavelet features obtained
from Ada-Boost training. Compared to previously published approaches, our solution offers a
generic, joint modeling of color and shape information without the need of tuning free
parameters.Once detected, objects are efficiently tracked within a temporal information
propagation framework. Second, classification is performed using Bayesian generative
modeling.Making use of the tracking information, hypotheses are fused over multiple frames.
Experiments show high detection and recognition accuracy and a frame rate of approximately
10 frames per second on a standard PC.
In traffic environments, signs regulate traffic, warn the driver, and command or prohibit
certain actions. A real-time and robust automatic traffic sign recognition can support and
disburden the driver, and thus, significantly increase driving safety and comfort. For instance,
it can remind the driver of the current speed limit, prevent him from performing inappropriate
actions such as entering a one-way street, passing another car in a no passing zone, unwanted
speeding, etc. Further, it can be integrated into an adaptive cruise control (ACC) for a less
stressful driving. In a more global context, it can contribute to the scene understanding of
traffic context (e.g., if the car is driving in a city or on a freeway).
In this contribution, a real-time system for vision based traffic sign detection and recognition
is described. Here, main focus on an important and practically relevant subset of traffic signs,
namely speed-signs and no-passing-signs, and their corresponding end-signs, respectively.
4
The problem of traffic sign recognition has some beneficial characteristics. First, the design
of traffic signs is unique, thus, object variations are small. Further, sign colors often contrast
very well against the environment. Moreover, signs are rigidly Note that data are available in
color. positioned relative to the environment (contrary to vehicles), and are often set up in
clear sight to the driver.
The vast majority of published traffic sign recognition approaches utilizes at least two steps,
one aiming at detection, the other one at classification, that is, the task of mapping the
detected sign image into its semantic category.
Nevertheless, a number of challenges remain for a successful recognition. First, weather and
lighting conditions vary significantly in traffic environments, diminishing the advantage of
the above claimed object uniqueness. Additionally, as the camera is moving, additional image
distortions, such as, motion blur and abrupt contrast changes, occur frequently. Further, the
sign installation and surface material can physically change over time, influenced by
accidents and weather, hence resulting in rotated signs and degenerated colors. Finally, the
constraints given by the area of application require inexpensive systems (i.e., low-quality
sensor, slow hardware), high accuracy and real-time computation.
The drawback of this sequential appliance of color and shape detection is as follows. Regions
that have falsely been rejected by the color segmentation cannot be recovered in the further
processing. A joint modeling of color and shape can overcome this problem. Additionally,
color segmentation requires the fixation of thresholds, mostly obtained from a time
consuming and error prone manual tuning.
2.2 RECOGNITION OF TRAFFIC SIGN BASED ON THEIR COLOUR AND SHAPE

FEATURES EXTRACTED USING HUMAN VISION MODELS
Colour and shape are basic characteristics of traffic signs which are used both by the driver
and to develop artificial traffic sign recognition systems. However, these sign features have
not been represented robustly in the earlier developed recognition systems, especially in
disturbed viewing conditions. In this study, this information is represented by using a human
vision colour appearance model and by further developing existing behaviour model of
visions. Colour appearance model CIECAM97 has been applied to extract colour information
5
and to segment and classify traffic signs. Whilst shape features are extracted by the
development of FOSTS model, the extension of behaviour model of visions. Recognition rate
is very high for signs under artificial transformations that imitate possible real world sign
distortion (up to 50% for noise
level, 50 m for distances to signs, and 5_ for perspective disturbances) for still images. For
British traffic signs (n = 98) obtained under various viewing conditions, the recognition rate
is up to 95%. Colour and shape are dominant visual features of traffic signs with distinguish
characteristics and are key information for drivers to process when driving along the road.
Therefore to develop a driver assistant system for recognition of traffic signs, this information
should be utilised effectively and efficiently even in the knowledge that colour and shape
vary with the change of lighting conditions and viewing angles. Colour is regulated not only
for the traffic sign category (red = stop, yellow = danger, etc.) but also for the tint of the paint
that covers the sign, which should correspond, with a tolerance, to a specific wavelength in
the visible spectrum . However, most colour-based techniques in computer vision run into
roblems if the illumination source varies not only in intensity but also in colour as well. This
is because that the spectral composition, and therefore the colour, of daylight change
depending on weather conditions, e.g., sky with/ without clouds, time of the day, and night
when all sorts of artificial lights are surrounded . Many authors therefore have developed
various techniques to make use of the colour information of traffic signs.
Tominaga developed clustering method in a colour space, whilst Ohlander used a recursive
region splitting method to achieve colour segmentation. The colour spaces they applied are
HSI (Hue, Saturation, Intensity), and L*a*b* space. These colour spaces normally limit to
only one lighting condition, which is D65. Hence, the range of each colour attribute, such as
hue, will be narrowed down due to the fact that weather condition changes with colour
temperatures ranging from 5000 to 7000 K.
Shape is another powerful visual feature for recognition of signs [4–10]. However, when
signs appear in cluttered scenes, many objects may appear similar to the road signs. Also,
when the viewing angles are different, the signs will appear differently with some degree of
distortion, sometimes with torn corners and occluded parts. Furthermore, signs vary in scale:
getting bigger as a vehicle moves toward them, and vary in size: appearing relatively small
with about 40–50 pixels wide at the most. Another difficulty is linked to the way the signs are
captured by the acquisition system. It is stated that all road signs will be seen with a non-zero
6
angle between the optical axis of each camera and the normal vector to the sign surface . This
angle should be as high as 30_, depending on the distance between the sign and the cameras.
Piccioloi and Campani concentrated on geometrical reasoning for the detection of triangular
and circular signs. For the triangular shapes, they segmented them using the horizontal or
having a slope of the ranges [60 _ e, 60+e], [_60 _ e,_60 + e] degrees, where e is the
deviation from 60 calculated from samples. Hough Transform was applied to detect the
circles. However, only two types of shape were studied. Miura et attracted sign candidates
using white circular regions by using binarization with area filtering, which only keeps white
regions whose areas are within a predetermined range. Due to the dust of the road, the white
regions sometimes may not be the areas with higher intensity values, which will result in lots
of false candidates.
More recently, Escalera has developed a driver support system which employs a genetic
algorithm for detection of sign state and a neural network for achieving the classification. But
the neural network needs to be re-trained whenever a new case is included, which is very
time consuming. Due to the adaptation to the environment, human can correctly identify
traffic signs invariant of lighting conditions and viewing angles. Therefore invariant features
can be extracted using vision models. In this study, two vision models have been applied and
developed. One model is CIECAM97 for measuring colour appearance invariant of lighting
conditions and is utilised to extract colour features. The other vision model, foveal system for
traffic signs (FOSTS), is developed based on behaviour model of visions (BMV) model
imitating some mechanisms of the real visual system for perceiving shapes [14–16]
CIECAM97 is a standard colour appearance model recommended by CIE (International
Colour Commission on Illumination) in 1997 for measuring colour appearance under various
viewing conditions [17,18]. This model can estimate a colour appearance as accurate as an
average observer. It takes weather conditions into account and simulates human_s perception
for perceiving colours under various viewing conditions and for different media, such as
reflection colours, transmissive colours, etc. For human perception, the most common terms
used for colour or colour appearance are lightness, chroma, and hue that can be predicted
using the
model. The input parameters are viewing conditions, including lighting source, reference
white, and the background.
7
2.3 REAL TIME ROAD SIGNS CLASSIFICATION

This paper describes a method for classifying road signs based on a single color camera
mounted on a moving vehicle. The main focus will be on the final neural network based
classification stage of the candidates provided by an existing traffic sign detection algorithm.
Great attention is paid to image preprocessing in order to provide a more simple and clear
input to the network: candidate color images are cropped and converted to greyscale, then
enhanced using a contrast stretching technique; a multi-layer perceptron neural network is
then used to provide a matching score with different road sign models. Finally results are
filtered using tracking. Benchmarks are presented, showing that the system is able to classify
more then 200 different Italian road sign in real-time, with a recognition rate of 80% to 90%.
Road sign detection and recognition can be an important aid to the driver, letting him
concentrate on driving; such kind of system can remember signs encountered, even those that
go unnoticed or neglected, thus reducing the impact of these events on driving comfort, and
also decreasing the possibility of related road accidents. Road signs are designed to be easily
readable, with high
contrast and saturated colors, and are installed according to a strict regulation; however,
environmental light, weather conditions, paint degradation, dirt, shadows and occlusions
make automatic traffic sign recognition a challenging task.
The main goal of this paper is to describe the classification stage of a traffic sign recognition
system. Since traffic signs follow strict shape formats, the classification system is driven by
the information provided by the shape detection stage, whose output is used to route the input
pattern to a specialized classificator. Several road sign classification techniques are described
in literature. One of the simplest methods is cross-correlation with models: in model signs,
resampled to 16×16 pixels and rototraslated, are used to find the best match. Random forests,
an ensemble learning technique, are used by to classify signs, and a comparison is made
between this technique and SVM and AdaBoost. Support vector machines (SVM) are largely
adopted to classify the inner part of road signs. Linear SVM and SVM with Gaussian Kernels
are used to recognize the symbol contained in the resampled inner part of road signs: only
significative pixels inside the region are used to train the SVM, and each object is only
compared with signs with the same shape and color. Example of the system output. inputs are
hard to analyze, can be useful for a classifier to reduce its input. Principal component analysis
(PCA) and linear discrimant analysis (LDA) techniques can be used to fit this task. Neural
8
networks are also largely adopted, and this technique is also the one chosen to provide the
classification stage described and evaluated in this paper.
A comparative study between networks with one or two hidden layers has already been made
, demonstrating that better performance can be achieved using networks with two hidden
layers. Tests are also available on the use of Resilient Back-propagation or Scale Conjugate
Gradient to train neural networks . Neural networks have also been recently used in
embedded systems for traffic sign recognition. Tests have been made on how to train neural
networks, using both synthetic and real images. Since a large road signs database can be
easily collected, this paper presents exhausting benchmarks to provide a tested and effective
indication on how to train neural network for a road sign classification system. The paper is
organized as follows: It presents the system architecture; briefly explains the detection stage
presented in a previous article.
2.4 ROBUST CLASS SIMILARITY MEASURE FOR TRAFFIC SIGN

RECOGNITION
This paper describes a method for classifying road signs based on a single color camera
mounted on a moving vehicle. The main focus will be on the final neural network based
classification stage of the candidates provided by an existing traffic sign detection algorithm.
Great attention is paid to image preprocessing in order to provide a more simple and clear
input to the network: candidate color images are cropped and converted to greyscale, then
enhanced using a contrast stretching technique; a multi-layer perceptron neural network is
then used to provide a matching score with different road sign models. Finally results are
filtered using tracking. Benchmarks are presented, showing that the system is able to classify
more then 200 different Italian road sign in real-time, with a recognition rate of 80% to 90%.
Road sign detection and recognition can be an important aid to the driver, letting him
concentrate on driving; such kind of system can remember signs encountered, even those that
go unnoticed or neglected, thus reducing the impact of these events on driving comfort, and
also decreasing the possibility of related road accidents. Road signs are designed to be easily
readable, with high
contrast and saturated colors, and are installed according to a strict regulation; however,
environmental light, weather conditions, paint degradation, dirt, shadows and occlusions
make automatic traffic sign recognition a challenging task.
9
The main goal of this paper is to describe the classification stage of a traffic sign recognition
system. Since traffic signs follow strict shape formats, the classification system is driven by
the information provided by the shape detection stage, whose output is used to route the input
pattern to a specialized classificator. Several road sign classification techniques are described
in literature. One of the simplest methods is cross-correlation with models: in model signs,
resampled to 16×16 pixels and rototraslated, are used to find the best match. Random forests,
an ensemble learning technique, are used by to classify signs, and a comparison is made
between this technique and SVM and AdaBoost. Support vector machines (SVM) are largely
adopted to classify the inner part of road signs. Linear SVM and SVM with Gaussian Kernels
are used to recognize the symbol contained in the resampled inner part of road signs: only
significative pixels inside the region are used to train the SVM, and each object is only
compared with signs with the same shape and color. Example of the system output. inputs are
hard to analyze, can be useful for a classifier to reduce its input. Principal component analysis
(PCA) and linear discrimant analysis (LDA) techniques can be used to fit this task. Neural
networks are also largely adopted, and this technique is also the one chosen to provide the
classification stage described and evaluated in this paper.
A comparative study between networks with one or two hidden layers has already been made
, demonstrating that better performance can be achieved using networks with two hidden
layers. Tests are also available on the use of Resilient Back-propagation or Scale Conjugate
Gradient to train neural networks . Neural networks have also been recently used in
embedded systems for traffic sign recognition. Tests have been made on how to train neural
networks, using both synthetic and real images. Since a large road signs database can be
easily collected, this paper presents exhausting benchmarks to provide a tested and effective
indication on how to train neural network for a road sign classification system. The paper is
organized as follows: It presents the system architecture; briefly explains the detection stage
presented in a previous article.
2.5 CONVOLUTIONAL NETWORKS AND APPLICATIONS IN VISION
Convolutional Networks are trainable multistage architectures composed of multiple stages.

The input and output of each stage are sets of arrays called feature maps. For example, if the
input is a color image, each feature map would be a 2D array containing a color channel of
10
the input image (for an audio input each feature map would be a 1D array, and for a video or
volumetric image, it would be a 3D array). At the output, each feature map represents a
particular feature extracted at all locations on the input. Each stage is composed of three
layers: a filter bank layer, a non-linearity layer, and a feature pooling layer.
A typical ConvNet is composed of one, two or three such 3-layer stages, followed by a
classification module. Each layer type is now described for the case of image recognition.
artificial) is how to produce good internal representations of the visual world. What sort of
internal representation would allow an artificial vision system to detect and classify objects
into categories, independently of pose, scale, illumination, conformation, and clutter? More
interestingly, how could an artificial vision system learn appropriate internal representations
automatically, the way animals and human seem to learn by simply looking at the world? In
the time-honored approach to computer vision (and to pattern recognition in general), the
question is avoided: internal representations are produced by a hand-crafted feature extractor,
whose output is fed to a trainable classifier. While the issue of learning features has been a
topic of interest for many years, considerable progress has been achieved in the last few years
with the development of so-called deep learning methods. Good internal representations are
hierarchical.
In vision, pixels are assembled into edglets, edglets into motifs, motifs into parts, parts into
objects, and objects into scenes. This suggests that recognition architectures for vision (and
for other modalities such as audio and natural language) should have multiple trainable stages
stacked on top of each other, one for each level in the feature hierarchy. This raises two new
questions: what to put in each stage and how to train such deep, multi-stage architectures?
Convolutional Networks (ConvNets) are an answer to the first question. Until recently, the
answer to the second question was to use gradient-based supervised learning, but recent
research in deep learning has produced a number of unsupervised methods which greatly
reduce the need for labeled samples.
The Convolutional Network architecture is a remarkably versatile, yet conceptually simple

paradigm that can be applied to a wide spectrum of perceptual tasks. While traditional
ConvNet trained with supervised learning are very effective, training them require a large
number of labeled training samples. We have shown that using simple architectural tricks
11
such as rectification and contrast normalization, and using unsupervised pre-training of each
filter bank, the need for labeled samples is considerably reduced. Because of their
applicability to a wide range of tasks, and because of their relatively uniform architecture,
ConvNets are perfect candidates for hardware implementations, and embedded applications,
as demonstrated by the increasing amount of work in this area. We expect to see many new
embedded vision systems based on ConvNets in the next few years.
Despite the recent progress in deep learning, one of the major challenges of computer vision,
machine learning, and AI in general in the next decade will be to devise methods that can
automatically learn good features hierarchies from unlabeled and labeled data in an integrated
fashion. Current and future research will focus on performing unsupervised learning on
multiple stages simultaneously, on the integration of unsupervised and unsupervised learning,
and on using the feed-back path implemented by the decoders to perform visual inference,
such as pattern completion and disambiguation
various architectures and training procedures are compared to determine which non-
linearities are preferable, and which training protocol makes a difference. Generic Object
Recognition using Caltech 101 Dataset: We use a two-stage system where, the first stage is
composed of
an Flayer with 64 filters of size 9×9, followed by different combinations of non-linearities
and pooling. The second-stage feature extractor is fed with the output of the first stage and
extracts 256 output features maps, each of which combines a random subset of 16 feature
maps from the previous stage using 9×9kernels. Hence the total number of convolution
kernels is 256 ×16 = 4096.
.
1. Excellent accuracy of 65.5% is obtained using unsupervised pre-training and supervised
refinement with abs and normalization non-linearities. The result is on par with the popular
model based on SIFT and pyramid match kernel SVM . It is clear that abs and normalization
are crucial for achieving good performance. This is an extremely important fact for users of
convolutional networks, which traditionally only use tanh().
2. Astonishingly, random filters without any filter learning whatsoever achieve decent
performance (62.9% for R), as long as abs and normalization are present (Rabs −N−PA). A
more detailed study on this particular case can be found in .
12
3. Comparing experiments from rows Rvs R+,Uvs U+, we see that supervised fine tuning
consistently improves the performance, particularly with weak non-linearities.
4. It seems that unsupervised pre-training (U,U+) is crucial when newly proposed non-
linearities are not in place. Handwritten Digit Classification using MNIST Dataset: Using the
evidence gathered in previous experiments, we used a two-stage system with a two-layer
fully-connected classifier. The two convolutional stages were pre-trained unsupervised, and
refined supervised. An error rate of 0.53% was achieved on the test set. To our knowledge,
this is the lowest error
rate ever reported on the original MNIST dataset, without distortions or preprocessing. The
best previously reported error rate was 0.60% . Experiments on German traffic sign
recognition benchmark (GTSRB) demonstrate that the proposed method can obtain
competitive results with state-of-the-art algorithms with much less complexity.
13
CHAPTER 3
METHODOLOGY
3.1 PROPOSED MODEL
Current popular algorithms mainly use convolutional neural networks to execute both feature
extraction and classification. Such these methods could achieve impressive results but often
on the basis of an extremely huge and complex network or ensemble learning, together with
over-massive data. For the purpose of making full use of the advantages of CNN, we propose
a novel traffic sign recognition architecture. Before sent to CNN for feature extraction, the
average image of the traffic signs is subtracted to ensure illumination invariance to some
extent.
3.1.1 Feature Extraction Using CNN Extractor

In order to indicate that the deep convolutional features learnt by plain CNN is discriminative
enough for traffic sign recognition. Here we refer to the original and simple structure to build
up the CNN. The difference is that an extra convolutional layer with 200 feature maps of 1×1
neuron is added before the fully-connected layer. The max pooling layer here is non-
overlapping and no rectification or inner-layer normalization operation is used.
Considering that the traffic signs images are relatively invariable in shape and the size of the
samples in ITSRB dataset varies from 15×15 to 250×250, here we assume that the influence
coming from cropping and wrapping is considered neglectable. Thus, only images in
bounding boxes given by the annotations are cropped and resized to 48×48 uniformly. Data
augmentation is not used, which means that random deformation (translation, rotation,
scaling, etc.) is not applied to the training set. Since CNN is used to extract deep features
rather than conduct classification, the first eight layers of the CNN are taken out as a feature
extractor while the fully-connected layers are removed when training is done.
3.1.2 Classification Using CNN with Deep Convolutional Features

The accuracy of traffic sign recognition depends on two aspects: feature extractor and
classifier. The more discriminative features are and the more powerful classifier is, the higher
14
recognition rate will be. Let us consider the use of CNN for image classification in more
detail.
Conv(32),conv(32)
(activation:ReLu)
Pool(2×2)
Input Image
(32×32)
The image is passed through a series of convolutional, non-linear,pooling layers and fully
connected layers and then generates the output.
3.1.2.1 The convolutional layer

The convolutional layer is always the first. The image (matrix with pixel values) is entered
into the convolutional layer. Imagine that reading of the input matrix begins at the top left of
the image. Next the software selects the smaller matrix there, which is called filter. Then the
filter produces convolution, that is, move along the input image.The filter’s task is to multiply
its values by the original pixel values. All these multiplications are summed up. One number
is obtained in the end. Since the filter has read the image only in the upper left corner,it
moves further and further right by one unit performing a similar operation. After passing the
filter across all positions, a matrix is obtained, but smaller than a input matrix. This
operation, from a human percpective,is analogous to identifying boundaries and simple
colours on the image. But inorder to recognize the properties of a higher level the whole
network is needed.
15
The network will consist of several convolutional networks mixed with non-linear and
pooling layers .When the image passes through one convolutional layer, the output of the first
layer becomes the input of the second layer and this happens with every further
convolutional layer.
3.1.2.2 The Non Linear Layer

The nonlinear layer is added after each convolutional operation. It has an activation function,
Which brings nonlinear property. Without this property a network would not be sufficiently
intense and will not be able to model the response variable.
3.1.2.3 The Pooling Layer

The pooling later follows the non linear layer. It works with the width and height of the
image and performs a down-sampling operation on them. As a result the image volume is
reduced. This means that if some features have already been identified in the previous
convolution operation, then a detailed image is no longer needed for further processing, and it
is compressed to less detailed pictures.
After completion of series of convolutional, non linear and pooling layers, it is necessary to
attach a fully connected layer. This layer takes the output information from the
convolutional \networks. Attaching a fully connected layer to the end of the network results
in an N dimensional vector, where N is the amount of classes from which the model selects
the desired class.
3.2 ARCHITECTURE OF PROPOSED MODEL

Current popular algorithms mainly use convolutional neural networks to execute both feature
extraction and classification. Such these methods could achieve impressive results but often
on the basis of an extremely huge and complex network or ensemble learning, together with
over-massive data. For the purpose of making full use of the advantages of CNN , we propose
a novel traffic sign recognition architecture. Before sent to CNN for feature extraction, the
average image of the traffic signs is subtracted to ensure illumination invariance to some
extent.
16
Data preprocessing Data modeling Training
Testing
ing
Evaluation Validation
3.3 MODULE DESCRIPTION

3.3.1 Feature Extraction Using CNN Extractor
In order to indicate that the deep convolutional features learnt by plain CNN is discriminative
enough for traffic sign recognition. Here we refer to the original and simple structure
proposed in to build up the CNN. The difference is that an extra convolutional layer with 200
feature maps of 1×1 neuron is added before the fully connected layer CNN architecture. The
max pooling layer here is non-overlapping and no rectification or inner-layer normalization
operation is used. Considering that the traffic signs images are relatively invariable in shape
and the size of the samples in dataset varies from 15×15 to 250×250, here we assume that the
influence coming from cropping and wrapping is considered neglectable. Thus, only images
in bounding boxes given by the annotations are cropped and resized to 48×48 uniformly.
Note that data augmentation is not used, which means that random deformation (translation,
rotation, scaling, etc.) is not applied to the training set. Since CNN is used to extract deep
features rather than conduct classification, the first eight layers of the CNN are taken out as a
feature extractor while the fully connected layers are removed when training is done.
3.3.2 Image Classification Using CNN

Neural network consists of individual units called neurons. Neurons are located in a series of
groups- layers. Neurons in each layer are connected to neurons of the next layer. Data comes
from the input layer to the output layer along these compounds. Each individual node
performs simple mathematical calculation. Then it transmits its data to all the nodes it is
connected to.CNN is a special architecture of artificial neural networks. CNN uses some
features of the visual cortex. One of the most popular uses of this architecture is image
classification.
17
CHAPTER 4
TRAFFIC SIGN CLASSIFIER

In our, we used Python and TensorFlow to classify traffic signs. This dataset has
more than 50,000 images of 43 classes.
4.1 PIPELINE ARCHITECTURE:

4.1.1 Load the data
We have three .p files of 32x32 resized images:

▪ train.p: The training set.
▪ test.p: The testing set.
▪ valid.p: The validation set.
4.1.2 Dataset Summary & Exploration
The pickled data is a dictionary with 4 key/value pairs:

• 'features' is a 4D array containing raw pixel data of the traffic sign images, (num
examples, width, height, channels).
• 'labels' is a 1D array containing the label/class id of the traffic sign. The
file signnames.csv contains id -> name mappings for each id.
• 'sizes' is a list containing tuples, (width, height) representing the original width and
height the image.
• 'coords' is a list containing tuples, (x1, y1, x2, y2) representing coordinates of a
bounding box around the sign in the image.
4.1.3 Data Preprocessing.
4.1.3.1 Shuffling
we shuffle the training data to increase randomness and variety in training
dataset, in order for the model to be more stable. We will use sklearn to shuffle our data.
4.1.3.2 Grayscalling
Grayscale images instead of color improves the ConvNet's accuracy. We
use OpenCV to convert the training images into grey scale.
18
4.1.3.3 Local histogram equalization
This technique simply spreads out the most frequent intensity values in an image,
resulting in enhancing images with low contrast. Applying this technique will be very
helpful in our case since the dataset in hand has real world images, and many of them
has low contrast. We use skimage to apply local histogram equalization to the training
images
4.1.3.4 Normalization
Normalization is a process that changes the range of pixel intensity values.
Usually the image data should be normalized so that the data has mean zero and equal
variance.
4.1.4 Designing Model Architecture

Here, we design and implement a deep learning model that learns to recognize traffic
signs from our dataset Indian Traffic Sign Dataset
We'll use Convolutional Neural Networks to classify the images in this dataset. The reason
behind choosing ConvNets is that they are designed to recognize visual patterns directly from
pixel images with minimal preprocessing. They automatically learn hierarchies of invariant
features at every level from data. We will implement two of the most famous ConvNets. Our
goal is to reach an accuracy of +95% on the validation set.
We start by explaining each network architecture, then implement it using TensorFlow.
1. We specify the learning rate of 0.001, which tells the network how quickly to
update the weights.
2. We minimize the loss function using the Adaptive Moment Estimation (Adam)
Algorithm. Adam is an optimization algorithm. Adam algorithm computes adaptive
learning rates for each parameter.
3. we will run minimize() function on the optimizer which use backprobagation to
update the network and minimize our training loss.
4.1.4.1 LeNet
LeNet-5 is a convolutional network designed for handwritten and machine-
printed character recognition
19
4.1.4.1.1 LeNet Architecture
This ConvNet follows these steps:
Input => Convolution => ReLU => Pooling => Convolution => ReLU => Pooling =>
FullyConnected => ReLU => FullyConnected
Layer 1 (Convolutional): The output shape should be 28x28x6.

Activation. Your choice of activation function.
Pooling. The output shape should be 14x14x6.
Layer 2 (Convolutional): The output shape should be 10x10x16.

Pooling. The output shape should be 5x5x16.
Flattening: Flatten the output shape of the final pooling layer such that it's 1D instead of 3D.
Layer 3 (Fully Connected): This should have 120 outputs.


4.1.5 Model Training and Evaluation.

In this step, we will train our model using normalized_images, then we'll compute
softmax cross entropy between logits and labels to measure the model's error
probability.
4.1.6 Testing the Model Using the Test Set.

We use the testing set to measure the accuracy of the model over unknown
example
20
CHAPTER 5
5.1 CODE
# Importing Python libraries

import pickle
import numpy as np
import matplotlib.pyplot as plt
import random
import cv2
import skimage.morphology as morp
from skimage.filters import rank
from sklearn.utils import shuffle
import csv
import os
import tensorflow as tf
from tensorflow.contrib.layers import flatten
training_file = "./traffic-signs-data/train.p"
validation_file= "./traffic-signs-data/valid.p"
testing_file = "./traffic-signs-data/test.p"
with open(training_file, mode='rb') as f:

train = pickle.load(f)
with open(validation_file, mode='rb') as f:
valid = pickle.load(f)
with open(testing_file, mode='rb') as f:
test = pickle.load(f)
#Mapping ClassID to traffic sign names

signs = []
with open('signnames.csv', 'r') as csvfile:
signnames = csv.reader(csvfile, delimiter=',')
21
next(signnames,None)
for row in signnames:
signs.append(row[1])
csvfile.close()
X_train, y_train = train['features'], train['labels']

X_valid, y_valid = valid['features'], valid['labels']
X_test, y_test = test['features'], test['labels']
# Number of training examples

n_train = X_train.shape[0]
# Number of testing examples

n_test = X_test.shape[0]
# Number of validation examples.

n_validation = X_valid.shape[0]
# What's the shape of an traffic sign image?

image_shape = X_train[0].shape
# How many unique classes/labels there are in the dataset.

n_classes = len(np.unique(y_train))
print("Number of training examples: ", n_train)

print("Number of testing examples: ", n_test)
print("Number of validation examples: ", n_validation)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
def list_images(dataset, dataset_y, ylabel="", cmap=None):

"""
Display a list of images in a single figure with matplotlib.
22
Parameters:
images: An np.array compatible with plt.imshow.
lanel (Default = No label): A string to be used as a label for each image.
cmap (Default = None): Used to display gray images.
"""
plt.figure(figsize=(15, 16))
for i in range(6):
plt.subplot(1, 6, i+1)
indx = random.randint(0, len(dataset))
#Use gray scale color map if there is only one channel
cmap = 'gray' if len(dataset[indx].shape) == 2 else cmap
plt.imshow(dataset[indx], cmap = cmap)
plt.xlabel(signs[dataset_y[indx]])
plt.ylabel(ylabel)
plt.xticks([])
plt.yticks([])
plt.tight_layout(pad=0, h_pad=0, w_pad=0)
plt.show()
# Plotting sample examples

list_images(X_train, y_train, "Training example")
list_images(X_test, y_test, "Testing example")
list_images(X_valid, y_valid, "Validation example")
def histogram_plot(dataset, label):

"""
Plots a histogram of the input data.
Parameters:
dataset: Input data to be plotted as a histogram.
lanel: A string to be used as a label for the histogram.
"""
hist, bins = np.histogram(dataset, bins=n_classes)
width = 0.7 * (bins[1] - bins[0])
23
center = (bins[:-1] + bins[1:]) / 2

plt.bar(center, hist, align='center', width=width)
plt.xlabel(label)
plt.ylabel("Image count")
plt.show()
# Plotting histograms of the count of each sign

histogram_plot(y_train, "Training examples")
histogram_plot(y_test, "Testing examples")
histogram_plot(y_valid, "Validation examples")
X_train, y_train = shuffle(X_train, y_train)
def gray_scale(image):
"""
Convert images to gray scale.
Parameters:
image: An np.array compatible with plt.imshow.
"""
return cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
# Sample images after greyscaling

gray_images = list(map(gray_scale, X_train))
list_images(gray_images, y_train, "Gray Scale image", "gray")
def local_histo_equalize(image):
"""
Apply local histogram equalization to grayscale images.
Parameters:
image: A grayscale image.
"""
24
kernel = morp.disk(30)
img_local = rank.equalize(image, selem=kernel)
return img_local
# Sample images after Local Histogram Equalization

equalized_images = list(map(local_histo_equalize, gray_images))
list_images(equalized_images, y_train, "Equalized Image", "gray")
def image_normalize(image):
"""
Normalize images to [0, 1] scale.
Parameters:
image: An np.array compatible with plt.imshow.
"""
image = np.divide(image, 255)
return image
# Sample images after normalization

n_training = X_train.shape
normalized_images = np.zeros((n_training[0], n_training[1], n_training[2]))
for i, img in enumerate(equalized_images):
normalized_images[i] = image_normalize(img)
list_images(normalized_images, y_train, "Normalized Image", "gray")
normalized_images = normalized_images[..., None]
def preprocess(data):
"""
Applying the preprocessing steps to the input data.
Parameters:
data: An np.array compatible with plt.imshow.
"""
gray_images = list(map(gray_scale, data))
equalized_images = list(map(local_histo_equalize, gray_images))
25
n_training = data.shape
normalized_images = np.zeros((n_training[0], n_training[1], n_training[2]))
for i, img in enumerate(equalized_images):
normalized_images[i] = image_normalize(img)
normalized_images = normalized_images[..., None]
return normalized_images
class LaNet:
def __init__(self, n_out=43, mu=0, sigma=0.1, learning_rate=0.001):

# Hyperparameters
self.mu = mu
self.sigma = sigma
# Layer 1 (Convolutional): Input = 32x32x1. Output = 28x28x6.

self.filter1_width = 5
self.filter1_height = 5
self.input1_channels = 1
self.conv1_output = 6
# Weight and bias
self.conv1_weight = tf.Variable(tf.truncated_normal(
shape=(self.filter1_width, self.filter1_height, self.input1_channels,
self.conv1_output),
mean = self.mu, stddev = self.sigma))
self.conv1_bias = tf.Variable(tf.zeros(self.conv1_output))
# Apply Convolution
self.conv1 = tf.nn.conv2d(x, self.conv1_weight, strides=[1, 1, 1, 1], padding='VALID')
+ self.conv1_bias
# Activation:
self.conv1 = tf.nn.relu(self.conv1)
26
# Pooling: Input = 28x28x6. Output = 14x14x6.

self.conv1 = tf.nn.max_pool(self.conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
padding='VALID')
# Layer 2 (Convolutional): Output = 10x10x16.

self.filter2_width = 5
self.filter2_height = 5
self.input2_channels = 6
self.conv2_output = 16
# Weight and bias
self.conv2_weight = tf.Variable(tf.truncated_normal(
shape=(self.filter2_width, self.filter2_height, self.input2_channels,
self.conv2_output),
mean = self.mu, stddev = self.sigma))
self.conv2_bias = tf.Variable(tf.zeros(self.conv2_output))
# Apply Convolution
self.conv2 = tf.nn.conv2d(self.conv1, self.conv2_weight, strides=[1, 1, 1, 1],
padding='VALID') + self.conv2_bias
# Activation:
self.conv2 = tf.nn.relu(self.conv2)
# Pooling: Input = 10x10x16. Output = 5x5x16.

self.conv2 = tf.nn.max_pool(self.conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
padding='VALID')
# Flattening: Input = 5x5x16. Output = 400.

self.fully_connected0 = flatten(self.conv2)
# Layer 3 (Fully Connected): Input = 400. Output = 120.

self.connected1_weights = tf.Variable(tf.truncated_normal(shape=(400, 120), mean =
self.mu, stddev = self.sigma))
self.connected1_bias = tf.Variable(tf.zeros(120))
27
self.fully_connected1 = tf.add((tf.matmul(self.fully_connected0,
self.connected1_weights)), self.connected1_bias)
# Activation:
self.fully_connected1 = tf.nn.relu(self.fully_connected1)

self.connected2_weights = tf.Variable(tf.truncated_normal(shape=(120, 84), mean =
self.mu, stddev = self.sigma))
self.connected2_bias = tf.Variable(tf.zeros(84))
self.fully_connected2 = tf.add((tf.matmul(self.fully_connected1,
self.connected2_weights)), self.connected2_bias)
# Activation.
self.fully_connected2 = tf.nn.relu(self.fully_connected2)

self.output_weights = tf.Variable(tf.truncated_normal(shape=(84, 43), mean = self.mu,
stddev = self.sigma))
self.output_bias = tf.Variable(tf.zeros(43))
self.logits = tf.add((tf.matmul(self.fully_connected2, self.output_weights)),
self.output_bias)
# Training operation
self.one_hot_y = tf.one_hot(y, n_out)
self.cross_entropy = tf.nn.softmax_cross_entropy_with_logits(self.logits,
self.one_hot_y)
self.loss_operation = tf.reduce_mean(self.cross_entropy)
self.optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate)
self.training_operation = self.optimizer.minimize(self.loss_operation)
# Accuracy operation
self.correct_prediction = tf.equal(tf.argmax(self.logits, 1), tf.argmax(self.one_hot_y, 1))
self.accuracy_operation = tf.reduce_mean(tf.cast(self.correct_prediction, tf.float32))
28
# Saving all variables

self.saver = tf.train.Saver()
def y_predict(self, X_data, BATCH_SIZE=64):

num_examples = len(X_data)
y_pred = np.zeros(num_examples, dtype=np.int32)
sess = tf.get_default_session()
for offset in range(0, num_examples, BATCH_SIZE):
batch_x = X_data[offset:offset+BATCH_SIZE]
y_pred[offset:offset+BATCH_SIZE] = sess.run(tf.argmax(self.logits, 1),
feed_dict={x:batch_x, keep_prob:1, keep_prob_conv:1})
return y_pred
def evaluate(self, X_data, y_data, BATCH_SIZE=64):

num_examples = len(X_data)
total_accuracy = 0
sess = tf.get_default_session()
batch_x, batch_y = X_data[offset:offset+BATCH_SIZE],
y_data[offset:offset+BATCH_SIZE]
accuracy = sess.run(self.accuracy_operation,
feed_dict={x: batch_x, y: batch_y, keep_prob: 1.0, keep_prob_conv: 1.0
})
total_accuracy += (accuracy * len(batch_x))
return total_accuracy / num_examples
x = tf.placeholder(tf.float32, (None, 32, 32, 1))

y = tf.placeholder(tf.int32, (None))
keep_prob = tf.placeholder(tf.float32) # For fully-connected layers

keep_prob_conv = tf.placeholder(tf.float32) # For convolutional layers
29
# Validation set preprocessing

X_valid_preprocessed = preprocess(X_valid)
EPOCHS = 30
BATCH_SIZE = 64
DIR = 'Saved_Models'
LeNet_Model = LaNet(n_out = n_classes)

model_name = "LeNet"
with tf.Session() as sess:

sess.run(tf.global_variables_initializer())
num_examples = len(y_train)
print("Training ...")
print()
for i in range(EPOCHS):
normalized_images, y_train = shuffle(normalized_images, y_train)
end = offset + BATCH_SIZE
batch_x, batch_y = normalized_images[offset:end], y_train[offset:end]
sess.run(LeNet_Model.training_operation, feed_dict={x: batch_x, y: batch_y,
keep_prob : 0.5, keep_prob_conv: 0.7})
validation_accuracy = LeNet_Model.evaluate(X_valid_preprocessed, y_valid)

print("EPOCH {} : Validation Accuracy = {:.3f}%".format(i+1,
(validation_accuracy*100)))
LeNet_Model.saver.save(sess, os.path.join(DIR, model_name))
print("Model saved")
30
CHAPTER 6
RESULTS
We use the testing set to measure the accuracy of the model over unknown examples.
# Test set preprocessing

X_test_preprocessed = preprocess(X_test)

LeNet_Model.saver.restore(sess, os.path.join(DIR, "LeNet"))
y_pred = LeNet_Model.y_predict(X_test_preprocessed)
test_accuracy = sum(y_test == y_pred)/len(y_test)
print("Test Accuracy = {:.1f}%".format(test_accuracy*100))
On training
EPOCH 1 : Validation Accuracy = 81.451%

31

Model saved
As we can see, we've been able to reach a maximum accuracy of 95.3% on the validation set
over 30 epochs, using a learning rate of 0.001 on training the LeNet model
# Loading and resizing new test images

new_test_images = []
32
path = './traffic-signs-data/new_test_images/'
for image in os.listdir(path):
img = cv2.imread(path + image)
img = cv2.resize(img, (32,32))
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
new_test_images.append(img)
new_IDs = [13, 3, 14, 27, 17]
print("Number of new testing examples: ", len(new_test_images))
Number of new testing examples: 5
for i in range(len(new_test_images)):
plt.subplot(3, 5, i+1)
plt.imshow(new_test_images[i])
plt.xlabel(signs[new_IDs[i]])
plt.ylabel("New testing image")
plt.xticks([])
plt.yticks([])
plt.tight_layout(pad=0, h_pad=0, w_pad=0)
plt.show()
# New test data preprocessing

new_test_images_preprocessed = preprocess(np.asarray(new_test_images)
def y_predict_model(Input_data, top_k=5):

"""
Generates the predictions of the model over the input data, and outputs the top softmax
probabilities.
33
Parameters:
X_data: Input data.
top_k (Default = 5): The number of top softmax probabilities to be generated.
"""
num_examples = len(Input_data)
y_pred = np.zeros((num_examples, top_k), dtype=np.int32)
y_prob = np.zeros((num_examples, top_k))
LeNet_Model.saver.restore(sess, os.path.join(DIR, "LeNet"))
y_prob, y_pred = sess.run(tf.nn.top_k(tf.nn.softmax(LeNet_Model.logits), k=top_k),
feed_dict={x:Input_data, keep_prob:1, keep_prob_conv:1})
return y_prob, y_pred
y_prob, y_pred = y_predict_model(new_test_images_preprocessed)
test_accuracy = 0
for i in enumerate(new_test_images_preprocessed):
accu = new_IDs[i[0]] == np.asarray(y_pred[i[0]])[0]
if accu == True:
test_accuracy += 0.2
print("New Images Test Accuracy = {:.1f}%".format(test_accuracy*100))
for i in range(len(new_test_images_preprocessed)):
plt.subplot(5, 2, 2*i+1)
plt.imshow(new_test_images[i])
plt.title(signs[y_pred[i][0]])
plt.axis('off')
plt.subplot(5, 2, 2*i+2)
plt.barh(np.arange(1, 6, 1), y_prob[i, :])
labels = [signs[j] for j in y_pred[i]]
plt.yticks(np.arange(1, 6, 1), labels)
plt.show()
34
Keep left -
Speed limit - -
Road work -
Keep Right -
Yield -
0.0 0.2 0.4 0.6 0.8 1.0
30 km/hr -
20 km/hr -
50 km/hr -
80 km/hr -
60 km/hr -
0.0 0.2 0.4 0.6 0.8
Keep right -
Turn right -
50 km/hr -
30 km/hr -
Stop -
0.0 0.2 0.4 0.6 0.8 1.0
Gen.caution -
Traffic sig -
Road nar -
Crossing -
Pedestrians -
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.8
Keep right -
Slippery rd - -
Priority rd - -
Go left -
No entry -
0.0 0.2 0.4 0.6 0.8

1.0
35
CHAPTER 7
CONCLUSION
In this work I figured out what is deep learning. I assembled and trained the CNN model to
classify photographs of traffic sign. I measured how the accuracy depends on the number of
epochs in order to detect potential overfitting problem.
In this process of traffic sign recognition our first step is feature extraction followed by image
classification by utilizing variety of traffic signs using CNN classifier. Thus this project
uncovers the fundamental idea of CNN algorithm required to accomplish image classification
from traffic sign recognition.
My next step would be to try this model on more datasets and try to apply it to practical tasks.
I would also like to experiment with the neural network design in order to see how a higher
efficiency can be achieved in various problems. We expect to arrive at a better recognition
system for traffic sign recognition that allows convolution networks to be trained with very
few labeled samples.
Using LeNet, we've been able to reach a very high accuracy rate. We can observe that the
models saturate after nearly 10 epochs, so we can save some computational resources and
reduce the number of epochs to 10. We can also try other preprocessing techniques to further
improve the model's accuracy.. We can further improve on the model using hierarchical
CNNs to first identify broader groups (like speed signs) and then have CNNs to classify finer
features (such as the actual speed limit) This model will only work on input examples where
the traffic signs are centered in the middle of the image. It doesn't have the capability to
detect signs in the image corners.
36
REFERENCES
1. Bahlmann, C., Zhu, Y., Ramesh, V., Pellkofer, M., and Koehler, T. (2005). A system
for traffic sign detection, tracking, and recognition using color, shape, and motion
information.
2. Gao, X. W., Podladchikova, L., Shaposhnikov, D., Hong, K., and Shevtsova, N.
(2006). Recognition of traffic signs based on their colour and shape features extracted
using human vision models.
3. Broggi, A., Cerri, P., Medici, P., Porta, P. P., and Ghisio, G. (2007). Real time road
signs recognition.
4. Andrzej Ruta, Yongmin Li, Xiaohui Liu, (2010). Robust class similarity measure for
traffic sign recognition.
5. Yann LeCun, Koray Kavukcuoglu, Cl’ement Farabet.(2010). Convolution networks

and applications in vision.
37

TSR Project Report

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

TSR Project Report

Загружено:

Авторское право:

Доступные форматы

TRAFFIC SIGN RECOGNITION USING MACHINE LEARNING PROJECT REPORT

ANGEL MARY JOHN (KSD15IT003)

the University of Kerala

Department of Computer Science and Engineering

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

LBS COLLEGE OF ENGINEERING, KASARAGOD

Head of the Department

Name of student: ANUSHA RAVINDRAN

Uni.Register number: KSD15IT005

Name of guide: SANDEEP CHANDRAN

Countersigned with name:

Head, Department Of Computer Science Engineering

LBS college of engineering kasaragod Date:06/05/2019

Title Page Number

1.1 General Background 2

Chapter-2. Literature Survey 4

2.1 A system for traffic sign detection,tracking and recognition using

3.1 Proposed model 14

Chapter-4 Traffic sign classifier 18

4.1 Pipeline architecture 18

Traffic sign recognition is a multi-category classification problem with unbalanced class

Traffic sign recognition is a promising subfield of object recognition with various

1.1 GENERAL BACKGROUND

2.1 A SYSTEM FOR TRAFFIC SIGN DETECTION, TRACKING AND

2.2 RECOGNITION OF TRAFFIC SIGN BASED ON THEIR COLOUR AND SHAPE

2.3 REAL TIME ROAD SIGNS CLASSIFICATION

2.4 ROBUST CLASS SIMILARITY MEASURE FOR TRAFFIC SIGN

2.5 CONVOLUTIONAL NETWORKS AND APPLICATIONS IN VISION

Convolutional Networks are trainable multistage architectures composed of multiple stages.

The Convolutional Network architecture is a remarkably versatile, yet conceptually simple

3.1 PROPOSED MODEL

3.1.1 Feature Extraction Using CNN Extractor

3.1.2 Classification Using CNN with Deep Convolutional Features

3.1.2.1 The convolutional layer

3.1.2.2 The Non Linear Layer

3.1.2.3 The Pooling Layer

3.2 ARCHITECTURE OF PROPOSED MODEL

Data preprocessing Data modeling Training

3.3 MODULE DESCRIPTION

3.3.2 Image Classification Using CNN

TRAFFIC SIGN CLASSIFIER

4.1 PIPELINE ARCHITECTURE:

We have three .p files of 32x32 resized images:

4.1.2 Dataset Summary & Exploration

The pickled data is a dictionary with 4 key/value pairs:

4.1.3 Data Preprocessing.

4.1.3.3 Local histogram equalization

4.1.4 Designing Model Architecture

We start by explaining each network architecture, then implement it using TensorFlow.

4.1.4.1.1 LeNet Architecture

This ConvNet follows these steps:

Layer 1 (Convolutional): The output shape should be 28x28x6.

Layer 2 (Convolutional): The output shape should be 10x10x16.

Layer 3 (Fully Connected): This should have 120 outputs.

Layer 4 (Fully Connected): This should have 84 outputs.

Layer 5 (Fully Connected): This should have 10 outputs.

4.1.5 Model Training and Evaluation.

4.1.6 Testing the Model Using the Test Set.

# Importing Python libraries

with open(training_file, mode='rb') as f:

#Mapping ClassID to traffic sign names

X_train, y_train = train['features'], train['labels']

def init(self, n_out=43, mu=0, sigma=0.1, learning_rate=0.001):