CMPENEE454 Project1Report MatthewMcTaggart PeterRancourt

CMPEN/EE 454 Project 1
Image Classification using Convolutional Neural Net

Matthew McTaggart & Peter Rancourt
Overview
This project served as an introduction to image classification using a convolutional neural net
(CNN). This convolutional neural net was an 18-layer cascaded network comprised of basic
image processing operations such as normalization, convolution, rectified linear unit,
maxpool, full-connect and softmax. The motivation of the project was to show that complex
image processing tasks can in fact be performed through many elementary operations
applied many times. Image normalization scaled the input image values between the range
of -0.5 to 0.5. Convolution was performed using pretrained data given with the project.
Rectified linear unit thresholded the images such that negative values were set to 0.
Maxpool downsampled the image by a factor of two, while selecting the maximum image
intensity value in every 2x2 pixel block. Full-connect assigned all the processed information
to a value for each possible imageclass, while softmax converted these values into
probabilities. The class with the highest probability is determined as the predicted class of
the image.
The image set used for this project was cifar10testdata.mat which contains 10,000 images
with their corresponding imageclass from 1 to 10. In order the classes are airplane(1),
automobile(2), bird(3), cat(4), deer(5), dog(6), frog(7), horse(8), ship(9), and truck(10). Each
image is 32x32x3 being 32x32 pixels with red, green, and blue intensity level channels.
This project passed all images in the cifar10testdata i nto our 18-layer CNN and compared
the predicted class to the true class. It assessed the quality of the CNN through making
accuracy measurements to the predicted class, and how it responded to user defined lamp,
truck, and bird images outside of the cifar10testdata i mageset. The CNN’s accuracy is
43.71% for correctly guessing the class from the cifar10testdata imageset. For guessing the
correct class of an image within the top-3 most probable classes, the CNN’s accuracy is
78.64%. For additional images we decided to test the CNN on, it was able to guess the truck
image, but it incorrectly guessed the bird image as an airplane. The lamp image was to
evaluate the output when the CNN did not have any pretrained data for any lamp objects.
Outline of Procedural Approaches

The structure of our code consists of two subroutines, m ain() and convnn(input). The CNN
process begins in main where we start with a test image to see that our CNN code works
appropriately. We define a variable n to be 490, representing image number 490 in the test
images. We then use n to retrieve the appropriate image data from imageset and store the
data in the variable ap. From there we use ap as an input to our second subroutine, convnn.
Image 490 was chosen because it is the image used in the d ebuggingTest.mat variable
provided. We used the difference between our results and the d ebuggingTest.mat to verify
correctness.
The convnn function works to compute the various individual stages of the CNN and
ultimately outputs a list of probabilities corresponding to the likelihood of an image being
within a certain category. We start with layer one where we normalize the input data’s RGB
pixel values to be in the range of -0.5 to 0.5. This was done using the equation i nput / 255.0 -
0.5 where input is a 32x32x3 array. The result is stored in the variable l ayer1.
layer1 is then used to compute layer2, where we convolve the image data from layer1 with
the filter banks for this stage (filterbanks{2}). We used two for loops to go through the filter
banks and layers, where the outer loop uses a variable i in the range of 1 to 10 (classes) and
the inner loop uses a variable k in range of 1 to 3 (RGB channels). We chose to use nested
for loops for the convolution steps because it is a traversal method that we are both familiar
with and can therefore both understand and write code that functions correctly. i is used to
access the 10 arrays from the 32x32x10 layer 2 output array and the 3x3x10 filter bank
arrays. k is used to access the 3 arrays from the 32x32x3 input array. The innermost for loop
calculates the convolution for each channel and sums them, and the outer for loop then adds
the bias vector to the calculated sum.
The result from layer 2 then is used in the ReLU phase (activation function) that is layer 3,
where we simply take the values calculated in layer 2 and set values which are negative and
set them to zero. We can do this using the max function, where each value within the 10
arrays are compared with 0 to see which is larger.
Layer 4 and 5 follow, where their layout is very similar to layers 2 and 3. Layer 6, however,
we calculate the maxpool. In this project, each of the 10 arrays from the previous layer is
downscaled by a factor of 2 every time maxpool is called. To calculate the downscaled
arrays, we used a 3 nested for loops. The outer for loop uses a variable l which is meant to
cycle through each of the 10 arrays from the input. The next for loop is used to cycle through
every other row in the array, using the variable i with a value of 1:2:(M-1), where M is the
height of the array. The innermost for loop is meant to cycle through every other column in
the array using the variable j with a value of 1:2:(N-1), where N is the width of the array.
Inside of the innermost for loop, 2x2 blocks are taken from the input array and the maximum
from each is calculated using the max function. This process starts at the upper left 2x2
block and works row by row, ending at the lower right 2x2 block. At the end of each iteration
of this loop, we receive 4 values. In layer 6, for instance, we receive 16 values total for each
of the 10 arrays. We again chose to use nested for loops to calculate these values because
it is easily understood and easily portable to other maxpool calculations done in other layers.
Layers 7-16 use the techniques defined for previous layers, but layer 17 is different as it
implements fullconnect. We took a similar approach to previous layers by using two nested
for loops. The outer for loop uses variable l which takes values from 1 to 10 and is used to
access each of the output’s 10 arrays as well as the 10 filter arrays in each of 10 sets of filter
arrays in filterbanks. The inner for loop uses a variable k which also takes values 1 to 10 and
is used to access each of the 10 sets of 10 filter arrays in filter banks, as well as each of the
10 arrays from the input from layer 16. Inside of the inner for loop, we sum the sum of filter
banks, which are multiplied by the input arrays (in code: sum(sum(filterbanks{17}(:, :, k,
l).*layer16(:, :, k)))). The outer for loop then adds the corresponding bias vectors to the
summed value.
The final layer then gives us a probability for each of the 10 arrays from layer 17. To emulate
the equations:
we set 𝞪 to the maximum value from the values calculated in layer 17. From there, we used
a for loop to go through each of the 10 values and calculate their probability. The output from
this layer is then returned to the main function.
In the main function, we create a table for our confusion matrix. From there, we test each of
the 10000 images using a for loop to iterate. main calls convnn for each image and the value
returned is split into its probability and the predicted class. Its predicted class is then
compared to its actual class and is recorded in the confusion matrix based on its class index
and predicted class index. Those with the same index correspond to a diagonal value in the
table (e.g. (0,0), (1,1), ... , (10,10)). Once each of the 10000 images are classified, we find
the accuracy by summing the correctly identified (diagonal) values and dividing them by the
total 10000.
The flowchart showing how these subroutines interact can be seen below:
For our project, we passed each image in the imageset cifar10testdata.mat into our CNN
and saved the results into the tableAccuracy.mat variable. This variable contains 10, 10x10
confusion matrices which correspond to the top-k classifications. The total run time of our
CNN for the imageset was 34.6 minutes. The submitted code loads from the
tableAccuracy.mat variable to calculate the accuracy without needed to compile for another
34.6 minutes.
Experimental Observations
Based upon our intermediate and final results for the test case, photo number 490, our code
seems to be working as it should. The variation between our results and the test results at
each layer, shown by layerResults, was zero or very close to zero (a value multiplied by 10-14
→ 10-18) for each pixel . Some examples of the intermediate results are shown below, where
the console output is our layer result (e.g. layer1, layer2, etc.) subtracted by the
test/expected layer result (e.g. layerResult{1}, layerResult{2}, etc.):
Layer 1 Layer 2
Layer 3 Layer 4
Layer 5 Layer 6
Layer 7 Layer 8
Layer 9 Layer 10

Layer 11 Layer 12
Layer 13 Layer 14

Layer 15
Layer 16 Layer 17 Layer 18

Additionally, we can verify that our code calculates the correct number of values by seeing
that the resultant matrices are of the correct size after each stage is calculated:
We additionally checked the images from several of the intermediate layers to further verify
the functionality. These images are shown below:
Layer 2: Convolution
Layer 3: ReLU
Layer 5: ReLU
Layer 6: Maxpool
Layer 8: ReLU

Layer 10: ReLU
Layer 11: Maxpool
Each convolution yields an altered version of the image from the layer before it, highlighting
a specific feature or lack thereof. Additionally, maxpool creates an image that is ¼ the size of
the layer before it as we expected.
The output of running image 490 does in fact match the output of the debugging test (see
image below).
The results of the debugging test is shown in the following image. We can see that class 1,
or airplane, has the highest probability which matches with the above command window.
Run Performance Evaluation

In our main function, we tested all 10000 images in the convnn function, where their results
were placed into a matrix based on their predicted class index and their true class index; this
formed our confusion matrix. The confusion matrix can be seen below:
After adding the diagonal values and dividing by the total number of pictures, we determined
the overall accuracy of the CNN to be 43.71%. This rate is lower than we anticipated, but it is
a good starting position for just starting to create image classification software. More image
training for each different class could potentially raise the accuracy.
The following figure shows the accuracy curve for the top-k classes. As previously
mentioned, the accuracy of the CNN is 43.71% for guessing the correct class from the
highest probability. The CNN correctly guesses the class within the top two probabilities with
an accuracy of 65.91%. Eventually and as expected, the CNN has 100% accuracy when
considering the top-10 probabilities as there are only 10 classes.
Figure 1 - Top-k Classification Rates

Exploration
Some additional images from the web were used to asses the response of the CNN
developed in this project. We used an image of a bird, truck and of a lamp. The lamp was
chosen because we wanted to explore how the CNN would behave to images that are not
classified by the CNN. The following figure shows the native images found from the web.
Figure 2 - Additional images to evaluate the effectiveness of the CNN.
Each image is 256x256 pixels with 3 color channels. In order for these images to be used in
the CNN, they were downsampled to 32x32 pixels with 3 color channels. In order to do so,
the downsampled image were created by selecting every 8th pixel of the original image
convolved with a gaussian filter with standard deviation of 2. The gaussian smoothing helps
prevent high-frequency artifacts when downsampling. The standard deviation of 2 was
chosen after some trial and error. In this project, all standard deviations lower than 4 had
consistent outputs. Standard deviations greater than 4 blurred the image too much and
classifications were not consistent when increasing the standard deviation from 4.
Using the listed configuration from above, here are the following results.
Finding the class for the bird you picked
The estimated class is airplane with probability 0.3042
Finding the class for the truck you picked
The estimated class is truck with probability 0.3055
Finding the class for the lamp you picked
The estimated class is bird with probability 0.3318
From these results, it appears that the bird is classified as an airplane probably due to the
presence of the sky in the bird image. As most of the training data for the airplane
classification consists of regions of sky (or blue), then input images that have quite a bit of
blue might correlate highly with the airplane training data. The truck happens to be
correlated well with the truck training data and has been classified correctly. The lamp
however, has no training data so it is classified by what object is has the most similarity to in
the training data.
Using a standard deviation of 4, here are the following results
Finding the class for the bird you picked
Finding the class for the truck you picked
Finding the class for the lamp you picked
The estimated class is frog with probability 0.3168
Here we see that the classifications are not correct for all images. The truck image is now
classified as an airplane, the bird remains classified as an airplane, but the lamp is now
classified as a frog. With a standard deviation of 10, the other remain the same, but the truck
is now classified as a ship.
For further testing, it is best to use an image that contains the object of interest as large as
possible, with only that object in the image. To account for an unknown object in an input
image, we could evaluate an unknown class if many of the known objects’ probabilities are
too close to each other. If for example, all classes approach a uniform distribution, the output
could be “unable to detect class”. Another enhancement could be adding a particular
threshold for saying what an object is. More specifically, if a class is not above at least 0.25
probability, it could be said that classifications are too ambiguous. If an unknown image is
placed in, for example as with the lamp, this could be the scenario that plays out where the
classification does not exist within the training data.
Documentation of Roles on Project
MATLAB Coding for CNN Peter & Matthew
MATLAB Project Code Streamlining & Comments Matthew
Overview Matthew
Outline of Procedural Approaches Peter
Experimental Observations Peter
Run Performance Evaluation Peter
Exploration Matthew

CMPENEE454 Project1Report MatthewMcTaggart PeterRancourt

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

CMPENEE454 Project1Report MatthewMcTaggart PeterRancourt

Загружено:

Авторское право:

Доступные форматы

CMPEN/EE 454 Project 1

Image Classification using Convolutional Neural Net

Outline of Procedural Approaches

Layer 9 Layer 10

Layer 13 Layer 14

Layer 16 Layer 17 Layer 18

Layer 2: Convolution

Layer 3: ReLU

Layer 4: Convolution

Layer 5: ReLU

Layer 6: Maxpool

Layer 7: Convolution

Layer 8: ReLU

Layer 10: ReLU

Layer 11: Maxpool

Run Performance Evaluation

Figure 1 - Top-k Classification Rates

MATLAB Project Code Streamlining & Comments Matthew

Outline of Procedural Approaches Peter

Experimental Observations Peter

Run Performance Evaluation Peter

Вам также может понравиться

CMPENEE454 Project1Report MatthewMcTaggart PeterRancourt

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

CMPENEE454 Project1Report MatthewMcTaggart PeterRancourt

Загружено:

Авторское право:

Доступные форматы

CMPEN/EE​ ​454​ ​Project​ ​1

Image​ ​Classification​ ​using​ ​Convolutional​ ​Neural​ ​Net

Outline​ ​of​ ​Procedural​ ​Approaches

Layer​ ​9 Layer​ ​10

​ ​ ​ ​ ​ ​ ​ ​Layer​ ​13 ​ ​ ​ ​ ​ ​ ​Layer​ ​14

​ ​ ​ ​ ​ ​ ​Layer​ ​16 ​ ​ ​ ​ ​ ​ ​Layer​ ​17 ​ ​ ​ ​ ​ ​ ​ ​ ​Layer​ ​18

Layer​ ​2:​ ​Convolution

Layer​ ​3:​ ​ReLU

Layer​ ​4:​ ​Convolution

Layer​ ​5:​ ​ReLU

Layer​ ​6:​ ​Maxpool

Layer​ ​7:​ ​Convolution

Layer​ ​8:​ ​ReLU

Layer​ ​10:​ ​ReLU

Layer​ ​11:​ ​Maxpool

Run​ ​Performance​ ​Evaluation

Figure​ ​1​ ​-​ ​Top-k​ ​Classification​ ​Rates

MATLAB​ ​Project​ ​Code​ ​Streamlining​ ​&​ ​Comments Matthew

Outline​ ​of​ ​Procedural​ ​Approaches Peter

Experimental​ ​Observations Peter

Run​ ​Performance​ ​Evaluation Peter

Вам также может понравиться

CMPEN/EE 454 Project 1

Image Classification using Convolutional Neural Net

Outline of Procedural Approaches

Layer 9 Layer 10

Layer 13 Layer 14

Layer 16 Layer 17 Layer 18

Layer 2: Convolution

Layer 3: ReLU

Layer 4: Convolution

Layer 5: ReLU

Layer 6: Maxpool

Layer 7: Convolution

Layer 8: ReLU

Layer 10: ReLU

Layer 11: Maxpool

Run Performance Evaluation

Figure 1 - Top-k Classification Rates

MATLAB Project Code Streamlining & Comments Matthew

Outline of Procedural Approaches Peter

Experimental Observations Peter

Run Performance Evaluation Peter