Вы находитесь на странице: 1из 14

CMPEN/EE​ ​454​ ​Project​ ​1

Image​ ​Classification​ ​using​ ​Convolutional​ ​Neural​ ​Net


Matthew​ ​McTaggart​ ​&​ ​Peter​ ​Rancourt

Overview
This​ ​project​ ​served​ ​as​ ​an​ ​introduction​ ​to​ ​image​ ​classification​ ​using​ ​a​ ​convolutional​ ​neural​ ​net
(CNN).​ ​This​ ​convolutional​ ​neural​ ​net​ ​was​ ​an​ ​18-layer​ ​cascaded​ ​network​ ​comprised​ ​of​ ​basic
image​ ​processing​ ​operations​ ​such​ ​as​ ​normalization,​ ​convolution,​ ​rectified​ ​linear​ ​unit,
maxpool,​ ​full-connect​ ​and​ ​softmax.​ ​The​ ​motivation​ ​of​ ​the​ ​project​ ​was​ ​to​ ​show​ ​that​ ​complex
image​ ​processing​ ​tasks​ ​can​ ​in​ ​fact​ ​be​ ​performed​ ​through​ ​many​ ​elementary​ ​operations
applied​ ​many​ ​times.​ ​Image​ ​normalization​ ​scaled​ ​the​ ​input​ ​image​ ​values​ ​between​ ​the​ ​range
of​ ​-0.5​ ​to​ ​0.5.​ ​Convolution​ ​was​ ​performed​ ​using​ ​pretrained​ ​data​ ​given​ ​with​ ​the​ ​project.
Rectified​ ​linear​ ​unit​ ​thresholded​ ​the​ ​images​ ​such​ ​that​ ​negative​ ​values​ ​were​ ​set​ ​to​ ​0.
Maxpool​ ​downsampled​ ​the​ ​image​ ​by​ ​a​ ​factor​ ​of​ ​two,​ ​while​ ​selecting​ ​the​ ​maximum​ ​image
intensity​ ​value​ ​in​ ​every​ ​2x2​ ​pixel​ ​block.​ ​Full-connect​ ​assigned​ ​all​ ​the​ ​processed​ ​information
to​ ​a​ ​value​ ​for​ ​each​ ​possible​ ​imageclass,​ ​while​ ​softmax​ ​converted​ ​these​ ​values​ ​into
probabilities.​ ​The​ ​class​ ​with​ ​the​ ​highest​ ​probability​ ​is​ ​determined​ ​as​ ​the​ ​predicted​ ​class​ ​of
the​ ​image.

The​ ​image​ ​set​ ​used​ ​for​ ​this​ ​project​ ​was​ ​cifar10testdata.mat​ ​which​ ​contains​ ​10,000​ ​images
with​ ​their​ ​corresponding​ ​imageclass​ ​from​ ​1​ ​to​ ​10.​ ​In​ ​order​ ​the​ ​classes​ ​are​ ​airplane(1),
automobile(2),​ ​bird(3),​ ​cat(4),​ ​deer(5),​ ​dog(6),​ ​frog(7),​ ​horse(8),​ ​ship(9),​ ​and​ ​truck(10).​ ​Each
image​ ​is​ ​32x32x3​ ​being​ ​32x32​ ​pixels​ ​with​ ​red,​ ​green,​ ​and​ ​blue​ ​intensity​ ​level​ ​channels.

This​ ​project​ ​passed​ ​all​ ​images​ ​in​ ​the​ ​cifar10testdata​ i​ nto​ ​our​ ​18-layer​ ​CNN​ ​and​ ​compared
the​ ​predicted​ ​class​ ​to​ ​the​ ​true​ ​class.​ ​It​ ​assessed​ ​the​ ​quality​ ​of​ ​the​ ​CNN​ ​through​ ​making
accuracy​ ​measurements​ ​to​ ​the​ ​predicted​ ​class,​ ​and​ ​how​ ​it​ ​responded​ ​to​ ​user​ ​defined​ ​lamp,
truck,​ ​and​ ​bird​ ​images​ ​outside​ ​of​ ​the​ ​cifar10testdata​ i​ mageset.​ ​The​ ​CNN’s​ ​accuracy​ ​is
43.71%​ ​for​ ​correctly​ ​guessing​ ​the​ ​class​ ​from​ ​the​ ​cifar10testdata​ ​imageset.​ ​For​ ​guessing​ ​the
correct​ ​class​ ​of​ ​an​ ​image​ ​within​ ​the​ ​top-3​ ​most​ ​probable​ ​classes,​ ​the​ ​CNN’s​ ​accuracy​ ​is
78.64%.​ ​For​ ​additional​ ​images​ ​we​ ​decided​ ​to​ ​test​ ​the​ ​CNN​ ​on,​ ​it​ ​was​ ​able​ ​to​ ​guess​ ​the​ ​truck
image,​ ​but​ ​it​ ​incorrectly​ ​guessed​ ​the​ ​bird​ ​image​ ​as​ ​an​ ​airplane.​ ​The​ ​lamp​ ​image​ ​was​ ​to
evaluate​ ​the​ ​output​ ​when​ ​the​ ​CNN​ ​did​ ​not​ ​have​ ​any​ ​pretrained​ ​data​ ​for​ ​any​ ​lamp​ ​objects.

Outline​ ​of​ ​Procedural​ ​Approaches


The​ ​structure​ ​of​ ​our​ ​code​ ​consists​ ​of​ ​two​ ​subroutines,​ m ​ ain()​ ​and​ ​convnn(input)​.​ ​The​ ​CNN
process​ ​begins​ ​in​ ​main​ ​where​ ​we​ ​start​ ​with​ ​a​ ​test​ ​image​ ​to​ ​see​ ​that​ ​our​ ​CNN​ ​code​ ​works
appropriately.​ ​We​ ​define​ ​a​ ​variable​ ​n​ ​to​ ​be​ ​490,​ ​representing​ ​image​ ​number​ ​490​ ​in​ ​the​ ​test
images.​ ​We​ ​then​ ​use​ ​n​ ​to​ ​retrieve​ ​the​ ​appropriate​ ​image​ ​data​ ​from​ ​imageset​ ​and​ ​store​ ​the
data​ ​in​ ​the​ ​variable​ ​ap​.​ ​From​ ​there​ ​we​ ​use​ ​ap​ ​as​ ​an​ ​input​ ​to​ ​our​ ​second​ ​subroutine,​ ​convnn​.
Image​ ​490​ ​was​ ​chosen​ ​because​ ​it​ ​is​ ​the​ ​image​ ​used​ ​in​ ​the​ d ​ ebuggingTest.mat​ ​variable
provided.​ ​We​ ​used​ ​the​ ​difference​ ​between​ ​our​ ​results​ ​and​ ​the​ d ​ ebuggingTest.mat​ ​to​ ​verify
correctness.
The​ ​convnn​ ​function​ ​works​ ​to​ ​compute​ ​the​ ​various​ ​individual​ ​stages​ ​of​ ​the​ ​CNN​ ​and
ultimately​ ​outputs​ ​a​ ​list​ ​of​ ​probabilities​ ​corresponding​ ​to​ ​the​ ​likelihood​ ​of​ ​an​ ​image​ ​being
within​ ​a​ ​certain​ ​category.​ ​We​ ​start​ ​with​ ​layer​ ​one​ ​where​ ​we​ ​normalize​ ​the​ ​input​ ​data’s​ ​RGB
pixel​ ​values​ ​to​ ​be​ ​in​ ​the​ ​range​ ​of​ ​-0.5​ ​to​ ​0.5.​ ​This​ ​was​ ​done​ ​using​ ​the​ ​equation​ i​ nput​ ​/​ ​255.0​ ​-
0.5​ ​where​ ​input​ ​is​ ​a​ ​32x32x3​ ​array.​ ​The​ ​result​ ​is​ ​stored​ ​in​ ​the​ ​variable​ l​ ayer1.

layer1​ ​is​ ​then​ ​used​ ​to​ ​compute​ ​layer2​,​ ​where​ ​we​ ​convolve​ ​the​ ​image​ ​data​ ​from​ ​layer1​ ​with
the​ ​filter​ ​banks​ ​for​ ​this​ ​stage​ ​(​filterbanks{2}​).​ ​We​ ​used​ ​two​ ​for​ ​loops​ ​to​ ​go​ ​through​ ​the​ ​filter
banks​ ​and​ ​layers,​ ​where​ ​the​ ​outer​ ​loop​ ​uses​ ​a​ ​variable​ i​ ​ ​in​ ​the​ ​range​ ​of​ ​1​ ​to​ ​10​ ​(classes)​ ​and
the​ ​inner​ ​loop​ ​uses​ ​a​ ​variable​ ​k​ ​in​ ​range​ ​of​ ​1​ ​to​ ​3​ ​(RGB​ ​channels).​ ​We​ ​chose​ ​to​ ​use​ ​nested
for​ ​loops​ ​for​ ​the​ ​convolution​ ​steps​ ​because​ ​it​ ​is​ ​a​ ​traversal​ ​method​ ​that​ ​we​ ​are​ ​both​ ​familiar
with​ ​and​ ​can​ ​therefore​ ​both​ ​understand​ ​and​ ​write​ ​code​ ​that​ ​functions​ ​correctly.​ i​ ​ ​is​ ​used​ ​to
access​ ​the​ ​10​ ​arrays​ ​from​ ​the​ ​32x32x10​ ​layer​ ​2​ ​output​ ​array​ ​and​ ​the​ ​3x3x10​ ​filter​ ​bank
arrays.​ ​k​ ​is​ ​used​ ​to​ ​access​ ​the​ ​3​ ​arrays​ ​from​ ​the​ ​32x32x3​ ​input​ ​array.​ ​The​ ​innermost​ ​for​ ​loop
calculates​ ​the​ ​convolution​ ​for​ ​each​ ​channel​ ​and​ ​sums​ ​them,​ ​and​ ​the​ ​outer​ ​for​ ​loop​ ​then​ ​adds
the​ ​bias​ ​vector​ ​to​ ​the​ ​calculated​ ​sum.

The​ ​result​ ​from​ ​layer​ ​2​ ​then​ ​is​ ​used​ ​in​ ​the​ ​ReLU​ ​phase​ ​(activation​ ​function)​ ​that​ ​is​ ​layer​ ​3,
where​ ​we​ ​simply​ ​take​ ​the​ ​values​ ​calculated​ ​in​ ​layer​ ​2​ ​and​ ​set​ ​values​ ​which​ ​are​ ​negative​ ​and
set​ ​them​ ​to​ ​zero.​ ​We​ ​can​ ​do​ ​this​ ​using​ ​the​ ​max​ ​function,​ ​where​ ​each​ ​value​ ​within​ ​the​ ​10
arrays​ ​are​ ​compared​ ​with​ ​0​ ​to​ ​see​ ​which​ ​is​ ​larger.

Layer​ ​4​ ​and​ ​5​ ​follow,​ ​where​ ​their​ ​layout​ ​is​ ​very​ ​similar​ ​to​ ​layers​ ​2​ ​and​ ​3.​ ​Layer​ ​6,​ ​however,
we​ ​calculate​ ​the​ ​maxpool.​ ​In​ ​this​ ​project,​ ​each​ ​of​ ​the​ ​10​ ​arrays​ ​from​ ​the​ ​previous​ ​layer​ ​is
downscaled​ ​by​ ​a​ ​factor​ ​of​ ​2​ ​every​ ​time​ ​maxpool​ ​is​ ​called.​ ​To​ ​calculate​ ​the​ ​downscaled
arrays,​ ​we​ ​used​ ​a​ ​3​ ​nested​ ​for​ ​loops.​ ​The​ ​outer​ ​for​ ​loop​ ​uses​ ​a​ ​variable​ l​ ​ ​which​ ​is​ ​meant​ ​to
cycle​ ​through​ ​each​ ​of​ ​the​ ​10​ ​arrays​ ​from​ ​the​ ​input.​ ​The​ ​next​ ​for​ ​loop​ ​is​ ​used​ ​to​ ​cycle​ ​through
every​ ​other​ ​row​ ​in​ ​the​ ​array,​ ​using​ ​the​ ​variable​ i​ ​ ​with​ ​a​ ​value​ ​of​ ​1:2:(M-1),​ ​where​ ​M​ ​is​ ​the
height​ ​of​ ​the​ ​array.​ ​The​ ​innermost​ ​for​ ​loop​ ​is​ ​meant​ ​to​ ​cycle​ ​through​ ​every​ ​other​ ​column​ ​in
the​ ​array​ ​using​ ​the​ ​variable​ ​j​ ​with​ ​a​ ​value​ ​of​ ​1:2:(N-1),​ ​where​ ​N​ ​is​ ​the​ ​width​ ​of​ ​the​ ​array.
Inside​ ​of​ ​the​ ​innermost​ ​for​ ​loop,​ ​2x2​ ​blocks​ ​are​ ​taken​ ​from​ ​the​ ​input​ ​array​ ​and​ ​the​ ​maximum
from​ ​each​ ​is​ ​calculated​ ​using​ ​the​ ​max​ ​function.​ ​This​ ​process​ ​starts​ ​at​ ​the​ ​upper​ ​left​ ​2x2
block​ ​and​ ​works​ ​row​ ​by​ ​row,​ ​ending​ ​at​ ​the​ ​lower​ ​right​ ​2x2​ ​block.​ ​At​ ​the​ ​end​ ​of​ ​each​ ​iteration
of​ ​this​ ​loop,​ ​we​ ​receive​ ​4​ ​values.​ ​In​ ​layer​ ​6,​ ​for​ ​instance,​ ​we​ ​receive​ ​16​ ​values​ ​total​ ​for​ ​each
of​ ​the​ ​10​ ​arrays.​ ​We​ ​again​ ​chose​ ​to​ ​use​ ​nested​ ​for​ ​loops​ ​to​ ​calculate​ ​these​ ​values​ ​because
it​ ​is​ ​easily​ ​understood​ ​and​ ​easily​ ​portable​ ​to​ ​other​ ​maxpool​ ​calculations​ ​done​ ​in​ ​other​ ​layers.

Layers​ ​7-16​ ​use​ ​the​ ​techniques​ ​defined​ ​for​ ​previous​ ​layers,​ ​but​ ​layer​ ​17​ ​is​ ​different​ ​as​ ​it
implements​ ​fullconnect.​ ​We​ ​took​ ​a​ ​similar​ ​approach​ ​to​ ​previous​ ​layers​ ​by​ ​using​ ​two​ ​nested
for​ ​loops.​ ​The​ ​outer​ ​for​ ​loop​ ​uses​ ​variable​ ​l​ ​which​ ​takes​ ​values​ ​from​ ​1​ ​to​ ​10​ ​and​ ​is​ ​used​ ​to
access​ ​each​ ​of​ ​the​ ​output’s​ ​10​ ​arrays​ ​as​ ​well​ ​as​ ​the​ ​10​ ​filter​ ​arrays​ ​in​ ​each​ ​of​ ​10​ ​sets​ ​of​ ​filter
arrays​ ​in​ ​filterbanks.​ ​The​ ​inner​ ​for​ ​loop​ ​uses​ ​a​ ​variable​ k​ ​ ​which​ ​also​ ​takes​ ​values​ ​1​ ​to​ ​10​ ​and
is​ ​used​ ​to​ ​access​ ​each​ ​of​ ​the​ ​10​ ​sets​ ​of​ ​10​ ​filter​ ​arrays​ ​in​ ​filter​ ​banks,​ ​as​ ​well​ ​as​ ​each​ ​of​ ​the
10​ ​arrays​ ​from​ ​the​ ​input​ ​from​ ​layer​ ​16.​ ​Inside​ ​of​ ​the​ ​inner​ ​for​ ​loop,​ ​we​ ​sum​ ​the​ ​sum​ ​of​ ​filter
banks,​ ​which​ ​are​ ​multiplied​ ​by​ ​the​ ​input​ ​arrays​ ​(in​ ​code:​ ​sum(sum(filterbanks{17}(:,​ ​:,​ ​k,
l).*layer16(:,​ ​:,​ ​k)))).​ ​The​ ​outer​ ​for​ ​loop​ ​then​ ​adds​ ​the​ ​corresponding​ ​bias​ ​vectors​ ​to​ ​the
summed​ ​value.

The​ ​final​ ​layer​ ​then​ ​gives​ ​us​ ​a​ ​probability​ ​for​ ​each​ ​of​ ​the​ ​10​ ​arrays​ ​from​ ​layer​ ​17.​ ​To​ ​emulate
the​ ​equations:

we​ ​set​ ​𝞪​ ​to​ ​the​ ​maximum​ ​value​ ​from​ ​the​ ​values​ ​calculated​ ​in​ ​layer​ ​17.​ ​From​ ​there,​ ​we​ ​used
a​ ​for​ ​loop​ ​to​ ​go​ ​through​ ​each​ ​of​ ​the​ ​10​ ​values​ ​and​ ​calculate​ ​their​ ​probability.​ ​The​ ​output​ ​from
this​ ​layer​ ​is​ ​then​ ​returned​ ​to​ ​the​ ​main​ ​function.

In​ ​the​ ​main​ ​function,​ ​we​ ​create​ ​a​ ​table​ ​for​ ​our​ ​confusion​ ​matrix.​ ​From​ ​there,​ ​we​ ​test​ ​each​ ​of
the​ ​10000​ ​images​ ​using​ ​a​ ​for​ ​loop​ ​to​ ​iterate.​ ​main​ ​calls​ ​convnn​ ​for​ ​each​ ​image​ ​and​ ​the​ ​value
returned​ ​is​ ​split​ ​into​ ​its​ ​probability​ ​and​ ​the​ ​predicted​ ​class.​ ​Its​ ​predicted​ ​class​ ​is​ ​then
compared​ ​to​ ​its​ ​actual​ ​class​ ​and​ ​is​ ​recorded​ ​in​ ​the​ ​confusion​ ​matrix​ ​based​ ​on​ ​its​ ​class​ ​index
and​ ​predicted​ ​class​ ​index.​ ​Those​ ​with​ ​the​ ​same​ ​index​ ​correspond​ ​to​ ​a​ ​diagonal​ ​value​ ​in​ ​the
table​ ​(e.g.​ ​(0,0),​ ​(1,1),​ ​...​ ​,​ ​(10,10)).​ ​Once​ ​each​ ​of​ ​the​ ​10000​ ​images​ ​are​ ​classified,​ ​we​ ​find
the​ ​accuracy​ ​by​ ​summing​ ​the​ ​correctly​ ​identified​ ​(diagonal)​ ​values​ ​and​ ​dividing​ ​them​ ​by​ ​the
total​ ​10000.

The​ ​flowchart​ ​showing​ ​how​ ​these​ ​subroutines​ ​interact​ ​can​ ​be​ ​seen​ ​below:

For​ ​our​ ​project,​ ​we​ ​passed​ ​each​ ​image​ ​in​ ​the​ ​imageset​ ​cifar10testdata.mat​ ​into​ ​our​ ​CNN
and​ ​saved​ ​the​ ​results​ ​into​ ​the​ ​tableAccuracy.mat​ ​variable.​ ​This​ ​variable​ ​contains​ ​10,​ ​10x10
confusion​ ​matrices​ ​which​ ​correspond​ ​to​ ​the​ ​top-k​ ​classifications.​ ​The​ ​total​ ​run​ ​time​ ​of​ ​our
CNN​ ​for​ ​the​ ​imageset​ ​was​ ​34.6​ ​minutes.​ ​The​ ​submitted​ ​code​ ​loads​ ​from​ ​the
tableAccuracy.mat​ ​variable​ ​to​ ​calculate​ ​the​ ​accuracy​ ​without​ ​needed​ ​to​ ​compile​ ​for​ ​another
34.6​ ​minutes.
Experimental​ ​Observations
Based​ ​upon​ ​our​ ​intermediate​ ​and​ ​final​ ​results​ ​for​ ​the​ ​test​ ​case,​ ​photo​ ​number​ ​490,​ ​our​ ​code
seems​ ​to​ ​be​ ​working​ ​as​ ​it​ ​should.​ ​The​ ​variation​ ​between​ ​our​ ​results​ ​and​ ​the​ ​test​ ​results​ ​at
each​ ​layer​,​ ​shown​ ​by​ ​layerResults​,​ ​was​ ​zero​ ​or​ ​very​ ​close​ ​to​ ​zero​ ​(a​ ​value​ ​multiplied​ ​by​ ​10​-14
→​ ​10​-18​)​ ​for​ ​each​ ​pixel​ ​.​ ​Some​ ​examples​ ​of​ ​the​ ​intermediate​ ​results​ ​are​ ​shown​ ​below,​ ​where
the​ ​console​ ​output​ ​is​ ​our​ ​layer​ ​result​ ​(e.g.​ ​layer1,​ ​layer2,​ ​etc.)​ ​subtracted​ ​by​ ​the
test/expected​ ​layer​ ​result​ ​(e.g.​ ​layerResult{1},​ ​layerResult{2},​ ​etc.):

Layer​ ​1 Layer​ ​2
​ ​ ​ ​ ​ ​ ​ ​ ​Layer​ ​3 ​ ​ ​ ​ ​Layer​ ​4

​ ​ ​ ​ ​ ​ ​Layer​ ​5 Layer​ ​6
​ ​ ​ ​ ​ ​ ​Layer​ ​7 Layer​ ​8

Layer​ ​9 Layer​ ​10


Layer​ ​11 ​ ​ ​ ​ ​ ​Layer​ ​12

​ ​ ​ ​ ​ ​ ​ ​Layer​ ​13 ​ ​ ​ ​ ​ ​ ​Layer​ ​14


​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​Layer​ ​15

​ ​ ​ ​ ​ ​ ​Layer​ ​16 ​ ​ ​ ​ ​ ​ ​Layer​ ​17 ​ ​ ​ ​ ​ ​ ​ ​ ​Layer​ ​18


Additionally,​ ​we​ ​can​ ​verify​ ​that​ ​our​ ​code​ ​calculates​ ​the​ ​correct​ ​number​ ​of​ ​values​ ​by​ ​seeing
that​ ​the​ ​resultant​ ​matrices​ ​are​ ​of​ ​the​ ​correct​ ​size​ ​after​ ​each​ ​stage​ ​is​ ​calculated:

We​ ​additionally​ ​checked​ ​the​ ​images​ ​from​ ​several​ ​of​ ​the​ ​intermediate​ ​layers​ ​to​ ​further​ ​verify
the​ ​functionality.​ ​These​ ​images​ ​are​ ​shown​ ​below:

Layer​ ​2:​ ​Convolution

Layer​ ​3:​ ​ReLU

Layer​ ​4:​ ​Convolution

Layer​ ​5:​ ​ReLU

Layer​ ​6:​ ​Maxpool

Layer​ ​7:​ ​Convolution

Layer​ ​8:​ ​ReLU


Layer​ ​9:​ ​Convolution

Layer​ ​10:​ ​ReLU

Layer​ ​11:​ ​Maxpool

Each​ ​convolution​ ​yields​ ​an​ ​altered​ ​version​ ​of​ ​the​ ​image​ ​from​ ​the​ ​layer​ ​before​ ​it,​ ​highlighting
a​ ​specific​ ​feature​ ​or​ ​lack​ ​thereof.​ ​Additionally,​ ​maxpool​ ​creates​ ​an​ ​image​ ​that​ ​is​ ​¼​ ​the​ ​size​ ​of
the​ ​layer​ ​before​ ​it​ ​as​ ​we​ ​expected.

The​ ​output​ ​of​ ​running​ ​image​ ​490​ ​does​ ​in​ ​fact​ ​match​ ​the​ ​output​ ​of​ ​the​ ​debugging​ ​test​ ​(see
image​ ​below).

The​ ​results​ ​of​ ​the​ ​debugging​ ​test​ ​is​ ​shown​ ​in​ ​the​ ​following​ ​image.​ ​We​ ​can​ ​see​ ​that​ ​class​ ​1,
or​ ​airplane,​ ​has​ ​the​ ​highest​ ​probability​ ​which​ ​matches​ ​with​ ​the​ ​above​ ​command​ ​window.

Run​ ​Performance​ ​Evaluation


In​ ​our​ ​main​ ​function,​ ​we​ ​tested​ ​all​ ​10000​ ​images​ ​in​ ​the​ ​convnn​ ​function,​ ​where​ ​their​ ​results
were​ ​placed​ ​into​ ​a​ ​matrix​ ​based​ ​on​ ​their​ ​predicted​ ​class​ ​index​ ​and​ ​their​ ​true​ ​class​ ​index;​ ​this
formed​ ​our​ ​confusion​ ​matrix.​ ​The​ ​confusion​ ​matrix​ ​can​ ​be​ ​seen​ ​below:
After​ ​adding​ ​the​ ​diagonal​ ​values​ ​and​ ​dividing​ ​by​ ​the​ ​total​ ​number​ ​of​ ​pictures,​ ​we​ ​determined
the​ ​overall​ ​accuracy​ ​of​ ​the​ ​CNN​ ​to​ ​be​ ​43.71%.​ ​This​ ​rate​ ​is​ ​lower​ ​than​ ​we​ ​anticipated,​ ​but​ ​it​ ​is
a​ ​good​ ​starting​ ​position​ ​for​ ​just​ ​starting​ ​to​ ​create​ ​image​ ​classification​ ​software.​ ​More​ ​image
training​ ​for​ ​each​ ​different​ ​class​ ​could​ ​potentially​ ​raise​ ​the​ ​accuracy.

The​ ​following​ ​figure​ ​shows​ ​the​ ​accuracy​ ​curve​ ​for​ ​the​ ​top-k​ ​classes.​ ​As​ ​previously
mentioned,​ ​the​ ​accuracy​ ​of​ ​the​ ​CNN​ ​is​ ​43.71%​ ​for​ ​guessing​ ​the​ ​correct​ ​class​ ​from​ ​the
highest​ ​probability.​ ​The​ ​CNN​ ​correctly​ ​guesses​ ​the​ ​class​ ​within​ ​the​ ​top​ ​two​ ​probabilities​ ​with
an​ ​accuracy​ ​of​ ​65.91%.​ ​Eventually​ ​and​ ​as​ ​expected,​ ​the​ ​CNN​ ​has​ ​100%​ ​accuracy​ ​when
considering​ ​the​ ​top-10​ ​probabilities​ ​as​ ​there​ ​are​ ​only​ ​10​ ​classes.

Figure​ ​1​ ​-​ ​Top-k​ ​Classification​ ​Rates


Exploration
Some​ ​additional​ ​images​ ​from​ ​the​ ​web​ ​were​ ​used​ ​to​ ​asses​ ​the​ ​response​ ​of​ ​the​ ​CNN
developed​ ​in​ ​this​ ​project.​ ​We​ ​used​ ​an​ ​image​ ​of​ ​a​ ​bird,​ ​truck​ ​and​ ​of​ ​a​ ​lamp.​ ​The​ ​lamp​ ​was
chosen​ ​because​ ​we​ ​wanted​ ​to​ ​explore​ ​how​ ​the​ ​CNN​ ​would​ ​behave​ ​to​ ​images​ ​that​ ​are​ ​not
classified​ ​by​ ​the​ ​CNN.​ ​The​ ​following​ ​figure​ ​shows​ ​the​ ​native​ ​images​ ​found​ ​from​ ​the​ ​web.

Figure​ ​2​ ​-​ ​Additional​ ​images​ ​to​ ​evaluate​ ​the​ ​effectiveness​ ​of​ ​the​ ​CNN.

Each​ ​image​ ​is​ ​256x256​ ​pixels​ ​with​ ​3​ ​color​ ​channels.​ ​In​ ​order​ ​for​ ​these​ ​images​ ​to​ ​be​ ​used​ ​in
the​ ​CNN,​ ​they​ ​were​ ​downsampled​ ​to​ ​32x32​ ​pixels​ ​with​ ​3​ ​color​ ​channels.​ ​In​ ​order​ ​to​ ​do​ ​so,
the​ ​downsampled​ ​image​ ​were​ ​created​ ​by​ ​selecting​ ​every​ ​8th​ ​pixel​ ​of​ ​the​ ​original​ ​image
convolved​ ​with​ ​a​ ​gaussian​ ​filter​ ​with​ ​standard​ ​deviation​ ​of​ ​2.​ ​The​ ​gaussian​ ​smoothing​ ​helps
prevent​ ​high-frequency​ ​artifacts​ ​when​ ​downsampling.​ ​The​ ​standard​ ​deviation​ ​of​ ​2​ ​was
chosen​ ​after​ ​some​ ​trial​ ​and​ ​error.​ ​In​ ​this​ ​project,​ ​all​ ​standard​ ​deviations​ ​lower​ ​than​ ​4​ ​had
consistent​ ​outputs.​ ​Standard​ ​deviations​ ​greater​ ​than​ ​4​ ​blurred​ ​the​ ​image​ ​too​ ​much​ ​and
classifications​ ​were​ ​not​ ​consistent​ ​when​ ​increasing​ ​the​ ​standard​ ​deviation​ ​from​ ​4.
Using​ ​the​ ​listed​ ​configuration​ ​from​ ​above,​ ​here​ ​are​ ​the​ ​following​ ​results.
Finding​ ​the​ ​class​ ​for​ ​the​ ​bird​ ​you​ ​picked
The​ ​estimated​ ​class​ ​is​ ​airplane​ ​with​ ​probability​ ​0.3042
Finding​ ​the​ ​class​ ​for​ ​the​ ​truck​ ​you​ ​picked
The​ ​estimated​ ​class​ ​is​ ​truck​ ​with​ ​probability​ ​0.3055
Finding​ ​the​ ​class​ ​for​ ​the​ ​lamp​ ​you​ ​picked
The​ ​estimated​ ​class​ ​is​ ​bird​ ​with​ ​probability​ ​0.3318

From​ ​these​ ​results,​ ​it​ ​appears​ ​that​ ​the​ ​bird​ ​is​ ​classified​ ​as​ ​an​ ​airplane​ ​probably​ ​due​ ​to​ ​the
presence​ ​of​ ​the​ ​sky​ ​in​ ​the​ ​bird​ ​image.​ ​As​ ​most​ ​of​ ​the​ ​training​ ​data​ ​for​ ​the​ ​airplane
classification​ ​consists​ ​of​ ​regions​ ​of​ ​sky​ ​(or​ ​blue),​ ​then​ ​input​ ​images​ ​that​ ​have​ ​quite​ ​a​ ​bit​ ​of
blue​ ​might​ ​correlate​ ​highly​ ​with​ ​the​ ​airplane​ ​training​ ​data.​ ​The​ ​truck​ ​happens​ ​to​ ​be
correlated​ ​well​ ​with​ ​the​ ​truck​ ​training​ ​data​ ​and​ ​has​ ​been​ ​classified​ ​correctly.​ ​The​ ​lamp
however,​ ​has​ ​no​ ​training​ ​data​ ​so​ ​it​ ​is​ ​classified​ ​by​ ​what​ ​object​ ​is​ ​has​ ​the​ ​most​ ​similarity​ ​to​ ​in
the​ ​training​ ​data.

Using​ ​a​ ​standard​ ​deviation​ ​of​ ​4,​ ​here​ ​are​ ​the​ ​following​ ​results
Finding​ ​the​ ​class​ ​for​ ​the​ ​bird​ ​you​ ​picked
The​ ​estimated​ ​class​ ​is​ ​airplane​ ​with​ ​probability​ ​0.3721
Finding​ ​the​ ​class​ ​for​ ​the​ ​truck​ ​you​ ​picked
The​ ​estimated​ ​class​ ​is​ ​airplane​ ​with​ ​probability​ ​0.3118
Finding​ ​the​ ​class​ ​for​ ​the​ ​lamp​ ​you​ ​picked
The​ ​estimated​ ​class​ ​is​ ​frog​ ​with​ ​probability​ ​0.3168

Here​ ​we​ ​see​ ​that​ ​the​ ​classifications​ ​are​ ​not​ ​correct​ ​for​ ​all​ ​images.​ ​The​ ​truck​ ​image​ ​is​ ​now
classified​ ​as​ ​an​ ​airplane,​ ​the​ ​bird​ ​remains​ ​classified​ ​as​ ​an​ ​airplane,​ ​but​ ​the​ ​lamp​ ​is​ ​now
classified​ ​as​ ​a​ ​frog.​ ​With​ ​a​ ​standard​ ​deviation​ ​of​ ​10,​ ​the​ ​other​ ​remain​ ​the​ ​same,​ ​but​ ​the​ ​truck
is​ ​now​ ​classified​ ​as​ ​a​ ​ship.

For​ ​further​ ​testing,​ ​it​ ​is​ ​best​ ​to​ ​use​ ​an​ ​image​ ​that​ ​contains​ ​the​ ​object​ ​of​ ​interest​ ​as​ ​large​ ​as
possible,​ ​with​ ​only​ ​that​ ​object​ ​in​ ​the​ ​image.​ ​To​ ​account​ ​for​ ​an​ ​unknown​ ​object​ ​in​ ​an​ ​input
image,​ ​we​ ​could​ ​evaluate​ ​an​ ​unknown​ ​class​ ​if​ ​many​ ​of​ ​the​ ​known​ ​objects’​ ​probabilities​ ​are
too​ ​close​ ​to​ ​each​ ​other.​ ​If​ ​for​ ​example,​ ​all​ ​classes​ ​approach​ ​a​ ​uniform​ ​distribution,​ ​the​ ​output
could​ ​be​ ​“unable​ ​to​ ​detect​ ​class”.​ ​Another​ ​enhancement​ ​could​ ​be​ ​adding​ ​a​ ​particular
threshold​ ​for​ ​saying​ ​what​ ​an​ ​object​ ​is.​ ​More​ ​specifically,​ ​if​ ​a​ ​class​ ​is​ ​not​ ​above​ ​at​ ​least​ ​0.25
probability,​ ​it​ ​could​ ​be​ ​said​ ​that​ ​classifications​ ​are​ ​too​ ​ambiguous.​ ​If​ ​an​ ​unknown​ ​image​ ​is
placed​ ​in,​ ​for​ ​example​ ​as​ ​with​ ​the​ ​lamp,​ ​this​ ​could​ ​be​ ​the​ ​scenario​ ​that​ ​plays​ ​out​ ​where​ ​the
classification​ ​does​ ​not​ ​exist​ ​within​ ​the​ ​training​ ​data.
Documentation​ ​of​ ​Roles​ ​on​ ​Project
MATLAB​ ​Coding​ ​for​ ​CNN Peter​ ​&​ ​Matthew

MATLAB​ ​Project​ ​Code​ ​Streamlining​ ​&​ ​Comments Matthew

Overview Matthew

Outline​ ​of​ ​Procedural​ ​Approaches Peter

Experimental​ ​Observations Peter

Run​ ​Performance​ ​Evaluation Peter

Exploration Matthew

Вам также может понравиться