# kͲNearestNeighbor

DataRepresentation
LearningGoals

 Describehowtorepresentcomplexdata
 Viewdatagraphically
DataRepresentation
 Mostlearningalgorithmsrequiredatainsome
numericrepresentation
± e.g.eachinputpatternisavector
 Ifdatanaturallyhasnumeric(realͲvalued)
features,justrepresentasvectorof(real)
numbers
± e.g.28x28imageby784x1vectorofpixel
intensities
 IfdatahasnonͲnumericrepresentation…
± let’slookatsomeexamples
DatatoFeatures
 Textdocument

DatatoFeatures
Let’sconsiderdatasetsimilartoTennisPlayingexample

 Featuresarecategorical(Low/High,Yes/No,Overcast/Rainy/Sunny,etc)
 Featureswithonly2possiblevalues
± canberepresentedas0/1
 Featureswithmorethan2possiblevalues
± canwemapSunny=0,Overcast=1,Rainy=2?

DatatoFeatures
 Well,wecouldmapSunny=0,Overcast=1,Rainy=2…
 Butsuchamappingmaynotalwaysbeappropriate
± imaginefeaturevaluesbeingred,blue,green
± red=0,blue=1,green=2impliesredmoresimilartoblue
thantogreen
 Solution:forfeaturewithK>2possiblevalues,create
binaryfeatures,oneforeachpossiblevalue
DataVisualization
Turnfeaturesintonumericalvalues
Let’svisualizethisdata (simplemappingusedheretokeepvisualizationsimple)
Weight Color Label Weight Color Label

## 4 Red Apple 4 0 Apple

1 B B B
A A
5 Yellow Apple 5 1 Apple

Color
6 Yellow Banana 6 1 Banana

## 7 Yellow Banana 7 1 Banana 0 A A

8 Yellow Banana 8 1 Banana

## 6 Yellow Apple 6 1 Apple 0 Weight 10

WecanviewexamplesaspointsinandͲdimensionalspace
whered isnumberoffeatures
kͲNearestNeighbor
LearningGoals

 DescribekNNalgorithm
 DescribeimpactofkinkNN
 DescribekNNvariants(optional)

ExamplesinaFeatureSpace

feature2 testexample
whatclass?

label1
label2
label3

feature1
Anotherclassificationalgorithm?
Toclassifyexamplex:
Labelx withlabelofclosestexampletox intrainingset
kͲNearestNeighbor(kͲNN)
Toclassifyexamplex:
± Findk nearest
nearestneighborsofx
neighborsofx
± Chooseaslabelthe
Chooseaslabelthemajoritylabel
majoritylabel withink nearest
neighbors

Howdowemeasure“nearest”?
commonapproach:standardEuclidean distancemetric
twoͲdimensional:
dist(a,b) = sqrt((a1 – b1)2 + (a2 – b2)2) (b1, b2,…, bn)
nͲdimensional:
dist(a,b) = sqrt(i(ai – bi)2) (a1, a2, …, an)

OtherDistanceMeasures
 BinaryͲvaluedfeatures
± Hammingdistance dist(a,b) = i I(ai  bi)
countsnumberoffeatureswheretwoexamples
disagree

 Mixed featuretypes(somereal,somebinary)
± mixeddistancemeasures
± e.g.Euclideanforrealpart,Hammingforbinarypart

 Canalsoassignweights tofeatures
dist(a,b) = i wi·d(ai,bi)

kͲNNDecisionBoundaries

label1
label2
label3

WherearedecisionboundariesforkͲNN?
kͲNNgiveslocallydefined decisionboundariesbetweenclasses
(formssubsetofVoronoi diagramfortrainingdata)

kͲNNDecisionBoundaries
 Canbechangedbydifferentdistancemetrics

## dist(a,b)=(a1 – b1)2+(a2 – b2)2 dist(a,b)=(a1 – b1)2+(3a2 – 3b2)2

 Becomemorecomplexasmoreexamplesarestored

kͲNearestNeighbor(kͲNN)
Toclassifyexamplex:
± Findk nearestneighborsofx
± Chooseaslabelthemajoritylabelwithink nearest
neighbors

Howdowechoosek?

,PSDFWRIk

Whatisroleofk?

Howdoesitrelatetooverfittingandunderfitting?

Howdidwecontrolthisfordecisiontrees?

kͲNearestNeighbor(kͲNN)
Toclassifyexamplex:
Toclassifyexampled:
± Findk nearestneighborsofd
nearestneighborsofx
± Chooseaslabelthe
Chooseaslabelthemajoritylabelwithink
majoritylabel withink nearest
neighbors

Howdowechoosek?
 OftendataͲdependentandheuristicͲbased
± commonheuristic:choose3,5,7(oddnumber)
 Usevalidationdata

kͲNearestNeighbor(kͲNN)
Toclassifyexamplex:
± Findk nearestneighborsofx
± Chooseaslabelthemajoritylabelwithink nearest
neighbors

Anyvariants?
 Fixeddistance
 Weighted
examplessothatcloserexampleshavemorevote/weight
(oftenusesomesortofexponentialdecay)

kNN ProblemsandMLTerminology
LearningGoals

 DescribehowtospeedͲupkNN
 DefinenonͲparametricandparametricand
describedifferences
 Describecurseofdimensionality

SpeedingupkͲNN
 kͲNNisa“lazy”learningalgorithm
± doesvirtuallynothingattrainingtime
 Butclassification/predictioncanbecostly
whentrainingsetislarge
± forn trainingexamplesandd features,howmany
computationsrequiredforeachtestexample?
 Twostrategiesforalleviatingthisweakness
± editednearestneighbor:donotretainevery
traininginstance
± kͲdtree:usesmartdatastructuretolookupNN
Aside:NonͲparametricvsParametric
 nonͲparametric method
± notbasedonparameterizedfamiliesofprobability
distributions ofvariablesbeingassessed
± complexitygrowswithamountoftrainingdata
± (notnoneͲparametric) bothDTandkNN
arenonͲparametric
 parametric method
dataͲgeneratingdistribution
± hasfixednumberofparameters

CurseofDimensionality
withdimensions!
 NNbreaksdowninhighdimensionalspaces
because“neighborhood”becomesverylarge
 Ex:Supposewehave5000pointsuniformly
distributedinunithypercubeandwanttoapply
5ͲNNtotestexampleatorigin
± onaverage,needtoexplore5/5000=0.001ofvolume
± 1D:mustgodistanceof0.001onaverage
± 2D:mustgosqrt(0.001)§ 0.0316togetsquarethat
contains0.001ofvolume
± nͲD:inn dimensions,mustgo(0.001)1/n >>0.001

Summary:kͲNN
Whentoconsider
 examplesmaptopointsind
 smallnumber(<20)attributesperinstance
 lotsoftrainingdata