Академический Документы
Профессиональный Документы
Культура Документы
Learningalgorithms
SC549 ArtificialNeural
Networks
SemesterII 2014/2015
Supervisedandunsupervised
Hebbrule(Hebbian learning)
Perceptronanditslearningalgorithm
ADALINEanditslearningalgorithm(Delta
rule)
Topic03:LearninginNeural
Networks
Training/LearninginNeural
Networks
TypesofLearningAlgorithms
Training/Learninganeuralnetworkmodel
essentiallymeansselectingonemodelfrom
thesetofallowedmodelsthatminimizeacost
function
Findingthedecisionboundarybyadjusting
theweights
Findingtheweightmatrixthatgivesusthe
correctclassification
3
Supervisedlearning
infersafunctionfromlabeledtrainingdata
Unsupervisedlearning
triestofindhiddenstructureinunlabeleddata
SupervisedLearning
UnsupervisedLearning
TheANNistrainedrepeatedlybyateacher.
Eachinputpresentedtothenetworkwillhave
anassociateddesiredoutputthatwillalsobe
presented.
Eachlearningcycletheerrorbetweenthe
actualandthedesiredoutputisusedtoadjust
theweights.
Whentheerrorisacceptableamountthe
learningstops.
Ateacherisnotinvolved
Thenetworkusesonlyinputs
Theinputsformautomaticclusteringbasedon
someclosenessorsimilaritycriteria.
Meaningisassociatedtotheseclustered
dependingonthedata.
Sometimesthistimeoftrainingiscalledself
organizingnetworks.
Generalisationtounseendata
MachineLearning
UnsupervisedLearning
p(x)
x
SupervisedLearning
(x,t)
f:x t
Trainingdata
Classification, t={1,,n}
Kmeans
GMM(EM)
PCA,manifold
Testingdata
ANN
SVM
Densityestimation
Regression, t: conti.variable
Polynomial
curvefit
Gaussian
Process
7
p(x)
p(x)
f:x t
f:x t
8
Generalization
MachineLearningProblems
Howwelldoesalearnedmodelgeneralizefrom
thedataitwastrainedontoanewtestset?
10
Slide credit: L. Lazebnik
Generalization
Componentsofgeneralizationerror
Bias: howmuchtheaveragemodeloverall
trainingsetsdifferfromthetruemodel?
Errorduetoinaccurate
assumptions/simplificationsmadebythemodel
Variance: howmuchmodelsestimatedfrom
differenttrainingsetsdifferfromeachother
Errorfromsensitivitytosmallfluctuationsinthe
trainingset
Thevarianceishowmuchthepredictionsforagiven
pointvarybetweendifferentrealizationsofthemodel.
11
Slide credit: L. Lazebnik
12
Generalization
BiasVarianceTradeoff
Underfitting: modelistoosimpletorepresentall
therelevantclasscharacteristics
Highbiasandlowvariance
Hightrainingerrorandhightesterror
Modelswithtoofew
parametersare
inaccuratebecauseofa
largebias(notenough
flexibility).
Overfitting: modelistoocomplexandfits
irrelevantcharacteristics(noise)inthedata
Lowbiasandhighvariance
Lowtrainingerrorandhightesterror
Modelswithtoomany
parametersare
inaccuratebecauseofa
largevariance(toomuch
sensitivitytothesample).
13
Slide credit: L. Lazebnik
14
Slide credit: D. Hoiem
Biasvariancetradeoff
BiasVarianceTradeoff
Underfitting
Tradeoffbetweenamodel'sabilitytominimizebiasand
variance
Overfitting
Error due to
incorrect
assumptions
Error due to
variance of training
samples
15
Slide credit: D. Hoiem
16
Slide credit: D. Hoiem
Theperfectclassificationalgorithm
Dimensionalityreductioncandecreasevarianceby
simplifyingmodels
Featureselectioncandecreasevariancebysimplifying
models.
Alargertrainingsettendstodecreasevariance.
Addingfeatures(predictors)tendstodecreasebias,at
theexpenseofintroducingadditionalvariance.
Learningalgorithmstypicallyhavesometunable
parameters thatcontrolbiasandvariance
Onewayofresolvingthetradeoffistousemixture
modelsandensemblelearning
SimpleNeuralNetworksLearning
Algorithms
Hebbian learning(Hebbrule)
Perceptronlearning
Deltalearning
Boosting
Bagging
17
Slide credit: D. Hoiem
HebbNetsandHebbian Learning
18
HebbNetsandHebbian Learning
Hebb,inhisinfluentialbookTheorganizationof
Behavior(1949),claimed
wi
InANN,Hebbian lawcanbestated:
xi
increasesonlyiftheoutputsofbothunits
y
andhavethesamesign.
Behaviorchangesareprimarilyduetothechangesof
wij
synapticstrengths()betweenneuronsi
andj
wi increasesonlywhenbothi andj(twoconnected
neurons)areon:theHebbian learninglaw
(algorithm)
Theweightsareincreasedasfollows;
wi wi (new) wi (old ) xi y
19
20
Hebbian learningalgorithm
HebbNetExample ANDFunction
Examples:ANDfunction
Binaryunits(1,0)
(x1,x2,1)y=t
(1,1,1)
1
(1,0,1)
0
(0,1,1)
0
(0,0,1)
0
w1
1
1
1
1
biasunit
w2
1
1
1
1
b
1
1
1
1
Anincorrectboundary:
1+x1+x2=0
Islearnedafterusing
eachsampleonce
21
HebbNetExample ANDFunction
22
HebbNetExample ANDFunction
Bipolarunits(1,1)
(x1,x2,b)y=t
(1,1,1)
1
(1,1,1)
1
(1,1,1)
1
(1,1,1)
1
w1
1
0
1
2
w2
1
2
1
2
b
1
0
1
2
Acorrectboundary
1+x1+x2=0
issuccessfully
learned
A boundary
1+x1+x2=0
is learned.Thisisnotthe
correctboundary
23
24
HebbNetExample ANDFunction
Withbipolarunits,a
correctboundary
1+x1+x2=0
issuccessfullylearned
Strongerlearningmethodsare
needed
Errordriven:foreachsamples:t,computey
fromsbasedoncurrentWandb,then
compareyandt
Usetrainingsamplesrepeatedly,andeach
timeonlychangeweightsslightly
LearningmethodsofPerceptronandAdaline
aregoodexamples
25
Perceptron
26
Perceptron
ByRosenblatt(1962)
Rosenblattsperceptronisbuiltarounda
nonlinearneuron,theMcCullochPittsmodel
ofaneuron.
Basically,itconsistsofasingleneuronwith
adjustablesynapticweightsandbias.
27
Theperceptronlearningruleismorepowerful
thantheHebblearningrule.
Ifthereexistsaweightmatrixthatgivesusthe
correctclassification,theperceptroniterative
learningprocedurecanbeshowntoconverge
tothecorrectweights.
Perceptrons candifferentiatepatternsonlyif
theyarelinearlyseparable.
28
Perceptron
PerceptronLearningAlgorithm
Theoutputoftheperceptrony=F(s)is
computedusingtheactivationfunction.
Foragiventrainingsamples:t,changeweights
onlyifthecomputedoutputyisdifferentfrom
thetargetoutputt(thuserrordriven)
Thefollowingformulaisusedtoupdatewhen
anerrorisfound.
OR
wheret={1,0,1}thetargetvalueand is
thelearningrate.
29
Perceptronlearningalgorithm
30
Perceptronlearningalgorithm
Notes:
Learningoccursonlywhenasamplehasy!=t
Thegivenalgorithmissuitableforeitherbinaryor
bipolarinputvectors,bipolartargets,fixed,and
adjustablebias.
Therearetwoloops,
Acompletionoftheinnerloop(eachsampleis
usedonce)iscalledanepoch
iscalledthelearningrate
31
32
Perceptronlearningalgorithm
PerceptronExample AND
Function
Stopcondition
Binaryinputsandbipolartargets
Whennoweightischangedinthecurrentepoch,
or
Whenpredeterminednumberofepochsis
reached
(x1,x2,b)t
(1,1,1)1
(1,0,1)1
(0,1,1)1
(0,0,1)1
33
PerceptronExample AND
Function
34
PerceptronExample AND
Function
Theclassificationiscorrectforthefirstinput
anditisillustratedbelow
35
36
PerceptronExample AND
Function
PerceptronExample AND
Function
37
PerceptronExample AND
Function
38
Perceptronlearningalgorithm Another
Version(FromHaykins Book)
Thisisthemostcommonlyusedversion.
VariablesandParameters:
x(n)=(m+1)by1inputvector
=[+1,x (n),x (n),...,x (n)]
w(n)=(m+1)by1weightvector
=[b,w (n),w (n),...,w (n)]
b=bias
y(n)=actualresponse(quantized)
d(n)=desiredresponse(target)
=learningrateparameter,apositiveconstantless
thanunity
40
1
39
Perceptronlearningalgorithm Another
Version(FromHaykins Book)
Perceptronlearningalgorithm Another
Version(FromHaykins Book)
1.
Weightupdate
Initialization.Setw(0)=0.Thenperformthefollowing
computationsfortimestepn=1,2,....
w(n+1)=w(n)+[d(n) y(n)]x(n)
2.
Activation.Attimestepn,activatetheperceptronbyapplying
continuousvaluedinputvectorx(n)anddesiredresponsed(n).
3.
ComputationofActualResponse.Computetheactualresponseof
theperceptronas
y(n)=sgn[wT(n)x(n)]wheresgn()isthesignum
4.
Usingthepreviousnotations
wi :=wi +xi *(ty),i =1ton
function
AdaptationofWeightVector. Updatetheweightvectorofthe
perceptrontoobtain
w(n+1)=w(n)+[d(n) y(n)]x(n)
5.
Continuation.Incrementtimestepnbyoneandgobacktostep2.
41
42
TheAdaptiveLinearNeuron
(ADALINE)
TheAdaptiveLinearNeuron
(ADALINE)
ADALINE(AdaptiveLinearNeuronorlater
AdaptiveLinearElement)isanearlysingle
layerartificialneuralnetwork
AdaptiveLinearNeuronwasintroducedby
Widrow andHoff(1960)
Thesamearchitectureofoursimplenetwork
Thistypicallyusesbipolaractivationsforits
inputsignalsandtargetoutput.
Thearchitecturehasweightsthatare
adjustable.
TheADALINEistrainedusingtheDeltaRule
(alsoknownastheLMSorWidrowHoffrule)
43
44
Delta(LMS)LearningRulefor
ADALINE
Delta(LMS)LearningRulefor
ADALINE
Thedeltaruleadjuststheweightstoreduce
thedifferencebetweenactualandthedesired
outputs
Thisresultsinthesmallestmeansquareerror
Thisimprovementensuresthatthetraining
givesyouamoregeneralizedsystem
Learningalgorithm:sameasPerceptron
learningexceptinStep5:
b(new)=b(old)+(t y_in).
wi(new)=wi(old)+(t y_in)xi.
where
45
Delta(LMS)LearningRulefor
ADALINE
Itisderivedinthefollowingmanner.Themeansquared
errorforaparticulartrainingpatternisgivenby
46
Delta(LMS)LearningRulefor
SingleOutputUnit
Thedeltarulechangestheweightsofheneuralconnection
soastominimizethedifferencebetweenthenetinputtothe
outputunitS,andthetargetvaluet.
Theaimistominimizetheerrorforaltrainingpatterns
howeverthisisaccomplishedbyminimizingtheerrorfor
eachpatternsoneatatime.
Eisafunctionofalltheweightswi ,i =1,n.
Thedeltaruleforadjustingtheith weightforeachpatternis
ThegradientE/ wI givesthedirectionofmostrapid
increaseinE,hencetheerrorcanbedecreasedby
adjustingtheweightwI inthedirectionof E/ wI ,
i.e.
Itisderivedinthefollowingmanner.Themeansquarederror
foraparticulartrainingpatternisgivenby
isthelearningrate
47
48
Delta(LMS)LearningRulefor
MultipleOutputUnits
ADALINE DeltaLearningAlgorithm
ThedeltaruleforadjustingtheweightfromtheIth inputto
theJth outputunitforeachpatternis
Themeansquareerrorforaparticulartrainingpatternis
Step0Initializeweights.Setlearningrate
Step1Whilestoppingconditionisfalse,doStep26.
Step2Foreachbipolartrainingpairs:t,doStep35.
Step3Setactivationsofinputunits,i =1,n:xi=si .
Step4Computernetinputtooutputunit:
Step5Updatebiasandweights,i =1,n:
Inthemultipleoutputneuroncaseweagainconsidereach
weightseparatelyfortheerrorreduction.
Thelocalerrorwillreducemostrapidlyforagivenlearning
ratebyadjustingtheweightsaccordingto
b(new)=b(old)+(t y_in).
wi(new)=wi(old)+(t y_in)xi.
Step6Testforstoppingcondition:Ifthelargestweight
changethatoccurredinStep2issmallerthana
specifiedtolerance,thenstop;otherwisecontinue.
49
ADALINE DeltaLearningAlgorithm
50
PerceptronvsDeltaRule
ThedifferencebetweenAdaline andthestandardperceptronisthatinthe
learningphasetheweightsareadjustedaccordingtotheweightedsumof
theinputs(thenet).
Thelearningrate hastobechosen
appropriately:
ASmallvaluewillmakethelearningprocess
extremelyslow.
Alargevaluewillresultinthelearningprocessnot
converging.
Theinitialweightsaresettosmallrandom
values.
Inthestandardperceptron,thenetispassedtotheactivation(transfer)
functionandthefunction'soutputisusedforadjustingtheweights.
perceptrontrainingrule:
usesthresholded unit
convergesafterafinitenumberofiterations
outputhypothesisclassifiestrainingdataperfectly
linearlyseparability necessary
deltarule:
51
usesunthresholded linearunit
convergesasymptoticallytowardaminimumerrorhypothesis
terminationisnotguaranteed
linearseparability notnecessary
52