3 LearniningInNeuralNetworks

Contents
Learningalgorithms
SC549 ArtificialNeural
Networks
SemesterII 2014/2015
Supervisedandunsupervised
Hebbrule(Hebbian learning)
Perceptronanditslearningalgorithm
ADALINEanditslearningalgorithm(Delta
rule)
Topic03:LearninginNeural
Networks
Training/LearninginNeural
Networks
TypesofLearningAlgorithms
Training/Learninganeuralnetworkmodel
essentiallymeansselectingonemodelfrom
thesetofallowedmodelsthatminimizeacost
function
Findingthedecisionboundarybyadjusting
theweights
Findingtheweightmatrixthatgivesusthe
correctclassification
3
Supervisedlearning
infersafunctionfromlabeledtrainingdata
Unsupervisedlearning
triestofindhiddenstructureinunlabeleddata
SupervisedLearning
UnsupervisedLearning
TheANNistrainedrepeatedlybyateacher.
Eachinputpresentedtothenetworkwillhave
anassociateddesiredoutputthatwillalsobe
presented.
Eachlearningcycletheerrorbetweenthe
actualandthedesiredoutputisusedtoadjust
theweights.
Whentheerrorisacceptableamountthe
learningstops.
Ateacherisnotinvolved
Thenetworkusesonlyinputs
Theinputsformautomaticclusteringbasedon
someclosenessorsimilaritycriteria.
Meaningisassociatedtotheseclustered
dependingonthedata.
Sometimesthistimeoftrainingiscalledself
organizingnetworks.
Generalisationtounseendata
MachineLearning
UnsupervisedLearning
p(x)
x
It is a fundamental issue, what we learnt is well

generalised to unseen data?
SupervisedLearning
(x,t)
f:x t
Trainingdata
Classification, t={1,,n}
Kmeans
GMM(EM)
PCA,manifold
Testingdata
ANN
SVM
Densityestimation
Regression, t: conti.variable
Polynomial
curvefit
Gaussian
Process
7
p(x)
p(x)
f:x t
f:x t
8
Generalization
MachineLearningProblems
Training set (labels known)
Test set (labels

unknown)
Howwelldoesalearnedmodelgeneralizefrom
thedataitwastrainedontoanewtestset?
10
Slide credit: L. Lazebnik
Generalization
Componentsofgeneralizationerror
Bias: howmuchtheaveragemodeloverall
trainingsetsdifferfromthetruemodel?
Errorduetoinaccurate
assumptions/simplificationsmadebythemodel
Variance: howmuchmodelsestimatedfrom
differenttrainingsetsdifferfromeachother
Errorfromsensitivitytosmallfluctuationsinthe
trainingset
Thevarianceishowmuchthepredictionsforagiven
pointvarybetweendifferentrealizationsofthemodel.
11
12
Generalization
BiasVarianceTradeoff
Underfitting: modelistoosimpletorepresentall
therelevantclasscharacteristics
Highbiasandlowvariance
Hightrainingerrorandhightesterror
Modelswithtoofew
parametersare
inaccuratebecauseofa
largebias(notenough
flexibility).
Overfitting: modelistoocomplexandfits
irrelevantcharacteristics(noise)inthedata
Lowbiasandhighvariance
Lowtrainingerrorandhightesterror
Modelswithtoomany
parametersare
inaccuratebecauseofa
largevariance(toomuch
sensitivitytothesample).
13
14
Slide credit: D. Hoiem
Biasvariancetradeoff
BiasVarianceTradeoff
Underfitting
Tradeoffbetweenamodel'sabilitytominimizebiasand
variance
Overfitting
E(MSE) = noise2 + bias2 + variance

Unavoidable
error
Error due to
incorrect
assumptions
Error due to
variance of training
samples
15
16
Theperfectclassificationalgorithm
Dimensionalityreductioncandecreasevarianceby
simplifyingmodels
Featureselectioncandecreasevariancebysimplifying
models.
Alargertrainingsettendstodecreasevariance.
Addingfeatures(predictors)tendstodecreasebias,at
theexpenseofintroducingadditionalvariance.
Learningalgorithmstypicallyhavesometunable
parameters thatcontrolbiasandvariance
Onewayofresolvingthetradeoffistousemixture
modelsandensemblelearning
SimpleNeuralNetworksLearning
Algorithms
Hebbian learning(Hebbrule)
Perceptronlearning
Deltalearning
Boosting
Bagging
17
HebbNetsandHebbian Learning
18
HebbNetsandHebbian Learning
Hebb,inhisinfluentialbookTheorganizationof
Behavior(1949),claimed
wi
InANN,Hebbian lawcanbestated:
xi
increasesonlyiftheoutputsofbothunits
y
andhavethesamesign.
Behaviorchangesareprimarilyduetothechangesof
wij
synapticstrengths()betweenneuronsi
andj
wi increasesonlywhenbothi andj(twoconnected
neurons)areon:theHebbian learninglaw
(algorithm)
Theweightsareincreasedasfollows;
wi wi (new) wi (old ) xi y
19
20
Hebbian learningalgorithm
HebbNetExample ANDFunction
Examples:ANDfunction
Step0.Initialization:b=0,wi =0,i =1ton

Step1.Foreachofthetrainingsamples:tdosteps24
/*sistheinputpattern,tthetargetoutputofthe
sample*/
Step2.xi:=si,i =1ton/*setstoinputunits*/
Step3.y:=t/*setytothetarget*/
Step4.wi :=wi +xi*y,i =1ton/*updateweight*/
b:=b+y/*updatebias*/
Binaryunits(1,0)
(x1,x2,1)y=t
(1,1,1)
1
(1,0,1)
0
(0,1,1)
0
(0,0,1)
0
w1
1
1
1
1
biasunit
w2
1
1
1
1
b
1
1
1
1
Anincorrectboundary:
1+x1+x2=0
Islearnedafterusing
eachsampleonce
21
22
Bipolarunits(1,1)
(x1,x2,b)y=t
(1,1,1)
1
(1,1,1)
1
(1,1,1)
1
(1,1,1)
1
w1
1
0
1
2
w2
1
2
1
2
b
1
0
1
2
Acorrectboundary
1+x1+x2=0
issuccessfully
learned
A boundary
1+x1+x2=0
is learned.Thisisnotthe
correctboundary
23
24
Withbipolarunits,a
correctboundary
1+x1+x2=0
issuccessfullylearned
Strongerlearningmethodsare
needed
Errordriven:foreachsamples:t,computey
fromsbasedoncurrentWandb,then
compareyandt
Usetrainingsamplesrepeatedly,andeach
timeonlychangeweightsslightly
LearningmethodsofPerceptronandAdaline
aregoodexamples
25
Perceptron
26
Perceptron
ByRosenblatt(1962)
Rosenblattsperceptronisbuiltarounda
nonlinearneuron,theMcCullochPittsmodel
ofaneuron.
Basically,itconsistsofasingleneuronwith
adjustablesynapticweightsandbias.
27
Theperceptronlearningruleismorepowerful
thantheHebblearningrule.
Ifthereexistsaweightmatrixthatgivesusthe
correctclassification,theperceptroniterative
learningprocedurecanbeshowntoconverge
tothecorrectweights.
Perceptrons candifferentiatepatternsonlyif
theyarelinearlyseparable.
28
Perceptron
PerceptronLearningAlgorithm
Theoutputoftheperceptrony=F(s)is
computedusingtheactivationfunction.
Foragiventrainingsamples:t,changeweights
onlyifthecomputedoutputyisdifferentfrom
thetargetoutputt(thuserrordriven)
Thefollowingformulaisusedtoupdatewhen
anerrorisfound.
OR
wheret={1,0,1}thetargetvalueand is
thelearningrate.
29
Perceptronlearningalgorithm
30
Step0.Initialization:b=0,wi =0,i =1ton

Step1.Whilestopconditionisfalsedosteps25
Step2.
Foreachofthetrainingsamples:tdosteps35
Step3.
xi:=si,i =1ton
Step4.computey
Step5.Ify!=t
wi :=wi +xi*t,i =1ton
b:=b+*t
Notes:
Learningoccursonlywhenasamplehasy!=t
Thegivenalgorithmissuitableforeitherbinaryor
bipolarinputvectors,bipolartargets,fixed,and
adjustablebias.
Therearetwoloops,
Acompletionoftheinnerloop(eachsampleis
usedonce)iscalledanepoch
iscalledthelearningrate
31
32
PerceptronExample AND
Function
Stopcondition
Binaryinputsandbipolartargets
Whennoweightischangedinthecurrentepoch,
or
Whenpredeterminednumberofepochsis
reached
(x1,x2,b)t
(1,1,1)1
(1,0,1)1
(0,1,1)1
(0,0,1)1
33
Function
34
Function
Theclassificationiscorrectforthefirstinput
anditisillustratedbelow
35
36
Function
Function
37
Function
38
Perceptronlearningalgorithm Another
Version(FromHaykins Book)
Thisisthemostcommonlyusedversion.
VariablesandParameters:
x(n)=(m+1)by1inputvector
=[+1,x (n),x (n),...,x (n)]
w(n)=(m+1)by1weightvector
=[b,w (n),w (n),...,w (n)]
b=bias
y(n)=actualresponse(quantized)
d(n)=desiredresponse(target)
=learningrateparameter,apositiveconstantless
thanunity
40
1
39
1.
Weightupdate
Initialization.Setw(0)=0.Thenperformthefollowing
computationsfortimestepn=1,2,....
w(n+1)=w(n)+[d(n) y(n)]x(n)
2.
Activation.Attimestepn,activatetheperceptronbyapplying
continuousvaluedinputvectorx(n)anddesiredresponsed(n).
3.
ComputationofActualResponse.Computetheactualresponseof
theperceptronas
y(n)=sgn[wT(n)x(n)]wheresgn()isthesignum
4.
Usingthepreviousnotations
wi :=wi +xi *(ty),i =1ton
function
AdaptationofWeightVector. Updatetheweightvectorofthe
perceptrontoobtain
w(n+1)=w(n)+[d(n) y(n)]x(n)
5.
Continuation.Incrementtimestepnbyoneandgobacktostep2.
41
42
TheAdaptiveLinearNeuron
(ADALINE)
TheAdaptiveLinearNeuron
(ADALINE)
ADALINE(AdaptiveLinearNeuronorlater
AdaptiveLinearElement)isanearlysingle
layerartificialneuralnetwork
AdaptiveLinearNeuronwasintroducedby
Widrow andHoff(1960)
Thesamearchitectureofoursimplenetwork
Thistypicallyusesbipolaractivationsforits
inputsignalsandtargetoutput.
Thearchitecturehasweightsthatare
adjustable.
TheADALINEistrainedusingtheDeltaRule
(alsoknownastheLMSorWidrowHoffrule)
43
44
Delta(LMS)LearningRulefor
ADALINE
ADALINE
Thedeltaruleadjuststheweightstoreduce
thedifferencebetweenactualandthedesired
outputs
Thisresultsinthesmallestmeansquareerror
Thisimprovementensuresthatthetraining
givesyouamoregeneralizedsystem
Learningalgorithm:sameasPerceptron
learningexceptinStep5:
b(new)=b(old)+(t y_in).
wi(new)=wi(old)+(t y_in)xi.
where
45
ADALINE
Itisderivedinthefollowingmanner.Themeansquared
errorforaparticulartrainingpatternisgivenby
46
SingleOutputUnit
Thedeltarulechangestheweightsofheneuralconnection
soastominimizethedifferencebetweenthenetinputtothe
outputunitS,andthetargetvaluet.
Theaimistominimizetheerrorforaltrainingpatterns
howeverthisisaccomplishedbyminimizingtheerrorfor
eachpatternsoneatatime.
Eisafunctionofalltheweightswi ,i =1,n.
Thedeltaruleforadjustingtheith weightforeachpatternis
ThegradientE/ wI givesthedirectionofmostrapid
increaseinE,hencetheerrorcanbedecreasedby
adjustingtheweightwI inthedirectionof E/ wI ,
i.e.
Itisderivedinthefollowingmanner.Themeansquarederror
foraparticulartrainingpatternisgivenby
isthelearningrate
47
48
MultipleOutputUnits
ADALINE DeltaLearningAlgorithm
ThedeltaruleforadjustingtheweightfromtheIth inputto
theJth outputunitforeachpatternis
Themeansquareerrorforaparticulartrainingpatternis
Step0Initializeweights.Setlearningrate
Step1Whilestoppingconditionisfalse,doStep26.
Step2Foreachbipolartrainingpairs:t,doStep35.
Step3Setactivationsofinputunits,i =1,n:xi=si .
Step4Computernetinputtooutputunit:
Step5Updatebiasandweights,i =1,n:
Inthemultipleoutputneuroncaseweagainconsidereach
weightseparatelyfortheerrorreduction.
Thelocalerrorwillreducemostrapidlyforagivenlearning
ratebyadjustingtheweightsaccordingto
b(new)=b(old)+(t y_in).
wi(new)=wi(old)+(t y_in)xi.
Step6Testforstoppingcondition:Ifthelargestweight
changethatoccurredinStep2issmallerthana
specifiedtolerance,thenstop;otherwisecontinue.
49
ADALINE DeltaLearningAlgorithm
50
PerceptronvsDeltaRule
ThedifferencebetweenAdaline andthestandardperceptronisthatinthe
learningphasetheweightsareadjustedaccordingtotheweightedsumof
theinputs(thenet).
Thelearningrate hastobechosen
appropriately:
ASmallvaluewillmakethelearningprocess
extremelyslow.
Alargevaluewillresultinthelearningprocessnot
converging.
Theinitialweightsaresettosmallrandom
values.
Inthestandardperceptron,thenetispassedtotheactivation(transfer)
functionandthefunction'soutputisusedforadjustingtheweights.
perceptrontrainingrule:
usesthresholded unit
convergesafterafinitenumberofiterations
outputhypothesisclassifiestrainingdataperfectly
linearlyseparability necessary
deltarule:
51
usesunthresholded linearunit
convergesasymptoticallytowardaminimumerrorhypothesis
terminationisnotguaranteed
linearseparability notnecessary
52

3 LearniningInNeuralNetworks

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

3 LearniningInNeuralNetworks

Загружено:

Авторское право:

Доступные форматы

Contents

It is a fundamental issue, what we learnt is well

Training set (labels known)

Test set (labels

E(MSE) = noise2 + bias2 + variance

Step0.Initialization:b=0,wi =0,i =1ton

Step0.Initialization:b=0,wi =0,i =1ton

Вам также может понравиться