Вы находитесь на странице: 1из 13

Contents

Learningalgorithms

SC549 ArtificialNeural
Networks
SemesterII 2014/2015

Supervisedandunsupervised

Hebbrule(Hebbian learning)
Perceptronanditslearningalgorithm
ADALINEanditslearningalgorithm(Delta
rule)

Topic03:LearninginNeural
Networks

Training/LearninginNeural
Networks

TypesofLearningAlgorithms

Training/Learninganeuralnetworkmodel
essentiallymeansselectingonemodelfrom
thesetofallowedmodelsthatminimizeacost
function
Findingthedecisionboundarybyadjusting
theweights
Findingtheweightmatrixthatgivesusthe
correctclassification
3

Supervisedlearning
infersafunctionfromlabeledtrainingdata

Unsupervisedlearning
triestofindhiddenstructureinunlabeleddata

SupervisedLearning

UnsupervisedLearning

TheANNistrainedrepeatedlybyateacher.
Eachinputpresentedtothenetworkwillhave
anassociateddesiredoutputthatwillalsobe
presented.
Eachlearningcycletheerrorbetweenthe
actualandthedesiredoutputisusedtoadjust
theweights.
Whentheerrorisacceptableamountthe
learningstops.

Ateacherisnotinvolved
Thenetworkusesonlyinputs
Theinputsformautomaticclusteringbasedon
someclosenessorsimilaritycriteria.
Meaningisassociatedtotheseclustered
dependingonthedata.
Sometimesthistimeoftrainingiscalledself
organizingnetworks.

Generalisationtounseendata

MachineLearning
UnsupervisedLearning

p(x)
x

It is a fundamental issue, what we learnt is well


generalised to unseen data?

SupervisedLearning

(x,t)

f:x t

Trainingdata

Classification, t={1,,n}

Kmeans
GMM(EM)
PCA,manifold

Testingdata

ANN
SVM

Densityestimation

Regression, t: conti.variable
Polynomial
curvefit
Gaussian
Process
7

p(x)

p(x)

f:x t

f:x t
8

Generalization

MachineLearningProblems

Training set (labels known)

Test set (labels


unknown)

Howwelldoesalearnedmodelgeneralizefrom
thedataitwastrainedontoanewtestset?

10
Slide credit: L. Lazebnik

Generalization
Componentsofgeneralizationerror
Bias: howmuchtheaveragemodeloverall
trainingsetsdifferfromthetruemodel?
Errorduetoinaccurate
assumptions/simplificationsmadebythemodel
Variance: howmuchmodelsestimatedfrom
differenttrainingsetsdifferfromeachother
Errorfromsensitivitytosmallfluctuationsinthe
trainingset
Thevarianceishowmuchthepredictionsforagiven
pointvarybetweendifferentrealizationsofthemodel.
11
Slide credit: L. Lazebnik

12

Generalization

BiasVarianceTradeoff

Underfitting: modelistoosimpletorepresentall
therelevantclasscharacteristics
Highbiasandlowvariance
Hightrainingerrorandhightesterror

Modelswithtoofew
parametersare
inaccuratebecauseofa
largebias(notenough
flexibility).

Overfitting: modelistoocomplexandfits
irrelevantcharacteristics(noise)inthedata
Lowbiasandhighvariance
Lowtrainingerrorandhightesterror

Modelswithtoomany
parametersare
inaccuratebecauseofa
largevariance(toomuch
sensitivitytothesample).
13
Slide credit: L. Lazebnik

14
Slide credit: D. Hoiem

Biasvariancetradeoff

BiasVarianceTradeoff

Underfitting

Tradeoffbetweenamodel'sabilitytominimizebiasand
variance

Overfitting

E(MSE) = noise2 + bias2 + variance


Unavoidable
error

Error due to
incorrect
assumptions

Error due to
variance of training
samples

15
Slide credit: D. Hoiem

16
Slide credit: D. Hoiem

Theperfectclassificationalgorithm
Dimensionalityreductioncandecreasevarianceby
simplifyingmodels
Featureselectioncandecreasevariancebysimplifying
models.
Alargertrainingsettendstodecreasevariance.
Addingfeatures(predictors)tendstodecreasebias,at
theexpenseofintroducingadditionalvariance.
Learningalgorithmstypicallyhavesometunable
parameters thatcontrolbiasandvariance
Onewayofresolvingthetradeoffistousemixture
modelsandensemblelearning

SimpleNeuralNetworksLearning
Algorithms
Hebbian learning(Hebbrule)
Perceptronlearning
Deltalearning

Boosting
Bagging
17
Slide credit: D. Hoiem

HebbNetsandHebbian Learning

18

HebbNetsandHebbian Learning

Hebb,inhisinfluentialbookTheorganizationof
Behavior(1949),claimed

wi
InANN,Hebbian lawcanbestated:
xi
increasesonlyiftheoutputsofbothunits
y
andhavethesamesign.

Behaviorchangesareprimarilyduetothechangesof
wij
synapticstrengths()betweenneuronsi
andj
wi increasesonlywhenbothi andj(twoconnected
neurons)areon:theHebbian learninglaw
(algorithm)

Theweightsareincreasedasfollows;

wi wi (new) wi (old ) xi y

19

20

Hebbian learningalgorithm

HebbNetExample ANDFunction
Examples:ANDfunction

Step0.Initialization:b=0,wi =0,i =1ton


Step1.Foreachofthetrainingsamples:tdosteps24
/*sistheinputpattern,tthetargetoutputofthe
sample*/
Step2.xi:=si,i =1ton/*setstoinputunits*/
Step3.y:=t/*setytothetarget*/
Step4.wi :=wi +xi*y,i =1ton/*updateweight*/
b:=b+y/*updatebias*/

Binaryunits(1,0)
(x1,x2,1)y=t
(1,1,1)
1
(1,0,1)
0
(0,1,1)
0
(0,0,1)
0

w1
1
1
1
1
biasunit

w2
1
1
1
1

b
1
1
1
1

Anincorrectboundary:
1+x1+x2=0
Islearnedafterusing
eachsampleonce

21

HebbNetExample ANDFunction

22

HebbNetExample ANDFunction
Bipolarunits(1,1)
(x1,x2,b)y=t
(1,1,1)
1
(1,1,1)
1
(1,1,1)
1
(1,1,1)
1

w1
1
0
1
2

w2
1
2
1
2

b
1
0
1
2

Acorrectboundary
1+x1+x2=0
issuccessfully
learned

A boundary
1+x1+x2=0
is learned.Thisisnotthe
correctboundary
23

24

HebbNetExample ANDFunction
Withbipolarunits,a
correctboundary
1+x1+x2=0
issuccessfullylearned

Strongerlearningmethodsare
needed
Errordriven:foreachsamples:t,computey
fromsbasedoncurrentWandb,then
compareyandt
Usetrainingsamplesrepeatedly,andeach
timeonlychangeweightsslightly
LearningmethodsofPerceptronandAdaline
aregoodexamples

25

Perceptron

26

Perceptron

ByRosenblatt(1962)

Rosenblattsperceptronisbuiltarounda
nonlinearneuron,theMcCullochPittsmodel
ofaneuron.
Basically,itconsistsofasingleneuronwith
adjustablesynapticweightsandbias.

27

Theperceptronlearningruleismorepowerful
thantheHebblearningrule.
Ifthereexistsaweightmatrixthatgivesusthe
correctclassification,theperceptroniterative
learningprocedurecanbeshowntoconverge
tothecorrectweights.
Perceptrons candifferentiatepatternsonlyif
theyarelinearlyseparable.
28

Perceptron

PerceptronLearningAlgorithm

Theoutputoftheperceptrony=F(s)is
computedusingtheactivationfunction.

Foragiventrainingsamples:t,changeweights
onlyifthecomputedoutputyisdifferentfrom
thetargetoutputt(thuserrordriven)
Thefollowingformulaisusedtoupdatewhen
anerrorisfound.

OR

wheret={1,0,1}thetargetvalueand is
thelearningrate.
29

Perceptronlearningalgorithm

30

Perceptronlearningalgorithm

Step0.Initialization:b=0,wi =0,i =1ton


Step1.Whilestopconditionisfalsedosteps25
Step2.
Foreachofthetrainingsamples:tdosteps35
Step3.
xi:=si,i =1ton
Step4.computey
Step5.Ify!=t
wi :=wi +xi*t,i =1ton
b:=b+*t

Notes:
Learningoccursonlywhenasamplehasy!=t
Thegivenalgorithmissuitableforeitherbinaryor
bipolarinputvectors,bipolartargets,fixed,and
adjustablebias.
Therearetwoloops,
Acompletionoftheinnerloop(eachsampleis
usedonce)iscalledanepoch

iscalledthelearningrate
31

32

Perceptronlearningalgorithm

PerceptronExample AND
Function

Stopcondition

Binaryinputsandbipolartargets

Whennoweightischangedinthecurrentepoch,
or
Whenpredeterminednumberofepochsis
reached

(x1,x2,b)t
(1,1,1)1
(1,0,1)1
(0,1,1)1
(0,0,1)1

33

PerceptronExample AND
Function

34

PerceptronExample AND
Function
Theclassificationiscorrectforthefirstinput
anditisillustratedbelow

35

36

PerceptronExample AND
Function

PerceptronExample AND
Function

37

PerceptronExample AND
Function

38

Perceptronlearningalgorithm Another
Version(FromHaykins Book)
Thisisthemostcommonlyusedversion.

VariablesandParameters:
x(n)=(m+1)by1inputvector
=[+1,x (n),x (n),...,x (n)]
w(n)=(m+1)by1weightvector
=[b,w (n),w (n),...,w (n)]
b=bias
y(n)=actualresponse(quantized)
d(n)=desiredresponse(target)
=learningrateparameter,apositiveconstantless
thanunity
40
1

39

Perceptronlearningalgorithm Another
Version(FromHaykins Book)

Perceptronlearningalgorithm Another
Version(FromHaykins Book)

1.

Weightupdate

Initialization.Setw(0)=0.Thenperformthefollowing
computationsfortimestepn=1,2,....

w(n+1)=w(n)+[d(n) y(n)]x(n)
2.

Activation.Attimestepn,activatetheperceptronbyapplying
continuousvaluedinputvectorx(n)anddesiredresponsed(n).

3.

ComputationofActualResponse.Computetheactualresponseof
theperceptronas
y(n)=sgn[wT(n)x(n)]wheresgn()isthesignum

4.

Usingthepreviousnotations
wi :=wi +xi *(ty),i =1ton

function

AdaptationofWeightVector. Updatetheweightvectorofthe
perceptrontoobtain
w(n+1)=w(n)+[d(n) y(n)]x(n)

5.

Continuation.Incrementtimestepnbyoneandgobacktostep2.
41

42

TheAdaptiveLinearNeuron
(ADALINE)

TheAdaptiveLinearNeuron
(ADALINE)

ADALINE(AdaptiveLinearNeuronorlater
AdaptiveLinearElement)isanearlysingle
layerartificialneuralnetwork
AdaptiveLinearNeuronwasintroducedby
Widrow andHoff(1960)
Thesamearchitectureofoursimplenetwork

Thistypicallyusesbipolaractivationsforits
inputsignalsandtargetoutput.
Thearchitecturehasweightsthatare
adjustable.
TheADALINEistrainedusingtheDeltaRule
(alsoknownastheLMSorWidrowHoffrule)

43

44

Delta(LMS)LearningRulefor
ADALINE

Delta(LMS)LearningRulefor
ADALINE

Thedeltaruleadjuststheweightstoreduce
thedifferencebetweenactualandthedesired
outputs
Thisresultsinthesmallestmeansquareerror
Thisimprovementensuresthatthetraining
givesyouamoregeneralizedsystem

Learningalgorithm:sameasPerceptron
learningexceptinStep5:
b(new)=b(old)+(t y_in).
wi(new)=wi(old)+(t y_in)xi.
where

45

Delta(LMS)LearningRulefor
ADALINE
Itisderivedinthefollowingmanner.Themeansquared
errorforaparticulartrainingpatternisgivenby

46

Delta(LMS)LearningRulefor
SingleOutputUnit
Thedeltarulechangestheweightsofheneuralconnection
soastominimizethedifferencebetweenthenetinputtothe
outputunitS,andthetargetvaluet.
Theaimistominimizetheerrorforaltrainingpatterns
howeverthisisaccomplishedbyminimizingtheerrorfor
eachpatternsoneatatime.

Eisafunctionofalltheweightswi ,i =1,n.

Thedeltaruleforadjustingtheith weightforeachpatternis

ThegradientE/ wI givesthedirectionofmostrapid
increaseinE,hencetheerrorcanbedecreasedby
adjustingtheweightwI inthedirectionof E/ wI ,
i.e.

Itisderivedinthefollowingmanner.Themeansquarederror
foraparticulartrainingpatternisgivenby

isthelearningrate
47

48

Delta(LMS)LearningRulefor
MultipleOutputUnits

ADALINE DeltaLearningAlgorithm

ThedeltaruleforadjustingtheweightfromtheIth inputto
theJth outputunitforeachpatternis

Themeansquareerrorforaparticulartrainingpatternis

Step0Initializeweights.Setlearningrate
Step1Whilestoppingconditionisfalse,doStep26.
Step2Foreachbipolartrainingpairs:t,doStep35.
Step3Setactivationsofinputunits,i =1,n:xi=si .
Step4Computernetinputtooutputunit:

Step5Updatebiasandweights,i =1,n:
Inthemultipleoutputneuroncaseweagainconsidereach
weightseparatelyfortheerrorreduction.
Thelocalerrorwillreducemostrapidlyforagivenlearning
ratebyadjustingtheweightsaccordingto

b(new)=b(old)+(t y_in).
wi(new)=wi(old)+(t y_in)xi.

Step6Testforstoppingcondition:Ifthelargestweight
changethatoccurredinStep2issmallerthana
specifiedtolerance,thenstop;otherwisecontinue.

49

ADALINE DeltaLearningAlgorithm

50

PerceptronvsDeltaRule
ThedifferencebetweenAdaline andthestandardperceptronisthatinthe
learningphasetheweightsareadjustedaccordingtotheweightedsumof
theinputs(thenet).

Thelearningrate hastobechosen
appropriately:
ASmallvaluewillmakethelearningprocess
extremelyslow.
Alargevaluewillresultinthelearningprocessnot
converging.

Theinitialweightsaresettosmallrandom
values.

Inthestandardperceptron,thenetispassedtotheactivation(transfer)
functionandthefunction'soutputisusedforadjustingtheweights.
perceptrontrainingrule:

usesthresholded unit
convergesafterafinitenumberofiterations
outputhypothesisclassifiestrainingdataperfectly
linearlyseparability necessary

deltarule:

51

usesunthresholded linearunit
convergesasymptoticallytowardaminimumerrorhypothesis
terminationisnotguaranteed
linearseparability notnecessary
52

Вам также может понравиться