Академический Документы
Профессиональный Документы
Культура Документы
DSES481001IntrotoCOMPUTATIONALINTELLIGENCE
&SOFTCOMPUTING
With ever increasing computer power readily available novel engineering methods based
on soft computing are emerging at a rapid rate. This course provides the students with a
working knowledge of computational intelligence (CI) covering the basics of fuzzy logic,
neural networks, genetic algorithms and evolutionary computing, simulated annealing,
wavelet analysis, artificial life and chaos. Applications in control, forecasting,
optimization, data mining, fractal image compression and time series analysis are
illustrated with engineering case studies.
This course provides a handson introduction to the fascinating discipline of
computational intelligence (i.e. the synergistic interplay of fuzzy logic, genetic
algorithms, neural networks, and other soft computing techniques). The students will
develop the skills to solve engineering problems with computational intelligence
paradigms.ThecourserequiresaCIrelatedprojectinthestudentsareaofinterest.
Instructor:
OfficeHours:
ClassTime:
Text(optionakl):
Prof.MarkJ.Embrechts(x4009embrem@rpi.edu)
Thursday1011am(CII5217)
Orbyappointment.
Monday/Thursday:8:309:50am(AmosEatonHall216)
J. S. Jang, C. T. Sun, E. Mizutani, NeuroFuzzy and Soft
Computing,PrenticeHall,1996(1998).ISBN0132610663
Courseisopentograduatestudentsandseniorsofalldisciplines.
GRADING:
Tests
10%
5HomeworkProjects
35%
CourseProject
40%
Presentation
15%
ATTENDANCEPOLICY
Courseattendanceismandatory,amakeupprojectisrequiredforeachmissedclass.A
missedclasswithoutmakeupresultsinthelossofhalfagradepoint.
ACADEMICHONESTY
Homework Projects are individual exercises. You can discuss assignments with your
peers,butnotcopy.Courseprojectmaybeingroupsof2.
COMPUTATIONALINTELLIGENCECOURSEOUTLINE
1.
INTRODUCTIONTOARTIFICIALNEURALNETWORKS(ANNs)
1.1History
1.2Philosophyofneuralnets
1.3Overviewneuralnets
2.
INTRODUCTIONTOFUZZYLOGIC
2.1History
2.2PhilosophyofFuzzyLogic
2.3Terminologyanddefinitions
3.
INTRODUCTIONTOEVOLUTIONARYCOMPUTING
3.1IntroductiontoGeneticAlgorithms
3.2EvolutionaryComputing/Evolutionaryprogramming/GeneticProgramming
3.3Terminologyanddefinitions
4.
NEURALNETWORKAPPLICATIONS/DATAMININGWITHANNs
4.1 Casestudy:timeseriesforecasting(populationforecasting)
4.2 Casestudy:automateddiscoveryofnovelpharmaceuticals(PartI)
4.3 Dataminingwithneuralnetworks
5.
FUZZYLOGICAPPLICATIONS/FUZZYEXPERTSYSTEMS
5.1 Fuzzylogiccasestudy:tipping
5.2 Fuzzyexpertsystems
6.
SIMULATEDANNEALING/GENETICALGORITHMAPPLICATIONS
6.1 Simulatedannealing
6.2 SupervisedclusteringwithGAs
6.3 Casestudy:automateddiscoveryofnovelpharmaceuticals(PartII)
7.
DATAVISUALIZATIONWITHSELFORGANIZINGMAPS
7.1TheKohonenfeaturemap
7.2Casestudy:visualexplorationsfornovelpharmaceuticals(PartIII)
7.
ARTIFICIALLIFE
7.1 Cellularautomata
7.2 Selforganizedcriticality
7.3 Casestudy:highwaytrafficjamsimulation
8.
FRACTALSandCHAOS
8.1 FractalDimension
8.2 IntroductiontoChaos
8.3 IteratedFunctionSystems
9.
WAVELETS
MondayJanuary12, 2004
DSES481001IntrotoCOMPUTATIONALINTELLIGENCE
&SOFTCOMPUTING
Instructor:
Prof.MarkJ.Embrechts(x4009or3714562)(embrem@rpi.edu)
OfficeHours:
Tuesday1012am(CII5217)
Orbyappointment.
ClassTime:
Monday/Thursday:8:309:50(AmosEatonHall216)
TEXT(optional):
LECTURES#13:INTROtoNeuralNetworks
The purpose of the first two lectures is to expose an overview of the philosophy of
artificial neural networks. Today's lecture will provide a brief history of neural network
development and inspire the idea of training a neural network. We will introduce a neural
network as a framework to generate a map from an input space to an output space. Three
basic premises will be discussed to explain artificial neural networks:
(1) AproblemcanbeformulatedandrepresentedasamapfromamdimensionalspaceRm
toandimensionalspaceRn,orRm>Rn.
(2) Sucha mapcan berealized bysetting upan equivalent artificial frameworkofbasic
buildingblocksofMcCullochPittsartificialneurons.Thiscollectionofartificialneurons
formsanartificialneuralnetworkorANN.
(3) Theneuralnetcanbetrainedtoconformtothemapbasedonsamplesofthemapandwill
reasonablygeneralizetonewcasesithasnotencounteredbefore.
Handouts:
1. Mark J. Embrechts, "Problem Solving with Artificial Neural Networks."
2. Course outline and policies.
Tasks:
Start thinking about project topic, meet with me during office hours or by appointment.
PROJECTDEADLINES:
January22
January29
HomeworkProject#0(webpagesummary)
Projectproposal(2typedpages,title,references,
Motivation,deliverable,evaluationcriteria)
WHATISEXPECTEDFROMTHECLASSPROJECT?
Prepare a monologue about a course related subject (15 to 20 written pages and
supportingmaterialinappendices).
Preparea20minutelectureaboutyourprojectandgivepresentation.Handinahard
copyofyourslides.
Aprojectstartsinthelibrary.Preparetospendatleastafulldayinthelibraryover
thecourseoftheproject.Meticulouslywritedownalltherelevantreferences,and
attachacopyofthemostimportantreferencestoyourreport.
Theideaforthelectureandthemonologueisthatyouspendthemaximumamountof
efforttoallowathirdpartytopresentthatsamematerial,basedonyourpreparation,
withaminimalamountofeffort.
A project proposal should be a fluent text of at least 2 full pages, where you are trying
to sell the idea for a research project in a professional way. Therefore the proposal
should contain a clear background and motivation.
The proposal should define a clear set of goals, deliverables, and time table.
Identify how you would consider your project successful and address evaluation
criteria
Make sure you select a title (acronyms and logos are suggested as well), and add a list
of references to your proposal.
PROBLEMSOLVINGWITHARTIFICIALNEURALNETWORKS
MarkJ.Embrechts
1. INTRODUCTIONTONEURALNETWORKS
1.1 Artificialneuralnetworksinanutshell
Thisintroductiontoartificialneuralnetworksexplainsasbrieflyaspossiblewhatiscommonly
understoodbyanartificialneuralnetworkandhowtheycanbeappliedtosolvedatamining
problems. Only the most popular type of neural networks will be discussed here: i.e.,
feedforward neuralnetworks(usuallytrainedwiththepopular backpropagationalgorithm).
Neuralnetsemergedfrompsychologyasalearningparadigm,whichmimicshowthebrain
learns.Therearemanydifferenttypesofneuralnetworks, trainingalgorithms,anddifferent
ways to interpret how and whya neural network operates. A neural network problem is
viewedinthiswriteupasaparameterfreeimplementationofamapanditissilentlyassumed
thatmostdataminingproblemscanbeframedasamap.Thisisaverylimitedview,which
doesnotfullycoverthepowerofartificialneuralnetworks.However,thisviewleadstoa
intuitive basic understanding of the neural network approach for problem solving with a
minimumofotherwisenecessaryintroductorymaterial.
Threebasicpremiseswillbediscussedinordertoexplainartificialneuralnetworks:
(1) AproblemcanbeformulatedandrepresentedasamapfromamdimensionalspaceRm
toandimensionalspaceRn,orRm>Rn.
(2) Suchamapcanbeimplementedbyconstructinganartificialframeworkofbasicbuilding
blocksofMcCullochPittsartificialneurons.Thiscollectionofartificialneuronsformsan
artificialneuralnetwork(ANN).
(3) Theneuralnetcanbetrainedtoconformtothemapbasedonsamplesofthemapandwill
reasonablegeneralizetonewcasesithasnotencounteredbefore.
The next sections expand on these premises and explain a map, McCullochPitts neuron,
artificialneuralnetworkorANN,trainingandgeneralization.
1.2 Framinganequivalentmapforaproblem
Letusstartbyconsideringatokenproblemandreformulatethisproblemasamap.Thetoken
probleminvolvesdecidingwhetherasevenbitbinarynumberisoddoreven.Torestatethis
problemasamaptwospacesareconsidered:asevendimensionalinputspacecontainingall
thesevenbitbinarynumbers,andaonedimensionaloutputspacewithjusttwoelements(or
classes):oddoreven,whichwillbesymbolicallyrepresentedbyaoneorazero.Suchamap
canbeinterpretedasatransformationfromRmtoRn,orRm>Rn(withm=7andn=2).A
mapforthesevenbitparityproblemisillustratedinfigure1.1.
0000000
0000001
0000010
0000011
...
1111111
R7
1
0
R1
The sevenbit parity problem was justframed as aformal mapping problem. The specific
detailsofthemapareyettobedetermined:allwehavesofaristhatwehopethataprecise
functioncanbeformulatedthattransfersthesevenbitbinaryinputspacetoa1dimensional1
bitoutputspacewhichsolvesthesevenbitparityproblem.Wehopethateventuallywecan
specifyagreenboxthatformallycouldbeimplementedasasubroutineinaCcode,wherethe
subroutinewouldhaveaheaderofthetype:
void Parity_Mapping(VECTOR sample, int *decision) {
code line 1;
...
line of code;
*decision = ... ;
} // end of subroutine
In other words: given a seven-bit binary vector as an input to this subroutine (e.g. {1, 0,
1, 1, 0, 0, 1}), we expect the subroutine to return an integer nicknamed "decision. The
value for decision will turn out to be unity or zero, depending on whether the seven-bit
input vector is odd or even.
Wecallthismethodologyagreenboxapproachtoproblemsolvingtoimplythatweonlyhope
thatsuchafunctioncaneventuallyberealized,butthatsofar,wearecluelessabouthow
exactlywearegoingtofillthebodyofthatgreenbox.Ofcourse,youprobablyguessedby
nowthatsomehowartificial neural networkswillbeapplied todothis jobforus.Before
elaboratingonneuralnetworkswestillhavetodiscussasubtlebutimportantpointrelatedto
ourwayofsolvingthesevenbitparityproblem.Implicitlyitisassumedforthisproblemthat
allsevenbitbinarynumbersareavailableandthattheparityofeachsevenbitbinarynumber
isknown.
Letuscomplicatethesevenbitparityproblembyspecifyingthatweknowforthetimebeing
thecorrectparityforonly120ofthe128possible sevenbitbinarynumbers.Wewantto
specifyamapforthese120sevenbitbinarynumberssuchthatthemapwillcorrectlyidentify
theeightremainingbinarynumbers.Thisisamuchmoredifficultproblemthanmappingthe
sevenbitparityproblembasedonallthepossiblesamples,andwhetherananswerexistsand
canbefoundforthistypeofproblemisoftennotclearatallfromtheonset.Themethodology
forlearningwhathastogointhegreenboxforthisproblemwilldividetheavailablesamples
forthismapinatrainingsetasubsetoftheknownsamplesandatestset.Thetestsetwill
beusedonlyforevaluatingthegoodnessofthegreenboximplementationtothemap.
Letusintroduceasecondexampletoillustratehowaregressionproblemcanbereformulated
asamappingproblem.Consideracollectionofimagesofcircles:all64x64blackandwhite
(B&W)pixelimages.Theproblemhereistoinfertheradiiofthesecirclesbasedonthepixel
values.Figure1.2illustrateshowtoformulatethisproblemasaformalmap.A64x64image
couldbescannedrowbyrowandberepresentedbyastringofzerosandonesdepending
whetherthepixeliswhiteorblack.Thisinputspacehas64x64or4096binaryelementsand
canthereforebeconsideredasaspacewith4096dimensions.Theoutputspace is aone
dimensionalnumber,beingtheradiusofthecircleintheappropriateunits.
Wegenerallywouldnotexpectforthisproblemtohaveaccesstoallpossible64x64B&W
images ofcircles todetermine themappingfunction. Wethereforewouldonlyconsidera
representativesampleofcircleimages,somehowuseaneuralnetworktofilloutthegreenbox
tospecifythemap,andhopethatitwillgivethecorrectcircleradiuswithinacertaintolerance
forfuture outofsample 64x64B&Wimageofcircles.Itactuallyturnsoutthattheformal
mapping procedure as described so far would yield lousy estimates for the radius. Some
ingeniousformofpreprocessingontheimagedata(e.g.,consideringselectedfrequenciesofa
2DFouriertransform)willbenecessarytoreducethedimensionalityoftheinputspace.
Mostproblemscanbeformulatedinmultiplewaysasamapofthetype:Rm>Rn.However,
not all problems can be elegantly transformed into a map, and some formal mapping
representations might be betters than others for a particular problem. Often ingenuity,
experimentation, and common sense are called for to frame an appropriate map that can
adequatelyberepresentedbyartificialneuralnetworks.
Map from
R4096 to R1
64
R1
64
R1
R2
R2
Figure 1.2
1.3 TheMcCullochPittsneuronandartificialneuralnetworks
Thefirstneuralnetworkpremisestatesthatmostproblemscanbeformulatedasanequivalent
formalmappingproblem.Thesecondpremisestatesthatsuchamapcanberepresentedbyan
artificialneuralnetwork(orANN):i.e.,aframeworkofbasicbuildingblocks,thesocalled
McCullochPittsartificialneurons.
TheMcCullochPittsneuronwasfirstproposedin1943byWarrenMcCullochandWalter
Pitts, a psychologist and a mathematician, in a paper illustrating how simple artificial
representationsofneuronscouldinprinciplerepresentanyarithmeticfunction.Howtoactually
implementsuchafunctionwasfirstaddressedbythepsychologistDonaldHebbin1949inhis
book"Theorganizationofbehavior."TheMcCullochPittsneuroncaneasilybeunderstoodas
asimplemathematicaloperator.Thisoperatorhasseveralinputsandoneoutputandperforms
twoelementaryoperationsontheinputs:firstitmakesaweightedsumofalltheinputs,and
thenitappliesafunctionaltransformtothatsumwhichwillbesendtotheoutput.Assumethat
r
thereareNinputs{x1,x2,...,xN},oraninputvector xandconsidertheoutputy.Theoutput
ycanbeexpressedasafunctionofitsinputsaccordingtothefollowingequations:
sum
i 1 N
and
y f (sum)
(1)
(2)
So,farwehavenotyetspecifiedthetransferfunctionf(.).Initsmostsimpleformitisjusta
thresholdfunctiongivinganoutputofunitywhenthesumexceedsacertainvalue,andzero
whenthesumisbelowthisvalue.Itiscommonpracticeinneuralnetworkstouseastransfer
functionthesigmoidfunction,whichcanbeexpressedas:
1
1 e sum
f (sum)
(3)
Figure1.3illustratesthebasicoperationsofaMcCullochPittsneuron.Itiscommonpractice
toapplyanappropriatescalingtotheinputs(usuallysuchthateither0<x i<1,or1<xi<1).
x1
w1
w2
f()
w3
x3
xN
Figure 1.3
wN
OnemoreenhancementhastobeclarifiedforthebasicsoftheMcCullochPittsneuron:before
summingtheinputs,theyactuallyhavetobemodifiedbymultiplyingthemwithaweight
vector,{w1,w2,...,wN},sothatinsteadofusingequation(1)andsummingtheinputswewill
makeaweightedsumoftheinputsaccordingtoequation(4).
sum
wi xi
i1 N
(4)
Acollectionofthesebasicoperatorscanbestackedinastructureanartificialneuralnetwork
thatcanhaveanynumberofinputsandanynumberofoutputs.Theneuralnetworkshownin
figure2representsamapwithtwoinputstooneoutput.Therearetwofanoutinputelements
andatotalofsixneurons.Therearethreelayersofneurons,thefirstlayeriscalledthefirst
hiddenlayer,thesecondlayeristhesecondhiddenlayerandtheoutputlayerconsistsofone
neuron.Thereare14weights.Thelayersarefullyconnected.Inthisexamplethereareno
backwardconnectionsandthistypeofneuralnetisthereforecalledafeedforwardnetwork.
Thetypeofneuralnetoffigure1.4isthemostcommonlyencounteredtypeofartificialneural
network,thefeedforwardnet:
(1)
(2)
(3)
Therearenoconnectionsskippinglayers.
Thelayersarefullyconnected.
Thereisusuallyatleastonehiddenlayer.
Itisnothardtoenvisionnowthatanymapcanbetranslatedintoanartificialneuralnetwork
structureatleastformally.Howtodeterminetherightweightsetandhowmanyneuronsto
locateinthehiddenlayerswehavenotyetaddressed.Thisisasubjectforthenextsection.
x1
x2
w11
w12
w13
w
f() w11
f()
22
w23
Output
f() w11neuron
y
f()
f() w21
f() w32
Second hidden
layer
First hidden
layer
Figure 1.4 Typical artificial feedforward neural network.
1.4 Artificialneuralnetworks
Apropernumberofbasicneuronsareappropriatelyconnected
Appropriateweightsareselected
Specifyinganartificialneuralnetworktoconformwithaparticularmapmeansdeterminingthe
neuralnetworkstructureanditsweights.Howtoconnecttheneuronsandhowtoselectthe
weights is the subject of the discipline of artificial neural networks. Even when a neural
networkcanrepresentinprincipleanyfunctionormap,itisnotnecessarilyclearthatonecan
ever specify such a neural network with the existing algorithms. This section will briefly
addresshowtosetupaneuralnetwork,andgiveatleastaconceptualideaaboutdetermining
anappropriateweightset.
The feedforward neural network of figure 1.4 is the most commonly encountered type of
artificial neural net. For most functional maps at least one hidden layer of neurons, and
sometimestwohiddenlayersofneuronsarerequired.Thestructurallayoutofafeedforward
neuralnetworkcannowbedetermined.Forafeedforwardlayeredneuralnetworktwopoints
havetobeaddressedtodeterminethelayout:
(1) Howmanyhiddenlayerstouse?
(2) Howmanyneuronstochooseineachhiddenlayer?
Different experts in the field have often different answers to these questions. A general
guidelinethatworkssurprisinglywellistotryonehiddenlayerfirst,andtochooseasfew
neuronsinthehiddenlayer(s)asonecangetawaywith.
Themostintriguingquestionstillremainsandaddressesthethirdpremiseofneuralnetworks:
itisactuallypossibletocomeupwithalgorithmsthatallowustospecifyagoodweightset.
Howdowedeterminetheweightsofthenetworkfromsamplesofthemap?Canweexpecta
reasonableanswerfornewcasesthatwerenotencounteredbeforefromsuchanetwork?
Itisstraightforwardtodevisealgorithmsthatwilldetermineaweightsetforneuralnetworks
that contain just an input layer and an output layer and nohidden layer(s) of neurons.
However, such networks do not generalize well at all. Neural networks with good
generalizationcapabilitiesrequireatleastonehiddenlayerofneurons.Formanyapplications
suchneuralnetsgeneralizesurprisinglywell.Theneedforhiddenlayersinartificialneural
networkswasalreadyrealizedinthelatefifties.However,inhis1963book"Perceptrons"the
MITprofessorMarvinMinskyarguedthatitmightnotbepossibleatalltocomeupwithany
algorithm to determine a suitable weight set if hidden layers are present in the network
structure. Only in 1986 emerged such an algorithm: the backpropagation algorithm,
popularizedbyRummelhartandMacLellandinaveryclearlywrittenchapterintheirbook
"ParallelDistributedComputing."Thebackpropagationalgorithmwasactuallyinventedand
reinventedseveraltimesanditsoriginalformulationisgenerallycreditedtoPaulWerbos.He
describedthebackpropagationalgorithminhisHarvardPh.D.dissertationin1972,butthis
algorithm was not widely noted at that time. The majority of todays neural network
applicationsreliesinoneformontheotheronthebackpropagationalgorithm.
1.5 Trainingneuralnetworks
Theresultofaneuralnetworkisitsweightset.Determininganappropriateweightsetiscalled
trainingorlearning,basedonthemetaphorthatlearningtakesplaceinthehumanbrainwhich
canbeviewedasacollectionofconnectedbiologicalneurons.Thelearningruleproposedby
Hebbwasthefirstmechanismfordeterminingtheweightsofaneuralnetwork.TheCanadian
DonaldHebbpostulatedthislearningstrategyinthelatefortiesasoneofthebasicmechanisms
howhumansandanimalscanlearn.Lateronitturnedoutthathehitthehammeronthenail
withhisformulation.Hebb'sruleissurprisinglysimple,andwhileinprincipleHebb'srulecan
beusedtotrainmultilayeredneuralnetworkswewillnotelaboratefurtheronthisrule.Letus
justpointoutherethattherearenowmanydifferentneuralnetworkparadigmsandmany
algorithmsfordeterminingtheweightsofaneuralnetwork.Mostofthesealgorithmswork
iteratively:i.e.,onestartsoutwitharandomlyselectedweightset,appliesoneormoresamples
ofthemapping,andgraduallyupgradestheweights.Thisiterativesearchforaproperweight
setiscalledthelearningortrainingphase.
Beforeexplainingtheworkingsofthebackpropagationalgorithmwewillpresentasimple
alternative,therandomsearch.Themostnaiveanswertodetermineaweightsetwhich
rather surprisingly in hindsight did not emerge before the backpropagation principle was
formulated is just to try randomly generated weight sets, and to keep trying with new
randomlygeneratedweightsetsuntilonehitsitjustright.Therandomsearchisatleastin
principleawaytodetermineasuitableweightsetifitweren'tforitsexcessivedemandson
computingtime.Whilethismethodsoundstoonaivetogiveitevenseriousthought,smart
randomsearchparadigms(suchasgeneticalgorithmandsimulatedannealing)arenowadays
actuallylegitimateandwidelyusedtrainingmechanismforneuralnetworks.However,random
searchmethodshavemanywhistlestoblowandbellstoring,andareextremelydemandingon
computingtime.Onlythewideavailabilityofeverfastercomputersallowedthismethodtobe
practicalatall.
Theprocessfordeterminingtheweightsofaneuralnetproceedsintwoseparatestages.Inthe
firststage,thetrainingphase,oneappliesanalgorithmtodetermineahopefullygood
weightsetwithabout2/3oftheavailablemappingsamples.Thegeneralizationperformanceof
thejusttrainedneuralnetissubsequentlyevaluatedinthetestingphasebasedontheremaining
samplesofthemap.
1.6Thebackpropagationalgorithm
An error measure can be defined to quantify the performance of a neural net. This error
functiondependsontheweightvaluesandthemappingsamples.Determiningtheweightsofa
neuralnetworkcanthereforebeinterpretedasanoptimizationproblem,wheretheperformance
errorofthenetworkstructureisminimizedforarepresentativesampleofthemappings.All
paradigmsapplicabletogeneraloptimizationproblemsapplythereforetoneuralnetsaswell.
Thebackpropagationalgorithmiselegantandsimple,andisusedineightypercentofthe
neuralnetworkapplications.Itconsistentlygivesatleastreasonablyacceptableanswersforthe
weight set. The backpropagation algorithm can not be applied to just any optimization
problem,butitisspecificallytailoredtomultilayerfeedforwardneuralnetwork.
Therearemanywaystodefinetheperformanceerrorofaneuralnetwork.Themostcommonly
appliederrormeasureisthemeansquareerror.Thiserror,E,isdeterminedbyshowingevery
sampletothenetandtotallythedifferencesbetweentheactualoutput,o,minusthedesired
targetoutput,t,accordingtoequation(5).
noutputs
oi ti 2
(5)
i1
Traininganeuralnetworkstartsoutwitharandomlyselectedweightset.Abatchofsamplesis
showntothenetwork,andanimprovedweightsetisobtainedbyiterationfollowingequations
(6)and(7).Thenewweightsforaparticularneuron(labeledij)atiteration(n+1),arean
improvementfortheweightsfromiteration(n),bymovingasmallamountonthegradientof
theerrorsurfacetowardsthedirectionoftheminimum.
wij
dE
dwij
(6)
(7)
Equations (6) and (7) represent an iterative steepest descent algorithm, which will always
convergetoalocalminimumoftheerrorfunctionprovidedthatthelearningparameter,a,is
small.Theingenuityofthebackpropagationalgorithmwastocomeupwithasimpleanalytical
expressionforthegradientoftheerrorinmultilayerednetsbyacleverapplicationofthe
chainrule.Whileitwasforawhilecommonlybelievedthatthebackpropagationalgorithm
wastheonlypracticalalgorithmtoimplementequation(7),itisworthpointingoutthatthe
derivativeofEwithrespecttotheweightscaneasilybeestimatednumericallybytweakingthe
weightsalittlebit.Thisapproachisperfectlyvalid,butissignificantlyslowerthantheelegant
backpropagationformulation.Thedetailsforderivingthebackpropagationalgorithmcanbe
foundintheliterature.
1.7 Moreneuralnetworkparadigms
Sofar,webrieflydescribedhowfeedforwardneuralnetscansolveproblemsbyrecastingthe
problemasaformalmap.Theworkingsofthebackpropagationalgorithmtotrainaneural
networkwereformallyexplained.Whiletheviewsandalgorithmspresentedhereconformwith
themainstreamapproachtoneuralnetworkproblemsolving,thereareliteraryhundredsof
differentneuralnetworktypesandtrainingalgorithms.Recastingtheproblemasaformalmap
isjustonepartandoneviewofneuralnet.Forabroaderviewonneuralnetworkswereferto
theliterature.
Atleasttwomoreparadigmsrevolutionizedandpopularizedneuralnetworksintheeighties:
theHopfieldnetandtheKohonennet.ThephysicistJohnHopfieldgainedattentionforneural
networks in1983whenhewroteapaperintheProceedings oftheNational Academyof
Scienceindicatinghowneuralnetworksformanidealframeworktosimulateandexplainthe
statisticalmechanicsofphasetransitions.TheHopfieldnetcanalsobeviewedasarecurrent
contentaddressablememorythatcanbeappliedtoimagerecognition,andtravelingsalesman
typeofoptimizationproblems.Forseveralspecializedapplications,thistypeofnetworkisfar
superiortoanyotherneuralnetworkapproach.TheKohonennetworkproposedbytheFinnish
professorTeuvoKohonenontheotherhandisaonelayerfeedforwardnetworkthatcanbe
viewed as a selflearning implementation of the Kmeans clustering algorithm for vector
quantizationwithpowerfulselforganizingpropertiesandbiologicalrelevance.
Otherpopular,powerfulandcleverneuralnetworkparadigms aretheradialbasisfunction
network, the Boltzmann machine, the counterpropagation network and the ART (adaptive
resonance theory) networks. Radial basis functions can be viewed as a powerful general
1.8 Literature
Thedomainofartificialneuralnetworksisvastandliteratureisexpandingatafastrate.With
theknowledgetobefarfromcomplete letmebriefly discuss myfavoriteneural network
referencesinthissection.Notealsothatanexcellentcomprehensiveintroductiontoneural
networkscanbefoundunderthefrequentlyaskedquestionsonneuralnetworksfilesatvarious
WWWwebsites(i.e.searchFAQneuralnetworksinAltaVista).
JosePrincipe
Probablythestandardtextbooknowforteachingneuralnetworks.Comeswithademoversion
ofNeurosolutions.
Neural and Adaptive Systems: Fundamentals Through Simulations, Jose Principe, Neil R.
Euliano,andW.CurtLefebre,JohnWiley2000.
Hagan,Demuth,andBeale
Anexcellentbookforbasiccomprehensiveundergraduateteaching,goingbacktobasicswith
lotsofLinearAlgebraaswellandgoodMATLABillustrationfilesis
NeuralNetworkDesign,Hagan,demuth,andBeale,PWSPublishingCompany,1996.
JosephP.Bigus
Bigus wrote an excellent introduction to neural networks for data mining for the nontechnical reader. The book makes a good case why neural networks are an important data
mining tool and the power and limitations of neural networks for data mining. Some
conceptual case studies are discussed. The book does not really discuss the theory of
neural networks, or how exactly to apply neural networks to a data mining problem, but it
gives nevertheless many practical hints and tips.
Data Mining with Neural Networks: Solving Business Problems from Application
DevelopmenttoDecisionSupport,McGrawHill(1997).
MaureenCaudill
MaureenCaudillhaspublishedseveralbooksthataimtothebeginnersmarketandprovide
valuableinsightintheworkingsofneuralnets.Morethanherebooks,Iwouldrecommenda
seriesofarticlesthatappearedinthepopularmonthlymagazineAIEXPERT.Collectionsof
Caudill'sarticlesarebundledasseparatespecialeditionsofAIEXPERT.
PhillipD.Wasserman
Wassermanpublishedtwoveryreadablebooksexplainingneuralnetworks.Hehasaknackto
explaindifficultparadigmsefficientlyandunderstandablywithaminimumofmathematical
diversions.
NeuralComputing,VanNostrandReinhold(1990).
AdvancedMethodsinNeuralComputing,VanNostrandReinhold(1993).
JacekM.Zurada
Zuradapublishedthefirstbooksonneuralnetworksthatcanbeconsideredatextbook.Itisan
introductorylevelgraduateengineeringcoursewithanelectricalengineeringbiasandcomes
withawealthofhomeworkproblemsandsoftware.
IntroductiontoArtificialNeuralSystems,WestPublishingCompany(1992).
LaureneFausett
An excellent introductory textbook on the advanced undergraduate level with a wealth of
homeworkproblems.
FundamentalsofNeuralNetworks:Architecture,Algorithms,andApplication, PrenticeHall
(1994).
SimonHaykin
Nicknamedthebibleofneuralnetworksbymystudentsthis700pageworkcanbeconsidered
both as a desktop reference and advanced graduate level text on neural networks with
challenginghomeworkproblems.
NeuralNetworks:AComprehensive Foundation, MacMillanCollege PublishingCompany
(1995).
MohammedH.Hassoun
Excellentgraduateleveltextbookwithclearexplanationsandacollectionofveryappropriate
homeworkproblems.
FundamentalsofArtificialNeuralNetworks,MITPress(1995).
JohnHertz,AndersKrogh,andRichardG.Palmer
This book is one of the earlier better books on neural networks and provides a thorough
understandingofthevariousneuralparadigmsandhowandwhyneuralnetworkswork.This
bookisexcellentforitsreferencesandhasanextremelyhighinformationdensity.Eventhough
this book is heavy on the Hopfield network and the statistical mechanics interpretation, I
probablyconsultthisbookmorethananyother.Itdoesnotlenditselfwellasatextbook,but
forawhileitwasoneofthefewgoodbooksavailable.Highlyrecommended.
Introduction to the Theory of Neural Computation, Addison Wesley Publishing Company
(1991).
TimothyMasters
MasterswroteaseriesofthreebooksinshortsuccessionandIwouldcallhiscollectionof
booktheuser'sguidetoneuralnetworks.Ifyouprogramyourownnetworksthewealthof
informationisinvaluable.Ifyouuseneuralnetworks,thewealthofinformationisinvaluable.
Thebookscomewithsoftwareandallsourcecodeisincluded.Thesoftwareisverypowerful,
butisgearedtowardtheseriousC++userandlacksadecentuser'sinterfaceforthenonC++
initiated.Amustforthebeginnerandtheadvanceduser.
PracticalNeuralNetworkrecipesinC++,AcademicPress,Inc.(1993).
SignalandImageProcessingwithneuralNetworks,JohnWiley(1994).
AdvancedAlgorithmsforNeuralNetworks:AC++Sourcebook,JohnWiley(1995).
BartKosko
Advancedelectricalengineeringgraduateleveltextbook.Excellentforfuzzylogicandneural
network control applications. Not recommended for general introduction or advanced
reference.
NeuralNetworksandFuzzySystems,PrenticeHall(1992).
GuidoJ.DeBoeck
Ifyouareseriousaboutapplyingneuralnetworksforstockmarketspeculationthisbookisa
goodstartingpoint.Notheory,justapplications.
TradingontheEdge:Neural,geneticandfuzzysystemsforchaoticFinancialMarkets,John
Wiley&Sons(1994).
2.
2.1 Introduction
The purpose of this case study is to expose an overview of the philosophy of artificial neural
networks. This case study will inspire the view of neural networks as a model free regression
technique. The study presented here describes how to estimate the world's population for the
year 2025 based on traditional regression techniques and based on an artificial neural network.
In the previous section an artificial neural network was explained as a biologically inspired
model that can implement a map. This model is based on an interconnection of elementary
McCulloch-Pitts neurons. It was postulated that:
(a)
(b)
(c)
The most popular algorithm for training a neural network is the backpropagation algorithm
which has been rediscovered in various fields over and over again and is generally credited to
Dr. Paul Werbos.[1] The backpropagation algorithm was widely popularized in 1986 by
Rumelhart and McClelland[2] explaining why the surge in popularity of artificial neural
networks is a relatively recent phenomenon. The derivation and implementation details of the
backpropagation algorithm are referred to the literature.
2.2 Population forecasting
The reverend Thomas Malthus identified in 1798 in his seminal work "An essay on the
principle of population"[3] that the world's population grows exponentially while agricultural
output grows linearly, predicting gloom and doom for future generations. Indeed, the rapidly
expanding population on our planet reminds us daily that the resources on our planet have to be
carefully mended to survive gracefully during the next few decades. The data for the world's
population from 1650 through 1996 are summarized in Table I and figure 2.1.[4]
TABLE I. Estimates for the world population (1650 1996)
YEAR
1650
1750
1850
1900
1950
1960
1970
1980
1990
1995
1996
3027
3678
4478
5292
5734
5772
In order to build a model for population forecasting we will normalize the data points (Table
II). The year 1650 is re-scaled as 0.0 and 2025 as 1.0 and we interpolate linearly in between for
all the other years. The reason for doing such a normalization is that it is customary (and often
required) for neural networks to scale the data between zero and unity. Since our largest
considered year will be 2025 it will be re-scaled as unity. The reader can easily verify that a
linear re-normalization of a variable x between a maximum value (max) and a minimal value
(min) will lead to a re-normalized value (xnor) according to:
xnor
x min
max min
Because the population increases so rapidly with time we will work with the natural logarithm
of the population (in million) and then re-normalize these data according to the above formula,
where (anticipating the possibility for a large forecast for the world's population in 2025) we
used 12 as the maximum possible value for the re-normalized logarithm of the population in
2025 and 6.153 as the minimum value. In other words: max in the above formula was
arbitrarily assigned a value of 12 to assure that the neural net predictions can accommodate
large values. Table II illustrates these transforms for the world population data.
Figure 2.1
POP
470
694
1091
1571
2513
3027
3678
4478
5292
5734
5772
YEARnor
0.000
0.267
0.533
0.667
0.800
0.827
0.853
0.880
0.907
0.920
0.923
ln(POP)
6.153
6.542
6.995
7.359
7.829
8.015
8.210
8.407
8.574
8.654
8.661
POPnor
0.000
0.067
0.144
0.206
0.287
0.318
0.352
0.385
0.414
0.428
0.429
E yi Y yi ax i b
i 1
i1
E 0
a
E
0
b
or
E
2 yi Ax i b xi 0
a
E
2 yi Ax i b 0
b
Itisleftasanexercisetothereadertoverifythatthisyieldsforaandb
i 1
i 1
i1
2
N xi yi xi yi
N xi 2 xi
i
i
b y ax
where,
y
x
1
N
yi
1
N
xi
i1
N
i1
TableIIIillustratesthenumericalcalculationofaandb,wherethefirsttendataentrieswere
used(withotherwords,wedonotconsiderthe1996datapoint).
TABLE III. Estimates of World Population and corresponding normalizations
Xnor
Ynor
xy
x2
.
0.000
0.000
0.000
0.000
0.267
0.067
0.018
0.071
0.533
0.144
0.077
0.284
0.667
0.206
0.137
0.445
0.800
0.287
0.230
0.640
0.827
0.318
0.263
0.684
0.853
0.352
0.300
0.728
0.880
0.385
0.339
0.774
0.907
0.414
0.375
0.823
0.920
0.428
0.394
0.846
6.654
2.601
2.133
5.295
Expressions for a and b cab be evaluated based on the data in Table III.
10 2.133 6.654 2.601
0.464
10 5.295 6.654 2
b 0.260 0.464 0.665 0.0486
a
Forecasting for the year 2025 according to the regression model yields the following for the
normalized value for the population:
y2025 a 1.0 b 0.464 0.0486 0.415
When re-scaling back into the natural logarithm of the actual population we obtain:
ln POP2025 max min y2025 min 12 6.153 0. 415 6.153 8.580
The actual population estimate for the year 2025 is the exponent of this value leading to an
estimate of 5321 million people. Obviously this value is not what we would expect or accept as
a forecast. What happened actually is that over the considered time period (1650 - 1996) the
population has actually been exploding faster than exponentially and the postulated exponential
model is not a very good one. The flaws in this simple regression approach become obvious
when we plot the data and their approximations in the re-normalized frame according to figure
2.2. Our model has an obvious flaw, but the approach we took here is a typical regression
implementation. Only by plotting our data and predictions, and often after the fact, becomes
the reason for the poor or invalid estimate obvious. More seasoned statisticians would suggest
that we try an approximation of the type:
y a becx
dx e
or use ARMA models and/or other state-of-the-art time series forecasting tools. All these
methods are fair game for forecasting and can yield reliable estimates in the hands of the
experienced analyst. Nevertheless, from this simple case study we can conclude so far that
forecasting the world's population seems to be a challenging forecasting problem indeed.
2.4 Simple neural network model for population forecasting
In this section we will develop the neural network approach for building a population
forecasting model. We will define a very simple network with one input element, two neurons
in the hidden layer and one output neuron. We will however include two bias nodes (dummy
nodes with input unity) which is standard practice for most neural network applications. The
network has common sigmoid transfer functions and the bias is just an elegant way to allow
some shifts in the transfer functions as well. The sigmoid transfer function can be viewed as a
crude approximation for the threshold function. Remember that an artificial neuron can be
viewed as a mathematical operator with following functions:
Figure 2.2
entries.
Figure 2.3.
f z
1
as a crude approximation to the threshold
1 e z
function. Note that the introduction of bias nodes (i.e., dummy nodes with
input unity, as shown in figure 4) allows horizontal shifts of the sigmoid
a)
b)
Figure2.4isarepresentationofoursimpleneuralnetwork.Notethattherearethreeneurons
andtwobiasnodes.Therearethreelayers:aninputlayer,onehiddenlayerandanoutputlayer.
Onlythehiddenlayerandtheoutputlayercontainneurons:suchanetworkisreferredtoasa
1x2x1 net. The two operations of a neuron (weighted sum and transfer function) are
symbolicallyrepresentedonthefigureforeachneuron(bythesymbols and f). In order for a
neural network to be a robust function approximator at least one hidden layer of neurons and
generally at most two hidden layers of neurons are required. The neural network represented in
figure 2.4 is the most common neural network of the feedforward type and is fully connected.
The unknown weights are indicated on the figure by the symbols w1, w2 ,...,w7 .
The weights can be considered as being the neural network equivalent for the unknown
regressioncoefficientsfromourregressionmodel.Thealgorithmforfindingthesecoefficients
thatwasappliedhereisthestandardbackpropagationalgorithm,whichminimizesthesumof
thesquaresoftheerrorssimilartothewayhowitwasdoneforregressionanalysis.However,
contrarytoregressionanalysis,aniterativenumericalminimizationprocedureratherthanan
analyticalderivationwasappliedtoestimatetheweightsinordertominimizetheleastsquares
errormeasure.Thebackpropagationalgorithmusesaclevertricktosolvethisproblemwhena
hiddenlayerofneuronsispresentinthemodel.Byallmeansthinkofaneuralnetworkasa
moresophisticatedregressionmodel.Itisdifferentfromaregressionmodelinthesensethat
wedonotspecifylinearorhigherordermodelsfortheregressionanalysis.Wespecifyonlya
neuralnetworkframe(numberoflayersofneurons,andnumberofneuronsineachlayer)and
lettheneuralnetworkalgorithmworkoutwhattheproperchoicefortheweightswillbe.This
approachisoftenreferredtoasamodelfreeapproximationmethod,becausewereallydonot
specify whether we are dealing with a linear, quadratic or exponential model. The neural
networkwastrainedwithMetaNeural,ageneralpurposeneuralnetworkprogramthatuses
thebackpropagationalgorithmandrunsonmostcomputerplatforms.Theneuralnetworkwas
trained onthe same 10patterns that wereused forthe regression analysis and the screen
responseisillustratedinfigure2.5.
HIDDEN LAYER
f
neuron 1
w1
INPUT
w2
w3
1
Bias node
Figure 2. 4
Figure 2.5
w5
w6
neuron 2
w4
OUTPUT
neuron 3
w7
Bias node
Screen response from MetaNeural for training and testing the population
forecasting model.
Hands-on details for the network training will be left for lecture 3, where we will gain handson exposure to artificial neural network programs. The files that were used for the
MetaNeural program are reproduced in the appendix. The program gave 0.48118 as the
prediction for the normalized population forecasts in 2025. After re-scaling this would
correspond to 7836 million people. Probably a rather underestimated forecast, but definitely
better than the regression model. The weights corresponding to this forecast model are
reproduced in Table IV. The problem of the neural network model is that a 1-2-1 net is a rather
simplistic network and that the way we represented the patterns too much emphasis is placed
on the earlier years (1650 - 1850) which are really not all that relevant. By over-sampling (i.e.,
presenting the data from 1950 onward let's say three times as often than the other data) and
choosing a 1-3-4-1 network, the way a more seasoned practitioner might approach this
problem, we actually obtained a forecast of 11.02 billion people for the world's population in
2025. This answer seems to be a lot more reasonable than the one obtained from the 1-2-1
network. Changing to the 1-3-4-1 model is just a matter of changing a few numbers in the input
file for MetaNeural and can be done in a matter of seconds. The results for the predictions
with the 1-3-4-1 network with over-sampling are shown in figure 2.6.
Figure 2.6.
World population prediction with a 1-3-41 artificial neural network with oversampling.
TABLE IV. Weight values corresponding to the neural network in figure 2.4
WEIGHT
w1
w2
w3
w4
w5
w6
w7
VALUE
-2.6378
2.4415
1.6161
-1.3550
-3.6308
3.0321
-1.3795
2.6 Conclusions
A neural network can be viewed as a least-squares model-free regression-like approximator
that can implement almost any map. The illustration of a forecasting model for the world's
population with a simple neural network proceeds similar to regression analysis and relatively
straightforward. The fact that neural networks are model-free approximators is often
advantageous over traditional statistical forecasting methods and standard time series analysis
techniques. Where neural networks differ from standard regression techniques is the way how
the least-squares error minimization procedure was implemented: while regression techniques
rely on closed one-step analytical formulas, the neural network approach employs a numerical
iterative backpropagation algorithm.
2.7 Exercises for the brave
1.
Derive the expressions for the parameters a, b, c, d, and e for the following regression
model:
y a becx
dx e
and forecast the world's population for the year 2025 based on this model.
2.
Write a MATLAB program that implements the evaluation of the network shown in
figure 4 and verify the population forecast for the year 2025 based on this 1-21 neural
network model and the weights shown in TABLE IV.
3.
Expand the MATLAB program of exercise 2 to a program that can train the weights of a
neural network based on a random search model. I.e., Start with and initial random
collection for the weights (let's say all chosen from a uniform random distribution
between -1.0 and +1.0). Then iteratively adjust the weights by making small random
perturbations (one weight at a time), evaluate the new error after showing all the training
samples, and retain the perturbed weight if it is smaller. Repeat this process until the
network has a reasonably small error.
2.8 References
[1]
[2]
[3]
[4]
P. Werbos, "Beyond regression: New tools for prediction and analysis in the behavioral
sciences," Ph.D. thesis, Harvard University (1974).
D. E. Rumelhart, G. Hinton, and R. J. Williams, "Learning internal representations by
error propagation," In Parallel distributed processing: explorations in the
microstructure of cognition, Vol. 1, D. E. Rumelhart and James L. McClelland, Eds.,
Chapter 8, pp. 318-362, MIT Press, Cambridge, MA (1986).
Malthus, "An Essay on the Principle of Population," 1798. Republished in the Pelican
Classics series, Penguin Books, England (1976).
Otto Johnson, Ed., "1997 Information Please Almanac," Houghton Mifflin Company
Boston & New York (1996).
0.000
0.067
0.144
0.206
0.287
0.318
0.352
0.385
0.414
0.428
0
1
2
3
4
5
6
7
8
9
10 training pattern
first training pattern
second training pattern