Вы находитесь на странице: 1из 30

COURSEANNOUNCEMENTSpring2004

DSES481001IntrotoCOMPUTATIONALINTELLIGENCE
&SOFTCOMPUTING
With ever increasing computer power readily available novel engineering methods based
on soft computing are emerging at a rapid rate. This course provides the students with a
working knowledge of computational intelligence (CI) covering the basics of fuzzy logic,
neural networks, genetic algorithms and evolutionary computing, simulated annealing,
wavelet analysis, artificial life and chaos. Applications in control, forecasting,
optimization, data mining, fractal image compression and time series analysis are
illustrated with engineering case studies.
This course provides a handson introduction to the fascinating discipline of
computational intelligence (i.e. the synergistic interplay of fuzzy logic, genetic
algorithms, neural networks, and other soft computing techniques). The students will
develop the skills to solve engineering problems with computational intelligence
paradigms.ThecourserequiresaCIrelatedprojectinthestudentsareaofinterest.
Instructor:
OfficeHours:
ClassTime:
Text(optionakl):

Prof.MarkJ.Embrechts(x4009embrem@rpi.edu)
Thursday1011am(CII5217)
Orbyappointment.
Monday/Thursday:8:309:50am(AmosEatonHall216)
J. S. Jang, C. T. Sun, E. Mizutani, NeuroFuzzy and Soft
Computing,PrenticeHall,1996(1998).ISBN0132610663

Courseisopentograduatestudentsandseniorsofalldisciplines.
GRADING:
Tests
10%
5HomeworkProjects
35%
CourseProject
40%
Presentation
15%
ATTENDANCEPOLICY
Courseattendanceismandatory,amakeupprojectisrequiredforeachmissedclass.A
missedclasswithoutmakeupresultsinthelossofhalfagradepoint.
ACADEMICHONESTY
Homework Projects are individual exercises. You can discuss assignments with your
peers,butnotcopy.Courseprojectmaybeingroupsof2.

COMPUTATIONALINTELLIGENCECOURSEOUTLINE
1.

INTRODUCTIONTOARTIFICIALNEURALNETWORKS(ANNs)
1.1History
1.2Philosophyofneuralnets
1.3Overviewneuralnets

2.

INTRODUCTIONTOFUZZYLOGIC
2.1History
2.2PhilosophyofFuzzyLogic
2.3Terminologyanddefinitions

3.

INTRODUCTIONTOEVOLUTIONARYCOMPUTING
3.1IntroductiontoGeneticAlgorithms
3.2EvolutionaryComputing/Evolutionaryprogramming/GeneticProgramming
3.3Terminologyanddefinitions

4.

NEURALNETWORKAPPLICATIONS/DATAMININGWITHANNs
4.1 Casestudy:timeseriesforecasting(populationforecasting)
4.2 Casestudy:automateddiscoveryofnovelpharmaceuticals(PartI)
4.3 Dataminingwithneuralnetworks

5.

FUZZYLOGICAPPLICATIONS/FUZZYEXPERTSYSTEMS
5.1 Fuzzylogiccasestudy:tipping
5.2 Fuzzyexpertsystems

6.

SIMULATEDANNEALING/GENETICALGORITHMAPPLICATIONS
6.1 Simulatedannealing
6.2 SupervisedclusteringwithGAs
6.3 Casestudy:automateddiscoveryofnovelpharmaceuticals(PartII)

7.

DATAVISUALIZATIONWITHSELFORGANIZINGMAPS
7.1TheKohonenfeaturemap
7.2Casestudy:visualexplorationsfornovelpharmaceuticals(PartIII)

7.

ARTIFICIALLIFE
7.1 Cellularautomata
7.2 Selforganizedcriticality
7.3 Casestudy:highwaytrafficjamsimulation

8.

FRACTALSandCHAOS
8.1 FractalDimension
8.2 IntroductiontoChaos
8.3 IteratedFunctionSystems

9.

WAVELETS
MondayJanuary12, 2004
DSES481001IntrotoCOMPUTATIONALINTELLIGENCE
&SOFTCOMPUTING

Instructor:

Prof.MarkJ.Embrechts(x4009or3714562)(embrem@rpi.edu)

OfficeHours:

Tuesday1012am(CII5217)
Orbyappointment.

ClassTime:

Monday/Thursday:8:309:50(AmosEatonHall216)

TEXT(optional):

J. S. Jang, C. T. Sun, E. Mizutani, NeuroFuzzy and Soft


Computing,PrenticeHall,1996.(1998)ISBN0132610663

LECTURES#13:INTROtoNeuralNetworks
The purpose of the first two lectures is to expose an overview of the philosophy of
artificial neural networks. Today's lecture will provide a brief history of neural network
development and inspire the idea of training a neural network. We will introduce a neural
network as a framework to generate a map from an input space to an output space. Three
basic premises will be discussed to explain artificial neural networks:
(1) AproblemcanbeformulatedandrepresentedasamapfromamdimensionalspaceRm
toandimensionalspaceRn,orRm>Rn.
(2) Sucha mapcan berealized bysetting upan equivalent artificial frameworkofbasic
buildingblocksofMcCullochPittsartificialneurons.Thiscollectionofartificialneurons
formsanartificialneuralnetworkorANN.
(3) Theneuralnetcanbetrainedtoconformtothemapbasedonsamplesofthemapandwill
reasonablygeneralizetonewcasesithasnotencounteredbefore.
Handouts:
1. Mark J. Embrechts, "Problem Solving with Artificial Neural Networks."
2. Course outline and policies.
Tasks:

Start thinking about project topic, meet with me during office hours or by appointment.

PROJECTDEADLINES:
January22
January29

HomeworkProject#0(webpagesummary)
Projectproposal(2typedpages,title,references,
Motivation,deliverable,evaluationcriteria)

WHATISEXPECTEDFROMTHECLASSPROJECT?

Prepare a monologue about a course related subject (15 to 20 written pages and
supportingmaterialinappendices).

Preparea20minutelectureaboutyourprojectandgivepresentation.Handinahard
copyofyourslides.

Aprojectstartsinthelibrary.Preparetospendatleastafulldayinthelibraryover
thecourseoftheproject.Meticulouslywritedownalltherelevantreferences,and
attachacopyofthemostimportantreferencestoyourreport.

Theideaforthelectureandthemonologueisthatyouspendthemaximumamountof
efforttoallowathirdpartytopresentthatsamematerial,basedonyourpreparation,
withaminimalamountofeffort.

The project should be a finished and selfconsistent document where you


meticulouslydigesttheprerequisitematerial,giveabriefintroductiontoyourwork,
and motivate the relevance of the material. Handson program development and
personalexpansionsofandreflectionsontheliteraturearestronglyencouraged.If
yourprojectinvolvesprogramming,handinaworkingversionoftheprogram(with
sourcecode)anddocumenttheprogramwithausersmanualandsampleproblems.

It is expected that you spend on average 6 hours/week on the class project.


PROJECT PROPOSAL

A project proposal should be a fluent text of at least 2 full pages, where you are trying
to sell the idea for a research project in a professional way. Therefore the proposal
should contain a clear background and motivation.
The proposal should define a clear set of goals, deliverables, and time table.
Identify how you would consider your project successful and address evaluation
criteria

Make sure you select a title (acronyms and logos are suggested as well), and add a list
of references to your proposal.

PROBLEMSOLVINGWITHARTIFICIALNEURALNETWORKS
MarkJ.Embrechts
1. INTRODUCTIONTONEURALNETWORKS
1.1 Artificialneuralnetworksinanutshell

Thisintroductiontoartificialneuralnetworksexplainsasbrieflyaspossiblewhatiscommonly
understoodbyanartificialneuralnetworkandhowtheycanbeappliedtosolvedatamining
problems. Only the most popular type of neural networks will be discussed here: i.e.,
feedforward neuralnetworks(usuallytrainedwiththepopular backpropagationalgorithm).
Neuralnetsemergedfrompsychologyasalearningparadigm,whichmimicshowthebrain
learns.Therearemanydifferenttypesofneuralnetworks, trainingalgorithms,anddifferent
ways to interpret how and whya neural network operates. A neural network problem is
viewedinthiswriteupasaparameterfreeimplementationofamapanditissilentlyassumed
thatmostdataminingproblemscanbeframedasamap.Thisisaverylimitedview,which
doesnotfullycoverthepowerofartificialneuralnetworks.However,thisviewleadstoa
intuitive basic understanding of the neural network approach for problem solving with a
minimumofotherwisenecessaryintroductorymaterial.
Threebasicpremiseswillbediscussedinordertoexplainartificialneuralnetworks:
(1) AproblemcanbeformulatedandrepresentedasamapfromamdimensionalspaceRm
toandimensionalspaceRn,orRm>Rn.
(2) Suchamapcanbeimplementedbyconstructinganartificialframeworkofbasicbuilding
blocksofMcCullochPittsartificialneurons.Thiscollectionofartificialneuronsformsan
artificialneuralnetwork(ANN).
(3) Theneuralnetcanbetrainedtoconformtothemapbasedonsamplesofthemapandwill
reasonablegeneralizetonewcasesithasnotencounteredbefore.
The next sections expand on these premises and explain a map, McCullochPitts neuron,
artificialneuralnetworkorANN,trainingandgeneralization.
1.2 Framinganequivalentmapforaproblem

Letusstartbyconsideringatokenproblemandreformulatethisproblemasamap.Thetoken
probleminvolvesdecidingwhetherasevenbitbinarynumberisoddoreven.Torestatethis
problemasamaptwospacesareconsidered:asevendimensionalinputspacecontainingall
thesevenbitbinarynumbers,andaonedimensionaloutputspacewithjusttwoelements(or
classes):oddoreven,whichwillbesymbolicallyrepresentedbyaoneorazero.Suchamap

canbeinterpretedasatransformationfromRmtoRn,orRm>Rn(withm=7andn=2).A
mapforthesevenbitparityproblemisillustratedinfigure1.1.
0000000
0000001
0000010
0000011
...
1111111

map from R7to R1

R7

1
0

R1

Figure 1.1 The seven-bit parity problem posed as a mapping problem.

The sevenbit parity problem was justframed as aformal mapping problem. The specific
detailsofthemapareyettobedetermined:allwehavesofaristhatwehopethataprecise
functioncanbeformulatedthattransfersthesevenbitbinaryinputspacetoa1dimensional1
bitoutputspacewhichsolvesthesevenbitparityproblem.Wehopethateventuallywecan
specifyagreenboxthatformallycouldbeimplementedasasubroutineinaCcode,wherethe
subroutinewouldhaveaheaderofthetype:
void Parity_Mapping(VECTOR sample, int *decision) {
code line 1;
...
line of code;
*decision = ... ;
} // end of subroutine

In other words: given a seven-bit binary vector as an input to this subroutine (e.g. {1, 0,
1, 1, 0, 0, 1}), we expect the subroutine to return an integer nicknamed "decision. The
value for decision will turn out to be unity or zero, depending on whether the seven-bit
input vector is odd or even.
Wecallthismethodologyagreenboxapproachtoproblemsolvingtoimplythatweonlyhope
thatsuchafunctioncaneventuallyberealized,butthatsofar,wearecluelessabouthow
exactlywearegoingtofillthebodyofthatgreenbox.Ofcourse,youprobablyguessedby
nowthatsomehowartificial neural networkswillbeapplied todothis jobforus.Before
elaboratingonneuralnetworkswestillhavetodiscussasubtlebutimportantpointrelatedto
ourwayofsolvingthesevenbitparityproblem.Implicitlyitisassumedforthisproblemthat
allsevenbitbinarynumbersareavailableandthattheparityofeachsevenbitbinarynumber
isknown.

Letuscomplicatethesevenbitparityproblembyspecifyingthatweknowforthetimebeing
thecorrectparityforonly120ofthe128possible sevenbitbinarynumbers.Wewantto
specifyamapforthese120sevenbitbinarynumberssuchthatthemapwillcorrectlyidentify
theeightremainingbinarynumbers.Thisisamuchmoredifficultproblemthanmappingthe
sevenbitparityproblembasedonallthepossiblesamples,andwhetherananswerexistsand
canbefoundforthistypeofproblemisoftennotclearatallfromtheonset.Themethodology
forlearningwhathastogointhegreenboxforthisproblemwilldividetheavailablesamples
forthismapinatrainingsetasubsetoftheknownsamplesandatestset.Thetestsetwill
beusedonlyforevaluatingthegoodnessofthegreenboximplementationtothemap.
Letusintroduceasecondexampletoillustratehowaregressionproblemcanbereformulated
asamappingproblem.Consideracollectionofimagesofcircles:all64x64blackandwhite
(B&W)pixelimages.Theproblemhereistoinfertheradiiofthesecirclesbasedonthepixel
values.Figure1.2illustrateshowtoformulatethisproblemasaformalmap.A64x64image
couldbescannedrowbyrowandberepresentedbyastringofzerosandonesdepending
whetherthepixeliswhiteorblack.Thisinputspacehas64x64or4096binaryelementsand
canthereforebeconsideredasaspacewith4096dimensions.Theoutputspace is aone
dimensionalnumber,beingtheradiusofthecircleintheappropriateunits.
Wegenerallywouldnotexpectforthisproblemtohaveaccesstoallpossible64x64B&W
images ofcircles todetermine themappingfunction. Wethereforewouldonlyconsidera
representativesampleofcircleimages,somehowuseaneuralnetworktofilloutthegreenbox
tospecifythemap,andhopethatitwillgivethecorrectcircleradiuswithinacertaintolerance
forfuture outofsample 64x64B&Wimageofcircles.Itactuallyturnsoutthattheformal
mapping procedure as described so far would yield lousy estimates for the radius. Some
ingeniousformofpreprocessingontheimagedata(e.g.,consideringselectedfrequenciesofa
2DFouriertransform)willbenecessarytoreducethedimensionalityoftheinputspace.
Mostproblemscanbeformulatedinmultiplewaysasamapofthetype:Rm>Rn.However,
not all problems can be elegantly transformed into a map, and some formal mapping
representations might be betters than others for a particular problem. Often ingenuity,
experimentation, and common sense are called for to frame an appropriate map that can
adequatelyberepresentedbyartificialneuralnetworks.

Map from
R4096 to R1

64
R1

64

R1
R2

R2

Figure 1.2

Determining the radius of a 64x64 B&W image of a circle, posed as a formal


mapping problem.

1.3 TheMcCullochPittsneuronandartificialneuralnetworks

Thefirstneuralnetworkpremisestatesthatmostproblemscanbeformulatedasanequivalent
formalmappingproblem.Thesecondpremisestatesthatsuchamapcanberepresentedbyan
artificialneuralnetwork(orANN):i.e.,aframeworkofbasicbuildingblocks,thesocalled
McCullochPittsartificialneurons.
TheMcCullochPittsneuronwasfirstproposedin1943byWarrenMcCullochandWalter
Pitts, a psychologist and a mathematician, in a paper illustrating how simple artificial
representationsofneuronscouldinprinciplerepresentanyarithmeticfunction.Howtoactually
implementsuchafunctionwasfirstaddressedbythepsychologistDonaldHebbin1949inhis
book"Theorganizationofbehavior."TheMcCullochPittsneuroncaneasilybeunderstoodas
asimplemathematicaloperator.Thisoperatorhasseveralinputsandoneoutputandperforms
twoelementaryoperationsontheinputs:firstitmakesaweightedsumofalltheinputs,and
thenitappliesafunctionaltransformtothatsumwhichwillbesendtotheoutput.Assumethat
r
thereareNinputs{x1,x2,...,xN},oraninputvector xandconsidertheoutputy.Theoutput
ycanbeexpressedasafunctionofitsinputsaccordingtothefollowingequations:
sum

i 1 N

and

y f (sum)

(1)
(2)

So,farwehavenotyetspecifiedthetransferfunctionf(.).Initsmostsimpleformitisjusta
thresholdfunctiongivinganoutputofunitywhenthesumexceedsacertainvalue,andzero
whenthesumisbelowthisvalue.Itiscommonpracticeinneuralnetworkstouseastransfer
functionthesigmoidfunction,whichcanbeexpressedas:

1
1 e sum

f (sum)

(3)

Figure1.3illustratesthebasicoperationsofaMcCullochPittsneuron.Itiscommonpractice
toapplyanappropriatescalingtotheinputs(usuallysuchthateither0<x i<1,or1<xi<1).

x1

w1
w2

f()

w3
x3
xN
Figure 1.3

wN

The McCulloch-Pitts artificial neuron as a mathematical operator.

OnemoreenhancementhastobeclarifiedforthebasicsoftheMcCullochPittsneuron:before
summingtheinputs,theyactuallyhavetobemodifiedbymultiplyingthemwithaweight
vector,{w1,w2,...,wN},sothatinsteadofusingequation(1)andsummingtheinputswewill
makeaweightedsumoftheinputsaccordingtoequation(4).

sum

wi xi

i1 N

(4)

Acollectionofthesebasicoperatorscanbestackedinastructureanartificialneuralnetwork
thatcanhaveanynumberofinputsandanynumberofoutputs.Theneuralnetworkshownin
figure2representsamapwithtwoinputstooneoutput.Therearetwofanoutinputelements
andatotalofsixneurons.Therearethreelayersofneurons,thefirstlayeriscalledthefirst
hiddenlayer,thesecondlayeristhesecondhiddenlayerandtheoutputlayerconsistsofone
neuron.Thereare14weights.Thelayersarefullyconnected.Inthisexamplethereareno
backwardconnectionsandthistypeofneuralnetisthereforecalledafeedforwardnetwork.
Thetypeofneuralnetoffigure1.4isthemostcommonlyencounteredtypeofartificialneural
network,thefeedforwardnet:
(1)
(2)
(3)

Therearenoconnectionsskippinglayers.
Thelayersarefullyconnected.
Thereisusuallyatleastonehiddenlayer.

Itisnothardtoenvisionnowthatanymapcanbetranslatedintoanartificialneuralnetwork
structureatleastformally.Howtodeterminetherightweightsetandhowmanyneuronsto
locateinthehiddenlayerswehavenotyetaddressed.Thisisasubjectforthenextsection.

x1

x2

w11
w12
w13
w

f() w11
f()

22

w23

Output
f() w11neuron
y

f()
f() w21

f() w32

Second hidden
layer
First hidden
layer
Figure 1.4 Typical artificial feedforward neural network.

1.4 Artificialneuralnetworks

An artificial neural network is a collection of connected McCullochPitts neurons. Neural


networkscanformallyrepresentalmostanyfunctionalmapprovidedthat:
(1)
(2)

Apropernumberofbasicneuronsareappropriatelyconnected
Appropriateweightsareselected

Specifyinganartificialneuralnetworktoconformwithaparticularmapmeansdeterminingthe
neuralnetworkstructureanditsweights.Howtoconnecttheneuronsandhowtoselectthe
weights is the subject of the discipline of artificial neural networks. Even when a neural
networkcanrepresentinprincipleanyfunctionormap,itisnotnecessarilyclearthatonecan
ever specify such a neural network with the existing algorithms. This section will briefly
addresshowtosetupaneuralnetwork,andgiveatleastaconceptualideaaboutdetermining
anappropriateweightset.
The feedforward neural network of figure 1.4 is the most commonly encountered type of
artificial neural net. For most functional maps at least one hidden layer of neurons, and
sometimestwohiddenlayersofneuronsarerequired.Thestructurallayoutofafeedforward
neuralnetworkcannowbedetermined.Forafeedforwardlayeredneuralnetworktwopoints
havetobeaddressedtodeterminethelayout:

(1) Howmanyhiddenlayerstouse?
(2) Howmanyneuronstochooseineachhiddenlayer?
Different experts in the field have often different answers to these questions. A general
guidelinethatworkssurprisinglywellistotryonehiddenlayerfirst,andtochooseasfew
neuronsinthehiddenlayer(s)asonecangetawaywith.
Themostintriguingquestionstillremainsandaddressesthethirdpremiseofneuralnetworks:
itisactuallypossibletocomeupwithalgorithmsthatallowustospecifyagoodweightset.
Howdowedeterminetheweightsofthenetworkfromsamplesofthemap?Canweexpecta
reasonableanswerfornewcasesthatwerenotencounteredbeforefromsuchanetwork?
Itisstraightforwardtodevisealgorithmsthatwilldetermineaweightsetforneuralnetworks
that contain just an input layer and an output layer and nohidden layer(s) of neurons.
However, such networks do not generalize well at all. Neural networks with good
generalizationcapabilitiesrequireatleastonehiddenlayerofneurons.Formanyapplications
suchneuralnetsgeneralizesurprisinglywell.Theneedforhiddenlayersinartificialneural
networkswasalreadyrealizedinthelatefifties.However,inhis1963book"Perceptrons"the
MITprofessorMarvinMinskyarguedthatitmightnotbepossibleatalltocomeupwithany
algorithm to determine a suitable weight set if hidden layers are present in the network
structure. Only in 1986 emerged such an algorithm: the backpropagation algorithm,
popularizedbyRummelhartandMacLellandinaveryclearlywrittenchapterintheirbook
"ParallelDistributedComputing."Thebackpropagationalgorithmwasactuallyinventedand
reinventedseveraltimesanditsoriginalformulationisgenerallycreditedtoPaulWerbos.He
describedthebackpropagationalgorithminhisHarvardPh.D.dissertationin1972,butthis
algorithm was not widely noted at that time. The majority of todays neural network
applicationsreliesinoneformontheotheronthebackpropagationalgorithm.
1.5 Trainingneuralnetworks

Theresultofaneuralnetworkisitsweightset.Determininganappropriateweightsetiscalled
trainingorlearning,basedonthemetaphorthatlearningtakesplaceinthehumanbrainwhich
canbeviewedasacollectionofconnectedbiologicalneurons.Thelearningruleproposedby
Hebbwasthefirstmechanismfordeterminingtheweightsofaneuralnetwork.TheCanadian
DonaldHebbpostulatedthislearningstrategyinthelatefortiesasoneofthebasicmechanisms
howhumansandanimalscanlearn.Lateronitturnedoutthathehitthehammeronthenail
withhisformulation.Hebb'sruleissurprisinglysimple,andwhileinprincipleHebb'srulecan
beusedtotrainmultilayeredneuralnetworkswewillnotelaboratefurtheronthisrule.Letus
justpointoutherethattherearenowmanydifferentneuralnetworkparadigmsandmany
algorithmsfordeterminingtheweightsofaneuralnetwork.Mostofthesealgorithmswork
iteratively:i.e.,onestartsoutwitharandomlyselectedweightset,appliesoneormoresamples
ofthemapping,andgraduallyupgradestheweights.Thisiterativesearchforaproperweight
setiscalledthelearningortrainingphase.

Beforeexplainingtheworkingsofthebackpropagationalgorithmwewillpresentasimple
alternative,therandomsearch.Themostnaiveanswertodetermineaweightsetwhich
rather surprisingly in hindsight did not emerge before the backpropagation principle was
formulated is just to try randomly generated weight sets, and to keep trying with new
randomlygeneratedweightsetsuntilonehitsitjustright.Therandomsearchisatleastin
principleawaytodetermineasuitableweightsetifitweren'tforitsexcessivedemandson
computingtime.Whilethismethodsoundstoonaivetogiveitevenseriousthought,smart
randomsearchparadigms(suchasgeneticalgorithmandsimulatedannealing)arenowadays
actuallylegitimateandwidelyusedtrainingmechanismforneuralnetworks.However,random
searchmethodshavemanywhistlestoblowandbellstoring,andareextremelydemandingon
computingtime.Onlythewideavailabilityofeverfastercomputersallowedthismethodtobe
practicalatall.
Theprocessfordeterminingtheweightsofaneuralnetproceedsintwoseparatestages.Inthe
firststage,thetrainingphase,oneappliesanalgorithmtodetermineahopefullygood
weightsetwithabout2/3oftheavailablemappingsamples.Thegeneralizationperformanceof
thejusttrainedneuralnetissubsequentlyevaluatedinthetestingphasebasedontheremaining
samplesofthemap.
1.6Thebackpropagationalgorithm
An error measure can be defined to quantify the performance of a neural net. This error
functiondependsontheweightvaluesandthemappingsamples.Determiningtheweightsofa
neuralnetworkcanthereforebeinterpretedasanoptimizationproblem,wheretheperformance
errorofthenetworkstructureisminimizedforarepresentativesampleofthemappings.All
paradigmsapplicabletogeneraloptimizationproblemsapplythereforetoneuralnetsaswell.
Thebackpropagationalgorithmiselegantandsimple,andisusedineightypercentofthe
neuralnetworkapplications.Itconsistentlygivesatleastreasonablyacceptableanswersforthe
weight set. The backpropagation algorithm can not be applied to just any optimization
problem,butitisspecificallytailoredtomultilayerfeedforwardneuralnetwork.
Therearemanywaystodefinetheperformanceerrorofaneuralnetwork.Themostcommonly
appliederrormeasureisthemeansquareerror.Thiserror,E,isdeterminedbyshowingevery
sampletothenetandtotallythedifferencesbetweentheactualoutput,o,minusthedesired
targetoutput,t,accordingtoequation(5).

noutputs

oi ti 2

(5)

i1

Traininganeuralnetworkstartsoutwitharandomlyselectedweightset.Abatchofsamplesis
showntothenetwork,andanimprovedweightsetisobtainedbyiterationfollowingequations
(6)and(7).Thenewweightsforaparticularneuron(labeledij)atiteration(n+1),arean

improvementfortheweightsfromiteration(n),bymovingasmallamountonthegradientof
theerrorsurfacetowardsthedirectionoftheminimum.

wij(n1) wij(n) wij

wij

dE
dwij

(6)

(7)

Equations (6) and (7) represent an iterative steepest descent algorithm, which will always
convergetoalocalminimumoftheerrorfunctionprovidedthatthelearningparameter,a,is
small.Theingenuityofthebackpropagationalgorithmwastocomeupwithasimpleanalytical
expressionforthegradientoftheerrorinmultilayerednetsbyacleverapplicationofthe
chainrule.Whileitwasforawhilecommonlybelievedthatthebackpropagationalgorithm
wastheonlypracticalalgorithmtoimplementequation(7),itisworthpointingoutthatthe
derivativeofEwithrespecttotheweightscaneasilybeestimatednumericallybytweakingthe
weightsalittlebit.Thisapproachisperfectlyvalid,butissignificantlyslowerthantheelegant
backpropagationformulation.Thedetailsforderivingthebackpropagationalgorithmcanbe
foundintheliterature.
1.7 Moreneuralnetworkparadigms

Sofar,webrieflydescribedhowfeedforwardneuralnetscansolveproblemsbyrecastingthe
problemasaformalmap.Theworkingsofthebackpropagationalgorithmtotrainaneural
networkwereformallyexplained.Whiletheviewsandalgorithmspresentedhereconformwith
themainstreamapproachtoneuralnetworkproblemsolving,thereareliteraryhundredsof
differentneuralnetworktypesandtrainingalgorithms.Recastingtheproblemasaformalmap
isjustonepartandoneviewofneuralnet.Forabroaderviewonneuralnetworkswereferto
theliterature.
Atleasttwomoreparadigmsrevolutionizedandpopularizedneuralnetworksintheeighties:
theHopfieldnetandtheKohonennet.ThephysicistJohnHopfieldgainedattentionforneural
networks in1983whenhewroteapaperintheProceedings oftheNational Academyof
Scienceindicatinghowneuralnetworksformanidealframeworktosimulateandexplainthe
statisticalmechanicsofphasetransitions.TheHopfieldnetcanalsobeviewedasarecurrent
contentaddressablememorythatcanbeappliedtoimagerecognition,andtravelingsalesman
typeofoptimizationproblems.Forseveralspecializedapplications,thistypeofnetworkisfar
superiortoanyotherneuralnetworkapproach.TheKohonennetworkproposedbytheFinnish
professorTeuvoKohonenontheotherhandisaonelayerfeedforwardnetworkthatcanbe
viewed as a selflearning implementation of the Kmeans clustering algorithm for vector
quantizationwithpowerfulselforganizingpropertiesandbiologicalrelevance.
Otherpopular,powerfulandcleverneuralnetworkparadigms aretheradialbasisfunction
network, the Boltzmann machine, the counterpropagation network and the ART (adaptive
resonance theory) networks. Radial basis functions can be viewed as a powerful general

regressiontechnique formultidimensional functionapproximationwhichemployGaussian


transferfunctionswithdifferentstandarddeviations.TheBoltzmannmachineisarecursive
simulated annealing type of network with arbitrary network configuration. HechtNielsen's
counterpropagationnetworkcleverlycombinesafeedforwardneuralnetworkstructurewitha
Kohonenlayer.Grossberg'sARTnetworksuseasimilarideabutcanbeelegantlyimplemented
inhardwareandretainsahighlevelofbiologicalplausibility.
There is room as well for more specialized networks such as Oja's rules for principal
component analysis, wavelet networks, cellular automata networks and Fukushima's
neocognitron.Waveletnetworksutilizethepowerfulwavelettransformandgenerallycombine
elements of the Kohonen layer with radial basis function techniques. Cellular automata
networksareaneuralnetworkimplementationofthecellularautomataparadigm,popularized
by Mathemtica's inventor, Stephen Wolfram. Fukushima's neocognitron is a multilayered
network with weight sharing and feature extraction properties that has shown the best
performanceforhandwritingandOCRrecognitionapplications.
Avarietyofhigherordermethodsimprovethespeedofthebackpropagationapproach.Most
widely applied are conjugate gradient networks and the LevenbergMarquardt algorithm.
Recursivenetworkswithfeedbackconnectionsaremoreandmoreapplied,especiallyinneuro
controlproblems.Forcontrolapplicationsspecializedandpowerfulneuralnetworkparadigms
havebeendevelopedanditisworthwhilenotingthataonetooneequivalencecanbederived
betweenfeedforwardneuralnetsofthebackpropagationtypeandKalmanfilters.Fuzzylogic
andneuralnetworksareoftencombinedforcontrolproblems.
Thereisnoshortageofneuralnetworktoolsandmostparadigmscanbeappliedtoawide
range of problems. Most neural network implementations rely on the backpropagation
algorithm.However,whichneuralnetworkparadigmtouseisoftenasecondaryquestionand
whatevertheuserfeelscomfortablewithisfairgame.

1.8 Literature

Thedomainofartificialneuralnetworksisvastandliteratureisexpandingatafastrate.With
theknowledgetobefarfromcomplete letmebriefly discuss myfavoriteneural network
referencesinthissection.Notealsothatanexcellentcomprehensiveintroductiontoneural
networkscanbefoundunderthefrequentlyaskedquestionsonneuralnetworksfilesatvarious
WWWwebsites(i.e.searchFAQneuralnetworksinAltaVista).
JosePrincipe
Probablythestandardtextbooknowforteachingneuralnetworks.Comeswithademoversion
ofNeurosolutions.

Neural and Adaptive Systems: Fundamentals Through Simulations, Jose Principe, Neil R.
Euliano,andW.CurtLefebre,JohnWiley2000.
Hagan,Demuth,andBeale
Anexcellentbookforbasiccomprehensiveundergraduateteaching,goingbacktobasicswith
lotsofLinearAlgebraaswellandgoodMATLABillustrationfilesis
NeuralNetworkDesign,Hagan,demuth,andBeale,PWSPublishingCompany,1996.
JosephP.Bigus
Bigus wrote an excellent introduction to neural networks for data mining for the nontechnical reader. The book makes a good case why neural networks are an important data
mining tool and the power and limitations of neural networks for data mining. Some
conceptual case studies are discussed. The book does not really discuss the theory of
neural networks, or how exactly to apply neural networks to a data mining problem, but it
gives nevertheless many practical hints and tips.
Data Mining with Neural Networks: Solving Business Problems from Application
DevelopmenttoDecisionSupport,McGrawHill(1997).
MaureenCaudill
MaureenCaudillhaspublishedseveralbooksthataimtothebeginnersmarketandprovide
valuableinsightintheworkingsofneuralnets.Morethanherebooks,Iwouldrecommenda
seriesofarticlesthatappearedinthepopularmonthlymagazineAIEXPERT.Collectionsof
Caudill'sarticlesarebundledasseparatespecialeditionsofAIEXPERT.
PhillipD.Wasserman
Wassermanpublishedtwoveryreadablebooksexplainingneuralnetworks.Hehasaknackto
explaindifficultparadigmsefficientlyandunderstandablywithaminimumofmathematical
diversions.
NeuralComputing,VanNostrandReinhold(1990).
AdvancedMethodsinNeuralComputing,VanNostrandReinhold(1993).
JacekM.Zurada
Zuradapublishedthefirstbooksonneuralnetworksthatcanbeconsideredatextbook.Itisan
introductorylevelgraduateengineeringcoursewithanelectricalengineeringbiasandcomes
withawealthofhomeworkproblemsandsoftware.
IntroductiontoArtificialNeuralSystems,WestPublishingCompany(1992).
LaureneFausett
An excellent introductory textbook on the advanced undergraduate level with a wealth of
homeworkproblems.

FundamentalsofNeuralNetworks:Architecture,Algorithms,andApplication, PrenticeHall
(1994).
SimonHaykin
Nicknamedthebibleofneuralnetworksbymystudentsthis700pageworkcanbeconsidered
both as a desktop reference and advanced graduate level text on neural networks with
challenginghomeworkproblems.
NeuralNetworks:AComprehensive Foundation, MacMillanCollege PublishingCompany
(1995).
MohammedH.Hassoun
Excellentgraduateleveltextbookwithclearexplanationsandacollectionofveryappropriate
homeworkproblems.
FundamentalsofArtificialNeuralNetworks,MITPress(1995).
JohnHertz,AndersKrogh,andRichardG.Palmer
This book is one of the earlier better books on neural networks and provides a thorough
understandingofthevariousneuralparadigmsandhowandwhyneuralnetworkswork.This
bookisexcellentforitsreferencesandhasanextremelyhighinformationdensity.Eventhough
this book is heavy on the Hopfield network and the statistical mechanics interpretation, I
probablyconsultthisbookmorethananyother.Itdoesnotlenditselfwellasatextbook,but
forawhileitwasoneofthefewgoodbooksavailable.Highlyrecommended.
Introduction to the Theory of Neural Computation, Addison Wesley Publishing Company
(1991).

TimothyMasters
MasterswroteaseriesofthreebooksinshortsuccessionandIwouldcallhiscollectionof
booktheuser'sguidetoneuralnetworks.Ifyouprogramyourownnetworksthewealthof
informationisinvaluable.Ifyouuseneuralnetworks,thewealthofinformationisinvaluable.
Thebookscomewithsoftwareandallsourcecodeisincluded.Thesoftwareisverypowerful,
butisgearedtowardtheseriousC++userandlacksadecentuser'sinterfaceforthenonC++
initiated.Amustforthebeginnerandtheadvanceduser.
PracticalNeuralNetworkrecipesinC++,AcademicPress,Inc.(1993).
SignalandImageProcessingwithneuralNetworks,JohnWiley(1994).
AdvancedAlgorithmsforNeuralNetworks:AC++Sourcebook,JohnWiley(1995).
BartKosko

Advancedelectricalengineeringgraduateleveltextbook.Excellentforfuzzylogicandneural
network control applications. Not recommended for general introduction or advanced
reference.
NeuralNetworksandFuzzySystems,PrenticeHall(1992).
GuidoJ.DeBoeck
Ifyouareseriousaboutapplyingneuralnetworksforstockmarketspeculationthisbookisa
goodstartingpoint.Notheory,justapplications.
TradingontheEdge:Neural,geneticandfuzzysystemsforchaoticFinancialMarkets,John
Wiley&Sons(1994).

2.

NEURAL NETWORK CASE STUDY POPULATION FORECASTING

2.1 Introduction
The purpose of this case study is to expose an overview of the philosophy of artificial neural
networks. This case study will inspire the view of neural networks as a model free regression
technique. The study presented here describes how to estimate the world's population for the
year 2025 based on traditional regression techniques and based on an artificial neural network.
In the previous section an artificial neural network was explained as a biologically inspired
model that can implement a map. This model is based on an interconnection of elementary
McCulloch-Pitts neurons. It was postulated that:
(a)
(b)
(c)

Most real-world problems can be formulated as a map.


Such a map can be formally represented by an artificial neural network, where the socalled "weights" are the free parameters to be determined.
Neural networks can "train" their weights to conform with a map using powerful
computational algorithms. This model for the map does not only represent the "training
samples" quite reasonably, but generally extrapolates well to "test samples" that were not
used to train the neural network.

The most popular algorithm for training a neural network is the backpropagation algorithm
which has been rediscovered in various fields over and over again and is generally credited to
Dr. Paul Werbos.[1] The backpropagation algorithm was widely popularized in 1986 by
Rumelhart and McClelland[2] explaining why the surge in popularity of artificial neural
networks is a relatively recent phenomenon. The derivation and implementation details of the
backpropagation algorithm are referred to the literature.
2.2 Population forecasting
The reverend Thomas Malthus identified in 1798 in his seminal work "An essay on the
principle of population"[3] that the world's population grows exponentially while agricultural
output grows linearly, predicting gloom and doom for future generations. Indeed, the rapidly
expanding population on our planet reminds us daily that the resources on our planet have to be
carefully mended to survive gracefully during the next few decades. The data for the world's
population from 1650 through 1996 are summarized in Table I and figure 2.1.[4]
TABLE I. Estimates for the world population (1650 1996)
YEAR
1650
1750
1850
1900
1950

POPULATION (in millions)


470
694
1091
1571
2513

1960
1970
1980
1990
1995
1996

3027
3678
4478
5292
5734
5772

In order to build a model for population forecasting we will normalize the data points (Table
II). The year 1650 is re-scaled as 0.0 and 2025 as 1.0 and we interpolate linearly in between for
all the other years. The reason for doing such a normalization is that it is customary (and often
required) for neural networks to scale the data between zero and unity. Since our largest
considered year will be 2025 it will be re-scaled as unity. The reader can easily verify that a
linear re-normalization of a variable x between a maximum value (max) and a minimal value
(min) will lead to a re-normalized value (xnor) according to:
xnor

x min
max min

Because the population increases so rapidly with time we will work with the natural logarithm
of the population (in million) and then re-normalize these data according to the above formula,
where (anticipating the possibility for a large forecast for the world's population in 2025) we
used 12 as the maximum possible value for the re-normalized logarithm of the population in
2025 and 6.153 as the minimum value. In other words: max in the above formula was
arbitrarily assigned a value of 12 to assure that the neural net predictions can accommodate
large values. Table II illustrates these transforms for the world population data.

Figure 2.1

Estimates of the world population between 1650 and 1996.

TABLE II. Estimates of World Population and corresponding normalizations


YEAR
1650
1750
1850
1900
1950
1960
1970
1980
1990
1995
1996

POP
470
694
1091
1571
2513
3027
3678
4478
5292
5734
5772

YEARnor
0.000
0.267
0.533
0.667
0.800
0.827
0.853
0.880
0.907
0.920
0.923

ln(POP)
6.153
6.542
6.995
7.359
7.829
8.015
8.210
8.407
8.574
8.654
8.661

POPnor
0.000
0.067
0.144
0.206
0.287
0.318
0.352
0.385
0.414
0.428
0.429

2.3 Traditional regression model for population forecasting


First we will apply traditional regression techniques to population forecasting. The classical
Malthusian model assumes that the population grows as an exponential curve. This equivalent
to stating that the natural logarithm of the population will grow linearly with time. Because the
re-normalization in the previous paragraph re-scaled the population numbers first into their
natural logarithms, we should be able to get by with a linear regression model for the re-scaled
values. With other words, we are trying to determine the unknown coefficients a and b in the
following population model:
POPNOR a YEARNOR b
or,usingthetraditionalsymbolsYandXforthedependentandtheindependentvariables
Y aX b
Itiscustomaryinregressionanalysistodeterminethecoefficientsaandbsuchthatthesumof
thesquaresoftheerrors(E)betweenthemodeledvaluesandtheactualvaluesisminimized.
Withotherwords,thefollowingfunctionneedstobeminimized:
N

E yi Y yi ax i b
i 1

i1

ThereareNdatapoints, xi and yi aretheactualdatapoints,andtheYvaluesaretheestimates


accordingtothemodel.Thevaluesforthecoefficientsaandbforwhichthiserrorisminimal
can be found by setting the partial derivatives of the error with respect to the unknown
coefficientsaandbequaltozeroandsolvingthissetoftwoequationsfortheseunknown
coefficients.Thisleadstothefollowing:

E 0

a
E
0

b
or

E
2 yi Ax i b xi 0
a
E
2 yi Ax i b 0
b

Itisleftasanexercisetothereadertoverifythatthisyieldsforaandb

i 1

i 1

i1
2

N xi yi xi yi

N xi 2 xi
i
i

b y ax
where,
y
x

1
N

yi

1
N

xi

i1
N
i1

TableIIIillustratesthenumericalcalculationofaandb,wherethefirsttendataentrieswere
used(withotherwords,wedonotconsiderthe1996datapoint).
TABLE III. Estimates of World Population and corresponding normalizations
Xnor
Ynor
xy
x2
.
0.000
0.000
0.000
0.000
0.267
0.067
0.018
0.071
0.533
0.144
0.077
0.284
0.667
0.206
0.137
0.445
0.800
0.287
0.230
0.640
0.827
0.318
0.263
0.684
0.853
0.352
0.300
0.728
0.880
0.385
0.339
0.774
0.907
0.414
0.375
0.823
0.920
0.428
0.394
0.846

6.654
2.601
2.133
5.295
Expressions for a and b cab be evaluated based on the data in Table III.
10 2.133 6.654 2.601
0.464
10 5.295 6.654 2
b 0.260 0.464 0.665 0.0486
a

Forecasting for the year 2025 according to the regression model yields the following for the
normalized value for the population:
y2025 a 1.0 b 0.464 0.0486 0.415
When re-scaling back into the natural logarithm of the actual population we obtain:
ln POP2025 max min y2025 min 12 6.153 0. 415 6.153 8.580
The actual population estimate for the year 2025 is the exponent of this value leading to an
estimate of 5321 million people. Obviously this value is not what we would expect or accept as
a forecast. What happened actually is that over the considered time period (1650 - 1996) the
population has actually been exploding faster than exponentially and the postulated exponential
model is not a very good one. The flaws in this simple regression approach become obvious
when we plot the data and their approximations in the re-normalized frame according to figure
2.2. Our model has an obvious flaw, but the approach we took here is a typical regression
implementation. Only by plotting our data and predictions, and often after the fact, becomes
the reason for the poor or invalid estimate obvious. More seasoned statisticians would suggest
that we try an approximation of the type:
y a becx

dx e

or use ARMA models and/or other state-of-the-art time series forecasting tools. All these
methods are fair game for forecasting and can yield reliable estimates in the hands of the
experienced analyst. Nevertheless, from this simple case study we can conclude so far that
forecasting the world's population seems to be a challenging forecasting problem indeed.
2.4 Simple neural network model for population forecasting
In this section we will develop the neural network approach for building a population
forecasting model. We will define a very simple network with one input element, two neurons
in the hidden layer and one output neuron. We will however include two bias nodes (dummy
nodes with input unity) which is standard practice for most neural network applications. The
network has common sigmoid transfer functions and the bias is just an elegant way to allow
some shifts in the transfer functions as well. The sigmoid transfer function can be viewed as a
crude approximation for the threshold function. Remember that an artificial neuron can be
viewed as a mathematical operator with following functions:

Figure 2.2
entries.

Figure 2.3.

Results from regression analysis on logarithmically normalized data

The sigmoid function

f z

1
as a crude approximation to the threshold
1 e z

function. Note that the introduction of bias nodes (i.e., dummy nodes with
input unity, as shown in figure 4) allows horizontal shifts of the sigmoid

(and/or threshold function) allowing more powerful and more flexible


approximation.

a)
b)

Make a weighted sum of the input signals resulting in a signal z.


Apply a transfer function f(z) to the signal z, which in the case of a sigmoid corresponds
to:
1
f z
1 e z
asillustratedinfigure2.3.

Figure2.4isarepresentationofoursimpleneuralnetwork.Notethattherearethreeneurons
andtwobiasnodes.Therearethreelayers:aninputlayer,onehiddenlayerandanoutputlayer.
Onlythehiddenlayerandtheoutputlayercontainneurons:suchanetworkisreferredtoasa
1x2x1 net. The two operations of a neuron (weighted sum and transfer function) are
symbolicallyrepresentedonthefigureforeachneuron(bythesymbols and f). In order for a
neural network to be a robust function approximator at least one hidden layer of neurons and
generally at most two hidden layers of neurons are required. The neural network represented in
figure 2.4 is the most common neural network of the feedforward type and is fully connected.
The unknown weights are indicated on the figure by the symbols w1, w2 ,...,w7 .
The weights can be considered as being the neural network equivalent for the unknown
regressioncoefficientsfromourregressionmodel.Thealgorithmforfindingthesecoefficients
thatwasappliedhereisthestandardbackpropagationalgorithm,whichminimizesthesumof
thesquaresoftheerrorssimilartothewayhowitwasdoneforregressionanalysis.However,
contrarytoregressionanalysis,aniterativenumericalminimizationprocedureratherthanan
analyticalderivationwasappliedtoestimatetheweightsinordertominimizetheleastsquares
errormeasure.Thebackpropagationalgorithmusesaclevertricktosolvethisproblemwhena
hiddenlayerofneuronsispresentinthemodel.Byallmeansthinkofaneuralnetworkasa
moresophisticatedregressionmodel.Itisdifferentfromaregressionmodelinthesensethat
wedonotspecifylinearorhigherordermodelsfortheregressionanalysis.Wespecifyonlya
neuralnetworkframe(numberoflayersofneurons,andnumberofneuronsineachlayer)and
lettheneuralnetworkalgorithmworkoutwhattheproperchoicefortheweightswillbe.This
approachisoftenreferredtoasamodelfreeapproximationmethod,becausewereallydonot
specify whether we are dealing with a linear, quadratic or exponential model. The neural
networkwastrainedwithMetaNeural,ageneralpurposeneuralnetworkprogramthatuses
thebackpropagationalgorithmandrunsonmostcomputerplatforms.Theneuralnetworkwas
trained onthe same 10patterns that wereused forthe regression analysis and the screen
responseisillustratedinfigure2.5.

HIDDEN LAYER

f
neuron 1

w1
INPUT

w2

w3

1
Bias node

Figure 2. 4

Figure 2.5

w5

w6

neuron 2

w4

OUTPUT

neuron 3

w7

Bias node

Neural network approximation for the population forecasting problem.

Screen response from MetaNeural for training and testing the population
forecasting model.

Hands-on details for the network training will be left for lecture 3, where we will gain handson exposure to artificial neural network programs. The files that were used for the
MetaNeural program are reproduced in the appendix. The program gave 0.48118 as the
prediction for the normalized population forecasts in 2025. After re-scaling this would
correspond to 7836 million people. Probably a rather underestimated forecast, but definitely
better than the regression model. The weights corresponding to this forecast model are

reproduced in Table IV. The problem of the neural network model is that a 1-2-1 net is a rather
simplistic network and that the way we represented the patterns too much emphasis is placed
on the earlier years (1650 - 1850) which are really not all that relevant. By over-sampling (i.e.,
presenting the data from 1950 onward let's say three times as often than the other data) and
choosing a 1-3-4-1 network, the way a more seasoned practitioner might approach this
problem, we actually obtained a forecast of 11.02 billion people for the world's population in
2025. This answer seems to be a lot more reasonable than the one obtained from the 1-2-1
network. Changing to the 1-3-4-1 model is just a matter of changing a few numbers in the input
file for MetaNeural and can be done in a matter of seconds. The results for the predictions
with the 1-3-4-1 network with over-sampling are shown in figure 2.6.

Figure 2.6.

World population prediction with a 1-3-41 artificial neural network with oversampling.

TABLE IV. Weight values corresponding to the neural network in figure 2.4
WEIGHT
w1
w2
w3
w4
w5
w6
w7

VALUE
-2.6378
2.4415
1.6161
-1.3550
-3.6308
3.0321
-1.3795

2.6 Conclusions
A neural network can be viewed as a least-squares model-free regression-like approximator
that can implement almost any map. The illustration of a forecasting model for the world's
population with a simple neural network proceeds similar to regression analysis and relatively
straightforward. The fact that neural networks are model-free approximators is often
advantageous over traditional statistical forecasting methods and standard time series analysis
techniques. Where neural networks differ from standard regression techniques is the way how
the least-squares error minimization procedure was implemented: while regression techniques
rely on closed one-step analytical formulas, the neural network approach employs a numerical
iterative backpropagation algorithm.
2.7 Exercises for the brave
1.

Derive the expressions for the parameters a, b, c, d, and e for the following regression
model:
y a becx

dx e

and forecast the world's population for the year 2025 based on this model.
2.

Write a MATLAB program that implements the evaluation of the network shown in
figure 4 and verify the population forecast for the year 2025 based on this 1-21 neural
network model and the weights shown in TABLE IV.

3.

Expand the MATLAB program of exercise 2 to a program that can train the weights of a
neural network based on a random search model. I.e., Start with and initial random
collection for the weights (let's say all chosen from a uniform random distribution
between -1.0 and +1.0). Then iteratively adjust the weights by making small random
perturbations (one weight at a time), evaluate the new error after showing all the training
samples, and retain the perturbed weight if it is smaller. Repeat this process until the
network has a reasonably small error.

2.8 References
[1]
[2]

[3]
[4]

P. Werbos, "Beyond regression: New tools for prediction and analysis in the behavioral
sciences," Ph.D. thesis, Harvard University (1974).
D. E. Rumelhart, G. Hinton, and R. J. Williams, "Learning internal representations by
error propagation," In Parallel distributed processing: explorations in the
microstructure of cognition, Vol. 1, D. E. Rumelhart and James L. McClelland, Eds.,
Chapter 8, pp. 318-362, MIT Press, Cambridge, MA (1986).
Malthus, "An Essay on the Principle of Population," 1798. Republished in the Pelican
Classics series, Penguin Books, England (1976).
Otto Johnson, Ed., "1997 Information Please Almanac," Houghton Mifflin Company
Boston & New York (1996).

APPENDIX: INPUT FILES FOR 1-2-1 NETWORK FOR MetaNeural


ANNOTATED MetaNeural INPUT FILE: POP
3
1
2
1
1
0.1
0.1
0.5
0.5
1000
500
1
1
pop.pat
0
100
0.01
1
0.6

Three layered network


One input node
2 neurons in the hidden layer
One output neuron
Show all samples and then update weights
Learning parameter first layer of weights
Learning parameter second layer of weights
Momentum first layer of weights
Momentum second layer of weights
Do thousand iterations (for all patterns)
Show intermediate results every 500 iterations on the screen
Standard [0, 1] sigmoid transfer function
Temperature one for sigmoid (i.e., standard sigmoid)
Name of training pattern file
Ignored
Ignored
Stop training when error is less than 0.01
Initial weights are drawn from a uniform random distribution
between [-0.6, 0.6]

POP.PAT: The pattern file


10
0.000
0.267
0.533
0.667
0.800
0.827
0.853
0.880
0.907
0.920

0.000
0.067
0.144
0.206
0.287
0.318
0.352
0.385
0.414
0.428

0
1
2
3
4
5
6
7
8
9

10 training pattern
first training pattern
second training pattern

Вам также может понравиться