Вы находитесь на странице: 1из 9

1/3/2015

PrincipalComponentsAnalysis

PrincipalComponentsAnalysis
Supposeyouhavesampleslocatedinenvironmentalspaceorinspeciesspace(SeeSimilarity,Difference
andDistance).Ifyoucouldsimultaneouslyenvisionallenvironmentalvariablesorallspecies,thenthere
wouldbelittleneedforordinationmethods.However,withmorethanthreedimensions,weusuallyneed
alittlehelp.WhatPCAdoesisthatittakesyourcloudofdatapoints,androtatesitsuchthatthe
maximumvariabilityisvisible.Anotherwayofsayingthisisthatitidentifiesyourmostimportant
gradients.
Letustakeahypotheticalexamplewhereyouhavemeasuredthreedifferentspecies,X1,X2,andX3:

Inthisexample,itispossible(thoughitmightbedifficult)totellthatX1andX2arerelatedtoeach
other,anditislessclearwhetherX3isrelatedtoeitherX1orX2.Ourjobistodeterminewhetherthere
is/areahiddenfactor(s)orcomponent(s)(orinthecaseofcommunityecology,gradient(s))alongwhich
oursamplesvarywithrespecttospeciescomposition.
(NotethatX2hasnegativevalues,somethingthatwillnothappenwithrealspecies.Iamonlyincluding
suchavariabletodemonstratethattheinitialscalingisnotrelevantinPCA).
Thefirststageinrotatingthedatacloudistostandardizethedatabysubtractingthemeananddividing
http://ordination.okstate.edu/PCA.htm

1/9

1/3/2015

PrincipalComponentsAnalysis

bythestandarddeviation.Thus,thecentroidofthewholedatasetiszero.Welabelthesestandardized
axesS1,S2,andS3.Therelativelocationofpointsremainsthesame:

Afootnote:itmaybearguedthatweshouldnotdividebythestandarddeviationwewouldwant
aspecieswhichvariesfrom,say,8to10000individualstobeconsideredmorevariablethana
specieswhichvariesfrom100to102individuals.Bystandardizing,wearegivingallspeciesthe
samevariation,i.e.astandarddeviationof1.Weactuallycanhaveitbothways:aPCAwithout
dividingbythestandarddeviationisaneigenanalysisofthecovariancematrix,andaPCAin
whichyoudoindeeddividebythestandarddeviationisaneigenanalysisofthecorrelationmatrix.
(todothelatterinCANOCO,youneedtospecify"centerandstandardize"yourspeciesrecall
thatthecovarianceofstandardizedvariablesequalsthecorrelation!).Whenusingspecies/variables
measuredindifferentunits,youmustuseacorrelationmatrix.
Fromlookingatthelasttwofigures,onecanalreadyidentifyagradient:fromthelowerleftfronttothe
upperrightback.Inotherwords,thereappearstobeanunderlyinggradientalongwhichspecies1and
species2bothincrease(InthelanguageofGauch(1982),species1and2bothcontainsome"redundant"
information.Letusnowdrawalinealongthisgradient:

http://ordination.okstate.edu/PCA.htm

2/9

1/3/2015

PrincipalComponentsAnalysis

PrincipalComponentsAnalysischoosesthefirstPCAaxisasthatlinethatgoesthroughthecentroid,but
alsominimizesthesquareofthedistanceofeachpointtothatline.Thus,insomesense,thelineisas
closetoallofthedataaspossible.Equivalently,thelinegoesthroughthemaximumvariationinthedata.
ThesecondPCAaxisalsomustgothroughthecentroid,andalsogoesthroughthemaximumvariationin
thedata,butwithacertainconstraint:Itmustbecompletelyuncorrelated(i.e.atrightangles,or
"orthogonal")toPCAaxis1.
IfwerotatethecoordinateframeofPCAAxis1tobeontheXaxis,andPCAAxis2tobeontheYaxis,
thenwegetthefollowingdiagram:

http://ordination.okstate.edu/PCA.htm

3/9

1/3/2015

PrincipalComponentsAnalysis

Wecanseethatsamplesa,b,c,anddareatoneextremeofspeciescomposition,andsamplest,w,x,y,
andzareattheotherextreme.Butthereisasecondarygradientofspeciescomposition,fromsamplesb,
m,n,u,randtuptosamplesl,q,w,andy.Whatistheunderlyingbiologybehindsuchagradient?PCA,
andanyotherindirectgradientanalysis,issilentwithrespecttothisquestion.Thisiswherethe
biologicalinterpretationcomesin.Thescientistneedstoask,whatisspecialaboutthesamplesonthe
rightwhichmakethemfundamentallydifferentfromthosesamplesontheleft?Whatisitaboutthe
biologyofspecies1thatmakesitoccurinthesamelocationsasspecies2?
WehaveonlyplottedtwoPCAAxes.However,thereexistthreeaxesinthedataset(becausethereare
threespecies).Whydidwenotplotthethird?Thisisfortworeasons:
Ifweweregoingtoplotthreeaxes,thenwhyevenbothertoperformPCAinthefirstplace?We
endupwithjustascomplicatedadiagramaswestartoutwith(i.e.samplesin3dimensional
speciesspace.
Thethirdaxisismuch,muchlessimportantthanthefirsttwo,asdescribedbelow.
Howdowedeterminehowmanyaxesareworthinterpreting?Ultimately,thisisleftuptothereasonsfor
theinvestigation.Butabighintcanbefoundwiththeeigenvalues.Everyaxishasaneigenvalue(also
calledlatentroot)associatedwithit,andtheyarerankedfromthehighesttothelowest.Thefirstthrough
thethirdeigenvaluesforthefirstthreeaxesintheaboveexampleare1.8907,0.9951,and0.1142
respectively.Thesearerelatedtotheamountofvariationexplainedbytheaxis.Notethatthesumofthe
eigenvaluesis3,whichisalsothenumberofvariables.Itisusuallytypicaltoexpresstheeigenvaluesas
http://ordination.okstate.edu/PCA.htm

4/9

1/3/2015

PrincipalComponentsAnalysis

apercentageofthetotal:
PCAAxis1:63%
PCAAxis2:33%
PCAAxis3:4%

Inotherwords,ourfirstaxisexplainedor"extracted"almost2/3ofthevariationintheentiredataset,
andthesecondaxisexplainedalmostalloftheremainingvariation.Axis3onlyexplainedatrivial
amount,andmightnotbeworthinterpreting.
Howdoweknowwhichspeciescontributetowhichaxes?Welookatthecomponentloadings(or"factor
loadings"):
Species
S1
S2
S3

PCA1
0.9688
0.9701
0.1045

PCA2
0.0664
0.0408
0.9945

PCA3
0.2387
0.2391
0.0061

ThismeansthatthevalueofasamplealongthefirstaxisofPCAis0.9688timesthestandardized
abundanceofspecies1PLUS0.9701timesthestandardizedabundanceofspecies2PLUS0.1045times
thestandardizedabundanceofspecies3.
WecaninterpretAxis1asbeinghighlypositivelyrelatedtotheabundancesofspecies1and2,and
weeklynegativelyrelatedtotheabundanceofspecies3.Axis2,ontheotherhand,ispositivelyrelatedto
(andthereforecorrelatedwith)theabundanceofallspecies,butmostlyspecies3.Sothe"gradient"
reflectedbyAxis2issomethingwhichbenefitsspecies3.
PCAisextremelyusefulwhenweexpectspeciestobelinearly(orevenmonotonically)relatedtoeach
other.Unfortunately,werarelyencountersuchasituationinnature.Itismuchmorelikelythatspecies
haveaunimodalspeciesresponsecurve.Thatis,speciesusuallypeakinabundanceatsomeintermediate
partofenvironmentalgradients(seealsoExplorationsinCoenospace).Hereisahypotheticalcoenocline:

http://ordination.okstate.edu/PCA.htm

5/9

1/3/2015

PrincipalComponentsAnalysis

Thismeansthatspeciesarenonlinearlyrelatedtoeachother.Letusnowplottheabundanceofthe
abovethreehypotheticalspeciesinspeciesspace:

http://ordination.okstate.edu/PCA.htm

6/9

1/3/2015

PrincipalComponentsAnalysis

Howeveryoudescribetheabovecloudofpoints,itiscertainlynotasimplelineoraplane.PCAwould
failmiserablywithsuchadataset.Inparticular,PCAproducesanartifactknownastheHorseshoeEffect
(similartotheArchEffect),inwhichthesecondaxisiscurvedandtwistedrelativetothefirst,anddoes
notrepresentatruesecondarygradient.Donote,however,thatifweonlysampledasmallenough
sectionofthegradientthedatamightbelinearenoughtoallowtheuseofPCA.
FortheBoomerLakeexamplegiveninExplorationsinCoenospace,wehavebelttransectsestablished
alongalakeshore,andafairlywelldefinedzonationofplantspeciesoccursasafunctionofdistance
fromthewater.WhenweperformaPCAonthisdataset,wegetthefollowingdiagram:

http://ordination.okstate.edu/PCA.htm

7/9

1/3/2015

PrincipalComponentsAnalysis

Thisillustratesthehorseshoeeffect.Thesecondaxisisacurveddistortionofthefirstaxis.Thesecond
axisalsohasnoeasilyunderstoodbiologicalmeaning:thereisnoobviousreasonswhysamples6,7,and
8shouldbeatoppositeendsofagradientfromsamples1,2,and9through12.
However,dorecallthattherewasonepredominantgradient:thatofsample1through12(beinga
wetlandtodrylandgradient).However,PCAdistortsthisrelationshipwithsomeincurving.Insteadof
goingfromsample1to12(asitshould),themostextremesamplesalongPCAAxis1aresamples3and
10.
The"toe"ofthehorseshoecaneitherbeupordowninthiscaseitjusthappenstobedown.
Inthisparticularexample,weareabletoseethearch,andthereforemightbeabletoconcludethatthe
"real"extremesarequadrats1and10,11,or12.Thisisbecausethereisonlyonecleargradientandthe
gradientissostrong.However,inmanydatasets,theremaybemoreandweakergradients,aswellas
morenoise.Therefore,itwouldbeverydifficulttomakesenseofPCA.
AlthoughPCAisseldomusefulfortheanalysisofsamplesinspeciesspace,itisstillquiteappropriate
fortheanalysisofsamplesinenvironmentalspace.Thisisbecauseitislikelyformostenvironmental
variablestobemonotonicallyrelatedtounderlyingfactors,andtoeachother.Also,PCAallowstheuse
ofvariableswhicharenotmeasuredinthesameunits(e.g.elevation,concentrationofnutrients,
temperature,pH,etc.).
http://ordination.okstate.edu/PCA.htm

8/9

1/3/2015

PrincipalComponentsAnalysis

ThispagewascreatedandismaintainedbyMichaelPalmer.

Totheordinationwebpage

http://ordination.okstate.edu/PCA.htm

9/9

Вам также может понравиться