Академический Документы
Профессиональный Документы
Культура Документы
FromWikipedia,thefreeencyclopedia
Ahistogramisagraphicalrepresentationofthe
distributionofdata.Itisanestimateoftheprobability
distributionofacontinuousvariable(quantitative
variable)andwasfirstintroducedbyKarlPearson.[1]To
constructahistogram,thefirststepisto"bin"therange
ofvaluesthatis,dividetheentirerangeofvaluesintoa
seriesofsmallintervalsandthencounthowmany
valuesfallintoeachinterval.Arectangleisdrawnwith
heightproportionaltothecountandwidthequaltothe
binsize,sothatrectanglesabouteachother.Ahistogram
mayalsobenormalizeddisplayingrelativefrequencies.
Itthenshowstheproportionofcasesthatfallintoeachof
severalcategories,withthesumoftheheightsequaling
1.Thebinsareusuallyspecifiedasconsecutive,non
overlappingintervalsofavariable.Thebins(intervals)
mustbeadjacent,andusuallyequalsize.[2]The
rectanglesofahistogramaredrawnsothattheytouch
eachothertoindicatethattheoriginalvariableis
continuous.[3]
Histogram
First
KarlPearson
described
by
Purpose Toroughlyassesstheprobability
distributionofagivenvariableby
depictingthefrequenciesofobservations
occurringincertainrangesofvalues
Histogramsgivearoughsenseofthedensityofthedata,
andoftenfordensityestimation:estimatingthe
probabilitydensityfunctionoftheunderlyingvariable.Thetotalareaofahistogramusedforprobability
densityisalwaysnormalizedto1.Ifthelengthoftheintervalsonthexaxisareall1,thenahistogramis
identicaltoarelativefrequencyplot.
Ahistogramcanbethoughtofasasimplistickerneldensityestimation,whichusesakerneltosmooth
frequenciesoverthebins.Thisyieldsasmootherprobabilitydensityfunction,whichwillingeneralmore
accuratelyreflectdistributionoftheunderlyingvariable.Thedensityestimatecouldbeplottedasan
alternativetothehistogram,andisusuallydrawnasacurveratherthanasetofboxes.
AvariablebinwidthhistogramwasintroducedbyDenbyandMallows(2009)
(http://pubs.research.avayalabs.com/pdfs/ALR2007003paper.pdf).Examplesofthisaredisplayedon
Censusbureaudatabelow.
Anotheralternativeistheaverageshiftedhistogram
(http://www.stat.rice.edu/~scottdw/stat550/HW/hw4/c05.pdf)whichisfasttocompute,andgetsasmooth
curveestimateofthedensitywithoutusingkernels.
Thehistogramisoneofthesevenbasictoolsofqualitycontrol.[4]
Histogramsareoftenconfusedwithbarcharts.Abarchartisaplotofcategoricalvariables,andthe
discontinuityshouldbeindicatedbyhavinggapsbetweentherectangles.Oftenthisisneglectedwhichmay
leadtoabarchartbeingconfusedforahistogram.
Contents
1Etymology
2Examples
3Mathematicaldefinition
3.1Cumulativehistogram
3.2Numberofbinsandwidth
4Seealso
5References
6Furtherreading
7Externallinks
Etymology
Theetymologyofthewordhistogramisuncertain.Sometimesitis
saidtobederivedfromtheGreekhistos'anythingsetupright'(asthe
mastsofaship,thebarofaloom,ortheverticalbarsofa
histogram)andgramma'drawing,record,writing'.Itisalsosaid
thatKarlPearson,whointroducedthetermin1891,derivedthe
namefrom"historicaldiagram".[5]
Anexamplehistogramoftheheights
of31BlackCherrytrees.
Examples
Thisisatoyexample
Bin Count
3.5
2.5
32
1.5
109
0.5
180
0.5
132
1.5
34
2.5
3.5
Thelanguageusedtodescribethepatternsinahistogramare
symmetric,skewedleftorright,unimodal,bimodalormultimodal.
Symmetric,unimodal
Skewedright
Skewedleft
Bimodal
Multimodal
Symmetric
Itisagoodideatoplotyourdataonseveraldifferentbinwidthstolearnmoreaboutit.Hereisanexample
ontipsgiveninarestaurant.
Tipsusinga$1
Tipsusinga10c
binwidth,skewedright, binwidth,stillskewed
unimodal
right,multimodalwith
modesat$and50c
amounts,indicates
rounding,alsosome
outliers
Hereareacouplemoreexamples.
PricesofhousessoldinAmesin2009,exhibitssomerightskew.
Acesbyplayersinagrandslamtennistournament,facettedby
gender.Therearemoreacesinthemensgame.
TheU.S.CensusBureaufoundthattherewere124millionpeople
whoworkoutsideoftheirhomes.[6]Usingtheirdataonthetime
occupiedbytraveltowork,Table2belowshowstheabsolute
numberofpeoplewhorespondedwithtraveltimes"atleast30but
lessthan35minutes"ishigherthanthenumbersforthecategories
aboveandbelowit.Thisislikelyduetopeopleroundingtheir
reportedjourneytime.Theproblemofreportingvaluesassomewhat
arbitrarilyroundednumbersisacommonphenomenonwhen
collectingdatafrompeople.
Histogramoftraveltime(towork),US2000census.Area
underthecurveequalsthetotalnumberofcases.This
diagramusesQ/widthfromthetable.
Databyabsolutenumbers
Interval Width Quantity Quantity/width
0
4180
836
13687
2737
10
18618
3723
15
19634
3926
20
17981
3596
25
7190
1438
30
16369
3273
35
3212
642
40
4122
824
45
15
9200
613
60
30
6461
215
90
60
3435
57
Thishistogramshowsthenumberofcasesperunitintervalastheheightofeachblock,sothattheareaof
eachblockisequaltothenumberofpeopleinthesurveywhofallintoitscategory.Theareaunderthe
curverepresentsthetotalnumberofcases(124million).Thistypeofhistogramshowsabsolutenumbers,
withQinthousands.
Histogramoftraveltime(towork),US2000census.Area
underthecurveequals1.ThisdiagramusesQ/total/width
fromthetable.
Databyproportion
Interval Width Quantity(Q) Q/total/width
0
4180
0.0067
13687
0.0221
10
18618
0.0300
15
19634
0.0316
20
17981
0.0290
25
7190
0.0116
30
16369
0.0264
35
3212
0.0052
40
4122
0.0066
45
15
9200
0.0049
60
30
6461
0.0017
90
60
3435
0.0005
Thishistogramdiffersfromthefirstonlyintheverticalscale.Theareaofeachblockisthefractionofthe
totalthateachcategoryrepresents,andthetotalareaofallthebarsisequalto1(thefractionmeaning"all").
Thecurvedisplayedisasimpledensityestimate.Thisversionshowsproportions,andisalsoknownasa
unitareahistogram.
Inotherwords,ahistogramrepresentsafrequencydistributionbymeansofrectangleswhosewidths
representclassintervalsandwhoseareasareproportionaltothecorrespondingfrequencies:theheightof
eachistheaveragefrequencydensityfortheinterval.Theintervalsareplacedtogetherinordertoshowthat
thedatarepresentedbythehistogram,whileexclusive,isalsocontiguous.(E.g.,inahistogramitispossible
tohavetwoconnectingintervalsof10.520.5and20.533.5,butnottwoconnectingintervalsof10.520.5
and22.532.5.Emptyintervalsarerepresentedasemptyandnotskipped.)[7]
Mathematicaldefinition
Inamoregeneralmathematicalsense,a
histogramisafunctionmithatcountsthe
numberofobservationsthatfallintoeachofthe
disjointcategories(knownasbins),whereasthe
graphofahistogramismerelyonewayto
representahistogram.Thus,ifweletnbethe
totalnumberofobservationsandkbethetotal
numberofbins,thehistogrammimeetsthe
followingconditions:
Anordinaryandacumulativehistogramofthesamedata.
Thedatashownisarandomsampleof10,000pointsfroma
normaldistributionwithameanof0andastandard
deviationof1.
Cumulativehistogram
Acumulativehistogramisamappingthatcountsthecumulativenumberofobservationsinallofthebins
uptothespecifiedbin.Thatis,thecumulativehistogramMiofahistogrammjisdefinedas:
Numberofbinsandwidth
Thereisno"best"numberofbins,anddifferentbinsizescanrevealdifferentfeaturesofthedata.Grouping
dataisatleastasoldasGraunt'sworkinthe17thcentury,butnosystematicguidelinesweregiven[8]until
Sturges'sworkin1926.[9]
Usingwiderbinswherethedensityislowreducesnoiseduetosamplingrandomnessusingnarrowerbins
wherethedensityishigh(sothesignaldrownsthenoise)givesgreaterprecisiontothedensityestimation.
Thusvaryingthebinwidthwithinahistogramcanbebeneficial.Nonetheless,equalwidthbinsarewidely
used.
Sometheoreticianshaveattemptedtodetermineanoptimalnumberofbins,butthesemethodsgenerally
makestrongassumptionsabouttheshapeofthedistribution.Dependingontheactualdatadistributionand
thegoalsoftheanalysis,differentbinwidthsmaybeappropriate,soexperimentationisusuallyneededto
determineanappropriatewidth.Thereare,however,varioususefulguidelinesandrulesofthumb.[10]
Thenumberofbinskcanbeassigneddirectlyorcanbecalculatedfromasuggestedbinwidthhas:
Thebracesindicatetheceilingfunction.
Squarerootchoice
whichtakesthesquarerootofthenumberofdatapointsinthesample(usedbyExcelhistogramsandmany
others).[11]
Sturges'formula
Sturges'formula[9]isderivedfromabinomialdistributionandimplicitlyassumesanapproximatelynormal
distribution.
Itimplicitlybasesthebinsizesontherangeofthedataandcanperformpoorlyifn<30,becausethe
numberofbinswillbesmalllessthansevenandunlikelytoshowtrendsinthedatawell.Itmayalso
performpoorlyifthedataarenotnormallydistributed.
RiceRule
TheRiceRule[12]ispresentedasasimplealternativetoSturges'srule.
Doane'sformula
Doane'sformula[13]isamodificationofSturges'formulawhichattemptstoimproveitsperformancewith
nonnormaldata.
where istheestimated3rdmomentskewnessofthedistributionand
Scott'snormalreferencerule
where isthesamplestandarddeviation.Scott'snormalreferencerule[14]isoptimalforrandomsamplesof
normallydistributeddata,inthesensethatitminimizestheintegratedmeansquarederrorofthedensity
estimate.[8]
FreedmanDiaconis'choice
TheFreedmanDiaconisruleis:[15][8]
whichisbasedontheinterquartilerange,denotedbyIQR.Itreplaces3.5ofScott'srulewith2IQR,which
islesssensitivethanthestandarddeviationtooutliersindata.
ChoicebasedonminimizationofanestimatedL2[16]riskfunction
where
and aremeanandbiasedvarianceofahistogramwithbinwidth ,
and
.
Remark
Agoodreasonwhythenumberofbinsshouldbeproportionalto
isthefollowing:supposethatthe
dataareobtainedas independentrealizationsofaboundedprobabilitydistributionwithsmoothdensity.
Thenthehistogramremainsequallyruggedas tendstoinfinity.If isthewidthofthedistribution
(e.g.,thestandarddeviationortheinterquartilerange),thenthenumberofunitsinabin(thefrequency)is
oforder
andtherelativestandarderrorisoforder
.Comparingtothenextbin,the
relativechangeofthefrequencyisoforder
providedthatthederivativeofthedensityisnonzero.
Thesetwoareofthesameorderif isoforder
,sothat isoforder
.
Thissimplecubicrootchoicecanalsobeappliedtobinswithnonconstantwidth.
Seealso
Databinning
Densityestimation
Kerneldensityestimation,asmootherbutmore
complexmethodofdensityestimation
FreedmanDiaconisrule
WikimediaCommonshas
mediarelatedto
Histograms.
Imagehistogram
Paretochart
SevenBasicToolsofQuality
Voptimalhistograms
References
1. ^Pearson,K.(1895)."ContributionstotheMathematicalTheoryofEvolution.II.SkewVariationin
HomogeneousMaterial".PhilosophicalTransactionsoftheRoyalSocietyA:Mathematical,Physicaland
EngineeringSciences186:343414.Bibcode:1895RSPTA.186..343P
(http://adsabs.harvard.edu/abs/1895RSPTA.186..343P).doi:10.1098/rsta.1895.0010
(http://dx.doi.org/10.1098%2Frsta.1895.0010).
2. ^Howitt,D.andCramer,D.(2008)StatisticsinPsychology.PrenticeHall
3. ^CharlesStangor(2011)"ResearchMethodsForTheBehavioralSciences".Wadsworth,CengageLearning.
ISBN9780840031976.
4. ^NancyR.Tague(2004)."SevenBasicQualityTools"(http://www.asq.org/learnaboutquality/sevenbasic
qualitytools/overview/overview.html).TheQualityToolbox.Milwaukee,Wisconsin:AmericanSocietyfor
Quality.p.15.Retrieved20100205.
5. ^M.EileenMagnello(December2006)."KarlPearsonandtheOriginsofModernStatistics:AnElastician
becomesaStatistician"(http://www.rutherfordjournal.org/article010107.html).TheNewZealandJournalforthe
HistoryandPhilosophyofScienceandTechnology.1volume.OCLC682200824
(https://www.worldcat.org/oclc/682200824).
6. ^US2000census(http://www.census.gov/prod/2004pubs/c2kbr33.pdf).
7. ^Dean,S.,&Illowsky,B.(2009,February19).DescriptiveStatistics:Histogram.Retrievedfromthe
ConnexionsWebsite:http://cnx.org/content/m16298/1.11/
8. ^abcScott,DavidW.(1992).MultivariateDensityEstimation:Theory,Practice,andVisualization.NewYork:
JohnWiley.
9. ^abSturges,H.A.(1926)."Thechoiceofaclassinterval".JournaloftheAmericanStatisticalAssociation:65
66.JSTOR2965501(https://www.jstor.org/stable/2965501).
10. ^e.g.5.6"DensityEstimation",W.N.VenablesandB.D.Ripley,ModernAppliedStatisticswithS(2002),
Springer,4thedition.ISBN0387954570.
11. ^EXCEL2007:Histogram(http://cameron.econ.ucdavis.edu/excel/ex11histogram.html)
12. ^OnlineStatisticsEducation:AMultimediaCourseofStudy(http://onlinestatbook.com/).ProjectLeader:David
M.Lane,RiceUniversity(chapter2"GraphingDistributions",section"Histograms")
13. ^DoaneDP(1976)Aestheticfrequencyclassication.AmericanStatistician,30:181183
14. ^Scott,DavidW.(1979)."Onoptimalanddatabasedhistograms".Biometrika66(3):605610.
doi:10.1093/biomet/66.3.605(http://dx.doi.org/10.1093%2Fbiomet%2F66.3.605).
15. ^Freedman,DavidDiaconis,P.(1981)."Onthehistogramasadensityestimator:L2theory".Zeitschriftfr
WahrscheinlichkeitstheorieundverwandteGebiete57(4):453476.doi:10.1007/BF01025868
(http://dx.doi.org/10.1007%2FBF01025868).
16. ^Shimazaki,H.Shinomoto,S.(2007)."Amethodforselectingthebinsizeofatimehistogram"
(http://www.mitpressjournals.org/doi/abs/10.1162/neco.2007.19.6.1503).NeuralComputation19(6):1503
1527.doi:10.1162/neco.2007.19.6.1503(http://dx.doi.org/10.1162%2Fneco.2007.19.6.1503).PMID17444758
(https://www.ncbi.nlm.nih.gov/pubmed/17444758).
Furtherreading
Lancaster,H.O.AnIntroductiontoMedicalStatistics.JohnWileyandSons.1974.ISBN0471
512508
Externallinks
JourneyToWorkandPlaceOfWork
WikimediaCommonshas
mediarelatedtoHistogram.
Lookuphistogramin
Wiktionary,thefree
dictionary.
(http://www.census.gov/population/www/socdemo/journey.html)(locationofcensusdocumentcited
inexample)
Smoothhistogramforsignalsandimagesfromafewsamples
(http://www.mathworks.com/matlabcentral/fileexchange/30480histconnect)
Histograms:Construction,AnalysisandUnderstandingwithexternallinksandanapplicationto
particlePhysics.(http://quarknet.fnal.gov/toolkits/ati/histograms.html)
AMethodforSelectingtheBinSizeofaHistogram
(http://2000.jukuin.keio.ac.jp/shimazaki/res/histogram.html)
Interactivehistogramgenerator(http://www.shodor.org/interactivate/activities/histogram/)
Matlabfunctiontoplotnicehistograms
(http://www.mathworks.com/matlabcentral/fileexchange/27388plotandcomparenicehistograms
bydefault)
DynamicHistograminMSExcel(http://excelandfinance.com/histograminexcel/)
Histogramconstruction
(http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_ModelerActivities_MixtureModel_1)
andmanipulation
(http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_PowerTransformFamily_Gr
aphs)usingJavaapplets,andcharts(http://www.socr.ucla.edu/htmls/SOCR_Charts.html)onSOCR
Retrievedfrom"http://en.wikipedia.org/w/index.php?title=Histogram&oldid=641376957"