Вы находитесь на странице: 1из 12

Histogram

FromWikipedia,thefreeencyclopedia

Ahistogramisagraphicalrepresentationofthe
distributionofdata.Itisanestimateoftheprobability
distributionofacontinuousvariable(quantitative
variable)andwasfirstintroducedbyKarlPearson.[1]To
constructahistogram,thefirststepisto"bin"therange
ofvaluesthatis,dividetheentirerangeofvaluesintoa
seriesofsmallintervalsandthencounthowmany
valuesfallintoeachinterval.Arectangleisdrawnwith
heightproportionaltothecountandwidthequaltothe
binsize,sothatrectanglesabouteachother.Ahistogram
mayalsobenormalizeddisplayingrelativefrequencies.
Itthenshowstheproportionofcasesthatfallintoeachof
severalcategories,withthesumoftheheightsequaling
1.Thebinsareusuallyspecifiedasconsecutive,non
overlappingintervalsofavariable.Thebins(intervals)
mustbeadjacent,andusuallyequalsize.[2]The
rectanglesofahistogramaredrawnsothattheytouch
eachothertoindicatethattheoriginalvariableis
continuous.[3]

Histogram

First
KarlPearson
described
by
Purpose Toroughlyassesstheprobability
distributionofagivenvariableby
depictingthefrequenciesofobservations
occurringincertainrangesofvalues

Histogramsgivearoughsenseofthedensityofthedata,
andoftenfordensityestimation:estimatingthe
probabilitydensityfunctionoftheunderlyingvariable.Thetotalareaofahistogramusedforprobability
densityisalwaysnormalizedto1.Ifthelengthoftheintervalsonthexaxisareall1,thenahistogramis
identicaltoarelativefrequencyplot.

Ahistogramcanbethoughtofasasimplistickerneldensityestimation,whichusesakerneltosmooth
frequenciesoverthebins.Thisyieldsasmootherprobabilitydensityfunction,whichwillingeneralmore
accuratelyreflectdistributionoftheunderlyingvariable.Thedensityestimatecouldbeplottedasan
alternativetothehistogram,andisusuallydrawnasacurveratherthanasetofboxes.
AvariablebinwidthhistogramwasintroducedbyDenbyandMallows(2009)
(http://pubs.research.avayalabs.com/pdfs/ALR2007003paper.pdf).Examplesofthisaredisplayedon
Censusbureaudatabelow.
Anotheralternativeistheaverageshiftedhistogram
(http://www.stat.rice.edu/~scottdw/stat550/HW/hw4/c05.pdf)whichisfasttocompute,andgetsasmooth
curveestimateofthedensitywithoutusingkernels.
Thehistogramisoneofthesevenbasictoolsofqualitycontrol.[4]
Histogramsareoftenconfusedwithbarcharts.Abarchartisaplotofcategoricalvariables,andthe
discontinuityshouldbeindicatedbyhavinggapsbetweentherectangles.Oftenthisisneglectedwhichmay
leadtoabarchartbeingconfusedforahistogram.

Contents
1Etymology
2Examples
3Mathematicaldefinition
3.1Cumulativehistogram
3.2Numberofbinsandwidth
4Seealso
5References
6Furtherreading
7Externallinks

Etymology
Theetymologyofthewordhistogramisuncertain.Sometimesitis
saidtobederivedfromtheGreekhistos'anythingsetupright'(asthe
mastsofaship,thebarofaloom,ortheverticalbarsofa
histogram)andgramma'drawing,record,writing'.Itisalsosaid
thatKarlPearson,whointroducedthetermin1891,derivedthe
namefrom"historicaldiagram".[5]

Anexamplehistogramoftheheights
of31BlackCherrytrees.

Examples
Thisisatoyexample

Bin Count
3.5

2.5

32

1.5

109

0.5

180

0.5

132

1.5

34

2.5

3.5

Thelanguageusedtodescribethepatternsinahistogramare
symmetric,skewedleftorright,unimodal,bimodalormultimodal.

Symmetric,unimodal

Skewedright

Skewedleft

Bimodal

Multimodal

Symmetric

Itisagoodideatoplotyourdataonseveraldifferentbinwidthstolearnmoreaboutit.Hereisanexample
ontipsgiveninarestaurant.

Tipsusinga$1
Tipsusinga10c
binwidth,skewedright, binwidth,stillskewed
unimodal
right,multimodalwith
modesat$and50c
amounts,indicates
rounding,alsosome
outliers

Hereareacouplemoreexamples.
PricesofhousessoldinAmesin2009,exhibitssomerightskew.
Acesbyplayersinagrandslamtennistournament,facettedby
gender.Therearemoreacesinthemensgame.
TheU.S.CensusBureaufoundthattherewere124millionpeople
whoworkoutsideoftheirhomes.[6]Usingtheirdataonthetime
occupiedbytraveltowork,Table2belowshowstheabsolute
numberofpeoplewhorespondedwithtraveltimes"atleast30but
lessthan35minutes"ishigherthanthenumbersforthecategories
aboveandbelowit.Thisislikelyduetopeopleroundingtheir
reportedjourneytime.Theproblemofreportingvaluesassomewhat
arbitrarilyroundednumbersisacommonphenomenonwhen
collectingdatafrompeople.

Histogramoftraveltime(towork),US2000census.Area
underthecurveequalsthetotalnumberofcases.This
diagramusesQ/widthfromthetable.

Databyabsolutenumbers
Interval Width Quantity Quantity/width
0

4180

836

13687

2737

10

18618

3723

15

19634

3926

20

17981

3596

25

7190

1438

30

16369

3273

35

3212

642

40

4122

824

45

15

9200

613

60

30

6461

215

90

60

3435

57

Thishistogramshowsthenumberofcasesperunitintervalastheheightofeachblock,sothattheareaof
eachblockisequaltothenumberofpeopleinthesurveywhofallintoitscategory.Theareaunderthe
curverepresentsthetotalnumberofcases(124million).Thistypeofhistogramshowsabsolutenumbers,
withQinthousands.

Histogramoftraveltime(towork),US2000census.Area
underthecurveequals1.ThisdiagramusesQ/total/width
fromthetable.

Databyproportion
Interval Width Quantity(Q) Q/total/width
0

4180

0.0067

13687

0.0221

10

18618

0.0300

15

19634

0.0316

20

17981

0.0290

25

7190

0.0116

30

16369

0.0264

35

3212

0.0052

40

4122

0.0066

45

15

9200

0.0049

60

30

6461

0.0017

90

60

3435

0.0005

Thishistogramdiffersfromthefirstonlyintheverticalscale.Theareaofeachblockisthefractionofthe
totalthateachcategoryrepresents,andthetotalareaofallthebarsisequalto1(thefractionmeaning"all").
Thecurvedisplayedisasimpledensityestimate.Thisversionshowsproportions,andisalsoknownasa
unitareahistogram.
Inotherwords,ahistogramrepresentsafrequencydistributionbymeansofrectangleswhosewidths
representclassintervalsandwhoseareasareproportionaltothecorrespondingfrequencies:theheightof

eachistheaveragefrequencydensityfortheinterval.Theintervalsareplacedtogetherinordertoshowthat
thedatarepresentedbythehistogram,whileexclusive,isalsocontiguous.(E.g.,inahistogramitispossible
tohavetwoconnectingintervalsof10.520.5and20.533.5,butnottwoconnectingintervalsof10.520.5
and22.532.5.Emptyintervalsarerepresentedasemptyandnotskipped.)[7]

Mathematicaldefinition
Inamoregeneralmathematicalsense,a
histogramisafunctionmithatcountsthe
numberofobservationsthatfallintoeachofthe
disjointcategories(knownasbins),whereasthe
graphofahistogramismerelyonewayto
representahistogram.Thus,ifweletnbethe
totalnumberofobservationsandkbethetotal
numberofbins,thehistogrammimeetsthe
followingconditions:
Anordinaryandacumulativehistogramofthesamedata.
Thedatashownisarandomsampleof10,000pointsfroma
normaldistributionwithameanof0andastandard
deviationof1.

Cumulativehistogram
Acumulativehistogramisamappingthatcountsthecumulativenumberofobservationsinallofthebins
uptothespecifiedbin.Thatis,thecumulativehistogramMiofahistogrammjisdefinedas:

Numberofbinsandwidth
Thereisno"best"numberofbins,anddifferentbinsizescanrevealdifferentfeaturesofthedata.Grouping
dataisatleastasoldasGraunt'sworkinthe17thcentury,butnosystematicguidelinesweregiven[8]until
Sturges'sworkin1926.[9]
Usingwiderbinswherethedensityislowreducesnoiseduetosamplingrandomnessusingnarrowerbins
wherethedensityishigh(sothesignaldrownsthenoise)givesgreaterprecisiontothedensityestimation.
Thusvaryingthebinwidthwithinahistogramcanbebeneficial.Nonetheless,equalwidthbinsarewidely
used.
Sometheoreticianshaveattemptedtodetermineanoptimalnumberofbins,butthesemethodsgenerally
makestrongassumptionsabouttheshapeofthedistribution.Dependingontheactualdatadistributionand
thegoalsoftheanalysis,differentbinwidthsmaybeappropriate,soexperimentationisusuallyneededto
determineanappropriatewidth.Thereare,however,varioususefulguidelinesandrulesofthumb.[10]

Thenumberofbinskcanbeassigneddirectlyorcanbecalculatedfromasuggestedbinwidthhas:

Thebracesindicatetheceilingfunction.
Squarerootchoice

whichtakesthesquarerootofthenumberofdatapointsinthesample(usedbyExcelhistogramsandmany
others).[11]
Sturges'formula
Sturges'formula[9]isderivedfromabinomialdistributionandimplicitlyassumesanapproximatelynormal
distribution.

Itimplicitlybasesthebinsizesontherangeofthedataandcanperformpoorlyifn<30,becausethe
numberofbinswillbesmalllessthansevenandunlikelytoshowtrendsinthedatawell.Itmayalso
performpoorlyifthedataarenotnormallydistributed.
RiceRule

TheRiceRule[12]ispresentedasasimplealternativetoSturges'srule.
Doane'sformula
Doane'sformula[13]isamodificationofSturges'formulawhichattemptstoimproveitsperformancewith
nonnormaldata.

where istheestimated3rdmomentskewnessofthedistributionand

Scott'snormalreferencerule

where isthesamplestandarddeviation.Scott'snormalreferencerule[14]isoptimalforrandomsamplesof
normallydistributeddata,inthesensethatitminimizestheintegratedmeansquarederrorofthedensity
estimate.[8]
FreedmanDiaconis'choice
TheFreedmanDiaconisruleis:[15][8]

whichisbasedontheinterquartilerange,denotedbyIQR.Itreplaces3.5ofScott'srulewith2IQR,which
islesssensitivethanthestandarddeviationtooutliersindata.
ChoicebasedonminimizationofanestimatedL2[16]riskfunction

where

and aremeanandbiasedvarianceofahistogramwithbinwidth ,

and

.
Remark
Agoodreasonwhythenumberofbinsshouldbeproportionalto
isthefollowing:supposethatthe
dataareobtainedas independentrealizationsofaboundedprobabilitydistributionwithsmoothdensity.
Thenthehistogramremainsequallyruggedas tendstoinfinity.If isthewidthofthedistribution
(e.g.,thestandarddeviationortheinterquartilerange),thenthenumberofunitsinabin(thefrequency)is
oforder
andtherelativestandarderrorisoforder
.Comparingtothenextbin,the
relativechangeofthefrequencyisoforder
providedthatthederivativeofthedensityisnonzero.
Thesetwoareofthesameorderif isoforder
,sothat isoforder
.
Thissimplecubicrootchoicecanalsobeappliedtobinswithnonconstantwidth.

Seealso
Databinning
Densityestimation
Kerneldensityestimation,asmootherbutmore
complexmethodofdensityestimation
FreedmanDiaconisrule

WikimediaCommonshas
mediarelatedto
Histograms.

Imagehistogram
Paretochart
SevenBasicToolsofQuality
Voptimalhistograms

References
1. ^Pearson,K.(1895)."ContributionstotheMathematicalTheoryofEvolution.II.SkewVariationin
HomogeneousMaterial".PhilosophicalTransactionsoftheRoyalSocietyA:Mathematical,Physicaland
EngineeringSciences186:343414.Bibcode:1895RSPTA.186..343P
(http://adsabs.harvard.edu/abs/1895RSPTA.186..343P).doi:10.1098/rsta.1895.0010
(http://dx.doi.org/10.1098%2Frsta.1895.0010).
2. ^Howitt,D.andCramer,D.(2008)StatisticsinPsychology.PrenticeHall
3. ^CharlesStangor(2011)"ResearchMethodsForTheBehavioralSciences".Wadsworth,CengageLearning.
ISBN9780840031976.
4. ^NancyR.Tague(2004)."SevenBasicQualityTools"(http://www.asq.org/learnaboutquality/sevenbasic
qualitytools/overview/overview.html).TheQualityToolbox.Milwaukee,Wisconsin:AmericanSocietyfor
Quality.p.15.Retrieved20100205.
5. ^M.EileenMagnello(December2006)."KarlPearsonandtheOriginsofModernStatistics:AnElastician
becomesaStatistician"(http://www.rutherfordjournal.org/article010107.html).TheNewZealandJournalforthe
HistoryandPhilosophyofScienceandTechnology.1volume.OCLC682200824
(https://www.worldcat.org/oclc/682200824).
6. ^US2000census(http://www.census.gov/prod/2004pubs/c2kbr33.pdf).
7. ^Dean,S.,&Illowsky,B.(2009,February19).DescriptiveStatistics:Histogram.Retrievedfromthe
ConnexionsWebsite:http://cnx.org/content/m16298/1.11/
8. ^abcScott,DavidW.(1992).MultivariateDensityEstimation:Theory,Practice,andVisualization.NewYork:
JohnWiley.
9. ^abSturges,H.A.(1926)."Thechoiceofaclassinterval".JournaloftheAmericanStatisticalAssociation:65
66.JSTOR2965501(https://www.jstor.org/stable/2965501).
10. ^e.g.5.6"DensityEstimation",W.N.VenablesandB.D.Ripley,ModernAppliedStatisticswithS(2002),
Springer,4thedition.ISBN0387954570.
11. ^EXCEL2007:Histogram(http://cameron.econ.ucdavis.edu/excel/ex11histogram.html)
12. ^OnlineStatisticsEducation:AMultimediaCourseofStudy(http://onlinestatbook.com/).ProjectLeader:David
M.Lane,RiceUniversity(chapter2"GraphingDistributions",section"Histograms")
13. ^DoaneDP(1976)Aestheticfrequencyclassication.AmericanStatistician,30:181183
14. ^Scott,DavidW.(1979)."Onoptimalanddatabasedhistograms".Biometrika66(3):605610.
doi:10.1093/biomet/66.3.605(http://dx.doi.org/10.1093%2Fbiomet%2F66.3.605).
15. ^Freedman,DavidDiaconis,P.(1981)."Onthehistogramasadensityestimator:L2theory".Zeitschriftfr
WahrscheinlichkeitstheorieundverwandteGebiete57(4):453476.doi:10.1007/BF01025868
(http://dx.doi.org/10.1007%2FBF01025868).

16. ^Shimazaki,H.Shinomoto,S.(2007)."Amethodforselectingthebinsizeofatimehistogram"
(http://www.mitpressjournals.org/doi/abs/10.1162/neco.2007.19.6.1503).NeuralComputation19(6):1503
1527.doi:10.1162/neco.2007.19.6.1503(http://dx.doi.org/10.1162%2Fneco.2007.19.6.1503).PMID17444758
(https://www.ncbi.nlm.nih.gov/pubmed/17444758).

Furtherreading
Lancaster,H.O.AnIntroductiontoMedicalStatistics.JohnWileyandSons.1974.ISBN0471
512508

Externallinks
JourneyToWorkandPlaceOfWork

WikimediaCommonshas
mediarelatedtoHistogram.
Lookuphistogramin
Wiktionary,thefree
dictionary.

(http://www.census.gov/population/www/socdemo/journey.html)(locationofcensusdocumentcited
inexample)
Smoothhistogramforsignalsandimagesfromafewsamples
(http://www.mathworks.com/matlabcentral/fileexchange/30480histconnect)
Histograms:Construction,AnalysisandUnderstandingwithexternallinksandanapplicationto
particlePhysics.(http://quarknet.fnal.gov/toolkits/ati/histograms.html)
AMethodforSelectingtheBinSizeofaHistogram
(http://2000.jukuin.keio.ac.jp/shimazaki/res/histogram.html)
Interactivehistogramgenerator(http://www.shodor.org/interactivate/activities/histogram/)
Matlabfunctiontoplotnicehistograms
(http://www.mathworks.com/matlabcentral/fileexchange/27388plotandcomparenicehistograms
bydefault)
DynamicHistograminMSExcel(http://excelandfinance.com/histograminexcel/)
Histogramconstruction
(http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_ModelerActivities_MixtureModel_1)
andmanipulation
(http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_PowerTransformFamily_Gr
aphs)usingJavaapplets,andcharts(http://www.socr.ucla.edu/htmls/SOCR_Charts.html)onSOCR
Retrievedfrom"http://en.wikipedia.org/w/index.php?title=Histogram&oldid=641376957"

Categories: Statisticalchartsanddiagrams Qualitycontroltools Estimationofdensities


Nonparametricstatistics
Thispagewaslastmodifiedon7January2015at08:42.
TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmay
apply.Byusingthissite,youagreetotheTermsofUseandPrivacyPolicy.Wikipediaisa
registeredtrademarkoftheWikimediaFoundation,Inc.,anonprofitorganization.

Вам также может понравиться