Вы находитесь на странице: 1из 20

# Dr. Valerie P.

Muehsam, 2006

Introduction to Statistics In the business world, and in fact, in practically e ery aspect of daily li in!, "uantitati e techni"ues are used to assist in decision ma#in!. \$hy% &nli#e the classroom, in the 'real world( there is often not enou!h information a ailable to be !uaranteed of ma#in! a correct decision. )or instance, if ad ertisers would li#e to #now how many households in the &nited *tates with tele isions are tuned to a particular tele ision show, at a particular date and time, it would be impossible to determine without the complete cooperation of e ery household and an astonishin! amount of time and money. If a consumer protection a!ency wanted to determine the true proportion of prescription dru! users who also use herbal non+re!ulated o er+the+counter supplements, this information would most li#ely not be a ailable. ,s a result of the inability to determine characteristics of interest, the application of statistics, and other "uantitati e techni"ues has de eloped. *tatistics is defined as the process of collectin! a sample, or!ani-in!, analy-in! and interpretin! data. .he numeric alues which represent the characteristics analy-ed in this process are also referred to as statistics. \$hen information related to a particular !roup is desired, and it is impossible or impractical to obtain this information, a sample or subset of the !roup is obtained and the information of interest is determined for the subset. )or instance someone is interested in the a era!e annual income of all the students with ma/ors in the 0olle!e of 1usiness ,dministration at *am 2ouston *tate &ni ersity, the only way this information could be obtained is if the annual income of e ery student in this population could be collected, recorded and analy-ed without error.

*ince this would ta#e considerable time and money, and since the probability of collectin! the data necessary to determine the true annual salary of the students is small, a sample of this population will be ta#en. .he sample mean annual salary of the sample of students will be determined and used to estimate the true mean annual salary of all the students with ma/ors in the 0olle!e of 1usiness ,dministration at *am 2ouston *tate &ni ersity. .he study of statistics consists of two types3 descriptive statistics and inferential statistics. Descripti e statistics are characteristics, usually numeric, used to describe a particular data set. ,n e4ample of a descripti e statistic would be the a era!e final e4am !rade of ten students in an elementary statistics class. .his a era!e test score is used to indicate a 'typical alue( for the e4am !rades of the ten students. Inferential statistics, on the other hand, are similar to descripti e statistics in that each is calculated from a sample, but the difference is the use of the statistic. In inferential statistics, the statistic is used to ma#e inference, or ma#e decisions, about the entire population of interest. In other words, we ta#e a sample and calculate a statistic and use that statistic to ma#e inference about the actual alue of the characteristic in the entire population. )or instance, there are many descripti e characteristics of a firm5s customers that their mana!ement would li#e to #now but this information may be difficult or impossible to determine. Measurement of each and e ery customer of a lar!e retail firm is nearly impossible. 6 en if the information were !athered, it would be unli#ely that it would be timely. &nfortunately, mana!ers do not always #now what mean 7a era!e8 wee#ly demand for a product will be or what proportion of tele ision iewers will watch a

particular show. *ince these parameters of interest are not #nown, and usually impossible or impractical to determine, the parameters will be estimated usin! partial information !athered from a sample. )or instance, if the desired parameter is the mean annual salary of the income earnin! residents of a particular county, a sample of 200 of these residents could be obtained and the annual salary of each resident 7element8 in the sample could be determined and the mean annual salary of the sample residents. If the sample is drawn in a random fashion from a frame, or list, of the entire population, and if we use correct statistical techni"ues, the sample mean annual salary 7a statistic8 may be a !ood estimate of the true mean annual salary 7a parameter8 of all the residents of this county. , population includes all the elements of interest. \$e use the term 'element( to represent each indi idual unit of a !roup in which we ha e interest. )or instance, elements may refer to people 7i.e., customers8, records 7i.e., all loan accounts at a particular ban#8, products 7i.e., we are interested in the proportion defecti e8 etc. .he notation used in statistics to represent the population size is 'N(. In our e4ample abo e, the population of interest would be all the income earnin! residents of the county. 6ach of these residents is an element in our population. If the population of the income earnin! residents in the county was 90,000 then : ; 90,000. .he si-e of the population, :, is often not #nown. , sample is a subset of the population. .he notation for the sample size is 'n(. In our pre ious e4ample, the sample would be the 200 residents we sampled out of all the income earnin! residents in the county. In this case n ; 200.

<

, parameter is a characteristic, usually numeric, of the population. Populations ha e many parameters but researchers are often interested in only one or two of these characteristics. )or instance, in our e4ample abo e, the parameter of interest is the population mean annual salary of all the income earnin! residents of the county. .he mean annual salary is but one of many other characteristics of this population that may be of interest and could also be estimated. .he proportion of these residents who support a particular school bond issue and the mean a!e of the residents are two e4amples of other parameters that may be of interest. , statistic is a characteristic, usually numeric, of the sample. *amples, li#e populations, also ha e many statistics that may be calculated. )or each parameter of a population, there is a correspondin! statistic that may be calculated from a sample. ,n important item to remember is that a statistic is a random variable which indicates that each sample may result in many different alues for the statistic. )or instance, in the e4ample abo e, the statistic is the sample mean annual income of the 200 residents of the county. .his alue is called the 'sample mean( because it is calculated from the sample. ,lthou!h the sample mean is our 'best !uess( for the alue of the population mean it is one of many possible alues that could be calculated from different samples of si-e 200. In other words, there are many samples of 200 that could be collected from the population of 90,000 residents. &nfortunately, e en if we ta#e a random sample of 200, we could end up with the most affluent 200 residents in the county. .he sample mean calculated from this sample would not be representati e of the population. .he possibility of collectin! a sample li#e this cannot be i!nored. \$e will, howe er, learn to

use statistical techni"ues that allow us to estimate the probability of !ettin! a alue for the sample statistic that is not a !ood estimate of the population parameter. .he use of statistics to estimate parameters of interest is not !uaranteed to be successful. If the estimate is not '!ood( the result could be a faulty decision that, in turn, could result in loss of time and>or re enue. \$e must not allow "uantitati e techni"ues to ma#e decisions for us, we must use these techni"ues only as a tool to assist us in decision ma#in!.

Scale of Data Measurement 1efore any statistical techni"ue is employed, a researcher must determine the type of data that is to be collected. In a !eneral sense, there are two types of data3 qualitative data and quantitative data. Qualitative data cate!ori-es an element by a non+numeric attribute. )or instance, if we are interested in which political party a resident belon!s to, we are cate!ori-in! the resident usin! "ualitati e data3 Democratic, ?epublican, Independent, etc. @ualitati e data is often the data we are interested in !atherin! in the social sciences and particularly in business. )or instance, much of what we want to #now in business is related to attitudes or beha ior of consumers. .he data is not numeric and therefore more difficult to analy-e. \$e often calculate the proportion of elements with a particular characteristic 7i.e., the proportion of residents who own their own home8 but many techni"ues cannot be used on this type of data. .here are two types of "ualitati e data3 nominal data and ordinal data. Nominal data is, in terms of structure, the lowest form of data. :ominal data is

"ualitati e data that has no natural order. 64amples of nominal data include3 !enderA political affiliationA type of car ownedA product modelA etc. Data comprised of 'numbers( can also be "ualitati e data. Bip codes, area codes, telephone numbers are e4amples of data that are "ualitati e. In math terms, these data are not 'real( numbers because they do not represent numeric measures. Cne way to determine whether 'numbers( are numeric measures is to consider whether one mi!ht be interested in an a era!e of these 'numbers(. If a number can be replaced with letters, words or symbols without losin! any information then this indicates that a 'number( is :C. a numeric measure. data is "ualitati e data that has a natural order. 64amples of ordinal data include3 military ran#A si-e of clothin! usin! *, M, D, EDA place in which a race was finishedA condition of a used appliance usin! PCC?, ,V6?,F6, FCCD, 6E06DD6:.A etc. \$hile ordinal data has an order, the inter als between the ran#in!s are not e"ual inter als. .hus, while ordinal data has more structure than nominal data, math functions on the data, such as differences, are not alid. Quantitative data cate!ori-es an element by a numeric measure. @uantitati e data are true numbers and, as a result, more "uantitati e techni"ues are a ailable for use with this data. @uantitati e data can be di ided into two types of data3 inter al data and ratio data. Interval data is "uantitati e data that has no natural startin! point or -ero le el. 64amples of inter al data include )ahrenheit temperature and scores on I@ tests. 6ach 7of these type data8 is a numeric measure but neither has a natural startin! point or -ero le el. Bero de!rees )ahrenheit is not the absence of temperature /ust as there is no -ero le el for a test of intelli!ence. Inter al data can be used for any techni"ue that re"uires "uantitati e data, howe er, we must reali-e that ratios ha e no meanin! with this rdinal

type of data since there is no natural -ero le el. )or e4ample, 90 de!rees )ahrenheit is not twice as warm as 29 de!rees )ahrenheit. !atio data is "uantitati e data that has a natural startin! point or -ero le el. Most "uantitati e data falls into this scale of data measurement. 64amples of ratio scaled data include hei!ht, wei!ht, rate of return, net income, etc. *ince there is a natural -ero le el, ratios ha e meanin!.

Measures of "entral Tendency Cnce we ha e decided the type of data that we are !oin! to collect, we must determine the type of techni"ues that are appropriate for analy-in! the data. .he first or!ani-ational techni"ue we will most li#ely perform is to order the data from smallest alue to lar!est alue. \$e order the data to !et an idea about the ran!e of the alues obser ed. 0onsider a particular e4ample, if we ha e collected annual income fi!ures from G,000 households what mi!ht we be interested in #nowin! about this data% Perhaps we would be interested in a typical annual income alue for the data set. .ypical alues are often referred to as Measures of "entral Tendency. Measures of central tendency are attempts to identify typical alues which are representati e of the G,000 obser ations collected. .he three most common measures of central tendency are the mean, the median and the mode. ,ll three of these measures are referred to as 'a era!e( or 'typical( alues althou!h they are each different measures of typical. .he first, and most popular, measure of central tendency is the arithmetic mean, hereafter referred to as simply the mean. .he mean is calculated as the sum of the obser ations di ided by the number of obser ations. .he sample mean is denoted x and

the formula for calculatin! the sample mean is3 x = . .he population or true
x n

mean is denoted 7the Free# script letter 'mu(8 and is calculated the same way as the sample mean e4cept that all elements in the population are measured. .he mean re"uires at least inter al scaled data which means it is only alid for true numeric measures. .he mean is often referred to as the #\$ravitational center of the data set% which is similar to the balancin! point of the data. If e"ual wei!hts were placed on a scale representin! a number line for each obser ation in a data set, the mean would be the point at which the scale balances. *ince each obser ation has an e"ual wei!ht, the ma!nitude of the alues influence the mean. .he mean, while certainly the most commonly used measure of central tendency, is not always a !ood measure of 'typical.( )or instance, data sets that include e4treme alues relati e to the rest of the data 'pull( the mean in that direction. 64tremely small alues cause the mean to be 'small( and e4tremely lar!e alues cause the mean to be 'lar!e.( .he result is that the mean is not a '!ood( measure of typical and in fact, may be lar!er or smaller than all alues e4cept the e4treme one. \$hen e4treme alues occur in a data set, we often use another measure of typical referred to as the median. )or instance, attempts to find a typical income often is best e4pressed as the median income rather than the mean income since there is a lower limit 7-ero8 but not an upper limit on income. .he median is the second most commonly used measure of central tendency and is referred to as the positional avera\$e. .he median is the center value in an ordered data set. If the data set has an odd number of obser ations then the median is the alue found in the center of the distribution of ordered alues. If the sample set has an e en number of alues then the median is the mean of the two alues surroundin! the center of I

the data set. .he median is also P90, the fiftieth percentile. .his means that 90J or half of the alues are smaller than the median and half of the alues or 90J are !reater than the median. .he procedure for findin! the median is3 G. Crder the data set from smallest to lar!est 7or lar!est to smallest8. :C.63 this re"uires that the data can be ordered so the median cannot be found for nominal data. 2. )ind i, which is the location or position of the median. .his position can be
n +G , where n is the si-e of 2

## calculated by usin! the followin! formula3 i = the sample.

<. If i is an inte!er then the median is the alue found at the ith position in the ordered data set. If i is not an inte!er, then the median is the mean of the two alues surroundin! the ith position.
x. .he median is often denoted as M or K

.he last of the more common Measures of 0entral .endency is called the mode. .he mode is the most commonly occurrin! alue in a data set, in other words, the alue that occurs with the !reatest fre"uency. .he mode, unli#e either the mean or the median, does not ha e to be uni"ue. , data set can ha e more than one mode or no mode at all. , data set with3 one mode is referred to as unimodalA two modes is referred to as bimodalA and three or more modes is referred to as multimodal. .here is no uni ersal notation for the mode and the mode is alid for any type of data.

## Measures of Data &ariation

1esides a measure of 'typical,( what else mi!ht we want to #now about a data set% Do the measures of central tendency tell us all we need to '#now( about the obser ations we ha e collected% 0ertainly not, in fact, two data sets could ha e the same mean and be completely difference in terms of dispersion. 0onsider that we '#now( the mean depth of a la#e where we plan our ne4t office picnic. *uppose the mean depth of the la#e is = feet, is this all we need to #now about the depth of this la#e% :o. \$e need to #now how much the alues 7depth8 aries around = feet. .he depth of the la#e could be = feet at e ery point and ha e a mean of = feet or the depth of the la#e could ary !reatly around four feet and still ha e a mean of = feet. .here could be places where the depth is a few inches and other places where the depth is G0 feet. .his information about how the data are dispersed is ery important 7especially for those of us who cannot swim8. .he study of statistics could appropriately be referred to as the study of ariability since many of the techni"ues employ the comparison of the ariability of typical alues in different !roups to determine whether or not these alues are the same or different between !roups. Measures of Data &ariation 7 ariability, dispersion, or

spread8 are attempts to describe how spread out, or how much the alues ary, in a particular data set. 'll measures of data variation or dispersion re"uire "uantitati e data to calculate and are nonne\$ative. .he measures of data ariation are -ero 7if all the alues are e"ual8 or positi e. , 'lar!e( measure of spread indicates a more dispersed data set while a 'small( measure indicates a more ti!htly !rouped data set. .he easiest measure of spread to calculate is the ran!e. .he ran\$e is the difference between the lar!est or ma4imum alue and the smallest or minimum alue. .he notation and formula for the ran!e is3 R = H L , where 2 is the lar!est of

G0

ma4imum alue and D is the smallest or minimum alue. .he ran!e, while simple to calculate, is only informati e if it is 'small.( '*mall( and 'lar!e( are relati e terms and must be determined relati e to the ma!nitude of the alues measured. )or instance, a ran!e of M< for dinner could be characteri-ed as 'small( if we are eatin! at a fi e+star restaurant in a pricey hotel in :ew Nor# 0ity where the dinner entrees ran!e in price from MG2.00 to M<9.00 but may be characteri-ed as 'lar!e( if we5re eatin! at a local fast+ food restaurant. If the ran!e is 'small( it means that the two e4treme alues are ery close to each other, so the rest of the alues must also be ti!htly !rouped. If the ran!e is 'lar!e( we #now that the e4treme alues are a lon! way from each other but we #now nothin! about the distribution of the rest of the obser ations. *ince the ran!e only uses two alues in its calculation, we are pro ided with limited information. Di#e our fa orite measure of central tendency, the mean, we mi!ht li#e to come up with a measure of ariability that incorporates all the alues in the data set as opposed to usin! only the two alues needed to calculate the ran!e. \$e mi!ht be interested in findin! out, on the a era!e, how much the alues ary around a 'typical alue.( In an effort to describe the ariability of a data set we could measure the distance each alue is from the mean, our standard measure of 'typical.( .he distance a alue is from the mean is called the 'deviation from the mean( and is found by subtractin! the mean from a particular alue. .his de iation from the mean can be ne!ati e, 7if the alue is smaller than the mean8 positi e, 7if the alue is bi!!er than the mean8 or -ero 7if the alue is e"ual to the mean8. .o calculate the avera\$e deviation from the mean, we could sum the de iations from the mean for each alue in the data set and di ide by the number of obser ations in our sample. &nfortunately, althou!h a !ood idea intuiti ely, this alue

GG

will always be -ero since the mean is the !ra itational center of the data set and as a result, the sum of the deviations from the mean sum to zero and so the a era!e

7 x x 8 = 0 .
n

## .his occurs because the de iations from

the mean that are ne!ati e offset the de iations from the mean that are positi e. \$e can a oid this problem by usin! the absolute alue or s"uare of the de iations from the mean. .he Mean 'bsolute Deviation (M'D), is the sum of the absolute de iations from the mean di ided by the sample si-e3 MAD =

O xx O n

## . .he M,D is used in

financial analysis to determine the ariability in stoc# prices from the e4pected price. &nfortunately, while the M,D is the 'best( measure of spread for descripti e purposes, it is not useful for inferential statistics since the distribution of an absolute alue function is not smooth. .he sample variance* denoted s+, is the sum of the s"uared de iations from the mean di ided by the sample si-e less one 7n+G8. 0ontinuin! our effort to find an a era!e de iation from the mean, we s"uare the de iations from the mean to eliminate any ne!ati e alues so our numerator is not e"ual to -ero, and then di ide by the sample si-e less one. Cur denominator is made smaller 7hence our ariance is made lar!er8 as an ad/ustment to our estimate for the true population variance, denoted + 7si!ma s"uared8 since we calculate the sample ariance, s2, usin! the sample mean, x , instead of the true population mean, 7mu8. .he true measure of ariability for the population should be calculated accordin! to each alue5s distance from , the population mean. .he

G2

ad/ustment in the denominator ma#es our estimate lar!er than without the ad/ustment to account for the estimate 7 x 8 used in the numerator. *ince we would prefer to ha e a 'small( measure of ariability because this indicates that the mean, x , is a !ood measure of 'typical( since most of the alues are 'close to( the mean, ad/ustin! our estimate for the ariance to be lar!er is considered to be conser ati e. \$e are unsure of the true alue of the mean so we use the alue of the sample mean to estimate the ariability in the data. .he de iations from the mean are estimated usin! de iations from the sample mean. It is said that we lose one de!ree of freedom 7df8 in the denominator for e ery estimate in the numerator. 'll variances are of the form, sum of squares divided by de\$rees of freedom.he problem with the ariance is that the alue is in s"uared units. )or instance, if we are measurin! the dollar amount spent on lunch, the ariance will be in dollars s"uared. *ince s"uared units ma#e interpretation difficult, we normally ta#e the s"uare root of the ariance to return to the ori!inal units of measurement. .he positi e s"uare root of the sample ariance, s2, is the sample standard deviation, s. .he sample standard de iation, s, is our estimate for the true population standard deviation, denoted ( si\$ma), which is the positi e s"uare root of the population ariance, 2. .he definitional formula for the sample ariance, s2, is !i en below followed by an al!ebraic manipulation which we call the computation formula. .he computational formula is easier and faster to calculate but intuiti ely the definitional formula ma#es more sense as our estimate of the 'a era!e( 7s"uared8 de iation from the mean.

s2 =

7x x8
n G

7 x 8 2 n

n G

G<

s=

s2

## ; the sample standard de iation

,lthou!h we rarely calculate parameters, the followin! formulae are !i en for the population ariance and the population standard de iation.

7 x 8 =
N

x 2

7 x 8 2 N

## = 2 ; the population standard de iation.

.ses of the Standard Deviation .he standard de iation of a sample is an attempt to estimate the typical distance that alues in the data set differ from the mean. \$e use the standard de iation as the step+si-e to estimate the percenta!e of alues that lie within G step, 2 steps, or three steps of the mean. )or e4ample, 0hebyshe 5s .heorem, which applies to any distribution re!ardless of its shape, states that within k standard de iations of the mean, at least

G J of the alues will fall. *ince 0hebyshe 5s .heorem applies to any distribution k2

re!ardless of shape, the information learned is less specific then we mi!ht li#e. In other words, usin! the formula, we would disco er that at least /01 of the observations (in any distribution) lie within + standard deviations of the mean- .his means that H9J+ G00J of the alues will fall within two standard de iations of the mean. \$hile some information is better than none, we would li#e to be more precise in our estimate of this percenta!e. )or certain #nown distributions, we can more precisely estimate the percenta!e of alues that lie within one, two or three standard de iations of the mean.

G=

The 2mpirical !ule, which only applies to a normal distribution, pro ides us with much more information about this particular distribution than 0hebyshe 5s .heorem. .he 6mpirical ?ule states that for any normal distribution, appro4imately 6IJ of the alues will fall within one standard de iation of the mean, appro4imately L9J of the alues will fall within two standard de iations of the mean, and appro4imately LL.HJ of the alues will fall within three standard de iations of the mean. .his much more precise information is only true for data distributed normally. .he normal distribution, sometimes referred to as the Faussian distribution after Parl Fauss who disco ered that the normal distribution of certain errors, is bell+shaped and symmetrical, and models the beha ior of many random ariables. \$e will discuss the normal distribution as well as its probability distribution later in the course.

Measures of 3osition or 4ocation Measure of central tendency and measures of data ariation are sin!ular alues to describe an entire data set. Measure of position or location are measures of an indi idual alue and indicate the relati e position of that alue to the other alues in the data set. , commonly used measure of position is a percentile. ,ptitude tests often pro ide an indi idual5s percentile ran#in! to let them #now how they did relati e to others who too# the test. .o determine what test score e4ceeds a certain percenta!e of test scores, we first di ide our data set into G00 e"ual parts and then count in to determine the location of the alue that corresponds to the percentile we are interested in.

G9

.he #th percentile, P#, is that alue which is e"ual to or !reater than, #J of the obser ations and is less than or e"ual to the remainin! 7G00+#8J of the obser ations. .he procedure for calculatin! the #th percentile is3 G. Crder the data from smallest to lar!est alue. 2. )ind
nk , where n is the sample si-e and # is the percentile you are G00

## calculatin!. <. 7a8 if

nk is not an inte!er, then i, the position of the #th percentile, will be G00

## nk nk Q.9. )or e4ample if ; 6 then i ; 6.9. G00 G00

=. 7a8 if i is an inte!er 7<a abo e8 then the #th percentile if the alue found at the ith position. )or e4ample, in <a abo e, i ; 9, so the #th percentile is the 9th alue in the ordered data set. 7b8 if i is not an inte!er 7<b abo e8 then the #th percentile if the mean of the two alues surroundin! the ith position. )or e4ample, in <b abo e, i ; 6.9, so the #th percentile is the mean of the si4th and se enth alues in the ordered data set. *ometimes, instead of bein! interested in what data point has a certain percenta!e abo e it or below it, researchers are interested in determinin! the alue that is 'typical( for the 'center( !roup of alues. )or e4ample, suppose we are char!ed with the

G6

responsibility of de elopin! the curriculum for a #inder!arten class. .he students in a class of #inder!arteners could differ tremendously in terms of ac"uired #nowled!e. *uppose, in an effort to de elop the curriculum, we !i e each student in the class an aptitude class to measure his>her abilities in basic #nowled!e. .he scores may ary !reatly since some of the students may ha e attended preschool since they were ery youn! while others may not ha e attended at all. If we do not ha e the resources to ha e multi+le el curriculum, then we would de elop a curriculum that was tar!eted at those 'in the middle( in terms of their aptitude scores. *ince we are interested in tar!etin! the center of the distribution of aptitude scores, we will determine what constitutes the 'middle 90J( and !ear our curriculum at those students. Quartiles, which are /ust specific percentiles, allow us to di ide our data into four e"ual !roups. .he first or lower "uartile, @G, is e"ual to the 29J percentile, P29. .he second or mid+"uartile, @2, is e"ual to the 90J percentile, P90, which is also the median, M. .he third or upper "uartile, @<, is e"ual to the H9J percentile, PH9. \$e use these "uartiles to help us determine characteristics of the middle 90J of our data. )or e4ample, the Interquartile !an\$e (IQ!), is the ran!e of the middle 90J of the data. Di#e the ran!e, the I@? is a measure of data ariation or dispersion but instead of indicatin! the ran!e of all the data li#e the ran!e does, the I@? indicates the ran!e of only the middle 90J. Di#e other Measures of Data Variation, the I@? re"uires "uantitati e data to calculate. .he formula for the I@? is3 IQR = Q< QG . .o calculate the I@?, the first and third "uartiles are determined by findin! the correspondin! percentile, i.e., @<;PH9 and @G;P29.

GH

.he Mid5Quartile !an\$e* (MQ!)* is a statistic we calculate to determine a 'typical( alue in the middle !roup of obser ations. .he M@? is a Measure of 0entral .endency and is the mean of the e4treme alues of the middle 90J of the obser ations. It is not the mean of all obser ations in the middle 90J, but instead we find the mean of

the first and third "uartiles. .he formula for the M@? is3 MQR =

QG + Q< . 2

,nother measure of position or location is called the 65score or 6 value. .he B+ score for a particular alue in a data set indicates the number of standard deviations that value is from the mean- B+scores can be ne!ati e 7if the alue is less than the mean8, positi e 7if the alues is lar!er than the mean8, or e"ual to -ero 7if the alue is e"ual to the mean8. .he B+score for the mean is always -ero. )or e4ample, a alue with a B+score of G.<9 is G.<9 standard de iations above the mean. , alue with a B+score of R2.G2 is 2.G2 standard de iations below the mean. B+ alues can be calculated, and a *tandard :ormal .able used, to determine appro4imately what proportion of the alues, for a normal distribution, are abo e or below a particular alue, or between two alues in a distribution.

7requency Distributions Terminolo\$y, Defn, The frequency* f* for a value or a class of values is the number of times that value or class of values occurs in the data set\$e are simply countin! how often a alue or set of alues occurs in the data set. G. \$hat is the minimum number of times a alue or class of alues occur7s8 in a data set% .he minimum number of times a alue or class of alues can occur is -ero 708. \$hat is the ma4imum number of times a alue or class of alues can occur in GI

the data set% .he ma4imum number of times a alue or class of alues can occur in the data set is n, or the total number of alues in the data set. 8fn 2. If we add the fre"uencies for each alue or set of alues it will sum to n. f 9 n Defn, The relative frequency* f:n* (how often the value occurs divided by the total number of observations;\$ives you a proportion of times a value or class of values occurs) for a value or a class of values is the proportion of time that a value or class of values occurs in the data setG. \$hat is the minimum proportion of time a alue or class of alues occur7s8 in a data set% .he minimum proportion of time a alue or class of alues can occur is -ero 708. \$hat is the ma4imum proportion of time a alue or class of alues can occur in the data set% .he ma4imum proportion of time a alue or class of alues can occur in the data set is one 7G8. 8 f:n < 2. If we add the relati e fre"uencies for each alue or set of alues it will sum to one 7G8. f:n 9 < Defn, The cumulative frequency* 7* for a value or a class of values is the number of times that value or any smaller value occurs in the data set\$e are simply #eepin! a runnin! total. G. 0umulati e fre"uencies are non+decreasin! 7this means the alues cannot decreaseSthey can le el off but they can5t !o down8. 2. .he cumulati e fre"uency for the last alue or class of alues is n. <. \$e must ha e at least ordinal scaled data to find cumulati e fre"uencies. Defn, The cumulative relative frequency* 7:n* for a value or a class of values is the proportion of time that value or any smaller value occurs in the data set\$e are simply #eepin! a runnin! total of relati e fre"uencies or proportions. G. 0umulati e relati e fre"uencies are non+decreasin!. 2. .he cumulati e relati e fre"uency for the last alue or class of alues is one 7G8. <. \$e must ha e at least ordinal scaled data to find cumulati e relati e fre"uencies.

GL

20