Вы находитесь на странице: 1из 15

Describing Data with Numerical Measures

A parameter is a measure obtained from the population of interest.


A statistic is a measure obtained from a sample.

Measures of Central Tendency Measures of central tendency describe the center of a given data set.

Mean The arithmetic mean or simply mean is the average of a given set of data. It is obtained by dividing the sum of all the observations by the number of observations totaled. Population Mean: Given the population data X1, X2, ,Xn, the population mean is given by
=

X =
i 1

Sample Mean: Given the sample data X1, X2, ,Xn, the sample mean is given by

___

x =

X
i 1

The number of U.S. cars in service by top car rental companies in a recent year according to Auto Rental News follows.
Company Enterprise Hertz National/Alamo Avis Dollar/Thrifty Budget Advantage U-Save Payless ACE Fox Rent-A-Wreck Triangle Number of Cars in Service 643,000 327,000 233,000 204,000 167,000 144,000 20,000 12,000 10,000 9,000 9,000 7,000 6,000

Median
The median is the middle value of the data that is arranged in increasing order. It divides the data set into two equal parts, with half of the data values greater than or equal to it, and the other half less than or equal to it. The position of the sample median is given by 0.5(n+1).

Mode

The most frequently occurring observation in the data set is called the mode. It is most useful for locating the center of a qualitative data.

Measures of Dispersion or Measures of Variability


Quantitative measures that describe the extent to which the data are dispersed are known as measures of dispersion or measures of variability. These numerical values determine how widely spread the observations are.

Range
The range is the difference between the lowest and highest observation in the data set.

Variance
The variance is the average squared deviations of the observations from their mean. The higher the value, the more dispersed the data set is. Population Variance: Given the population data X1, X2, ,XN, the population variance is given by

=
2

(X
i 1

X
i 1

2 i

Sample Variance: Given the sample data X1, X2, ,Xn, the sample variance is given by

(X
i 1

x)

__

n 1

X
i 1

2 i

n x

__ 2

n 1

Standard Deviation

The standard deviation is the positive square root of the variance. It is the average distance of the observations from their mean.

Population Standard Deviation: Given the population data X1, X2, ,XN, the population standard deviation is given by

Sample Standard Deviation: Given the population data X1, X2, ,Xn, the sample standard deviation is given by

2 s s

Coefficient of Variation
The coefficient of variation is the ratio of the standard deviation to its mean, expressed in percent. It is a relative measure of dispersion useful when comparing dispersion of two or more data sets with different units. It is given by CV = standard deviation/mean * 100%

Measures of Position

A measure of position describes the standing or location of an observation relative to the rest of the data.

Quartiles divide the data set into 4 equal parts, deciles into 10 equal parts, while percentiles into 100 equal parts.
Quartiles and deciles are special cases of percentiles. The 1st quartile is the 25th percentile. The 4th decile is the 40th percentile. The median is also the 2nd quartile, the 5th decile and the 50th percentile.

Percentile

i(n + 1) Pi = 100

Decile

i ( n + 1 ) Di = 10
i ( n + 1 ) Qi = 4

Quartile

At least i% of the data lie below the ith percentile, and at most (100 - i)% of the data lie above the ith percentile

Example: 90th percentile indicates that at least 90% of the data lie below it, and at most 10% of the data lie above it

Measure of Skewness
A measure of skewness describes the extent of departure of the distribution of the data from symmetry.

A symmetric data set is one in which, if a line is drawn through its center, the picture on one side is a mirror image of the picture on the other side.

Positively skewed

Negatively skewed

Skewed to the right or positively skewed distributions has more higher values that are spread out than lower values, that is, has a tail on the right.

Skewed to the left or negatively-skewed distributions has more lower values that spread out than higher values, that is, has a tail on the left.

Coefficient of Skewness

3(mean - median) SK = standard deviation


If SK is 0, the data has a symmetric distribution. If SK is positive, the data has a positively skewed distribution, i.e. the distribution has a tail on the right.

If SK is negative, the data has a negatively skewed distribution, i.e. the distribution has a tail on the left.

Вам также может понравиться