Вы находитесь на странице: 1из 5

Z Score

The Z score is a very useful and intuitive concept. Its concept is used in statistics
repeatedly. The Z score uses two of the more common parameters the mean and the
standard deviation. The problem of accurate and consistent measurement has been a
difficult subject throughout the history. The yardstick differed from one time to another
and across different locations and cultures. Different countries and rulers tried to unify
the unit of measurement. The closest unit to become universally accepted is the meter. A
meter and the metric system are the most universal measuring system. Even the metric
system is not commonly used in all quarters in spite of its ease and applicability. The
metric system has its limitations too. One problem is the difference in scale.
County fairs have farming contests and give prizes for the “best” in different
categories. For example the biggest pumpkin grower or the owner of the largest pig
receive a prize. If they decide to give a prize to the grower of the largest produce how
should the winner be determined? Even the largest apple on the record is not a match for
any pumpkin. Similarly who has the fastest animal the owner of the fastest horse or the
owner of the winner of the pig race? Would it be fair to compare the amount of the milk
of a cow with that of a goat? How can we compare the output of the most productive
small manufacturer to that of a large one, which is fully automated?
In statistics everything is measured in relative terms. If an apple is compared to
the average apple and a pumpkin is compared to the average pumpkin we can determine
if each one is larger or smaller than their respective averages. We cannot determine how
large or how small each one is in a sensible way unless we put them in some perspective.
Let the apple and the pumpkin both be 2.4 ounces larger than their own averages. These
numbers do not indicate if either one is really large or relatively large. They do not allow
comparison of the two either. Are the apple and the pumpkin as large compared to their
own mean? If we put things in perspective using a universal yardstick then comparison
will be easier. The yardstick used in statistics is the standard deviation. If the apple is
3.15 standard deviation larger than an average apple it is a very large apple indeed. If the
pumpkin is .36 standard deviation larger than an average pumpkin then it is not a very
large pumpkin, just a little over the average, and not unexpected. With the above data we
can say that the apple is relatively larger than the pumpkin. The above concept is called
the Z score.
The Z score is defined as follows:

O b s −e Er vx ep de c t e d
Z=
1
S t a dn a rd de v i ao tf Oi o bn s e r v e d

X −µ
Z = 2
σ
The expected value of an observation is its mean; its standard deviation is σ . To
obtain the Z score of a value subtract its mean from it and divide the result by the
standard deviation of the observation.
The distance of an observation from its expected value is also called error. The Z
score is the scaled error. The unit of measurement of Z score is the standard deviation.
Obtaining the Z score of an observation is also known as standardizing the value.
Standardization refers to the process of obtaining the Z score of a value. In this context,
the process of obtaining the Z score of a value from the normal distribution is known as
standardization or normalization. The result is also known as the normal (o, 1),
pronounced normal zero one. The table of standard normal is the collection of all the
values from a normal distribution that have been standardized and their corresponding
probabilities.
Standardization process can be applied to any value. If the item under
consideration is a single observation from the population the result is called the Z score.
X −µ
Z = 3
σ
Suppose two students are given the task of measuring the error of an observation.
The first one found that the observation is 41.6666 feet. In fact the decimal places never
ended. Disliking decimal places and specially the never ending ones, he measured the
deviation of the data point from the mean in inches and was relieved to find out that it has
no decimal place. He presented his error of the observation as 500 inches. The second
student, using an electronic measuring “tape,” subtracted the observation from the mean
and reported 0.007891414 as the error. While the first error seems large and the second
seems small, both are the same. 500 inches is 41.6666667 feet or .007891414 miles. Z
score provides a unique and comparable measure of error. Every error is reported in the
units of its own standard deviation. Since the Z score is in terms of the standard
deviation, it allows comparison of unrelated data measured in different units.

What Do the Others Say?


http://psych.colorado.edu/~mcclella/java/zcalc.html

Standardization of Sample Mean



If the value under consideration is the sample mean, X , the resulting Z score
would be:

X−µ
Z= 4
σ−
X

Where X is the sample mean, µ is its expected value, which is also the

σ
population mean, and the standard deviation of the sample is σ X− = . The standard
n
deviation of the sample mean is also known as the standard error. Even though the letter
Z is used for either a Z score or the normal (0,1), its use is justified here. According to the
central limit theorem, the equation 4 has a normal (0,1) distribution if the sample size
approaches infinity.

Z Score and the Chebyshev’s Theorem


The Chebyshev’s theorem is usually stated in words as:
The proportion of observations falling within K standard deviations of the mean is at least
1
(1- 2 ).
K
This is the same Z score concept. The definition indicates that we need to find the
difference of a value from its mean. I.e. (X- µ ). Since the theorem applies to all the
values within k standard deviation, i.e. K σ , the absolute value is desired. The theorem
sets a minimum limit for the |X- µ | < K σ . Therefore, the Chebyshev’s theorem is
stating that:

1
P(|X- µ |<K σ ) ≥(1- ) 5
K2
But since σ is a non-negative value, dividing both sides of the inequality |X- µ |
< K σ by σ will not change the sign. Therefore
| X −µ| 1
P( < k ) ≥ (1 − 2 ) 6
σ K
1
P (| Z |< k ) ≥ (1 − 2 ) As it is evident, the Z score is the core of the Chebyshev’s
K
theorem. In fact, Chebyshev was one of the greater contributors to the Central Limit
Theorem.
Note that the first part of the equation {i.e. P (| Z |< k ) } is the same as the same as
the confidence interval of a range.

Z Score and Test statistics


The test statistics used to test hypothesis is in fact the standardized value of the
observation. The only difference is that the expected value is the value indicated by the
null hypothesis. The null hypothesis is the expected outcome until proven otherwise.
The general formula in equation one applies to all cases. Depending on the hypothesis the
observed value, the expected value and the standard deviation of the observed value are
modified.

Test statistics for a Mean


The test statistics for testing the hypothesis that the mean of a population is equal
a constant value µ 0 is:

X − µ0
Z= 7
σ−
X
If the standard error is not known replace it with its estimate, the sample standard
error. Recall that the standard deviation of the sample mean is also known as the standard
error. If the sample size is large then the test statistics is:

X − µ0
Z= 8
S−
X
If the standard deviation is not known and the sample size is small the test
statistics will be:

X − µ0
t= 9
S−
X
For more detail see hypothesis, test of hypothesis, test statistics, inference using
test of hypothesis.
Z score in Regression
Standardization is also used in regression for testing purposes.

Test statistics for Regression Slope


Usually the standard deviation is not known and its estimate is used. The formula
is given in terms of t, but if the sample is large enough the Z table could be utilized.
^
β 1− β 1
t= 10
S^
β1
^
is the estimate of the slope, β1 is the hypothesized value of the slope,
Where
β 1

and S^ is the standard deviation of the estimate of the slope. This test statistics will work
β1
for any slope in multiple regression. Most statistics software provide a column of the
standard deviations of the slopes and a column of calculated t. The first one corresponds
to the denominator of equation 9 and the latter one corresponds to the entire equation.
For more detail see hypothesis, test of hypothesis, test statistics, inference using
test of hypothesis.

Test statistics for Regression Residual


If the value under consideration is a regression error value, then the Z score would
be:
^
ε−0
^
ε
Z = Standardized residuals = = 11
S^ S (1 − h)
ε
^
Where
ε is the estimated error and its expected value is zero. S is the square root

of the MSE, and h is given by:


 − 
1 ( X p − X )2 
h= + 
 n Σ( X ) 2 − (ΣX ) 
2

 n 
Notice that h is identical to the value that is multiplied by the σ to obtain the
^
variance of the mean of Y .
It is not customary to designate this formula as a Z. The Z notation is reserved for
the Z score and for the normal (0,1) values. Furthermore, the distributional properties of
the equation in 11 depend on the sample size. If the errors are random and the sample size
is large, then it has a normal sampling distribution. If the sample size is small, then it will
have a t distribution. Therefore, it is called standardized residuals. For more detail see
the residual analysis.

What Do Others Say?

http://olam.ed.asu.edu/~glass/502/chp6/chp6.html

Вам также может понравиться