Академический Документы
Профессиональный Документы
Культура Документы
Chapter 1 Outline
• Describing a Single Data Variable
o Introduction to Distributions
o Measure of the Distribution Center: Mean (Average)
o Measures of the Distribution Spread: Range, Variance, and
Standard Deviation
o Histogram: Visual Illustration of a Data Variable’s Distribution
• Describing the Relationship between Two Data Variables
o Scatter Diagram: Visual Illustration of How Two Data Variables
Are Related
o Correlation of Two Variables
o Measures of Correlation: Covariance
o Independence of Two Variables
o Measures of Correlation: Correlation Coefficient
o Correlation and Causation
• Arithmetic of Means, Variances, and Covariances
2. Consider the monthly growth rates of the Dow Jones Industrial Average and the
Nasdaq Composite Index.
a. In most months, would you expect the Nasdaq’s growth rate to be high
or low when the Dow’s growth rate is high?
b. In most months, would you expect the Nasdaq’s growth rate to be high
or low when the Dow’s growth rate is low?
c. Would you describe the Dow and Nasdaq growth rates as being
correlated or uncorrelated?
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1901 2.09 0.56 5.66 5.80 5.12 0.75 3.77 5.75 3.67 4.17 1.30 8.51
1902 2.13 3.32 5.47 2.92 2.42 4.54 4.66 4.65 5.83 5.59 1.27 4.27
1903 3.28 4.27 6.40 2.30 0.48 7.79 4.64 4.92 1.66 2.72 2.04 3.95
1904 4.74 2.45 4.48 5.73 4.55 5.35 2.62 4.09 5.45 1.74 1.35 2.75
1905 3.90 1.70 3.66 2.56 1.28 2.86 2.63 6.47 6.26 2.27 2.06 3.15
1906 2.18 2.73 4.90 3.25 4.95 2.82 3.45 6.42 2.59 5.69 1.98 4.49
1907 2.73 1.92 1.82 1.98 4.02 2.61 3.87 1.44 8.74 5.00 4.50 3.89
1908 2.25 3.53 2.86 1.97 4.35 0.76 3.28 4.27 1.73 1.57 1.06 3.05
1909 3.56 5.16 3.01 5.53 3.36 2.24 2.24 3.80 4.99 1.23 1.06 2.95
1910 6.14 5.08 1.37 3.07 2.67 2.65 1.90 4.03 2.86 0.93 3.69 1.72
1911 2.36 2.18 3.80 1.87 1.37 2.02 4.21 5.92 3.41 8.81 3.84 4.42
1912 2.18 3.16 5.70 3.92 4.34 0.77 2.61 3.22 2.52 2.07 4.03 4.04
1913 3.98 2.94 6.30 3.30 4.94 0.90 1.59 2.26 2.56 5.16 2.11 3.38
1914 3.72 3.36 5.52 6.59 3.56 2.32 3.53 5.11 0.52 2.09 2.62 2.89
1915 6.52 7.02 0.12 3.99 1.20 3.00 9.13 8.28 1.37 2.89 2.20 5.86
1916 2.56 5.27 3.97 3.69 3.21 4.97 6.85 2.49 5.08 1.01 3.29 2.85
1917 3.30 1.98 4.08 1.83 4.13 5.27 3.36 7.06 2.42 6.60 0.63 2.56
1918 4.11 2.99 2.91 2.78 2.47 4.01 1.84 2.22 7.00 1.32 2.87 2.95
1919 2.02 2.80 4.22 2.37 6.20 1.09 4.17 4.80 4.45 1.81 6.20 1.48
1920 2.74 4.45 2.90 4.71 3.65 6.26 2.06 3.62 6.74 1.54 4.62 6.02
1921 2.00 2.38 3.57 6.47 4.56 3.87 6.00 2.35 1.84 1.08 6.20 1.90
1922 1.56 3.02 5.34 2.81 5.47 9.68 4.28 4.25 2.27 2.55 1.56 3.15
1923 6.02 1.81 1.98 3.19 3.26 2.24 1.77 2.55 1.89 5.50 5.05 4.23
1924 3.85 2.56 1.05 4.54 2.21 1.28 1.75 3.11 5.87 0.01 2.57 2.16
1925 3.42 3.64 4.12 3.10 2.55 4.28 6.97 1.93 3.09 4.74 3.23 3.56
1926 3.23 5.01 3.95 3.62 1.19 2.03 3.24 3.97 1.50 5.02 5.38 2.78
1927 2.50 2.62 1.96 1.60 4.83 3.37 3.40 5.01 2.79 4.59 8.65 5.66
1928 2.19 2.90 1.17 4.16 3.25 6.97 6.23 8.40 3.07 0.87 1.79 0.97
1929 4.33 3.92 3.20 6.89 4.17 3.06 0.70 1.54 3.62 2.75 2.73 4.05
1930 2.59 1.39 3.95 1.41 3.34 4.47 4.50 1.82 2.08 2.24 3.42 1.63
1931 3.58 1.80 3.79 2.95 7.44 4.24 3.87 6.57 2.50 3.06 1.55 3.83
1932 3.68 2.70 4.24 2.33 1.67 2.62 3.83 2.67 3.96 3.69 6.05 1.99
1933 2.44 3.48 4.79 5.03 1.69 3.68 2.25 6.63 12.34 3.90 1.19 2.81
3
1934 3.50 2.82 3.60 4.44 3.42 4.67 1.73 3.02 9.54 2.35 3.50 2.99
1935 4.96 2.50 1.48 2.54 2.17 5.50 3.10 0.82 4.67 0.88 4.41 1.05
1936 6.47 2.64 7.04 4.07 1.76 3.28 1.45 4.85 3.80 4.80 2.02 5.96
1937 5.38 2.22 3.38 4.03 6.09 5.72 2.88 4.91 3.24 4.33 4.86 2.44
1938 6.60 1.77 2.00 3.07 3.81 8.45 7.45 2.04 14.55 2.49 3.02 3.95
1939 2.21 3.62 4.49 4.56 2.15 3.21 2.30 3.89 2.97 4.55 0.98 3.89
1940 2.63 2.72 5.58 6.37 5.67 2.46 4.69 1.56 1.53 1.04 6.31 3.01
1941 2.21 1.59 1.63 0.55 2.87 6.13 4.04 1.79 2.88 2.13 4.29 3.82
1942 3.54 1.66 7.89 0.96 2.98 3.63 4.95 2.93 3.94 3.27 6.07 6.03
1943 2.92 1.63 3.07 3.66 5.62 2.38 6.18 2.49 2.40 3.88 4.64 0.58
1944 1.24 2.34 4.36 3.66 1.35 4.70 3.88 4.33 5.31 1.74 4.21 2.18
1945 3.07 3.33 2.16 5.43 6.45 7.67 7.36 2.79 3.57 2.18 3.54 3.91
1946 2.72 3.52 1.60 2.16 5.41 3.30 5.30 4.00 4.88 1.51 0.70 3.51
1947 3.37 1.96 3.29 4.59 4.63 3.22 2.73 1.69 2.84 2.04 5.63 2.33
1948 2.63 2.45 2.92 2.87 5.83 5.67 2.95 3.56 1.92 1.14 5.22 2.87
1949 4.52 2.47 1.67 2.70 4.76 0.72 3.41 3.64 3.55 2.58 1.79 2.44
1950 4.33 3.99 2.67 3.64 2.77 3.65 2.83 2.93 2.24 1.87 6.60 4.64
1951 3.28 4.61 5.13 3.63 2.96 3.05 4.15 3.56 2.63 4.66 4.64 4.35
1952 4.02 1.97 3.17 3.40 4.00 4.97 4.99 3.98 4.05 1.07 0.89 4.10
1953 6.24 2.97 8.24 5.36 6.81 2.41 1.95 1.87 1.88 5.15 2.36 4.53
1954 2.45 1.94 3.93 4.24 4.80 2.68 3.00 3.91 6.14 1.89 5.07 3.19
1955 0.81 3.73 4.39 4.76 3.00 4.06 1.99 16.10 3.80 7.57 4.46 0.79
1956 1.75 3.52 4.94 4.49 2.02 2.86 2.90 2.71 5.55 1.64 3.10 4.83
1957 1.38 1.10 1.55 2.75 3.89 4.50 1.67 0.94 1.57 2.19 5.54 6.39
1958 4.03 2.21 2.62 4.58 2.98 1.64 5.13 5.19 3.90 3.79 3.79 1.57
1959 3.81 2.32 3.84 3.80 1.04 5.65 5.07 6.70 1.03 7.81 4.33 3.85
1960 2.35 3.90 3.32 4.30 3.44 4.73 6.84 3.74 6.75 2.43 3.13 2.71
1961 2.52 3.16 3.00 4.72 3.20 6.05 2.82 2.86 2.02 2.33 3.79 3.27
1962 3.01 3.59 1.84 2.69 2.03 1.06 2.16 3.33 3.74 4.16 2.11 3.30
1963 2.95 2.62 3.61 2.00 1.97 3.98 1.92 2.54 3.56 0.32 3.92 2.19
1964 5.18 2.32 2.71 2.72 0.83 1.84 3.02 3.01 0.94 1.32 1.68 3.98
1965 1.57 2.33 1.10 2.43 2.69 2.41 3.97 3.43 3.68 2.32 2.36 1.88
1966 1.72 3.43 2.93 1.28 2.26 3.30 5.83 0.67 5.14 4.51 3.48 2.22
1967 1.37 2.89 3.27 4.51 6.30 3.61 5.24 3.76 2.12 1.92 2.90 5.14
1968 1.87 1.02 4.47 2.62 3.02 7.19 0.73 1.12 2.64 3.10 5.78 5.08
1969 1.28 2.31 1.97 3.93 2.73 3.52 6.89 5.20 2.94 1.53 5.34 6.30
1970 0.66 3.55 3.52 3.69 4.16 4.97 2.17 5.23 3.05 2.45 3.27 2.37
1971 1.95 3.29 2.53 1.49 3.77 2.68 2.77 4.91 4.12 3.60 4.42 3.19
1972 1.86 3.47 4.85 4.06 4.72 10.25 2.42 2.25 1.84 2.51 6.92 6.81
1973 4.26 2.58 3.45 6.40 5.45 4.43 3.38 2.17 1.83 2.24 2.30 8.77
1974 3.35 2.42 4.34 2.61 5.21 3.40 3.71 3.97 7.29 1.94 2.76 3.67
4
1975 4.39 3.04 3.97 2.87 2.10 4.68 10.56 6.13 8.63 4.90 5.08 3.90
1976 5.23 3.30 2.15 3.40 4.49 2.20 2.20 6.21 2.74 4.31 0.71 2.69
1977 2.24 2.21 5.88 4.91 3.57 3.83 4.04 5.94 7.77 5.81 4.37 5.22
1978 8.16 0.88 2.65 1.48 2.53 2.83 1.81 4.85 0.97 2.19 2.31 3.93
1979 11.01 2.49 3.00 5.37 4.78 0.77 6.67 5.14 4.54 5.79 3.84 4.00
1980 0.50 0.99 6.42 3.84 1.47 3.94 2.26 1.43 2.33 2.23 3.63 0.91
1981 0.49 7.58 0.24 4.48 2.99 3.81 3.11 1.36 3.53 6.10 1.57 4.41
1982 3.92 3.65 2.26 4.39 2.54 8.07 4.20 2.00 2.81 2.29 3.55 1.85
1983 4.82 4.42 4.95 8.99 5.54 2.42 3.10 2.39 1.82 5.47 7.05 6.40
1984 1.75 6.42 3.68 4.30 11.95 1.69 4.66 1.34 1.02 3.13 3.97 2.84
1985 1.73 1.97 2.65 1.55 4.53 3.59 2.16 4.29 2.88 3.50 6.27 1.78
1986 5.86 2.83 3.69 1.43 2.36 5.02 7.32 1.99 1.07 2.43 5.32 5.52
1987 4.32 0.08 4.58 4.76 1.44 4.16 1.51 3.84 7.65 4.16 3.27 2.31
1988 2.40 3.40 2.13 3.59 2.58 1.28 6.37 4.71 2.45 1.72 5.83 1.52
1989 0.94 2.55 2.00 4.29 8.79 5.74 3.81 5.97 5.99 8.10 3.21 1.06
1990 4.32 3.15 3.13 4.35 6.79 1.49 1.70 8.05 1.42 6.40 3.64 5.07
1991 2.37 1.67 4.73 3.66 5.40 2.03 1.39 9.06 7.10 4.21 5.01 3.20
1992 2.12 1.78 3.25 2.95 2.32 3.34 4.28 7.63 2.47 2.18 4.43 3.76
1993 2.18 2.31 5.44 4.69 0.88 2.53 2.99 3.04 4.59 3.79 4.35 3.86
1994 5.76 1.87 5.60 3.19 6.34 2.70 6.87 4.39 3.72 1.34 3.87 5.06
1995 3.66 3.00 1.68 2.15 2.09 2.10 3.75 2.38 3.04 10.93 4.66 2.20
1996 6.68 4.01 2.19 8.30 3.62 4.50 6.94 0.70 6.01 4.11 3.59 6.09
1997 3.56 2.27 3.19 3.68 3.56 1.30 3.99 4.69 1.30 2.27 4.67 1.38
1998 4.19 2.56 4.53 2.79 3.50 8.60 2.06 1.45 2.31 5.70 1.78 1.24
1999 5.67 1.89 4.82 0.87 3.83 2.78 1.65 5.45 13.19 3.48 2.77 1.84
2000 3.00 3.40 3.82 4.14 4.26 7.99 6.88 5.40 5.36 2.29 2.83 4.24
Table 1.1: Monthly Precipitation in Amherst, Massachusetts 1901-2000 (Inches)
Introduction to Distributions
What is the wettest month of the summer in Amherst? How can we address this
question? While it is possible to compare the inches of precipitation in June, July,
and August by carefully studying the numerical values recorded in Tables 1.1a
and 1.1b, it is difficult, if not impossible, to draw any conclusions. There is just
too much information to digest. In some sense, the table includes too much detail;
it overwhelms us. For example, we can see from the table that July was the
wettest summer month in 1996; August was the wettest summer month in 1997;
June was the wettest summer month in 1998; August was again the wettest
summer month in 1999; and finally, June was again the wettest summer month in
2000. We need a way to summarize the information contained in Tables 1.1a and
1.1b. Descriptive statistics perform this task. By describing the distribution of the
values, descriptive statistics distill the information contained in many
5
observations into single numbers. Summarizing data in this way has both benefits
and costs. Without a summary we can easily “lose sight of the forest for the
trees.” In the process of summarizing, however, some information will inevitably
be lost.
Since the mean represents the center of the distribution, the representative
value, why not simply calculate the mean amount of precipitation in June, July,
and August to decide on the wettest summer month? The month with the highest
mean would be deemed the wettest. To calculate the mean (average) precipitation
for June in the twentieth century, we sum the amount of precipitation in each June
and divide the total by the number of Junes, 100 in this case:
.75 + 4.54 + … + 7.99 377.76
Mean for June = = = 3.78
100 100
The mean precipitation for June is 3.78 inches. More formally, we can let x
represent the data variable for monthly precipitation in June:
x1 = value for the first observation (June 1901) = .75
x2 = value for the second observation (June 1902) = 4.54
.
.
.
xT = value for the Tth or last observation (June 2000) = 7.99
6
x + x + … + xT ∑
xt
Mean[ x ] = x = 1 2 = t =1 T = Total Number of Observations
T T
The mean of a data variable is often denoted by a bar above the symbol, x ,
pronounced “x bar.”
T
∑x
t =1
t
is a concise way to describe the arithmetic used to compute the mean.
T
Let us now “dissect” the numerator of this expression:
• The upper case Greek sigma, Σ, is an abbreviation for the word
summation.
• The t=1 and T represent the first and last observations of the summation.
• The xt represents observation t of the data variable.
T
Consequently, the expression ∑ x says “calculate the sum of the xt’s from t
t =1
t
∑x
t =1
t = x1 + x2 + … + xT
Note that the x in Mean[x] is in a bold font. This is done to emphasize the
fact that the mean describes a specific characteristic, the distribution center, of the
entire collection of values, the entire distribution.
We would now like to calculate the precipitation mean for each summer
month. We could use the information in tables and a pocket calculator to compute
the means. This would not only be laborious, but also error prone. Fortunately,
econometric software provides us with an easy and reliable alternative. The
Amherst weather data are posted on our web site:
Amherst Precipitation Data: Monthly time series data of precipitation in
Amherst, Massachusetts from January 1901 to December 2000.
Yeart Year of observation t
Montht Month of observation t
Precipt Monthly precipitation in Amherst, MA for observation t
(inches)
7
Then:
• In the File Download window: Click Open. (Note that different browsers
may present you with a slightly different screen to open the workfile.)
Next, we instruct EViews to calculate the means:
• In the Workfile window: Highlight year by clicking on it; then, while
depressing <Ctrl>, click on month and precip to highlight them also.
• In the Workfile window: Double click on any of the highlighted variables.
• A new list now pops up: Click Open Group. A spreadsheet including the
variables Year, Month, and Precip for all the months appears.
• In the Group window: Click View; then click Descriptive Stats, and then
Individual Samples.3 Descriptive statistics for all the months of the
twentieth century now appear. We only want to consider one month at a
time. We want to compute the mean for June and then for July and then
for August. Let us see how to do this.
• In the Group window: Click Sample.
In the Sample window: Enter month=6 in the “If condition
(optional)” text area to restrict the sample to the sixth month, June,
only.
Click OK. Descriptive statistics for the 100 Junes appear in the
Group window. Record the mean.
• In the Group window: Click Sample.
o In the Sample window: Enter month=7 in the “If condition
(optional)” text area to restrict the sample to July only.
o Click OK. Descriptive statistics for the 100 Julys appear in the
Group window. Record the mean.
• In the Group window: Click Sample.
o In the Sample window: Enter month=8 in the “If condition
(optional)” text area to restrict the sample to August only.
o Click OK. Descriptive statistics for the 100 Augusts appear in the
Group window. Record the mean.
• NB: This last step is critical. In the Group window: Click Sample.
o In the Sample window: Clear the “If condition (optional)” text area
by deleting month=8; otherwise the restriction, month=8, will
remain in effect if you ask EViews to perform any more
computations.
8
Table 1.2 summarizes the information. August has the highest mean. Based on the
mean criterion, August was the wettest summer month in the twentieth century;
the mean for August equals 3.96 which is greater than the mean for June or July.
While the center of the distribution is undoubtedly important, the spread can be
crucial also. If the spread is small, all the values of the distribution lie close to the
center, the mean. On the other hand, if the spread is large, some of the values lie
far below the mean and some lie far about the mean. Farming provides a good
illustration of why the spread can be important. Obviously, the mean precipitation
during the growing season is critical to the farmer. But the spread of the
precipitation is important also. Most crops grow best when they get a steady
amount of moderate rain over the entire growing season. An unusually dry period
followed by an unusually wet period or vice versa is not welcome news for the
farmer. Both the center (mean) and the spread are important. The years 1951 and
1998 illustrate this well.
In reality, 1951 was a better growing season than 1998 even though the
mean for 1998 was a little higher. Precipitation was less volatile in 1951 than in
1998. Arguably the most straightforward measure of distribution spread is its
range. In 1951, precipitation ranged from a minimum of 2.96 to a maximum of
4.15. In 1998, the range was larger from 1.45 to 8.60.
9
While the range is the simplest, it is not the most sensitive. The most
widely cited measure of spread is the variance and its closely related cousin, the
standard deviation. The variance equals the average of the squared deviations of
the values from the mean. While this definition may sound a little overwhelming
when first heard, it is not as daunting as it sounds. We can use the following three
steps to calculate the variance:
• For each month, calculate the amount by which that month’s precipitation
deviates from the mean.
• Square each month’s deviation.
• Calculate the average of the squared deviations; that is, sum the squared
deviations and divide by the number of months, 5 in this case.
Let us first calculate the variance for 1998:
Squared
Month Precipitation Mean Deviation From Mean
Deviation
Apr 2.79 3.68 2.79 – 3.68 = –0.89 0.7921
May 3.50 3.68 3.50 – 3.68 = –0.18 0.0324
Jun 8.60 3.68 8.60 – 3.68 = 4.92 24.2064
Jul 2.06 3.68 2.06 – 3.68 = –1.62 2.6244
Aug 1.45 3.68 1.45 – 3.68 = –2.23 4.9729
Sum of Squared Deviations = 32.6282
Sum of Squared Deviations 32.6282
Variance = = = 6.5256
N 5
Note that the mean and the variance are expressed in different units; the mean is
expressed in inches and the variance in inches squared. Often it is useful to
compare the mean and the measure of spread directly, in terms of the same units.
The standard deviation allows us to do just that. The standard deviation is the
square root of the variance; hence, the standard deviation is expressed in inches,
just like the mean:
Standard Deviation = Variance = 6.5256 in 2 = 2.55in
We can use the same procedure to calculate the variance and standard
deviation for 1951:
Squared
Month Precipitation Mean Deviation From Mean
Deviation
Apr 3.63 3.47 3.63 – 3.47 = 0.16 0.0256
May 2.96 3.47 2.96 – 3.47 = –0.51 0.2601
Jun 3.05 3.47 3.05 – 3.47 = –0.42 0.1764
Jul 4.15 3.47 4.15 – 3.47 = 0.68 0.4624
Aug 3.56 3.47 3.56 – 3.47 = 0.09 0.0081
Sum of Squared Deviations = 0.9326
10
When the spread is small, as it was in 1951, all observations will be close
to the mean. Hence, the deviations will be small. The squared deviations, the
variance, and the standard deviation will also be small. On the other hand, if the
spread is large, as it was in 1998, some observations must be far from the mean.
Hence, some deviations will be large. Some squared deviations, the variance, and
the standard deviation will also be large. Let us summarize:
Spread small Spread large
↓ ↓
All deviations small Some deviations large
↓ ↓
All squared deviations small Some squared deviations large
↓ ↓
Variance small Variance large
We can summarize the steps for calculating concisely with the following
equations:
( x − Mean[ x ]) 2 + ( x2 − Mean[ x ])2 + … + ( xT − Mean[ x ]) 2
Var[ x ] = 1
T
( x − x ) + ( x2 − x ) + … + ( xT − x ) 2
2 2
= 1
T
where T = Total Number of Observations
x = Mean[ x ] = Mean of x
∑ ( xt − Mean[ x ])2 ∑ (x − x )
t
2
Var[ x ] = t =1
= t =1
T T
The standard deviation is the square root of the variance:
SD[ x ] = Var[ x ]
11
∑ ( xt − Mean[ x ])2 and ∑ ( xt − x )2 equal the sum of the squared deviations from the
t =1 t =1
mean.
Note that the x in Var[x] and SD[x] is in a bold font. This emphasizes the
fact that the variance and standard deviation describe one specific characteristic,
the distribution spread, of the entire distribution.
The most obvious difference in the two histograms is that the September
histogram has a longer “right hand tail.” The center of September’s distribution
lies to the right of February’s; consequently, we would expect September’s mean
to exceed February’s. Also, the distribution of precipitation in September is more
“spread out” than the distribution in February; hence, we would expect
September’s variance to be larger. Table 1.4 confirms quantitatively what we
observe visually. September has a higher mean: 3.89 for September versus 2.88
for February. Furthermore, the variance for September is greater.
Mean Variance
February 2.88 1.49
September 3.89 6.50
Table 1.4:
Means and Variances of Precipitation for February and September, 1901-2000
Scatter Diagram: Visual Illustration of How Two Data Variables Are Related
We shall use a second data example to introduce another type of useful graph, the
scatter diagram, which visually illustrates the relationship between two
variables.
14
Stock Market Data: Monthly time series data for growth rates of the Dow Jones
Industrial and Nasdaq stock indexes from January 1985 to December 2000.
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1985 6.21 -0.22 -1.34 -0.69 4.55 1.53 0.90 -1.00 -0.40 3.44 7.12 5.07
1986 1.57 8.79 6.41 -1.90 5.20 0.85 -6.20 6.93 -6.89 6.23 1.94 -0.95
1987 13.82 3.06 3.63 -0.79 0.23 5.54 6.35 3.53 -2.50 -23.22 -8.02 5.74
1988 1.00 5.79 -4.03 2.22 -0.06 5.45 -0.61 -4.56 4.00 1.69 -1.59 2.56
1989 8.01 -3.58 1.56 5.46 2.54 -1.62 9.04 2.88 -1.63 -1.77 2.31 1.73
1990 -5.91 1.42 3.04 -1.86 8.28 0.14 0.85 -10.01 -6.19 -0.42 4.81 2.89
1991 3.90 5.33 1.10 -0.89 4.83 -3.99 4.06 0.62 -0.88 1.73 -5.68 9.47
1992 1.72 1.37 -0.99 3.82 1.13 -2.31 2.27 -4.02 0.44 -1.39 2.45 -0.12
1993 0.27 1.84 1.91 -0.22 2.91 -0.32 0.67 3.16 -2.63 3.53 0.09 1.90
1994 5.97 -3.68 -5.11 1.26 2.08 -3.55 3.85 3.96 -1.79 1.69 -4.32 2.55
1995 0.25 4.35 3.65 3.93 3.33 2.04 3.34 -2.08 3.87 -0.70 6.71 0.84
1996 5.44 1.67 1.85 -0.32 1.33 0.20 -2.22 1.58 4.74 2.50 8.16 -1.13
1997 5.66 0.95 -4.28 6.46 4.59 4.66 7.17 -7.30 4.24 -6.33 5.12 1.09
1998 -0.02 8.08 2.97 3.00 -1.80 0.58 -0.77 -15.13 4.03 9.56 6.10 0.71
1999 1.93 -0.56 5.15 10.25 -2.13 3.89 -2.88 1.63 -4.55 3.80 1.38 5.69
2000 -4.84 -7.42 7.84 -1.72 -1.97 -0.71 0.71 6.59 -5.03 3.01 -5.07 3.59
Table 1.5a: Monthly Percentage Growth Rate of Dow Jones Industrial Index
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1985 12.79 1.97 -1.76 0.50 3.64 1.86 1.72 -1.19 -5.84 4.35 7.35 3.47
1986 3.35 7.06 4.23 2.27 4.44 1.32 -8.41 3.10 -8.41 2.88 -0.33 -3.00
1987 12.41 8.39 1.20 -2.86 -0.31 1.97 2.40 4.62 -2.35 -27.23 -5.60 8.29
1988 4.30 6.47 2.07 1.23 -2.35 6.59 -1.87 -2.76 2.95 -1.34 -2.88 2.66
1989 5.22 -0.40 1.75 5.14 4.35 -2.44 4.25 3.42 0.77 -3.66 0.11 -0.29
1990 -8.58 2.41 2.28 -3.54 9.26 0.72 -5.21 -13.01 -9.63 -4.27 8.88 4.09
1991 10.81 9.39 6.44 0.50 4.41 -5.97 5.49 4.71 0.23 3.06 -3.51 11.92
1992 5.78 2.14 -4.69 -4.16 1.15 -3.71 3.06 -3.05 3.58 3.75 7.86 3.71
1993 2.86 -3.67 2.89 -4.16 5.91 0.49 0.11 5.41 2.68 2.16 -3.19 2.97
1994 3.05 -1.00 -6.19 -1.29 0.18 -3.98 2.29 6.02 -0.17 1.73 -3.49 0.22
1995 0.43 5.10 2.96 3.28 2.44 7.97 7.26 1.89 2.30 -0.72 2.23 -0.67
1996 0.73 3.80 0.12 8.09 4.44 -4.70 -8.81 5.64 7.48 -0.44 5.82 -0.12
1997 6.88 -5.13 -6.67 3.20 11.07 2.98 10.52 -0.41 6.20 -5.46 0.44 -1.89
1998 3.12 9.33 3.68 1.78 -4.79 6.51 -1.18 -19.93 12.98 4.58 10.06 12.47
1999 14.28 -8.69 7.58 3.31 -2.84 8.73 -1.77 3.82 0.25 8.02 12.46 21.98
2000 -3.17 19.19 -2.64 -15.57 -11.91 16.62 -5.02 11.66 -12.68 -8.25 -22.90 -4.90
Table 1.5b: Monthly Percentage Growth Rate of Nasdaq Index
15
We shall focus on the relationship between the Dow Jones and Nasdaq
growth rates. In Figure 1.3, the Dow Jones growth rate is placed on the horizontal
axis and the Nasdaq growth rate is placed on the vertical axis:
Figure 1.3: Scatter Diagram – Dow Jones Growth Rate versus Nasdaq Growth
Rate
On the scatter diagram, each point illustrates the Dow Jones growth rate and the
Nasdaq growth rate for one specific month. For example, the top left point labeled
Feb 2000 represents February 2000 when the Dow fell by 7.42 percent and the
Nasdaq grew by 19.19 percent. Similarly, the point in the first quadrant labeled
Jan 1987 represents January 1987: the Dow rose by 13.82 percent and the Nasdaq
rose by 12.41 percent.
16
The Dow Jones and Nasdaq growth rates appear to be correlated. Two variables
are correlated when information about one variable helps us predict the other.
Typically, when the Dow Jones growth rate is positive, the Nasdaq growth rate is
also positive; similarly, when the Dow Jones growth rate is negative, the Nasdaq
growth rate is usually negative. Although there are exceptions, February 2000 for
example, knowing one growth rate typically helps us predict the other. For
example, if we knew that the Dow Jones growth rate was positive in one specific
month, we would predict that the Nasdaq growth rate would be positive also.
While we would not always be correct, we would be right most of the time.
Covariance quantifies the notion of correlation. We can use the following three
steps to calculate the covariance of two data variables, x and y:
• For each observation, calculate the amount by which variable x deviates
from its mean and the amount by which variable y deviates from its mean.
• Multiply each observation’s x deviation by its y deviation.
• Calculate the average of these products; that is, sum the products of the
deviations and divide by the number of observations.
We can express these steps concisely with an equation:
( x − x )( y1 − y ) + ( x2 − x )( y2 − y ) + … + ( xT − x )( yT − y )
Cov[ x , y ]= 1
T
T
∑ ( x − x )( y
t t − y)
= t =1
T
where T = Total Number of Observations
x = Mean[ x ] = Mean of x
y = Mean[ y ] = Mean of y
Let us calculate the covariance for the Dow and Nasdaq monthly growth
rates. The average monthly increase for the Dow Jones Industrial average was
1.25 percent and the average increase for the Nasdaq Composite was 1.43 percent.
Their covariance equals 19.61:
17
Cov[ x, y ] =
( x1 − x )( y1 − y ) + ( x2 − x )( y2 − y ) + … + ( xT − x )( yT − y )
T
=
( 6.21 − 1.25)(12.79 − 1.43) + ( −.22 − 1.25)(1.97 − 1.43) + … + ( 3.59 − 1.25)( −4.90 − 1.43)
192
= 19.61
A nonzero variance suggests that the variables are correlated. To understand why,
consider a scatter diagram of the deviations. We place the deviation of the Dow Jones
growth rate from its mean on the horizontal axis and the deviation of the Nasdaq
growth rate from its mean on the vertical axis. This scatter diagram allows us to
motivate the relationship between the covariance and correlation.4
Figure 1.4: Scatter Diagram – Dow Jones Growth Rate Less Its Mean versus
Nasdaq Growth Rate Less Its Mean
18
The covariance equation and the scatter diagram are related. The
numerator of the covariance equation equals the sum of the products of each
month’s deviations, ( xt − x )( yt − y ) :
T
∑ ( x − x )( y
t t − y)
Cov[ x, y ] = t =1
T
What can we say about the sign of each observation’s deviations and their
product, ( xt − x )( yt − y ) , in each quadrant of the scatter diagram?
• First quadrant. Dow growth rate is greater than its mean and Nasdaq
growth is greater than its mean. Both deviations are positive; hence, the
product of the deviations is positive in the first quadrant:
( xt − x ) > 0 and ( yt − y ) > 0 → ( xt − x )( yt − y ) > 0
• Second quadrant. Dow growth rate is less than its mean and Nasdaq
growth is greater than its mean. One deviation is positive and one
negative; hence, the product of the deviations is negative in the second
quadrant:
( xt − x ) < 0 and ( yt − y ) > 0 → ( xt − x )( yt − y ) < 0
• Third quadrant. Dow growth rate is less than its mean and Nasdaq growth
is less than its mean. Both deviations are negative; hence, the product of
the deviations is positive in the third quadrant:
( xt − x ) < 0 and ( yt − y ) < 0 → ( xt − x )( yt − y ) > 0
• Fourth quadrant. Dow growth rate is greater than its mean and Nasdaq
growth is less than its mean. One deviation is positive and one negative;
hence, the product of the deviations is negative in the fourth quadrant:
( xt − x ) > 0 and ( yt − y ) < 0 → ( xt − x )( yt − y ) < 0
19
−
(yt − y)
Quadrant II Quadrant I
− − − −
(xt−x)<0 (yt−y)>0 (xt−x)>0 (yt−y)>0
− − − −
(xt−x)(yt−y) < 0 (xt−x)(yt−y) > 0
−
(xt - x)
Quadrant III Quadrant IV
− − − −
(xt−x)<0 (yt−y)<0 (xt−x)>0 (yt−y)<0
− − − −
(xt-x)(yt-y) > 0 (xt−x)(yt−y) < 0
In the Dow Jones and Nasdaq deviation scatter diagram (Figure 1.4), most
months lie in the first and third quadrants. Consequently, the product of the
deviations, ( xt − x )( yt − y ) , is positive in most months. This explains why the
covariance is positive.5 A positive covariance means that the variables are positively
correlated. When one variable is above average, the other is typically above average
also. Similarly, when one variable is below average, the other is typically below
average.
20
Figure 1.6: Scatter Diagram – Amherst Precipitation Less Its Mean versus
Nasdaq Growth Rate Less Its Mean
21
Recall what we know about the sign of the deviation in each quadrant:
• First quadrant: ( xt − x ) > 0 and ( yt − y ) > 0 → ( xt − x )( yt − y ) > 0
• Second quadrant: ( xt − x ) < 0 and ( yt − y ) > 0 → ( xt − x )( yt − y ) < 0
• Third quadrant: ( xt − x ) < 0 and ( yt − y ) < 0 → ( xt − x )( yt − y ) > 0
• Fourth quadrant: ( xt − x ) > 0 and ( yt − y ) < 0 → ( xt − x )( yt − y ) < 0
Since the points are distributed more or less evenly across all four
quadrants, the products of the deviations, ( xt − x )( yt − y ) , are positive in about
half the months and negative in the other half.6 Consequently, the covariance will
be approximately equal to 0. More generally, if variables are independent, the
covariance will be about 0. In reality, the covariance of precipitation and the
Nasdaq growth rate is –0.91, approximately 0:
T
∑ ( x − x )( y
t t − y)
Cov[ x, y ] = t =1
= −.91 ≈ 0
T
We can use EViews to calculate the covariance. The stock market data are
posted on our web site.
Stock Market Data: Monthly time series data for growth rates of the Dow
Jones Industrial and Nasdaq stock indexes from January 1985 to December
2000.
DJGrowtht Monthly growth rate of the Dow Jones Industrial Average
based on the monthly close for observation t (percent)
NasdaqGrowtht Monthly growth rate of the Nasdaq Composite based on the
monthly close for observation t (percent)
Precipt Monthly precipitation in Amherst, MA for observation t
(inches)
22
Then:
• In the File Download window: Click Open. (Note that different browsers
may present you with a slightly different screen to open the workfile.)
Next, we instruct EViews to calculate the covariance of Amherst precipitation and
the Nasdaq growth rate:
• In the Workfile window: Highlight precip by clicking on it; then while
depressing <Ctrl> click on nasdaqgrowth to highlight it.
• In the Workfile window: Double click on any of the highlighted variables.
• A new list now pops up: Click Open Group. A spreadsheet including the
variables Precip and NasdaqGrowth appears.
• In the Group window: Click View, and then click Covariance Analysis…
• In the Covariance Analysis window: Be certain that the Covariance
checkbox is selected; then, click OK.
Last, close the file:
• In the EViews window: Click File, then Exit.
• In the Workfile window: Click No in response to the save changes made
to the workfile.
__________________________________________________________________
Covariance Matrix
Precip NasdaqGrowth
Precip 4.170426 -0.911125
NasdaqGrowth -0.911125 43.09910
Table 1.6: Amherst Precipitation and Nasdaq Growth Rate Covariance Matrix
Both the variances and the covariances are reported. The variances are reported in
the diagonal cells: the variance for Amherst precipitation is 4.17 and the variance
for the Nasdaq growth rate is 43.10. Their covariance appears in the off diagonal
cells: the covariance is −.91. Note that the two off diagonal cells report the same
number. This results from a basic arithmetic fact. When we multiply two numbers
together, the order of the multiplication does not matter:
( xt − x )( yt − y ) = ( yt − y )( xt − x )
23
There is no natural range for the covariance; its magnitude depends on the units
used. To appreciate why, suppose that we measured Amherst precipitation in
centimeters rather than inches. Consequently, all precipitation figures appearing
in Tables 1.1a and 1.1b would be multiplied by 2.54 to convert from inches to
centimeters. Now, consider the covariance equation:
( x − x )( y1 − y ) + ( x2 − x )( y2 − y ) + … + ( xT − x )( yT − y )
Cov[ x, y ] = 1
T
Unit Insensitivity
⏐ ∑ ( xt − x ) 2
⏐ Var[ x ] = t =1
↓ T
Var[x] up by a
factor of 2.542
↓
Var[ x ] up by a
factor of 2.54
Natural Range
The correlation coefficient also has another important property; it must lie
between –1.00 and +1.00. Therefore, it provides us with a sense of how strongly
two variables are correlated. A correlation coefficient of +1.00 represents perfect
positive correlation and −1.00 represents perfect negative correlation. To
understand why, consider the two polar cases of perfect positive and perfect
negative correlation.
0 CorrCoef
−1 +1
Figure 1.7: Range of Correlation Coefficients
∑ ( xt − x )2 ∑(y t − y )2
Var[ x ] = t =1
Var[ y ] = t =1
T T
Cov[ x, y ]
CorrCoef[ x, y ] =
Var[ x ] Var[ y ]
In this case, the variables exhibit perfect positive correlation. If we know the
value of x we can perfectly predict the value of y and vice versa. Let us compute
their correlation coefficient. To do so, first note that x and y have identical means
y=x
and that each observation’s deviation from the means is the same for x and y
yt − y = xt − x for each t = 1, 2,… , T
Consider the above equations for the variances and covariance; both the variance
of y and the covariance equal the variance of x:
Var[y] = Var[x] and Cov[x, y] = Var[x]
26
Var[y] = ∑ ( yt − y )2
t =1
= ∑ (x − x )
t =1
t
2
= Var[x]
T T
↑ ↑ ↑
Definition yt − y = xt − x Definition
and
T T
Cov[x, y] = ∑ ( xt − x )( yt − y )
t =1 = ∑ (x − x )
t =1
t
2
= Var[x]
T T
↑ ↑ ↑
Definition yt − y = xt − x Definition
Now, apply the correlation coefficient equation; the correlation coefficient equals
1.00:
Cov[ x, y ] Var[ x ] Var[ x ]
CorrCoef[ x , y ] = = = = 1.00
Var[ x ] Var[ y ] Var[ x ] Var[ x ] Var[ x ]
↑ ↑
Definition Cov[x, y] = Var[x]
27
In this case, the variables exhibit perfect negative correlation. Clearly, y’s mean is
the negative of x’s:
y = −x
and y’s deviation from its mean equals the negative of x’s deviation from its mean
for each observation
yt − y = −( xt − x ) for each t = 1, 2,… , T
The variance of y equals the variance of x and the covariance equals the negative
of the variance of x:
Var[y] = Var[x] and Cov[x, y] = −Var[x]
Var[y] = ∑(y
t =1
t − y )2
= ∑ (x − x )
t =1
t
2
= Var[x]
T T
↑ ↑ ↑
Definition yt − y = −( xt − x ) Definition
and
T
T
∑ ( xt − x )( yt − y ) ∑ (x − x ) t
2
Cov[x, y] = t =1
= − t =1 = −Var[x]
T
T
↑ ↑ ↑
Definition yt − y = −( xt − x ) Definition
Applying the correlation coefficient equation, the correlation coefficient equals
−1.00:
Cov[ x, y ] − Var[ x ] − Var[ x ]
CorrCoef[ x , y ] = = = = 1.00
Var[ x ] Var[ y ] Var[ x ] Var[ x ] Var[ x ]
↑ ↑
Definition Cov[x, y] = −Var[x]
28
Then:
• In the File Download window: Click Open. (Note that different browsers
may present you with a slightly different screen to open the workfile.)
• In the Workfile window: Highlight precip by clicking on it; then, while
depressing <Ctrl>, click on nasdaqgrowth and djgrowth to highlight
them also.
• In the Workfile window: Double click on any of the highlighted variables.
• A new list now pops up: Click Open Group. A spreadsheet including the
variables Precip, NasdaqGrowth, and DJGrowth appears.
• In the Group window: Click View, and then click Covariance Analysis…
• In the Covariance Analysis window: Clear the Covariance box and Select
the Correlation box; then, click OK.
__________________________________________________________________
All diagonal elements must equal 1.00. This reflects the fact that when two
variables are identical, perfect positive correlation results. Each off diagonal cells
reports the correlation coefficient for the two different variables:
Correlation Matrix
Precip NasdaqGrowth DJGrowth
Precip 1.000000 -0.067960 -0.128425
NasdaqGrowth -0.067960 1.000000 0.669061
DJGrowth -0.128425 0.669061 1.000000
Table 1.7: Amherst Precipitation, Nasdaq Growth Rate, and Dow Jones Growth
Rate Correlation Matrix
Note that all the correlation coefficients fall within the –1.00 to +1.00 range. Each
correlation coefficient provides us with a sense of how correlated two variables
are.
CorrCoef[Dow Jones Growth Rate, Nasdaq Growth Rate] = .67
CorrCoef[Nasdaq Growth Rate, Amherst Precipitation] = –.07
The correlation coefficient for the Dow and Nasdaq growth rate is
positive. This illustrates that they are positively correlated. On the other hand, the
correlation coefficient for Nasdaq growth rate and Amherst precipitation is
approximately 0, indicating that the Nasdaq growth rate and Amherst
precipitation are independent.
29
The fact that two variables are highly correlated does not necessarily indicate that
one variable is causing the other to rise and fall. For example, the Dow Jones and
Nasdaq growth rates are indeed positively correlated. This does not imply that a
rise in the Dow Jones causes the Nasdaq to rise or that a rise in the Nasdaq causes
the Dow Jones to rise, however. It simply means that when one rises the other
tends to rise and when one falls the other tends to fall. One reason that these two
variables tend to move together is that both are influenced by similar factors. For
example, both are influenced by that the general health of the economy. When the
economy prospers, both Dow Jones stocks and Nasdaq stocks tend to rise
therefore both indexes tend to rise. On the other hand, when the economy falters,
both indexes tend to fall. While the indexes are correlated, other factors are
responsible for the causation.
30
x + x + … + xT ∑
xt
Mean[ x ] = x = 1 2 = t =1 where T = Total Number of Observations
T T
∑( x − x )
2
( x1 − x ) + ( x2 − x ) + … + ( xT − x )
2 2 2 t
Var[ x ] = = t =1
T T
T
x1 − x )( y1 − y ) + ( x2 − x )( y2 − y ) + … + ( xT − x )( yT − y ) ∑
( ( xt − x )( yt − y )
Cov[ x, y ] = = t =1
T T
x1 + y1 + x2 + y2 + … + xT + yT
Mean[ x + y ] =
T
( x + x + … + xT ) + ( y1 + y2 + … + yT )
= 1 2
T
x + x + … + xT y1 + y2 + … + yT
= 1 2 +
T T
=x+y
Variance of a constant plus a variable: Var[c + x] = Var[x]
The variance of a constant plus a variable equals the variance of the variable.
[(c + x1 ) − (c + x )]2 + [(c + x2 ) − (c + x )]2 + … + [(c + xT ) − (c + x )]2
Var[c + x ] =
T
[(c − c) + ( x1 − x )] + [(c − c) + ( x2 − x )]2 + … + [(c − c) + ( xT − x )]2
2
=
T
( x − x ) + ( x2 − x ) + … + ( xT − x ) 2
2 2
= 1
T
= Var[ x ]
Variance of a constant times a variable: Var[cx] = c2Var[x]
The variance of a constant times a variable equals the constant squared times
the variance of the variable.
(cx − cx ) 2 + (cx2 − cx )2 + … + (cxT − cx )2
Var[cx ] = 1
T
c ( x1 − x ) + c ( x2 − x ) 2 + … + c 2 ( xT − x ) 2
2 2 2
=
T
( x − x ) + ( x2 − x ) 2 + … + ( xT − x ) 2
2
= c2 1
T
= c Var[ x ]
2
Var[ x + y ]
[( x1 + y1 ) − ( x + y )]2 + … + [( xT + yT ) − ( x + y )]2
=
T
[( x − x ) + ( y1 − y )] + … + [( xT − x ) + ( yT − y )]2
2
= 1
T
[( x − x ) + 2( x1 − x )( y1 − y ) + ( y1 − y ) 2 ] + … + [( xT − x ) 2 + 2( xT − x )( yT − y ) + ( yT − y )2 ]
2
= 1
T
[( x − x ) + … + ( xT − x ) ] + 2[( x1 − x )( y1 − y ) + … + ( xT − x )( yT − y )] + [( y1 − y ) 2 + … + ( yT − y
2 2
= 1
T
( x1 − x ) + … + ( xT − x )
2 2
( x1 − x )( y1 − y ) + … + ( xT − x )( yT − y ) ( y1 − y ) 2 + … + ( yT − y ) 2
= +2 +
T T T
= Var[ x ] + 2Cov[ x , y ] + Var[ y ]
=
( x1 − x )( y1 − y ) + ( x2 − x )( y2 − y ) + … + ( xT − x )( yT − y )
T
= Cov[ x, y ]
Covariance of the product of a constant and a variable: Cov[cx, y] = cCov[x, y]
Multiplying a variable by a constant increases the covariance by a factor equal
to the constant.
34
Cov[cx, y ] =
( cx1 − cx )( y1 − y ) + ( cx2 − cx )( y2 − y ) + … + ( cxT − cx )( yT − y )
T
c ( x1 − x )( y1 − y ) + c ( x2 − x )( y2 − y ) + … + c ( xT − x )( yT − y )
=
T
=c
( x1 − x )( y1 − y ) + ( x2 − x )( y2 − y ) + … + ( xT − x )( yT − y )
T
= cCov[ x, y ]
1
With the exception of two months, the data were obtained from NOAA's
National Climate Data Center. Data for these two months, March 1950 and
October 1994, were missing. Data for these months were obtained from the
Phillip T. Ives records that are stored in the Amherst College archives.
2
The median and mode are other measures of the center. They are presented in
Chapter 25.
3
Common sample eliminates all observations in which there is one or more
missing values in one of the variables; the individual samples option does not do
so. Since no values are missing for June, July, and August, the choice of common
or individual has no impact.
4
The discussion that follows is not mathematically rigorous because it ignores the
magnitude of the deviation products. Nevertheless, it provides valuable insights.
Chapter 25 provides a more rigorous discussion of covariance.
5
As mentioned above, we are ignoring how the magnitude of the products affects
the sum.
6
Again, note that this explanation ignores the magnitude of the products.
7
See Appendix 1.1 at the end of this chapter for the algebraic proofs.