Академический Документы
Профессиональный Документы
Культура Документы
Financial Valuation
and
Econometrics
Final Draft version under Copy-Editing by
World Scientific Publishing, June 2010
ii
iii
To Leng
iv
Contents
vi-viii
Preface
Chapter
1-24
Chapter
25-40
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
Chapter
10
173-194
Chapter
11
195-206
Chapter
12
Specification Errors
207-234
Chapter
13
235-253
Chapter
14
Cross-Sectional Regression
Application: Testing CAPM
More Multiple Linear Regressions
Application: Multi-Factor Asset
Pricing
41-68
69-84
85-100
101-124
125-144
145-154
155-172
254-269
v
Errors-in-variable
Application: Exchange Rates and
Risk Premium
Unit Root Processes
Application: Purchasing Power Parity
Conditional Heteroskedasticity
Application: Risk Estimation
Mean Reverting Continuous Time
Process
Application: Bonds and Term
Structures
Implied Parameters
Application: Option Pricing
Generalized Method of Moments
Application: Consumption-Based
Asset Pricing
270-284
Matrix Algebra
374-394
Appendix
Eviews Guide
395-405
Appendix
406-410
Appendix
411-435
Appendix
436-465
Chapter
15
Chapter
16
Chapter
17
Chapter
18
Chapter
19
Chapter
20
Appendix
Index
285-303
304-324
325-346
347-359
360-373
466-470
vi
Preface
This book is an introduction to financial valuation and financial data analyses
using econometric methods. The complexity and enormity of global financial
markets today has far-reaching implications for financial decision-making and
practice. The uncertainty that drives and perturbs financial market prices often
draws rigorous scientific search in the hope of finding clues to reproduce
winning formulas for gold, or else to find the secret recipe to avert disasters.
This scientific drive is partly fueled by the rising of mathematical analyses,
including economic and statistical theory, to the occasion, and is also spurred
on by the increasing availability of financial market data as well as the
fullness of computing power in crunching data.
Financial valuation is the key to investment decision and risk management
that are central to any economy today. In a nutshell, investment leads to
production of tomorrows consumption goods. Risk management leads to
prudence in investment and savings so as to avoid bankruptcies that cost
aplenty. Since the 1950s, the field of finance has developed a rich set of
theories and rigorous framework with which to understand how stock prices
are formed, whether rationally or sometimes perhaps behaviorally or with
anomalies. Besides stock prices and returns, bond prices, interest rates,
exchange rates, futures and option prices are other major financial variables in
the capital markets.
The empirical validation of financial valuation models and of market
phenomena by data, and in turn the feedback to appropriate and effective
theoretical modeling, form an interesting and exciting experience in the study
of finance. There are really three key domain knowledge areas here. Financial
valuation or pricing theories are typically constructed from more fundamental
economic axioms such as investor rationality and insatiability. Some
mathematical tools such as optimization and conditional expectations are
utilized. In the process of deriving a closed form or else analytical form
theoretical model, market equilibrium conditions are often added as part of the
necessary conditions to a solution. A major output of such theorizing efforts is
an asset pricing model. Theoretical models help to explain positively how
market variables happened and provide a vehicle to develop optimal decisionmaking.
Yet pragmatic investment decisions typically require parameter inputs that
have to be estimated or require forecasts of future prices. Such considerations
inevitably lead to applications of statistical models to historical prices and
time series of the relevant economic variables. In more formal language, this
is the construction of a probability space on these variables of interest. If we
model a particular variable over time, it is a statistical model. If we model a
collection of variables over time, where these variables mutually influence
vii
each other, then it is also called an econometric model. Thus the second key
domain knowledge area is probability and statistical theory, or else
econometrics in the context of problems to do with capital market finance.
Finally, another key domain is how data are collected and used. Raw data
are just numbers that in themselves do not lend much insight. For example, if
we collect twelve past monthly return rates of a particular stock, and find that
the sample average of these is one percent, this average of one percent should
not be used simply as an expectation of what the next months return would
be. But suppose we have in addition a statistical model showing that the
monthly return rate of this stock follows an upward trend of half percent while
any deviation is due to random error, then it is more accurate to expect next
months return to be half percent. Sometimes great attention has to be paid to
whether monthly return rate, or daily return rate, or intra-day return rate, is
more appropriate for the question under study. It is of paramount importance
to understand how the data are obtained, whether there are observational or
recording errors, and whether there are better proxy variables to represent the
effect we seek.
This book is a modest attempt to bring together these domains in financial
valuation theory, in econometrics modeling, and in the empirical analyses of
financial data. These domains are highly inter-twined and should be properly
understood in order to correctly and effectively harness the power of data and
statistical or econometrics methods for investment and financial decisionmaking.
One can think of many good books in basic econometrics and also many
good books in finance theory and modeling. The contribution in this book, and
at the same time, its novelty, is in employing materials in basic econometrics,
particularly linear regression analyses, and weaving into it threads of
foundational finance theory, concepts, ideas, and models. The treatment in this
book is at a basic level. It is hoped that advanced undergraduate or first year
postgraduate students learning finance and/or basic econometrics or linear
regression analyses could go through the materials in this book with a
heightened appreciation of how applied econometrics inter-twined with the
discovery of financial market knowledge.
It is also hoped that all students who work through this book will begin to
understand that it may be indeed erroneous to make a forecast by simply
taking a bunch of financial time series data and making a straight-line least
squares regression. We should seek to know what the theory, if any, behind
the linkage of the variables is, and be able to choose the appropriate time
series data, and employ useful econometric modeling to address not so
apparent features of the time series such as non-homogeneity, non-linearity,
measurement errors, and so on. Students should also appreciate that the
estimates or test statistics are in themselves random variables and would
viii
behave in some prescribed manner as sample size varies, and would also
misbehave if there is a spurious problem, and thus be able to interpret
empirical results with more scientific precision.
The chapters of the book are organized along a general progression of
topics taught in basic econometrics, particularly in linear regression analyses,
although there is also coverage on time series analyses and on the nonlinear
generalized method-of-moments technique. There is a clear attempt on my
part to make this coincide with teaching of key concepts and theories in
financial valuation. In fact, this feature of covering both finance and
econometrics at the same time should be especially rewarding and interesting
to students who are learning both finance and basic econometrics at the same
time.
At the beginning of each chapter in this book, key points of learning are
listed so that students can check their own progress if they have covered the
major materials of the chapter. Some econometrics materials, especially those
involving multiple variables, are more conveniently developed in terms of
matrices instead of arithmetic algebra. Therefore, some prior knowledge of
matrix algebra will be helpful. Appendix A contains a short refresher on matrix
algebra as a preparation.
Most chapters would contain one or more finance application examples
where finance concepts, and sometimes theory, are taught. I have tried to
incorporate real examples of companies and practice where useful, and due to
my nationality, I naturally use some examples of Singapore-based companies.
References to articles and sources are usually listed as footnotes on the same
pages. Data sources for the empirical examples are cited. The empirical
examples were developed using EVIEWS, a statistical software package that
is easily available. Alternative software such as R or SAS, or even EXCEL
(with VBA) can be used as well. A beginners guide to using EVIEWS and
also EXCEL regression is provided in Appendix B and C respectively. Each
chapter ends with a problem set for the student to practice, and more reading
references should the student desire to learn more advanced materials related
to the contents of that chapter. Appendix D contains sets of multiple choice
question tests so students can quickly check if they understand the concepts
taught. Appendix E provides solutions to the problem sets.
This manuscript is a substantial expansion and revision from a draft
version that I used to teach in a course on Investment and Financial Data
Analysis. I wish to express my thanks to Dharma, Hong Chao, Jane Lim, Yi
Bao, Christopher Ting, and several other colleagues who had provided
valuable feedbacks. Finally, any updates or errata will be available at http://
www. mysmu.edu/faculty/kglim.
Kian Guan
Singapore, April 2010
ix
Chapter 1
PROBABILITY DISTRIBUTION AND STATISTICS
Key Points of Learning
Random variable, Joint probability distribution, Marginal probability
distribution, Conditional probability distribution, Expected value, Variance,
Covariance, Correlation, Independence, Normal distribution function, Chisquare distribution, Student-t distribution, F-distribution, Data types and
categories, Sampling distribution, Hypothesis, Statistical test
1.1
PROBABILITY
2
defined, or which has interpretative meaning only when there is existence of a
joint probability distribution describing the random variables.
Figure 1.1
S&P 500 Index Portfolio Return Rate and Price-Earning Ratio 1872-2009
(Data from Prof Shiller, Yale University)
60%
40%
20%
0%
-20%
-40%
S&P 500 INDEX RETURN RATE
S&P 500 INDEX AGGREGATE P/E RATIO
-60%
1870
1890
1910
1930
1950
1970
1990
2010
YEAR
In Figure 1.2, we plot the U.S. national aggregate consumption versus national
disposable income in US$ billion. Disposable income is defined as Personal
Income less personal taxes. Personal Income is National Income less corporate
taxes and corporate retained earnings. In turn, National Income is Gross
Domestic Product (GDP) less depreciation and indirect business taxes such as
sales tax. GDP is essentially the total dollar output or gross income of the
country. If we include repatriations from citizens working abroad, then it
becomes Gross National Product (GNP).
In Figure 1.2, it appears that consumption increases in disposable income.
The relationship is approximately linear. This is intuitive as on a per capita
basis, we would expect that for each person, when his or her disposable
income rises, he or she would consume more. In life-cycle models of financial
economics theory, some types of individual preferences could lead to
consumption as an increasing function of individual wealth which consists of
inheritance as well as fresh income. Sometimes analysis on income also
3
breaks it down into a permanent part and a transitory part. More of these could
be read in economics articles on life-cycle models and hypotheses.
Figure 1.2
U.S. Annual National Aggregate Consumption versus Disposable Income
1999-2009 (Data from Federal Reserve Board of U.S. in $billion)
$9,600
CONSUMPTION
$9,200
$8,800
$8,400
$8,000
$7,600
$7,200
$7,000
$8,000
$9,000
$10,000
$11,000
DISPOSABLE INCOME
4
purchases of goods and services, then the plot displays the positive income
effect on such effective demand. Theoretically, each Xt and each Yt for every
time t is a random variable.
Figure 1.3
U.S. Annual Year-to-Year Change in National Aggregate Consumption
versus Change in Disposable Income 2000-2009 (Data from Federal
Reserve Board of U.S. in $billion)
$400
CHANGE IN CONSUMPTION
P1
P6
$300
P5
P7
P4
P8
$200
P2
P3
$100
$0
P9
$-100
$40
P10
5
stock ABCs holding or discrete return rate at time t+1. X t+1 = Pt+1/Pt 1. The
corresponding continuously compounded return rate at t+1 is ln(Pt+1/Pt), which
is approximately Xt+1 when Xt+1 is close to 0. Another stock XYZ has discrete
return rate Yt+1 at time t+1.
Table 1.1
Discrete Bivariate Joint Probability of Two Stock Return Rates
Xt+1
P(xt+1,yt+1)
Yt+1
a1
a2
a3
a4
a5
a6
P(yt+1)
b1
0.005 0.03
0.03
0.015
0.005
0.01
0.095
b2
0.015 0.02
0.04
0.015
0.005
0.02
0.115
b3
0.015 0.025
0.05
0.02
0.015
0.05
0.175
b4
0.03
0.03
0.07
0.08
0.025
0.035
0.27
b5
0.02
0.06
0.04
0.05
0.045
0.02
0.235
b6
0.015 0.035
0.02
0.02
0.005
0.015
0.11
0.25
0.2
0.1
0.15
P(xt+1)
0.1
0.2
In Table 1.1, we must take care to distinguish between random variable Xt+1
and the realized value it takes in an outcome, e.g. xt+1 a3. For example, a3
could be 0.08 or 8%. In the bivariate discrete probability distribution shown in
the table, Xt+1 takes one of six possible values viz. a1, a2, a3, a4, a5, and a6. The
probability of any one of these six events or outcomes is given by P(X t+1 = xt+1
ak), or in short P(xt+1), and is shown in the last row of the table. The
probability function P(.) for discrete probability distribution is also called a
probability mass function (pmf). We should think of a probability or chance as
a one-to-one function that maps or assigns a number in [0,1]
to each
realized value of the random variable.
denotes the real line or (-, +).
Likewise, the probability of any one of the six outcomes of random variable
Yt+1 is given by P(yt+1) and is shown in the last column of the table. Note that
the probabilities of events that make up all the possibilities must sum up to 1.
The joint probability of event or outcome with realized values (xt+1 , yt+1) is
given by P(Xt+1=xt+1,Yt+1=yt+1). These probabilities are shown in the cells
within the inner box. For example, P(a3, b5) = 0.04. This means that the
probability or chance of Xt+1 = a3 and Yt+1 = b5 simultaneously occurring is
0.04 or 4%. Clearly the sum of all the joint probabilities within the inner box
must equal 1. The marginal probability of Yt+1 = b3 in the context of the
(bivariate) joint probability distribution is the probability that Yt+1 takes the
realized value yt+1 b3 regardless of the simultaneous value of xt+1. We write
6
this marginal probability as PY(Yt+1=b3). The subscript Y to probability
function P(.) is to highlight that it is marginal probability of Y. Sometimes this
is omitted. Note that this marginal probability is also a univariate probability.
In this case, PY(b3) = P(a1,b3) + P(a2,b3) + P(a3,b3) + P(a4,b3) + P(a5,b3) +
P(a6,b3). Notice we simplify the notations indicating the ajs and bks are
values xt+1 and yt+1 respectively where the context is understood. In a full
summation notation,
PY Yt 1 b 3 PX t 1 a j , Yt 1 b 3 .
6
j1
This is obviously the sum of numbers in the row involving b3, and is equal to
0.175. The marginal probability of Xt+1 = a2 is given by
6
PX X t 1 a 2 PX t 1 a 2 , Yt 1 b k 0.2 .
k 1
PX
6
What is
j1 k 1
t 1
PX
6
j1 k 1
a j , Yt 1 b k PX X t 1 a j 1.
6
t 1
j1
7
Table 1.2
Joint Probability of Two Stock Return Rates when Xt+1=a2
0.03
0.02
0.025
0.03
0.06
0.035
We recall Bayes rule on event sets, that
PA | B
PA B
PB
where A and B are events or event sets in a universe. We can think of the
outcome {Xt+1 = a2} as event B, and outcome {Yt+1 = b3} as event A. Events
can be more general, as occurrences {Xt+1=aj}, {Yt+1=bk}, {Xt+1=aj ,Yt+1=bk}
are all events or event sets. More exactly,
Pb 3 | a 2
Pa 2 , b 3 0.025
0.125 .
PX a 2
0.2
In general,
PX t 1 a j , Yt 1 b k
PYt 1 b k | X t 1 a j
PX X t 1 a j
PX t 1 a j , Yt 1 b k
PX
6
k 1
t 1
a j , Yt 1 b k
8
9.5 3
f x, ydy dx .
0 2
The support for a random variable such as Xt+1 is the range of x. For joint
normal densities, the ranges are usually (-,). Thus, Yt+1 also has the same
support. It is usually harmless to use (-,) as supports even if the range is
finite [a,b], since probabilities of null events (-,a) and (b,) are zeros.
However, when more advanced mathematics is involved, it is typically better
to be precise. Notice also that probability is essentially an integral of a
function, whether continuous or discrete, and is area under the pdf curve.
The marginal probability density function of Xt+1 and Yt+1 are given by
f Y y
f x, ydx
and f X x
f x, ydy .
f(y|x) = f(x,y)/fX(x) .
EXPECTATIONS
E(Xt+1) =
a
j1
E(Xt+1) =
x f x dx
X
a Pa
6
E(Xt+1|b4) =
j1
9
and for continuous pdf,
E(Xt+1|y) =
x f x | y dx .
Notice that for the continuous pdf, the conditional expected value given y is a
function containing only y. This means that one can further evaluate more
specific conditional expectations based on given sets of y values e.g. {y: -2 <
y < 3}. Then E(Xt+1|-2 < y < 3) is found via
x f x | -2 y 3dx
f x,-2 y 3
3
f x, y dx dy
dx
x
f
x,
y
dy
dx
2
3
f x, y dx dy
x
f
x,
y
dx
dy
2
3
f Y y dy
3
The interchange of integrals in the last step above uses the Fubini Theorem
assuming some mild regularity conditions satisfied by the functions.
The variance of a continuous random variable Xt+1 is given by
var(Xt+1) = X =
2
f X (x) dx .
cov(Xt+1,Yt+1) = XY =
x y f x, ydx dy .
x
10
high, then the covariance will be a positive number. If they vary inversely,
then the covariance will be a negative number. If there is no co-moving
relationship and each random variable moves independently, then their
covariance is zero. Notice that covariance is also an expectation or integral.
The co-movement of two random variables is typically better characterized
by their correlation coefficient which is covariance normalized or divided by
their s.d.s.
corr(Xt+1,Yt+1) = XY
XY
.
XY
One other advantage of using the correlation coefficient than the variance is
that the correlation coefficient is not denominated in the value units of X or Y
but is a ratio.
It is important to understand that correlation measures association but not
causality. In Figure 1.3, clearly changes in consumption and income are
strongly positively correlated. Suppose one concludes that increasing
consumption will increase income, the resulting action will be disastrous. Or
even if one simply concludes (based on some understanding of
macroeconomics theory or by intuition) that increased income causes
increased consumption, it may still be premature as there are so many other
possibilities and qualifications. For example, some other unobserved variables
such as general education level could lead to increases in both income and
consumption.
Or, suppose we think of Yt+1 as GDP and Xt+1 as population. Both increase
with time due to various economic and geo-political reasons. But it will be
disastrous for policy implication to think that increasing population leads to or
causes increase in GDP. This has to assume fairly constant employment and
output.
For general random variables X and Y, (dropping time subscripts), we can
write their means, variances, and covariance as follows.
E(X) = X
E(Y) = Y
2X
2
var(Y) = E(Y-Y)2 = E(Y2) - Y
var(X) = E(X-X)2 = E(X2) -
cov(X,Y) = EX X Y Y EXY X Y .
Covariances are actually linear operators. A function is f : A B or
11
of an operator as a special case of a function where the domain and range
consist of normed space such as a vector space. These technicalities are not
important except in more advanced courses.
Now consider N number of random variables Xi, where i=1,2,.,N. A
very useful property of a covariance is shown below.
N
N
cov X i , X j
j1
i 1
N
N
E X i EX i X j E X j
i 1
j1
E X i EX i X j EX j
N
i 1 j1
E X i EX i X j EX j
N
i 1 j1
covX i , X j .
N
i 1 j1
var X Y covX Y, X Y
covX, X covX, Y covY, X covY, Y
var X var Y 2covX, Y .
A convenient property of a correlation coefficient is that it lies between 1
and +1. This is shown as follows. For any real ,
2 2
var X Y 2
X Y 2 X Y 0 .
Put
X
. Then, 2 2 2 2 2 2 0 .
X
X
X
Y
1 0 , or
2
1.3
1 . Therefore, 1 1 .
DISTRIBUTIONS
12
f(x)
2 2
1 x
where the mean of x is , and the s.d. of x is . and are given constants.
xf(x)dx
E(X) =
Var(X) = E(X-)2
= (x ) 2 f(x)dx
f(x)dx .
F(X) =
X
or X Z
x
, and is called the standard normal
variable.
For normal distribution N(, 2),
x
F(X) =
x
f
dz
13
x
x
. The standard
is the standard normal pdf, and z =
where f
5%
a= -1.645
14
Table 1.3
Z
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
1.100
1.282
1.300
1.400
1.500
Z
1.600
1.645
1.700
1.800
1.960
2.000
2.100
2.200
2.300
2.330
2.400
2.500
2.576
2.600
2.700
2.800
f(x, y)
where
1
2 X Y 1 2
1
q
2
(1.1)
15
x
X
X
cov(x, y)
and
.
XY
1
q
1 2
x X
2
y Y
y Y
Y
f x 1 , x 2 ,, x p
p/2
1/2
T
exp x 1 x
2
X. The 3rd central moment variance3/2 is known as skewness. The 4th central
moment variance2 is known as kurtosis.
The normal distribution r.v. X N(,2) has a mean , variance 2,
skewness 0, and kurtosis 32. Hence the standard normal variate Z N(0,1)
has a mean 0, variance 1, skewness 0, and kurtosis 3.
Many financial variables, e.g. daily stock returns, currency rate of change,
etc. display skewness as well as large kurtosis compared with the benchmark
normal distribution with symmetrical pdf, skewness = 0, and kurtosis = 3.
Figure 1.5
Example of a Pdf with Negative Skewness and Large Kurtosis
f(x)
negative or left
skewness
(longer left tail)
16
Departure from normality is illustrated by a pdf in Figure 1.5. The shaded
area in Figure 1.5 shows a normal pdf. The unshaded curve shows pdf of a
random variable with negative skewness and a kurtosis larger than that of the
normal random variable.
The concept of stochastic independence between random variables is
important. Two random variables X and Y are said to be stochastically
independent if and only if their joint pdfs can be expressed as follows:
f(X,Y) = fx(X) fy(Y).
One implication of the above is that for any function h(.) of X and any
function g(.) of Y, their expectation can be found as
E(h(X) g(Y)) = E(h(X)) E(g(Y)).
A special case is the covariance operator. If X and Y are (stochastically)
independent, then it implies their covariance is zero:
cov(X,Y) = EX X Y Y EX X EY Y 0 .
The converse is not always true. It is true only for special cases such as when
X and Y are jointly normally distributed. When X and Y are jointly normally
distributed, then if they have zero covariance, they are stochastically
independent. For bivariate normal pdf, conditional
g(x | y)
f(x, y)
.
f Y (y)
Or,
q
1
g(x | y)
2 X Y 1 2
1
Y 2
e 2
1 y Y
2 Y
2 X2 1 2
1
2 2X|Y
e 2 1-
1
X 2 1
x X
X
2
2
e 2 1- X
1
2 2X | Y
X X|Y 2
y Y
Y
X y Y
x X
Y Y
17
where X|Y (1 2 ) x is the variance of X conditional on Y=y,
2
and X|Y X
X
y Y is the mean of X condition on Y=y.
Y
X-
2
random variable V
~ 1 is a chi-square distribution with 1
degree of freedom. If X1, X2, X3, , Xn are n random variables each
independently drawn from the same population distribution N(( 2 ) , or
X
2
think of {Xi}i=1 to n as a random sample of size n, then i
~ n is
i 1
n
then
Vr
d
Ur1-1 d
~ Fr1 ,r2 is an
Vr2-1
STATISTICAL ESTIMATION
18
1 n
x k . Another common sample statistics is the unbiased sample
n k 1
variance
s2
1 n
x k x 2 .
n 1 k 1
Xn
1 n
Xk
n k 1
1 n
1 n
1 n
E X k EX k .
n k 1
n k 1
n k 1
n
n
1
1
n 2 2
var( X n ) 2 var X k 2 var X k 2
.
n
n
n k 1
n
k 1
Since X n is a normal random variable, therefore
E(X n )
2
N ,
n
Xn
n X n
N0,1 .
s2 d 2
n 1 2 ~ n 1 . Thus it can be seen that E(n-12) = n-1, the number of
s2
2
n X n
s
19
is distributed as Student-t with (n-1) degrees of freedom and zero mean.
Denote the random variable with t-distribution, n-1 degrees of freedom, as tn-1.
Then,
n X n d
~ t n 1 .
s
Suppose we find (-a,+a), a>0, such that Prob(-a tn-1 +a) = 95%. Since tn-1 is
symmetrically distributed, then Prob(-a tn-1) = 97.5% and Prob(tn-1 +a) =
97.5%. Thus,
Prob a
Also,
n X n
a 0.95 .
s
s
s
Prob X n a
Xn a
0.95 .
n
n
Suppose x1, x2, x3, ., xn-1, xn are randomly sampled from X ~ N(( 2 ) .
Sample size n = 30. The t-statistic value such that Prob(t29 2.045) = 97.5% is
t29 = 2.045. Then
s
s
Prob X n 2.045
X n 2.045
0.95 .
30
30
s
s
X n 2.045
, X n 2.045
when estimated s is entered.
30
30
1.5
STATISTICAL TESTING
In many situations there is apriori (or ex-ante) information about the value of
the mean , and it may be desirable to use observed data to test if the
information is correct. is called a parameter of the population or fixed
distribution N(, 2). A statistical hypothesis is an assertion about the true
value of the population parameter, in this case . A simple hypothesis
specifies a single value for the parameter while a composite hypothesis will
specify more than one value. We will work with the simple null hypothesis H0
(sometimes this is called the maintained hypothesis), which is what is
postulated to be true. The alternative hypothesis HA is what will be the case if
the null hypothesis is rejected. Together the values specified under H0 and HA
should form the total universe of possibilities of the parameter. For example,
H0: = 1
HA: 1.
20
A statistical test of the hypothesis is a decision rule that, given the inputs from
the sample values and hence sampling distribution, chooses to either reject or
else not reject (intuitively similar in meaning to accept) the null H 0. Given
this rule, the set of sample outcomes or sample values that lead to rejection of
the H0 is called the critical region. If H0 is true but is rejected, a Type I error is
committed. If H0 is false but is accepted, a Type II error is committed.
The statistical rule on H0: = 1, HA: 1, is that if the test statistic
t n 1
s/ n
within the critical region (shaded), defined as {tn-1 < -a or tn-1 > +a} , a>0, as
shown in Figure 1.6 below, then H0 is rejected in favor of HA. Otherwise H0 is
not rejected and is accepted.
Figure 1.6
Critical Region under the Null Hypothesis H0: = 1
tn-1
X
-a true, then the t-distribution
0
If H0 were
would be correct, and +a
therefore the
probability of rejecting H0 would be the area of the critical region, or 5% in
this case. Notice that for n=61, P(-2 < t60 < 2) = 0.95. Moreover, the tdistribution is symmetrical, so each of the right and left shaded tails makes up
2.5%. This is called a 2-tailed test with a significance level of 5%. The
significance level is the probability of committing a Type I error when H0 is
true. In the above example, if the sample t-statistic is 1.045, then it is < 2, and
we cannot reject H0 at the 2-tailed 5% significance level. Given a sample tvalue, we can also find its p-value which is the probability under H0 of t60
exceeding 1.045 in a one-tailed test, or of exceeding |1.045| in a 2-tailed test.
In the above 2-tailed test, the p-value of a sample statistic of 1.045 would be 2
Prob(t60 > 1.045) = 2 0.15 = 0.30 or 30%. Another way to verify the test
is that if the p-value < test significance level, reject H0; otherwise H0 cannot
be rejected.
In theory, if we reduce the probability of Type I error, the probability of
Type II error increases, and vice-versa. This is illustrated as follows.
21
Figure 1.7
pdf f(X)
tn-1
-2
Suppose H0 is false, and > 1, so the true tn-1 distribution is represented by the
dotted curve in Figure 1.7. The critical region {tn-1 < -2.00 or tn-1 > 2.00}
remains the same, so the probability of committing Type II error is 1 sum of
shaded areas. Clearly, this probability increases as we reduce the critical
region in order to reduce Type I error. Although it is ideal to reduce both types
of errors, the tradeoff forces us to choose between the two. In practice, we fix
the probability of Type I error when H0 is true, i.e. determine a fixed
significance level e.g. 10%, 5%, or 1%. The power of a test is the probability
of rejecting H0 when it is false. Thus power = 1 P(Type II error). Or power
equals the shaded area in Figure 1.7. Clearly this power is a function of the
alternative parameter value 1. We may determine such a power function
of 1.
Thus reducing significance level also reduces power and vice-versa. In
statistics, it is customary to want to design a test so that its power function of
1 equals or exceeds that of any other test with equal significance level for
all plausible parameter values 1 in HA. If this test is found, it is called a
uniformly most powerful test.
We have seen the performance of a 2-tailed test. Sometimes we embark
instead on a one-tailed test such as H0: = 1, HA: > 1, in which we
theoretically rule out the possibility of < 1, i.e. P( < 1) = 0. In this case, it
makes sense to limit the critical region to only the left side, for when > 1,
then tn-1 will become smaller. Thus at the one-tail 5% significance level, the
critical region is {tn-1 < -1.671} for n=61.
1.6
DATA TYPES
Consider the types of data series that are commonly encountered in regression
analyses. There are four generic types, viz.
22
(a)
(b)
(c)
(d)
Time series
Cross-sectional
Pooled Time Series Cross-Sectional
Panel/longitudinal/micropanel
Time series are the most prevalent in empirical studies in finance. They are
data indexed by time. Each data point is a realization of a random variable at a
particular point in time. The data occur as a series over time. A sample of such
data is typically a collection of the realized data over time such as the history
of ABC stocks prices on a daily basis from 1970 January 2 till 2002
December 31.
Cross-sectional data are also common in finance. An example is the
reported annual net profit of all companies listed on an exchange for a specific
year. If we collect the cross sections for each year over a 20-year period, then
we have a pooled time series cross section of companies over 20 years. Panel
data are less used in finance. They are data collected by tracking specific
individuals or subjects over time and across subjects.
The nature of data also differs according to the following categories.
(a) Quantitative
(b) Ordinal e.g. very good, good, average, poor
(c) Nominal/categorical e.g. married/not married, college graduate/nongraduate
Quantitative data such as return rates, prices, volume of trades, etc. have
the least limitations and therefore the greatest use in finance. These data
provide not only ordinal rankings or comparisons of magnitudes, but also
exact degrees of comparisons. There are some limitations and therefore
special considerations to the use of the other categories of data. In the
treatment of ordinal and nominal data, we may have to use specific tools such
as dummy variables in regression.
1.7
PROBLEM SET
1.1 X, Y, Z are r.v.s with a joint pdf f(X,Y,Z) that is integrable. Show using
the concept of marginal pdfs that E(X+Y+Z) = E(X)+E(Y)+E(Z) by
integrating over (X+Y+Z).
X
,
i X j in terms of the N by N
i
1
j
NxN
23
1.3 The following is the probability distribution table of a trivariate U1, U2,
and U3.
U1
U2
U3
P(U1,U2,U3)
-1
-2
-3
.125
-1
-2
3
.125
-1
2
-3
.125
-1
2
3
.125
1
-2
-3
.125
1
-2
3
.125
1
2
-3
.125
1
2
3
.125
Find the bivariate probability distribution P(U1, U2). Find the marginal
P(U3).
1.4 In the probability distribution table of a trivariate U1, U2, and U3,
U1
U2
U3
P(U1,U2,U3)
-1
-2
-3
.125
-1
-2
3
.125
-1
2
-3
.125
-1
2
3
.125
1
-2
-3
.125
1
-2
3
.125
1
2
-3
.125
1
2
3
.125
(iii)
2
2
1.5 X, Y have joint pdf f(X,Y) = exp(-X-Y) for 0<X,Y<, and pdf is 0
elsewhere. Find the marginal pdfs of X and Y. Are X and Y stochastically
dependent?
1.6 X, Y have a joint pdf f(X,Y) = 1 in the set {0X2, 0YX/2}.
(i)
Find the marginal distributions of X and Y.
(ii)
Find the variances of X and Y, and the covariance of X,Y.
(iii)
Find the conditional means E(X|Y), E(Y|X), and conditional
variances var(X|Y), var(Y|X).
1.7 Xit is distributed as independent univariate normal, N(0,1) for i=1,2,3, and
t = 1,2,.,60. Yt = 0.5X1t + 0.3 X2t + 0.2 X3t . What are the mean and the
standard deviation of Yt? If a computer program runs and churns out
24
3K number of random values Zj belonging to univariate normal N(0,1)
distribution, and Wi = 0.5Z3i-2 + 0.3Z3i-1 + 0.2Z3i for i=1,2,....,K, what is
K
1
for i=1,2,.,K, and Xi and Xj are
60
X
+1
-1
Y
-1
0
Z
0
+1
25
Chapter 2
STATISTICAL LAWS AND
CENTRAL LIMIT THEOREM
APPLICATION: STOCK RETURN DISTRIBUTIONS
Key Points of Learning
Stochastic process, Stationarity, Law of large numbers, Central limit theorem,
Rates of return, Lognormal distribution, Information sets, Random walk, Law
of iterated expectations, Unconditional expectation, Conditional mean,
Conditional variance, Jarque-Bera test
STOCHASTIC PROCESS
.. etc.
form a stochastic process that is weakly stationary. If Var( ~r1 )=0.25, what is
Var ( ~r5 )? Clearly this is the same constant 0.25. If Cov( ~r1 , ~r3 )=0.10, what is
26
cov( ~r7 , ~r9 )? Clearly, this is 0.10 since the time gap between the two random
variables is the similarly two months in either case.
Suppose we have a realized history of the past 60 monthly return rates
{rt}t=1,2,,60. Each of these rts is a known number, e.g. 0.01, one percent, or
0.005, negative half percent. The realized number rt is a sample point taken
from the pdf of random variable ~rt . We have to learn to distinguish between
what is a random variable that has an attached pdf, and what is a realized
sample point that is a given number. Notice that sometimes a tilde is put
over the variable to denote it as being random. The past history or realized
values of the stochastic process, {rt}t=1,2,,60 , e.g. {0.010, -0.005, 0.003, 0.008,
-0.012, ., 0.008} is called a time series, which is a time-indexed
sequence of sample points of each random variable ~rt in the stochastic process
{ ~rt }t.
A stochastic process {Xi}i is said to be strongly stationary if each set of
{Xi, Xi+1, Xi+2, , Xi+k} for any i and the same k has the same joint
multivariate pdf independent of i. As an example, consider joint multivariate
normal distributions, MVN. Suppose the following is strongly stationary,
then clearly the joint multivariate pdf of ~r3 , ~r4 , ~r5 has the same MVN (M, ).
There are two very important and essential theorems dealing with
stochastic processes and therefore applicable to the study of time series of
empirical data. They are the Law of Large Numbers and the Central Limit
Theorem.
2.2
The Law of Large Numbers (LLN) states that if x1, x2, , xn is a realized
sample randomly chosen from any random variable Xi with a fixed pdf where
each time a draw is taken from an independent Xi, then the sample average or
sample mean converges to the expected value of random variable Xi or E(Xi).
This is sometimes referred to as Kolmogorovs LLN when the convergence
refers to a sample mean taken from a time series, and the corresponding
stochastic process is stationary and also independently distributed. The latter
implies that any Xj and Xk within {Xi}i are independent. We will discuss
convergence in a later chapter, but for now, it suffices to understand it as
approaching in value in some arbitrarily close fashion. Thus, the law of
large numbers states:
lim
n
1 n
x i where E(Xi) = .
n i1
27
An extension of the above, relaxing the assumption of independence, states
that in a (stationary) ergodic stochastic process {Xt}t with mean , i.e. E(Xt) =
for all t, if x1, x2, , xn is a realized sample randomly chosen from the
stochastic process {Xt}t , then
lim
n
1 n
Xi .
n i1
The Central Limit Theorem states that if X1, X2, , Xn is a vector of random
variables drawn from the same stationary distribution with mean and
variance 2, and suppose we let
n
i 1
or
n X ,
or else,
Y i1
or
,
lim Y ~ N(0,1) .
n
n 1 n
X i N(0,1), then
n i 1
2
1 n
N
Xi n Y , n .
n i1
n
Or,
X
i 1
n n Y Nn , n 2 .
(2.1)
28
This says that when n is large, the sample mean X , itself a random variable, is
normally distributed with mean and variance 2/n.
2.4
holding period or interval [t, t+1). If the time interval or each period is small,
this is approximately the discrete return rate
Pt 1
1 . However, the discrete
Pt
return rate is bounded from below by -1. Contrary to that, the return rt,t+1 has (, ) as support, as in a normal distribution.
We can justify how rt,t+1 can be reasonably normally distributed, or
equivalently, that the price is lognormally distributed, over a longer time
interval. Consider a small time interval or period = 1/T, such that
29
P
ln t , the small interval continuously compounded return, is a random
Pt
variable (not necessarily normal) with mean = /T, and variance 2 =
2/T. The allowance of small 0 in the above satisfies (c).
Aggregating the returns,
P
P
P
ln t ln t 2 ln t 3 ln t T ln t T . (2.2)
P
P
P
P
P
t
t
t 2
t
t (T 1)
The right-hand side of equation (2.2) is simply the continuously compounded
Pt 1
rt,t 1 over the longer period [t,t+1), where the length is
Pt
return ln
k1
P
var ln t k var rt j,t j1 k 2 .
Pt
j0
Thus, ex-ante variance of return increases with holding period [t, t+k). This
satisfies characteristic (d).
It is important to recognize that the discrete or holding period return rate
Pt 1
1 does not display some of the appropriate characteristics. The discrete
Pt
period returns have to be aggregated geometrically in the following way.
Pt 1
1 R t,t 1
Pt
Pt k k 1 Pt j1 k 1
1 R t j,t j1 0,
Pt
j0 Pt j
j0
The lower boundary of zero is implied by the limited liability of owners of
listed stocks. This discrete setup is cumbersome and poses analytical
intractability when it comes to computing drifts and variances. It is
straightforward to compute the means and variances of sums of random
variables as in the case of the continuously compounded returns, but not so for
30
products of random variables when they are not necessarily independent, as in
the case of the discrete period returns here.
2.5
Earlier we have seen how when two random variables X, Y are jointly
bivariate normal, we can express the conditional mean or expectation of one in
terms of the other viz.
E X | Y E X
X
Y E Y .
Y
(2.3)
E X,Y X E X X
where the superscripts in the expectation operator denote that the integral is
taken with respect to those random variables. We could also use small letter x
to denote the sample realization of random variable X, although sometimes we
ignore this if the context is clear as to which is used, whether it is a random
variable or a realized value. We could also employ notation E X| y X | y to
denote an expected value taken on random variable X based on conditional
probability of X given Y = y.
When two random variables X, Y are not jointly normal, the linear
relationship in equation (2.3) is not possible based just on distributional
assumptions. Instead we have to impose the linear relationship directly. For
example, we may assume or specify:
X = a + bY + e
(2.4)
where a, b are constants, and e is a random variable with zero mean and
variance 2, and is independent of Y. Equation (2.4) is called a linear
regression model, or a linear relationship connecting two or more random
variables including at least one unobservable random variable e. Then
E(X) = a + b E(Y).
Now E(X|Y) = a + bY, since E(Y|Y) = Y, and E(e|Y) = E(e) = 0. Then E(X|Y)
= a + b E(Y) + b[Y E(Y)] = E(X) + b[Y E(Y)].
Also, cov(Y,X) = cov(Y,a) + b cov(Y,Y) + cov(Y,e) = b var(Y). Thus,
b=
XY
which is identical to the case under bivariate normality. Thus, it may be seen
that the linear regression model in (2.4) plays a crucial role.
From (2.4), var(X) = b2 var(Y) + 2 where var(e) = 2. But conditional
variance var(X|Y) = 2. Therefore var(X|Y) < var(X). Thus it is seen that
31
providing information Y reduces the ex-ante uncertainty or variance of X. Of
course, if X and Y are not linearly related, i.e. b = 0, then var(X|Y) = var(X),
in which case knowing Y does not reduce the uncertainty. This idea of
reducing uncertainty with given relevant information Y about X is central in
the thinking and theory of finance. For example, if we know pertinent
information about tomorrows stock return movements, then the risk of
investing in stocks will be suitably reduced.
2.6
We have introduced the idea of a stochastic process earlier. Let the time
sequence of random variables Pt , Pt+1 , Pt+2 , .. represent the prices of a stock
at time t, t+1, t+2, etc. Then {Pt}t is a stochastic process.
Let t , t 1 , t 2 ,.... represent the information set at time t, t+1, t+2, etc.
that is available to the decision-maker for making forecasts of the future price
of the stock. We may interpret an information set t essentially as a random
variable that can take realized sample values t that are called information. A
piece of information at different times t, t+1, t+2, etc., viz. t , t+1 , t+2 , .
can be thought of as some function of other random variable Yt at time t, i.e.
t(yt). t has a joint density with Pt, and possibly also with Pt+1. Therefore,
since Yt and Pt, Pt+1, are jointly determined in a probabilistic manner, then
given information t(yt), a better forecast of next period Pt+1 can be attained.
Et(Pt+1) E(Pt+1|t) is a conditional expectation or forecast of next period
Pt+1 based on information available at t, i.e. t . Notice that subscript t is used
to denote evaluation of integral over information set at t, i.e. t. Such
applications of conditional expectations are plentiful in the finance literature.
Early studies in finance suggest simple stochastic processes for prices such as
a random walk:
Pt+1 = + Pt + et+1
(2.5)
where is a constant drift, and et+1 is a disturbance or white noise or a random
variable that is independent of past information as well as prices. Equation
(2.5) is sometimes called an arithmetic random walk in prices. The latter name
arises since when an arithmetic or subtraction operation is performed on the
prices such as taking the price difference, it is equal to a constant with an
added disturbance.
Since Pt t , Et(Pt+1) E(Pt+1|t) = E(Pt+1|Pt) as only Pt is relevant
according to the random walk process above. Other information within t
except for Pt is redundant in equation (2.5). This is an implication of the
arithmetic random walk in prices. Thus, Et(Pt+1) E(Pt+1|t) = + Pt .
32
If =0, then the best forecast of tomorrows price is todays price Pt
according to the random walk theory as in (2.5). Suppose we construct a
random walk in natural logarithm of price. Then
ln Pt+1 = + ln Pt + et+1 .
This is sometimes called the geometric random walk in prices. Or,
P
ln t 1 rt,t 1 = + et+1. If we specify et+1 N(0, 2), then we are back to
Pt
the lognormal model described earlier. Thus we see that we can construct
meaningful return rate distributions using the linear model of stochastic price
process as in the random walk model above. The linear model is essentially a
difference equation in the logarithms of price.
Suppose the information set t (is a subset of) t , the Law of Iterated
Expectation states:
E[ E(Pt+1|t) |t] = E(Pt+1| t) .
If we condition on the null set , E[ E(Pt+1|t) |] = E(Pt+1| ) = E(Pt+1)
that is also the unconditional expectation or unconditional forecast. Applying
the Law of Iterated Expectations to information revelation over time,
E[ E(Pt+2|t+1) |t] = E(Pt+2| t) since t t+1 .
Or, Et [ Et+1(Pt+2) ] = Et (Pt+2) . The best forecast of tomorrow (t+1)s forecast
of X at t+2 is equal to the best forecast of P at t+2 today at t.
2.7
In the following, we shall show more formally that E[ E(P t+1|t) |t] = E(Pt+1|
t) when t t. Although not necessary, but for convenience of proof, we
shall assume that jointly distributed random variables X, Y, and Z have joint
probability density function f(x, y, z). Then, f x, y f x, y, z dz .
The Law of Iterated Expectations can take various forms. At the simplest
level, consider
E X, Y Y
y f x, y dy dx
f x, y
y
f x dy dx
x
y
f x
y f y | x dy f x dx
x y
E Y | x f x dx
Y| x
E X E Y|x Y | x
33
where we use notation E Y|x Y | x to denote an expected value taken on
random variable Y based on conditional probability density function f y | x
expressed in the superscript to the expectation operator. We could also have
used E Y|x Y[x] indicating the integrand is y that is a function of x.
Similarly, we should be able to show
E X,Y|z Y | z
y f x, y | z dy dx
f x, y | z
y
f x | z dy dx
x
f x | z
y
y f y | x, z dy f x | z dx
x y
E
x
Y| x, z
Y | x, z f x | z dx
E X|z E Y|x,z Y | x, z .
We can think of random variables {X,Z} as the information set t, and values
x, z as the realized information t at time t. Likewise {Z} is an information
set t , and clearly t t . Thus we may rewrite the equation of the law of
iterated expectations above as:
E Y | t E E Y | t | t
TEST OF NORMALITY
Given the importance of the role of the normal distribution in financial returns
data, it is not surprising that many statistics have been devised to test if a
given sample of data {ri} comes from a normal distribution. One such statistic
is the Jarque-Bera (JB) test of normality.1 The test is useful only when the
sample size n is large (sometimes, we call such a test an asymptotic test). The
2
2
d
JB test statistic n 3 ~ 2
2
24
See C M Jarque and A K Bera, (1987) A Test for Normality of Observations and
Regression Residuals, International Statistical Review, vol.55, 163-172.
34
where is the skewness measure of {ri} , and is the kurtosis measure of
{ri}. The inputs of these measures to the JB test statistic are usually sample
estimates. For {ri} to follow a normal distribution, its skewness sample
estimate should converge to 0, since the normal distribution is symmetrical
with third moment being zero, and its kurtosis sample estimate should
converge to 3. If the JB statistic is too large, exceeding say the 95 th percentile
of a 2 distribution with 2 degrees of freedom, or 5.99, then the null
hypothesis, H0, of normal distribution is rejected. The JB statistic is large if
and ( -3) deviate materially from zero.
Recall that the p-value (probability-value) of a realized test statistic t*
based on the null hypothesis distribution is either:
(a) the probability of obtaining test statistic values whose magnitudes are even
larger than t*, i.e. P(tt*) for one-tail right-tail test, or
(b) the probability of obtaining test statistic values whose absolute magnitudes
are larger than t* in a symmetrical zero mean null distribution, i.e. P(t-|t*| or
t|t*|) for a two-tail test. Thus if in a statistical test, the significance level is set
at %, and the p-value is x%, then reject H0 if x , and accept H0 if x > .
2.9
35
Tuesdays return is usually computed as the log (natural logarithm) of
close of Tuesdays stock price relative to close of Mondays price. Unlike
other days, however, one has to be sensitive to the fact that Mondays return
cannot be usually computed as the log (natural logarithm) of close of
Mondays stock price relative to close of Fridays price. The latter return
spans 3 days, and some may argue that the Monday daily return should be a
third of this, although it is also clearly the case that Saturday and Sunday have
no trading. Some may use closing price relative to the opening price on the
same day to compute daily returns. Open-to-close return signifies return
captured during daytime trading when the Exchange is open. However, closeto-open return signifies price change taking place overnight. We shall not be
concerned with these issues for the present purpose.
The three series of daily, weekly, and monthly return rates are tabulated in
histograms. Descriptive statistics of these distributions such as mean, standard
deviation, skewness, and kurtosis are reported. The Jarque-Bera tests for
normality of the distributions are also conducted.
Figure 2.1
Histogram and Statistics of Daily AXP Stock Return Rates
280
Series: DR
Sample 1 1257
Observations 1257
240
200
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
160
120
80
Jarque-Bera 419.1988
Probability
0.000000
40
0
-0.06
0.000377
-0.000210
0.063081
-0.057786
0.013331
0.129556
5.817207
-0.04
-0.02
0.00
0.02
0.04
0.06
In Figure 2.1, the JB test statistic shows a p-value of <0.000, thus normality is
rejected at significance level 0.0005 or 0.05% for the daily returns. The mean
return in the sampling period is 0.0377% per day, or about 252 0.0377 =
9.5% per annum. The daily return standard deviation or volatility is 1.333%.
If the continuously compounded return were indeed normally distributed, the
annual volatility may be computed as 252 0.01333 = 21.16%. Figure 2.1
indicates AXP stock has positive skewness during this sampling period. Its
kurtosis of 5.817 exceeds 3.0 which is the kurtosis of a normally distributed
random variable.
36
In Figure 2.2, the JB test statistic shows a p-value of <0.000, thus
normality is also rejected at significance level 0.0005 or 0.05% for the weekly
returns.
Figure 2.2
Histogram and Statistics of Weekly AXP Stock Return Rates
50
Series: WR
Sample 1 1257
Observations 251
40
30
20
10
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
0.001815
0.001179
0.112051
-0.103370
0.025712
-0.040191
5.345425
Jarque-Bera
Probability
57.59904
0.000000
0
-0.10
-0.05
0.00
0.05
0.10
Figure 2.3
Histogram and Statistics of Monthly AXP Stock Return Rates
8
Series: MR
Sample 1 1257
Observations 57
7
6
5
4
3
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
0.008601
0.004985
0.121969
-0.092568
0.047514
0.166812
2.721483
Jarque-Bera
Probability
0.448583
0.799082
2
1
0
-0.10
-0.05
0.00
0.05
0.10
In Figure 2.3, the JB test statistic shows a p-value of 0.449. Thus normality is
not rejected at significance level 0.10 or 10%. (Indeed it is not rejected even at
significance level of 44%. Sometimes we may call the p-value the exact
significance level.)
37
In the tables of the Figures, note that sample size n=1257 for the daily
returns, n=251 for the weekly returns, and n=57 for the monthly returns. The
mean formula is
i 1 i
ni 1 ri
n 1
r
n
i 1
n 3
r
and
n
i 1
n 4
It is interesting to note that daily and weekly stock return rates are usually not
normal, but aggregation to monthly return rates produces normality as would
be expected by our earlier discussion on Central Limit Theorem. This result
has important implications in financial modeling of stock returns. Short
interval return rates should not be modeled as normal given our findings. In
fact, the descriptive statistics of the return rates for different intervals above
show that shorter interval return rates tend to display higher kurtosis or fat
tail in the pdf. Many recent studies of shorter interval return rates introduce
other kinds of distributions or else stochastic volatility to produce returns with
fatter tails or higher kurtosis than that of the normal distribution.
The next example is that of the Overseas Chinese Banking Corporation
(OCBC) which is one of the 3 largest banks in Singapore. OCBC is a strong
blue-chip stock with plenty of liquidity in trading. The OCBC bank daily
stock returns in a 5-year period from 10/27/1997 to 10/25/2002 are collected
from the Singapore Stock Exchange (SGX) source and processed as follows.
The return rates are daily continuously compounded return rates ln(P t+1/Pt).
Weekly as well as monthly stock returns are computed from the daily return
rates. Likewise, the three series of daily, weekly, and monthly return rates are
tabulated in histograms shown in Figures 2.4, 2.5, and 2.6. Descriptive
statistics of these distributions such as mean, standard deviation, skewness,
and kurtosis are reported. The Jarque-Bera tests for normality of the
distributions are also conducted.
As in the case of American Express Company, the daily return rates of
OCBC show very high kurtosis deviating from normality. There is also
skewness. In Figures 2.4 and 2.5, the JB test statistics show p-values less than
0.000. Tthus normality is rejected at significance level 0.0005 or 0.05% for
the daily as well as the weekly returns.
From Figure 2.4 the mean return in the sampling period is 0.0259% per
day, or about 253 0.0259 = 6.55% per annum. The daily return standard
38
deviation or volatility is 2.527%. The annual volatility may be computed as
252 0.02527 = 40.19%.
Figure 2.4
Histogram and Statistics of Daily OCBC Stock Return Rates
500
Series: DR
Sample 1 1305
Observations 1305
400
300
200
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
0.000259
0.000000
0.163979
-0.116818
0.025270
0.297441
6.720127
Jarque-Bera
Probability
771.7571
0.000000
100
0
-0.10
-0.05
0.00
0.05
0.10
0.15
Figure 2.5
Histogram and Statistics of Weekly OCBC Stock Return Rates
70
Series: WR
Sample 1 1301
Observations 255
60
50
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
40
30
20
10
Jarque-Bera
Probability
0
-0.2
-0.1
0.0
0.1
0.001324
0.007725
0.203466
-0.261330
0.061151
-0.272323
5.723938
81.98756
0.000000
0.2
In Figure 2.6, the JB test statistic shows a p-value of 0.858. Thus normality
is not rejected at significance level 0.10 or 10%. In the tables of the Figures,
39
note that sample size n=1305 for the daily returns, n=255 for the weekly
returns, and n=65 for the monthly returns.
Figure 2.6
Histogram and Statistics of Monthly OCBC Stock Return Rates
8
Series: MR
Sample 1 1281
Observations 65
7
6
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
5
4
3
2
1
Jarque-Bera
Probability
0
-0.2
2.10
-0.1
0.0
0.1
0.005055
3.76E-16
0.244560
-0.247624
0.113431
-0.081671
2.706749
0.305165
0.858488
0.2
PROBLEM SET
40
(iii) Suppose Rt,t+, R t+,,t+2, etc. are independent, have constant mean ,
but different variances where variance tends to be high when return is
low and low when return is high. Intuitively explain if the monthly
return sample distribution will or will not show fatter tails?
2.3 X1, X2, and X3 are n 1 vectors of stock A,B, and Cs observed monthly
return rates from month t=1 to month t=n. L is a n 1 vector with each
element equal to 1. Express the average returns of A, B, and C in terms
of the Xs and L. What is the multivariate distribution of these average
returns if the stock returns are independent of each other, but each stock
return is theoretically MVN N( , n x n )?
2.4 Suppose we are testing a theory that says that the market as a whole holds
a conditional expectation of a certain stocks return Rt+1 as XtQ where
Et(Rt+1| t) = XtQ where t is all the information available to the market at
t, Xt is an observable variable that has a stationary history, and Q is a
known constant. Show how you would test this model by employing the
Law of Iterated Expectations.
2.5 Prove that E(et+1|Pt) = 0 implies that et+1 and Pt have zero correlation.
2.6 Find the mean and variance of the lognormally distributed random
variable X where log X ~ N(,2).
41
Chapter 3
TWO-VARIABLE LINEAR REGRESSION
APPICATION: FINANCIAL HEDGING
Key Points of Learning
Regression model, Transformations, Dependent variable, Regressand,
Regressor, Ordinary least squares method, Classical conditions, Unbiased
linear estimator, Estimation efficiency, Gauss-Markov theorem, Testing of
coefficient estimates, Decomposition of squares, Coefficient of determination,
Forecasting, Stock index futures, Cost of carry, Arbitrage, Hedging, Hedge
ratio
REGRESSION
42
In Figure 3.1, we show sample observations of X and Y variables that
occur simultaneously.
The bold line shows an attempt to draw a linear line as close to the
observed occurrences as possible. Or does it make sense for us to draw a
nonlinear line that fits all the sample points exactly as seen in the dotted
curve? Obviously not. This is because the bivariate points are just realized
observations of bivariate random variables, and at the next sampling, no
matter how large the sample size is, the points will change positions. Drawing
a line through all or most of the 8 points is like a model with 8 parameters,
and results in over-fitting problem.
Figure 3.1
(X3,Y3)
(X7,Y7)
(X5,Y5)
(X8,Y8)
(X2,Y2)
(X4,Y4)
(X6,Y6)
(X1,Y1)
X
What we want is a straight line in a linear model (or a curve in a nonlinear
model) that is estimated in such a way that whatever the sample, as long as the
size is sufficiently large, the line will be pretty much remain at about the same
position. This will then enable purposes of (1) explaining Y given any X (not
just those observed) that is within the normal range of X in the context, and
(2) forecast given a new X or in the case of time series, the next period X t+1.
When the sample size is small, there will be large sampling errors of the
parameter estimates.
Therefore, the idea of a regression model (needs not be linear), Y = f(X;)
+ , where is a random error or noise, is one where parameter(s) are
suitably estimated as . is close to true given a sample of {(Xi
43
g Y
n
i 1
f X i ;
sense where g(.) is a criterion function. For example g(z) = z2 is one such
criterion function. Thus a linear regression model does not fit random
variables X, Y perfectly, but allows for a residual noise in order that the
model is not over-parameterized or over-fitted. This would then serve
purposes (1) and (2).
A Linear (bivariate) Regression Model is
Yt = a + bXt + et ,
where a, b are constants. In the linear regression model, Yt is the dependent
variable or regressand. Xt is the explanatory variable or regressor. et is a
residual noise, disturbance, or innovation.
If a constant a has been specified in the linear regression model, then the
mean of et is zero. If a constant has not been specified, then et has a non-zero
mean. It is common to add the specification that et is independently and
identically distributed (in short, i.i.d.). This means that the probability
distributions of et , et+1, et-1, etc. are all identical, and that et is stochastically
independent of all other r.v.s, including its own lags and leading terms, i.e.
cov(et, et-k) = 0 and cov(et, et+k) = 0 for k = 1,2,3,.
An even stronger specification is that et is i.i.d. and also normally
distributed, and we can write this as n.i.d. N(, 2). In trying to employ the
model to explain, and also to forecast, the constant parameters a and b need to
be estimated effectively, and perhaps some form of testing on their estimates
could be done to verify if they accord with theory. This forms the bulk of the
material in the rest of this chapter.
It is also important to recognize that a linear model provides for correlation
between Xt and Yt (this needs not be the only type of model providing
correlation, e.g. nonlinear models Yt = exp(Xt) also does the job) as we see
occurred in joint bivariate distribution (X,Y). For example, in Yt = a + bXt +
et, with i.i.d. et, we have cov(Xt,Yt) = b var(Xt) 0 provided b 0.
Sometimes we encounter a timeplot (a timeplot shows a variables realized
values against time) or a scatterplot (a graph of simultaneous pairs of realized
values of random variables) that does not look linear, unlike Figure 3.1. As an
example, consider the following two regressions both producing straight lines
that appear to cut evenly through the collection of points in each graph if we
use the criterion that minimizes z2.
The point is that using some intuitively appropriate criterion to fit linear
lines is not enough. It is important to first establish that the relationships are
linear before fitting a linear regression model makes sense.
44
Figure 3.2
X
Figure 3.3
X
In Figure 3.2, the graph Y versus X is clearly a nonlinear curve toward the
origin. If it is quadratic, then it is appropriate in that case to use a nonlinear
regression model such as Y = a + bX + cX2 + .
In Figure 3.3, for Y versus X, there is clearly an outlier point with a very
high Y-value. As a result, the fitted line is actually above the normal points
that form the rest of the sample. This can be treated either by excluding the
outlier point if the assessment is that it is an aberration or distortion, or else by
providing for another explanatory variable to explain that point that may be a
rare event.
Thus, a visual check on the data plots is useful to ascertain if a linear
regression model is appropriate and whether there are outliers.
Sometimes there are theoretical models that specify relationships between
random variables that look nonlinear, but can be transformed to linear models
45
so that linear regression methods can be applied for estimation and testing.
Examples are as follows.
When Y = AXB, take Log-Log transformation (taking logs on both sides),
so
ln Y = lnA + B lnX + ln .
Note that here the disturbance noise must necessarily be larger than zero,
otherwise ln will have non-feasible values. Here, ln can range from - to
. Sometimes Y is called the constant elasticity function since B is the
constant elasticity (when ln is fixed at zero).
When Y=exp(a+bX+), taking logs on both sides ends up with a semi-log
transformation, so lnY = a + bX + . This is also called a Semi-log model.
When eY = AXB, taking logs on both sides ends up again with a Semi-log
model Y = lnA + B ln X + ln . Sometimes when the regressor X is a fast
increasing series relative to Y, then taking the natural log of X as regressor
will produce a more stable result, as long as theory has nothing against this
adhoc data transformation practice.
There are examples of interesting nonlinear curves that are important in
economics. An example is the Phillips curve as follows.
Figure 3.4
Philips Curve Relating Short-Run Wage Inflation with Unemployment
Level
Y
Rate of wage
change
X Unemployment
46
Next we study one major class of estimators of the linear regression model
and the properties of such estimators. This class is synonymous with the
criterion method for deriving the estimates of the model parameters. This is
the ordinary least squares criterion. For this chapter, we will cover only the
two-variable linear regression model.
3.2
~
Yi a bX i ~
ei ,
i 1,2,, N.
(3.1)
(A1)
Ee i 0 for every i.
(A2)
(A3)
(A4)
47
In simpler treatment, Xis are assumed to be given, so we can treat them as
constants. We shall adhere to this mostly in this chapter. This treatment can be
justified easily in the case when there is repeated sampling of Y is given that
Xis can be pre-selected. If not, the results that ensue are interpreted as being
conditional on given Xis. Such Xt is deterministic or exogenous, and are
also referred to as pre-determined in a time series context. At other times, Xt
is stochastic and occurs jointly with Yt. It is theoretically easier if Xt is
deterministic. In the latter, it is easier to accept that the e is are uncorrelated,
or stronger still, i.i.d. Such properties of the disturbance will be seen to
simplify the estimation theory.
In addition to assumptions (A1) through (A4), we could also add a
distributional assumption to the random variables, e.g.
(A5)
e i ~ N 0, 2 .
In Figure 3.5 below, the dots represent the data points (Xi ,Yi) for each i.
The regression lines passing amidst the points represent attempts to provide a
i indicate measure of
linear association between Xi and Yi. The scalar value e
the vertical distance between the point (X1,Yi) and the fitted regression line.
The solid line provides a better fit than the dotted line, and we shall elaborate
on this.
Figure 3.5
Ordinary Least Squares Regression of Observations (Xi, Yi)
Y
(X1,Y1)
e 1
(X2,Y2)
e 3
(X3,Y3)
e 2
48
The requirement of a linear regression model estimation is to estimate a and b.
The ordinary least squares (OLS) method of estimating a and b is to find a
N 2
and b so as to minimize the residual sum of squares (RSS), e i .
i1
Note that this is different from minimizing the sum of squares of random
variables ei which we do not observe. This is an important concept that should
not be missed. It harks back to the distinction of what is a random variable and
what is its sample value in a single draw.
It should also be noted that the estimators a , b , and are themselves
random variables. However, given a particular sample, the computed number
a 0.245, for example, is a realized value of the estimator, and is called an
estimate. Although the same notation is used, the context should be
distinguished.
The key criterion in OLS is to minimize the sum of the squares of vertical
distances from the points to the fitted OLS straight line (or plane if the
problem is of a higher dimension):N
i 1
i 1
e i2 Yi a bX
i
min
a ,b
e i2
i 1
2 Yi a bX
i 0
i 1
e i2
i 1
2 X i Yi a bX
i 0
i 1
Note that the above left-side quantities are partial derivatives. The equations
above are called the normal equations for the linear regression of Yi on Xi.
From the FOC,
N
i 1
i 1
i 1
Yi a bX
i
NY Na bNX
a Y bX
(3.2)
49
N
X Y X a bX
i
i 1
i 1
i 1
i 1
i 1
2
i
(3.3)
X i Yi aNX
b X i2
Putting (3.2) into (3.3):
N
i 1
i 1
bX
X i Yi X i Y bX
i2
i 1
X Y Y b X X
i
i 1
i 1
X Y Y
i
i 1
N
X X
i
i 1
Xi X Yi Y
i 1
N
X
i 1
X X i X
x y
i
i 1
N
x
i 1
(3.4)
2
i
Y Y 0 .
i 1
We see that a , b , are linear estimators with wis as fixed weights (when Xis
are deterministic) on the Yis. They are linear functions of Yis. The weights
for b are as follows.
N
b w i Yi
i 1
where w i
xi
x
i 1
2
i
50
2
i
w x
i
1
x i2
1
a are as follows.
N
N
1
Yi w i XYi v i Yi
N i1
i 1
i 1
N
xiX
1
where v i
.
N N 2
xi
i 1
X x i
vi 1 x2 1
i
v X
i
XX
x X
x X
i
1
2x i X
x i2 X 2 1
X2
2
i N 2 N x 2 2 2 N
i xi
x i2
In the above, x i X i X NX X 0.
Now, for the finite sample properties of OLS estimators:
b wi a bXi ei b wiei .
2
var b E b b
.
2
i
w e w Ee 1
x
Similarly, a v a bX e a v e
= E
Then,
2
i
i i
2
i
i i
. Thus, Ea a .
51
1
2
X 2
2
var a E a a E v i e i 2 v i2 2
.
N x2
i
The above results show that the means of the estimators a and b are centered
at true population parameters a and b respectively. Thus we say that the OLS
estimators a and b are unbiased.
What is the probability distribution of b ? Using (A5), since b is a linear
combination of eis that are normally distributed, b is also normally
distributed.
1
.
b ~ N b , 2
x2
1
X 2
.
a ~ N a , 2
2
N
x
X
E v i e i w i e i 2 v i w i 2
x2
i
3.3
GAUSS-MARKOV THEOREM
The Gauss-Markov Theorem states that amongst all linear and unbiased
estimators of the form
N
Y
A
i i
i 1
N
Y
B
i i
i 1
where i and i are constant weights in Xis (and not in Yis or a or b), and
a, E B
b, the OLS estimators
E A
variances, i.e.
var b var B
var a var A
52
In this sense, OLS estimators (under the classical conditions) are called
BLUE, viz. Best Linear Unbiased Estimators for the linear regression model
in (3.1). They are efficient estimators (estimation efficiency) when they have
the least variances and are unbiased. They are best in this linear unbiased
class.2
What happens to the estimators a and b when the sample size N goes
toward infinity? In such a situation when sample size approaches infinity (or
practically when we are in a situation of a very large sample, though still finite
sample size), we are discussing asymptotic (large sample) theory.
Consider the following sample moments as sample size increases toward
. Earlier we see that population means E X X , E Y Y , and
population covariance E X X Y Y E XY X Y . The Sample
1 N
1 N
X
X
,
i
Yi Y . From the Law of Large Numbers,
N i1
N i1
lim X X , lim Y Y , when X and Y are stationary.
means are
N
1
X i X Yi Y is unbiased, but we
N2
1
X i X Yi Y = SXY if N approaches . Both the
N
1
X i X 2 S2X , and observe that
coefficient
is
rXY
XY
. The sample
X Y
X X Y Y
X X Y Y
i
S XY
.
SXS Y
There are some estimators that are biased but may possess smaller variances e.g. the
Stein estimators.
53
xy x y .
2
S
r S S
S
OLS estimator b XY
XY 2X Y rXY Y .
2
SX
SX
SX
. It can be further
consistent. The estimated residual e i Yi a bX
i
expressed as
bX
Y Y b X X y bx
.
e i Yi Y bX
i
i
i
i
i
i from the
It is important to distinguish this estimated residual Yi Y
actual unobserved ei. From the FOCs in (3.2) and (3.3), we see that
N
e i 0 and
Xie i 0 or also
i 1
i 1
(X
i 1
X) e i 0 .
What are their population equivalents? They are respectively E(e i) = 0, E(Xiei)
= 0 or cov(Xi , ei) = 0.
3.4
DECOMPOSITION
Y Y
N
i 1
Recall that in the OLS method, we minimize the sum of squares of estimated
N
residual errors
e
i 1
2
i
. Now,
54
Y Y Y Y Y Y Y Y
N
i 1
i 1
N
i 1
2 Y i Y Yi Y i Yi Y i
i 1
i 1
2
Y i Y 2 Y i Y e i e i
i 1
N
i 1
i 1
2
Y i Y e i
i 1
i 1
(3.5)
e
a bX
Y Y e Ye
N
since
i 1
i 1
i i
i 1
0.
Y Y
Y Y
i .
Define Residual Sum of Squares (RSS) = e i Yi Y
Thus, from (3.5), TSS = ESS + RSS. RSS is also called the unexplained sum
of squares (USS).
Now,
2
Y
ESS Y
i
2
rXY
a b X i a b X
2
b 2 X i X
S 2Y
2
NS 2X rXY
NS 2Y
S 2X
TSS Yi Y NS2Y
2
So,
ESS 2
RSS
2
.
rXY . Also, rXY
1
TSS
TSS
ESS
R2
TSS
where 0 < R2 < 1 is called the coefficient of determination. This coefficient R2
determines the degree of fit of the linear regression line to the data points in
the sample. The closer R2 is to 1, the better is the fit. Perfect fit occurs if all
points lie on the straight line. Then R2 = 1.
55
ESS
RSS
.
R2 1
TSS
TSS
The unbiased estimator of residual variance e2
1 N 2
e i .
N 2 i1
1
X2
2
2
e
eN
x i2
X
2
2
2
2v i e 2X i w i e 2 e X i
x2
i
2
X
2x X 2x X
2X X
1
X2
i 2
i
i i
i
2 1
2
N x2 x2 N x2
xi
x i2
i
i
i
1
2
2
e Xi
x2
1
1
2 1
X 2 X 2 2x X 2x X 2X X
i
i
i i
i
e
N x2
x2
1
i
2 1
e
N x2
cov a - a b - b X e , a - a b - b X e
i i
j j
1 X 2 X jX 1 x jX Xi X Xi X j x jXi 1 x i X x i X j
2
e N
x i2 x i2 N x i2 x i2 x i2 x i2 N x i2 x i2
1
1
2
e N
x i2
X 2 X X X X X X
i
j
i j
xx
1
i j
2
e N
x i2
56
Note that although true ei and ej are independent according to the classical
conditions, yet their OLS estimates are correlated. Now,
N
2
i
i 1
e2
2 1 X2 2 X
2
e
a e N x 2
x
N ,
2 1
b
b e2 X 2
e x 2
x
b b
So,
Z N(0,1).
s.e. b
e2 . Use
t N 2
b 1
1
x2
t N 2
a 0
1
N
X2
x2
FORECASTING
a bX
Y
N 1
N 1
57
where a and b are the OLS BLUE estimates. This forecast is the most
efficient amongst all linear forecasts. In general, the most efficient forecast is
E(YN+1|XN+1), and this needs not be the OLS forecast if the regression model is
not linear as in (3.1).
We can express the above in terms of population parameter a.
N 1 a bX
Y
N 1 Y bX bX N 1 Y bx N 1
where x N 1 X N 1 X .
But (3.1) gives Y a bX
1 N
ei and here we are dealing with a
N i1
1 N
Y bx
Y
bX
ei bx
N 1
N 1
N 1
N i1
which is again a representation as a random variable. But
N
1 N
b x e 1 e .
bx bx
YN 1 Y
b
i
i
N 1
N 1
N 1
N 1
N 1
N 1
N i1
N i1
2
1
var YN 1 Y N 1 | x N 1 x 2N 1 var b e2 e2 e2 1 N1 xNN 1
N
xi2
i 1
YN 1 Y
N 1
So,
1
N
2
xNN 1
xi2
i 1
t N 2 .
YN 1 t N 2,95% e 1 1 xN2N 1 .
N xi2
i 1
58
estimators (MLE) can be developed. MLE essentially chooses estimators
that maximizes the sample likelihood function. There is equivalence of OLS
and ML estimators in the specific case of normally distributed i.i.d. eis.
However, MLE is in general a nonlinear estimator.3
3.6
We shall re-visit the idea of maximum likelihood estimation in more details in a later
Chapter. The theory of maximum likelihood estimation and nonlinear estimation can
be read in more advanced econometrics textbooks. There are numerous such excellent
books. One example is: Russell Davidson and James G. MacKinnon, (1993),
Estimation and Inference in Econometrics, Oxford University Press.
59
futures contract that matures in December. This is called a December Nikkei
225 Index futures contract to reflect its maturity. After its maturity date, the
contract is worthless. In September, however, the traded price (this is not a
currency price, but an index price or a notional price) of this December
contract will reflect how the market thinks the final at-maturity Nikkei 225
index will be. If the September market trades the index futures at a notional
price of 12,000, and you know that the December index number is going to be
higher, then you will buy (long) say N of the Nikkei 225 Index December
futures contracts. At maturity in December, if you still have not sold your
position, and if the Nikkei 225 index is really higher at 14,000, then you will
make a big profit. This profit is calculated as the increase in futures notional
price or 2000 points in this case the Yen value per point per contract
number of contracts N.
Thus, a current stock index futures notional price is related to the index
notional price at a future time. At maturity, the index futures notional price
also converges to the underlying stock index number. As stock index
represents the average price of a large portfolio of stocks, the corresponding
stock index futures notional price is related to the value of the underlying
large portfolio of stocks making up the index. This relationship is sometimes
called the no-arbitrage model pricing. It can be explained briefly as follows.
3.7
Suppose we define the stock index value to be St at current time t. This value
is the capitalization-weighted average of the underlying portfolio stock prices.
The actual market capitalization currency value of the portfolio is of course a
very large constant multiplier of this index value. Nevertheless, the percentage
return to the index changes reflects the overall portfolios gain or loss.
Suppose an institutional investor holds a large diversified portfolio say of the
major Japanese stocks. Even if this portfolio consists of only 80% of the
number of stocks in the Nikkei 225 Index, the return to this portfolio would
look quite similar to the return computed on the changes in the Nikkei 225
Index value.
Let the effective risk-free interest rate or cost of carry be Rt,T over [t,T].
An arbitrageur could in principle buy or short-sell a portfolio of N225 stocks
in proportions equal to their value weights in the index. Let the cost of this
portfolio be St whereby is a constant multiplier reflecting the proportionate
relationship of the portfolio value to the index notional value S t. However, the
percentage return on the index is also the same percentage return on the
portfolio.
The arbitrageur either carries or holds the portfolio, or short-sells the
portfolio till maturity T with a final cost at T of St (1+Rt,T) after the
60
opportunity cost of interest compounding is added. Suppose the Japanese
stocks in the N225 index issue an aggregate amount of dividends D over the
period [t,T]. Since the N225 index notional value is proportional to the overall
225 Japanese stocks market value, the dividend yield d as a fraction of the
total market value is the same dividend yield as a fraction of the N225 index
notional value. Then, the dividends issued to the arbitrageurs portfolio
amount to dSt. Suppose that the dividends to be received are perfectly
anticipated, then the present value of this amount, d*St can be deducted
from the cost of carry. Let D*=d*St. The net cost of carry of the stocks as at
time T is then
[St D*] (1+Rt,T).
Suppose the N225 index futures notional price is now trading at Ft,T. The
subscript notations imply the price at t for a contract that matures at T. The
arbitrageur would enter a buy or long position in the stocks if at t, Ft,T > [St
D*] (1+Rt,T). At the same time t, the arbitrageur sells an index futures contract
at notional price Ft,T. For simplicity, we assume the currency value per point
per contract is 1. Without loss of generality, assume =1. At T, whatever the
index value ST = FT,T, the arbitrageur would:Sell the portfolio at ST, gaining (Yen or $, whichever may be)
$ ST [St D*] (1+Rt,T).
Cash-settle the index futures trade, gaining
$Ft,T FT,T or $Ft,T ST.
The net gain is the sum of the two terms, or $Ft,T [St D*] (1+Rt,T) > 0.
Thus, the arbitrageur risklessly makes a profit equivalent to the net gain
above.
Conversely, the arbitrageur would enter a short position in the stocks if at t,
Ft,T < [St D*] (1+Rt,T). At the same time t, the arbitrageur buys an index
futures contract at notional price Ft,T. At T, whatever the index value ST = FT,T,
the arbitrageur would:Buy back the portfolio at ST, gaining (Yen or $, whichever may be)
$ [St D*] (1+Rt,T) ST
Cash-settle the index futures trade, gaining
$FT,T Ft,T or $ST Ft,T .
61
The net gain is the sum of the two terms, or $ [St D*] (1+Rt,T) Ft,T > 0.
Thus, the arbitrageur risklessly makes a profit equivalent to the net gain
above.
We have of course ignored transaction costs in this analysis, which would
mean that it is even more difficult to try to make riskless arbitrage4 profit. The
cost-of-carry model price of the index futures Ft,T = [St D*] (1+Rt,T) is also
called the fair value price.
We employ data from Singapore Exchange (SGX) that contain daily endof-day Nikkei 225 Index values and Nikkei 225 Index December 1999 futures
contract prices traded at SIMEX/SGX during the period September 1 to
October 15, 1999. During this end 1999 period, the Japan money market
interest rate was very low at 0.5% p.a. We use this as the cost-of-carry interest
rate. We also assume the Nikkei 225 stock portfolios aggregate dividend was
1.0 % p.a. at present value. During these trade dates, the term-to-maturity is
about of a year or 3 months from September/October till December.
Based on the finance theory above, we plot in Figure 3.6 the two time
series of the N225 futures price Ft,T , and the fair price Ft* = [St D*] (1+Rt,T),
where D* = (1- 1.0% ) of St or 0.9975 St, and Rt,T = 0.5% .
Figure 3.6
Prices of Nikkei 225 December futures contract from 9/1/1999 to
10/15/1999
18500
18000
17500
17000
Futures
Price
16500
Fair
Value
16000
15500
9/1
9/8
9/15
9/22
9/30
10/7
10/15
One of the earliest studies showing that such risk-free arbitrage in the Nikkei 225
Stock Index futures had largely disappeared, after transaction costs, in the late 1980s
is a paper by Kian-Guan Lim, (1992), Arbitrage and Price Behavior of the Nikkei
Stock Index Futures, Journal of Futures Markets, Vol 12, No 2, 151-162.
62
We also compute a normalized percent difference p = (Ft,T Ft*)/Ft*
indicating the percentage deviation from the fair price Ft*. We test the
statistical hypothesis, H0: p = 0, during this period. The simple t-test is used
here. If p is highly positive, then arbitrageurs can make profit by shorting the
index futures and buying stocks to carry. If p is highly negative, then
arbitrageurs can make profit by longing the index futures and shorting stocks
(borrowing them through a brokerage and paying extra transaction costs in
this case). The time series of p is also shown in Figure 3.7.
Figure 3.7
Difference between futures price and fair price
0.01
0.005
0
9/1
9/8
9/15
9/22
9/30
-0.005
10/7
10/15
-0.01
-0.015
Figure 3.6 shows that both the futures price and the fair value are tracked
closely together. The sample size for the two variables is 30. The t-statistic of
their daily difference p, at d.f. 29,
29
p
is -0.1278 which is
s.e.p
63
Sometimes when large transactions take place, the market price may be
sensitive to the price pressure and prices may move against a large buy by a
resulting higher average price paid for the total order, or prices may dip in a
large sell order resulting in average lower revenue. This is called impact cost.
For a short position in stocks, the cost could be even higher as the arbitrageur
normally would have to also pay for borrowing the stocks.
The graph in Figure 3.7 suggests that if p > 0 (or futures price > fair
value), except for one outlier, it reverses direction downward when it hit about
%. If p < 0 (or futures price < fair value), it reverses direction upward when
it hit -1%. One possible interpretation, though this is by no means conclusive,
is that suppose transaction cost to arbitrageurs in a long spot-short futures
situation, where p > 0, was a little less than %, then arbitrage profits
occurred at about p = %. This drew infinitely (or theoretically close) many
sell orders for futures and buy orders for spot and thus pushed futures price
down and spot price up. Then, p started to drop toward zero. This is the
observed reversal pattern close to p = %.
On the other hand, suppose transaction cost to arbitrageurs in a short spotlong futures situation, where p < 0, was a little less than 1 %, then arbitrage
profits occurred at about p = - 1 %. This drew infinitely (or theoretically close)
many buy orders for futures and sell orders for spot and thus pushed futures
price up and spot price down. Then, p started to increase toward zero. This is
the observed reversal pattern close to p = - 1 %.
The above interpretation would be consistent with (note that we did not say
it is a conclusive or even convincing evidence of)5 the presence of some
arbitrage opportunities for large institutional players with relatively low costs
of transaction. The pattern basically says that if on day t, pt (change in p
from t-1 to t) hit a high note and drew in arbitrage, then p would reverse, and
the following pt+1 would be opposite in sign to pt.
We can statistically examine the reversal in p by investigating a
regression of the daily change in p over its lag. Specifically, we perform the
linear regression pt+1 = a + b pt + et+1 , where pt+1 = pt+1 pt , a, b are
coefficients, and et+1 is assumed to be an i.i.d. residual error. In this case, the
first data point p2= p2 p1 is the change in p from 9/1 to 9/2; the last data
point p30= p30 p29 is the change in p from 10/14 to 10/15.
Since we employ a lag for regression, the number of sample observations used
in the regression is further reduced by 1, so there are only N=28 data points {
5
There are other stories from a market micro-structure perspective. One example of
such possibilities is about bid-ask bounce causing negative serial correlations even
when there were no information and the market was efficient. See Richard Roll,
(1984), A Simple Implicit Measure of the Effective Bid-Ask Spread in an Efficient
Market, Journal of Finance, 1127-1139.
64
(p30, p29 ), (p29, p28 ), , (p3, p2) } for the dependent variable, and
28 data points { (p29, p28 ), (p28, p27 ), , (p2, p1) } for the
explanatory variable that is the first lag of the dependent variable in the
regression. Thus the linear regression would produce t-statistic with N-2 or in
this case 26 degrees of freedom. The regression results are reported in Table
3.1.
The regression results show that b = -0.5237078 , and is statistically
significant at 2-tailed 1% significance level, p-value being 0.0057, thus
rejecting H0: b = 0. This shows the presence of reversals in p across days.
Table 3.1
Regression of Difference of Actual and Fair N225 Futures Price on its
Lag: pt+1 = a + b pt + et+1 (CHANGEP is pt+1)
Dependent Variable: CHANGEP
Method: Least Squares
Sample: 1 28
Included observations: 28
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
LAGCHANGEP
0.000161
-0.523708
0.000891
0.173972
0.181
-3.010
0.8579
0.0057
R-squared
Adjusted R-squared
S.E. of regression
0.2585
0.2299
0.004701
F-statistic
Prob(F-statistic) df 1,26
Sum squared resid
9.062
0.00574
0.000200
HEDGING
We can also use linear regression to study optimal hedging. Suppose a large
institutional investor holds a huge well-diversified portfolio of Japanese stocks
that has returns following closely that of the N225 stock index return or rate of
change. Suppose in September 1999, the investor was nervous about an
imminent big fall in Japan equity prices, and wished to protect his portfolio
value over the period September to mid-October 1999. He could liquidate his
stocks. But this would be unproductive since his main business was to invest
in the Japanese equity sector. Besides, liquidating a huge holding or even a big
part of it would likely result in loss due to impact costs. Thus, the investor
decided to hedge the potential drop in index value by selling some h Nikkei
225 index futures contracts. If the Japanese stock prices did fall, then the gain
65
in the short position of the futures contracts would make up for the loss in the
actual portfolio value.
The investors original stock position has a total current value V t. For
example, this could be 10 billion Yen. Suppose his stock position value is a
constant factor f the N225 index value St. Then Vt+1 = f St+1, and further
the portfolio return rate Vt+1/Vt = St+1/St, as mentioned in the last paragraph.
In essence, the investor forms a hedged portfolio comprising f St Yen,
and h number of short positions in N225 index futures contracts. The contract
with maturity T has notional traded price Ft,T and an actual price value of
500 Ft,T where the contract is specified to have a value of 500 per notional
price point. At the end of the risky period, his hedged portfolio value change
would be:Pt+1 Pt = f (St+1 St) h 500 (Ft+1,T Ft,T ).
(3.6)
In effect, the investor wished to minimize the risk or variance of Pt+1 Pt
P. Now, simplifying notations, from (3.6):P = f S h 500 F.
So, var(P) = f2 var(S) + h2 5002 var(F) 2h 500f cov(S, F).
The FOC for minimizing var(P) with respect to decision variable h
yields:
2h (5002) var(F) 2(500f) cov(S, F) = 0,
or an risk-minimizing optimal hedge of
h* = f cov(S, F) / {500 var(F)}.
This is a positive number of contracts since St and Ft,T would move together
and recall that at maturity T of the futures contract, ST = FT,T. h* can be
estimated by substituting in the sample estimates of the covariance in the
numerator and of the variance in the denominator. It can also be estimated
through the following linear regression employing OLS method:S = a + b F + e
where e is the usual residual error that is uncorrelated with F. We run this
regression and the results are shown in Table 3.2. Theoretically, b = cov(S,
F)/ var(F) = 500/f h*. (Recall that earlier in the chapter, when dealing
with 2-variable linear regression, b = cov (Xi ,Yi)/var(Xi).) The OLS estimate
One of the earliest studies to highlight use of least squares regression in optimal
hedging is Louis H. Ederington, (1979), The Hedging Performance of the New
Futures Markets, Journal of Finance, Vol 34, No 1, 157-170.
66
Table 3.2
Regression of Change in Nikkei Index (SPOTCHANGE) on Change in
Nikkei Futures Price (FUTCHANGE): S = a + b F + e
Dependent Variable: SPOTCHANGE
Method: Least Squares
Sample: 2 30
Included observations: 29
Variable
Coefficient
Std. Error
t-Statistic
Prob.
FUTCHANGE
C
0.715750
4.666338
0.092666
24.01950
7.723968
0.194273
0.0000
0.8474
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.688436
0.676897
129.3249
451573.2
-181.1206
59.65968
0.000000
1.087586
227.5159
12.62900
12.72330
12.65854
2.706941
From Table 3.2, b is 0.71575. With a 10 billion portfolio value and spot
N225 index on 1 September 1999 at 17479, f = 10b/17479 = 572115. Number
of futures contract to short in this case is estimated at
h* = b f /500 = 0.71575 572115 / 500 819 N225 futures contracts.
3.9
PROBLEM SET
3.1 Suppose Yt is the excess stock return at month t, and Xt is the excess
market return at time t. Suppose we run an OLS regression
Yt = a + b Xt + et
on a sample size 60. Assume normally distributed returns.
(i) Find the alpha and the beta estimate of the stock using the following
sample data.
60
X Y
t 1
5.0 103 ;
X 0.005
60
X
t 1
2
t
67
(ii) Suppose you had instead run an OLS regression
Yt = c + d Zt + nt
where Zt is another factor variable e.g. market trading volume. If OLS
1 n
xk x 2 . Find the expected value of s2 and
n 1 k 1
(i)
Find X and Y .
Xi
2
2.5
3
3
4
5
4.5
6
7
8
Yi
3
4
4.5
5
6.5
7
8
8.5
9
9.5
68
in the following equation using OLS? Do you need to assume anything
about the data?
X2
Y
X
3.7 You hypothesize that demand for a particular credit card is linearly
related to income. In using OLS of demand on income to estimate the
coefficients of the regression model assuming the hypothesis is true,
would you attempt to sample the card subscription patterns or demand
using a survey of people from a homogeneous income group or from a
wide range of income groups? Explain.
3.8 A well-diversified stock portfolio has $40m value, and a beta 1.22
relative to S&P500. The portfolio manager anticipates a bearish horizon.
Instead of liquidating, he sells N contracts of S&P500 stock index futures.
Suppose 1 futures contract = $500 x futures price in points or notional
value. Suppose the S&P 500 notion value is now 1500 points. Suppose the
rate of change of the S&P 500 index is the same as the rate of change of
the futures index. What is N if the manager wants to hedge perfectly?
3.9 Given the market model of stock returns rit = i + i rmt + eit where
eit is i.i.d. N(0, e2), and suppose the OLS estimates i and i are found.
Given sample observations {rit} and {rmt} for t=1,2,3,..,T, i and i ,
show the appropriate formulae of computing the unbiased estimates of
(i) e2
(ii) conditional variance var(rit|rmt)
(iii) (unconditional) variance of rit
FURTHER RECOMMENDED READINGS
[1] Russell Davidson and James G. MacKinnon, (1993), Estimation and
Inference in Econometrics, Oxford University Press.
[2] Damodar N. Gujarati, (1995), Basic Econometrics, 3rd edition,
McGraw-Hill.
[3] Jack Johnston, and John DiNardo, (1997), Econometric Methods, 4th
edition, McGraw-Hill.
69
Chapter 4
MODEL ESTIMATION
APPLICATION: CAPITAL ASSET PRICING MODEL
Key Points of Learning
Market model, Capital asset pricing model, Beta, Alpha, Systematic risk,
Unsystematic risk, Security Characteristic Line, Test of Zero-Beta CAPM,
Jensen measure, Treynor measure, Sharpe measure, Appraisal ratio, Market
timing
MARKET MODEL
Suppose rit is the return rate of stock i at time t, and rmt is the market portfolio
return rate (or alternatively return rate of a market index) at time t. If r it and rmt
are bivariate normally distributed (all stock returns being MVN is sufficient to
give this bivariate relationship), then conditional distribution of rit|rmt is normal
7
70
with (conditional) mean and (conditional) variance
E rit | r mt E rit
im
2
im
var rit | rmt 2
m
2
i
(4.2)
where a E rit
(4.3)
im
E rmt
2
m
(4.4)
im
and E eit | rmt 0 .
m2
2
m
2
e
2
im
which is (4.2).
m2
Market model does not necessarily imply the Sharpe-Lintner CAPM as parameter a
is not necessarily constrained to be the CAPM intercept. Nevertheless, a is a constant
that can possibly follow that constraint. CAPM does not necessarily imply the market
model as quadratic utility and not MVN can be a sufficient condition for CAPM. But
MVN is both a sufficient condition for CAPM and for the market model.
71
4.2
a rf 1 b
where rf is the riskfree rate that is constant. This condition is not exactly (4.4)
since rf is not required nor does it appear there. However, it can be thought of
as a special case of (4.4). Then, imposing this theoretical CAPM restriction,
(4.3) becomes
rit rft b rmt rft eit
(4.5)
where we have added time indexing for the riskfree rate in order to allow the
rate to vary over time. The familiar CAPM equation we usually see is the
expectation condition (take expectation of (4.5)):
E(rit ) rft b E rmt rft
(4.6)
that holds good for each time period t. In other words, this CAPM is actually a
single period model where the cross-sectional expected returns of stocks are
related to the excess expected market portfolio return. It has become common
for empirical as well as some theoretical reason, including assuming stationary
processes in the returns, to treat the estimation and testing of CAPM in the
time series version of (4.5) rather than just the expected condition above.
In (4.5), the relationship applies for every stock i in the economy, so it is
convenient to write b as bi (still a constant) to indicate association of a
different bi with a different stock i. bi is called the beta of stock i. Thus,
r
t 1
mt
rm rit ri
r
t 1
mt
rm
72
where rx
1 T
rxt . Since the classical OLS conditions are met, b is BLUE.
T t 1
im
.
m2
(4.7)
where E eit | rmt 0 for each i and t. Since rft is supposed to be a constant at
t, eit is independent of rft, and thus also of (rmt rft). This version is slightly
more general and has an added advantage as follows. It involves regression of
excess stock i return rate rit rft on excess market return rate rmt rft , and
intercept ai. ai is also called the alpha of stock i. It is theoretically 0 in
equilibrium, but could become positive or negative in actual regression. The
interpretation of the latter then becomes one of financial performance:ai > 0
ai < 0
73
sense of changing its distribution so that if there is some random break in
between, the sampling moments computed for 5 years will be more robust
(less prone to deviation from the true underlying distribution during the
sample period) and make more sense than sampling moments across 10 years.
Another example is when the underlying distribution is stationary, but
conditional distribution changes dramatically such as 5 years of recession
followed by 5 years of boom. In such a case, taking a sample from the entire
10 years to model a situation of boom may provide incorrect inferences. Some
studies also recommend adjustments to the estimation of beta to minimize
sampling errors. For details, see Blume (1975).9
4.3
ESTIMATING BETA
Weekly stock return data of Chuan Hup (marine sector) and of the market
index are collected in the sampling period 9/10/2000 to 8/25/2002 from the
Singapore Stock Exchange. The market index is represented by the Singapore
Straits Time Index comprising large capitalization stocks from the main listing
(the number as well as the constituent stocks are updated from time to time).
The ST Index return proxies for the market portfolio return. The Singapore
Government Treasury bills 3-month rates of return, which serves as a proxy
for the riskfree rate, are also collected from Datastream.
To compute the excess weekly return on a stock, we subtract from the
weekly stock return the weekly riskfree return rate. Ideally we should use the
return of a Treasury-bill with one week left till maturity as the weekly riskfree rate. However, short-term T-bill rates such as 3-month rates are more
easily available, and we may use these as approximations. The interest rates
are typically quoted on a per annum basis, so we need to convert them to
weekly basis by dividing them by 52, viz. ln(1+rweekly) = 1/52 * ln(1+rp.a.) .
The linear regression model using (4.7) is employed to estimate alpha and
beta:
rit rft a i bi rmt rft eit .
The case of Chuan Hup Limited (CHL) is illustrated as follows. The
dependent variable is the weekly excess return rate, denoted CHL_EXC_RET.
The formulae for the various reported statistics in Table 4.1 are explained as
follows.
The sample sequence is from 2 to 103, thus yielding a sample size of 102
(Included observations). Number of regressors, k=2, as shown by number of
74
explanatory variables
MKT_EXC_RET.
in
the
Variable
column,
viz.
and
1
X2
.
e T
T
2
Xt X
t 1
Table 4.1
Regfression Results of rit rft a i bi rmt rft eit
1
.
T
2
Xt X
t 1
T
The SSR, Sum squared resid, is SSR e 2 .
t
t 1
75
1
T2
SSR .
T
2
Y Y .
t
T 1 t 1
R 2 /(k 1)
F-statistic in the table is F
.
k 1, T k
2
(1 R )/(T k)
For the case k=2, the t-Statistic for MKT_EXC_RET, is also the square-root
of the F1,T-k statistic where the first degree of freedom in the F-statistic is one,
2 F
i.e. t
. This result does not generalize to k>2.
Tk
1, T k
.
4.4
INTERPRETATION OF REGRESSION RESULTS
Chuan Hup Limited (CHL) is a firm that has substantial interests in the marine
business among others. We employ all 2 years of data, although a case can be
made for using a slightly smaller sample size, as was noted earlier.
What is the OLS estimate of alpha? a i ri rf b i rm rf where the bar
denotes the sampling average. From Table 4.1, it is seen that OLS estimate for
alpha is 0.0032, but the p-value for the t-statistics is 0.24, which means that
we cannot reject H0: a = 0 at significance level up to 20% for a 2-tailed test.
Estimate a appears not to be significantly different from zero. Thus there is
positive alpha (which means a good performing stock), but this is not
significant.
OLS estimate for beta is 0.624. Thus the stock is not well-diversified or
else its beta will be close to 1, and is positively correlated with but not
strongly sensitive to market movements. Its t-statistic is 7.755 with p-value
lesser than 0.000, so beta is certainly significantly positive. The F-statistic
with degrees of freedom k-1=1, and T-k=102-2=100, is 60.143. Notice that the
higher the coefficient of determination R2, the higher the F-value. This is a test
on null H0: a=b=0. Thus a good fit with reasonably high R2 = 0.376 implies
that a and b fit well and are unlikely to be zero. Therefore H0 is rejected since
p-value for F1,100 is < 0.00000. The estimate of stock is systematic risk is
b i
1 T
2
rmt rm . The estimate of stock is unsystematic risk is the
T 1 t 1
76
e = RSS /(T 2) . From the standard error of regression e and the t-value
T
X X
t
77
the intercept as alpha, and indications of unsystematic risks as dispersions of
the returns about the CHL We produce such a plot in Figure 4.1. The stocks
SCL should not be confused with the markets security market line SML
which is represented by a graph of expected returns versus their corresponding
betas. We also plot the estimated (or fitted) residuals of the SCL in Figure 4.2.
Figure 4.2
Estimated Residuals
PERFORMANCE MEASURES
78
E rpt rft
bp
where subscript p denotes association with a portfolio. Theoretically, in
equilibrium when there is no abnormal performance as in zero Jensen
measure, it is equivalent to the expected excess market portfolio return rate.
Therefore, this measure shows whether a portfolio is performing better than or
equal to, or worse than the market portfolio.
It is estimated using
rp rf
. It can be shown that if the Jensen measure
b
p
E r
rft
( ) E r
mt rft .
bp
The above performance measures are useful for well-diversified portfolios, but
could also be interpreted for individual stocks. For portfolios that are not welldiversified, their total risk p becomes important. Sharpe measure or Sharpe
ratio
pt
E rpt rft
p
shows how well the portfolio is performing relative to the capital market line
E rmt rft
with slope
.
E rpt rft
( )
E rmt rft
p
E rmt rft ( ) 0
m
E rpt rft pm 2p m E rmt rft (uncertain) 0
m
E rpt rft
a (uncertain) 0
79
Thus, there is also some relationship between the Sharpe performance
measure and the other two measures. All the measures identify superior
performance consistently with one another. Sharpe measure is estimated using
rp rf
.
p
There is also the appraisal ratio
ap
which is estimated by
a p
e
where the
Banz, Rolf W., (1981), The Relationship Between Return and Market Value of
Common Stocks, Journal of Financial Economics, 3-18.
80
Figure 4.3
Alphas and Betas in CAPM regressions using monthly returns of 10
Sized-Portfolios in Sampling Period January 2002 to December 2006
1.6
A LP HA
B E TA
1.2
0.8
0.4
0.0
-0.4
1
10
Fung, William, and David A Hsieh, (2001), The Risk in Hedge Fund Strategies:
Theory and Evidence from Trend Followers, Review of Financial Studies, Vol 14,
No 2, 313-342.
81
performances. Research in hedge fund strategies has been especially
voluminous in recent years.
Market timing refers to the ability of funds managers to shift investment
funds into the market portfolio when market is rising, and to shift out of the
stock market into money assets or safe Treasury bonds when market is falling,
particularly if the market falls below riskfree return. If a particular fund can
perform in this way, then its returns profile over time will look as follows. See
Figure 4.4.
Figure 4.4
Excess Fund Return
switch to
riskfree asset
when market
falls below rf
Note the nonlinear profile of the returns realizations over time. Suppose we
represent the above by a different set of axes as follows by squaring the X
variable. See Figure 4.5.
It can be seen that existence of market timing abilities in a fund portfolio
will show up as a slope when we regress excess fund return on the square of
excess market return as Figure 4.5 indicates. If there is no market timing
ability, there will be as many points in the negative 4th quadrant, and the slope
of a fitted line will be flat or close to zero. This idea was first proposed by
Treynor and Mazur (1966)12. Many important subsequent studies include
12
Treynor J L, and Kay Mazur, (1966), Can Mutual Funds Outguess the Market?
Harvard Business Review, Vol 43.
82
Merton (1981)13. If we employ a multiple linear regression using another
explanatory variable which is the square of the market excess return,
where eit is independent of rmt, then market timing abilities in a fund will show
up as a significantly positive c i . Unfortunately, many mutual funds that were
studied did not display such market timing abilities.
Figure 4.5
switch to
riskfree asset
when market
falls below rf
4.6
PROBLEM SETS
83
a 0.004 , b 0.004 ,
100
e
i 1
2
i
100 120
and XT =
120 160
XTX =
0.00245 ,
1
100
(i) Find the t-statistics of the OLS estimators a , b under the null
H0: a=0 , HA: a 0; and H0: b=0 , HA: b0.
(ii) What is the average return of all the portfolio average returns Ris?
(iii) According to CAPM, how would you interpret a 0.004 and
b 0.004 to be?
4.2 A researcher runs the following regression of stock i returns rit on
market portfolio returns rmt :
rit = a + b rmt + eit
where eit is a residual noise that is i.i.d. and independent of rmt.
(i) Is eit independent of rit ?
84
FURTHER RECOMMENDED READINGS
[1] Bodie Z., Alex Kane, and A.J. Marcus, (1999), Investments, 4th edition,
Irwin McGraw Hill.
[2] Merton, R.C., (1981), On Market Timing and Investment Performance, I:
An Equilibrium Theory of Value for Market Forecasts, Journal of
Business, Vol 54, 363-406.
[3] Sharpe, William, (1964), Capital Asset Prices: A Theory of Market
Equilibrium under Conditions of Risk, Journal of Finance, 19, 425-442.
85
Chapter 5
CONSTRAINED REGRESSION
APPLICATION: COST OF CAPITAL
Key Points of Learning
Constrained regression, Net present value analysis, Internal rate of return,
Capital budgeting, Weighted average cost of capital, Levered beta, Levered
cost of equity, Dividend growth model, Residual income valuation model,
Earnings forecast, Security market line, Capital market line, Excess market
return, Fair rates of return to Utilities
In this chapter, we extend the usage of CAPM to the problem of estimating the
cost of capital in funding risky projects. This estimation involves estimation of
betas which we had shown in the last chapter, as well as estimation of market
risk premium. The latter is a bit more tricky and sometimes requires auxiliary
regressions involving constraining the intercept to be zero. The latter is the
same as regression through the origin. Although constrained regression is
more general and can apply to the constraint of any sets of coefficients in a
linear regression equation, we consider only the case of regression through the
origin here.
5.1
L
(X1,Y1)
(X2,Y2)
(X3,Y3)
86
We reconsider the sample observations of X and Y variables in Figure 3.5 of
chapter 3. Instead of the OLS line without constraint (which is now shown as
the dotted line), we seek the least squares line that must pass through the
origin 0. This line is the bold line L as seen on the graph.
The Linear (bivariate) Regression Model is
~
Yi bX i ~
ei ,
i 1,2,, N.
(5.1)
for a sample of size N, where b is the constant slope, and there is zero
intercept. As usual, Yt is the dependent variable and Xt is the explanatory
variable. et is the residual noise. As in Chapter 3, we assume conditions (A1),
(A2), (A3), (A4), and (A5) hold.
Employing the same minimum least squares criterion,
N
min
e
i 1
2
i
Yi b X i
i 1
e i2
i 1
X 0.
2 X i Yi b
i
i 1
(5.2)
Solving,
N
X i Yi
i1
b
N 2
Xi
i1
(5.3)
e
X bX ~
i
i
i
b
i
1
b
N 2
X
i1 i
N
ei
Xi ~
.
i1
N 2
X
i1 i
Eb
b
var b
N
~
X i ei
i
N 2
Xi
i1
2 .
N 2
Xi
i1
87
From equation (5.2), it is seen that if e i Yi b X i is the fitted residual, then
N
e i X i
i 1
1 N
e i is not necessarily 0. This can be seen by taking the sum over
N i 1
e Y b X .
i
i 1
i 1
i 1
e i Yi b X i . If
N
Yi
b i1 which is of course not true in general.
N
Xi
i1
What is the probability distribution of b ? Using (A5), since b is a linear
combination of eis that are normally distributed, b is also normally
distributed.
1
.
b ~ N b , 2
X2
CAPITAL PROJECTS
88
the project is profitable or not, or at all financially feasible. This is the area of
capital budgeting.
The central idea in finance when it comes to capital budgeting and project
evaluation is that the cost of capital of a project must be commensurate with
its market risk. As we have seen earlier, the market will require higher
expected return for higher risk. Therefore if we are to fund our project with
market capital, the market will require a certain expected return based on the
risk of our project. This expected return is equivalent to our cost of capital.
For example, if the market requires 10% p.a. return for 10 years of loan or
debt on borrowing to finance a project, then we have to apply 10% p.a. as our
cost of capital. This 10% rate is then used as a discount rate to compute the
Net Present Value (NPV) of the project. Obviously, only projects with
positive NPVs are financially feasible. And given capital budgets or
constraints, only those projects with the highest NPVs are selected. Thus,
estimating cost of capital is closely connected to (1) whether it is financially
justifiable to fund a particular capital project, and (2) which capital project
should receive priority in funding with limited capital available (capital
budgeting).
Many fallacies were committed with respect to (1) and (2). For example, a
large firm may be able to borrow funds from a bank based on a general credit
facility at prime rate of 10% p.a. If a division in the firm proposed a project
with a market risk-adjusted (or risk-assessed) cost of 15% p.a., should it be
funded based on the firms 10% cost? You may suppose the NPV at 15% cost
is negative, but the NPV at 10% cost is positive. If in another firm, division A
has projects with internal rate of return (IRR) of 12% p.a. while division B has
projects with IRR of only 8% p.a., should all the firms borrowing capacity at
10% p.a. cost be allocated to fund only division As projects?
These are deceptively simple questions. The answers are negative. In the
first case, the firms credit line of 10% cost is based on the firms weighted
average cost of capital (WACC) and its weighted average risks of all existing
projects, and possibly new projects with similar average risks. Should the firm
decide to fund a new marginal project at 15% risk-adjusted cost with a
negative NPV, its firm value will decrease, and its future average cost will rise
from the current 10%. Thus, a firm should only fund a project with positive
NPV using the projects own risk-adjusted cost, and not the firms lower
overall WACC.
In the second case, it should again be noted that the firms overall
borrowing cost of 10% p.a. is likely an average of the costs of capital in both
operating and ongoing divisions. Also note that the IRR does not reflect any
bit about the risk of the projects in the division. IRR is simply the discount
rate that equates a projects present value cash-inflows with its present value
of outflows. Some of division As projects may be highly risky with risk-
89
adjusted cost of over 10%. On the other hand, some of division Bs projects
may have risk-adjusted costs of less than 10%. The overall funding should be
allocated to both divisions projects with risk-adjusted costs lower than their
IRRs (here we assume there is no problem of cashflow ambiguity or multiple
solutions in the computation of the IRRs), i.e. with positive NPVs.
5.3
Net present value (NPV) analysis is the key method to assess the worth of a
new capital project proposal. It could be a replacement plant or a new
investment in an ongoing firm, or it could be in project financing where
oftentimes the project will have a terminal time when it would be sold off.
In NPV analysis, there are two key inputs: Expected cash flows and the
risk-adjusted discount rate(s) over the future horizon of the cashflows. The
steps in the analysis may be summarized as follows.
(a) Forecast the nominal cash flows.
Sometimes this may require the forecast of inflation rates. The inflation
rates are utilized to estimate subsequent years nominal revenues by
multiplying present revenue by (1 + expected inflation rate), if no real
growth is anticipated. If real growth is expected, then this must be used to
gross up the nominal revenues.
(b) Ascertain the currency of the cashflows and anticipate the risks that come
with currency conversions. This foreign exchange risk can be hedged by
using currency derivatives.
(c) Ascertain the cost of capital or risk-adjusted discount rate r.
(d) Ascertain the initial capital outlay.
For example, a risky stream of T-years annual future (nominal) cashflows has
a PV computed as its expected values C discounted by nominal per annum
risk-adjusted discount rate r. This annuity of $C from t=1 till t=T, has PV = $
C/(1+r) + C/(1+r)2 + + C/(1+r)T = (C/r) [1 1/(1+r)T] using the
summation by geometric progression. If the cashflow is grossed up by an
inflation component i, then the same risky stream of T-years annual future
(nominal) cash flows has a PV = $ C(1+i)/(1+r) + C(1+i) 2/(1+r)2 + +
C(1+i)T/(1+r)T = C(1+i)/(r-i) [1 (1+i)T/(1+r)T]. One can also add a growth
rate g to i.
Suppose for annuity, we use the certainty equivalent cash flow X < C.
Then now the risk of project is not reflected in cash flows, but should be
reflected in the lower risk-free discount rate k. Thus, PV = (X/k) [1
1/(1+k)T] = (C/r) [1 1/(1+r)T]. Therefore, certainty equivalent X is related
to C as follows: X = C (k/r) [1 1/(1+r)T] / [1 1/(1+k)T]. The NPV of the
90
project is then the PV less the initial capital outlay, or else less the present
value of capital outlays.
5.4
CAPITAL STRUCTURE
When we consider the firm level share valuation, the following steps can be
performed. Let $X = EBIT, expected earnings before interest and tax.
Expected earnings after interest and tax charges,
EAIT = (X rDD)(1-t)
where rD is cost of debt, D is debt level at the firm, and t is the firms tax rate.
This cashflow EAIT goes to equity holders, or to equity value E.
For a perpetual stream of cashflows, the cost of equity is
rE = EAIT/E = (X-rDD)(1-t)/E
or rE = [ X(1-t) rDD(1-t) ] / E
(5.4)
Then the firms weighted average cost of capital, WACC is defined as:rA = X(1-t)/[D+E].
Let V=D+E, where V is the firms total market value, assuming the firm has
only debt and equity. Note that in this V=D+E equation, we do not consider
any tax effect. Without any tax effect, the value of the firm should be the same
whether it is leveraged or not, so we can write V=V U for unlevered firm value
VU, ceteris paribus. From (5.4),
rA = [rEE + rDD(1-t)]/[D+E]
or rA = rE E/[D+E] + rDD(1-t)/[D+E]
or rA = rE E/V + rD(1-t) D/V.
(5.5)
(5.5) shows clearly the weighting in the overall cost of capital by E/V and
D/V. For the debt part, it is the after-tax cost of debt rD(1-t). Note that t is
marginal tax rate (marginal to this project or incremental taxable revenues, so
that the entire analysis is to find the cost of the next funding dollar), and rE and
rD are fund costs payable to suppliers of equity capital and of debt or loans.
We assume competitive capital markets throughout.
Assuming perpetual flows, an unlevered firm value
VU = X(1-t)/rU
(5.6)
91
The total expected cashflows to equity and debt holders is X(1-t) + t rD D.
Note that with corporate tax, interest rate payments are tax-deductible, and so
accounts for the second term above which is called a tax shield. The levered
firm value, ceteris paribus, is
VL = VU + t D
(5.7)
(5.9)
(5.10)
Equations (5.7) and (5.8) represent some of the ingenious results obtained in a series
of papers by Modigliani and Miller, including the path-breaking paper in corporate
finance:- Franco Modigliani and Merton Miller, (1958), The Cost of Capital,
Corporation Finance, and the Theory of Investment, American Economic Review,
June, 261-297. (5.8) is sometimes called MM Proposition II.
92
The initial bE could be obtained using regression as follows.
ri rf = a + bE (rM - rf) + e,
where ri is return rate from the levered firms equity, and constraining
intercept to zero if the unconstrained version produced a significant intercept
0.
As an illustration, suppose a levered firm with debt-equity ratio 0.1 and
marginal tax rate 40% produces an estimate of levered equity beta at 1.2. To
find a new beta supposing the firm intends to lever debt up to a ratio of 25%,
we can compute:
bU = 1.2/[1 + 0.6(0.1)] = 1.13
then new bE = 1.13 [1+0.6(0.25)] = 1.30.
Equations (5.5), (5.8), and (5.10) are useful in the computations of costs of
capital for capital budgeting and project valuation purposes, especially in
firms where there is leverage, provided at some point either rU or rE are
known. Even with the CAPM method, we still need to estimate the market
premium which is the expected excess market return for the period in which
the cost is to be computed.
There are two main approaches to this issue. The first is to estimate the
cost of equity (whether from levered or unlevered firms) directly by
estimating the cashflows that accrue to the equity. The second is to employ the
CAPM in its expected value form, and in that process, necessarily estimating
beta and the market risk premium. We shall now explain these approaches.
5.5
In what follows, we shall assume that market data are available, and that the
market is efficient so that CAPM holds. In a well-functioning market, a
(listed) stock entitles its holder to a perpetual stream of after-tax15 dividends
(random variables as at time now t = 0) starting next period t = 1, through to
. There is no terminal date unless the firm goes bankrupt. This infinite
horizon is in contrast to project financing with a finite terminal date T as seen
in the earlier sections. Let the expected values (based on current date) of these
future dividends be
D1 , D2 , D3 , .. , Dn ,
15
We ignore the small difference created by dividend tax rebates allowable for
investors with personal income tax lower than imputed tax. So, all investors face the
same tax rate on dividends and so would agree on the same discount formulation.
93
and the after-tax required rates of return (or risk-adjusted expected rate of
return) of equity for each of the future dividends is
R1 , R2 , R3 , , Rn , ..
quoted on a per period basis, then, the price of the stock now in a rational
efficient market is
D1
D2
D3
Dn
Dt
P0
......
.......
t
2
3
n
1 R1 (1 R 2 ) (1 R 3 )
(1 R n )
t 1 1 R t
D1
1 g D0 (1 g)
P0
D0
t
R g
R g
1 R
t 1 1 R
t 1
D1
where R > g . P0
is called the Dividend Growth Model (or
R g
D0 (1 g) t
D1
g.
P0
(5.11)
3
4
5
0.025 0.025 0.03
25% 0% 20%
6
0.03
0%
7
0.03
0%
8
0.03
0%
9
10
0.035
0.04
16.67% 14.29%
Period 12
$Div 0.04
13
0.04
14
0.04
15
0.05
16
0.05
17
18
19
20
0.045 0.045 0.045 0.045
Growth 0%
0%
0%
25%
0%
-10%
0%
0%
0%
11
0.04
0%
21
0.05
Average
0.037
11.11%
5.10%
94
Thus, annual % growth rate of dividends is projected at 2 x 5.10% or 10.2%
p.a. If the current stock price is $5 and the last annual dividend is $ 0.045 +
0.05 = $ 0.095, then using (5.8):
Rj = (0.095 x 1.102) / 5 + 10.2% = 12.3% p.a.
Using the DGM, cost of equity is estimated at 12.3% p.a.
Note that sometimes DGM cannot be used because the stock has no steady
dividend record, so the estimation of dividend growth is not reliable. Or in a
volatile market, current stock price can change dramatically by the minute, so
that the expected dividend yield component (D1/P0) in (5.11) is unreliable
because of the highly variable denominator.
The DGM method is not the only method in the approach using direct
estimation of cashflows and discount rate. The Residual Income Valuation
model16 uses direct estimation or forecast of a firms earnings instead of
dividends.
(5.12)
t 1
where BVt is the book value of equity at time t from now, Yt is the abnormal
earnings forecasts (some averaging of the many earnings forecasts on active
stocks made by analysts in the finance industry), and r is the risk-adjusted
discount rate on equity earnings and also charge on the book value of equity.
Yt is abnormal in the sense that it meets the condition of being at least equal or
more to the capital charge rBVt-1. It can be shown that this model is equivalent
to the DGM when the BV0 is related to some discounted value of excess Yt
over dividend issue at t.
One advantage of (5.12) over (5.11) or other forms of dividend growth
model is that earnings forecasts are much more readily available from the
industry.17 They are intuitively more accurate than dividend forecasts since
expected dividends are derived from forecast earnings, and are discretionary
issues after considering new investment requirements of the firm and retained
earnings plough-back. The latter are much harder to forecast.18
16
See for example Ohlson, J., (1995), Earnings, book values, and dividends in equity
valuation, Contemporary Accounting Research, 11, 661-687; and also Feltham G.,
and J. Ohlson, (1995), Valuation and clean surplus accounting for operating and
financial activities, Contemporary Accounting Research, 11, 689-731.
17
See for example the data provider www.ibes.com or the I/B/E/S data.
18
There remain other issues. For a detailed discussion and further references, see
Kothari, S.P., (2001), Capital markets research in accounting, Journal of
Accounting and Economics, 31, 105-231.
95
5.6
The Security Market Line (SML) approach is based on the CAPM. The
equation (5.13) below is the SML, and is typically represented as a line in a
graph of expected return versus beta. It is oftentimes confused with CAPM
which is not only the result in (5.13) but also the model, including the
underlying assumptions and equilibrium conditions, which produce (5.13).
Rj E(rj) = rf + j { E(rm) rf }
(5.13)
Suppose we use recent past history of excess stock return rates (we suggest
using the last 5 years or 60 months of returns data of stock, market, and
riskfree rate) versus excess market return rates to run OLS without intercept,
i.e.
rjt rft = j{ rmt - rft } + ejt .
This is to estimate the beta j of stock j. Suppose we use
1
rf T+1 + j
rmt rft
T t 1
to estimate the required rate of return presently at T+1 where the sample
history of returns [1,T] were used. During certain historical sampling period
T
1 T
because the market was falling, and therefore this does not suitably represent
the expected market risk premium E(r m T+1) rf T+1 that must be positive.
Ferson and Locke (1998)19 also pointed out in their study that estimating this
component of market risk premium in the CAPM or SML setup produces the
most critical errors, while errors due to beta estimation and even riskfree rate
assumption are minor. In some industry practice, the market risk premium,
E(rm) rf , is estimated at t using a long history of past realized premium that
averaged out to give more stable positive estimates.20 We can also approach
the estimation of market risk premium in an interesting econometric
framework.
19
Ferson, W.E., and D.H. Locke, (1998), Estimating the Cost of Capital through
Time: An Analysis of Sources of Error, Management Science, Vol 44, 4, 485-500.
20
Market risk premium estimates over long horizons were provided, for example, in
Ibbotson Associates, Stocks, Bonds, Bills and Inflation: 1987 Yearbook, Ibbotson
Associates Inc., Chicago.
96
To estimate a positive market risk premium, we employ an auxiliary
regression as follows. First we look at the Capital Market Line (CML) also
implied by CAPM.
CML is the line containing all possible portfolios of investors. Each
portfolio is just a simple linear combination of 2 assets the market portfolio
and the riskfree asset. Any portfolio P on the CML or portfolio efficient
frontier with expected return E(rp) and risk p satisfies the following equation:
Erm rf Erp rf
.
m
p
The right-hand side is basically the Sharpe index or ratio. It is basically a
reward-to-risk ratio and the higher the number, the better is the performance
of the portfolio. In CAPM theory, of course all stocks share the same rewardto-risk ratio (call it ) in equilibrium.
Figure 5.1
The Capital Market Line
CML
E(rp)-rf
E(rm) -rf
rf
m
Putting in the time subscript to denote the relationship at each time point t,
we have
Ermt rft
mt
> 0,
(5.14)
97
Therefore, rmt rft mt u t where ut is assumed to be n.i.d. with zero
mean. Now, rmt and rft are the realized market return at t and the realized
riskfree rate at t respectively. For the purpose of this study, they are monthly
rates. mt > 0 here is allowed to vary over time, so the left-side excess
expected market return in (5.14) is effectively a conditional expected return.
Merton (1980)21 also suggested that excess market return at time t may be
estimated by
2
(5.15)
Ermt rft mt
where is a positive constant now denoting the relative risk aversion, i.e.
larger implies requirement of a higher risk premium against total market risk
2
at time t.
mt
The advantage of both the above specifications, (5.14) and (5.15), of
market risk premium is that the OLS estimate of expected market premium
2
will not be negative. In either of the above specifications, mt
is practically
not observable, and so is estimated as follows according to Merton (1980):
2
mt
6
1 6
2
2
lnR m( t k ) lnR m( t k )
12 k 1
k 1
1 6
2
lnR m( t k ) .
13 k 6
Now we can estimate using OLS linear regression through the origin, in
rmt rft pmt u t (p = 1 or 2 depending on which of the above versions),
p
where ut is assumed to be n.i.d., and ut is not correlated with mt
.
mt
21
rft 2 2mt t
2
98
Estimate 2 is obtained, hence > 0. Finally, the market risk premium or the
expected excess market return is computed as
99
the project has expected ROE of 8% but risk-adjusted rate of 10%, do we still
embark on the project? No. It is all about positive NPV as we had discussed
earlier.
5.7
The CAPM methodology has been extensively used since mid-1960s, and
especially prior to 1980s, to investigate the cost of capital to investments (or
equivalently required rate of return) in regulated industries in the U.S. such as
electric power, natural gas, insurance, utilities, telecommunications, and so
on.22 The method has also been applied, sometimes with slight variation, to
investigating required returns to potential investments in new technology or
continuing investments in industry. Baldwin, Tribendis, and Clark (1984)23
studied this problem of the cost of continuing investments in the U.S. steelmaking industry. This was of course an important question when there was so
much cost competition from overseas production at that time.
As an example of the important application of finance, suppose a utility
firm has a rate base or book equity value of $100m. It supplies an expected
10m units of power. The allowed fair rate of return (or cost of capital) is 5%
p.a. A fair rate is one where return should be commensurate with the
(systematic) risks in the market, but also sufficient to maintain the credit
standing of the firm, and also to be able to attract fresh capital. These are
usually the stipulations of court when consumers or consumer association tries
to sue utilities to bring down rates. The last 2 conditions are more difficult and
subjective in the rate fixing process.
If expected costs are $30m per year, including depreciation and taxes (so
as to maintain firms steady state), then it should set per unit power price at
{$30m + 0.05 $100m} / 10m = $3.50.
Thus, one can see that the rate of $3.50 that consumers pay is very much
determined by the allowed fair rate of return of 5% p.a. Much of the finance
theory and econometrics in this chapter could be put to good use in
determining scientifically this fair rate.
5.8
PROBLEM SET
5.1 Why is it necessary either for the government or the courts to set rates
22
See Myers, Stewart C., (1972), The Application of Finance Theory of Public
Utility Rate Cases, Bell Journal of Economics and Management Science, Spring, 5897.
23
Baldwin C.Y., J.J. Tribendis, and J.P. Clark (1984), The evolution of market risk
in the U.S. steel industry and implications for required rates of return, The Journal of
Industrial Economics, September, 73-98.
100
or prices per unit power or water in utilities, but not necessary for
setting prices charged on goods sold by private companies in general?
5.2 According to DGM, what would a high P/E ratio imply about the stocks
future earning prospects?
5.3 Suppose a steady firm has $100 million market value of assets paid for
by 10 million shares of equity funding, and in steady state the assets are
expected to generate earnings of $10 million a year forever, what is the
equilibrium market price per share if the cost of equity is 5% p.a.?
54 Suppose the same firm as in Q5.3 decides to retain 60% of its earnings
every year as internal financing, but could plough back this retained
earnings into new investments that provide only return of 5% p.a., what
would be its new price per share? Explain why. (Ignore tax, potential
bankruptcy, and temporary business cycles.)
5.5 Suppose for a firm with a constant dividend growth rate of 3% p.a., we
estimated the cost of equity to be 8% p.a., what is its fair share price if
the current dividend is $0.50 per share?
5.6 Comment if the SML model is consistent with DGM, and what if any, are
their key differences?
FURTHER RECOMMENDED READINGS
[1] Grinblatt, M., and Sheridan Titman, (2004), Financial Markets and
Corporate Strategy, McGraw-Hill.
[2] Jensen, M., and W. Meckling, (1976), Theory of the Firm: Managerial
Behavior, Agency Costs, and Ownership Stucture, Journal of Financial
Economics, October, 305-360.
[3] Merton, Miller, (1977), Debt and Taxes, Journal of Finance, 32, 261
-275.
[4] Myers, Stewart C., (1984), The Capital Structure Puzzle, Journal of
Finance, 39, 575-592.
101
Chapter 6
TIME SERIES ANALYSIS
APPLICATION: INFLATION FORECASTING
Key Points of Learning
Time series model, White noise, Autoregressive process, Moving average
process, ARMA process, Autocovariance function, Autocorrelation function,
Autocorrelogram, Box Pierce Q-statistic, Ljung-Box test, Backward shift
operator, Invertibility, Yule-Walker equations, Partial autocorrelation
function, ARIMA, Exponential-weighted moving average, De-seasonalization,
Gross domestic product, Inflation, Out-of-sample forecast, Fisher effect
STATIONARY PROCESSES
This is unconditional mean and variance, i.e.taking expectation without any other
information available at t.
3
For time series, this is also called autocovariance or sometimes serial covariance.
102
identically distributed (i.i.d.) process {ut} where ut is stationary, and in
addition is also independent of ut-k and ut+k for any k 0. Thus, pdf (ut | ut-k,
ut+k) = pdf (ut) , k 0 . {ut} is called a white noise. In the slightly weaker case
where E(ut) = 0, var(ut) = constant u2, and cov(ut , ut+k) = 0 for any k0, {ut}
is called a weakly stationary zero serial correlation process. Most covariancestationary processes can be formed from linear combinations of white noises.
Since covariance-stationary processes have a constant variance at any time
t, they do not display changing conditional variances. Some common
examples of covariance-stationary processes can be modelled using the BoxJenkins (1976) approach.
Consider the following stochastic process to model how random variable
Yt evolves through time. They are all constructed from the basic white noise
process {ut} t = -, ., + where each ut is i.i.d. with zero mean, variance u2.
(a)
Yt-1 u t
, 0
(6.1)
(b)
(6.2)
where the residual is made of a moving average of two white noises u t
and ut-1.
Autoregressive Moving Average order one-one, ARMA(1,1),process:
Yt Yt-1 u t u t 1
, 0, 0
(6.3)
(c)
where Yt autoregresses on its first lag, and the residual is also a moving
average.
6.2
AUTOREGRESSIVE PROCESS
Y1 = + Y0 + u1
103
Y-T+1 = + Y-T + u-T+1
These equations are stochastic, not deterministic, as each equation contains a
random variable ut that is not observable.
By repeated substitution for Yt in this AR(1) process,
(6.4) Yt ( Yt-2 u t 1 ) u t
or
Yt (1 )
Yt-2 (u t
u t 1 )
(1
) (u t
u t 1 u t 2
2
For each t,
E(Yt ) (1
2
provided | | 1
1
Otherwise if || 1, either a finite mean does not exist or the mean is not
constant
for every t.
var(Yt ) var( u t
u (1
2
u t 1
u tk
provided | | 1
2
1
Otherwise if || 1, either a finite variance does not exist or the variance is
not constant.
Autocovariance of Yt and Yt-1
or
Yt-1 u t , Yt-1 )
corr(Yt , Yt-1 )
Also, corr(Yt , Yt-k )
( k ).
104
autocovariance lag k of Yt and Yt-k, (k), by the variance of Yt.
Hence we see that the AR(1) Yt process is covariance-stationary with
2
constant mean =
, constant variance =
, and autocorrelation lag
1 2
1
2.5
5.
1 0.5
3
4.
1 0.25
0.5cov Yt 1 , Yt 1
0.5 4 2.
Since Yt is stationary, cov (Yt+k , Yt+k+1) = cov (Yt+k , Yt+k-1) = 1 = 2 for any
k. First order autocorrelation (autocorrelation at lag 1)
corr (Yt+k , Yt+k+1) = corr (Yt+k , Yt+k-1) = 1 =
1
= 0.5.
Y2
k
.
0
105
6.3
u
corr(Yt , Yt-1 )
u t 1 , u t 1
u t2 )
corr(Yt , Yt-k ) 0
for k 1
k 1 2
0
6.4
,k 1
, k 1
Yt Yt 2 u t 1 u t 2 u t u t 1
1 2 Yt 2 u t u t 1 u t 1 u t 2
1 2 u t u t 1 2 u t 2 u t 1 u t 2 2 u t 3
1 2 u t u t 1 u t 2 2 u t 3
For each t,
E(Yt )
provided | | 1
106
var(Yt ) u
u
2
2 2
2 4
2 6
[1 ( ) ( ) ( ) ( )
2
( )
2
1
2
1-
provided | | 1
( )
2
1-
, for k 1
, constant variance =
u2 [
( ) 2
1+
] = y2,
2
1
Y2 u2
k k 2
Y
, k 1
, k 1
cov(X t ,Yt )
var(X t )
E(Yt ) E(X t )
107
Then only when {Xt}, {Yt} are stationary can we apply the Law of large
T
2
t
2
u
{Xt}, {Yt}, and hence {ut}, are stationary. And then only can we perform
appropriate statistical inference based on the usual tT-2 -statistics.
Hence we see the importance of stationarity on the dependent variable
process {Yt} or regressand, and on {Xt} or regressor. Even though OLS and
may be unbiased, but if that is all we know and there is nothing about the
sampling distribution of these OLS estimators that can be estimated, then
statistical inference would be impossible.
It is important to know that while we are dealing with covariancestationary processes e.g. AR(1) (|| <1), or MA(1), their conditional mean will
change over time. This is distinct from the constant mean we talk about, which
is the unconditional mean, and the distinction is often a source of confusion
among beginning students, but it is important enough to elaborate.
6.5
Yt Yt-1 u t
Also
( 0)
Yt 1 Yt u t 1 .
The conditional mean also serves as forecast when estimated parameters are used
e.g. Yt .
108
variance
variance
0).
2u
1 2
E(Yt 1 |u t ) u t
var(Yt 1 |u t ) 2u 2u (1 2 )
Likewise for MA(1) covariance stationary process, conditional mean changes
at each t, but conditional variance is constant and is smaller than the
unconditional variance.
We will branch off from this point to two other important topics : (1) nonstationary processes, and (2) conditional variance that is not a constant. These
topics will be covered in later chapters.
Given a time series and knowing it is from a stationary process, the next
step is to identify the statistical time series model or the generating process for
the time series. Since AR and MA models produce different autocorrelation
function (ACF) k , k > 0, we can find the sample autocorrelation function
r(k) and use this to try to differentiate between a AR or MA or perhaps
ARMA model. To get to the sample autocorrelation function, we need to
estimate the sample autocovariance function.
6.6
c k
1 T k
Yt Y Ytk Y
T t 1
r k
c k
c 0
for k = 0,1,2,3,,p.
109
In the above estimator of autocorrelation function, there are sometimes
different versions in different statistical packages due to the use of different
divisors, e.g.
Tk
r k
1
T k 2
Y Y Yt k Y
t
t 1
T
1
T 1
Y Y
t 1
r k
Y Y Yt k Y
t
t 1
(6.6)
Y Y
t 1
corr
Yt
Yt 1
Yt 2
Yt 3
Yt
Yt 1
Yt 2
Yt 3
1
0.53
0.24 0.53
1
0.12 0.24 0.53
110
Figure 6.1
Sample Autocorrelation Functions of AR and MA Processes
r(k)
MA
AR
0
k
Based on the AR(1) process in equation (6.1) and the sample autocorrelation
measure r(k) in (6.6),
var r k
2
2k
1 1 1
2k 2k .28
2
T
1
(6.7)
For k=1, var r 1
1
1 2 . For k=2,
T
1
1 1 2 1 4
3
var r2
4 4 1 2 2 1 2 var r1.
2
T
T
1
T
6.7
For derivation of the var(r(k)) above, see M.S. Bartlett (1946), On the theoretical
specification of sampling properties of autocorrelated time series, Journal of Royal
Statistical Society., B, Vol. 27.
111
essentially a test of the null hypothesis H0: = 0. Then
var r k
1
T
for all k > 0. Under H0, (1) becomes Yt = + ut , where {ut} is i.i.d. Hence
{Yt} is i.i.d. Therefore, asymptotically as sample size T increases to +,
1
0
T 0
r 1
0
0 1
r 2
T
~ N
,
0 0
r m
0 0
0
0
0
0
1
T
zj
r j 0
.
1
T
Reject H0: (j) = 0 at 95% significance level if the z value exceeds 1.96 for a
2-tailed test.
Since it is known that AR processes have ACF (k), approximated by
r(k), that decay to zero slowly, but MA(q) processes have ACF (k),
approximated by r(k), that are zero for k > q, a check of the autocorrelogram
(graph of r(k)) in Figure 6.2 below shows that we cannot reject H0: (k) = 0
for k > 1 at the 95% significance level. For MA(q) processes where (k)=0 for
k>q,
j1
1
2
var rk 1 2 j .
T
1
1
2
1 2 r1 .
T
T
Thus MA(1) is identified. Compare this with the AR(3) that is also shown.
The standard error used in most statistical programs is
. This standard
error is reasonably accurate for AR(1) and for MA(q) processes when q is
small.
112
Figure 6.2
Identification Using Sample Autocorrelogram
r(k)
MA (1)
AR (3)
1.96 s.e.
1.96 s.e.
r k T r k z 2k ~ m2
k 1
k 1
(6.8)
k 1
This is an asymptotic test statistic. The Ljung and Box (1978) test statistic
provides for approximate finite sample correction to the above asymptotic test
statistic:
r k
T T 2
~ m2 .
Q
m
2
'
k 1
Tk
(6.9)
ARMA(P,Q) PROCESSES
th
(6.10)
(6.11)
113
We shall define the backward shift operator B as a function such that its map
is its lag, BYt = Yt-1 . The operator follows usual algebraic properties. B2Yt =
B(BYt) = BYt-1=Yt-2 . BkY = Yt-k .
A finite qth order MA process is always stationary as the mean and
variance are finitely constant. A finite pth order AR process may be nonstationary, and is stationary provided restriction on the parameters ks holds.
For the AR(p) in (6.10), we express it as
(B)Yt (1- 1B - 2B2 - - pBp)Yt = + ut .
(B) (1- 1B - 2B2 - - pBp) = 0 is called the characteristic equation
of the AR(p) process. For the process to be stationary, the roots or zeros of the
characteristic equation must lie outside the unit circle (complex roots lie
outside the Argand circle). For example, in the AR(1) case, (B) (1- 1B) =
0, and B = 1-1, so if B lies outside unit circle, then the requirement for
stationarity is that | 1-1| > 1, or | 1| < 1.
Next we show how MA processes can be sometimes represented by an
infinite order AR processes. As an example, consider MA(1):
Yt = + ut + ut-1 ( 0), or Yt = + (1+B) ut .
So, (1+B)-1 Yt = (1+B)-1 + ut .
Note that (1+x)-1=1-x+x2-x3+x4- for |x|<1. Also, let constant c = (1+B)1
. Then,
(1-B+2B2-3B3+4B4-..)Yt = c + ut , or
Yt - Yt-1 + 2Yt-2 - 3Yt-3 + 4Yt-4 - . = c + ut .
Thus
Yt = c + Yt-1 - 2Yt-2 + 3Yt-3 - 4Yt-4 + . + ut ,
which is an infinite order AR process.
This AR() process is not a proper representation that allows infinitely
past numbers of Yt-ks to forecast a finite Yt unless it is stationary. If it is not
stationary, Yt may increase by too much, based on an infinite number of
explanations of past Yt-ks.
It is stationary provided that roots of (1+B) = 0 lie outside the unit circle.
If a stationary MA(q) process can be equivalently represented as a stationary
AR() process, then the MA(q) process is said to be invertible. Although all
finite order MA(q) processes are stationary, not all are invertible. For
example, Yt = ut - 0.3ut-1 is invertible, but Yt = ut - 1.3ut-1 is not.
Invertibility of an MA(q) process to stationary AR() allows expression
of current Yt and future Yt+k in terms of past Yt-k , k > 0. This could facilitate
forecasts and interpretations of past impact.
It is not an interesting issue to consider inverting AR(p) processes to
infinite order MA processes.
A mixed autoregressive moving average process of order (p,q) is
Yt = + 1Yt-1 + 2Yt-2 + + pYt-p + ut + 1ut-1 + 2ut-2 + .. + qut-q .
114
This is also invertible to an infinite order AR or less interestingly, an infinite
order MA.
Consider an AR(p) process in (10): Yt = + 1Yt-1 + 2Yt-2 + +
pYt-p + ut . Note that ut is i.i.d. and has zero correlation with Yt-k , k > 0.
Multiply both sides by Yt-k . Then,
Yt-kYt = Yt-k + 1 Yt-k Yt-1 + 2 Yt-k Yt-2 + + p Yt-k Yt-p + Yt-k ut .
Taking unconditional expectation on both sides, and noting that
E(Yt-kYt) = (k) + 2 where = E(Yt-k) for any k, then
(k) + 2 = + 1[(k-1) + 2] + 2[(k-2) + 2]
+ .+ p[(k-p) + 2]
(6.12)
Finding the unconditional mean in (6.10), = + 1 + 2 + .+ p , and
using this in (6.11), we have
(k) = 1(k-1) + 2(k-2) + .+ p(k-p) for k > 0 .
Dividing both sides by (0),
(k) = 1 (k-1) + 2 (k-2) + .+ p (k-p) for k > 0 .
If we set k = 1 ,2 3, , p, we obtain p equations. Put (0)=1. The following
equations derived from AR(p) are called the Yule-Walker equations. They
solve for the p parameters in 1 , 2 , .. , p .
(1) = 1
+ 2 (1) + .+ p (p-1)
(2) = 1 (1) + 2
+ .+ p (p-2)
(3) = 1 (2) + 2 (1) + .+ p (p-3)
(p) = 1(p-1)+2(p-2) + .+ p
If we replace the (k) by sample r(k) as approximates, then the p Yule-Walker
equations can be solved as follows for the parameter estimates k s.
r(1)
r(2)
1
r(1)
r(1)
1
r(1)
r(2)
r(2)
R
r(1)
1
r(p)
r(p 1) r(p 2) r(p 3)
R ,
and
Therefore, 1R
The other parameters can be estimated as follows.
r(p 1)
r(p 2)
r(p 3)
1
2
p
(6.13)
115
1 T
Yt
T t 1
1 1 2
p .
1
2
u2 Yt 1 1r(1) 2r(2)
T t 1
pr(p)
r(2) r(1)
22
.
2
1 r(1)
2
Note that the computation of 21 is not important at all here. Suppose we take
k=3, and based on AR(3): Yt = + 31Yt-1+ 32Yt-2+ 33Yt-3 + ut , compute the
sample Yule-Walker equations in (6.13), and obtain 33 . In a similar way, we
116
Yt = + k1Yt-1+ k2Yt-2+ k3Yt-3 +..+ kkYt-k + ut
and assuming all Yt-js, j<k, are constants, then autocorrelation coefficient of
Yt and Yt-k is simply kk which is the partial autocorrelation. The function or
sequence kk (k), k = 1,2,3,.., is called the partial autocorrelation function
(PACF). They can be computed from a time series {Yt}. Their theoretical
counterparts are kk where theoretical (k) are used instead of r(k) in (6.13).
What is interesting to note is that given true order p or AR(p),
theoretically 11 0, 22 0, 33 0, ., pp 0, p+1 p+1 = 0, p+2 p+2 = 0, p+3
p+3 = 0, and so on. Or kk 0 for k p, and kk = 0 for k > p. Therefore, while
an AR(p) process has a decaying ACF that does not disappear to zero, it has a
PACF that disappears to zero after lag p.
For k > p, var( kk ) = T-1. Therefore, we can apply hypothesis testing to
determine if for a particular k, H0: kk = 0, should be rejected or not by
kk
considering if statistic
T 1
kk
MA (1)
AR (3)
1.96 s.e.
1.96 s.e.
117
6.10
GDP GROWTH
This is the gross domestic output in an economy, and is seen to be rising with
a time trend. If we input a time trend variable T, and then run an OLS
regression on an intercept and time trend29, we obtain
Y = 7977.59 + 292.74*T
where Y is GDP. We then perform a time series analysis on the residuals of
this regression, i.e. e = Y 7977.59 292.74*T. Sample ACF and PACF
correlograms are plotted for the estimated residuals e to determine its time
series model. This is shown in Figure 6.5.
29
At this juncture, another type of analysis may proceed alternatively with taking first
difference. We shall look at this in a later chapter on Unit Roots.
118
Figure 6.5
Sample Autocorrelation and Partial Autocorrelation Statistics
Figure 6.5 indicates significant ACF that decays slowly so AR(p) process
is suggested. PACF is significant only for lag 1, so an AR(1) is identified. The
residual process is not i.i.d. since the Ljung-Box Q-statistics in (6.9) indicate
rejection of zero correlation for m=1,2,3,etc. Then, we may approximate the
process as
(Yt - 7977.59 - 292.74*T) = 0.573*(Yt-1 - 7977.59 - 292.74*[T-1]) + ut
where ut is i.i.d. Then, Yt = 3598 + 126 T + 0.573 Yt-1 + ut . If we create a
fitted (in-sample fit) series
119
Figure 6.6
3598 126 T 0.57 3 Y
Fitted GDP Equation Y
t
t -1
Fitted GDP
Actual GDP
6.11
ARIMA
Suppose for a stationary time series {Yt}, both its ACF and PACF decay
slowly or exponentially without reducing to zero, then an ARMA (p,q), p 0,
q 0, model is identified. Sometimes a time series ACF does not reduce to
zero, and yet its PACF also does not reduce to zero, not because it is ARMA
(p,q), but rather it is an autoregressive integrated moving average process
ARIMA(p,d,q). This means that we need to take d number of differences in
{Yt} in order to arrive at a stationary ARMA(p,q). For example, if {Yt} is
ARIMA(1,1,1), then {Yt Yt-1} is ARMA(1,1). In such a case, we have to
take differences and then check the resulting ACF, PACF in order to proceed
to determine ARMA.
There is a special case ARIMA(0,1,1) that is interesting, amongst others.
Yt Yt Yt 1 u t u t 1 , 1
Then
1 B Yt 1 B u t . So,
120
1 B Y u
1 B t t
1 B 1 B Y u
t
t
1 B
1 1 B B2 2 B3 ..... Yt u t
Yt Yt 1 u t , Yt 1 1
j1
Yt j , 1
j1
E Yt | Yt 1 , Yt 2 ,.... Yt 1 Yt 1 1 Yt 2
Thus, the forecast of next Yt at time t-1, is the weighted average of last
observation Yt-1, and the exponential-weighted moving average of past
observations of Yt-1.
Another special application of ARIMA is de-seasonalization. Suppose a
time series {St} is monthly sales. It is noted that every December (when
Christmas and New Year comes around) sales will be higher because of an
additive seasonal component X (assume this is a constant for simplicity).
Otherwise St = Yt. Assume Yt is stationary.
Then the stochastic process of sales {St} will look as follows.
Y1, Y2 ,...., Y11, Y12 X, Y13, Y14 ,......., Y23, Y24 X, Y25 , Y26 ,......Y35 , Y36 X, Y37 , Y38 ,......
This is clearly a non-stationary series even if {Yt} by itself is stationary. This
is because the means will jump by X each December. A stationary series can
be obtained from the above for purpose of forecast and analysis by performing
the appropriate differencing.
(1-B12) St = Yt Yt-12 .
Suppose (1-B12) St = ut, a white noise. Then this can be notated as (0,1,0)12. If
(1- B12) St = ut, then it is (1,0,0)12. Notice that the subscript 12 denotes the
power of B. If (1- B)(1-B12) St = ut, then it is (1,0,0) (0,1,0)12. (1- B12)St =
(1-B) ut is (0,1,0)12 (0,0,1).
In this case, the X component is removed, and we perform analyses on the
differenced series {Yt Yt-12}. Suppose a stationary sales series
St = 1000 + 0.3 St-1 + 0.2 St-2 + 0.05 St-3 + ut
where previous month sales has 0.3 impact on current St and so on. A forecast
of this month sales given all previous months sales information yields
E(St|St-1,St-2,.) = 1000 + 0.3 St-1 + 0.2 St-2 + 0.05 St-3.
Thus, if the past three months sales records are 25,000 units, 30,000 units,
and 18,000 units, then
Et-1(St) = 15,400 units.
Note the subscript t-1 denotes all information available at time t-1.
121
6.12
P
I t ln t
Pt 1
Using US data for the period 1953 to 1977, Fama and Gibbons (1984)
reported the following sample autocorrelations in their Table 1 (shown as
follows).
N=299
It
It It-1
r(1)
.55
-.53
r(2)
.58
.11
r(3)
.52
-.06
R(4)
.52
-.01
r(5)
.52
-.00
r(6)
.52
.03
It
It It-1
r(7)
.48
-.04
r(8)
.49
-.04
r(9)
.51
.06
r(10)
.48
.02
r(11)
.44
-.08
r(12)
.47
.09
Given sample size N, the standard error of r(k) is approximately 0.058. Using
95% significance level or a critical region outside of two standard errors, or
about 0.113, r(k)s for the It process are all significantly greater than 0, but
r(k)s for It-It-1 process are all not significantly different from 0 except for r(1).
The autocorrelation of It is seen to decline particularly slowly, suggesting
plausibility of an ARIMA process. The ACF of It-It-1 suggests a MA(1)
process. Thus It is plausibly ARIMA (0,1,1). Using this identification,
It It-1 = ut + ut-1 , || < 1
(6.14)
From (6.14), Et-1(It ) = It-1 + ut-1 since ut is i.i.d. Substitute this conditional
forecast or expectation back into (6.14), then
It = Et-1(It ) + ut .
(6.15)
But Et(It+1) = It + ut and Et-1(It ) = It-1 + ut-1 imply that
Et(It+1) - Et-1(It ) = It It-1 + (ut ut-1)
= ut + ut-1 + (ut ut-1)
= (1+) ut .
(6.16)
122
This indicates that the forecast follows a random walk since ut is i.i.d. From
(6.15), ut = It - Et-1(It ) can also be interpreted as the unexpected inflation rate
realized at t.
Suppose is estimated as -0.8 . Then (6.14) can be written as
It It-1 = ut 0.8 ut-1 .
From (6.16), the variance in the change of expected inflation is (1-0.8)2 u2 or
0.04 u2 , while the variance of the unexpected inflation is u2 . Thus, the latter
variance is much bigger. This suggests that using past inflation rates in
forecasting future inflation as in the time series model (6.15) above is not as
efficient. Other relevant economic information could be harnessed to produce
forecast with less variance in the unexpected inflation, i.e. produce forecast
with less surprise. Fama and Gibbons (1984) show such approaches using the
Fisher effect which says that
Rt = Et-1(rt) + Et-1(It)
where Rt is the nominal riskfree interest rate from end of period t-1 to end of
period t, and this is known at t-1, rt is the real interest rate for the same period
as the nominal rate, and It is the inflation rate.
Notice that the right-side of the Fisher equation contains terms in
expectations. This is because inflation at t-1 when Rt is known, has not
happened, and so is only expected in form or ex-ante. If the real rate rt is
known at t-1, then the equation would be Rt = rt + Et-1(It), and this creates an
immediate contradiction in that Et-1(It) = Rt rt becomes known at t-1 which is
not.
Thus, at t-1, real rate, like inflation, is ex-ante and not known. In fact,
unlike inflation which may be measured ex-post fairly accurately, real rate
cannot be accurately observed ex-post and has to be estimated.
6.13
PROBLEM SET
123
is 1.5% and the lagged one, lagged two, and lagged three return rates
are 2%, 1%, and 1.2% respectively? Do you need to check any
condition on the stochastic process before you perform this forecast?
r(k)
1.96 s.e.
0
1.96 s.e.
124
rp =
1 N
ri where ri is the return of asset i during the same period.
N i 1
N*
ci c . Assume that
i
1 N*
bi bp
N * i 1
where bp is a constant > 0. Find the variance of rp for N* that is finite, and
for N that is almost infinite.
125
Chapter 7
RANDOM WALK
APPLICATION: MARKET EFFICIENCY
Key Points of Learning
Random walk, Rational expectation, Informational efficiency, Equilibrium
asset pricing model, Benchmark model, Joint test, Market efficiency, Euler
condition, Martingale, Markov process, Strong-form market efficiency, Semistrong form market efficiency, Weak-form market efficiency, Variance ratio,
Long term memory
RANDOM WALK
126
iterated expectations. Note that cov(et+1,Pt) = 0 needs not necessarily imply
E(et+1|Pt) = 0. The former represents zero correlation, while the latter
represents a stronger condition of conditional stochastic independence
between et+1 and Pt.
Note that the strongest version implies the strong version that in turn
implies the weak version of random walk. For an example that the reverse is
not true, suppose cov(et+1, et) = 0. However, it could be that cov(et+12 , et2) 0.
These conditions satisfy the weak version, but not the strong or strongest
version since stochastic independence means E(et+12et2) = E(et+12) E(et2),
and hence cov(et+12, et2) necessarily equals 0 in this case.
A related process Pt+1 = + bPt + et+1 where b=1, et+1 is stationary with
zero mean, but that could be stochastically dependent over time, is called a
Unit Root Process.
In the special case of random walk in (7.1) when drift = 0,
Pt+1 = Pt + et+1 .
(7.2)
Z
Suppose t represents observed information variables in the economy
E Pt 1 | Zt E Pt | Zt E e t 1 | Zt Pt 0 .
The last term is zero whether we have the strongest or strong or weak version
of random walks. Hence also,
E Pt 1 Pt | Zt 0 .
(7.3)
The converse where equation (7.3) implies (7.1) is not necessarily true as
nonlinear forms e.g. Pt+1=Pt t+1 , and E(t+1| Zt ) = 1, can also lead to (7.3).
Thus, the random walk process is slightly stronger than (7.3), and is a special
case of (7.3). Nevertheless, the random walk process is a convenient
workhorse to test a condition such as (7.3). In other words, if empirical data
satisfy (7.3), they are consistent with (7.1) though (7.1) is not exactly
validated.
127
time t. Prices following a process with property (7.3) such that the
conditional expectation is the last period price is called a martingale
process.
Suppose ln t+1 ~ N(, 2). Then yt+1 exp (ln t+1) t+1 is lognormally
distributed. Now, E y t 1 | Zt E t 1 | tZ E e
N , 2
12 2
. Thus
Pt 1
128
j 0
1
T
to test that the jth autocorrelation is zero. To test if all autocorrelations are
zero, viz.
H0: 1 2 m 0 [with m usually set at 6, 12, 18],
we can apply the Ljung and Box (1978) test statistic that provides for
approximate finite sample correction:
'
Q m T T 2
2
k
Tk
~ m .
2
k 1
Under (7.5), the null hypothesis is H0: cov(rt, rt-k) = 0 for any k0, so we can
use the z statistic and also the Q-statistic on the continuously compounded
return rates. Processes such as ARMA with dependence on lags are said to
exhibit long memory and are predictable.
7.2
INFORMATIONAL EFFICIENCY
The issue of whether stock returns are predictable is a very old one, and will
always be terribly important as long as stock market remains a major source of
funding for the corporate world. The issue of predictability is intimately
connected to the issue of market efficiency. Because the microeconomics and
finance literature are very intense on this topic, at least during a good stretch
of the last 3 decades with such an enormous collection of writings, there has
always been some confusion on the constructions of definitions and ideas.
To begin with, it is important first to understand what rational expectation
is. Rational expectation is to some economists a very fundamental assumption
of rationality in any financial valuation or price modeling. It assumes that
agents in the market make forecasts in a rational way by incorporating all
given information at hand and then act optimally on these forecasts. The way
information is incorporated is typically in a statistical sense through the
Bayesian rule when risks and uncertainties in relevant variables are expressed
as a multivariate probability distribution.
Suppose we are given a model Yt = a + bXt + ut , ut being i.i.d. If Xt is
known or observed by the agents in the market, the rational expectation of Y t
is then the conditional mean E(Yt|Xt) = a + bXt . This is the best forecast with
minimum error. It is a statistical result no doubt, but it is the way the model
and agents in the economy work. If forecast is made any other way e.g.
E(Yt|Xt) = a + bXt v , v > 0, then obviously it is irrational, since the forecast
error
Yt E(Yt|Xt) = Yt a bXt + v = ut + v
is expected to be larger.
129
Suppose instead of using the given information Xt, the agents in the
market use a subset of the information Zt Xt to form the forecast E(Yt|Zt)
instead. Clearly, this is a rational forecast given Zt, but it is inefficient use of
information. Not all information in Xt is applied. The market based on such
inefficient information usage, with respect to the information available to the
market, is informationally inefficient. Otherwise it is informationally efficient.
In this context, it is also referred to as market efficiency30 if there is
informational efficiency.
According to our definition, informational efficiency includes necessarily
rational expectations. However, rational expectation by itself strictly does not
ensure informational efficiency as is seen in the case above where only subset
information Zt is used. Rational expectation is a necessary but not sufficient
condition for informational efficiency. Conversely, it is not ordinarily
conceivable in usual circumstances to have irrationality and yet informational
efficiency. We shall illustrate the concept of informational efficiency with
respect to use of available information by an example as follows.
Suppose a true statistical model of returns generation over two periods is
as follows.
Figure 7.1
Today
0.7
good
news
0.3
0.4
0.6
0.2
Earnings
$20
bad
news
$10
good
news
$20
bad
news
0.8
30
good
news
bad
news
$10
There have been various other definitions of market efficiency with regard to
transaction costs. However, the chief usage in finance is in the sense of
informationally efficient or inefficient market.
130
In the first period, there is either good news with probability 0.4 or bad news
with probability 0.6. If it is good news, then in the second period, there is
either good news with probability 0.7 or bad news with probability 0.3. If
there is bad news, then in the second period, there is either good news with
probability 0.2 or bad news with probability 0.8. The information structure is
depicted as follows.
One period has just passed, and the asset giving rise to the earnings is to
be priced by the market today at the start of the second period. Suppose by
some equilibrium asset pricing model taking into account risk aversion and
economy wide factors, the required risk-adjusted rate of return for this asset
over one period is 10%. This equilibrium model is sometimes referred to as
the benchmark model of equilibrium asset price.
If good news had happened in the first period, and the market has
efficiently taken this information into account, the expected earnings for next
period is 0.7 x $20 + 0.3 x $10 = $17. The price of the asset today is $17/1.1 =
$15.45.
If bad news had happened in the first period, and the market has
efficiently taken this information into account, the expected earnings for next
period is 0.2 x $20 + 0.8 x $10 = $12. The price of the asset today is $12/1.1 =
$10.91.
However, if the market is inefficient with respect to this relevant news,
and does not take this into account, the probability of good news next period is
0.4x0.7+0.6x0.2=0.4, and the probability of bad news next period is
0.4x0.3+0.6x0.8=0.6. The expected earnings for next period is 0.4 x $20 + 0.6
x $10 = $14. The price of the asset today under this informationally inefficient
market is $14/1.1 = $12.73.
Thus we see that in an informationally efficient market, when relevant
information affecting price arrives (as at end of period 1), the asset price will
adjust quickly to reflect this information. If it is good news, price will rise to
$15.45. If it is bad news, price will fall to $10.91. But if the market is not
informationally efficient, the news will not be quickly incorporated into price,
and it will stay at $12.73. If only a subset, including null set, of the revealed
information is utilized by the market, the asset price does not fully incorporate
the information and therefore will not adjust quickly. The market is then
informationally inefficient. This idea of quick price adjustment in the face of
relevant information reaching the market will again be taken up in the chapter
on Event Studies.
There are two important salient points to note in the above example.
Firstly, rational expectations of prices are employed throughout, whether the
market is informationally efficient or not. If a market is rational, yet
inefficient, it means that not all available information out there is utilized or
taken as being given. It could be something about information being costly
131
and thus cannot be totally appropriated, or market participants are slow to
utilize all existing information at once. Thus rational expectation is not a
sufficient condition for concepts of market efficiency. However, it is usually a
necessary condition, just as rationality is usually the foundation to equilibrium
asset pricing. Secondly, we note that if we use asset price to assess, whether
by how much it moves or does not move, if the market is informationally
efficient or not, then this assessment (or more formally a test)s conclusion
also depends on the benchmark model or the equilibrium required riskadjusted rate of return. Thus, if in the above example, the benchmark model is
incorrectly specified, and the required rate of return for the asset is actually
33.5% per period, then when there is good news and the market is
informationally efficient, the resulting price is $17/1.335 = $12.73.
Using the incorrect benchmark of 10% will result in the wrong conclusion
that the market is informationally inefficient. Hence the often quoted
expression that, Test of market efficiency is a joint test of informational
efficiency and also correct equilibrium asset pricing model. A rejection of the
null hypothesis of market efficiency could be due, not just to sampling or type
I error, but to an incorrect benchmark model even when the market is
informationally efficient. (More rarely, but plausibly, an acceptance of
informational efficiency could be due, not just to sampling or type II error, but
to incorrect benchmark model.) In summary, under a joint hypothesis, the
rejection of H0: Market Efficiency could be rejection of the benchmark model
and not market efficiency. Vice-versa, it could be rejection of market
efficiency and not the benchmark model. It could also be the rejection of both
efficiency and model. However, the acceptance of H0 could be market
inefficiency but also incorrect benchmark model in a coincidence of sorts.
Therefore we need to understand the benchmark model a little more here.
A very general and necessary condition for intertemporal asset pricing
model is the Euler condition31
31
This condition is necessary for many forms of rational asset pricing models,
including CAPM and its many variations. It becomes a testable implication for asset
pricing, and beginning with an important theoretical work by Lucas (1978) [Asset
Prices in an Exchange Economy, Econometrica 46, 1429-1445], and an important
empirical work by Hansen and Singleton (1983) [Stochastic Consumption, Risk
Aversion, and Temporal Behavior of Asset Returns, Journal of Political Economy
91, 249-265], the investigations have spawned a whole new era of research, leading to
important results such as the consumption-based asset pricing, Mehra and Prescotts
equity premium puzzle, and Hansen and Jagannathans lower bound for stochastic
discount rate volatility. A related econometrics paper, Hansen and Singleton (1982)
[Generalized Instrumental Variables Estimation of Nonlinear Rational Expectations
Models, Econometrica 50, 1269-1288] has also generalized results in classical
132
U C t 1
(7.6)
E t Pt 1
Pt
U C t
where Pt+1 is next period asset price, U(Ct) is the marginal utility of
consumption Ct at t, and 0 < < 1 is a time preference discount factor or
inverse of impatience (to consume) coefficient, with higher preference for
present consumption instead of postponement if is smaller. The term in the
bracket to the right of the price Pt+1 is sometimes called the pricing kernel.
Such a necessary condition presupposes liquid trading in a securitized market
and the availability of information to all agents.
All agents are assumed to be alike or homogeneous, so equilibrium in the
market or market clearing price Pt can be attained at t. Importantly, it is
assumed that the conditional expectation in (7.6) is taken by agents with
respect to all available information Zt in the economy at t. This includes the
history of the asset prices {Pt} or Pt , and the history of all available public
information Yt e.g. earnings records, trading records, announcements, etc.
Information set in Zt excluding Pt and Yt would be any other information
at t that was not publicly available, and would include insider information.
The three information sets satisfy the following set relationship:
Pt Yt Zt .
Equation (7.6) provides conditions for fixing the required risk-adjusted
rate of return in the market. For example, if U(.) is quadratic, or if return rates
are MVN, we can derive the single-period CAPM as seen in earlier chapters.
In general, when rational expectation is taken in (7.6) for a general kernel, the
prices Pt+1 needs not be a martingale or exhibit random walk. A random walk
is a special case of a martingale as discussed earlier.32 Therefore we should
keep in mind that rejection of random walk does not necessarily amount to
rejecting rational expectations and market efficiency.
As a special case, if U(.) is linear in Ct , which is the same as assuming
agents are risk-neutral, then U(Ct+1) = constant, and ratio U(Ct+1)/U(Ct) = 1.
Then we have (letting 1 for a sufficiently small interval),
Et (Pt+1 ) = Pt .
(7.7)
133
It is important to emphasize that this is a conditional expectation taken by the
agents in the economy (and this benchmark model has only one representative
agent, so we are not dealing with heterogeneous agents model33) with
information set Zt used by the agent. Another way to state (7.7) is that
present price Pt is a sufficient statistic for the conditioned information set
Zt that is used by agent. Sufficient statistic means that it is enough to use it to
infer exactly the same conditional expectation instead of using the entire
information set at t , Zt , which also includes the sufficient statistic.
E E Pt 1 Pt | Zt | tY 0 , or
E Pt 1 Pt | tY 0
(7.8)
E E Pt 1 Pt | Zt | Pt 0 , or
E Pt 1 Pt | tP 0
(7.9)
since Pt Yt Zt .
In (7.3), stochastic process {Pt} is a martingale. A related property of
random variables in any stochastic process {Qt} is that P(Qt+1| Qt, Qt-1, Qt2,..) = P(Qt+1| Qt). Such a process is called Markov process, and is said to
exhibit the Markov property.
A Markov process {Pt} is necessary for a martingale in prices. Martingale
theory was developed in connection to the idea of fair games. As an example
of a fair game: bet $1 ; outcome 50% $2, 50% $0 (double or zero). Expected
wealth next round = $1. Under the double or zero strategy, wealth is a
martingale. Or, expected gain is always zero.
Historically, Et (Pt+1 ) = Pt or (7.3) has come to be well studied. If we
assume infinite supply of funds for arbitrage, and a very small period interval,
this should hold approximately true under speculative efficiency. 34 If the
market is efficient, any information leading to equilibrium price increase (or
33
134
decrease) as illustrated by Figure 7.1, must cause current price to adjust till the
forecast based on all available information at t equals the price Pt. Intuitively,
even if (7.3) is strictly not a benchmark model, it may be a good
approximation. It is also plausible for the economy to allow a positive drift,
i.e. Et(Pt+1) = + Pt , for >0.
In view of the earlier discussion of the joint hypothesis of informational
efficiency and correct benchmark model, we simplify the issue by either
assuming that the benchmark model is correct, or that the market is
informationally efficient. Indeed this has been the implicit practice of most
empirical finance research. If we assume that the market is informationally
efficient, then whatever test of (7.6) is a test of asset pricing model, and there
has been a copious amount of research testing the Euler condition in (7.6). If
we assume the benchmark model, e.g. (7.3) or (7.8) or (7.9) is correct, then
the test of the condition amounts to testing whether the market is
informationally efficient. We shall adopt this latter assumption of a correct
benchmark, and thus (7.3) is called the strong-form market efficiency
hypothesis, (7.8) is called the semi-strong form market efficiency hypothesis,
and (7.9) is called the weak-form market efficiency hypothesis. Note that
these hypotheses deal with aspects of informational efficiency. Much of this
framework was a result of the seminal contribution by Eugene Fama. In 1970,
he surveyed the idea of an informationally efficient capital market, and made
the following famous definition: A market in which prices always fully
reflect available information is called efficient.
What are the hypothesis implications of (7.3), (7.8), and (7.9)? Suppose
we accept (7.3) and thus also (7.8) and (7.9), the market is said to have strongform (informational) efficiency. If we reject (7.3), but accept (7.8) and hence
also (7.9), the market is said to have semi-strong form (informational)
efficiency. If we reject (7.3) and (7.8), but accept (7.9), the market is said to
have weak-form (informational) efficiency.
What exactly are the testable implications of (7.3), (7.8), and (7.9)? How
do we test (7.3), (7.8), and (7.9)? It is easiest to test (7.9), the martingale
property of asset price. Let the price change at t+1 be defined by
et+1 = Pt+1 - Pt .
According to (7.9), its conditional expectation is zero, i.e.
E e t 1 | tP 0 .
(7.10)
135
E E e t 1 | tP E e t 1 0 .
(7.11)
A variation of the autocorrelation test is variance ratio test for random walk
processes. Suppose stock prices {Pt} follow process:
ln Pt+1 = + ln Pt + et+1
(7.13)
136
empirical data work, we shall make this one month. Then 2 is the variance of
monthly return rates. Call the return over one period rt+1 = rt(1). Call the return
over two periods or two months, rt(2) = rt+1 + rt+2 ln (Pt+2/Pt). The return over
q periods or q months is rt(q) = rt+1 + rt+2 + + rt+q ln (Pt+q/Pt). This
discrete random walk process is consistent with, and implied by continuous
time geometric diffusion process that will be discussed later in the chapter on
bonds and term structure.
The important point is that var[rt(1)] = 2, var[rt(2)] = 22, .,
var[rt(q)] = q2, and so on.
Therefore, random walk model (7.13) implies that:
var[rt(q)] = q var[rt(1)],
(7.14)
varrt q /q
1.
varrt 1
Or,
var rt q /q
1 0.
var rt 1
(7.15)
Equation (7.15) is thus the testable implication of the random walk process in
(7.13).
For a large sample size N+1 relative to q, with price data P0, P1, P2, .,
PN,, the sampling estimates of the terms in (7.15), respectively for the
numerator and denominator, are (choose N so that N/q is an integer):
1 N/q
2
N/q lnPqj lnPq j1 q
j1
q2 /q
q
1 N
2
2
and 1 lnPj lnPj-1
N j1
1 N
where lnPj lnPj1 .
N j1
(7.16)
(7.17)
By the Law of Large Numbers for strongly stationary process r t(q), for
2
2
any q, as N, , and 1 in terms of asymptotic convergence. In
137
(7.17), by the Central Limit Theorem, the sample average 12 converges
1 N
2
E lnPj lnPj-1 2
N j1
and asymptotic variance
1
1
2
var 12 var lnPj lnPj-1 var e 2t 1
N
N
4
1
var 2 z 2t 1
2
N
N
E 12
(7.18)
N 12 2 N
N 0, 2 4 .
1
1
2
var /q
var
lnP
lnP
var
e
qj
q
j
-1
Nq
Mq 2
j1
2
q
1
1
var Y 2
var q z t 1
Nq
Nq
2q 4
q
var 2 z 2t 1
N
N
1
var q 2 z 2t 1
Nq
N q2 /q 2 N
N 0, 2q 4 .
2
2
Consider statistic J d q /q 1 .
(7.19)
138
.
N
N /q
N0, 2q - 1 .
var q2 /q 12 N
var q2 /q var 12 2q 1
NJ d
Then
2
q
2
1
NJ r N
/q N0, 2q - 1
2
q
2
1
2
1
q2 /q
12
q2 /q
N
J r N0, 1 where J r 2 1 .
2q - 1
1
We collect monthly (month-end) S&P 500 index prices Pt from Yahoo
Finance in the sampling period 1949 December to December 2009. The
statistic Jr , the variance ratio test statistic Z =
N
J r , and the p-values of
2q - 1
the normality tests are reported as follows for different q = 2, 3, 6, 12, 24, and
36. N is 720 observations here.
Table 7.1 shows that the random walk hypothesis is rejected in 2-tailed
test at 10% significance level for all cases except when q=36. If significance
level is 5%, then it is also not rejected at q=24. However, for q 12, there is a
very strong indication of positive autocorrelation in the monthly returns. This
35
See Lo, Andrew and Craig MacKinlay, (1988), Stock Prices do not Follow
Random Walks: Evidence from a Simple Specification Test, The Review of Financial Studies,
Vol.1, Spring, 41-66, for a more general discussion involving overlapping data.
139
led to the variance ratio being larger than one, and hence the test statistic is
greater than zero.
Table 7.1
Variance Ratio Test of Random Walk Hypothesis on S&P 500 Prices
q
Jr
Z
p-value
7.4
12
24
36
0.2298
0.4992
0.4667
0.9622
0.4507
0.4330
3.5605
5.9905
3.7762
5.3838
1.7640
1.3788
0.0004
0.0000
0.0002
0.0000
0.0777
0.1679
ALTERNATIVE HYPOTHESES
We can see how the variance ratio is related to autocorrelations when random
walk does not hold. Define rt(2) = rt + rt-1 . This is a 2-period return ln[Pt/Pt-2] .
Define the variance ratio of a 2-period continuously compounded return to
twice of a 1-period continuously compounded return:
VR 2
1 1 .
2 var[rt ]
2 var[rt ]
2 var[rt ]
140
Var rt .
.
.
q 1 q 2 q 3
T
1q
.
.
.
is
q 1
q 2
q 3
Var rt q 1T 1
Var rt q 2 q 1 2 q 2 2 2 q 3 3 ... 2 q 1
Thus,
VR q
Var rt q
qVar rt
1
q 2 q 1 q 2 2 q 3 3 ... q q 1 q1
q
1
2
q 2 3 ... q1 2 2 3 3 ... q 1 q 1
q
2
1 2 1 2 ... q2 2 2 3 3 ... q 1 q 1
q
1 2 1 q 1
q 1
1 2
1
2
1
q 1
q 1
1 1 1
q 1
q 1 q q 1 q
141
q 1 q
q
q
1
2
q q
1
.
1
q q 1
2
1
From the above, if VR(q) < 1 (q>1), then < 0 where || < 1. Conversely, in
general, when there is positive first order correlation, > 0, then VR(q) > 1.
The early literature on stock prices indicated that they behave like random
walks.36 If the strong-form market efficiency is true, stock return is not
predictable based on any past information. If price is a random walk and there
is no memory or correlation in returns, then VR(q) = 1 for any q. Statistically
significant deviation from 1 indicates long term memory.
Variance ratio tests indicated that stocks have smaller variances over a
long horizon than indicated by a random walk process. Thus there is some
exhibition of price memory. The existence of prices exhibiting memory may
be explained by a short-run disequilibrium during which prices do not adjust
quickly because of the cost of acquiring and evaluating information. Slow
price adjustment and informational inefficiency for limited periods of time,
can offer a window for profitable trading using simple technical trading
methods. This may explain why technical trading or charting activities persist
from time to time because of such short windows of informational
inefficiencies. But of course, it has also been found that many claimed
technical profits disappear when the trading costs such as commissions and
fees are counted. Smaller variances over a longer horizon could also be
negative return autocorrelation due to price reversion as a result of initial overreaction.
7.5
PROBLEM SET
7.1 Suppose we are testing a joint hypothesis that both statements A and B
are simultaneously true. Coming up with some statistical measures, we
perform a test and show that the joint hypothesis that A and B are true is
rejected. Does this mean that by itself statement B is not true?
36
The intellectual debate of the 1960s through the early 1980s was typified by the
popular book, A Random Walk down Wall Street, written by Burton Malkiel. Those
were exciting times when funds managers tried to invite investments by promising
superior returns that beat the market. As it turned out, funds generally did not
consistently beat the market, and indeed most funds lost to the market index in terms
of performance after commission and costs were deducted.
142
7.2 A simple two period market works as follows. In the first period, the
market opens for trade and an equilibrium price of $10 is reached for
stock X. It is common knowledge that in the next period when market
trades, 4 states of the world are possible. S1, S2, S3, and S4 can occur
with probabilities 0.1, 0.4, 0.4, and 0.1 respectively. Share prices P2 of
$15, $12, $10, and $7 are associated with each of these states in period 2
respectively. Investors risk-adjusted required rate of return in the market
is homogeneously 10% per period. In period 1, suppose an informed
investor has private information = {S1, S2, S3} that one of the states in
the set will occur in period 2.
(i) Conditional on , what is the informed investors expected stock price
E(P2) for X in period 1? What is the informed investors new revised
price for X in period 1?
(ii) What is the uninformed price?
(iii) If the investor managed to buy the stock X at $10.20, what is his
abnormal expected return?
7.3 Suppose we can collect the trading activity report of corporate officers
(filed with Exchange Commission) and run a regression
et+1 = c0 + c1It + t+1
where et+1 is residual return over day [t,t+1] after fitting a model such as
market model or CAPM, and It = 1 if officer buys own stocks, -1 if he
sells own stocks at time t. How does the significance of OLS estimate of
c1 imply about market efficiency? What form is it?
7.4 The tree below shows the resolution of uncertainties over time. Each
branch represents a state of nature occurring at that time period (t,t+1].
The number on the branch indicates the probability of occurrence of the
state. The alphabet at each node denotes the state at that time t. By t=3, the
security prices P3 are revealed as shown on the nodes.
(i) Find conditional expectations E(P3|B) and E(P3|X) at t=1.
(ii) Hence find the expectation of P3 at t=1 when there is no information.
(iii) Suppose the per period risk-adjusted discount rate of the security is
(4/3)1/2-1, independent of the information. What would be the
different prices of the security at t=1 given information B, given
information X, and given no information?
(iv) If the observed market price at t=1 is $3.45, what is a possible
conclusion to draw about market informational efficiency?
143
2/3
$10
$8
$8
$6
$5
$4
$2
$1
t=3
B
1/5
1/3
A
2/3
4/5
X
1/3
t=0
t=1
Z
t=2
7.5 For the AR(1) process rt rt 1 t , show that the variance ratio
2
VRq 1
1
q q
q q1
144
FURTHER RECOMMENDED READINGS
[1] John F.O. Bilson, (1981), The Speculative Efficiency Hypothesis, Vol
54 No 3, Journal of Business.
[2] John Y. Campbell, Andrew W. Lo, and A. Craig MacKinlay, (1997), The
Econometrics of Financial Markets, Princeton University Press.
[3] Fama, Eugene F., (1970), Efficient Capital Markets: A Review Of
Theory And Empirical Work, Journal Of Finance, Volume 25, Issue 2,
Papers And Proceedings Of The Twenty-Eighth Annual Meeting Of The
American Finance Association New York, N.Y. December, 28-30, 1969,
383-417.
[4] Kian-Guan Lim, (2007), The Efficient Markets Hypothesis: A
Developmental Perspective, Chapter 9 of Pioneers of Financial
Economics Volume II: Twentieth Century Contributions edited by
Geoffrey Poitras.
[5] Burton Malkiel, (1996), A Random Walk Down Wall Street. Revised
Edition, W Norton & Company.
145
Chapter 8
AUTOREGRESSION AND PERSISTENCE
APPLICATION: PREDICTABILITY
Key Points of Learning
Mean Reversion, Dividend-Price ratio, Forward shift operator, Long-run
return, Serial correlation, Out-of-sample forecast, Price-earnings ratio,
Business cycle, Momentum, Volume
In this chapter we explore how long-run mean reversion in stock returns could
have significant implication on the forecast or predictability of such returns.
8.1
MEAN REVERSION
Since the late 1980s, there has been a strong surge of research interest in the
belief that stock returns are after all predictable to some extent. We shall see
that this means a certain mean-reverting pattern or mean reversion in returns
over the long run, and also long-term predictability based on some current
fundamental information. It also means that random walk holds approximately
true only for the very short-run prices, but is not appropriate for long-run
prices.
An example of a mean-reversion process is
Yt+1 = c - Yt + et+1
where 0 < < 1, and et+1 is a zero mean i.i.d. process. When Yt is large, Yt+1
tends to be smaller, and vice-versa. The unconditional mean of the stationary
AR(1) process is c/(1+). The conditional mean is Et(Yt+1) = c - Yt .
If Yt is above the mean c/(1+), i.e. Yt > c/(1+), then
Et(Yt+1) = c - Yt < c - [c/(1+)] = c/(1+) .
Conversely, if Yt is below the mean, Yt < c/(1+), then Et(Yt+1) > c/(1+) .
Thus, we see the mean reversion effect. When Yt exceeds the unconditional
mean or the long run mean level, the expected Yt+1 next period will be
below this mean. Conversely, when Yt is below this long run mean level,
then the expected Yt+1 will rise above it.
As seen in the earlier chapter on market efficiency, when stock returns
exhibit memory, as in negative autocorrelation, then it would follow a meanreversion process, though the reversion could be very small and detectable
only over a long horizon.
146
This idea of stock return memory is also affirmed by a separate study by
Fama and French (1988)37 who ran regressions of dependent variable
k
rt,t k a bk rt k,t et k .
They found estimated coefficient b k to be negative. There is thus
empirical evidence that market prices exhibit some form of memory pattern
within them and future prices have some predictability content within them
from past prices.
For example, if rt 1 rt t 1 , with a small mean reversion, < 0.
Then,
ln Pt 1 ln Pt ln Pt ln Pt 1 t 1 ,
for < 0. Thus, ln Pt 1 (1 ) ln Pt ln Pt 1 t 1 .
Thus it is seen that past price Pt-1 (apart from the current Pt) does predict
future Pt+1, i.e. there is also price memory as a result of or in connection with
predictability in stock returns when 0.
Though random walk of prices has come to be pretty much accepted as
given in the short run, the long-run issue has become quite important from the
point of view of investors who invest for the long-run. The long-run situation
has paramount implications for pension funds, insurance companies, and the
generally greying population considering how their investment plans would
work out in the long horizon. One implication of long-term memory and mean
reversion in stock prices is that volatility of stock returns over a long horizon
does not increase as much as is prescribed by a random walk process. This
smaller risk would induce a larger fraction of wealth to be allocated to equity
if the investor holds a longer horizon, and is not about to liquidate the
positions over short horizons.
8.2
PRICE-DIVIDEND RATIO
One possibility that could explain returns predictability is that log dividendprice ratio is positively correlated with return, and more significantly so over a
longer horizon. We shall explore this idea further.
We shall show how dividend-price ratio or its inverse price-dividend ratio
is a fundamental input to stock return. Let us start with the definition of stock
return over one period as the sum of capital gain Pt+1/Pt and dividend yield
37
147
Dt+1/Pt .
R t 1
Pt 1 D t 1
Pt
(8.1)
Pt
P D t 1 D t 1
, or
R t 1 t 1
Dt
D t 1
Dt
Pt
1
D
P
t 1 1 t 1 .
Dt R t 1 Dt Dt 1
The left-hand side of the above definition is the price-dividend ratio of the
stock. Taking natural logarithms:
(8.2)
pt d t rt 1 d t 1 ln 1 ep t1 d t1
ln 1 e t 1 t 1 ln 1 e k p
d
k
t
1
t
1 e k
1
p t 1 d t 1 k
p t d t rt 1 d t 1 ln 1 e k
1 ek
rt 1 d t 1 c p t 1 d t 1
c p t 1 d t 1 d t 1 rt 1
k
where constant c ln 1 e k
1 ek
1
and
0 1 .
1 ek
(8.3)
c p t 1 d t 1 0
(8.4)
148
p d
p t d t F p t d t c d t 1 rt 1
1
c d t 1 rt 1
1 F
1 F 2 F2 ... c d t 1 rt 1
c
j1 d t j rt j .
1 j1
(8.5)
pt d t
c
E t j1 d t j rt j .
1
j1
(8.6)
Equation (8.6) shows that high price-dividend ratio on the left-hand side is
associated with high dividend growth or else low return in the future. (8.6)
shows that price/dividend ratios vary over time because they forecast changing
future dividend growth rates and returns.
If future dividend growth or returns in (8.6) are not forecastable, then
p t d t may be assumed as constant. Volatility in p t d t is largely due to
changing expected future returns (i.e. volatile information impact on
expectations). From (8.3),
rt 1 c d t 1 d t 1 p t 1 d t p t .
(8.7)
In (8.7), we should require expected future stock return to be positive (even if
realized returns may sometimes be negative). If we rewrite (8.7) as
E(rt 1 ) E(c d t 1 p t 1 Ed t 1 d t p t .
By (8.4), the first term on the right-side > 0, so a sufficient condition for
E(rt+1) > 0 is
E[d t 1 d t p t ] 0 .
(8.8)
Suppose the log dividend-price ratio38 (or simply dividend-price ratio from
38
We should be careful to point out that this is not the usual dividend yield used in
existing literature. Dividend yield is dividend divided by the previous period price,
while dividend-price ratio is dividend divided by the current end-of-period price.
However, this difference is sometimes blurred.
149
now on) has long persistence (i.e. a slow-moving variable):
d t 1 p t 1 d t p t t 1
(8.9)
(8.10)
Note that the last term could be highly serially correlated if d t 1 is highly
serially correlated. This may be seen from (8.9) if we take the first difference:
d t 1 p t 1 d t p t t 1 , or
d t 1 d t ( p t 1 p t t 1 )
where the last term is close to i.i.d. given that the prices are close to random
walks. From (8.10),
Ert 1 c 1 Ed t p t Ed t 1 t 1
Ed t p t c Ed t 1 t 1 0
where 1 (0,1) .
c E d t 1 t 1 0
(8.11)
c d t 1 t 1 t 1
where t 1 has zero mean and may be serially correlated. Substituting this into
(8.10), then
rt 1 d t p t t 1 .
(8.12)
If rt+1 is a constant, in this case , then de facto dt pt and also t+1 are
zeros. In this case, is the riskfree rate at time t+1. We define
rt*1 rt 1
to be the excess return at time t+1, so
rt*1 d t p t t 1 .
Then the system of regression equations can be written as
rt*1 d t p t t 1
rt* 2 d t 1 p t 1 t 2
rt* k d t k 1 p t k 1 t k .
(8.13)
150
Notice that 0 1 is small since 0 , and that 1 is large. Then,
from (8.9),
where stationary disturbance ut,t+k involves terms in t+k, t+k-1, ., t+1, t+k-1,
t+k-2, . , t+1 , and may be serially correlated. Let the left-hand side of the
above equation be the excess continuously compounded k-period return rate,
which can be written as
rt,t k d t p t u t,t k
*
(8.14)
1 k 1
. It should be noted that increases with k . This
1
where
OUT-OF-SAMPLE FORECASTS
Fama and French (1988, 1989)39 employed the idea in (8.14) to investigate the
influence of dividend-price ratio on future stock returns and found non-trivial
effects. (8.14) can be performed as it is, or in ratio forms without taking logs,
or by adding an intercept (as long as the estimate of this is not significantly
large). Thus, we can run OLS on
(8.15)
rt , t k a bd t pt u t , t k
using sample t=1,.,T-k Now, even though ut,t+k may be serially correlated, as
long as it is not contemporaneously correlated with dt-pt , then a and b are
unbiased and consistent. Because of the limitation in the lack of a long enough
time series, overlapping data usually have to be used in the regression
especially for large k>1, or long horizon. For example, if we use dependent
variables that are excess returns over 4-year holding period or horizon, the
regression equations would look like
r1,5 = a + b(d1 p1) + u1,5
39
Fama Eugene F., and Kenneth R. French, (1988), Dividend Yields and Expected
Stock Returns, Journal of Financial Economics 22, 3-27. Fama Eugene F., and
Kenneth R. French, (1989), Business Conditions and Expected Returns on Stocks
and Bonds, Journal of Financial Economics 25, 23-49.
151
r2,6 = a + b(d2 p2) + u2,6
r3,7 = a + b(d3 p3) + u3,7
...
rT-4,T = a + b(dT-4 pT-4) + uT-4,T
where subscripts (i,j) denote end of year i to end of year j. Thus the excess
returns are 4-year returns. More specifically, a dependent data point, say r 3,7
would mean log(P7/P3) or the difference of log of price at the end of the 7 th
year and log of price at the end of the 3rd year. And the explanatory variable
(d3 p3) would be the log of total dividends issued over the 3rd year less the
log of price at the end of the 3rd year.
If more frequent dividends e.g. quarterly dividends are available, and each
period t denotes a quarter, then in the above setup, r3,7 would mean log(P7/P3)
or the difference of log of price at the end of the 7th quarter and log of price at
the end of the 3rd quarter. And the explanatory variable (d3 p3) would be the
log of total dividends issued over the 3rd quarter (not year) less the log of price
at the end of the 3rd quarter. Moreover, as dividends are sometimes reported in
annualized terms, be sure to convert or decimate the annualized figure to a
quarterly figure for such reported dividends. Data care cannot be overemphasized if the theory or model is to be properly tested and utilized.
The OLS regression produces estimates a and b that are unbiased and
consistent. However, since the errors ut,t+k may be serially correlated (and
perhaps even heteroskedastic), the OLS t-statistics may not be appropriate for
inference, and must be read with care. Given estimates a and b , the out-ofsample forecast of a 4-year return starting at end of regression sample T,
would be for a return over [T, T+4], and is given by:
ET(rT,T+4) = a + b (dT pT)
(8.16)
r
t
T, T 4
N 2, 95%
e
where x T X T X .
1
N
x2
T
N
2
x
t
1
152
8.4
Shiller (2001)40 studied the relationship between aggregate U.S. market priceearnings ratios from 1881 to 1989 and the stock markets aggregate 10-year
real return over the 10 years period immediately after the P/E ratio is
registered. His plot looks as follows in our simplistic reproduction.
Figure 8.1
Annualized 10-year return
At t+10 y
20%
P/E at t
0
10
20
30
Clearly, there is a negative relationship. When todays P/E is high, the return
over 10 years into the future would be low. High stock market price today
therefore is bad for investors who wish to hold stocks over 10 years. On the
contrary, low P/E or low market price today is good for long term investors
who wish to buy and hold the stocks over the next 10 years, for the return will
be good. This result appears to be in sync with the dividend-price ratio D/P we
just studied. High P/E or low E/P is highly similar to low D/P, and hence
predicts low future return. Low P/E or high E/P, similar to high dividend-price
ratio, predicts high future returns.
The similarity could be due to the relationship:
40
153
P/E = P/D DPR
where DPR D/E (E is EPS or earnings per share) is the dividend payout
ratio. If DPR = 1, then P/D = P/E and the results apply equivalently to P/E. Or,
we can express the relationship as P/D = P/E DC where DC E/D, the
dividend cover or number of times earnings can cover the dividend on a per
share basis.
The predictability is about time variation in the risk premium (excess over
riskfree returns), and not about time varying interest or discount rate which is
found to be of less an impact. It makes economic sense in that at the low of a
business cycle, prices are low (low P/D but high D/P) and investors are
willing to hold stocks because of high expected returns (high risk premium
involved) in the future. This would be compensated by high expected return
over the long future horizon. High returns happen when the business cycle
reverts.
Conversely, at the high point of a business cycle in a boom, prices are
high (high P/D but low D/P), and low D/P predicts long-term lower returns.
Reversals in business cycles can be seen in the boom time of the 1960s
followed by the 1973 oil and exchange rate crisis, the 1970s recession, and
then followed by the 1980s and 1990s boom (despite a couple of incidences
of sharp market falls in 1987 and 1989), the tech-bubble burst in 2001, and the
subsequent and current boom since 2002-2003 till the global financial crisis in
2008. Markets then rebounced in late 2009. Business cycle reversals
contribute to the observed predictability phenomenon.
There is an exciting and impactful area of work in investment on
momentum41 which says that stocks that climb in prices for the last several
months would likely continue to do so in the next few months; and stocks that
are losers would continue to be losers over the next few months. Other studies
point out that momentum strategies to buy winners and sell losers (long-short
strategy) are particularly effective only in stocks with some of the largest trade
volumes, and are not evident in illiquid stocks. Hedge funds also come into
the picture and long-short is one popular though increasingly plain vanilla
strategy. Yet other studies of momentum indicate that the strategy works
mostly for small cap stocks. Momentum strategies come in various shapes and
sizes. In China, some studies indicate that the estimation of momentum would
take place over a window a few months back, followed by a window of
waiting, before a short one to two weeks window of momentum-induced
buying and then liquidation within the same period. Such strategies are said to
yield returns of over 15% p.a. after transactions costs.
41
Jegadeesh, N., and Sheridan Titman, (1993), Returns to Buying Winners and
Selling Losers: Implications for Stock Market Efficiency, Journal of Finance 48, 6591.
154
8.5
PROBLEM SET
8.1 Three researchers A, B, C each has some interesting findings about stock
market research. A finds that using the Box-Ljung test, stock returns are
not correlated, so he concludes that stock prices are not predictable. B
uses the variance-ratio tests and finds that they are significantly less than
1 at the 10% level, and concludes that there is evidence of stock return
correlation. C regresses long-term stock returns on their dividend-price
ratios and finds significant coefficients. He concludes that there is longterm stock price predictability. Are their conclusions all in conflict? Or
can you reconcile their findings? Explain.
8.2 In a regression of 5-year future excess return on current dividend/price
variable, the estimated regression equation is ln(1+Rt,t+5) = 0.56 + 0.2
ln(Dt/Pt), where Rt,t+5 is the 5-year nominal excess return rate. If current
dividend per share is Dt = $1, current price per share is Pt = $10, what is
the forecast of the return rate over the next 5 years if the 5-year riskfree
rate is 1% p.a.?
FURTHER RECOMMENDED READINGS
[1] John H. Cochrane, (2001), Asset Pricing, Princeton University Press.
(Chapter 20 of the book is particularly relevant for the initial part of this
chapter.)
[2] John Y. Campbell, and Luis M. Viceira, (2002), Strategic Asset
Allocation: Portfolio Choice for Long-Term Investors, Oxford University
Press.
155
Chapter 9
ESTIMATION ERRORS AND T-TESTS
APPLICATION: EVENT STUDIES
Key Points of Learning
Measurement (Estimation) period, Post-event period, Event window, Event
date, Benchmark model, Abnormal return, Standardized abnormal return,
Average abnormal return, Cumulative abnormal return, Cumulative average
abnormal return, Tests of significance, Price pressure effect, Substitution
effect, Information effect, Information leakage
EVENTS
156
deviations before the time of the publicly announced event may be interpreted
as information leakage or inside information revelation.
There are three stages in an event study analysis. First, we need to define
the event of interest and identify the period over which the event will be
examined. Next, we have to design a testing framework to define the way to
measure impact and to test its significance. Finally, we need to collect
appropriate data to perform the testing of the events impact and draw
conclusions in a model-theoretic and statistical sense.
Event studies are of various types. It is typically in the form of an
announcement.
(a) Firm-specific event e.g. insider trading, announcement of board change,
announcement of major strategic change, unusual rights issue
announcement, announcement to file or seek Chapter Eleven (reorganization in US in a last effort to avoid bankruptcy and liquidation
which is Chapter Seven), executive stock option issues, employee stock
option issues, etc.
(b) Across firms system-wide event e.g. unanticipated better-than-forecast
earnings announcement (good news), unanticipated worse-than-forecast
earnings announcement (bad news), anticipated better-than or else worsethan forecast earnings announcement (no news). Others include
announcement of bonus issues, stock splits, mergers and acquisitions,
better than expected dividends or worse than expected dividends, new
debt issues, seasoned equity issues, block sales, purchase of other
companies assets and stocks, etc.
(c) Macro events e.g. increase in CPF employer contribution, GDP growth
decline projection, regulatory changes etc.
Event studies focus on the performance of stock prices (or equivalently
stock returns) before, during, and after the event announcement. On the flip
side, it is also about the reaction of stock investors to news or information, if
any, from the event.
9.2
TESTING FRAMEWORK
157
end of day prices, is the variable commonly employed in conjunction with the
daily continuously compounded market return, and also the daily riskfree rate.
Figure 9.1
Event Sampling Frame
Measurement
(Estimation Period)
-250
Event Window
-10
Post-Event Period
+10
+250
Trading days t
Days are measured according to number of trading days before Event Day 0
(or the event announcement day), or number of trading days after Event Day
0. In the above, the measurement or estimation is about 240 sample points
from t=-250 to t=-11. Sometimes when we suspect that the market is
disruptive and beta may have changed over the nearly 1 calendar year (about
250+ trading days), then we use a shorter time series e.g.60 trading days (t=70 to 11). During a stable measurement period, a longer or large sample is
better in order to reduce sampling errors in the parameter estimators.
The post-event period that goes up to 1 calendar year is less often used
except for studies such as mergers and acquisitions, buyouts, IPOs, when a
longer time is required before the effect of the event is to be seen. The event
Window (or period) is the most important part of the time frame and is further
delineated as follows.
Figure 9.2
Announcement Windows
(Event) Announcement Day
Pre-announcement window
-10
Post-announcement window
+10
The number of days to use in the event window typically includes 2 calendar
weeks (or 10 trading days) before the announcement day, the announcement
day itself, and 10 trading days after the announcement day. This window
158
should be large enough to show up any possible changes to returns due to the
event. The event date is the day when the event becomes public information
e.g. when the announcement or news is broadcast as public information. It is
denoted as day 0 in the event study calendar.
The parameters estimated from the measurement period are used in the
event period to compute the defined deviation from normal return as a
measure of impact of event, if any.
9.3
BENCHMARK
A normal return will have to be defined. Various benchmark models are used.
(A) Market Model (MVN of stock returns leads to this specification)
rit i i rmt e it
for return rit stock i at time t, and conditions (a), (b), (c) are assumed.
(a)
cov(rmt, eit) = 0 ;
(b)
var(eit) = i2 , a constant ;
(c)
cov(eit , eit-k) = 0 for k0
Then OLS regression for data set, t=-250 to t=-11, will yield BLUE i and
11
1
rit i i rmt
L 2 t L10
11
1
2
eit
L 2 t L10
AR i ri ri ri i i rm .
(9.1)
159
In (B), the normal return during the EVENT WINDOW is defined as
ri rf i rm rf .
AR i ri ri ri rf i rm rf .
Alternative measures of abnormal return include:
(C) Market Adjusted Excess Return, defined as ri rm .
This measure may be appropriate especially when the stocks have betas close
to one. Note that when beta exactly equals one, then the CAPM abnormal
return is also the market adjusted excess return.
(D) Mean Adjusted Excess Return, defined as ri ri where ri
1 11
rit .
L t L10
i=1
i=2
0
0
i=N
160
Avoiding clustering of the events in time prevents confounding events e.g.
9/11 when all stocks dived. If we had done an event study of positive
announcements and all firm-events lined up around the month of late
November-December 2001, then the 9/11 would produce a negative effect that
is certainly not due to the positive earnings. Spreading out also has the
advantage of ensuring the various ARis of the various firms do not correlate
so that it is easier to estimate the variance of a portfolio of the AR is across
firms. In other words, if the firm-events all cluster together at the same time,
then since firm returns (even after adjusting for market) may still possess
correlation, the portfolio variance will be harder to measure because of the
covariance terms.
Suppose we use the Market Model (A) as our correct benchmark model.
Then the ARis computed during the Event Window would be randomly
varying about zero provided (a) the model is true (which we assumed already);
and (b) the market is efficient (quick to process information and update price)
and rational (will process relevant information correctly given the
information). The last statement (b) is basically an assumption about market
efficiency. The implication of (a) and (b) is that residual noise does not
display a mean > 0 or < 0 when there is no news to alter the firm or stocks
mean and volatility. In an efficient and rational market, without significant
information impact, the expected value of ARi is zero, conditional on market
information up to and including those at . Hence, average abnormal returns
over the event window should not be significantly different from zero when
there is no significant information in the event.
Significant information in the event announcement is taken to be
unanticipated news that causes the market to either (a) re-evaluate the stocks
expected future earnings (thus also dividends), and, or (b) re-evaluate the
stocks risk-adjusted discount rate, resulting in the immediate efficient
adjustment of the stock price. With significant information impact, the
expected value of ARi is non-zero (positive if good news on stock and
negative if bad news on stock), conditional on market information up to and
including those at . Average abnormal returns over the event window would
then be significantly different from zero. The abnormal return essentially
reflects a change in the conditional expectation of the stock return by the
market.
We define the null hypothesis H0: Event has no impact on stock returns (or
more specifically - no impact on stocks abnormal returns). This same
hypothesis can be made more detailed in several ways as follows. From
equation (9.1), the market model abnormal return is
AR i ri i i rm
so that
161
EAR i | rm Eri | rm E i | rm rm E i | rm
Eri | rm rm
0
and
2
var AR i | rm var ri | rm var i | rm rm var i | rm
1
var AR i | rm i2 i2
L
2rm i2
11
2
rm i2
11
2
rmt rm
t L 10
rm
rm2
t L 10
1
i2 1
L
where rm
mt
rm
1
11
t L 10
mt
rm
rm rm
11
2
rmt rm
t L 10
1 11
rmt .
L t L10
Recall that while t ranges from L-10 to -11 within the estimation period,
ranges from -10 to +10 within the event window. The last 2 terms in
and e it
are zero
and i involves e it (t = L
independent. Note that estimator errors are added to the variance of ARi.
9.4
TEST STATISTICS
If the estimation period sample size L is large, we can use the argument of
asymptotic result to show var AR i | rm converges to i2 , or use the latter
as an approximation in the case when L is fairly large, e.g. L = 240.
So, AR i | rm d N 0, i2 .
162
Test for each stock i using
AR i
N 0,1 or tL-2
i
(9.2)
This is sometimes called the SARi , the standardized abnormal return. The
distribution is sometimes also interpreted as Student-t with L-2 degrees of
freedom. This is because i2 is estimated via
11
1
rit i i rmt
L 2 t L10
2
i
and
2 2
L2
L2
i2
t L 2 where Z N(0,1).
i2
AR i AR i i
Z
i
i
i
AAR
1 N
AR i .
N i1
var(AAR | rm )
1 N 2
i
N 2 i1
So, AAR i | rm N 0, 2
N
i 1
2
i
AAR i
1
N2
N 0,1 .
i 1
(9.3)
2
i
There is another way to represent this test statistic in terms of the SARi.
1 N AR i
N
N 0,1 . This is approximately equal to (9.3).
N i1 i
In order to test for the persistence of the impact of the event during period k 1 (where 1 is start of event window, and 21 is end of event window, and 1
k 21 ), the abnormal return can be added to obtain the cumulative abnormal
return
163
k
CAR 1 , k AR i
1
CAR 1 , k
k 1 1 i2
N 0,1 .
(9.4)
CAAR 1 , k
k
1 N k
1 N
AR
i
AR i
N i1 1
1 N i 1
var(CAAR 1 , k | rm k ,...)
1 N
1 1 i2
2 k
N i1
i 1
1 1 i2 .
CAAR 1 , k
1 N
k 1 1 i2
N 2 i1
N 0,1 .
(9.5)
The z-statistics in (9.2), (9.3), (9.4), and (9.5) can be used to test the H0. The
interpretation in each case will be slightly different. For example, in (9.3) a
rejection says that there is an unexpected large increase or decrease in return
for that event day, while in (9.5) a rejection says that there is an unexpected
large increase or decrease in cumulative return.
The tests are based on the null that the returns mean level and variance or
volatility remain constant. A rejection would mean that the conditional mean
had changed due to the event announcement. If we want to test if there is a
conditional mean change not due to increased volatility because of the event,
i.e. not due to finite sampling errors caused by bigger variances, then we can
use the sample variance of AAR during the event window, i.e.
164
2 AAR i
1 21
(AAR i AAR i ) 2
20 1
AAR i
t 20
AAR i
for testing. It is important to interpret the CARi or CAAR graph appropriately.
Figure 9.4
Graph of Cumulative Abnormal Returns based on Events
Significant leakage
information from -3
of
Informational efficiency.
Significant impact on (-1,0)
and less on (0,1)
Permanent effect on asset
value with no reversal in AR
-10
2 s.e.
Temporary or transitory
effect such as a price
pressure
effect.
AR
reverses. CAR or CAAR
returns to 0.
+10
The three events in Figure 9.4 show different paths of CAR. In the dotted
event path, CAR is significant only at event date due to a significantly positive
AR. After that, AR is negative, and thus CAR falls back to zero. CAR remains
at zero thereafter. This shows a price pressure effect that could be due to
excessive buying (or selling) not because of information but liquidity. Because
of temporary inelasticity, large volume selling will drive down price
temporarily and large volume buying will drive up price temporarily. As there
is no information, the prices will revert, thus showing a reversal in AR over
one or two days: CAR reverts back to zero after that. The positive AR is
possibly due to buying pressure and illiquidity. Such price pressure effects do
not last and are said to be temporary or transitory.
However, if there is excessive buying pressure, but sellers are plentiful and
can easily substitute into other shares, then the supply curve is flat or highly
elastic, so the buying pressure will not force up price. This substitution effect
will not produce any significant AR or CAR in the first place.
165
Similarly, if there is excessive selling pressure, but buyers are plentiful and
can easily substitute from other shares, then the demand curve is flat or highly
elastic demand, so the selling pressure will not force down the market share
price. It is more plausible that the market operates under information effect as
well as substitution effect, so dotted event paths illustrated in Figure 9.4 will
be rare.
At times, even when a piece of information is known to produce asset price
changes, but if this information is already known or anticipated, then its
announcement at day 0 will not produce any significant AR or CAR.
However, the solid line event shown in Figure 9.4 illustrates significant
event with a positive impact on price and returns e.g. a positive earnings
announcement. The impact is permanent. The news is also quickly absorbed
and price adjusted quickly so that after day 1 of event, there is no more price
adjustment, and AR is zero, so CAR stays constant thereafter.
In the other dotted event with positive CAR, the news appeared to hit
before event date, at about t=-3. The AR and thus CAR are significantly
positive. This may indicate information leakage. It appears that inside
information has caused the significant price changes before the public news.
In what follows, we study an actual event on a firm, and show how to
conduct the various event study tests. The data is collected from Center for
Research in Security Prices (CRSP) database managed at the University of
Chicago.
9.5
Around the beginning of the recent global financial crisis, the Bank of
America (BOA) announced on September 15, 2008 (Monday), that she was
acquiring Merrill Lynch (ML) in a $50 billion stock-for-stock exchange. BOA
would exchange 0.8595 share for each ML share. The offer price for MLs
share in terms of the BOA share market value at that time represented about
US$29 a share.
It was to be one of the most tumultuous weeks in the financial history of
global capital markets. Early on that Monday, Lehman Brothers, the Number
4 investment bank in U.S. had filed for Chapter 11 bankruptcy. Earlier in
March JP Morgan Chase had bought out ailing Bear Stearns at a deep
discount. These investment firms had gotten into deep trouble by sinking huge
chunks of investment capital in real estate derivatives and collaterized debt
obligations (CDOs), and these instruments were either becoming a tiny
fraction of their original values or could not be traded as liquidity had dried up
in the fear of troubled mortgages.
Merrill Lynchs share was last traded the close of the previous week at
US$17.05 a share, so BOA appeared to pay a premium of 70% above the last
166
market price. At its peak in 2007, MLs share was selling above US$98. The
deal also carried substantial risks for BOA as ML had billions of dollars in
assets tied to mortgages that had dived in value. Merrill had reported four
straight quarterly losses.
Our task in this single-event single-firm study is to try as scientifically as
possible to test (a) if the acquisition announcement on September 15, 2008
(event day 0) contained significant information whether good news or bad
news for the stockholders, and (b) if there was a leakage42 of information
during the 2 weeks (10 trading days) before announcement.
We use daily traded data during one calendar year (about 252 market price
observations) before the event window [-10,+10] to estimate the market model
(A) as benchmark. The i and i estimates of the market model parameters
for BOA are then used to estimate the abnormal returns
AR i ri i i rm
in the event window. The abnormal returns during the event window 10 days
prior to announcement date till 10 days post-announcement are shown in
Figure 9.5.
Figure 9.5
Abnormal Returns Around the Event that Bank of America (BOA)
Announced Acquisition of Merrill Lynch on September 15, 2008
Return Rate
ALPHA
.15
.10
Announcement Date
.05
.00
-.05
-.10
-.15
254
256
258
260
262
264
266
268
270
272
29 2 3 4 5 8 9 10 11 12 15 16 17 18 19 22 23 24 25 26 29
August September
42
All information and data are collected from known published sources such as Yahoo
Finance, CRSP, and SGX database. The term leakage is purely a technical word
denoting information being known by a subset of investors. There is no connotation of
wrong doing or of issues that would be implicated by law.
167
The cumulative abnormal return is shown in Figure 9.6.
Figure 9.6
Cumulative Abnormal Returns of Bank of America (BOA) from August
29 to September 29, 2008
S
Return Rate
.35
.30
.25
.20
September 15
Event Date
.15
.10
.05
.00
254
256
258
260
262
264
266
268
270
272
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +10
N(0,1)
0.7719
2.1730*
2.6002**
1.9566
2.5640**
2.9481**
2.7704**
2.3756*
2.1583*
2.2696*
Dates
Sep 15
Sep 16
Sep 17
Sep 18
Sep 19
Sep 22
Sep 23
Sep 24
Sep 25
Sep 26
N(0,1)
0.3517
1.2196
1.3112
1.6551
2.9633**
2.6989**
2.6798**
2.5824**
2.5253*
3.0170**
168
CAR values that are significantly different from 0 at the 5% 2-tail significance
level are marked with an *, whereas those significant at the 1% 2-tail level
are marked with two asterisks **.
Figures 9.5, 9.6, and Table 9.1 indicate that BOAs prices had increased
two weeks prior to announcement date. At event date on September 15, there
was a significant drop in price. There did not appear to be any leakage as the
price did not show any marked changes in the few days before the event date.
The significant drop was probably due to the market taking the news as bad,
believing BOA was undertaking a high risk in acquiring ML.
However, in the 2 to 4 days subsequent to announcement, BOAs prices
showed reversion back to normality and in fact climbed somewhat. This may
be due to stockholders realizing after all that the acquisition was actually good
for the future business development of BOA.
The news reported that BOA was combining its own large consumer and
corporate banking business with MLs global wealth management, advisory
expertise, and financial services capabilities and capacities, creating a huge
finance corporation that would rival Citigroup Inc.
9.6
A SEMI-CONDUCTOR FIRM
Not long ago, some time during September 2002, there was a rave of
criticisms of Chartered Semiconductor Manufacturing (CSM) that stunned the
market on September 2, Monday, by announcing a massive rights issue (8
rights for every 10 ordinary shares) of S$1.1 b at S$1 a share when its share
was trading at Fridays close of S$2.10. There was talk of information leakage
to the big boys and a market sell-down the week earlier.
Even as late as August 31, CSM (the worlds 3rd largest chip maker, but
one seeing its share price fell from a height of glory at over S$19 in 2000 to
just over S$2 in mid 2002), which had lost money in the last 2 years before
2002, had maintained that its cash balance and access to credit were strong
enough and the market deduced that further fund raising was not imminent.
Ordinarily, ceteris paribus, a rights issue should not increase or reduce
existing share value. We illustrate as follows.
An ordinary share holder holds 1 share at $X. Suppose he is given a right
1:1 to buy a second new issued share at $Y < X. The holders original wealth
is $X + Y with $Y in some other assets e.g. bank deposit. After all ordinary
shareholders exercise the rights, new share price is $N x (X+Y) / (2N) = $
(X+Y) / 2, where N is the original number of shares of the firm. If Y < X, it
can be seen that the new share price is diluted or diminished. But the original
holder now has 2 shares, so his wealth is still $X + Y, and there is no change
in value.
169
The exception is that his $Y was originally in his own pocket and could
seek other investments, but now is constrained to be used as new capital in the
firm. Thus a firm will issue rights in 2 kinds of situation: (1) when it requires
additional capital funding for new growth projects that will improve returns
good news and is welcome by the shareholders because the $Y has higher
return in the firm than shareholders alternatives, and (2) when it requires
additional capital funding because it is losing money and needs to put in more
capital bad news and is not welcome by the shareholders because $Y has
lower return in the firm than shareholders alternatives.
Good news or good signals by the rights issue announcement will lead to
increased share price, while bad news or bad signals by the rights issue
announcement will lead to decreased share price.
In this particular episode, bad news appeared to be the case with CSM.
There are of course happy episodes with CSM including its rise in prosperity
back in the middle of 2000s. The sequence of events can be tabled as
follows:Close Friday
August 30, 2002
6 am Monday
Sep 2, 2002
Close Monday
Sep 2, 2002
Business Times September 5, 2002, reported, But the biggest question of all
concerns the activity in its shares in the run-up to Mondays announcement of
the rights issue. .Pre-announcement or not, it did not go unnoticed that some
quarters seemed to have started offloading Chartereds stock several days
before Mondays announcement. This heavy-volume selling drove down the
stocks price by more than 16 per cent during the week. There was clearly
a leak, thundered the director of a local broking house, How else do you
explain the share price falling sharply since last week, way ahead of the
announcement.
Our task in this single-event single-firm study is to try as scientifically as
possible to test (a) if the rights issue announcement on September 2, 2002
(event day 0) contained significant information that influenced share price,
and (b) if there was a leakage43 of information during the 2 weeks (10 trading
days) before announcement.
43
All information and data are collected from known published sources.
170
Data were collected from the Straits Times stock reporting pages. The
Market Model in (A) is used as the benchmark model. The abnormal returns
10 trading days before announcement date and up to 8 trading days post event
[-10, +8] are shown in Figure 9.7 below. (We use +8 because we collected the
data till that date.)
Figure 9.7
Abnormal Returns Around the Event that Chartered Semiconductor
Manufacturing (CSM) Announced a Massive Rights Issue on September
2, 2002
Return Rate
Announcement Date
19 20 21 22 23 26 27 28 29 30 2 3 4 5 6 9 10 11 12
August
September
The event date or the date of the announcement was September 2nd, 2002. In
event time, this is t=0.
The cumulative abnormal returns are shown in Figure 9.6 below. It starts
from event date -10 to +8. Figure 9.8 shows the CAR graph during the event
window [-10,+8]. Figure 9.8 shows that the CAR remains flat in the postannouncement window, indicating a permanent information effect.
Table 9.2 then shows the t-test results of the CAR (starting from t= -5) for
the days of the week prior to the Monday September 2, 2002, morning
announcement, and including the event date itself. As shown in the chapter,
the z-statistic is often used as an approximate for the t-statistic when the
sample size is large.
171
Figure 9.8
Cumulative Abnormal Returns of Chartered Semiconductor
Manufacturing (CSM) from August 19 to September 12, 2002
Return Rate
0.05
0
-12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 +1 +2 +3 +4 +5 +6 +7 +8
-0.05
-0.1
-0.15
September 2
Event Date
-0.2
-0.25
-0.3
-0.35
-0.4
-0.45
Table 9.2
T-statistics of the CAR from t= -5 to t=0
Dates
Aug 26
Aug 27
Aug 28
Aug 29
Aug 30
Sep 2
N(0,1)
-0.11001
-0.54175
-0.98432
-1.21243
-2.64587**
-6.17074**
CAR i 1 , 2
AR
t 1
it
172
where 1 = 10, and 2 = +10.
(ii) Suppose the sum of the variances of the ARit of 5 event-stocks (i =
5
i 1
2
i
173
Chapter 10
MULTIPLE LINEAR REGRESSION
AND STOCHASTIC REGRESSORS
Key Points of Learning
Stochastic regressor, Consistency, Asymptotic efficiency, Orthogonal
projection, Ordinary least squares method, Tests of restrictions, Adjusted R2,
Schwarz criterion, Akaike information criterion, Hannan-Quinn criteria,
Forecasting
In this chapter we continue with the linear regression theory and extend the
Two-Variable case in Chapter 3 to more than two variables. This is called
multiple linear regression when there is more than one explanatory variable
besides the constant.
10.1
STOCHASTIC REGRESSORS
u i where Yi is dependent
Linear regression equation is Yi X i ~
~
X i is independent or explanatory variable
ui is disturbance or error variable, and and are regression
(regressor), ~
variable (or regressand),
174
variables (we usually would omit if the context is clear whether they are to be
treated as random variables or as the realized values of the random variables).
Each pair of observable X i , Yi for i = 1,2,3,,N are realizations of the
~
~
Yi X i ~
ui .
The Classical Conditions (desirable conditions) for OLS regression are:
(A1) E~
ui 0 for every i,
(A2) E ~
ui
(A3) E~
ui ~
u j 0 for every i j
~
(A4) X i and ~
u j are stochastically independent (of all other r.v.s) for each i, j.
(A5) ~
u ~ N 0, 2
i
~
E~
u i | X j , all j 0 for every i .
~
u i , X j 0 for all i, j. The latter is not as strong as
E~
ui 0 and cov ~
~
stochastic independence between X i and ~
u j , and is implied by the latter. The
~
~
condition can be written in matrix format as E U Nx1 | X Nx1 0 . Note that in
2 ~
E~
u i | X j , all j 2 , a same constant for every i .
~
E~
ui ~
u j | X k , all k 0 for every i j .
(10.1)
(10.2)
175
Note
that
stochastic
or
probabilistic
~
independence of X i and ~
u j are obviated. We can also replace (A5) by U | X
1
1
X
X
Y
i N i i N Yi
i 1
i 1
OLS i 1
.
2
N
N
1
X i N X i
i 1
i 1
N
~ ~
themselves are random variables. Given that X i , Yi for each i take
realizations X i , Yi , the estimators and also take realization in the
form of estimates and (unique for each sample { X i , Yi }i=1,2,3,..,N .
The estimates are
Y X
X
N
i 1
i 1
44
X Yi Y
i
176
Again the symbols and shall be used both in the context of random
variables (estimators) or in the context of estimates. Now, as estimator,
~
Xi
i 1
~
~
X i X i u~i
i 1
1
N
1
N
i 1
u~i
~ 1 N ~
Xi N Xi
i 1
i 1
N
~ 1 N ~ ~ 1 N ~
X i N X i u i N u i
i 1
i 1
i 1
2
N
~ 1 N ~
Xi N Xi
i 1
i 1
~
~
1
X i N X i
i 1
i 1
1
N
u~ .
i 1
i 1
i 1
E E E X i 1
2
N
N
~ 1
~
Xi N Xi
i 1
i 1
N
N
N ~
~
X i N1 X i E X u~i N1 u~i
i 1
i 1
E i 1
2
N
N
~ 1
~
Xi N Xi
i 1
i 1
177
since conditional on X = {X1 , X2 , X3 ,,XN}, E~
u i | X 0 . Likewise it
can be shown that E . This is because
E Y E E
N
1
N
i 1
~
i
1
N
~
i
X
i 1
~
~
E Yi E N1 X i E X
i 1
N ~
~
E Yi E N1 X i
i 1
~
~
E Yi E X i .
mean
N
w Y , for different
i 1
weights wi that are not functions of Yis nor of the parameters, but just
functions of X, and have the least variance given the same sample size
178
1
1
X
X
Y
i N i i N Yi
N i 1
.
i 1
i 1
2
N
N
1
Xi N Xi
N i 1
i 1
N
By
the
law
N
p lim
N
1
N
of
large
numbers
for
stationary
random
variables,
E X i . So,
i 1
X
N
1
p lim p lim
E X i Yi E Yi
cov X i , Yi
.
var X i
1 N
Xi E Xi
N i 1
This convergence is in the sense of a probability limit plim, or it is
convergence in probability. It means that in the ordinary limit sense of
numbers in Euclidean distance,
N
i 1
When dealing with stochastic regressors X, some authors e.g. Judge et. al., (1980),
in Introduction to the theory and practice of econometrics, (2 nd ed. p 573), JohnWiley, prefer not to call this a linear estimator since it is now a stochastic function of
Y. It is strictly a stochastic function because it contains wi that is a function of
stochastic X. Hence it is strictly not BLUE, unless we again condition it on X.
Conditioned on X, it of course has minimum variance amongst all unbiased estimators
of the same linear form.
179
The convergence can be depicted as a series of graphs with different
sample size N in Figure 10.2.
Figure 10.2
N=1000
N
N=200
N=60
mean
The above density functions depict those of the sampling distributions of the
estimator
estimators, and have the least asymptotic variances, then they are
asymptotically efficient.
Under the classical conditions, OLS estimators are BLUE, consistent, and
asymptotically efficient (amongst the class of linear consistent estimators).
With additional distributional assumption (A5), OLS estimators are also
normally distributed, and statistical inference becomes possible in finite
sample. (Note that without (A5), statistical inference is sometimes possible,
and if so, only in the asymptotic sense when the sample size is very large and
that the statistics converge to some kind of normality under the Central Limit
Theorem.)
However, there are situations in which OLS will not possess these
desirable sampling properties.
~
It is a good time to pause and ask why stochastic regressor X is necessary.
In economics and finance, the explanatory variables are not drawn within a
laboratory control setting. Instead, they are drawn from some probability
distribution, and therefore are not determined ex-ante. Therefore the
assumption of stochastic independence between X and U is important, or else
it is important to be able to say that given X, U behaves like white noise.
180
However, if we are able to reproduce a new time series sample { X i , Yi for
each i } by repeatedly sampling new Yi while keeping Xi fixed, then the
weaker set of classical conditions that are conditioned on X becomes quite
intuitive and obvious. We shall keep to the more general setting of stochastic
regressors except when situations allow otherwise.
The above discussion extends to multivariate linear regression with more
than two regressors including the constant. For multivariate regression
involving many variables, it is more convenient to work with matrices. For the
set of k explanatory variables including the constant, the matrix notation X Nxk
is used. The k columns denote the k number of regressors, while the N rows
denote the sample values of each regressor in a sample of size N.
The homoskedastic and non-autocorrelation condition is written as
T
E U Nx1U Nx1 | X Nxk 2 I NxN . And the conditional zero mean condition is
181
independent, and the classical conditions are met, the OLS test statistics such
as t-, F-, remain valid provided the appropriate variances are used for
normalizing.
10.2
When we extend the linear regression analyses to more than two variables, we
are dealing with multiple linear regression model viz.
Yi = b0 + b1X1i + b2X2i + b3X3i + .. + bk-1Xk-1i + ui
(10.4)
Y1
Y2
where YNx1 =
,
YN
1 X11
1 X12
XNxk =
1 X
1N
X k 1,1
X k 1, 2
X k 1,N
u1
u
2
, and UNx1 = .
uN
b0
b
1
Bkx1 =
b k 1
var U u2 I NN
where E u i
2
u
182
U ~ N 0, u2 I .
(A4)
Law of Iterated Expectations. Thus, it also implies that X and U are not
contemporaneously correlated, i.e. cov(Xji , ui) = 0 for every j.
Y XB
. The sum of square residuals
Let the estimated residuals be U
N
u
t 1
2
t
TU
. Applying the Ordinary
U
U
0.
2XT Y 2X T XB
XT X
So, B
dimension k x k. For inverse of (XTX) to exist, (XTX) must have full rank of k.
The rank of XNxk is at most min(N,k). Thus, k must be smaller than N. This
means that when we perform a regression involving k explanatory variables
(including the constant), we must employ at least a sample size of N.
10.3
183
Figure 10.3
k-dimensional hyperplane
formed by XB (for any B)
or linear combination of
the k columns of XN x k
Y XB
XB
0 . Therefore, we obtain
Thus X Y XB
T
XT X
B
10.4
XT X
B
X T XB U B X T X
XT U .
B E XTX
(A3), then also E B
X E U B .
that in this case, there is no restriction on X (i.e. it is not necessary that there
to be BLUE). Therefore,
must be a constant regressor amongst X, for B
B X T X 1 X T U
B
, given X, is
The k k covariance matrix of B
E B
B
var B
E XTX
u2 X T X
XTU
B B
E XTX
U X X X X X
T
X T IX X T X
u2 X T X
XTU
1
X X
T
XTU
X T E UU T X X T X
184
E XTX
var B
X T UU T X X T X
X UU XX X X
EX X X I XX X EX X .
E E XTX
2
u
2
u
u j
th
10.5
diag element
X X
T
TU
U
Nk
. If H0: Bj = 1, the
is tN-k .
TESTS OF RESTRICTIONS
RB
E RB
r R cov B
1 R T
or var RB
2 T
where cov B X X
u
1
as noted in the last section.
185
x x 2
var 1
y y2
1
a
x a x 2 b x 3c
x 3
b var 1
y 3
y
a
y
b
y
c
2
3
1
c
x T Vx x T Vy
T
T
y Vx y Vy
x1 y1
x1 x 2 x 3
V x 2 y 2 = RVRT
y1 y 2 y 3 x y
3
3
a
where x = (x1 x2 x3) , y = (y1 y2 y3) , and V3x3 = cov b .
c
T
TU
2
u
.
~ N-k2 , and is independent of B
Thus,
1
r T R X T X R T
RB
U U /( N k )
RB r / q
~ Fq, N-k
(10.5)
min Y XB Y XB 2 T RB r .
T
C 2R T 0
2XT Y XB
186
C r 0.
and also 2 RB
C X T Y
X T X R T B
C is OLS under the constrained
where B
0 r
R
Or,
regression. Then,
1
C X T X R T X T Y
B
0 r
R
X T X 1 I R T CR(X T X) 1
1
CR X T X
where C - R(X T X) 1 R T
T
R T C X Y
C
r
1
I R CR(X X) X Y X X R Cr
X X X Y X X R CR(X X) X Y - r
X X R CRB
- r
B
X X R R(X X) R RB
- r .
B
T
X X
B
C
X X
Therefore,
C B
XT X
B
1
T
T 1
R T R(X T X) 1 R T
RB - r .
1
C Y - XB
- X(B
C B
) U
X(B
C B
).
U
Hence
0.
CT U
C U
TU
(B
C B
) T X T X(B
C B
) since X T U
U
CT U
C -U
TU
RB
- r R(X T X) 1 R T RB
- r which
Then U
numerator in (10.5) before dividing by q. Hence we see that
T
is
the
CSSR USSR /q ~ F
USSR/(N k)
q, N k
/k
B X X B
T X T X B
/k
B
~ Fk, N-k .
TU
/( N k )
TU
/( N k )
U
U
187
The above is a test of whether the Xs explain Y or allow a linear fitting of Y.
Suppose the Xs do not explain Y. But as long as the mean of Y is not zero,
then the constraint b0=0 should not be used. If used, there will be rejection of
the H0, but this will lead to the wrong conclusion that Xs explain Y.
In other words, if we allow or maintain the mean of Y to be non-zero, a
more suitable test of whether Xs affect Y is H0: b1= b2=..=bk-1=0. We leave
out the constraint b0=0. How do we obtain such a test statistic?
The restrictions H0: b1= b2=..=bk-1=0 are equivalent to the matrix
restriction of Rk-1xkB = 0k-1x1 where R=[0| Ik-1] with its first column containing
all zeros. Partition X = [1| X*] where the first column 1 contains all ones, and
X* is N x (k-1). Then XTX = [1| X*]T [1| X*] =
N
1T X *
.
*T
*T *
X 1 X X
Now R(XTX)-1RT = [0| Ik-1] (XTX)-1 [0| Ik-1]T. This produces the bottom right
(k-1) x (k-1) submatrix of (XTX)-1. But by the partitioned matrix result in
linear algebra, this submatrix is
[(X*TX*) X*T1 (1/N) 1TX*]-1 = [X*T (I 1/N 11T)X*]-1
where (I 1/N 11T) M0 is symmetrical and idempotent, and transforms a
matrix into deviation form, i.e.
X 11 X1
X 12 X1
0 *
M X
X 1N X1
T
-1
Hence R(X X) R
[X**TX**].
X 21 X 2
X 22 X 2
X 2N X 2
X (k 1)1 X k 1
X (k 1)2 X k 1
X ** .
X (k 1)N X k 1
e . So,
From the OLS definition of estimated residual e , Y XB
M 0 e . Now M 0 e e since for any column of X, including
M 0 Y M 0 XB
- X T Y 0 or
1, XT e 0 . This comes from normal equations X T XB
) 0 or - X T e 0 . Then M 0 Y 0 X* * B
e . Let
- X T (Y - XB
Y1 Y
Y Y be Y**.
M 0Y 2
YN Y
188
Y ** X **
b 1
b 2 e
b k 1
b 1
b
2
b k 1
** e where B
** X**T X**
then, Y ** X ** B
Thus, Y
**T
-1
X**T Y** .
**T X **T X ** B
** e T e ,
Y ** B
* T
since X e M X e X M e X e 0 .
Hence, TSS = ESS + RSS.
Now,
**T
RB R X X
T
2
u
RT
*T
RB B
1
**T
*T
**T
**
X ** B
2
u
**
**
B
X
T
2
u
**
**
B
2
k 1
**
** T X ** B
** / k 1
B
~ Fk 1, N k .
e T e / N k
ESS/[TSS k 1]
~ Fk 1, Nk .
RSS/[TSS N k ]
Therefore,
R 2 /(k 1)
~ Fk-1, N-k .
(1 R 2 ) /( N k )
Note that if R2 is large, the F statistic is also large. What happens to the test on
H0 : B2=B3=..=Bk=0 ? Do we tend to reject or accept?
, usually
While R2 determines how well is the fit of the regression line XB
adjusted R2 is used to check how well the Y is explained by the model XB:
R2 1
RSS /( N k) 1 k N 1 2
R .
TSS /( N 1) N k N k
189
overfitting, then for any fixed N, increase in k will be compensated for by a
reduction to a smaller R 2 , ceteris paribus.
Three other common criteria for comparing the fit of various specifications
or model XB are:
L* k
lnN
N N
L * 2k
Akaike Information criterion: AIC 2
N N
L * 2k
Hannan-Quinn criterion: SC -2
lnln N
N
N
N/2
1 N
where L* ln 2 2
exp 2 e i2 , and
2 i 1
Schwarz criterion: SC 2
e i2 is SSR.
i 1
Unlike R , when a better fit yields a larger R number, here, smaller SC and
AIC indicate better fits. This is due to the penalty imposed by larger k.
10.6
FORECASTING
cT B
.
Y
cT var B
c . Note that the variance
The variance of the forecast is var cT B
of forecast is not the variance of forecast error.
However, in terms of prediction or forecast error in the next period
Y=cTB+U*1x1
U* cT B
B
YY
u2 1 c T X T X 1 c
So var Y Y
YY
And
u 1 cT X T X c
1
~ tN-k .
t 0.025 u 1 cT X T X 1 c .
Y
10.7
PROBLEM SETS
10.1
190
target firm as dependent variable, and the capitalization size of the
firm as explanatory variable.
Dependent Variable: IRET
Method: Least Squares
Sample: 1 100
Included observations: 100
Variable
C
SIZE
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Coefficient
Std. Error
t-Statistic
Prob.
0.0900
-0.003
0.0300
????
3.000
????
0.0034
????
????
0.0547
0.2928
0.1130
0.0650
????
0.000000
10.3
10.4
191
(iv) What is the confidence interval for level sales next month?
10.5
10.6
ln $Labor
14.5
15.3
16.1
17.4
18.4
18.8
18.8
19.7
20.1
20.3
20.8
21.2
ln $Capital
16.7
16.8
19.5
22.1
22.3
17.5
20.2
20.4
12.7
22.9
19.3
17.1
192
20.64
22.48
26.48
25.28
27.56
25.72
25.52
27.96
10.7
21.3
22.9
23.1
23.7
24.8
25.5
25.7
28.8
16.8
19.8
31.9
26.3
25.9
22.1
24.1
25.7
.
0.539 0.004
1141
0.539 0.276 0.001
and (X Y) as
6256
12521
64389
193
125
e i2 0.85 ,
i 1
125 1.5
and XT =
3
1.5
1 1
x 3 x125
(i) Find the t-statistics of the OLS estimators a , b under the null
XTX =
1
x
1
1
x2
10.10
194
10.11
Coefficient
Std. Error
t-Statistic
Prob.
-0.0005
1.52635
0.05
0.7471
-0.01
2.0430
0.992
0.0456
????
????
0.003
0.000522
-588
1.57
0.0085
0.02
10.75
10.76
????
(i) Find the missing F-statistic, R2, and adjusted R2 statistics in the
Table. (Hint: All the information required are available within the
table).
(ii) What is the null hypothesis that the reported F-statistic is testing?
What is the p-value of this F-test?
(iii) Given the market model of stock returns
rit = i + i rmt + eit
where eit is i.i.d. N(0, e2), as in (a), find the unbiased estimates of
e2 and of the conditional variance var(rit|rmt).
(iv) In (iii), what is the unconditional variance of rit ?
FURTHER RECOMMENDED READINGS
[1] Fama, Eugene F., and James D. MacBeth, (1973), Risk, Return, and
Equilibrium, Journal of Political Economy, Vol 81, 607-636.
[2] Fama, Eugene F., and Kenneth R. French, (1992), The Cross-Section of
Expected Stock Returns, Journal of Finance, Vol XLVII, No2, June, 427465.
[3] Jensen, MC, and Myron Scholes, (1972), The capital asset pricing model:
some empirical tests, in M. Jensen ed., Studies in the Theory of Capital
Markets (Praeger Press).
195
Chapter 11
DUMMY VARIABLES AND ANOVA
APPLICATION: TIME EFFECT ANOMALIES
Key Points of Learning
Dummy variables, Asset pricing anomalies, Day-of-the-week effect, Seasonal
Effect, January Effect, Fear Gauge, Test of equality of the Means, Wald test,
Analysis of variance
DUMMY VARIABLES
1 female
0 male
We usually write D1
196
1 managerial responsibility
D2
0 no managerial responsibility
E = number of years of formal education
W = number of years of working experience
Then a linear regression model is
Si = c0 + c1D1i + c2D2i + c3Ei + c4Wi + ui
(11.1)
for subjects i = 1,2,3,,60.
ui is the disturbance term that is i.i.d. and independent of the explanatory
variables. Using OLS method for the linear regression, the results are reported
below. The dependent variable is salary and the sample size is 60.
Table 11.1
Explanation of Monthly Salary
The tests of the coefficients indicate that managerial position, education, and
working experience are significantly related to higher salary. The last two are
especially significant at p-values of less than 0.001%. Each additional year of
education adds on average S$408 to salary, ceteris paribus (everything else
being equal). Each additional year of work experience adds on average S$245
to salary, ceteris paribus.
Gender is not significant in explaining differences in salary. If this were
significant, the coefficient of S$274.39 on gender D1 indicates that if the
respondent were female (thus D1=1), then there is an average increase of
S$274.39 in monthly salary over that of a male, ceteris paribus. What is the
interpretation of constant of regression C? Its estimate is - S$4606. Usually
the constant is an estimate of some base level or fixed level effect. Here, due
197
to about 6 years of mandatory education in Singapore and respondents
average work years of about 6.5, their incremental income due to these would
have put them at the baseline of about zero to start with. This is of course a
rough explanation of why there is such a low negative constant.
Suppose now we run the same OLS regression except without the
constant. Now we create two alternative dummies in place of the gender
dummy.
0 female
and
G1
1 male
1 female
G2
0 male
(11.2)
Notice that G1i + G2i = 1 for any i. The regression results are as follows.
Table 11.2
Explanation of Monthly Salary
In the OLS regression of (11.1) and (11.2), it is notable that almost all the
reported results are identical. All unaffected explanatory variables D2i , Ei ,
and Wi , produced the same estimates and t-tests. SSE, R2 , mean of dependent
variable, DW d-statistic all remain the same.
The estimated coefficient of g 1 and its t-statistic in (11.2) are identical
with the estimate of the constant and its t-statistic in (11.1). Thus it is seen that
if the male dummy is left out in (11.1), it is actually contained in the constant.
In (11.2), the male dummy G1i is included, but the constant is left out, so it
replaces the role of the constant.
However, the estimated coefficient g 2 of G2i and its t-statistic in (11.2)
198
does not appear to look similar with the coefficient of D1i in (11.1) even
though G2i = D1i for every i. However, note that g 2 - g 1 = S$274.39 = c 1 .
Thus in (11.1), the estimate of c1 is really about the relative additional mean
salary of female to male. This relative figure is more explicitly revealed in
(11.2) as g 2 - g 1 .
A key point to note is that in any regression when we allocate the
respondents into x mutually exclusive and exhaustive categories and provide
dummies D2, D3, .., Dx (equals 1, otherwise 0) for membership in category
2, 3,..,x respectively, and then perform a linear regression with a constant c,
the estimated coefficients of dummy variables D2, D3, , Dx are to be
interpreted as membership effect relative to category 1 which does not have a
dummy. The membership effect of category 1 is, however, captured in c .
Thus to obtain the full (not relative) membership effect of say category 2, we
add the estimated coefficient of dummy variables D2 to c . This full effect can
also be obtained if we perform a MLR without a constant, but instead assign a
dummy to all categories.
In (11.2), we cannot have all dummies for all mutually exclusive and
exhaustive categories and still introduce a regression constant. In other words,
if the dummies are exhaustive, i.e. G1i + G2i = 1 for every i, then introducing a
constant viz.
Si = c + g1G1i + g2G2i + c2D2i + c3Ei + c4Wi + ui
will create a singularity condition in the explanatory X 60x6 matrix and thus
OLS estimates cannot be derived because X is singular (of rank < k < N; or
here rank < 6) and so XTX is not invertible. This is sometimes termed the
dummy variable trap.
11.2
The CAPM contains only one market factor. Sometimes, other factors are
found that explains higher or lower mean returns to portfolios or stocks that
cannot be explained by existing equilibrium asset pricing models. More
specifically, the issue is that if such abnormally high average returns exist,
why should the rational and efficient market (assumed) not act to cream away
this abnormal return quickly enough? For this reason, such noticed empirical
irregularities are called asset pricing anomalies.46 Since the early 1980s, time
46
There is some similarity with new research in the area of behavioral finance which
tries to explain why classical rational expectations approach sometimes cannot explain
all empirically observed or stylized facts about market prices and investment results
because human beings can sometimes be irrational or behave in ways yet to be totally
199
effect anomalies (unusual observations) on stock returns were reported in the
US stock market.
Many researchers reported that the mean daily stock returns on Monday
tend to be lower than those of other weekdays47. Other time or seasonal effect
anomalies such as January effect (higher monthly return in January relative to
average), holiday effect, and so on, have since been reported. Non-time
anomalies include higher expected return from investing in smaller
capitalization firm, firms with lower P/E ratios, and so on.
Of course, many anomalies do not survive closer analysis once
transactions costs are taken into account. If a so-called anomaly exists, but
which cannot be profitably exploited because transaction costs are too high,
then it cannot be anomalous by definition. Time anomalies are interesting in
finance and useful for instruction from the point of teaching basic
econometrics when it comes to using dummy variables. In this chapter we will
experience the Monday blues lower than normal returns on Mondays. Many
anecdotal evidence and explanations have been advanced for such anomalies.
Some anecdotes seem to last for a long time and do not disappear so easily.
However, similar studies in the 1990s showed that the week-of-the-day effect
in the U.S. stock market may have largely disappeared in the 1990s.
Let the daily continuously compounded return rate Rt of a market
portfolio or else a well traded stock at time t be expressed as
Rt = c1D1t + c2D2t + c3D3t + c4D4t + c5D5t + et
(11.3)
where cjs are constants, and Djts are dummy variables. These dummy
variables take values as follows.
1
D1t
0
1
D 2t
0
1
D 3t
0
1
D 4t
0
understood. Not all anomalies are, however, candidates for behavioral financial
research as some may be explained by purely taxation or accounting effects.
47
See a complete literature in Pettengill, G.N., (2003), A Survey of the Monday
Effect Literature, Quarterly Journal of Economics, June.
200
1
D 5t
0
R1 1 0 0 0 0
u1
R 2 0 1 0 0 0 c1 u 2
c
2
c3
c 5
R
u
N 0 0 0 0 1
N
Y
Each row of the XNx5 matrix contains all zero elements except for one unit
element. Assume U ~ N( 0 , u2).
The day-of-the-week effect refers to a price anomaly whereby a particular
day of the week has a higher mean return than the other days of the week. To
test for the day-of-the-week effect, a regression based on equation (11.3) is
performed.
To test if the day-of-the-week effect occurred in the Singapore market, we
employ Singapore Stock Exchange data. Continuously compounded daily
returns are computed based on the Straits Times Industrial Index (STII) from
11 July 1994 - 28 August 1998 and on the re-constructed Straits Times Index
(STI) from 31 August till the end of the sample period. The data were
collected from Datastream. At that time STI was a value-weighted index based
on 45 major stocks that make up approximately 61% of the total market
capitalization in Singapore. Since the STII and STI were indexes that captured
the major stocks that were the most liquidly traded, the day-of-the-week
effect, if any, would show up in the returns based on the index movements.
We use 1075 daily traded data in each period: July 18, 1994 August 28,
1998 (period 1), and September 7, 1998 October 18, 2002 (period 2). Table
11.3 below shows descriptive statistics of return rates for each trading day of
the week.
Table 11.4 shows the multiple linear regression result of daily return rates
on the weekday dummies.
201
Table 11.3
Return Characteristics on Different Days of the Week
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
MON
-0.002278
-0.002272
0.160307
-0.078205
0.018646
3.192659
32.39143
TUE
WED
-0.001052 0.001194
-0.001255 0.000269
0.095055 0.069049
-0.092189 -0.039137
0.013078 0.012631
0.263696 0.920829
25.84039 8.182601
Jarque-Bera
Probability
8103.961
0.000000
4675.906
0.000000
Sum
-0.489672 -0.226154
Sum Sq. Dev. 0.074401 0.036600
Observations 215
215
THU
-0.000394
-0.000206
0.040004
-0.079436
0.013085
-1.154059
10.15761
FRI
-0.000359
3.71E-05
0.039870
-0.069948
0.010729
-1.147256
11.22779
270.9991 506.6721
0.000000 0.000000
653.6114
0.000000
Table 11.4
Rt = c1D1t + c2D2t + c3D3t + c4D4t + c5D5t + et
July 18, 1994 August 28, 1998 (period 1)
Dependent Variable: PERIOD1RE
Method: Least Squares
Sample: 1 1075
Included observations: 1075
Variable
Coefficient
Std. Error
t-Statistic
Prob.
D1
D2
D3
D4
D5
-0.002278
-0.001052
0.001194
-0.000394
-0.000359
0.000947
0.000947
0.000947
0.000947
0.000947
-2.404414
-1.110473
1.260246
-0.416095
-0.378842
0.0164
0.2670
0.2079
0.6774
0.7049
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
0.006554
0.002840
0.013889
0.206413
3074.541
-0.000578
0.013909
-5.710774
-5.687611
1.678574
202
The OLS regression result shows that the coefficient c 1 = -0.002278 is the
only one that is significantly different (smaller) than zero. This corresponds to
D1, the Monday dummy variable. Thus, there is a significantly negative mean
return on Monday. This Monday day-of-the-week effect in Singapore is
similar to evidence elsewhere in US and in other exchanges in Asia.
Why do we interpret this as negative mean return on Monday? (11.3)
implies that
Rt = c1D1t + et
since the other cj coefficients are not significantly different from zero. Then
E(Rt) = c1D1t . For Mondays, D1t = 1. So, mean Monday return, E(Rt|Monday)
= c1. This is estimated by c 1 = -0.002278. From Table 1 and 2, it is seen that
the means of the returns on Monday, Tuesday, etc. are indeed the coefficient
estimates of the dummies D1t , D2t , etc. However, if there are other nondummy quantitative explanatory variables on Rt, then c j is in general not the
mean return on the jth day of the week, but just its marginal contribution.
The negative or lower Monday return effect is sometimes addressed as the
weekend effect due to the explanation that most companies typically put out
bad news if any during the weekends so that Monday prices on average end
relatively lower than on other days of the week. This weekend effect
sometimes appears only some months of the year. In the US, the Monday
effect does not typically appear in January. It has also been empirically found
that Friday returns on average on the highest, at least in studies before 2002.
11.3
We test the null hypothesis that the means of all weekday returns are equal.
This takes the form of testing if the coefficients to the 5 dummies are all
equal, viz. H0: c1 = c2 = c3 = c4 = c5. The results are shown in Table 3 below.
Table 11.5
Wald and F-Test of Equal Mean Returns on All Weekdays
Equation: PERIOD1_REGRESSION
Null Hypothesis:
C(1)=C(5)
C(2)=C(5)
C(3)=C(5)
C(4)=C(5)
F-statistic
Chi-square
1.764813
7.059252
Probability
Probability
0.133652
0.132790
203
The Wald chi-square statistic is asymptotic, assuming u2 u2 in
RB - r R X X
T
RT
2u
RB - r ~
1
2
q
Coefficient
Std. Error
t-Statistic
Prob.
N1
N2
-0.000877
0.000799
0.001055
0.001055
-0.831321
0.756916
0.4060
0.4493
N3
N4
0.000193
0.001417
0.001055
0.001055
0.183048
1.342474
0.8548
0.1797
N5
0.001840
0.001055
1.743269
0.0816
R-squared
0.003815
0.000674
Adjusted R-squared
S.E. of regression
0.000091
0.015474
0.015475
-5.494645
0.256212
2958.372
Schwarz criterion
Durbin-Watson stat
-5.471482
1.768158
204
Has the Monday day-of-the-week effect disappeared after August 1998? It
seems so.
The day-of-the week effect in U.S. may have disappeared in the late
1990s partly because arbitrageurs would enter to cream away the profit by
buying low at close of Monday and selling the same stock high on Friday,
earning on average above normal returns. The arbitrageurs activities have the
effect of raising Mondays prices and lowering Fridays. This would wipe out
the observed differences. Recently, some studies48 documented high Monday
VIX (volatility index) prices relative to Friday prices, and high fall prices
relative to summer prices. This would allow abnormal trading profit by buying
VIX futures at CBOE on Friday and selling on Monday, and buying the same
in summer and selling as autumn approaches. The day-of-the-week and
seasonal effects are not explained by risk premia, but perhaps rather by
behavioral patterns exhibiting pessimism or fear of uncertainty, hence greater
anticipated volatility or VIX index (sometimes called the Fear Gauge) on
Monday for the whole working week ahead, and in autumn when the chilly
winds start to blow in North America.
11.4
ANALYSIS OF VARIANCE
SST n i R i R
5
i 1
n3
n1
n2
n4
n5
2
2
2
2
2
SSE R 1j R 1 R 2j R 2 R 3j R 3 R 4j R 4 R 5j R 5
j1
j1
j1
j1
j1
48
See for example, Haim Levy (2010), Volatility Risk Premium, Market Sentiment
and Market Anomalies, Melbourne Conference in Finance, March 2010.
205
where R1j is a Monday return for a day j, R2j is a Tuesday return for a day j,
and so on.
The population means of Monday returns, Tuesday returns, Wednesday
returns, and so on, are c1, c2 , .. , c5. Intuitively H0: c1 = c2 = c3 = c4= c5 is
true if the variability between groups (SST) is small relative to variability
within groups (SSE). H0 is false if the variability between groups is large
relative to variability within groups.
Analysis of between-group variance and within-group variance leads to
test-statistic
SST /(g 1)
~ Fg-1, N-g (F-distribution)
SSE /( N g)
(11.4)
PROBLEM SET
11.1
11.2
An analyst suspected that for some strange reasons, the daily return
rates to some stocks are usually lower on Fridays and higher on other
days of the week. To test this hypothesis, he collected the daily return
data rt of a particular stock, and performed the following linear
regression using ordinary least squares.
rt c1 c 2 I1 c3I 2 c 4 I 3 c5I 4 e t
206
where cis are the regression constant coefficients, et is the disturbance
that is assumed to be i.i.d., and
1 if it is Monday
I1
0 otherwise
1 if it isTuesday
I2
0 otherwise
1 if it is Wednesday
I3
0 otherwise
1 if it isThursday
I4
0 otherwise
It is noted that trading takes place only on week days. If the estimated
equation is
207
Chapter 12
SPECIFICATION ERRORS
Key Points of Learning
Generalized least squares, Heteroskedasticity, Weighted least squares,
Relevance exclusion, Irrelevant inclusion, Multi-collinearity, Lagged
endogenous variable, Contemporaneous correlation, Measurement error,
Simultaneous equations bias, Probability limits, Instrumental variables,
Whites heteroskedasticity-consistent covariance matrix estimator, GoldfeldQuandt test, Breusch-Pagan & Godfrey (LM test), Cochrane-Orcutt procedure
208
Since is a covariance matrix, any xNx1 vector must yield u2 xT x 0
as the variance of xTU . Hence is positive definite. In Linear Algebra, there
is a theorem that if is positive definite, then it can be expressed as
= PPT
(12.1)
where P is a NxN non-singular matrix. Note that P is fixed and non-stochastic.
(12.1) P-1P-1T = I
(12.1) -1 = P-1TP-1
(12.2)
(12.3)
X*T X* 1 X*T Y*
B
(12.4)
is BLUE. This is not the original OLS estimator since the regression
This B
is made using transformed Y* and X*.
above also in terms of the original Y and X. We do this
We can express B
by substituting the definitions of X* and Y* in (12.4) and utilizing (12.3).
X*T X* 1 X*T Y*
B
P X P X P X P Y
1
X T P 1T P 1X X T P 1T P 1Y
1
X T1X
Or, B
X T1Y .
Given , the latter is the Generalized Least Squares (GLS) estimator of the
in (12.4)) is BLUE. It
regression of Y on X. This GLS (exactly the same as B
is interesting to note that OLS regression of Y on X based on YNx1 = XNxk Bkx1
+ UNx1 , with non-homoskedastic disturbances U, produces OLS estimator that
is unbiased, i.e. E X T X
efficiency.
Given that is known, an unbiased estimate for u2 is (since cov(U*) is
u2I)
209
T
T
1
Y* X*B
Y* X*B
P 1Y P 1XB
P 1Y P 1XB
Nk
Nk
T
1
P 1T P 1 Y XB
Y XB
Nk
1 T 1
U U
Nk
= Y XB
Note that
above is found by using GLS
U
u2
u2 X*T X* 1 u2 X T P 1T P 1X 1 u2 X T 1X 1 .
cov B
Thus, the usual procedures of confidence interval estimation and testing of the
parameters can be carried out.
As a special case, suppose the disturbance exhibits heteroskedasticity
which is of a known form. For example, Y= XB + U, X= (Xi1 | Xi2 | Xi3 | .. |
XiN) Nxk and first column of X, Xi1 = (1 1 1)T. Suppose the N
disturbances have variances proportional to the square of a certain j th
explanatory variable Xij as follows.
cov(U) = u2
X12j 0
0
2
0 X2 j
0
where NxN
2
0
X Nj
0
1/ X12j
0
0
2
0
1/ X 2 j
0
1
Now NxN
.
0
1/ X 2Nj
0
XT1X 1 XT1Y X*T X* 1 X*T Y*
B
where
Y1 / X1 j
Y /X
2
2j
*
Y
YN / X Nj
and
1/ X1 j
1/ X
2j
*
X
1/ X Nj
X12 / X1 j
X 22 / X 2 j
X N 2 / X Nj
X1k / X1 j
X 2k / X 2 j
X Nk / X Nj
210
For the notation in Xij, the first subscript i represents time/cross-sectional
position 1, 2, ., N, and second subscript j represents the column number of
X or the jth explanatory variable.
is also called the weighted least squares
The above GLS estimator B
estimator since we are basically weighing each observable (Yi Xi1 Xi2 . Xij
. Xik) by 1/Xij . Note that Y* is a MLR on X* with a constant. Where is the
constant?
Then, to test H0: Bj = 0, we use
j 0
B
N N
12 0
0 22
0
0
N2
0
0
0
The variances of ui, uj, etc. in U are not the same at least for some of the us.
However, all cross-covariances, cov(ui,uj) = 0 for i j.
What happens? If {i2}i=1,2,,N are known, then we can apply GLS and
obtain a BLUE estimator of B. More discussion of GLS for the case when
{i2}i=1,2,,N are unknown will be presented after listing of the other types of
specification errors.
(b) Disturbances are serially correlated (if ut is a time series), or crosscorrelated (if ui is cross-sectional).
For example, if Ys, Xs, and Us are stochastic processes (over time), then if
ut is AR(1) or MA(1), the autocorrelation is not zero for at least lag one. Then
211
E(UUT) = NxN u2 IN . Specifically, the off-diagonal elements are not zero.
For example, if disturbance
ut+1 = ut + et+1,
where et+1 is zero mean i.i.d., then
NN
N 1
N 2
N 3
N 1
N 2
N 3 .
What happens? If is known, then we can apply GLS and obtain a BLUE
estimator of B. More discussion of GLS will be presented later for the case
when is unknown after listing of the other types of specification errors.
12.2
,B
if the true model is
OLS B
0
1
0 b0 and E B
1 b1 . Thus the OLS estimators are biased and not
E B
consistent.
212
(b) Inclusion of irrelevant explanatory variables
The unnecessary inclusion of irrelevant or dependent variables for explanation
is sometimes called an error of commission. Suppose the true specification is
Dt = b0 + b1Pt-1 + b2Yt + b3Qt + b4Mt + t ,
but additional explanatory variables
Z1t = number of cars in Hong Kong
Z2t = number of cars in New York
Z3t = number of cars in Tokyo
213
large.
This means that the sampling errors of the estimators B are large in the
face of multi-collinear X. It leads to small t-statistic based on H0: Bj = 0, and
thus the zero null is not rejected. Thus it is difficult to obtain accurate
estimators. Even if Bj is actually > (or <) 0, we cannot reject H0 : Bj = 0.
What can we do to fix the problem? We can fall back on apriori restrictions
based on theory. For example, if explanatory variable X k is highly correlated
with X1, X2, , Xk-1, Xk+1, etc. but bk theoretically is close to zero, we can
restrict bk = 0 and thus avoid inclusion of Xk. This will eliminate the multicollinearity problem.
Or else we live with the shortcoming which is a data problem, and not a
model problem. The problem can of course be mitigated when the sample size
increases. It should be noted that the OLS estimators are still BLUE, and
asymptotically, the OLS estimators are still consistent.
(d) Non-stationarity
When X has a time trend, e.g. Xit = a0 + a1t + uit , uit being i.i.d., then it is
problematic to regress Yit on Xit because Xit is not stationary. When Yit is also
non-stationary, there is the problem of false correlation between Yit and Xit .
This problem will be explored more fully in the chapter on unit roots.
12.3
214
to be uncorrelated with ut .) What happens? OLS estimator is not BLUE, but is
still consistent.
We can show that as follows. In matrix notation, YNx1 = (Y1 Y2 .. YN)T .
XNxk = ( 1 | X1 | X2 | .. | Y* ) where 1 is Nx1, Xi is Nx1, and Y* = (Y0 Y1 ..
YN-1)T.
UNx1 = (u1 u2 . uN)T. Bkx1 = (B0 B1 B2 C)T .
XT X
Then, OLS B
E XT X
EB
1 X T XB U E B X T X 1 X T U B E X T X 1 X T U
E X T X
X T U E X T X
However, if one element of XNxk , e.g. Y*, is not independent of U, then the
1
1
taken as the product of expectations of X T X X T and U as some random
1
elements in X T X X T will be dependent on U.
1
B XT X XT U .
The OLS estimator can always be written as B
Therefore, if Yt-1 and ut are contemporaneously uncorrelated, and all other
Xits are also uncorrelated with ut , then
1 N
X 1t u t
covX 1t , u t
N t 1
N
X T U 1 X 2t u t
covX 2t , u t
N t 1
N
0 k1 .
covY , u
N
1
t
1
t
N Yt 1 u t
t 1
k1
215
Since
XT X
converges to a non-singular k k matrix, say Q, then
N
XT X
plim B B plim
N
N
N
plim
N
XT U
B Q.0 B.
N
Thus, zero contemporaneous correlation, but not independence, does not yield
, but yields consistency. In other words, B
is biased in
BLUE for OLS B
finite sample but is consistent.
(b) Contemporaneous non-zero correlation
In addition to Yt = B0 + B1X1t + B2X2t + . + C Yt-1 + ut , suppose ut is AR(1)
process, i.e. ut = ut-1 + et , then
cov(Yt-1 , ut)
= cov(B0 + B1X1t-1+ B2X2t-1 + . + C Yt-2 + ut-1 , ut)
= cov(ut , ut-1) 0.
Thus, Yt-1 and ut are contemporaneously correlated. What happens? OLS
estimator is not BLUE, and is also not consistent.
There are some special situations when stochastic dependence between X
and U arises and causes problems.
12.4
216
Some regression specifications require simultaneous equations model. For
example, if demand is modeled as a regression equation
Dt = B0 + B1Pt + et ,
then there could be another simultaneous equation at work, viz.
St = A0 + A1Pt + ut ,
and cov(ut , et ) = 0.
The two regressions constitute the simultaneous equations model. In
economic equilibrium, Dt = St . Then,
B0 + B1Pt + et = A0 + A1Pt + ut
or, et = A0-B0 + (A1-B1)Pt + ut .
If cov(et , ut) = 0, then (A1-B1) cov(Pt , ut) + var(ut) = 0, or cov(Pt , ut) = var(ut)/(A1-B1). Therefore, cov (et , Pt ) = (A1-B1)var(Pt) + cov(Pt , ut) = (A1B1)var(Pt) - var(ut)/(A1-B1) 0. Thus the MLR with simultaneous equations
bias induces contemporaneous non-zero correlation. If we regress Dt on Pt ,
without considering simultaneous equations, the OLS estimator is not BLUE,
and not consistent. The simultaneous equations bias can be shown graphically
as follows.
Figure 12.1
Simultaneous Equations Bias under Demand and Supply Equations
S
et
P
P
Given Pt , demand Dt moves up or down by disturbance et . Because of the
elastic supply curve, movement in the demand curve due to et induces
217
equilibrium price along the supply curve, and thus Pt also changes. Thus, it is
seen that cov(Pt , et) 0.
12.5
PROBABILITY LIMITS
1
0 is easy to understand as the real number
N N
F X dF
X
lim prob(| X N A |
N
1
) 0,
N
218
For example, given X
1 N
Xi
N i1
plim X = 0.
There are some results on probability limits that we will use. If
1 N
Xijui 0 for at least some columns j amongst the k
N i1
1
Then p lim
OLS B XT X
Since B
X T U , then
1
1
OLS B plim 1 X T X 1 X T U B XX
plim B
XU B .
N
N
INSTRUMENTAL VARIABLES
219
1 T
Z X ZX 0kxk and ZX is non-singular.
N
1
ZT X ZT Y .
is B
Property (a) yields plim
IV
IV ZT X
So, B
ZT Y B ZT X ZT U , and
1
1
IV B plim 1 ZT X 1 ZT U B ZX
plim B
ZU B 0kx1 B .
N
N
is
B B
B T X, Z
E B
IV
IV
Z UU ZX Z
Z X Z ZX Z
E ZT X
2u
X, Z
ZT Z X T Z
2u E Z T X Z T Z X T Z
.
.
1
2u E plim Z T X plim Z T Z plim X T Z
N
N
N
1
1
1
1 ZZ
1
2u E ZX ZZ ZX 2u ZX
XZ .
N
N
12.7
220
business cycle. For example, B could take different values conditional on the
state of the economy. The states of the economy could be driven by a Markov
Chain model. Each period the state could be good G or bad B. If the state is G,
next period probability of G is 0.6 and probability of B is 0.4. This is shown in
Table 12.1 below.
Table 12.1
Random Coefficient in Two States
State G
0.6
0.2
State G
State B
12.8
State B
0.4
0.8
HETEROSKEDASTICITY
Suppose NxN is not known and the elements need to be estimated, but assume
the form of the heteroskedasticity is known. Suppose the N disturbances have
variances proportional to the th power of a certain jth explanatory variable Xij
as follows.
cov(U) = u2
NxN
X1j
0
X2 j
0
X Nj
OLS XT X
(a) Use OLS to obtain B
BLUE.
(b) Find estimated residual u i Yi
B
j1
j,OLS
U .)
X ij (Note: E U
2
(c) Since var u i u X ij for i = 1, 2,.., N, then
221
X1j 0
0
0 X2 j
0
(d) NxN
0
X Nj
0
1
1X XT
1Y will be approximately BLUE (subject to
XT
Then, B
sampling error in ) .
In situations where is unknown and the form is approximately INxN , then
we may stick to OLS. It is still unbiased, and approximately BLUE as in the
case of any estimated GLS estimators.
Suppose the heteroskedasticity implies
N N
where
12 0
0 22
0
0
u2=1
N2
0
0
0
2
t
X Tt X t
conditional on X is XT X
The covariance matrix of B
Since is diagonal with constants, X T X
XTX XT X .
1
t 1
where X N k
X1
X
2
XN
2
T
2 T
E ~ X X E ~ E X X .
t t t t t t
And given the stationarities of {t2} and X, the above may be estimated by
222
plim
N
1N 2 T
1 N 2
1N T
X
X
plim
plim
t t t N N
t
X t X t .
N t 1
t 1
N N t 1
Then, X T X
t 1
2
t
1 N 2 T
1 N 2
N plim
t X t X t N plim
t
N N t 1
N N t 1
plim 1 X T X
t t
N N t 1
2
N
0 XX
1 N 2
1 N T
where 2 plim
plim
and
X X .
0
XX
N t 1 t
N t 1 t t
N
N
XTX XT X
as estimate of
0
XX
N
N N N N
Thus, when the sample size is large, OLS is still a useful estimator under such
unknown heteroskedasticity as the covariance matrix of the OLS estimators
can be estimated and is consistent.
When t2 is a function of variables dependent on X, the covariance
is XT X
matrix49 of B
XTX XT X
u2 X T X is also incorrect.
1
. Then let
First, run OLS and obtain u t Yt X t B
OLS
49
223
u 12
0
.
u 2N
0
u 22
0
X XT X as the
XT
1
1XT XXTX 1
plim N X T X
N
XT X
plim
N
N
1
plim
N
T
X
X
N
XTX
plim
N
N
T
X X 1
1
plim
.
XX
XX
N
N
T
X
X
N
T
X
X
N
1 N 2 T
T
plim
u t X t X t converges to X X . Hence,
N
N t 1
X X T X X T X X T X X T X .
p lim X T X X T
1
1
N
X XT X 1 .
XT X XT
Nk
224
In this test, the sample data is sorted in order of the value of the explanatory
variable that is associated with the disturbance variance, starting with data
with lowest disturbance variance. OLS regression is then performed using the
first third and the last third of this sorted sample. If the association is true, then
the disturbance of the first third will have smaller variance (approximately
homoskedastic) than the variance of the disturbance of the last third. Since the
SSR/(N-k) (or RSS/(N-k)) is the unbiased estimate of the variance of the
residuals, the ratio of SSR(last third)/SSR(first third) (n1-k)/(n3-k) , where n1
and n3 are the sample sizes of the first third and last third respectively, is
distributed as Fn3-k , n1-k under the null of no heteroskedasticity.
If the form of the heteroskedasticity is suspected to be of the form
2
t f Z t linked linearly to some k-1 exogenous variables Zt, then the
Breusch-Pagan & Godfrey test (LM test) can be performed. Here estimated
2
u t2 is regressed against a constant and Zt and an asymptotic k 1 test statistic
12.9
SERIAL CORRELATION
When E(UUT) = NxN u2 IN , then another possibility is that the offdiagonal elements are not zero. This is equivalent to serial correlation in its
disturbances or residuals. Specifically, suppose disturbance
ut+1 = ut + et+1
where et+1 is zero mean i.i.d. If can be accurately estimated, then Estimated
or Feasible GLS can be applied. Specifically the Cochrane-Orcutt (iterative)
procedure is explained here. It tries to transform the disturbances into i.i.d.
disturbances so that the problem is less severe, and then FGLS estimators are
approximately consistent and asymptotically efficient. (Recall that if there are
225
lagged dependent variables on the right-hand side with AR(1) disturbances,
OLS estimates are biased and inconsistent, and IV method has to be used.)
The covariance matrix of the disturbances is
NN
N 1
N 2
N 3
N 1
N 2
N 3 .
X2* = X2 - X1 .
Recall Xt is 1 x k matrix containing all explanatory variables at time t. The
system of regression equations is
YN = XN B + uN
YN-1 = XN-1B +uN-1
...
Y2 = X2 B +u2
Y1 = X1 B +u1
Therefore,
YN* = YN - YN-1 = XN B + uN - (XN-1B +uN-1) = XN* B + eN
..
Y2* = X2* B + e2
Thus, we are back to the classical conditions, and OLS is BLUE on the
transformed data. Of course, this is equivalent to GLS on the original data. In
practice, since is not known, it has to be estimated.
226
First run OLS on the original data. Obtain the estimated residuals
.
u t Yt X t B
OLS
Next, estimate using
N
1
N
u u
t 2
N
1
N
u
t 2
t 1
.
2
t 1
Note that the index starts from 2. can also be obtained from the reported
Durbin-Watson d-statistic in the OLS regression. Then the transformations are
done, viz.
YN* = YN - YN-1
..
Y2* = Y2 - Y1
1 2 Y1
XN* = XN - XN-1
Y1* =
..
X2* = X2 - X1
X1 * =
1 2 X1
Then, OLS is run again using the transformed {Yt*} and {Xt*}. The
estimators are approximately consistent and asymptotically efficient.
Iterations can be performed using the updated OLS u t and for another
round of OLS regression till the estimates converge.
The presence of serial correlations, hence non-homoskedastic disturbances
and breakdown of the classical conditions, can be tested using the DurbinWatson test, the Box-Pierce-Ljung Q-tests discussed earlier, and BreuschPagan & Godfrey test (LM test).
There are three limitations of the DW test as a test for serial correlation.
First, the distribution of the DW d-statistic under the null hypothesis depends
on the data matrix. Thus bounds are placed on the critical region within which
the test results are inconclusive. Second, if there are lagged dependent
variables on the right-hand side of the regression, the DW d-test is no longer
valid. Lastly, the d-test is strictly valid on the null hypothesis of no serial
correlation only when it has the alternative hypothesis of first-order serial
correlation.
227
The other tests of serial correlation viz. the Box-Pierce-Ljung Q-test and
Breusch-Pagan & Godfrey test (LM test) overcome these limitations. The
essential idea of the LM test is to test the null hypothesis that there is no serial
correlation in the residuals up to the specified order. Here estimated u t is
regressed against a constant and lagged u t k s and an asymptotic k 1 test
statistic and an equivalent Fk-1,N-k statistic are reported based on the null
hypothesis of zero restrictions on the slope coefficients of u t k . Strictly
speaking, the distribution of the F-statistic is not known, and the F-distribution
is an approximation since is estimated.
2
12.10
PROBLEM SET
12.1
12.2
1/ B
? Explain
OLS regression on [2], do we obtain OLS estimate D
why or why not?
A theoretical model is postulated as
Yt
ea bX t cZ t t
,
1 ea bX t cZ t t
1
10
0.6
3
2
15
0.5
6
3
5
0.7
2
4
20
0.8
5
228
12.3
[1]
[2]
[3]
-0.01
(0.36)
0.01
(0.55)
0.02
(0.87)
0.21
(0.04)
0.16
(0.34)
C
0.18
(0.06)
0.08
(0.41)
Explain if there is any problem with the data here? How would you
proceed to handle the problem?
229
Table 12.2
Glossary of Specification Errors and Remedies
Problem: Heteroskedasticity
Special Cases
Non-constant variances but zero
correlations i.e. covariance matrix
I is diagonal. Variance form is
known e.g. Xi2
Resulting from
Effect on OLS
Unbiased and Consistent
BUT not efficient (in both
finite sample and large
sample). Incorrect standard
error when using
2 X T X
var B
Test if Problem Exists
Goldfeld-Quandt test
Breusch-Pagan & Godfrey test
(LM test)
Solution
Weighted LS BLUE.
Consistent and Asymptotically Efficient (consistent
estimation of the parameter in the heteroskedastic
form gives rise to this estimator consistency)
Problem: Heteroskedasticity
Special Cases
Non-constant variances but zero
correlations i.e. covariance matrix
I is diagonal. Variance form is
not known
Resulting from
Effect on OLS
Unbiased and Consistent
BUT not efficient (in both
finite sample and large
sample). Incorrect standard
error
when
using
2 X T X
var B
Test if Problem Exists
Whites test
Solution
Whites Heteroskedasticity-Consistent Covariance
Matrix Adjustment to Standard Error Estimate.
Adjusted standard error is now correct
asymptotically. OLS is consistent and adjusted
covariance matrix allows correct statistical
inference. However, estimator is not asymptotically
efficient.
Resulting from
Effect on OLS
Unbiased and Consistent
BUT not efficient (in both
finite sample and large
sample). Incorrect standard
error
when
using
2 X T X
var B
230
Test if Problem Exists
Box-Pierce-Ljung Q-tests
Breusch-Godfrey test (LM test)
Solution
Resulting from
covariance matrix
I contains
non-zero
offdiagonal elements
Effect on OLS
Unbiased and Consistent
BUT not efficient (in both
finite sample and large
sample). Incorrect standard
error
when
using
2 X T X
var B
Test if Problem Exists
Durbin-Watson d-test
Box-Pierce-Ljung Q-tests
Breusch-Godfrey test (LM test)
Solution
Estimated or Feasible GLS (Cochrane-Orcutt or else
Hildreth-Lu or other versions of iterative procedures
are used). Consistent and Asymptotically Efficient.
Approximately BLUE (in finite sample)
Resulting from
Partial Adjustment models.
Actual dep variable is not
observed
Yt*=a+bXt+et but observed
Yt=Yt-1+q*(Yt*-Yt-1) + ut , i.e.
with delayed adjustment.
Effect on OLS
Biased and not efficient.
But still consistent because
there is zero contemporaneous
correlation.
Asymptotically
Efficient.
Solution
OLS Consistent and Asymptotically Efficient.
Resulting from
Errors-in-variables
Measurement
error
explanatory variable.
Solution
If error form and specification is known, correction can be
made to establish consistent estimator with asymptotic
inference.
or
in
Effect on OLS
Biased and not efficient.
Not consistent because there
is
contemporaneous
correlation.
Not
Asymptotically Efficient.
231
Problem: Disturbances ut with contemporaneous correlation with
explanatory variable(s) (hence also are stochastically dependent)
Special Cases
Yt=a+bXt+ut such that
cov(Xt , ut) 0
Resulting from
Simultaneous
equation bias
Effect on OLS
Biased and not efficient.
Not consistent because there
is
contemporaneous
correlation.
Not
Asymptotically Efficient.
Solution
For exactly identified system, use Instrumental
Variables or Indirect Least Squares. For overidentified system, use two-stage least squares or else
limited information maximum likelihood method.
All these enables Consistency with asymptotic
variance, and are thus testable, but are usually not
asymptotically efficient.
Resulting from
Imposing Koyck lags
on lagged explanatory
variables or Adaptive
Expectations Models
where
Yt=a+b*XtE+et,
XtE is an expectation
formed at t and
follows
XtE=Xt-1E+q*(Xt-Xt-1E)
Effect on OLS
Biased and not efficient.
Not consistent because
there is contemporaneous
correlation.
Not
Asymptotically Efficient.
Solution
If can be accurately estimated, then Estimated or
Feasible GLS, e.g. Zellner-Geisel Method tries to
transform into i.i.d. disturbances so that the problem
is less severe, and then FGLS is approximately
Consistent and Asymptotically Efficient. Or else
perform Instrumental
Variables
Regression.
Consistent with asymptotic variance, thus testable,
but usually not asymptotically efficient.
232
Problem: Disturbances ut with contemporaneous correlation with
explanatory variable(s) (hence also are stochastically dependent)
Special Cases
Lagged Endogenous (Dependent)
Variable Yt=a+bXt+bYt-1+ut
AND Disturbances follow AR
process ut = ut-1 + et , et i.i.d.
Cov(Yt-1 , ut) 0 and cov(Yt-1 , ut-1)
0 so in general Y, U are not
stochastically independent. Also,
contemporaneous correlation is not
zero.
Test if Problem Exists
Durbin-Watson h-test
Box-Pierce-Ljung Q-tests
Breusch-Godfrey test (LM test)
Resulting from
Effect on OLS
Biased and not efficient.
Not consistent because there
is
contemporaneous
correlation.
Not
Asymptotically Efficient.
Solution
If can be accurately estimated, then Estimated or
Feasible GLS (Cochrane-Orcutt or else HildrethLu or other versions of iterative procedures) tries
to transform problem into i.i.d. disturbances so
that the problem is less severe, and then FGLS is
approximately Consistent and Asymptotically
Efficient. Or else perform Instrumental Variables
Regression. Consistent with asymptotic variance.
Thus testable, but usually not asymptotically
efficient.
Resulting from
also
Effect on OLS
Spurious fit and bad forecasts
Solution
Transform into linear form or perform nonlinear
regressions
Resulting from
Effect on OLS
Biased and not efficient. Not
consistent. Not Asymptotically
Efficient. Incorrect standard
error
when
using
2 X T X
var B
Test if Problem Exists
Low R2 and/or unreasonable
economic
interpretations
of
estimated coefficients.
Solution
Add variables based on theoretical reasoning or
model. Or start with overfitting and reduce number
of explanatory variables by optimal model selection.
233
Problem: Wrong explanatory variables
Special Cases
Inclusion of irrelevant
explanatory variables
Resulting from
datamining/overfitting
or wrong theoretical
model
Test
if
Problem
Exists
Test for insignificant
t-statistics
for
coefficient estimates
Solution
Effect on OLS
Still BLUE in the sense that estimator of
coefficients for irrelevant explanatory
variables will be expected to be zeros
(and also consistent). But may introduce
unnecessary data multi-collinearity in
finite sample and hence less precision to
relevant estimators, i.e. sampling
variance of estimators of coefficients of
relevant variables now larger. Also,
seriously
Biased
forecast
when
irrelevant variables are used (as nonzero coefficients appear in forecast
equation).
Problem: Multi-collinearity
Special Cases
lagged
explanatory
variables is one cause
Test
if
Problem
Exists
Correlation matrix and
test of autocorrelation
amongst
lagged
explanatory variables
Resulting from
at least one explanatory
variable close to being a linear
combination
of
other
explanatory variable(s)
Solution
Effect on OLS
Low
t-statistics
(Large
sampling errors of estimators)
Problem: Multi-collinearity
Special Cases
Explanatory variables consist
of lags of same variable e.g.
Yt = c0+c1Xt+c2Xt-1+c3Xt2+c4Xt-3++et
Test if Problem Exists
Correlation matrix and test
of autocorrelation amongst
lagged explanatory variables
Resulting from
the lagged explanatory
variables are highly
correlated
Effect on OLS
Low
t-statistics
(Large
sampling errors of estimators)
Solution
Impose restrictions on coefficients e.g. Koyck lag so as to
concentrate effect onto just one Xt [but this leads to
Lagged Endogenous (Dependent) Variable problem] or
else polynomial distributed lag to concentrate effect onto
just a few composite (linear averages of X t-ks) explanatory
variables. Or increase sample size.
234
Problem: Non-stationary regressand and regressor
Special Cases
Regressand and regressor are trend
stationary
Test if Problem Exists
Test for presence of time trend
Resulting from
Effect on OLS
Both
a
linear spurious coefficients showing
function
of
a significant slope when it does
strong time trend
not exist
Solution
Remove trend by first differencing and regression
using first differenced series
Resulting from
Effect on OLS
Both unit root spurious coefficient estimates
processes
Solution
Cointegration regression if there is cointegration
between the unit root processes. If not, perform OLS
on first differences.
Resulting from
Effect on OLS
very long time series
spurious
coefficient
or economic regime estimates
changes
or
major
economic event
Solution
Apply model specific techniques principally based on
maximum likelihoods
Resulting from
Effect on OLS
Wrong statistical inference of
OLS estimators and forecasts
Solution
Ascertain the distribution and employ maximum
likelihood methods
235
Chapter 13
CROSS SECTIONAL REGRESSION
APPLICATION: TESTING CAPM
Key Points of Learning
Mean-Variance Efficiency, Rolls critique, Fama-MacBeth Procedure, Crosssectional Regression Test, Coefficient Restriction, Wishart Distribution,
Hotellings T2 Statistic, Asymptotic Chi-Square Test, Asymptotic Tests, Wald
Test Statistic, Idiosyncratic Risk
wT V w
wT E(R) = k
and
wT j = 1.
236
min
w, a>0, b>0
L wT V w + a [k wT E(R)] + b [1- wT j]
FOCs are:
Lw = V w a E(R) b j = 0
(13.1)
T
La = k w E(R) = 0
(13.2)
Lb = 1- wT j = 0
(13.3)
where 0 is a N1 vector of zeros.
From equation (13.1), the optimal (mean-variance, MV, efficient)
portfolio weights are given by:
w*= a V-1 E(R) + b V-1 j .
(13.4)
(13.7)
237
with expected return kg , and h = p + q kh with expected return kh. We can
form a linear combination of these two MV frontier portfolios:
z = g + (1- )h
= [ p + q kg ] + (1- ) [ p + q kh ]
= p + q { kg + (1- ) kh}.
Clearly, portfolio z is itself a MV frontier portfolio since its portfolio weights
satisfy the property in (13.7). Its expected return is kz = kg + (1- ) kh.
We just showed an important result in portfolio mathematics, that there
exists a two-fund separation theorem, i.e. any efficient portfolio can be created
by investing in only two efficient funds. In other words, any linear
combination of two efficient portfolios is itself an efficient portfolio. We let
portfolio h have expected return kh equal to the riskfree rate rf . Then, portfolio
hs weight vector is p + q rf. Such a portfolio exists on the MV frontier.
By the above result, the market portfolio m which is MV efficient can be
expressed as a linear combination of any arbitrary MV efficient portfolio g
and MV efficient portfolio h with expected return rf , such that
m = m g + (1- m) h
= m [ p + q kg ] + (1- m) [ p + q rf ]
= p + q {m kg + (1- m) rf }
= p + q km
where km = m kg + (1- m) rf .
(13.7)
(13.8)
where E(ri) ki is the expected return of an ith portfolio (it can also be a stock
simply by taking the weights to be [0,0,0,,0,1,0,..,0] with 1 in the ith
position of the N1 vector. E(rm) km, and rf are respectively the expected
market return and the riskfree rate for the period.
Substituting (13.7) into (13.8):
E(ri) = rf + bi [m E(rg) + (1- m) rf rf ]
= rf + bim [ E(rg) rf ]
= rf + i [ E(rg) rf ]
(13.9)
238
where E(rg) kg , and i = bim is a constant associated with portfolio g.
Equation (13.9) gives rise to regression equation:
ri = rf + i [ rg r f ] + ei
(13.10)
where it is assumed that cov(rg , ei) = 0, and hence i = cov(ri , rg)/ var(rg),
which acts like a market beta if indeed rg is the market return.
Rolls critique53 was a clever and deep insight into the CAPM tests of
(13.8). Typically, the S&P 500 or some other market indices are used as
proxies for the market portfolio price. Their returns are assumed to be market
returns rmt , for a range of t. However, Roll noted that there was no way to
verify if indeed the proxy is the market return. Thus acceptance of test
restrictions in (13.8) merely implies that the proxy or the arbitrary portfolio g
follows (13.9) and (13.10) instead. This in turn merely implies that there is
evidence the proxy is a MV efficient portfolio. Nothing more. Hence all the
tests of CAPM in the past using proxies are nothing more than testing if the
proxies are mean-variance efficient, i.e. if that portfolio lies on the MV
efficient frontier. They do not test for CAPM in (13.8), and unless a market
return can truly be observed, the CAPM is unlikely to be testable in the formal
and exact sense. It may also be said that rejection of a test on de facto (13.9)
or (13.10) is not necessarily a rejection of CAPM in (13.8) but rather a
rejection that the chosen proxy is not MV efficient.
Though the idealized situation of a definitive test of the CAPM cannot be
realized, given that we do not know exactly what the market return is, as it
may embody human capital returns and other assets returns not captured in
the standard proxies such as market index returns, still researchers agree that
CAPM tests are useful to know if given the set of stocks, the standard proxies
are mean-variance efficient (or sometimes called minimum variance efficient).
If the proxy is MV efficient, then it is still useful to employ the proxy to
describe differences in expected returns of different assets based on their
return correlations with the proxy return. In what follows, stated CAPM test
would assume the market index is a good proxy for market return.
13.2
53
Roll, Richard, (1977), A Critique of the Asset Pricing Theorys Tests: Part I,
Journal of Financial Economics 4, 129-176.
239
Ei = 0 + 1 bi ,
for i=1,2,.,N
(13.11)
where N is the number of assets, Ei E(ri), and 0, 1 are constants. In
particular, 0 can be interpreted as the riskfree rate, and 1 the excess market
return or the market risk premium, according to (13.8). In other words, the
variation in assets betas must explain the variation in assets expected returns
or the cross section of expected returns. In fact, under CAPM, the implication
is quite strong in that only betas are systematic in explaining all the variation
of the expected returns.
Intuitively, the CAPM result or SML relates to the Markowitz portfolio
diversification. An asset with return having a high covariance with r mt does not
contribute as much to market portfolio variance reduction than does an asset
with return having a low or even negative covariance with r mt. Thus the high
covariance asset or equivalently high beta asset will have a higher systematic
risk (risk that cannot be diversified away due to covariation with r mt) and thus
higher expected return according to (13.8). Thus, when there is relatively high
expected return (or high required return) on an asset given current price Pt,
this is the same as saying there is relatively high expected future price E(P T)
then one key possibility is that the asset has a high risk premium attached to it,
so that the relatively higher required return or risk premium is to compensate
the investor with bearing the higher systematic risk.
Tests using restriction (13.11) across assets for a particular period are
called cross-sectional tests of the CAPM. Alternative cross-sectional tests of
the CAPM include:
Ei = 0 + 1 bi + 2 bi2 +.+ k i ,
for i=1,2,.,N
(13.12)
240
processes for the returns, and extend the single period CAPM to a time series
representation:
rit = rft + bi [rmt rft] + eit , for i=1,2,......,N; and t=1,2,........,T
that is consistent with (13.8), though with additional auxiliary assumptions
such as cov(rmt , eit ) = 0, and bi being constant over time. The time series
representation can be re-written as:
rit rft = ai + bi [rmt rft] + eit , for i=1,2,......,N; and t=1,2,........,T. (13.13)
A testable restriction is then H0: ai = 0 for all i=1,2,..,N. If we had run
time series regressions on rit = ai + bi rmt + eit instead, then an alternative and
more complicated testable restriction on the coefficients is H0: ai = rf (1-bi) for
all i=1,2,.....,N, and assuming rft = constant rf for all t=1,2,.......,T. Even more
complicated tests involve cross-sectional time series regressions.55
We provide a cross-sectional regression of (13.11) as follows. 135 large
and liquid (active trading) stocks in the Singapore Exchange were grouped
into 14 portfolios according to the size of their betas. The first portfolio
contains stocks with the lowest betas and the 12th portfolio contains stocks
with the highest betas. The betas range from 0.6 to 1.6. The portfolios are
chosen so that the number of stocks in each portfolio is about similar and at
the same time the resulting portfolio betas are spread out as widely as
possible. The spread in the explanatory variable of beta is to ensure adequate
statistical power to the test. The betas were computed based on returns in the
past 5 years and were checked with a published source: Corporate Handbook
Singapore, Thomson Financial Publishing to ensure consistency. The
averaged returns of stocks in a portfolio is then used as the proxy for the
expected return of that portfolio, i.e. the dependent variable in (13.11). For the
following example, we use the averaged portfolio returns for averaged
monthly returns of the stocks during the period October 2001 to September
2002.
Equation (13.11) is consistent with the following regression equation:
ri = 0 + 1 bi + i ,
for i=1,2,.,N
(13.14)
where E(ri) = Ei, E(i) = 0, and cov(bi , i) = 0. Moreover we shall add the
usual classical OLS condition that error i is homoskedastic and serially
uncorrelated.
55
See Gibbons, M., (1982), Multivariate tests of financial models: A new approach,
Journal of Financial Economics, 10, 3-27.
241
Using averaged portfolio returns provides for less errors to the dependent
variable which is supposed to be an expected return. This basically means that
the residual error variance is smaller. Betas are typically measured with
sampling errors. Portfolio betas allow for averaging across the stock betas in
the portfolio and thus have less measurement errors. In the chapter on
Specification Errors, we saw how reducing this is important to ensure betas
are not biased downward.
The graphical plot of return versus beta is shown in Figure 13.1 below.
The monthly return is an average of the returns during October 2001 to
September 2002.
Figure 13.1
Plot of Portfolio Averaged Monthly Return versus Portfolio Beta for
Singapore stocks, October 2001 September 2002
Average Portfolio Return
.03
Realized SML
.02
.01
.00
-.02
0.6
0.8
1.0
1.2
1.4
1.6
Portfolio Beta
242
return was about 6% p.a. Though both the OLS regression intercept and slope
estimates are positive and hence support the implications of SML in CAPM,
Figure 13.1 shows that the OLS fitted line is flatter than the realized SML
according to (13.8):
E(ri) = 0.0005 + bi 0.0045.
It means that beta alone cannot fully explain the cross sectional variation of
expected returns.56
Table 13.1
Cross-sectional Regression of Portfolio Return on Portfolio Beta
ri = 0 + 1 bi + i
Dependent Variable: PORTRET
Method: Least Squares
Sample: 1 14
Included observations: 14
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
BETA
0.002941
0.000493
0.012179
0.010413
0.241508
0.047316
0.8132
0.9630
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.000187
-0.083131
0.010532
0.001331
44.96073
0.002239
0.963039
0.003502
0.010120
-6.137247
-6.045953
-6.145698
1.684388
The large standard errors in the regression estimates in Table 13.1 could be
due to the measurement errors of betas. Specifically, in the regression of
(13.14), the estimated betas (from time series) would contain errors as they are
sampling estimates of the population parameters
cov(rit , rmt )
. To perform an
var(r mt )
A similar but stronger result for U.S. stocks during 1928 to 2003 is discussed in
Fama, Eugene F., and Kenneth R. French (2004), The Capital Asset Pricing Model:
Theory and Evidence, Journal of Economic Perspective,18, 25-46. The result was
also found by some earlier research on the U.S. stock market.
243
MacBeth (1973)57 used a three-step procedure that has come to be known as
the Fama-MacBeth procedure or approach in cross-sectional regression
involving asset expected returns. A second issue that the Fama-MacBeth
procedure addressed is that any non-zero cross sectional correlations of i in
(13.14) would lead to estimator distribution that is different from the usual tstatistics and hence incorrect inference based on the t-tests.
13.3
FAMA-MACBETH PROCEDURE
This approach is used in the original article Fama and MacBeth (1973) and
later articles such as Fama and French (1992).6
(1) In step one, the N assets are divided into M portfolios. These are divided
based on criteria such as initial beta estimates58 (that are subject to the
measurement errors). Or the N assets could be first divided into M
portfolios based on firm size, and within each sized portfolio, there is a
further division into M sub-portfolios by initial betas. The latter would
create M2 number of portfolios. Then compute the average or equalweighted portfolio return R j for each of the M (or M2 whichever case it
may be) portfolios.
The idea is that each portfolio sorted by initial beta will behave
similarly (with the same distribution) given their betas are
approximately the same. Averaging their returns in a particular period, say
month t, will produce an observation close to the portfolios expected
return at time t. Any remaining measurement error with expected return is
not critical as this variable is a dependent variable. This point will also
become clearer in later chapters.
(2) In step two, the beta for each portfolio is re-computed using time series
regression on returns data on rjt rft = j{ rmt - rft } + ejt for the regression,
typically for 60 monthly data.7 This regression could be performed in two
ways: (a) on the equal-weighted portfolio as dependent rjt, thus obtaining
57
See Fama, Eugene F., and James D. MacBeth, (1973), Risk, Return, and
Equilibrium, Journal of Political Economy, Vol 81, 607-636. See also Fama, Eugene
F., and Kenneth R. French, (1992), The Cross-Section of Expected Stock Returns,
Journal of Finance, Vol XLVII, No2, June, 427-465.
58
Betas may be computed as prior betas, i.e. using past time series of the stocks, e.g.
the last 60 months to compute the beta for the next month regression. Sometimes,
forward betas, i.e. the next 60 months are used instead if it is supposed instead that the
investor in the market looked forward rather than backward to infer beta.
244
the portfolio j ; or (b) on each stock return rjt in a portfolio, and then
averaging these betas to obtain the portfolio beta.
The idea is that the estimated portfolio beta for each of the M portfolios will
contain as small a measurement error as is possible since there is initial beta
sorting. Finding a beta from such a portfolio containing assets with closely
similar betas will yield a more accurate estimate of beta. This is one of the key
merits in the Fama-MacBeth approach for cross-sectional regressions.
Steps (1) and (2) above produce M bivariate data points { R j , j } for
j=1,2,.,M. These would be sufficient for the cross-sectional regression59
R j = a + j + ej , j=1,2,3,..,N
(13.15)
month forward to t+1. Portfolios are reconstructed, and { R j , j } reestimated. Sometimes, the portfolio reconstruction is done only each year
ahead. In this case, monthly returns and betas are still used for the crosssectional regression, but the changes from month to month within that year
will be due to delistings and other reasons. For other FM-type regressions
that involve other explanatory variables, as in Fama and French (1992),
regressions could be run with individual stock returns as the dependent
variable. In this case, the associated beta could still be computed based on
something like step (2) above to reduce errors-in-variables. However, in
this case, all stocks in the same portfolio for that period will have the same
estimated beta. Since the cross-section could involve thousands of stocks
from say 100 portfolios (M2 = 100), using same beta for some stocks is not
a big issue.
The a and are estimated cross-sectionally for each month, and then
the monthly time series of estimates are averaged across time (or across the
different cross-sectional regressions). The averages are then tested to see if
they are significantly different from zero, or positive as in the case of the
average of . This last step avoids the problem that any non-zero cross
sectional correlations of i in (13.14) would lead to incorrect inference
based on the t-tests for a single month cross-sectional regression. There is
59
Some studies use returns as dependent variable, while others use excess returns as
dependent variable.
245
of course implicit assumptions that the alphas and betas are approximately
constant, the estimators are uncorrelated over time, and that the sampling
distribution of the estimated alphas and betas follow the central limit
theorem, in which case the time series has to be reasonably long.
Suppose such cross-sectional regressions are performed over T number of
months, and T such slope estimates are obtained. We then test if the
1
time-average of ,
72
E 0 ,
sd
where sd t
1 72
t . In the Fama and MacBeth (1973) study,
T t 1
one particular averaged over 72 months during 1935 to 1940 was found
1 72
t = 11.6%. Hence the t
T t 1
1.09
0.79 was found not to be significantly
11.6
positive. We can also test if the averaged a differs significantly from the
statistic d.f. 71,
72
13.4
ASYMPTOTIC TESTS
246
such time series regressions based on (13.13), and collect the a i s for the test.
It is still a cross-sectional test as we are considering the deviations of a i s
across section, i.e. across the different stocks i. However, to produce test
distributions, we assume, as in the market model, that the zero mean residual
errors in (13.13) for each i, are jointly normally distributed at each t, and are
stationary over time. Specifically, for each t, variance-covariance matrix of
vector residuals is:
e1t e1t
e1t2
e e
e 2t e 2t
cov , E 2t 1t
e Nt e1t
e Nt e Nt
e1t e 2t
e
2
2t
e Nt e 2t
e1t e Nt
e 2t e Nt
N N .
e 2Nt
We can write random vector et (e1t, e2t, e3t, , eNt)T as jointly normally
distributed NN (0, ), where the subscript to N(. , .) denotes multivariate
normality of dimension N. Now consider the TN matrix:
e1, t 1 e 2, t 1
e
e 2, t 2
= 1, t 2
e1, t T e 2, t T
where we assume (e1,t,
independent for t t.
e N, t 1
e N, t 2
e N, t T
e2,t, .., eN,t) and (e1,t, e2,t, .., eN,t) are
Let SNN = T . Clearly S is positive definite since any non-zero vector xN1
is such that xTSx = xTT x = (x)T x > 0. S is also symmetric. Random
square matrix S has a Wishart distribution. For most purposes, we consider the
case T > N, and the Wishart distribution above is said to have T degrees of
freedom, and a pdf function f(.) characterized by parameters T, N, and matrix
:
f s
T N 1/2
exp 12 trace 1s
2 (TN)/2
T/2
N T/2
j1
Notationally, we write SNN WN (,T). There are some useful results about
the Wishart distribution.
247
If xN1 is a vector (0,.,0,1,0,..0) with all elements zero except the kth
T
kk 2k .
Note that in (13.13), the residual errors are normally distributed with
var e it i2 for all t,
and
e it e jt .
t 1
If xN1 is a nonzero constant vector, then x T S x ~ x T x T2 , a chisquare distribution with T degrees of freedom multiplied by variance of (xTet),
TH2 = T (x )T S-1 (x )
(13.16)
TH2 N, T
TN
FN,T- N1 .
T N 1
As an example, for a normally distributed random variable xN1 ~ NN
1 T
(, ), its sample mean x x t over sample size T, where each xt is
T t 1
an N1 normally distributed vector ~ NN (, ), has distribution
x ~ N N , /T . Or T x ~ N N T , .The
unbiased
sampling
covariance matrix of x is
248
1 T
x t x x t x T
T 1 t 1
1
~
WN , T 1.
T 1
Hence, T - 1 U ~ WN , T 1.
From (13.16), we see that
T
T 1 T x T T 1
U
T x T
is distributed as
Re-arranging, we have
T 1N F .
T
1
(13.17)
Tx U x ~ TH2 N, T 1
N,T - N
TN
If we use the sampling covariance matrix that is normalized by sample
size T,
1 T
T 1
T
x t x x t x
U , or
T t 1
T
1
~ WN , T 1.
T
T 1N F .
T
Then, T - 1x 1 x ~ TH2 N, T 1
N,T - N
TN
T 1N F
In (13.17), as sample size T, the statistic
N, T - N converges
TN
to NFN, which is a chi-square statistic with N degrees of freedom. The
U
TH2(N,T-1).
latter is the result of a Wald test. A Wald test in this case is a test that
the population mean is with sample evidence of x . The estimator x
is a maximum likelihood estimator of (under normal distributional
assumption, for example), and the Wald statistic is basically of the
form:
A T
1 A
A ~ 2
lim A
T
249
and cov(eit, ejt) = 0 for any i,j and tt, using sample t=1,2,........,T. From the
two-variable regression theory seen in chapter 3, we know the sampling
distributions of OLS estimators a j and b j .
For notational simplicity, let Yjt = rjt rft, and Xt = rmt rft.
T
a j a j v t e jt
t 1
where v t
T
xtX
x 2t
1 T
, x t X t X and X X t ,
T t 1
1
X2
Now, v
2
t 1
T xt
1 2m
1 2m
T T 2 T 1 2
m
m
T
1
where we use notations m rmt rft
T t 1
T
1
2
and 2m rmt rft m .
T t 1
The variance-covariance matrix of ( a 1 , a 2 , a 3 , .........., a N )T is
T
2
t
a 1
1 2m
a2
var 1 2
T m
a N
1 2m
1 2
T m
e1t2
e e
E 2t 1t
e Nt e1t
e1t e 2t
e 22t
e Nt e 2t
e1t e Nt
e 2t e Nt
e 2Nt
NN .
~ N 0 , 1 1 m .
A
N
2
T m
250
1 2
Or, 1 m2
T m
1 / 2
~ N 0 ,
A
N
T
e jt Yjt a j b j X t , and e t e 1t , e 2t ,, e Nt .
When T is large,
U
1 T
1
e t e Tt ~
WN , T 2.
T 2 t 1
T2
T
A T 2 U
2m T 1
T 2N F
T1 2 A U A ~ TH2 N, T 2
(13.18)
N, T - N -1 .
T N -1
m
Two learning points to note here are: (1) equation (13.18) is tantamount
to finding the statistic
2
T 1 1 m 1 A
,
A
U
2
T m
and (2) under joint normality of returns, the OLS estimators a j s are
also maximum likelihood estimators, hence (13.18) is a Wald test when T.
2
In (13.18), lim T1 m2
T
m
T 1 A
NF 2 .
A
U
N,
N
We can apply the test statistic in (13.18) to test the CAPM and reject if the test
statistic is too large, i.e. if the deviations of a i s across section from zero are
too large.60
60
See also Gibbons, M., S.A. Ross, and Jay Shanken, (1989), A Test of the
Efficiency of a Given Portfolio, Econometrica, 57, 1121-1152.
251
13.5
IDIOSYNCRATIC RISK
The cross-sectional regression tests on the CAPM have mixed results with
some clearly rejecting the model, while others could not. There is much
discussion that centered around whether small sample bias could biased
toward more rejections than warranted. There are also objections that those
tests that support the Sharpe-Lintner CAPM did not impose enough
restrictions implied by the model, and thus could not be said to have produce
convincing evidence.
Yet another test of the CAPM is whether the idiosyncratic risk (or
unsystematic risk) is in fact priced by the market (as observed in market
traded prices). This type of test is somewhat similar to the tests under the first
major implication of the CAPM, that only the betas explain cross-sectional
variation in expected stock returns. However, in substance it challenges basic
CAPM premises that the market diversifies away all unsystematic risk (that a
representative investor holds a large diversified portfolio) and so would only
be compensated for bearing systematic risk related to the market. The trend of
development is to accept that there may be more than one systematic factor
other than the market index factor, but a large part of mainstream finance
academia would still debate about whether unsystematic risk should be priced.
Regardless, there is a line of literature showing that indeed idiosyncratic
risk is priced. This implies that the market is under-diversified and thus (on
average) for each under-diversified market investor, he or she would demand
compensation for stocks with high expected idiosyncratic risk.61
Fu (2009)62 showed that expected idiosyncratic risk indeed explained the
cross-section of stock expected returns in the period July 1963 to December
2006. U.S. stocks on the NYSE, AMEX, and NASDAQ are covered in the
study.Monthly idiosyncratic risk for each stock is basically estimated as the
sample standard deviation of daily stock excess return residual errors (after
fitting a multi-factor model to explain most of the time series variations)
multiplied by the square root of the number of trading days in the month. The
monthly idiosyncratic risks were found to be highly (positively) persistent.
Exponential GARCH model was used to model the dynamics of the
idiosyncratic riskd and enabled expected idiosyncratic risk (as distinct from
unexpected idiosyncratic risk, or the total realized idiosyncratic risk) to be
estimated. Using Fama-MacBeth style approach, monthly cross-sectional
61
252
regressions of stock returns on various factors such as betas, sizes, values,
expected idiosyncratic risk, and so on, were performed. The estimated
coefficients or slopes on the expected idiosyncratic risk were collected and
their time series properties were evaluated. We show below an excerpt of
results on the estimated coefficient on expected idiosyncratic volatility under 3
different regression specifications involving slightly different explanatory
variables as controls.
Specification Averaged Coefficient of Expected
Idiosyncratic Volatility
1
0.11
2
0.13
3
0.15
t-statistc
9.05
11.41
13.65
Averaged
R2
3.02%
4.98%
6.89%
13.6
PROBLEM SET
13.1
13.2
13.3
R j = a + j + j 2 + ei , i=1,2,3,..,N
13.4
253
is both close to zero in value and also not significantly different from
zero. Under the above scenario, given Rolls critique, is it still
possible that CAPM be in fact true?
13.5
13.6
N
f e t 2 2 2 exp e Tt 1e t . Suppose et is i.i.d. across
2
254
Chapter 14
MORE MULTIPLE LINEAR REGRESSIONS
APPLICATION: MULTI-FACTOR ASSET PRICING
Key Points of Learning
Arbitrage Pricing Theory, Intertemporal Capital Asset Pricing Model, Risk
factors, Cross-sectional regression, Model selection, Multi-collinearity, FamaFrench three factor model, Heteroskedasticity, Test of heteroskedasticity,
White heteroskedasticity-consistent adjustment
The CAPM is a single factor asset pricing model, with the market as the
factor. The implication is that systematic risk connected to the covariation
with market returns is the only risk that is priced. However, there is no reason
why the universe of assets in the economy could not have their prices and
required returns dependent on more than one economy-wide systematic factor.
There have been two approaches to the issue of finding more factors in
asset pricing. One approach by Chen, Roll, and Ross (1986)63 and others is to
specify macroeconomic and financial market variables that have economically
valid intuitions to explain co-movements with stock returns in a systematic
fashion. For example, industrial production in an economy could increase and
correlate with higher stock prices especially for firms that have business
exposures to industrial activities, and this is in addition to the general stock
market movement.
Another approach is to look for factors to which certain firms
characteristics, e.g. firm size, firm growth potential, firms industry sector,
etc., could potentially be sensitive to.
63
Chen, NF, R Roll, and S A Ross, (1986), Economic forces and the stock market,
Journal of Business, 59, 383-403.
255
As in the CAPM theory, finding factors is not enough. It is systematic
factors that we want. Unsystematic factors are not yet considered to be
important when an investor holds a large portfolio, because then these
unsystematic noises cancel out one another, and there is no net impact on the
portfolios expected return. But for systematic factors that affect all, or mostly
all, of the economys stocks, nothwithstanding by different loadings, these
risks cannot be diversified away in a large portfolio. Sneezes in the factor will
come to haunt the portfolio performance, be it good or bad. This is the risk
business.
As stock market and portfolio performance research continues, it is
interesting to know that empirical data research oftentimes came up with
evidence of new systematic factors that are valuable to be considered. Over
time some proved to be spurious results, some due to data-snooping64, some
over-shadowed by new factors that seem to subsume the old ones, and some
disappeared with new and more recent data.
In what follows, we describe the framework to think about systematic
factors that require risk-adjusted return compensation, discussing the
Arbitrage Pricing Theory, and then moving on to empirical spadeworks.
14.2
Starting with the CAPM, Ross (1976)s Arbitrage Pricing Theory65 (APT) and
Merton (1973)s Intertemporal Capital Asset Pricing Model66 (ICAPM)
continued the stage for decades of excitement and research into the pricing of
assets. Essentially, if researchers can know what at a certain point in time is
the correct model to price an asset, then if the market prices the asset too low,
it is opportune to long the asset with the view it will increase in price soon
enough, and vice-versa. The equilibrium or correct price of an asset is of
course related to the equilibrium (ex-ante) expected return since future
expected payoffs discounted by this expected return gives the price. The
64
256
variation of returns in the cross-section of firms in the market is important
enough to be well researched, as seen in the previous chapter.
Rosss APT has come to be understood as essentially a statistical model of
prices. In a very large economy with no friction and many assets, no-arbitrage
argument (without the need to specify investors risk-return preferences) gives
rise to equilibrium expected returns. These returns are related to an unknown
number of factors in the economy that exogenously affect the returns in a
statistical way. Mertons ICAPM is an intertemporal equilibrium model where
investors make optimal consumption versus investment decisions constrained
by their preferences and resources. The risks in the economy are driven by
some finite number of economic state factors. Expected returns are related to
the nature of these economic factors as well as investor preferences implicitly.
Although the characters of both APT and ICAPM are quite different, they
both have a common intention of explaining equilibrium expected returns
based on some other market factors, whether observed or not.
Tests of APT tend to go along statistical methods such as principal
components method, factor analysis, etc. with a view to understand how many
factors there are in the economy. There have been various debates about
whether APT is testable, but we shall not worry about that here. In some
sense, ICAPM is more natural in suggesting a regression relationship between
asset returns and observed market variables that are possibly the ones
producing risks that investors must hedge. The use of observed market
variables to proxy as factors in APT or as the risks in ICAPM led to multifactor asset pricing model. There is a colossal amount of research output on
such. One of the earlier papers, effectively on multi-factor models, is by Chen,
Roll and Ross (1986) who found the following four macroeconomic variables
to be significant in explaining the variation in the cross-section of firms
returns:
(a) index of industrial production
(b) differences in promised yields to maturity on AAA versus Baa corporate
bonds (default premium)
(c) differences in promised yields to maturity on long- and short-term
government bonds (term-structure of interest rates), and
(d) unanticipated inflation .
Keim and Stambaugh (1986) found67 the following three ex-ante
observable variables that affected risk premia of stocks and bonds:
(a) difference between yields on long-term Baa grade and below corporate
bonds and yields on short-term Treasury bills
67
Keim DB, and RF Stambaugh, (1986), Predicting returns in the bond and stock
markets, Journal of Financial Economics, 17, 357-390.
257
This proxied for default premium.
(b) loge (ratio of real S&P composite index to previous long-run S&P level)
This may proxy for inflationary tendencies.
(c) loge (average share price of the lowest market value quintile of firms on
NYSE)
There appears to be some business cycle and size effect.
There have been many other similar works. All suggest that there are at most 3
to 4 significant factors or economic variables that affect variation in the crosssectional returns. It is instructive to understand the APT so that a better
understanding of why multi-factors are used in cross-sectional regressions can
be obtained. Strictly speaking, however, APT does not imply nor is implied by
multi-factor asset pricing with identified observable economic variables
explaining the cross-sectional variation in average returns.
14.3
Suppose asset returns are generated by a K-factor model (K<N where N is the
total number of assets in the economy):
K
R i E(R i ) b ij j i ,
i 1,2,, N
(14.1)
j1
where E(Ri)
And
= Ei
j's
are zero mean common risk factors (i.e., they affect asset
is return Ri via b ij ' s )
b ij ' s
i
Cov(i , j ) 0
for i j
Cov(i , j ) 0
for every i, j
R N1 E N1 B NK K1 N1
(' ) 2 I NN
( ' ) 0 KN
An example of (14.1) is for stock returns
~
R i E(R i ) b ig ~
g b iR R i
where E(Ri) is the stocks unconditional expected return in the absence of any
258
loadings or sensitivities) big > 0, and biR < 0 as are typical. When GDP is
unexpectedly high with a booming economy, the firms revenues will
unexpectedly rise and gives rise to a higher return Ri, hence positive big. When
prime rate or business cost unexpectedly rises, firms revenues will suffer
unexpectedly, leading to fall in return ri.
Suppose we can find a portfolio x N1 where the element are weights, such
that
x' l 0 ;
1
1
l
1 N 1
(14.2)
x' B 01K
(14.3)
E N1 0 l B K1
11 N1
N K
(14.5)
259
risk premia K1 for each source of the K factor risks. If we put B=0, then
clearly 0 is the riskfree rate. Note that if the asset is more sensitive, i.e., if B
increases, then the systematic risk B 1 increases, and thus E also increases.
For a single asset i, (14.5) implies:
E(R i ) rf b i11 b i2 2 b iK K
(14.6)
R i E(R i ) b ij j i ,
j1
it is seen that
R i rf b i1 ( 1 1 ) b i2 ( 2 2 ) b iK ( K K ) i
(14.7)
There are some empirical problems in estimating and testing the APT. Firstly,
the number of factors, K, is not known theoretically. Setting different K will
affect the estimation of the factor loadings B. Secondly, even if K is known, it
is not known what the factors are. One can only make guesses about economic
variables that may have an impact on Ei .
Equation (14.7) can be expressed in regression form as
~
R N 1 r f l N 1 BN K ( ) K 1 U N 1
(14.8)
where ( R N1 ) E N1 ,
~
(14.9)
(U ) 0
~~
( U T ) K N 0
.
Each equation in the system in (14.8) at time t is
R i rf b i11 b i2 2 b iK K i
(14.10)
260
R it rft b i11t b i 2 2 t b iK Kt u it ?
(14.11)
where it is risk (premium) factor that can vary over time and E()=. It would
be a straightforward test of APT, but unfortunately the risk factors jt, and the
number of these K, are usually unknown. The multi-factor approach using
macroeconomic variables or using own firm variables are attempts to guess
such a structure and hope to approximate them.
In the regression specification in (14.11) which we can adopt for multifactor models, 1t can be defined as ones, i.e. allowing for a constant intercept.
Further, uit may in general covary with some or all of jts. This is because
APT deals with a single period time frame and strictly speaking, has little to
say about inter-temporal properties of stochastic processes. However, we shall
add a bit more restriction to enable nice econometric results, i.e. we assume
E(uit jt)=0, for every t, i, and j .
Provided is a stationary random variable, OLS regression can be
performed on (14.11). To re-iterate, in practice, we do not know a priori what
are the bijs (sensitivities) for each i, and what are the jts for each t. Nor do
we know what is the regressor span K. However, if we link this framework
with multi-factor asset pricing model, then we can postulate jts to be some
specific observed market-wide economic variables for j=1, 2, , K. (Though
this also implies their expected values are js, the risk premia themselves.)
Suppose we can perform a time series regression on (14.11), assuming we
observe the risk factors jt for each j, and over time t. We can also check the
effectiveness of out-of-sample forecast with realized returns. At t the out-ofsample forecast of Rit+1-rft+1, conditional on given 1t1 , , Kt 1 , is
i b i1 1t 1 b ik Kt 1
R it rft i b i11t b iK Kt u it .
(14.12)
We shall come back to this framework when we discuss the Fama and French
three-factor model.
14.4
CROSS-SECTIONAL REGRESSION
261
trading period, we can do many wonderful long-short strategies over huge
portfolios, so as to minimize volatility risks, to make super- or abnormal
profits. Hedge funds and many investment funds tracking some indexes with
enhanced alphas do this all the time, and other things as well.
Suppose we are back to using APT in (14.5) and (14.6). Adding a constant
intercept:
E(R i ) rf a b i11 b i2 2 b iK K
.
This is a perfect candidate for cross-sectional regression as in the single-factor
Fama-MacBeth study of CAPM testing seen in Chapter 13. Using an estimate
such as portfolio average return as dependent variable, and estimated beta or
loading sensitivities bijs, j=1,2,,K; i=1,2,.,N, as explanatory variables,
the regression is run across i as follows:
R i rf a 1b i1 2 b i2 K b iK u i .
(14.13)
b i11 b i2 2 b iK K i .
We can test this multi-factor asset pricing model by testing if the (14.13)
multiple linear regression coefficient estimates of factor means
1 , 2 ,, K are significantly different from zeros. If so there is statistical
evidence of the model.
Fama and French (1992)68 suggested (for US NYSE, AMEX, and
NASDAQ exchange stocks that are recorded in CRSP database) that for crosssectional regression with stock return as dependent variable, in addition to
estimated CAPM beta b i , ln(ME), ln(BE/ME), ln(BA/ME), and E/P variables
of stock i in year m can be used for bi1 , bi2 , bi3, ., iK.
ME = market equity $ value = stock is last price number of stock is shares
outstanding in the market.
BE = book equity value
BA = total book asset value = BE + BL where BL is book liability value
E/P = earnings to price ratio of the stock i
ln(ME) represents size of market equity. Small size firms have lower ln(ME)
values than larger firms. We expect smaller firms to be systematically more
68
262
risky and thus the market will require higher return.
ln(BE/ME) represents book-to-market equity value. Underpriced stocks lead
to high book-to-market equity value, and vice-versa. Underpriced stocks are
systematically more risky and will fetch higher required returns.
ln(BA/ME) represents relative leverage. Higher ln(BA/ME) implies a
relatively higher component of debt. This increases beta, but beyond that it
also increases default risk, which leads to higher expected return. High E/P
indicates underpriced stock (despite good earnings) and will explain higher
expected returns.69
In the Fama and French studies on cross-sectional regressions, for each
period such as a month, the cross-sectional multiple linear regression (14.13)
yields coefficient estimates of factor means 1 , 2 ,, K and their tstatistics. Over all the periods or months, the estimated first coefficients e.g.
1 can be treated like a time series, and its t-statistics can be obtained as
1 T
1t
T t 1
1 T
1 T
1t
1t
T t 1
T t 1
to test if the estimates are significantly different from zero assuming they are
randomly distributed about zero under the null.
14.5
SINGAPORE FACTORS
The reported annual financial data of 60 major firms listed on the first board
of the Singapore Exchange and featured in the Straits Times Index, are used.
For each firm the follow data is collected from source book: Corporate
Handbook Singapore published by CEIC Holdings Ltd in connection with
Thomson Financial Publishing. The data are contained in Multi Factor.xls
Beta
Closing Share Price (S$)
Price/Earnings Close
Number of Ordinary Shares Outstanding
69
Jaffe, Keim, and Westerfield (1989) suggested a U-shape for average return versus
E/P ratio. See their paper, Earnings yields, market values, and stock returns, Journal
of Finance 44, 135-148.
263
Gross Dividend Per Share (cent)
Debt/Equity Ratio
Total Book Assets (In thousands $)
Total Book Equity (In thousands $)
Similar data for each of years 1996, 1997, and 1998, were collected. We run a
cross-sectional OLS regression similar to (14.13), replacing rf with constant of
regression a, and adding two more regressors (excess return could also be
used, though similar results could be obtained):
R i a 1 b i1 2 b i2 3 b i3 4 b i4 5 b i5 6 b i6 u i (14.14)
where classical conditions are assumed to be satisfied.
Stock is annual return is formed from previous year end price Pt, and
current year end price Pt+1, and then taking ln(Pt+1/Pt) = Ri. For beta of each
stock i, instead of using the Fama-MacBeth procedure, or the procedure in
Fama and French (1992) where portfolios of stocks with similar beta are first
formed and used to estimate their beta from time series (then using this
portfolio beta as the same beta applied to each stock in the portfolio), we use
the reported beta in the source book. bi1 = stock i beta.
ME = S$ Closing Share Price (S$) x Number of Ordinary Shares Outstanding
bi2 = stock is ln(ME)
bi3 = stock is ln(BE/ME) = ln (Total Book Equity / ME )
bi4 = stock is ln(BA/ME) = ln (Total Book Assets / ME )
bi5 = stock is P/E ratio
bi6 = stock is debt/equity ratio
The result of the OLS cross-sectional regression in (14.14) for the above
regressors for the year 1996 are reported as follows in Table 14.1.
Starting with a larger set of regressors, we shrink the set till adjusted R 2 is
about maximized. At the same time, we notice that regressors that are not
significant can be removed and adjusted R2 increases as a result. In the sample
data, we find that coefficients on ln(BA/ME), E/P, and D/E are not
significantly different from zero. Coefficients for beta are negative and mostly
insignificant, so are not considered. We could arrive at
E(Ri) = -1.898 + 0.048 * ln(ME) 0.117*ln(BE/ME)
as the best fitting model where the coefficient estimates are also significant
and make sense.
This result is similar to the US study in that beta is found to be
unimportant in explaining the cross-sectional returns. The data show that beta
is positively correlated with firm size. However, there are two major
264
differences. Firstly, the size effect appears to be reversed. Large capitalization
firm appears to fetch higher expected returns, unlike in U.S. Larger book-tomarket equity produces lower returns across the section. These results are
direct opposites to the US results. Why?
Table 14.1
The regressors are shown on the first row. The OLS estimates of the
coefficients are reported below. Blanks indicate that the regression does
not use that regressor. Each row or case reports a different regression
using different sets of regressors. (Numbers in brackets below estimates
indicate the corresponding p-values.)
constant
beta
ln(ME)
ln(BE/ME)
ln(BA/ME)
E/P
D/E
adj R2
case
1
-2.273
-0.00008
0.060
-0.0949
-0.0413
0.239
0.0230
0.0715
(0.0037)
(0.748)
(0.056)
(0.1519)
(0.4773)
(0.7351)
(0.4605)
case
2
-2.061
(0.0028)
-0.00011
(0.648)
0.055
(0.0672)
-0.117
(0.0232)
case
3
-2.029
-0.00009
0.053
-0.101
-0.0217
(0.0031)
(0.710)
(0.074)
(0.1131)
(0.6691)
case
4
-1.975
-0.0001
0.052
-0.117
(0.0031)
(0.641)
(0.0784)
(0.0225)
case
5
-1.978
0.051
-0.0980
-0.0254
(0.0030)
(0.0775)
(0.1182)
(0.6088)
case
6
-1.986
(0.0027)
0.052
(0.0736)
-0.117
(0.0226)
case
7
case
8
-1.898
0.048
-0.117
(0.0030)
(0.0867)
(0.022)
case
9
-1.266
0.057
(0.0309)
(0.047)
-1.333
-0.0001
0.061
(0.0302)
(0.687)
(0.0452)
0.0172
(0.5563)
0.0965
0.0938
0.1070
0.1077
0.0174
(0.5498)
0.1093
0.1192
0.0363
0.0502
265
It is instructive to note that the variable log(BA/ME) [which by itself is
not significant] is highly correlated with several others especially
log(BE/ME). This introduces multi-collinearity problem into the regression.
In Table 14.1, case 5 and case 6 regressions show that when both
log(BE/ME) and log(BA/ME) are used together, the standard error (hence pvalue) of the coefficient estimator of log(BE/ME) increases. Without
log(BA/ME), coefficient of log(BE/ME) is significantly different from zero at
a p-value of 0.02. When log(BA/ME) is also present, multi-collinearity
introduces more sampling noise, and the coefficient of log(BE/ME) is then
significantly different from zero at a p-value of 0.12.
This suggests dropping log(BA/ME) from the set of regressors since it
does not appear to be significantly different from zero. Further, we run the
optimal maximal adjusted R2 model (case 7 of Table 14.1)
Ri = c0 + c1 log(ME) + c2 log(BE/ME) + ui , i=1,2,3,,60
and then perform a Whites test of (unknown) heteroskedasticity.
Heteroskedasticity is typically a problem in a cross-sectional regression since
each stock in the sample has heterogeneous variances, and may have returns
correlations. The results are shown in Table 14.2 below.
Table 14.2
Whites Heteroskedasticity Test
266
Thus, the null of no heteroskedasticity is rejected at 5% significance level
since the p-value is 0.042445 for the F5,54 test. The 5 degrees of freedom
comes from the restrictions to zero in the regression of u 2t on logme, logme^2,
logme*logbeme, logbeme, and logbeme^2. We also perform OLS with
Whites HCCME (Heteroskedasticity Consistent Covariance Matrix
Estimator). The results are reported in Table 14.3. It is seen that the results do
not change a lot in this case from the OLS.
Table 14.3
OLS Regression with Whites HCCME
14.6
R it rft i b i11t b iK Kt u it .
(14.15)
In contrast to the cross-sectional regression, here the bijs are estimated for
267
each asset i.
One of several very influential papers by Fama and French (1993)70
suggests that the market factor, and two other firm attributes viz. capitalization
size, and book-to-market equity ratio, are three variables that correspond each
to a common risk factor affecting the cross-section of stocks. Stocks with high
book-to-market equity ratio are called value stocks, and those with low bookto-market equity ratio are called growth stocks. Of course, there are stocks
with persisting high book-to-market equity ratio that are about to go belly up
or default. Together with an earlier 1992 study, Fama and French broke
completely new and fascinating ground in the world of investment finance by
pointing out presumably better explanations for the cross sectional expected
returns of stocks than what single factor CAPM does. The new proxies of
systematic factors they suggested led to voluminous research that followed.
From the Fama and French (1992) U.S. study, the basic message is that
small size stocks and high book-to-equity ratio or value stocks tend to provide
higher expected returns.
14.7
The Fama and French (1993) approach is essentially a time series regression
employing (14.15) where the variables associated with the common random
risk factors {1t , 2t , 3t ,., Kt}t=1,2,,T are the explanatory variables whose
expectations are the risk premia 1, 2, . ,K, and the coefficients to be
estimated are the sensitivities of stock or portfolio i. The dependent variable is
the time series of stock or portfolio is excess return.
The choice in using {1t , 2t , 3t ,., Kt}t=1,2,,T of course comes from the
earlier Fama and French (1992) research that had shown that their means {1,
2, . ,K} do affect the expected returns of stocks cross-sectionally.
Under CAPM, we already identified 1t being the excess market return, so that
E(1t) = 1 is the market premium. Fama and French (1992) show that smallsized firms produce cross-sectionally (consistently the same each period)
higher returns than large-sized firms. Thus a brilliant guess of 2t that
corresponds with this size-factor 2 is for 2t = difference in average return of
small-sized stocks at t and large-sized stocks at t. Small versus large portfolios
formed for this purpose produces a return difference called the SMB (small
minus big) factor variable that can be determined each period.
Likewise, if in (14.13), it is shown that high book-to-market equity ratio
firms produce cross-sectionally (consistently the same each period) higher
returns than small book-to-market equity ratio firms, then a brilliant guess of
70
Fama, Eugene, and K R French, (1993), Common risk factors in the returns on
stocks and bonds, Journal of Financial Economics 33, 3-56.
268
3t that corresponds with this BE/ME-factor 3 is for 3t = difference in
average return of high book-to-market equity ratio stocks at t and low bookto-market equity ratio stocks at t. High BE/ME stock versus low BE/ME stock
portfolios formed for this purpose produces a return difference called the
HML (high minus low) factor variable that can be determined each period.
Fama and French (1993) found that these two variables are important risk
factors (in addition to the market factor r mt rft that we already know through
CAPM) in explaining every stock is return variations over time. It is in
explaining every stocks variations that affords these risk factors to be
considered as systematic across the market.
Clearly, it may be interpreted that the mean over time of random
systematic risk, 2t = difference in average return of small-sized stocks at t and
large-sized stocks at t, is 2 0 (>0). That is why this becomes a non-zero
coefficient in the cross-sectional regression in (14.13). The same can be said
for the time-averaged mean of 3t = difference in average return of high bookto-market equity ratio stocks at t and low book-to-market equity ratio stocks at
t. This mean is also a non-zero coefficient 3 in the cross-sectional regression
in (14.13). Unlike factors linked to macroeconomic variables, these FamaFrench factors can be constructed like funds that can be traded and used for
hedging. Thus, there is greater plausibility in their use by the market, and their
role as systematic factors.
14.8
PROBLEM SET
14.1
A particular stocks monthly return rate at t, r1t = 0.01 + 1.2 rmt + e1t ,
where rmt is the market portfolio return at t, and e1t is a residual noise
statistically independent of rmt. Another stocks monthly return rate at
t, r2t = 0.005 0.1 rmt + e2t, e2t is a residual noise also statistically
independent of rmt. Further, suppose cov(e1t , e2t) = 0.036. (Assume all
returns are covariance-stationary.)
(i)
(ii)
14.2
cov(r1t , r2t) > 0, find an upper bound for the variance of rmt.
Suppose r1t and r2t can also be represented by
r1t = 0.01 + 1.2 rmt + 0.2 It + 1t
r2t = 0.005 0.1 rmt + 0.3 It + 2t
where It is an industry factor at t that is statistically
independent of rmt , and 1t and 2t are i.i.d. noises, then what
is the variance of It?
269
series regressions on all stocks i:-
1 T
2
u it , and suppose
T t 1
R i rf a 1 b i1 2 b i2 K b iK K 1 i2 u i
(i) If all stocks form well-diversified portfolios of investors, do you
expect K 1 to be significantly different from zero?
(ii) If investors generally do not hold well-diversified portfolio, do
you expect K 1 to be significantly positive? What do you call
K 1 ?
270
Chapter 15
ERRORS-IN-VARIABLE
APPLICATION: EXCHANGE RATES
AND RISK PREMIUM
Key Points of Learning
Spot exchange rate, Forward exchange rate, Unbiased expectation hypothesis,
Test of Restriction, Overlapping data problem, Serial correlation, DurbinWatson d-statistic, Errors-in-variable, Missing variable problem, Forward risk
premium, Forward premium, Forward discount, Interest rate parity
This chapter covers the interesting topic on exchange rates. Currency trading
and speculation is one of the oldest and largest games in town. Multi-national
corporations also enter the currency market to hedge their currency exposures.
Corporate treasurers and finance controllers are familiar with transaction in
the forward market to hedge transactions denominated in other currencies. It is
important to understand the relationship between the forward and the spot
prices. The cost of carry model learned earlier in the Nikkei stock index can
be applied in most forward contract situation such as this. We discuss and test
hypothesis about unbiased expectations and about risk premium. The
interesting case of overlapping data problem is shown and the pitfalls are told.
The test of serial correlation in the Durbin-Watson statistic is discussed. We
introduce the topic on tests of restrictions in the coefficients.
15.1
FOREIGN EXCHANGE
Exchange rates have been highly volatile since 1973 when the US and most of
the developed countries departed from fixed rate regime. Exchange rates are
heavily studied because of the critical role they play in lubricating world trade
machineries. They are also an important market for hedgers and also
speculators. The currency forward market has become one of the principal and
major world market.
Spot exchange rate is the rate valued at spot for current transaction,
although market practices require actual exchange of currencies only one to
two trading days later. A direct quote to U.S. citizen is in terms of number of
US$ per unit of the foreign currency, e.g. $0.7 per S$, or $1.35 per Euro. An
271
indirect quote to U.S. citizen is in terms of number of foreign currency units
per US$, e.g. S$1.4286 per US$ or 0.74 euro per US$. A direct quote of
US$1.02 per C$ is an indirect quote to Canadian citizens. To avoid these
confusions, we shall use the notation US$/FC to denote the number of US$
per unit of foreign currency, or FC/US$ to denote the number of units of
foreign currency per US$. Unless otherwise stated, $ refers to US$. The /
sign above denotes a quotient. This is sometimes confusing as the way
quotations are read in the foreign exchange market is to put the base currency
first followed by the variable currency, and this is written in the short-form as
base currency/variable currency. We shall keep to the quotient notation.
A forward x-month rate is a rate valued currently for transaction x-months
from present. Thus a forward rate is a contractual rate locking in a rate for the
future. Common forward contracts have x equal to 1-month, 3-months, 6months, and so on. The near-term (shorter maturity x) contracts tend to be
more liquid and heavily traded. For example, a forward 3-month S$/$ rate is
1.45. This means that in 3-months time, buyer of $ forward will pay S$1.45
for each US$. Seller will collect S$1.45 for each US$ delivered.
15.2
The unbiased expectations hypothesis (UEH) states that todays forward xmonth rate is an unbiased predictor or forecast or expectation of the future xmonth spot rate.
Ft,t+x = Et (St+x)
(15.1)
For the UEH in (1), Ft,t+x or $/Y is the forward rate of currency Y contracted
at time t for transaction at t+x > t. Thus the forward rate is an x-month forward
rate. St+x $/Y is the spot rate of currency Y at time t+x when the forward
contract matures. From (1), Et(St+x Ft,t+x) = 0, so UEH implies that forward
exchange speculation, i.e. buy cheaper currency Y in $ forward and selling at
a dearer expected spot price Y in $ later, or vice-versa, will not produce any
profit. Thus, intuitively, it suggests that the currency market is speculatively
efficient.
Bilson (1981)71 suggests that the hypothesis that forward prices are the
best unbiased forecast of future spot prices is not necessarily equivalent to the
efficient markets hypothesis (EMH). However, strictly speaking, the rational
expectations hypothesis is required in (15.1) when conditioning on market
information is done. In fact, if the time interval of forecast is long, and market
investors are risk averse and not risk-neutral, then a positive cost-of-carry
71
272
model would predict St = Ft,t+x / (1+rt,t+x) = Et(St+x) / (1+k), where rt,t+x is the
riskfree interest cost of carry over [t,t+x], and k > r t,t+x is the risk-adjusted
cost. Hence, Ft,t+x = Et(St+x) [(1+rt,t+x) / (1+k)] = Et(St+x) - , where = Et(St+x)
(rt,t+x - k ) / (1+k) < 0, is a risk premium. Hence, there this is an equilibrium
model in which market expectations are rational but in which there is forward
bias or negative risk premium. If the forecast interval is short, and interest cost
rt,t+x and k are small, then the UEH may hold true in an approximate sense.
Of course, if investors are risk-neutral, i.e. k = rt,t+x, then UEH holds exactly.
Despite the existence of many models of currency risk bias, UEH has been
widely tested and investigated. Especially over short horizons and over
periods when market uncertainties are not rampant, the UEH may be fairly
accurate.
15.3
TEST OF RESTRICTIONS
The UEH in (15.1) may be tested for its restriction on observed time series of
spot rates St+x , and forward rates Ft,t+x . Suppose we employ monthly (end of
month) data, and we match spot rate at t+x, St+x , with forward rate applicable
at t+x but contracted earlier at t, i.e. Ft,t+x . At any one point in time, there
could be various forward rate matches for a single spot St+x, since x could be
1-month, 3-months, 6-months, etc. Suppose
St+x = E(St+x | t) + et+x ,
(15.2)
where t is the market information set used at t to predict St+x at t+x. The
prediction or forecast error et+x can be characterized as follows. Taking
conditional expectation on (15.2) implies
E(St+x | t) = E[ E(St+x | t) | t] + E(et+x | t) .
Using the law of iterated expectations, the right-hand side is E(St+x| t) +
E(et+x | t) , so
E(et+x | t) = 0.
(15.3)
But et+x = St+x Ft,t+x , using (15.1) and (15.2). Therefore UEH implies E(St+x
Ft,t+x|t) = 0, which was discussed earlier. (15.3) E(et+x) = 0.
Since Ft,t+x is known at t, Ft,t+xt . So, (15.3) E(et+x | t) = 0 E(Ft,t+x
et+x | t) = 0 cov(Ft,t+x , et+x | t) = 0. UEH implies (15.2) which can be
written as
St+x = a + b Ft,t+x + et+x
(15.4)
where a = 0, b = 1, mean of et+x is zero and et+x has zero correlation with Ft,t+x .
Thus, a testable hypothesis of the UEH is to test the restrictions H0: a= 0 and b
= 1 jointly in the linear regression model in (15.4).
15.4
In testing (15.4), suppose we use x = 3-months, i.e. forward 3-month rates are
273
used. The system of linear equations in the linear regression model involving
the time series variables St+x and Ft,t+x are as follows, starting with forward rate
F0,3 as the earliest observed data point.
ST = c0 + c1FT-3,T + eT
..
St+3 = c0 + c1Ft,t+3 + et+3
S4 = c0 + c1F1,4 + e4
S3 = c0 + c1F0,3 + e3
This system of equations leads to the following characterization of the
disturbance errors.
eT = ST - c0 - c1FT-3,T
e5 = S5 - c0 - c1F2,5
e4 = S4 - c0 - c1F1,4
e3 = S3 - c0 - c1F0,3
This can be depicted as follows.
Figure 15.1
Overlapping of Forecast Errors
The forecast error spans the period [2,5]. Since this forecast error
overlaps with the earlier forecast error over [1,4], there may be
correlation in the forecast errors
Time t in
months
Some events happening between t=3 and t=4, for example, will affect S5 and
also S4. Hence they affect e5 and e4. Thus e5 and e4 may be correlated. This
results in overlapping forecast errors. To characterize this further, we employ
an auxiliary model on the spot rates. Suppose the spot market dynamics is S t+1
= + St + vt+1 where vt+1 is i.i.d. and has zero mean. is too small so that after
transaction cost, there is no arbitrage profit to be made. If = 0, then {St} is a
strong version random walk with zero drift. Then F2,5 = E2(S5) = E2( [S5S4]+[S4-S3]+[S3-S2]+S2 ) = E2( 3+v5+v4+v3+S2) = 3+S2.
In general, Ft,t+x = x+St . Then, the forecast error
et+x = St+x Ft,t+x = St+x St a
274
where a = x, a constant.
The correlation of overlapping forecast errors are
cov(e6 , e5)
= cov(S6-S3,S5-S2 )
= cov([S6-S5]+[S5-S4]+[S4-S3] , [S5-S4]+[S4-S3]+[S3-S2])
= cov(v6 + v5 + v4 , v5 + v4 + v3)
= var(v5) + var(v4) 0.
Indeed they are positively correlated. Therefore e5 is correlated with e4 , e4
with e3, and so on. This is called the overlapping data problem. This is a
problem because the serial correlation in et+xs implies that OLS using (15.4)
will not satisfy all the classical conditions with regard to disturbance qualities.
In the above, it is empirically plausible that disturbance e t is AR(2) or AR(1),
so estimated GLS will need to be done instead of OLS if the serial correlation
is not close to zero. OLS on (15.4) without correcting for serial correlation
will result in the estimators not BLUE, although they are still unbiased.
However, in the latter, statistical inference using the OLS estimated
covariances cannot be applied.
We provide a regression of monthly spot C$/$ from March 1989 to May
2002 on monthly 3-month forward C$/$ rate from December 1988 to February
2002. We may view this as testing the UEH from the point of view of the US$
in terms of C$. In other words, we like to validate if the forward C$ value of a
US$ is a best forcast of future spot US$ in terms of C$.
We show the results of such OLS regressions as follows.
Table 15.1
Regression:- St+x = a + b Ft,t+x + et+x
Dependent variable is spot rate, Sample size 159
275
Serial correlation in the disturbance term can be checked using the DurbinWatson (DW) d-statistic.
N
u
t 2
u t 1
u
t 1
2
t
u t u t 1
t 2
u
t 1
2
t
N
N
2 u 2t u t u t 1
2 1
t 2 N t 2
u 2t1
t 2
where
u u
t 2
N
u
t 2
d
1
2
t 1
(15.5)
may be inferred from the DW statistic. If d > 2, then < 0, and ut is likely to
be negatively correlated. If d < 2, then > 0, and ut is likely to be positively
correlated. If d is close to 2, then 0, and ut is likely to be zero-correlated.
In the above regression of (15.4), the DW d-statistic is 0.819. Thus, there
appears to be a strong positive serial correlation in ut+x . This is suggested by
the overlapping data problem.
This statistic follows a Durbin-Watson or DW distribution when the null
is zero correlation.72 D-W distribution is reported in table form giving for a
sample size N, a given number K of regressors (including the constant), and
the significance level, two numbers dL and dH where dL < dH . The null
hypothesis is H0: 0 .
If DW d < 2, then if d < dL , reject H0 in favor of alternative hypothesis
HA: 0 . But if DW d > dH , then we cannot reject (thus accept) H0.
If DW d > 2, then if 4-d < dL , reject H0 in favor of alternative hypothesis HA:
72
When the null of the disturbance is an AR(1) process, then the Durbin-Watson hstatistic is used.
276
1 2 S3
FN-3,N*=FN-3,N - FN-4,N-1
S3* =
F1,4*=F1,4 - F0,3
F0,3* =
1 2 F0,3
Suggestions of the initial point corrections were apparently by Prais and Winsten in
Prais SJ and CB Winsten, Trend Estimates and Serial Correlation, Unpublished
Cowles Foundation Discussion Paper, Chicago, 1954.
277
Here, without necessarily using the auxiliary model, we can show that the
forecast errors are zero-correlated. Note that we employ the law of iterated
expectations.
E(e6 e3) = E [ E3(e6 e3) ] = E [ e3 E3(e6 ) ] = E[ e3 0 ] = 0.
Since E(et+x) = 0, therefore cov(et+x , et) = E(et+x et) E(et+x) E(et) = 0. Thus the
OLS regression on the non-overlapping dataset now satisfies the classical
conditions.
(a)
(b)
In Table 15.2, note that Note that the DW d-statistic is now 2.17 and does not
appear to indicate presence of positive or negative serial correlation. For
N=53, k=2, at 5% significance level, dL=1.518, dH=1.595. Now 4-d = 1.83 >
dH, hence we cannot reject H0: = 0.
Under UEH in (15.4), we test the H0: a= 0 and b=1. The following Table
15.3 employing the Wald Test shows that the null hypothesis, hence UEH, is
not rejected.
Table 15.2
Regression of Spot rate on forward rate
Sample size 53 after adjusting endpoints
278
Table 15.3
Wald Test using non-overlapping regression
The probability figures on the top right of Table 15.3 shows that we cannot
reject the H0: c0=0 , c1=1 at up to 38.71% significance level.
15.5
According to (15.1), Ft,t+x = c0 + c1 Et (St+x), where c0=0, and c1=1. The righthand explanatory variable in this case is a conditional forecast which is not
observed by the econometrician.74
We shall suppose that this conditional forecast E(St+x| t) is estimated by
current spot St but with a measurement error at t, t , i.e.
St = E(St+x| t) + t .
(15.6)
We shall assume that t is uncorrelated with E(St+x| t). This means that the
error t is not related to t. This is sometimes also called the errors-invariables problem. Though measurement error t is not correlated with E(St+x|
t), it is, however, necessarily correlated with St since cov(St , t) = var(t) >
0. Suppose we instead run OLS regression on
74
Here it is not appropriate to use the auxiliary model on spot rates: St+x = St + vt+x
where vt+x is i.i.d., because taking the conditional expectation with respect to t yields
E(St+x | t) = St + E(vt+x | t) = St which suggests the market forecast is observed. This
is not our purpose here.
279
Ft,t+x = c0 + c1 St + ut .
(15.7)
S
N -3
c 1
t 0
N -3
S
t 0
S Ft,t 3
t
S
N -3
S S t
t 0
S c 0 c1S t u t
S
N -3
t 0
S S t
S u t
N -3
c1
t 0
N -3
S
t 0
S S t
Therefore,
S
N -3
Ec 1 c1 E
t 0
N -3
S
t 0
S u t
t
S S t
c1
(15.8)
1
S t S S t
N t 0
and ut are contemporaneous correlated. Hence c 1 is not consistent.
N -3
From (15.8),
1 N-3
1 N-3
St S u t
St S u t
N t 0
N t 0
c 1 c1
c1 N-3
.
1 N-3
1
St S St
St S St S
N t 0
N t 0
280
So,
plim
plim c 1 c1
N
1 N-3
St S u t
N t 0
1 N-3
St S St S
N N t 0
covS t , u t
c1
var S t
plim
c1
c1 var t
var S t
var t
c1 1
c1
var S t
Thus when the sample size is large, the estimator c 1 will be biased downward.
The OLS regression result of (15.7) with the measurement error problem
is shown as follows in the following regression Table 15.4.
Table 15.4
Dependent variable is F3M
Sample size 159
281
1.01 and 1.02.
15.6
MISSING VARIABLE PROBLEM
Other kinds of serial correlation (error autocorrelation) problems can occur
with missing explanatory variable in the regression specification. Suppose in
the OLS regression model
Yt = c0 + c1Xt + et ,
and there is a missing variable Zt . Moreover, suppose the process for Zt is
Zt = Zt-1 + vt , 0 .
Including this to produce the correct model leads to
Yt = c0 + c1Xt + c2Zt + ut ,
where {ut} and {vt} are independent i.i.d. processes.
Therefore,
et = c2Zt + ut .
(15.9)
Thus, this misspecified model disturbance
et = c2 ( Zt-1 + vt) + ut
= c2 Zt-1 + c2vt + ut .
Using (15.9) for c2 Zt-1 = et-1 ut-1, then
et = (et-1-ut-1) + c2vt + ut .
Hence et is serially correlated in most cases. However, if there are not missing
explanatory variables, then we do not detect serial correlations. We show such
an example as follows in Table 15.5.
Table 15.5
Dependent variable is differenced spot
Later we will see that in most cases spot rates (or their loge) and forward rates
282
are unit root processes I(1), in which case they are likely to be stochastic trend
stationary and so (St+x Ft,t+x) may itself be I(1). Then the above regression
may be spurious, i.e. the estimates may not mean anything and may not
converge to any population parameters or moments.
For the regression to make sense, which we implicitly assume here, St+x
bFt,t+x (b a constant, can be 1) should be I(0), i.e. stationary. Then St+x and Ft,t+x
are said to be cointegrated with cointegrating vector (1, -b). Then the OLS
regression in (15.4)
St+x = a + bFt,t+x + et+x
will produce superconsistent (implies consistent) estimators. To avoid
spuriousness, we can also perform the results in Table 15.5 using differenced
series in the regression, i.e.
(St+x St)/St = a + b (Ft,t+x St)/St + ut+x .
15.7
15.8
PROBLEM SET
15.1.
R2=0.850
283
If Friedmans real expected income per capita data, Yet, were used
instead, the regression was estimated as:
Dt = -0.7247 - 0.048802 Pt + 0.025487 Yet
R2=0.895
(i) Suppose that disposable income Ydt = Yet + error, i.e. disposable
income is expected income plus disturbance. How would this
explain the relative magnitudes of the estimated coefficients of Ydt
and Yet?
(ii) In forecasting the next years demand for automobile stock per
capita, suppose price will be fixed as at the current years level at
151.1, and real expected income per capita is taken to be at 828.8,
what will be the forecast?
(iii) Will the forecast error be affected by the value taken as the real
expected income per capita next year given that this is a very
accurate assessment?
(iv) Suppose in the above regression, Friedmans real expected
income per capita variable was not observable, but instead it was
assumed that it evolved according to adaptive expectation, i.e.
Yet = Ye t-1 + (Ydt - Ye t-1), would the estimated coefficients using
OLS still be BLUE? Explain.
15.2
15.3
284
Dependent Variable: COUNT
Method: Least Squares
Sample: 1 751
Included observations: 751
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
PRICE
-678.2789
84.29917
59.66590
2.701127
-11.36795
31.20888
0.0000
0.0000
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
????
????
385.7005
111424892
-5536.873
0.041786
1131.280
584.6037
14.75066
14.76297
????
0.000000
285
Chapter 16
UNIT ROOT PROCESSES
APPLICATION: PURCHASING POWER PARITY
Key Points of Learning
Non-stationary process, Deterministic trend, Trend stationary process,
Stochastic trend, Difference stationary process, Augmented Dickey-Fuller
statistic, Spurious regression, Cointegration, Super-consistency, Real
exchange rate, Long-run PPP equilibrium, Exchange rate forecasting, Error
correction model
NON-STATIONARY PROCESS
cov( t ,
t k
E( t ) 0 and var( t )
, a constant.
286
t-1, t ) 0.
Yt ( Yt -2 t 1 ) t
2 ( Yt -3 t 2 ) t t 1
t Y0 t t 1 t 2 2 1
Thus we see that a unit root process in Yt leads to Yt having a time trend t as
t 1
j 0
t j
starting value Y0 is still a random variable, although its variance may be very
2
small. Clearly, if E(Y0 ) 0 , var(Y0 ) 0 then
E(Yt ) 0 t
0 , provided 0 .
Hence the mean of Yt increases (decreases) with time according to drift >
(<) 0. Also,
t-1
2
2
var(Yt ) 0 var[ t- j ] 0 .
j0
However, the variance of Yt changes due to the presence of a stochastic trend
in the unit root process. Therefore, {Yt} is not covariance-stationary, or we
shall simply call it non-stationary.
Suppose random variable Yt is trend stationary, i.e. stationary about a
deterministic time trend. By definition, a trend stationary process, unlike a
unit root process, does not have a stochastic trend, and thus does not display
changing variance over time, although its mean t does change over time. The
unit root process, however, possesses both a time trend as in the trend
stationary process, and also an additional stochastic trend. The following is a
trend stationary process fluctuating randomly about the deterministic trend +
t.
Yt t t
(16.2)
287
where t is time, and are constants, and t is a stationary random variable
with zero mean and i.i.d., such that var Yt var t 2 . Then
Yt 1 (t 1) t 1
and so Yt Yt Y
t 1 t
or Yt Y
t
t 1
(16.3)
2
where var t var t
t 1 2 , since t is i.i.d.
Equation (16.3) may look like the unit root process in (16.1). However, it
is really not so75 because the stationary noise term t carries a special
structure (thus we do not call this a difference stationary process). If we iterate
the process through time,
Yt ( Yt -2 t 1 ) t
2 ( Yt -3 t 2 ) t t 1
t Y0 t t 1 t 2 2 1
t Y0 t 0
2
where var t 0 2 . Here we treat the starting value Y0 as a constant.
We thus see that for a trend stationary process, the variance of Y t stays the
same even as t increases. There is no stochastic trend, and the variance of Yt
does not change through time. The big difference is that the noise at the end of
a trend stationary process in (16.3) t does not add up variance as fast as the
noise in a unit root process t.
Let us recall. A unit root process contains a deterministic time trend plus a
stochastic trend. The latter causes the unit root process to have changing
variances over time. A process with just deterministic time trend plus a
stationary noise, but not a stochastic trend, is called a trend stationary process.
75
One of the earliest and exciting papers to point out this difference is Nelson,
Charles, and Charles Plosser, (1982), Trends and Random Walks in Macroeconomic
Time Series: Some Evidence and Implications, Journal of Monetary Economics 10,
130-162.
288
A unit root process may thus be construed as a trend stationary process to
t 1
j1
time, then the stochastic trend will induce even larger variances in the unit
root process. While both a trend stationary process and a unit root process will
display similar increasing trend (expected values or means) if >0, the unit
root process will display increasing volatility over time relative to the trend
stationary process. This distinction is important to differentiate the two.
In more general terms, equation (16.1) can be represented by ARIMA
(p,1,q) where p and q need not be zero for a unit root process. Thus, more
general unit root processes can be modeled by ARIMA (p,2,q), and so on.
Figure 16.1
Time Series Graphs of Stochastic Processes
6
0
1
13
19
25
31
37
43
49
55
61
67
73
79
85
91
97
-2
-4
-6
In Figure 16.1 above, we show how the three different processes would have
looked like. Clearly the unit root process can produce large deviations away
from the mean.
289
16.2
SPURIOUS REGRESSION
Suppose
Yt Yt-1 e t
Z t Z t -1 u t
and et and ut are independent of each other. They are also not correlated with
Yt-1 and Zt-1 . {Yt} and {Zt} are unit root processes with drifts and
respectively. Then,
Yt t Y0 (e t e t 1 e1 )
Z t t Z 0 (u t u t -1 u )
1
showing their deterministic as well as stochastic trends. Let Y0 and Z0 be
independent. Then,
t
t
cov(Yt , Z t ) cov(Y0 , Z0 ) cov( e j , u k ) 0
j1 k 1
Yt a bZ t t
Yt a bZ t t
, cov(Zt , t ) 0
(16.4)
t -1
t 1
t Y0 e t - j a b (t Z 0 u t - j ) t
j0
j0
Divide through by t
Y
bZ 0
1
a
1
0 e t j b
b u t- j t .
t
t
t
t
t
t
Since var(t) < , then var(
var t
t
0 as t
) in the last term =
t
t2
290
As t increases, the time-averages of the noise terms in et and ut converge to
zeros via some version of the law of large numbers due to their stationarity.
All terms except the following also converge to zeros. We are then left with
the following.
so b
0.
Yt t t
Z t t t
where t and t are mean zero i.i.d. random variables that have zero
correlation.
Even though Yt and Zt are not correlated,
Z t
Yt t
Z t t t .
291
So OLS regression of Yt on Zt will give a spurious estimate of 0 .
Zt Z t-1 u t
w t w t-1 t
Yt c dZ t w t
(c ) d dZ t 1 du t w t 1 t
( d) Yt 1 du t t
is also a unit root process where c,d 0. Here Yt is correlated with Zt due to Yt
being a linear combination involving Zt.
If we perform OLS on Yt = c + dZt + wt , d0, the effects are as follows.
The OLS estimate of d will involve cov(Yt , Zt) = cov(c + dZt + wt , Zt) = d
var(Zt) + cov (c+wt , Zt). But the latter is a covariance of two independent unit
root processes each with a deterministic trend (and a stochastic trend as well),
that produces spurious sampling estimate that is not zero. Thus, the sampling
estimate of cov(Yt , Zt) under OLS will also be spurious even if d 0.
At this point we can almost see when OLS on two related unit root
processes such as Yt and Zt can or cannot be feasible. It has to do with the
covariance of the explanatory variable and the residual variable, cov (wt , Zt).
If both are unit root processes, then there is spuriousness.
Suppose instead, wt is a stationary process, and not a unit root process,
independent of Zt . Then the sample estimate of cov (wt,Zt) = 0. In this case,
the OLS estimate of d converges correctly.
In summary, suppose unit root processes Yt and Zt are truly related as
follows: Yt = c + dZt + wt , where disturbance wt has a unit root and is not
correlated with Zt . Then, it will not be appropriate to perform OLS of Yt on Zt
since wt is not stationary. The OLS result will be spurious.
When two unit root variables are not correlated, it is not appropriate to
perform OLS of one on the other, as the slope coefficient estimate will be
spurious. Even when two unit root variables are correlated, OLS regression
will also produce spurious slope estimate if the disturbance is a unit root
variable as seen above. This is because of non-zero covariance in the unit root
regressor and the unit root disturbance. In the latter, the regression will not be
spurious only if the disturbance is stationary.
292
16.3
COINTEGRATION
Sometimes two processes may be non-stationary such that they carry unit
roots. However, they could have a long-term equilibrium relationship so that
over a long time interval, it can be seen that a linear combination of them
behaves like a stationary process and they do not drift away from each other
aimlessly. We say they are cointegrated to each other. To model such a
relationship, we proceed as follows.
In Yt = c + dZt + wt ,
if wt is stationary, then Yt and Zt are said to be cointegrated with cointegrating
vector (1, -d), i.e.
Y
d t c w
t
Z
t
ln Pt ln Pt-1 t .
This is similar to stationary i.i.d. continuously compounded return
P
R t ln ( t ) t
Pt -1
with mean , var R t 2 . The mean and variance of return RT over a
longer interval [t-1,T] are T t 1 and T t 1 2 . This is the same as
, t ~ stationary .
(16.5)
293
Then first difference
Yt Yt - Yt-1 = t
is stationary (without any special structure like in the trend stationary
process). In this case Yt is said to be Integrated order 1 or I(1) process. The
first difference Yt is integrated order 0 or I(0) process and is stationary.
I(k) is integrated order k process. The kth difference of an I(k) process
becomes stationary for the first time. Thus for an I(1) process, it is not
stationary but its first difference is.
But suppose Yt = + Yt-1 + t , t~stationary
then first difference
(16.6)
Yt Yt Yt-1 = + t .
If Yt = + t + Yt-1 + t , t ~ stationary , then
(16.7)
Yt = + t + t .
The above (16.5), (16.6), and (16.7) are all unit root processes. The alternative
stationary autoregressive hypotheses are
Yt = Yt-1 + t
Yt = + Yt-1 + t
Yt = + t + Yt-1 + t
16.4
(|| < 1)
(||<1)
(|| < 1)
(16.8)
(16.9)
(16.10)
How do we test for unit processes (16.5), (16.6), or (16.7)? Using the
alternative specifications in (16.8), (16.9) and (16.10), we can write:
Yt = Yt-1 + t
Yt = + Yt-1 + t
Yt = + t + Yt-1 + t
(16.11)
(16.12)
(16.13)
where = -1. For I(1) processes in (16.5), (16.6) and (16.7), however, (1) = 0. Thus, we can test the null hypothesis of a unit root process by testing
H0: = 0.
294
In practice, before any test is carried out, specifications (16.11), (16.12), or
(16.13) are generalized to include lags of Yt so that any elements of
stationarity in Yt are explicitly modeled and that we are left with a residual
noise et that has autocorrelations removed and is then i.i.d. with mean zero.
This is the same as saying that we model stationary t in (16.11), (16.12), and
(16.13) as MA process. (16.11), (16.12), and (16.13) in general can be
expressed as:
Yt = Yt-1 + 1Yt-1 + 2Yt-2 + + kYt-k + et
(no constant)
Yt = + Yt-1 +
j 1
Yt-j + et
(16.14)
(16.15)
(there is constant)
Yt = + t + Yt-1 +
j 1
Yt-j + et
(16.16)
OLS
.
OLS s.e.(OLS )
This is the usual formula for t-value function, but in this case it is not
distributed as Student-tT-n statistic where T is the sample size, and n is the
number of parameters, i.e. n=k+1 for (16.14), n=k+2 for (16.15), and n=k+3
for (16.16). The distribution was found through some simulation and is
reported in studies by Dickey and Fuller.76 For the computed t-value, we
76
See Dickey, D., (1976), Estimation and Hypothesis Testing in Nonstationary Time
Series, PhD Dissertation, Iowa State University, and Fuller, W., (1976),
Introduction to Statistical Time Series, Wiley New York. See also an early but
295
therefore use the Dickey-Fuller (ADF) critical values for inference to test the
null hypothesis. The critical values for the test are shown in Table 16.1 below.
Table 16.1
Critical Values for Dickey-Fuller t-Test
Source: Fuller, W., (1996), Introduction to Statistical Time Series, (2nd ed.)
New York: Wiley
Case:
No constant
Equation
(16.5)
Case:
Constant
Equation
(16.6)
Case:
Constant and
Time Trend
Equation
(16.7)
Sample
Size
N
25
0.025
-2.26
0.05
-1.95
0.10
-1.60
50
-2.62
-2.25
-1.95
-1.61
100
-2.60
-2.24
-1.95
-1.61
250
-2.58
-2.24
-1.95
-1.62
-2.58
-2.23
-1.95
-1.62
N
25
0.01
-3.75
0.025
-3.33
0.05
-2.99
0.10
-2.64
50
-3.59
-3.23
-2.93
-2.60
100
-3.50
-3.17
-2.90
-2.59
300
-3.45
-3.14
-2.88
-2.58
N
25
-3.42
-3.12
-2.86
-2.57
0.01
-4.38
0.025
-3.95
0.05
-3.60
0.10
-3.24
50
-4.16
-3.80
-3.50
-3.18
100
-4.05
-3.73
-3.45
-3.15
300
-3.98
-3.69
-3.42
-3.13
-3.96
-3.67
-3.41
-3.13
If x < critical value (at say p-value 1%), then we reject H0: = 0 (or = 1),
i.e. there is no unit root at 1% significance level.
296
If x > critical value (p-value 1%), then we cannot reject the evidence of
unit root at 1% level.
As another check on whether a process is a unit root process, the
autocorrelation function (ACF) of the process is computed. A unit root
process will typically show a highly persistent ACF, i.e. one where
autocorrelation decays very slowly with increase in the lags.
16.5
Absolute purchasing power parity (PPP) version states that Pt = etPt* , where
Pt is UK national price index in , Pt* is the US national price index in USD,
and et is spot exchange rate: number of per $.
ln Pt = ln et + ln Pt*
d ln Pt = d ln et + d ln Pt*
*
dPt
de t
dPt
*
Pt
et
Pt
The Relative PPP version states that
*
e t Pt P t
Thus if US inflation rate is 5%, UK inflation rate is 10%, both over horizon T
years, then det/et = 10% - 5% = 5%, and $ is expected to appreciate by 5%
over over T years. et is the nominal exchange rate, exchanging et number of
pounds for one US$. The real exchange rate or real per $ is the number of
units of real good in U.K. that can be obtained in exchange for one unit of the
same real good purchased in U.S. Here, the number of units of real goods
purchased in U.S. per US$ is 1/Pt*, supposing US$ per unit of good is Pt*. The
number of units of the same good that can be obtained in U.K. by exchanging
one US$ is et/Pt where we suppose the price per unit of good is Pt.
The real exchange rate (real per $) is e t Pt . If the real exchange rate of
Pt
$ is rising over time, then it means goods prices in US are becoming more
expensive relative to UK as more units of goods in U.K. can be obtained in
exchange for one same unit in U.S. This can happen if nominal et increases, or
if inflation in U.S. rises relative to that in U.K. If the PPP holds exactly, then
real exchange is 1.
In log form, real exchange rate is
rt = ln et + ln Pt* - ln Pt = 0 under PPP.
(16.17)
297
In reality, in the short run, rt deviates from zero at any t. In the long-run, if
PPP holds, then rt will be a stationary process with mean zero. This means that
rt may deviate, but will over time revert back to its mean at 0. This is the
realistic interpretation of PPP (sometimes called the long-run PPP), rather than
stating rt as being equal to 0 at every time t.
If long-run PPP does not hold, then rt may deviate from 0 and not return to
it. It can then be described as following a unit root process, viz.
rt = rt-1 + t
(16.18)
where t is stationary with zero mean.
If equation ( 16.18) is the case, it means the t has a permanent effect of
causing rt to move away from 0. This is because if 1> 0, then r1=r0+1, so
new r2=r1+2 is a stationary deviation from r1 that has permanently absorbed
1. Moreover, if rt has a drift, then the unit root also incorporates a
deterministic trend moving rt away from zero deterministically as well.
We test the validity of the long-run PPP by testing the null of unit root of
rt. We run OLS on
j rt-j + t
rt = + rt-1 +
j1
to test .
Suppose we test ln et , ln Pt*, and ln Pt separately and they are all unit root
processes. Then it is plausible that rt = ln et + ln Pt* - ln Pt in (16.17) is also a
unit root process. However, it is also possible that rt may be a stationary
process in the following way.
Suppose their linear combination
ln e t
rt ( 1 1 1 ) ln Pt
ln Pt
is stationary and not unit root. Thus the processes ln et, ln Pt*, and ln Pt are
cointegrated with cointegrating vector ( 1 1 -1 ).
In empirical work, some currency pairs satisfy long-run PPP while others
do not. For those that satisfy long-run PPP, i.e. rt is stationary, then we may try
to forecast ln et+1 as follows. In some studies, there is some relaxation of PPP
to allow for a more general cointegrating vector (1 b a) as long as a, b are
close to one. Run OLS on
ln et = + a ln Pt b ln Pt* + ( ln et-1 + ln Pt-1* - ln Pt-1 ) + ut
(16.19)
where ut is assumed to be i.i.d.
Notice how error correction is now built into the equation. When there is
long-run tendency to revert back to zero for rt or else (ln et + b ln Pt* - a ln Pt),
then past deviation ( ln et-1 + ln Pt-1* - ln Pt-1 ) is likely to be of influence via
298
negative to bring the error back towards zero. The above regression is called
an error correction model i.e. using short-run deviation ln rt-1 to help explain
variation in the next period deviation. Error correction specification is possible
provided cointegration exists. If there is no cointegration or long-run reversion
to zero, then error correction is meaningless.
With the error correction model in (16.19), provided ln et , ln Pt , and ln Pt*
are cointegrated, the forecast of next period nominal exchange rate et+1, $ per
, can be obtained as follows.
a
Pt 1
ln rt
e t 1
exp
b
Pt 1
or a and b can be fixed at one if they are not significantly different from one.
In this case we also need to input next period Pt+1 and Pt+1*. These may be
more readily available in the form of price inflation forecast.
A caveat in the above exercise is that exchange rate forecasting is
extremely elusive, and PPP is not a particularly effective forecasting
specification, especially in the short-run.
However, there are many instances where currency pairs do not display
cointegration, and the real exchange rate is not stationary. The following
empirical results show that. We employ per $ nominal exchange rate, US
CPI and UK CPI annual data from 1960 till 2001. The data series are
contained in the datafile PPP.xls. The real exchange rate is shown below.
Figure 16.2
per $ Real Exchange 1960 till 2001
1960
1970
1980
1990
2000
Figure 16.2 shows that log real exchange rate in per $ appears to be mostly
larger than zero, indicating better terms of trade and competitiveness favoring
299
U.K. from 1960 to 2001, except for a period in the early 1970s and during
mid 1980s.
Recall that et is spot exchange rate in number of per $. Results of a unit
root test of ln et , of ln(Pt*) or U.S. price, of ln(Pt) or UK price, of the first
difference of ln(et), and of the real exchange rt are shown in Tables 16.2, 16.3,
16.4, 16.5 and 16.6 respectively.
From Table 16.2, the ADF test-statistic of -1.7760 > -3.1988 at 10%
critical level. Hence we cannot reject that the spot per $ exchange rate
during 1960-2001 follows a unit root process. All the ADF tests in Tables
16.2, 16.3, and 16.4 show that loge of the nominal spot exchange rates and the
loge of the prices are unit root processes. The real exchange rate also has a unit
root as seen in Table 16.6. Its first difference, however, is stationary.
Sometimes, bank reports show the use of PPP in trying to forecast or make
prediction about the future movement of a currency. For example, in the above
per $ spot rate, actual spot $ value may lie above the theoretical PPP $, and
if real exchange is stationary, a bank report may suggest that $ is way
overvalued and PPP will bring about a correction soon to see $ trending back
to PPP level. However, this is a dangerous prediction as the real exchange
seems to be a unit root process during the period, and the $ value may indeed
continue to move upward or not revert down.
Table 16.2
Augmented D-F unit root test of ln et
300
Table 16.3
Unit root test of ln Pt* (US price)
Table 16.4
Unit root test of ln Pt (UK price)
301
Table 16.5
Unit root test on First Difference of ln et
Table 16.6
Unit root test of real exchange rate rt
302
16.6
16.1
PROBLEM SET
The following augmented Dickey-Fuller unit root test statistics were
collected on a certain price series Pt with a sample size of 1000. The
critical values were also shown.
Test without constant and trend
ADF Test Statistic
-1.313870
1% Critical Value
-2.5678
5% Critical Value
10% Critical Value
-1.9397
-1.6158
1% Critical Value
-3.4397
5% Critical Value
-2.8649
-2.5685
1% Critical Value
-3.9722
5% Critical Value
-3.4167
-3.1303
-2.118800
-3.847680
The augmented Dickey-Fuller unit root test was performed again, this
time on the first difference of the Pt series with a sample size of 999.
Test without constant and trend
ADF Test Statistic
-14.57404
1% Critical Value*
-2.5678
5% Critical Value
-1.9397
-1.6158
303
16.3
304
Chapter 17
CONDITIONAL HETEROSKEDASTICITY
APPLICATION: RISK ESTIMATION
Key Points of Learning
Risk management, Value-at-risk, Historical approach, Parametric approach,
Volatility cluster, Volatility persistence, Conditional distribution,
Unconditional distribution, Conditional variance, ARCH, GARCH, ARCH-inmean, Maximum likelihood, Fishers information matrix, Cramer-Rao lower
bound, Asymptotic efficiency, Estimating GARCH, Futures margin
RISK MANAGEMENT
78
305
proprietary trading positions and lending exposures, to corporations
investments, to firms hedging of transactions and translation risks79, and to an
Exchanges margining rules and practices, and so on.
Major sources of risks that impact on asset values or prices are market
risks, credit risks, operational risks, liquidity risks, legal risks, political risks,
and model risks.80 The most prevalent and obvious of these is market risk.
When a bear market starts to run, be it in equities or bonds or futures or
options or commodities or the real estate sector, those with long positions will
start to worry and for good reasons. Conversely, in a bull run, those who sell
short ought to beware. Market movements can erode the value of asset
positions or net wealth, and thus is a major concern for investors, be they
institutions or individuals.
The key concept in market risk management is Value-at-Risk. Value-atRisk (VaR) is the maximum loss (or worst loss) over a specified horizon at a
given confidence level. It is also the minimum loss with probability equal to 1
less the confidence level. The idea of estimating such losses is related to
regulatory bodies requiring firms to hold enough capital so that the firms can
bear any foreseeable losses arising out of their trading and market activities.
The practice of such computations appeared to have formally started in 1980
when the Securities Exchange Commission required financial firms to report
potential losses over a 30-day horizon with a 95% confidence level. The
computation of VaR became commonplace when Basel regulatory bodies
quickened supervisory roles in banking operations worldwide especially in the
developed markets.
Suppose a portfolio of stocks is currently valued at $100 million. This
$100m value is the marked-to-market value81 of the portfolio. If the
consideration of market risk exposure is over a day, then we are analyzing
daily VaR. If it is exposure over a week, then we think in terms of a weekly
loans, investments, and other business activities. Sound operating banks in a country
ensure economic stability and a well functioning capital market.
79
Firms that export or import, and thus receive or pay foreign currencies for the
goods, typically hedge currency risk by selling or buying currency forward contracts
in order to lock in a certain exchange rate. This is an example of hedging transaction
risk. Translation risk has to do with accounting exposures.
80
See Steven Allen (2003), Financial Risk Management: A Practitioners Guide to
Managing Market and Credit Risk, Wiley Finance. There are many excellent books
written both by academics and by practitioners on this growing and important subject.
81
This means that the assets could be sold if needs be to realize the marked-to-market
dollar value. On the other hand, in situations of illiquid market or when the asset is not
placed onto the market for sale, there is no ready market price, In such cases,
theoretical models to price the assets are used, and the assets are said to be marked-tomodel when a value is assessed based on the model price.
306
VaR. Over a day, suppose the probability distribution of daily portfolio return
rate ~r , is normal and shown in Figure 17.1, where is daily return volatility.
We assume that over a day, the expected return is negligibly small and cannot
be estimated accurately, so we set it to zero, i.e. = 0. Suppose = 0.02.
Figure 17.1
Daily Ex-ante Portfolio Return
Negative -0.0466
return rate at
99% confidence
level
-2.33 -1.645
Normal ~r
At 99% confidence level (critical value on the left tail for standard normal Z is
-2.33), Prob( [ ~r -]/ Z < -2.33) = 1%. Hence Prob( ~r < -2.33) Prob (
~r < - 0.0466) = 1%. Since ~r = P /P 1, where P is the next period or next
1 0
1
days portfolio price, then if ~r < 0, the portfolio value loss is P0 P1 = - P0 ~r .
In this case, the daily VaR (or portfolio value loss) at 99% confidence level is
P0 ~r 99% = - 100m -0.0466 = $4.66 m. There is a chance of losing $4.66 m
or more out of the portfolio of $100 m, with a probability of 1%. This is also
called the absolute VaR when = 0.
Suppose instead = 0.01 or a 1% expected increase in return. Then the
return distribution is shifted to the right by . This is shown in Figure 17.2.
At 99% confidence level (critical value on the left tail for standard normal
Z is -2.33), Prob( [ ~r -]/ Z < -2.33) = 1%. Hence Prob( ~r < -2.33)
Prob ( ~r < 0.01- 0.0466 -0.0366 ) = 1%. Since ~r = P1/P0 1, where P1 is the
next period or next days portfolio price, then if ~r < 0, the portfolio value loss
is P0 P1 = - P0 ~r . In this case, the daily VaR (or portfolio value loss) at 99%
confidence level is P0 ~r 99% = - 100m -0.0366 = $3.66 m. There is a chance
of losing $3.66 m or more out of the portfolio of $100 m, with a probability of
1%. This is also called the absolute VaR when > 0.
307
Figure 17.2
Daily Ex-ante Portfolio Return
Negative -0.0466
+ 0.01 return rate
at 99% confidence
level
Normal ~r
0
-2.33 -1.645
Absolute VaR $3.66m
Relative VaR $4.66m
The absolute VaR is loss measured with respect to the current marked-tomarket portfolio value regardless of . On the other hand, if the loss is to be
computed taking into account the loss also of the expected profit P0, then the
loss with respect to the expected value E(P1), not current value P0, is E(P1)
P1 = P0(1+) P1 = P0 - P0 ~r . This is called the relative VaR. Relative VaR
Absolute VaR + P0. Here P0 = 0.01 100 m = $1 m. Hence relative VaR
= $ 4.66 m. For > (<) 0, Relative VaR > (<) Absolute VaR. For > 0,
Relative VaR is more conservative, giving a higher VaR or loss number.
There are 3 major approaches to measuring the probabilities:
(a) Historical Approach
Collect the immediate past historical daily returns and form an empirical
distribution.
(b) Parametric (VaR) Approach
Assume a normal distribution. Use immediate past daily returns to estimate
the mean and variance of the normal distribution.
1 T
rt
T t 1
r
T 1
t 1
308
* 5 days 5* 2 daily
2
or * 5
For mean * 5 days 5 * daily .
(c) Monte Carlo Approach
The price processes are specified. Usually they are complicated and
cannot be solved analytically. Computer simulations of the possible
paths of the prices are made millions of times, and the distribution of
such path prices is then collected and used for determining the critical
regions of losses.
In the Parametric Approach, we see how critical is the term volatility ,
from now to end of next day in determining VaR. If is underestimated, then
VaR will be too small and excessive risk may be overlooked, creating grave
dangers for the credit standing of the exposed firm or bank.
Why does volatility matter in the world of finance? To motivate, let us
consider the Nick Leeson lesson. In late 1994 and early 1995, Leeson, who
was chief trader for Barings Futures in Singapore, accumulated a speculative
long position on the Nikkei 225 Futures contracts worth nominally US$7.7
billion. By mid-February 1995, the N225 index had fallen more than 15%.
This is clearly a case of great volatility. Together with losses on options,
Leeson losed US$1.3 billion and wiped out the entire equity capital of the
Barings PLC Bank based in U.K. at that time. History seems to repeat itself
when in January 2008 it was reported that a rogue trader at Socit Gnrale
through fraudulent practice losed more than US$7 billion.
If we use stationary ARMA process seen in Chapter 6 to model return, we
would have constant variances, and this will often underestimate volatility
when market becomes uncertain and volatility clusters together. This is
evidence of volatility persistence and is illustrated in the following diagram.
Clearly, such clustering over say a few days as depicted above means that
based on the most recent information e.g. information the day before at t, t,
one can provide a much more accurate forecast of next period t+1 volatility.
This is a forecast of conditional (on t) volatility. We will require the
modeling of the return process using other than ARMA process since the latter
has constant conditional volatilities.
Modeling clustering avoids underestimating volatility and hence also VaR
during uncertain periods. We employ models of autoregressive conditional
heteroskedastic processes (ARCH) with changing conditional variance.
It should also be mentioned that VaR as reported by banks to regulatory
agencies on a daily basis is meant to take care of normal dayto-day risks.
Therefore, in situations where major market movements are expected e.g. 9/11
309
or at the aftermath of Lehman Brothers bankruptcy in September 2008,
additional extreme risk tools would be deployed in addition to the tool of
VaR.
Figure 17.3
Illustrating a volatility cluster
Volatility
cluster
Time
17.2
ARCH-GARCH
(17.1)
where E(ut) = 0, Cov (ut, ut-k) = 0, k 0, and xt, ut are stochastically stationary
and independent, then Var (yt| xt) = Var(ut) = u2 is constant. In (17.1), we can
think of xt in general terms, including it being lagged yt-1. So far we have not
modeled anything about the variance of ut.
However, suppose we model a process on the variance of ut (not ut itself
note this distinction) such that:
Var (ut) = 0 + 1ut-12 .
(17.2)
310
This is an autoregressive conditional heteroskedasticity or ARCH (1) model82
in disturbance ut. Then in (17.1) and (17.2), we that Var (yt| xt) = Var(ut) = 0
+ 1ut-12 u2. The conditional variance of yt indeed changes with past levels
of ut-1, although the latter cannot be directly observed.
We can also write (17.2) in terms of a ut process as follows:
u t e t 0 1u 2t 1
(17.3)
where et ~ N(0,1). To be precise, (17.3) implies (17.2), but the converse is not
necessarily true. It is interesting to know what is the nature of the distribution
of the disturbance ut. From (17.3), it should be evident that ut is
unconditionally not a normal distribution. However, conditional on ut-1, ut is
normally distributed. Using Monte Carlo simulation with a sample size of
10000, a histogram of the distribution of ut is produced as follows.
Figure 17.4
Monte Carlo Simulation of errors u t e t 0 1u 2t 1
2400
Series: u
Sample 1 10000
Observations 10000
2000
1600
Mean
0.004843
Median
0.000929
Maximum
3.423084
Minimum
-3.927891
Std. Dev.
0.497121
Skewness -0.036257
Kurtosis
5.531066
1200
800
400
JarqueProbability
Bera
0
-3.75 -2.50 -1.25
0.00
1.25
2671.481
0.000000
2.50
The seminal article in this area is Engle, R., (1982), Autoregressive Conditional
Heteroscedasticity with Estimates of the Variance of United Kingdom Inflations,
Econometrica, 50, 987-1008. Its significant generalization is in Bollerslev, T., (1986),
Generalized Autoregressive Conditional Heteroscedasticity, Journal of
Econometrics, 31, 307-327.
311
When Var (ut) = 0 + 1ut-12 + 2 ut-22 +..+ q-1ut-q+12 + qut-q2, we call the
conditional variance of ut above an ARCH(q) process.
Besides (17.2), another model of changing conditional variance is
Var (ut) = o + 1ut-12 + 1 Var (ut-1)
(17.4)
i.i.d.
x t ~ N 1, 0.4 , u t ~ N 0, 2 ,
y0
y1
1
2
1
1
2
1
x 0 u0
x1 u1
2 2
.
1
2
1
2
312
E yt E x t
1
2
Var y t 2 x u
1
2
1 1
* 0.4 2 2.1
4
The plot of the time-path of yt is shown in Figure 17.5. It is seen that yt
behaves like a random series with a mean at 1 and the two dotted lines are the
two standard deviations away from the mean. In this case, they are 1 2.98
(about 4 and -2 respectively), with 2% probability of exceeding each way the
region between the two dotted lines.
2
Figure 17.5
Stationary process yt ~ N(1, 2.1)
t
Next we simulate sample paths of another process {yt}t=1,2,200 that follows
(17.1) and (17.4) instead.
yt x t u t .
i.i.d.
x t ~ N 1, 0.4
313
, 0
, 1
, and 1
2
2
2
4
d
1 1
y0 x 0 u 0 , u 0 ~ N 0, 2
2 2
1
2
Once u02 and Var(u0) are obtained, we can use (17.4) to obtain Var(u1). Next
2
simulate u1= e1 0 1 u 0 1 Var ( u 0 ) for e1 ~ N(0,1).
Put y1
1 1
x1 u1 .
2 2
In general,
u t e t 0 1 u 2t 1 1 Var (u t 1 ) .
The plot is shown in Figure 17.6.
Figure 17.6
GARCH error process Var (ut) = 0.5 + 0.25ut-12 + 0.5 Var (ut-1)
Unconditional yt has mean, variance 1, 2.1
t
Figure 17.6 shows a similar yt process as in Figure 17.5, with yt = + xt +
ut. Its unconditional mean and variance are the same as yt in Figure 17.5.
Unconditional mean and variance of yt are 1 and 2.1. However, its variance
314
follows the GARCH error process: Var (ut) = 0.5 + 0.25ut-12 + 0.5 Var (ut-1).
The Figure shows that yt behaves like a random series with a mean at 1 and the
two dotted lines are the two standard deviations away from the mean. In this
case, they are 1 2.98 (about 4 and -2 respectively), with 2% probability of
exceeding each way the region between the two dotted lines. There appears to
be more volatility. At about the 50th observation, the variance clusters together
and y-values persist below -2.
Figure 17.7
GARCH error process Var (ut) = 0.5 + 0.25ut-12 + 0.7 Var (ut-1)
Unconditional yt has mean, variance 1, 2.1
t
We provide another simulation using the same yt = + xt + ut with
unconditional mean and variance of yt at the same 1 and 2.1 values
respectively. However, its variance now follows GARCH error process: Var
(ut) = 0.1 + 0.25ut-12 + 0.7 Var (ut-1) where clustering or persistence in
volatility should be more evident because of the high 1 = 0.25, and the higher
1 = 0.7. Indeed Figure 17.7 shows the persistent and much higher volatility
with yts exceeding +15 and falling below -15 in the observations from 100 to
150. Thus we see that GARCH modeling of variance is able to produce the
kind of persistence and clustering in volatility sometimes observed in market
prices.
In addition to models such as (17.1) and (17.4), suppose
2
y t x t t u t
315
where t Var y t | x t , u t 1 0 1u t 1 . Then the yt process is an ARCH2
in- mean or ARCH-M model. This version basically has the variance t2
driving the mean effect E(yt).
It is interesting to note that if we perform a regression of
Yt =c0 + c1 Xt + ut ,
where ut follows a GARCH process, OLS estimators are still blue as long as ut
is unconditionally stationary. However in such cases the maximum likelihood
estimators will usually be more efficient. We shall discuss the estimation later.
17.3
STATIONARITY CONDITIONS
It is interesting to note that while GARCH processes are conditionally nonstationary with changing variances, they are still unconditionally stationary
processes. For reasons of data analyses, when we have only one time series or
one sample path, it is important to be able to invoke the law of large numbers
or ergodic theory so that sample averages can converge to population
parameters. The convergence requires stationarity, hence stationarity is
essential for most purposes. We shall show how GARCH processes are also
stationary.
If we expand (17.4):
Var (ut) = o + 1ut-12 + 1 Var (ut-1)
= o + 1ut-12 + 1 [o + 1ut-22 + 1 Var (ut-2)]
= o(1+1) + 1( ut-12 + 1 ut-22) + 12[o + 1ut-32 + 1 Var (ut-3)]
= o(1+1+ 12+.. ) + 1( ut-12 + 1 ut-22 + 12 ut-32 + )
(17.6)
Let 2 = Var(ut), being the unconditional variance of ut. Taking unconditional
expectation on both sides, and assuming there exists stationarity so that 2 =
E(ut-12) = E(ut-22) = E(ut-32) = ., then
2
= o/(1-1) + 1(2 + 1 2 + 12 2 + .. )
= o/(1-1) + 12/(1-1)
= [o+ 12] / (1-1), supposing |1| < 1.
316
17.7 above, given the parameters 0
, 1
, and 1
, the
4
10
unconditional variance of the GARCH disturbance is 2= 0.1/(1-0.95) = 2.
17.4
10
1
2e2
1 Y f X ;
2
e
1 12 Yt f X;
e
e t 1
L=
.
2
2e
N
This is the density function of observing the sample, or the chance (we
should strictly be careful not to interpret density as chance) of the sample
taking the values that it did.
Taking natural logarithm which preserves the relative magnitudes, the loglikelihood function is
N
1 N Y f X;
2
.
log L = log 2e t
2
2 t 1
e
317
Suppose the parameters to be estimated are put in a vector [if we use the
earlier example, then this means , e ], and the variables Y, X are
also notationally subsumed in Z. Then
LZ; dZ 1 ,
which is the property of any density function, that the area under the density
curve sums to one. Differentiating the above with respect to the parameter ,
LZ;
dZ 0 .
log L 1 L
L
log L
, then
, so
log LZ;
log LZ;
dZ E
L
0 .
Since
(17.7)
2 log LZ;
log LZ; LZ;
L
dZ
dZ 0 .
Or,
2 log LZ;
log LZ; log LZ;
L
dZ
L dZ 0 .
Hence
2 log LZ;
log LZ; log LZ;
L
dZ
L dZ .
2 log LZ;
E
, defined as R().
T
318
, R(
) is also the covariance matrix of the
When the parameters are ML
vector
log L Z ;
which has a mean of 0 seen in (17.7).
Eh ( Z) h ( Z)LZ; dZ .
In
differentiating
the
left-hand
side,
recall
that
L
log L
.
L
log LZ;
log LZ;
LZ; dZ E h ( Z)
T
I .
log LZ;
Thus, I is the covariance matrix between h(Z) and
since the
T
h (Z)
h ( Z) cov[h ( Z)]
I
.
cov log L
I
R ()
Suppose is n 1 vector. Then h(Z) is also n 1.
log L
is n 1. So the
I p
cov[ h ( Z)]
p T R 1
I
R () R 1p
pT cov[ h ( Z)] R 1 p 0 .
The last line above shows the Cramer-Rao inequality. Thus the covariance
matrix of any unbiased estimator, cov[h(Z)] is larger than or equal to the
inverse of the information matrix R.
We can write, for any arbitrary n 1 vector p, pT cov[h(Z)] p pT R-1 p, a
11 number. Clearly, if we choose pT = (1, 0, 0,.., 0), then pT cov[h(Z)] p =
319
variance of the unbiased estimator of the first parameter in vector , and this
is bounded below by the first row-first column element of R-1, say r11. Suppose
r11
R 1
r22
rkk
Then all unbiased estimators have variances bounded below by the CramerRao lower bounds (r11 , r22 , .. , rkk) respectively. An estimator that attains
the lower bound is said to be a minimum variance unbiased or efficient
estimator.
Though maximum likelihood estimators are not always unbiased (one
exception is the linear regression model when ei is normally distributed the
OLS estimator and ML estimator are identical in that case) in finite sample,
they are in most cases consistent and asymptotically efficient. This makes ML
a favorite estimation method especially when the sample size is large.
17.5
ESTIMATING GARCH
Given a historical time series of futures price and its changes, how do you
estimate the daily value-at-risk at 95% confidence interval83? Traditional
methods have used sampling variance method (assuming constant conditional
variance). Since there is plenty of evidence of volatility clustering and
contagion effects when markets moved together over a period of several days,
modeling volatility as a dynamic process such as in GARCH (including
ARCH) is useful for the purpose of estimating risk and developing margins
for risk control at the Exchange.84
DFt
Suppose for the following day, the volatility of F is forecast at in
t
order to estimate the Value-at-Risk or VaR of a long N225 futures contract
83
For bank risk control, it is usual to estimate daily 95%, 97.5%, or 99% confidence
intervals Value-at-risk, sometimes doing this several times in a day or at least at the
close of the trading day. Sometimes a 10-day 95% confidence interval VaR is also
used.
84
Perhaps one of the earliest applications of GARCH technology to Exchange risk
margin setting could be found in Lim, KG, (1996), Weekly Volatility Study of
SIMEX Nikkei 225 Futures Contracts using GARCH Methodology, Technical
Report, Singapore: SIMEX, December, 15pp.
320
position. Daily VaR at 95% confidence interval is such that Prob
( F1 F0 1.645 F0 ). Recall therefore, VaR is 1.645 F0 .
An Exchange is interested to decide the level of maintenance margin per
contract, $x, so that within the next day, chances of the Exchange taking risk
of a loss before top-up by the client or broker member, i.e. when event
( | F1 F0 | x ) or loss exceeding maintenance margin happens, is 5% or less.
Then, x is set by the Exchange to be large enough, i.e. set x 1.645 F0 .
Thus forecasting or estimating for the next day is an important task for
setting Exchange contract margin for the following day. We shall model the
DFt
conditional variance of daily rates of the futures price change F as a
t
DFt
GARCH (1,1) process. Assume E[ F ] = 0 over a day.
t
DFt
Let
= ut and then assume Var(ut) follows a GARCH (1,1) process as
Ft
follows.
Var (ut) = c + ut-12 + Var (ut-1)
(17.8)
We wish to estimate the parameters {c, , } in (17.8) and then use them to
forecast the following days volatility Var(ut+1). Notationally, we let
Var u t 2t , and an estimate of the volatility t 1 as t 1 . If we take a
DFt
historical sampling variance of a past daily time series { F }t=1,2,,N , and let
t
this sample variance be s2, then we assume the initial Var(u0) = s2, and initial
u0=0. Furthermore, assume ut conditional on ut-1 and on t-1 is normally
distributed. This shall allow us to perform a ML estimation of {c, , } in
(17.8). Although assuming other distributions could in principle allow a ML
estimation, the form of the density function could be complicated for accurate
computations.
Here we observe a past history {ut}t=1,2,.,N . The likelihood function of
this history is:N
1 12 u t
e t 1 t
2
t 1 2
t
321
2
N
1 N
1 N u
Log L log2 log 2t t .
2
2 t 1
2 t 1 t
1 m
m 1
t 1 u 2m t m s 2
= c
t 1
1
(17.9)
where we have put in u0=0 and 02=s2. In the above formula, for example,
when m=1, m2=c+s2.
Putting expression (17.9) in the log-likelihood function, we maximize the
following objective function:
1
1 N
log c
2 t 1 1
Log L *
2
t 1 j1 2
ut
t 2 1 N
u t j s
t
j1
2 t 1 c 1 t1 j1 u 2 t s 2
t j
1 j1
where we have removed the constant term N/2 (log [2]). Note that log L* is
a function of parameters {c, , } given sample size N and s.
Using (17.9), once the parameters are estimated, the next day volatility can
be estimated as:
N
1 N 1
t 1 u 2N t 1 N 1s 2 .
N 1 c
t 1
1
17.6
322
(17.4). From the section on stationarity, we see that the GARCH process can
be expanded in a similar way. The last term on the right-hand side shows an
infinite number of lags in ut-j2s.
Var (ut) = o + 1ut-12 + 1 Var (ut-1)
= o + 1ut-12 + 1 [o + 1ut-22 + 1 Var (ut-2)]
= o(1+1) + 1( ut-12 + 1 ut-22) + 12[o + 1ut-32 + 1 Var (ut-3)]
= o(1+1+ 12+.. ) + 1( ut-12 + 1 ut-22 + 12 ut-32 + )
Thus, heuristically, a GARCH process may be expressed as follows for an
arbitrarily large number of lags N, where the cjs are constants :
ut2 = c0 + c1 ut-12 + c2 ut-22 +..+ cqut-q2 ++cNut-N2 + et.
It is clear from both the expressions of ARCH and GARCH above that there is
autocorrelations (serial correlations) in the square of the residuals. For
ARCH(q) , autocorrelation in ut2 is non-zero up to lag q, and becomes zero
after that lag. For GARCH(q,p), autocorrelation in ut2 is non-zero for an
arbitrarily long number of lags.
Considering (17.1), suppose we estimate via OLS and then obtain the
estimated residuals:-
u t y t x t .
Compute time series { u 2t }. Then using the Ljung and Box Q-test in Chapter 6
on the u 2t and its auto-correlogram (not u t ) , test if correlations H0:
(1)=(2)=(3)=..=(q)=0. If H0 is rejected for an auto-correlogram that is
significant out to lag q, then ARCH(q) is plausible. If the correlations do not
appear to decay or disappear, then a GARCH process is likely. We should also
follow up with the Ljung and Box Q-test on
2
u t
.
var u t
17.7
PROBLEM SET
17.1
323
and et is an unobserved i.i.d. disturbance and is distributed as N(0,1).
ht is an unobserved process that is uncorrelated with xt and is modeled
as follows:
ht = 0.2 ht-1 + 0.2 et-12
Suppose consistent estimates of a, b are 0.03 and 0.5 respectively. ht-1
is estimated consistently at 0.01. xt-1 has value 0.10 and current return
rt-1 is effectively 5%. Current price of the portfolio Pt-1 is $10 million.
Next period value xt is 0.12 and its variance is 0.02.
Show how you would compute the absolute and also the relative
Value-at-Risk of the portfolio at time t at 99% confidence interval?
Given:
z-value
% Area to the right of the unit normal curve
1.28
10.0%
1.645
5.0%
1.96
2.5%
2.33
1.0%
2.56
0.5%
17.2
17.3
324
17.4
[2]
325
Chapter 18
MEAN REVERTING CONTINUOUS TIME PROCESS
APPLICATION: BONDS AND TERM STRUCTURES
Key Points of Learning
Bonds, Credit ratings, Yield-to-maturity, Bond equivalent yield, Bond total
return, Zero coupon bond, Zero yield, Spot rate, Bootstrapping, Spot yield
curve, Credit spread, Continuous Time Process, Short rate, Mean reversion,
Ornstein-Uhlenbeck process, Vasicek model, Cox-Ingersoll-Ross model,
Bond pricing, Treasury slope, Business cycle
In this chapter we shall study bonds and their term structures. Bonds are a
major class of investment assets distinct from equities, and they have salient
features that will be explained. Continuous time stochastic processes are
briefly inttroduced in this chapter to show their usage in modeling bond prices
and therefore the resulting credit spreads and yields. Multiple regression
analyses involving explanation of credit spreads and of bond returns are
described.
18.1
BONDS
Capital markets in the world may be broadly divided into money markets
where borrowing and lending of short-term monies take place, equity markets
where companies raise capital for production by issuing stocks or shares to
investors, and the bond markets (sometimes called Fixed Income markets)
where medium to long-term borrowings and lendings occur.
Medium one-year to ten-year borrowings are usually called notes while
longer-term borrowings are called debts or bonds. While in the past, bonds
take the form of nice colorful certificates stating the entitlement of holders
(lenders) to payments of interests and principal repayments, these days they
are mostly in electronic entry forms. In the past, when an interest on a bond is
due, holders will tear off a coupon attached to the bond certificate or package
and sent to the company to require payment of interest. As a result, bonds with
periodic interest payments are usually called coupon bonds.
Debts are of two major categories those that are negotiable instruments
may be bought and sold in a secondary market. These debts include not just
coupon bonds and notes, but also asset-backed and mortgaged-backed and
326
structured notes with attached derivatives. Non-negotiable debts on the other
hand are usually loans made privately between two parties, and are generally
not tradeable.
Borrowings in the form of bonds come from various sectors of the
economy. When sovereign governments borrow, the bonds are called
government bonds and usually carry the guarantee of the government in
repayment. Thus sovereign bonds typically have very high credit ratings. Of
course, there are governments in some countries that do not appear stable and
their economies appear to be in serious trouble with high inflation and low
incomes. In such cases, default did occur and the credit ratings might be bad.
Corporate bonds issued by commercial companies are very large in size. Debt
borrowing by companies is the other major source of capital financing for
companies apart from equity issues. Banks and financial institutions also issue
debts for various purposes. They usually borrow at cheaper rates and lend at
higher rates.
We shall look at some major characteristics of a bond as follows. These
characteristics would have some impact on the value of the bonds or the prices
that the bonds would be able to sell for in a liquid market.
(a) Credit rating of issuer:
There are several key rating agencies such as Standard and Poor, Moodys,
and Fitch, that regularly provide updates on ratings of major bonds and
corporations. S&P ratings, for example, go from the almost risk-free AAA
(Aaa in Moodys) to D which is default. Quality investment grade bonds are
typically BBB and better, while speculative risky bonds are BB and below,
and sometimes also referred to as junk bonds which means they give high
returns but also carry non-trivial probabilities of default. When a bond
defaults, all promised payments in terms of coupons and principals are
canceled, and the investor may hope for only a fraction of the outlay once the
liquidation process is completed.
Bonds issued by firms or counterparties that are deemed to be credit-risky
and which face a positive probability of default (or going bankrupt in some
cases) are priced lower to induce investors to buy. This translates into a higher
yield. Bonds with low credit ratings therefore have higher yields than bonds
with higher credit ratings. If we substract the Treasury yield from the creditrisky yield of a rated bond, we obtain the credit spread of that rated bond.
Credit spreads increase with lower ratings. AAA bonds typically have low
spreads of several basis points. B rated bonds may have spreads of several
hundred basis points (or several % above Treasury rate of the same tenor or
maturity).
327
(b) Bond coupon and frequency:
The higher the coupon interest and frequency means more interest payments
and at a faster rate, which is good for the investor.
(c) Maturity of Bond:
The longer maturity means that the principal will be repaid only after a longer
time, and is thus more risky for the investor.
(d) Taxability:
Some bonds carry tax-free status, which is common for municipal bonds. Taxfree bonds however usually give a lower interest rate. In general, a high
income-tax investor would prefer tax-free bonds because coupon interest is
considered as taxable income in most other bonds.
(e) Type of bond Straight, callable, puttable, sinking fund, zero-coupon:
Straight bonds have well defined coupon payment dates and final redemption
or maturity date when principal would be repaid. Callable bonds subject the
investor to risk of the bond being called for early redemption by the issuer if
the bond prices are rising. Therefore callable bonds usually carry higher
interest rates as compensation to investors for this call risk. Puttable bonds is
usually the reverse situation when investors can sell the bonds back to the
issuer at an earlier date than maturity. In this case, the interest rate received by
the investor is usually lower to pay for this option they hold. In a common
type of sinking fund bond, the issuer may periodically call back parts of the
bond issue. Like callable bonds, sinking fund bonds carry higher interest
because there is risk to the investor who may have to surrender the bonds at a
time when alternative interest rates are low in the market. Zero coupon bonds
are also discount bonds where the bonds are issued at a deep discount. No
interim coupons are paid. At redemption, the investor is paid the bonds par
value.
(f) Liquidity:
Bonds that cannot be traded or sold in a liquid market usually require a
compensation in terms of higher interest or yield for bearing this risk.
There are other macroeconomic factors that affect the prices of bonds
generally. If inflation is uncertain and inflation risk premium in the form of
expected inflation is high, then via the Fisher effect discussed in an earlier
chapter, this induces a higher nominal interest rate in the economy, and thus
lower bond prices. During economic recession, credit spreads will be
especially wide, so credit-risky bonds will be priced at very low prices. The
converse is true of boom times. Increased money supplies by governments
328
also tend to reduce the general level of interest rates, and vice-versa.
Governments can directly intervene in the bond market through open market
operation by buying up Treasury bonds to increase money supply and by
selling such bonds to reduce money supply in the economy.
18.2
YIELD-TO-MATURITY
88000
1
2
5% 100000 12 5% 100000
1 y / 2
1 y / 22
1
2
5% 100000 100000
1 y / 220
Notice that the numerator of the cashflows on the right-hand sides are the
interest rate payments. The last payment includes the redemption of $100,000.
The denominator represents the opportunity cost of the investment, and is a
discount rate. As y is annualized, the 6-monthly discount uses y/2 %.
The discount rate y% is also called the Yield-to-maturity of the bond. In
other words, if an investor should hold the bond to maturity, then he can
realize the yield of y% p.a.
There is an inverse relationship between current bond price and yield-tomaturity (YTM). As YTM rises for a particular bond, its price falls, and viceversa.
Although YTM is simple to compute and offers some idea of what kinds
of returns accrue in investing in a particular bond, the YTM method has some
shortcomings.
Firstly, YTM assumes that cashflows, specifically the coupon interests,
are re-invested at the same YTM rates over time. This may not be feasible as
short-term interest rates for investment change and are uncertain. Secondly,
the YTM assumes that discount rate for every cashflow in the future till
maturity is the same. In other words, discount for a cashflow in ten years time
is the same as the discount for a cashflow in one years time, except that the
329
ten-year discount is compounded ten times. This is not a true picture of the
real market where sometimes grave uncertainty even two years out will mean
that the discount applied to a two year future cashflow will be much more than
twice that of a one-year cashflow. YTM is a single rate that applies across all
cashflows up to a certain maturity T, and is not as flexible as a framework
whereby different discount rates are applied to different maturities up to T.
In U.S., corporate bonds usually issue coupons twice a year, so the coupon
rate is a semi-annual coupon rate. If a bond issues coupon only once a year,
the annualized rate is usually converted to a semi-annual equivalent rate called
the bond-equivalent yield. For example, a European bond pays one annual
coupon of 6% p.a., then its bond-equivalent-yield (BEY) comparable to US
bonds that are typically semi-annual coupon, is found via 2 x [(1.06)0.5 1] =
5.91% p.a. Conversely, a US semi-annual bond at BEY 5.91% p.a. is
equivalent in payment at end of the year to (1+ x 0.0591)2 -1 = 6%.
18.3
For investments with different horizons other than the bonds maturities, it
makes sense to compute the bonds total rate of return. For investments with a
portfolio comprising both equities and bonds, the bond returns also allow for
measuring returns correlation for diversification purpose.
For computing the bonds total rate of return, we require the following
information:
(a)
(b)
(c)
(d)
(e)
(f)
(g)
initial price
end of horizon price
holding period
reinvestment rate
interim cash-flows
compounding frequency
daycount convention
330
$25,652 in 1 year. The compounding frequency is semi-annual, and the
daycount convention is (actual number of days)/365.
Suppose the accrued interest is 10/182 $25,000 = $1374 at the start.
This means that the bond buyer will pay this additional amount to bond seller.
Suppose the accrued interest at the end of horizon when selling the bond is
11/183 $25,000 = $1503. The seller will obtain this amount in addition to
the clean or quoted bond price. The total price is sometimes called the full
or dirty price.
Then, the total bond return rate
= Future Value/Present Value -1
= $(982,250+50,652+1503)/(995,000+1374) -1
= $1,034,405/ 996,374 1
= 3.817% p.a.
18.4
ZEROS
331
yield is computed as (100,000/50,835)0.1 1= 7% p.a. on annual
compounding. Its BEY z is found via (1.07)10 = (1+z/2)20, or z = semi-annual
6.88% p.a. Thus, it is about 18 basis points (0.18%) higher for the zero
compared to a coupon bond of the same maturity. This semi-annual 6.88%
p.a. or annual 7% p.a. is called the 10 Y zero rate and is also called a 10 Y
spot rate.
The idea of zero or spot rate overcomes the shortcomings and problems of
using YTM. The spot rate RT at a given maturity T is the YTM on a zero
coupon bond with that maturity. It is the appropriate discount rate for valuing
the present value of any cash flow occurring on a specific maturity date in the
future. Any two different amounts $A or $2A occuring at T in the future
would unambiguously have present values of $a and $2a, using the same Tyear spot rate for discounting (assuming also use of same compounding
frequency and day-count convention).
18.5
SPOT RATES
332
Table 18.1
Current Coupon Bond Prices
$ Price
99.30
99.50
100.20
101.20
End Year 1
104
4.5
5
5.5
End Year 2
End Year 3
End Year 4
104.5
5
5.5
105
5.5
105.5
Coupon
4%
4.5%
5%
5.5%
To find the term structure of yield, first determine the one-year spot rate y1.
Put the one-year zero coupon bond price equal to the present value of the
cashflows in one years time.
99.30 = 104/(1+y1) implies y1 = 4.733%. Thus, one-year zero price =
1/(1.04733) = 0.9548.
Now put the two-year zero coupon bond price equal to the present value of the
cashflows in one years and two years time.
99.50 = 4.5/1.04733 + 104.5/(1+y2 )2 implies two-year spot rate y2 =
4.769%.
The two-year zero price = 1/(1.04769)2 = 0.9110. Similarly,
100.20 = 5/1.04733 + 5/(1.04769)2 + 105/(1+y3 )3 implies three-year spot
rate y3 = 4.935%. Three-year zero price = 0.8654.
And 101.20 = 5.5/1.04733 + 5.5/(1.04769)2 + 5.5/(1.04935)3 + 105.5/(1+y4)4
implies four-year spot rate y4 = 5.187%. Four-year zero price is 0.8169.
Hence the spot rates are found and plotted on a spot rate curve or zero yield
curve. See Figure 18.1 below.
Figure 18.1
Spot Rate Curve
Yield %
4.935%
3 years
Maturity
333
It is important to emphasize the usefulness of the spot rates to price a fixed
income security. The spot curve is the fundamental building block for the
valuation of any fixed income security. Just as a coupon bond can be
decomposed into a portfolio of zero coupon bonds, spot rates allow any
combination of timed cash-flows or coupon flows to be priced as a coupon
bond. For example, a 4Y coupon bond may have the following specifications:
(a) Repayment of principal of par $100,000 at end of 4 years
(b) First coupon payment of effective 5% at end of 1Y
(c) Second coupon payment of effective 10% at end of 3Y
(d) No coupon payment in the second and final year.
To find its no-arbitrage current price, we just need to find out the spot
rates (or equivalently the zero prices) at the points of the cashflows, i.e. at end
of 1Y, 3Y, and 4Y. Using the numbers from the bootstrapping illustration, the
1Y, 3Y, 4Y zero prices are 0.9548, 0.8654, and 0.8169. Hence, the present
value of this bond is $5000(0.9548) + 10,000(0.8654) + 100,000(0.8169)
=$95,118. If the market price for this bond is any other than this price, then
arbitrage opportunities would be possible by trading in 1Y, 3Y, and 4Y zeros
that are mispriced.
18.6
334
The credit spread for the Aa bonds is shown, and in this example, the credit
spread widens as maturity increases, given the greater uncertainties and risks
associated with a longer horizon.
Figure 18.2
Term strutures: spot yield curves for different credit ratings
Yield %
Ba
Aa
Credit spread
Treasury
Maturity
18.7
See a classic such as Samuel Karlin and Howard M. Taylor, (1981), A Second
Course in Stochastic Process, Academic Press.
86
See Robert C. Merton, (1990), Continuous-Time Finance, Basil Blackwell.
335
differential equation is a multivariate deterministic function, the solution of a
stochastic differential equation is a random variable that is characterized by a
probability distribution.
The short rate is the spot interest rate when the term goes to zero. Let the
short rate be r. This is the instantaneous spot rate. An example of a diffusion
process of the short rate is:
drt = ( - rt ) dt + rt dWt
(18.1)
(18.2)
where the equivalence of (18.2) and (18.1) is readily seen when we put =
87
Dothan, Uri, (1978), On the term structure of interest rates, Journal of Financial
Economics 6, 59-69.
88
Brennan, M.J. and Eduardo S. Schwarz, (1977), Savings bonds, retractable bonds,
and callable bonds, Journal of Financial Economics 3, 133-155.
89
Vasicek, Oldrich, (1977), An equilibrium characterization of the term structure,
Journal of Financial Economics 5, 177-188.
90
Cox, John C., J.E. Ingersoll, and Stephen A. Ross, (1985), A theory of the term
structure of interest rates, Econometrica 53, 385-407.
336
rt e t r0 1 e t e t e u dWu
(18.3)
Ert e t - r0 ,
(18.4)
var rt
2
e 2 t
2 u
e du
0
2
1 e 2 t .
2
(18.5)
(18.6)
See K.C. Chan, Andrew Karolyi, F.A. Longstaff, and Anthony Saunders, (1992),
An Empirical Comparison of Alternative Models of the Short-Term Interest Rate,
The Journal of Finance, Vol 47, No.3, 1209-1227.
337
B0, t E 0P exp ru du
(18.7)
0
Interest rate modeling is quite mathematical. An example of the many books that
can be consulted is Musiela M. and M. Rutkowski, (1998), Martingale Methods in
Financial Modelling, Springer.
338
18.8
Daily one-Month Treasury Bill Rates in the Secondary Market from August
2001 to March 2010 are obtained from the Federal Reserve Bank of New
York public website. The one-month spot rates are treated as proxies for the
short rate rt. The graph of the time series of this rate is shown in Figure 18.3.
Figure 18.3
Daily One-Month Treasury Bill Rates in the Secondary Market from
August 2001 to March 2010
Annualized Daily Treasury One Month Spot Bond Equivalent Yield
.06
.05
.04
.03
.02
.01
.00
Aug 2001
02
03
04
05
06
07
08
09
2010 March
It is seen that the rates increase spectacularly from 2003 till 2007 when the
U.S. property market was booming. The rates collapsed in 2008 and 2009
together with the global financial crisis as governments cut central bank
interest rates.
We shall use linear regression method to provide preliminary investigation
of the plausibility of the Dothan, Vasicek, Brennan-Schwarz, and the CIR
short-rate models.
Dothans approximate discrete model is
rt+ - rt = rt t+
(18.8)
where t+ ~ N(0, ). The implication is that (rt+ - rt) / rt ~ N(0, 2). It
should be pointed out here that except for the Vasicek model that is
mentioned in this section, all the other 3 models including Dothans do
339
not imply normal distribution for the short rates under continuous time.
Hence the discretized version of normal errors is merely an
approximation.
Figure 18.4
Series: DRR
Sample 2 2172
Observations 2170
1,000
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
800
600
400
200
0.027614
0.000000
15.99488
-0.794818
0.450052
22.86198
748.8531
Jarque-Bera 50487537
Probability
0.000000
0
0
10
12
14
16
Figure 18.4 and the embedded table show that (rt+ - rt) / rt is not normally
distributed. Thus (18.8) may not be a good description of the proxy short
rates.
Next we explore regression using the Vasicek discrete model in (18.6):
rt+ - rt = - rt + et+
(18.9)
2
where et+ is i.i.d. normally distributed N(0, ). Let a = , and
340
Table 18.2
Regression rt = a + b rt + et+ , et+ ~ i.i.d. N(0, 2).
Dependent Variable: DR
Method: Least Squares
Date: 04/18/10 Time: 23:58
Sample: 2 2172
Included observations: 2171
White heteroskedasticity-consistent standard errors & covariance
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
R
1.63E-05
-0.001504
2.81E-05
0.001179
0.579256
-1.275752
0.5625
0.2022
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.000695
0.000234
0.000977
0.002070
11968.11
1.508434
0.219512
-1.67E-05
0.000977
-11.02359
-11.01836
-11.02168
1.653049
Figure 18.5
Series: Residuals
Sample 2 2172
Observations 2171
1,000
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
800
600
400
200
5.50e-20
-2.44e-06
0.009721
-0.011617
0.000977
-0.642009
38.81535
Jarque-Bera 116183.6
Probability 0.000000
0
-0.010
-0.005
0.000
0.005
0.010
341
The discrete approximations of Brennan-Schwarz short-rate model is:
rt+ - rt = - rt + rt et+
where et+ is i.i.d. normally distributed N(0, 2).
Let yt+ = (rt+ - rt)/rt, then the model implies
Table 18.3
2
Regression yt+ = a + b (1/rt) + et+ , et+ ~ i.i.d. N(0, ).
Dependent Variable: DRR
Method: Least Squares
Date: 04/19/10 Time: 00:01
Sample (adjusted): 2 2171
Included observations: 2170 after adjustments
White heteroskedasticity-consistent standard errors & covariance
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
RINV
0.000216
6.73E-05
0.008442
1.60E-05
0.025591
4.209572
0.9796
0.0000
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.032504
0.032058
0.442780
425.0444
-1310.233
72.83674
0.000000
0.027614
0.450052
1.209432
1.214669
1.211347
2.181601
In the above regression, the sign of the coefficient of the intercept is not
negative, and thus a negative speed is incorrectly estimated.
The discrete approximations of CIR short-rate model is:
rt+ - rt = - rt + rt1/2 et+
where et+ is i.i.d. normally distributed N(0, 2). Let yt+ = (rt+ - rt)/rt1/2,
then the model implies
342
zero. Let b1 = and b2 = - . Then we perform regression
yt+ = c + b1 (1/rt1/2 ) + b2 (rt1/2) + et+ .
The results are shown in Table 18.4.
Table 18.4
Regression yt+ = c + b1 (1/rt1/2 ) + b2 (rt1/2) + et+,
et+ ~ i.i.d. N(0, 2).
Dependent Variable: DRSQRR
Method: Least Squares
Date: 04/19/10 Time: 00:07
Sample (adjusted): 2 2171
Included observations: 2170 after adjustments
White heteroskedasticity-consistent standard errors & covariance
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
SQRRINV
SQRR
-0.000630
7.82E-05
-0.000523
0.001408
2.64E-05
0.007337
-0.447633
2.960648
-0.071293
0.6545
0.0031
0.9432
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.009705
0.008791
0.012429
0.334769
6443.727
10.61857
0.000026
0.000317
0.012484
-5.936154
-5.928298
-5.933281
2.018086
In the regression reported in Table 18.4, the estimated coefficients are of the
correct signs and one of them is highly significant.
93
343
18.9
In Duffee (1998) and Dufresne et. al. (2001)94, increase in 3-month U.S.
Treasury bill rate and increase in the slope of the Treasury yield curve both
separately would lead to decrease in the credit spread of corporate bonds,
especially bonds with longer maturity and lower ratings.
One way of understanding this is that when times are good boom during
a business cycle peak, and demand for loanable funds are high, the short
(short-maturity) Treasury interest rates (and other loanable funds rates as well)
go up. Likewise, expected next period short rate is high. The latter implies that
todays long rate would account for the high expected next periods rate and
would also be high.
Higher long rates mean that the slope of the yield curve also goes up. In
these situations, the markets assessments of the credit risks of corporate
bonds are lower, and thus the credit spread (or credit premium, difference
between credit risky bond interest and the Treasury interest of the same
maturity) would narrow.
On the other hand, when the Treasury yield curve slope starts to decrease
and turn negative, then there is expectation of future short rate decreases, a
signal of markets negative assessment of future economic prospects.95
To empirically verify this economic intuition, we perform a regression
analysis as follows. Monthly spot interest rates of U.S. Treasury bills and
bonds and also Ba-rating bonds from February 1992 till March 1998 were
obtained from Lehman Brothers Fixed Income Database. We first construct
monthly Ba-rated credit spreads for 9-year term by finding the difference of
Ba-rated 9-year spot rate less the Treasury 9-year spot rate. We then construct
the Treasury yield curve slope (or its proxy slope since the yield curve is not
exactly a straight line) by taking the Treasury 10-year spot rate less the
Treasury 1-month spot rate.
Finally, we also employ the Treasury 1-month spot rate as the proxy for
short rate (very short-term Treasury spot rate). A regression of the credit
spread on the 1-month Treasury spot and the Treasury slope using the set of
74 monthly observations is performed. The results are shown in Table 18.5. It
94
See Duffee, Gregory R., (1998), The Relation Between Treasury Yields and
Corporate Bond Yield Spreads, The Journal of Finance, Vol LIII, No 6, 2225-2241,
and also Pierre Collin-Dufresne, Robert S. Goldstein, and J. Spencer Martin, (2001),
The Determinants of Credit Spread Changes, The Journal of Finance, Vol LVI, No
6, 2177-2207.
95
Ang, Andrew, M. Piazzesi, and Min Wei, (2006), What does the yield curve tell us
about GDP growth? Journal of Econometrics 131, 359-403, commented that every
recession in the U.S. after the mid-1960s was predicted by a negative yield curve
slope within 6 quarters.
344
is indeed seen that the coefficient estimates on the 1-month Treasury spot rate
and on the Treasury slope are both negative as indicated. Regression using the
3-month Treasury spot rate as proxy for the short rate yields almost similar
results. However, the coefficients for this sampling period are not significantly
negative based on the t-tests.
Table 18.5
Regression of credit spread on Treasury short rate
and Treasury Slope, 2/1992 to 3/1998 (74 Observations)
Dependent Variable: CREDITSPREAD
Method: Least Squares
Date: 04/18/10 Time: 10:36
Sample: 1 74
Included observations: 74
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
T1M
TERMSTRSLOPE
0.071755
-0.387103
-1.120837
0.022873
0.337696
0.907097
3.137055
-1.146305
-1.235631
0.0025
0.2555
0.2207
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.024888
-0.002580
0.017943
0.022857
194.0529
0.906070
0.408730
0.043620
0.017919
-5.163593
-5.070185
-5.126331
2.093021
The credit spread and the Treasury slope are indeed important factors in
the economy. They can also explain the cross-sectional variations of
bond returns. We leave this idea as a recommended reading in the
article by Profs Fama and French (1993) on the recommended reading
list.
18.10
PROBLEM SET
18.1
Suppose a record of the last 50 years was taken, and 10 of the years
had negative GDP growth while the other 40 years had positive GDP
growth. In the 10 years with negative GDP growth or recession, 7 of
these years saw inverted yield curves the year just prior to recession.
In the other 40 years of positive GDP growth, 7 of those years saw
inverted yield curves the year just prior to recession. If you see an
345
inverted yield curve this year, what is the probability of a recession
next year given just the above information?
If you could run an OLS linear regression using actual GDP growth
numbers, Y, as dependent variable and actual yield curve slope, X, as
explanatory variable, and suppose the OLS result is
Yt = 0.01 + 1.2*Xt-1
where estimated residual variance u2 0.0004 . Assuming the
residual is i.i.d. normally distributed with zero mean, and that this
period Xt = -0.02, what is the estimated probability of a recession next
year using the linear model? (Use the following unit normal
distribution table.) If the two answers above do not match, explain
why is it so? (A couple of sentences will do here.) If they match, you
do not need to explain.
A
Prob
(z<a)
18.2.
0
0.50
0.1
0.54
0.2
0.58
0.3
0.62
0.4
0.66
0.5
0.69
0.6
0.73
0.7
0.76
0.8
0.79
0.9
0.82
Pt Ae BR t
where t=1,2,3,....,T denotes consecutive trading months (end of
month).
Time
Rt %
Pt
Jan
3.5
78
Feb
3.3
80
Mar
3.2
81
Apr
3.4
79
May
3.5
78
Jun
3.5
78.5
Jly
3.6
77
Aug
3.7
76.5
Sep
3.7
76
Oct
4.0
75
346
returns on stocks and bonds, Journal of Financial Economics 33, 3-56.
[4] Musiela Marek, and Marek Rutkowski, (1998), Martingale Methods in
Financial Modelling, Springer Applications of Mathematics series 36.
[5] Pierre Collin-Dufresne, Robert S. Goldstein, and J. Spencer Martin,
(2001), The Determinants of Credit Spread Changes, The Journal of
Finance, Vol LVI, No 6, 2177-2207.
[6] Sundaresan, Suresh, (1997), Fixed Income Markets and Their
Derivatives, South-Western Publishing.
347
Chapter 19
IMPLIED PARAMETERS
APPLICATION: OPTION PRICING
Key Points of Learning
Warrants, Call, Put, Option premium, Intrinsic value, Time value, Put-call
parity, Black-Scholes formula, Historical volatility, Implied volatility,
Efficient forecast, Errors-in-variable problem, Leverage effect
348
Merton option pricing model; Fischer Black regrettably passed away in 1995
before the grand accolade.
A European (style) call option on an asset A is the right to purchase on a
fixed future date T, one unit of the underlying asset A, at a pre-determined
strike or exercise price K. A European put option on an asset A is the right to
sell on a fixed future date T, one unit of the underlying asset A, at a predetermined strike or exercise price K. An American (style) call option on an
asset A is the right to purchase at any time up to and including on a fixed
future date T, one unit of the underlying asset A, at a pre-determined strike or
exercise price K. An American put option on an asset A is the right to sell at
any time up to and including on a fixed future date T, one unit of the
underlying asset A, at a pre-determined strike or exercise price K.
Calls and puts are pretty much the basic types and most prevalent of all
options. Second generation options include many exotic types97 such as Asian
options, Barrier Options, Lookbacks, Flexi-Options, Digitals, Quantos,
Rainbows, Spread and Basket Options. Current trends in the 2000s include
issue of many different types of structured products that combine various
types of exotics or vanilla options. Besides selling structured products to
investing institutions such as hedge-funds, pension funds, mutual funds,
insurance companies, and corporations, increasingly a large amount of
structured products are also placed out to high net-worth individual investors
directly or through private banking distribution. Sell-side banks or structuring
arms also issue structured warrants on third party equity that are backed by an
inventory of the equity purchased from the open market. After the blues of the
technological bubble burst in U.S. in 2001, and since about 2004, the major
financial markets of the world have seen till the middle of 2007 a giant wave
of financial product and process innovation, and enjoyed a boom time. Since
the middle of 2007, however, the U.S. subprime woes have continued to erode
market confidence.
A particular option, whether European call or put, or whether American
call or put, is of course defined by what the underlying asset is. Many
different types of options have been traded in Exchanges or on the Over-thecounter (OTC) inter-bank market based on assets ranging from stock or equity
to bonds to money-market instruments, to futures contracts, to commodities
such as oil, metals, raw materials, and agricultural produce, to currencies, to
synthetic constructions such as indices, and to verifiable conditions such as
credits, weather, and so on.
97
See for example, Peter G. Zhang (1997), Exotic Options: A Guide to Second
Generation Options, World Scientific Press. Or, Israel Nelken (2000), Pricing,
Hedging, and Trading Exotic Options, McGraw-Hill.
349
19.2
OPTION PRICING
A call option has a maturity or expiry date T in the future. During the period
from the current time t till T, a call option has a price that is determined in the
market, whether an Exchange or OTC. If the call is not exercised by the buyer
or holder till maturity, then at maturity T, the call price should be max(0, ST
K) where ST is the underlying asset price at time T. After T, whether the
option is exercised or not, the option is worthless as the contract has expired.
At t < T, a European call (option) price $c(t,St) is less than or equal to an
American call (option) price $C(t,St) when their exercise prices and term-tomaturity (remaining time to expiry) are the same. This is because an American
call option can always be utilized like an European call option by not
exercising till maturity; the converse is not possible. Thus an American call
option is at least as valuable as its European counterpart. The difference,
C(t,St) c(t,St) 0, is called the exercise premium.
For a call option98, at t T, the situations St > K, St = K, and St < K, are
called in-the-money (ITM), at-the-money (ATM), and out-of-the-money
(OTM) respectively. We show below the dollar payoff outcomes of a long call
and a short call position at maturity T.
Figure 19.1a
Value of a long call contract at T
Figure 19.1b
Value of a short call contract at T
$
Long Call payoff:
Max(0,ST K)
98
Asset price at T, ST
Asset price at T, ST
Short Call payoff:
- Max(0,ST K)
350
A put option has a maturity or expiry date T in the future. During the period
from the current time t till T, a put option has a price that is determined in the
market, whether an Exchange or OTC. If the put is not exercised by the buyer
or holder till maturity, then at maturity T, the put price should be max(0, K
ST) where ST is the underlying asset price at time T. After T, whether the
option is exercised or not, the option is worthless as the contract has expired.
At t < T, a European put (option) price $p(t,St) is less than or equal to an
American put (option) price $P(t,St) when their exercise prices and term-tomaturity (remaining time to expiry) are the same. This is because an American
put option can always be utilized like an European put option by not
exercising till maturity; the converse is not possible. Thus an American put
option is at least as valuable as its European counterpart. The difference,
P(t,St) p(t,St) 0, is called the exercise premium.
For a put option, at t T, the situations St < K, St = K, and St > K, are
called in-the-money (ITM), at-the-money (ATM), and out-of-the-money
(OTM) respectively. We show below the dollar payoff outcomes of a long put
and a short put position at maturity T.
Figure 19.2a
Value of a long put contract at T
Figure 19.2b
Value of a short put contract at
T
$
Short Put payoff:
- Max(0, K ST)
Asset price at T, ST
Asset price at T, ST
351
Figure 19.3a
Profit/Loss of a long call contract
contract at T
Figure 19.3b
Profit/Loss of a short call
at T
C
0
Asset price at T, ST
Asset price at T, ST
-C
Breakeven
point
Figure 19.4b
Value of a long put contract
at t < T
$ Put Price
$ Call Price
Time value
Intrinsic value
Asset price at t, St
Asset price at t, St
352
becomes max(0, K) or K. When the put is deep-out-of-the-money or St K is
highly positive, then p(t,St) gets close to 0. The figures show that the call price
function (bold curve) is a convex function originating at the origin, while the
put price function is a convex function to the origin. The dotted curve
represents the intrinsic value function of the option.
Consider the situation in Figure 19.4a, suppose at t the price of the option
is $3. This option price is also called the option premium. If the underlying
asset price is trading at $8, and if the call strike is K = $5.50, then the call
option is ITM. The intrinsic value of the call option is max(0, St K), or $2.50
in this case. If the option is OTM or ATM, the intrinsic value is defined as
zero.
The time value or time premium of the option is the option premium less
the intrinsic value. In this case, the time value is $3 - $2.50, or $0.50. The time
premium is the value the option buyer has to pay to buy the chance of the
option getting more into the money during the remaining life of the option.
The time value is intuitively increasing with term-to-maturity and with
volatility of the underlying asset return.
19.3
Equilibrium call and put prices are also related in certain ways. For a
European call and a European put with the same strike price K and same
maturity T, they are related by the put-call parity theorem:
c p = St K e-r(T-t).
Thus, the equilibrium put price can be inferred from the call price, and viceversa. Of course, in reality, market prices of calls and puts are only
approximated by the above relationship since there are transaction costs, and
there are also bid-ask spreads not considered in the above formulation.
Ignoring the issue of dividends, American call and put prices are governed
by the following bounds:St K C P St K e-r(T-t).
The easiest way to prove the above is to look at a payoff Table 19.1 as
follows. It serves to prove C P St K e-r(T-t) or P C + St K e-r(T-t) 0.
Thus, whatever may happen, the initial portfolio will produce either zero or
positive future payoff. To prevent arbitrage, the portfolio must cost something
at t. Thus P C + St K e-r(T-t) 0.
The put-call parity theorem for European options is also proved using the
table. In the European case, there is no exercise before T. Therefore the
payoffs according to the table will be zero whatever the future outcomes. To
prevent arbitrage, therefore current outlay cost must equal zero. Hence p c +
St K e-r(T-t) = 0.
353
Table 19.1
Proof of C P St K e-r(T-t) or P C + St K e-r(T-t) 0
Payoff
outcome
at any
tT
Long
Put
Short
Call
Long
stock
Borrow
Total
If no exercise till T
ST > K
ST K
At t < T, if call is
exercised against
put holder;
liquidate all other
positions
+ P(t,St) > 0 sell
put
(St K)
+ St
+ St sell stock
Initial outlay at t
K e-r(T-t)
>K
P C + St K e- > 0
K ST
(ST K)
+ ST
+ ST
K
0
K
0
r(T-t)
If no exercise till T
ST > K
ST K
ST K
(K ST)
St
buy back ST
stock
+ K er(t-t) > K
+ K er(T-t)
>0
>0
ST
+ K er(T-t)
>0
Thus, whatever may happen, the initial portfolio will produce positive future
payoff. To prevent arbitrage, the portfolio must cost something at t. Thus C
P St + K 0. The latter could be a strict inequality, just like American put
price is above European put price because as long as the underlying is low
354
enough, there is a positive probability of early exercise in the American put
case.
However, it can be shown in the American call case without dividends
that it will not be exercised before maturity. Thus the latter American call
price is equal to the European call price, in this special case.
19.4
IMPLIED VOLATILITY
In 1987, the U.S. stock market saw a dramatic rise of 42% in its market value
until October 19 that year when over two days the market fell a whopping
23%. This has come to be known as the Stock Market Crash of 87 and Black
Monday, and there had been an enormous amount of studies poured over this
subject. Two of the most important empirical studies on this subject, Schwert
(1990) and Bates (1991)99 basically indicated that stock index options
contained more information beyond those carried in stocks themselves that
could shed light on the event. This highlights the importance of empirical
studies of option prices in investment. Specifically, we shall explore the
information contained in volatility that is implied in option prices.
A graphical depiction of prices and implied volatility during the time of
the 1987 October crash may be seen as follows in Figure 19.5. The numbers
are inferred from the Schwert study.
From the earlier chapters, we know the volatility of a stock or a stock
index return is its standard deviation, usually expressed on an annual basis.
Since the lognormal diffusion process underlying the Black-Scholes option
pricing model has the random walk as its discrete analogue, we know the
variance of return over horizon T, relative to the variance 2 over one unit of
time, is 2T. Hence, the volatility of return over T is T.
Annualized historical volatility can be estimated using T where
number of trading days in a year is T=252, for example, and daily volatility is
estimated as follows.
1 252
rd r 2
251 d1
where r
1 252
rd ,
252 d1
Sd
Sd1
where Sd is day d stock or stock index price. You would note that rd is a
99
G. William Schwert, (1990), Stock Volatility and the Crash of 87 in The Review
of Financial Studies, Vol.3, No.1, 77-102, and David Bates, (1991), The Crash of 87
was it expected? The evidence from options markets, Journal of Finance, Vol.46,
No.3, 1009-1044.
355
continuously compounded return rate.
Figure 19.5
S&P 500 Index and Implied Volatility in October 1987
During the U.S. Stock Market Crash
35
30
25
20
15
10
S&P 500 Index /10
0
13
14
15
16
19
20
21
Dates/October 1987
N x S N x
ct, St St Nx Ke r N x
pt, St Ke r
where
x
(19.1)
lnSt /K r 12 2
and T t.
Black, F., and M. Scholes, 1973, The Pricing of Options and Corporate
Liabilities, Journal of Political Economy 81, 637-659.
356
generalized to one where the volatility (t) is a deterministic function of time,
and where 2
sds .
T
. Thus, can be found such that the call price observed at t, c, is equal to:
c St Nx Ke r N x
and
lnSt /K r 12 2
.
357
Dates SPX index
Strike K
1/03
3/01
1200
1200
1202.08
1210.41
Time-tomaturity
73/365
80/365
Riskfree
rate
3% p.a.
3% p.a.
Call
price
31.5
33.0
The implied volatilities for the nearest the money calls on Jan 03 and Mar 01
are 12.5% and 10.1% respectively.
19.5
EFFICIENT FORECAST
(19.2)
process or Brownian motion such that
the rate of return process of the asset. This same appears in the BlackScholes option price in (1). At time t, when we find the implied volatility of an
[ t ,T ] is assumed to be the markets riskoption that matures at time T, this
neutral assessment of the volatility of dSt/St that will happen over [t,T].
Suppose we compute the realized (ex-post) volatility of dSt/St by
1 T
rk r 2 where N is number of days between t and T, rk is
N k t 1
the daily return rate at day k, and r is the daily mean return rate over [t,T].
[ t ,T ] contains information about *[ t ,T ] .
Then it would be reasonable to test if
[*t ,T ]
Christensen and Prabhala (1998)101 used monthly European S&P 100 index
options from November 1983 to May 1995 to compute implied volatilities
[ t , t 1] of the underlying S&P 100 index returns. The subscripts [t,t+1] means
that the implied volatility is obtained at end of month t for a horizon till end of
the next month which is the options expiry. The options have about 1-month
left to maturity. At the same time, they computed realized index returns over
the month ahead, *[ t ,t 1] . Options are nearest the money calls. Thus, for the
sampling period t=Nov 83 to T=May 95, using 139 monthly sample points the
following non-overlapping regression was performed.
101
Christensen, B.J. and N.R. Prabhala, (1998), The relation between implied and
realized volatility, Journal of Financial Economics 50, 125-150.
358
[ t , t 1] ) + et
ln( *[ t ,t 1] ) = a + b ln(
(19.3)
19.6
PROBLEM SET
19.1
Two call options on the same underlying stock are traded such that
one has a strike price of $2.05 and the other has a strike price of
$2.10. Both have the same maturity in 3 months. What are the
possible reasons that their implied volatilities derived from the BlackScholes model are different?
19.2
Company A has just announced the surprising news that its CEO is
quitting, and that search for a replacement will be started. In the light
of this, supposing options are available on the stocks of this firm, what
would you expect the behavior of the implied volatility to be? In the
event studies you have come across, what may be a necessary
359
adjustment to make in trying to test the significance of any abnormal
returns post announcement?
360
Chapter 20
GENERALIZED METHOD OF MOMENTS
APPLICATION: CONSUMPTION-BASED ASSET
PRICING
Key Points of Learning:
Consumption, Utility, Representative agent, Utility-based asset pricing model,
Euler condition, Moment restrictions, Orthogonality conditions, Consumption
beta, Equity premium puzzle, Hansen-Jagannathan bound, Generalized
method of moments (GMM), Asymptotic chi-square test, Newey-West
covariance estimator
361
in excellent books such as Huang and Litzenberger (1988) and Ingersoll
(1987). Similarly excellent books in macroeconomics but using utility-based
frameworks are Blanchard and Fischer (1989) and Stokey and Lucas (1989).
These classics are shown in the further reading list at the end of the chapter.
20.1
Max E t
C t
k UC t k
(20.1)
k 0
Max UC t E t U C t 1 1 R t 1 .
(20.2)
- U C C t E t 1 R t 1 U C C t 1 1 R t 1 0 0 .
Note that when the substitution is optimal as above, the equation has net gain
of zero, hence 0 on the RHS.
Then,
Or,
- U C C t E t 1 R t 1 U C C t 1 0 .
U C
E t 1 R t 1 C t 1 1 .
U C C t
(20.3)
Equation (20.3) is an Euler condition. (20.3) applies to all traded assets in the
economy. Let any such an asset with price Pt at t have random price at t+1 of
362
Pt+1. Then, 1+ Rt+1 = Pt+1/Pt , and we substitute this into (20.3). Thus,
U C
(20.4)
E t C t 1 Pt 1 Pt .
U C C t
U C C t 1
Let M t 1
be the kernel within the integral or expectation in
U C C t
(20.4). Then (20.4) can be re-written as
E t M t 1 Pt 1 Pt .
(20.5)
E M t 1 R t 1 1 .
(20.6) E(Rt+1) E(Mt+1) + Cov (Mt+1, Rt+1) = 1.
(20.6)
(20.7)
Since the riskfree asset with return Rf should also satisfy (20.6), we have
Rf E (Mt+1) = 1, or E (Mt+1) = 1/Rf . Hence (20.7) becomes
E (Rt+1) Rf = - Cov (Mt+1, Rt+1) / E (Mt+1)
or, ER t 1 R f
cov- M t 1 , R t 1 var M t 1
.
var M t 1
EM t 1
(20.8)
(20.9)
var M t 1
cov- M t 1 , R t 1
and M
, then M
EM t 1
var M t 1
is a market risk premium or the price of risk common to all assets, and
(CCAPM). If we put C
363
C is a consumption beta specific to the asset with return Rt+1. The
intuition of this beta is as follows. Suppose the assets consumption
beta > 0 and thus its return Rt+1 correlates positively with Mt+1. This
implies a positive correlation between Rt+1 and Ct+1 since UCC < 0 due to
decreasing marginal utility to increasing consumption (U is a concave
function). Thus holding that asset adds to the consumption volatility
which would require risk compensation in the form of higher expected
return for the asset.
(20.8) can be re-written as
E (Rt+1) Rf = - M,R M R / E (Mt+1) , or
ER t 1 R f
M
M
MR
.
R
EM t 1 EM t 1
(20.10)
The GMM is developed by Prof Lars P. Hansen (1982) Large Sample Properties
of Generalized Method of Moments Estimators, Econometrica 50, 1029-1054.
364
E[ f2(Xt,) ] = 0
E[ fK(Xt,) ] = 0
(20.11)
1 T
f x , 0
T t 1 1 t
1 T
f x , 0
T t 1 2 t
(20.12)
...
1 T
f x , 0
T t 1 K t
The Central Limit Theorem would tell us that as T, these sample moments
would converge to
E f X ,
j t
for all j=1, 2,.., K. Hence, if these conditions are approximately zero, then
intuitively the vector estimator would also be close to in some fashion,
given that is unique. We would also assume some regularity conditions on
the moments E[fj(.)] such that E[fj( )] would behave smoothly and not jump
about as value gets closer and closer to . Let all the observable values of
sample size T be YT {xt}t=1,2,.,T.
365
1 T
f 1 x t ,
T tT1
1 f x ,
T 2 t
Let g YT , t 1
1 T
f K x t ,
T t 1
min
g YT ,
WT YT , g YT , .
(20.13)
Note that if WT(.,.) is any arbitrary symmetric positive definite matrix, then
weighting matrix function WT (.,.) can be found so that the estimators will
be asymptotically efficient, or have the lowest asymptotic covariance in the
class of estimators satisfying (20.13) for arbitrary WT(.,.).
Let vector function FK1(Xt, ) = ( f1(Xt, ), f2(Xt, ), , fK(Xt, ) )T.
Then,
gYT ,
Let 0
1 T
Fx t , .
T t 1
E FX t , FX t - j ,
N
j N
366
The minimization of (20.13) gives first order condition:
T
g YT ,
2
WT YT , g YT , 0 m1
g YT ,
WT YT , g YT , 0 m1 ,
(20.14)
initially selected as IKK The solution 1 is consistent but not efficient. The
consistent estimates 1 in this first step are then employed to find the optimal
weighting matrix WT*(YT , 1 ).
1 T F x t , 1
Let g' YT , 1
.
T t 1
T
1 F X , F X ,
0
1
t
1
t
1
T t 1
N
1 T
F X t , 1 F X t - j , 1
j1 T t j1
T
1 T
F X t , 1 F X t - j , 1 .
(20.15)
j1 T t j1
1
Then employ WT * 0 1 as the optimal weighting matrix in (20.13)
and minimize the function again in the second step to obtain the
efficient and consistent GMM estimator * .
N
gY , *.
T
T * g YT , * 0 1
T
g YT , * 1
0 g YT , * 0 m1 .
367
, and
As T, 1 , * ,
0 1
0
gY , *
1
T
T T * T g YT , * 0 1
2
K -m
(20.16)
Sometimes the LHS is called the J-statistic, JT T T. Notice that this test
statistic is asymptotically chi-square of K-m degrees of freedom and not K
degrees of freedom when the population parameter is instead in the
arguments. This is because m degrees of freedom were taken up in m linear
dependencies created in (20.15) in the solution for * . (20.16) also indicate
that a test is only possible when K > m, i.e. the number of moment conditions
or restrictions is greater than the number of parameters to be estimated. We
can always create additional moments conditions by using instruments. One
common instrument is lagged variables contained in the information set at t. If
the moment conditions such as those in (20.11) are generated by conditional
moments such as (20.6), then it is easy to enter the information variables
observed at t into the expectation operator in (20.6), and then take iterated
expectation on the null set to arrive at the unconditional moments such as in
(20.11).
For example, a theoretical model may prescribe Et-1[ f1(Xt,) ] = 0 as a
conditional expectation. By the iterated expectation theorem, this leads to a
moment restriction in (20.11). Since Xt-1 is observed at t-1, we can add it as in
instrument to obtain Et-1 [ f1(Xt,) Xt-1 ] = 0, hence another moment restriction
E [ f1(Xt,) Xt-1 ] = 0. From Appendix A, we may say vector f1(Xt,) is
orthogonal to Xt-1 for any realization of Xt-1, Xt. Thus, such moment
restrictions are sometimes also called orthogonality conditions.
The excess number of moment conditions over the number of parameters
is called the number of overidentifying restrictions, and is the number of
degrees of freedom of the asymptotic 2 test.
~
g
* by a linear Taylor series expansion
T
~
about the true population parameter , where is some linear combination of
~
and * . is also consistent. Pre-multiplying by the m K matrix,
Now, g * g
g YT , * 1
0 , we obtain
T
T
~
g YT , * 1
g YT , * 1 g
0 g * g
0
* .
T
Then
368
~
g Y , * T
T
1 g
0
T
or,
T
g YT , * 1
0 g * ,
g Y , * T 1 g ~
T
T *
0
T
Since T g ~ N 0, 0 , then
g YT , *
cov
T
g YT , *
V
g YT , * 1
0 T g .
~ 1
g g YT , * 1
0 T g
T
1
1 g YT , *
0 0 0
V
T
1
g Y , * T 1 g ~
T
is a mm matrix.
where V
0
T
Asymptotically, T * N0, V
where
(20.17)
g Y , * T 1 g *
T
.
V
0
(20.16) is the test statistic measuring the sampling moment deviations from
the means imposed by the theoretical restrictions in (20.11). If the test statistic
is too large and exceeds the critical boundaries of the chi-square random
variable, then the moment conditions of (20.11) and thus the theoretical
restrictions would be rejected. (20.17) provides the asymptotic standard errors
of the GMM estimator * that can be utilized to infer if the estimates are
statistically significant given a certain significance level.
is consistent.
However, in finite sample, this computed matrix may not be positive
369
semi-definite. This may result in (20.16) being negative. Newey-West
HACC (heteroskedastic and autocorrelation consistent covariance)
matrix estimator104 provides for a positive semi-definite covariance
estimator of 0 that can be used. This takes the form
T
1 F X , F X ,
0
1
t
1
t
1
T t 1
N
j
T
1
j
j
N 1
j1
T
1 F X , F X ,
where
j
t
1
t- j
1
T
t j1
20.3
.
T
U C
E t 1 R t 1 C t 1 1 .
U C C t
E t 1 R t 1 Qt 11 1 ,
(20.18)
where Qt+1 Ct+1/Ct is the per capita consumption ratio, as Uc (Ct) Ct-1.
Since there are two parameters, and , to be estimated, we require at least 3
moment restrictions. This will yield 1 overidentifying restriction. We could
employ the lagged values of (1+Rt)s or Qts as instruments. Here we form 3
moment restrictions:
(a) E 1 R t 1 Qt 1 1 0 ,
(b)
(c)
E 1 R
E 1 R
1 R t Qt 1 1 R t 0 ,
t 1 1 R t -1 Q t 1 1 R t -1 0 ,
t 1
See Newey, W.K. and Kenneth D. West, (1987), A Simple, Positive SemiDefinite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix,
Econometrica, Vol.55 No.3, 703-708.
370
If a distributional assumption is made about Qt, then the Maximum
Likelihood method can be employed.105 Using the GMM method discussed in
the previous section, the GMM estimation and test statistics are shown as
follows. The quarterly consumption data from 2000 to 2009 used in the
analysis are obtained from the public website of the Bureau of Economic
Analysis of the U.S. Department of Commerce. In particular the real durable
consumption series is divided by population to obtain the per capital durable
consumption. Qt measures the quarterly growth ratio.
Table 20.1
GMM Estimation of the Euler Equation under Constant Relative Risk
Aversion
Dependent Variable: Implicit Equation
Method: Generalized Method of Moments
Included observations: 37 after adjustments
Estimation weighting matrix: HAC Newey-West
Standard errors & covariance computed using estimation weighting matrix
Convergence achieved after 13 iterations
C(1)*MKTRET*CONS^C(2)-1
Instrument specification: MKTRET(-1) MKTRET(-2)
Constant added to instrument list
C(1)
C(2)
Mean dependent var
S.E. of regression
Durbin-Watson stat
Instrument rank
Coefficient
Std. Error
t-Statistic
Prob.
0.998049
-1.361539
0.023597
0.514941
42.29641
-2.644066
0.0000
0.0122
0.000000
0.091516
1.897556
3
0.000000
0.293129
4.115583
0.042490
From Table 20.1, it is seen that the time discount factor is estimated to be
0.998. The relative risk aversion coefficient is estimated to be 1.36. Both are
significantly different from zero at p-values of less than 2%. The J-statistic
that is distributed as 2 with 1degree of freedom is 4.11 with a p-value of
0.0425. Therefore, the moment restrictions (a), (b), and (c) implied by the
105
See Hansen, L.P., and K.J. Singleton (1983), Stochastic consumption, risk
aversion, and the temporal behavior of asset returns, Journal of Political Economy,
vol. 91, 249268.
371
model and rational expectations are not rejected at the 10% significance level,
though it would be rejected at the 5% level. The result is similar to those
reported in Hansen and Singleton (1982).106 In this GMM estimation, the
covariance matrix of the sampling moments is estimated using the Newey-
m0
ER i i
ER m i
ER m
0 m0
0 m0
(20.19)
can be rigorously tested for its full implications using the GMM.108
Ris and Rm are the ith assets and the market excess return rates. i is the usual
market beta im/m2. i is ith assets co-skewness with the market,
2
3
2
E{ R i ER i R m ER m } m 03 ; 0, 0 and m0 are the usual mean,
variance and the third central moment of the excess market return. is the
marginal rate of substitution of terminal wealth m for terminal wealth
volatility. Then the moment restrictions testable under the null of (20.19) with
parameters is, is, 0, 0, and to be estimated, are:
m0
E R i
i
i R m 0 , for all i,
0 m0
0 m0
E R i R m 0 R i i [ R 2m 0 R m ] 0 , for all i,
E R i R 2 2 0 R i R [ 2 2 ] R i [ R 0 ] 3 0 , for all i,
m
m
0
0 i
m
106
372
and
E R m 0 0
E [ R m 0 ] 2 2 0
0
E [ R m 0 ]3 m 3 0 .
0
Since 1982, the GMM technique has been one of the most widely applied
methods for empirical estimation and testing especially of nonlinear rational
expectations models in many fields of economics and finance. This attests to
its usefulness in many situations when serial correlations or lack of
distributional information hampers empirical investigation. However, its
drawback is its reliance on asymptotic theory. Many Monte Carlo and
econometric studies have arisen to investigate how inference under the GMM
can be improved.
20.5
PROBLEM SET
20.1
20.2
E t M t 1 Pt 1 Pt 0 .
G t
Ht
M t 1 Pt 1 Pt 0 or
G t
Pt 1
1 0 using GMM?
Pt
test restriction E
H
E t
G t
20.3
M t 1
Show how you would apply GMM to estimate a and b and to test the
linear regression model in Yt = a + b Xt + et where Et-1(et) = 0 and
373
vart-1(et) = 2, though it is serially correlated. We also do not know
what is the distribution of et. et is also correlated with Xt. Assume Xt-1
is observed. Assume {Xt} and {Yt} are stationary series.
20.4
20.5
What are the advantages and disadvantages of the GMM versus the
maximum likelihood estimation methods?
374
Appendix A
MATRIX ALGEBRA
A1.
MATRICES
2 1 3 4
4 2 0 8 is a 3 x 4 matrix.
5 1 7 12
The 2nd row and 3rd column element, a23 = 0.
2 1 3 4 is a row vector.
2
4 is a column vector.
5
A vector is a special matrix. A scalar is a number e.g. 3. A scalar multiplied
by a matrix M is a scalar operation, such that [aij]rxc = [aij]rxc. As an
illustration,
2 3 6 9
3 1 4 3 12 .
6 1 18 3
375
a b e f a e b f
c d g h c g d h .
2 0 3 4 1 4
1 5 2 6 3 1 .
d e f v y
23 w z
32
au bv cw ax by cz
du ev fw dx ey fz 22
5
3 1 2
1 0 3 1 1
5
4 0 2
2
0
5
3
10 1 0 0
7
0 1 0 .
10
0
0
1
1
10
376
[aij]nxn is a square matrix. For a square matrix, if all diagonal elements a ii
are one, and all non-diagonal elements aij (i j) are zero, it is called an
identity matrix, usually denoted by Inxn. Each row or column of I is a unit
vector. However, these unit vectors are themselves different because their
units occur at different positions in the vector.
For a square matrix where not all diagonal elements are zero, and all nondiagonal elements are zero, it is a diagonal matrix. This is different from Inxn
since diagonal elements need not be all ones.
The trace of a square matrix with dimension n n, or dim (n), is the sum of
the n diagonal elements.
tr a ij
i 1
ii
3 2
For an example, tr
7.
5 4
Also,
tr (A) = tr (A)
tr (AB) = tr (BA)
tr (A + B) = tr (A) + tr (B)
[aij]rxc , where all aij = 0, is a zero or null matrix 0.
The transpose of M is MT
2 3
1 6
1 4 2
3 4 1
6 1
T
Or, [aij]mxn = [aji]nxm . The ith row becomes the ith column, for all i.
Suppose tr (A) = 6, then clearly tr (AT) also = 6.
[ai]rx1 where all ai = 1 is a ones vector 1. Note that a unit vector is not, and
should not be confused with a ones vector in which all elements are ones.
377
1
1
1
1
4.
1
Also,
1
5 6 3 1 5 6 3 14 .
1
More generally, x 1
x2
x 3 . . . . x n 1n 1n1 x i .
i 1
c d
M
a b
c d
ad bc
As an example,
10 3
2 . The determinant is 10x2 3x6 = 2. This is a
6 2
a 12
a 22
a 32
a 13
a 23 .
a 33
a 23
a 33
a 13 (1) 1 3
a 21
a 22
a 31
a 32
378
In the above expression,
a22
a23
a32
a33
determinant of the smaller matrix obtained by deleting the 1 st row and the 1st
column of original M.
Consider a minor Mijof element aij. Mij is the remaining matrix after
deleting the ith row and the jth column. The corresponding co-factor of
element aij is
Cij = (-1)i+j Mij.
The determinant M nxn
a
j 1
ij
Note that the same determinant can be found by expansion along a column k
n
M nxn a ik ( 1) i k M ik
i 1
As an example,
5
2
6
3
1
3 2
2 2 2 3
2 5
6
3 0
7 0
7 3
7 3 0
5 (6) 6 14 (27) 141.
If A, B are conformable, A x B is an operation where one pre-multiplies B
by A, or post-multiplies A by B.
If A, B are square matrices, and A x B = I, then A is inverse of B, A B-1.
Also, B is inverse of A, B A-1. For any general X, where its inverse exists,
XX-1 = X-1X = I.
For a 2 x 2 matrix, finding the inverse is simple. Given
a b
M
,
c d
M -1
1
M
d b
c a .
379
As an illustration,
3 1
4 2
1 2 1 1 12
.
2 - 4 3 2 32
To verify this is indeed the inverse, we multiply it with the original matrix to
yield the identity matrix.
1 12 3 1 1 0
.
- 2 3
2 0 1
2 4
.
adj M =
.
.
.
c 1n
c 21
c 22
.
.
.
.
.
c 2n
c n1
c n2
.
.
.
.
.
c nn
M -1
adj M
|M|
As an example,
4 1 1
if M = 0 3 2 , then adj M =
3 0 7
21 7 5
6 31 - 8 .
9 3 12
380
21 7 5
1
6 31 - 8 .
Now M= 99, thus M =
99
9 3 12
-1
Now suppose M= 0, does M have inverse? No. If M-1 does not exist, M
is called a singular matrix. If M-1 exists, M is a non-singular matrix. Thus we
see that
1 2
0 0 is singular since its determinant is zero.
A B
M
C D
For example,
2 5
4 1
M=
3 1
1 0
6
0 A B
0 C I 2
A
M 11
A 21
A12
A 22
A11
A
21
A12
A 22
1
BA 21A11
- A11 A12 B
381
A11
A
21
A 12
A 22
1
BA 21 A11
1
- A11 A12 B
1
1
1
A 21 A11 I A 12 BA 21 A11 A 22 BA 21 A11
I 0
I.
0 I
A12 B A12 B
This facilitates finding of the inverse of large M using smaller matrices. M3x2
can be partitioned into row or column vectors. A row partition is
X1
X
2
X 3
where each Xi is a 1x2 vector. For a general Mrxc, if number of rows r >
number of columns c, it is possible to find 1xr non-zero vectors [ij] so that
[ij] x M = 0.
For example,
1 3
M 1 2 .
3 7
1 3
1 2 1 1 2 0 0.
3 7
This is akin to finding a linear combination of the rows viz.
1 [1 3] + 2 [1 2] 1 [3 7] = [ 0 0]. Thus at least one of the rows is
linearly dependent on the others.
For Mrxc where r c, it may or may not be possible to find 1xr non-zero
vector [ij] so that [ij]xM = 0. For example,
1 1 3
M
.
3 2 7
382
If a
1 1 3
b
. a 1 1 3 b 3 2 7 0 0 0 , [a b] must be
3 2 7
zero. Thus, the two rows are linearly independent. Then the rank of M is r.
The rank of M is also the maximum number of linearly independent rows in
M.
1 3
3 7
combination of the rows such that [0 0] obtains. This means that out of the 3
rows in the matrix, one is linearly dependent on the other two. If we remove
one row, e.g. the first row, and we cannot find non-zero [a b] so that
1 2
b
0 0 , then it has a rank of 2.
3 7
LINEAR EQUATIONS
(A.1)
(A.2)
(A.3)
1 3
2
1 2 x1 1 .
x
3 7 2 4
We perform the same row operations as above by pre-multiplying:
383
1 2 1 1 3
0 1 0 1 2
0 0 1 3 7
1 2 1 2
x1
x 0 1 0 1 .
2 0 0 1 4
Thus,
0 0
0
1 2 x 1 1 .
x
3 7 2 4
In the above, a linear combination of rows
1 1 3 2 1 2 1 3 7 0 0
1 3
1 2
3 7
is said to be linearly dependent.
The reduced system of equations after deleting redundant equations is:
1 2 x 1 1
3 7 x 4 .
2
If the system cannot be reduced further, then matrix
1 2
3 7
x 1 1
x 3
2
x1 7
Or,
x 2 - 3
2 1
7 4
- 2 1 1
.
1 4 1
The solution X to a linear system Anxn Xnx1 = Bnx1 above requires finding
inverse A-1.
Xnx1 = A-1nxn Bnx1 = adj(A)nxn Bnx1 |A| .
If Xnx1 = [xk]k=1,2,,n , then kth row of X is
384
A
A
xk
b - 1
i 1
ik
M ik
The numerator looks like a determinant. In this case, it is that of a new matrix
formed by replacing the kth column of A with Bnx1, say AK.
Then x K
A.3
AK
A
EIGENVALUES
Given matrix Mnxn and vector Vnx1 0, suppose we can find a scalar so that
MV = V. Then is called an eigenvalue or characteristic root of M, and V
is called an eigenvector or characteristic vector of M.
(M-In)V = 0
If (M-In)-1 exists, then V = 0 which is a trivial solution. For general V 0,
the system of linear equations holds if (M-In)-1 does not exist, i.e. (M-In) is
singular.
M-In= 0
is called the characteristic equation of M.
2 2
, solve
2 1
2
2
0.
2
1
Obtain (2-)(1+)-4 = 0, or 2 - -6 = 0.
At this point, use the usual quadratic formula to find the roots.
3 or 2 .
2 1
2
385
2 v1
2 3
0,
2
1 3 v 2
or 1 + 22 = 0, thus 1 = 22 .
2 4
1 2
2v
v 2
v2
So v T v 2v2
2v
2
v2 2 5v2 1 .
v2
2
1
. Thus the normalized eigenvector is v
5
1
Then,
v2
A.4
ORTHOGONALITY
5
.
5
If the scalar product of two vectors u and v is 0, uTv = 0, then u and v are
orthogonal. Unit vectors are orthogonal:
0
1 0 0 1 0 .
0
Orthogonality has a geometric interpretation.
y
v
W
u
z
x
386
dimensional Euclidean space R3 above, u, v, w are the 3 unit vectors shown in
axes x-, y-, and z- respectively.
y
P
y
z
x
y
b
w*
a
z
A w vector that lies on the plane (shaded area) formed by linear combinations
of a and b is represented by
xa
xb xa
a ya b y b ya
z
z z
a
b a
xb
y b a X .
z b b
A specific w* vector on this plane with a = , b = 2/3, is shown.
When (xa ya za)T and (xb yb zb)T are linearly independent, their linear
combination forms a subspace of a lower dimension (dim = number of
387
weights in the combination). The (2-dim) subspace is said to be spanned by
the 2 columns of X above.
The 2-dim subspace (or geometrically a 2-dim plane in R3 space) above
passes through O. To be more general, when a 2-dim subspace does not pass
through O, it is represented by X - C, where C is a constant 3x1 vector.
Thus we can express the equation of a plane (2-dim) in 3-dim space as
V3x1 = X - C
where X and C are given, and V = (x, y, z) represents a vector in 3-dim with
general coordinates in X-, Y-, and Z-axis.
Since we can certainly solve a and b in terms of X, C, and also x, y, then
the plane-equation can also be written as a linear equation f (x, y, z; X, C) = 0.
An example is 3x+4y-2z = 5.
There are interesting geometric interpretations to operations on vector spaces
as in the above.
The length of vector a is
x a2 x 2b x c2 a T a 2 a
1
Note that the analytical formula of length in Euclidean space comes from the
basic Pythagoras theorem. ||a|| is sometimes called the norm of a. To convert
a vector of length ||a|| into a similar vector (i.e. same direction) but of unit
length, transform it into a/||a|| .
Now consider a simpler 2-dim space as follows, shown in the diagram.
y
(xa ya)
(xb yb)
0
From basic trigonometric identity,
388
x a x b ya y b
a b
a b
aTb
a b
y
A (xa ya)
=-
e=(xb yb)/||b||
0
Thus, cos
a Te
a
T
or, a,e a e a cos is the length of the projection of a onto e. The
389
2
n
n 2 n 2
a i bi a i bi .
i1
i1 i1
Another inequality, the triangle inequality states that
||c-a|| ||b-a|| + ||c-b||
or, |AC| is |AB| + |BC|.
A5.
Note that a convex set can have linear segments e.g. a sliced disc.
A and B are convex sets in Rn space that are disjoint, i.e. A B = . A
linear functional p is a vectornx1 of constants such that pTx = constant k, for
any point x in the Rn space.
pTx = k represents a hyperplane in Rn space. In R2 or 2-dimensional
Euclidean space, p = (a b)T. So pTx in R2 is ax1+bx2=k which is a straight line.
In R3, ax1+bx2+cx3=k is a 2-dim plane.
390
u2
A
x
B
y
u1
u2-u1= -3
The above diagram shows two disjoint convex sets A and B in two-dimension,
and a line u2 = u1 3 that passes between them.
The Separating Hyperplane Theorem states that if A and B are two disjoint
nonempty convex sets in Rn, then for any x in A and y in B, there exists a
linear functional p such that pTx pTy . This is illustrated in the R2 or 2-dim
diagram where pT = (-1 1). So, (-1 1) x > -3 > (-1 1) y where xT = [ u1 u2 ].
A.6
QUADRATIC FORM
x1
x
x 2 M 2x2 1
x 2
391
where x nx1
x1
x
2
.
and Mnxn = [ij] .
.
.
x n
w2
. . . w n and wT 1 =1 .
Var (
w ~r ) = w
i 1
i i
0.16 0.3
0.3 0.25
Covariance matrix V
0.3
or,
0.16 0.3
0.3 0.25
AB = 0.1 0.9
0.3
0.7 0.0603.
392
~r1
r2
ri ) or i.
functions. The mean of ri is its expected value E( ~
r1 , ~
r2 is
M 1 . The covariance matrix for ~
2
~
~
V E{(R M)(R M) T }
~r1 1 ~
r1 1 ~r2 2
E ~
r2 2
2
~
~
( r1 1 )
( r1 1 )(~r2 2 )
E ~
~
(~r2 2 ) 2
( r2 2 )( r1 1 )
E(~r ) 2
E(~r1 1 )(~r2 2 )
~ 1 ~1
E(~r2 2 ) 2
E( r2 2 )( r1 1 )
2
1
21
12
.
2
2
Note that ~
r1 can be interpreted as return rate of ith security or return rate at time
period i of the same security.
A.7
MATRIX CALCULUS
x 2
dm 2x
,
then
2 . Note that m is a vector and x is a
3
dx
x
3x
Suppose m
w 1
2 (1 w 1 12 w 2 )
2
393
2 (12 w 1 2 w 2 )
2
w 2
We can write
p 2
2
p
1 2 12 w 1
w 1
2
p 2
w
21 2 w 2
w 2
T
or,
w Vw 2 V w , where w is a vector.
w
If we take second order derivatives:
p
2
(
) 2 1
w 1 w 1
p
(
) 2 12
w 1 w 2
p
2
(
) 22
w 2 w 2
p
(
) 2 12
w 2 w 1
2
We can write
p 2
(
)
2
2 ( p ) w 1 w 1
p 2
w 2
(
)
w 2 w 1
2
2 1
21
2V .
2
p
(
)
w 1 w 2
2
p
(
)
w 2 w 2
12
2
2
x nx1 itself has a multivariate normal distribution: x nx1 ~ N( nx1 , Vnxn ) , where
nx1 = [i] , Vnxn = [ij].
x nx1 X1
T
X2 Xn
394
and dim (DVDT) is m m.
A sum of n independent squared standard normal random variables is
distributed as 2 with n degrees of freedom. Univariate random variable
d
X i ~ N( i , i ) .
X i d
Zi i
~ N(0,1)
i
n
Z
i 1
2
i
~n .
n
Zi z T z ~ n .
2
i 1
(x M) T V 1 (x M) ~ n .
2
1 2
0
.
A special case is V = diag [i2] =
.
.
0
2
2
.
.
.
.
.
.
.
.
.
.
2
n
i 1
Xi i
d 2
~ n .
395
Appendix B
EVIEWS GUIDE
EVIEWS is a statistical software package that works in Window environment
and is used mainly for econometrics and regression analyses. It has built-in
functions for many econometric procedures and easy data handling tools.
Different versions of EVIEWS may have slightly different bells and whistles
though the generic framework of the statistical software should be similar.
To start EVIEWS, click the EVIEWS program icon and open into the
following blank Workfile. All works and outputs are done through a loaded
workfile.
In order to bring data from an Excel spreadsheet or other format into the
EVIEWS program for econometrics processing, a new loaded Workfile
has to be first created. Do this by
(1)
clicking on the main menu File , then (choose and click) New . Then
Workfile. The following submenu will appear to prompt where and
how to locate data in the PC in order to load them into the new workfile.
396
(2) Suppose our Excel datafile is OCBC daily return data.xls with 5
columns of data and 1306 rows (first row is non-data and can be
excluded during loading). One way is to specify undated or irregular
data, and enter start observation 1 and end observation 1305. Then
press OK.
(3) EVIEWS will then open the following worksheet displayed as
follows. Note that by default, two time series c and resid will be
initialized and appear, but they basically do not contain information at
this point. The program file below contains the upper space:
programming window for writing programs, and the worksheet
below to contain objects such as the two time series.
(4) We can save the work file for future work, or re-save (save) the latest
update as work progresses in EVIEWS. This is done by clicking File
(on the main menu) Save As (and proceed to name the file to be
saved in the appropriate PC directory). This can be recalled later for
use.
397
Programming
window
worksheet
(5) Next, load the external data into EVIEWS program itself. This is done
by clicking PROCS (procedure) in the submenu Import Read
Text-Lotus-Excel , and then pointing to the location of the Excel
datafile, e.g.: OCBC daily return data.xls. The Excel Spreadsheet
Import menu will appear see next diagram (note that the external
Excel datafile has to be closed before this works). At this point it is
instructive to know the structure of the Excel datafile. It looks as
follows. The 5 columns of time series data are
A
1
Calendar Date Date Value Day Number
Day
2
10/27/1997
35730
1
Monday
3
10/28/1997
35731
2
Tuesday
4
10/29/1997
35732
3
Wednesday
.
.
1305
1306
10/24/2002
10/25/2002
37553
37554
1824
1825
Thursday
Friday
E
OCBC daily
returns
0
-0.053708912
0.020498102
-0.008888947
-0.00896867
398
(6) Next specify the number of observations and enter the start Excel cell
name of the first observation to be loaded onto EVIEWS workfile.
Enter A2 accordingly. Enter the Excel sheet name if there is more
than one sheet in the Excel datafile. Enter the variable names of the
5 variables (each occupying a column) and separate each name by a
space. Then press OK. (Note that the length of time series for each
variable, 1 to 1305, has been entered earlier in (2), and now appears in
the Import Sample space.)
(7) By now, the 5 variable time series would be loaded. They appear as the
additional objects cadate, datev, day, dayno, and return in the
worksheet. We can think of objects as sheets of paper with data
series. To look at a particular object e.g. return (or to look at a sheet
of paper), we click on the object = return, and then under the submenu
of series: RETURN Workfile , click View Spreadsheet to see the
data series of the variable return. This is shown in diagram
SPREADSHEET below.
(8) On the same submenu, click View Graph Line to see the line
graph of the variable return. This is shown in diagram LINEGRAPH
below.
(9) On the same submenu, click View Descriptive Statistics
Histogram and Stats to see the histogram of the variable return. This
is shown in diagram HISTOGRAM below.
399
SPREADSHEET
LINEGRAPH
400
HISTOGRAM
(10) We can place the mouse pointer on either Range or Sample in the
worksheet space just below the submenu and then double-click to
change the range or sample inputs, e.g. from 1 to 200 (instead of the
present 1 to 1305) if we wish to work with a smaller sub-sample.
(11) An important resource in the EVIEWS program is to click on the main
menu Help Eviews Help Topics Index in order to learn the
details of certain commands etc.
(12) Now we will learn how to generate additional time series of variables.
We can either type directly on the programming window series
s=@sum(return), then hit carriage return, or we can click the Genr
(generate) in the workfile submenu, and type s=@sum(return) , then
click OK. Both methods will yield the same output series s which now
appears as an object in the worksheet. It is a series sample size 1305
with each number equal to the sum of the 1305 return rates. Note that if
we use the programming window to output new series, we have to add
the prefix series to initialize s. See the diagram below.
401
Generate New
Seris
(13) We can also generate independent new time series as follows. Typing in
the programming window:
series x=0.03+@sqr(0.5)*nrnd
generates a series named x which is a random draw from a normal
distribution with mean 0.03 and variance 0.5.
Then generate another series y that is correlated with x by typing in
the programming window:
series y=0.005+0.5*x+@sqr(0.125)*nrnd
The last term above @sqr(0.125)*nrnd is really an independent
normal r.v. e or normal white noise with mean 0 and variance 0.125.
Mean of y is 0.005+0.5*E(x)+E(e) = 0.005+0.5*0.03+0 = 0.02. The
variance of y is (0.5)2*var(x)+var(e) = 0.25*0.5+0.125 = 0.25.
The covariance between x and y is cov(x,y) = 0.5*var(x) = 0.25.
Indeed, y=0.005+0.5*x+e itself forms a linear regression model
associating x with y. If we write y = a + bx + e, then a linear
regression estimation should yield estimate of a close to 0.005, and
402
estimate of b close to 0.5. We will use EVIEWS to examine the
plausibility of this construction next.
(14) Once the x and y series are generated and appear in the worksheet, we
can create a new object called Group in order to perform some crossvariable analysis. To create the Group, first highlight the variables on
the worksheet that are to be put into the Group we choose series x
and y that have just been formed hold down Ctrl button and select
the variables to highlight.
Then on workfile submenu click Objects New Object Group
(at this stage, leave the Name for Object box as it is, i.e. Untitled)
OK. Then a Series List box will appear, with x y in the box
confirming the chosen series x and y to be grouped. Click OK. Then the
Group XY: workfile will appear showing the spreadsheet containing
only the columns of x and y. Click Name on the submenu of the Group
XY: workfile and proceed to give a name to this new Group object as
xy, upon which xy will appear in the worksheet as the new object.
We can also perform a shortcut to the above by hightlighting the
variables x and y to be grouped, then right-click, click Open as
Group .
403
(15) Once the Group object xy is created, we can click on it. The Group: XY
workfile will be displayed. Click on its submenu View Descriptive
Stats Common Sample, and obtain the statistics as shown in the
diagram below. The sample mean of Y and X are highlighted, and are
shown to be close to 0.02 and 0.03 as expected. Moreover, the sample
standard deviations of Y and X are 0.496 and 0.721 respectively, giving
variances of about 0.25 and 0.5 as expected.
View
Y
X
Y
0.246229917878
0.248612205356
X
0.248612205356
0.519961054164
404
SCATTERPLOT
405
REGRESSION
RESIDUAL TEST
406
Appendix C
LINEAR REGRESSION IN EXCEL
C.1
INTRODUCTION
Many of the regressions done in this book can be performed using the Data
Analysis subroutine in EXCEL. To enable the Data Analysis subroutine, click
on the main menu Tools in EXCEL. Then click Add-Ins Next tick the
relevant toolboxes required. In this case, tick Analysis ToolPak and also
Solver Add-in. In the Microsoft Office 2007 version, go to the Office button,
click and choose Excel Options and then find the Add-Ins.
There are two ways of performing linear least square regression using
Excel.
(a) From main menu of Excel, click Tools Data Analysis Regression
(b) From main menu of Excel, click Insert (choose and click) Functions
Statistical Linest
The two methods are applicable for both a simple and multiple linear
regression analysis. We demonstrate here procedures in (1) for a linear
regression on a dataset.
C.2
Click OK. Excel provides several options that determine how the regression
407
output displays the output. We will highlight some of these.
408
Before we proceed to the regression output file, Excel is used to show the line
graph of the two time series of oil price and rig counts.
409
In the New Worksheet Ply: Select and enter the new sheet name regression
output so that the output is stored in the new sheet that will be created. Click
OK. The output file in the regression output sheet is shown below.
410
In this regression example, the dependent variable is the number of oil rigs
produced by a company, and the explanatory variable is the price of oil. We
would expect that if the price of oil increases, then the company will find it
profitable to produce more oil rigs as there would be a higher demand for oil
rigs to harvest more oil.
C.3
EXCEL is also packaged with VBA or Visual Basic for Applications which is
essentially a BASIC language that when coupled with the Microsoft EXCEL
spreadsheet program, enhances the latters functionality and applicability for the
more complex computational problems beyond just spreadsheet calculations.
VBA is largely about trying to automate repeated calculations using short-cut
programs. These programs are either user-defined VBA programs or they
employ Macros (or subroutines) in VBA. VBA is an Objected-Oriented
programming language. Each Excel object e.g. worksheets, charts, range, etc.
represents a functionality. The objects are usually arranged in a hierarchy. For
example, the range, e.g. A1:C10 in EXCEL, is represented as an object called
data. This is referenced hierarchically using Application.Workbooks
(Distributions.xls). Sheets(lognormal).Range(data), i.e. we have to first
point to the Excel file, then the correct worksheet within that file, then the
correct data or range.
411
Appendix D
MULTIPLE CHOICE QUESTION TESTS
D.1
TEST ONE
Please circle the alphabet associated with the most appropriate answer.
1. If we run an Ordinary Least Squares regression of Yi = a + bXi + ei where
ei is a white noise that is not correlated with Xi, and suppose sample
averages for Y and X are 3 and 4 respectively. Moreover, a is found to be
1. What is the value of b ?
(a)
0.5
(b)
1.0
(c)
4/3
(d)
Indeterminate from the given information
2. In the CAPM model, the SML line refers to the regression of:(a)
(b)
(c)
(d)
market model
joint multivariate normal returns
quadratic utility
positive alphas
4. Suppose a firm generates a perpetual after tax annual earnings per share of
$2 which are distributed as dividends on a stock with required return of
10% p.a. by the market. A new technology allows the firm to retain 50%
earnings each year to reinvest in the new technology with a higher return
of 20%. Would the new share price be:
(a)
(b)
(c)
(d)
<$10
$10
$20
>$20
412
5. If we wish to estimate beta j according to the CAPM model, we should
strictly apply linear regression of
(a)
(b)
(c)
(d)
6. Since CAPM deals with real and not nominal return rates, i.e. beta is
covariance of real returns divided by variance of real market return, then
using excess returns for regression is suitable in this case since it
(a)
(b)
(c)
(d)
(b)
(c)
(d)
min
b X t
X Y b X 0 b XXY
X bX
X
t
t
2
t
ut
incorrect
biased
unbiased
degenerate
Xt
ut ,
X 2t
2
t
413
9. The purpose of computing cost of capital is not
(a)
(b)
(c)
(d)
10. The stock price according to the dividend growth model can be very
volatile not because of the following reason:
(a)
(b)
(c)
(d)
Answer Key:
1a, 2c, 3d, 4d, 5d, 6a, 7d, 8c, 9d, 10d
414
D2.
TEST TWO
Please circle the alphabet associated with the most appropriate answer.
1. For a random walk process on price, the forecast (or conditional
expectation) of return rate next time period t+1, given all information up
to time t, is
(a)
(b)
(c)
(d)
zero
constant
uncertain, dependent on information at t
white noise
415
(a)
(b)
(c)
(d)
wrong specification
asymptotic errors
multi-collinearity
errors-in-variables
8. If the variance ratio statistic VR(q) is less than 1, then it is quite likely that
when one regresses 5-year returns on its lag, the coefficient of the slope is
(a)
(b)
(c)
(d)
zero
positive
negative
infinite
416
(c)
(d)
10. Which of the following is not true about holding stocks over a longer
horizon
(a)
(b)
(c)
(d)
Answer Key:
1b, 2d, 3d, 4c, 5b, 6c, 7c, 8c, 9a, 10c
417
D3.
TEST THREE
Please circle the alphabet associated with the most appropriate answer.
1. In a financial event study, the time line is usually broken up into 2
adjacent blocks called the
(a)
(b)
(c)
(d)
4. The market model parameters and for each stock can be estimated
consistently (and also BLUE in finite sample) using OLS
(a)
(b)
(c)
(d)
Yes
No
Not sure
Sometimes
418
6. In selecting different stocks (or sometimes same stock but at different
times e.g. different years in a bonus shares event) for a common event
study e.g. bonus shares issue, it is important to ensure as far as possible
that their calendar dates do not cluster together. The following is not a
reason for this non-clustering.
(a)
(b)
(c)
(d)
~
1 N
AR it d N (0, i2 ) . If AAR t AR it , is average (or aggregated)
N i1
abnormal returns of stocks i=1,2,,N, all at event time t, then AARt has
distribution (assuming independence of AR across stocks)
(a)
(b)
(c)
(d)
2
N 0, i
N
i2
N 0, 2
N
1 N
N 0, i2
N i1
1 N
N 0, 2 i2
N i1
CAAR 1 , 2
~
d N(0,1)
var CAAR 1 , 2
for each 2 = 1 , 1+1, 1+2,.., 2 within the event period, what is the
null hypothesis?
(a)
(b)
that the given event has no impact on the returns process (or more
specifically the abnormal returns) up to event day 2
that the given event has significant impact on the returns process
(or more specifically the abnormal returns) up to event day 2
419
(c)
that the given event has no impact on the returns process (or more
specifically the abnormal returns) every day within (1, 2)
(d)
that the given event has significant impact on the returns process
(or more specifically the abnormal returns) every day within (1,
2)
9. Given the following diagram of CAAR versus event day, what would you
infer about the event if the standard error of CAAR at (-15,0) is 2%? (The
event is a type of financial announcement. Assume the model you used to
compute abnormal returns is correct.)
5%
2%
-15
-10
-5
2 s.d.
+5
+10
+15
event day t
(a)
(b)
(c)
(d)
The event does not carry information and the market is efficient
The event carries insignificant information and the market is
efficient
The event carries significant information and the market is
efficient
The event carries significant information and the market is
inefficient
(d)
Answer Key:
1a, 2b, 3a, 4a, 5c, 6c, 7d, 8a, 9c, 10a
420
D.4
TEST FOUR
Please circle the alphabet associated with the most appropriate answer.
1. An analyst suspected that for some strange reasons, the daily return rates
to some stocks are usually lower on Fridays and higher on other days of
the week. To test this hypothesis, he collected the daily return data r t of a
particular stock, and performed the following linear regression using
ordinary least squares.
rt c1 c 2 I1 c3I 2 c 4 I 3 c5I 4 e t
where cis are the regression constant coefficients, et is the disturbance
that is assumed to be i.i.d., and
1 if it is Monday
I1
0 otherwise
1 if it isTuesday
I2
0 otherwise
1 if it is Wednesday
I3
0 otherwise
1 if it isThursday
I4
0 otherwise
It is noted that trading takes place only on week days. If he had included
another dummy variable
1
I5
0
if it is Friday
otherwise
OLS would be
(a)
(b)
(c)
(d)
421
3. The reported p-values for the t-statistics of c 1 ,c 2 ,c 3 ,c 4 , c 5 are 0.03, 0.11,
0.08, 0.15, and 0.02 respectively. Based on test at 5% significance level
for a 2-tail test, we can
(a)
(b)
(c)
(d)
0.02%
0.03%
-0.01%
none of the above
5. A currency speculator was trying to understand the following spotforward relationship of C$ (versus US$). According to the unbiased
expectations hypothesis
Ft,t+3 = Et (St+3) + t,t+3
where Ft,t+3 is the forward 3-month C$ per US$ as at time t, St+3 is the
future spot rate at t+3 months, Et(.) denotes conditional expectation given
all market information current at t, and risk premium t,t+3 = 0.
Which of the following is a random variable at t?
(a)
(b)
(c)
(d)
t,t+3
Ft,t+3
St+3
Et(St+3)
which
of
the
422
(a)
(b)
(c)
(d)
7. There are typically many specifications that are consistent. The following
is one. What restrictions on the regression coefficients and disturbance are
implied by the UEH?
St c0 c1F t k,t et , k>0
(a)
c0 0, c1 0, E e t | Ft k,t 0
(b)
c0 0, c1 0, E e t | Ft k,t 0
(c)
c0 0, c1 0, E e t | Ft k,t 0
(d)
c0 0, c1 1, E e t | Ft k,t 0
S e
Ft,t 3 c0 c1E
t
t 3
t
S is an
where Ft,t+3 is a forward 3-month rate as in Q5, and E
t
t 3
unbiased estimator at t of E t St 3 . What are the problems in such an
OLS regression?
(a)
(b)
(c)
(d)
423
(a)
(b)
(c)
(d)
(b)
(c)
(d)
Answer Key:
1d, 2b, 3d, 4a, 5c, 6c, 7d, 8c, 9b, 10b
Answers to Q5, Q6, and Q8 are explained in more details as follows.
A5.
424
A6.
A8.
425
D.5
TEST FIVE
Please circle the alphabet associated with the most appropriate answer.
1. If Yi = c0 + c1Xi + ei ,
i=1,2,,N
BLUE
biased but consistent
GLS
unbiased but not efficient
2. If Yt = c0 + c1Xt + c2Zt + et ,
t=1,2,,T
4.
repeated OLS
estimating ut and transforming Yt , Xt , Zt using this
estimating and transforming Yt , Xt , Zt using this
generalizing the covariance matrix of ut and then applying OLS
Accept H0
Inconclusive on H0
Reject H0, accept negative autocorrelation
Reject H0, accept positive autocorrelation
426
5. Besides the DW d-statistic, what other test statistics do you use to help
detect serial or autocorrelations?
(a)
(b)
(c)
(d)
Jarque-Bera
Shapiro-Wilk
Box-Pierce-Ljung
Johnson
close to 0
close to 2
different from 2
cannot be computed
Yes
No
Uncertain
Sometimes
BLUE
unbiased but not consistent
biased and not consistent
none of the above
OLS
GLS
IV
DW
427
10. In the test of the UEH, St+k = c0 + c1Ft,t+k + et+k , based on joint hypothesis
H0: c0 = 0 and c1 = 1, which test statistic is used? Note that N is the sample
size.
(a)
(b)
(c)
(d)
tN-2
Fk,N-2
Fk-1,N-2
F2,N-2
Answer Key:
1d, 2c 3c, 4b, 5c, 6c, 7b, 8c, 9c, 10d
428
D6.
TEST SIX
Please circle the alphabet associated with the most appropriate answer.
1. When a stochastic process {Yt} is covariance-stationary, the following
statement is not true:
(a)
(b)
(c)
(d)
4. While both a trend stationary process and a unit root process may display
similar looking trends, their difference is shown by
(a)
(b)
(c)
(d)
429
5. If Xt, Yt and Zt are all unit root processes, and we perform OLS regression
of
Xt = a+bYt+cZt+et where et is a disturbance that is independent of all the
other variables, then
(a)
(b)
(c)
(d)
the process variable must have a time trend that increases linearly
with time
the process variable must have a variance that increases linearly
with time
the process variable is unpredictable
the process variable is trend stationary
430
9. In the long-run, if PPP holds, then
(a)
(b)
(c)
(d)
exchange rate and the two price indices are integrated processes
real exchange rate is stationary
variance of real exchange rate must converge to zero
exchange rate follows a deterministic trend
b , and
(a)
testing if Yt Zt is stationary
(b)
(c)
(d)
Answer Key:
1c, 2d, 3d, 4c, 5d, 6b, 7d, 8b, 9b, 10c
*In Q10, (d) does not necessarily have the ADF distribution used for t-values.
431
D7.
TEST SEVEN
Please circle the alphabet associated with the most appropriate answer.
1. In the CAPM model, beta is
(a)
(b)
(c)
(d)
Y aX
cov(Ri,RM)/var(RM)
corr(Ri,RM)/M
(XTX)-1(XTY)
< 0.01
> 0.01
< 0.02
> 0.02
4. The random walk hypothesis is best described by the postulation that one
cannot make
(a)
(b)
(c)
(d)
5. If a researcher uses Nikkei 225 index futures closing price data at Chicago
Mercantile Exchange on date YYY, compares those with Nikkei 225
index futures closing price data at Singapore Exchange on the date YYY,
and finds significant difference in the notional price, what is the most
432
likely reason to explain the difference?
(a)
(b)
(c)
(d)
7. What is not a good reason for why a financial event study should avoid
calendar-time clustering of sample firm events?
(a) There may be a systematic event impacting the market, unrelated to
the financial event, that occurred at a calendar time
(b) Correlations across different stocks at the same calendar time may
introduce more sampling errors
(c) The study implications may become conditional on the general
business condition or regime at that calendar time
(d) The market may impact different stocks differently at the same
calendar time
8. In a day-of-the-week test of significant returns, the reported p-values for
the t-statistics of c 1 ,c 2 ,c 3 ,c 4 , c 5 are 0.06, 0.01, 0.03, 0.09, and 0.04
respectively. Based on test at 5% significance level for a 2-tail test, we
can
(a)
(b)
(c)
(d)
0.02%
433
(b)
(c)
(d)
0.03%
-0.01%
none of the above
OLS is biased
OLS is inconsistent
OLS is inefficient
OLS cannot provide for a test
Answer Key:
1b, 2a, 3a, 4d, 5b, 6b, 7d, 8b, 9b, 10c
* For Q10, OLS can provide the HCCME for test, so it is not d.
434
D8.
TEST EIGHT
Please circle the alphabet associated with the most appropriate answer.
1. The
is
(a)
(b)
(c)
(d)
3. What is not an appropriate finance concept you learn from this course?
(a)
(b)
(c)
(d)
435
6. Multi-factor models are:(a)
(b)
(c)
(d)
buy-and-hold
track an index
outperform by taking risk
match market performance
stock-picking
levered shortsales
market timing
sector rotation
436
Appendix E
SOLUTIONS TO PROBLEM SETS
Chapter 1
1.1 Show E (X+Y+Z) = E (X) + E (Y) + E (Z).
x y z f x, y, z dx dy dz
x f x, y, z dxdydz y f x, y, z dxdydz
z f x, y, z dxdydz
x [ f x, y, z dydz ] dx y [ f x, y, z dxdz ] dy
z [ f x, y, z dxdy ] dz
x f x dx y f y dx z f z dz
EX EY EZ
X
,
i Xj
i
1
j
N
N
E X i E X i X j E X j
j 1
i 1
1.2 cov
E X i E X i X j E X j
N
i 1 j 1
E X i E X i X j E X j
N
i 1 j 1
covX i , X j
N
i 1 j 1
covX , X 1
N
i 1 j 1
NxN
where 1 1 1 1N x 1 .
T
437
PU 3 PU 1 ,U 2 ,U 3 holding U 3 constant
U1 U 2
~
~
Y1 b U1
~
Y1 b 1 with probability of 0.5
i.e. ~
Y1 b 1 with probability of 0.5
~
~
Y2 2b U 2
~
Y2 2b 2 with probability of 0.5
i.e. ~
Y2 2b 2 with probability of 0.5
~
~
b X1Y1 X 2Y2 Y1 2Y2
2
2
5
X1 X 2
438
b 1 22b 2
with probability of 0.25
5
b 1 22b 2
b
with probability of 0.25
5
i.e.
b 1 22b 2
b
with probability of 0.25
5
b 1 22b 2
b
with probability of 0.25
5
(iii) Find the mean and variance of b .
b
1
1
1
1
E b 0.25 5b 5 5b 3 5b 3 5b 5
5
5
5
5
1
20b b
20
Var b E b 2 b 2
0.25
25
[25 b 2 50 b 25 25 b 2 30 b 9 25 b 2 30 b 9
25 b 2 50 b 25 b 2
1.5
1
100
100 b 2 68 b 2 0.68
e X Y 0 x, y
f x, y
0 otherwise
f X x e x y dy
0
f X x e x e y dy
0
f X x e x e y
f X x e , 0 x
x
439
fY x e y , 0 y
Since f x, y f X x . fY y , then X and Y are stochastically
independent.
1.6
y
2
y = x/2
x
0
f X x
1 dy
0
f X x y 0 2
x
fY y
x
for 0 x 2 (0 o.w.)
2
1dx
2y
fY y x2 y 21 y for 0 y 1 (0 o.w.)
2
x
E X x dx
2
0
2
4
1
E X x3
6 0 3
E X
x4
2
8 0
440
2
2
4
Var X EX 2 E 2 X 2
9
3
1
E Y 2 y1 y dy
0
1
2 1
E Y y 2 y 3
3 0 3
2 y 1 y dy
1
EY
EY
2 y 3 1 y 4 16
2 0
3
Var Y EY 2 E2 Y
2
1 1 1
6 9 18
E XY xy (1) dydx
0 0
x
y2 2
E XY x dx
2 0
0
2
x x
E XY dx
2 2
0
2
1
E XY x 3dx
80
2
1 x4
1
E XY
8 4 0 2
CovX, Y EXY EX EY
1 4 1 1
2 3 3 18
f x | y
f x, y
1
fY y 21 y
441
2
E X | Y x f x | y dx 109
2y
EX | Y
1
xdx
21 y 2y
2
1 x2
EX | Y
21 y 2 2 y
E X | Y
1
4 4 y2
41 y
EX | Y 1 y
f Y | X
1
2
x/2 x
x
E Y | X
y x dy
0
2 y2 2
E Y | X
x 2 0
1 x2 x
E Y | X
x 4 4
211 y x dx
2
E X 2 |Y
2y
1 x3
E X |Y
21 y 3 2 y
611 y 8 8 y
E X 2 |Y
109
f x, y dx 1 .
442
E X 2 |Y
4
1 y y2
3
Var X | Y E X 2 | Y E 2 X | Y
Var X | Y
4
2
1 y y 2 1 y
3
1 2
1
Var X | Y y y 2
3 3
3
Var X | Y
1
1 y 2
3
E Y2 | X
2
dy
x
x
2 y3 2
EY |X
x 3 0
E Y
2 x
| X
3x 8
3
E Y2 | X
x2
12
Var Y | X E Y 2 | X E 2 Y | X
Var Y | X
x2 x2
12 16
x2
VarY | X
48
1.7 Xit is distributed as univariate normal, N(0,1) for i=1,2,3, and
t=1,2,.,60. Yt = 0.5X1t + 0.3 X2t + 0.2 X3t . Thus, E(Yt) = 0 since
443
E(Xit) = 0. E(Yt2) = E(Yt E(Yt))2 = var(Yt) = var(0.5X1t + 0.3 X2t + 0.2
X3t) = 0.25 + 0.09 + 0.04 = 0.38.
Standard Deviation (Yt) = 0.38 =
K
1.8 AX i ~ N 0 ,
A2
. Hence A = 60. If random vector Y = (X1, X2,
60
1 60
X i 0.5
60 i 1
60
a 0.5 1.96
b 0.5 1.96
0.24
0.376
60
0.24
0.624
60
444
= Rt+[N-1],t+N + Rt+[N-2],t+[N-1]+ Rt+[N-3],t+[N-2] + .+ Rt,t+
= N x (1/N) { Rt+[N-1],t+N + Rt+[N-2],t+[N-1]+ .+ Rt,t+ }
This converges to N x normal (, 2/N), as N increases. Hence, it
is distributed as Nomal (N, 2N)
(ii) Use the Jarque-Bera test statistic
2 32
2
n
distributed as 2
24
6
where 2
1 60
Rt R 2
59 t 1
1 60
Rt R 3
59 t 1
3
1 60
Rt R 4
59 t 1
and
.
4
returns
On average, the monthly returns will display negative skew and fatter
tails, as seen in the bold pdf.
2.3 Let X = [X1 X2 X3], n 3 matrix. Average return vector
M3 x 1 = XT L /n.
E(M) = 3 x 1. Var-covariance matrix of M, or
445
Var(M) = diag(1/n2 LT n x n L), where diag(K) means a diagonal matrix
with diagonal elements = K. Note that M is MVN.
2.4 Using the Law of Iterated Expectations, take unconditional expectation
over Et(Rt+1| t) = XtQ, to obtain E(Rt+1) = E(XtQ) or E(Rt+1-XtQ) = 0. Since
Rt+1 and XtQ are stationary, assume ergodicity, and test for the sample
mean of the time series {Rt+1 - XtQ ) to be zero.
2.5 E[E(et+1|Pt)] = E(et+1) = 0 by Law of Iterated Expectations. Also,
Pt E(et+1|Pt) = E( Ptet+1|Pt) = 0 implies E( Ptet+1) = 0. Thus,
cov(et+1,Pt) = E( Ptet+1) E(Pt)E(et+1) = 0-0 = 0. Hence implies zero
correlation.
2.6 X = eY where Y ~ N(,2). E(X) = exp[( + 2)].
E(X2) = E(e2Y) = exp(2 + (4) 2) = exp[(2 + 22)]
So var(X) = E(X2) [E(X)]2 = exp[(2 + 22)] exp[(2 + 2)]
Chapter 3
3.1 (i) b
XY 60 XY
X 60 X
2
1.64
a Y bX 0.0052
e2 SSR / 58 106
1
0.0004
var b e2
X 2 60 X 2
2
1
X
2.667 108
var a e2
60 X 2 60 X 2
446
3.2 s 2
1 n
1 n
2
X k2 nX 2 .
n 1 k 1
n 1 k 1
Also, Var X E X 2 2
2
n
. So, E X 2 2
2
n
2 2
1 n
1 n 2
2
2
2
E
X
nE
(
X
)
k
n 1
n 1 k 1
n
k 1
1
n 1 2 .
n 1
E (s 2 )
3.3 No, Bt is not a stationary process. This is because BT becomes zero with
zero variance.
3.4 R2
3.5 (i) X 4.5 , Y 6.5
X2
; one just needs 2 points (X1 , Y1) ,
3.6 Strictly speaking, if Y
X
(X2 , Y2) to infer and . But requiring OLS implies random error is
involved. A suitable assumption of the data structure is:
Random error , Y = Y
X2
X
X2
X + u
Y
provided Y0 , and u = -
X2
Then use OLS to regress
on constant and X.
Y
3.7 Let the variance of the random error be u2 and the income variable be ai.
Then, the variance of the estimator of the coefficient of the income
variable is
447
u2
e2
3.9 (i)
1 T 2
1 T
e it
rit i i rmt
T 2 t 1
T 2 t 1
1 T
1 T
2
rit i i rmt
mt
m
T 1 t 1
T 2 t 1
2 rit i2
Chapter 4
4.1 (i) u = sqrt(0.00245/98) = 0.005. sd ( a ) = 0.005*0.316 = 0.00158,
sd( b ) = 0.005*0.25 = 0.00125; ta = 2.53, tb = 3.2
have b < 1. Thus the portfolio beta bp < 1. E(rp) = rf + bp [E(rm) rf] <
rf + [E(rm) rf] = E(rm). Thus rp is likely to fall below rm on average.
4.3 We need to make some assumption about initial outlay. Assume he has
$Pt to start with. Instead of buying stock, he puts $Pt into riskfree bond
448
yielding interest rate r. He short-sells $Pt and puts this into r, but has to
pay r for borrowing the scripts. At t+1, he buys in at $Pt+1. Final payoff
at t+1 is $ Pt (1+r) + [Pt - Pt+1] . Initial outlay at t is $Pt . Return factor is $
[Pt (1+r) + Pt - Pt+1] / $Pt = (1+r) (Pt+1/Pt 1) .
Return rate is r (Pt+1/Pt 1). This is r - (return in a long position).
So if Pt+1 = Pt , then return rate is just r. If Pt+1 = 0, then return rate is
100% + r.
4.4 Yes, if the assets beta is negative and the market risk premium is
positive.
4.5 Golds return rate is low and negatively or lowly correlated with market
return. Thus gold has a beta close to zero, if not negative.
Chapter 5
5.1 Utility companies are monopolies or oligopolies and hold strategic
resource that belong in part to the country. They should not overcharge and build excess profits. There is no competition of service
providers unlike private goods. Thus the rates must be commensurate with
keeping the firm ongoing but without profiting from the captive
consumers.
5.2 According to DGM, if all earnings are issued as dividends, then
P/E = 1/(R-g) where P is current stock price, E is expected next period
Earnings, R is the risky discount rate, and g is the earnings growth. Hence
a high P/E would imply a high growth rate, provided R > g (hence risk
also would be higher), and thus higher future earning prospects.
5.3 Earnings are $1 per share a year forever. Share price =
$(1/1.05 + 1/1.052 + 1/1.053 + ) = 1/0.05 = $20.
5.4 The generated dividends each year may be shown as the sum of the
entries in each row.
$/share
generated from
first plough-back
of retained
earnings
generated from
second ploughback of retained
earnings
generated from
third ploughback of retained
earnings
2003
2004
2005
2006
2007
0.4
0.4
0.4
0.4
0.4
(0.6*1.05)*0.4
(0.6*1.05)*0.4
(0.6*1.05)*0.4
(0.6*1.05)*0.4
(0.62*1.052)*0.4
(0.62*1.052)*0.4
(0.62*1.052)*0.4
(0.63*1.053)*0.4
(0.63*1.053)*0.4
449
In general, in the nth plough-back of retained earnings, additional dividend
issue due to that portion is (0.6n*1.05n)*0.4.
The present value of the dividend stream is (summing all diagonals) per
share is :
0.4 {1/1.05 + 0.6*1.05/1.052 + 0.62*1.052/1.053 + 0.63*1.053/1.054 +.}
+0.4/1.05{1/1.05+0.6*1.05/1.052+0.62*1.052/1.053 + 0.63*1.053/1.054 +
}+0.4/1.052{1/1.05+0.6*1.05/1.052+0.62*1.052/1.053+0 .63*1.053/1.054
+ }+ ..
= 0.4[1+1/1.05+1/1.052+] {1/1.05 (1+0.6+0.62+0.63+..)}
= 0.4(1.05/0.05)(1/1.05)(1/0.4)
= 1/0.05
= $20
Therefore, the price per share is unchanged at $20. This is unchanged by
the dividend policy as the retained earnings do not yield additional returns
over the original share returns if all dividends were issued.
5.5 P = D1/(R-g) = .5*1.03/(0.05) = $10.30
5.6 There is no inherent inconsistency between the SML and DGM. The SML
is a single period model providing the risk-adjusted required rate of return,
while the DGM is on pricing a stock given the required rate of return and
all future expected earnings or dividends. The DGM further imposes some
restrictions such as constant expected return and constant growth in
earnings or dividends. The latter may be inconsistent with empirical
versions of CAPM where over time the required rates of return are
allowed to vary from period to period.
Chapter 6
6.1 Find the mean and variance of rt.
rt rt 1 t
Let E rt m t
m m m
.
1
Let Var rt r t
2
r 2 2 r 2 2 r 2
2
.
1 2
450
(iii) rt = 1.5% + ut 0.1ut-1
Obtain -0.1 from -0.1/(1+0.1^2) = -0.099. MA is invertible,
(1-0.1B)=0 lies outside unit root, ie B>1. AR() representation is
appropriate.
(1-0.1B)-1rt = (10/9) x 1.5% + ut, or rt
= 1.667% - 0.1rt-1 0.12rt-2 0.13rt-3 -.
Forecast is 1.667% - 0.2% - 0.01% - 0.0012%
= 1.4558 or approximately 1.46%.
6.3 (i) E(Yt) = 5;
(ii) var(Yt) = 1.16 var(ut) ; autocovariances: (1) = -0.4 var(ut) ; (k) = 0
for |k| > 1. ACF of Yt is 1 for k=0, -0.4/1.16 = -0.345 for k=1, 0 for
|k|>1.
(iii) Mean and ACF are independent of Yt , hence it is covariance
stationary.
(iv) (Yt 5) = (1-0.4B)ut . So, 1/(1-0.4B) * (Yt 5) = ut . The root or zero
of the equation (1-0.4B) = 0 is B = 1/0.4 = 2.5 outside the unit circle,
so the MA process is stationary, and thus it is invertible. The
invertible AR is (1+0.4B+0.16B2+0.064B3+..)(Yt 5) = ut or
Yt = 25/3 0.4Yt-1 0.16Yt-2 0.064Yt-3 - + ut
6.4 (i) (1-0.5B-0.4B2)Yt = 2+ut . Roots of characteristic equation )
(1-0.5B-0.4B2)=0 is B = -2.33 or 1.075. Both are outside the unit
circle, so the AR process is stationary.
(ii) E(Yt) = 2/(1-0.5-0.4) = 20
(iii) var(Yt) = 0.25 var(Yt) + 0.16 var(Yt-1) + 2(.5)(.4)(1) + var(ut)
implies 0.59 var(Yt) = 0.4 (1) + var(ut).
Next multiply equation by Yt-1 and take expectation, (1) = 0.5 var(Yt)
+ 0.4 (1), so var(Yt) = 1.2 (1).
Thus, 0.308 (1) = var(ut), or (1) = 3.247 var(ut),
and var(Yt) = 3.896 var(ut).
Multiply by Yt-2 and take expectations,
(2)=0.5(1)+0.4var(Yt).
Therefore (0)=1; (1) = 3.247/3.896=0.833; (2)=0.817.
In general for higher k, (k) = 0.5(k-1) + 0.4(k-2).
(iv) 11 = (1) = 0.833; 22 = ((2)- (1)2)/(1-(1)2 ) = 0.402;
kk = 0, k>2
6.5 (i) Since uts are uncorrelated,
var(Yt) = var(ut) + A2[var(ut-1) + var(ut-2 ) +].
451
The second term on the right sums to infinity, hence var(Yt) is not
finite. Hence, Yt is not covariance-stationary.
(ii) Yt Yt-1 = ut + (A-1)*ut-1 . Thus it is MA(1), or Yt is ARIMA(0,1,1),
and is stationary. (Remember all finite MA processes are stationary.)
ACF of the first-differenced series is (0) = 1
(k) = (A-1)/[1+(A-1)2] for k = +1 or -1
(k) = 0 for |k| > 1
6.6
N * 2
cov(r , r ) N * b b
N * N*
2
m
cove i , e j
= b p 2m N *2
2
1 N*
ri )=
N * i 1
(ii)
(iii)
7.3
452
(iii) Per period discount factor is (4/3)1/2. Discount over 2 periods from
t=1 to t=3 is 4/3.
P1|B = E(P3|B)/[4/3] = $25/3 4/3 = $6
P1|X = E(P3|X)/[4/3] = $7/2 4/3 = $2 5/8
P1 = E(P3) 4/3 = $67/15 4/3 = $3 7/20 or $3.35
(iv) The observed market price of $3.45 is approximately P1 without
information. Hence the market is not efficient with respect to
information B or X at time t=1.
7.5 To show variance ratio, VR q 1
2
1
q q
,
q q1
2
. Now
1 2
rt q rt rt 1 rt 2 ... rt q 1
qVar rt q
Var rt .
.
.
q 1
Var rt q 1T 1
1q
.
.
.
q 2
q 3
is
q 1
q 2
q 3
VRq 1
VRq
1
q 2 q 1 q 2 2 q 3 3 ... q q 1 q 1
q
2
q 2 3 ... q 1 2 2 3 3 ... q 1 q 1
q
2
VRq 1 2 1 2 ... q 2 2 2 3 3 ...q 1 q 1
q
453
VRq 1 2
VRq 1
1 2 1 q 1
2
1
q 1
q 1
q 1 2
1 1 q 1 1
q 1
q 1 q
q 1
q
2
q 1 q
q
VRq 1
1
1
q1 q
VRq 1
2
1
q q
.
q
q
1
454
0.099483+riskfree return over 5 yrs = 0.099483+[1.015-1] = 0.1505, or
15.05% over 5 yrs. This is also (1.15050.2-1) = 0.0284 or 2.84% p.a.
Chapter 9
9.1 (i) N(0, 21*0.01) = N(0, 0.21)
(ii) CAAR 1 , 2
AAR t
t 1
1 N
CAR i 1 , 2
N i1
1
N2
i 1
1 1 i2
455
Adj R2 = 0.2928
(ii) F2,98 = [R2/1]/[(1-R2)/98] = 41.996; t98 = sqrt(41.996) = -6.480
(iii) Standard error = -6.480;
t-statistic of SIZE = -0.003/-6.480 = 0.0004629
(iv) The smaller size of target has significantly larger increased return at
1% significance level.
10.2 t-statistic = (0.867-1)/0.113 = -1.177. We need to know the sample
size in order to determine the degrees of freedom of the t-statistic. But for
typical n > 30, the test will not reject H0 at 5% level.
10.3
Y1
Y
2
Y
:
Yn
Y
Y
Y
:
Y
1
X 21
1
X
22
L=
X=
:
:
X 2n
1
X 31
X 32
:
X 3n
X2
X
X 2
:
X2
and
..... X k 1
..... X k 2
.....
:
..... X kn
X 3 ..... X k
X 3 ..... X k
;
: ..... :
X 3 ..... X k
b2
b
3
B =
:
bk
d2
d
3
D= .
:
dk
a and c are constants.
Then in general, the regression of Y=aL + XB will yield OLS
estimates a, B that are different from OLS estimates c, D in the
regression Y Y cL ( X X )D .
In the special case where k=2,
N
b 2
X
i 1
N
2i
X
i 1
X 2 Yi
2i
X2
456
N
b 2
[ X
2i
i 1
N
[ X
i 1
X 2 0]Yi
X 2 0]
2i
regression.
For
first
regression,
c Y Y b 2 [X 2 X 2 ] 0 0 0 .
Thus OLS estimate B=D, but c=0 whereas a is not necessarily 0.
10.4 (i)
457
131.72
(ii) B 19.423
26.844
(iii) 36.84 I52x52
X X
T
1 3 1.5
372.75 1.5 125
0.85
0.006910
123
5.56137 105 2.78068 105
1
2u X T X
5
0.002317237
2.78068 10
0.3
t a
40.23
0.00746
b 1
0.09
t b
1.869
0.0481 0.0481
2u
(ii)
a b 1 X Xb a 1 /2
T
F2,123
U
/123
U
125 1.5 0.3
0.3 0.09
/2
3 0.09
1.5
T
0.085/123
809.87
458
10.10
10.11(i)
(ii)
(iii)
(iv)
Chapter 11
11.1
11.2
Chapter 12
12.1
459
12.2
0
1
r
1
0
T
1
R r R X T X R T
u T u /2
R r /2 0.44786
1
p-value = 0.6907
Hence we cannot reject H0 at conventional 10% significance level.
12.3
Since the disturbances are all classical, this is clearly a multicollinearity problem where Y and Z are highly correlated, both not
correlated with the noises. Given A and B show significance in Y and
Z, it is incorrect to drop both as regressing with just X will be
underspecification with missing variable and will result in biased
coefficient for X. He could pick either A or B with the higher R2. Or
use some theoretical justification to settle for A or B. Or if he could
increase the sample size till c2 and c3 estimates become significant,
then continue with C.
12.4
Chapter 13
13.1
13.2
460
13.3
No. The alpha and beta should explain all the cross-sectional expected
returns. However, if there are severe measurement errors in j , this
may cause some variations to be explained by the irrelevant j .
13.4
13.5
13.6
Yes, if the market proxy is not the true market return, then it could be
that the true market return is Rmt = a1rmt + a2x1t + a3x2t. Regression of
rjt = a + b Rmt + ejt that produces significant b is supportive of
CAPM.
There is contemporaneous correlation in the explanatory variables that
could cause serious bias in the interpretation of the t-statistics.
Removing all or some of the xjts may make the estimate a 1 become
significantly positive, which would accord with CAPM or at least the
interpretation that the market proxy is MV efficient.
L log f(Y1,Y2,.,YT | X1,X2,.,XT)
NT
T
1 T
T
ln2 ln Yt A BX t 1 Yt A BX t
2
2
2 t 1
where A (a1, a2, .., aN)T and B (b1, b2, .., bN)T. Note that Xt is a
scalar.
Finding FOCs:
L A 1 Yt A BX t 0 N1
t 1
L B 1 Yt A BX t X t 0 N1
t 1
T 1 1 1 T
T
Yt A BX t Yt A BX t 1
2
2
t 1
0 N N
T
Hence B
Yt X t X
t 1
X t X
B
and A
X
t 1
t 1
t 1
where N1 T 1 Yt and X T 1 X t
. Moreover,
461
1 T
B
B
X Y A
X
Yt A
t
t
t
T t 1
The estimators are the same as OLS estimators under joint normality.
Chapter 14
14.1(i) cov(r1t,r2t) = -0.12m2+0.036>0, therefore m2<0.3 or 30%.
(ii) Key is to note that r1t and r2t can also be represented by, so
0.2 It + 1t =e1t and
0.3 It + 2t = e2t , and so cov(0.2 It + 1t , 0.3 It + 2t) = 0.036.
0.2x0.3 var(It) = 0.036, so var(It) = 0.6, or 60%.
14.2(i) No, we do not expect the coefficient to be significantly different
from zero as unsystematic risk is not priced in an economy where
all investors are fully diversified.
(ii) In this case where investors are not fully diversified, K 1 may be
significantly positive and represents risk premium as
compensation to investors for idiosyncratic risk of their holdings.
Chapter 15
15.1 (i) Measurement error depresses estimate in the disposable case
(ii) Forecast is 13.025
(iii) Yes, will be affected because it multiplies the estimators errors. Thus,
the further this value is from the sample mean, the larger is the
forecast error.
(iv) No, simultaneity bias.
15.2
15.3
462
Y1* = 1 2 Y1
Perform similar transforms on the Xs. Then regress Y * on c and
X*. If the disturbance is not correlated with X or PRICE, then
estimators are unbiased and consistent. The adjustment depends
on how accurate estimate of above is. If it is not accurate, then
estimates may become biased and worst off.
(iv) k (inclusive of constant) = 2, so k-1 =1
tN-k (of PRICE) = 31.20888 = t751-2 and t7492 = F1,749
so Fk-1, N-k = 31.208882 = 973.99
(R2/[k-1])/(1-R2)/[N-k] = Fk-1,N-k = 973.99,
so R2 = 973.99/[749+973.99] =0.5653
adjusted R2 , R 2 1
RSS /( N k) 1 k N 1 2
R
TSS /( N 1) N k N k
= -1/749+(750/749)*0.5653=0.5647
15.4
Chapter 16
16.1
Yes, unit root process allowing for drift and trend as suggested by the
3 tests. To avoid negative price, we can use ln Pt = a + ln Pt-1 + t by
taking exponentials. The price process would be Pt = ea . Pt-1 . et .
16.2
16.3
Chapter 17
17.1
463
17.2
17.3
17.4
Chapter 18
18.1
18.2
R: event of recession
I: event of inverted Yield Curve the year earlier
Prob(R|I) = Prob(RI)/Prob(I) = [7/50]/[14/50]=50% only
E(Yt+1)=0.01-1.2*0.02= - 0.014. Yt+1 | Xt ~ N(-0.014, 0.0004).
So Prob(Yt+1<0) = Prob({Yt+1-[-0.014]}/0.02<0.014/0.02)
= Prob(z<0.7) = 0.76
The latter is parametric, and the distribution contains more
information than the original non-parametric. It could be that this
periods negative yield slope is especially large, being larger than
average.
Run the OLS regression lnPt = lnA B rt + et where et is residual
error. B>0, so increase in rt leads to fall in bond price Pt. There is the
usual inverse price-yield relationship for bonds.
= 10.13 (% change in 5-yr T-bond
OLS estimate of ln A = 4.714, B
price per unit change in 3-m T-bill rate)
464
Chapter 19
19.1
19.2
Chapter 20
20.1
20.2
Ht
M t 1 Pt 1 Pt or H t M t 1 Pt 1 1 are stationary.
Gt
Gt
Pt
20.3
20.5
465
are stationary. ML method requires the serial correlation to be fully
specified in order to be able to compute the likelihood function of the
sample. However, if the distribution is in fact known, then the ML
method will in general yield more efficient estimators as GMM does
not make use of distributional assumptions. Also, in finite sample, ML
method in principle allows the finite sample distribution of the
estimators to be computed for exact inference. GMM can only provide
asymptotic inference, and tends to have biases in finite sample
inference and testing.
466
INDEX
abnormal return, 77, 82, 165, 169,
170, 171, 172, 173, 177, 182, 210,
440, 441, 455, 479, 489
ACF, 115, 118, 122, 123, 125, 126,
127, 129, 131, 132, 313, 475, 476
adjusted R2, 183
aggregate consumption, 2
Akaike information criterion, 183
alpha, 73, 77
American call, 369, 370, 373, 375,
489
American put, 368, 371, 375
annual volatility, 38, 40, 385
ANOVA (analysis of variance), 206,
216
Appraisal ratio, 73
APT, 271, 272, 275, 276, 282
AR(1), 108, 109, 110, 113, 114, 116,
117, 119, 120, 122, 126, 130, 149,
152, 155, 223, 228, 238, 243, 290,
292, 450
arbitrage, 44, 269, 271, 274, 286, 380
arbitrageur, 63, 64, 66, 67
ARCH, 322, 323, 327, 328, 329, 330,
334, 339, 341, 342
ARCH-in-mean, 322
ARIMA, 104, 107, 127, 128, 129,
305, 475, 488
ARMA, 107, 108, 112, 115, 120, 127,
136, 327, 328, 452
asymptotic efficiency, 183, 322
asymptotic test, 36, 119
asymptotic variance, 146, 232, 244,
245, 246
Augmented Dickey-Fuller statistic,
302
autocorrelation function, See ACF
autocorrelogram, 107, 119, 124
autocovariance function, See ACF
autoregressive process, See AR(1)
average abnormal return, 165
backward shift operator, 107
bankruptcy, 106, 166, 176, 327, 479
BASEL II, 323
BE/ME, 277, 279, 280, 281, 284
467
Cochrane-Orcutt procedure, 219, 292
coefficient of determination, 44
cointegration, 248, 302
conditional mean, 27
conditional probability, 1
conditional variance, 27, 322
confidence level, 324, 325, 430
consistency, 183, 244
constrained regression, 90
consumption beta, 381
contemporaneous correlation, 219,
230
continuous probability (pdf), 5, 8, 9,
10, 14, 18
continuous time stochastic process,
345, 354
correlation (correlation coefficient), 1,
11, 12, 56, 58, 247, 248, 291, 293,
460, 483
cost of carry, 44
cost of equity, 90, 95, 97, 98, 99, 104,
106
covariance, 1, 10, 204, 243, 269, 282,
390, 413, 459
Cox-Ingersoll-Ross model, 345
Cramer-Rao lower bound, 322
credit ratings, 344
credit spread, 345
critical region, 21, 22, 23, 129, 215,
240
cross-sectional Regression, 249, 257
cumulative abnormal return, 165
cumulative average abnormal return,
165
daily returns, 37, 38, 39, 41, 212, 217,
326, 419
data types, 1
day-of-the-week effect, 206
decomposition of squares, 44
defaults, 346
de-seasonalization, 107
deterministic trend, 302
DGM, 99, 100, 106, 473, 474
diagnostic, 341
disposable income, 2, 3, 4, 299, 300
dividend growth model. See DGM
468
futures hedging, 44
futures margin, 322
GARCH, 322, 323, 328, 330, 332,
333, 334, 335, 338, 339, 341, 342
Gauss-Markov theorem, 44
GDP, 3, 11, 124, 125, 127, 166, 202,
273, 303, 364, 365, 481
generalized least squares, 219
generalized method of moments, 381,
See GMM
geometric random walk, 34
GMM, 140, 381, 385, 386, 388, 390,
391, 392, 393, 394, 395, 490
Goldfeld-Quandt test, 219, 237, 242
gross domestic product, 3, 124
Hannan-Quinn criteria, 183
Hansen-Jagannathan bound, 381, 385
HCCME, 235, 236, 282, 458, 459
hedge ratio, 44
hedging, 44, 69, 369
heteroskedasticity, 219, 242, 243, 269,
281, 282, 323, 390
historical approach, 322
historical volatility, 367
HML, 284
homoskedastic, 50, 88, 190, 204, 220,
221, 237, 239, 243, 255, 292, 335,
449
Hotellings T2 Statistic, 249
hypothesis, 1, 21, 142, 148, 153, 214,
288, 301, 312
ICAPM, 269, 271, 272, 285
idiosyncratic risk, 80, 84, 249, 266,
486
implied volatility, 367
inflation, 48, 101, 107, 132
information effect, 165
information leakage, 165
information set, 33, 34, 35, 142, 288,
388
informational efficiency, 133
instrumental variables, 219
interest rate parity, 286
internal rate of return, 90
Intertemporal Capital Asset Pricing
Model, See ICAPM
469
multi-factor asset pricing, 269, 272,
276, 277
multiple linear regression, 87, 183,
191, 201, 212, 219, 277, 278, 428
multivariate distribution, 43, 191
N225 index, 63, 64, 69, 70, 327
net present value, 90, 94
Newey-West covariance estimator,
381
news, 138, 139, 166, 168, 169, 170,
175, 176, 178, 179, 214, 273, 380
nominal data, 24
non-stationary process, 302
OLS, 49, 51, 54, 55, 56, 57, 59, 60,
61, 69, 70, 71, 72, 76, 80, 81, 82,
88, 91, 92, 101, 103, 113, 125, 151,
160, 168, 184, 185, 186, 188, 190,
191, 193, 194, 196, 197, 198, 201,
202, 204, 207, 208, 209, 210, 211,
213, 215, 218, 219, 220, 221, 224,
225, 226, 227, 228, 229, 230, 231,
233, 234, 235, 237, 238, 239, 240,
241, 242, 243, 244, 245, 246, 247,
248, 249, 255, 256, 257, 264, 265,
268, 276, 279, 280, 282, 290, 291,
293, 295, 296, 297, 298, 299, 300,
307, 308, 309, 310, 312, 315, 334,
338, 342, 365, 366, 434, 435, 438,
440, 443, 445, 446, 448, 449, 450,
451, 452, 453, 455, 457, 458, 459,
471, 480, 481, 484, 486, 487, 489
option premium, 367
ordinal data, 24
ordinary least squares, See OLS
Ornstein-Uhlenbeck process, 345, 355
orthogonal projection, 183
orthogonality conditions, 381
out-of-sample forecast, 107, 154
overlapping data problem, 286, 287,
290, 292, 293
overlapping forecast errors, 290
P/E ratios, 2, 210
PACF, 107, 122, 123, 125, 126, 127,
131
parametric approach, 322
partial autocorrelation function. See
PACF
Phillips curve, 48, 49
portfolios, 62, 83, 84, 85, 101, 210,
251, 253, 255, 258, 259, 260, 276,
279, 284, 285
post-event period, 165
PPP, 302, 314, 315, 316, 317, 454
pre-determined, 49, 50, 184, 191, 368
price pressure effect, 165
price-to-earnings, 1 See P/E ratios
pricing anomalies, 206, 210
probability limits, 219, 229
purchasing power parity, 302, 314,
See PPP
put-call parity theorem, 373
random coefficient, 232, 248
random walk, 27, 33, 34, 130, 133,
134, 135, 136, 141, 144, 145, 147,
148, 150, 155, 156, 290, 310, 375,
437, 453, 456, 474, 478
rates of return, 27
rational expectation, 133, 137, 138
regressand, 44, 248
regression through the origin, 90, 103
regressor,, 44
relative risk aversion parameter, 391,
394
relevance exclusion, 219
representative agent, 381
residual income, 90
risk factors, 269
risk management, 322, 323
Rolls critique, 249, 252, 268
RSS, 51, 58, 193
S&P 500 Index, 1, 2, 376
sample mean, 19, 28, 29, 42, 92, 216,
262, 425, 469, 480, 486
sample variance, 19, 56, 71, 174, 340
sampling, 1, 85, 167
Schwarz criterion, 70, 183, 200, 205,
213, 215, 257, 301, 360, 362, 363,
365
seasonal Effect, 206
security characteristic line, 73, 81
security market line, 90 See SML
semi-log, 48
470
serial correlation. See autocorrelation
Sharpe measure, 73, 83, 84
Sharpe ratio, 83, 102, 385
short rate, 345, 356, 357
significance level, 22, 23, 36, 38, 39,
40, 41, 68, 80, 84, 118, 129, 148,
178, 182, 201, 204, 215, 270, 282,
292, 293, 295, 313, 390, 392, 438,
444, 449, 453, 455, 457, 480, 484
Simultaneous equations bias, 219
single index, 75
skewness, 16, 17, 36, 37, 38, 40, 85,
393
Slutsky theorem, 231
SML, 81, 100, 101, 106, 252, 253,
254, 256, 433, 474
spot rate(s), 294, 345, 351
spot yield curve, 345
spurious regression, 302
standard deviation, 10, 79
standard normal variable, 13
statistical test, 1
stochastic regressor, 183
stochastic trend, 248, 302
stock index futures, 44
stock return, 2, 27, 30, 33, 36, 40, 70,
78, 84, 101, 119, 131, 150, 155,
156, 158, 163, 166, 170, 259, 277,
380, 437
Student-t, 1, 18, 20, 172
substitution effect, 165
sum of squares, 51, 57, 58, 193, 216,
See also RSS
super-consistency, 302
switching regression model, 232
systematic risk, 73
term structure, 99, 145, 352, 353, 355
testing of coefficient, 44
tests of restrictions, 183
three-moment asset pricing, 393
transaction costs, 64, 65, 66, 137, 153,
210, 373
treasury slope, 345, 364, 365
trend stationary process, 302
Treynor measure, 73, 82, 83
two-fund separation theorem, 251
471