Gaodun - CFA2 Quantitative

全球财经证书培训领导品牌
http://finance.gaodun.cn
Brief Introduction
Topic weight:
Quantitative Methods
Study Session 1-2 Ethics & Professional Standards 10 -15%
Study Session 3 Quantitative Methods 5 -10%
Study Session 4 Economics 5 -10%
Level 2 -- 2017 Study Session 5-6

Study Session 7-8
Financial Reporting and Analysis
Corporate Finance
15 -20%
5 -15%
Instructor: Feng
Study Session 9-11 Equity Investment 15 -25%
Study Session 12-13 Fixed Income 10 -20%
Study Session 14 Derivatives 5 -15%
Study Session 15 Alternative Investments 5 -10%
Study Session 16-17 Portfolio Management 5 -10%
Weights: 100%
Brief Introduction Brief Introduction
Content: 考纲对比:
Ø Study session 3: Quantitative Methods for Valuation Ø 与2016年相比，2017年的考纲没有变化。
• Reading 9: Correlation and Regression
• Reading 10: Multiple Regression and Issues in
Regression Analysis
• Reading 11: Time-Series Analysis
• Reading 12: Excerpt from “Probabilistic approaches:
scenario analysis, decision trees, and simulations”
Brief Introduction Brief Introduction
推荐阅读: 学习建议:
Ø 定量投资分析 Ø 本门课程逻辑递进关系很强，要把每个知识点学懂了再继
• Richard A. DeFusco, Dennis W. 续往前学；
Mcleavey, Jerald E. Pinto,
Ø 听课与做题相结合，但并不建议“刷题”；
David E. Runkle
Ø 最重要的，认真、仔细的听课。
• ISBN: 978-7-111-38802-9
• 机械工业出版社
Correlation Analysis
幸福就是，有人爱、有事做、 Tasks:
Ø Calculate and interpret a sample covariance and a
有所相信、有所期待！
sample correlation coefficient;
Ø Formulate a hypothesis test of population correlation
coefficient;
Ø Describe limitations to correlation analysis.
Correlation Analysis Correlation Analysis

Scatter plots Sample covariance
Ø A graph that shows the relationship between the Ø A statistical measure of the degree to which two
observations for two data series in two dimensions. variables move together, and capture the linear
relationship between tow variables.
South Korea
 X   Y -Y 
n
Australia i -X i
C o v (X ,Y )= i= 1
U.K. n -1
U.S.
Switzerland
Ø Ranges of Cov(X,Y): -∞ < Cov(X,Y) < +∞.
ü Cov(X,Y) > 0: the two variables tend to move together;
Japan
ü Cov(X,Y) < 0: the two variables tend to move in
opposite direction.
Sample correlation coefficient Sample correlation coefficient (Cont.)

Ø A measure of the direction and extent of linear r = +1 r = -1
(perfect positive linear (perfect negative linear
association between two variables. correlation) correlation)
C o v (X ,Y )
rX Y =
sXsY
Ø Ranges of rXY : -1 < rXY < +1.

Sample correlation coefficient (Cont.) Sample correlation coefficient (Cont.)

0<r<1 -1 < r < 0 r=0
(positive linear correlation) (negative linear correlation) (no linear correlation)

Steps of hypothesis testing (Review of Level 1) Hypothesis testing of correlation
Ø Step 1: stating the hypotheses: relation to be tested; Ø Test the correlation coefficient between two variables is
Ø Step 2: identifying the appropriate test statistic and its
equal to zero.
probability distribution;
ü H0: ρ=0, Ha: ρ≠0;
Ø Step 3: specifying the significance level;
Ø Step 4: stating the decision rule; r n -2
ü t-test: t = 2
df=n-2;
1 -r
Ø Step 5: collecting the data and calculating the test
statistic; ü Two-tailed test;
Ø Step 6: making the statistical decision; ü Decision rule: reject H 0 if t > + tcritical , or t < - tcritical
Ø Step 7: making the economic or investment decision.
Example: Answer:
A analyst want to test the correlation between variable X and Ø H0: ρ=0, Ha: ρ≠0;
Ø Sample correlation coefficient r = 16/(4×8) = 0.5;
variable Y. The sample size is 20, and he find t he covariance
2 0 -2
between X and Y is 16. The standard deviation of X is 4 and Ø t-statistic: t = 0 . 5 x = 2 .4 5
1 - 0 .2 5
the standard deviation of Y is 8. With 5% significance level, Ø The critical value of two-tailed t-test with df=18 and
test the significance of the correlation coefficient between X significance level of 5% is 2.101;
and Y. Ø Since 2.45 is larger than 2.101, the null hypothesis can be
rejectted, and we can say the correlation coefficient
between X and Y is significantly different from zero.
Limitation to correlation analysis Limitation to correlation analysis (Cont.)

Ø Outlier: may result in false statistical significance of Ø Spurious correlation: statistically significant correlation
linear relationship. exists when in fact there is no relation (no economic
explanation).
Correlation Analysis Summary
Limitation to correlation analysis (Cont.) Ø Importance: ☆☆
Ø Nonlinear relationships: two variables can have a strong Ø Content:

• Covariance and correlation coefficient;
nonlinear relation and still have a very low correlation.
• Hypothesis testing of correlation coefficient;
• Limitation of correlation analysis.
Ø Exam tips:
• 这一部分是后面学习的基础，出题点比较多，出题形式也
比较灵活。
Simple Linear Regression

Dependent variable (Y)
Simple Linear Regression Ø The variable that you are seeking to explain;
Tasks: Ø Also referred to as explained variable or predicted
Ø Describe the assumptions underlying linear regression; variable.
Ø Calculate and interpret the predicted value and Independent variable (X)
confidence interval for the dependent variable; Ø The variable(s) that you are using to explain changes in
Ø Interpret regression coefficients, formulate its the dependent variable.
Ø Also referred to as explanatory variable or predicting
hypothesis testing, calculate and interpret its
variable.
confidence interval.
Simple Linear Regression Simple Linear Regression
Linear regression Simple linear regression model

Ø Use linear regression model to explain the dependent Ø Yi = b 0 + b 1 X i + ε i i= 1 , .... ,n
variable using the independent variable(s) . where:
Yi = ith observation of the dependent variable, Y;
Xi = ith observation of the independent variable, X;
b0 = intercept;
b1 = slope coefficient;
εi = error term for the ith observation (also referred to as
residual of disturbance term).

Assumptions of simple linear regression model The regression line (the line of best fit)
Ø The relationship between the dependent variable (Y) and Ø Ordinary least squares (OLS) regression : chooses values
the independent variable (X) is linear; for the intercept (estimated intercept coefficient, b̂ 0)
Ø The independent variable (X) is not random ; and slope (estimated slope coefficient, b̂ 1), to minimize
Ø The expected value of the error term is 0: E( ε)=0; the sum of squared errors (SSE).
Ø The variance of the error term is the same for all ü Sum of squared errors (SSE): sum of squared vertical
observations (homoscedasticity): E (ε 2i )= σ 2ε i= 1 , .... ,n distances between the observations and the regression
Ø The error term is uncorrelated (independent) across line.
observations: E(εiεj)=0 for all i ≠ j; Ø Equation of regression line: Yˆ i = bˆ 0 + bˆ 1 X i
Ø The error term (ε) is normally distributed.

The regression line Predicted value of dependent variable
Ø Estimated slope coefficient ( b̂ 1) Ø The values that are predicted by the regression equation,
C ov XY given an estimate of the independent variable.
ü Calculation: bˆ 1 =
 2X
Yˆ = bˆ 0 + bˆ 1 X p
ü Interpretation: the sensitivity of Y to a change in X.
where:
• The change of Y for 1-unit change of X.
Yˆ : predicted value of the dependent variable;
Ø Estimated intercept coefficient ( bˆ 0 )
X p : forecasted value of the independent variable.
ü Calculation: bˆ 0 = Y - bˆ 1 X
ü Interpretation: the value of Y when X is equal to zero.

Predicted value of dependent variable (Cont.) Significance test for a regression coefficient
Ø H0: b1= hypothesized value; H a: b1≠ hypothesized value;
Ø The confidence interval for a predicted value of
ü Typically, H0: b1= 0; Ha: b1≠ 0, which means to test
dependent variable is:
whether an independent variable explains the variation
ˆ  (t  s )
Y or ˆ - (t  s )Y Y
Y ˆ  (t  s )
c f c f c f in the dependent variable.
where: bˆ 1 - b 1
Ø Test statistic: t = df=n-2;
s bˆ
t c : two-tailed critical t-value with df=n-2; 1
Ø Decision rule: reject H 0 if t > + tcritical , or t < - tcritical ;

s f : standard error of the prediction.
Ø Rejection of null hypothesis means the regression
coefficient is significantly different from the hypothesized
value.
Simple Linear Regression Summary

Confidence interval for a regression coefficient Ø Importance: ☆☆☆
Ø The confidence interval for a regression coefficient is: Ø Content:
bˆ 1  ( t c  s bˆ ) or bˆ 1 - ( t c  s bˆ )  b 1  bˆ 1  ( t c  s bˆ )
1 1 1 • Underlying consumptions of linear regression;
where:
• Prediction of dependent variable;
t c : two-tailed critical t-value with df=n-2;
s bˆ : standard error of the regression coefficient. • Interpretation of hypothesis testing for regression
1
Ø Can be applied to significance test for a regression coefficient.

coefficient. Ø Exam tips:
ü If the confidence interval does not include zero, the null • 常考点1：underlying consumption，概念题；
hypothesis (H 0: b1=0) is rejected, and the coefficient is • 常考点2：predicted value of dependent variable，计算题。
said to be statistically significantly different from zero.
Simple Linear Regression
Analysis of variance (ANOVA)

ANOVA Analysis (1) Ø A statistical procedure for dividing the total variability of
Tasks: a variable into components that can be attributed to
Ø Describe and interpret ANOVA; different sources.
Ø Calculate and interpret SEE, R2, and F-statistics; ü Total variation = explained variation + unexplained
Ø Describe limitations of regression analysis. variation

• Total sum of squares(SST) = Regression sum of
squares (RSS) + Sum of squared errors (SSE)
Analysis of variance (Cont.) Analysis of variance (Cont.)

Ø A graphic explanation of the components of total Ø Total sum of squares(SST): measures the total variation
variation: in the dependent variable.
n
S S T =  ( Yi - Y )2
i- 1
Ø Regression sum of squares (RSS): measures the variation
in the dependent variable that is explained by the
independent variable.
n
R S S =  (Yˆ - Y ) 2
i-1
Analysis of variance (Cont.) Analysis of variance (Cont.)

Ø Sum of squared errors (SSE): measures the unexplained Ø ANOVA table
Sum of Squares Mean Sum of
variation in the dependent variable. df
(SS) Squares (MS)
n
S S E =  (Y i - Yˆ ) 2 Regression
1 RSS MSR=SSR/1
i- 1 (explained)
• Also known as the sum of squared residuals or the Error
n-2 SSE MSE=SSE/(n-2)
(unexplained)
residual sum of squares. Total n-1 SST -
ü MSR: mean regression sum of squares;

ü MSE: mean squared error.

Standard error of estimate (SEE) Coefficient of determination (R²)
Ø The standard deviation of error terms in the regression. Ø The percentage of the total variation that is explained by
SSE the regression.
SEE= = M SE
n -2
E x p la in e d v a r ia t io n R S S S S T - S S E
R2= = =
Ø Measures the degree of variability of the actual Y-values T o t a l v a r ia t io n SST SST
relative to the estimated Y-values from a regression ü For simple linear regression, R² is equal to the squared
equation; correlation coefficient: R² = r².
ü Gauges the "fit" of the regression line. The smaller the
SEE, the better the fit.

F-statistic F-statistic (Cont.)
Ø An F-statistic assesses how well the independent Ø For simple linear regression, the F-test duplicate the t-
variables, as a group, explains the variation in the test for the significance of the slop coefficient.
dependent variable; or used to test whether at least one ü H0: b1= 0; Ha: b1≠ 0;
independent variable explains a significant portion of the
RSS
M SR 1
variation of the dependent variable. ü F= M SE = SSE dfnumerator=1; dfdenominator=n-2;
RSS n -2
M SR k
F= =
M SE SSE ü Decision rule: reject H 0 if F > Fc.
n -k -1
ü Note: this is always a one-tailed test.
Simple Linear Regression Summary
Limitations of regression analysis Ø Importance: ☆☆☆
Ø Regression relations can change over time ( parameter Ø Content:

• ANOVA;
instability).
• SEE, R2, and F-statistic.
Ø To investment contexts, public knowledge of regression
Ø Exam tips:
relationships may negate their future usefulness.
• 常考点1：给出ANOVA表，计算某空白格；
Ø If the regression assumptions are violated, hypothesis • 常考点2：R2的calculation and interpretation，计算题和概念
tests and predictions based on linear regression will not 题都可能考。
be valid.
Multiple Regression
Multiple regression
Multiple Regression Ø Regression analysis with more than one independent
Tasks: variable.
Ø Formulate a multiple regression and explain the ü Multiple linear regression model
assumptions of a multiple regression model; Y i = b 0 + b 1 X 1 i + b 2 X 2 i + ...+ b k X k i + ε i
Ø Interpret estimated regression coefficients, formulate where:

hypothesis tests for them and interpret the results. Yi = the ith observation of the dependent variable Y
Ø Calculate and interpret the predicted value for the Xji = the ith observation of the jth independent variable Xj
dependent variable. bj = slope coefficient of independent variables
Multiple Regression Multiple Regression

Assumptions of multiple linear regression Assumptions of multiple linear regression (Cont.)
Ø The relationship between the dependent variable and Ø The variance of the error term is the same for all
2 2
the independent variables is linear; observations (homoscedasticity, 同方差性): E (ε i )= σ ε ;
Ø The independent variables are not random. Also, no Ø The error term is uncorrelated across observations: E( εiε
exact linear relation exists between two or more of the j)=0 for all i≠j;
independent variables; Ø The error term is normally distributed.
Ø The expected value of the error term, conditioned on the
independent variables, is 0: E(ε | X1, X2, …, Xk) = 0;

Intercept term (b 0) Hypothesis testing of regression coefficients
Ø Hypothesis: H 0 : bˆ j  b j H a : bˆ j  b j
Ø The value of the dependent variable when the
H 0 : bˆ j  b j H a : bˆ j  b j
independent variables are all equal to zero.
H 0 : bˆ j  b j H a : bˆ j  b j
Slope coefficient (bj) bˆ j - b

Ø Test statistic: t 
j
s bˆ
Ø The expected increase in the dependent variable for a 1-
j
ü df = n-k-1, k = number of independent variables

unit increase in that independent variable, holding the Ø Decision rule: reject H 0 if
other independent variables constant. ü t > + tc , or t < - tc ;
ü Also called partial slope coefficients. ü p-value < significance level (α).

Statistical significance of independent variable Interpret the testing results
Ø Hypothesis: H 0 : bˆ j  0 H a : bˆ j  0 Ø Rejection of null hypothesis means the regression
bˆ coefficient is different from/greater than/less than the
Ø Test statistic: t  j
s bˆ
j hypothesized value given a level of significance ( α).
ü df = n-k-1, k = number of independent variables
Ø For significance testing, rejection of null hypothesis
Ø Decision rule: reject H 0 if
means the regression coefficient is different from zero, or
ü t > + tcritical , or t < - tcritical ;
the independent variable explains some variation of the
ü p-value < significance level (α).
dependent variable.

Confidence interval for a regression coefficient Predicting the dependent variable
Ø The confidence interval for a regression coefficient is: Ø The regression equation can be used to predict the value
bˆ j  ( t c  s bˆ ) or bˆ j - ( t c  s bˆ )  b1  bˆ 1  ( t c  s bˆ )
j j j
of the dependent variable based on assumed values of
where:
t c : two-tailed critical t-value with df=n-k-1;
the independent variables.
s bˆ : standard error of the regression coefficient. Yˆ i = bˆ 0 + bˆ 1 Xˆ 1 i + bˆ 2 Xˆ 2 i + ...+ bˆ k Xˆ k i
1
Ø Can be applied to significance test for a regression where:

coefficient. Ŷ i = predicted value of the dependent variable
ü If the confidence interval does not include zero, the null b̂ j = estimated slope coefficient for jth independent
hypothesis (H 0: bj=0) is rejected, and the coefficient is
variable
said to be statistically significantly different from zero.
Summary
Ø Importance: ☆☆
Ø Content:
ANOVA Analysis (2)
• Assumptions of multiple linear regression;
Tasks:
• Interpretation and hypothesis testing of regression
Ø Describe and interpret ANOVA table;
coefficients;
Ø Calculate and interpret F-statistic, and describe how it
• Prediction of dependent variable.
Ø Exam tips: is used in regression analysis;
• 常考点：regression coefficients的假设检验；出题点比较灵 Ø Distinguish between and interpret R2 and adjusted R 2.
活，包括检验统计量的计算，检验结果的判断和解读。

ANOVA table of multiple regression F-statistics
df SS MSS Ø Test whether all regression coefficients are
Regression k RSS MSR=SSR/k simultaneously equal to zero; or test whether the
Error n-k-1 SSE MSE=SSE/(n-k-1) independent variables, as a group, help explain the
Total n-1 SST - dependent variable; or assess the effectiveness of the
Ø R2 = RSS/SST model, as a whole, in explaining the dependent variable.
Ø F = MSR/MSE with df of k and n-k-1 ü H0: b1= b2= … = bk=0; Ha: at least one b j≠0 (j = 1 to k)
RSS
Ø SEE= MSE M SR k
ü F= =
M SE SSE
n -k -1

F-statistics (Cont.) R² (Coefficient of determination)
Ø Decision rule: reject H 0 if F-statistic > F-critical value. 2 E x p la in e d v a ria tio n R S S S S T -S S E
Ø R = = =
T o ta l v a ria tio n SST SST
ü The F-test here is always a one-tailed test.
Ø Test the overall effectiveness (goodness of fit) of the
Ø Rejection of H 0 means there is at least one regression
entire set of independent variables (regression model) in
coefficient is significantly different from zero, thus at
explaining the dependent variable.
least one independent variable makes a significant
ü For example, an R² of 0.7 indicates that the model, as a
contribution to the explanation of dependent variable.
whole, explains 70% of the variation in the dependent
variable.

R² (Cont.) Adjusted R²
 n -1  
Ø a d ju s te d R 2  1 -     1 - R  
2
Ø For multiple regression, however, R 2 will increase simply  n - k - 1  

by adding independent variables that explain even a where: n = number of observations
k = number of independent variables
slight amount of the previously unexplained variation.
Ø Adjusted R² ≤ R², and may less than zero if R² is low
ü Even if the added independent variable is not
enough.
statistically significant, R 2 will increase. Ø Adding a new independent variable will increase R2, but
may either increase or decrease the adjusted R2.
ü If the new variable has only a small effect on R2, the
value of adjusted R2 may decrease.
Multiple Regression Summary

Interpretation of regression model Ø Importance: ☆☆☆
Ø Interpretation generally focuses on the regression Ø Content:
coefficients; • ANOVA table;

• Calculation and interpretation of F-statistics;
Ø It is possible to identify a relationship that has statistical
• R2 and adjusted R 2.
significance without any economic significance.
Ø Exam tips:
• 常考点1：F-statistics的解读，概念题；
• 常考点2：R2和adjusted R 2的比较，概念题。
Heteroskedasticity (异方差性)
Definition of heteroskedasticity
Violations of Assumptions Ø The variance of the errors differs across observations
Tasks: (i.e., the error terms are not homoskedastic).
Ø Explain the types of heteroskedasticity and how ü Unconditional heteroskedasticity: heteroskedasticity
heteroskedasticity and serial correlation affect of the error variance is not correlated with the
statistical inference; independent variables.
Ø Describe multicollinearity and explain its causes and • Creates no major problems for statistical inference.
effects in regression analysis.

Heteroskedasticity Heteroskedasticity
Definition of heteroskedasticity (Cont.) Effects of heteroskedasticity
ü Conditional heteroskedasticity: heteroskedasticity of Ø The coefficient estimates ( b̂ j) aren't affected.
the error variance is correlated with (conditional on) Ø The standard errors of coefficient ( s bˆ )are usually
j
the values of the independent variables. unreliable.

ü With financial data, the standard errors are most likely
• Does create significant problems for statistical bˆ j - b
underestimate, and the t-statistics ( t 
j
) will be
inference. s bˆ
j
inflated, and tend to find significant relationships where

none actually exist (type I error);
Ø The F-test is also unreliable.
Heteroskedasticity Heteroskedasticity
Testing for heteroskedasticity Testing for heteroskedasticity (Cont.)
Ø Examining scatter plots of the residuals; Ø Breusch-Pagen χ² test.
Residuals ü H0: no heteroskedasticity;
2
ü BP χ² = n  R r e s id with df = k (the number of
independent variables) and one-tailed test;
X • n = the number of observation;
2
• R r e s id = R2 of a second regression of the squared
residuals from the first regression on the
independent variables.
Heteroskedasticity Serial Correlation (序列相关、自相关)

Correcting for heteroskedasticity Definition of serial correlation (autocorrelation)
Ø Use robust standard errors to recalculate the t-statistics; Ø The residuals (error terms) are correlated with one
ü Also called White-corrected standard errors. another, and typically arises in time-series regressions.
Ø Use generalized least squares, other than ordinary least ü Positive serial correlation: a positive/negative error for
one observation increases the chance of a
squares, to build the regression model.
positive/negative error for another observation.
ü Negative serial correlation: a positive/negative error
for one observation increases the chance of a
negative/positive error for another observation.
Serial Correlation Serial Correlation

Effects of serial correlation Testing for serial correlation
Ø The coefficient estimates aren't affected. Ø Residual scatter plots
Ø The standard errors of coefficient are usually unreliable. Residuals Residuals
ü Positive serial correlation: standard errors
underestimated and t-statistics inflated, suggesting
T T
significance when there is none (type I error);
ü Negative serial correlation: vice versa (type II error).
Positive serial correlation Negative serial correlation
Ø The F-test is also unreliable.
Serial Correlation Serial Correlation

Testing for serial correlation (Cont.) Correcting for serial correlation
Ø The Durbin-Watson test Ø Adjust the coefficient standard errors (recommended);
ü H0: No serial correlation
• E.g., Hanson method, which also correct conditional
ü DW ≈ 2×(1−r), if sample size is very large
heteroskadasticity;
• r = correlation coefficient between residuals from
one period and those from the previous period. • Adjusted standard errors, or Hansen-White standard
ü Decision rule: errors.
In- Fail to In- Ø Modify the regression equation itself.
Positive conclusive reject H conclusive Negative
0
DW=0 dL dU 4-dU 4-dL DW=4

(r=1) (r=-1)
Multicollinearity ( 多重共线性) Multicollinearity

Definition of multicollinearity Effects of multicollinearity
Ø Two or more independent variables (or combinations of Ø Estimates of regression coefficients become extremely
independent variables) are highly (but not perfectly) imprecise and unreliable;
correlated with each other. Ø Standard errors of regression coefficients will be inflated,
then t-test on the coefficients will have little power
(more type II error).
ü Greater probability we will incorrectly conclude that a
variable is not statistically significant.
Multicollinearity Summary of Assumption Violations

Testing for multicollinearity Violation Effects Testing
Ø The t-tests indicate that none of the regression • Residual scatter plots
Conditional
coefficients is significant, while R² is high and F-test Type I error • Breusch-Pagen χ²-test
Heteroskedasticity
indicates overall significance; BP = n×R²
Ø The absolute value of the sample correlation between any Positive serial
Type I error • Residual scatter plots
correlation
two independent variables is greater than 0.7 (not • Durbin-Watson test
Negative serial
recommended). Type II error DW≈2×(1−r)
correlation
Correcting for multicollinearity • t-tests indicate no significance
Ø Excluding one or more of the correlated independent Multicollinearity Type II error when F-test indicates overall
variables. significance and R² is high
Summary
Ø Importance: ☆☆☆
Ø Content:
Other Issues in Regression Analysis
• Definition, effects, testing, and correcting of
Tasks:
heteroskedasticity, serial correlation, and
Ø Formulate a multiple regression with dummy variables
multicollinearity.
and interpret the coefficients;
Ø Exam tips:
• 常考点：effects of heteroskedasticity and serial correlation, 概 Ø Describe effects of model misspecification and
念题。 avoidance of its common forms;
Ø Describe models with qualitative dependent variables.
Dummy Variable (哑变量) Dummy Variable

Dummy variable Example
Ø Qualitative variables may be used as independent Ø Yi = b0 + b1X1i + b2X2i + b3X3i + ɛi
variables in a regression. where: Yi = quarterly value of EPS of a stock
ü Dummy variable is one type of qualitative variable, and Y X1 X2 X3
takes on a value of “0” or “1”. Q1 EPS 1 0 0
Q2 EPS 0 1 0
Ø If we want to distinguish among n categories, we need
Q3 EPS 0 0 1
n−1 dummy variables.
Q4 EPS
0 0 0
(omitted category)
Dummy Variable Model Misspecification (模型设定偏误)

Interpretation of coefficient Definition of model misspecification
Ø Intercept coefficient (b 0): the average value of Ø The set of variables included in the regression and the
dependent variable for the omitted category. regression equation’s functional form.
Ø Regression coefficient (bj): the difference in dependent
variable (on average) between the category represented
by the dummy variable and the omitted category.
Model Misspecification Model Misspecification

Categories of model misspecification Effects of model misspecification
Ø Misspecified functional form Ø Regression coefficients are often biased and inconsistent,
ü Important variables ommited
leading to unreliable hypothesis testing and inaccurate
ü Variables need to be transformed
predictions.
ü Pools data incorrectly
Ø Independent variables correlated with the error term
ü Lagged dependent variables as independent variables
ü Incorrect dating of variables
ü Independent variables are measured with error
Ø Other types of time-series misspecification
Model Misspecification Qualitative Dependent Variable (定性的因变量)

Avoiding model misspecification Qualitative dependent variable
Ø The model should be grounded in cogent economic Ø Dummy variables used as dependent variables instead
reasoning;
of as independent variables.
Ø The functional form chosen for the variables should be
ü Probit and logit model
appropriate given the nature of the variables;
Ø The model should be parsimonious; ü Discriminant models
Ø The model should be examined for violations of
regression assumptions before being accepted;
Ø The model should be tested and be found useful out of
sample before being accepted.
Summary
Ø Importance: ☆
Ø Content:
Trend Models
• Dummy variable;
Tasks:
• Model misspecification;
Ø Calculate and evaluate the predicted trend value for a
• Qualitative dependent variable.
time series;
Ø Exam tips:
• 不是考试重点。 Ø Describe factors to determine trend model selection;
Ø Evaluate limitations of trend models.
Trend Models Trend Models

Linear trend models Log-linear trend models
Ø Work well in fitting time series that have constant change Ø Work well in fitting time series that have constant growth
amount with time. rate with time (exponential growth).
yt=b0 + b1t + εt yˆ = bˆ 0 + bˆ 1 t y t = e (b 0 + b 1 t + ε t ) Ln(yt) =b0+b1t+εt
yt
t
Trend Models Trend Models

Linear trend model Vs. Log-linear trend model Limitations of trend models
Ø If data plots with a linear shape (constant change Ø The trend model is not appropriate for time series when
amount), a linear trend model may be appropriate. data exhibit serial correlation.
Ø If data plots with a non-linear (curved) shape (constant ü Use the Durbin-Watson statistic to detect serial
growth rate), a log-linear model may be more suitable. correlation.
Summary
Ø Importance: ☆
Autoregressive Models (AR)
Ø Content:
• Linear trend model & log-linear trend model; Tasks:
• Limitation of trend models. Ø Describe the structure of an AR model, explain the
Ø Exam tips: testing of autocorrelations of the residuals;
• 不是考试重点。 Ø Calculate one- and two-period-ahead forecasts given
the estimated coefficients of an AR model;
Ø Explain mean reversion and calculate a mean-
reverting level.
Autoregressive Model (AR) Autoregressive Model

Covariance stationary Autoregressive model
Ø A key assumption for AR time series model to be valid Ø Uses past values of dependent variables as independent
based on ordinary least squares (OLS) estimates. variables.
Ø A covariance stationary series must satisfy three principal ü AR(1): First-order autoregressive model
x t = b 0 + b 1 x t-1 + ε t
requirements:
ü AR(p): p-order autoregressive model
ü Constant and finite expected value in all periods;
x t = b 0 + b 1 x t-1 + b 2 x t-2 + ...+ b p x t-p + ε t
ü Constant and finite variance in all periods;
• Where p indicates the number of lagged values that
ü Constant and finite covariance with itself for a fixed
the autoregressive model will include as
number of periods in the past or future in all periods.
independent variables.
Autoregressive Model Autoregressive Model

Chain rule of forecasting Detecting autocorrelation
Ø A one-period-ahead forecast for an AR(1) model: Ø Step 1: Estimate the AR(1) model using linear regression;
x̂ = bˆ + bˆ x
t+ 1 0 1 t
ü xt = b0 + b1xt-1 + ɛt
Ø A two-period-ahead forecast for an AR(1) model: Ø Step 2: Compute the autocorrelations ( ρ ε t , ε t - k ) of the
x̂ t+ 2 = bˆ 0 + bˆ 1 x t+ 1 residual;
ü Autocorrelation: the correlations of a time series with
its own past values;
ü The order of the correlation is given by k, where k
represents the number of periods lagged.

Detecting autocorrelation (Cont.) Seasonality
Ø Step 3: Test if the autocorrelations are significantly Ø Time series that shows regular patterns of movement
different from zero. within the year.
ρ ε ,ε
ü t=
t t-k
1/ T Ø Testing of seasonality: test if the seasonal

• T: the number of observations in the time series; autocorrelation of the residual will differ significantly
• Degree of freedom: T-2. from 0.
ü If the residual autocorrelations differ significantly from ü The 4th autocorrelation in case of quarterly data;
0, the model is not correctly specified and need to be ü The 12th autocorrelation in case of monthly data.
modified.

Seasonality (Cont.) Mean reversion
Ø Correcting of seasonality: include a seasonal lag in AR Ø A time series shows mean reversion if it has a tendency
model: to move towards its mean.
ü Quarterly data: xt = b0+b1xt-1+ b2xt-4+εt ü Tends to fall when it is above its mean and rise when it
is below its mean.
ü Monthly data: xt = b0+b1xt-1+ b2xt-12+εt
Ø Mean-reverting level for an AR(1) model:
Ø Forecasting using AR model with a seasonal lag: b0
x t 
1 - b1
ü Quarterly data: xˆ t  bˆ0  bˆ1 x t 1  b2 x t - 4
ü Covariance stationary → finite mean-reverting level;
ü Monthly data: xˆ t  bˆ0  bˆ1 x t 1  b2 x t -12 ü lb1l < 1 in AR(1) model → finite mean-reverting level.
Summary
Ø Importance: ☆☆
Ø Content:
Random Walk
• Covariance stationary and AR model;
Tasks:
• Auto-correlation and seasonality;
Ø Describe characteristics of random walk processes;
• Mean reversion.
Ø Describe unit roots for time-series analysis and the
Ø Exam tips:
• 常考点：mean-reverting level的计算。 steps of the unit root test for nonstationarity;
Ø Demonstrate how a random walk can be transformed
to be stationary.
Random Walk Random Walk

Random walk (simple random walk) Random walk with a drift
Ø A time series in which the value of the series in one Ø A random walk with the intercept term that not equal to
period is the value of the series in the previous period zero (b0 ≠ 0).
plus an unpredictable random error. x t  b 0  x t -1   t
x t  x t -1   t ü Increase or decrease by a constant amount (b 0) in each
ü A special AR(1) model with b 0=0 and b1=1; period.
ü The best forecast of xt is xt-1.

Random walk Vs. Covariance stationary Unit root
Ø A random walk will not exhibit covariance stationary. Ø The time series is said to have a unit root if the lag
ü A time series must have a finite mean reverting level to coefficient is equal to one (b 1=1) and will follow a random
be covariance stationary; walk process.
ü A random walk has an undefined mean reverting level. ü Testing for unit root can be used to test for
b0 0
xt   nonstationarity since a random walk is not covariance
1 - b1 0
Ø The least squares regression method doesn’t work to stationary;
estimate an AR(1) model on a time series that is actually • But t-test of the hypothesis that b 1=1 in AR model is
a random walk. invalid to test the unit root;

Unit root (Cont.) Dickey-Fuller test for unit root
ü Testing of AR model can determine if a time series is Ø Step 1: start with an AR(1) model: xt=b0+b1xt-1+εt ;
Ø Step 2: subtract xt-1 from both sides:
covariance stationary.
xt-xt-1 = b0 + (b1 –1)xt-1 + εt ;
• If autocorrelations at all lags are statistically
ü Or: xt-xt-1 =b0+g1xt-1+εt where: g1=b1-1
indistinguishable from zero, the time series is Ø Step 3: test if g 1=0.
stationary. ü H0: g1=0; Ha: g1<0;
ü Calculate t-statistic and use revised critical values;
ü If fail to reject H 0, there is a unit root and the time
series is non-stationary.
Random Walk Summary

First differencing Ø Importance: ☆☆☆
Ø A random walk (i.e., has a unit root) can be transformed Ø Content:
to a covariance stationary time series by first • Random walk;
differencing. • Testing of unit roots;
ü Subtract xt-1 from both sides of random walk model: • First differencing.
xt-xt-1=xt-1-xt-1+εt=εt
Ø Exam tips:
ü Define yt=xt-xt-1, so yt=εt ; • 常考点：unit roots的检验方法，检验结果的解读，random
Or yt=b0+b1yt-1+εt ; where: b0=b1=0 walk变形为stationary的方法(first differencing)。
ü Then, yt is covariance stationary variable with a finite
mean-reverting level of 0/(1-0)=0.
Model Evaluation
Comparing forecasting model performance
Model Evaluation Ø In-sample forecasts errors: the residuals within sample
Tasks: period to estimate the model;
Ø Contrast in-sample and out-of-sample forecasts; Ø Out-of-sample forecasts errors: the residuals outside
Ø Explain ARCH model; sample period to estimate the model.
Ø Determine and justify an appropriate time-series Ø Root mean squared error (RMSE) criterion: the model
model. with the smallest RMSE for the out-of-sample data is

typically judged most accurate.
Model Evaluation Model Evaluation

Instability of regression coefficients Autoregressive Conditional Heteroskedasticity (ARCH)
Ø Financial and economic relationships are inherently Ø Review of conditional heteroskedasticity:
dynamic, so the estimates of regression coefficients of heteroskedasticity of the error variance is correlated with
the time-series model can change substantially across (conditional on) the values of the independent variables.
different sample periods. Ø ARCH: conditional heteroskedasticity in AR models.
Ø The is a tradeoff between reliability and stability. ü When ARCH exists, the standard errors of the
• Models estimated with shorter time series are usually regression coefficients in AR models are incorrect, and
more stable but less reliable. the hypothesis tests of these coefficients are invalid.

ARCH(1) model Predicting variance with ARCH models
Ø Variance of the error in a particular time-series model in Ø If a time-series model has ARCH(1) errors, the ARCH
one period depends on the variance of the error in
model can be used to predict the variance of the
previous periods.
residuals in future periods.
ˆt2  a 0  a1ˆt2-1  u t , where ut is the error item.
ü ˆ t  1  aˆ 0  aˆ 1 ˆ t
2 2
ü If the coefficient a1 is statistically significantly different
from 0, the time series is ARCH(1).
ü If a time series model has ARCH(1) errors, generalized
least squares must be used to develop a predictive
model.

Steps in time series forecasting Steps in time series forecasting (Cont.)
Does series have a trend? (plotting) No Yes
Covariance
Yes stationary?
First differencing AR(1)?
Linear trend Exponential trend No Yes
Adding lags Serial correlation?
No
DW test for serial correlation? ARCH?
No Yes Yes
General least squares
Use a trend model Use an AR model No
Time Series model
Regression With Two Time Series Regression With Two Time Series
Regression with two time series Regression with two time series (Cont.)
Ø When running regression with two time series, either or ü If both time series have a unit root:
both could be subject to nonstationarity. • If the two series are cointegrated, linear regression
Ø Dickey-Fuller tests can be used to detect unit root: can be used;

• If the two series are not cointegrated, linear
ü If none of the time series has a unit root, linear
regression can not be used.
regression can be safely used;
ü Cointegration: two time series have long-term financial
ü Only one time series has a unit root, linear regression
or economic relationship so that they do not diverge
can not be used;
from each other without bound in the long run.
Summary
Ø Importance: ☆
Ø Content: Simulation
• In-sample and out-of-sample forecasting; Tasks:
• ARCH model; Ø Describe steps of simulation and treatment of
• Regression with two time series. correlation;
Ø Exam tips: Ø Describe advantage, constraints, and issues of
• 不是考试重点。
simulation;
Ø Compare scenario analysis, decision trees, and
simulations.
Simulation Simulation
Steps in running a simulation Define probability distributions for variables
Ø Determine “probabilistic” variables; Ø Historical data
Ø Define probability distributions for these variables; Ø Cross sectional data
Ø Check for correlation across variables; Ø Statistical distribution and parameters
Ø Run the simulation.
Treatment of correlation across variables Advantages of using simulations
Ø When there is strong correlation, positive or negative, Ø Better input estimation;
across inputs, we have two choices: Ø It yield a distribution for expected value rather than a
ü Pick only one that has the bigger impact on value; point estimate.
ü Building the correlation explicitly into the simulation.
Constraints on simulations Issues in using simulations in risk assessment
Ø Book value constraints; Ø Garbage in, garbage out;
Ø Earnings and cash flow constraints; Ø Real data may not fit distributions;
Ø Market value constraints. Ø Non-stationary distributions;
Ø Changing correlation across inputs.
Simulation Summary
Comparing probabilistic approaches Ø Importance: ☆
Ø How to choose among probabilistic approaches: scenario Ø Content:
analysis, decision trees, and simulation: • Steps of simulation and ways to define probability
distribution;
ü Selective vs. full risk analysis;
• Advantages, constraints, and issues of simulation;
ü Type of risk;
• Comparison of scenario analysis, decision tree and
• Discrete vs. continuous.
simulation.
ü Correlations across risks;
Ø Exam tips:
ü Quality of information. • 不是考试重点。

Gaodun - CFA2 Quantitative

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Gaodun - CFA2 Quantitative

Загружено:

Авторское право:

Доступные форматы

全球财经证书培训领导品牌

Level 2 -- 2017 Study Session 5-6

Brief Introduction Brief Introduction

Brief Introduction Brief Introduction

Correlation Analysis Correlation Analysis

Correlation Analysis Correlation Analysis

Sample correlation coefficient Sample correlation coefficient (Cont.)

Ø Ranges of rXY : -1 < rXY < +1.

Correlation Analysis Correlation Analysis

Sample correlation coefficient (Cont.) Sample correlation coefficient (Cont.)

Correlation Analysis Correlation Analysis

Correlation Analysis Correlation Analysis

Correlation Analysis Correlation Analysis

Limitation to correlation analysis Limitation to correlation analysis (Cont.)

Correlation Analysis Summary

Limitation to correlation analysis (Cont.) Ø Importance: ☆☆

Ø Nonlinear relationships: two variables can have a strong Ø Content:

Simple Linear Regression

Simple Linear Regression Simple Linear Regression

Linear regression Simple linear regression model

Simple Linear Regression Simple Linear Regression

Simple Linear Regression Simple Linear Regression

Simple Linear Regression Simple Linear Regression

Ø Decision rule: reject H 0 if t > + tcritical , or t < - tcritical ;

Simple Linear Regression Summary

Ø Can be applied to significance test for a regression coefficient.

Simple Linear Regression

Analysis of variance (ANOVA)

Tasks: a variable into components that can be attributed to

Ø Describe and interpret ANOVA; different sources.

Ø Describe limitations of regression analysis. variation

Simple Linear Regression Simple Linear Regression

Analysis of variance (Cont.) Analysis of variance (Cont.)

Simple Linear Regression Simple Linear Regression

Analysis of variance (Cont.) Analysis of variance (Cont.)

ü MSR: mean regression sum of squares;

Simple Linear Regression Simple Linear Regression

Simple Linear Regression Simple Linear Regression

Simple Linear Regression Summary

Limitations of regression analysis Ø Importance: ☆☆☆

Ø Regression relations can change over time ( parameter Ø Content:

Ø Interpret estimated regression coefficients, formulate where:

Multiple Regression Multiple Regression

Multiple Regression Multiple Regression

Slope coefficient (bj) bˆ j - b

ü df = n-k-1, k = number of independent variables

Multiple Regression Multiple Regression

Multiple Regression Multiple Regression

Ø Can be applied to significance test for a regression where:

Multiple Regression Multiple Regression

Multiple Regression Multiple Regression

Multiple Regression Multiple Regression

Ø For multiple regression, however, R 2 will increase simply  n - k - 1  

Multiple Regression Summary

Ø Interpretation generally focuses on the regression Ø Content:

coefficients; • ANOVA table;

Tasks: (i.e., the error terms are not homoskedastic).

Ø Explain the types of heteroskedasticity and how ü Unconditional heteroskedasticity: heteroskedasticity

statistical inference; independent variables.

effects in regression analysis.

the values of the independent variables. unreliable.

inflated, and tend to find significant relationships where

Heteroskedasticity Serial Correlation (序列相关、自相关)

Serial Correlation Serial Correlation