Академический Документы
Профессиональный Документы
Культура Документы
P( x 1 X X 2)
where
x 1x2
P ( A|B )=
P ( AB)
P ( B)
18. Joint probability of two events - the probability that they will both occur together.
19. Multiplication rule of Probability - multiplication of conditional and unconditional probability.
a. P(AB) = P(A|B) P(B)
20. Independent events - refers to events for which the occurrence of one has no influence on the occurrence of
the others.
a. P(A|B) = P(A) or equivalently P(B|A) = P(B)
21. Dependent events - if the independence condition is not satisfied, the events are dependent events. (the
occurrence of one is dependent on the occurrence of the other)
22. Addition rule for probabilities - used to determine the probability that at least one of the two events will
occur. E.g. given two events A and B, the addition rule can be used to determine the probability that either A or
B will occur.
23. Mutually exclusive events - the joint probability of (AB) is zero, the probability that either A or B will occur is
simply the sum of the unconditional probabilities.
Ch 3 Basic Statistics, Miller, Math and Stats for Risk Management
1. Statistics- word used to refer to data and methods used to analyze data
2. Descriptive Statistics - summary of important characteristics of large data sets
3. Inferential Statistics - pertain to procedures used to make forecasts, estimates, judgements about large set
of data on the basis of statistical characteristics of smaller set.
4. Population - set of all possible members of a stated group
5. Measures of Central Tendency - center, average of a data set. Can be used to represent the typical or
expected value of a data set.
6. Population mean - describes the expected value/average of the entire data set (population)
7. Sample mean
- a partial mean of a sample of the population for n observations < N population. Used to
make inferebnces abut population mean when it is infeasible or not possible to get all members of population.
8. Arithmetic mean - is the only measure of central tendency for which the sum f the deviations from the mean
is zero.
9. Median - midpoint of a data set when ordered from ascending or descending order. Half observations above
and below the median.
10. Mode - most frequent value observed in the dataset.
11. Geometric mean - used when calculation investment returns over multiple periods or when measuring
compound growth rates. nth root of the product of all n observed values.
12. Expected Value - weighted average of all possible outcomes of a random variable.
13. Properties of Expectation - see handwritten lecture notes
14. Variance measure of the expected spread or dispersion of the random variables about the mean (squared) .
can only be used for one variable
15. Sample Variance variance of the sample with n-1 degrees of freedom for an unbiased estimator of variance.
16. Standard Deviation same as variance but in units of the mean/expected value. square root of the variance.
spread or dispersion about the mean.
17. Properties of Variance - see handwritten notes
18. Covariance - expected value of the product of the deviations of the two random variables from their respective
expected values. Describes the co-movement of two variables. Describes a linear relationship between two
variables but does not mean much until reduced down to correlation (as a percentage). Gives direction of comovement but not size(level) of the relationship.
19. Properties of Covariance - see handwritten notes
20. Correlation - easier to interpret covariance when reduced by the standard deviations of the two variables.
Measures the strength of the linear relationship between the two random variables. Ranges between -1<=
<=1
E ( X k ) = pi X ki
E ( Xu ) = pi ( X iu )
38. Point estimate is a linear estimator when it can be used as a linear function of sample data. If the estimator is
the best available (has minimum variance), exhibits linearity, and is unbiased, it is said to be BLUE (best
linear unbiased estimator)
the number of successes per unit. E.g. number of calls per hour arriving at a switchboard, or number of defects
per batch in a prod process, no of phonecalls per hour arriving at phone. Number of x patients affected from
200 procedures with average affected rate=2 persons/procedure.
10. Pdf of a normal distribution 11. Cdf of a normal distribution
12. Confidence interval - a range of values around the expected outcome within which we expect the actual
outcome to be some specified percentage of the time. 95% confidence interval is a range that we expect the
random variable to be in 95% of the time.
13. Confidence intervals for Normal Distribution - interval is based on the expected value (sometimes called a
point estimate) of a random variable and on its variability, which we measure with a standard deviation.
a. For any normally distributed random variable, 68% are within one standard deviation.
b. Approx. 95% are within 2 standard deviations.
x 1.65 ,
x 1.96 ,
x 2.58 ,
x + 1.65
x + 1.96
x +2.58
z=
obspop . mean xu
=
std dev
16. Z-Values standardized random variable. Also determines number of standard deviations of normal. As
std.normal r.v sd = 1. And mean = 0.
17. Lognormal Distribution , pdf 18. Lognormal Distribution, cdf
19. Normal vs. Lognormal Distribution
20. Central Limit Theorem states that for simple random samples of size n from population with a mean
and finite variance
x
2
variance equal
becomes large. Because when the sample size is large, the sum of independent and identically distributed r.vs
will be normally distributed.
21. Properties of Central Limit Theorem
a.
Bayes Theorem
Unconditional Probability of events
Joint probability of events
Bayesian Approach
Frequentist Approach
Bayes vs. Frequentist
Bayesian Theorem with Multiple States
Ch. 7 Hypothesis Testing and Confidence Intervals , Miller, Mathematics a Statistics for Financial Risk
Management
1. Simple Random Sampling - method of selecting a sample in such a way that each item or person in the population
being studied has the same likelihood of being included in the sample.
2. Sampling Error the difference between sample statistic(mean, variance, std dev of the sample) and its
corresponding population parameter(true mean, variance, std dev of the population)
a. Sample error (se) = sample mean population mean =
3. Sampling Distribution of the sample statistic - sample statistic itself is a random variable itself and therefore, has a
probability distribution. This is the prob distribution of all possible sample statistics computer from a set of equal
size samples that are randomly drawn from the same population.
4. Sampling Distribution of the mean repeated process of sampling the mean from n observations from the population.
5. Mean of the sample average e.g. two random variables x_1 and x_2.
( X 1 + X 2 ) = x + x =2 x
a.
Var ( X )=
X1+ X 2 2 x
=
= x
2
2
recall
( X ) = x
2
general terms .
n
7. Standard Error = std dev of the sample average is known as standard error.
8. Population mean all observed values in population are summed and divided by number of observations in
population.
9. Sample mean = sum of all values in a sample of population dived by no. of observations in the sample.
10. Dispersion defined as the variability around the central tendency. Theme in finance is tradeoff between reward
and variability. Central tendency is measure of reward and dispersion is a measure of risk.
11. Population variance - average of the squared deviations from the mean. Population variance uses all members of a
population.
( X )2
i=1
( X )2
s 2= i=1
n1
for small sample sizes, known as a biased estimator using n-1 improves the statistical properties of
2
s as an estimator of
s X =s e X =
s
n
16. Covariance between two random variables is a statistical measure of the degree to which two variables move
together. Captures the linear relationship between one variable and another. Pos cov- move together same
direction. Neg cov move in opp direction
N
( X i X )( Y i y )
co v xy= i=1
( X i X ) ( Y i y )
s . co v xy= i=1
n1
19. Confidence Intervals - estimates that result in a range of values which the actual value of a parameter will lie, given
the probability 1-
is called a level of significance for the confidence interval. , and the probability
variables will range from 15-25 with 95% degree of confidence , or at the 5% level of significance.
22. Construction of a confidence interval usually constructed by adding or subtracting an appropriate value form the
point estimate. In general, confidence intervals take on the following form.
a. Point estimate
x ( 1 x )
where
25.
26.
x z / 2
27. Commonly used normal distribution reliability factor factor, a standard normal variable for which te probability in
the right-hand tail of the distribution is
/2 or
a.
z/ 2
= 1.65 for 90% confidence Intervals (signf level 10%, 5% in each tail
b.
z/ 2
= 1.96 for 95% confidence intervals (sign level 5%, 2.5% in each tail
c.
z/ 2
= 2.58 for 99% confidence interval (sign level 1%, 0.5% in each tail
28. Probabilistic Interpretation after repeatedly samples of the population, constructing confidence intervals for each
sample mean, 99% of the resulting confidence intervals will, in the long run, include the population mean,
29. Practical Interpretation - We are 99% confident that the population mean score is between
x
( 1 , x 2 )
for the
x t /2
s
n
32.
33.
34.
35.
is
/2
Reliability factors depend on sample size and degrees of freedom (n-1), so we cannot rely on standard
set of factors. Confidence intervals with t-statistic will be more conservative(wider) than with zreliability factors.
Selecting Appropriate Test-Statistic
a. Size of the sample influences whether or not we can construct the appropriate confidence interval for
the sample mean.
b. Distribution non-normal but population variance is known - z-statistic can be used as long as the
sample size is large (n>30). We can use this because the central limit theorem assures us that the
distribution of the sample mean is approx. normal when n is large.
c. Distribution is normal and population variance is unknown t-statistic can be used as long as the
sample size is large (n>30). You can use z-statistic but t-statistic is more conservative.
Hypothesis testing is the statistical assessment of a statement or idea regarding a population.
Hypothesis a statement about the value of a population parameter developed for the purpose of testing a theory
or belief. Hypotheses are stated in terms of the population parameter to be tested, like the population mean, .
Hypothesis Testing procedures - based on sample statistics and probability theory, are used to determine whether
a hypothesis is a reasonable statement and should not be rejected or If it is an unreasonable statement and
should be rejected.
a. State the hypothesis
b. Select the appropriate test statistic
c. Specify the level of significance
d. State the decision rule regarding the hypothesis
e. Collect the sample and calculate the sample statistic
f. Make a decision regarding the hypothesis
g. Make a decision based on the results of the test
H0
actually tested and is the basis for the selection of the test statistics. The null is generally a simple statement
about a population parameter.
a. Typical Statement -
H 0 : =0 , H 0 : 0H 0 : 0
39.
40.
41.
38.
where
HA,
hypothesis. It is usually the alternative hypothesis the research is really trying to assess. Why? Since you cannot
really prove anything with the statistics, when the null is rejected, the implication is that the alternative is valid.
Choice of Null and Alternative
a. Most common null will be an Equal to hypothesis . Alternative is often the hoped-for hypothesis.
When the null is that a coefficient is equal to zero, we hope to reject it and show significance of the
relationship.
Mutually excusive alternative - when the null is less than or equal to, the mutually exclusive alternative is framed
as greater than.
Hypothesis testing parameters - includes two statistics
a. Test statistic calculated from the sample data
b. Critical value of the test statistic.
Test statistic calculated by comparing the point estimate of the population parameter with the hypothesized value
of the parameter. (i.e. the value specified in the null hypothesis)
43. Standard error of the sample statistic adjusted standard deviation of the sample.
44. Critical value of the test statistic 45. Alternative Hypothesis can be one sided or two sided. Whether the test is one side or two-sided depends on the
proposition being tested.
46. One-tailed test considered if something is just greater than or less than a value. But if you do not know on which
side, two tailed tests are preferred.
a. Upper tail:
H 0 : o versus H A : 0
b. Upper tail: if calculated test statistic is greater than 1.645 at 5% sign level, we conclude that the
sample stasitic is sufficiently greater than hypothesized value. We reject the null hypothesis.
i. If calculated test statistic is less than 1.645, we conclude that the sample statistic is not
sufficiently different from hypothesized value, and we fail to reject the null.
c. Lower tail :
H 0 : o versus H A : 0
d. The appropriate set of hypotheses depends on whether we believe the population mean, , to be
greater than (upper tail) or ess than (lower tail) of the hypothesized value,
47. Two-tailed test allow for deviations on both sides of the hypothesized value (in general cases, zero)
a. Two-tailed test an be structured as
H 0 : =0 , H A : 0 ,
b. Since the alternative allows for values above and below hypothesized parameter, a two tailed test uses
two critical values(or rejection points)
48. Decision/rejection rule for a two-tailed z-test
a. Reject
H0
if : test statistic > Upper critical value or test statistic < lower critical value
49. Type I Error the rejection of the null hypothesis when it is actually true
50. Type II Error - the failure to reject the null hypothesis when it is actually false. (difficult in practice depends on
sample size and critical calue chosen) (alternative was not statistically significant enough due to sample size and
critical value .also collinearity when more than two variables involved)
51. Power of a test probability of correctly rejecting the null hypothesis when it is actually false. = (1 P(Type II error) )
52. Type I and Type II Errors in Hypothesis Testing
True Condition
H 0 is True
Decision
Do not Reject
Reject
H0
H A is False
Correct Decision
H0
b.
[Sample Statistic (critical value)(standard error) ]
population parameter
(Standard error)]
test stat =
test statistic
+ critical value ]
d. This is the range we fail to reject the null for two tailed hypothesis test at given level of significance
e.
54. Statistical Significance does not imply economic significance. When something is statistically significant based on
data, the economic benefits might be diminished based on executing and maintain the strategy. (transaction
55.
56.
57.
58.
59.
60.
costs, taxes, downside risk from short sales) which can diminish returns and not make strategy economically
viable in long term, even if it is statistically significant above zero.
Economic Significance see directly above
P-value - probability of obtaining a test statistic that would lead to a rejection of the null hypothesis, assuming the
null hypothesis is true. It is the smallest level of significance for which the null hypothesis can be rejected.
One Tailed test-p-value - is the probability that lies above the computed test statistic for upper tai tests or below
the computed test statistic for lower tail tests.
Two tailed test-pvalue - probability that lies above the positive value of the computed test statistic plus the
probability that lies below the negative value of the computed test statistic.
T-test - employs a t-statistic that uses a hypothesis test based on the t-distribution. Mostly used when n>30 and
population variance is unknown. Can be used if distribution is normal and known and sample size n< 30.
Critical Z-values
Level of Significance
.10=105
.05 =5%
.01= 1%
Two-Tailed Test
+-1.65
+-1.96
1%+-2.58
61. ChiSquared Test - used for hypothesis tests concerning the variance of a normally distributed population. Letting
H 0 : = 0 H A : 0
a.
b.
H 0 : 2 20 H A : 2 20
2
2
0
H 0 : H A :
2
0
64. Hypothesis Testing of population variance requires use of chi-squared distributed test statistic , denoted
The chi-squared distribution is asymmetrical and approaches the normal distribution in shape as the degrees of
freedom increase.
65.
2=
( n1 ) s2
20
66. F-test - hypotheses concerned with the equality of the variances of two populations are tested with an Fdistributed test statistic.
a. used under assumption that the populations from which samples are drawn are normally distributed
and that the samples are independent.
b.
H 0 : 21= 22 H A : 12 22
H 0 : 21 22 versus H A : 21> 22
OR
H 0 : 21 22 versus H A : 21< 22
e. F =
s 21
,
s 22
s 21
needed to consider critical value for right-hand tail. Also F-critical value takes into account degrees of
freedom for calculation .
n11n21
f.
g.
67.
Chebyshevs Inequality - states that for any set of observations, whether sample or population data, and
regardless f the distribution, the percentage of the observations that like within k standard deviations of
the mean is at least
1
k2
k >1
for all
69. Importance of Chebyshevs Inequality it applies to any distribution. If we know the underlying distribution is
normal. We can be even more precise about the percentage of observations that will fall within a given number of
standard deviations of the mean.
a. Events for nonnormal distributions may not be so rare, occurring about 11% of the time for events
beyond
3 Standard Deviations.
70. Backtesting - involves comparing expected outcomes against actual data. It is common for Risk managers to
backtest VaR models to ensure model is forecasting losses with same frequency predicted by the confidence
interval.
a. When the VaR measure is exceeded during a given testing period, it is known as an EXCEPTION or an
EXCEEDANCE. Ater backtesting, if number of exceptions is greater than expected, the risk manager
may be underestimating the actual risk and VaR may be underestimated. If no. of exceptions is less
than expected, risk manager may be overestimating actual risk.
b.
71. Limits of backtesting VaR
a. Backtesting VaR can cause issues because exceptions are often serially correlated. High probability
that an exception will occur after a previous period had an exception.
b. Also occurrence of exceptions tend to be correlated to overall market volatility. High exceptions with
high mkt volatility and low exceptions with low mkt volatility.
c. Failure of VaR model to quickly react to Risk levels.
Ch. 11 Correlations and Copulas, Hull, Risk Management and Financial Institutions
1. Correlation - measures the strength of the linear relationship btw two variables over time.
2. Covariance - measures the direction of the co-movement between two variables over time.
3. Rho = -1, 0, and 1 , - standardized measure is more convenient in risk analysis applications than covariance,
which can have values between
cov ( X ,Y )
x y
a.
xy =
b.
co v xy = xy x y
c.
X E ( X ) ( Y E ( Y ) ]=E ( X , Y ) E ( X ) E ( Y )
cov ( x , y )=E
d.
Variables are defined as independent if knowledge of one variable does not impact the probability
distribution for another variable. In other words conditional probability of V2 given information
regarding probability distribution of V1 is equal to the unconditional probability of V2 as expressed in
the following equation
e. P ( V 2|V 1 =x =P(V 2)
f.
Correlation of zero does not imply that there is no dependence between the two variables. It implies
there is no linear relationship between the two variables. But value of one variable can still have a
nonlinear relationship with the other variable.
4. EWMA Model a. Conventional Wisdom suggests that more recent observations should carry more weight because they
more accurately reflect the current market environment. We can calculate a new covariance on day n
using an exponentially weighted moving average(EWMA) model. Model is designed to vary the weight
given to more recent observations (by adjusting
b.
co v n=co v n1 + ( 1 ) X n1 Y n1
, where
Y n1=
X n1
n-1.
c.
d. And also the correlation from the EWMA covariance and EWMA variance.
5. GARCH(1,1) Model - Alternative method to updating covariance rate for two variables X and Y.
co v n=w+ X n1 Y n1 + co v n1
X n1 Y n1
b.
c.
d.
e.
f.
co v n= V L + X n1 Y n1+ co v n1
( co v n1 )
. This
equation requires three weights to sum to 100% or y+a+B=1 and long term average covariance rate
must equal w/(1-a-B)
6. Variance Covariance Matrix - can be constructed using the calculate estimates of variance and covariance
rates for a set of variables. The diagonal of the matrix represents the variance rates where i=j. Covariance
rates are all other elements of the matrix where
i j
7. Positive Semidefinite - matrix is positive semidefinite if it is internally consistent. The following expression
defines the necessary condition for an NxN variance-covariance matrix,
all Nx1 vectors
a.
w,
where
wT
w w0
b. When small changes are made to a small positive-semidefinite matrix, the matrix will most likely
remain positive-semidefinite. However, changes to a large 1000x1000 will most likely cause the matrix
to no longer be positive-semidefinite.
c.
8. Internally consistent
b.
9. Generating Samples for Bivariate Normal
a. Suppose we have a bivariate normal with two variables X and Y.
b. If variable X is known, and value of variable Y is conditional on value of variable X.
c. Expected value of Y is normally distribute with mean of:
i.
E ( Y )=Y + XY Y
( X X )
X
Y = Y 12XY
d. Steps for Generating two samples sets of variables from a bivariate normal distribution.
i.
Independent samples
Z x Z y
Samples
x y
are then generated. The first samples of X variables is the same as the
x =Z x
iii.
y = XY Z x + Z y 12XY
iv.
XY
where
U i= i F + 1 2i Z i
i.
ii.
Constant
is between -1 and 1
F and
Zi
have standard normal distributions and are uncorrelated with each other.
iv. Every
Zi
iii.
U iU j
Ux
standard normal. Mapped for every observation. Done for Y marginal distribution now
Uy
Uy
is linearly dependent on
U x U y
Ux
Uy
U 1U 2
3. Multiply
b. Gaussian (Normal) Copula - maps the marginal distribution of each variable to the standard normal
distribution. The mapping of each variable to the new distribution is done based on percentiles.
c.
Multivariate copula - used to define a correlation structure for more than two variables. Suppose the
marginal distributions are known for N variables.
V 1 V 2 V 3 V N
Ui
Vi
. Distribution
for each i
Zi
Ui
distribution. The
distributions :
i.
U i= i F + 1 2i Z i
ii.
Ui
Z iF
Ui
determines the dependency of the U variables, which also defines the covariance copula for
the V variables.
16. Tail dependence
a. There is a greater tail dependence in a bivariate Students t-distribution than a bivariate normal
distribution.
b. It is more common for two variables to have the same tail values at the same time using the bivariate
Students t distribution.
c. During a financial crisis or some other extreme market condition, it is common for assets to be highly
correlated and exhibit large losses at the same time. Suggests that the Students t Copula is better
than a Gaussian copula in describing the correlation structure of assets that historically have extreme
outliers in the distribution tails at the same time.
Ch4 Linear Regression with One Variable, Watson, Introduction to Econometrics
1. Dependent(explained) variable - variable attempting to be explained by an independent (X) variable.
2. Parameters of an equation - indicate the relationship (change in the relationship) between two variables.
(Linear in an OLS Regression)
3. Scatter plot - a visual representation of the relationship between the dependent variable and a given
independent variable. It uses a standard two-dimensional graph where the values of the dependent, or Y
Variable , are on the vertical axis
a. Can indicate the nature of the relationship between the dependent and independent variable.
b. A closer inspection can indicate if the relationship is linear or nonlinear.
4. Population regression coefficients -
independent variables, but includes the entire population. It can be described as the true parameter estimates.
5. Regression coefficients - parameters of the population regression equation.
a.
E ( Y i|X i ) = 0+ i X i
6. Error Term (noise component) - Difference between each Y and its corresponding conditional Expectation. (the
line that fits the data)
a.
i=Y i E ( Y i| X i )
b. Deviation from the expected value is the result of factors other than the included X-variable(s) .
c.
E ( Y i|X i )
and
b. One interpretation of the term linear relates to the independent variable(s) and specifies that the
independent variables enters into the equation without a transformation. (such as a square root or
logarithm). E.g. X = ln(amount consumed)
c. 2nd interpretation for the term linear applies to the parameters. Specifies that the dependent variable is
a linear function of the parameters, but does not require that there is linearity in the variables.
2
E ( Y i|X i ) = 0+ i X i E ( Y i| X i )= 0 +
d.
1
X
1 i
It would not be appropriate t apply linear regression to estimate the parameters of these functions.
The primary concern for linear models is that they display linearity in the parameters. When we refer
to a linear regression model we generally assume that the equation is linear in the parameters; it may
or may not be linear in the variables.
11. Ordinary Least Squares (OLS) process that estimates the population parameters
values for
bi
with corresponding
that minimize the squared residuals. (i.e. error terms) Recall the expression
e i Y i( b0 +b1 X i )
a. Minimize
e 2i = ( Y i( b 0 +b1 X i ) )2
b1
b. Slope coefficients
n
( X i X )(Y iY )
c.
i=1
b1=
( X i X )
i=1
12. Intercept term - lines intersection with the Y-axis at X=0. It can be positive, negative, zero.
a.
b0 =Y b 1 X
Y mean of Y Xmean
of X
E ( i|X i ) =0.
ii.
iii.
Xi
Var ( i| X i )= 2
Corr ( i , j ) =0 for j=1,2,3,
that knowing the value of an error for one observation does not reveal information concerning the
value of an error for another observation.
f. The error term is normally distributed.
15. Benefits of OLS Estimators a. Interpretation and Analysis of Regression outputs are easily understood across fields of study
b. Unbiased, consistent and under special conditions efficient
16. Properties of OLS Estimators and their Sampling Distributions 17. Unbiased estimator of the population mean The mean of the sampling distribution is used as an estimator of
the population mean is said to be an unbiased estimator WHEN The expected value of the estimator is equal to
the parameter you are trying to estimate. E(X) =
18. Central Limit Theorem - with large sample sizes, it is reasonable to assume that the sampling distribution will
approach the normal distribution. This means that the estimator is also a consistent Estimator.
19. Consistent Estimator - an unbiased estimator is one for which the expected value of the estimator is equal to
the parameter you are trying to estimate. A consistent estimator is one for which the accuracy of the
parameter estimate increases as the sample size increases.
20. Sum of Squares Error (SSE/SSR) - sums of squares that results from placing a given intercept and slope
coefficient into the equation and computing the residuals, squaring the residuals and summing them.
2
( Y iY^ ) = e 2i
a. -
( Y iY )2
22. Relationship between the three SS - Total Sum of Squares = Explained Sum of Squares + Sum of squared
residuals
23. Coefficient of Determination,
R2
percentage of variation in the dependent variable explained by the independent variable. (% explained by
regression parameters)
a. underlying concept is that for the dependent variable, there is a total sum of squares(SST or TSS)
around the sample mean. The regression explains some portion of that TSS.
b. Total Sum of Squares = Explained Sum of Squares + Sum of squared residuals
c.
d.
e.
TSS
R^2
R^2
=
=
=
SSE
1 - SSE/SST
1 - SSR/SST =
SSE/SST
r= R2
SSR
b.
b1 t c s b
c
t
or
b
[ 1( t c s b ) < B1< b1 + ( t c s b ) ]
is the critica two-tailed t-value for the selected confidence level with the appropriate number
of degrees of freedom, which is equal to the number of sample observations minus 2 (i.e. , n-2)
c.
d.
sb
standard error of the regression coefficient. It is a function of the SER. As SER rises ,
sb
also
e.
2. Hypothesis Tests regression coefficients (t-test)
b1
B1
, the
a.
t=
b1B1
sb
1
b.
H 0 : B1=0 H A : B1 0
3. P-value - smallest level of significance for which the null hypothesis can be rejected. A small p-value provides
support for rejecting the null hypothesis
a. Two tailed tests pvalue is the probability that lies above the positive value of the computed test
statistic. Plus the probability that lies below the negative value of the computed test statistic.
b. Pvalue gives a general idea of statistical significance without selecting a significance level.
4. Predicted Values - values of the dependent variable based on the estimated regression coefficients and a
prediction about the value of the independent variable. They are the values that are predicted by the
regression equation, given an estimate of the independent variable.
a.
a.
Y^ ( t c s f ) Y^ ( t c s f ) <Y < Y^ + ( t c sf )
a.
sf
2
1 ( X X )
s =SE R [1+ +
]
n ( n1 ) s 2x
2
f
b j
arent affected.
If the standard errors are too small, but the coefficient estimates themselves are not affected, the tstatistics will be too large and the null hypothesis of no statistical significance is rejected too often. The
opposite is true when standard errors are too large.
13. Detecting Heteroscedasticity a. A scatter plot of the residuals versus the independent variables can reveal patterns among
observations.
b.
14. Correcting Heteroscedasticity - beyond scope of FRM. Some available techniques. such as robust standard
errors.
15. Robust standard Errors - used to recalculate the t-statistics using the original regression coefficients.
16. Gauss-Markov Theorem - if the linear regression model assumptions are true and the regression errors
display homoscedasticity, then the OLS estimators have the following properties:
a. The OLS estimated coefficients have the minimum variance compared to other methods of estimating
the coefficients (i.e. they are the most precise) BEST
b. OLS estimated coefficients are based on linear functions. LINEAR
c. The OLS estimated coefficients are unbiased, which means that in repeated sampling the averages of
the coefficients from the sample will be distributed around the true population parameters. i.e.
E ( b0 ) =B0E ( b 1 )=B1
UNBIASED
E ( ^ 2 ) = 2
UNBIASED
a.
Y i=B0 + B1 X 1i + B2 X 2 i+ + Bk X ki + ei
4. OLS estimators Multiple Regression a. Intercept term is the value of the dependent variable when the independent variables are all equal to
zero
b. Slope coefficients estimated change in the dependent variable for a one-unit change in the
independent variable holding the other independent variables constant.
c. Slope coefficients (parameters) are sometimes called partial slope coefficients.
5. Homoscedasticity - refers to the condition that the variance of the error term is constant for all independent
variables, X.
Var ( i| X i )= 2
6. Heteroscedasticity - dispersion to the error terms varies over the sample. (variance is a function of the
independent variables)
7. Standard Error of the Regression (SER) - measures the uncertainty abou the accuracy of the predicted
values of the dependent variable.
closer to the regression line. (
ei
Y^ i=b 0+ bi X i .
are smaller)
a. Formally SER is the standard deviation of the predicted values for the dependent variable,
Equivalently, it is the standard deviation of the error terms in the regression. SER is sometimes
specified as
se
( b 0+ bi X i )
Y i
nk1
Y^
Y i
i.
i1
SSR
SER=
=
nk 1
ii.
R2
- when
R2
a. Because
almost always increases as an independent variable are added to the model, even if
b.
c.
R2a=1
2
Ra isless thanequal R
Adjusted R-squared.
10. Assumptions of Multiple Regression a. A linear relationship exists between the dependent and independent variables.
b. Independent variables are not random, and there is no exact linear relation between two or more
independent variables.
c. Expect value of the error term, conditional on the independent variables , is zero.
E ( e i| X 1 , X 2 , , X k =0
d. The variance of the error terms is constant for all observations.
e. Error term for one observation is not correlated with that of another observation.
E ( e i e j ) =0 , j i
13. Imperfect Multicollinearity - two or more independent variables are highly correlated, but less than perfectly
correlated.
14. Effect of MultiCollinearity a. Greater probability that we will incorrectly conclude that a variable is not statistically significant (Type
II Error)
b.
15. Detecting Multicollinearity a. The situation where t-tests indicate that non o the individual coefficients is significantly different from
zero, while the R-squared in the MLR model is high.
b. Indicates variables together explain much of the variation but the individual independent variables do
not.
c. Only way this happens is when independent variables are highly correlated with each other.
d. If absolute value of the sample correlation between two indep variables is greater than 0.7
multicollinearity may be a problem.
e. And if individual variables might not be highly correlated, linear combinations might lead to
multicollinearity. So even if low correlation, it may not necessarily mean that multicollinearity is a
problem.
16. Correcting Multicollinearity
a. Usually requires omitting a variable with highest correlation and low individual R^2
b. Stepwise regression until multicollinearity is minimized.
Ch7 Hypothesis Tests and Confidence Intervals in Multiple Regression, Watson, Introduction to
Econometrics
1. Hypothesis Testing of Regression coefficients (Multiple Regression) a. Needed to test estimated slope coefficients to determine if the independent variables make a
significant contribution to explaining the variation in the dependent variable.
2. Determining Statistical Significance - t-statistic used to test the significance of the individual coefficients in a
multiple regression is calculated using the same formula.
a.
t=
b j B j
sb
degrees of freedom.
b.
3. Interpreting p-values - as I said, :p , the smallest level of significance for which the null hypothesis can be
rejected.
a. P-value is less than the significance level, the null hypothesis can be rejected.
b. P-value is greater than the significance level, the null hypothesis cannot be rejected.
4. Other tests of the Regression Coefficients a.
5. Confidence Intervals for Regression Coefficients a.
b j t c sb
b. Critical t-value with n-k-1 degrees of freedom and a 5% significance level, where n is the number of
observations and k is the number of independent variables.
6. Predicting the Dependent Variable a. We can make predictions about the dependent variable based on forecasted values of the independent
variables.
b. But we need predicted values for more than one independent variable.
c.
^ ki
Y^ =b 0 +b1 ^
X 1 i+ b2 ^
X 2 i+ +b k X
7. Joint Hypothesis Testing - tests two or more coefficients at the same time. We could develop a hypothesis for
a linear regression model with three independent variables that sets two of these coefficients equal to zero ,
H 0 : b1=0 b 2=0
versus alternative hypothesis that one of them is not equal to zero. If just one of the
equalities in the null does not hold, we can reject the entire null hypothesis.
a. Using a joint hypothesis is preferred in certain scenarios since testing coefficients individually leads to
a greater chance of rejecting the null hypothesis.
8. F-Statistic a. F-test assesses how well the set of independent variables, as a group, explains the variation in the
dependent variable.
b. F-statistic is used to test whether at least one of the independent variables explains a significant
portion of the variation of the dependent variable.
c.
ESS
k
SSR
nk 1
SSR
sum of squared residuals . ESS and SSR are found in ANOVA table.
Fc
d f num =k
b.
d f denom =nk 1
c.
H0
if F(test statistic)>
Fc ( critical value )
d. Rejection of the null hypothesis at a stated level of significance indicates that at least one of the
coefficients is significantly different than zero, which is interpreted to mean that at least one of the
independent variables in the regression model makes a significant contribution to the explanation of
the dependent variable.
e. When testing the hypothesis that all the regression coefficients are simultaneously equal to zero, the Ftest is always a one-tailed test, despite the fact that it looks like it should be a two-tailed test because
there is an equal sign in the null hypothesis.
f. It should be noted that rejecting the null hypothesis indicates one or both of the coefficients are
significant.
10. Specification Bias - refers to how the slope coefficient and other statistics for a given independent variable are
usually different in a simple regression when compared to those of the same variable when included in a
multiple regression.
a. Indicated by the extent to which the coefficients for each independent variable is different when
compared across equations.
11. Decision rule for the F-test 12. Method of rejection 13. Coefficient of multiple correlation - simply the square root of R-squared. Always positive in multiple
regression.
14. ANOVA - table that spits out all the data for a multiple linear regression. F-tests, tstats, pvalues, 95% conf
intervals, parameter coefficients, R^2, ESS, SSR, TSS
15.
R2 Adjusted R2
a. MSE =
e2t
t =1
e t= yt ^y t
T
and expected observation)
b.
y t = ^0 + ^ 1 TIM E t
a. Unfortunately, selecting a model based on smallest MSE or highest R^2 is not effective in providing
good out of sample models.
b. A better methodology to select the best forecasting model is to find the model with the smallest out-fsample , one-step-ahead MSE.
3. Reducing MSE Bias a. In sample MSE to estimate out-of-sample MSE is not very effective because in-sample MSE cannot
increase when more variables are included in the forecasting model. MSE will have DOWNWARD bias
when predicting out-of-sample bias. (too small)
b. Must adjust the MSE bias with S62 measure.
4.
S2
measure - an unbiased estimate of the MSE because it corrects for degrees of freedom as follows:
T
a.
e 2t
s 2= t=1
T k
5. Data mining - as more variables are included in a regression equation, the model is at greater risk of overfitting the in-sample data.
a. Problem with data mining is that the regression model does a very good job of explaining the sample
data but does a poor job of forecasting out-of-sample data.
b. As more parameters are introduced to a regression model, it will explain the data better, but may be
worse at forecasting out-of-sample data. Increasing the number of parameters will not necessarily
improve the forecasting model.
6. Model Selection Criteria - Selection criteria is often compared based on a penalty factor.
a. The unbiased estimator, s^2 defined earlier can be rewritten Multiplying T to the numerator and
denominator,
7. Penalty factor - multiplying the s^2 by T to strike a penalty for degrees of freedom.
T
a.
s 2=(
e 2t
T
) t =1
T k T
e 2t
2k
T
AIC=e t=1
T
T
k
T
e2t
SIC =T t =1
T
10. Evaluating Consistency 11. Consistency - is a key property that is used to compare different selection criteria. Two conditions are required
for a model selection criteria to be consistent:
a. When the TRUE Model or data-generating process (DGP) is one of the defined regression models, then
the probability of selecting the true model approaches one as the sample size increases.
b. When the TRUE model is not one of the defined regression models being considered, then the
probability of selecting the BEST APPROXIMATION MODEL approaches one as the sample size
increases.
12. NOTE: because we live in a very complex world, almost all economic and financial models have assumptions
that simplify this complex environment.
a. MSE does not penalize for degrees of freedom and therefore not consistent
b. Unbiased MSE, S^2 penalizes for degrees of freedom but adjustment is too small for consistency
c. AIC penalty> s^2 ; however with large sample sizes, n, AIC tends to select models that have too many
variables or parameters
d. SIC seems to be the parameter with greatest penalty factor and most consistent when parameters are
increased relatively with sample size.
e. BUT if we think the true model is very complex, then the AIC should be considered/examined.
13. Asymptotic Efficiency
a. Is the property that chooses a regression model with one-step-ahead forecast error variances closest to
the variance of the true model. Interestingly, the AIC is asymptotically efficient and the SIC is not
asymptotically efficient.
b.
L y t = y t 1
y t =( 1L ) y t = y t y t 1
evolving information sets move the conditional mean over time (recall a requirement of stationarity is
constant unconditional mean)
b. Thus it can model the dynamics of time series process that is outside of covariance stationarity
(Unstable)
20. Rational polynomials a. Applying Wolds Theorem to infinite distributed lags is not practical, therefore, we need to restate this
lag model as infinite polynomials in the lag operator because infinite polynomials do not necessarily
contain an infinite number of parameters.
b. Infinite polynomials that are a ratio of finite-order polynomials are known as rational polynomials.
21. Rational distributed lags - The distributed lags constructed from these rational polynomials. With these lags,
we can approximate Wold Representation Theorem. An Autoregressive Moving Average Process (ARMA) is an
practical approximation for Wolds Representation Theorem.
22. Sample mean and sample autocorrelation
a. Sample Mean an approximation of the mean of the population which can be used to estimate the
autocorrelation function.
T
i.
y =
1
y
T t=1 t
b. Sample Autocorrelation estimates the degree to which white noise characterizes a series of data.
Recall that for a time series to be classified as white noise process, all autocorrelations must be zero in
the population data set. The sample autocorrelation , as a function of displacement,
is
computed as follows:
T
[ ( y t y ) ( y t y ) ]
i.
^ ( )=
t =+1
( y t y )2
t=1
c.
Sample partial autocorrelation can also be used to determine whether a time series exhibits white
noise. It differs from sample autocorrelation in that it performs linear regression on a finite or feasible
data series. .
i. However, the outcome of sample partial correlations is typically identical to that achieved
through sample autocorrelation.
ii. Sample partial autocorrelations usually plot within two-standard errors bands (i.e. 95%
confidence interval) when time series is white noise.
23. Q-staitstic - can be used to measure the degree to which autocorrelations vary from zero and whether white
noise is present in a dataset.
a. This can be done by evaluation the overall statistical significance of the autocorrelation.
b. This statistical measure is approximately Chi-Squared with m degrees of freedom in large samples
under the null hypothesis of no autocorrelations.
24. Box-Pierce Q-statistic a. Reflects the absolute magnitudes of the correlations, because it sums the squared autocorrelations.
b. Thus the signs do not cancel each other out, and large positive or negative autocorrelations
coefficients will result in large Q-statistics.
25. Ljung-Box Q-Statistic a. Is similar to Box-Pierce Q-Statistic EXCEPT that it replaces the sum of squared autocorrelations with a
weighted sum of squared autocorrelations. For large samples sizes, weights for both statistics are
roughly equal. More used on small data sets but same results with large data sets.
Ch8 Modelling Cycles : MA, AR, and ARMA Models , Diebold Elements of Forecasting
1. Moving Average Process
a. Used to capture the relationship between a time series variable and its current and lagged random
shocks.
b. Is a linear regression of the current values of a time series against both the current and previous
unobserved white noise error terms, which are random shocks.
2. Ma(1) Process - first order Moving Average Process has a mean of zero and a constant variance and can be
defined as:
a.
y t = t + t 1
b.
c.
t1
d.
e.
f.
t >0
, then
y t >0
equivalent to saying that a positive error term will yield a positive dependent variable (
. This is
yt
).
g. When adding back the lagged error term, we are now saying that the dependent variable is impacted
by not only the current error term, but also the previous periods unobserved error term, which is
amplified by a coefficient (
3. Autocorrelation cutoff
a.
1=
1+1
where
=0 for >1
b. For any value beyond the first lagged error term, the autocorrelation will be zero in an MA(1) process.
This is important because it is one condition of being covariance stationary(mean =0 and var=
||<1
. The autoregressive
representation, which is an algebraic rearrangement of the MA(1) process, is expressed in the following
formula.
a.
t = y t + t 1
b.
process of inversion enables the forecaster express current observables terms of past observables .
6. MA(q) process - forecasters can broaden their horizon to a finite order MA process of order q , which
essentially adds lag operators out to the
a.
y t = t +1 t1 ++ q tq
b.
tq
c.
q th
d. MA(q) process theoretically captures complex patters in greater detail, which can potentially provide
for more robust forecasting. This also lengthens the memory from one period to the qth period.
e. Ma(q) process also exhibits autocorrelation cutoff after the qth lagged error term. Again this is
important because covariance stationarity is essential to the predictive ability of the model.
7. First-order autoregressive (AR(1) ) process a. More capable of capturing a more robust relationship and not in need of being inverted because it is
already in a favorable rearrangement compared to the unadjusted moving average process.
b. Must have mean = 0 and constant variance.
c. Specified in the form of a variable regressed against itself in a lagged form. This relationship can be
shown below:
i.
y t = y t 1+ t
ii.
iii.
t = t for t=0,1,2,
c.
Used to reinforce a very important distinction between autoregressive processes and moving average
processes.
d. Recall that moving average processes exhibit autocorrelation cutoff, which means autocorrelation are
essentially zero beyond the order of the process.
e. The significance of the Yuler-Walker Equation is that for autoregressive processes, the autocorrelation
decays vary gradually.
i.
E.g.
y t = 1 y t 1+ 2 y t 2 ++ p y t p + t
||<1
y t = y t 1+ t + t 1
12. Applications of AR and ARMA process a. Must pay attention to decay of autoccorelation function.
b. Determine if any spikes in autocorrelation function which may indicate using an AR or ARMA.
c. E.g. if every 12th auto correlation jumps up, maybe a seasonality effect.
d. Useful to test various models using regression results. (easiest t see differences using data that follows
some pattern of seasonality.
Ch23 Estimating Volatilities and Correlations for Risk Management , Hull Options Futures and Other
Derivatives 9th ed
1. Weighting Schemes to estimate volatility
a. Equally weighted (average)
b. Autoregressive conditional heteroscedasticty model (ARCH)
c. EWMA Model
d. Generalized ARCH(1,1) GARCH(1,1) e. Maximum likelihood estimator
2. Equally weighted (Average)
a. Traditional models first used change in asset value from period to period. From continuously
compounded return over successive days is represented as:
i.
ui=ln
Si
, where S i=asset price at time i .
S i1
( )
ii.
m
u =
iii.
1
u where n= present period plus no . of observations, m ,leading up current period
m i=1 ni
=0 is :
1
= 2ni
m i=1
2
1.
2.
Simplest terms , historical data is used to generate returns in asset pricing series.
Historical returns used to generate volatility parameter.
3. Can be used to infer expected realizations of risk.
4. Equally weighted by 1/m . all periods have the same weight. If we want more strength
on earlier periods, we can:
General Weighting Scheme
b.
i.
1.
i=1
ii.
condition i=1
i=1
iii.
Objective is to generate greater influence on recent observations, then all alphas will decline in
value for later observations.
3. Autoregressive conditional heteroscedasticity model ( ARCH(m) )
a. One extension is to assume long-run variance through ARCH:
m
b.
c.
d.
e.
=w+ i u2ni
i=1
2
n
i =1
f.
4. EWMA
a. Model is a specific case of the general weighting model presented in the previous section.
b. The main difference Is that the weights are assumed to deline exponentially back through time.
c.
2n = 2n1 + ( 1 ) u 2n1 where weight on previous volatility est .u is return ( chg ) previous period
d.
e.
f.
6.
GARCH (1,1) model - one of the most popular methods of estimating volatility. Not only incorporates the most
recent estimates of variances and squared return, but also a variable that accounts for a long-run average
level of variance.
7. GARCH (p,q) p is for number of lagged terms on historical returns squared. Q stands for number of lagged
terms on historical volatility.
8. Weights of GARCH(1,1) model
a.
iii.
iv.
v.
vi.
VL
VL
w
1
+ + =1
+ <1
is not negative.
w=0 , =1 , = .
b. GARCH model adds to the information generated by the EWMA model in that it also assigns a
weighting to the average long-run variance estimate.
c. GARCH also has an additional characteristic which is the implicit assumption that variance tends to
revert to a long-term average level.
d. Recognition of a mean-reverting characteristic in volatility is an important feature when pricing
derivative securities such as options.
10. Mean Reversion
a. Experience indicates that volatility exhibits a mean-reverting characteristic. A GARCH model tends to
display a better theoretical justification than the EWMA model.
b. Methods for estimating GARCH parameters (or weights), however, often generates outcomes that are
not consistent with the models assumptions.
c. Sum of the weights a and B are sometimes greater than 1- which causes instability in the volatility
estimation.
d. In this case, the analyst must resort to using EWMA model.
e. If model is to be stationary over time the persistence ( sum (a + B) ) plus be less than one (w.r.t.
the reversion to the mean)
f. The PERSISTENCE describes that rate at which the volatility will revert to its long term value following
a large movement
g. The HIGHER the persisitence (given that it is less than one) the longer it will take to revert to the mean
following a shock or large movement.
h. Perisitence of 1 means that there is no reversion and with each change in volatility, a new level is
attained.
.
11. Volatility using Maximum Likelihood Estimator
a. MLE selects values of model parameters that maximize the likelihood that the observed data will occur
in the sample.
b. Requires formulating an expression or function for the underlying probability distribution of the data
and then searching for the parameters that maximize the value generated by the expression.
12. GARCH Estimation
a. Estimated using maximum likelihood techniques.
b. Estimation begins with a guess of the models paramters.
c. Then a calculation of the likelihood function based on those parameter estimates is made.
d. Parameters then slightly adjusted until the likelihood function fails to increase, at which the estimation
process assumes it has maximized the function and stops.
e. The values of the parameters at the point of maximum value in the likelihood function are then used to
estimate GARCH model volatility.
13. Volatility term structure a. Good job at modeling volatility clustering when periods of high volatility tend to be followed by other
periods of high volatility and periods of low volatility tend to be followed by subsequent periods of low
volatility.
b. Thus, there is autocorrelation in
c.
u2i
If GARCH models does a good job of explaining volatility changes, there should be very little
autocorrelation in
u2i
2i
d. Does a good job at forecasting volatility from a term structure perspective (time to maturity) Even
though the actual volatility term structure figures are somewhat different from those forecasted by
GARCH models, GARCH generated volatility data does an excellent job in predicting how the volatility
term structure responds to changes in volatility.
14. Impact of volatility changes
15. Estimating covariances and correlations a.
Simulation models generate random inputs that are assumed to follow a probability distribution.
a. With inputs, simulation model then generates scenarios or trials based on probabilities associated with
probability distributions.
b. Last step is to analyze the characterisitics of the probability generated inputs (samples) (mean,
variance, skewness, conf intervals)
c.
2. Choosing Simulation models a. Bootstrapping Technique
b. Parameter Estimate Technique
c. Best Fit technique
d. Subjective Guess Technique
3. Bootstrapping technique 4. Parameter Estimate Technique 5. Best-Fit Technique 6. Subjective Guess Technique 7. Monte Carlo Simulation 8. Advantages of Simulation Modeling 9. Correlations can be incorporated into simulation modeling 10. Relationship between Accuracy and Number of Scenarios
11. Estimator Bias
12. Discretization Error Bias
13. Identifying Most Efficient Estimator 14. Inverse Transform Method 15. Inverse Transform method for Discrete Distributions 16. Inverse Transform Method for Continuous Distributions 17. Hazard Rate 18. Pseudorandom Number Generators
19. See random numbers 20. Midsquare technique 21. Congruential pseudorandom number generator 22. Linear congruential pseudorandom generator 23. Quasirandom Sequences / Low-discrepancy Sequences 24. Stratified Sampling 25. Latin Hypercube Sampling Method 26. Generalized permutation matrix -